Should you add speech recognition to your IVR?

Written by Daniel Noworatzky | Jun 10, 2020 2:49:00 PM

In uncertain or unfamiliar times such as these, customers tend to call companies more than usual to get answers to their questions. Advanced technologies can significantly improve the caller’s experience, provided they are used appropriately. Last week, we looked at best practices in call center design. In this article, we examine speech recognition and how it should (and should not) be used to add value, efficiency, and effectiveness to your contact center procedures.

Speech recognition

Speech recognition refers to the ability of a computer to detect human speech, and to understand the meaning of what is being said. Within the context of an IVR (interactive voice response) system, callers are told to respond with a single word such as “yes,” “no” or “accounts” to be routed accordingly. Or, they may be asked, “How can we help you today?” and their reply will route them to the appropriate menu or agent.

Compare that with the use of a more traditional IVR menu, where callers respond with their telephone keypad. If you have three consecutive menus with four choices in each, callers are given a total of 12 available paths. However, with speech recognition, they potentially have hundreds or thousands of possible choices, limited only by the number of words that the system can understand.

You can immediately see how responding with words dramatically streamlines the procedure that a caller follows. Many more possible user intents can be determined in a single phrase, without the use of multiple nested lists of menu choices and prompts. This dramatically decreases caller time in the system.

Pitfalls of speech recognition

Speech recognition does have its pitfalls and if not used appropriately can be more detrimental than beneficial. Here are some principles to keep in mind:

Accuracy of understanding – The accuracy of the system is determined by the quality of the speech recognition software, the quality of the caller’s phone line connection, and the quality of the caller’s diction. If your customer base is expected to have many different accents, then a speech recognition system’s work is really cut out for it. If your users routinely call from mobile phones with sketchy connectivity, recognition can often fail. If the system repeatedly does not understand your callers, you may find that many more calls will be forwarded to a human agent to be served than before the use of this technology.
Use it to minimize call time – The objective is to minimize the amount of time a caller spends in the contact center. If your menu structures are not complex (one menu with two choices, for example), it may be preferable to maintain the traditional keypad selection process. If, however, you have four nested menus with seven choices each, then speech recognition will significantly reduce the amount of time a caller spends on the phone.
Tests, tests and more tests – Make sure you test the system thoroughly before putting it into operation. Especially for large contact centers, if its deployment fails, you may have thousands of callers being unable to get through or spending too much time trying to get the system to understand what they want.

Additional benefits

Speech recognition enables some useful additional features. These include:

Speech-to-text – Any speech that is recognized can be converted into text and stored. This stored text can then be used in many innovative ways. A caller can speak an account number, which can be stored as text in a variable and used to search databases and other data structures, allowing an enriched interaction between caller and system using natural speech.

Translation – This conversion of a spoken language to text enables translation services. Once in textual format, already mature text translation services can translate the text. Then, text to speech, which will be addressed next, can be used to “speak” the translated text to the caller. This three-step translation process can be quite beneficial for multi-language services.

Text-to-speech – This is a technology that allows a computer to synthesize a human-sounding voice from text. In addition to being useful as part of a translation procedure, it can be employed to dynamically create user prompts that callers will hear on the spot, rather than to have them prerecorded. This makes contact center modification much more flexible, simpler, and more readily executable.

Conclusion

Speech recognition represents an exciting leap forward in servicing customers that call in. But it must be used with caution. Make sure the technology is right for your customers’ needs and has been thoroughly tested before putting it into operation.