People either love or hate speech recognition services, whether they use them willingly or out of necessity. There’s rarely a middle ground. Users show less tolerance when they don’t have the option, like in many banks that won’t let you speak to a human until you’ve struggled with their voice assistant for 20 minutes. However, even when people choose to use speech assistants, such as Alexa or Siri, their tolerance still varies widely.
Human versus computer
The problem with automatic speech recognition (ASR) is there is rarely any measurement available for performance. You would not buy a motor car without knowing the fuel consumption first, yet if a company decides to offer ASR interaction to its customers, there is very little information on which to base a judgment as to whether it will enhance customer satisfaction. Sadly, many companies offering speech as a method of interaction simply don’t care. A human operator will cost them £50 per hour, whereas a voice-operated BOT will cost them pence per hour. They don’t care if the customer spends 20 minutes inputting information that would take a human 2 minutes. Customers waste their time, not the company.
Maximising ASR performance
That said, ASR performance is a very difficult thing to measure. It depends on so much. The quality of the speech, the pronunciation and motivation of the speaker, the size of the vocabulary, the language of use – the list is endless. A high-motivated speaker using a high-quality microphone can achieve 95% recognition performance for the same phrase, but only 20% of the words get recognized when the same phrase is uttered during a telephone call.
Add to that the fact that ASR service providers are reluctant to provide any sort of performance measurement when they don’t have to. Much better to let the prospective customer assume the best.
Accuracy and reliability of ASR
We have always considered it crucial to establish customer expectations before committing to ASR, due to the issue in Threads call transcription. It’s all very well saying “if you don’t like it, you can have your money back”, but that does not compensate for the time and money spent in providing the infrastructure for call transcription to start improving the bottom line. And used in the proper way, the benefits can be massive.
Given the dearth of information from the ASR service providers and the fact that the performance of ASR is so variable, we devised a framework in which prospective Threads customers can test the performance of ASR with:
- The choice of ASR services
- Their own staff
- Their own telephony equipment
- Their own dialogues
To make things easier, we provide a standard dialogue in 5 European languages which prospective customers can recite and get an exact measurement of the words correctly and incorrectly recognised. That will not necessarily reflect the operational performance, but it does avoid wasting time and money on a technology not yet fit for their purpose.
Future advancements in ASR
An interesting spin-off from this work is that it gives us the ability to see how ASR systems are evolving and improving with time. We can submit recordings made months or even years ago and compare the performance now with the performance then.
And another thing it shows is the importance of context in ASR. People are less tolerant of ASR systems when they compare them to humans because they do not realise that the computer has none of the other clues we humans use to recognise speech. In terms of pure word recognition rates, computers are often better than humans. What computers don’t have is a knowledge of the subject, the speaker, the language and a plethora of other contextual information we take for granted. Threads uniquely extracts context from other sorts of digital messages such as emails, to improve ASR performance.
Evaluating Siri and Alex
And the answer to the question, “how good are Siri and Alex?” is “it all depends!” Test them yourself with one of our speech workout scripts.
If you are more interested in the science and technology behind Threads’ speech recognition performance measurement framework, then have a read of our white paper on the subject: “A Framework for automatically measuring performance of Automatic Speech Recognition Systems” (Download PDF)
Happy transcribing.