Losing the Thread?

automatic speech recognition technologyA large proportion of the working population has less than perfect hearing. When listening to information conveyed in a conversation, this often leads to misunderstanding or guesswork. This is exacerbated when listening to telephone-quality speech where the range of sounds is significantly reduced and there is the absence of the visual clues one would normally get in a face to face conversation.

Most times, the results are not catastrophic but sometimes it can lead to a loss of lifeBut you don’t need to lose life to lose business.

We often think of Automatic Speech Recognition (ASR) as a technology to save us getting up from our armchair to change the music or to save the banks from employing humans, but in fact, its benefits can have far more far-reaching effects – especially for the hard of hearing.

You Either Love or Hate Automatic Speech Recognition Technology

Sadly, a large number of businesses (mainly financial institutions) have decided it’s cheaper to get a computer to answer the telephone than to pay a human, and undoubtedly this generates the most hate. When they shift the burden of the human operator to the caller, they really don’t care how many times you have to repeat your account details because it’s costing them virtually nothing. However, it drives the customers crazy and eventually they will vote with their feet.

On the other hand, if you are driving a vehicle and want to activate a map without causing an accident, you may love it.

When it does not do what you want, it is tempting to assume that automatic speech recognition technology falls woefully short of the human, yet there are circumstances where it often performs better. The human does much more than converting speech into text – the human extracts “meaning” and to do that applies many different knowledge sources to establish context. Computers are not bad at recognising human speech, they are simply not so good at understanding it – yet.

None of the above is greatly dependent on users having normal hearing. It’s not hugely important because the companies that tend to deploy ASR – for example in ChatBots – are trying to save themselves money rather than save money for the unsuspecting and involuntary user. But if you are hard of hearing and you need to get information from speech, then ASR can be very helpful – particularly if you feel left out in the cold by your “normal-hearing” colleagues.

Is There a Service That Can Help?

There are already several software services out there that provide the facility to convert acoustic speech into text. However, these services are not easy to deploy and more important, while they are fast, they are not fast enough to be used in real-time – ie for an interactive phone call. Nevertheless, when deployed via a message hub like Threads, they can still provide very useful information after a call is finished.

Therein lays the first hurdle – finding the call. Associated with the audible part of every call is important information which we call metadata. This includes things like the telephone number of the third party and the date and time of the call. If calls are stored and retrieved simply as audio files – like music – they need to be indexed in some way to be found at a later date.

This raw metadata is not particularly useful. Users will want to search for calls based upon the name rather than the number of the third party. This implies some sort of database to relate call metadata to individual contacts and companies. This is one of the things Threads does. But it can also transcribe calls so they can be searched according to content.

Most users would find it impractical to locate a call recording from their telephone system, then upload the call recording to the ASR service for transcription. Once done, the transcription needs then to be tied to the original recording. Threads achieves this automatically and transparently by intercepting calls on the network.

By automatically storing all calls in a user-friendly database, calls may be recovered played back and, if required, transcribed at will. This means that the call can be shared and listened to as many times as is necessary to ensure that the discussion has been fully understood.

You Have to See it Before You Believe it!

Once you have seen this in action, you soon realise that being 100% accurate is not so important. This is because the words that are best recognised often contain the highest amount of information. If you can find keywords, you can search your phone calls just like you can search your emails. And once you have located the phone call you want, you find that it is easier to interpret the transcription than to rely on your own perception – particularly if your hearing is worse than average.

In summary, call recording and transcription has considerable value for users with normal hearing. However, it provides even more value for those that might not normally feel able to directly participate in phone calls. And those users can have as much to contribute as those involved in the calls.