Back in the 1980’s I worked on speech recognition at the National Physical Laboratory(NPL), Teddington. At that time there were two main strategies. At NPL we took a phoneticapproach (called WISPA) – breaking speech into units of human understanding – whereas almost everyone else (eg IBM, CMU, GCHQ/JSRU) were using a pattern matching based approach. Pattern matching used to be described as artificial intelligence, but not any more. It would not be accurate to call this a neural network approach either but it was similar in that the idea was just to chuck loads of digitised utterances at the system, tell it what word was being uttered, and then let the system work out what they all had in common.
We took our phonetic approach at NPL partly because back then, computer power was insufficient to allow pattern-matching to be done in real-time, but mainly because we thought that pattern matching (of speech spectrograms) wasn’t what humans were doing when they recognised speech. Although the pattern matching techniques worked well, what we noticed was how much better our approach was at handling phonetically identical words with large acoustic differences – eg the same words spoken by different people.
After I left NPL and started JPY I never thought I’d ever get involved in pattern recognition again. That all changed about 5 years ago when we started a speculative project called Threads – which evolved from a system we originally developed to aggregate messages – emails and VoIP phone calls – to help us collaborate better. Although we expect speech recognition to become a big part of Threads, we have no intention of re-inventing it – there are now some really good speech recognisers around. However, it did cause us to start looking at neural networks as a way of correlating the information in messages (text) with “projects”. Although neural networks are great, in truth, for us it was a bit of a cop-out – to save us having to working out exactly how we identified a project. So it started me wondering if we shouldn’t first be breaking up (or tokenizing) messages to extract units of human understanding before processing further – maybe still using neural networks, pattern matching, or whatever.
Unfortunately, this is not what everyone wants to hear – investors especially – who think that AI is where all the new money should be going. The term AI is fine, as long as we don’t kid ourselves. It doesn’t necessarily mean emulating the human brain. It can be just as “intelligent” to do a Google search for an answer as to try and work it out algorithmically. Remember that Turing originally defined a machine as being “intelligent” if, when questioned, we couldn’t detect the machine wasn’t a human. If it seems like a human, then must be intelligent. There were no preconditions about how the machine should come up with the answer.
To my delight, I chanced upon this article in New Scientist by Sara Agee…
It seems I am not the only person to wonder if AI and neural networks might turn out to be the King’s New Clothes?. What we were doing at NPL in the 80’s is now called Symbolic AI, and it’s making a comeback. To (re)quote Roger Schank from Northwestern University in Illinois.
“When people say AI, they don’t mean AI. What they mean is a lot of brute force computation.”
Computing power is cheap. My iPhone has 1000 times the power of the computers I was using to do speech recognition in the ‘80s. But that doesn’t mean it can replicate human intelligence – computers are still nowhere near the power of the human brain. And neural networks, clever as they are, are doing no more than replicating some of the processes of the brain. Both have their value, but there is no substitute for understanding the problem you are tackling. If you want a piece of software to act like a human, then it has to understand like a human.