Listen to me, dammit
According to a lot of sci-fi we should be talking to our computers more. As I am typing this on the computer I'm thinking; would I really prefer to be saying it? The answer is that I'm not sure. I installed some speech-recognition software on my PC about 15 years ago and it didn't understand me and I didn't want to train it. That's a long time ago in computers but the lethargy has stuck. Now speech input, command and dictation seem to be on everyone's radar again thanks to the popularity of Apple's Siri on the iPhone 4S.
Look under the skin of the Siri software and you will find that the speech recognition is provided by industry stalwart Nuance Communications. This is why Siri was left out of the new iPad, for the tablet isn't necessarily always online (via Wi-Fi or 3G) like a phone is supposed to be. The Nuance speech-recognition 'engine' resides on Apple's servers and the computation takes place there too; Siri on the iPhone 4S is just an application layer that taps into it. Other companies use Nuance's speech-recognition software on their servers, too, as it's a leader in the field, supporting over 60 languages and a history going way back to 1982.
The two big rivals to Nuance are Google and Microsoft. Google's speech technologies were developed with the help of one of the Nuance co-founders, Mike Cohen. You can use the voice actions in Android, voice dialer, search and so on, but the user experience is overall much more fragmented, without the 'assistant' there. Meanwhile Microsoft has its own speech engine; I've just tried it on my Windows 7 machine and it was quite fun, except for having to spell out and correct Nuance and Google spelling, then sorting out capitalisation, and now I've given up already, back to typing! Microsoft purchased Tellme five years ago to buy the firm's speech technology. It features in Windows Phone, Xbox/Kinect, PCs with Windows 7 and a couple of cars from Ford and Kia.
I think Siri has shown us that the voice-recognition technology isn't the most important thing here but the application software layer that the users go through to use the speech recognition. An automated phone service may use Nuance's software on its server to recognise the telephone caller's requests. But the application built to use the Nuance speech recognition isn't programmed to interpret all the ways you might say, for example, "No, thank you". Answers such as "Nope" and "Not for me" are recognised by Nuance, sure, but not by the application layer as a valid response, so people think that the speech recognition is at fault. Also Siri may work better than rivals I have seen because of the standardisation of the hardware and software on the iPhone. PC and Android systems can be deeply different depending on vendors and users' preferences so it's hard to be so polished dealing with such a jumble of applications to interact with.
Quiero hablarPerhaps, understanding the greater importance of the app layer may affect widespread use. Nuance released Dragon Go in 2011 and also bought out rival Vlingo. Dragon Go is quite a well-reviewed app for both iOS and Android, though it doesn't talk back like Siri does. Speech really does seem like a natural human way to communicate, when possible we speak to people rather than type, so the same should be true of communicating with our computer. However when we create or make things we use tools, so mice, keyboards, pens and touch control seem like a winning combination for those tasks and will always be essential for a lot of people.
It's not just about our computers and phones. As mentioned above Microsoft has had the foresight to try and include its speech tech in Xbox 360 and car controls. Nuance also wants more of our household appliances to use its voice tech, including coffee machines, fridges, alarm systems and TVs. At the weekend Dragon TV was demonstrated to a reporter at The New York Times, including schedule search and channel-changing functions controlled by voice. A new TV planned by LG Electronics plans to use this Nuance software shortly. As well as the TV navigation functions people will be able to use Skype, shop on Amazon and update their Facebook and Twitter accounts using the microphone built into the remote. No headsets required...