We need to talk about speech recognition

Listen to me, dammit

According to a lot of sci-fi we should be talking to our computers more. As I am typing this on the computer I'm thinking; would I really prefer to be saying it? The answer is that I'm not sure. I installed some speech-recognition software on my PC about 15 years ago and it didn't understand me and I didn't want to train it. That's a long time ago in computers but the lethargy has stuck. Now speech input, command and dictation seem to be on everyone's radar again thanks to the popularity of Apple's Siri on the iPhone 4S.

Look under the skin of the Siri software and you will find that the speech recognition is provided by industry stalwart Nuance Communications. This is why Siri was left out of the new iPad, for the tablet isn't necessarily always online (via Wi-Fi or 3G) like a phone is supposed to be. The Nuance speech-recognition 'engine' resides on Apple's servers and the computation takes place there too; Siri on the iPhone 4S is just an application layer that taps into it. Other companies use Nuance's speech-recognition software on their servers, too, as it's a leader in the field, supporting over 60 languages and a history going way back to 1982.

The two big rivals to Nuance are Google and Microsoft. Google's speech technologies were developed with the help of one of the Nuance co-founders, Mike Cohen. You can use the voice actions in Android, voice dialer, search and so on, but the user experience is overall much more fragmented, without the 'assistant' there. Meanwhile Microsoft has its own speech engine; I've just tried it on my Windows 7 machine and it was quite fun, except for having to spell out and correct Nuance and Google spelling, then sorting out capitalisation, and now I've given up already, back to typing! Microsoft purchased Tellme five years ago to buy the firm's speech technology. It features in Windows Phone, Xbox/Kinect, PCs with Windows 7 and a couple of cars from Ford and Kia.

Controlling some functions of your 2012 Ford Explorer by voice

I think Siri has shown us that the voice-recognition technology isn't the most important thing here but the application software layer that the users go through to use the speech recognition. An automated phone service may use Nuance's software on its server to recognise the telephone caller's requests. But the application built to use the Nuance speech recognition isn't programmed to interpret all the ways you might say, for example, "No, thank you". Answers such as "Nope" and "Not for me" are recognised by Nuance, sure, but not by the application layer as a valid response, so people think that the speech recognition is at fault. Also Siri may work better than rivals I have seen because of the standardisation of the hardware and software on the iPhone. PC and Android systems can be deeply different depending on vendors and users' preferences so it's hard to be so polished dealing with such a jumble of applications to interact with.

Quiero hablar

Perhaps, understanding the greater importance of the app layer may affect widespread use. Nuance released Dragon Go in 2011 and also bought out rival Vlingo. Dragon Go is quite a well-reviewed app for both iOS and Android, though it doesn't talk back like Siri does. Speech really does seem like a natural human way to communicate, when possible we speak to people rather than type, so the same should be true of communicating with our computer. However when we create or make things we use tools, so mice, keyboards, pens and touch control seem like a winning combination for those tasks and will always be essential for a lot of people.

It's not just about our computers and phones. As mentioned above Microsoft has had the foresight to try and include its speech tech in Xbox 360 and car controls. Nuance also wants more of our household appliances to use its voice tech, including coffee machines, fridges, alarm systems and TVs. At the weekend Dragon TV was demonstrated to a reporter at The New York Times, including schedule search and channel-changing functions controlled by voice. A new TV planned by LG Electronics plans to use this Nuance software shortly. As well as the TV navigation functions people will be able to use Skype, shop on Amazon and update their Facebook and Twitter accounts using the microphone built into the remote. No headsets required...

Login with Forum Account

Don't have an account? Register today!

Nearest (okay, there are others, but it's the most well known) alternative to Siri on Android is Vlingo. And Nuance bought Vlingo, (as it says in the article), so it kind of looks like they've cornered the market wrt speech recognition.

I've got Vlingo on my phone at the moment, don't use the speech recognition features, but the vocalisation of text messages etc is useful to me.

2 reasons why it isn't popular:
1) it is actually very difficult, only scratched the surface of it when studying it but it's very complex.
2) (and this is the biggun really) you look and feel like an idiot!

I've toyed with voice recognition software on and off over the last 15 or so years, and it still doesn't feature as a useful thing, for me. I can type faster and with more accuracy on a keyboard. Furthermore, much of the time that I spend typing is done so in reasonably quiet environments, where it would not be appropriate to be speaking out loud. And then there are the noisy environments, where there are technical hurdles to overcome - like someone shouting over your shoulder, and suddenly your scientific paper has “Oi Roobubba, get me couple of jammy doughnuts too!” inserted randomly.

Don't get me wrong, I think it's a technology that would be great to have working properly, and it's getting quite close to being genuinely usable.

It's just not quite there, yet…

The final test will be when it recognises Scottish accents.

I can never get on with the voice recognition software, I want to like it and I can sometimes get it to work almost flawlessly with my voice BUT and this is a big but, I cant be productive with it…Me personally i have to just think to myself and type it down, it is a seperate process for me but as soon as I try and change the typing to just listening/voice recog then my way of thinking has gone, i start thinking in my head more and more and it puts me off talking to myself (yes i talk LOADS when discussing a problem with myself :L) as the system will take what i say.

I even have an issue with if i do manage to concentrate enough to realise what i need to input, the issue is i lose track of progress, my mind just cant comprehend talking to the system WHILE thinking what i was meant to speak. As above typing just works for me, i tell myself what i need to type and boom it transfers and my hands have already dealt with that, occasionally something gets missed but its relatively low.

If i could get passed that barrier i wouldnt mind at all, only for my home pc mind as you look like a total tool using voice recognition in public! And for Siri, it really isnt there at all… my friend was trying to show off with it and it didnt work 90% of the time… its really pointless for phones and i cant see myself being the only person who thinks the whole point of voice recognition is for handsfree, yet with Siri you have to press the start button… if you are only touching your phone im sure its easier to just TYPE the line you want to search in google or hit music than way for Siri to make the noise and you talk and then wait for it to load.

We need to talk about speech recognition

Listen to me, dammit

Quiero hablar

Related Reading

HEXUS Forums :: 10 Comments

Login with Forum Account

MY HEXUS

EVENTS

INDUSTRY PRESS RELEASES

User Name
Password