Posts Tagged ‘vlingo’

Tim O’Reilly: Google Voice Search Key Technology

Thursday, April 2nd, 2009

ReadWriteWeb reports Tim O’Reilly addressed attendees at the San Francisco Web 2.0 Expo this week, talking about key technologies for the Web >2.0. Voice search (Google iPhone App), he claimed was a tipping point in terms “sensor based interfaces”.

While not the only vendor to provide voice search (i.e. Yahoo oneSearch powered by Vlingo) Google certainly seems ahead in the game in what appears to be a gradual unfolding of a broad voice strategy, such as Voice Search and recently rebranding a feature-enhanced GrandCentral as Google Voice. Future work on the voice front we can expect includes promotion of its own speech recognition capacities through Android, Google Gears bringing speech capacities to all browers, tighter integration of Gaudi (audio indexing) with other services and perhaps one day opening up voice services over APIs.

As I’ve previously pointed out, to Google voice is just another form of data, but what’s slowly beginning to emerge is a central role for speech and voice technologies to play in coming developments for the web and how we search and interface with it.

More speech on the iPhone

Sunday, February 8th, 2009

The iPhone has proved a game-changer in many regards and speech is no exception. Both Google and Yahoo (with vlingo) have deployed mobile speech applications for the iPhone.
Today I came across another sighting of iPhone speech recognition, Vocalia by Creaceed, employing open-source ASR engine Julius for back-end technology. There is no “push to talk” button but a “shake to retry”, which may prove useful when recognition goes awry. The app supports French, English and German for now and costs €2.99. Dictation is not available at this point, though Julius is certainly capable of it from an architecture point of view.

Other speech and language related iPhone apps:,

Has anyone used these extensively? What is your experience with speech on the iPhone?

The Times Reports & Is SciFi Really Wrong?

Sunday, January 27th, 2008

The New York Times today published an interesting, if brief, article about speech recognition in the mobile/telco space – cited as a “$1.6 billion market in 2007″. The article provides a brief overview of a range of applications and mashups, such as vlingo.com and SimulScribe as well as some directory assistance services (but omitting some others such as SpinVox, GOOG411), that use voice.
The article opens:

“Innovation usually needs time to steep. Time to turn the idea into something tangible, time to get it to market, time for people to decide they accept it. Speech recognition technology has steeped for a long time”

And concludes:

“Even a digital expert [...] cautions that some people may never be satisfied with the quality of speech recognition technology — thanks to a steady diet of fictional books, movies and television shows featuring machines that understand everything a person says, no matter how sharp the diction or how loud the ambient noise.”

But isn’t this a bit hackneyed? Perhaps by today’s standards a twenty-year steeping period seems long, but this is hardly the case anywhere else in history. And after re-watching 1982′s Blade Runner recently, I actually felt rather optimistic that we are today close to what the movie’s expectations for speech recognition and speaker verification were for 2019. Elsewhere , a similar picture emerges.
The Star Trek ship computer’s speech recognition engine (the year is 2151), while accurate, stills require the push of a button to kick in, rather than listening for the hot word “computer”, a capacity available , if not quite ripe for deployment, today.
Of course, there are the HALs (2001), Marvins (no date), C3P0s (Long long time ago…), whose capacities far exceed that, which we dare dream our mobile phones can one day understand. But here it seems the problem is less about the quality of speech technology – the quality of HAL’s speech synthesis is available today, and Marvin’s characteristic monotone baritone should be easy to do – rather than about the old hard-soft divide in Artificial Intelligence. As long as we use a hard-AI problem, which speech arguably is, to solve soft-AI problems (“find closest pizza service”) we cannot fail to be disappointed.