Posts Tagged ‘ASR’

Speech and Dialog Conferences / Speech for iPhone and Android

Saturday, July 11th, 2009

Conference time: I will be spending a couple of days in London and Brighton from September 5th attending Interspeech, SIGDIAL as well as a researcher round-table. Anyone interested in meeting up, feel free to get in touch.

Also, here are some more or less recent, interesting news for Android (at about 6:20, thanks Schamai) and iPhone speech developers.

Tim O’Reilly: Google Voice Search Key Technology

Thursday, April 2nd, 2009

ReadWriteWeb reports Tim O’Reilly addressed attendees at the San Francisco Web 2.0 Expo this week, talking about key technologies for the Web >2.0. Voice search (Google iPhone App), he claimed was a tipping point in terms “sensor based interfaces”.

While not the only vendor to provide voice search (i.e. Yahoo oneSearch powered by Vlingo) Google certainly seems ahead in the game in what appears to be a gradual unfolding of a broad voice strategy, such as Voice Search and recently rebranding a feature-enhanced GrandCentral as Google Voice. Future work on the voice front we can expect includes promotion of its own speech recognition capacities through Android, Google Gears bringing speech capacities to all browers, tighter integration of Gaudi (audio indexing) with other services and perhaps one day opening up voice services over APIs.

As I’ve previously pointed out, to Google voice is just another form of data, but what’s slowly beginning to emerge is a central role for speech and voice technologies to play in coming developments for the web and how we search and interface with it.

Language Technology April Fools

Wednesday, April 1st, 2009

Just posting some gems from today concerning speech and language technology, such as natural language generation, speech recognition and natural language processing.

Have you found any others?

Microsoft Recite Preview – Note Dictation and Voice Search

Monday, February 16th, 2009

Arstechnica reports today on the release of Microsoft Recite “Technology Preview” for Windows Mobile. The applications lets users record short notes as audio snippets, which can later be searched for content by speaking key words. Apparently it does not entail speech recognition rather than simpler pattern matching, meaning it cannot be searched in text form but may work more robustly, eliminating the effort of training for speaker-independency.

While not a full product yet, this sounds like a nifty little application for cognitive off-loading.

Have you tried Microsoft Recite?



More speech on the iPhone

Sunday, February 8th, 2009

The iPhone has proved a game-changer in many regards and speech is no exception. Both Google and Yahoo (with vlingo) have deployed mobile speech applications for the iPhone.
Today I came across another sighting of iPhone speech recognition, Vocalia by Creaceed, employing open-source ASR engine Julius for back-end technology. There is no “push to talk” button but a “shake to retry”, which may prove useful when recognition goes awry. The app supports French, English and German for now and costs €2.99. Dictation is not available at this point, though Julius is certainly capable of it from an architecture point of view.

Other speech and language related iPhone apps:,

Has anyone used these extensively? What is your experience with speech on the iPhone?

Zumba Lumba – iPhone killer or simply a hoax?

Monday, February 2nd, 2009

A no-frills phone with the unlikely name of Zumba Lumba has recently received some attention by the BBC. The phone is said to be top-secret, developed by a defense-aviation company. It does without frills like a camera or an applications platform, but touts some interesting security and computational features, (not only) related to speech technology:

  • Cloud computing – the phone uses no local storage for contacts, data.
  • Network speech recognition – user input is recognized over the internet. This should avoid hardware intensive local computing for voice input, but requires internet access.
  • Voice identification – enhanced security, because the phone will only respond to a single user’s voice.

Some seem to think this is a potential iPhone killer at least in terms of making use of innovative input modalities (though Google already released a speech recognition app for the iPhone.) Others simply thinks it’s a hoax.

Either way, the idea of joining mobile with cloud computing is interesting. Using voice identification for security has its appeal as well, even if it’s unclear whether keeping data in the cloud and sending voice data over the internet is any more secure than simply keeping data on your phone, locally.

SVOX purchases Siemens AG speech-related IP

Monday, January 26th, 2009
Following Nuance’s acquisition of IBM speech technology intellectual property two weeks ago, Zurich-based SVOX today announced the purchase of the Siemens AG speech recognition technology group. The deal gears at creating “obvious synergies of developing TTS, ASR and speech dialog solutions” and enhances SVOX’s portfolio of technologies, which to date included only highly specialized speech synthesis solutions, to now entail speech recognition.
Like the Nuance-IBM deal (and unlike the Microsoft acquisition of TellMe), this merger breaks with the obvious big-fish small-fish paradigm. Here, a larger company’s (IBM, Siemens) R&D division was sold to a smaller, more specialized company (SVOX, Nuance).
Both transactions come with an intend to pursue development of novel interactive voice applications. However while Nuance announced the potential development of applications across platforms and environment with IBM expertise and IP, SVOX appears to stay on course with its successful line of automotive solutions to build
“a commanding market share in speech solutions for premium cars“.

This deal adds SVOX to a list of companies offering network and embedded speech recognition technologies, also including Nuance, Telisma, Loquendo and Microsoft. Financial terms of the deal were not announced.

IBM Predicts Talking Web

Friday, November 28th, 2008

IBM’s annual crystal ball list of Innovations That Will Change Our Lives in the Next Five Years includes a forecast of a voice-enabled talking web. “You will be able to sort through the Web verbally to find what you are looking for and have the information read back to you,” the article predicts.
IBM itself has launched several voice-enabled products and initiatives over the years, most notably the WebSphere Voice family of web servers, which adds various voice functionality to its flagship WebSphere platform, leveraging it in areas such as unified messaging and call-center automation.
Some problems exist with a vision as the one advocated by the article. Speech recognition accuracy and noise filtering have obviously come a long way and may only pose a minor impediment.
The user’s desire to speak rather than type or click is another problem. Issuing voice commands in the presence of others may not always be desirable and can be disruptive, for instance at work on public transport. Lastly, there are usability concerns, beyond the quality of speech technology, when converting a visual 2- or even 3-dimensional representation of information into a 1-dimensional audio stream. The cognitive load increases significantly with tasks more complex than, for instance, obtaining time-table information or finding the nearest Italian restaurant.
The effort that stands behind the vision, to put voice technology to uses beyond call-center automation, is laudable. Mobile internet access and computing on-the-road may indeed do their parts to make this vision come true. And clearly, there are use cases, such as improved accessibility for users with impairments, that on their own accord merit making the web voice-accessible. Wide-spread usage of a voice-enabled web, however, may be more than five years off.

Google Mobile iPhone App with Speech Recognition

Tuesday, November 18th, 2008

Google released a new feature for its Google Mobile iPhone Application yesterday: voice search. Users speak a query and the application returns search results formatted for the iPhone. This is similar to the GOOG411 directory assistance application, which allows users to call a phone number, speak a query and receive information about local listings in voice or SMS formats. However the new application apparently performs recognition locally on the iPhone, meaning it comes bundled with an embedded speech recognition engine.

Aside from GOOG411, during the US presidential Google released Gaudi, a voice indexing technology for video. That makes the iPhone app the third official service the company releases, making use of speech recognition, leaving one guessing when Google’s speech technology becomes available as API, like the Google AJAX Language API for translation and transliteration, rather than bundled as software services. Also, an Android version is probably in the works, one would guess.

All applications are available in US English for now.

Nuance buys Philips Speech Recognition Systems

Thursday, October 2nd, 2008

Nuance announced this week its acquisition of Philips Speech Recognition Systems. This represents another step in a series of acquisition by the speech technology giant towards market and portfolio expansion. In 2002, Scansoft Inc., which through further mergers and acquisitions became today’s Nuance, already acquired Philips’ network speech processing group, though not its dictation unit. With this weeks acquisition, the dictation unit will be incorporated into Nuance’s already strong dictation portfolio, expanding especially on European healthcare markets, the company announced. Highlights of the purchase include increasing customer base, language & solutions portfolios, distribution channels as well as a great leap forward in international expansion.