Archive for the ‘Vendors’ Category

Assistive and Accessibility Technology

Wednesday, November 21st, 2007

Diligent readers may have noticed that dominant news bits concerning speech and language technologies seem to focus on the cost- or time-saving aspects it. This is understandable, as the big players (Google, Microsoft, Nuance, IBM) have made it their mandate to capture lucrative markets (call center automation, directory assistance). Application of natural language technologies elsewhere, e.g. where it’s fun (in games) or necessary (providing accessibility for visually impaired users), seems to lag.
Not so this week. This week seems to shine under the assistive/accessibility technology star. Note Sourceforge project “Speak as Daisy” – a Microsoft Word plugin that enables creation of XML files with markup for speech synthesis or electronic braille generation. The plugin is said to be available in 2008.
Mac users with need for improved document read back in British English will rejoice over the improved Infovox iVox voices.
Philips and Elsevier develop a speech-enabled diagnostic system for Radiologists.
Behold Nattiq’s USB Hal Pen, which allows blind users to use the company’s accessibility features on any computer with a USB port without installation.
Of course there’s some overlap with time-, cost- and money-saving technologies as well. The FBI has announced widespread use of Nuance Dragon Naturally Speaking dictation for report and interview transcription.
Lastly, here’s an a propos rant against call center automation and frustrated end-users, a target group for speech and language technologies all too often neglected. Perhaps there’s a lesson to be learned about usability by the “money savers” employing speech technology, taken from those that rely on speech recognition and synthesis for their daily needs. I don’t know, but F-word spotting as a means for prioritizing frustrated callers seems like an acknowledgement of defeat.

Back in the saddle with MSFT, GOOG and VoiceGlue

Tuesday, November 13th, 2007

Back after an extensive break. Been working hard on some of my own multi-modal ideas. Keep your eyes peeled.
Looks like it’s been a quiet fall, speech and language technology-wise. After GOOG-411, Microsoft has also added speech to their search engine endeavors (if in a different domain) by speech-enabling Live Search for mobile users. Nuance continues to consolidate the speech tech market.
Exciting news on the IVR front. Finally a serious attempt to integrate various open-source technologies to provide free carrier-grade speech/telephone services is under way. VoiceGlue has managed to combine OpenVXI (VXML browser), Flite (Speech Synthesis) on Asterisk and is planning to integrate Sphinx2 for speech recognition. All components would then be available under some form of the GPL. Could this herald a change in availability of speech telephone platforms for developers unwilling to dish out horrendous per-port costs? Something to follow, anyway.
Lastly, here‘s an article describing the growing role of speech in warehouse management.

In-Game Speech with Fonix and Ninendo

Tuesday, July 3rd, 2007

Slow week in terms of language technology news.
On the gaming front: Nintendo announced they were playing the middleware game for Wii development by opening up the platform to 3rd party technologies. Among the first to sign on was Fonix, allowing game developers to integrate VoiceIn Game edition, their video game console speech recognition and “karaoke” SDK. The karaoke feature seems rather gimmicky, geared only at the karaoke gaming genre, which seems rather niche. Fonix has displayed strong focus on gaming in the past, integrating as Sony PS3 middleware.
Unfortunately, speech in games has never made a big splash, but it represents a refreshing move away from customer service applications. Perhaps the middleware approach of many platform vendors will change things.
Talking about the customer service front: Genesys and Merced Systems team to develop improved reporting tools. Measuring and reporting customer service interaction has made headway recently. Focus on interaction effectiveness of natural language/speech applications intends to help correct some of the poor image that self-service applications live with. Relatedly, this article describes the shortcomings of such applications in the past and proposes a less-is-more, faster interaction paradigm for interactive voice response applications. While not all problems with IVR applications boil down to complicated menu structures and long response times, this is certainly a pointer in the right direction, placing emphasis on dialogue design rather than engineering.
Lastly, showing that not all speech communications is simply about customer service, Voxeo snags Gartners “Cool Vendors in Enterprise Communications, 2007” title, awarded to companies for being among the “interesting, new and innovative”.

Nuance, Tegic and the woes and comeback of mobile speech

Tuesday, June 26th, 2007

So the big news this week is Nuance’s acquisition of the month: Tegic. Tegic supplies T9 predictive text input to several mobile phone manufacturers. The acquisition represents Nuance’s recent focus on acquiring mobile technology market companies. It serves Nuance with a strategic customer base, including obvious candidates for Nuance’s speech technologies. Aside from the strategic benefits, the technical result of mixing predictive text input with speech is interesting and something to be followed.
Coincidentally, the woes and comeback of using speech for I/O on mobile devices are described in these articles this week.
Lastly here is an interesting interview with Lin Chase, director of Accenture R&D in Bangalore, India, who held several prominent positions in the speech tech industry in the past. Topics include speech, women in the industry and why Americans should travel.

Weekly New Redux…

Tuesday, May 22nd, 2007

Today, I came across some novel(ish) uses for text-to-speech:

On the mainstream speech recognition front:

And some Web3.0 language tech news:

News are back…

Sunday, May 20th, 2007

Ok, I’m back from vacation and finally sorted through some of the recent developments in the speech world. Going forward I will probably post longer but less frequent tidbits here.

Biggest recent speech news is the acquisition of VoiceSignals, broadening their mobile end user market as well as adding some nifty voice features in short messaging and mobile phone usability.
On related news, here is a short article describing the role of speech in unified messaging.
Lastly, here is a description of progress on open-source telephony and speech recognition.

Three Observations about Recent Language Technology News

Wednesday, March 28th, 2007

To start us off, recent experience has shown three things:

  1. Speech (i.e. voice) related news is TTS-dominated, less so by ASR.
  2. The company featured most frequently in the news is Nuance.
  3. The talk of semantic search engines seems to dominate the NLP news.

The success of TTS is largely due to requirements set by mobile and in-car technologies, especially GPS and communications. The future of ASR in the other hand seems to depend on the dictation market (especially in the healthcare sector) and a growing relevance of network ASR (driven by advancing VoIP, impact of multi-modal applications).

Nuance’s continued position will depend on the role of “super players” IBM and Microsoft and to a lesser degree the role of open-source initiatives, especially on the network/telephony side.

Semantic search engines recently got some media hype with “Google-Killer” Powerset, a PARC offspring. While in its infancy, some believe this development towards semantic web will usher in a Web3.0 revolution. Of course, soem others believe this has already begun, while yet more just wanna see what happens with all this.

Let’s see how these trends develop. Especially multi-modality and semantic searches will be issues to follow closely.