Archive for the ‘Brands’ Category

Google on the Move, News Redux

Wednesday, July 25th, 2007

Very quiet recently. No big acquisitions, no no speech-tech revolution.

Most interesting: Google announced Mike Cohen (of formerly Nuance) will appear as keynote speaker at SpeechTek in August to reveal Google’s speech technology strategy. Google has already moved into the speech application market with GOOG411, an automatic directory assistance application leveraging business search and Google Maps.
UBC researchers announce speech learning system that doesn’t use traditional data-driven model to learn the sounds of a language. Instead it is said to represent more experience driven learning, much like infants. So far, the system has acquired English and Japanese vowels.
Some product reviews/announcements: a quick history of desktop dictation, uses of TextAloud for the iPhone, and Nuance’s new South African voice “Tessa”.
Also on the web: NIST evaluates DARPA automatic translation software in military contexts, and What Semantic Search is Not.

I may post less frequently in coming weeks. Stay tuned.

In-Game Speech with Fonix and Ninendo

Tuesday, July 3rd, 2007

Slow week in terms of language technology news.
On the gaming front: Nintendo announced they were playing the middleware game for Wii development by opening up the platform to 3rd party technologies. Among the first to sign on was Fonix, allowing game developers to integrate VoiceIn Game edition, their video game console speech recognition and “karaoke” SDK. The karaoke feature seems rather gimmicky, geared only at the karaoke gaming genre, which seems rather niche. Fonix has displayed strong focus on gaming in the past, integrating as Sony PS3 middleware.
Unfortunately, speech in games has never made a big splash, but it represents a refreshing move away from customer service applications. Perhaps the middleware approach of many platform vendors will change things.
Talking about the customer service front: Genesys and Merced Systems team to develop improved reporting tools. Measuring and reporting customer service interaction has made headway recently. Focus on interaction effectiveness of natural language/speech applications intends to help correct some of the poor image that self-service applications live with. Relatedly, this article describes the shortcomings of such applications in the past and proposes a less-is-more, faster interaction paradigm for interactive voice response applications. While not all problems with IVR applications boil down to complicated menu structures and long response times, this is certainly a pointer in the right direction, placing emphasis on dialogue design rather than engineering.
Lastly, showing that not all speech communications is simply about customer service, Voxeo snags Gartners “Cool Vendors in Enterprise Communications, 2007” title, awarded to companies for being among the “interesting, new and innovative”.

Nuance, Tegic and the woes and comeback of mobile speech

Tuesday, June 26th, 2007

So the big news this week is Nuance’s acquisition of the month: Tegic. Tegic supplies T9 predictive text input to several mobile phone manufacturers. The acquisition represents Nuance’s recent focus on acquiring mobile technology market companies. It serves Nuance with a strategic customer base, including obvious candidates for Nuance’s speech technologies. Aside from the strategic benefits, the technical result of mixing predictive text input with speech is interesting and something to be followed.
Coincidentally, the woes and comeback of using speech for I/O on mobile devices are described in these articles this week.
Lastly here is an interesting interview with Lin Chase, director of Accenture R&D in Bangalore, India, who held several prominent positions in the speech tech industry in the past. Topics include speech, women in the industry and why Americans should travel.

Healthcare, Security and the Army…

Wednesday, June 20th, 2007

…these are the three overarching themes of the speech technology news that I came across this week. There are some obvious and less obvious points of contact here:

Speech Meets Sales, Video Gaming and the Economist reports…

Tuesday, June 12th, 2007

Many of those working in speech recognition, especially deploying customer-service telephone application, have grown tired the limited scope that most projects entail. I recently wrote about speech enabled knowledge bases as a novel type of speech app. In what may be another – at least I haven’t heard this before – MTI and FasTrak Retail combine efforts to launch a ‘virtual sales associates‘ platform. And of course there are the recurring dreams of voice enabled video gaming.

Speech synthesis is naturally more diverse than its recognition sibling (perhaps not everything ‘I’ in I/O can be channelled through voice, but pretty much everything ‘O’ can be synthesized.) In todays news, TTS is employed in emergency response systems to broadcast text messages as audio.

Lastly, speech got some rep in the Economist June 7th issue.

Germany-based and search-engines-driven language technology

Friday, June 8th, 2007

There has been lot’s of German-based language technology news over the past couple of weeks:

Also some attention on language-technology-related search engine news:

Weekly New Redux…

Tuesday, May 22nd, 2007

Today, I came across some novel(ish) uses for text-to-speech:

On the mainstream speech recognition front:

And some Web3.0 language tech news:

News are back…

Sunday, May 20th, 2007

Ok, I’m back from vacation and finally sorted through some of the recent developments in the speech world. Going forward I will probably post longer but less frequent tidbits here.

Biggest recent speech news is the acquisition of VoiceSignals, broadening their mobile end user market as well as adding some nifty voice features in short messaging and mobile phone usability.
On related news, here is a short article describing the role of speech in unified messaging.
Lastly, here is a description of progress on open-source telephony and speech recognition.

Web 3.0 and Natural Language Processing

Monday, April 9th, 2007

Web 3.0 is getting some buzz in the blogosphere. Like Web 2.0, it begs the question that PCMag.com recently ran by its readers: what is it? However this time around things seems a bit easier.

Web 2.0 seems to be happy with being vaguely defined (delimited may be a better term) and equally a social and a technological movement. Web 3.0 clearly hovers over the idea of the “Semantic Web”, a term coined by Tim Berners-Lee, in which richly mark-upped hypertext and data allow for novel more meaningful human-machine and machine-machine communication. Radar Networks (currently in stealth mode) claim to be driving some interesting developments in this direction and are followed closely by those interested.

This has already raised some questions: will content be expensive hand labor or machine boot-strappable, what new privacy policies do we have to live with, how does one separate style and content, what are alternatives to RDF.

Sadly, there’s very little inspiring out there about potential applications.

My question (though not uniquely mine) to add to this: What role will natural language processing play in this (i.e. how “semantic” is this talk of Semantics)? Semantic content in RDF appears to be little more than a means for one machine to tell another who authored a particular book or what are the postal codes in the greater Boston area. Semantics to me is as much about intentions (“Why is web-service A dispensing such information?”) and interpreting such information for the purposes of action (“What can web-service B – or my browser or I – do with it?”).

Perhaps this misses the mark and semantic really isn’t about natural language. But there is a weaker, more real form of this “language and technology” concern: Insofar as semantics is just information, can it be bootstrapped by a machine (perhaps even linguistically informed rather than statistically)?

Thoughts?

Three Observations about Recent Language Technology News

Wednesday, March 28th, 2007

To start us off, recent experience has shown three things:

  1. Speech (i.e. voice) related news is TTS-dominated, less so by ASR.
  2. The company featured most frequently in the news is Nuance.
  3. The talk of semantic search engines seems to dominate the NLP news.

The success of TTS is largely due to requirements set by mobile and in-car technologies, especially GPS and communications. The future of ASR in the other hand seems to depend on the dictation market (especially in the healthcare sector) and a growing relevance of network ASR (driven by advancing VoIP, impact of multi-modal applications).

Nuance’s continued position will depend on the role of “super players” IBM and Microsoft and to a lesser degree the role of open-source initiatives, especially on the network/telephony side.

Semantic search engines recently got some media hype with “Google-Killer” Powerset, a PARC offspring. While in its infancy, some believe this development towards semantic web will usher in a Web3.0 revolution. Of course, soem others believe this has already begun, while yet more just wanna see what happens with all this.

Let’s see how these trends develop. Especially multi-modality and semantic searches will be issues to follow closely.