Posts Tagged ‘Microsoft’

A More Optimistic Outlook on the Future of Speech

Wednesday, June 30th, 2010

The speech application industry got some critical press in recent months (here are some spirited responses, respectively.)

All the more refreshing to come across this New York Times article presenting current work in speech and artificial intelligence. The article highlights broadly what kind of AI applications have moved into the mainstream (or have potential to do so). Speech and natural language understanding, the article claims, have gone furthest.

One thing that is generalizable from both criticisms above is that development of speech-enabled applications has stagnated, in various ways1. The underlying technology – speech recognition (ASR) – has gone as far as it can. Application designers and developers haven’t adopted. Dictation has learned to understand doctors and lawyers better, but still struggles with conversational speech.

This point may have to be conceded. In terms of commercial applications however, especially speech-enabled voice (IVR) systems, the root cause for stagnation is not necessarily a failure of AI, rather than a maturing of standards and best-practices. Fulfilling expectations that voice applications, much like websites, behave according to certain rules is much to the advantage of the millions who interact with such systems every day.

What I walk away with from the generalized critical, as well as the Times’ optimistic perspective is that, short of a revolution in underlying technologies (which hardly anyone expects), filling practical, everyday niches is where things can still move forward for speech and language processing.  These niches have certainly not been fully uncovered.

Thoughts?


1 Roughly summarized, Robert Fostner: “development in speech technology has flat-lined since 2001″; David Suendermann: “(statistical) engineering methods are more efficient than traditional symbolic linguistic approaches to language processing.”

Microsoft Recite Preview – Note Dictation and Voice Search

Monday, February 16th, 2009

Arstechnica reports today on the release of Microsoft Recite “Technology Preview” for Windows Mobile. The applications lets users record short notes as audio snippets, which can later be searched for content by speaking key words. Apparently it does not entail speech recognition rather than simpler pattern matching, meaning it cannot be searched in text form but may work more robustly, eliminating the effort of training for speaker-independency.

While not a full product yet, this sounds like a nifty little application for cognitive off-loading.

Have you tried Microsoft Recite?



SVOX purchases Siemens AG speech-related IP

Monday, January 26th, 2009
Following Nuance’s acquisition of IBM speech technology intellectual property two weeks ago, Zurich-based SVOX today announced the purchase of the Siemens AG speech recognition technology group. The deal gears at creating “obvious synergies of developing TTS, ASR and speech dialog solutions” and enhances SVOX’s portfolio of technologies, which to date included only highly specialized speech synthesis solutions, to now entail speech recognition.
Like the Nuance-IBM deal (and unlike the Microsoft acquisition of TellMe), this merger breaks with the obvious big-fish small-fish paradigm. Here, a larger company’s (IBM, Siemens) R&D division was sold to a smaller, more specialized company (SVOX, Nuance).
Both transactions come with an intend to pursue development of novel interactive voice applications. However while Nuance announced the potential development of applications across platforms and environment with IBM expertise and IP, SVOX appears to stay on course with its successful line of automotive solutions to build
“a commanding market share in speech solutions for premium cars“.

This deal adds SVOX to a list of companies offering network and embedded speech recognition technologies, also including Nuance, Telisma, Loquendo and Microsoft. Financial terms of the deal were not announced.

Microsoft Windows Live Messenger Translation Bot

Monday, September 8th, 2008

In the wake of Google’s release of its Chrome web-browser, speculation on plans for Chrome on other platforms, including Android have drifted ashore. Naturally this has washed aside much recent IE8 news, which, though not a game-changer, is said to introduce many of the much-needed improvements everyone has been looking for from Microsoft.

In light of the browser war raging, a little add-on for Microsoft’s Live Messenger may not stir many waters, even if it promises real-time chat translation between English and 14 other languages. However it is still refreshing to read about technology, which is geared at opening channels of communication, rather than capturing market shares.

What are Google’s plans with Chrome and Android viz. Microsoft IE on Windows Mobile? Will Microsoft leverage its non-browser language services such as translation and speech recognition like Google has been?

OnMobile buys Telisma

Monday, May 19th, 2008
OnMobile Global Ltd today acquired France-based Telisma, a producer of speech recognition software for network/telephony environments.
The acquisition comes at a time after OnMobile recently partnered with Nuance, a Telisma competitor for speech recognition markets, to deploy voice search applications for its home market, India. India’s multilingual market has made it a tough one to crack for speech technology companies, though a lucrative one as India has recently surpassed the U.S. as the second largest mobile market in the world, according to Om Malik at GigaOm.
I suspect issues specific to speech technology and India’s multilingualism have something to do with this deal. As I recently pointed out, internationalization of speech and language technologies comes at a steep entry cost, due to the high demands on expertise and data required for building language-specific models. In addition, speech recognition companies like Nuance have long kept their language models under wraps. In other words, if your language isn’t catered to, reaching that language’s customer base becomes a very pricey affair.
While open-source aspirations to build freely availably language models for speech recognition exist, Telisma has opted on middle-ground in this matter by allowing partners/customers to build their own models, but selling the tools to do so at a price. In a market like India, the ability to cater to a multi-lingual customer base without purchase of expensive proprietary software (or paying someone else to develop proprietary software for you to purchase) may have made a big difference in this deal.

On a different note, this acquisition is the latest in a series of acquisitions consolidating the speech technology market. While five years ago telephony speech technology was a highly redundant market of small companies building similar products, today they have largely been acquired by or merged with bigger players. In the meantime, companies like Microsoft, IBM, Siemens and Google are making their own moves to enter the market.

Update:
Telismas acoustic modelling toolkit is indeed not for sale, but for free, as one reader has pointed out. Thanks!

Assistive and Accessibility Technology

Wednesday, November 21st, 2007

Diligent readers may have noticed that dominant news bits concerning speech and language technologies seem to focus on the cost- or time-saving aspects it. This is understandable, as the big players (Google, Microsoft, Nuance, IBM) have made it their mandate to capture lucrative markets (call center automation, directory assistance). Application of natural language technologies elsewhere, e.g. where it’s fun (in games) or necessary (providing accessibility for visually impaired users), seems to lag.
Not so this week. This week seems to shine under the assistive/accessibility technology star. Note Sourceforge project “Speak as Daisy” – a Microsoft Word plugin that enables creation of XML files with markup for speech synthesis or electronic braille generation. The plugin is said to be available in 2008.
Mac users with need for improved document read back in British English will rejoice over the improved Infovox iVox voices.
Philips and Elsevier develop a speech-enabled diagnostic system for Radiologists.
Behold Nattiq’s USB Hal Pen, which allows blind users to use the company’s accessibility features on any computer with a USB port without installation.
Of course there’s some overlap with time-, cost- and money-saving technologies as well. The FBI has announced widespread use of Nuance Dragon Naturally Speaking dictation for report and interview transcription.
Lastly, here’s an a propos rant against call center automation and frustrated end-users, a target group for speech and language technologies all too often neglected. Perhaps there’s a lesson to be learned about usability by the “money savers” employing speech technology, taken from those that rely on speech recognition and synthesis for their daily needs. I don’t know, but F-word spotting as a means for prioritizing frustrated callers seems like an acknowledgement of defeat.

Back in the saddle with MSFT, GOOG and VoiceGlue

Tuesday, November 13th, 2007

Back after an extensive break. Been working hard on some of my own multi-modal ideas. Keep your eyes peeled.
Looks like it’s been a quiet fall, speech and language technology-wise. After GOOG-411, Microsoft has also added speech to their search engine endeavors (if in a different domain) by speech-enabling Live Search for mobile users. Nuance continues to consolidate the speech tech market.
Exciting news on the IVR front. Finally a serious attempt to integrate various open-source technologies to provide free carrier-grade speech/telephone services is under way. VoiceGlue has managed to combine OpenVXI (VXML browser), Flite (Speech Synthesis) on Asterisk and is planning to integrate Sphinx2 for speech recognition. All components would then be available under some form of the GPL. Could this herald a change in availability of speech telephone platforms for developers unwilling to dish out horrendous per-port costs? Something to follow, anyway.
Lastly, here‘s an article describing the growing role of speech in warehouse management.

This week: Bunnies, Trojans and the Jetsons

Wednesday, July 11th, 2007

There was no shortage of novel uses for speech technology this week. Avaya and the Jersey City’s Liberty Science Center announced speech-enabled exhibits, allowing customers to access information and services in the museum using their voice (and, of course, mobile devices).
Gizmo freaks should love (and everyone else should hate) this bunny, displaying speech recognition and synthesis, while also providing some unified communication capacities.
Also novel, though on a sadder note: speech is finally on the malware radar for good, as TTS trojans popped up using Microsoft’s builtin text-to-speech engine to annoy users by commenting their own malicious behavior. Call it the salt-in-wound virus. This news comes after about half a year after a MS Vista speech recognition security flaw was revealed, whereby the recognizer enables remote execution of content on a computer running speech recognition.

Traditional speech applications made some headlines this week as well: Nuance signs deal with Damovo to roll out speech apps in Ireland, forecasting €1.5m in profits over the next year. TuVox annouces hosted on-demand speech apps for VOIP access.

Lastly, here is an interesting article about the Jetsons and why speech technology hasn’t caught on as much as we have all hoped.

Daily News Redux…

Wednesday, April 18th, 2007

On the WWW today:

Daily News Redux…

Sunday, April 1st, 2007

On the WWW today: