Archive for the ‘Brands’ Category

Nuance acquires IBM speech patents

Friday, January 16th, 2009

Nuance yesterday announced the acquisition of speech-related patents from IBM. The deal encompasses a “licensing and technical services agreement”, with IBM continuing to support existing customers. Integrated solutions of the two companies’ technologies are expected in two years time, according to the press release.

This deal represents a further step in market consolidation, which Nuance has pursued via a number of mergers and acquisitions over the past years. Friends in the industry tell me IBM has been trying to market their suite of IVR voice application server software more aggressively, however speech research activity, once part of the company’s “pervasive computing” vision, has declined lately.

Perhaps the IBM vision will bear fruit at Nuance, as the announcement comes with a commitment ” to proliferate advanced speech capabilities across a broad range of devices and environments”. One thing is sure: much like Nuance’s recent acquisition of Philips voice products, years after taking over Philips IVR products and solutions, this deal represents another closure, as Nuance has been marketing and supporting IBM’s ViaVoice product line for years. The de facto number of competitors on the speech and voice technology market is shrinking, as applications become more mainstream.

.

IBM Predicts Talking Web

Friday, November 28th, 2008

IBM’s annual crystal ball list of Innovations That Will Change Our Lives in the Next Five Years includes a forecast of a voice-enabled talking web. “You will be able to sort through the Web verbally to find what you are looking for and have the information read back to you,” the article predicts.
IBM itself has launched several voice-enabled products and initiatives over the years, most notably the WebSphere Voice family of web servers, which adds various voice functionality to its flagship WebSphere platform, leveraging it in areas such as unified messaging and call-center automation.
Some problems exist with a vision as the one advocated by the article. Speech recognition accuracy and noise filtering have obviously come a long way and may only pose a minor impediment.
The user’s desire to speak rather than type or click is another problem. Issuing voice commands in the presence of others may not always be desirable and can be disruptive, for instance at work on public transport. Lastly, there are usability concerns, beyond the quality of speech technology, when converting a visual 2- or even 3-dimensional representation of information into a 1-dimensional audio stream. The cognitive load increases significantly with tasks more complex than, for instance, obtaining time-table information or finding the nearest Italian restaurant.
The effort that stands behind the vision, to put voice technology to uses beyond call-center automation, is laudable. Mobile internet access and computing on-the-road may indeed do their parts to make this vision come true. And clearly, there are use cases, such as improved accessibility for users with impairments, that on their own accord merit making the web voice-accessible. Wide-spread usage of a voice-enabled web, however, may be more than five years off.

Google Mobile iPhone App with Speech Recognition

Tuesday, November 18th, 2008

Google released a new feature for its Google Mobile iPhone Application yesterday: voice search. Users speak a query and the application returns search results formatted for the iPhone. This is similar to the GOOG411 directory assistance application, which allows users to call a phone number, speak a query and receive information about local listings in voice or SMS formats. However the new application apparently performs recognition locally on the iPhone, meaning it comes bundled with an embedded speech recognition engine.

Aside from GOOG411, during the US presidential Google released Gaudi, a voice indexing technology for video. That makes the iPhone app the third official service the company releases, making use of speech recognition, leaving one guessing when Google’s speech technology becomes available as API, like the Google AJAX Language API for translation and transliteration, rather than bundled as software services. Also, an Android version is probably in the works, one would guess.

All applications are available in US English for now.

Nuance buys Philips Speech Recognition Systems

Thursday, October 2nd, 2008

Nuance announced this week its acquisition of Philips Speech Recognition Systems. This represents another step in a series of acquisition by the speech technology giant towards market and portfolio expansion. In 2002, Scansoft Inc., which through further mergers and acquisitions became today’s Nuance, already acquired Philips’ network speech processing group, though not its dictation unit. With this weeks acquisition, the dictation unit will be incorporated into Nuance’s already strong dictation portfolio, expanding especially on European healthcare markets, the company announced. Highlights of the purchase include increasing customer base, language & solutions portfolios, distribution channels as well as a great leap forward in international expansion.

Google Showcases Audio Indexing with Gaudi

Friday, September 19th, 2008

Google Labs opened GAudi this week to showcase its new audio indexing technology.

Google GAudi allows searching for keywords/phrases in the audio-stream of selected YouTube videos. Matches are represented as yellow slots on the playback slider. Top results appear as snippets of text from the audio surrounding the search term as well as information how many minutes into the video the term occurred.

The video material chosen to showcase GAudi is material concerning this year’s US presendential elections as “part of a broader effort around politics”, but also because of the high performance with such material and the relevance to testers and users.

Indexing does not appear to be complete, as using randomly chosen text fragments from showcased videos did not always result in a match. Google does say Gaudi is using its own speech recognition engine, perhaps the same employed by GOOG411, though most FAQs about technical details and how one could use GAudi for video are directed to email inquiries.

While GAudi is showcasing campaign material, it seems only a matter of time before audio indexing will be available for serving ad content on video.

Microsoft Windows Live Messenger Translation Bot

Monday, September 8th, 2008

In the wake of Google’s release of its Chrome web-browser, speculation on plans for Chrome on other platforms, including Android have drifted ashore. Naturally this has washed aside much recent IE8 news, which, though not a game-changer, is said to introduce many of the much-needed improvements everyone has been looking for from Microsoft.

In light of the browser war raging, a little add-on for Microsoft’s Live Messenger may not stir many waters, even if it promises real-time chat translation between English and 14 other languages. However it is still refreshing to read about technology, which is geared at opening channels of communication, rather than capturing market shares.

What are Google’s plans with Chrome and Android viz. Microsoft IE on Windows Mobile? Will Microsoft leverage its non-browser language services such as translation and speech recognition like Google has been?

OnMobile buys Telisma

Monday, May 19th, 2008
OnMobile Global Ltd today acquired France-based Telisma, a producer of speech recognition software for network/telephony environments.
The acquisition comes at a time after OnMobile recently partnered with Nuance, a Telisma competitor for speech recognition markets, to deploy voice search applications for its home market, India. India’s multilingual market has made it a tough one to crack for speech technology companies, though a lucrative one as India has recently surpassed the U.S. as the second largest mobile market in the world, according to Om Malik at GigaOm.
I suspect issues specific to speech technology and India’s multilingualism have something to do with this deal. As I recently pointed out, internationalization of speech and language technologies comes at a steep entry cost, due to the high demands on expertise and data required for building language-specific models. In addition, speech recognition companies like Nuance have long kept their language models under wraps. In other words, if your language isn’t catered to, reaching that language’s customer base becomes a very pricey affair.
While open-source aspirations to build freely availably language models for speech recognition exist, Telisma has opted on middle-ground in this matter by allowing partners/customers to build their own models, but selling the tools to do so at a price. In a market like India, the ability to cater to a multi-lingual customer base without purchase of expensive proprietary software (or paying someone else to develop proprietary software for you to purchase) may have made a big difference in this deal.

On a different note, this acquisition is the latest in a series of acquisitions consolidating the speech technology market. While five years ago telephony speech technology was a highly redundant market of small companies building similar products, today they have largely been acquired by or merged with bigger players. In the meantime, companies like Microsoft, IBM, Siemens and Google are making their own moves to enter the market.

Update:
Telismas acoustic modelling toolkit is indeed not for sale, but for free, as one reader has pointed out. Thanks!

GOOG: We need more data

Thursday, January 3rd, 2008

The old maxim “I need more data” should be familiar to anyone who has ever tried to wrestle with language technology issues, attempted speech application tuning or delved into any statistical approach to an AI-related problem. Google moved into the speech world last year with GOOG-411, a speech recognition driven directory assistance application (you say what you are looking for and where, it returns suitable businesses and connects you to the one you want or sends you details in an SMS).
Like all (well, most) other Google services, GOOG-411 is free for the end-user. As such, the basic business model (collect data, turn data into cash) applies. This was recently confirmed in interview by Marissa Mayer, Google’s VP of Search Products and User Experience:


Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search.

Google thus couples statistical AI and its general data-driven approach to everything in a novel way. In doing so, Google may find itself in a catch-up race with the ilk of Nuance, Loquendo IBM, or Telisma, whose stronghold on speech recognition technology comes, in part, from having aggregated speech and language databases through data collection during professional services projects.
What’s new in Google’s approach, however, is the convergence of the dual role that data plays in AI and in the overall service-driven business model. Google will presumably not be content to bootstrap a pattern matching engine to sell licenses like the technology companies above. More interestingly to follow will be the range of services Google can spin using this technology (context sensitive video advertising, audio indexing, IVR hosting) which are more befitting of their overall company strategy.
Unsurprisingly, Mayer goes on to claim that Google isn’t working on ways out of the world of brute-force data-driven algorithms:

People should be able to ask questions, and we should understand their meaning, or they should be able to talk about things at a conceptual level. … A lot of people will turn to things like the semantic Web as a possible answer to that. But what we’re seeing actually is that with a lot of data, you ultimately see things that seem intelligent even though they’re done through brute force.

User privacy advocates may also have a thought or two on this new dimension of data collection, as Google is beginning to loose the “conventionally trustworthy” image it held amongst many over the past years. Fortunately the ways in which speech data is commonly used to train pattern matching models involves very little in the ways of privacy infringement.
Happy data collecting!

Assistive and Accessibility Technology

Wednesday, November 21st, 2007

Diligent readers may have noticed that dominant news bits concerning speech and language technologies seem to focus on the cost- or time-saving aspects it. This is understandable, as the big players (Google, Microsoft, Nuance, IBM) have made it their mandate to capture lucrative markets (call center automation, directory assistance). Application of natural language technologies elsewhere, e.g. where it’s fun (in games) or necessary (providing accessibility for visually impaired users), seems to lag.
Not so this week. This week seems to shine under the assistive/accessibility technology star. Note Sourceforge project “Speak as Daisy” – a Microsoft Word plugin that enables creation of XML files with markup for speech synthesis or electronic braille generation. The plugin is said to be available in 2008.
Mac users with need for improved document read back in British English will rejoice over the improved Infovox iVox voices.
Philips and Elsevier develop a speech-enabled diagnostic system for Radiologists.
Behold Nattiq’s USB Hal Pen, which allows blind users to use the company’s accessibility features on any computer with a USB port without installation.
Of course there’s some overlap with time-, cost- and money-saving technologies as well. The FBI has announced widespread use of Nuance Dragon Naturally Speaking dictation for report and interview transcription.
Lastly, here’s an a propos rant against call center automation and frustrated end-users, a target group for speech and language technologies all too often neglected. Perhaps there’s a lesson to be learned about usability by the “money savers” employing speech technology, taken from those that rely on speech recognition and synthesis for their daily needs. I don’t know, but F-word spotting as a means for prioritizing frustrated callers seems like an acknowledgement of defeat.

Back in the saddle with MSFT, GOOG and VoiceGlue

Tuesday, November 13th, 2007

Back after an extensive break. Been working hard on some of my own multi-modal ideas. Keep your eyes peeled.
Looks like it’s been a quiet fall, speech and language technology-wise. After GOOG-411, Microsoft has also added speech to their search engine endeavors (if in a different domain) by speech-enabling Live Search for mobile users. Nuance continues to consolidate the speech tech market.
Exciting news on the IVR front. Finally a serious attempt to integrate various open-source technologies to provide free carrier-grade speech/telephone services is under way. VoiceGlue has managed to combine OpenVXI (VXML browser), Flite (Speech Synthesis) on Asterisk and is planning to integrate Sphinx2 for speech recognition. All components would then be available under some form of the GPL. Could this herald a change in availability of speech telephone platforms for developers unwilling to dish out horrendous per-port costs? Something to follow, anyway.
Lastly, here‘s an article describing the growing role of speech in warehouse management.