Posts Tagged ‘IBM’

SVOX purchases Siemens AG speech-related IP

Monday, January 26th, 2009
Following Nuance’s acquisition of IBM speech technology intellectual property two weeks ago, Zurich-based SVOX today announced the purchase of the Siemens AG speech recognition technology group. The deal gears at creating “obvious synergies of developing TTS, ASR and speech dialog solutions” and enhances SVOX’s portfolio of technologies, which to date included only highly specialized speech synthesis solutions, to now entail speech recognition.
Like the Nuance-IBM deal (and unlike the Microsoft acquisition of TellMe), this merger breaks with the obvious big-fish small-fish paradigm. Here, a larger company’s (IBM, Siemens) R&D division was sold to a smaller, more specialized company (SVOX, Nuance).
Both transactions come with an intend to pursue development of novel interactive voice applications. However while Nuance announced the potential development of applications across platforms and environment with IBM expertise and IP, SVOX appears to stay on course with its successful line of automotive solutions to build
“a commanding market share in speech solutions for premium cars“.

This deal adds SVOX to a list of companies offering network and embedded speech recognition technologies, also including Nuance, Telisma, Loquendo and Microsoft. Financial terms of the deal were not announced.

Nuance acquires IBM speech patents

Friday, January 16th, 2009

Nuance yesterday announced the acquisition of speech-related patents from IBM. The deal encompasses a “licensing and technical services agreement”, with IBM continuing to support existing customers. Integrated solutions of the two companies’ technologies are expected in two years time, according to the press release.

This deal represents a further step in market consolidation, which Nuance has pursued via a number of mergers and acquisitions over the past years. Friends in the industry tell me IBM has been trying to market their suite of IVR voice application server software more aggressively, however speech research activity, once part of the company’s “pervasive computing” vision, has declined lately.

Perhaps the IBM vision will bear fruit at Nuance, as the announcement comes with a commitment ” to proliferate advanced speech capabilities across a broad range of devices and environments”. One thing is sure: much like Nuance’s recent acquisition of Philips voice products, years after taking over Philips IVR products and solutions, this deal represents another closure, as Nuance has been marketing and supporting IBM’s ViaVoice product line for years. The de facto number of competitors on the speech and voice technology market is shrinking, as applications become more mainstream.

.

IBM Predicts Talking Web

Friday, November 28th, 2008

IBM’s annual crystal ball list of Innovations That Will Change Our Lives in the Next Five Years includes a forecast of a voice-enabled talking web. “You will be able to sort through the Web verbally to find what you are looking for and have the information read back to you,” the article predicts.
IBM itself has launched several voice-enabled products and initiatives over the years, most notably the WebSphere Voice family of web servers, which adds various voice functionality to its flagship WebSphere platform, leveraging it in areas such as unified messaging and call-center automation.
Some problems exist with a vision as the one advocated by the article. Speech recognition accuracy and noise filtering have obviously come a long way and may only pose a minor impediment.
The user’s desire to speak rather than type or click is another problem. Issuing voice commands in the presence of others may not always be desirable and can be disruptive, for instance at work on public transport. Lastly, there are usability concerns, beyond the quality of speech technology, when converting a visual 2- or even 3-dimensional representation of information into a 1-dimensional audio stream. The cognitive load increases significantly with tasks more complex than, for instance, obtaining time-table information or finding the nearest Italian restaurant.
The effort that stands behind the vision, to put voice technology to uses beyond call-center automation, is laudable. Mobile internet access and computing on-the-road may indeed do their parts to make this vision come true. And clearly, there are use cases, such as improved accessibility for users with impairments, that on their own accord merit making the web voice-accessible. Wide-spread usage of a voice-enabled web, however, may be more than five years off.

OnMobile buys Telisma

Monday, May 19th, 2008
OnMobile Global Ltd today acquired France-based Telisma, a producer of speech recognition software for network/telephony environments.
The acquisition comes at a time after OnMobile recently partnered with Nuance, a Telisma competitor for speech recognition markets, to deploy voice search applications for its home market, India. India’s multilingual market has made it a tough one to crack for speech technology companies, though a lucrative one as India has recently surpassed the U.S. as the second largest mobile market in the world, according to Om Malik at GigaOm.
I suspect issues specific to speech technology and India’s multilingualism have something to do with this deal. As I recently pointed out, internationalization of speech and language technologies comes at a steep entry cost, due to the high demands on expertise and data required for building language-specific models. In addition, speech recognition companies like Nuance have long kept their language models under wraps. In other words, if your language isn’t catered to, reaching that language’s customer base becomes a very pricey affair.
While open-source aspirations to build freely availably language models for speech recognition exist, Telisma has opted on middle-ground in this matter by allowing partners/customers to build their own models, but selling the tools to do so at a price. In a market like India, the ability to cater to a multi-lingual customer base without purchase of expensive proprietary software (or paying someone else to develop proprietary software for you to purchase) may have made a big difference in this deal.

On a different note, this acquisition is the latest in a series of acquisitions consolidating the speech technology market. While five years ago telephony speech technology was a highly redundant market of small companies building similar products, today they have largely been acquired by or merged with bigger players. In the meantime, companies like Microsoft, IBM, Siemens and Google are making their own moves to enter the market.

Update:
Telismas acoustic modelling toolkit is indeed not for sale, but for free, as one reader has pointed out. Thanks!

GOOG: We need more data

Thursday, January 3rd, 2008

The old maxim “I need more data” should be familiar to anyone who has ever tried to wrestle with language technology issues, attempted speech application tuning or delved into any statistical approach to an AI-related problem. Google moved into the speech world last year with GOOG-411, a speech recognition driven directory assistance application (you say what you are looking for and where, it returns suitable businesses and connects you to the one you want or sends you details in an SMS).
Like all (well, most) other Google services, GOOG-411 is free for the end-user. As such, the basic business model (collect data, turn data into cash) applies. This was recently confirmed in interview by Marissa Mayer, Google’s VP of Search Products and User Experience:


Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search.

Google thus couples statistical AI and its general data-driven approach to everything in a novel way. In doing so, Google may find itself in a catch-up race with the ilk of Nuance, Loquendo IBM, or Telisma, whose stronghold on speech recognition technology comes, in part, from having aggregated speech and language databases through data collection during professional services projects.
What’s new in Google’s approach, however, is the convergence of the dual role that data plays in AI and in the overall service-driven business model. Google will presumably not be content to bootstrap a pattern matching engine to sell licenses like the technology companies above. More interestingly to follow will be the range of services Google can spin using this technology (context sensitive video advertising, audio indexing, IVR hosting) which are more befitting of their overall company strategy.
Unsurprisingly, Mayer goes on to claim that Google isn’t working on ways out of the world of brute-force data-driven algorithms:

People should be able to ask questions, and we should understand their meaning, or they should be able to talk about things at a conceptual level. … A lot of people will turn to things like the semantic Web as a possible answer to that. But what we’re seeing actually is that with a lot of data, you ultimately see things that seem intelligent even though they’re done through brute force.

User privacy advocates may also have a thought or two on this new dimension of data collection, as Google is beginning to loose the “conventionally trustworthy” image it held amongst many over the past years. Fortunately the ways in which speech data is commonly used to train pattern matching models involves very little in the ways of privacy infringement.
Happy data collecting!

Three Observations about Recent Language Technology News

Wednesday, March 28th, 2007

To start us off, recent experience has shown three things:

  1. Speech (i.e. voice) related news is TTS-dominated, less so by ASR.
  2. The company featured most frequently in the news is Nuance.
  3. The talk of semantic search engines seems to dominate the NLP news.

The success of TTS is largely due to requirements set by mobile and in-car technologies, especially GPS and communications. The future of ASR in the other hand seems to depend on the dictation market (especially in the healthcare sector) and a growing relevance of network ASR (driven by advancing VoIP, impact of multi-modal applications).

Nuance’s continued position will depend on the role of “super players” IBM and Microsoft and to a lesser degree the role of open-source initiatives, especially on the network/telephony side.

Semantic search engines recently got some media hype with “Google-Killer” Powerset, a PARC offspring. While in its infancy, some believe this development towards semantic web will usher in a Web3.0 revolution. Of course, soem others believe this has already begun, while yet more just wanna see what happens with all this.

Let’s see how these trends develop. Especially multi-modality and semantic searches will be issues to follow closely.