Posts Tagged ‘usability’

A More Optimistic Outlook on the Future of Speech

Wednesday, June 30th, 2010

The speech application industry got some critical press in recent months (here are some spirited responses, respectively.)

All the more refreshing to come across this New York Times article presenting current work in speech and artificial intelligence. The article highlights broadly what kind of AI applications have moved into the mainstream (or have potential to do so). Speech and natural language understanding, the article claims, have gone furthest.

One thing that is generalizable from both criticisms above is that development of speech-enabled applications has stagnated, in various ways1. The underlying technology – speech recognition (ASR) – has gone as far as it can. Application designers and developers haven’t adopted. Dictation has learned to understand doctors and lawyers better, but still struggles with conversational speech.

This point may have to be conceded. In terms of commercial applications however, especially speech-enabled voice (IVR) systems, the root cause for stagnation is not necessarily a failure of AI, rather than a maturing of standards and best-practices. Fulfilling expectations that voice applications, much like websites, behave according to certain rules is much to the advantage of the millions who interact with such systems every day.

What I walk away with from the generalized critical, as well as the Times’ optimistic perspective is that, short of a revolution in underlying technologies (which hardly anyone expects), filling practical, everyday niches is where things can still move forward for speech and language processing.  These niches have certainly not been fully uncovered.

Thoughts?


1 Roughly summarized, Robert Fostner: “development in speech technology has flat-lined since 2001″; David Suendermann: “(statistical) engineering methods are more efficient than traditional symbolic linguistic approaches to language processing.”

Zumba Lumba – iPhone killer or simply a hoax?

Monday, February 2nd, 2009

A no-frills phone with the unlikely name of Zumba Lumba has recently received some attention by the BBC. The phone is said to be top-secret, developed by a defense-aviation company. It does without frills like a camera or an applications platform, but touts some interesting security and computational features, (not only) related to speech technology:

  • Cloud computing – the phone uses no local storage for contacts, data.
  • Network speech recognition – user input is recognized over the internet. This should avoid hardware intensive local computing for voice input, but requires internet access.
  • Voice identification – enhanced security, because the phone will only respond to a single user’s voice.

Some seem to think this is a potential iPhone killer at least in terms of making use of innovative input modalities (though Google already released a speech recognition app for the iPhone.) Others simply thinks it’s a hoax.

Either way, the idea of joining mobile with cloud computing is interesting. Using voice identification for security has its appeal as well, even if it’s unclear whether keeping data in the cloud and sending voice data over the internet is any more secure than simply keeping data on your phone, locally.

Assistive and Accessibility Technology

Wednesday, November 21st, 2007

Diligent readers may have noticed that dominant news bits concerning speech and language technologies seem to focus on the cost- or time-saving aspects it. This is understandable, as the big players (Google, Microsoft, Nuance, IBM) have made it their mandate to capture lucrative markets (call center automation, directory assistance). Application of natural language technologies elsewhere, e.g. where it’s fun (in games) or necessary (providing accessibility for visually impaired users), seems to lag.
Not so this week. This week seems to shine under the assistive/accessibility technology star. Note Sourceforge project “Speak as Daisy” – a Microsoft Word plugin that enables creation of XML files with markup for speech synthesis or electronic braille generation. The plugin is said to be available in 2008.
Mac users with need for improved document read back in British English will rejoice over the improved Infovox iVox voices.
Philips and Elsevier develop a speech-enabled diagnostic system for Radiologists.
Behold Nattiq’s USB Hal Pen, which allows blind users to use the company’s accessibility features on any computer with a USB port without installation.
Of course there’s some overlap with time-, cost- and money-saving technologies as well. The FBI has announced widespread use of Nuance Dragon Naturally Speaking dictation for report and interview transcription.
Lastly, here’s an a propos rant against call center automation and frustrated end-users, a target group for speech and language technologies all too often neglected. Perhaps there’s a lesson to be learned about usability by the “money savers” employing speech technology, taken from those that rely on speech recognition and synthesis for their daily needs. I don’t know, but F-word spotting as a means for prioritizing frustrated callers seems like an acknowledgement of defeat.