A More Optimistic Outlook on the Future of Speech

The speech application industry got some critical press in recent months (here are some spirited responses, respectively.)

All the more refreshing to come across this New York Times article presenting current work in speech and artificial intelligence. The article highlights broadly what kind of AI applications have moved into the mainstream (or have potential to do so). Speech and natural language understanding, the article claims, have gone furthest.

One thing that is generalizable from both criticisms above is that development of speech-enabled applications has stagnated, in various ways1. The underlying technology – speech recognition (ASR) – has gone as far as it can. Application designers and developers haven’t adopted. Dictation has learned to understand doctors and lawyers better, but still struggles with conversational speech.

This point may have to be conceded. In terms of commercial applications however, especially speech-enabled voice (IVR) systems, the root cause for stagnation is not necessarily a failure of AI, rather than a maturing of standards and best-practices. Fulfilling expectations that voice applications, much like websites, behave according to certain rules is much to the advantage of the millions who interact with such systems every day.

What I walk away with from the generalized critical, as well as the Times’ optimistic perspective is that, short of a revolution in underlying technologies (which hardly anyone expects), filling practical, everyday niches is where things can still move forward for speech and language processing.  These niches have certainly not been fully uncovered.

Thoughts?


1 Roughly summarized, Robert Fostner: “development in speech technology has flat-lined since 2001″; David Suendermann: “(statistical) engineering methods are more efficient than traditional symbolic linguistic approaches to language processing.”

Tags: , , , , ,

3 Responses to “A More Optimistic Outlook on the Future of Speech”

  1. nsh Says:

    I kind of disagree that it’s just practice issue. The whole experience in bringing up ASR products leads to the the conclusion that technolgy is not there yet. Users can’t operate with 90% success rate, most applications require 99.999%.

    But I consider this stagnation as a delay before major breakthrough in the technology, so right now is a perfect time to start with ASR and catch the wave that will appear soon.

  2. Okko Says:

    Agree – even good practices are the cause of the stagnation. We know IVR systems are awful to use compared to web pages (or flashy iPhone apps), but by giving their awfulness a certain pattern, we can work around technological limitations.

    What I mean by niches for speech are interesting applications that don’t compete with mouse, keyboard or touch screen for user attention. These are battles bound to be lost. Fancy voice interfaces will always come second to more immersive or efficient input methods.

    I would love to hear where you think the next major breakthrough for ASR technology will come from.

  3. nsh Says:

    > What I mean by niches for speech are interesting applications that don’t compete
    > with mouse, keyboard or touch screen for user attention.

    Exactly, that’s why I consider speech analytics that acts in parallel with usual user activity transparently listening for call, talk or meeting more perspective technology than IVR or dictation. I even started voting about that on blog but suprisingly it shows that way more readers still think that dictation and command & control are usable.

    Another such domain is language learning.

    > I would love to hear where you think the next major breakthrough for ASR
    > technology will come from.

    Well, it should be another source of information. Not necessary AI since I still believe that planes shouldn’t flap wings. It might be WWW, then google will do that faster than anyone else ;) I took this new source idea from this nice post

    http://caterina.net/archive/001211.html

Leave a Reply