Posts Tagged ‘TTS’

News Redux & Building VoiceGlue

Tuesday, December 4th, 2007

I stumbled across some “traditional” news bits this week for speech and language technologies, representing most of the major and a few interesting minor market players . Yahoo is offering some kind of NLP-driven structured search for e-commerce solutions starting next year. A new bundled automatic translation software with automatic learning capabilities was announced by across Systems GmbH and Language Weaver. Loquendo is sponsoring a speech-for-in-car-navigation industry event. Persay, maker of voice authentication software, is shipping solutions securing Planet Payment’s voice-enabled payment processing. Lastly Nuance, continuing its acquisition spree, buys Viecore, a contact-center integration consulting company, indicating a clear focus on strengthening its traditional speech and telephony market position.

Recently I stumbled across and blogged about VoiceGlue, an integration of various GPL-licensed pieces of software, providing full IVR capabilities (including rudimentary speech synthesis but not recognition.) Well, last night, together with Christoph, I finally had a stab at it myself.
Our test setup involved running Fedora 9 virtualized in Mac OS X. Our Fedora installation was missing a few pieces of software beyond the indicated prerequisites, but after about an hour everything was under way.
The trickiest bit proved to be building various modules required for the XML parser (I presume needed later for VoiceGlue-customized DTMF grammar parser.) For some reason CPAN’s console kept conking out on us (claiming inexplicably missing/unbuildable prereqs), so after wrestling with that for some time, we decided to manually build all the modules ourself (hoorah, makefiles).
This worked like a charm, though we hit a snag with the Module::Build perl module, which required C_Support, which in turn required another perl module (ExtUtils-CBuilders), not mentioned in any documentation (scant across the board, though that’s half the fun, isn’t it).
After that, the VoiceGlue installation completed swiftly and all services started running after a minimal bit of configuration.
Next week we’ll be back with some test calls and our first impressions. In the meanwhile we’ll keep our eyes peeled for ASR integration (LumenVox/Sphinx), which will make this a truly valuable stab at open sourcing some of the most expensive carrier-grade technology out there.

Assistive and Accessibility Technology

Wednesday, November 21st, 2007

Diligent readers may have noticed that dominant news bits concerning speech and language technologies seem to focus on the cost- or time-saving aspects it. This is understandable, as the big players (Google, Microsoft, Nuance, IBM) have made it their mandate to capture lucrative markets (call center automation, directory assistance). Application of natural language technologies elsewhere, e.g. where it’s fun (in games) or necessary (providing accessibility for visually impaired users), seems to lag.
Not so this week. This week seems to shine under the assistive/accessibility technology star. Note Sourceforge project “Speak as Daisy” – a Microsoft Word plugin that enables creation of XML files with markup for speech synthesis or electronic braille generation. The plugin is said to be available in 2008.
Mac users with need for improved document read back in British English will rejoice over the improved Infovox iVox voices.
Philips and Elsevier develop a speech-enabled diagnostic system for Radiologists.
Behold Nattiq’s USB Hal Pen, which allows blind users to use the company’s accessibility features on any computer with a USB port without installation.
Of course there’s some overlap with time-, cost- and money-saving technologies as well. The FBI has announced widespread use of Nuance Dragon Naturally Speaking dictation for report and interview transcription.
Lastly, here’s an a propos rant against call center automation and frustrated end-users, a target group for speech and language technologies all too often neglected. Perhaps there’s a lesson to be learned about usability by the “money savers” employing speech technology, taken from those that rely on speech recognition and synthesis for their daily needs. I don’t know, but F-word spotting as a means for prioritizing frustrated callers seems like an acknowledgement of defeat.

Back in the saddle with MSFT, GOOG and VoiceGlue

Tuesday, November 13th, 2007

Back after an extensive break. Been working hard on some of my own multi-modal ideas. Keep your eyes peeled.
Looks like it’s been a quiet fall, speech and language technology-wise. After GOOG-411, Microsoft has also added speech to their search engine endeavors (if in a different domain) by speech-enabling Live Search for mobile users. Nuance continues to consolidate the speech tech market.
Exciting news on the IVR front. Finally a serious attempt to integrate various open-source technologies to provide free carrier-grade speech/telephone services is under way. VoiceGlue has managed to combine OpenVXI (VXML browser), Flite (Speech Synthesis) on Asterisk and is planning to integrate Sphinx2 for speech recognition. All components would then be available under some form of the GPL. Could this herald a change in availability of speech telephone platforms for developers unwilling to dish out horrendous per-port costs? Something to follow, anyway.
Lastly, here‘s an article describing the growing role of speech in warehouse management.

Google on the Move, News Redux

Wednesday, July 25th, 2007

Very quiet recently. No big acquisitions, no no speech-tech revolution.

Most interesting: Google announced Mike Cohen (of formerly Nuance) will appear as keynote speaker at SpeechTek in August to reveal Google’s speech technology strategy. Google has already moved into the speech application market with GOOG411, an automatic directory assistance application leveraging business search and Google Maps.
UBC researchers announce speech learning system that doesn’t use traditional data-driven model to learn the sounds of a language. Instead it is said to represent more experience driven learning, much like infants. So far, the system has acquired English and Japanese vowels.
Some product reviews/announcements: a quick history of desktop dictation, uses of TextAloud for the iPhone, and Nuance’s new South African voice “Tessa”.
Also on the web: NIST evaluates DARPA automatic translation software in military contexts, and What Semantic Search is Not.

I may post less frequently in coming weeks. Stay tuned.

This week: Bunnies, Trojans and the Jetsons

Wednesday, July 11th, 2007

There was no shortage of novel uses for speech technology this week. Avaya and the Jersey City’s Liberty Science Center announced speech-enabled exhibits, allowing customers to access information and services in the museum using their voice (and, of course, mobile devices).
Gizmo freaks should love (and everyone else should hate) this bunny, displaying speech recognition and synthesis, while also providing some unified communication capacities.
Also novel, though on a sadder note: speech is finally on the malware radar for good, as TTS trojans popped up using Microsoft’s builtin text-to-speech engine to annoy users by commenting their own malicious behavior. Call it the salt-in-wound virus. This news comes after about half a year after a MS Vista speech recognition security flaw was revealed, whereby the recognizer enables remote execution of content on a computer running speech recognition.

Traditional speech applications made some headlines this week as well: Nuance signs deal with Damovo to roll out speech apps in Ireland, forecasting €1.5m in profits over the next year. TuVox annouces hosted on-demand speech apps for VOIP access.

Lastly, here is an interesting article about the Jetsons and why speech technology hasn’t caught on as much as we have all hoped.

Nuance, Tegic and the woes and comeback of mobile speech

Tuesday, June 26th, 2007

So the big news this week is Nuance’s acquisition of the month: Tegic. Tegic supplies T9 predictive text input to several mobile phone manufacturers. The acquisition represents Nuance’s recent focus on acquiring mobile technology market companies. It serves Nuance with a strategic customer base, including obvious candidates for Nuance’s speech technologies. Aside from the strategic benefits, the technical result of mixing predictive text input with speech is interesting and something to be followed.
Coincidentally, the woes and comeback of using speech for I/O on mobile devices are described in these articles this week.
Lastly here is an interesting interview with Lin Chase, director of Accenture R&D in Bangalore, India, who held several prominent positions in the speech tech industry in the past. Topics include speech, women in the industry and why Americans should travel.

Healthcare, Security and the Army…

Wednesday, June 20th, 2007

…these are the three overarching themes of the speech technology news that I came across this week. There are some obvious and less obvious points of contact here:

Speech Meets Sales, Video Gaming and the Economist reports…

Tuesday, June 12th, 2007

Many of those working in speech recognition, especially deploying customer-service telephone application, have grown tired the limited scope that most projects entail. I recently wrote about speech enabled knowledge bases as a novel type of speech app. In what may be another – at least I haven’t heard this before – MTI and FasTrak Retail combine efforts to launch a ‘virtual sales associates‘ platform. And of course there are the recurring dreams of voice enabled video gaming.

Speech synthesis is naturally more diverse than its recognition sibling (perhaps not everything ‘I’ in I/O can be channelled through voice, but pretty much everything ‘O’ can be synthesized.) In todays news, TTS is employed in emergency response systems to broadcast text messages as audio.

Lastly, speech got some rep in the Economist June 7th issue.

Weekly New Redux…

Tuesday, May 22nd, 2007

Today, I came across some novel(ish) uses for text-to-speech:

On the mainstream speech recognition front:

And some Web3.0 language tech news:

Daily News Redux…

Friday, April 20th, 2007

Today on the WWW: