Archive for the ‘Fun’ Category

This Goes to Eleven

Tuesday, July 20th, 2010

No content, just for fun.

Twitter List RSS with Yahoo Pipes

Monday, January 18th, 2010

This post isn’t really about speech technology, but I wanted to share that after a long time of wondering what the point was, I finally found a use for twitter: Twitter Lists. With these you can follow a group of users with a common theme, either by packing them into a list yourself or by subscribing to other users’ public lists.
However I still can’t be bothered to check twitter.com for updates, nor do I care to install another 3rd-party app for enriching my user experience. And unfortunately there is no direct way to follow a list as an RSS feed, which is how I prefer to consume information1.

Thankfully, yet another neat little Yahoo Pipes mashup comes to the rescue. Simply enter the lists’ creator’s user name and the list name, and off you go.

To add a bit of speech tech to this post, here are a few sample lists that you might find interesting:
@die_lautmaler/voicebusiness
@alisohani/machine-learning
@suellewellyn/cunning-linguists
@rachelcotterill/computational-linguistics
(And thanks to people compiling these!)


1 Interestingly, several friends have recently pointed out that they have ditched RSS for twitter as most of their regular feeds also post there.  However I receive too much content via RSS that twitter won’t deliver, such as Google Alerts, and I find sorting through the twitfeed quickly becomes a chore, something you’ll still have to do when reading lists, I suppose. Also, leaving an open protocol for a commercial (if free) service seems like a step in the wrong direction…

Quick Voice Prompts with Google Translate TTS Service

Tuesday, January 12th, 2010

Google last month released several new features to their translation service among them a text-to-speech rendition of the English translation.  As reported elsewhere, it turns out you can directly access this service using a simple URL in your browser.  Following this link will return an MP3 of the text sent along with it:

http://translate.google.com/translate_tts?q=Hello+reader

Just replace “Hello+reader” with any text that you want spoken in your address bar.  Remember to replace spaces with pluses (+).

Some browsers however seem to have problems with the returned audio.  Chrome worked for me, though Internet Explorer is reportedly working as well.

As this is not an official RESTful Google API don’t be surprised if it stops working. Beware that commercial reuse of the output audio is likely also governed by license restrictions.

Update:
Friend Schamai pointed out how this could be employed in a web form. Here’s an example:


Or the corresponding HTML:

<form action="http://translate.google.com/translate_tts">
<input name="q" size="55" value="just saying" />
</form>

Speaking Piano

Thursday, December 31st, 2009

I greatly enjoyed this video about a piano-cum-speech-synthesis installation. I also think that this would make a great GarageBand plugin.

Welcome at the new URL

Tuesday, December 29th, 2009

Hello reader,

You may be new, you may have found me at my old blog (the content of which has already been migrated here.)  This is a fairly content-free post, bidding you a warm welcome here.

Only the best for 2010,

Okko

Language Technology April Fools

Wednesday, April 1st, 2009

Just posting some gems from today concerning speech and language technology, such as natural language generation, speech recognition and natural language processing.

Have you found any others?

The Times Reports & Is SciFi Really Wrong?

Sunday, January 27th, 2008

The New York Times today published an interesting, if brief, article about speech recognition in the mobile/telco space – cited as a “$1.6 billion market in 2007″. The article provides a brief overview of a range of applications and mashups, such as vlingo.com and SimulScribe as well as some directory assistance services (but omitting some others such as SpinVox, GOOG411), that use voice.
The article opens:

“Innovation usually needs time to steep. Time to turn the idea into something tangible, time to get it to market, time for people to decide they accept it. Speech recognition technology has steeped for a long time”

And concludes:

“Even a digital expert [...] cautions that some people may never be satisfied with the quality of speech recognition technology — thanks to a steady diet of fictional books, movies and television shows featuring machines that understand everything a person says, no matter how sharp the diction or how loud the ambient noise.”

But isn’t this a bit hackneyed? Perhaps by today’s standards a twenty-year steeping period seems long, but this is hardly the case anywhere else in history. And after re-watching 1982′s Blade Runner recently, I actually felt rather optimistic that we are today close to what the movie’s expectations for speech recognition and speaker verification were for 2019. Elsewhere , a similar picture emerges.
The Star Trek ship computer’s speech recognition engine (the year is 2151), while accurate, stills require the push of a button to kick in, rather than listening for the hot word “computer”, a capacity available , if not quite ripe for deployment, today.
Of course, there are the HALs (2001), Marvins (no date), C3P0s (Long long time ago…), whose capacities far exceed that, which we dare dream our mobile phones can one day understand. But here it seems the problem is less about the quality of speech technology – the quality of HAL’s speech synthesis is available today, and Marvin’s characteristic monotone baritone should be easy to do – rather than about the old hard-soft divide in Artificial Intelligence. As long as we use a hard-AI problem, which speech arguably is, to solve soft-AI problems (“find closest pizza service”) we cannot fail to be disappointed.

Welcome

Wednesday, March 28th, 2007

Welcome. Here I will follow what the news and other blogs have to say about what may broadly be called human language technology. These include, but aren’t limited to, automatic speech recognition (ASR), text-to-speech (TTS), speaker recognition/verification (SV), machine translation (MT) and natural language processing (NLP).

Oh and of course: this blog is intended to be informative and, unless otherwise specified, makes no claim about the truthfulness of any referenced material. I will do my best to ensure that any of my own opinions can easily be discerned as such. Comments and debate are always welcome.