<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Okko in Speech</title>
	<atom:link href="http://www.okkoblog.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.okkoblog.com</link>
	<description>Working with speech and language technology</description>
	<lastBuildDate>Tue, 20 Jul 2010 08:09:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>This Goes to Eleven</title>
		<link>http://www.okkoblog.com/2010/07/20/this-goes-to-eleven/</link>
		<comments>http://www.okkoblog.com/2010/07/20/this-goes-to-eleven/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 08:09:21 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Fun]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=195</guid>
		<description><![CDATA[No content, just for fun.]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/5FFRoYhTJQQ" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/5FFRoYhTJQQ"></embed></object></p>
<p>No content, just for fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/07/20/this-goes-to-eleven/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A More Optimistic Outlook on the Future of Speech</title>
		<link>http://www.okkoblog.com/2010/06/30/a-more-optimistic-outlook-on-the-future-of-speech/</link>
		<comments>http://www.okkoblog.com/2010/06/30/a-more-optimistic-outlook-on-the-future-of-speech/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 09:47:04 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[Siri]]></category>
		<category><![CDATA[usability]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=187</guid>
		<description><![CDATA[The speech application industry got some critical press in recent months (here are some spirited responses, respectively.) All the more refreshing to come across this New York Times article presenting current work in speech and artificial intelligence. The article highlights broadly what kind of AI applications have moved into the mainstream (or have potential to [...]]]></description>
			<content:encoded><![CDATA[<p>The speech application industry got some <a href="http://robertfortner.posterous.com/the-unrecognized-death-of-speech-recognition" target="_blank">critical</a> <a href="http://www.signalprocessingsociety.org/technical-committees/list/sl-tc/spl-nl/2010-04/suendermann/">press</a> in recent months (here are some <a href="http://robertopieraccini.blogspot.com/2010/05/un-rest-in-peas-unrecognized-life-of.html">spirited</a> <a href="http://languagelog.ldc.upenn.edu/nll/?p=2275">responses</a>, respectively.)</p>
<p>All the more refreshing to come across this New York Times <a href="http://www.nytimes.com/2010/06/25/science/25voice.html">article</a> presenting <a href="http://research.microsoft.com/en-us/um/people/horvitz/">current</a> <a href="http://siri.com/">work</a> in speech and artificial intelligence. The article highlights broadly what kind of AI applications have moved into the mainstream (or have potential to do so). Speech and natural language understanding, the article claims, have gone furthest.</p>
<p>One thing that is generalizable from both criticisms above is that development of speech-enabled applications has stagnated, in various ways<sup>1</sup>. The underlying technology – speech recognition (ASR) – has gone as far as it can. Application designers and developers haven&#8217;t adopted. Dictation has learned to understand doctors and lawyers better, but still struggles with conversational speech.</p>
<p>This point may have to be conceded. In terms of commercial applications however, especially speech-enabled voice (IVR) systems, the root cause for stagnation is not necessarily a failure of AI, rather than a maturing of standards and best-practices. Fulfilling expectations that voice applications, much like websites, behave according to certain rules is much to the advantage of the millions who interact with such systems every day.</p>
<p>What I walk away with from the generalized critical, as well as the Times&#8217; optimistic perspective is that, short of a revolution in underlying technologies (which hardly anyone expects), filling practical, everyday niches is where things can still move forward for speech and language processing.  These niches have certainly not been fully uncovered.</p>
<p>Thoughts?</p>
<hr /><sup>1</sup> Roughly summarized, Robert Fostner: &#8220;development in speech technology has flat-lined since 2001&#8243;; David Suendermann: &#8220;(statistical) engineering methods are more efficient than traditional symbolic linguistic approaches to language processing.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/06/30/a-more-optimistic-outlook-on-the-future-of-speech/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Roger Ebert TTS</title>
		<link>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/</link>
		<comments>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 16:57:50 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[Cepstral]]></category>
		<category><![CDATA[CereProc]]></category>
		<category><![CDATA[NeoVoice]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=181</guid>
		<description><![CDATA[Roger Ebert, who lost his lower jaw to cancer, has been his old voice back. Or at least a version of it. Edinburgh-based CereProc has build a custom voice for its own speech synthesis engine based on old recordings such as TV appearances and DVD commentary tracks. This is of course not the first case [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/lJI87Ivk0PM&amp;feature" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/lJI87Ivk0PM&amp;feature"></embed></object></p>
<p>Roger Ebert, who lost his lower jaw to cancer, has been his old voice back. Or at least a version of it. Edinburgh-based <a href="http://www.cereproc.com/" target="_blank">CereProc</a> has build a custom voice for its own speech synthesis engine based on old recordings such as TV appearances and DVD commentary tracks.</p>
<p>This is of course not the first case of text-to-speech (TTS) being used for essential day-to-day communication. Most prominently, Professor Stephen Hawkins has been doing so since 1985, initially using <a href="http://en.wikipedia.org/wiki/DECtalk" target="_blank">DECTalk</a>, since 2009 <a href="http://www.neospeech.com" target="_blank">NeoSpeech</a>. The poor quality of his voice prior to the switch was of course a bit of a trademark. The anecdote goes that Professor Hawkins stuck with his old voice out of attachment. While many speech and language technologies suffer a wow-but-who-really-needs-it existence, these cases are wonderful examples exhibiting real utility.</p>
<p>Mr. Ebert&#8217;s voice is novel in one regard: he got his own voice back. I have half-seriously mused in the past whether this wasn&#8217;t becoming a real option. Typically, new voice development for general purpose speech synthesis is a costly affair, mostly due to time and labor intensive data preprocessing (studio recording, annotation, hand alignment, etc.) However as the &#8220;grunt work&#8221; is getting more streamlined and automatized the buy-in costs for a new voice lowers. Mr. Ebert was &#8220;lucky&#8221; in the sense that large amounts of his voice had already been recorded in good enough quality to enable building his custom voice. Another player on the TTS market, <a href="http://www.cepstral.com" target="_blank">Cepstral</a>, has recently launched its <a href="http://www.voiceforge.com" target="_blank">VoiceForge</a> offering, which aims to lower the entry threshold for home-grown TTS developers.</p>
<p>Another option that seems to be more and more realistic is employing &#8220;voice-morphing&#8221; and &#8220;voice transformation&#8221;. The idea here is to simply apply changes to an already existing, high-quality TTS voice. The following is a demonstration of how the latter can be done by changing purely acoustic properties (timbre, pitch, rate) of a voice signal:</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/-pA7cW0UV88" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/-pA7cW0UV88"></embed></object></p>
<p>Voice morphing changes one voice to another. A Cambridge University <a href="http://mi.eng.cam.ac.uk/~hy216/VoiceMorphingPrj" target="_blank">research project</a> demonstrated how recordings of one speaker could be made to sound like that of another using relatively little training data. The following are some examples:</p>
<p>Original Speaker 1:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" autoplay="false"></embed></object></p>
<p>Target Speaker 2:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" autoplay="false"></embed></object></p>
<p>Converted Speaker 1 to Speaker 2:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" autoplay="false"></embed></object></p>
<p>Similar technology was also <a href="http://www.interspeech2009.org/conference/programme/session.php?id=2710" target="_blank">show cast extensively</a> during the 2009 Interspeech Conference. Perhaps this will one day enable those that have lost their voice without hours (or days) of recordings of it at their disposal to have their own custom voices to talk to their loved ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" length="159788" type="audio/x-wav" />
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" length="200748" type="audio/x-wav" />
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" length="159788" type="audio/x-wav" />
		</item>
		<item>
		<title>SpinVox, Voice-to-Text and Some Terminology</title>
		<link>http://www.okkoblog.com/2010/01/18/spinvox-voice-to-text-and-some-terminology/</link>
		<comments>http://www.okkoblog.com/2010/01/18/spinvox-voice-to-text-and-some-terminology/#comments</comments>
		<pubDate>Mon, 18 Jan 2010 11:14:45 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[SpinVox]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=156</guid>
		<description><![CDATA[The recent acquisition of SpinVox by Nuance not only represents another major step towards market consolidation by the latter company, but also prompted me have a look at the voice-to-text market. Being a &#8220;late adopter power user&#8221; – out of some combination of complacency with existing work flows – and refusing to pay for certain [...]]]></description>
			<content:encoded><![CDATA[<p>The recent <a href="http://www.nuance.com/spinvox/" target="_blank">acquisition</a> of <a href="http://www.spinvox.com" target="_blank">SpinVox</a> by <a href="http://www.nuance.com" target="_blank">Nuance</a> not only represents another major step towards market consolidation by the latter company, but also prompted me have a look at the voice-to-text market.  Being a &#8220;late adopter power user&#8221; – out of some combination of complacency with existing work flows – and refusing to pay for certain conveniences, I have refrained from using such services until now. Shameful for one who&#8217;s bread and butter is working with speech technology, I admin.</p>
<p>Luckily I came across some <a href="http://www.readwriteweb.com/archives/voice-to-text-speech-to-text.php" target="_blank">useful</a> <a href="http://baratunde.posterous.com/this-is-a-test-of-the-google-voice-messaging" target="_blank">reviews</a> of the most prominent providers to get me up to snuff. I won&#8217;t go into them, as I&#8217;m sure others have more to say about the actual user experience. However as &#8220;mobile&#8221; is the way speech and langauge technology seems to want to go, and as I finally plan to use more personal mobile computing resources (especially various gadgets starting with &#8220;i&#8221;) for speech technology, I may give some of these a whirl in the near future…</p>
<p>SpinVox caused somewhat of a stir when launching their voice-to-text service in 2004 and another when the BBC &#8220;<a href="http://news.bbc.co.uk/2/hi/8163511.stm" target="_blank">uncovered</a>&#8221; that the company used a combination of human and machine intelligence. To anyone working in speech and language technology this would have been obvious from the get-go, as well as to anyone reading the company&#8217;s patent or patent applications, in which the use of human operators is mentioned explicitly. However regular users would probably have been duped into thinking a machine was doing all the typing.  Failure to understand/communicate this caused a wholly avoidable privacy debacle.</p>
<p>One thing that&#8217;s clear from last years privacy debacle is that there&#8217;s a bit of mess of terminology when it comes to voice and speech technologies.  So here&#8217;s an attempt at shedding some light on what&#8217;s what:</p>
<p style="padding-left: 30px;"><em>Speech Recognition</em> &#8211; also <em>ASR</em> (automatic speech recognition) for short. This is the general term used to refer to the technology that automatically turns spoken words into machine-readable text. However there are different dimensions to describe this technology, such as models employed (HMM-based vs connectionist), who it&#8217;s for  (one single speaker or all speakers of a dialect or language).  Also, there is a host of applications that employ it (dictation, IVR/telephone systems, voice-to-text services), each with different requirements. Hence ASR is really an umbrella term.</p>
<p style="padding-left: 30px;"><em>Voice Recognition</em> &#8211; often confused with speech recognition.  Usually voice recognition refers to software that works for only a single speaker.  However this is anecdotal and in marketing the two are used synonymously.</p>
<p style="padding-left: 30px;"><em>Voice-to-Text</em> &#8211; a service that converts spoken words into text. Some ASR may be used to help to do so, as well as human transcribers, however the label itself makes no claim as to whether the process is fully automated.</p>
<p style="padding-left: 30px;"><em>Speaker Recognition</em> &#8211; this is a security technology typically used to perform one of two tasks: (1) identifying a speaker from a group of known speakers or (2) determining whether a speaker is really who s/he claims. These are very similar tasks that people often confuse.  Think of the first one as picking a person out of a crowd and the second as a kind of &#8220;voice fingerprint matching&#8221;.</p>
<p style="padding-left: 30px;"><em>Text-to-Speech</em> &#8211; or short <em>TTS</em>, another term for speech synthesis.  This technology is used to turn written text into an audio signal (such as an MP3).  This should be an obvious label, but surprisingly people seem to <a href="http://www.youtube.com/watch?v=N9GyPXJGZsU" target="_blank">confuse</a> it with Voice-to-Text services frequently (purely my own anecdote).</p>
<p>I&#8217;m also told SpinVox&#8217;s sales price of $102m is a bit of a disappointment, representing just over 50% of the initial $200m that SpinVox raised in 2003. But that&#8217;s something I&#8217;ll let others address. Let&#8217;s see where Nuance goes with this, in terms of trying to fully automate the whole transcription process…</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/01/18/spinvox-voice-to-text-and-some-terminology/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Twitter List RSS with Yahoo Pipes</title>
		<link>http://www.okkoblog.com/2010/01/18/twitter-list-rss-with-yahoo-pipes/</link>
		<comments>http://www.okkoblog.com/2010/01/18/twitter-list-rss-with-yahoo-pipes/#comments</comments>
		<pubDate>Mon, 18 Jan 2010 11:00:32 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[mashups]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=164</guid>
		<description><![CDATA[This post isn&#8217;t really about speech technology, but I wanted to share that after a long time of wondering what the point was, I finally found a use for twitter: Twitter Lists. With these you can follow a group of users with a common theme, either by packing them into a list yourself or by [...]]]></description>
			<content:encoded><![CDATA[<p>This post isn&#8217;t really about speech technology, but I wanted to share that after a long time of wondering what the point was, I finally found a use for twitter: <a href="http://blog.twitter.com/2009/10/theres-list-for-that.html" target="_blank">Twitter Lists</a>. With these you can follow a group of users with a common theme, either by packing them into a list yourself or by subscribing to other users&#8217; <a href="http://listorious.com/" target="_blank">public lists</a>.<br />
However I still can&#8217;t be bothered to check twitter.com for updates, nor do I care to install another 3rd-party app for enriching my user experience. And unfortunately there is no direct way to follow a list as an RSS feed, which is how I prefer to consume information<sup>1</sup>.</p>
<p>Thankfully, yet another neat little Yahoo Pipes mashup comes <a href="http://pipes.yahoo.com/pipes/pipe.info?_id=fb60de5ff93e81319e3c5fa207b9b276" target="_blank">to the rescue</a>. Simply enter the lists&#8217; creator&#8217;s user name and the list name, and off you go.</p>
<p>To add a bit of speech tech to this post, here are a few sample lists that you might find interesting:<br />
@die_lautmaler/voicebusiness<br />
@alisohani/machine-learning<br />
@suellewellyn/cunning-linguists<br />
@rachelcotterill/computational-linguistics<br />
(And thanks to people compiling these!)</p>
<hr /><sup>1</sup> Interestingly, several friends have recently pointed out that they have ditched RSS for twitter as most of their regular feeds also post there.  However I receive too much content via RSS that twitter won&#8217;t deliver, such as Google Alerts, and I find sorting through the twitfeed quickly becomes a chore, something you&#8217;ll still have to do when reading lists, I suppose. Also, leaving an open protocol for a commercial (if free) service seems like a step in the wrong direction…</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/01/18/twitter-list-rss-with-yahoo-pipes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quick Voice Prompts with Google Translate TTS Service</title>
		<link>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/</link>
		<comments>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 08:53:09 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Fun]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=149</guid>
		<description><![CDATA[Google last month released several new features to their translation service among them a text-to-speech rendition of the English translation.  As reported elsewhere, it turns out you can directly access this service using a simple URL in your browser.  Following this link will return an MP3 of the text sent along with it: http://translate.google.com/translate_tts?q=Hello+reader Just [...]]]></description>
			<content:encoded><![CDATA[<p>Google last month released several <a href="http://googleblog.blogspot.com/2009/11/new-look-for-google-translate.html">new features</a> to their <a href="http://translate.google.com">translation service</a> among them a text-to-speech rendition of the English translation.  As <a href="http://www.techcrunch.com/2009/12/14/the-unofficial-google-text-to-speech-api/" target="_blank">reported</a> <a href="http://lifehacker.com/5426797/google-translate-url-generates-instant-text+to+speech-mp3-files" target="_blank">elsewhere</a>, it turns out you can directly access this service using a simple URL in your browser.  Following this link will return an MP3 of the text sent along with it:</p>
<p><a href="http://translate.google.com/translate_tts?q=Hello+reader" target="_blank">http://translate.google.com/translate_tts?q=Hello+reader</a></p>
<p>Just replace &#8220;Hello+reader&#8221; with any text that you want spoken in your address bar.  Remember to replace spaces with pluses (+).</p>
<p>Some browsers however seem to have problems with the returned audio.  Chrome worked for me, though Internet Explorer is reportedly working as well.</p>
<p>As this is not an official RESTful Google API don&#8217;t be surprised if it stops working. Beware that commercial reuse of the output audio is likely also governed by license restrictions.</p>
<p><strong>Update:</strong><br />
Friend <a href="http://ch.linkedin.com/in/safra" target="_self">Schamai</a> pointed out how this could be employed in a web form.  Here&#8217;s an example:</p>
<form action="http://translate.google.com/translate_tts">
<input name="q" size="55" value="just saying" />
<button>Speak as MP3</button><br />
</form>
<p>Or the corresponding HTML:<br />
<code><br />
&lt;form action="http://translate.google.com/translate_tts"&gt;<br />
&lt;input name="q" size="55" value="just saying" /&gt;<br />
&lt;/form&gt;<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://translate.google.com/translate_tts?q=Hello+reader" length="5472" type="audio/mpeg" />
		</item>
		<item>
		<title>Speaking Piano</title>
		<link>http://www.okkoblog.com/2009/12/31/speaking-piano/</link>
		<comments>http://www.okkoblog.com/2009/12/31/speaking-piano/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 22:09:19 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=71</guid>
		<description><![CDATA[I greatly enjoyed this video about a piano-cum-speech-synthesis installation. I also think that this would make a great GarageBand plugin.]]></description>
			<content:encoded><![CDATA[<p><object width="560" height="340"><param name="movie" value="http://www.youtube.com/v/muCPjK4nGY4&#038;hl=en_US&#038;fs=1&#038;rel=0"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/muCPjK4nGY4&#038;hl=en_US&#038;fs=1&#038;rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="560" height="340"></embed></object></p>
<p>I greatly enjoyed this video about a piano-cum-speech-synthesis installation. I also think that this would make a great GarageBand plugin.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/12/31/speaking-piano/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Incremental Dialogue Management</title>
		<link>http://www.okkoblog.com/2009/12/30/incremental-dialogue-management/</link>
		<comments>http://www.okkoblog.com/2009/12/30/incremental-dialogue-management/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 19:37:59 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[dialogue research]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/blog/?p=65</guid>
		<description><![CDATA[The past year I&#8217;ve been involved in research on incremental processing in spoken dialogue systems at Potsdam University. Our project looks at how information in dialogues can be reduced to basic units, which get passed between modules (such as a speech recognizer and a semantic engine), based on a general abstract model of how this [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Dilbert.com" href="http://dilbert.com/strips/comic/2009-12-29/"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/70000/7000/900/77968/77968.strip.gif" border="0" alt="Dilbert.com" /></a><br />
The past year I&#8217;ve been involved in research on incremental processing in spoken dialogue systems at Potsdam University.  Our <a href="http://coco-lab.org/index.php?option=com_content&amp;task=view&amp;id=21&amp;Itemid=9">project</a> looks at how information in dialogues can be reduced to basic units, which get passed between modules (such as a speech recognizer and a semantic engine), based on a <a href="http://www.ling.uni-potsdam.de/~das/papers/schlangenetal_agmo_eacl2009.pdf" target="_blank">general abstract model</a> of how this can be done.  Thus far, we&#8217;ve been mainly concerned with issues originating close to the input speech signal (<a href="http://www.ling.uni-potsdam.de/~timo/pub/naacl-hlt2009.pdf" target="_blank">ASR</a>, <a href="http://www.ling.uni-potsdam.de/~timo/pub/interspeech-rubisc.pdf" target="_blank">semantics</a>, <a href="http://www.ling.uni-potsdam.de/~timo/pub/refres_sigdial.pdf" target="_blank">reference resolution</a>, <a href="http://www.ling.uni-potsdam.de/~timo/pub/interspeech-nbest.pdf">n-best lists</a>, <a href="http://www.ling.uni-potsdam.de/~timo/pub/diaholmia.pdf" target="_blank">prosody</a> etc.).  As these issues are mostly laid out, 2010 will be dedicated to research on larger dialogue issues (interaction &amp; dialogue management, incremental output generation.)</p>
<p>As in the Dilbert dialogue snippet, some issues that will naturally arise are (1) how different types of questions can be handled by an incremental dialogue system (breaking with the established Question-Answer-Question-A-Q&#8230; paradigm in favour of something more dynamic) and (2) what turn-taking means in an incremental framework (we now have a system that can interrupt the user at appropriate moments).  Incrementality delivers mostly benefits of speed, robustness and naturalness on the interaction front and these are linked to output generation, so this is a third issue to watch out for.  Larger dialogue strategies may not be as affected, but if they are, we need to establish in what ways.</p>
<p>We&#8217;ll certainly steer clear of calling our prototype Morgan. If you are involved in speech and language processing and interested in creating interesting, more natural human-machine dialogues, I&#8217;d love to hear from you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/12/30/incremental-dialogue-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Welcome at the new URL</title>
		<link>http://www.okkoblog.com/2009/12/29/welcome-at-the-new-url/</link>
		<comments>http://www.okkoblog.com/2009/12/29/welcome-at-the-new-url/#comments</comments>
		<pubDate>Tue, 29 Dec 2009 14:59:31 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[welcome]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=56</guid>
		<description><![CDATA[Hello reader, You may be new, you may have found me at my old blog (the content of which has already been migrated here.)  This is a fairly content-free post, bidding you a warm welcome here. Only the best for 2010, Okko]]></description>
			<content:encoded><![CDATA[<p>Hello reader,</p>
<p>You may be new, you may have found me at my <a href="http://okkobuss.blogspot.com" target="_blank">old blog</a> (the content of which has already been migrated here.)  This is a fairly content-free post, bidding you a warm welcome here.</p>
<p>Only the best for 2010,</p>
<p>Okko</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/12/29/welcome-at-the-new-url/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Speech and Dialog Conferences / Speech for iPhone and Android</title>
		<link>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/</link>
		<comments>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/#comments</comments>
		<pubDate>Sat, 11 Jul 2009 08:24:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[Android]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=54</guid>
		<description><![CDATA[Conference time: I will be spending a couple of days in London and Brighton from September 5th attending Interspeech, SIGDIAL as well as a researcher round-table. Anyone interested in meeting up, feel free to get in touch. Also, here are some more or less recent, interesting news for Android (at about 6:20, thanks Schamai) and [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->Conference time:  I will be spending a couple of days in London and Brighton from September 5th attending <a href="http://www.interspeech2009.org/">Interspeech</a>, <a href="http://www.sigdial.org/workshops/workshop10/index.html">SIGDIAL</a> as well as a researcher<a href="http://www.yrrsds.org/"> round-table</a>.  Anyone interested in meeting up, feel free to <a href="http://www.voxarca.de/app/main/contact">get in touch</a>.</p>
<p>Also, here are some more or less recent, interesting news for <a href="http://www.youtube.com/watch?v=uX9nt8Cpdqg">Android</a> (at about 6:20, thanks Schamai) and <a href="http://prmac.com/release-id-6453.htm">iPhone</a> speech developers.<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
