<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Okko in Speech &#187; TTS</title>
	<atom:link href="http://www.okkoblog.com/tag/tts/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.okkoblog.com</link>
	<description>Working with speech and language technology</description>
	<lastBuildDate>Thu, 29 Sep 2011 12:37:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Roger Ebert TTS</title>
		<link>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/</link>
		<comments>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 16:57:50 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[Cepstral]]></category>
		<category><![CDATA[CereProc]]></category>
		<category><![CDATA[NeoVoice]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=181</guid>
		<description><![CDATA[Roger Ebert, who lost his lower jaw to cancer, has been his old voice back. Or at least a version of it. Edinburgh-based CereProc has build a custom voice for its own speech synthesis engine based on old recordings such as TV appearances and DVD commentary tracks. This is of course not the first case [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/lJI87Ivk0PM&amp;feature" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/lJI87Ivk0PM&amp;feature"></embed></object></p>
<p>Roger Ebert, who lost his lower jaw to cancer, has been his old voice back. Or at least a version of it. Edinburgh-based <a href="http://www.cereproc.com/" target="_blank">CereProc</a> has build a custom voice for its own speech synthesis engine based on old recordings such as TV appearances and DVD commentary tracks.</p>
<p>This is of course not the first case of text-to-speech (TTS) being used for essential day-to-day communication. Most prominently, Professor Stephen Hawkins has been doing so since 1985, initially using <a href="http://en.wikipedia.org/wiki/DECtalk" target="_blank">DECTalk</a>, since 2009 <a href="http://www.neospeech.com" target="_blank">NeoSpeech</a>. The poor quality of his voice prior to the switch was of course a bit of a trademark. The anecdote goes that Professor Hawkins stuck with his old voice out of attachment. While many speech and language technologies suffer a wow-but-who-really-needs-it existence, these cases are wonderful examples exhibiting real utility.</p>
<p>Mr. Ebert&#8217;s voice is novel in one regard: he got his own voice back. I have half-seriously mused in the past whether this wasn&#8217;t becoming a real option. Typically, new voice development for general purpose speech synthesis is a costly affair, mostly due to time and labor intensive data preprocessing (studio recording, annotation, hand alignment, etc.) However as the &#8220;grunt work&#8221; is getting more streamlined and automatized the buy-in costs for a new voice lowers. Mr. Ebert was &#8220;lucky&#8221; in the sense that large amounts of his voice had already been recorded in good enough quality to enable building his custom voice. Another player on the TTS market, <a href="http://www.cepstral.com" target="_blank">Cepstral</a>, has recently launched its <a href="http://www.voiceforge.com" target="_blank">VoiceForge</a> offering, which aims to lower the entry threshold for home-grown TTS developers.</p>
<p>Another option that seems to be more and more realistic is employing &#8220;voice-morphing&#8221; and &#8220;voice transformation&#8221;. The idea here is to simply apply changes to an already existing, high-quality TTS voice. The following is a demonstration of how the latter can be done by changing purely acoustic properties (timbre, pitch, rate) of a voice signal:</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/-pA7cW0UV88" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/-pA7cW0UV88"></embed></object></p>
<p>Voice morphing changes one voice to another. A Cambridge University <a href="http://mi.eng.cam.ac.uk/~hy216/VoiceMorphingPrj" target="_blank">research project</a> demonstrated how recordings of one speaker could be made to sound like that of another using relatively little training data. The following are some examples:</p>
<p>Original Speaker 1:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" autoplay="false"></embed></object></p>
<p>Target Speaker 2:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" autoplay="false"></embed></object></p>
<p>Converted Speaker 1 to Speaker 2:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" autoplay="false"></embed></object></p>
<p>Similar technology was also <a href="http://www.interspeech2009.org/conference/programme/session.php?id=2710" target="_blank">show cast extensively</a> during the 2009 Interspeech Conference. Perhaps this will one day enable those that have lost their voice without hours (or days) of recordings of it at their disposal to have their own custom voices to talk to their loved ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" length="159788" type="audio/x-wav" />
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" length="200748" type="audio/x-wav" />
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" length="159788" type="audio/x-wav" />
		</item>
		<item>
		<title>Quick Voice Prompts with Google Translate TTS Service</title>
		<link>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/</link>
		<comments>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 08:53:09 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Fun]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=149</guid>
		<description><![CDATA[Google last month released several new features to their translation service among them a text-to-speech rendition of the English translation.  As reported elsewhere, it turns out you can directly access this service using a simple URL in your browser.  Following this link will return an MP3 of the text sent along with it: http://translate.google.com/translate_tts?q=Hello+reader Just [...]]]></description>
			<content:encoded><![CDATA[<p>Google last month released several <a href="http://googleblog.blogspot.com/2009/11/new-look-for-google-translate.html">new features</a> to their <a href="http://translate.google.com">translation service</a> among them a text-to-speech rendition of the English translation.  As <a href="http://www.techcrunch.com/2009/12/14/the-unofficial-google-text-to-speech-api/" target="_blank">reported</a> <a href="http://lifehacker.com/5426797/google-translate-url-generates-instant-text+to+speech-mp3-files" target="_blank">elsewhere</a>, it turns out you can directly access this service using a simple URL in your browser.  Following this link will return an MP3 of the text sent along with it:</p>
<p><a href="http://translate.google.com/translate_tts?q=Hello+reader" target="_blank">http://translate.google.com/translate_tts?q=Hello+reader</a></p>
<p>Just replace &#8220;Hello+reader&#8221; with any text that you want spoken in your address bar.  Remember to replace spaces with pluses (+).</p>
<p>Some browsers however seem to have problems with the returned audio.  Chrome worked for me, though Internet Explorer is reportedly working as well.</p>
<p>As this is not an official RESTful Google API don&#8217;t be surprised if it stops working. Beware that commercial reuse of the output audio is likely also governed by license restrictions.</p>
<p><strong>Update:</strong><br />
Friend <a href="http://ch.linkedin.com/in/safra" target="_self">Schamai</a> pointed out how this could be employed in a web form.  Here&#8217;s an example:</p>
<form action="http://translate.google.com/translate_tts">
<input name="q" size="55" value="just saying" />
<button>Speak as MP3</button><br />
</form>
<p>Or the corresponding HTML:<br />
<code><br />
&lt;form action="http://translate.google.com/translate_tts"&gt;<br />
&lt;input name="q" size="55" value="just saying" /&gt;<br />
&lt;/form&gt;<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
<enclosure url="http://translate.google.com/translate_tts?q=Hello+reader" length="5472" type="audio/mpeg" />
		</item>
		<item>
		<title>Speaking Piano</title>
		<link>http://www.okkoblog.com/2009/12/31/speaking-piano/</link>
		<comments>http://www.okkoblog.com/2009/12/31/speaking-piano/#comments</comments>
		<pubDate>Thu, 31 Dec 2009 22:09:19 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=71</guid>
		<description><![CDATA[I greatly enjoyed this video about a piano-cum-speech-synthesis installation. I also think that this would make a great GarageBand plugin.]]></description>
			<content:encoded><![CDATA[<p><object width="560" height="340"><param name="movie" value="http://www.youtube.com/v/muCPjK4nGY4&#038;hl=en_US&#038;fs=1&#038;rel=0"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/muCPjK4nGY4&#038;hl=en_US&#038;fs=1&#038;rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="560" height="340"></embed></object></p>
<p>I greatly enjoyed this video about a piano-cum-speech-synthesis installation. I also think that this would make a great GarageBand plugin.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/12/31/speaking-piano/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Speech and Dialog Conferences / Speech for iPhone and Android</title>
		<link>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/</link>
		<comments>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/#comments</comments>
		<pubDate>Sat, 11 Jul 2009 08:24:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[Android]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=54</guid>
		<description><![CDATA[Conference time: I will be spending a couple of days in London and Brighton from September 5th attending Interspeech, SIGDIAL as well as a researcher round-table. Anyone interested in meeting up, feel free to get in touch. Also, here are some more or less recent, interesting news for Android (at about 6:20, thanks Schamai) and [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->Conference time:  I will be spending a couple of days in London and Brighton from September 5th attending <a href="http://www.interspeech2009.org/">Interspeech</a>, <a href="http://www.sigdial.org/workshops/workshop10/index.html">SIGDIAL</a> as well as a researcher<a href="http://www.yrrsds.org/"> round-table</a>.  Anyone interested in meeting up, feel free to <a href="http://www.voxarca.de/app/main/contact">get in touch</a>.</p>
<p>Also, here are some more or less recent, interesting news for <a href="http://www.youtube.com/watch?v=uX9nt8Cpdqg">Android</a> (at about 6:20, thanks Schamai) and <a href="http://prmac.com/release-id-6453.htm">iPhone</a> speech developers.<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kindle Speech Synthesis</title>
		<link>http://www.okkoblog.com/2009/02/26/kindle-speech-synthesis/</link>
		<comments>http://www.okkoblog.com/2009/02/26/kindle-speech-synthesis/#comments</comments>
		<pubDate>Thu, 26 Feb 2009 13:35:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[audio books]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=50</guid>
		<description><![CDATA[News about speech and language technology tend to be an in-industry affair, interesting largely to those who need and use it on a daily basis or those who produce (develop or market) it. Every so often however, mainstream news surface that raise issues of broad interest. Google&#8217;s efforts with speech recognition are an example of [...]]]></description>
			<content:encoded><![CDATA[<p>News about speech and language technology tend to be an in-industry affair, interesting largely to those who need and use it on a daily basis or those who produce (develop or market) it.  Every so often however, mainstream news surface that raise issues of broad interest.  Google&#8217;s efforts with speech recognition are an example of this. Last month, Amazon&#8217;s Kindle 2 e-book reader created a <a href="http://www.nytimes.com/2009/02/25/opinion/25blount.html?_r=1&amp;partner=rss&amp;emc=rss&amp;pagewanted=all">buzz</a> <a href="http://news.cnet.com/8301-1023_3-10172412-93.html">with</a> its text-to-speech &#8220;audio book&#8221; functionality.</p>
<p>The underlying issue is that Amazon is selling e-books, which can be listened to using speech synthesis, without owning the rights to produce audio book versions.  The Authors&#8217;s Guild argues that this undermines the lucrative audio book market.  While it is arguable that a synthesized voice is comparable to the experience of  listening to a well-produced audio book, Amazon decided <a href="http://www.crunchgear.com/2009/02/28/authors-guild-successfully-kills-kindle-2-text-to-speech-feature-its-now-optional-for-publishers/">not to fight this one out</a>.</p>
<p>What do you think?  Can synthesized audio books provide an experience comparable to real voice productions?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/02/26/kindle-speech-synthesis/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>More speech on the iPhone</title>
		<link>http://www.okkoblog.com/2009/02/08/more-speech-on-the-iphone/</link>
		<comments>http://www.okkoblog.com/2009/02/08/more-speech-on-the-iphone/#comments</comments>
		<pubDate>Sun, 08 Feb 2009 09:34:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[machine translation]]></category>
		<category><![CDATA[open-source]]></category>
		<category><![CDATA[TTS]]></category>
		<category><![CDATA[vlingo]]></category>
		<category><![CDATA[Vocalia]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=48</guid>
		<description><![CDATA[The iPhone has proved a game-changer in many regards and speech is no exception. Both Google and Yahoo (with vlingo) have deployed mobile speech applications for the iPhone.Today I came across another sighting of iPhone speech recognition, Vocalia by Creaceed, employing open-source ASR engine Julius for back-end technology. There is no &#8220;push to talk&#8221; button [...]]]></description>
			<content:encoded><![CDATA[<p>The iPhone has proved a game-changer in many regards and speech is no exception.  Both Google and Yahoo (with vlingo) have deployed mobile speech applications for the iPhone.<br />Today I came across another sighting of iPhone speech recognition, <a href="http://www.creaceed.com/vocalia/">Vocalia</a> by Creaceed, employing open-source ASR engine <a href="http://julius.sourceforge.jp/en_index.php">Julius</a> for back-end technology.  There is no &#8220;push to talk&#8221; button but a &#8220;shake to retry&#8221;, which may prove useful when recognition goes awry.  The app supports French, English and German for now and costs €2.99.  Dictation is not available at this point, though Julius is certainly capable of it from an architecture point of view.</p>
<p>Other speech and language related iPhone apps:,
<ul>
<li><a href="http://googlemobile.blogspot.com/2008/11/google-mobile-app-for-iphone-now-with.html">Google Mobile</a> &#8211; voice search app</li>
<li><a href="http://vlingo.com/">Vlingo</a> &#8211; speech-enables your phone</li>
<li><a href="http://www.innovativelanguage.com/products/pocket">Pocket</a> &#8211; language learning app</li>
<li><a href="http://www.makayama.com/iphonevoicedial.html">Voice Dial</a> &#8211; speech-enabled dialer</li>
<li><a href="http://www.voicethis.com/">VoiceThis</a> &#8211; speech-enabled dialer</li>
<li><a href="http://www.future-apps.net/iSpeak/iSpeak.html">iSpeak</a> &#8211; multi-language translator with synthesized output</li>
<li>A <a href="http://www.crunchgear.com/2009/02/03/iphone-app-helps-reduce-stuttering/">stuttering aid</a> (not yet available)</li>
</ul>
<p>Has anyone used these extensively?  What is your experience with speech on the iPhone?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/02/08/more-speech-on-the-iphone/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SVOX purchases Siemens AG speech-related IP</title>
		<link>http://www.okkoblog.com/2009/01/26/svox-purchases-siemens-ag-speech-related-ip/</link>
		<comments>http://www.okkoblog.com/2009/01/26/svox-purchases-siemens-ag-speech-related-ip/#comments</comments>
		<pubDate>Mon, 26 Jan 2009 18:59:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[Siemens]]></category>
		<category><![CDATA[SVOX]]></category>
		<category><![CDATA[TellMe]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=46</guid>
		<description><![CDATA[Following Nuance&#8217;s acquisition of IBM speech technology intellectual property two weeks ago, Zurich-based SVOX today announced the purchase of the Siemens AG speech recognition technology group. The deal gears at creating &#8220;obvious synergies of developing TTS, ASR and speech dialog solutions&#8221; and enhances SVOX&#8217;s portfolio of technologies, which to date included only highly specialized speech [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">Following <a href="http://okkobuss.blogspot.com/2009/01/nuance-acquires-ibm-speech-patents.html">Nuance&#8217;s acquisition of IBM speech technology</a> intellectual property two weeks ago,  Zurich-based <a href="http://www.svox.com/News-Items-SVOX-acquires-Speech-Processing-unit-of-Siemens-AG.aspx">SVOX today announced</a> the purchase of the Siemens AG  speech recognition technology group.  The deal gears at creating &#8220;<span id="lblBodyText">obvious synergies of developing TTS, ASR and speech dialog solutions&#8221; and enhances SVOX&#8217;s portfolio of technologies, which to date included only highly specialized speech synthesis solutions,</span><span id="lblBodyText"> to now entail speech recognition.</span><br /><span id="lblBodyText">Like the Nuance-IBM deal (and unlike the <a href="http://www.microsoft.com/presspass/press/2007/mar07/03-14powerofspeechpr.mspx">Microsoft acquisition of TellMe</a>), this merger breaks with the obvious big-fish small-fish paradigm.  Here, </span><span id="lblBodyText">a larger company&#8217;s (IBM, Siemens) R&amp;D</span><span id="lblBodyText"> division was sold to a smaller, more specialized company (SVOX, Nuance).<br />Both transactions come with an intend to pursue development of novel interactive voice applications.  However while Nuance announced the potential development of applications across platforms and environment with IBM expertise and IP, SVOX appears to stay on course with its successful line of automotive solutions to build </span>&#8220;a commanding market share in speech solutions for premium cars<span id="lblBodyText">&#8220;.</span><br /><span id="lblBodyText"></span><br />This deal adds SVOX to a list of companies offering network and embedded speech recognition technologies, also including <a href="http://www.nuance.com/">Nuance</a>, <a href="http://www.telisma.com/">Telisma</a>, <a href="http://www.loquendo.com/">Loquendo</a> and <a href="http://www.microsoft.com/">Microsoft</a>.  Financial terms of the deal were not announced.<br /><script type="text/javascript"><br />  addthis_url    = location.href;   <br />  addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';</script><br /><!-- AddThis Bookmark Button END --></div>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/01/26/svox-purchases-siemens-ag-speech-related-ip/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IBM Predicts Talking Web</title>
		<link>http://www.okkoblog.com/2008/11/28/ibm-predicts-talking-web/</link>
		<comments>http://www.okkoblog.com/2008/11/28/ibm-predicts-talking-web/#comments</comments>
		<pubDate>Fri, 28 Nov 2008 08:19:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=44</guid>
		<description><![CDATA[addthis_url = location.href; addthis_title = document.title; addthis_pub = 'okkobuss'; IBM&#8217;s annual crystal ball list of Innovations That Will Change Our Lives in the Next Five Years includes a forecast of a voice-enabled talking web. &#8220;You will be able to sort through the Web verbally to find what you are looking for and have the information [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN --><script type="text/javascript"><br />  addthis_url    = location.href;   <br />  addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';     </script>IBM&#8217;s annual crystal ball list of <a href="http://www-03.ibm.com/press/us/en/pressrelease/26170.wss">Innovations That Will Change Our Lives in the Next Five Years</a> includes a forecast of a voice-enabled talking web.  &#8220;<span>You will be able to sort through the Web verbally to find what you are looking for and have the information read back to you,&#8221; the article predicts.<br />IBM itself has launched several voice-enabled products and initiatives over the years, most notably the <a href="http://www-01.ibm.com/software/voice/">WebSphere Voice</a> family of web servers, which adds various voice functionality to its flagship WebSphere platform, leveraging it in areas such as unified messaging and call-center automation.<br />Some problems exist with a vision as the one advocated by the article.  Speech recognition accuracy and noise filtering have obviously come a long way and may only pose a minor impediment.</span> The user&#8217;s desire to speak rather than type or click is another problem. Issuing voice commands in the presence of others may not always be desirable and can be disruptive, for instance at work on public transport.  Lastly, there are usability concerns, beyond the quality of speech technology, when converting a visual 2- or even 3-dimensional representation of information into a 1-dimensional audio stream.  The cognitive load increases significantly with tasks more complex than, for instance, obtaining time-table information or finding the nearest Italian restaurant.<br />The effort that stands behind the vision, to put voice technology to uses beyond call-center automation, is laudable.  Mobile internet access and computing on-the-road may indeed do their parts to make this vision come true.  And clearly, there are use cases, such as improved accessibility for users with impairments, that on their own accord merit making the web voice-accessible.  Wide-spread usage of a voice-enabled web, however, may be more than five years off.<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/11/28/ibm-predicts-talking-web/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Internationalization and Speech Technologies</title>
		<link>http://www.okkoblog.com/2008/05/05/internationalization-and-speech-technologies/</link>
		<comments>http://www.okkoblog.com/2008/05/05/internationalization-and-speech-technologies/#comments</comments>
		<pubDate>Mon, 05 May 2008 05:50:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[How To]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[LinkedIn]]></category>
		<category><![CDATA[machine translation]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[open-source]]></category>
		<category><![CDATA[Philips]]></category>
		<category><![CDATA[TTS]]></category>
		<category><![CDATA[Voiceforge]]></category>
		<category><![CDATA[Voxforge]]></category>
		<category><![CDATA[XING]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=36</guid>
		<description><![CDATA[The not-so-subtle truth is, of course, that we all speak English. Yet localization and internationalization are at once prerequisite and stumbling stone for many web-based endeavors. In my own backyard, two examples illustrate the effect and need for of internationalization, respectively. German professional social network XING has internationally outperformed competitors like LinkedIn through early and [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">
<p>The not-so-subtle truth is, of course, that we all speak English. Yet localization and internationalization are at once prerequisite and stumbling stone for many web-based endeavors.</p>
<p>In my own backyard, two examples illustrate the effect and need for of internationalization, respectively.  German professional social network <a href="http://www.xing.com/">XING</a> has internationally outperformed competitors like <a href="http://www.linkedin.com/">LinkedIn</a> through early and aggressive internationalization.  <a href="http://www.studivz.de/">StudiVZ</a> &#8211; the &#8220;German Facebook&#8221; has gained much of the student social network market <a href="http://uk.techcrunch.com/2008/03/03/facebooks-german-version-may-not-impress-the-locals-after-all/">before</a> <a href="http://www.facebook.com/">Facebook</a> decided to release a German version of its web app, making this a tough-to-crack market.</p>
<p>Ironically, as these two examples underline, the need for localization remains in cases where the demands on usability are low (join group/contact person/send message) and the target audience can largely be expected to speak sufficient English (read <a href="http://lostgarden.com/2008/03/translation-game.html">this</a> for an interesting take on the same issues and solutions in online gaming.)  Moreover, localization is an effort far greater than providing an interface in the local language.</p>
<p>As one expects, localization and internationalization and speech technology are inextricably linked &#8211; in a sense developing speech technologies <span style="font-style: italic;">is</span> internationalization.  And using such technology in professional service projects is akin to building a internationalized web application.  Here are some of the oddities I&#8217;ve observed while working with speech technologies in an international environment:</p>
<p><span style="font-style: italic;">Translation is not enough.</span> When you write software that speaks or wants to be spoken to, there is more at stake than providing interface text.  Can you expect all your users to spell input when your system doesn&#8217;t understand the raw speech input?  Can you be sure that all your translated content will generate well-formed speech-synthesis output?  Language and culture are sensitive issues, so a well-localized speech application must do more than provide translated user interface.  Employing local staff is usually a minimum to building a speech application for a new market.</p>
<p><span style="font-style: italic;">The cost shifts.</span> Re-usability of resources from previous speech projects is usually low.  So unlike localizing a web application, porting a speech application requires grunt work that you thought you had done the first time around.  Moreover, speech applications in new languages almost always come with additional licensing burdens and questions about the appropriate technology partner.  Expect to pay for things you didn&#8217;t expect.</p>
<p><span style="font-style: italic;">There is no long tail.</span> The buy-in costs for developing a new language in almost any speech  or language technology (recognition, synthesis, translation) remain constant.  This makes every newly developed language a strategic decision and translates into a two-tier localization effort:  one developing basic technologies, one employing such technology in professional service projects.<br />
As an example, the world&#8217;s most successful dictation software packages: <a href="http://www.nuance.com/naturallyspeaking/international/">Dragon Naturally Speaking</a> ships in five flavors of English and six European languages.  <a href="http://www.speechrecognition.philips.com/index.asp?id=532">Philip&#8217;s Speech Magic</a> ships in 23 dialects of 11 languages.  Both a far cry from world-coverage.<br />
The enormous cost of development has a decided effect on developing speech technology for lesser-spoken languages.  And it has posed a significant hurdle as well for <a href="http://www.voiceforge.com/">open-source</a> <a href="http://www.voxforge.org/">initiatives</a> of speech technologies to provide such resources for free.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/05/05/internationalization-and-speech-technologies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Times Reports &amp; Is SciFi Really Wrong?</title>
		<link>http://www.okkoblog.com/2008/01/27/the-times-reports-is-scifi-really-wrong/</link>
		<comments>http://www.okkoblog.com/2008/01/27/the-times-reports-is-scifi-really-wrong/#comments</comments>
		<pubDate>Sun, 27 Jan 2008 08:56:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[science fiction]]></category>
		<category><![CDATA[SimulScribe]]></category>
		<category><![CDATA[SpinVox]]></category>
		<category><![CDATA[TTS]]></category>
		<category><![CDATA[vlingo]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=35</guid>
		<description><![CDATA[The New York Times today published an interesting, if brief, article about speech recognition in the mobile/telco space &#8211; cited as a &#8220;$1.6 billion market in 2007&#8243;. The article provides a brief overview of a range of applications and mashups, such as vlingo.com and SimulScribe as well as some directory assistance services (but omitting some [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->The New York Times today published an interesting, if brief, <a href="http://www.nytimes.com/2008/01/27/business/27proto.html?_r=1&amp;oref=slogin">article</a> about speech recognition in the mobile/telco space &#8211; cited as a &#8220;$1.6 billion market in 2007&#8243;.  The article provides a brief overview of a range of applications and mashups, such as <a href="http://www.vlingo.com/">vlingo.com</a> and <a href="http://www.simulscribe.com/">SimulScribe</a> as well as some directory assistance services (but omitting some others such as <a href="http://www.spinvox.com/">SpinVox</a>, <a href="http://www.google.com/goog411/">GOOG411</a>), that use voice.<br />The article opens:<br />
<blockquote>&#8220;Innovation usually needs time to steep. Time to turn the idea into something tangible, time to get it to market, time for people to decide they accept it. Speech recognition technology has steeped for a long time&#8221;</p></blockquote>
<p><span style="text-decoration: underline;"> </span><a href="http://www.vlingo.com/"><span style="display: block;" id="formatbar_Buttons"><span class="on down" style="display: block;" id="formatbar_CreateLink" title="Link" onmouseover="ButtonHoverOn(this);" onmouseout="ButtonHoverOff(this);" onmouseup="" onmousedown="CheckFormatting(event);FormatbarButton('richeditorframe', this, 8);ButtonMouseDown(this);"></span></span></a><script type="text/javascript">is_url    = location.href;   <br />  addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';     <br /></script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12"></script>And concludes:<br />
<blockquote>&#8220;Even a digital expert [...] cautions that some people may never be satisfied with the quality of speech recognition technology — thanks to a steady diet of fictional books, movies and television shows featuring machines that understand everything a person says, no matter how sharp the diction or how loud the ambient noise.&#8221;</p></blockquote>
<p>But isn&#8217;t this a bit hackneyed?  Perhaps by today&#8217;s standards a twenty-year steeping period seems long, but this is hardly the case anywhere else in history.  And after re-watching 1982&#8242;s Blade Runner recently, I actually felt rather optimistic that we are today close to what the movie&#8217;s expectations for speech recognition and speaker verification were for 2019.  Elsewhere , a similar picture emerges.<br />The Star Trek ship computer&#8217;s speech recognition engine (the year is 2151), while accurate, stills require the push of a button to kick in, rather than listening for the hot word &#8220;computer&#8221;, a capacity available , if not quite ripe for deployment, today.<br />Of course, there are the HALs (2001), Marvins (no date), C3P0s (Long long time ago&#8230;), whose capacities far exceed that, which we dare dream our mobile phones can one day understand.  But here it seems the problem is less about the quality of speech technology &#8211; the quality of HAL&#8217;s speech synthesis is available today, and Marvin&#8217;s characteristic monotone baritone should be easy to do &#8211; rather than about the old hard-soft divide in Artificial Intelligence.  As long as we use a hard-AI problem, which speech arguably is, to solve soft-AI problems (&#8220;find closest pizza service&#8221;) we cannot fail to be disappointed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/01/27/the-times-reports-is-scifi-really-wrong/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

