<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Okko in Speech &#187; Google</title>
	<atom:link href="http://www.okkoblog.com/tag/google/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.okkoblog.com</link>
	<description>Working with speech and language technology</description>
	<lastBuildDate>Tue, 20 Jul 2010 08:09:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Quick Voice Prompts with Google Translate TTS Service</title>
		<link>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/</link>
		<comments>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 08:53:09 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Fun]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=149</guid>
		<description><![CDATA[Google last month released several new features to their translation service among them a text-to-speech rendition of the English translation.  As reported elsewhere, it turns out you can directly access this service using a simple URL in your browser.  Following this link will return an MP3 of the text sent along with it: http://translate.google.com/translate_tts?q=Hello+reader Just [...]]]></description>
			<content:encoded><![CDATA[<p>Google last month released several <a href="http://googleblog.blogspot.com/2009/11/new-look-for-google-translate.html">new features</a> to their <a href="http://translate.google.com">translation service</a> among them a text-to-speech rendition of the English translation.  As <a href="http://www.techcrunch.com/2009/12/14/the-unofficial-google-text-to-speech-api/" target="_blank">reported</a> <a href="http://lifehacker.com/5426797/google-translate-url-generates-instant-text+to+speech-mp3-files" target="_blank">elsewhere</a>, it turns out you can directly access this service using a simple URL in your browser.  Following this link will return an MP3 of the text sent along with it:</p>
<p><a href="http://translate.google.com/translate_tts?q=Hello+reader" target="_blank">http://translate.google.com/translate_tts?q=Hello+reader</a></p>
<p>Just replace &#8220;Hello+reader&#8221; with any text that you want spoken in your address bar.  Remember to replace spaces with pluses (+).</p>
<p>Some browsers however seem to have problems with the returned audio.  Chrome worked for me, though Internet Explorer is reportedly working as well.</p>
<p>As this is not an official RESTful Google API don&#8217;t be surprised if it stops working. Beware that commercial reuse of the output audio is likely also governed by license restrictions.</p>
<p><strong>Update:</strong><br />
Friend <a href="http://ch.linkedin.com/in/safra" target="_self">Schamai</a> pointed out how this could be employed in a web form.  Here&#8217;s an example:</p>
<form action="http://translate.google.com/translate_tts">
<input name="q" size="55" value="just saying" />
<button>Speak as MP3</button><br />
</form>
<p>Or the corresponding HTML:<br />
<code><br />
&lt;form action="http://translate.google.com/translate_tts"&gt;<br />
&lt;input name="q" size="55" value="just saying" /&gt;<br />
&lt;/form&gt;<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://translate.google.com/translate_tts?q=Hello+reader" length="5472" type="audio/mpeg" />
		</item>
		<item>
		<title>Tim O&#8217;Reilly: Google Voice Search Key Technology</title>
		<link>http://www.okkoblog.com/2009/04/02/tim-oreilly-google-voice-search-key-technology/</link>
		<comments>http://www.okkoblog.com/2009/04/02/tim-oreilly-google-voice-search-key-technology/#comments</comments>
		<pubDate>Thu, 02 Apr 2009 09:47:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Services]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Gaudi]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[vlingo]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=52</guid>
		<description><![CDATA[ReadWriteWeb reports Tim O&#8217;Reilly addressed attendees at the San Francisco Web 2.0 Expo this week, talking about key technologies for the Web >2.0. Voice search (Google iPhone App), he claimed was a tipping point in terms &#8220;sensor based interfaces&#8221;. While not the only vendor to provide voice search (i.e. Yahoo oneSearch powered by Vlingo) Google [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.readwriteweb.com/archives/five_applications_tim_oreilly_says_point_past_web20.php">ReadWriteWeb reports</a> Tim O&#8217;Reilly addressed attendees at the San Francisco Web 2.0 Expo this week, talking about key technologies for the Web >2.0.  Voice search (<a href="http://googlesystem.blogspot.com/2008/11/google-voice-search-for-iphone.html">Google iPhone App</a>), he claimed was a <a href="http://radar.oreilly.com/2008/11/voice-in-google-mobile-app-tipping-point.html">tipping point</a> in terms &#8220;sensor based interfaces&#8221;.</p>
<p>While not the only vendor to provide voice search (i.e. <a href="http://mobile.yahoo.com/onesearch">Yahoo oneSearch</a> <a href="http://gigaom.com/2008/04/02/vlingo-gets-20m-and-exclusive-yahoo-deal/">powered by Vlingo</a>) Google certainly seems ahead in the game in what appears to be a gradual unfolding of a broad voice strategy, such as Voice Search and recently rebranding a feature-enhanced GrandCentral as <a href="http://www.google.com/voice/about">Google Voice</a>.  Future work on the voice front we can expect includes  promotion of its own speech recognition capacities through <a href="http://code.google.com/android/">Android</a>, <a href="http://gears.google.com/">Google Gears</a> <a href="http://www.chromeexperiments.com/detail/browsertalk/">bringing speech capacities to all browers</a>, tighter integration of <a href="http://labs.google.com/gaudi">Gaudi</a> (audio indexing) with other services and perhaps one day opening up voice services over APIs.</p>
<p>As I&#8217;ve <a href="http://okkobuss.blogspot.com/2008/01/goog-we-need-more-data.html">previously pointed out</a>, to Google voice is just another form of data, but what&#8217;s slowly beginning to emerge is a central role for speech and voice technologies to play in coming developments for the web and how we search and interface with it.</p>
<p><script type="text/javascript"><br />  addthis_url    = location.href; addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/04/02/tim-oreilly-google-voice-search-key-technology/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More speech on the iPhone</title>
		<link>http://www.okkoblog.com/2009/02/08/more-speech-on-the-iphone/</link>
		<comments>http://www.okkoblog.com/2009/02/08/more-speech-on-the-iphone/#comments</comments>
		<pubDate>Sun, 08 Feb 2009 09:34:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[machine translation]]></category>
		<category><![CDATA[open-source]]></category>
		<category><![CDATA[TTS]]></category>
		<category><![CDATA[vlingo]]></category>
		<category><![CDATA[Vocalia]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=48</guid>
		<description><![CDATA[The iPhone has proved a game-changer in many regards and speech is no exception. Both Google and Yahoo (with vlingo) have deployed mobile speech applications for the iPhone.Today I came across another sighting of iPhone speech recognition, Vocalia by Creaceed, employing open-source ASR engine Julius for back-end technology. There is no &#8220;push to talk&#8221; button [...]]]></description>
			<content:encoded><![CDATA[<p>The iPhone has proved a game-changer in many regards and speech is no exception.  Both Google and Yahoo (with vlingo) have deployed mobile speech applications for the iPhone.<br />Today I came across another sighting of iPhone speech recognition, <a href="http://www.creaceed.com/vocalia/">Vocalia</a> by Creaceed, employing open-source ASR engine <a href="http://julius.sourceforge.jp/en_index.php">Julius</a> for back-end technology.  There is no &#8220;push to talk&#8221; button but a &#8220;shake to retry&#8221;, which may prove useful when recognition goes awry.  The app supports French, English and German for now and costs €2.99.  Dictation is not available at this point, though Julius is certainly capable of it from an architecture point of view.</p>
<p>Other speech and language related iPhone apps:,
<ul>
<li><a href="http://googlemobile.blogspot.com/2008/11/google-mobile-app-for-iphone-now-with.html">Google Mobile</a> &#8211; voice search app</li>
<li><a href="http://vlingo.com/">Vlingo</a> &#8211; speech-enables your phone</li>
<li><a href="http://www.innovativelanguage.com/products/pocket">Pocket</a> &#8211; language learning app</li>
<li><a href="http://www.makayama.com/iphonevoicedial.html">Voice Dial</a> &#8211; speech-enabled dialer</li>
<li><a href="http://www.voicethis.com/">VoiceThis</a> &#8211; speech-enabled dialer</li>
<li><a href="http://www.future-apps.net/iSpeak/iSpeak.html">iSpeak</a> &#8211; multi-language translator with synthesized output</li>
<li>A <a href="http://www.crunchgear.com/2009/02/03/iphone-app-helps-reduce-stuttering/">stuttering aid</a> (not yet available)</li>
</ul>
<p>Has anyone used these extensively?  What is your experience with speech on the iPhone?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/02/08/more-speech-on-the-iphone/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Mobile iPhone App with Speech Recognition</title>
		<link>http://www.okkoblog.com/2008/11/18/google-mobile-iphone-app-with-speech-recognition/</link>
		<comments>http://www.okkoblog.com/2008/11/18/google-mobile-iphone-app-with-speech-recognition/#comments</comments>
		<pubDate>Tue, 18 Nov 2008 06:51:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[machine translation]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=43</guid>
		<description><![CDATA[addthis_url = location.href; addthis_title = document.title; addthis_pub = 'okkobuss'; Google released a new feature for its Google Mobile iPhone Application yesterday: voice search. Users speak a query and the application returns search results formatted for the iPhone. This is similar to the GOOG411 directory assistance application, which allows users to call a phone number, speak [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN --><script type="text/javascript"><br />  addthis_url    = location.href;   <br />  addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';     <br /></script><a href="http://www.google.com/">Google</a> released a new feature for its <a href="http://googlemobile.blogspot.com/2008/11/google-mobile-app-for-iphone-now-with.html">Google Mobile iPhone Application</a> yesterday: voice search.  Users speak a query and the application returns search results formatted for the iPhone.  This is similar to the <a href="http://www.google.com/goog411/">GOOG411</a> directory assistance application, which allows users to call a phone number, speak a query and receive information about local listings in voice or SMS formats. However  the new application apparently performs recognition locally on the iPhone, meaning it comes bundled with an embedded speech recognition engine.</p>
<p>Aside from GOOG411, during the US presidential Google released <a href="http://labs.google.com/gaudi">Gaudi</a>, a voice indexing technology for video.  That makes the iPhone app the third official service the company releases, making use of speech recognition, leaving one guessing when Google&#8217;s speech technology becomes available as API, like the <a href="http://code.google.com/apis/ajaxlanguage/">Google AJAX Language API</a> for translation and transliteration, rather than bundled as software services.  Also, an Android version is probably in the works, one would guess.</p>
<p>All applications are available in US English for now.<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/11/18/google-mobile-iphone-app-with-speech-recognition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Showcases Audio Indexing with Gaudi</title>
		<link>http://www.okkoblog.com/2008/09/19/google-showcases-audio-indexing-with-gaudi/</link>
		<comments>http://www.okkoblog.com/2008/09/19/google-showcases-audio-indexing-with-gaudi/#comments</comments>
		<pubDate>Fri, 19 Sep 2008 06:37:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[advertising]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[audio indexing]]></category>
		<category><![CDATA[audio search]]></category>
		<category><![CDATA[Gaudi]]></category>
		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=41</guid>
		<description><![CDATA[Google Labs opened GAudi this week to showcase its new audio indexing technology. Google GAudi allows searching for keywords/phrases in the audio-stream of selected YouTube videos. Matches are represented as yellow slots on the playback slider. Top results appear as snippets of text from the audio surrounding the search term as well as information how [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->Google Labs opened <a href="http://labs.google.com/gaudi">GAudi</a> this week to showcase its new audio indexing technology.</p>
<p>Google GAudi allows searching for keywords/phrases in the audio-stream of selected YouTube videos. Matches are represented as yellow slots on the playback slider. Top results appear as snippets of text from the audio surrounding the search term as well as information how many minutes into the video the term occurred.</p>
<p>The <a href="http://labs.google.com/gaudi/static/faq.html#why-elections">video material chosen</a> to showcase GAudi is material concerning this year&#8217;s US presendential elections as &#8220;part of a broader effort around politics&#8221;, but also because of the high performance with such material and the relevance to testers and users.</p>
<p>Indexing does not appear to be complete, as using randomly chosen text fragments from showcased videos did not always result in a match.  Google does say Gaudi is using its own speech recognition engine, perhaps the same employed by <a href="http://www.google.com/goog411/">GOOG411</a>, though most FAQs about technical details and how one could use GAudi for video are directed to email inquiries.</p>
<p>While GAudi is showcasing campaign material, it seems only a matter of time before audio indexing will be available for serving ad content on video.<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/09/19/google-showcases-audio-indexing-with-gaudi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microsoft Windows Live Messenger Translation Bot</title>
		<link>http://www.okkoblog.com/2008/09/08/microsoft-windows-live-messenger-translation-bot/</link>
		<comments>http://www.okkoblog.com/2008/09/08/microsoft-windows-live-messenger-translation-bot/#comments</comments>
		<pubDate>Mon, 08 Sep 2008 06:50:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Android]]></category>
		<category><![CDATA[Chrome]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[machine translation]]></category>
		<category><![CDATA[Microsoft]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=40</guid>
		<description><![CDATA[In the wake of Google&#8217;s release of its Chrome web-browser, speculation on plans for Chrome on other platforms, including Android have drifted ashore. Naturally this has washed aside much recent IE8 news, which, though not a game-changer, is said to introduce many of the much-needed improvements everyone has been looking for from Microsoft. In light [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->In the wake of Google&#8217;s release of its <a href="http://www.google.com/chrome">Chrome</a> web-browser, speculation on plans for Chrome on other platforms, including <a href="http://code.google.com/android/">Android</a> have drifted ashore.  Naturally this has washed aside much recent <a href="http://www.microsoft.com/windows/internet-explorer/beta/default.aspx">IE8</a> news, which, though not a game-changer, is said to introduce many of the much-needed improvements everyone has been looking for from Microsoft.</p>
<p>In light of the browser war raging, <a href="http://arstechnica.com/journals/microsoft.ars/2008/09/03/new-translation-bot-released-for-windows-live-messenger">a little add-on</a> for Microsoft&#8217;s Live Messenger may not stir many waters, even if it promises real-time chat translation between English and 14 other languages.  However it is still refreshing to read about technology, which is geared at opening channels of communication, rather than capturing market shares.</p>
<p>What are Google&#8217;s plans with Chrome and Android viz. Microsoft IE on Windows Mobile?  Will Microsoft leverage its non-browser language services such as translation and speech recognition like Google has been?<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/09/08/microsoft-windows-live-messenger-translation-bot/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Times Reports &amp; Is SciFi Really Wrong?</title>
		<link>http://www.okkoblog.com/2008/01/27/the-times-reports-is-scifi-really-wrong/</link>
		<comments>http://www.okkoblog.com/2008/01/27/the-times-reports-is-scifi-really-wrong/#comments</comments>
		<pubDate>Sun, 27 Jan 2008 08:56:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[science fiction]]></category>
		<category><![CDATA[SimulScribe]]></category>
		<category><![CDATA[SpinVox]]></category>
		<category><![CDATA[TTS]]></category>
		<category><![CDATA[vlingo]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=35</guid>
		<description><![CDATA[The New York Times today published an interesting, if brief, article about speech recognition in the mobile/telco space &#8211; cited as a &#8220;$1.6 billion market in 2007&#8243;. The article provides a brief overview of a range of applications and mashups, such as vlingo.com and SimulScribe as well as some directory assistance services (but omitting some [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->The New York Times today published an interesting, if brief, <a href="http://www.nytimes.com/2008/01/27/business/27proto.html?_r=1&amp;oref=slogin">article</a> about speech recognition in the mobile/telco space &#8211; cited as a &#8220;$1.6 billion market in 2007&#8243;.  The article provides a brief overview of a range of applications and mashups, such as <a href="http://www.vlingo.com/">vlingo.com</a> and <a href="http://www.simulscribe.com/">SimulScribe</a> as well as some directory assistance services (but omitting some others such as <a href="http://www.spinvox.com/">SpinVox</a>, <a href="http://www.google.com/goog411/">GOOG411</a>), that use voice.<br />The article opens:<br />
<blockquote>&#8220;Innovation usually needs time to steep. Time to turn the idea into something tangible, time to get it to market, time for people to decide they accept it. Speech recognition technology has steeped for a long time&#8221;</p></blockquote>
<p><span style="text-decoration: underline;"> </span><a href="http://www.vlingo.com/"><span style="display: block;" id="formatbar_Buttons"><span class="on down" style="display: block;" id="formatbar_CreateLink" title="Link" onmouseover="ButtonHoverOn(this);" onmouseout="ButtonHoverOff(this);" onmouseup="" onmousedown="CheckFormatting(event);FormatbarButton('richeditorframe', this, 8);ButtonMouseDown(this);"></span></span></a><script type="text/javascript">is_url    = location.href;   <br />  addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';     <br /></script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12"></script>And concludes:<br />
<blockquote>&#8220;Even a digital expert [...] cautions that some people may never be satisfied with the quality of speech recognition technology — thanks to a steady diet of fictional books, movies and television shows featuring machines that understand everything a person says, no matter how sharp the diction or how loud the ambient noise.&#8221;</p></blockquote>
<p>But isn&#8217;t this a bit hackneyed?  Perhaps by today&#8217;s standards a twenty-year steeping period seems long, but this is hardly the case anywhere else in history.  And after re-watching 1982&#8242;s Blade Runner recently, I actually felt rather optimistic that we are today close to what the movie&#8217;s expectations for speech recognition and speaker verification were for 2019.  Elsewhere , a similar picture emerges.<br />The Star Trek ship computer&#8217;s speech recognition engine (the year is 2151), while accurate, stills require the push of a button to kick in, rather than listening for the hot word &#8220;computer&#8221;, a capacity available , if not quite ripe for deployment, today.<br />Of course, there are the HALs (2001), Marvins (no date), C3P0s (Long long time ago&#8230;), whose capacities far exceed that, which we dare dream our mobile phones can one day understand.  But here it seems the problem is less about the quality of speech technology &#8211; the quality of HAL&#8217;s speech synthesis is available today, and Marvin&#8217;s characteristic monotone baritone should be easy to do &#8211; rather than about the old hard-soft divide in Artificial Intelligence.  As long as we use a hard-AI problem, which speech arguably is, to solve soft-AI problems (&#8220;find closest pizza service&#8221;) we cannot fail to be disappointed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/01/27/the-times-reports-is-scifi-really-wrong/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>GOOG: We need more data</title>
		<link>http://www.okkoblog.com/2008/01/03/goog-we-need-more-data/</link>
		<comments>http://www.okkoblog.com/2008/01/03/goog-we-need-more-data/#comments</comments>
		<pubDate>Thu, 03 Jan 2008 08:42:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Loquendo]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[Telisma]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=34</guid>
		<description><![CDATA[addthis_url = location.href; addthis_title = document.title; addthis_pub = 'okkobuss'; The old maxim &#8220;I need more data&#8221; should be familiar to anyone who has ever tried to wrestle with language technology issues, attempted speech application tuning or delved into any statistical approach to an AI-related problem. Google moved into the speech world last year with GOOG-411, [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN --><script type="text/javascript"><br />  addthis_url    = location.href;   <br />  addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';     <br /></script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12"></script>The old maxim &#8220;I need more data&#8221; should be familiar to anyone who has ever tried to wrestle with language technology issues, attempted speech application tuning or delved into any statistical approach to an AI-related problem.   Google <a href="http://www.google.com/goog411/">moved into the speech world</a> last year with GOOG-411, a speech recognition driven directory assistance application (you say what you are looking for and where, it returns suitable businesses and connects you to the one you want or sends you details in an SMS).<br />Like all (well, most) other Google services, GOOG-411 is free for the end-user.  As such, the basic business model (collect data, turn data into cash) applies.  This was <a href="http://www.infoworld.com/article/07/10/23/Google-wants-your-phonemes_3.html">recently confirmed</a>  in interview by Marissa Mayer, Google&#8217;s VP <span class="mdTitleGen">of Search Products and User Experience:</span><br />
<blockquote></blockquote>
<p><span class="artText"><br />
<blockquote><span style="font-size:85%;">Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model &#8230; that we can use for all kinds of different things, including video search.</span></p></blockquote>
<p>Google thus couples statistical AI and its general data-driven approach to everything in a novel way.  In doing so, Google may find itself in a catch-up race with the ilk of <a href="http://www.nuance.com/">Nuance</a>, <a href="http://www.loquendo.com/">Loquendo</a> <a href="http://www-306.ibm.com/software/pervasive/voice_server/ivrgateway.html">IBM</a>, or <a href="http://www.telisma.com/">Telisma</a>, whose stronghold on speech recognition technology comes, in part, from having aggregated speech and language databases through data collection during professional services projects.<br /></span><span class="artText">What&#8217;s new in Google&#8217;s approach, however, is the convergence of the dual role that data plays in AI and in the overall service-driven business model.  Google will presumably not be content to bootstrap a pattern matching engine to sell licenses like the technology companies above.  More interestingly to follow will be the range of services Google can spin using this technology (context sensitive video advertising, audio indexing, IVR hosting) which are more befitting of their overall company strategy.</span><span class="artText"><br />Unsurprisingly, Mayer goes on to claim that Google isn&#8217;t working on ways out of the world of brute-force data-driven algorithms:<br /></span><span class="artText"></span><br />
<blockquote><span style="font-size:85%;"><span class="artText">People should be able to ask questions, and we should understand their meaning, or they should be able to talk                      about things at a conceptual level. &#8230; </span><span class="artText">A lot of people will turn to things like the semantic Web as a possible answer to that. But what we&#8217;re seeing actually is that with a lot of data, you ultimately see things that seem intelligent even though they&#8217;re done through brute force.</span></span></p></blockquote>
<p><span class="artText"></span><span class="artText">User privacy advocates may also have a thought or two on this new dimension of data collection, as Google is beginning to loose the &#8220;conventionally trustworthy&#8221; image it held amongst many over the past years.  Fortunately the ways in which speech data is commonly used to train pattern matching models involves very little in the ways of privacy infringement.</span><span class="artText"><br />Happy data collecting!<br /></span><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/01/03/goog-we-need-more-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Back in the saddle with MSFT, GOOG and VoiceGlue</title>
		<link>http://www.okkoblog.com/2007/11/13/back-in-the-saddle-with-msft-goog-and-voiceglue/</link>
		<comments>http://www.okkoblog.com/2007/11/13/back-in-the-saddle-with-msft-goog-and-voiceglue/#comments</comments>
		<pubDate>Tue, 13 Nov 2007 11:15:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[open-source]]></category>
		<category><![CDATA[TTS]]></category>
		<category><![CDATA[VoiceGlue]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=31</guid>
		<description><![CDATA[Back after an extensive break. Been working hard on some of my own multi-modal ideas. Keep your eyes peeled.Looks like it&#8217;s been a quiet fall, speech and language technology-wise. After GOOG-411, Microsoft has also added speech to their search engine endeavors (if in a different domain) by speech-enabling Live Search for mobile users. Nuance continues [...]]]></description>
			<content:encoded><![CDATA[<p>Back after an extensive break.  Been working hard on some of my own multi-modal ideas.  Keep your eyes peeled.<br />Looks like it&#8217;s been a quiet fall, speech and language technology-wise.  After GOOG-411, Microsoft has also added speech to their search engine endeavors (if in a different domain) by <a href="http://www.gadgetell.com/2007/11/speech-recognition-added-to-microsofts-live-search/">speech-enabling Live Search for mobile users</a>.  Nuance continues to <a href="http://www.thestreet.com/s/nuance-buys-new-york-software-firm/newsanalysis/techsoftware/10382384.html?puc=_googlen?cm_ven=GOOGLEN&amp;cm_cat=FREE&amp;cm_ite=NA">consolidate the speech tech market</a>.<br />Exciting news on the IVR front.  Finally a serious attempt to integrate various open-source technologies to provide free carrier-grade speech/telephone services is under way.  <a href="http://www.voiceglue.org/">VoiceGlue</a> has managed to combine OpenVXI (VXML browser), Flite (Speech Synthesis) on Asterisk and is planning to integrate Sphinx2 for speech recognition.  All components would then be available under some form of the GPL.  Could this herald a change in availability of speech  telephone platforms for developers unwilling to dish out horrendous per-port costs?  Something to follow, anyway.<br />Lastly, <a href="http://www.mmh.com/article/CA6500157.html">here</a>&#8216;s an article describing the growing role of speech in warehouse management.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2007/11/13/back-in-the-saddle-with-msft-goog-and-voiceglue/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google on the Move, News Redux</title>
		<link>http://www.okkoblog.com/2007/07/25/google-on-the-move-news-redux/</link>
		<comments>http://www.okkoblog.com/2007/07/25/google-on-the-move-news-redux/#comments</comments>
		<pubDate>Wed, 25 Jul 2007 07:06:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Acapela]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[machine translation]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=30</guid>
		<description><![CDATA[Very quiet recently. No big acquisitions, no no speech-tech revolution. Most interesting: Google announced Mike Cohen (of formerly Nuance) will appear as keynote speaker at SpeechTek in August to reveal Google&#8217;s speech technology strategy. Google has already moved into the speech application market with GOOG411, an automatic directory assistance application leveraging business search and Google [...]]]></description>
			<content:encoded><![CDATA[<p>Very quiet recently.  No big acquisitions, no no speech-tech revolution.</p>
<p>Most interesting:  Google announced Mike Cohen (of formerly Nuance) will <a href="http://webcast.broadcastnewsroom.com/articles/viewarticle.jsp?id=163358">appear as keynote speaker</a> at SpeechTek in August to reveal Google&#8217;s speech technology strategy.  Google has already moved into the speech application market with GOOG411, an automatic directory assistance application leveraging business search and Google Maps.<br />UBC researchers announce <a href="http://www.sciam.com/article.cfm?articleID=F4E67E1F-E7F2-99DF-3ADB8D4E45375897&#038;chanID=sa001">speech learning system</a> that doesn&#8217;t use traditional data-driven model to learn the sounds of a language.  Instead it is said to represent more experience driven learning, much like infants.  So far, the system has acquired English and Japanese vowels.<br />Some product reviews/announcements:  a quick <a href="http://www.livescience.com/technology/070716_speech_recognition.html">history of desktop dictation</a>, uses of <a href="http://www.emediawire.com/releases/2007/7/emw539305.htm">TextAloud for the iPhone</a>, and Nuance&#8217;s <a href="http://home.businesswire.com/portal/site/google/index.jsp?ndmViewId=news_view&amp;newsId=20070718005532&amp;newsLang=en">new South African voice</a> &#8220;Tessa&#8221;.<br />Also on the web:  NIST <a href="http://www.sciencedaily.com/releases/2007/07/070723095328.htm">evaluates</a> DARPA automatic translation software in military contexts, and <a href="http://www.webpronews.com/blogtalk/2007/07/18/what-semantic-search-is-not">What Semantic Search is Not</a>.</p>
<p>I may post less frequently in coming weeks.  Stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2007/07/25/google-on-the-move-news-redux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
