<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Okko in Speech &#187; Services</title>
	<atom:link href="http://www.okkoblog.com/category/brands/services/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.okkoblog.com</link>
	<description>Working with speech and language technology</description>
	<lastBuildDate>Tue, 20 Jul 2010 08:09:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>SpinVox, Voice-to-Text and Some Terminology</title>
		<link>http://www.okkoblog.com/2010/01/18/spinvox-voice-to-text-and-some-terminology/</link>
		<comments>http://www.okkoblog.com/2010/01/18/spinvox-voice-to-text-and-some-terminology/#comments</comments>
		<pubDate>Mon, 18 Jan 2010 11:14:45 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[SpinVox]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=156</guid>
		<description><![CDATA[The recent acquisition of SpinVox by Nuance not only represents another major step towards market consolidation by the latter company, but also prompted me have a look at the voice-to-text market. Being a &#8220;late adopter power user&#8221; – out of some combination of complacency with existing work flows – and refusing to pay for certain [...]]]></description>
			<content:encoded><![CDATA[<p>The recent <a href="http://www.nuance.com/spinvox/" target="_blank">acquisition</a> of <a href="http://www.spinvox.com" target="_blank">SpinVox</a> by <a href="http://www.nuance.com" target="_blank">Nuance</a> not only represents another major step towards market consolidation by the latter company, but also prompted me have a look at the voice-to-text market.  Being a &#8220;late adopter power user&#8221; – out of some combination of complacency with existing work flows – and refusing to pay for certain conveniences, I have refrained from using such services until now. Shameful for one who&#8217;s bread and butter is working with speech technology, I admin.</p>
<p>Luckily I came across some <a href="http://www.readwriteweb.com/archives/voice-to-text-speech-to-text.php" target="_blank">useful</a> <a href="http://baratunde.posterous.com/this-is-a-test-of-the-google-voice-messaging" target="_blank">reviews</a> of the most prominent providers to get me up to snuff. I won&#8217;t go into them, as I&#8217;m sure others have more to say about the actual user experience. However as &#8220;mobile&#8221; is the way speech and langauge technology seems to want to go, and as I finally plan to use more personal mobile computing resources (especially various gadgets starting with &#8220;i&#8221;) for speech technology, I may give some of these a whirl in the near future…</p>
<p>SpinVox caused somewhat of a stir when launching their voice-to-text service in 2004 and another when the BBC &#8220;<a href="http://news.bbc.co.uk/2/hi/8163511.stm" target="_blank">uncovered</a>&#8221; that the company used a combination of human and machine intelligence. To anyone working in speech and language technology this would have been obvious from the get-go, as well as to anyone reading the company&#8217;s patent or patent applications, in which the use of human operators is mentioned explicitly. However regular users would probably have been duped into thinking a machine was doing all the typing.  Failure to understand/communicate this caused a wholly avoidable privacy debacle.</p>
<p>One thing that&#8217;s clear from last years privacy debacle is that there&#8217;s a bit of mess of terminology when it comes to voice and speech technologies.  So here&#8217;s an attempt at shedding some light on what&#8217;s what:</p>
<p style="padding-left: 30px;"><em>Speech Recognition</em> &#8211; also <em>ASR</em> (automatic speech recognition) for short. This is the general term used to refer to the technology that automatically turns spoken words into machine-readable text. However there are different dimensions to describe this technology, such as models employed (HMM-based vs connectionist), who it&#8217;s for  (one single speaker or all speakers of a dialect or language).  Also, there is a host of applications that employ it (dictation, IVR/telephone systems, voice-to-text services), each with different requirements. Hence ASR is really an umbrella term.</p>
<p style="padding-left: 30px;"><em>Voice Recognition</em> &#8211; often confused with speech recognition.  Usually voice recognition refers to software that works for only a single speaker.  However this is anecdotal and in marketing the two are used synonymously.</p>
<p style="padding-left: 30px;"><em>Voice-to-Text</em> &#8211; a service that converts spoken words into text. Some ASR may be used to help to do so, as well as human transcribers, however the label itself makes no claim as to whether the process is fully automated.</p>
<p style="padding-left: 30px;"><em>Speaker Recognition</em> &#8211; this is a security technology typically used to perform one of two tasks: (1) identifying a speaker from a group of known speakers or (2) determining whether a speaker is really who s/he claims. These are very similar tasks that people often confuse.  Think of the first one as picking a person out of a crowd and the second as a kind of &#8220;voice fingerprint matching&#8221;.</p>
<p style="padding-left: 30px;"><em>Text-to-Speech</em> &#8211; or short <em>TTS</em>, another term for speech synthesis.  This technology is used to turn written text into an audio signal (such as an MP3).  This should be an obvious label, but surprisingly people seem to <a href="http://www.youtube.com/watch?v=N9GyPXJGZsU" target="_blank">confuse</a> it with Voice-to-Text services frequently (purely my own anecdote).</p>
<p>I&#8217;m also told SpinVox&#8217;s sales price of $102m is a bit of a disappointment, representing just over 50% of the initial $200m that SpinVox raised in 2003. But that&#8217;s something I&#8217;ll let others address. Let&#8217;s see where Nuance goes with this, in terms of trying to fully automate the whole transcription process…</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/01/18/spinvox-voice-to-text-and-some-terminology/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Quick Voice Prompts with Google Translate TTS Service</title>
		<link>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/</link>
		<comments>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 08:53:09 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Fun]]></category>
		<category><![CDATA[How To]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=149</guid>
		<description><![CDATA[Google last month released several new features to their translation service among them a text-to-speech rendition of the English translation.  As reported elsewhere, it turns out you can directly access this service using a simple URL in your browser.  Following this link will return an MP3 of the text sent along with it: http://translate.google.com/translate_tts?q=Hello+reader Just [...]]]></description>
			<content:encoded><![CDATA[<p>Google last month released several <a href="http://googleblog.blogspot.com/2009/11/new-look-for-google-translate.html">new features</a> to their <a href="http://translate.google.com">translation service</a> among them a text-to-speech rendition of the English translation.  As <a href="http://www.techcrunch.com/2009/12/14/the-unofficial-google-text-to-speech-api/" target="_blank">reported</a> <a href="http://lifehacker.com/5426797/google-translate-url-generates-instant-text+to+speech-mp3-files" target="_blank">elsewhere</a>, it turns out you can directly access this service using a simple URL in your browser.  Following this link will return an MP3 of the text sent along with it:</p>
<p><a href="http://translate.google.com/translate_tts?q=Hello+reader" target="_blank">http://translate.google.com/translate_tts?q=Hello+reader</a></p>
<p>Just replace &#8220;Hello+reader&#8221; with any text that you want spoken in your address bar.  Remember to replace spaces with pluses (+).</p>
<p>Some browsers however seem to have problems with the returned audio.  Chrome worked for me, though Internet Explorer is reportedly working as well.</p>
<p>As this is not an official RESTful Google API don&#8217;t be surprised if it stops working. Beware that commercial reuse of the output audio is likely also governed by license restrictions.</p>
<p><strong>Update:</strong><br />
Friend <a href="http://ch.linkedin.com/in/safra" target="_self">Schamai</a> pointed out how this could be employed in a web form.  Here&#8217;s an example:</p>
<form action="http://translate.google.com/translate_tts">
<input name="q" size="55" value="just saying" />
<button>Speak as MP3</button><br />
</form>
<p>Or the corresponding HTML:<br />
<code><br />
&lt;form action="http://translate.google.com/translate_tts"&gt;<br />
&lt;input name="q" size="55" value="just saying" /&gt;<br />
&lt;/form&gt;<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/01/12/quick-voice-prompts-with-google-translate-tts-service/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://translate.google.com/translate_tts?q=Hello+reader" length="5472" type="audio/mpeg" />
		</item>
		<item>
		<title>Speech and Dialog Conferences / Speech for iPhone and Android</title>
		<link>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/</link>
		<comments>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/#comments</comments>
		<pubDate>Sat, 11 Jul 2009 08:24:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[Android]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=54</guid>
		<description><![CDATA[Conference time: I will be spending a couple of days in London and Brighton from September 5th attending Interspeech, SIGDIAL as well as a researcher round-table. Anyone interested in meeting up, feel free to get in touch. Also, here are some more or less recent, interesting news for Android (at about 6:20, thanks Schamai) and [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->Conference time:  I will be spending a couple of days in London and Brighton from September 5th attending <a href="http://www.interspeech2009.org/">Interspeech</a>, <a href="http://www.sigdial.org/workshops/workshop10/index.html">SIGDIAL</a> as well as a researcher<a href="http://www.yrrsds.org/"> round-table</a>.  Anyone interested in meeting up, feel free to <a href="http://www.voxarca.de/app/main/contact">get in touch</a>.</p>
<p>Also, here are some more or less recent, interesting news for <a href="http://www.youtube.com/watch?v=uX9nt8Cpdqg">Android</a> (at about 6:20, thanks Schamai) and <a href="http://prmac.com/release-id-6453.htm">iPhone</a> speech developers.<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tim O&#8217;Reilly: Google Voice Search Key Technology</title>
		<link>http://www.okkoblog.com/2009/04/02/tim-oreilly-google-voice-search-key-technology/</link>
		<comments>http://www.okkoblog.com/2009/04/02/tim-oreilly-google-voice-search-key-technology/#comments</comments>
		<pubDate>Thu, 02 Apr 2009 09:47:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Services]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Gaudi]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[vlingo]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=52</guid>
		<description><![CDATA[ReadWriteWeb reports Tim O&#8217;Reilly addressed attendees at the San Francisco Web 2.0 Expo this week, talking about key technologies for the Web >2.0. Voice search (Google iPhone App), he claimed was a tipping point in terms &#8220;sensor based interfaces&#8221;. While not the only vendor to provide voice search (i.e. Yahoo oneSearch powered by Vlingo) Google [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.readwriteweb.com/archives/five_applications_tim_oreilly_says_point_past_web20.php">ReadWriteWeb reports</a> Tim O&#8217;Reilly addressed attendees at the San Francisco Web 2.0 Expo this week, talking about key technologies for the Web >2.0.  Voice search (<a href="http://googlesystem.blogspot.com/2008/11/google-voice-search-for-iphone.html">Google iPhone App</a>), he claimed was a <a href="http://radar.oreilly.com/2008/11/voice-in-google-mobile-app-tipping-point.html">tipping point</a> in terms &#8220;sensor based interfaces&#8221;.</p>
<p>While not the only vendor to provide voice search (i.e. <a href="http://mobile.yahoo.com/onesearch">Yahoo oneSearch</a> <a href="http://gigaom.com/2008/04/02/vlingo-gets-20m-and-exclusive-yahoo-deal/">powered by Vlingo</a>) Google certainly seems ahead in the game in what appears to be a gradual unfolding of a broad voice strategy, such as Voice Search and recently rebranding a feature-enhanced GrandCentral as <a href="http://www.google.com/voice/about">Google Voice</a>.  Future work on the voice front we can expect includes  promotion of its own speech recognition capacities through <a href="http://code.google.com/android/">Android</a>, <a href="http://gears.google.com/">Google Gears</a> <a href="http://www.chromeexperiments.com/detail/browsertalk/">bringing speech capacities to all browers</a>, tighter integration of <a href="http://labs.google.com/gaudi">Gaudi</a> (audio indexing) with other services and perhaps one day opening up voice services over APIs.</p>
<p>As I&#8217;ve <a href="http://okkobuss.blogspot.com/2008/01/goog-we-need-more-data.html">previously pointed out</a>, to Google voice is just another form of data, but what&#8217;s slowly beginning to emerge is a central role for speech and voice technologies to play in coming developments for the web and how we search and interface with it.</p>
<p><script type="text/javascript"><br />  addthis_url    = location.href; addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';</script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/04/02/tim-oreilly-google-voice-search-key-technology/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microsoft Recite Preview &#8211; Note Dictation and Voice Search</title>
		<link>http://www.okkoblog.com/2009/02/16/microsoft-recite-preview-note-dictation-and-voice-search/</link>
		<comments>http://www.okkoblog.com/2009/02/16/microsoft-recite-preview-note-dictation-and-voice-search/#comments</comments>
		<pubDate>Mon, 16 Feb 2009 07:43:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[mobile]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=49</guid>
		<description><![CDATA[Arstechnica reports today on the release of Microsoft Recite &#8220;Technology Preview&#8221; for Windows Mobile. The applications lets users record short notes as audio snippets, which can later be searched for content by speaking key words. Apparently it does not entail speech recognition rather than simpler pattern matching, meaning it cannot be searched in text form [...]]]></description>
			<content:encoded><![CDATA[<p>Arstechnica <a href="http://arstechnica.com/microsoft/news/2009/02/microsoft-recite-for-windows-mobile-previewed.ars">reports</a> today on the release of <a href="http://recite.microsoft.com/">Microsoft Recite</a> &#8220;Technology Preview&#8221; for Windows Mobile.   The applications lets users record short notes as audio snippets, which can later be searched for content by speaking key words.  Apparently it does not entail speech recognition rather than simpler pattern matching, meaning it cannot be searched in text form but may work more robustly, eliminating the effort of training for speaker-independency.</p>
<p>While not a full product yet, this sounds like a nifty little application for cognitive off-loading.</p>
<p>Have you tried Microsoft Recite?</p>
<p><!-- AddThis Bookmark Button BEGIN --><br /><script type="text/javascript"><br />  addthis_url    = location.href;   <br />  addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';     <br /></script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12"></script><br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/02/16/microsoft-recite-preview-note-dictation-and-voice-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Mobile iPhone App with Speech Recognition</title>
		<link>http://www.okkoblog.com/2008/11/18/google-mobile-iphone-app-with-speech-recognition/</link>
		<comments>http://www.okkoblog.com/2008/11/18/google-mobile-iphone-app-with-speech-recognition/#comments</comments>
		<pubDate>Tue, 18 Nov 2008 06:51:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[machine translation]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=43</guid>
		<description><![CDATA[addthis_url = location.href; addthis_title = document.title; addthis_pub = 'okkobuss'; Google released a new feature for its Google Mobile iPhone Application yesterday: voice search. Users speak a query and the application returns search results formatted for the iPhone. This is similar to the GOOG411 directory assistance application, which allows users to call a phone number, speak [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN --><script type="text/javascript"><br />  addthis_url    = location.href;   <br />  addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';     <br /></script><a href="http://www.google.com/">Google</a> released a new feature for its <a href="http://googlemobile.blogspot.com/2008/11/google-mobile-app-for-iphone-now-with.html">Google Mobile iPhone Application</a> yesterday: voice search.  Users speak a query and the application returns search results formatted for the iPhone.  This is similar to the <a href="http://www.google.com/goog411/">GOOG411</a> directory assistance application, which allows users to call a phone number, speak a query and receive information about local listings in voice or SMS formats. However  the new application apparently performs recognition locally on the iPhone, meaning it comes bundled with an embedded speech recognition engine.</p>
<p>Aside from GOOG411, during the US presidential Google released <a href="http://labs.google.com/gaudi">Gaudi</a>, a voice indexing technology for video.  That makes the iPhone app the third official service the company releases, making use of speech recognition, leaving one guessing when Google&#8217;s speech technology becomes available as API, like the <a href="http://code.google.com/apis/ajaxlanguage/">Google AJAX Language API</a> for translation and transliteration, rather than bundled as software services.  Also, an Android version is probably in the works, one would guess.</p>
<p>All applications are available in US English for now.<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/11/18/google-mobile-iphone-app-with-speech-recognition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Showcases Audio Indexing with Gaudi</title>
		<link>http://www.okkoblog.com/2008/09/19/google-showcases-audio-indexing-with-gaudi/</link>
		<comments>http://www.okkoblog.com/2008/09/19/google-showcases-audio-indexing-with-gaudi/#comments</comments>
		<pubDate>Fri, 19 Sep 2008 06:37:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[advertising]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[audio indexing]]></category>
		<category><![CDATA[audio search]]></category>
		<category><![CDATA[Gaudi]]></category>
		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=41</guid>
		<description><![CDATA[Google Labs opened GAudi this week to showcase its new audio indexing technology. Google GAudi allows searching for keywords/phrases in the audio-stream of selected YouTube videos. Matches are represented as yellow slots on the playback slider. Top results appear as snippets of text from the audio surrounding the search term as well as information how [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->Google Labs opened <a href="http://labs.google.com/gaudi">GAudi</a> this week to showcase its new audio indexing technology.</p>
<p>Google GAudi allows searching for keywords/phrases in the audio-stream of selected YouTube videos. Matches are represented as yellow slots on the playback slider. Top results appear as snippets of text from the audio surrounding the search term as well as information how many minutes into the video the term occurred.</p>
<p>The <a href="http://labs.google.com/gaudi/static/faq.html#why-elections">video material chosen</a> to showcase GAudi is material concerning this year&#8217;s US presendential elections as &#8220;part of a broader effort around politics&#8221;, but also because of the high performance with such material and the relevance to testers and users.</p>
<p>Indexing does not appear to be complete, as using randomly chosen text fragments from showcased videos did not always result in a match.  Google does say Gaudi is using its own speech recognition engine, perhaps the same employed by <a href="http://www.google.com/goog411/">GOOG411</a>, though most FAQs about technical details and how one could use GAudi for video are directed to email inquiries.</p>
<p>While GAudi is showcasing campaign material, it seems only a matter of time before audio indexing will be available for serving ad content on video.<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/09/19/google-showcases-audio-indexing-with-gaudi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microsoft Windows Live Messenger Translation Bot</title>
		<link>http://www.okkoblog.com/2008/09/08/microsoft-windows-live-messenger-translation-bot/</link>
		<comments>http://www.okkoblog.com/2008/09/08/microsoft-windows-live-messenger-translation-bot/#comments</comments>
		<pubDate>Mon, 08 Sep 2008 06:50:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Android]]></category>
		<category><![CDATA[Chrome]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[machine translation]]></category>
		<category><![CDATA[Microsoft]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=40</guid>
		<description><![CDATA[In the wake of Google&#8217;s release of its Chrome web-browser, speculation on plans for Chrome on other platforms, including Android have drifted ashore. Naturally this has washed aside much recent IE8 news, which, though not a game-changer, is said to introduce many of the much-needed improvements everyone has been looking for from Microsoft. In light [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->In the wake of Google&#8217;s release of its <a href="http://www.google.com/chrome">Chrome</a> web-browser, speculation on plans for Chrome on other platforms, including <a href="http://code.google.com/android/">Android</a> have drifted ashore.  Naturally this has washed aside much recent <a href="http://www.microsoft.com/windows/internet-explorer/beta/default.aspx">IE8</a> news, which, though not a game-changer, is said to introduce many of the much-needed improvements everyone has been looking for from Microsoft.</p>
<p>In light of the browser war raging, <a href="http://arstechnica.com/journals/microsoft.ars/2008/09/03/new-translation-bot-released-for-windows-live-messenger">a little add-on</a> for Microsoft&#8217;s Live Messenger may not stir many waters, even if it promises real-time chat translation between English and 14 other languages.  However it is still refreshing to read about technology, which is geared at opening channels of communication, rather than capturing market shares.</p>
<p>What are Google&#8217;s plans with Chrome and Android viz. Microsoft IE on Windows Mobile?  Will Microsoft leverage its non-browser language services such as translation and speech recognition like Google has been?<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/09/08/microsoft-windows-live-messenger-translation-bot/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GOOG: We need more data</title>
		<link>http://www.okkoblog.com/2008/01/03/goog-we-need-more-data/</link>
		<comments>http://www.okkoblog.com/2008/01/03/goog-we-need-more-data/#comments</comments>
		<pubDate>Thu, 03 Jan 2008 08:42:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Loquendo]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[Telisma]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=34</guid>
		<description><![CDATA[addthis_url = location.href; addthis_title = document.title; addthis_pub = 'okkobuss'; The old maxim &#8220;I need more data&#8221; should be familiar to anyone who has ever tried to wrestle with language technology issues, attempted speech application tuning or delved into any statistical approach to an AI-related problem. Google moved into the speech world last year with GOOG-411, [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN --><script type="text/javascript"><br />  addthis_url    = location.href;   <br />  addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';     <br /></script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12"></script>The old maxim &#8220;I need more data&#8221; should be familiar to anyone who has ever tried to wrestle with language technology issues, attempted speech application tuning or delved into any statistical approach to an AI-related problem.   Google <a href="http://www.google.com/goog411/">moved into the speech world</a> last year with GOOG-411, a speech recognition driven directory assistance application (you say what you are looking for and where, it returns suitable businesses and connects you to the one you want or sends you details in an SMS).<br />Like all (well, most) other Google services, GOOG-411 is free for the end-user.  As such, the basic business model (collect data, turn data into cash) applies.  This was <a href="http://www.infoworld.com/article/07/10/23/Google-wants-your-phonemes_3.html">recently confirmed</a>  in interview by Marissa Mayer, Google&#8217;s VP <span class="mdTitleGen">of Search Products and User Experience:</span><br />
<blockquote></blockquote>
<p><span class="artText"><br />
<blockquote><span style="font-size:85%;">Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model &#8230; that we can use for all kinds of different things, including video search.</span></p></blockquote>
<p>Google thus couples statistical AI and its general data-driven approach to everything in a novel way.  In doing so, Google may find itself in a catch-up race with the ilk of <a href="http://www.nuance.com/">Nuance</a>, <a href="http://www.loquendo.com/">Loquendo</a> <a href="http://www-306.ibm.com/software/pervasive/voice_server/ivrgateway.html">IBM</a>, or <a href="http://www.telisma.com/">Telisma</a>, whose stronghold on speech recognition technology comes, in part, from having aggregated speech and language databases through data collection during professional services projects.<br /></span><span class="artText">What&#8217;s new in Google&#8217;s approach, however, is the convergence of the dual role that data plays in AI and in the overall service-driven business model.  Google will presumably not be content to bootstrap a pattern matching engine to sell licenses like the technology companies above.  More interestingly to follow will be the range of services Google can spin using this technology (context sensitive video advertising, audio indexing, IVR hosting) which are more befitting of their overall company strategy.</span><span class="artText"><br />Unsurprisingly, Mayer goes on to claim that Google isn&#8217;t working on ways out of the world of brute-force data-driven algorithms:<br /></span><span class="artText"></span><br />
<blockquote><span style="font-size:85%;"><span class="artText">People should be able to ask questions, and we should understand their meaning, or they should be able to talk                      about things at a conceptual level. &#8230; </span><span class="artText">A lot of people will turn to things like the semantic Web as a possible answer to that. But what we&#8217;re seeing actually is that with a lot of data, you ultimately see things that seem intelligent even though they&#8217;re done through brute force.</span></span></p></blockquote>
<p><span class="artText"></span><span class="artText">User privacy advocates may also have a thought or two on this new dimension of data collection, as Google is beginning to loose the &#8220;conventionally trustworthy&#8221; image it held amongst many over the past years.  Fortunately the ways in which speech data is commonly used to train pattern matching models involves very little in the ways of privacy infringement.</span><span class="artText"><br />Happy data collecting!<br /></span><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/01/03/goog-we-need-more-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Assistive and Accessibility Technology</title>
		<link>http://www.okkoblog.com/2007/11/21/assistive-and-accessibility-technology/</link>
		<comments>http://www.okkoblog.com/2007/11/21/assistive-and-accessibility-technology/#comments</comments>
		<pubDate>Wed, 21 Nov 2007 09:25:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[healthcare]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Nattiq]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[TTS]]></category>
		<category><![CDATA[usability]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=32</guid>
		<description><![CDATA[Diligent readers may have noticed that dominant news bits concerning speech and language technologies seem to focus on the cost- or time-saving aspects it. This is understandable, as the big players (Google, Microsoft, Nuance, IBM) have made it their mandate to capture lucrative markets (call center automation, directory assistance). Application of natural language technologies elsewhere, [...]]]></description>
			<content:encoded><![CDATA[<p>Diligent readers may have noticed that dominant news bits concerning speech and language technologies seem to focus on the cost- or time-saving aspects it.  This is understandable, as the big players (Google, Microsoft, Nuance, IBM) have made it their mandate to capture lucrative markets (call center automation, directory assistance).  Application of natural language technologies elsewhere, e.g. where it&#8217;s fun (in games) or necessary (providing accessibility for visually impaired users), seems to lag.<br />Not so this week.  This week seems to shine under the assistive/accessibility technology star.  Note Sourceforge project &#8220;Speak as Daisy&#8221; &#8211; a <a href="http://www.news.com/8301-10784_3-9815836-7.html">Microsoft Word plugin</a> that enables creation of XML files with markup for speech synthesis or electronic braille generation.  The plugin is said to be available in 2008.<br />Mac users with need for improved document read back in British English will rejoice over the <a href="http://prmac.com/release-id-993.htm">improved Infovox iVox voices</a>.<br />Philips and Elsevier develop a <a href="http://money.cnn.com/news/newsfeeds/articles/prnewswire/UKTH00615112007-1.htm">speech-enabled diagnostic system</a> for Radiologists.<br />Behold Nattiq&#8217;s USB <a href="http://www.mysolutioninfo.com/news-display.aspx?Code=5405&amp;t=Nattiq%20announces%20new%20Hal%20Pen%20technology">Hal Pen</a>, which allows blind users to use the company&#8217;s accessibility features on any computer with a USB port without installation.<br />Of course there&#8217;s some overlap with time-, cost- and money-saving technologies as well.  The FBI has announced <a href="http://visualvoicemail.tmcnet.com/speech-technologies/articles/14574-fbi-picks-speech-recognition-software-from-nuance.htm">widespread use of Nuance Dragon Naturally Speaking</a> dictation for report and interview transcription.<br />Lastly, <a href="http://content.hamptonroads.com/story.cfm?story=137184&amp;ran=6782">here&#8217;s an <span style="font-style: italic;">a propos</span> rant</a> against call center automation and frustrated end-users, a target group for speech and language technologies all too often neglected.  Perhaps there&#8217;s a lesson to be learned about usability by the &#8220;money savers&#8221; employing speech technology, taken from those that rely on speech recognition and synthesis for their daily needs.  I don&#8217;t know, but F-word spotting as a means for prioritizing frustrated callers seems like an acknowledgement of defeat.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2007/11/21/assistive-and-accessibility-technology/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
