<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Okko in Speech &#187; SpinVox</title>
	<atom:link href="http://www.okkoblog.com/tag/spinvox/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.okkoblog.com</link>
	<description>Working with speech and language technology</description>
	<lastBuildDate>Thu, 29 Sep 2011 12:37:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>SpinVox, Voice-to-Text and Some Terminology</title>
		<link>http://www.okkoblog.com/2010/01/18/spinvox-voice-to-text-and-some-terminology/</link>
		<comments>http://www.okkoblog.com/2010/01/18/spinvox-voice-to-text-and-some-terminology/#comments</comments>
		<pubDate>Mon, 18 Jan 2010 11:14:45 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[SpinVox]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=156</guid>
		<description><![CDATA[The recent acquisition of SpinVox by Nuance not only represents another major step towards market consolidation by the latter company, but also prompted me have a look at the voice-to-text market. Being a &#8220;late adopter power user&#8221; – out of some combination of complacency with existing work flows – and refusing to pay for certain [...]]]></description>
			<content:encoded><![CDATA[<p>The recent <a href="http://www.nuance.com/spinvox/" target="_blank">acquisition</a> of <a href="http://www.spinvox.com" target="_blank">SpinVox</a> by <a href="http://www.nuance.com" target="_blank">Nuance</a> not only represents another major step towards market consolidation by the latter company, but also prompted me have a look at the voice-to-text market.  Being a &#8220;late adopter power user&#8221; – out of some combination of complacency with existing work flows – and refusing to pay for certain conveniences, I have refrained from using such services until now. Shameful for one who&#8217;s bread and butter is working with speech technology, I admin.</p>
<p>Luckily I came across some <a href="http://www.readwriteweb.com/archives/voice-to-text-speech-to-text.php" target="_blank">useful</a> <a href="http://baratunde.posterous.com/this-is-a-test-of-the-google-voice-messaging" target="_blank">reviews</a> of the most prominent providers to get me up to snuff. I won&#8217;t go into them, as I&#8217;m sure others have more to say about the actual user experience. However as &#8220;mobile&#8221; is the way speech and langauge technology seems to want to go, and as I finally plan to use more personal mobile computing resources (especially various gadgets starting with &#8220;i&#8221;) for speech technology, I may give some of these a whirl in the near future…</p>
<p>SpinVox caused somewhat of a stir when launching their voice-to-text service in 2004 and another when the BBC &#8220;<a href="http://news.bbc.co.uk/2/hi/8163511.stm" target="_blank">uncovered</a>&#8221; that the company used a combination of human and machine intelligence. To anyone working in speech and language technology this would have been obvious from the get-go, as well as to anyone reading the company&#8217;s patent or patent applications, in which the use of human operators is mentioned explicitly. However regular users would probably have been duped into thinking a machine was doing all the typing.  Failure to understand/communicate this caused a wholly avoidable privacy debacle.</p>
<p>One thing that&#8217;s clear from last years privacy debacle is that there&#8217;s a bit of mess of terminology when it comes to voice and speech technologies.  So here&#8217;s an attempt at shedding some light on what&#8217;s what:</p>
<p style="padding-left: 30px;"><em>Speech Recognition</em> &#8211; also <em>ASR</em> (automatic speech recognition) for short. This is the general term used to refer to the technology that automatically turns spoken words into machine-readable text. However there are different dimensions to describe this technology, such as models employed (HMM-based vs connectionist), who it&#8217;s for  (one single speaker or all speakers of a dialect or language).  Also, there is a host of applications that employ it (dictation, IVR/telephone systems, voice-to-text services), each with different requirements. Hence ASR is really an umbrella term.</p>
<p style="padding-left: 30px;"><em>Voice Recognition</em> &#8211; often confused with speech recognition.  Usually voice recognition refers to software that works for only a single speaker.  However this is anecdotal and in marketing the two are used synonymously.</p>
<p style="padding-left: 30px;"><em>Voice-to-Text</em> &#8211; a service that converts spoken words into text. Some ASR may be used to help to do so, as well as human transcribers, however the label itself makes no claim as to whether the process is fully automated.</p>
<p style="padding-left: 30px;"><em>Speaker Recognition</em> &#8211; this is a security technology typically used to perform one of two tasks: (1) identifying a speaker from a group of known speakers or (2) determining whether a speaker is really who s/he claims. These are very similar tasks that people often confuse.  Think of the first one as picking a person out of a crowd and the second as a kind of &#8220;voice fingerprint matching&#8221;.</p>
<p style="padding-left: 30px;"><em>Text-to-Speech</em> &#8211; or short <em>TTS</em>, another term for speech synthesis.  This technology is used to turn written text into an audio signal (such as an MP3).  This should be an obvious label, but surprisingly people seem to <a href="http://www.youtube.com/watch?v=N9GyPXJGZsU" target="_blank">confuse</a> it with Voice-to-Text services frequently (purely my own anecdote).</p>
<p>I&#8217;m also told SpinVox&#8217;s sales price of $102m is a bit of a disappointment, representing just over 50% of the initial $200m that SpinVox raised in 2003. But that&#8217;s something I&#8217;ll let others address. Let&#8217;s see where Nuance goes with this, in terms of trying to fully automate the whole transcription process…</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/01/18/spinvox-voice-to-text-and-some-terminology/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Times Reports &amp; Is SciFi Really Wrong?</title>
		<link>http://www.okkoblog.com/2008/01/27/the-times-reports-is-scifi-really-wrong/</link>
		<comments>http://www.okkoblog.com/2008/01/27/the-times-reports-is-scifi-really-wrong/#comments</comments>
		<pubDate>Sun, 27 Jan 2008 08:56:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Fun]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[science fiction]]></category>
		<category><![CDATA[SimulScribe]]></category>
		<category><![CDATA[SpinVox]]></category>
		<category><![CDATA[TTS]]></category>
		<category><![CDATA[vlingo]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=35</guid>
		<description><![CDATA[The New York Times today published an interesting, if brief, article about speech recognition in the mobile/telco space &#8211; cited as a &#8220;$1.6 billion market in 2007&#8243;. The article provides a brief overview of a range of applications and mashups, such as vlingo.com and SimulScribe as well as some directory assistance services (but omitting some [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->The New York Times today published an interesting, if brief, <a href="http://www.nytimes.com/2008/01/27/business/27proto.html?_r=1&amp;oref=slogin">article</a> about speech recognition in the mobile/telco space &#8211; cited as a &#8220;$1.6 billion market in 2007&#8243;.  The article provides a brief overview of a range of applications and mashups, such as <a href="http://www.vlingo.com/">vlingo.com</a> and <a href="http://www.simulscribe.com/">SimulScribe</a> as well as some directory assistance services (but omitting some others such as <a href="http://www.spinvox.com/">SpinVox</a>, <a href="http://www.google.com/goog411/">GOOG411</a>), that use voice.<br />The article opens:<br />
<blockquote>&#8220;Innovation usually needs time to steep. Time to turn the idea into something tangible, time to get it to market, time for people to decide they accept it. Speech recognition technology has steeped for a long time&#8221;</p></blockquote>
<p><span style="text-decoration: underline;"> </span><a href="http://www.vlingo.com/"><span style="display: block;" id="formatbar_Buttons"><span class="on down" style="display: block;" id="formatbar_CreateLink" title="Link" onmouseover="ButtonHoverOn(this);" onmouseout="ButtonHoverOff(this);" onmouseup="" onmousedown="CheckFormatting(event);FormatbarButton('richeditorframe', this, 8);ButtonMouseDown(this);"></span></span></a><script type="text/javascript">is_url    = location.href;   <br />  addthis_title  = document.title;  <br />  addthis_pub    = 'okkobuss';     <br /></script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12"></script>And concludes:<br />
<blockquote>&#8220;Even a digital expert [...] cautions that some people may never be satisfied with the quality of speech recognition technology — thanks to a steady diet of fictional books, movies and television shows featuring machines that understand everything a person says, no matter how sharp the diction or how loud the ambient noise.&#8221;</p></blockquote>
<p>But isn&#8217;t this a bit hackneyed?  Perhaps by today&#8217;s standards a twenty-year steeping period seems long, but this is hardly the case anywhere else in history.  And after re-watching 1982&#8242;s Blade Runner recently, I actually felt rather optimistic that we are today close to what the movie&#8217;s expectations for speech recognition and speaker verification were for 2019.  Elsewhere , a similar picture emerges.<br />The Star Trek ship computer&#8217;s speech recognition engine (the year is 2151), while accurate, stills require the push of a button to kick in, rather than listening for the hot word &#8220;computer&#8221;, a capacity available , if not quite ripe for deployment, today.<br />Of course, there are the HALs (2001), Marvins (no date), C3P0s (Long long time ago&#8230;), whose capacities far exceed that, which we dare dream our mobile phones can one day understand.  But here it seems the problem is less about the quality of speech technology &#8211; the quality of HAL&#8217;s speech synthesis is available today, and Marvin&#8217;s characteristic monotone baritone should be easy to do &#8211; rather than about the old hard-soft divide in Artificial Intelligence.  As long as we use a hard-AI problem, which speech arguably is, to solve soft-AI problems (&#8220;find closest pizza service&#8221;) we cannot fail to be disappointed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/01/27/the-times-reports-is-scifi-really-wrong/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

