<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Okko in Speech &#187; Research</title>
	<atom:link href="http://www.okkoblog.com/category/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.okkoblog.com</link>
	<description>Working with speech and language technology</description>
	<lastBuildDate>Tue, 20 Jul 2010 08:09:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>A More Optimistic Outlook on the Future of Speech</title>
		<link>http://www.okkoblog.com/2010/06/30/a-more-optimistic-outlook-on-the-future-of-speech/</link>
		<comments>http://www.okkoblog.com/2010/06/30/a-more-optimistic-outlook-on-the-future-of-speech/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 09:47:04 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[Siri]]></category>
		<category><![CDATA[usability]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=187</guid>
		<description><![CDATA[The speech application industry got some critical press in recent months (here are some spirited responses, respectively.) All the more refreshing to come across this New York Times article presenting current work in speech and artificial intelligence. The article highlights broadly what kind of AI applications have moved into the mainstream (or have potential to [...]]]></description>
			<content:encoded><![CDATA[<p>The speech application industry got some <a href="http://robertfortner.posterous.com/the-unrecognized-death-of-speech-recognition" target="_blank">critical</a> <a href="http://www.signalprocessingsociety.org/technical-committees/list/sl-tc/spl-nl/2010-04/suendermann/">press</a> in recent months (here are some <a href="http://robertopieraccini.blogspot.com/2010/05/un-rest-in-peas-unrecognized-life-of.html">spirited</a> <a href="http://languagelog.ldc.upenn.edu/nll/?p=2275">responses</a>, respectively.)</p>
<p>All the more refreshing to come across this New York Times <a href="http://www.nytimes.com/2010/06/25/science/25voice.html">article</a> presenting <a href="http://research.microsoft.com/en-us/um/people/horvitz/">current</a> <a href="http://siri.com/">work</a> in speech and artificial intelligence. The article highlights broadly what kind of AI applications have moved into the mainstream (or have potential to do so). Speech and natural language understanding, the article claims, have gone furthest.</p>
<p>One thing that is generalizable from both criticisms above is that development of speech-enabled applications has stagnated, in various ways<sup>1</sup>. The underlying technology – speech recognition (ASR) – has gone as far as it can. Application designers and developers haven&#8217;t adopted. Dictation has learned to understand doctors and lawyers better, but still struggles with conversational speech.</p>
<p>This point may have to be conceded. In terms of commercial applications however, especially speech-enabled voice (IVR) systems, the root cause for stagnation is not necessarily a failure of AI, rather than a maturing of standards and best-practices. Fulfilling expectations that voice applications, much like websites, behave according to certain rules is much to the advantage of the millions who interact with such systems every day.</p>
<p>What I walk away with from the generalized critical, as well as the Times&#8217; optimistic perspective is that, short of a revolution in underlying technologies (which hardly anyone expects), filling practical, everyday niches is where things can still move forward for speech and language processing.  These niches have certainly not been fully uncovered.</p>
<p>Thoughts?</p>
<hr /><sup>1</sup> Roughly summarized, Robert Fostner: &#8220;development in speech technology has flat-lined since 2001&#8243;; David Suendermann: &#8220;(statistical) engineering methods are more efficient than traditional symbolic linguistic approaches to language processing.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/06/30/a-more-optimistic-outlook-on-the-future-of-speech/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Roger Ebert TTS</title>
		<link>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/</link>
		<comments>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 16:57:50 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[Cepstral]]></category>
		<category><![CDATA[CereProc]]></category>
		<category><![CDATA[NeoVoice]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=181</guid>
		<description><![CDATA[Roger Ebert, who lost his lower jaw to cancer, has been his old voice back. Or at least a version of it. Edinburgh-based CereProc has build a custom voice for its own speech synthesis engine based on old recordings such as TV appearances and DVD commentary tracks. This is of course not the first case [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/lJI87Ivk0PM&amp;feature" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/lJI87Ivk0PM&amp;feature"></embed></object></p>
<p>Roger Ebert, who lost his lower jaw to cancer, has been his old voice back. Or at least a version of it. Edinburgh-based <a href="http://www.cereproc.com/" target="_blank">CereProc</a> has build a custom voice for its own speech synthesis engine based on old recordings such as TV appearances and DVD commentary tracks.</p>
<p>This is of course not the first case of text-to-speech (TTS) being used for essential day-to-day communication. Most prominently, Professor Stephen Hawkins has been doing so since 1985, initially using <a href="http://en.wikipedia.org/wiki/DECtalk" target="_blank">DECTalk</a>, since 2009 <a href="http://www.neospeech.com" target="_blank">NeoSpeech</a>. The poor quality of his voice prior to the switch was of course a bit of a trademark. The anecdote goes that Professor Hawkins stuck with his old voice out of attachment. While many speech and language technologies suffer a wow-but-who-really-needs-it existence, these cases are wonderful examples exhibiting real utility.</p>
<p>Mr. Ebert&#8217;s voice is novel in one regard: he got his own voice back. I have half-seriously mused in the past whether this wasn&#8217;t becoming a real option. Typically, new voice development for general purpose speech synthesis is a costly affair, mostly due to time and labor intensive data preprocessing (studio recording, annotation, hand alignment, etc.) However as the &#8220;grunt work&#8221; is getting more streamlined and automatized the buy-in costs for a new voice lowers. Mr. Ebert was &#8220;lucky&#8221; in the sense that large amounts of his voice had already been recorded in good enough quality to enable building his custom voice. Another player on the TTS market, <a href="http://www.cepstral.com" target="_blank">Cepstral</a>, has recently launched its <a href="http://www.voiceforge.com" target="_blank">VoiceForge</a> offering, which aims to lower the entry threshold for home-grown TTS developers.</p>
<p>Another option that seems to be more and more realistic is employing &#8220;voice-morphing&#8221; and &#8220;voice transformation&#8221;. The idea here is to simply apply changes to an already existing, high-quality TTS voice. The following is a demonstration of how the latter can be done by changing purely acoustic properties (timbre, pitch, rate) of a voice signal:</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/-pA7cW0UV88" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/-pA7cW0UV88"></embed></object></p>
<p>Voice morphing changes one voice to another. A Cambridge University <a href="http://mi.eng.cam.ac.uk/~hy216/VoiceMorphingPrj" target="_blank">research project</a> demonstrated how recordings of one speaker could be made to sound like that of another using relatively little training data. The following are some examples:</p>
<p>Original Speaker 1:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" autoplay="false"></embed></object></p>
<p>Target Speaker 2:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" autoplay="false"></embed></object></p>
<p>Converted Speaker 1 to Speaker 2:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" autoplay="false"></embed></object></p>
<p>Similar technology was also <a href="http://www.interspeech2009.org/conference/programme/session.php?id=2710" target="_blank">show cast extensively</a> during the 2009 Interspeech Conference. Perhaps this will one day enable those that have lost their voice without hours (or days) of recordings of it at their disposal to have their own custom voices to talk to their loved ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" length="159788" type="audio/x-wav" />
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" length="200748" type="audio/x-wav" />
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" length="159788" type="audio/x-wav" />
		</item>
		<item>
		<title>Incremental Dialogue Management</title>
		<link>http://www.okkoblog.com/2009/12/30/incremental-dialogue-management/</link>
		<comments>http://www.okkoblog.com/2009/12/30/incremental-dialogue-management/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 19:37:59 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[dialogue research]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/blog/?p=65</guid>
		<description><![CDATA[The past year I&#8217;ve been involved in research on incremental processing in spoken dialogue systems at Potsdam University. Our project looks at how information in dialogues can be reduced to basic units, which get passed between modules (such as a speech recognizer and a semantic engine), based on a general abstract model of how this [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Dilbert.com" href="http://dilbert.com/strips/comic/2009-12-29/"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/70000/7000/900/77968/77968.strip.gif" border="0" alt="Dilbert.com" /></a><br />
The past year I&#8217;ve been involved in research on incremental processing in spoken dialogue systems at Potsdam University.  Our <a href="http://coco-lab.org/index.php?option=com_content&amp;task=view&amp;id=21&amp;Itemid=9">project</a> looks at how information in dialogues can be reduced to basic units, which get passed between modules (such as a speech recognizer and a semantic engine), based on a <a href="http://www.ling.uni-potsdam.de/~das/papers/schlangenetal_agmo_eacl2009.pdf" target="_blank">general abstract model</a> of how this can be done.  Thus far, we&#8217;ve been mainly concerned with issues originating close to the input speech signal (<a href="http://www.ling.uni-potsdam.de/~timo/pub/naacl-hlt2009.pdf" target="_blank">ASR</a>, <a href="http://www.ling.uni-potsdam.de/~timo/pub/interspeech-rubisc.pdf" target="_blank">semantics</a>, <a href="http://www.ling.uni-potsdam.de/~timo/pub/refres_sigdial.pdf" target="_blank">reference resolution</a>, <a href="http://www.ling.uni-potsdam.de/~timo/pub/interspeech-nbest.pdf">n-best lists</a>, <a href="http://www.ling.uni-potsdam.de/~timo/pub/diaholmia.pdf" target="_blank">prosody</a> etc.).  As these issues are mostly laid out, 2010 will be dedicated to research on larger dialogue issues (interaction &amp; dialogue management, incremental output generation.)</p>
<p>As in the Dilbert dialogue snippet, some issues that will naturally arise are (1) how different types of questions can be handled by an incremental dialogue system (breaking with the established Question-Answer-Question-A-Q&#8230; paradigm in favour of something more dynamic) and (2) what turn-taking means in an incremental framework (we now have a system that can interrupt the user at appropriate moments).  Incrementality delivers mostly benefits of speed, robustness and naturalness on the interaction front and these are linked to output generation, so this is a third issue to watch out for.  Larger dialogue strategies may not be as affected, but if they are, we need to establish in what ways.</p>
<p>We&#8217;ll certainly steer clear of calling our prototype Morgan. If you are involved in speech and language processing and interested in creating interesting, more natural human-machine dialogues, I&#8217;d love to hear from you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/12/30/incremental-dialogue-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Speech and Dialog Conferences / Speech for iPhone and Android</title>
		<link>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/</link>
		<comments>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/#comments</comments>
		<pubDate>Sat, 11 Jul 2009 08:24:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[Android]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=54</guid>
		<description><![CDATA[Conference time: I will be spending a couple of days in London and Brighton from September 5th attending Interspeech, SIGDIAL as well as a researcher round-table. Anyone interested in meeting up, feel free to get in touch. Also, here are some more or less recent, interesting news for Android (at about 6:20, thanks Schamai) and [...]]]></description>
			<content:encoded><![CDATA[<p><!-- AddThis Bookmark Button BEGIN -->Conference time:  I will be spending a couple of days in London and Brighton from September 5th attending <a href="http://www.interspeech2009.org/">Interspeech</a>, <a href="http://www.sigdial.org/workshops/workshop10/index.html">SIGDIAL</a> as well as a researcher<a href="http://www.yrrsds.org/"> round-table</a>.  Anyone interested in meeting up, feel free to <a href="http://www.voxarca.de/app/main/contact">get in touch</a>.</p>
<p>Also, here are some more or less recent, interesting news for <a href="http://www.youtube.com/watch?v=uX9nt8Cpdqg">Android</a> (at about 6:20, thanks Schamai) and <a href="http://prmac.com/release-id-6453.htm">iPhone</a> speech developers.<br /><!-- AddThis Bookmark Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/07/11/speech-and-dialog-conferences-speech-for-iphone-and-android/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Incrementality in Verbal Interaction</title>
		<link>http://www.okkoblog.com/2009/06/18/incrementality-in-verbal-interaction/</link>
		<comments>http://www.okkoblog.com/2009/06/18/incrementality-in-verbal-interaction/#comments</comments>
		<pubDate>Thu, 18 Jun 2009 07:54:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[incrementality]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=53</guid>
		<description><![CDATA[Since I&#8217;ve joined a research program at Potsdam University end of last year (as a researcher and PhD student), I&#8217;ve decided to use this blog for some additional, more personal updates. This is the first :-). Our research is concerned with human-machine spoken dialog systems from an incremental, i.e. real-time processing, perspective. As such, members [...]]]></description>
			<content:encoded><![CDATA[<p>Since I&#8217;ve joined a <a href="http://www.coco-lab.org/">research program</a> at <a href="http://uni-potsdam.de/">Potsdam University</a> end of last year (as a researcher and PhD student), I&#8217;ve decided to use this blog for some additional, more personal updates.  This is the first :-).</p>
<p>Our research is concerned with human-machine spoken dialog systems from an incremental, i.e. real-time processing, perspective.  As such, members of our team, including me, were recently invited to a <a href="http://www.sfb673.org/component/option,com_eventcal/task,event/date,1244444400/eventid,73/Itemid,53/catid,/lang,en/">workshop</a> on &#8220;Incrementality in Verbal Interaction.&#8221;  The workshop brought together an interesting mix of perspectives on incrementality from Psycholinguistics as well as Theoretical and Computational Linguistics.  Slides from our project presentation are available <a href="http://www.ling.uni-potsdam.de/%7Eokko/docs/2009_bielefeld_IVI.pdf">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2009/06/18/incrementality-in-verbal-interaction/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Internationalization and Speech Technologies</title>
		<link>http://www.okkoblog.com/2008/05/05/internationalization-and-speech-technologies/</link>
		<comments>http://www.okkoblog.com/2008/05/05/internationalization-and-speech-technologies/#comments</comments>
		<pubDate>Mon, 05 May 2008 05:50:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[How To]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[LinkedIn]]></category>
		<category><![CDATA[machine translation]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[open-source]]></category>
		<category><![CDATA[Philips]]></category>
		<category><![CDATA[TTS]]></category>
		<category><![CDATA[Voiceforge]]></category>
		<category><![CDATA[Voxforge]]></category>
		<category><![CDATA[XING]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=36</guid>
		<description><![CDATA[The not-so-subtle truth is, of course, that we all speak English. Yet localization and internationalization are at once prerequisite and stumbling stone for many web-based endeavors. In my own backyard, two examples illustrate the effect and need for of internationalization, respectively. German professional social network XING has internationally outperformed competitors like LinkedIn through early and [...]]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">
<p>The not-so-subtle truth is, of course, that we all speak English. Yet localization and internationalization are at once prerequisite and stumbling stone for many web-based endeavors.</p>
<p>In my own backyard, two examples illustrate the effect and need for of internationalization, respectively.  German professional social network <a href="http://www.xing.com/">XING</a> has internationally outperformed competitors like <a href="http://www.linkedin.com/">LinkedIn</a> through early and aggressive internationalization.  <a href="http://www.studivz.de/">StudiVZ</a> &#8211; the &#8220;German Facebook&#8221; has gained much of the student social network market <a href="http://uk.techcrunch.com/2008/03/03/facebooks-german-version-may-not-impress-the-locals-after-all/">before</a> <a href="http://www.facebook.com/">Facebook</a> decided to release a German version of its web app, making this a tough-to-crack market.</p>
<p>Ironically, as these two examples underline, the need for localization remains in cases where the demands on usability are low (join group/contact person/send message) and the target audience can largely be expected to speak sufficient English (read <a href="http://lostgarden.com/2008/03/translation-game.html">this</a> for an interesting take on the same issues and solutions in online gaming.)  Moreover, localization is an effort far greater than providing an interface in the local language.</p>
<p>As one expects, localization and internationalization and speech technology are inextricably linked &#8211; in a sense developing speech technologies <span style="font-style: italic;">is</span> internationalization.  And using such technology in professional service projects is akin to building a internationalized web application.  Here are some of the oddities I&#8217;ve observed while working with speech technologies in an international environment:</p>
<p><span style="font-style: italic;">Translation is not enough.</span> When you write software that speaks or wants to be spoken to, there is more at stake than providing interface text.  Can you expect all your users to spell input when your system doesn&#8217;t understand the raw speech input?  Can you be sure that all your translated content will generate well-formed speech-synthesis output?  Language and culture are sensitive issues, so a well-localized speech application must do more than provide translated user interface.  Employing local staff is usually a minimum to building a speech application for a new market.</p>
<p><span style="font-style: italic;">The cost shifts.</span> Re-usability of resources from previous speech projects is usually low.  So unlike localizing a web application, porting a speech application requires grunt work that you thought you had done the first time around.  Moreover, speech applications in new languages almost always come with additional licensing burdens and questions about the appropriate technology partner.  Expect to pay for things you didn&#8217;t expect.</p>
<p><span style="font-style: italic;">There is no long tail.</span> The buy-in costs for developing a new language in almost any speech  or language technology (recognition, synthesis, translation) remain constant.  This makes every newly developed language a strategic decision and translates into a two-tier localization effort:  one developing basic technologies, one employing such technology in professional service projects.<br />
As an example, the world&#8217;s most successful dictation software packages: <a href="http://www.nuance.com/naturallyspeaking/international/">Dragon Naturally Speaking</a> ships in five flavors of English and six European languages.  <a href="http://www.speechrecognition.philips.com/index.asp?id=532">Philip&#8217;s Speech Magic</a> ships in 23 dialects of 11 languages.  Both a far cry from world-coverage.<br />
The enormous cost of development has a decided effect on developing speech technology for lesser-spoken languages.  And it has posed a significant hurdle as well for <a href="http://www.voiceforge.com/">open-source</a> <a href="http://www.voxforge.org/">initiatives</a> of speech technologies to provide such resources for free.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2008/05/05/internationalization-and-speech-technologies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Speech Enabled Knowledge Bases</title>
		<link>http://www.okkoblog.com/2007/04/24/speech-enabled-knowledge-bases/</link>
		<comments>http://www.okkoblog.com/2007/04/24/speech-enabled-knowledge-bases/#comments</comments>
		<pubDate>Tue, 24 Apr 2007 08:21:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[NLP]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=21</guid>
		<description><![CDATA[Two articles and a product showcase recently demonstrated speech-enabled knowledge base solutions. In essence products/solutions such as this are expert systems with various degrees of complexity, ranging from speaking manuals to complex diagnosis systems. Users can describe a problem and ultimately receive an answer, whether through complex one-shot natural language processing/understanding or a plain-old, multi-step [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://home.businesswire.com/portal/site/google/index.jsp?ndmViewId=news_view&#038;newsId=20070420005307&amp;newsLang=en">Two</a> <a href="http://news.tmcnet.com/news/2007/04/23/2540489.htm">articles</a> and a product <a href="http://www.excelsisnet.com/voice/en/excelsis/publicrelations/2006/empolis.html">showcase</a> recently demonstrated speech-enabled <span class="blsp-spelling-corrected" id="SPELLING_ERROR_0">knowledge base</span> solutions.  In essence products/solutions such as this are expert systems with various degrees of complexity, ranging from speaking manuals to complex diagnosis systems.  Users can describe a problem and ultimately receive an answer, whether through complex one-shot natural language processing/understanding or a plain-old, multi-step directed dialogue.<br />Alongside traditional call-center automation applications &#8211; e.g. customer service, process automation, <span class="blsp-spelling-error" id="SPELLING_ERROR_1">pre-qualification</span>, directory assistance &#8211; these systems represent a minor market segment.  However they are relatively novel, so much can still happen.  Especially in medical/<span class="blsp-spelling-corrected" id="SPELLING_ERROR_2">health care</span> domains, the market appears untapped and the list of potential applications broad.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2007/04/24/speech-enabled-knowledge-bases/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Web 3.0 and Natural Language Processing</title>
		<link>http://www.okkoblog.com/2007/04/09/web-3-0-and-natural-language-processing/</link>
		<comments>http://www.okkoblog.com/2007/04/09/web-3-0-and-natural-language-processing/#comments</comments>
		<pubDate>Mon, 09 Apr 2007 06:53:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[web3.0]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=13</guid>
		<description><![CDATA[Web 3.0 is getting some buzz in the blogosphere. Like Web 2.0, it begs the question that PCMag.com recently ran by its readers: what is it? However this time around things seems a bit easier. Web 2.0 seems to be happy with being vaguely defined (delimited may be a better term) and equally a social [...]]]></description>
			<content:encoded><![CDATA[<p>Web 3.0 is getting <a href="http://scobleizer.com/2007/04/05/i-finally-get-semantic-web/">some</a> <a href="http://yihongs-research.blogspot.com/2007/04/semantic-web-is-closer-to-be-real-isnt.html">buzz</a> <a href="http://www.pelicancrossing.net/netwars/2007/04/whats_in_a_20.html">in</a> <a href="http://billboushka.blogspot.com/2007/04/web-30-is-getting-attention.html">the</a> blogosphere.  Like Web 2.0, it begs the question that PCMag.com <a href="http://www.pcmag.com/article2/0,1759,2102852,00.asp">recently</a> ran by its readers:  what is it?  However this time around things seems a bit easier.</p>
<p>Web 2.0 seems to be happy with being vaguely defined (delimited may be a better term) and equally a social and a technological movement.  Web 3.0 clearly hovers over the idea of the &#8220;Semantic Web&#8221;, a term coined by <a href="http://de.wikipedia.org/wiki/Berners-Lee">Tim Berners-Lee</a>, in which richly <a href="http://de.wikipedia.org/wiki/Resource_Description_Framework">mark-upped</a> hypertext and data allow for novel more meaningful human-machine and machine-machine communication.  <a href="http://www.radarnetworks.com/">Radar Networks</a> (currently in stealth mode) claim to be driving some interesting developments in this direction and are followed closely by those interested.</p>
<p>This has already raised some questions: will content be expensive hand labor or machine boot-strappable, what new privacy policies do we have to live with, how does one separate <a href="http://www.elainevigneault.com/2007/04/08/semantic-web-and-the-future-of-the-internet.html">style and content</a>, what are <a href="http://mukhlason.multiply.com/reviews/item/25">alternatives to RDF</a>.</p>
<p>Sadly, there&#8217;s very little inspiring out there about potential applications.</p>
<p>My question (though not uniquely mine) to add to this:  What role will natural language processing play in this (i.e. how &#8220;semantic&#8221; is this talk of Semantics)?  Semantic content in RDF appears to be little more than a means for one machine to tell another who authored a particular book or what are the postal codes in the greater Boston area.  Semantics to me is as much about intentions (&#8220;Why is web-service A dispensing such information?&#8221;) and interpreting such  information for the purposes of action (&#8220;What can web-service B &#8211; or my browser or I &#8211; do with it?&#8221;).</p>
<p>Perhaps this misses the mark and semantic really isn&#8217;t about natural language.  But there is a weaker, more real form of this &#8220;language and technology&#8221; concern: Insofar as semantics <span style="font-style: italic;">is</span> just information, can it be bootstrapped by a machine (perhaps even linguistically informed rather than statistically)?</p>
<p>Thoughts?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2007/04/09/web-3-0-and-natural-language-processing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Three Observations about Recent Language Technology News</title>
		<link>http://www.okkoblog.com/2007/03/28/three-observations-about-recent-language-technology-news/</link>
		<comments>http://www.okkoblog.com/2007/03/28/three-observations-about-recent-language-technology-news/#comments</comments>
		<pubDate>Wed, 28 Mar 2007 11:50:00 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Services]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[ASR]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[multi-modal]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[open-source]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[search engines]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://okkoblog.com/blog/?p=4</guid>
		<description><![CDATA[To start us off, recent experience has shown three things: Speech (i.e. voice) related news is TTS-dominated, less so by ASR. The company featured most frequently in the news is Nuance. The talk of semantic search engines seems to dominate the NLP news. The success of TTS is largely due to requirements set by mobile [...]]]></description>
			<content:encoded><![CDATA[<p>To start us off, <a href="http://okkobuss.googlepages.com/">recent experience</a> has shown three things:
<ol>
<li>Speech (i.e. voice) related news is TTS-dominated, less so by ASR.</li>
<li>The company featured most frequently in the news is Nuance.</li>
<li>The talk of semantic search engines seems to dominate the NLP news.</li>
</ol>
<p> The success of TTS is largely due to requirements set by mobile and in-car technologies, especially GPS and communications.  The future of ASR in the other hand seems to depend on the dictation market (especially in the healthcare sector) and a growing relevance of network ASR (driven by advancing VoIP, impact of multi-modal applications).</p>
<p>Nuance&#8217;s continued position will depend on the role of &#8220;super players&#8221; IBM and Microsoft and to a lesser degree the role of open-source initiatives, especially on the network/telephony side.</p>
<p>Semantic search engines recently got some media hype with &#8220;Google-Killer&#8221; Powerset, a PARC offspring.  While in its infancy, some believe this development towards semantic web will usher in a Web3.0 revolution.  Of course, soem others believe this has already begun, while yet more just wanna see what happens with all this.</p>
<p>Let&#8217;s see how these trends develop.  Especially multi-modality and semantic searches will be issues to follow closely.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2007/03/28/three-observations-about-recent-language-technology-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
