<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Okko in Speech &#187; CereProc</title>
	<atom:link href="http://www.okkoblog.com/tag/cereproc/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.okkoblog.com</link>
	<description>Working with speech and language technology</description>
	<lastBuildDate>Thu, 29 Sep 2011 12:37:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Roger Ebert TTS</title>
		<link>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/</link>
		<comments>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 16:57:50 +0000</pubDate>
		<dc:creator>Okko</dc:creator>
				<category><![CDATA[Brands]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Vendors]]></category>
		<category><![CDATA[accessibility]]></category>
		<category><![CDATA[Cepstral]]></category>
		<category><![CDATA[CereProc]]></category>
		<category><![CDATA[NeoVoice]]></category>
		<category><![CDATA[TTS]]></category>

		<guid isPermaLink="false">http://www.okkoblog.com/?p=181</guid>
		<description><![CDATA[Roger Ebert, who lost his lower jaw to cancer, has been his old voice back. Or at least a version of it. Edinburgh-based CereProc has build a custom voice for its own speech synthesis engine based on old recordings such as TV appearances and DVD commentary tracks. This is of course not the first case [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/lJI87Ivk0PM&amp;feature" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/lJI87Ivk0PM&amp;feature"></embed></object></p>
<p>Roger Ebert, who lost his lower jaw to cancer, has been his old voice back. Or at least a version of it. Edinburgh-based <a href="http://www.cereproc.com/" target="_blank">CereProc</a> has build a custom voice for its own speech synthesis engine based on old recordings such as TV appearances and DVD commentary tracks.</p>
<p>This is of course not the first case of text-to-speech (TTS) being used for essential day-to-day communication. Most prominently, Professor Stephen Hawkins has been doing so since 1985, initially using <a href="http://en.wikipedia.org/wiki/DECtalk" target="_blank">DECTalk</a>, since 2009 <a href="http://www.neospeech.com" target="_blank">NeoSpeech</a>. The poor quality of his voice prior to the switch was of course a bit of a trademark. The anecdote goes that Professor Hawkins stuck with his old voice out of attachment. While many speech and language technologies suffer a wow-but-who-really-needs-it existence, these cases are wonderful examples exhibiting real utility.</p>
<p>Mr. Ebert&#8217;s voice is novel in one regard: he got his own voice back. I have half-seriously mused in the past whether this wasn&#8217;t becoming a real option. Typically, new voice development for general purpose speech synthesis is a costly affair, mostly due to time and labor intensive data preprocessing (studio recording, annotation, hand alignment, etc.) However as the &#8220;grunt work&#8221; is getting more streamlined and automatized the buy-in costs for a new voice lowers. Mr. Ebert was &#8220;lucky&#8221; in the sense that large amounts of his voice had already been recorded in good enough quality to enable building his custom voice. Another player on the TTS market, <a href="http://www.cepstral.com" target="_blank">Cepstral</a>, has recently launched its <a href="http://www.voiceforge.com" target="_blank">VoiceForge</a> offering, which aims to lower the entry threshold for home-grown TTS developers.</p>
<p>Another option that seems to be more and more realistic is employing &#8220;voice-morphing&#8221; and &#8220;voice transformation&#8221;. The idea here is to simply apply changes to an already existing, high-quality TTS voice. The following is a demonstration of how the latter can be done by changing purely acoustic properties (timbre, pitch, rate) of a voice signal:</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/-pA7cW0UV88" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/-pA7cW0UV88"></embed></object></p>
<p>Voice morphing changes one voice to another. A Cambridge University <a href="http://mi.eng.cam.ac.uk/~hy216/VoiceMorphingPrj" target="_blank">research project</a> demonstrated how recordings of one speaker could be made to sound like that of another using relatively little training data. The following are some examples:</p>
<p>Original Speaker 1:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" autoplay="false"></embed></object></p>
<p>Target Speaker 2:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" autoplay="false"></embed></object></p>
<p>Converted Speaker 1 to Speaker 2:</p>
<p><object style="width: 100px; height: 25px;" classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" width="100" height="25" codebase="http://www.apple.com/qtactivex/qtplugin.cab#version=6,0,2,0"><param name="autoplay" value="false" /><param name="src" value="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" /><embed style="width: 100px; height: 25px;" type="video/quicktime" width="100" height="25" src="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" autoplay="false"></embed></object></p>
<p>Similar technology was also <a href="http://www.interspeech2009.org/conference/programme/session.php?id=2710" target="_blank">show cast extensively</a> during the 2009 Interspeech Conference. Perhaps this will one day enable those that have lost their voice without hours (or days) of recordings of it at their disposal to have their own custom voices to talk to their loved ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.okkoblog.com/2010/03/10/roger-ebert-tts/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/vc01.wav" length="159788" type="audio/x-wav" />
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/src01.wav" length="200748" type="audio/x-wav" />
<enclosure url="http://mi.eng.cam.ac.uk/~hy216/prjwaves/tgt01.wav" length="159788" type="audio/x-wav" />
		</item>
	</channel>
</rss>

