英文文献翻译_第1页
英文文献翻译_第2页
英文文献翻译_第3页
英文文献翻译_第4页
英文文献翻译_第5页
已阅读5页,还剩14页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

英文原文英文原文 Speech synthesis Speech synthesis is the artificial production of human speech A computer system used for this purpose is called a speech synthesizer and can be implemented in software or hardware A text to speech TTS system converts normal language text into speech other systems render symbolic linguistic representations like phonetic transcriptions into speech Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database Systems differ in the size of the stored speech units a system that stores phones or diphones provides the largest output range but may lack clarity For specific usage domains the storage of entire words or sentences allows for high quality output Alternatively a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely synthetic voice output The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood An intelligible text to speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer Many computer operating systems have included speech synthesizers since the early 1990s OverviewOverview ofof texttext processingprocessing A text to speech system or engine is composed of two parts a front end and a back end The front end has two major tasks First it converts raw text containing symbols like numbers and abbreviations into the equivalent of written out words This process is often called text normalization pre processing or tokenization The front end then assigns phonetic transcriptions to each word and divides and marks the text into prosodic units like phrases clauses and sentences The process of assigning phonetic transcriptions to words is called text to phoneme or grapheme to phoneme conversion Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front end The back end often referred to as the synthesizer then converts the symbolic linguistic representation into sound In certain systems this part includes the computation of the target prosody pitch contour phoneme durations which is then imposed on the output speech History Long before electronic signal processing was invented there were those who tried to build machines to create human speech Some early legends of the existence of speaking heads involved Gerbert of Aurillac d 1003 AD Albertus Magnus 1198 1280 and Roger Bacon 1214 1294 In 1779 the Danish scientist Christian Kratzenstein working at the Russian Academy of Sciences built models of the human vocal tract that could produce the five long vowel sounds in International Phonetic Alphabet notation they are a e i o and u 5 This was followed by the bellows operated acoustic mechanical speech machine by Wolfgang von Kempelen of Pressburg Hungary described in a 1791 paper 6 This machine added models of the tongue and lips enabling it to produce consonants as well as vowels In 1837 Charles Wheatstone produced a speaking machine based on von Kempelen s design and in 1857 M Faber built the Euphonia Wheatstone s design was resurrected in 1923 by Paget In the 1930s Bell Labs developed the vocoder which automatically analyzed speech into its fundamental tone and resonances From his work on the vocoder Homer Dudley developed a manually keyboard operated voice synthesizer called The Voder Voice Demonstrator which he exhibited at the 1939 New York World s Fair The Pattern playback was built by Dr Franklin S Cooper and his colleagues at Haskins Laboratories in the late 1940s and completed in 1950 There were several different versions of this hardware device but only one currently survives The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound Using this device Alvin Liberman and colleagues were able to discover acoustic cues for the perception of phonetic segments consonants and vowels Dominant systems in the 1980s and 1990s were the MITalk system based largely on the work of Dennis Klatt at MIT and the Bell Labs system 8 the latter was one of the first multilingual language independent systems making extensive use of natural language processing methods Early electronic speech synthesizers sounded robotic and were often barely intelligible The quality of synthesized speech has steadily improved but output from contemporary speech synthesis systems is still clearly distinguishable from actual human speech As the cost performance ratio causes speech synthesizers to become cheaper and more accessible to the people more people will benefit from the use of text to speech programs Electronic devices The first computer based speech synthesis systems were created in the late 1950s The first general English text to speech system was developed by Noriko Umeda et al in 1968 at the Electrotechnical Laboratory Japan 10 In 1961 physicist John Larry Kelly Jr and colleague Louis Gerstman 11 used an IBM 704 computer to synthesize speech an event among the most prominent in the history of Bell Labs Kelly s voice recorder synthesizer vocoder recreated the song Daisy Bell with musical accompaniment from Max Mathews Coincidentally Arthur C Clarke was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel 2001 A Space Odyssey Arthur C Clarke Biography at the Wayback Machine archived December 11 1997 where the HAL 9000 computer sings the same song as it is being put to sleep by astronaut Dave Bowman Where HAL First Spoke Bell Labs Speech Synthesis website Bell Labs http www bell Retrieved 2010 02 17 Despite the success of purely electronic speech synthesis research is still being conducted into mechanical speech synthesizers Anthropomorphic Talking Robot Waseda Talker SeriesHandheld electronics featuring speech synthesis began emerging in the 1970s One of the first was the Telesensory Systems Inc TSI Speech portable calculator for the blind in 1976 TSI Speech however many concatenative systems also have rules based components Many systems based on formant synthesis technology generate artificial robotic sounding speech that would never be mistaken for human speech However maximum naturalness is not always the goal of a speech synthesis system and formant synthesis systems have advantages over concatenative systems Formant synthesized speech can be reliably intelligible even at very high speeds avoiding the acoustic glitches that commonly plague concatenative systems High speed synthesized speech is used by the visually impaired to quickly navigate computers using a screen reader Formant synthesizers are usually smaller programs than concatenative systems because they do not have a database of speech samples They can therefore be used in embedded systems where memory and microprocessor power are especially limited Because formant based systems have complete control of all aspects of the output speech a wide variety of prosodies and intonations can be output conveying not just questions and statements but a variety of emotions and tones of voice Examples of non real time but highly accurate intonation control in formant synthesis include the work done in the late 1970s for the Texas Instruments toy Speak 1325 may also be read as one three two five thirteen twenty five or thirteen hundred and twenty five A TTS system can often infer how to expand a number based on surrounding words numbers and punctuation and sometimes the system provides a way to specify the context if it is ambiguous Roman numerals can also be read differently depending on context For example Henry VIII reads as Henry the Eighth while Chapter VIII reads as Chapter Eight Similarly abbreviations can be ambiguous For example the abbreviation in for inches must be differentiated from the word in and the address 12 St John St uses the same abbreviation for both Saint and Street TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations while others provide the same result in all cases resulting in nonsensical and sometimes comical outputs such as co operation being rendered as company operation Text to phonemeText to phoneme challengeschallenges Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its spelling a process which is often called text to phoneme or grapheme to phoneme conversion phoneme is the term used by linguists to describe distinctive sounds in a language The simplest approach to text to phoneme conversion is the dictionary based approach where a large dictionary containing all the words of a language and their correct pronunciations is stored by the program Determining the correct pronunciation of each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary The other approach is rule based in which pronunciation rules are applied to words to determine their pronunciations based on their spellings This is similar to the sounding out or synthetic phonics approach to learning reading Each approach has advantages and drawbacks The dictionary based approach is quick and accurate but completely fails if it is given a word which is not in its dictionary citation needed As dictionary size grows so too does the memory space requirements of the synthesis system On the other hand the rule based approach works on any input but the complexity of the rules grows substantially as the system takes into account irregular spellings or pronunciations Consider that the word of is very common in English yet is the only word in which the letter f is pronounced v As a result nearly all speech synthesis systems use a combination of these approaches Languages with a phonemic orthography have a very regular writing system and the prediction of the pronunciation of words based on their spellings is quite successful Speech synthesis systems for such languages often use the rule based method extensively resorting to dictionaries only for those few words like foreign names and borrowings whose pronunciations are not obvious from their spellings On the other hand speech synthesis systems for languages like English which have extremely irregular spelling systems are more likely to rely on dictionaries and to use rule based methods only for unusual words or words that aren t in their dictionaries EvaluationEvaluation challengeschallenges The consistent evaluation of speech synthesis systems may be difficult because of a lack of universally agreed objective evaluation criteria Different organizations often use different speech data The quality of speech synthesis systems also depends to a large degree on the quality of the production technique which may involve analogue or digital recording and on the facilities used to replay the speech Evaluating speech synthesis systems has therefore often been compromised by differences between production techniques and replay facilities Recently however some researchers have started to evaluate speech synthesis systems using a common speech dataset Prosodics and emotional content A study in the journal SpeechSpeech CommunicationCommunication by Amy Drahota and colleagues at the University of Portsmouth UK reported that listeners to voice recordings could determine at better than chance levels whether or not the speaker was smiling It was suggested that identification of the vocal features that signal emotional content may be used to help make synthesized speech sound more natural ComputerComputer operatingoperating systemssystems oror outletsoutlets withwith speechspeech synthesissynthesis Atari Arguably the first speech system integrated into an operating system was the 1400XL 1450XL personal computers designed by Atari Inc using the Votrax SC01 chip in 1983 The 1400XL 1450XL computers used a Finite State Machine to enable World English Spelling text to speech synthesis 31 Unfortunately the 1400XL 1450XL personal computers never shipped in quantity The Atari ST computers were sold with stspeech tos on floppy disk Apple The first speech system integrated into an operating system that shipped in quantity was Apple Computer s MacInTalk in 1984 The software was licensed from 3rd party developers Joseph Katz and Mark Barton later SoftVoice Inc and was featured during the 1984 introduction of the Macintosh computer Since the 1980s Macintosh Computers offered text to speech capabilities through The MacinTalk software In the early 1990s Apple expanded its capabilities offering system wide text to speech support With the introduction of faster PowerPC based computers they included higher quality voice sampling Apple also introduced speech recognition into its systems which provided a fluid command set More recently Apple has added sample based voices Starting as a curiosity the speech system of Apple Macintosh has evolved into a fully supported program PlainTalk for people with vision problems VoiceOver was for the first time featured in Mac OS X Tiger 10 4 During 10 4 Tiger first releases of 10 5 Leopard there was only one standard voice shipping with Mac OS X Starting with 10 6 Snow Leopard the user can choose out of a wide range list of multiple voices VoiceOver voices feature the taking of realistic sounding breaths between sentences as well as improved clarity at high read rates over PlainTalk Mac OS X also includes say a command line based application that converts text to audible speech The AppleScript Standard Additions includes a say verb that allows a script to use any of the installed voices and to control the pitch speaking rate and modulation of the spoken text The Apple iOS operating system used on the iPhone iPad and iPod Touch uses VoiceOver speech synthesis for accessibility Some third party applications also provide speech synthesis to facilitate navigating reading web pages or translating text AmigaOS The second operating system with advanced speech synthesis capabilities was AmigaOS introduced in 1985 The voice synthesis was licensed by Commodore International from SoftVoice Inc who also developed the original MacinTalk text to speech system It featured a complete system of voice emulation with both male and female voices and stress indicator markers made possible by advanced features of the Amiga hardware audio chipset 33 It was divided into a narrator device and a translator library Amiga Speak Handler featured a text to speech translator AmigaOS considered speech synthesis a virtual hardware device so the user could even redirect console output to it Some Amiga programs such as word processors made extensive use of the speech system Microsoft Windows See also Microsoft Agent Modern Windows desktop systems can use SAPI 4 and SAPI 5 components to support speech synthesis and speech recognition SAPI 4 0 was available as an optional add on for Windows 95 and Windows 98 Windows 2000 added Narrator a text to speech utility for people who have visual handicaps Third party programs such as CoolSpeech Textaloud and Ultra Hal can perform various text to speech tasks such as reading text aloud from a specified website email account text document the Windows clipboard the user s keyboard typing etc Not all programs can use speech synthesis directly 34 Some programs can use plug ins extensions or add ons to read text aloud Third party programs are available that can read text from the system clipboard Microsoft Speech Server is a server based package for voice synthesis and recognition It is designed for network use with web applications and call centers Text to Speech TTS refers to the ability of computers to read text aloud A TTS Engine converts written text to a phonemic representation then converts the phonemic representation to waveforms that can be output as sound TTS engines with different languages dialects and specialized vocabularies are available through third party publishers Android Version 1 6 of Android added support for speech synthesis TTS Internet Currently there are a number of applications plugins and gadgets that can read messages directly from an e mail client and web pages from a web browser or Google Toolbar such as Text to voice which is an add on to Firefox Some specialized software can narrate RSS feeds On one hand online RSS narrators simplify information delivery by allowing users to listen to their favourite news sources and to convert them to podcasts On the other hand on line RSS readers are available on almost any PC connected to the Internet Users can download generated audio files to portable devices e g with a help of podcast receiver and listen to them while walking jogging or commuting to work A growing field in Internet based TTS is web based assistive technology e g Browsealoud from a UK company and Readspeaker It can deliver TTS functionality to anyone for reasons of accessibility convenience entertainment or information with access to a web browser The non profit project Pediaphon was created in 2006 to provide a similar web based TTS interface to the Wikipedia Other work is being done in the context of the W3C through the W3C Audio Incubator Group with the involvement of The BBC and Google Inc Others Some e book readers such as the Amazon Kindle Samsung E6 PocketBook eReader Pro enTourage eDGe and the Bebook Neo Some models of Texas Instruments home computers produced in 1979 and 1981 Texas Instruments TI 99 4 and TI 99 4A were capable of text to phoneme synthesis or reciting complete words and phrases text to dictionary using a very popular Speech Synthesizer peripheral TI used a proprietary codec to embed complete spoken phrases into applications primarily video games IBM s OS 2 Warp 4 included VoiceType a precursor to IBM ViaVoice Systems that operate on free and open sourc

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论