外文翻译---说话人识别.docx_第1页
外文翻译---说话人识别.docx_第2页
外文翻译---说话人识别.docx_第3页
外文翻译---说话人识别.docx_第4页
外文翻译---说话人识别.docx_第5页
已阅读5页,还剩11页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

附 录A 英文文献Speaker RecognitionBy Judith A. Markowitz, J. Markowitz ConsultantsSpeaker recognition uses features of a persons voice to identify or verify that person. It is a well-established biometric with commercial systems that are more than 10 years old and deployed non-commercial systems that are more than 20 years old. This paper describes how speaker recognition systems work and how they are used in applications.1. IntroductionSpeaker recognition (also called voice ID and voice biometrics) is the only human-biometric technology in commercial use today that extracts information from sound patterns. It is also one of the most well-established biometrics, with deployed commercial applications that are more than 10 years old and non-commercial systems that are more than 20 years old.2. How do Speaker-Recognition Systems WorkSpeaker-recognition systems use features of a persons voice and speaking style to:l attach an identity to the voice of an unknown speakerl verify that a person is who she/ he claims to bel separate one persons voice from other voices in a multi-speaker environmentThe first operation is called speak identification or speaker recognition; the second has many names, including speaker verification, speaker authentication, voice verification, and voice recognition; the third is speaker separation or, in some situations, speaker classification. This papers focuses on speaker verification, the most highly commercialized of these technologies.2.1 Overview of the ProcessSpeaker verification is a biometric technology used for determining whether the person is who she or he claims to be. It should not be confused with speech recognition, a non-biometric technology used for identifying what a person is saying. Speech recognition products are not designed to determine who is speaking.Speaker verification begins with a claim of identity (see Figure A1). Usually, the claim entails manual entry of a personal identification number (PIN), but a growing number of products allow spoken entry of the PIN and use speech recognition to identify the numeric code. Some applications replace manual or spoken PIN entry with bank cards, smartcards, or the number of the telephone being used. PINS are also eliminated when a speaker-verification system contacts the user, an approach typical of systems used to monitor home-incarcerated criminals.Figure A1.Once the identity claim has been made, the system retrieves the stored voice sample (called a voiceprint) for the claimed identity and requests spoken input from the person making the claim. Usually, the requested input is a password. The newly input speech is compared with the stored voiceprint and the results of that comparison are measured against an acceptance/rejection threshold. Finally, the system accepts the speaker as the authorized user, rejects the speaker as an impostor, or takes another action determined by the application. Some systems report a confidence level or other score indicating how confident it about its decision.If the verification is successful the system may update the acoustic information in the stored voiceprint. This process is called adaptation. Adaptation is an unobtrusive solution for keeping voiceprints current and is used by many commercial speaker verification systems.2.2 The Speech SampleAs with all biometrics, before verification (or identification) can be performed the person must provide a sample of speech (called enrolment). The sample is used to create the stored voiceprint.Systems differ in the type and amount of speech needed for enrolment and verification. The basic divisions among these systems arel text dependentl text independentl text prompted2.2.1 Text DependentMost commercial systems are text dependent. Text-dependent systems expect the speaker to say a pre-determined phrase, password, or ID. By controlling the words that are spoken the system can look for a close match with the stored voiceprint. Typically, each person selects a private password, although some administrators prefer to assign passwords. Passwords offer extra security, requiring an impostor to know the correct PIN and password and to have a matching voice. Some systems further enhance security by not storing a human-readable representation of the password.A global phrase may also be used. In its 1996 pilot of speaker verification Chase Manhattan Bank used Verification by Chemical Bank. Global phrases avoid the problem of forgotten passwords, but lack the added protection offered by private passwords.2.2.2 Text IndependentText-independent systems ask the person to talk. What the person says is different every time. It is extremely difficult to accurately compare utterances that are totally different from each other - particularly in noisy environments or over poor telephone connections. Consequently, commercial deployment of text-independent verification has been limited.2.2.3 Text PromptedText-prompted systems (also called challenge response) ask speakers to repeat one or more randomly selected numbers or words (e.g. “43516”, “27,46”, or “Friday, computer”). Text prompting adds time to enrolment and verification, but it enhances security against tape recordings. Since the items to be repeated cannot be predicted, it is extremely difficult to play a recording. Furthermore, there is no problem of forgetting a password, even though the PIN, if used, may still be forgotten.2.3 Anti-speaker ModellingMost systems compare the new speech sample with the stored voiceprint for the claimed identity. Other systems also compare the newly input speech with the voices of other people. Such techniques are called anti-speaker modelling. The underlying philosophy of anti-speaker modelling is that under any conditions a voice sample from a particular speaker will be more like other samples from that person than voice samples from other speakers. If, for example, the speaker is using a bad telephone connection and the match with the speakers voiceprint is poor, it is likely that the scores for the cohorts (or world model) will be even worse.The most common anti-speaker techniques arel discriminate trainingl cohort modelingl world modelsDiscriminate training builds the comparisons into the voiceprint of the new speaker using the voices of the other speakers in the system. Cohort modelling selects a small set of speakers whose voices are similar to that of the person being enrolled. Cohorts are, for example, always the same sex as the speaker. When the speaker attempts verification, the incoming speech is compared with his/her stored voiceprint and with the voiceprints of each of the cohort speakers. World models (also called background models or composite models) contain a cross-section of voices. The same world model is used for all speakers.2.4 Physical and Behavioural BiometricsSpeaker recognition is often characterized as a behavioural biometric. This description is set in contrast with physical biometrics, such as fingerprinting and iris scanning. Unfortunately, its classification as a behavioural biometric promotes the misunderstanding that speaker recognition is entirely (or almost entirely) behavioural. If that were the case, good mimics would have no difficulty defeating speaker-recognition systems. Early studies determined this was not the case and identified mimic-resistant factors. Those factors reflect the size and shape of a speakers speaking mechanism (called the vocal tract).The physical/behavioural classification also implies that performance of physical biometrics is not heavily influenced by behaviour. This misconception has led to the design of biometric systems that are unnecessarily vulnerable to careless and resistant users. This is unfortunate because it has delayed good human-factors design for those biometrics.3. How is Speaker Verification Used?Speaker verification is well-established as a means of providing biometric-based security for:l telephone networksl site accessl data and data networksand monitoring of:l criminal offenders in community release programmesl outbound calls by incarcerated felonsl time and attendance3.1 Telephone NetworksToll fraud (theft of long-distance telephone services) is a growing problem that costs telecommunications services providers, government, and private industry US$3-5 billion annually in the United States alone. The major types of toll fraud include the following:l Hacking CPEl Calling card fraudl Call forwardingl Prisoner toll fraudl Hacking 800 numbersl Call sell operationsl 900 number fraudl Switch/network hitsl Social engineeringl Subscriber fraudl Cloning wireless telephonesAmong the most damaging are theft of services from customer premises equipment (CPE), such as PBXs, and cloning of wireless telephones. Cloning involves stealing the ID of a telephone and programming other phones with it. Subscriber fraud, a growing problem in Europe, involves enrolling for services, usually under an alias, with no intention of paying for them.Speaker verification has two features that make it ideal for telephone and telephone network security: it uses voice input and it is not bound to proprietary hardware. Unlike most other biometrics that need specialized input devices, speaker verification operates with standard wireline and/or wireless telephones over existing telephone networks. Reliance on input devices created by other manufacturers for a purpose other than speaker verification also means that speaker verification cannot expect the consistency and quality offered by a proprietary input device. Speaker verification must overcome differences in input quality and the way in which speech frequencies are processed. This variability is produced by differences in network type (e.g. wireline v wireless), unpredictable noise levels on the line and in the background, transmission inconsistency, and differences in the microphone in telephone handset. Sensitivity to such variability is reduced through techniques such as speech enhancement and noise modelling, but products still need to be tested under expected conditions of use.Applications of speaker verification on wireline networks include secure calling cards, interactive voice response (IVR) systems, and integration with security for proprietary network systems. Such applications have been deployed by organizations as diverse as the University of Maryland, the Department of Foreign Affairs and International Trade Canada, and AMOCO. Wireless applications focus on preventing cloning but are being extended to subscriber fraud. The European Union is also actively applying speaker verification to telephony in various projects, including Caller Verification in Banking and Telecommunications, COST250, and Picasso.3.2 Site accessThe first deployment of speaker verification more than 20 years ago was for site access control. Since then, speaker verification has been used to control access to office buildings, factories, laboratories, bank vaults, homes, pharmacy departments in hospitals, and even access to the US and Canada. Since April 1997, the US Department of Immigration and Naturalization (INS) and other US and Canadian agencies have been using speaker verification to control after-hours border crossings at the Scobey, Montana port-of-entry. The INS is now testing a combination of speaker verification and face recognition in the commuter lane of other ports-of-entry.3.3 Data and Data NetworksGrowing threats of unauthorized penetration of computing networks, concerns about security of the Internet, and increases in off-site employees with data access needs have produced an upsurge in the application of speaker verification to data and network security.The financial services industry has been a leader in using speaker verification to protect proprietary data networks, electronic funds transfer between banks, access to customer accounts for telephone banking, and employee access to sensitive financial information. The Illinois Department of Revenue, for example, uses speaker verification to allow secure access to tax data by its off-site auditors.3.4 CorrectionsIn 1993, there were 4.8 million adults under correctional supervision in the United States and that number continues to increase. Community release programmes, such as parole and home detention, are the fastest growing segments of this industry. It is no longer possible for corrections officers to provide adequate monitoring of those people.In the US, corrections agencies have turned to electronic monitoring systems. Since the late 1980s speaker verification has been one of those electronic monitoring tools. Today, several products are used by corrections agencies, including an alcohol breathalyzer with speaker verification for people convicted of driving while intoxicated and a system that calls offenders on home detention at random times during the day. Speaker verification also controls telephone calls made by incarcerated felons. Inmates place a lot of calls. In 1994, US telecommunications services providers made $1.5 billion on outbound calls from inmates. Most inmates have restrictions on whom they can call. Speaker verification ensures that an inmate is not using another inmates PIN to make a forbidden contact.3.5 Time and AttendanceTime and attendance applications are a small but growing segment of the speaker-verification market. SOC Credit Union in Michigan has used speaker verification for time and attendance monitoring of part-time employees for several years. Like many others, SOC Credit Union first deployed speaker verification for security and later extended it to time and attendance monitoring for part-time employees.4. StandardsThis paper concludes with a short discussion of application programming interface (API) standards. An API contains the function calls that enable programmers to use speaker-verification to create a product or application. Until April 1997, when the Speaker Verification API (SVAPI) standard was introduced, all available APIs for biometric products were proprietary. SVAPI remains the only API standard covering a specific biometric. It is now being incorporated into proposed generic biometric API standards. SVAPI was developed by a cross-section of speaker-recognition vendors, consultants, and end-user organizations to address a spectrum of needs and to support a broad range of product features. Because it supports both high level functions (e.g. calls to enrol) and low level functions (e.g. choices of audio input features) it facilitates development of different types of applications by both novice and experienced developers.Why is it important to support API standards? Developers using a product with a proprietary API face difficult choices if the vendor of that product goes out of business, fails to support its product, or does not keep pace with technological advances. One of those choices is to rebuild the application from scratch using a different product. Given the same events, developers using a SVAPI-compliant product can select another compliant vendor and need perform far fewer modifications. Consequently, SVAPI makes development with speaker verification less risky and less costly. The advent of generic biometric API standards further facilitates integration of speaker verification with other biometrics. All of this helps speaker-verification vendors because it fosters growth in the marketplace. In the final analysis active support of API standards by developers and vendors benefits everyone.附 录B 中文翻译说话人识别作者:Judith A. Markowitz, J. Markowitz Consultants说话人识别是用一个人的语音特征来辨认或确认这个人。有着10多年的商业系统和超过20年的非商业系统部署,它是一种行之有效的生物测定学。本文介绍了说话人识别系统的工作原理,以及它们在应用软件中如何被使用。1. 介绍说话人识别(也叫语音身份和语音生物测定学)是当今从声音模式提取信息的商业应用中唯一的人类生物特征识别技术。有着10多年的商业应用程序部署和超过20年的非商业系统,它也是最行之有效的生物测定学之一。2. 说话人识别系统如何工作说话人识别系统使用一个人的语音和说话风格来达到以下目的:l 为一个未知说话人的声音绑定一个身份l 确认一个人是他/她所宣称的l 在多说话人的环境中从其它的声音中区分出每一特定人的声音第一个操作被称为说话人辨认或说话人识别;第二个有许多名字,包括说话人确认,说话人鉴定,声音确认和声音识别;第三个是说话人分离,某些情形下也叫说话人分类。本文着重这些技术中最高度商业化的说话人确认。2.1 方法概览说话人确认是决定一个人是否是他或她所宣称身份的一种生物测定技术。它不应同语音识别相混淆。后者是一种用来确定一个人说什么的非生物测定技术。语音识别产品不是被设计用来确定谁在发言的。说话人确认以一个身份声明开始(见图B1)。通常情况下,声明需要手工输入个人识别码( PIN ) 但越来越多的产品允许发言输入密码并使用语音识别确定数字代码。一些应用程序用银行卡,智能卡,或使用中的电话号码取代个人识别码的手动或语音输入。当一个说话人确认系统联系用户时,个人识别码也会被取消,一个典型的这种系统被用来监测在家服刑的罪犯。用户:声明一个身份系统:访问该身份的存储声纹系统:提示用户输入密码用户:说出密码系统:比较密码和存储样本系统:比较结果和阈值系统:接受或拒绝身份声明图B1一旦身份声明被做出,系统会取回声明身份的存储语音样本(叫做声纹)并要求声明用户的语音输入。通常,要求的输入是一个密码。最新输入的语音同存储的声纹相比较,比较的结果用一个接受拒绝的阈值进行衡量。最终,系统接受说话人为授权用户,或拒绝说话人为冒名顶替者,或做出应用程序定义的其它动作。一些系统报告一个可信度或其它评分来说明它的决定的可信程度。如果确认成功,系统可能升级存储声纹的声学信息。这个过程叫做适应。适应是用来保持声纹正确性的一种稳妥的解决方案。它在许多商用说话人确认系统中被使用。2.2 语音样本同所有的生物认证一样,在确认(或辨认)可以被执行之前,一个语音样本必须被提供(这个过程也叫做登记)。这个样本被用来生成存储声纹。在需要登记和确认的语音类型和数量方面,系统之间有区别。这些系统的基本分类是:l 文本相关l 文本无关l 文本提示型2.2.1 文本相关大部分的商业系统都是文本相关的。文本相关的系统期待用户说出事先定义好的词组、密码或者标识符。通过对被说出单词的控制,系统可以从存储的声纹中找出最为匹配的一个。一个典型的例子,每个用户可以选择一个私有的密码,尽管一些管理员更喜欢分配密码。因为冒名顶替者需要同时知道正确的个人身份号码和密码并且还要拥有一个相匹配的声音,所以密码提供了额外的安全性。有些系统通过不存储密码的人类可读性信息来进一步提高安全性。通用短语也可以被使用。在1996年的说话人确认试验中,大通曼哈顿银行使用了“化学银行确认”。通用短语避免了忘记密码的问题,但是缺乏私有密码所提供的额外保护。2.2.2 文本无关文本无关的系统要求用户说话。该用户每次说的内容是不同的。精确的匹配完全不同的语音是非常困难的,尤其是在高噪音环境下或者非常差的电话连接中。因此,文本无关确认的商业化部署受到限制。2.2.3 文本提示型文本提示系统(也叫做口令应答)要求说话人重复一个或多个随机选择的数字或单词(例如“43516”、“27、46”或者“星期五、计算机”)。文本提示增加了登记和确认的时间,但是它提高了针对磁带录音的安全性。由于重述的条目不能被预测到,播放录音是非常困难的。此外,这里没有忘记密码的问题。即使是使用个人身份号码,它也可能被遗忘掉。2.3 反说话人模型大部分系统把新的语音样本同要求身份的存储声纹进行比较。另一些系统也把最近输入的语音同其它人的声音相比较。这种技术被叫做反说话人模型。反说话人模型的基本原理是在任何条件下,来自某一特定说话人的语音样本比起其它说话人的语音样本总是更像这个说话人的其它样本。例如,如果说话人使用一个差的电话连接并且这个说话人的声纹匹配也很差,很有可能同期组群(或世界模型)的得分会更差。最常见的反说话人技术有:l 区别训练l 同期组群模型l 世界模型区别训练在系统中建立了使用其它说话人声音的新说话人的声纹对照。同期组群模型挑选少数说话人。他们的声音与已登记人类似。例如,同期组群通常是相同性别的说话人。当说话人试图确认时,进入的语音与他她的声纹及其每一个同期组群说话人的声纹进行比较。世界模型(又称背景模式或复合模式) 包含一个语音的横截面断片。同一个世界模型被用于所有的说话人。2.4 物理和行为生物测定学说话人识别通常表现

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论