已阅读5页,还剩30页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Sphinx-4 Application Programmers GuideThis tutorial shows you how to write Sphinx-4 applications. We will use the HelloWorld demo as an example to show how a simple application can be written. We will then proceed to a more complex example. Consequently, this tutorial is divided into the following parts:1. Simple Example - HelloWorld o Code Walk - HelloWorld.javao Configuration File Walk - helloworld.config.xml Recognizer Decoder Linguist Acoustic Model Front End Instrumentation2.3. More Complex Example - Hello NGram o Code Walk - HelloNGram.javao N-Gram Language Modelo Configuration File Walk - hellongram.config.xml4. Two ways of configuring Sphinx4 o Configuration Managemento Raw Configuration5. Interpreting the Recognition Result6. Writing Sphinx4 Scripts o Groovyo Pythono Clojure1. Simple Example - HelloWorldWe will look at a very simple Sphinx-4 speech application, the HelloWorld demo. This application recognizes very restricted type of speech - greetings. As you will see, the code is very simple. The harder part is understanding the configuration, but we will guide you through every step of it. Lets look at the code first.Code Walk - HelloWorld.javaAll the source code of the HelloWorld demo is in one short file sphinx4/src/apps/edu/cmu/sphinx/demo/helloworld/HelloWorld.java:package edu.cmu.sphinx.demo.helloworld;import edu.cmu.sphinx.frontend.util.Microphone;import edu.cmu.sphinx.recognizer.Recognizer;import edu.cmu.sphinx.result.Result;import ps.ConfigurationManager;/* * A simple HelloWorld demo showing a simple speech application built using Sphinx-4. This application uses the Sphinx-4 * endpointer, which automatically segments incoming audio into utterances and silences. */public class HelloWorld public static void main(String args) ConfigurationManager cm; if (args.length 0) cm = new ConfigurationManager(args0); else cm = new ConfigurationManager(HelloWorld.class.getResource(helloworld.config.xml); Recognizer recognizer = (Recognizer) cm.lookup(recognizer); recognizer.allocate(); / start the microphone or exit if the programm if this is not possible Microphone microphone = (Microphone) cm.lookup(microphone); if (!microphone.startRecording() System.out.println(Cannot start microphone.); recognizer.deallocate(); System.exit(1); System.out.println(Say: (Good morning | Hello) ( Bhiksha | Evandro | Paul | Philip | Rita | Will ); / loop the recognition until the programm exits. while (true) System.out.println(Start speaking. Press Ctrl-C to quit.n); Result result = recognizer.recognize(); if (result != null) String resultText = result.getBestFinalResultNoFiller(); System.out.println(You said: + resultText + n); else System.out.println(I cant hear what you said.n); This demo imports several important classes in Sphinx-4:edu.cmu.sphinx.recognizer.Recognizeredu.cmu.sphinx.result.Rps.ConfigurationManager The Recognizer is the main class any application should interact with. The Result is returned by the Recognizer to the application after recognition completes. The ConfigurationManager creates the entire Sphinx-4 system according to the configuration specified by the user.Lets look at the main() method. The first few lines creates the URL of the XML-based configuration file. A ConfigurationManager is then created using that URL. The ConfigurationManager then reads in the file internally. Since the configuration file specifies the components recognizer and microphone (we will look at the configuration file next), we perform a lookup() in the ConfigurationManager to obtain these components. The allocate() method of the Recognizer is then called to allocate the resources need for the recognizer. The Microphone class is used for capturing live audio from the system audio device. Both the Recognizer and the Microphone is configured as specified in the configuration file.Once all the necessary components are created, we can start running the demo. The program first turns on the Microphone (microphone.startRecording(). After the microphone is turned on successfully, the program enters a loop that repeats the following. It tries to recognize what the user is saying, using the Recognizer.recognize() method. Recognition stops when the user stops speaking, which is detected by the endpointer built into the front end by configuration. Once an utterance is recognized, the recognized text, which is returned by the method Result.getBestResultNoFiller(), is printed out. If the Recognizer recognized nothing (i.e., result is null), then it will print out a message saying that. Finally, if the demo program cannot turn on the microphone in the first place, the Recognizer will be deallocated, and the program exits. It is generally a good practice to call the method deallocate() after the work is done to release all the resources.Note that several exceptions are thrown. These exceptions should be caught and handled appropriately.Hopefully, by this point, you will have some idea of how to write a simple Sphinx-4 application. We will now turn to the harder part, understanding the various components necessary to create a grammar-based recognizer. These components are specified in the configuration file, which we will now explain in depth.Configuration File Walk - helloworld.config.xmlIn this section, we will explain the various Sphinx-4 components that are used for the HelloWorld demo, as specified in the configuration file. We will look at each section of the config file in depth. If you want to learn about the format of these configuration files, please refer to the document Sphinx-4 Configuration Management.The lines below define the frequently tuned properties. They are located at the top of the configuration file so that they can be edited quickly. RecognizerThe lines below define the recognizer component that performs speech recognition. It defines the name and class of the recognizer, Recognizer. This is the class that any application should interact with. If you look at the javadoc of the Recognizer class, you will see that it has two properties, decoder and monitors. This configuration file is where the value of these properties are defined. accuracyTracker speedTracker memoryTracker We will explain the monitors later. For now, lets look at the decoder.DecoderThe decoder property of the recognizer is set to the component called decoder, which is defined as: The decoder component is of class edu.cmu.sphinx.decoder.Decoder. Its property searchManager is set to the component searchManager, defined as: The searchManager is of class edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager. This class performs a simple breadth-first search through the search graph during the decoding process to find the best path. This search manager is suitable for small to medium sized vocabulary decoding. The logMath property is the log math that is used for calculation of scores during the search process. It is defined as having the log base of 1.0001. Note that typically the same log base should be used throughout all components, and therefore there should only be one logMath definition in a configuration file: The linguist of the searchManager is set to the component flatLinguist (which we will look at later), which again is suitable for small to medium sized vocabulary decoding. The pruner is set to the trivialPruner: which is of class edu.cmu.sphinx.decoder.pruner.SimplePruner. This pruner performs simple absolute beam and relative beam pruning based on the scores of the tokens. The scorer of the searchManager is set to the component threadedScorer, which is of class edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer. It can use multiple threads (usually one per CPU) to score the tokens in the active list. Scoring is one of the most time-consuming step of the decoding process. Tokens can be scored independently of each other, so using multiple CPUs will definitely speed things up. The threadedScorer is defined as follows: The frontend property is the front end from which features are obtained. For details about the other properties of the threadedScorer, please refer to javadoc for ThreadedAcousticScorer. Finally, the activeListFactory property of the searchManager is set to the component activeList, which is defined as follows: component name=activeList type=edu.cmu.sphinx.decoder.search.PartitionActiveListFactory property name=logMath value=logMath/ property name=absoluteBeamWidth value=$absoluteBeamWidth/ property name=relativeBeamWidth value=$relativeBeamWidth/ /componentIt is of class edu.cmu.sphinx.decoder.search.PartitionActiveListFactory. It uses a partitioning algorithm to select the top N highest scoring tokens when performing absolute beam pruning. The logMath property specifies the logMath used for score calculation, which is the same LogMath used in the searchManager. The property absoluteBeamWidth is set to the value given at the very top of the configuration file using $absoluteBeamWidth. The same is for $relativeBeamWidth. LinguistNow lets look at the flatLinguist component (a component inside the searchManager). The linguist is the component that generates the search graph using the guidance from the grammar, and knowledge from the dictionary, acoustic model, and language model. It also uses the logMath that weve seen already. The grammar used is the component called jsgfGrammar, which is a BNF-style grammar: JSGF grammars are defined in JSAPI. The class that translates JSGF into a form that Sphinx-4 understands is edu.cmu.sphinx.jsapi.JSGFGrammar. Note that this link to the javadoc also describes the limitations of the current implementation). The property grammarLocation can take two kinds of values. If it is a URL, it specifies the URL of the directory where JSGF grammar files are to be found. Otherwise, it is interpreted as resource locator. In our example, the HelloWorld demo is being deployed as a JAR file. The grammarLocation property is therefore used to specify the location of the resource hello.gram within the JAR file. Note that it is not necessary to the JAR file within which to search. The grammarName property specifies the grammar to use when creating the search graph. logMath is the same log math as the other components. The dictionary is the component that maps words to their phonemes. It is almost always the dictionary of the acoustic model, which lists all the words that were used to train the acoustic model: property name=wordReplacement value=/ The locations of these dictionary files are specified using the Sphinx-4 resource mechanism. The dictionary for filler words like BREATH and LIP_SMACK is the file fillerdict.For details about the other possible properties, please refer to the javadoc for FastDictionary.Acoustic ModelThe next important property of the flatLinguist is the acoustic model which describes sounds of the language. It is defined as: property name=wordReplacement value=/ wsj stands for the Wall Street Journal acoustic models. Sphinx-4 can load acoustic models trained by Sphinxtrain. Common models are packed into JAR files during build and located in lib folder. Sphinx3Loader class. is used to load them. The JAR needs to be included into classpath. The JAR file for the WSJ models is called WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar, and is in the sphinx4/lib directory. As a programmer, all you need to do is to specify the class of the AcousticModel, and the loader of the AcousticModel, as shown above (note that if you are using the WSJ model in other applications, these lines should be the same, except that you might have called your logMath component something else). is in the sphinx4/lib directory. Acoustic model could be located in filesystem or on any other resource. You need to specify the model location in location property then.The next properties of the flatLinguist are the wordInsertionProbability and languageWeight. These properties are usually for fine tuning the system. Below are the default values we used for the various tasks. You can tune your system accordingly:Vocabulary SizeWord Insertion ProbabilityLanguage WeightDigits (11 words - TIDIGITS)1E-368Small (
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年一级注册建筑师之设计前期与场地设计通关提分题库及完整答案
- 胆囊钙化的护理
- 雨课堂学堂在线学堂云《材料成形技术基础(合工大)》单元测试考核答案
- 高考化学“8+1”模拟练试卷含答案(十七)
- 2025福建省晋江文旅集团有限公司及其权属子公司招聘工作人员11人历年真题汇编附答案解析
- 2026年度全国社会工作者《社会工作实务》备考真题带答案解析
- 四川港荣能源集团有限公司招聘历年真题汇编带答案解析
- 2025重庆市属事业单位第四季度遴选工作人员43人历年真题库附答案解析
- 2026年(通讯维修工)理论知识考试题库及参考答案【考试直接用】
- 青岛市卫生健康委员会直属事业单位校园招聘2026届高校毕业生(407名)历年真题汇编附答案解析
- 2024年秋季学期新人教版数学一年级上册课件 第5单元 20以内的进位加法 3 5、4、3、2加几 练一练
- 巨量引擎信息流广告(初级)认证理论试题库资料(含答案)
- (正式版)CB∕T 4553-2024 船舶制造舱室封舱及密性试验作业安全管理规定
- 承德施工组织设计暗标
- 冬季施工安全措施
- 中建EPC工程总承包项目全过程风险清单(2023年)
- 高职院校学前教育专业教学标准汇总
- 床旁教学与患者教育的原则与方法
- 干部宪法知识讲座
- 体育室内课《篮球ppt课件》
- 餐厅小票打印模板
评论
0/150
提交评论