Sphinx-4应用程序编程指南_第1页
Sphinx-4应用程序编程指南_第2页
Sphinx-4应用程序编程指南_第3页
Sphinx-4应用程序编程指南_第4页
Sphinx-4应用程序编程指南_第5页
已阅读5页,还剩30页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Sphinx-4 Application Programmers GuideThis tutorial shows you how to write Sphinx-4 applications. We will use the HelloWorld demo as an example to show how a simple application can be written. We will then proceed to a more complex example. Consequently, this tutorial is divided into the following parts:1. Simple Example - HelloWorld o Code Walk - HelloWorld.javao Configuration File Walk - helloworld.config.xml Recognizer Decoder Linguist Acoustic Model Front End Instrumentation2.3. More Complex Example - Hello NGram o Code Walk - HelloNGram.javao N-Gram Language Modelo Configuration File Walk - hellongram.config.xml4. Two ways of configuring Sphinx4 o Configuration Managemento Raw Configuration5. Interpreting the Recognition Result6. Writing Sphinx4 Scripts o Groovyo Pythono Clojure1. Simple Example - HelloWorldWe will look at a very simple Sphinx-4 speech application, the HelloWorld demo. This application recognizes very restricted type of speech - greetings. As you will see, the code is very simple. The harder part is understanding the configuration, but we will guide you through every step of it. Lets look at the code first.Code Walk - HelloWorld.javaAll the source code of the HelloWorld demo is in one short file sphinx4/src/apps/edu/cmu/sphinx/demo/helloworld/HelloWorld.java:package edu.cmu.sphinx.demo.helloworld;import edu.cmu.sphinx.frontend.util.Microphone;import edu.cmu.sphinx.recognizer.Recognizer;import edu.cmu.sphinx.result.Result;import ps.ConfigurationManager;/* * A simple HelloWorld demo showing a simple speech application built using Sphinx-4. This application uses the Sphinx-4 * endpointer, which automatically segments incoming audio into utterances and silences. */public class HelloWorld public static void main(String args) ConfigurationManager cm; if (args.length 0) cm = new ConfigurationManager(args0); else cm = new ConfigurationManager(HelloWorld.class.getResource(helloworld.config.xml); Recognizer recognizer = (Recognizer) cm.lookup(recognizer); recognizer.allocate(); / start the microphone or exit if the programm if this is not possible Microphone microphone = (Microphone) cm.lookup(microphone); if (!microphone.startRecording() System.out.println(Cannot start microphone.); recognizer.deallocate(); System.exit(1); System.out.println(Say: (Good morning | Hello) ( Bhiksha | Evandro | Paul | Philip | Rita | Will ); / loop the recognition until the programm exits. while (true) System.out.println(Start speaking. Press Ctrl-C to quit.n); Result result = recognizer.recognize(); if (result != null) String resultText = result.getBestFinalResultNoFiller(); System.out.println(You said: + resultText + n); else System.out.println(I cant hear what you said.n); This demo imports several important classes in Sphinx-4:edu.cmu.sphinx.recognizer.Recognizeredu.cmu.sphinx.result.Rps.ConfigurationManager The Recognizer is the main class any application should interact with. The Result is returned by the Recognizer to the application after recognition completes. The ConfigurationManager creates the entire Sphinx-4 system according to the configuration specified by the user.Lets look at the main() method. The first few lines creates the URL of the XML-based configuration file. A ConfigurationManager is then created using that URL. The ConfigurationManager then reads in the file internally. Since the configuration file specifies the components recognizer and microphone (we will look at the configuration file next), we perform a lookup() in the ConfigurationManager to obtain these components. The allocate() method of the Recognizer is then called to allocate the resources need for the recognizer. The Microphone class is used for capturing live audio from the system audio device. Both the Recognizer and the Microphone is configured as specified in the configuration file.Once all the necessary components are created, we can start running the demo. The program first turns on the Microphone (microphone.startRecording(). After the microphone is turned on successfully, the program enters a loop that repeats the following. It tries to recognize what the user is saying, using the Recognizer.recognize() method. Recognition stops when the user stops speaking, which is detected by the endpointer built into the front end by configuration. Once an utterance is recognized, the recognized text, which is returned by the method Result.getBestResultNoFiller(), is printed out. If the Recognizer recognized nothing (i.e., result is null), then it will print out a message saying that. Finally, if the demo program cannot turn on the microphone in the first place, the Recognizer will be deallocated, and the program exits. It is generally a good practice to call the method deallocate() after the work is done to release all the resources.Note that several exceptions are thrown. These exceptions should be caught and handled appropriately.Hopefully, by this point, you will have some idea of how to write a simple Sphinx-4 application. We will now turn to the harder part, understanding the various components necessary to create a grammar-based recognizer. These components are specified in the configuration file, which we will now explain in depth.Configuration File Walk - helloworld.config.xmlIn this section, we will explain the various Sphinx-4 components that are used for the HelloWorld demo, as specified in the configuration file. We will look at each section of the config file in depth. If you want to learn about the format of these configuration files, please refer to the document Sphinx-4 Configuration Management.The lines below define the frequently tuned properties. They are located at the top of the configuration file so that they can be edited quickly. RecognizerThe lines below define the recognizer component that performs speech recognition. It defines the name and class of the recognizer, Recognizer. This is the class that any application should interact with. If you look at the javadoc of the Recognizer class, you will see that it has two properties, decoder and monitors. This configuration file is where the value of these properties are defined. accuracyTracker speedTracker memoryTracker We will explain the monitors later. For now, lets look at the decoder.DecoderThe decoder property of the recognizer is set to the component called decoder, which is defined as: The decoder component is of class edu.cmu.sphinx.decoder.Decoder. Its property searchManager is set to the component searchManager, defined as: The searchManager is of class edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager. This class performs a simple breadth-first search through the search graph during the decoding process to find the best path. This search manager is suitable for small to medium sized vocabulary decoding. The logMath property is the log math that is used for calculation of scores during the search process. It is defined as having the log base of 1.0001. Note that typically the same log base should be used throughout all components, and therefore there should only be one logMath definition in a configuration file: The linguist of the searchManager is set to the component flatLinguist (which we will look at later), which again is suitable for small to medium sized vocabulary decoding. The pruner is set to the trivialPruner: which is of class edu.cmu.sphinx.decoder.pruner.SimplePruner. This pruner performs simple absolute beam and relative beam pruning based on the scores of the tokens. The scorer of the searchManager is set to the component threadedScorer, which is of class edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer. It can use multiple threads (usually one per CPU) to score the tokens in the active list. Scoring is one of the most time-consuming step of the decoding process. Tokens can be scored independently of each other, so using multiple CPUs will definitely speed things up. The threadedScorer is defined as follows: The frontend property is the front end from which features are obtained. For details about the other properties of the threadedScorer, please refer to javadoc for ThreadedAcousticScorer. Finally, the activeListFactory property of the searchManager is set to the component activeList, which is defined as follows: component name=activeList type=edu.cmu.sphinx.decoder.search.PartitionActiveListFactory property name=logMath value=logMath/ property name=absoluteBeamWidth value=$absoluteBeamWidth/ property name=relativeBeamWidth value=$relativeBeamWidth/ /componentIt is of class edu.cmu.sphinx.decoder.search.PartitionActiveListFactory. It uses a partitioning algorithm to select the top N highest scoring tokens when performing absolute beam pruning. The logMath property specifies the logMath used for score calculation, which is the same LogMath used in the searchManager. The property absoluteBeamWidth is set to the value given at the very top of the configuration file using $absoluteBeamWidth. The same is for $relativeBeamWidth. LinguistNow lets look at the flatLinguist component (a component inside the searchManager). The linguist is the component that generates the search graph using the guidance from the grammar, and knowledge from the dictionary, acoustic model, and language model. It also uses the logMath that weve seen already. The grammar used is the component called jsgfGrammar, which is a BNF-style grammar: JSGF grammars are defined in JSAPI. The class that translates JSGF into a form that Sphinx-4 understands is edu.cmu.sphinx.jsapi.JSGFGrammar. Note that this link to the javadoc also describes the limitations of the current implementation). The property grammarLocation can take two kinds of values. If it is a URL, it specifies the URL of the directory where JSGF grammar files are to be found. Otherwise, it is interpreted as resource locator. In our example, the HelloWorld demo is being deployed as a JAR file. The grammarLocation property is therefore used to specify the location of the resource hello.gram within the JAR file. Note that it is not necessary to the JAR file within which to search. The grammarName property specifies the grammar to use when creating the search graph. logMath is the same log math as the other components. The dictionary is the component that maps words to their phonemes. It is almost always the dictionary of the acoustic model, which lists all the words that were used to train the acoustic model: property name=wordReplacement value=/ The locations of these dictionary files are specified using the Sphinx-4 resource mechanism. The dictionary for filler words like BREATH and LIP_SMACK is the file fillerdict.For details about the other possible properties, please refer to the javadoc for FastDictionary.Acoustic ModelThe next important property of the flatLinguist is the acoustic model which describes sounds of the language. It is defined as: property name=wordReplacement value=/ wsj stands for the Wall Street Journal acoustic models. Sphinx-4 can load acoustic models trained by Sphinxtrain. Common models are packed into JAR files during build and located in lib folder. Sphinx3Loader class. is used to load them. The JAR needs to be included into classpath. The JAR file for the WSJ models is called WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar, and is in the sphinx4/lib directory. As a programmer, all you need to do is to specify the class of the AcousticModel, and the loader of the AcousticModel, as shown above (note that if you are using the WSJ model in other applications, these lines should be the same, except that you might have called your logMath component something else). is in the sphinx4/lib directory. Acoustic model could be located in filesystem or on any other resource. You need to specify the model location in location property then.The next properties of the flatLinguist are the wordInsertionProbability and languageWeight. These properties are usually for fine tuning the system. Below are the default values we used for the various tasks. You can tune your system accordingly:Vocabulary SizeWord Insertion ProbabilityLanguage WeightDigits (11 words - TIDIGITS)1E-368Small (

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论