语音处理第四次组会.pptx

上传人：j*** IP属地：河南上传时间：2020-08-08 格式：PPTX 页数：30 大小：2.84MB 积分：20 举报 版权申诉

已阅读5页，还剩25页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、2013.11.30,1.尝试使用Alize+Spro+python构建说话人识别平台 (1)ALIZE version 3.x- http:/mistral.univ-avignon.fr/index_en.html-LIA_RAL- LIA_SpkDET (2)Spro4.0.1-http:/www.irisa.fr/metiss/guig/spro/index.html-Filter-bank cepstral features (3) python3.3.2 2.机器学习的哲学探索读书笔记,参考素材,1.paper-ALIZE, a free toolkit for speaker r

2、ecognition 2.paper-ALIZE/SpkDet: a state-of-the-art open source software for speaker recognition 3.HTK -http:/htk.eng.cam.ac.uk/ 5.BILLs block”使用Alize等工具构建说话人识别平台”-http:/ibillxia.github.io/blog/2013/04/26/building-speaker-recognition-system-using-alize-etc/ 6. ALIZE 3.0 - Open-source platform for sp

3、eaker recognition-/technical-committees/list/sl-tc/spl-nl/2013-05/ALIZE/,ALIZE介绍,The ALIZE project consists of a low level API (ALIZE) and a set of high level executables that form the LIA_RAL toolkit. The ensemble makes it possible to easily set up a speaker rec

4、ognition system for research purposes as well as develop industry based applications. LIA_RAL is a high level toolkit based on the low level ALIZE API. It consists of three sets of executables: LIA_SpkSeg, LIA_Utils and LIA_SpkDET. LIA_SpkSeg and LIA_Utils respectively include executables dedicated

5、to speaker segmentation and utility programs to handle ALIZE objects while LIA_SpkDet is developed to fulfil the main functions of a state-of-the-art speaker recognition system as described in the following figure.,ALIZE介绍,ALIZE does not include acoustic feature extraction but is compatible withSPro

6、 , HTK and RAW formats Score matrices can be exported in binary format easily handled by the BOSARIS toolkit,SPro介绍,spro is aimed at extracting features in the area of speaker recognition,you can extract features such as mfcc and lpc. SPro is a free speech signal processing toolkit which provides ru

7、ntime commands implementing standard feature extraction algorithms for speech related applications and a C library to implement new algorithms and to use SPro files within your own programs. SPro was originally designed for variable resolution spectral analysis but also provides for feature extracti

8、on techniques classically used in speech applications. There are commands for the following representations:filter-bank energies cepstral coefficients linear prediction derived representation,SPro介绍,Though the toolkit has been designed as a front-end for applications such as speech or speaker recogn

9、ition, we believe the library provides enough possibilities to implement various feature extraction algorithms easily (e.g. zero crossing rate). However, no command for such features is provided. The library, written in ANSI C, provides functions for the following: waveform signal input low-level si

10、gnal processing (FFT, LPC analysis, etc.) low-level feature processing (lifter, CMS, variance normalization, deltas, etc.) feature I/O,SPro介绍,The library does not provide for high-level feature extraction functions which directly converts a waveform into features, mainly because such functions would

11、 require a tremendous number of arguments in order to be versatile. However, it is rather trivial to write such a function for your particular needs using the SPro library.,SPro介绍,Filter-bank cepstral features The second filter-bank analysis tool,sfbcep, takes as input a waveform and output filter-b

12、ank derived cepstral features. The filter-bank processing is similar to what is done insfbank(see previous section). The cepstral coefficients are computed by DCTing the filter-bank log-magnitudes and possibly liftered. Optionally, the log-energy can be added to the feature vector. Insfbcep, the fra

13、me energy is calculated as the sum of the squared waveform samples after windowing. As for the magnitudes in the filter-bank, the log-energy are thresholded to keep them positive or null. The log-energies may be scaled to avoid differences between recordings. Mean and variance normalization of the s

14、tatic cepstral coefficients can be specified with the global-cmsand-normalizeoptions but do not apply to log-energies. The normalizations can be global (default) or based on a sliding window whose length is specified with-segment-length. Finally, first and second order derivatives of the cepstral co

15、efficients and of the log-energies can be appended to the feature vectors. When using delta features, the absolute log-energy can be suppressed using the-no-static-energyoption,第1步，特征提取MFCC,sfbcep.exe（MFCC）,第2步， Silence removal 静音检测和去除,NormFeat.exe 先能量规整 EnergyDetector.exe 基于能量检测的静音去除,第3步, Features

16、Normalization 特征规整,NormFeat.exe 再使用这个工具进行特征规整,第4步， World model training,TrainWorld.exe 训练UBM,第5步， Target model training,TrainWorld.exe 在训练好UBM的基础上，训练training set和testing set的GMM,第6步， Testing,ComputeTest.exe 将testing set 的GMM在training set的GMM上进行测试和打分,第7步， Score Normalization,ComputeNorm.exe 将得分进行规整,第

17、8步， Compute EER 计算等错误率,可以查查计算EER的matlab代码，NIST SRE的官网上有下载（/iad/mig/tools/DETware_v2.1.targz.htm）,others,关于各步骤中参数的问题，可以在命令行“工具 -help”来查看该工具个参数的具体含义，另外还可参考Alize源码中各个工具的test目录中提供的实例，而关于每个工具的作用及理论知识则需要查看相关论文。常见问题及解答: http:/mistral.univ-avignon.fr/mediawiki/index.php/Frequently_as

18、ked_questions 更多问题请在Google论坛（,Others-ALIZE中用到的功能（其它功能作用待研究）,Others-浅谈Python程序和C程序的整合,利用 ctypes 模块整合 Python 程序和 C 程序 ctypes 是 Python 的一个标准模块，它包含在 Python2.3 及以上的版本里。ctypes 是一个 Python 的高级外部函数接口，它使得 Python 程序可以调用 C 语言编译的静态链接库和动态链接库。运用 ctypes 模块，能够在 Python 源程序中创建，访问和操作简单的或复杂的 C 语言数据类型。最为重要的是 ctypes 模块能够在

19、多个平台上工作，包括 Windows，Windows CE，Mac OS X，Linux，Solaris，FreeBSD，OpenBSD。,机器学习的哲学探索-A.学科前沿P32,使用CiteSpace 2描绘知识图谱【附图】,机器学习的哲学探索-A.学科前沿P32,1. 知识图谱显示的学科前沿：【附图】,机器学习的哲学探索-A.学科前沿P32,2.重要的作者【附图】,机器学习的哲学探索-A.学科前沿P32,3.前沿知识群：增强学习；分类技术；数据挖掘 4.对知识群的分析得出，机器学习研究的两种目的：增强自身的性能+学习到人类可以理解的知识,机器学习的哲学探索-B.机器学习研究的演化路径P58,1.“增强学习”是机器学习研究中相对独立的一个领域，是布鲁克斯创立的行为主义人工智能研究范式在机器学习研究中的继续。 2.“数据挖掘”将两个曾经对立的研究范式融合在一起，采用一种实用主义的态度共同解决实践中的问题。以数据挖掘为核心的统计学习处理的一个最重要或者最基本的问题是“分类”。或许有人会认为“分类”是一个微不足道的过程，然而“分类”却遍布于智能的理解过程之中，及时向“机器人规划”这样的活动，也能够构建成为“分类”问题。或者说，目前整个机器学习的核心问题就是一个

人人文库> 全部分类> 生活休闲 > 科普知识

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

语音处理第四次组会.pptx

文档简介

温馨提示

最新文档

评论

语音处理第四次组会.pptx

文档简介

温馨提示

最新文档

评论

相关文档