版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、July 1st, 2002Speech acoustics and phonetics, Il Ciocco1OverviewnDynamics in speech acousticsnContour modeling (mainly formants)nAspects of spectral undershootnModeling V and C reductionnPhonetic knowledge from speech corporanIFA, CGN, TIMIT, found speechnConclusionsJuly 1st, 2002Speech acoustics an
2、d phonetics, Il Ciocco3Dynamics in speech acousticsnDynamics is the norm, not stationaritynarticulatory efficiencynDynamics is everywherengenerally no word boundaries in speechndeletion of words, syllables, phonemes; insertionnwithin/between word coarticulation/assimilationnvowel and consonant reduc
3、tionnAcoustic manifestationsnsegment duration, F0, loudness, spectral qualityJuly 1st, 2002Speech acoustics and phonetics, Il Ciocco4Dynamics is the normnThe speaker speaks as sloppily as the listeners allow him to do in communicationncommunicative efficiencynArticulatory vs. perceptual efficiencynd
4、o spectral transitions facilitate or hamper perception? see other presentationnSpeaker flexibility; speaking style (clear vs. sloppy); speaking rateJuly 1st, 2002Speech acoustics and phonetics, Il Ciocco5Dynamics is everywherenDeletionnbread and butter /brEmbY3/nAmsterdam (Du) /AmstrdAm/ /AmsdAm/nko
5、ninklijke (Du) /konIklk/ /kolk/nInsertionnhomorganic glide insertion: die een (Du) /dijn/nDegeminationnis zichtbaar (Du) /Is zIxtbar/ /IsIxbar/nReduction, coarticulation, assimilationJuly 1st, 2002Speech acoustics and phonetics, Il Ciocco6Acoustic manifestationsnpitch, loudness, formant, component c
6、ontoursncontour stylization (e.g., pitch in praat)ncontour modelingnn-th degree curve fitting(D.van Bergem)nLegendre polynomials)(R.van Son)n16 points per segment)n(phoneme) segmentationnby hand (time consuming; non-consistent)nautomatically (via forced phoneme recognition and a pronunciation lexico
7、n with alternatives; systematic errors)July 1st, 2002Speech acoustics and phonetics, Il Ciocco7Contour modelingnallows modeling of specific phenomenanpitch accentuation (vs. vowel onset)nreduction, centralization, undershootnallows generation of stimuli for perc. expts.nphoneme identification in ext
8、ending contextn2-alternatives forced choice identif. of continuandiscrimination, RTnallows statistics on large speech corporanTIMIT, CGN, IFA-corpus, SwitchboardJuly 1st, 2002Speech acoustics and phonetics, Il Ciocco8Static vs. dynamic V recogn.nsee Weenink (2001)n“Vowel normalizations with the TIMI
9、T acoustic phonetic speech corpus”, IFA Proc. 24, 117-123n438 males, both train & test sent. of TIMITn35,385 vowel segments, hand segmentedn13 monophthongeal vowel categoriesn1-Bark bandfilter anal. (18), intensity. normal.n3 frames per segment: central and 25 ms L/RJuly 1st, 2002Speech acoustic
10、s and phonetics, Il Ciocco9Some resultsnVowel classif. (%) with discriminant functionsCondition# ItemsStatic 1 frameDynamic 3 framesOriginal35,385438x13x(125)59.366.9speaker normalized35,38562.269.2V centers per speaker5,374438x1378.990.1speaker normalized5,37487.994.5July 1st, 2002Speech acoustics
11、and phonetics, Il Ciocco10Formant tracks / speaking ratenPh.D. thesis Rob van Son (1993)n“Spectro-temporal features of vowel segments”nsee also Speech Comm. 13, 135-148 (Pols & vSon)n850-words text, read at normal and fast ratenhand segmentation of 7 most freq. V + schwanformant tracksnvia 16 po
12、ints per segm. or 5 Legendre polynomialsninfluence of rate, V-dur., context, sent. acc.nevidence for duration-controlled undershoot?July 1st, 2002Speech acoustics and phonetics, Il Ciocco11Some resultsnno differences for F1/F2 in vowel center for normal- or fast-rate speech; only some over- all rise
13、 in F1 for fast rate (irrespective of V)nsame formant track shape (normalized to 16 points) for normal- or fast-rate speechnsame results when using the more elaborate Legendre polynomialsnConcl.: changes in V-duration do not change the amount of undershoot active control of articulation speedJuly 1s
14、t, 2002Speech acoustics and phonetics, Il Ciocco12Formant representations800100012001400160018002000300400500600yoauiNormal rate Fast rateF -1F -2-250-200-150-100-50050100150200250-150-100-50050ayouiNormal rate Fast rateF -2F -1zeroth order Legendre Legendre polynomial coefficients (mean Fi in vowel
15、 segment)second order polynomials (axes reversed)eeJuly 1st, 2002Speech acoustics and phonetics, Il Ciocco13Modeling vowel reductionnPh.D. thesis Dick van Bergem (1995)n“Acoustic and lexical vowel reduction”nsee also Speech Communication 16, 329-358nlexical V reduction Fr /bet/ vs. Du /btOn/nacousti
16、c V reduction /banan, bAnan, bnan/ nf(sent. acc., w. str., w. class): can-candy-canteenncoarticulatory effects on the schwanC1C2V- and VC1C2-type nonsense wordsnperceptual effects (full V or schwa, f.i. ananas)July 1st, 2002Speech acoustics and phonetics, Il Ciocco14Some resultsThe schwa is not just
17、 a centralized vowel but somethingthat is completely assimilated with its phonemic contextt-nw-lJuly 1st, 2002Speech acoustics and phonetics, Il Ciocco15Modeling consonant reductionnSp. Comm. (1999) 28, 125-140 (vSon & Pols)n20 min. speech, both spontaneous and readn2 x 791 similar VCV; hand seg
18、mentedn5 aspects of V and C reductionnrelated to coarticulation: F2 slope differences at CV- vs. VC-boundaries; F2 locus equations (F2 onset vs. F2 target)nrelated to speaking effort: duration; spectral COG (mean freq.); V-C sound energy differencesJuly 1st, 2002Speech acoustics and phonetics, Il Ci
19、occo16Some resultsnV markedly reduced in spontaneous speechnlower F2-slope diff. in spontaneous speech decrease in articulation speednno systematic effect on F2 locus equation; V onsets and targets change in concert any V reduction mirrored by comparable change in Cnspont. sp.: V and C shorter; lowe
20、r COG decrease in vocal and articulatory effortJuly 1st, 2002Speech acoustics and phonetics, Il Ciocco17Access to large corporanmore, and more realistic, datanphonetic knowledge via statistical analysesnf.i. highly accessible IFA-corpus (free, SQL)nsee “Structure and access of the open source IFA-co
21、rpus”, IFA Proc. 24, 15-26 (vSon & Pols)n4 M/4F speakers, 5.5 hrs of speechnfrom informal to read + sent., words, syllablesn 50Kwords segm. and labeled at phoneme levelJuly 1st, 2002Speech acoustics and phonetics, Il Ciocco18Some resultsnspeech + annot. + meta data: relational DBnrealization of
22、final n, f.i. Du geven /xev(n)/Style#wrds/n/All% /n/Informal5,2501304305 0.3Retelling6,22913236249 5.2 LFHFNarr. story14,453180372552334230Sentences14,97020334054337Pseudo-sent2,55462198177All43,4564591,2711,73036ReadJuly 1st, 2002Speech acoustics and phonetics, Il Ciocco19Spoken Dutch Corpus (CGN)n
23、10 M words, 1,000 hrs of speechnvariety of styles, incl. telephone speechnadult Dutch and Flemish speakersnfor linguistic and technological researchnsee various LREC and ICSLP papers (2002)nfully transcribed: orthogr., POS, lemmasnpartly transcr.: phonemic, prosodic, syntacticJuly 1st, 2002Speech ac
24、oustics and phonetics, Il Ciocco20TIMITnpopular DB in acoustic phonetics and ASRnalso telephone version (NTIMIT)nhand segmented & labeled at phoneme leveln438 males, 192 females (8 dialect regions)n10 sent./sp. (2 fixed, 1 phon. compact, 7 diverse)sa1: “She had her dark suit in greasy wash water
25、 all year”nincludes separate test data (112 M, 56 F) ne.g. Ph.D thesis X. Wang (1997)“Incorporating knowledge on segmental duration in HMM-based continuous speech recognition”July 1st, 2002Speech acoustics and phonetics, Il Ciocco21RSRoot /iy/LwLucountmeans.d.factorlevel46269539154483311588953614941
26、094679678257118936379125816872973510440379834719983372911954461044291805299111775798052947013618010143310114832210719952945012612186812113498461113749637156229001201201201201230123012001202012263022252750254224360274652232524375827Useful info: durational variabilityAdopted from Wang (1998)normal rat
27、e=95 primary stress=104word final=136utterance final=186overall average=95 ms(fast rate slow rate)histogram count (number of utterances)020406080100120140-0.76-0.53-0.3-0.070.160.390.620.851.081.311.54020406080100120140160180utterance-averaged phone duration (ms)histogramphone durd,normalized phone
28、durationspeaking raterNiiN11,all 3,696 training sent. (sx + si) of TIMIT training set0July 1st, 2002Speech acoustics and phonetics, Il Ciocco23found speechnDARPA-LVSR community rather ambitiousnBroadcast News (BN), Sp.Comm. 37 (2002) 95WSJ NAB read sp.1995Market place1996F0-F5, FX partitioned19973 hrs test unpartit.1998+ non Engl. speech also 900 Mbest % WERon test set27.0 %27.1 %1:4
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 冰淇淋广告营销方案(3篇)
- 木制走廊施工方案范本(3篇)
- 2026年商务工作计划(2篇)
- 深圳地区软土地基沉降规律的深度剖析与实践应用
- 深入剖析相依风险模型:理论、应用与前沿探索
- 淮安市地方政府安全生产监管:现状剖析与优化路径探究
- 淘宝生鲜水果网购:感知风险与购买意愿的深度剖析
- 涵道共轴双旋翼无人机飞控算法关键技术剖析与实践
- 液态铅铋合金固态氧控系统中氧化铅颗粒的制备、性能及应用研究
- 液力机械式自动变速器传动效率优化及对整车燃油经济性的影响研究
- 电解铝厂安全规程样本
- 2025年中考历史热点专题复习资料
- 企业微信的使用培训
- 2025年语文四年级下第二单元习作范文10篇(我的奇思妙想)
- GA/T 761-2024停车库(场)安全管理系统技术要求
- 历史人物孙中山介绍完整版课件
- 银行破产管理人账户营销案例
- 楼板下加钢梁加固施工方案
- 卫生院财务培训课件
- 快递加盟策划方案
- 下肢动脉硬化闭塞症伴坏疽的护理查房
评论
0/150
提交评论