使用MFCC,DTW和KNN的隔离词自动语音识别(ASR)系统【中文4600字】
收藏
资源目录
压缩包内文档预览:
编号:9103516
类型:共享资源
大小:1.49MB
格式:ZIP
上传时间:2018-02-28
上传人:闰***
认证信息
个人认证
冯**(实名认证)
河南
IP属地:河南
13
积分
- 关 键 词:
-
使用
mfcc
dtw
以及
knn
隔离
自动
语音
识别
辨认
asr
系统
中文
- 资源描述:
-
使用MFCC,DTW和KNN的隔离词自动语音识别(ASR)系统【中文4600字】,使用,mfcc,dtw,以及,knn,隔离,自动,语音,识别,辨认,asr,系统,中文
- 内容简介:
-
THE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST1069781467397919/16/31002016IEEEISOLATEDWORDAUTOMATICSPEECHRECOGNITIONASRSYSTEMUSINGMFCC,DTWFEATURESOFSPEECHAREEXTRACTEDUSINGMFCCSDTWISAPPLIEDFORSPEECHFEATUREMATCHINGKNNISEMPLOYEDASACLASSIFIERTHEEXPERIMENTALSETUPINCLUDESWORDSOFENGLISHLANGUAGECOLLECTEDFROMFIVESPEAKERSTHESEWORDSWERESPOKENINANACOUSTICALLYBALANCED,NOISEFREEENVIRONMENTTHEEXPERIMENTALRESULTSOFPROPOSEDASRSYSTEMAREOBTAINEDINTHEFORMOFMATRIXCALLEDCONFUSIONMATRIXTHERECOGNITIONACCURACYACHIEVEDINTHISRESEARCHIS984KEYWORDSASRMFCCDTWKNNIINTRODUCTIONSPEECHISPROPAGATIONOFPERIODICVARIATIONSINTHEAIRFROMHUMANLUNGSTHERESPONSIBILITYFORTHEPRODUCTIONANDSHAPINGOFACTUALSOUNDISDONEBYTHEHUMANVOCALTRACTWITHTHEHELPOFPHARYNX,NOSECAVITYANDMOUTHAUTOMATICSPEECHRECOGNITIONASRSYSTEMISTHEPROCESSOFAUTOMATICALLYINTERPRETINGHUMANSPEECHINADIGITALDEVICEANDISDEFINEDASTRANSFORMATIONOFACOUSTICSPEECHSIGNALSTOWORDSSTRINGGENERALLYGOALOFALLASRSYSTEMSAREUSEDTOEXTRACTWORDSSTRINGFROMINPUTSPEECHSIGNAL1INASRPROCESSTHEINPUTISTHESPEECHUTTERANCEANDOUTPUTISTHEINTHEFORMOFTEXTUALDATAINASSOCIATIONWITHGIVENINPUTSOMEFACTORSONWHICHTHEPERFORMANCEOFASRSYSTEMSMAINLYRELIESAREVOCABULARYSIZE,AMOUNTOFTRAININGDATAANDSYSTEMSCOMPUTATIONALCOMPLEXITYTHEREARENUMEROUSAPPLICATIONSOFASRLIKEITISEXTENSIVELYUSEDINDOMESTICAPPLIANCES,SECURITYDEVICES,CELLULARPHONES,ATMMACHINESANDCOMPUTERSTHISPAPERDESCRIBESANASRSYSTEMOFENGLISHLANGUAGEEXPERIMENTEDONSMALLVOCABULARYOFWORDSRESTOFTHEPAPERISORGANIZEDASFOLLOWSSECTIONIIDESCRIBESTHEOVERALLASRSYSTEMOVERVIEW,THEMAJORBLOCKSUSEDINASRSYSTEMWHILEIMPLEMENTATIONOFASRSYSTEMUSINGFEATUREEXTRACTIONANDCLASSIFICATIONTECHNIQUESAREDESCRIBEDINSECTIONIIISECTIONIVDISCUSESTHEBRIEFDESCRIPTIONOFEXPERIMENTALSETUP,ASWELLASSOMEEXPERIMENTALRESULTSCONCLUDINGREMARKSAREDISCUSSEDINSECTIONVIIASRSYSTEMOVERVIEWASRSYSTEMCOMPRISESOFTWOMAINBLOCKSIEFEATUREEXTRACTIONBLOCKANDACLASSIFICATIONBLOCKASSHOWNINFIG1FIG1BLOCKDIAGRAMOFPROPOSEDASRSYSTEMDESIGNTHEINPUTTOTHEBLOCKISSPEECHANDOUTPUTOFTHEBLOCKISTEXTUALDATATHEWORKINGOFBLOCKSISDESCRIBEDBELOWAFEATUREEXTRACTIONBLOCKFEATUREEXTRACTIONISONEOFTHEMOSTVITALMODULEINANASRSYSTEMINASR,SPEECHSIGNALISSPLITUPINTOSMALLERFRAMESUSUALLY10TO25MSECASTHEREISREDUNDANTINFORMATION,PRESENTINTHESPEECHSIGNALTHEREFORE,TOTAKEOUTIMPORTANTANDUSEFULINFORMATIONFEATUREEXTRACTIONTECHNIQUEISAPPLIEDTHISWILLALSOHELPINDIMINUTIONOFDIMENSIONALITYPERCEPTUALLINEARPREDICTIONPLPCOEFFICIENTS,WAVELETTRANSFORMBASEDFEATURES,LINEARPREDICTIVECOEFFICIENTSLPC,WAVELETPACKETBASEDFEATURESANDMELFREQUENCYCEPSTRALCOEFFICIENTSMFCCARETHEWIDELYUSEDFEATURESINASR2MFCCISUSEDINTHISRESEARCHANDISDISCUSSEDINDETAILSINSECTIONIIITHE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST1079781467397919/16/31002016IEEE0BCLASSIFICATIONBLOCKAFTEREXTRACTINGFEATURESFROMSPEECHSIGNAL,THEEXTRACTEDFEATURESAREGIVENTOTHECLASSIFICATIONBLOCKFORRECOGNITIONPURPOSEINCLASSIFICATIONTHEINPUTSPEECHFEATUREVECTORISUSEDTOTRAINONKNOWNFEATUREPATTERNSANDISTESTEDONTESTDATASETANDTHEPERFORMANCEOFCLASSIFIERISEVALUATEDONPERCENTAGERECOGNITIONACCURACYINTHISRESEARCH,DTWISUSEDFORFEATUREMATCHINGANDKNNISUSEDFORCLASSIFICATION,THEINNERBLOCKSSHOWNINFIG2AREINDIVIDUALLYDESCRIBEDBELOWINDETAIL1PREPROCESSINGTHEAUDIOSIGNALSINARERECORDEDHAVINGASAMPLINGRATEOF16KHZEACHWORDISSTOREDINSEPARATEAUDIOFILETHEPREPROCESSINGSTEPINCLUDESTHEPREEMPHASISOFSIGNALTOBOOSTTHEENERGYOFSIGNALATHIGHFREQUENCIESTHEDIFFERENCEEQUATIONOFPREEMPHASISFILTERISGIVENBYEQUATION2BOTHAREDISCUSSEDFURTHERINSECTIONIIIHZBZAZBOBLZL11097Z12CDATABASEINASRSYSTEM,THEDATABASEISAGROUPOFSPEECHSAMPLESTHESESAMPLESOFSPEECHDATAARECOLLECTEDINAWAYTOILLUSTRATEDIFFERENTCHANGEABLEASPECTSOFLANGUAGESELECTIONOFADATASETISOFSIGNIFICANTIMPORTANCEFORSUCCESSFULLYCONDUCTINGASRRESEARCHITPROVIDESAPLATFORMINCOMPARINGPERFORMANCEOFTHEOUTPUTRESPONSEOFPREEMPHASISFILTERISSHOWNINFIG3ORIGNALSIGNAL040200204DIFFERENTSPEECHRECOGNITIONTECHNIQUES3ITALSOPROVIDES0500010000150003RESEARCHERSABALANCEINDIFFERENTSPEECHRECOGNITIONASPECTSIEGENDER,AGEANDDIALECTADATABASECOMPRISESOFLARGE,MEDIUMORSMALLSIZESDEPENDINGUPONTHEWORDCOUNTDATACANBEGATHEREDFROMSOURCESIEBOOKS,NEWSPAPERS,MAGAZINES,X1010123FILTEREDSIGNALLECTURESANDTVCOMMERCIALSDUETOISSUESOFUNAVAILABILITYOFVOLUNTEERSANDSOMEIDENTITYISSUES,SPEECHDATABASESARENOTEASILYAVAILABLESOMESTANDARDSPEECHDATABASESAREAVAILABLEFORFEWLANGUAGES,LIKEBREFFORFRENCH,TIMITFORENGLISHANDATRFORJAPANESEETC4IIIIMPLEMENTATIONOFASRSYSTEMINTHISSECTIONIMPLEMENTATIONANDDESCRIPTIONOFFEATUREEXTRACTIONTECHNIQUEMELFREQUENCYCEPSTRALCOEFFICIENTMFCC,FEATUREMATCHINGTECHNIQUEDTWANDFEATURECLASSIFICATIONTECHNIQUEKNEARESTNEIGHBORKNNARE050100150200250300350400FIG3PREEMPHASISFILTEROUTPUT2FRAMINGANDWINDOWINGTHESPEECHSIGNALISNOTSTATIONARYINNATUREINORDERTOMAKEITSTATIONARYFRAMINGISUSEDFRAMINGISTHENEXTSTEPAFTERPREPROCESSINGINTHISSTEPSPEECHSIGNALISSPLITUPINTOSMALLERFRAMESOVERLAPPEDWITHEACHOTHERAFTERFRAMINGWINDOWINGISUSEDTOREMOVEDISCONTINUITIESATEDGESOFFRAMESTHEWINDOWMETHODUSEDINTHISRESEARCHISHAMMINGWINDOWTHEHAMMINGWINDOWISDEFINEDBYEQUATION3DISCUSSEDINDETAILW054046COSZN0N1N10OTERWISE3AMELFREQUENCYCEPSTRALCOEFFICIENTHUMANSPEECHASAFUNCTIONOFTHEFREQUENCIESISNOTLINEARINNATURETHEREFORETHEPITCHOFANACOUSTICSPEECHSIGNALOFSINGLEFREQUENCYISMAPPEDINTOA“MEL”SCALEINMELSCALE,THEWHERE,NISTOTALNUMBEROFSAMPLESINASINGLEFRAMETHEOUTPUTRESPONSEOFORIGINALSIGNALANDWINDOWEDSIGNALISSHOWNINFIG4ORIGINALSIGNALFREQUENCIESSPACINGBELOW1KHZISLINEARANDTHEFREQUENCIESSPACINGABOVE1KHZISLOGARITHMIC5THEMELFREQUENCIESCORRESPONDINGTOTHEHERTZFREQUENCIESARECALCULATEDBYUSINGEQUATION1040200204050001000015000FMEL2595LOG1F700THEBLOCKDIAGRAMFORMELFREQUENCYCEPSTRALCOEFFICIENTSMFCCCOMPUTATIONSISSHOWNINFIG2X13WINDOWEDSIGNAL1012050100150200250300350400FIG4ORIGINALSIGNALVSWINDOWEDSIGNAL3FASTFOURIERTRANSFORMFFTFASTFOURIERTRANSFORMISUSEDFORCALCULATINGOFTHEDISCRETEFOURIERTRANSFORMDFTOFSIGNAL,WITHSIZEN512HAVEBEENUSED6THISSTEPISPERFORMEDTOTRANSFORMTHESIGNALINTOFREQUENCYDOMAINTHEFFTISCALCULATEDUSINGEQUATION4FIG2BLOCKDIAGRAMFORMFCCCOMPUTATIONXKN1XEJZKNN0N4THE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST1089781467397919/16/31002016IEEEEWHERE,NISTHESIZEOFFFTTHEMAGNITUDESPECTRUMOFFFTXKN12XCOSK21K0,1,2,N16N0ZNISSHOWNINFIG5004FASTFOURIERTRANSFORMFFTTHEMFCCSGRAPHFORASINGLEWORDISSHOWNINFIG8MFCCCOMPUTATIONOFASINGLEWORD00351500310002500250015001000050050100150200250300350400FREQUENCYFIG5FASTFOURIERTRANSFORMMAGNITUDESPECTRUM4MELFILTERBANKTHENEXTSTEPAFTERTAKINGFFTOFTHESIGNALISTHETRANSFORMATIONFROMHERTZTOMELSCALE,THESPECTRUMSPOWERISTRANSFORMEDINTOAMELSCALE7THEMELFILTERBANKCOMPRISESOFTRIANGULARSHAPEDOVERLAPPINGFILTERSAS1015200102030405060708090NOOFFRAMESFIG8MFCCSFORSINGLEWORDSHOWNINFIG6FIG6MFCCFILTERBANKOUTPUTBCLASSIFICATIONITRAININGIITESTINGTHERESULTSANDPERCENTAGERECOGNITIONACCURACYAREOBTAINEDINTHEFORMOFCONFUSIONMATRIXDTWANDKNNAREDISCUSSEDFURTHERINNEXTSECTION1DYNAMICTIMEWRAPPINGDTWDTWALGORITHMCALCULATIONISINVIEWOFMEASURINGCLOSENESSINTWOTIMESERIESWHICHMIGHTSHIFTINTIMEANDSPEEDTHECOMPARISONISMEASUREDINTERMSOFPOSITIONOFTWOTIMEARRANGEMENTSIFONETIMEARRANGEMENTMIGHTBEWRAPPEDNONSTRAIGHTLYBYEXTENDINGORCONTRACTINGITALONGITSTIMEPIVOTTHEWRAPPINGINTWOTIMEARRANGEMENTSCANFURTHERBEUTILIZEDTODISCOVERRELATINGREGIONSINTWOTIMEARRANGEMENTSORTOFOCUSCLOSENESSBETWEENTHETWOTIMEARRANGEMENTSNUMERICALLY,DTWCOMPARESTWOTIMEARRANGEDPATTERNSANDMEASURETHESIMILARITYBETWEENTHEMWITHTHEHELPOFMINIMUMAMPLITUDELOGENERGYOFFRAMESAMPLITUDE5THE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST1099781467397919/16/31002016IEEE9DISTANCEFORMULACONSIDERTWOTIMESERIESPANDQHAVINGLENGTHNANDMIEPP1,PZ,P3,PI,PNQQ1,QZ,Q3,QJ,QMINTIMESERIESPANDQTHEITHANDJTHCOMPONENTOFTHEMATRIXINCLUDESTHEDISTANCEDPI,QJINTHETWOMATRIXPOINTSPIANDQJ10THENUSINGEUCLIDEANDISTANCEFORMULA,INEQUATION7MEASURESTHEABSOLUTEDISTANCEBETWEENTWOPOINTSDPI,QJJPIQJZ7EVERYMATRIXELEMENTIANDJISBELONGSTOTHEALIGNMENTINPOINTSPIANDQJTHEN,USINGEQUATION8ACCUMULATEDDISTANCEISCALCULATEDDI,JMINDI1,J1,DI1,J,DI,J1DI,J82KNEARESTNEIGHBORKNNTHEWORKINGOFKNNCLASSIFIERINTHISRESEARCHISDICUSSEDBELOWKNNMETHODCONSISTSOFASSIGNINGTHEINDEXOFTHEFEATUREVECTORTHATISNEARESTTOGIVENSCOREINTHEFEATURESPACEMINIMUMSCOREINDICESFROMDTWAREPROCESSEDINKNNMETHODITCONVERGESTHECURRENTFEATUREONTORESPECTIVEFEATUREOFFEATURESPACESAMENUMBERSOFFEATURESARERETURNEDBYKNNBUTTHESEFEATURESAREFROMFEATURESPACEMODEOFTHEKNNRETURNEDFEATURESGIVESTHEMOST3CONFUSIONMATRIXINORDERTOCHECKTHEEFFICIENCYOFTHESYSTEMIERECOGNITIONACCURACYANDPERCENTAGEOFERROR,ACONFUSIONMATRIXISFORMEDINCASEOFNWORDS,ITWILLCONTAINNNMATRIXINCONFUSIONMATRIXALLDIAGONALSENTRIES,STATEAIJFORIJ,SHOWEDTHENOOFTIMEAWORDIISMATCHEDCORRECTLY11SIMILARLYNONDIAGONALENTRIES,STATEAIJFORIJ,SHOWEDTHENUMBEROFTIMESAWORDIISISCONFUSEDWITHTHEWORDJA11A12A13A1NA21A22A23A2NA31A32A33A3NAN1AN2AN3ANN4PERCENTAGEERRORTHECALCULATIONOFPERCENTAGEOFERRORISVERYIMPORTANTINORDERTOCHECKTHEOVERALLSYSTEMPERFORMANCEANDITISCALCULATEDINTHEFORMOFCONFUSIONMATRIXFORTHISPURPOSEASINGLEISOLATEDWORDISTESTEDANDCHECKHOWMANYTIMEITISRECOGNIZEDSUCCESSFULLYANDSTATEDINDIAGONALENTRYINROWIPERCENTAGEISCALCULATEDBYDIVIDINGSUCCESSFULLYENTRIESDIVIDEDBYTHETOTALNOOFENTRIESTHUS,CORRECTMATCHCANDPERCENTAGEERRORE,FORAPARTICULARWORD,CANBEREPRESENTEDASINEQUATION9THERESULTSOBTAINEDFROMCONFUSIONMATRIXAREFURTHERDICUSSEDINSECTIONIVFREQUENTFEATURELIESINANDITWOULDBETHECORRECTMTCCAIJAAAAWEREIJ,J1,2,3,N9ILI2I3INRECOGNIZEDWORDOFERRORE1CX10010IVEXPERIMENTALRESULTSANDDISCUSSIONTHEEXPERIMENTSWEREPERFORMEDONASMALLSIZEVOCABULARYOFENGLISHTHESETUPINCLUDESWORDSSPOKENFROMFIVEDIFFERENTSPEAKERSTHESEWORDSWERESPOKENINANACOUSTICALLYBALANCED,NOISEFREEENVIRONMENTTHEIMPLEMENTATIONANDEXPERIMENTALRESULTSWEREANALYZEDWITHTHEHELPOFMATLABR2014BTHETESTINGANDTRAININGRESULTSOFASRAREOBTAINEDINTHEFORMOFMATRIXCALLEDCONFUSIONMATRIXASSHOWNINFIG10CONFUSIONMATRIXGRAPHOFWORDS2001501005001234568778561034910WORDINDEX21WORDINDEXFIG10CONFUSIONMATRIXGRAPHOFWORDSFIG9FLOWDIAGRAMOFKNNFIG9SHOWSTHEFLOWDIAGRAMOFKNNCLASSIFIER,HEREK_NISTHENUMBEROFNEARESTNEIGHBORS,N_SISTHENUMBEROFSPEAKERSANDN_WISTHENUMBEROFWORDSINVOCABULARYINFIG10OFCONFUSIONMATRIXGRAPHTHEXAXISANDTHEYAXISARESHOWINGTHEINDICESOFTHEWORDSTHEZAXISSHOWSTHEHEIGHTIEITSHOWSTHETOTALNUMBEROFTIMES,ANINDIVIDUALWORDISSUCCESSFULLYRECOGNIZEDORITCONFUSEDWITHANYOFOTHERWORDTHEDIAGONALSLOTSSHOWHEIGHTSASNOOFSUCCESFULORUNSUCCESFULRECOGNITIONTHE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST1109781467397919/16/31002016IEEESUCCESSFULRECOGNITIONRATETHEMAXIMUMPOSSIBLEATTAINEDPOSSIBILITYOFHEIGHTINTHISCASEIS200THETOTALNUMBEROFTIMESAWORDISTESTEDINTHISCASEIS200THEVALUESOFCORRECTMATCHCANDERRORE,FORWORDS,ARESUMMARIZEDINTABLEITABLEIRECOGNITIONERRORPERCENTAGEOFWORDSWORDVALUEOFCORRECTMATCHCRECOGNITIONACCURACYERROR1CX100“DARK”098982“WASH”099991“WATER”099599505“YEAR”097597525“DONT”097973“CARRY”099599505“GREASY”098982“LIKE”098598515“OILY”097597525“THAT”099599505ACCUMULATIVEAVERAGE098498416TABLEIDESCRIBESTHERECOGNITIONANDERRORRATESOFADATASETFIRSTLYEACHWORDISEVALUATEDONINDIVIDUALBASISANDTHENACCUMULATIVEAVERAGEOFTHEDATASETISCALCULATEDTHEDATAISOBTAINEDINTHEFORMOFCONFUSIONMATRIXASARESULTOFTESTINGTHEASRSYSTEMTHEACCUMULATIVEAVERAGESUCCESSRATEOBTAINEDFORTHEDATASETGIVENABOVEIS984WITH16ERRORRATEVCONCLUSIONTHEPROPOSEDRESEARCHONANASRSYSTEMDELINEATESMFCC,DTWANDKNNTECHNIQUESTHEEXTRACTIONOFFEATURESISPERFORMEDUSINGMFCC,DTWISUSEDFORSPEECHFEATURESMATCHINGANDKNNISUSEDFORCLASSIFICATIONMINIMUMSCOREINDICESACQUIREDFROMDTWAREPROCESSEDINKNNTHEEXPERIMENTALRESULTSAREOBTAINEDINTHEFORMOFCONFUSIONMATRIXITISOBSERVEDDURINGTHEWHOLERESEARCHTHATTHEPROPOSEDASRSYSTEMSHOWSGOODRECOGNITIONPERFORMANCEWHENMFCC,DTWANDKNNAREUSEDJOINTLYTHERECOGNITIONACCURACYACHIEVEDINTHISRESEARCHIS984WITHANERROROF16REFERENCES1JMGILBERT,SIRYBCHENKO,RHOFE,SRELL,MJFAGAN,RKMOORE,PGREEN,“ISOLATEDWORDRECOGNITIONOFSILENTSPEECHUSINGMAGNETICIMPLANTSANDSENSORS,”INTERNATIONALJOURNALOFMEDICALENGINEERINGANDPHYSICS,VOL32,PP11891197,AUGUST20102VIMALACANDDRVRADHA“AREVIEWONSPEECHRECOGNITIONCHALLENGESANDAPPROACHES”WORLDOFCOMPUTERSCIENCEANDINFORMATIONTECHNOLOGYJOURNALWCSITISSN22210741VOL2,NO1,PP17,20123JCLEAR,ANDNOSTLERSATKINS,“CORPUSDESIGNCRITERIA,“OXFORDJOURNALOFLITERARYANDLINGUISTICCOMPUTING,VOL7,NO1,PP116,19924LFLAMEL,ANDMESKENAZIJLGAUVAIN,“DESIGNCONSIDERATIONSANDTEXTSELECTIONFORBREF,ALARGEFRENCHREADSPEECHCORPUS,“IN1STINTERNATIONALCONFERENCEONSPOKENLANGUAGEPROCESSING,ICSLP,1990,PP109711005MMURUGAPPAN,NURULQASTURIIDAYUBAHARUDDIN,JERRITTAS“DWTANDMFCCBASEDHUMANEMOTIONALSPEECHCLASSIFICATIONUSINGLDA”INTERNATIONALCONFERENCEONBIOMEDICALENGINEERINGICOBE,PENANG,2728FEBRUARY2012,PP2032066MICHAELPITZ,RALFSCHLUTER,ANDHERMANNNEYSIRKOMOLAU,“COMPUTINGMELFREQUENCYCEPSTRALCOEFFICIENTSONTHEPOWERSPECTRUM,“IN2001IEEEINTERNATIONALCONFERENCEONACOUSTICS,SPEECH,ANDSIGNALPROCESSING,2001PROCEEDINGSICASSP01,USA,2001,PP73767IBRAHIMPATELANDDRYSRINIVASRAO“SPEECHRECOGNITIONUSINGHMMWITHMFCCANANALYSISUSINGFREQUENCYSPECTRALDECOMPOSITIONTECHNIQUE”SIGNALIMAGEPROCESSINGANINTERNATIONALJOURNALSIPIJVOL1,NO2,PP101110,DECEMBER20108AMILTON,SSHARMYROY,STAMILSELVI“SVMSCHEMEFORSPEECHEMOTIONRECOGNITIONUSINGMFCCFEATURE”INTERNATIONALJOURNALOFCOMPUTERAPPLICATIONS09758887VOLUME69NO9,PP3439,MAY20139AREGGBAGHDASARYANANDAALOUISBEEX“AUTOMATICPHONEMERECOGNITIONWITHSEGMENTALHIDDENMARKOVMODELS”IEEE2011CONFERENCEONSIGNALS,SYSTEMSANDCOMPUTERS,ASILOMAR,2011,PP56957410ANJALIBALA,ABHIJEETKUMAR,NIDHIKABIRLA“VOICECOMMANDRECOGNITIONSYSTEMMBASEDONMFCCANDDTW”INTERNATIONALJOURNALOFENGINEERINGSCIENCEANDTECHNOLOGY,VOL2,NO12,PP73357342,JAN201011HUANONGTING,BOONFEIYONG,SEYEDMOSTAFAMIRHASSANI,“SELFADJUSTABLENEURALNETWORKFORSPEECHRECOGNITION,”INTERNATIONALJOURNALOFENGINEERINGAPPLICATIONSOFARTIFICIALINTELLIGENCE,VOL26,PP20222027,JULY2013MUHAMMADATIFIMTIAZFACULTYOFELECTRONICSFEATURESOFSPEECHAREEXTRACTEDUSINGMFCCSDTWISAPPLIEDFORSPEECHFEATUREMATCHINGKNNISEMPLOYEDASACLASSIFIERTHEEXPERIMENTALSETUPINCLUDESWORDSOFENGLISHLANGUAGECOLLECTEDFROMFIVESPEAKERSTHESEWORDSWERESPOKENINANACOUSTICALLYBALANCED,NOISEFREEENVIRONMENTTHEEXPERIMENTALRESULTSOFPROPOSEDASRSYSTEMAREOBTAINEDINTHEFORMOFMATRIXCALLEDCONFUSIONMATRIXTHERECOGNITIONACCURACYACHIEVEDINTHISRESEARCHIS984KEYWORDSASRMFCCDTWKNNIINTRODUCTIONSPEECHISPROPAGATIONOFPERIODICVARIATIONSINTHEAIRFROMHUMANLUNGSTHERESPONSIBILITYFORTHEPRODUCTIONANDSHAPINGOFACTUALSOUNDISDONEBYTHEHUMANVOCALTRACTWITHTHEHELPOFPHARYNX,NOSECAVITYANDMOUTHAUTOMATICSPEECHRECOGNITIONASRSYSTEMISTHEPROCESSOFAUTOMATICALLYINTERPRETINGHUMANSPEECHINADIGITALDEVICEANDISDEFINEDASTRANSFORMATIONOFACOUSTICSPEECHSIGNALSTOWORDSSTRINGGENERALLYGOALOFALLASRSYSTEMSAREUSEDTOEXTRACTWORDSSTRINGFROMINPUTSPEECHSIGNAL1INASRPROCESSTHEINPUTISTHESPEECHUTTERANCEANDOUTPUTISTHEINTHEFORMOFTEXTUALDATAINASSOCIATIONWITHGIVENINPUTSOMEFACTORSONWHICHTHEPERFORMANCEOFASRSYSTEMSMAINLYRELIESAREVOCABULARYSIZE,AMOUNTOFTRAININGDATAANDSYSTEMSCOMPUTATIONALCOMPLEXITYTHEREARENUMEROUSAPPLICATIONSOFASRLIKEITISEXTENSIVELYUSEDINDOMESTICAPPLIANCES,SECURITYDEVICES,CELLULARPHONES,ATMMACHINESANDCOMPUTERSTHISPAPERDESCRIBESANASRSYSTEMOFENGLISHLANGUAGEEXPERIMENTEDONSMALLVOCABULARYOFWORDSRESTOFTHEPAPERISORGANIZEDASFOLLOWSSECTIONIIDESCRIBESTHEOVERALLASRSYSTEMOVERVIEW,THEMAJORBLOCKSUSEDINASRSYSTEMWHILEIMPLEMENTATIONOFASRSYSTEMUSINGFEATUREEXTRACTIONANDCLASSIFICATIONTECHNIQUESAREDESCRIBEDINSECTIONIIISECTIONIVDISCUSESTHEBRIEFDESCRIPTIONOFEXPERIMENTALSETUP,ASWELLASSOMEEXPERIMENTALRESULTSCONCLUDINGREMARKSAREDISCUSSEDINSECTIONVIIASRSYSTEMOVERVIEWASRSYSTEMCOMPRISESOFTWOMAINBLOCKSIEFEATUREEXTRACTIONBLOCKANDACLASSIFICATIONBLOCKASSHOWNINFIG1FIG1BLOCKDIAGRAMOFPROPOSEDASRSYSTEMDESIGNTHEINPUTTOTHEBLOCKISSPEECHANDOUTPUTOFTHEBLOCKISTEXTUALDATATHEWORKINGOFBLOCKSISDESCRIBEDBELOWAFEATUREEXTRACTIONBLOCKFEATUREEXTRACTIONISONEOFTHEMOSTVITALMODULEINANASRSYSTEMINASR,SPEECHSIGNALISSPLITUPINTOSMALLERFRAMESUSUALLY10TO25MSECASTHEREISREDUNDANTINFORMATION,PRESENTINTHESPEECHSIGNALTHEREFORE,TOTAKEOUTIMPORTANTANDUSEFULINFORMATIONFEATUREEXTRACTIONTECHNIQUEISAPPLIEDTHISWILLALSOHELPINDIMINUTIONOFDIMENSIONALITYPERCEPTUALLINEARPREDICTIONPLPCOEFFICIENTS,WAVELETTRANSFORMBASEDFEATURES,LINEARPREDICTIVECOEFFICIENTSLPC,WAVELETPACKETBASEDFEATURESANDMELFREQUENCYCEPSTRALCOEFFICIENTSMFCCARETHEWIDELYUSEDFEATURESINASR2MFCCISUSEDINTHISRESEARCHANDISDISCUSSEDINDETAILSINSECTIONIIIISOLATEDWORDAUTOMATICSPEECHRECOGNITIONASRSYSTEMUSINGMFCC,DTWTHEREFORETHEPITCHOFANACOUSTICSPEECHSIGNALOFSINGLEFREQUENCYISMAPPEDINTOA“MEL”SCALEINMELSCALE,THEFREQUENCIESSPACINGBELOW1KHZISLINEARANDTHEFREQUENCIESSPACINGABOVE1KHZISLOGARITHMIC5THEMELFREQUENCIESCORRESPONDINGTOTHEHERTZFREQUENCIESARECALCULATEDBYUSINGEQUATION1NULLNULLNULLNULL2595LOG1NULLNULLNULLNULL1THEBLOCKDIAGRAMFORMELFREQUENCYCEPSTRALCOEFFICIENTSMFCCCOMPUTATIONSISSHOWNINFIG2FIG2BLOCKDIAGRAMFORMFCCCOMPUTATIONTHEINNERBLOCKSSHOWNINFIG2AREINDIVIDUALLYDESCRIBEDBELOWINDETAIL1PREPROCESSINGTHEAUDIOSIGNALSINARERECORDEDHAVINGASAMPLINGRATEOF16KHZEACHWORDISSTOREDINSEPARATEAUDIOFILETHEPREPROCESSINGSTEPINCLUDESTHEPREEMPHASISOFSIGNALTOBOOSTTHEENERGYOFSIGNALATHIGHFREQUENCIESTHEDIFFERENCEEQUATIONOFPREEMPHASISFILTERISGIVENBYEQUATION2NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL1097NULLNULLNULL2THEOUTPUTRESPONSEOFPREEMPHASISFILTERISSHOWNINFIG3FIG3PREEMPHASISFILTEROUTPUT2FRAMINGANDWINDOWINGTHESPEECHSIGNALISNOTSTATIONARYINNATUREINORDERTOMAKEITSTATIONARYFRAMINGISUSEDFRAMINGISTHENEXTSTEPAFTERPREPROCESSINGINTHISSTEPSPEECHSIGNALISSPLITUPINTOSMALLERFRAMESOVERLAPPEDWITHEACHOTHERAFTERFRAMINGWINDOWINGISUSEDTOREMOVEDISCONTINUITIESATEDGESOFFRAMESTHEWINDOWMETHODUSEDINTHISRESEARCHISHAMMINGWINDOWTHEHAMMINGWINDOWISDEFINEDBYEQUATION3NULLNULLNULL054046COSNULLNULLNULLNULLNULLNULL0NULLNULL10NULLNULLNULLNULLNULLNULLNULLNULL3WHERE,NISTOTALNUMBEROFSAMPLESINASINGLEFRAMETHEOUTPUTRESPONSEOFORIGINALSIGNALANDWINDOWEDSIGNALISSHOWNINFIG4FIG4ORIGINALSIGNALVSWINDOWEDSIGNAL3FASTFOURIERTRANSFORMFFTFASTFOURIERTRANSFORMISUSEDFORCALCULATINGOFTHEDISCRETEFOURIERTRANSFORMDFTOFSIGNAL,WITHSIZEN512HAVEBEENUSED6THISSTEPISPERFORMEDTOTRANSFORMTHESIGNALINTOFREQUENCYDOMAINTHEFFTISCALCULATEDUSINGEQUATION4NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL4050001000015000040200204ORIGNALSIGNAL05010015020025030035040032101X103FILTEREDSIGNAL050001000015000040200204ORIGINALSIGNAL0501001502002503003504002101X103WINDOWEDSIGNALTHE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST9781467397919/16/31002016IEEE107WHERE,NISTHESIZEOFFFTTHEMAGNITUDESPECTRUMOFFFTISSHOWNINFIG5FIG5FASTFOURIERTRANSFORMMAGNITUDESPECTRUM4MELFILTERBANKTHENEXTSTEPAFTERTAKINGFFTOFTHESIGNALISTHETRANSFORMATIONFROMHERTZTOMELSCALE,THESPECTRUMSPOWERISTRANSFORMEDINTOAMELSCALE7THEMELFILTERBANKCOMPRISESOFTRIANGULARSHAPEDOVERLAPPINGFILTERSASSHOWNINFIG6FIG6MFCCFILTERBANKOUTPUT5DELTAENERGYINTHISSTEPTAKEBASE10LOGARITHMOFOUTPUTOFPREVIOUSSTEPTHECOMPUTATIONOFLOGENERGYISESSENTIALBECAUSEOFTHEFACTTHATHUMANEARRESPONSETOACOUSTICSPEECHSIGNALLEVELISNOTLINEAR,HUMANEARISNOTMUCHSENSITIVETODIFFERENCEINAMPLITUDEATHIGHERAMPLITUDESTHEADVANTAGEOFLOGARITHMICFUNCTIONISTHATITTENDSTODUPLICATEBEHAVIOROFHUMANEARENERGYCOMPUTATIONISCALCULATEDUSINGEQUATION5THEGRAPHFORENERGYCOMPUTATIONISSHOWNINFIG7NULLNULLNULLNULLNULLNULLNULLNULLNULLNULL5FIG7SIGNALLOGENERGYOUTPUT6DISCRETECOSINETRANSFORMDCTTHEDISCRETECOSINETRANSFORMDCTISEMPLOYEDAFTERTAKINGLOGARITHMOFOUTPUTOFTHEMELFILTERBANKITFINALLYPRODUCESTHEMELFREQUENCYCEPSTRALCOEFFICIENTSINTHISRESEARCHFORANISOLATEDWORD,39DIMENSIONALFEATURESARETAKENOUTIE12MFCCMELFREQUENCYCEPSTRALCOEFFICIENTS,ONEENERGYFEATURE,ONEDELTAENERGYFEATURE,ONEDOUBLEDELTAENERGYFEATURE,12DELTAMFCCFEATURESAND12DOUBLEDELTAMFCCFEATURESANNPOINTDCT8ISDEFINEDBYEQUATION6NULLNULL2NULLNULLCOSNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL2NULL1NULL0,1,2,NULL16THEMFCCSGRAPHFORASINGLEWORDISSHOWNINFIG8FIG8MFCCSFORSINGLEWORDBCLASSIFICATIONITRAININGIITESTINGTHERESULTSANDPERCENTAGERECOGNITIONACCURACYAREOBTAINEDINTHEFORMOFCONFUSIONMATRIXDTWANDKNNAREDISCUSSEDFURTHERINNEXTSECTION1DYNAMICTIMEWRAPPINGDTWDTWALGORITHMCALCULATIONISINVIEWOFMEASURINGCLOSENESSINTWOTIMESERIESWHICHMIGHTSHIFTINTIMEANDSPEEDTHECOMPARISONISMEASUREDINTERMSOFPOSITIONOFTWOTIMEARRANGEMENTSIFONETIMEARRANGEMENTMIGHTBEWRAPPEDNONSTRAIGHTLYBYEXTENDINGORCONTRACTINGITALONGITSTIMEPIVOTTHEWRAPPINGINTWOTIMEARRANGEMENTSCANFURTHERBEUTILIZEDTODISCOVERRELATINGREGIONSINTWOTIMEARRANGEMENTSORTOFOCUSCLOSENESSBETWEENTHETWOTIMEARRANGEMENTSNUMERICALLY,DTWCOMPARESTWOTIMEARRANGEDPATTERNSANDMEASURETHESIMILARITYBETWEENTHEMWITHTHEHELPOFMINIMUM05010015020025030035040000005001001500200250030035004FASTFOURIERTRANSFORMFFTFREQUENCYAMPLITUDE0102030405060708090642024NOOFFRAMESLOGENERGYOFFRAMESSIGNALLOGENERGY01020304050607080902015105051015NOOFFRAMESAMPLITUDEMFCCCOMPUTATIONOFASINGLEWORDTHE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST9781467397919/16/31002016IEEE108DISTANCEFORMULACONSIDERTWOTIMESERIESPANDQHAVINGLENGTHNANDMIENULLNULLNULL,NULLNULL,NULLNULL,NULLNULL,NULLNULLNULLNULLNULL,NULLNULL,NULLNULL,NULLNULL,NULLNULLINTIMESERIESPANDQTHEITHANDJTHCOMPONENTOFTHEMATRIXINCLUDESTHEDISTANCEDPI,QJINTHETWOMATRIXPOINTSPIANDQJ10THENUSINGEUCLIDEANDISTANCEFORMULA,INEQUATION7MEASURESTHEABSOLUTEDISTANCEBETWEENTWOPOINTSNULLNULLNULLNULL,NULLNULLNULLNULLNULLNULLNULLNULLNULL7EVERYMATRIXELEMENTIANDJISBELONGSTOTHEALIGNMENTINPOINTSPIANDQJTHEN,USINGEQUATION8ACCUMULATEDDISTANCEISCALCULATEDNULLNULL,NULLMINNULLNULL1,NULL1,NULLNULL1,NULL,NULLNULL,NULL1NULLNULL,NULL82KNEARESTNEIGHBORKNNTHEWORKINGOFKNNCLASSIFIERINTHISRESEARCHISDICUSSEDBELOWKNNMETHODCONSISTSOFASSIGNINGTHEINDEXOFTHEFEATUREVECTORTHATISNEARESTTOGIVENSCOREINTHEFEATURESPACEMINIMUMSCOREINDICESFROMDTWAREPROCESSEDINKNNMETHODITCONVERGESTHECURRENTFEATUREONTORESPECTIVEFEATUREOFFEATURESPACESAMENUMBERSOFFEATURESARERETURNEDBYKNNBUTTHESEFEATURESAREFROMFEATURESPACEMODEOFTHEKNNRETURNEDFEATURESGIVESTHEMOSTFREQUENTFEATURELIESINANDITWOULDBETHERECOGNIZEDWORDFIG9FLOWDIAGRAMOFKNNFIG9SHOWSTHEFLOWDIAGRAMOFKNNCLASSIFIER,HEREK_NISTHENUMBEROFNEARESTNEIGHBORS,N_SISTHENUMBEROFSPEAKERSANDN_WISTHENUMBEROFWORDSINVOCABULARY3CONFUSIONMATRIXINORDERTOCHECKTHEEFFICIENCYOFTHESYSTEMIERECOGNITIONACCURACYANDPERCENTAGEOFERROR,ACONFUSIONMATRIXISFORMEDINCASEOFNWORDS,ITWILLCONTAINNNMATRIXINCONFUSIONMATRIXALLDIAGONALSENTRIES,STATEAIJFORIJ,SHOWEDTHENOOFTIMEAWORDIISMATCHEDCORRECTLY11SIMILARLYNONDIAGONALENTRIES,STATEAIJFORIJ,SHOWEDTHENUMBEROFTIMESAWORDIISISCONFUSEDWITHTHEWORDJA11A12A13A1NA21A22A23A2NA31A32A33A3NAN1AN2AN3ANN4PERCENTAGEERRORTHECALCULATIONOFPERCENTAGEOFERRORISVERYIMPORTANTINORDERTOCHECKTHEOVERALLSYSTEMPERFORMANCEANDITISCALCULATEDINTHEFORMOFCONFUSIONMATRIXFORTHISPURPOSEASINGLEISOLATEDWORDISTESTEDANDCHECKHOWMANYTIMEITISRECOGNIZEDSUCCESSFULLYANDSTATEDINDIAGONALENTRYINROWIPERCENTAGEISCALCULATEDBYDIVIDINGSUCCESSFULLYENTRIESDIVIDEDBYTHETOTALNOOFENTRIESTHUS,CORRECTMATCHCANDPERCENTAGEERRORE,FORAPARTICULARWORD,CANBEREPRESENTEDASINEQUATION9THERESULTSOBTAINEDFROMCONFUSIONMATRIXAREFURTHERDICUSSEDINSECTIONIVNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL,NULL1,2,3,NULL9OFERRORE1CX10010IVEXPERIMENTALRESULTSANDDISCUSSIONTHEEXPERIMENTSWEREPERFORMEDONASMALLSIZEVOCABULARYOFENGLISHTHESETUPINCLUDESWORDSSPOKENFROMFIVEDIFFERENTSPEAKERSTHESEWORDSWERESPOKENINANACOUSTICALLYBALANCED,NOISEFREEENVIRONMENTTHEIMPLEMENTATIONANDEXPERIMENTALRESULTSWEREANALYZEDWITHTHEHELPOFMATLABR2014BTHETESTINGANDTRAININGRESULTSOFASRAREOBTAINEDINTHEFORMOFMATRIXCALLEDCONFUSIONMATRIXASSHOWNINFIG10FIG10CONFUSIONMATRIXGRAPHOFWORDSINFIG10OFCONFUSIONMATRIXGRAPHTHEXAXISANDTHEYAXISARESHOWINGTHEINDICESOFTHEWORDSTHEZAXISSHOWSTHEHEIGHTIEITSHOWSTHETOTALNUMBEROFTIMES,ANINDIVIDUALWORDISSUCCESSFULLYRECOGNIZEDORITCONFUSEDWITHANYOFOTHERWORDTHEDIAGONALSLOTSSHOWHEIGHTSAS1234567891012345678910050100150200WORDINDEXCONFUSIONMATRIXGRAPHOFWORDSWORDINDEXNOOFSUCCESFULORUNSUCCESFULRECOGNITIONTHE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST9781467397919/16/31002016IEEE109SUCCESSFULRECOGNITIONRATETHEMAXIMUMPOSSIBLEATTAINEDPOSSIBILITYOFHEIGHTINTHISCASEIS200THETOTALNUMBEROFTIMESAWORDISTESTEDINTHISCASEIS200THEVALUESOFCORRECTMATCHCANDERRORE,FORWORDS,ARESUMMARIZEDINTABLEITABLEIRECOGNITIONERRORPERCENTAGEOFWORDSWORDVALUEOFCORRECTMATCHCRECOGNITIONACCURACYERROR1CX100“DARK”098982“WASH”099991“WATER”099599505“YEAR”097597525“DONT”097973“CARRY”099599505“GREASY”098982“LIKE”098598515“OILY”097597525“THAT”099599505ACCUMULATIVEAVERAGE098498416TABLEIDESCRIBESTHERECOGNITIONANDERRORRATESOFADATASETFIRSTLYEACHWORDISEVALUATEDONINDIVIDUALBASISANDTHENACCUMULATIVEAVERAGEOFTHEDATASETISCALCULATEDTHEDATAISOBTAINEDINTHEFORMOFC
- 温馨提示:
1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。