中文翻译.docx

使用MFCC,DTW和KNN的隔离词自动语音识别(ASR)系统【中文4600字】

收藏

压缩包内文档预览:
预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图 预览图
编号:9103516    类型:共享资源    大小:1.49MB    格式:ZIP    上传时间:2018-02-28 上传人:闰*** IP属地:河南
13
积分
关 键 词:
使用 mfcc dtw 以及 knn 隔离 自动 语音 识别 辨认 asr 系统 中文
资源描述:
使用MFCC,DTW和KNN的隔离词自动语音识别(ASR)系统【中文4600字】,使用,mfcc,dtw,以及,knn,隔离,自动,语音,识别,辨认,asr,系统,中文
内容简介:
THE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST1069781467397919/16/31002016IEEEISOLATEDWORDAUTOMATICSPEECHRECOGNITIONASRSYSTEMUSINGMFCC,DTWFEATURESOFSPEECHAREEXTRACTEDUSINGMFCCSDTWISAPPLIEDFORSPEECHFEATUREMATCHINGKNNISEMPLOYEDASACLASSIFIERTHEEXPERIMENTALSETUPINCLUDESWORDSOFENGLISHLANGUAGECOLLECTEDFROMFIVESPEAKERSTHESEWORDSWERESPOKENINANACOUSTICALLYBALANCED,NOISEFREEENVIRONMENTTHEEXPERIMENTALRESULTSOFPROPOSEDASRSYSTEMAREOBTAINEDINTHEFORMOFMATRIXCALLEDCONFUSIONMATRIXTHERECOGNITIONACCURACYACHIEVEDINTHISRESEARCHIS984KEYWORDSASRMFCCDTWKNNIINTRODUCTIONSPEECHISPROPAGATIONOFPERIODICVARIATIONSINTHEAIRFROMHUMANLUNGSTHERESPONSIBILITYFORTHEPRODUCTIONANDSHAPINGOFACTUALSOUNDISDONEBYTHEHUMANVOCALTRACTWITHTHEHELPOFPHARYNX,NOSECAVITYANDMOUTHAUTOMATICSPEECHRECOGNITIONASRSYSTEMISTHEPROCESSOFAUTOMATICALLYINTERPRETINGHUMANSPEECHINADIGITALDEVICEANDISDEFINEDASTRANSFORMATIONOFACOUSTICSPEECHSIGNALSTOWORDSSTRINGGENERALLYGOALOFALLASRSYSTEMSAREUSEDTOEXTRACTWORDSSTRINGFROMINPUTSPEECHSIGNAL1INASRPROCESSTHEINPUTISTHESPEECHUTTERANCEANDOUTPUTISTHEINTHEFORMOFTEXTUALDATAINASSOCIATIONWITHGIVENINPUTSOMEFACTORSONWHICHTHEPERFORMANCEOFASRSYSTEMSMAINLYRELIESAREVOCABULARYSIZE,AMOUNTOFTRAININGDATAANDSYSTEMSCOMPUTATIONALCOMPLEXITYTHEREARENUMEROUSAPPLICATIONSOFASRLIKEITISEXTENSIVELYUSEDINDOMESTICAPPLIANCES,SECURITYDEVICES,CELLULARPHONES,ATMMACHINESANDCOMPUTERSTHISPAPERDESCRIBESANASRSYSTEMOFENGLISHLANGUAGEEXPERIMENTEDONSMALLVOCABULARYOFWORDSRESTOFTHEPAPERISORGANIZEDASFOLLOWSSECTIONIIDESCRIBESTHEOVERALLASRSYSTEMOVERVIEW,THEMAJORBLOCKSUSEDINASRSYSTEMWHILEIMPLEMENTATIONOFASRSYSTEMUSINGFEATUREEXTRACTIONANDCLASSIFICATIONTECHNIQUESAREDESCRIBEDINSECTIONIIISECTIONIVDISCUSESTHEBRIEFDESCRIPTIONOFEXPERIMENTALSETUP,ASWELLASSOMEEXPERIMENTALRESULTSCONCLUDINGREMARKSAREDISCUSSEDINSECTIONVIIASRSYSTEMOVERVIEWASRSYSTEMCOMPRISESOFTWOMAINBLOCKSIEFEATUREEXTRACTIONBLOCKANDACLASSIFICATIONBLOCKASSHOWNINFIG1FIG1BLOCKDIAGRAMOFPROPOSEDASRSYSTEMDESIGNTHEINPUTTOTHEBLOCKISSPEECHANDOUTPUTOFTHEBLOCKISTEXTUALDATATHEWORKINGOFBLOCKSISDESCRIBEDBELOWAFEATUREEXTRACTIONBLOCKFEATUREEXTRACTIONISONEOFTHEMOSTVITALMODULEINANASRSYSTEMINASR,SPEECHSIGNALISSPLITUPINTOSMALLERFRAMESUSUALLY10TO25MSECASTHEREISREDUNDANTINFORMATION,PRESENTINTHESPEECHSIGNALTHEREFORE,TOTAKEOUTIMPORTANTANDUSEFULINFORMATIONFEATUREEXTRACTIONTECHNIQUEISAPPLIEDTHISWILLALSOHELPINDIMINUTIONOFDIMENSIONALITYPERCEPTUALLINEARPREDICTIONPLPCOEFFICIENTS,WAVELETTRANSFORMBASEDFEATURES,LINEARPREDICTIVECOEFFICIENTSLPC,WAVELETPACKETBASEDFEATURESANDMELFREQUENCYCEPSTRALCOEFFICIENTSMFCCARETHEWIDELYUSEDFEATURESINASR2MFCCISUSEDINTHISRESEARCHANDISDISCUSSEDINDETAILSINSECTIONIIITHE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST1079781467397919/16/31002016IEEE0BCLASSIFICATIONBLOCKAFTEREXTRACTINGFEATURESFROMSPEECHSIGNAL,THEEXTRACTEDFEATURESAREGIVENTOTHECLASSIFICATIONBLOCKFORRECOGNITIONPURPOSEINCLASSIFICATIONTHEINPUTSPEECHFEATUREVECTORISUSEDTOTRAINONKNOWNFEATUREPATTERNSANDISTESTEDONTESTDATASETANDTHEPERFORMANCEOFCLASSIFIERISEVALUATEDONPERCENTAGERECOGNITIONACCURACYINTHISRESEARCH,DTWISUSEDFORFEATUREMATCHINGANDKNNISUSEDFORCLASSIFICATION,THEINNERBLOCKSSHOWNINFIG2AREINDIVIDUALLYDESCRIBEDBELOWINDETAIL1PREPROCESSINGTHEAUDIOSIGNALSINARERECORDEDHAVINGASAMPLINGRATEOF16KHZEACHWORDISSTOREDINSEPARATEAUDIOFILETHEPREPROCESSINGSTEPINCLUDESTHEPREEMPHASISOFSIGNALTOBOOSTTHEENERGYOFSIGNALATHIGHFREQUENCIESTHEDIFFERENCEEQUATIONOFPREEMPHASISFILTERISGIVENBYEQUATION2BOTHAREDISCUSSEDFURTHERINSECTIONIIIHZBZAZBOBLZL11097Z12CDATABASEINASRSYSTEM,THEDATABASEISAGROUPOFSPEECHSAMPLESTHESESAMPLESOFSPEECHDATAARECOLLECTEDINAWAYTOILLUSTRATEDIFFERENTCHANGEABLEASPECTSOFLANGUAGESELECTIONOFADATASETISOFSIGNIFICANTIMPORTANCEFORSUCCESSFULLYCONDUCTINGASRRESEARCHITPROVIDESAPLATFORMINCOMPARINGPERFORMANCEOFTHEOUTPUTRESPONSEOFPREEMPHASISFILTERISSHOWNINFIG3ORIGNALSIGNAL040200204DIFFERENTSPEECHRECOGNITIONTECHNIQUES3ITALSOPROVIDES0500010000150003RESEARCHERSABALANCEINDIFFERENTSPEECHRECOGNITIONASPECTSIEGENDER,AGEANDDIALECTADATABASECOMPRISESOFLARGE,MEDIUMORSMALLSIZESDEPENDINGUPONTHEWORDCOUNTDATACANBEGATHEREDFROMSOURCESIEBOOKS,NEWSPAPERS,MAGAZINES,X1010123FILTEREDSIGNALLECTURESANDTVCOMMERCIALSDUETOISSUESOFUNAVAILABILITYOFVOLUNTEERSANDSOMEIDENTITYISSUES,SPEECHDATABASESARENOTEASILYAVAILABLESOMESTANDARDSPEECHDATABASESAREAVAILABLEFORFEWLANGUAGES,LIKEBREFFORFRENCH,TIMITFORENGLISHANDATRFORJAPANESEETC4IIIIMPLEMENTATIONOFASRSYSTEMINTHISSECTIONIMPLEMENTATIONANDDESCRIPTIONOFFEATUREEXTRACTIONTECHNIQUEMELFREQUENCYCEPSTRALCOEFFICIENTMFCC,FEATUREMATCHINGTECHNIQUEDTWANDFEATURECLASSIFICATIONTECHNIQUEKNEARESTNEIGHBORKNNARE050100150200250300350400FIG3PREEMPHASISFILTEROUTPUT2FRAMINGANDWINDOWINGTHESPEECHSIGNALISNOTSTATIONARYINNATUREINORDERTOMAKEITSTATIONARYFRAMINGISUSEDFRAMINGISTHENEXTSTEPAFTERPREPROCESSINGINTHISSTEPSPEECHSIGNALISSPLITUPINTOSMALLERFRAMESOVERLAPPEDWITHEACHOTHERAFTERFRAMINGWINDOWINGISUSEDTOREMOVEDISCONTINUITIESATEDGESOFFRAMESTHEWINDOWMETHODUSEDINTHISRESEARCHISHAMMINGWINDOWTHEHAMMINGWINDOWISDEFINEDBYEQUATION3DISCUSSEDINDETAILW054046COSZN0N1N10OTERWISE3AMELFREQUENCYCEPSTRALCOEFFICIENTHUMANSPEECHASAFUNCTIONOFTHEFREQUENCIESISNOTLINEARINNATURETHEREFORETHEPITCHOFANACOUSTICSPEECHSIGNALOFSINGLEFREQUENCYISMAPPEDINTOA“MEL”SCALEINMELSCALE,THEWHERE,NISTOTALNUMBEROFSAMPLESINASINGLEFRAMETHEOUTPUTRESPONSEOFORIGINALSIGNALANDWINDOWEDSIGNALISSHOWNINFIG4ORIGINALSIGNALFREQUENCIESSPACINGBELOW1KHZISLINEARANDTHEFREQUENCIESSPACINGABOVE1KHZISLOGARITHMIC5THEMELFREQUENCIESCORRESPONDINGTOTHEHERTZFREQUENCIESARECALCULATEDBYUSINGEQUATION1040200204050001000015000FMEL2595LOG1F700THEBLOCKDIAGRAMFORMELFREQUENCYCEPSTRALCOEFFICIENTSMFCCCOMPUTATIONSISSHOWNINFIG2X13WINDOWEDSIGNAL1012050100150200250300350400FIG4ORIGINALSIGNALVSWINDOWEDSIGNAL3FASTFOURIERTRANSFORMFFTFASTFOURIERTRANSFORMISUSEDFORCALCULATINGOFTHEDISCRETEFOURIERTRANSFORMDFTOFSIGNAL,WITHSIZEN512HAVEBEENUSED6THISSTEPISPERFORMEDTOTRANSFORMTHESIGNALINTOFREQUENCYDOMAINTHEFFTISCALCULATEDUSINGEQUATION4FIG2BLOCKDIAGRAMFORMFCCCOMPUTATIONXKN1XEJZKNN0N4THE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST1089781467397919/16/31002016IEEEEWHERE,NISTHESIZEOFFFTTHEMAGNITUDESPECTRUMOFFFTXKN12XCOSK21K0,1,2,N16N0ZNISSHOWNINFIG5004FASTFOURIERTRANSFORMFFTTHEMFCCSGRAPHFORASINGLEWORDISSHOWNINFIG8MFCCCOMPUTATIONOFASINGLEWORD00351500310002500250015001000050050100150200250300350400FREQUENCYFIG5FASTFOURIERTRANSFORMMAGNITUDESPECTRUM4MELFILTERBANKTHENEXTSTEPAFTERTAKINGFFTOFTHESIGNALISTHETRANSFORMATIONFROMHERTZTOMELSCALE,THESPECTRUMSPOWERISTRANSFORMEDINTOAMELSCALE7THEMELFILTERBANKCOMPRISESOFTRIANGULARSHAPEDOVERLAPPINGFILTERSAS1015200102030405060708090NOOFFRAMESFIG8MFCCSFORSINGLEWORDSHOWNINFIG6FIG6MFCCFILTERBANKOUTPUTBCLASSIFICATIONITRAININGIITESTINGTHERESULTSANDPERCENTAGERECOGNITIONACCURACYAREOBTAINEDINTHEFORMOFCONFUSIONMATRIXDTWANDKNNAREDISCUSSEDFURTHERINNEXTSECTION1DYNAMICTIMEWRAPPINGDTWDTWALGORITHMCALCULATIONISINVIEWOFMEASURINGCLOSENESSINTWOTIMESERIESWHICHMIGHTSHIFTINTIMEANDSPEEDTHECOMPARISONISMEASUREDINTERMSOFPOSITIONOFTWOTIMEARRANGEMENTSIFONETIMEARRANGEMENTMIGHTBEWRAPPEDNONSTRAIGHTLYBYEXTENDINGORCONTRACTINGITALONGITSTIMEPIVOTTHEWRAPPINGINTWOTIMEARRANGEMENTSCANFURTHERBEUTILIZEDTODISCOVERRELATINGREGIONSINTWOTIMEARRANGEMENTSORTOFOCUSCLOSENESSBETWEENTHETWOTIMEARRANGEMENTSNUMERICALLY,DTWCOMPARESTWOTIMEARRANGEDPATTERNSANDMEASURETHESIMILARITYBETWEENTHEMWITHTHEHELPOFMINIMUMAMPLITUDELOGENERGYOFFRAMESAMPLITUDE5THE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST1099781467397919/16/31002016IEEE9DISTANCEFORMULACONSIDERTWOTIMESERIESPANDQHAVINGLENGTHNANDMIEPP1,PZ,P3,PI,PNQQ1,QZ,Q3,QJ,QMINTIMESERIESPANDQTHEITHANDJTHCOMPONENTOFTHEMATRIXINCLUDESTHEDISTANCEDPI,QJINTHETWOMATRIXPOINTSPIANDQJ10THENUSINGEUCLIDEANDISTANCEFORMULA,INEQUATION7MEASURESTHEABSOLUTEDISTANCEBETWEENTWOPOINTSDPI,QJJPIQJZ7EVERYMATRIXELEMENTIANDJISBELONGSTOTHEALIGNMENTINPOINTSPIANDQJTHEN,USINGEQUATION8ACCUMULATEDDISTANCEISCALCULATEDDI,JMINDI1,J1,DI1,J,DI,J1DI,J82KNEARESTNEIGHBORKNNTHEWORKINGOFKNNCLASSIFIERINTHISRESEARCHISDICUSSEDBELOWKNNMETHODCONSISTSOFASSIGNINGTHEINDEXOFTHEFEATUREVECTORTHATISNEARESTTOGIVENSCOREINTHEFEATURESPACEMINIMUMSCOREINDICESFROMDTWAREPROCESSEDINKNNMETHODITCONVERGESTHECURRENTFEATUREONTORESPECTIVEFEATUREOFFEATURESPACESAMENUMBERSOFFEATURESARERETURNEDBYKNNBUTTHESEFEATURESAREFROMFEATURESPACEMODEOFTHEKNNRETURNEDFEATURESGIVESTHEMOST3CONFUSIONMATRIXINORDERTOCHECKTHEEFFICIENCYOFTHESYSTEMIERECOGNITIONACCURACYANDPERCENTAGEOFERROR,ACONFUSIONMATRIXISFORMEDINCASEOFNWORDS,ITWILLCONTAINNNMATRIXINCONFUSIONMATRIXALLDIAGONALSENTRIES,STATEAIJFORIJ,SHOWEDTHENOOFTIMEAWORDIISMATCHEDCORRECTLY11SIMILARLYNONDIAGONALENTRIES,STATEAIJFORIJ,SHOWEDTHENUMBEROFTIMESAWORDIISISCONFUSEDWITHTHEWORDJA11A12A13A1NA21A22A23A2NA31A32A33A3NAN1AN2AN3ANN4PERCENTAGEERRORTHECALCULATIONOFPERCENTAGEOFERRORISVERYIMPORTANTINORDERTOCHECKTHEOVERALLSYSTEMPERFORMANCEANDITISCALCULATEDINTHEFORMOFCONFUSIONMATRIXFORTHISPURPOSEASINGLEISOLATEDWORDISTESTEDANDCHECKHOWMANYTIMEITISRECOGNIZEDSUCCESSFULLYANDSTATEDINDIAGONALENTRYINROWIPERCENTAGEISCALCULATEDBYDIVIDINGSUCCESSFULLYENTRIESDIVIDEDBYTHETOTALNOOFENTRIESTHUS,CORRECTMATCHCANDPERCENTAGEERRORE,FORAPARTICULARWORD,CANBEREPRESENTEDASINEQUATION9THERESULTSOBTAINEDFROMCONFUSIONMATRIXAREFURTHERDICUSSEDINSECTIONIVFREQUENTFEATURELIESINANDITWOULDBETHECORRECTMTCCAIJAAAAWEREIJ,J1,2,3,N9ILI2I3INRECOGNIZEDWORDOFERRORE1CX10010IVEXPERIMENTALRESULTSANDDISCUSSIONTHEEXPERIMENTSWEREPERFORMEDONASMALLSIZEVOCABULARYOFENGLISHTHESETUPINCLUDESWORDSSPOKENFROMFIVEDIFFERENTSPEAKERSTHESEWORDSWERESPOKENINANACOUSTICALLYBALANCED,NOISEFREEENVIRONMENTTHEIMPLEMENTATIONANDEXPERIMENTALRESULTSWEREANALYZEDWITHTHEHELPOFMATLABR2014BTHETESTINGANDTRAININGRESULTSOFASRAREOBTAINEDINTHEFORMOFMATRIXCALLEDCONFUSIONMATRIXASSHOWNINFIG10CONFUSIONMATRIXGRAPHOFWORDS2001501005001234568778561034910WORDINDEX21WORDINDEXFIG10CONFUSIONMATRIXGRAPHOFWORDSFIG9FLOWDIAGRAMOFKNNFIG9SHOWSTHEFLOWDIAGRAMOFKNNCLASSIFIER,HEREK_NISTHENUMBEROFNEARESTNEIGHBORS,N_SISTHENUMBEROFSPEAKERSANDN_WISTHENUMBEROFWORDSINVOCABULARYINFIG10OFCONFUSIONMATRIXGRAPHTHEXAXISANDTHEYAXISARESHOWINGTHEINDICESOFTHEWORDSTHEZAXISSHOWSTHEHEIGHTIEITSHOWSTHETOTALNUMBEROFTIMES,ANINDIVIDUALWORDISSUCCESSFULLYRECOGNIZEDORITCONFUSEDWITHANYOFOTHERWORDTHEDIAGONALSLOTSSHOWHEIGHTSASNOOFSUCCESFULORUNSUCCESFULRECOGNITIONTHE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST1109781467397919/16/31002016IEEESUCCESSFULRECOGNITIONRATETHEMAXIMUMPOSSIBLEATTAINEDPOSSIBILITYOFHEIGHTINTHISCASEIS200THETOTALNUMBEROFTIMESAWORDISTESTEDINTHISCASEIS200THEVALUESOFCORRECTMATCHCANDERRORE,FORWORDS,ARESUMMARIZEDINTABLEITABLEIRECOGNITIONERRORPERCENTAGEOFWORDSWORDVALUEOFCORRECTMATCHCRECOGNITIONACCURACYERROR1CX100“DARK”098982“WASH”099991“WATER”099599505“YEAR”097597525“DONT”097973“CARRY”099599505“GREASY”098982“LIKE”098598515“OILY”097597525“THAT”099599505ACCUMULATIVEAVERAGE098498416TABLEIDESCRIBESTHERECOGNITIONANDERRORRATESOFADATASETFIRSTLYEACHWORDISEVALUATEDONINDIVIDUALBASISANDTHENACCUMULATIVEAVERAGEOFTHEDATASETISCALCULATEDTHEDATAISOBTAINEDINTHEFORMOFCONFUSIONMATRIXASARESULTOFTESTINGTHEASRSYSTEMTHEACCUMULATIVEAVERAGESUCCESSRATEOBTAINEDFORTHEDATASETGIVENABOVEIS984WITH16ERRORRATEVCONCLUSIONTHEPROPOSEDRESEARCHONANASRSYSTEMDELINEATESMFCC,DTWANDKNNTECHNIQUESTHEEXTRACTIONOFFEATURESISPERFORMEDUSINGMFCC,DTWISUSEDFORSPEECHFEATURESMATCHINGANDKNNISUSEDFORCLASSIFICATIONMINIMUMSCOREINDICESACQUIREDFROMDTWAREPROCESSEDINKNNTHEEXPERIMENTALRESULTSAREOBTAINEDINTHEFORMOFCONFUSIONMATRIXITISOBSERVEDDURINGTHEWHOLERESEARCHTHATTHEPROPOSEDASRSYSTEMSHOWSGOODRECOGNITIONPERFORMANCEWHENMFCC,DTWANDKNNAREUSEDJOINTLYTHERECOGNITIONACCURACYACHIEVEDINTHISRESEARCHIS984WITHANERROROF16REFERENCES1JMGILBERT,SIRYBCHENKO,RHOFE,SRELL,MJFAGAN,RKMOORE,PGREEN,“ISOLATEDWORDRECOGNITIONOFSILENTSPEECHUSINGMAGNETICIMPLANTSANDSENSORS,”INTERNATIONALJOURNALOFMEDICALENGINEERINGANDPHYSICS,VOL32,PP11891197,AUGUST20102VIMALACANDDRVRADHA“AREVIEWONSPEECHRECOGNITIONCHALLENGESANDAPPROACHES”WORLDOFCOMPUTERSCIENCEANDINFORMATIONTECHNOLOGYJOURNALWCSITISSN22210741VOL2,NO1,PP17,20123JCLEAR,ANDNOSTLERSATKINS,“CORPUSDESIGNCRITERIA,“OXFORDJOURNALOFLITERARYANDLINGUISTICCOMPUTING,VOL7,NO1,PP116,19924LFLAMEL,ANDMESKENAZIJLGAUVAIN,“DESIGNCONSIDERATIONSANDTEXTSELECTIONFORBREF,ALARGEFRENCHREADSPEECHCORPUS,“IN1STINTERNATIONALCONFERENCEONSPOKENLANGUAGEPROCESSING,ICSLP,1990,PP109711005MMURUGAPPAN,NURULQASTURIIDAYUBAHARUDDIN,JERRITTAS“DWTANDMFCCBASEDHUMANEMOTIONALSPEECHCLASSIFICATIONUSINGLDA”INTERNATIONALCONFERENCEONBIOMEDICALENGINEERINGICOBE,PENANG,2728FEBRUARY2012,PP2032066MICHAELPITZ,RALFSCHLUTER,ANDHERMANNNEYSIRKOMOLAU,“COMPUTINGMELFREQUENCYCEPSTRALCOEFFICIENTSONTHEPOWERSPECTRUM,“IN2001IEEEINTERNATIONALCONFERENCEONACOUSTICS,SPEECH,ANDSIGNALPROCESSING,2001PROCEEDINGSICASSP01,USA,2001,PP73767IBRAHIMPATELANDDRYSRINIVASRAO“SPEECHRECOGNITIONUSINGHMMWITHMFCCANANALYSISUSINGFREQUENCYSPECTRALDECOMPOSITIONTECHNIQUE”SIGNALIMAGEPROCESSINGANINTERNATIONALJOURNALSIPIJVOL1,NO2,PP101110,DECEMBER20108AMILTON,SSHARMYROY,STAMILSELVI“SVMSCHEMEFORSPEECHEMOTIONRECOGNITIONUSINGMFCCFEATURE”INTERNATIONALJOURNALOFCOMPUTERAPPLICATIONS09758887VOLUME69NO9,PP3439,MAY20139AREGGBAGHDASARYANANDAALOUISBEEX“AUTOMATICPHONEMERECOGNITIONWITHSEGMENTALHIDDENMARKOVMODELS”IEEE2011CONFERENCEONSIGNALS,SYSTEMSANDCOMPUTERS,ASILOMAR,2011,PP56957410ANJALIBALA,ABHIJEETKUMAR,NIDHIKABIRLA“VOICECOMMANDRECOGNITIONSYSTEMMBASEDONMFCCANDDTW”INTERNATIONALJOURNALOFENGINEERINGSCIENCEANDTECHNOLOGY,VOL2,NO12,PP73357342,JAN201011HUANONGTING,BOONFEIYONG,SEYEDMOSTAFAMIRHASSANI,“SELFADJUSTABLENEURALNETWORKFORSPEECHRECOGNITION,”INTERNATIONALJOURNALOFENGINEERINGAPPLICATIONSOFARTIFICIALINTELLIGENCE,VOL26,PP20222027,JULY2013MUHAMMADATIFIMTIAZFACULTYOFELECTRONICSFEATURESOFSPEECHAREEXTRACTEDUSINGMFCCSDTWISAPPLIEDFORSPEECHFEATUREMATCHINGKNNISEMPLOYEDASACLASSIFIERTHEEXPERIMENTALSETUPINCLUDESWORDSOFENGLISHLANGUAGECOLLECTEDFROMFIVESPEAKERSTHESEWORDSWERESPOKENINANACOUSTICALLYBALANCED,NOISEFREEENVIRONMENTTHEEXPERIMENTALRESULTSOFPROPOSEDASRSYSTEMAREOBTAINEDINTHEFORMOFMATRIXCALLEDCONFUSIONMATRIXTHERECOGNITIONACCURACYACHIEVEDINTHISRESEARCHIS984KEYWORDSASRMFCCDTWKNNIINTRODUCTIONSPEECHISPROPAGATIONOFPERIODICVARIATIONSINTHEAIRFROMHUMANLUNGSTHERESPONSIBILITYFORTHEPRODUCTIONANDSHAPINGOFACTUALSOUNDISDONEBYTHEHUMANVOCALTRACTWITHTHEHELPOFPHARYNX,NOSECAVITYANDMOUTHAUTOMATICSPEECHRECOGNITIONASRSYSTEMISTHEPROCESSOFAUTOMATICALLYINTERPRETINGHUMANSPEECHINADIGITALDEVICEANDISDEFINEDASTRANSFORMATIONOFACOUSTICSPEECHSIGNALSTOWORDSSTRINGGENERALLYGOALOFALLASRSYSTEMSAREUSEDTOEXTRACTWORDSSTRINGFROMINPUTSPEECHSIGNAL1INASRPROCESSTHEINPUTISTHESPEECHUTTERANCEANDOUTPUTISTHEINTHEFORMOFTEXTUALDATAINASSOCIATIONWITHGIVENINPUTSOMEFACTORSONWHICHTHEPERFORMANCEOFASRSYSTEMSMAINLYRELIESAREVOCABULARYSIZE,AMOUNTOFTRAININGDATAANDSYSTEMSCOMPUTATIONALCOMPLEXITYTHEREARENUMEROUSAPPLICATIONSOFASRLIKEITISEXTENSIVELYUSEDINDOMESTICAPPLIANCES,SECURITYDEVICES,CELLULARPHONES,ATMMACHINESANDCOMPUTERSTHISPAPERDESCRIBESANASRSYSTEMOFENGLISHLANGUAGEEXPERIMENTEDONSMALLVOCABULARYOFWORDSRESTOFTHEPAPERISORGANIZEDASFOLLOWSSECTIONIIDESCRIBESTHEOVERALLASRSYSTEMOVERVIEW,THEMAJORBLOCKSUSEDINASRSYSTEMWHILEIMPLEMENTATIONOFASRSYSTEMUSINGFEATUREEXTRACTIONANDCLASSIFICATIONTECHNIQUESAREDESCRIBEDINSECTIONIIISECTIONIVDISCUSESTHEBRIEFDESCRIPTIONOFEXPERIMENTALSETUP,ASWELLASSOMEEXPERIMENTALRESULTSCONCLUDINGREMARKSAREDISCUSSEDINSECTIONVIIASRSYSTEMOVERVIEWASRSYSTEMCOMPRISESOFTWOMAINBLOCKSIEFEATUREEXTRACTIONBLOCKANDACLASSIFICATIONBLOCKASSHOWNINFIG1FIG1BLOCKDIAGRAMOFPROPOSEDASRSYSTEMDESIGNTHEINPUTTOTHEBLOCKISSPEECHANDOUTPUTOFTHEBLOCKISTEXTUALDATATHEWORKINGOFBLOCKSISDESCRIBEDBELOWAFEATUREEXTRACTIONBLOCKFEATUREEXTRACTIONISONEOFTHEMOSTVITALMODULEINANASRSYSTEMINASR,SPEECHSIGNALISSPLITUPINTOSMALLERFRAMESUSUALLY10TO25MSECASTHEREISREDUNDANTINFORMATION,PRESENTINTHESPEECHSIGNALTHEREFORE,TOTAKEOUTIMPORTANTANDUSEFULINFORMATIONFEATUREEXTRACTIONTECHNIQUEISAPPLIEDTHISWILLALSOHELPINDIMINUTIONOFDIMENSIONALITYPERCEPTUALLINEARPREDICTIONPLPCOEFFICIENTS,WAVELETTRANSFORMBASEDFEATURES,LINEARPREDICTIVECOEFFICIENTSLPC,WAVELETPACKETBASEDFEATURESANDMELFREQUENCYCEPSTRALCOEFFICIENTSMFCCARETHEWIDELYUSEDFEATURESINASR2MFCCISUSEDINTHISRESEARCHANDISDISCUSSEDINDETAILSINSECTIONIIIISOLATEDWORDAUTOMATICSPEECHRECOGNITIONASRSYSTEMUSINGMFCC,DTWTHEREFORETHEPITCHOFANACOUSTICSPEECHSIGNALOFSINGLEFREQUENCYISMAPPEDINTOA“MEL”SCALEINMELSCALE,THEFREQUENCIESSPACINGBELOW1KHZISLINEARANDTHEFREQUENCIESSPACINGABOVE1KHZISLOGARITHMIC5THEMELFREQUENCIESCORRESPONDINGTOTHEHERTZFREQUENCIESARECALCULATEDBYUSINGEQUATION1NULLNULLNULLNULL2595LOG1NULLNULLNULLNULL1THEBLOCKDIAGRAMFORMELFREQUENCYCEPSTRALCOEFFICIENTSMFCCCOMPUTATIONSISSHOWNINFIG2FIG2BLOCKDIAGRAMFORMFCCCOMPUTATIONTHEINNERBLOCKSSHOWNINFIG2AREINDIVIDUALLYDESCRIBEDBELOWINDETAIL1PREPROCESSINGTHEAUDIOSIGNALSINARERECORDEDHAVINGASAMPLINGRATEOF16KHZEACHWORDISSTOREDINSEPARATEAUDIOFILETHEPREPROCESSINGSTEPINCLUDESTHEPREEMPHASISOFSIGNALTOBOOSTTHEENERGYOFSIGNALATHIGHFREQUENCIESTHEDIFFERENCEEQUATIONOFPREEMPHASISFILTERISGIVENBYEQUATION2NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL1097NULLNULLNULL2THEOUTPUTRESPONSEOFPREEMPHASISFILTERISSHOWNINFIG3FIG3PREEMPHASISFILTEROUTPUT2FRAMINGANDWINDOWINGTHESPEECHSIGNALISNOTSTATIONARYINNATUREINORDERTOMAKEITSTATIONARYFRAMINGISUSEDFRAMINGISTHENEXTSTEPAFTERPREPROCESSINGINTHISSTEPSPEECHSIGNALISSPLITUPINTOSMALLERFRAMESOVERLAPPEDWITHEACHOTHERAFTERFRAMINGWINDOWINGISUSEDTOREMOVEDISCONTINUITIESATEDGESOFFRAMESTHEWINDOWMETHODUSEDINTHISRESEARCHISHAMMINGWINDOWTHEHAMMINGWINDOWISDEFINEDBYEQUATION3NULLNULLNULL054046COSNULLNULLNULLNULLNULLNULL0NULLNULL10NULLNULLNULLNULLNULLNULLNULLNULL3WHERE,NISTOTALNUMBEROFSAMPLESINASINGLEFRAMETHEOUTPUTRESPONSEOFORIGINALSIGNALANDWINDOWEDSIGNALISSHOWNINFIG4FIG4ORIGINALSIGNALVSWINDOWEDSIGNAL3FASTFOURIERTRANSFORMFFTFASTFOURIERTRANSFORMISUSEDFORCALCULATINGOFTHEDISCRETEFOURIERTRANSFORMDFTOFSIGNAL,WITHSIZEN512HAVEBEENUSED6THISSTEPISPERFORMEDTOTRANSFORMTHESIGNALINTOFREQUENCYDOMAINTHEFFTISCALCULATEDUSINGEQUATION4NULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL4050001000015000040200204ORIGNALSIGNAL05010015020025030035040032101X103FILTEREDSIGNAL050001000015000040200204ORIGINALSIGNAL0501001502002503003504002101X103WINDOWEDSIGNALTHE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST9781467397919/16/31002016IEEE107WHERE,NISTHESIZEOFFFTTHEMAGNITUDESPECTRUMOFFFTISSHOWNINFIG5FIG5FASTFOURIERTRANSFORMMAGNITUDESPECTRUM4MELFILTERBANKTHENEXTSTEPAFTERTAKINGFFTOFTHESIGNALISTHETRANSFORMATIONFROMHERTZTOMELSCALE,THESPECTRUMSPOWERISTRANSFORMEDINTOAMELSCALE7THEMELFILTERBANKCOMPRISESOFTRIANGULARSHAPEDOVERLAPPINGFILTERSASSHOWNINFIG6FIG6MFCCFILTERBANKOUTPUT5DELTAENERGYINTHISSTEPTAKEBASE10LOGARITHMOFOUTPUTOFPREVIOUSSTEPTHECOMPUTATIONOFLOGENERGYISESSENTIALBECAUSEOFTHEFACTTHATHUMANEARRESPONSETOACOUSTICSPEECHSIGNALLEVELISNOTLINEAR,HUMANEARISNOTMUCHSENSITIVETODIFFERENCEINAMPLITUDEATHIGHERAMPLITUDESTHEADVANTAGEOFLOGARITHMICFUNCTIONISTHATITTENDSTODUPLICATEBEHAVIOROFHUMANEARENERGYCOMPUTATIONISCALCULATEDUSINGEQUATION5THEGRAPHFORENERGYCOMPUTATIONISSHOWNINFIG7NULLNULLNULLNULLNULLNULLNULLNULLNULLNULL5FIG7SIGNALLOGENERGYOUTPUT6DISCRETECOSINETRANSFORMDCTTHEDISCRETECOSINETRANSFORMDCTISEMPLOYEDAFTERTAKINGLOGARITHMOFOUTPUTOFTHEMELFILTERBANKITFINALLYPRODUCESTHEMELFREQUENCYCEPSTRALCOEFFICIENTSINTHISRESEARCHFORANISOLATEDWORD,39DIMENSIONALFEATURESARETAKENOUTIE12MFCCMELFREQUENCYCEPSTRALCOEFFICIENTS,ONEENERGYFEATURE,ONEDELTAENERGYFEATURE,ONEDOUBLEDELTAENERGYFEATURE,12DELTAMFCCFEATURESAND12DOUBLEDELTAMFCCFEATURESANNPOINTDCT8ISDEFINEDBYEQUATION6NULLNULL2NULLNULLCOSNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL2NULL1NULL0,1,2,NULL16THEMFCCSGRAPHFORASINGLEWORDISSHOWNINFIG8FIG8MFCCSFORSINGLEWORDBCLASSIFICATIONITRAININGIITESTINGTHERESULTSANDPERCENTAGERECOGNITIONACCURACYAREOBTAINEDINTHEFORMOFCONFUSIONMATRIXDTWANDKNNAREDISCUSSEDFURTHERINNEXTSECTION1DYNAMICTIMEWRAPPINGDTWDTWALGORITHMCALCULATIONISINVIEWOFMEASURINGCLOSENESSINTWOTIMESERIESWHICHMIGHTSHIFTINTIMEANDSPEEDTHECOMPARISONISMEASUREDINTERMSOFPOSITIONOFTWOTIMEARRANGEMENTSIFONETIMEARRANGEMENTMIGHTBEWRAPPEDNONSTRAIGHTLYBYEXTENDINGORCONTRACTINGITALONGITSTIMEPIVOTTHEWRAPPINGINTWOTIMEARRANGEMENTSCANFURTHERBEUTILIZEDTODISCOVERRELATINGREGIONSINTWOTIMEARRANGEMENTSORTOFOCUSCLOSENESSBETWEENTHETWOTIMEARRANGEMENTSNUMERICALLY,DTWCOMPARESTWOTIMEARRANGEDPATTERNSANDMEASURETHESIMILARITYBETWEENTHEMWITHTHEHELPOFMINIMUM05010015020025030035040000005001001500200250030035004FASTFOURIERTRANSFORMFFTFREQUENCYAMPLITUDE0102030405060708090642024NOOFFRAMESLOGENERGYOFFRAMESSIGNALLOGENERGY01020304050607080902015105051015NOOFFRAMESAMPLITUDEMFCCCOMPUTATIONOFASINGLEWORDTHE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST9781467397919/16/31002016IEEE108DISTANCEFORMULACONSIDERTWOTIMESERIESPANDQHAVINGLENGTHNANDMIENULLNULLNULL,NULLNULL,NULLNULL,NULLNULL,NULLNULLNULLNULLNULL,NULLNULL,NULLNULL,NULLNULL,NULLNULLINTIMESERIESPANDQTHEITHANDJTHCOMPONENTOFTHEMATRIXINCLUDESTHEDISTANCEDPI,QJINTHETWOMATRIXPOINTSPIANDQJ10THENUSINGEUCLIDEANDISTANCEFORMULA,INEQUATION7MEASURESTHEABSOLUTEDISTANCEBETWEENTWOPOINTSNULLNULLNULLNULL,NULLNULLNULLNULLNULLNULLNULLNULLNULL7EVERYMATRIXELEMENTIANDJISBELONGSTOTHEALIGNMENTINPOINTSPIANDQJTHEN,USINGEQUATION8ACCUMULATEDDISTANCEISCALCULATEDNULLNULL,NULLMINNULLNULL1,NULL1,NULLNULL1,NULL,NULLNULL,NULL1NULLNULL,NULL82KNEARESTNEIGHBORKNNTHEWORKINGOFKNNCLASSIFIERINTHISRESEARCHISDICUSSEDBELOWKNNMETHODCONSISTSOFASSIGNINGTHEINDEXOFTHEFEATUREVECTORTHATISNEARESTTOGIVENSCOREINTHEFEATURESPACEMINIMUMSCOREINDICESFROMDTWAREPROCESSEDINKNNMETHODITCONVERGESTHECURRENTFEATUREONTORESPECTIVEFEATUREOFFEATURESPACESAMENUMBERSOFFEATURESARERETURNEDBYKNNBUTTHESEFEATURESAREFROMFEATURESPACEMODEOFTHEKNNRETURNEDFEATURESGIVESTHEMOSTFREQUENTFEATURELIESINANDITWOULDBETHERECOGNIZEDWORDFIG9FLOWDIAGRAMOFKNNFIG9SHOWSTHEFLOWDIAGRAMOFKNNCLASSIFIER,HEREK_NISTHENUMBEROFNEARESTNEIGHBORS,N_SISTHENUMBEROFSPEAKERSANDN_WISTHENUMBEROFWORDSINVOCABULARY3CONFUSIONMATRIXINORDERTOCHECKTHEEFFICIENCYOFTHESYSTEMIERECOGNITIONACCURACYANDPERCENTAGEOFERROR,ACONFUSIONMATRIXISFORMEDINCASEOFNWORDS,ITWILLCONTAINNNMATRIXINCONFUSIONMATRIXALLDIAGONALSENTRIES,STATEAIJFORIJ,SHOWEDTHENOOFTIMEAWORDIISMATCHEDCORRECTLY11SIMILARLYNONDIAGONALENTRIES,STATEAIJFORIJ,SHOWEDTHENUMBEROFTIMESAWORDIISISCONFUSEDWITHTHEWORDJA11A12A13A1NA21A22A23A2NA31A32A33A3NAN1AN2AN3ANN4PERCENTAGEERRORTHECALCULATIONOFPERCENTAGEOFERRORISVERYIMPORTANTINORDERTOCHECKTHEOVERALLSYSTEMPERFORMANCEANDITISCALCULATEDINTHEFORMOFCONFUSIONMATRIXFORTHISPURPOSEASINGLEISOLATEDWORDISTESTEDANDCHECKHOWMANYTIMEITISRECOGNIZEDSUCCESSFULLYANDSTATEDINDIAGONALENTRYINROWIPERCENTAGEISCALCULATEDBYDIVIDINGSUCCESSFULLYENTRIESDIVIDEDBYTHETOTALNOOFENTRIESTHUS,CORRECTMATCHCANDPERCENTAGEERRORE,FORAPARTICULARWORD,CANBEREPRESENTEDASINEQUATION9THERESULTSOBTAINEDFROMCONFUSIONMATRIXAREFURTHERDICUSSEDINSECTIONIVNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL,NULL1,2,3,NULL9OFERRORE1CX10010IVEXPERIMENTALRESULTSANDDISCUSSIONTHEEXPERIMENTSWEREPERFORMEDONASMALLSIZEVOCABULARYOFENGLISHTHESETUPINCLUDESWORDSSPOKENFROMFIVEDIFFERENTSPEAKERSTHESEWORDSWERESPOKENINANACOUSTICALLYBALANCED,NOISEFREEENVIRONMENTTHEIMPLEMENTATIONANDEXPERIMENTALRESULTSWEREANALYZEDWITHTHEHELPOFMATLABR2014BTHETESTINGANDTRAININGRESULTSOFASRAREOBTAINEDINTHEFORMOFMATRIXCALLEDCONFUSIONMATRIXASSHOWNINFIG10FIG10CONFUSIONMATRIXGRAPHOFWORDSINFIG10OFCONFUSIONMATRIXGRAPHTHEXAXISANDTHEYAXISARESHOWINGTHEINDICESOFTHEWORDSTHEZAXISSHOWSTHEHEIGHTIEITSHOWSTHETOTALNUMBEROFTIMES,ANINDIVIDUALWORDISSUCCESSFULLYRECOGNIZEDORITCONFUSEDWITHANYOFOTHERWORDTHEDIAGONALSLOTSSHOWHEIGHTSAS1234567891012345678910050100150200WORDINDEXCONFUSIONMATRIXGRAPHOFWORDSWORDINDEXNOOFSUCCESFULORUNSUCCESFULRECOGNITIONTHE2016ASIAPACIFICCONFERENCEONMULTIMEDIAANDBROADCASTINGAPMEDIACAST9781467397919/16/31002016IEEE109SUCCESSFULRECOGNITIONRATETHEMAXIMUMPOSSIBLEATTAINEDPOSSIBILITYOFHEIGHTINTHISCASEIS200THETOTALNUMBEROFTIMESAWORDISTESTEDINTHISCASEIS200THEVALUESOFCORRECTMATCHCANDERRORE,FORWORDS,ARESUMMARIZEDINTABLEITABLEIRECOGNITIONERRORPERCENTAGEOFWORDSWORDVALUEOFCORRECTMATCHCRECOGNITIONACCURACYERROR1CX100“DARK”098982“WASH”099991“WATER”099599505“YEAR”097597525“DONT”097973“CARRY”099599505“GREASY”098982“LIKE”098598515“OILY”097597525“THAT”099599505ACCUMULATIVEAVERAGE098498416TABLEIDESCRIBESTHERECOGNITIONANDERRORRATESOFADATASETFIRSTLYEACHWORDISEVALUATEDONINDIVIDUALBASISANDTHENACCUMULATIVEAVERAGEOFTHEDATASETISCALCULATEDTHEDATAISOBTAINEDINTHEFORMOFC
温馨提示:
1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2: 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
3.本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
提示  人人文库网所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
关于本文
本文标题:使用MFCC,DTW和KNN的隔离词自动语音识别(ASR)系统【中文4600字】
链接地址:https://www.renrendoc.com/p-9103516.html

官方联系方式

2:不支持迅雷下载,请使用浏览器下载   
3:不支持QQ浏览器下载,请用其他浏览器   
4:下载后的文档和图纸-无水印   
5:文档经过压缩,下载后原文更清晰   
关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

网站客服QQ:2881952447     

copyright@ 2020-2025  renrendoc.com 人人文库版权所有   联系电话:400-852-1180

备案号:蜀ICP备2022000484号-2       经营许可证: 川B2-20220663       公网安备川公网安备: 51019002004831号

本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知人人文库网,我们立即给予删除!