已阅读5页,还剩10页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ASystemforVideoSurveillanceandMonitoringRobertT.Collins,AlanJ.LiptonandTakeoKanadeRoboticsInstitute,CarnegieMellonUniversity,Pittsburgh,PAE-MAIL:frcollins,ajl,PHONE:412-268-1450HOMEPAGE:/vsamAbstractTheRoboticsInstituteatCarnegieMellonUniversity(CMU)andtheSarnoffCorporationaredevelopingasystemforautonomousVideoSurveillanceandMonitoring.Thetechnicalobjectiveistousemultiple,cooperativevideosensorstoprovidecontinuouscoverageofpeopleandvehiclesinclutteredenvironments.Thispaperpresentsanoverviewofthesystemandsignificantresultsachievedtodate.1IntroductionTheDARPAImageUnderstanding(IU)programisfundingbasicresearchintheareaofVideoSurveillanceandMonitoring(VSAM)toprovidebattlefieldawareness.ThethrustofCMUsVSAMresearchistodevelopauto-matedvideounderstandingalgorithmsthatallowanetworkofactivevideosensorstoautomaticallymonitorobjectsandeventswithinacomplex,urbanenvironment.Wehavedevelopedvideounderstandingtechnologythatcanau-tomaticallydetectandtrackmultiplepeopleandvehicleswithinclutteredscenes,andtomonitortheiractivitiesoverlongperiodsoftime.Humanandvehicletargetsareseamlesslytrackedthroughtheenvironmentusinganetworkofactivesensorstocooperativelytracktargetsoverareasthatcannotbeviewedcontinuouslybyasinglesensoralone.Eachsensortransmitssymboliceventsandrepresentativeimagerybacktoacentraloperatorcontrolstation,whichprovidesavisualsummaryofactivitiesdetectedoverabroadarea.Theuserinteractswiththesystemusinganintuitivemap-basedinterface.Forexample,theusercanspecifythatobjectsenteringaregionofinterestshouldtriggeranalert,relievingtheburdenofcontinuallywatchingthatarea.Thesystemautomaticallyallocatessensorstooptimizesystemperformancewhilefulfillingusercommands.Althoughdevelopedwithinacontextofprovidingbattlefieldawareness,webelievethistechnologyhasgreatpoten-tialforapplicationsinremotemonitoringofnuclearfacilities.Sampletasksthatcouldbeautomatedareverificationthatroutinemaintainanceactivitiesarebeingperformedaccordingtoschedule,loggingandtrackingvisitorsandpersonnelastheyenterandmovethroughthesite,andprovidingsecurityagainstunauthorizedintrusion.Otherapplicationsinmilitaryandlawenforcementscenariosincludeprovidingperimetersecurityfortroops,monitoringpeacetreatiesorrefugeemovementsusingunmannedairvehicles,providingsecurityforembassiesorairports,andstakingoutsuspecteddrugorterroristhide-outsbycollectingtime-stampedpicturesofeveryoneenteringandexitingthebuilding.ThefollowingsectionspresentanoverviewofvideosurveillancealgorithmsdevelopedatCMUoverthelasttwoyears(Section2)andtheirincorporationintoaprototypesystemforremotesurveillanceandmonitoring(Section3).ThisworkisfundedbytheDARPAIUprogramunderVSAMcontractnumberDAAB07-97-C-J031.1Collins-12VideoUnderstandingTechnologiesKeepingtrackofpeople,vehicles,andtheirinteractionsinacomplexenvironmentisadifficulttask.TheroleofVSAMvideounderstandingtechnologyinachievingthisgoalistoautomatically“parse”peopleandvehiclesfromrawvideo,determinetheirgeolocations,andautomaticallyinsertthemintoadynamicscenevisualization.Wehavedevelopedrobustroutinesfordetectingmovingobjects(Section2.1)andtrackingthemthroughavideosequence(Section2.2)usingacombinationoftemporaldifferencingandtemplatetracking.Detectedobjectsareclassifiedintosemanticcategoriessuchashuman,humangroup,car,andtruckusingshapeandcoloranalysis,andtheselabelsareusedtoimprovetrackingusingtemporalconsistencyconstraints(Section2.3).Furtherclassificationofhumanactivity,suchaswalkingandrunning,hasalsobeenachieved(Section2.4).Geolocationsoflabeledentitiesaredeterminedfromtheirimagecoordinatesusingeitherwide-baselinestereofromtwoormoreoverlappingcameraviews,orintersectionofviewingrayswithaterrainmodelfrommonocularviews(Section2.5).Thecomputedgeolocationsareusedtoprovidehigher-leveltrackingcapabilities,suchastaskingmultiplesensorswithvariablepan,tiltandzoomtocooperativelytrackanobjectthroughthescene(Section2.6).2.1MovingTargetDetectionTheinitialstageofthesurveillanceproblemistheextractionofmovingtargetsfromavideostream.Therearethreeconventionalapproachestoautomatedmovingtargetdetection:temporaldifferencing(two-frameorthree-frame)Andersonetal.,1985;backgroundsubtractionHaritaogluetal.,1998,Wrenetal.,1997;andopticalflow(seeBarronetal.,1994foranexcellentdiscussion).Temporaldifferencingisveryadaptivetodynamicenvironments,butgenerallydoesapoorjobofextractingallrelevantfeaturepixels.Backgroundsubtractionprovidesthemostcompletefeaturedata,butisextremelysensitivetodynamicscenechangesduetolightingandextraneousevents.Opticalflowcanbeusedtodetectindependentlymovingtargetsinthepresenceofcameramotion;however,mostopticalflowcomputationmethodsareverycomplexandareinapplicabletoreal-timealgorithmswithoutspecializedhardware.TheapproachpresentedhereissimilartothattakeninGrimsonandViola,1997,andisanattempttomakebackgroundsubtractionmorerobusttoenvironmentaldynamism.Thekeyideaistomaintainanevolvingstatisticalmodelofthebackgroundtoprovideamechanismthatadaptstoslowchangesintheenvironment.Foreachpixelvaluepninthenthframe,arunningaveragepnandaformofstandarddeviationpnaremaintainedbytemporalfiltering,implementedas:pn+1=pn+1+(1?)pnn+1=jpn+1?pn+1j+(1?)n(1)where=f,fistheframerateandisatimeconstantspecifyinghowfast(responsive)thebackgroundmodelshouldbetointensitychanges.theinfluenceofoldobservationsdecaysexponentiallyovertime,andthusthebackgroundgraduallyadaptstoreflectcurrentenvironmentalconditions.Ifapixelhasavaluewhichismorethan2frompn,thenitisconsideredaforegroundpixel.Atthispointamultiplehypothesisapproachisusedfordeterminingitsbehavior.Anewsetofstatistics(p0;0)isinitializedforthispixelandtheoriginalsetisremembered.If,aftertimet=3,thepixelvaluehasnotreturnedtoitsoriginalstatisticalvalue,thenewstatisticsarechosenasreplacementsfortheold.Foreground(moving)pixelsareaggregatedusingaconnectedcomponentapproachsothatindividualtarget“blobs”canbeextracted.Transientmovingobjectscauseshorttermchangestotheimagestreamthatarenotincludedinthebackgroundmodel,butarecontinuallytracked,whereasmorepermanentchangesare(afteratimeincrementof3haselapsed)absorbedintothebackground(seeFigure1).Themovingtargetdetectionalgorithmdescribedaboveispronetothreetypesoferror:incompleteextractionof2Collins-1(A)(B)Figure1:Exampleofmovingtargetdetectionbydynamicbackgroundsubtraction.Figure2:Targetpre-processing.Amovingtargetregionismorphologicallydilated(twice),erodedandthenitsborderisextracted.amovingobject;erroneousextractionofnon-movingpixels;andlegitimateextractionofillegitimatetargets(suchastreesblowinginthewind).Incompletetargetsarepartiallyreconstructedbyblobclusteringandmorphologicaldilation(Figure2).Erroneouslyextracted“noise”isremovedusingasizefilterwherebyblobsbelowacertaincriticalsizeareignored.Illegitimatetargetsmustberemovedbyothermeanssuchastemporalconsistencyanddomainknowledge.Thisisthepurviewofthetargettrackingalgorithm.2.2TargetTrackingTobegintobuildatemporalmodelofactivity,individualobjectsmustbetrackedovertime.Thefirststepinthisprocessistotaketheblobsgeneratedbymotiondetectionandmatchthembetweenframesofavideosequence.ManysystemsfortargettrackingarebasedonKalmanfilters.However,asIsardandBlakepointout,theseareoflimitedusebecausetheyarebasedonunimodalGaussiandensitiesthatcannotsupportsimultaneousalternativemotionhypothesesIsardandBlake,1996.IsardandBlakepresentanewstochasticalgorithmcalledCONDEN-SATIONthatdoeshandlealternativehypotheses.WorkontheproblemofmultipledataassociationinradartrackingcontextsisalsorelevantBar-ShalomandFortmann,1988.Weemployamuchsimplerapproachbasedonaframe-to-framematchingcostfunction.Arecordofeachblobiskeptwiththefollowinginformation:imagetrajectory(positionp(t)andvelocityv(t)asfunctionsoftime)oftheobjectcentroid,blob“appearance”intheformofanimagetemplate,3Collins-1blobsizesinpixels,colorhistogramhoftheblob.ThepositionandvelocityofeachblobTiisdeterminedfromthelasttimesteptlastandusedtopredictanewimagepositionatthecurrenttimetnow:pi(tnow)pi(tlast)+vi(tlast)(tnow?tlast)(2)UsingthisinformationamatchingcostisdeterminedbetweenaknowntargetTiandacandidatemovingblobRjC(Ti;Rj)=f(jpi?pjj;jsi?sjj;jhi?hjj):(3)Targetsthatare“closeenough”incostspaceareconsideredtobepotentialmatches.Tolendmorerobustnesstochangesinappearanceandocclusions,thefulltrackingalgorithmusesacombinationofcostandadaptivetemplatematching,asdescribedindetailinLiptonetal.,1998.RecentresultsfromthesystemareshowninFigure3.Figure3:Recentresultsofmovingentitydetectionandtrackingshowingdetectedobjectsandtrajectoriesoverlaidonoriginalvideoimagery.Notethattrackingpersistsevenwhentargetsaretemporarilyoccludedormotionless.2.3TargetClassificationTheultimategoaloftheVSAMeffortistobeabletoidentifyindividualentities(suchasthe“FedExtruck”,the“4:15pmbustoOakland”and“FredSmith”)anddeterminewhattheyaredoing.Asafirststep,entitiesareclassifiedintospecificclassgroupingssuchas“humans”and“vehicles”.Currently,weareexperimentingwithaneuralnetworkapproach(Figure4).Theneuralnetworkisastandardthree-layernetworkwhichusesabackpropagationalgorithmforhierarchicallearning.Inputstothenetworkare4Collins-1amixtureofimage-basedandscene-basedentityparameters:dispersedness(perimeter2/area(pixels);imagearea(pixels);aspectratio(height/width);andcamerazoomfactor.Usingasetofmotionregionsautomaticallyextractedbutlabeledbyhand,thenetworkistrainedtooutputoneofthreeclasses:human;vehicle;orhumangroup(twoormorehumanswalkingclosetogether).Whenteachingthenetworkthataninputentityisahuman,alloutputsaresetto0.0exceptfor“human”,whichissetto1.0.Otherclassesaretrainedsimilarly.Iftheinputdoesnotfitanyoftheclasses,suchasatreeblowinginthewind,alloutputsaresetto0.0.InputLayer(4)HiddenLayer(16)OutputLayer(3)TeachpatternAreaDispersednessVehicleSinglehuman1.00.0Multiplehuman0.0AspectratioRejecttargetsinglehuman0.00.00.0ZoommagnificationTargetCameraFigure4:Neuralnetworkapproachtotargetclassification.Resultsfromtheneuralnetworkareinterpretedasfollows:if(outputTHRESHOLD)classification=maximumNNoutputelseclassification=REJECTTheresultsforthisclassificationschemearesummarizedinTable1.Thisclassificationapproachiseffectiveforsingleimages.However,oneoftheadvantagesofvideoisitstemporalcomponent.Toexploitthis,classificationisperformedoneveryentityateveryframeandtheresultsofclassificationarekeptinahistogramwiththeithbucketcontainingthenumberoftimestheobjectwasclassifiedasclassi.Ateachtimestep,theclasslabelthathasbeenoutputmostoftenforeachobjectischosenitsmostlikelyclassification.2.4ActivityAnalysisAfterclassifyinganobject,wewanttodeterminewhatitisdoing.Understandinghumanactivityisoneofthemostdifficultopenproblemsintheareaofautomatedvideosurveillance.DetectingandanalyzinghumanmotioninrealtimefromvideoimageryhasonlyrecentlybecomeviablewithalgorithmslikePfinderWrenetal.,1997andW4Haritaogluetal.,1998.Thesealgorithmsrepresentagoodfirststeptotheproblemofrecognizingandanalyzinghumans,buttheystillhavesomedrawbacks.Ingeneral,theyworkbydetectingfeatures(suchashands,feetandhead),trackingthem,andfittingthemtosomeapriorihumanmodelsuchasthecardboardmodelofJuetalJuetal.,1996.Thereforethehumansubjectmustdominatetheimageframesothattheindividualbodycomponentscanbereliablydetected.5Collins-1ClassSamples%CorrectlyClassifiedHuman43099.5Humangroup9688.5Vehicle50899.4Falsealarms4864.5Total108296.9Table1:Resultsforneuralnetworkclassificationalgorithm.Weusea“star”skeletonizationprocedureforanalyzingthemotionofhumansthatarerelativelysmallintheimage.DetailscanbefoundinFujiyoshiandLipton,1998.Thekeyideaisthatasimpleformofskeletonizationthatonlyextractsthebroadinternalmotionfeaturesofatargetcanbeemployedtoanalyzeitsmotion.Thismethodprovidesasimple,real-time,robustwayofdetectingextremalpointsontheboundaryofthetargettoproducea“star”skeleton.The“star”skeletonconsistsofthecentroidofanentityandallofthelocalextremalpointswhichcanberecoveredwhentraversingtheboundaryoftheentitysimage(Figure5a).0,enddistanceborderpositiondistancedistanceiDFTLPFInverseDFT0endabcdeabcdestarskeletonoftheshaped(i)d(i)centroid-+0xyx,yccl,lxy(a)(b)x,ycc(A)(B)Figure5:(A)Thestarskeletonisformedby“unwrapping”aregionboundaryasadistancefunctionfromthecentroid.Thisfunctionisthensmoothedandextremalpointsareextracted.(B)Determinationofskeletonfeaturesmeasuringgaitandposture.istheangletheleftmostlegmakeswiththevertical,andistheanglethetorsomakeswiththevertical.Usingonlymeasurementsbasedonthe“star”skeleton,itispossibletodeterminethegaitandpostureofamovinghumanbeing.Figure5bshowshowtwoanglesnandnareextractedfromtheskeleton.Thevaluenrepresentstheangleofthetorsowithrespecttovertical,whilenrepresentstheangleoftheleftmostleginthefigure.Figure6showsskeletonmotionfortypicalsequencesofwalkingandrunninghumans,alongwiththevaluesofnandn.Thesedatawereacquiredinreal-timefromavideostreamwithframerate8Hz.Comparingtheaveragevaluesninfigures6(e)-(f)showthatthepostureofarunningtargetcaneasilybedistinguishedfromthatofawalkingoneusingtheangleofthetorsosegmentasaguide.Also,thefrequencyofcyclicmotionofthelegsegmentsprovidescuestodistinguishingrunningfromwalking.6Collins-1frame111213141516171819200.125sec(a)skeletonmotionofawalkingperson12345678910(b)skeletonmotionofarunningperson1112131415161718192012345678910frameradframe-1-0.500.510510152025-1-0.500.510510152025rad(d)legangleofarunningperson(c)legangleofawalkingperson00.4051015202500.40510152025frameframerad|rad|(f)torsoangleofarunningperson(e)torsoangleofawalkingpersonFigure6:Skeletonmotionsequences.Clearly,theperiodicmotionofnprovidescuestothetargetsmotionasdoesthemeanvalueofn.2.5Model-basedGeolocationThevideounderstandingtechniquesdescribedsofarhaveoperatedpurelyinimagespace.Alargeleapintermsofdescriptivepowercanbemadebytransformingimageblobsandmeasurementsinto3Dscene-basedobjectsanddescriptors.Inparticular,determinationofobjectlocationinthesceneallowsustoinfertheproperspatialrelation-shipsbetweensetsofobjects,andbetweenobjectsandscenefeaturessuchasroadsandbuildings.Furthermore,webelievethekeytocoherentlyintegratingalargenumberoftargethypothesesfrommultiplewidely-spacedsensorsiscomputationoftargetspatialgeolocation.Inregionswheremultiplesensorviewpointsoverlap,objectlocationscanbedeterminedveryaccuratelybywide-baselinestereotriangulation.However,regionsofthescenethatcanbesimultaneouslyviewedbymultiplesensorsarelikelytobeasmallpercentageofthetotalareaofregardinrealoutdoorsurveillanceapplications,whereitisdesirabletomaximizecoverageofalargeareagivenfinitesensorresources.Determiningtargetlocationsfromasinglesensorrequiresdomainconstraints,inthiscasetheassumptionthattheobjectisincontactwiththeterrain.Thiscontactlocationisestimatedbypassingaviewingraythroughthebottomoftheobjectintheimageandintersectingitwithamodelrepresentingtheterrain(seeFigure7a).Sequencesoflocationestimatesovertimearethenassembledintoconsistentobjecttrajectories.Previoususesoftherayintersectiontechniqueforobjectlocalizationinsurveillanceresearchhavebeenrestrictedtosmallareasofplanarterrain,wheretherelationbetweenimagepixelsandterrainlocationsisasimple2Dho-mographyBradshawetal.,1997,FlinchbaughandBannon,1994,Kolleretal.,1993.Thishasthebenefitthatnocameracalibrationisrequiredtodeterminetheback-projectionofanimagepointontothesceneplane,providedthemappingsofatleastfourcoplanarscenepointsareknownbeforehand.However,largeoutdoorsceneareasmay7Collins-1Elev(X0+kU,Y0+kV)Z0+kW11X0,Y0,Z0X0121087469351213ProjectionX0,Y0Ray:(X0,Y0)+k(U,V)Ray:(X0,Y0,Z0)+k(U,V,W)VerticalX(A)(B)Figure7:(A)Estimatingobjectgeolocationsbyintersectingtargetviewingrayswithaterrainmodel.(B)ABresenham-liketraversalalgorithmdetermineswhichDEMcellcontainsthefirstintersectionofaviewingrayandtheterrain.containsignificantlyvariedterrain.Tohandlethissituation,weperformgeolocationusingrayintersectionwithafullterrainmodelprovided,forexample,byadigitalelevationmap(DEM).Givenacalibratedsensor,andanimagepixelcorrespondingtotheassumedcontactpointbetweenanobjectandtheterrain,aviewingray(x0+ku;y0+kv;z0+kw)isconstructed,where(x0;y0;z0)isthe3Dsensorlocation,(u;v;w)isaunitvectordesignatingthedirectionoftheviewingrayemanatingfromthesensor,andk0isanarbitrarydistance.Generalmethodsfordeterminingwhereaviewingrayfirstintersectsa3Dscene(forexample,raytracing)canbequiteinvolved.However,whenscenestructureisstoredasaDEM,asimplegeometrictraversalalgorithmsuggestsitself,basedonthewell-knownBresenhamalgorithmfordrawingdigitallinesegments.ConsidertheverticalprojectionoftheviewingrayontotheDEMgrid(seeFigure7b).Startingatthegridcell(x0;y0)containingthesensor,eachcell(x;y)thattheraypassesthroughisexaminedinturn,progressingoutward,untiltheelevationstoredinthatDEMcellexceedsthez-componentofthe3Dviewingrayatthatlocation.Thez-componentoftheviewrayatlocation(x;y)iscomputedaseitherz0+(x?x0)uworz0+(y?y0)vw(4)dependingonwhichdirectioncosine,uorv,islarger.ThisapproachtoviewingrayintersectionlocalizesobjectstoliewithintheboundariesofasingleDEMgridcell.Amoreprecisesub-celllocationestimatecanthenbeobtainedbyinterpolation.Ifmultipleintersectionswiththeterrainbeyondthefirstarerequired,thisalgorithmcanbeusedtogeneratetheminorderofincreasingdistancefromthesensor,outtosomecut-offdistance.SeeCollinsetal.,1998formoredetails.8Collins-12.6Multi-SensorCooperationInmostcomplexoutdoorscenes,itisimpossibleforasinglesensortomaintainitsviewofanobjectforlongperiodsoftime.Objectsbecomeoccludedbyenvironmentalfeaturessuchastreesandbuildings,andsensorshavelimitedeffectivefieldsofregard.Apromisingsolutiontothisproblemistouseanetworkofvideosensorstocooperativelytrackanobjectthroughthescene.Trackedobjectsarethenhanded-offbetweencamerastogreatlyextendthetotaleffectiveareaofsurveillancecoverage.Therehasbeenlittleworkdoneonautonomouslycoordinatingmultipleactivevideosensorstocooperativelytrackamovingtarget.OneapproachispresentedbyMatsuyamaforacontrolledindoorenvironmentwherefourcameraslockontoontoaparticularobjectmovingacrossthefloorMatsuyama,1998.Weapproachtheproblemmoregenerallybyusingtheobjects3Dgeolocationascomputedinthelastsectiontodeterminewhereeachsensorshouldlook.Thepan,tiltandzoomoftheclosestse
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年湖南省岳阳市单招职业适应性测试模拟测试卷附答案
- 2025重庆巫溪县古路镇卫生院招聘(公共基础知识)综合能力测试题带答案解析
- 2026年鄂尔多斯生态环境职业学院单招职业适应性考试题库附答案
- 2025福建漳州市国有资本运营集团有限公司一线岗位招聘8人(公共基础知识)测试题附答案解析
- 2026年重庆移通学院单招(计算机)测试备考题库及答案1套
- 2025湖南怀化市辰溪县招聘社区专职工作者24人(公共基础知识)综合能力测试题附答案解析
- 2026年天津国土资源和房屋职业学院单招职业技能考试题库及答案详解一套
- 2026年黔西南民族职业技术学院单招职业倾向性考试题库附答案
- 2026年重庆艺术工程职业学院单招职业倾向性测试模拟测试卷附答案
- 2026年钦州幼儿师范高等专科学校单招(计算机)测试备考题库及答案1套
- 廉洁校园主题班会课件
- 房颤患者非心脏手术麻醉管理
- 具有履行合同所必须的设备和专业技术能力的声明函8篇
- 洗车设备管理制度
- GB/Z 15166.8-2025高压交流熔断器第8部分:应用导则
- DB43T1027-2015 近自然森林可持续经营技术规程
- 2025-2030全球膜处理系统行业调研及趋势分析报告
- 《以平和的心态迎接期末考试》班会课件
- 小班冬季疾病预防
- 2025年公司搬迁方案计划
- 鞋和靴用乳膏产品生产技术现状
评论
0/150
提交评论