




已阅读5页,还剩10页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ASystemforVideoSurveillanceandMonitoringRobertT.Collins,AlanJ.LiptonandTakeoKanadeRoboticsInstitute,CarnegieMellonUniversity,Pittsburgh,PAE-MAIL:frcollins,ajl,PHONE:412-268-1450HOMEPAGE:/vsamAbstractTheRoboticsInstituteatCarnegieMellonUniversity(CMU)andtheSarnoffCorporationaredevelopingasystemforautonomousVideoSurveillanceandMonitoring.Thetechnicalobjectiveistousemultiple,cooperativevideosensorstoprovidecontinuouscoverageofpeopleandvehiclesinclutteredenvironments.Thispaperpresentsanoverviewofthesystemandsignificantresultsachievedtodate.1IntroductionTheDARPAImageUnderstanding(IU)programisfundingbasicresearchintheareaofVideoSurveillanceandMonitoring(VSAM)toprovidebattlefieldawareness.ThethrustofCMUsVSAMresearchistodevelopauto-matedvideounderstandingalgorithmsthatallowanetworkofactivevideosensorstoautomaticallymonitorobjectsandeventswithinacomplex,urbanenvironment.Wehavedevelopedvideounderstandingtechnologythatcanau-tomaticallydetectandtrackmultiplepeopleandvehicleswithinclutteredscenes,andtomonitortheiractivitiesoverlongperiodsoftime.Humanandvehicletargetsareseamlesslytrackedthroughtheenvironmentusinganetworkofactivesensorstocooperativelytracktargetsoverareasthatcannotbeviewedcontinuouslybyasinglesensoralone.Eachsensortransmitssymboliceventsandrepresentativeimagerybacktoacentraloperatorcontrolstation,whichprovidesavisualsummaryofactivitiesdetectedoverabroadarea.Theuserinteractswiththesystemusinganintuitivemap-basedinterface.Forexample,theusercanspecifythatobjectsenteringaregionofinterestshouldtriggeranalert,relievingtheburdenofcontinuallywatchingthatarea.Thesystemautomaticallyallocatessensorstooptimizesystemperformancewhilefulfillingusercommands.Althoughdevelopedwithinacontextofprovidingbattlefieldawareness,webelievethistechnologyhasgreatpoten-tialforapplicationsinremotemonitoringofnuclearfacilities.Sampletasksthatcouldbeautomatedareverificationthatroutinemaintainanceactivitiesarebeingperformedaccordingtoschedule,loggingandtrackingvisitorsandpersonnelastheyenterandmovethroughthesite,andprovidingsecurityagainstunauthorizedintrusion.Otherapplicationsinmilitaryandlawenforcementscenariosincludeprovidingperimetersecurityfortroops,monitoringpeacetreatiesorrefugeemovementsusingunmannedairvehicles,providingsecurityforembassiesorairports,andstakingoutsuspecteddrugorterroristhide-outsbycollectingtime-stampedpicturesofeveryoneenteringandexitingthebuilding.ThefollowingsectionspresentanoverviewofvideosurveillancealgorithmsdevelopedatCMUoverthelasttwoyears(Section2)andtheirincorporationintoaprototypesystemforremotesurveillanceandmonitoring(Section3).ThisworkisfundedbytheDARPAIUprogramunderVSAMcontractnumberDAAB07-97-C-J031.1Collins-12VideoUnderstandingTechnologiesKeepingtrackofpeople,vehicles,andtheirinteractionsinacomplexenvironmentisadifficulttask.TheroleofVSAMvideounderstandingtechnologyinachievingthisgoalistoautomatically“parse”peopleandvehiclesfromrawvideo,determinetheirgeolocations,andautomaticallyinsertthemintoadynamicscenevisualization.Wehavedevelopedrobustroutinesfordetectingmovingobjects(Section2.1)andtrackingthemthroughavideosequence(Section2.2)usingacombinationoftemporaldifferencingandtemplatetracking.Detectedobjectsareclassifiedintosemanticcategoriessuchashuman,humangroup,car,andtruckusingshapeandcoloranalysis,andtheselabelsareusedtoimprovetrackingusingtemporalconsistencyconstraints(Section2.3).Furtherclassificationofhumanactivity,suchaswalkingandrunning,hasalsobeenachieved(Section2.4).Geolocationsoflabeledentitiesaredeterminedfromtheirimagecoordinatesusingeitherwide-baselinestereofromtwoormoreoverlappingcameraviews,orintersectionofviewingrayswithaterrainmodelfrommonocularviews(Section2.5).Thecomputedgeolocationsareusedtoprovidehigher-leveltrackingcapabilities,suchastaskingmultiplesensorswithvariablepan,tiltandzoomtocooperativelytrackanobjectthroughthescene(Section2.6).2.1MovingTargetDetectionTheinitialstageofthesurveillanceproblemistheextractionofmovingtargetsfromavideostream.Therearethreeconventionalapproachestoautomatedmovingtargetdetection:temporaldifferencing(two-frameorthree-frame)Andersonetal.,1985;backgroundsubtractionHaritaogluetal.,1998,Wrenetal.,1997;andopticalflow(seeBarronetal.,1994foranexcellentdiscussion).Temporaldifferencingisveryadaptivetodynamicenvironments,butgenerallydoesapoorjobofextractingallrelevantfeaturepixels.Backgroundsubtractionprovidesthemostcompletefeaturedata,butisextremelysensitivetodynamicscenechangesduetolightingandextraneousevents.Opticalflowcanbeusedtodetectindependentlymovingtargetsinthepresenceofcameramotion;however,mostopticalflowcomputationmethodsareverycomplexandareinapplicabletoreal-timealgorithmswithoutspecializedhardware.TheapproachpresentedhereissimilartothattakeninGrimsonandViola,1997,andisanattempttomakebackgroundsubtractionmorerobusttoenvironmentaldynamism.Thekeyideaistomaintainanevolvingstatisticalmodelofthebackgroundtoprovideamechanismthatadaptstoslowchangesintheenvironment.Foreachpixelvaluepninthenthframe,arunningaveragepnandaformofstandarddeviationpnaremaintainedbytemporalfiltering,implementedas:pn+1=pn+1+(1?)pnn+1=jpn+1?pn+1j+(1?)n(1)where=f,fistheframerateandisatimeconstantspecifyinghowfast(responsive)thebackgroundmodelshouldbetointensitychanges.theinfluenceofoldobservationsdecaysexponentiallyovertime,andthusthebackgroundgraduallyadaptstoreflectcurrentenvironmentalconditions.Ifapixelhasavaluewhichismorethan2frompn,thenitisconsideredaforegroundpixel.Atthispointamultiplehypothesisapproachisusedfordeterminingitsbehavior.Anewsetofstatistics(p0;0)isinitializedforthispixelandtheoriginalsetisremembered.If,aftertimet=3,thepixelvaluehasnotreturnedtoitsoriginalstatisticalvalue,thenewstatisticsarechosenasreplacementsfortheold.Foreground(moving)pixelsareaggregatedusingaconnectedcomponentapproachsothatindividualtarget“blobs”canbeextracted.Transientmovingobjectscauseshorttermchangestotheimagestreamthatarenotincludedinthebackgroundmodel,butarecontinuallytracked,whereasmorepermanentchangesare(afteratimeincrementof3haselapsed)absorbedintothebackground(seeFigure1).Themovingtargetdetectionalgorithmdescribedaboveispronetothreetypesoferror:incompleteextractionof2Collins-1(A)(B)Figure1:Exampleofmovingtargetdetectionbydynamicbackgroundsubtraction.Figure2:Targetpre-processing.Amovingtargetregionismorphologicallydilated(twice),erodedandthenitsborderisextracted.amovingobject;erroneousextractionofnon-movingpixels;andlegitimateextractionofillegitimatetargets(suchastreesblowinginthewind).Incompletetargetsarepartiallyreconstructedbyblobclusteringandmorphologicaldilation(Figure2).Erroneouslyextracted“noise”isremovedusingasizefilterwherebyblobsbelowacertaincriticalsizeareignored.Illegitimatetargetsmustberemovedbyothermeanssuchastemporalconsistencyanddomainknowledge.Thisisthepurviewofthetargettrackingalgorithm.2.2TargetTrackingTobegintobuildatemporalmodelofactivity,individualobjectsmustbetrackedovertime.Thefirststepinthisprocessistotaketheblobsgeneratedbymotiondetectionandmatchthembetweenframesofavideosequence.ManysystemsfortargettrackingarebasedonKalmanfilters.However,asIsardandBlakepointout,theseareoflimitedusebecausetheyarebasedonunimodalGaussiandensitiesthatcannotsupportsimultaneousalternativemotionhypothesesIsardandBlake,1996.IsardandBlakepresentanewstochasticalgorithmcalledCONDEN-SATIONthatdoeshandlealternativehypotheses.WorkontheproblemofmultipledataassociationinradartrackingcontextsisalsorelevantBar-ShalomandFortmann,1988.Weemployamuchsimplerapproachbasedonaframe-to-framematchingcostfunction.Arecordofeachblobiskeptwiththefollowinginformation:imagetrajectory(positionp(t)andvelocityv(t)asfunctionsoftime)oftheobjectcentroid,blob“appearance”intheformofanimagetemplate,3Collins-1blobsizesinpixels,colorhistogramhoftheblob.ThepositionandvelocityofeachblobTiisdeterminedfromthelasttimesteptlastandusedtopredictanewimagepositionatthecurrenttimetnow:pi(tnow)pi(tlast)+vi(tlast)(tnow?tlast)(2)UsingthisinformationamatchingcostisdeterminedbetweenaknowntargetTiandacandidatemovingblobRjC(Ti;Rj)=f(jpi?pjj;jsi?sjj;jhi?hjj):(3)Targetsthatare“closeenough”incostspaceareconsideredtobepotentialmatches.Tolendmorerobustnesstochangesinappearanceandocclusions,thefulltrackingalgorithmusesacombinationofcostandadaptivetemplatematching,asdescribedindetailinLiptonetal.,1998.RecentresultsfromthesystemareshowninFigure3.Figure3:Recentresultsofmovingentitydetectionandtrackingshowingdetectedobjectsandtrajectoriesoverlaidonoriginalvideoimagery.Notethattrackingpersistsevenwhentargetsaretemporarilyoccludedormotionless.2.3TargetClassificationTheultimategoaloftheVSAMeffortistobeabletoidentifyindividualentities(suchasthe“FedExtruck”,the“4:15pmbustoOakland”and“FredSmith”)anddeterminewhattheyaredoing.Asafirststep,entitiesareclassifiedintospecificclassgroupingssuchas“humans”and“vehicles”.Currently,weareexperimentingwithaneuralnetworkapproach(Figure4).Theneuralnetworkisastandardthree-layernetworkwhichusesabackpropagationalgorithmforhierarchicallearning.Inputstothenetworkare4Collins-1amixtureofimage-basedandscene-basedentityparameters:dispersedness(perimeter2/area(pixels);imagearea(pixels);aspectratio(height/width);andcamerazoomfactor.Usingasetofmotionregionsautomaticallyextractedbutlabeledbyhand,thenetworkistrainedtooutputoneofthreeclasses:human;vehicle;orhumangroup(twoormorehumanswalkingclosetogether).Whenteachingthenetworkthataninputentityisahuman,alloutputsaresetto0.0exceptfor“human”,whichissetto1.0.Otherclassesaretrainedsimilarly.Iftheinputdoesnotfitanyoftheclasses,suchasatreeblowinginthewind,alloutputsaresetto0.0.InputLayer(4)HiddenLayer(16)OutputLayer(3)TeachpatternAreaDispersednessVehicleSinglehuman1.00.0Multiplehuman0.0AspectratioRejecttargetsinglehuman0.00.00.0ZoommagnificationTargetCameraFigure4:Neuralnetworkapproachtotargetclassification.Resultsfromtheneuralnetworkareinterpretedasfollows:if(outputTHRESHOLD)classification=maximumNNoutputelseclassification=REJECTTheresultsforthisclassificationschemearesummarizedinTable1.Thisclassificationapproachiseffectiveforsingleimages.However,oneoftheadvantagesofvideoisitstemporalcomponent.Toexploitthis,classificationisperformedoneveryentityateveryframeandtheresultsofclassificationarekeptinahistogramwiththeithbucketcontainingthenumberoftimestheobjectwasclassifiedasclassi.Ateachtimestep,theclasslabelthathasbeenoutputmostoftenforeachobjectischosenitsmostlikelyclassification.2.4ActivityAnalysisAfterclassifyinganobject,wewanttodeterminewhatitisdoing.Understandinghumanactivityisoneofthemostdifficultopenproblemsintheareaofautomatedvideosurveillance.DetectingandanalyzinghumanmotioninrealtimefromvideoimageryhasonlyrecentlybecomeviablewithalgorithmslikePfinderWrenetal.,1997andW4Haritaogluetal.,1998.Thesealgorithmsrepresentagoodfirststeptotheproblemofrecognizingandanalyzinghumans,buttheystillhavesomedrawbacks.Ingeneral,theyworkbydetectingfeatures(suchashands,feetandhead),trackingthem,andfittingthemtosomeapriorihumanmodelsuchasthecardboardmodelofJuetalJuetal.,1996.Thereforethehumansubjectmustdominatetheimageframesothattheindividualbodycomponentscanbereliablydetected.5Collins-1ClassSamples%CorrectlyClassifiedHuman43099.5Humangroup9688.5Vehicle50899.4Falsealarms4864.5Total108296.9Table1:Resultsforneuralnetworkclassificationalgorithm.Weusea“star”skeletonizationprocedureforanalyzingthemotionofhumansthatarerelativelysmallintheimage.DetailscanbefoundinFujiyoshiandLipton,1998.Thekeyideaisthatasimpleformofskeletonizationthatonlyextractsthebroadinternalmotionfeaturesofatargetcanbeemployedtoanalyzeitsmotion.Thismethodprovidesasimple,real-time,robustwayofdetectingextremalpointsontheboundaryofthetargettoproducea“star”skeleton.The“star”skeletonconsistsofthecentroidofanentityandallofthelocalextremalpointswhichcanberecoveredwhentraversingtheboundaryoftheentitysimage(Figure5a).0,enddistanceborderpositiondistancedistanceiDFTLPFInverseDFT0endabcdeabcdestarskeletonoftheshaped(i)d(i)centroid-+0xyx,yccl,lxy(a)(b)x,ycc(A)(B)Figure5:(A)Thestarskeletonisformedby“unwrapping”aregionboundaryasadistancefunctionfromthecentroid.Thisfunctionisthensmoothedandextremalpointsareextracted.(B)Determinationofskeletonfeaturesmeasuringgaitandposture.istheangletheleftmostlegmakeswiththevertical,andistheanglethetorsomakeswiththevertical.Usingonlymeasurementsbasedonthe“star”skeleton,itispossibletodeterminethegaitandpostureofamovinghumanbeing.Figure5bshowshowtwoanglesnandnareextractedfromtheskeleton.Thevaluenrepresentstheangleofthetorsowithrespecttovertical,whilenrepresentstheangleoftheleftmostleginthefigure.Figure6showsskeletonmotionfortypicalsequencesofwalkingandrunninghumans,alongwiththevaluesofnandn.Thesedatawereacquiredinreal-timefromavideostreamwithframerate8Hz.Comparingtheaveragevaluesninfigures6(e)-(f)showthatthepostureofarunningtargetcaneasilybedistinguishedfromthatofawalkingoneusingtheangleofthetorsosegmentasaguide.Also,thefrequencyofcyclicmotionofthelegsegmentsprovidescuestodistinguishingrunningfromwalking.6Collins-1frame111213141516171819200.125sec(a)skeletonmotionofawalkingperson12345678910(b)skeletonmotionofarunningperson1112131415161718192012345678910frameradframe-1-0.500.510510152025-1-0.500.510510152025rad(d)legangleofarunningperson(c)legangleofawalkingperson00.4051015202500.40510152025frameframerad|rad|(f)torsoangleofarunningperson(e)torsoangleofawalkingpersonFigure6:Skeletonmotionsequences.Clearly,theperiodicmotionofnprovidescuestothetargetsmotionasdoesthemeanvalueofn.2.5Model-basedGeolocationThevideounderstandingtechniquesdescribedsofarhaveoperatedpurelyinimagespace.Alargeleapintermsofdescriptivepowercanbemadebytransformingimageblobsandmeasurementsinto3Dscene-basedobjectsanddescriptors.Inparticular,determinationofobjectlocationinthesceneallowsustoinfertheproperspatialrelation-shipsbetweensetsofobjects,andbetweenobjectsandscenefeaturessuchasroadsandbuildings.Furthermore,webelievethekeytocoherentlyintegratingalargenumberoftargethypothesesfrommultiplewidely-spacedsensorsiscomputationoftargetspatialgeolocation.Inregionswheremultiplesensorviewpointsoverlap,objectlocationscanbedeterminedveryaccuratelybywide-baselinestereotriangulation.However,regionsofthescenethatcanbesimultaneouslyviewedbymultiplesensorsarelikelytobeasmallpercentageofthetotalareaofregardinrealoutdoorsurveillanceapplications,whereitisdesirabletomaximizecoverageofalargeareagivenfinitesensorresources.Determiningtargetlocationsfromasinglesensorrequiresdomainconstraints,inthiscasetheassumptionthattheobjectisincontactwiththeterrain.Thiscontactlocationisestimatedbypassingaviewingraythroughthebottomoftheobjectintheimageandintersectingitwithamodelrepresentingtheterrain(seeFigure7a).Sequencesoflocationestimatesovertimearethenassembledintoconsistentobjecttrajectories.Previoususesoftherayintersectiontechniqueforobjectlocalizationinsurveillanceresearchhavebeenrestrictedtosmallareasofplanarterrain,wheretherelationbetweenimagepixelsandterrainlocationsisasimple2Dho-mographyBradshawetal.,1997,FlinchbaughandBannon,1994,Kolleretal.,1993.Thishasthebenefitthatnocameracalibrationisrequiredtodeterminetheback-projectionofanimagepointontothesceneplane,providedthemappingsofatleastfourcoplanarscenepointsareknownbeforehand.However,largeoutdoorsceneareasmay7Collins-1Elev(X0+kU,Y0+kV)Z0+kW11X0,Y0,Z0X0121087469351213ProjectionX0,Y0Ray:(X0,Y0)+k(U,V)Ray:(X0,Y0,Z0)+k(U,V,W)VerticalX(A)(B)Figure7:(A)Estimatingobjectgeolocationsbyintersectingtargetviewingrayswithaterrainmodel.(B)ABresenham-liketraversalalgorithmdetermineswhichDEMcellcontainsthefirstintersectionofaviewingrayandtheterrain.containsignificantlyvariedterrain.Tohandlethissituation,weperformgeolocationusingrayintersectionwithafullterrainmodelprovided,forexample,byadigitalelevationmap(DEM).Givenacalibratedsensor,andanimagepixelcorrespondingtotheassumedcontactpointbetweenanobjectandtheterrain,aviewingray(x0+ku;y0+kv;z0+kw)isconstructed,where(x0;y0;z0)isthe3Dsensorlocation,(u;v;w)isaunitvectordesignatingthedirectionoftheviewingrayemanatingfromthesensor,andk0isanarbitrarydistance.Generalmethodsfordeterminingwhereaviewingrayfirstintersectsa3Dscene(forexample,raytracing)canbequiteinvolved.However,whenscenestructureisstoredasaDEM,asimplegeometrictraversalalgorithmsuggestsitself,basedonthewell-knownBresenhamalgorithmfordrawingdigitallinesegments.ConsidertheverticalprojectionoftheviewingrayontotheDEMgrid(seeFigure7b).Startingatthegridcell(x0;y0)containingthesensor,eachcell(x;y)thattheraypassesthroughisexaminedinturn,progressingoutward,untiltheelevationstoredinthatDEMcellexceedsthez-componentofthe3Dviewingrayatthatlocation.Thez-componentoftheviewrayatlocation(x;y)iscomputedaseitherz0+(x?x0)uworz0+(y?y0)vw(4)dependingonwhichdirectioncosine,uorv,islarger.ThisapproachtoviewingrayintersectionlocalizesobjectstoliewithintheboundariesofasingleDEMgridcell.Amoreprecisesub-celllocationestimatecanthenbeobtainedbyinterpolation.Ifmultipleintersectionswiththeterrainbeyondthefirstarerequired,thisalgorithmcanbeusedtogeneratetheminorderofincreasingdistancefromthesensor,outtosomecut-offdistance.SeeCollinsetal.,1998formoredetails.8Collins-12.6Multi-SensorCooperationInmostcomplexoutdoorscenes,itisimpossibleforasinglesensortomaintainitsviewofanobjectforlongperiodsoftime.Objectsbecomeoccludedbyenvironmentalfeaturessuchastreesandbuildings,andsensorshavelimitedeffectivefieldsofregard.Apromisingsolutiontothisproblemistouseanetworkofvideosensorstocooperativelytrackanobjectthroughthescene.Trackedobjectsarethenhanded-offbetweencamerastogreatlyextendthetotaleffectiveareaofsurveillancecoverage.Therehasbeenlittleworkdoneonautonomouslycoordinatingmultipleactivevideosensorstocooperativelytrackamovingtarget.OneapproachispresentedbyMatsuyamaforacontrolledindoorenvironmentwherefourcameraslockontoontoaparticularobjectmovingacrossthefloorMatsuyama,1998.Weapproachtheproblemmoregenerallybyusingtheobjects3Dgeolocationascomputedinthelastsectiontodeterminewhereeachsensorshouldlook.Thepan,tiltandzoomoftheclosestse
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 房屋买卖协议
- 君天酒店合伙经营协议书
- 出租房协议范本
- 基本知识培训课件学习心得
- 基层调解基础知识培训课件
- 八年级数学一次函数与方程试卷及答案
- 三类分数阶微分方程边值问题的Lyapunov不等式研究:理论与应用
- 八年级数学全等三角形综合试卷及答案
- 八年级数学全等三角形判定练习试卷及答案
- 基层医院行风建设课件
- 向上沟通培训课件
- 网站篡改演练方案
- 《2025年CSCO卵巢癌诊疗指南》更新要点解读
- 2025年-四川省安全员《A证》考试题库及答案
- 防治传染病知识培训课件
- DBT29-35-2017 天津市住宅装饰装修工程技术标准
- 放射治疗技术规范标准
- 【物理】第九章 压强 单元练习+2024-2025学年人教版物理八年级下册
- 《仓库消防安全》课件
- (2021)最高法民申5114号凯某建设工程合同纠纷案 指导
- 蜡疗课件教学课件
评论
0/150
提交评论