视频场景的重建与增强处理

上传人：简*** IP属地：湖北上传时间：2020-03-27 格式：DOC 页数：7 大小：71.50KB 积分：9.6 举报 版权申诉

已阅读5页，还剩2页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

附件附件 2 论文中英文摘要格式论文中英文摘要格式作者姓名作者姓名章国锋论文题目论文题目视频场景的重建与增强处理作者简介作者简介章国锋男 1981 年 11 月出生 2003 年 9 月师从于浙江大学鲍虎军教授于 2009 年 6 月获博士学位中中文文摘摘要要随着信息获取和处理技术的快速发展如何利用计算机技术高效逼真地表达虚拟和真实的世界实现真实世界与虚拟世界的高度互动和融合已经成为一个非常重要的研究课题处理对象复杂程度的日益提高使得采用正向三维建模并进行绘制的方式在真实性计算效率和交互的自然性等方面遇到了巨大的挑战鉴于图像视频等视觉信息易于直接从真实世界中获取计算机视觉技术又能够帮助从中提取和构建符合人类视觉感知规律的几何和光照等信息这有效弥补了传统图形处理技术的缺陷然而实拍的图像视频序列的缺陷在于仅部分采样了真实世界在图像平面上的投影并不直观地反映实际场景的三维结构导致计算机难以准确自动地理解实际复杂场景这严重阻碍了图像视频信息的深层次应用因此如何对图像视频场景进行几何重建与运动恢复就成了解决问题的关键这也正是三维视觉的核心问题基于上述研究背景本学位论文深入研究了计算机视觉中的视频场景的三维几何重建与运动恢复问题充分利用视频序列中信息的连贯性和冗余性借助视频场景的关键帧表达和多帧信息统计互补的创新思路提出了一个高效鲁棒的全局优化计算框架有效解决了摄像机参数深度和光流等信息的高精度恢复以及视频场景的层次分割等难题为大规模城市三维建模自主视觉导航视频场景的理解与重用以及虚实交互融合等重要应用奠定了基础本文的工作主要包括以下四个方面 1 1 基于视频的摄像机自动跟踪定位基于视频的摄像机自动跟踪定位摄像机的自动跟踪定位是计算机视觉的基本问题也是其他诸多视觉问题的基础在城市设计与规划军事训练与演习影视娱乐等领域有着广泛的应用然而现有的方法在处理规模计算效率和稳定性等方面都存在不同程度的瓶颈这严重阻碍了其在实际问题上的应用从图像视频序列中恢复出摄像机的运动参数往往采用运动推断结构技术 Structure from Motion 简称SFM 整个过程涉及到特征点匹配跟踪运动和结构初始化自定标以及集束调整等前人的方法在初始化运动和结构以及如何利用自定标技术将其及时从射影空间转换到度量空间上存在不足这极大地影响了重建的稳定性基于关键帧简化表达与多帧求解优势互补的思路我们提出了一个高效鲁棒的基于视频序列的摄像机自动跟踪技术采用基于关键帧的求解框架并通过优化关键帧求解次序最佳自定标时机选择以及集束调整局部化等策略极大地提高了焦距变化的长序列的求解稳定性和计算效率摄像机参数的准确恢复还很依赖于特征跟踪的质量特征点的匹配跟踪不仅需要非常准确而且跟踪轨迹要尽可能长从而使得建立的约束充分完备避免SFM重建漂移问题其中一个重要难题是如何将散布在非连续帧上的同名特征点即对应同一个三维场景点快速识别并匹配起来传统的KLT跟踪方法难以处理这种情况极易造成SFM重建漂移问题为了解决上述特征点跟踪存在的问题避免SFM重建漂移问题我们提出一个新的非连续特征点跟踪方法通过一个两遍匹配策略有效延长了特征点在连续帧上的跟踪寿命并通过快速的匹配矩阵计算有效找出具有匹配关系的子序列进行非连续帧上的特征匹配从而将分布在不同子序列上的同名特征点合并起来该方法不仅能够有效处理循环回路序列的匹配还能处理多视频序列的匹配将各个序列恢复的三维结构和摄像机运动轨迹注册到同一个世界坐标系下该方法能够有效提高摄像机跟踪的精度以及规模从而为大场景的稠密三维重建以及实时跟踪的高效稳定运行奠定了基础这方面的工作发表在计算机视觉领域顶级会议CVPR 2007以及重要会议ECCV 2010上基于这方面工作开发的摄像机自动跟踪系统ACTS 其在变焦长序列的求解性能上优于国际著名商业软件 Boujou Three 目前该软件已经申请软件著作权并于2009年7月底在网上发布前已经有超过350名的注册用户该系统已经成为我们许多研究工作的基础平台进一步的研究工作通过在离线预处理中对场景进行关键帧简化表达并结合在线的关键帧快速识别与匹配实现了街道规模级别下的摄像机在线实时跟踪已经发表在计算机视觉顶级会议ICCV 2009上 2 2 基于视频序列的时空一致性深度恢复基于视频序列的时空一致性深度恢复随着数字摄像和显示设备的快速发展和普及人们对三维视觉技术的需求也越来越迫切为了能从实拍的图像视频中恢复出高质量的深度信息基于多帧信息统计互补的思想我们提出了一个基于视频序列的稠密深度恢复算法创造性地提出在多视图立体深度恢复中采用集束优化方法将序列各帧上的深度变量通过几何一致性约束关联起来进行多帧上的信息统计和全局优化从而有效克服了噪声遮挡和误差对深度恢复的影响很好地解决了深度恢复的时空一致性和边界瑕疵等问题在此基础上我们还提出了一个多道置信度传播优化算法可以在不需要增加很多计算代价的条件下有效地扩展全局优化中的深度级数从而提高深度恢复的精度高质量的深度恢复直接推动了很多相关应用的发展该项研究成果发表在计算机视觉顶级会议 CVPR 2008 上作为大会宣读论文录取率 4 该项工作的扩展改进版发表在视觉与人工智能顶级期刊 IEEE Transactions on Pattern Analysis and Machine Intelligence 上 2008 年影响因子 5 96 这方面工作目前累计已被他引超过 20 次查自 Google Scholar 3 3 单目视频立体化单目视频立体化视频三维立体化已是大势所趋特别是最近阿凡达等三维电影的空前成功以及三维立体电视的兴起如何将真实拍摄的图像视频三维立体化引起了人们的空前关注通常的立体视频制作由于需要专门的硬件设备来拍摄或大量的人力进行手工建模来实现格式转换成本非常昂贵特别对于已经拍摄的单目视频往往只能采取后者的方式进行后期的二维到三维格式转换而目前又尚缺乏高效便捷的方法在这种背景下基于摄像机自动跟踪技术我们提出了一个自动高效的单目视频立体化算法巧妙地绕开稠密深度恢复过程将视频立体化问题转化为一个综合考虑了立体效果内容相似度和视觉平滑性的非线性能量优化问题通过将运动视差直接转化为双目视差实现了摄像机移动拍摄的单目视频到双目立体视频的快捷转换该项研究成果发表在国际期刊 IEEE Transactions on Visualization and Computer Graphics 上并曾在国家十五科技重大创新成就展上展出该项工作已被他引 6 次查自 Google Scholar 4 4 基于深度恢复的视频分割编辑与虚实融合基于深度恢复的视频分割编辑与虚实融合随着数字摄像机的大规模普及视频的采集和获取变得非常容易越来越多的视频通过互联网发布共享如何将这些海量视频数据利用起来进行视频的编辑虚实融合和再创作这是一个很有现实意义但又极具挑战性的课题这不仅需要恢复正确的摄像机参数和静态场景的三维几何信息还需要对视频场景进行分割并恢复动态物体的运动和三维等信息可能是由于正确深度信息恢复的困难性绝大多数的分割技术只利用了颜色信息显然如果能在优化能量函数中结合深度和运动信息建立多帧上的对应统计关系可以很大程度上解决单纯依靠颜色信息带来的诸多不确定性从而提高分割的稳定性基于这个思想我们提出了一个新的运动前景抽取方法将深度恢复光流估计和运动对象分割纳入到一个统一的求解框架下进行迭代优化充分利用多帧的信息优势在有效地抽取出运动前景的同时还可以估计出整个视频场景的光流信息以及静态场景的深度信息该方法突破了传统方法要求摄像机位置固定的局限性无需对背景进行预先建模就可以很好地处理摄像机自由移动且背景深度层次复杂的情况将摄像机跟踪深度恢复和视频分割技术结合起来我们进一步提出了一个半自动的视频再创作与虚实融合技术框架解决了基于视频的编辑与虚实交互融合中所面临的几何一致性光照一致性和遮挡一致性等问题实现了对各种视频资源的有效利用有效增加了视频编缉的手段和多样性特别针对静态场景提出了一个基于深度恢复的静态场景快速分层方法极大地提高了视频场景的分层效率基于高质量的深度恢复和视频分层结果我们分别讨论了如何进行虚拟三维物体和视频对象的真实感合成包括如何做到合成的几何一致性真实的光照效果和阴影效果等并提出了几种视频特效的生成方法如物体的去除和伪装子弹时间模拟场景雾化和景深效果等这方面的工作发表在计算机视觉顶级会议 ICCV 2007 国际期刊 IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Transactions on Visualization and Computer Graphics 和 Computer Animation and Virtual Worlds 上相关工作累计已被他引超过 15 次查自 Google Scholar 关键词摄像机跟踪深度恢复运动估计视频分割混合现实视频编缉视频增强 Video Scene Reconstruction and Enhancement Guofeng Zhang ABSTRACT With the enormous increase in popularity of information acquisition and processing technologies effectively expressing our real world and integrating it with computer generated virtual scene which enables convenient human environment interaction has become a very important research topic As the object complexity increases it becomes more and more challenging to directly model and render objects to obtain realistic visual effects due to intractable computational cost and manpower requirement Based on the fact that contrary to computation images and videos can easily capture the real world computer vision techniques are the key to help extract and reconstruct the geometric and luminosity information coincident with our human visual perceptions which break the modeling limitations of the traditional computer graphics Nonetheless captured image video sequences only partly sample the projected view and do not intuitively reflect the actual 3D scenes which make it difficult for computer to accurately understand the complex real scenes automatically It seriously hinders the advanced usage of image video information The key solution is to accurately recovery 3D geometry and motion information from images videos which is also the core problem of 3D vision This thesis mainly focuses on solving the 3D geometry reconstruction and motion recovery problem from the real captured video data By making use of information consistency and redundancy of video data and leveraging the creative ideas of using keyframe representation and the complementary information from multiple frames we propose a robust and efficient global optimization framework to deal with camera tracking dense depth recovery motion estimation and video segmentation It profits many applications such as large scale city modeling autonomous visual navigation video scene understanding and reusing and mixed reality all of which are regarded fundamentally important but difficult in the technical level traditionally In summary the works in this thesis can be categorized into the following four aspects 1 Video based Automatic Camera Tracking Automatic camera tracking is a fundamental problem of computer vision which can be widely applied to urban planning military training television and entertainment etc Previous methods generally suffer from the efficiency and robustness problem and have difficulties in handling large scale scenes which seriously hinder their usage in practical problems Specifically the structure from motion SFM technique can recover the camera motion parameters from image video sequences It typically involves the steps of feature point tracking motion and structure initialization self calibration and bundle adjustment Two steps i e initializing motion and selecting an appropriate moment for self calibration in order to upgrade the projective reconstruction to a metric one are problematic in previous work With the idea of using keyframe representation and the complementary information from multiple frames we propose a robust and efficient video based automatic camera tracking technology which can efficiently and reliably handle long sequences with varying focal length using keyframe based optimization framework Our method significantly advances structure from motion SFM First keyframes are ordered in optimization to make SFM initialization reliable Second we measure the accumulation error and selectively upgrade projective reconstruction to a metric one before the error begins to damage the self calibration Third a local on demand scheme for bundle adjustment is applied which dramatically accelerates computation Accurate camera motion recovery also largely relies on the quality of feature tracking It is crucial to obtain long and accurate feature tracks so that the established constraints are sufficient to avoid the drift problem in SFM One challenge is to rapidly recognize and match the common features which correspond to the same 3D points but are distributed in non consecutive frames The traditional sequential trackers such as KLT are difficult to handle this situation which may cause the drift problem in SFM We address this problem by proposing a new feature tracking method which contains two phases namely consecutive point tracking and non consecutive track matching A new two pass matching strategy is employed to greatly increase the matching rate for detected invariant features and extend the lifetime of the tracks In the non consecutive track matching phase by efficiently computing a matching matrix a set of disjointed subsequences with overlapping content can be detected Common feature tracks distributed in these subsequences can also be reliably matched Our method not only is useful for loopback sequences but also can handle tracking and matching multiple videos and registering them in a common 3D system Our method can significantly improve the accuracy of camera tracking in large scale scenes which is central to dense 3D reconstruction and robust real time camera tracking These two pieces of work were published in top vision conferences CVPR 2007 and ECCV 2010 respectively The research findings constitute an automatic camera tracking system ACTS which outperforms state of the art commercial software Boujou in handling long sequences with varying focal lengths We have registered the software copyright for ACTS The software is downloadable from since the end of July 2009 It has received widespread attentions from both domestic and international researchers There are over three hundred and fifty registered users now This system has been a basic platform for our many other research works Based on this work we further propose a keyframe based real time camera tracking method which contains an offline module to capture reference images for environment modeling and select keyframes to represent the scene for reducing the data redundancy Then in an online module with fast keyframe recognition and matching we can achieve online real time camera tracking in street scale scenes This work has been published in top vision conference ICCV 2009 2 Spatio Temporally Consistent Depth Recovery from a Video Sequence With the rapid development and popularity of digital capture and display devices 3D vision techniques are in great demand In order to recover high quality depth information from image video sequences with the idea of using complementary statistical information among multiple frames we propose a novel video based dense depth recovery algorithm We propose a bundle optimization model for multi view stereo which associates the depth variables among different frames with geometric coherence constraint and collects the multi frame statistics information to perform global optimization It can effectively remove the influence of image noise occlusion and outliers and recover a set of spatio temporally consistent depth maps with accurate object boundaries In addition we also propose a multi pass belief propagation algorithm to significantly extend the number of depth levels in global optimization without introducing much computational overhead It is equivalent to improving depth precision in computation The recovered high quality dense depth maps can facilitate many related applications This work was published in the top conference CVPR 2008 as oral presentation acceptance rate 4 The extended and improved version was published in IEEE Transactions on Pattern Analysis and Machine Intelligence impact factor 5 96 in year 2008 The work has been cited 20 times in Google Scholar 3 Stereoscopic Video Synthesis from Monocular Videos Along with the great success of recent 3D films e g Avatar and the popularity of 3D televisions and monitors the problem of how to convert a captured 2D video to a 3D one arises and awaits urgently proper answers Acquiring stereoscopic videos generally require special devices in data capturing or excessive labors for manual 3D modeling in format conversion For existing monocular videos the latter approach has to be deployed where the cost is very high using previous methods In this context we provide an efficient and convenient algorithmic approach for stereoscopic video production based on automatic camera tracking Instead of recovering depth maps our method synthesizes the binocular parallax in stereoscopic video directly from the motion parallax in monocular video The synthesis is formulated as an optimization problem via introducing a cost function incorporating the constraints of stereoscopic effects content similarity and visual smoothness This work was published in IEEE Transactions on Visualization and Computer Graphics and once presented in the 10th Five year Plan Significant Scientific Achievements Exhibition This work has been cited 6 times in Google Scholar 4 Video Segmentation Editing and Composition based on Dense Depth Recovery With the increasing prevalence of portable video capturing devices more and more videos are shared and broadcasted over internet which can be accessed by home users in their daily life How to utilize these massive video data to synthesize new videos is a fascinating but challenging task In order to achieve this objective besides recovering camera parameters and 3D geometry of static objects we also semantically separate video layers and recover the motion and 3D information of dynamic objects Due to the difficulty of accurate depth recovery most of the existing segmentation methods only use the color information and ignore the important depth cue Obviously if we can optimize an energy function incorporating both the depth and motion information and link the correspondences in multi frames the problem ambiguity can be largely reduced and the reliability of segmentation can be significantly enhanced According to this scheme we propose a new moving object extraction method which integrates depth recovery optical flow estimation and moving object segmentation into a unified framework and solve them iteratively Because we use multiple frames our system can simultaneously accomplish foreground extraction dense motion field construction and background depth map estimation robustly Also previous methods typically require that the camera is stationary and the background is known or can be easily modeled In contrast our method does not have this limitation and can handle the challenging cases th

人人文库> 全部分类> 教育资料 > 辅导培训

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

视频场景的重建与增强处理

文档简介

温馨提示

最新文档

评论

视频场景的重建与增强处理

文档简介

温馨提示

最新文档

评论

相关文档