视频场景的重建与增强处理_第1页
视频场景的重建与增强处理_第2页
视频场景的重建与增强处理_第3页
视频场景的重建与增强处理_第4页
视频场景的重建与增强处理_第5页
已阅读5页,还剩2页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

附件附件 2 论文中英文摘要格式论文中英文摘要格式 作者姓名 作者姓名 章国锋 论文题目论文题目 视频场景的重建与增强处理 作者简介作者简介 章国锋 男 1981 年 11 月出生 2003 年 9 月师从于浙江大学鲍 虎军教授 于 2009 年 6 月获博士学位 中中 文文 摘摘 要要 随着信息获取和处理技术的快速发展 如何利用计算机技术高效逼真地表达虚拟和真实 的世界 实现真实世界与虚拟世界的高度互动和融合 已经成为一个非常重要的研究课题 处理对象复杂程度的日益提高 使得采用正向三维建模并进行绘制的方式在真实性 计算效 率和交互的自然性等方面遇到了巨大的挑战 鉴于图像 视频等视觉信息易于直接从真实世 界中获取 计算机视觉技术又能够帮助从中提取和构建符合人类视觉感知规律的几何和光照 等信息 这有效弥补了传统图形处理技术的缺陷 然而 实拍的图像 视频序列的缺陷在于 仅部分采样了真实世界在图像平面上的投影 并不直观地反映实际场景的三维结构 导致计 算机难以准确自动地理解实际复杂场景 这严重阻碍了图像 视频信息的深层次应用 因此 如何对图像 视频场景进行几何重建与运动恢复就成了解决问题的关键 这也正是三维视觉 的核心问题 基于上述研究背景 本学位论文深入研究了计算机视觉中的视频场景的三维几何重建与 运动恢复问题 充分利用视频序列中信息的连贯性和冗余性 借助视频场景的关键帧表达和 多帧信息统计互补的创新思路 提出了一个高效鲁棒的全局优化计算框架 有效解决了摄像 机参数 深度和光流等信息的高精度恢复以及视频场景的层次分割等难题 为大规模城市三 维建模 自主视觉导航 视频场景的理解与重用以及虚实交互融合等重要应用奠定了基础 本文的工作主要包括以下四个方面 1 1 基于视频的摄像机自动跟踪定位基于视频的摄像机自动跟踪定位 摄像机的自动跟踪定位 是计算机视觉的基本问题 也是其他诸多视觉问题的基础 在 城市设计与规划 军事训练与演习 影视娱乐等领域有着广泛的应用 然而现有的方法在处 理规模 计算效率和稳定性等方面都存在不同程度的瓶颈 这严重阻碍了其在实际问题上的 应用 从图像 视频序列中恢复出摄像机的运动参数 往往采用运动推断结构技术 Structure from Motion 简称SFM 整个过程涉及到特征点匹配跟踪 运动和结构初始 化 自定标以及集束调整等 前人的方法在初始化运动和结构 以及如何利用自定标技术将 其及时从射影空间转换到度量空间上存在不足 这极大地影响了重建的稳定性 基于关键帧 简化表达与多帧求解优势互补的思路 我们提出了一个高效鲁棒的基于视频序列的摄像机自 动跟踪技术 采用基于关键帧的求解框架 并通过优化关键帧求解次序 最佳自定标时机选 择以及集束调整局部化等策略 极大地提高了焦距变化的长序列的求解稳定性和计算效率 摄像机参数的准确恢复还很依赖于特征跟踪的质量 特征点的匹配跟踪不仅需要非常准 确 而且跟踪轨迹要尽可能长 从而使得建立的约束充分完备 避免SFM重建漂移问题 其 中一个重要难题是 如何将散布在非连续帧上的同名特征点 即对应同一个三维场景点 快 速识别并匹配起来 传统的KLT跟踪方法难以处理这种情况 极易造成SFM重建漂移问题 为 了解决上述特征点跟踪存在的问题 避免SFM重建漂移问题 我们提出一个新的非连续特征 点跟踪方法 通过一个两遍匹配策略有效延长了特征点在连续帧上的跟踪寿命 并通过快速 的匹配矩阵计算 有效找出具有匹配关系的子序列 进行非连续帧上的特征匹配 从而将分 布在不同子序列上的同名特征点合并起来 该方法不仅能够有效处理循环回路序列的匹配 还能处理多视频序列的匹配 将各个序列恢复的三维结构和摄像机运动轨迹注册到同一个世 界坐标系下 该方法能够有效提高摄像机跟踪的精度以及规模 从而为大场景的稠密三维重 建以及实时跟踪的高效稳定运行奠定了基础 这方面的工作发表在计算机视觉领域顶级会议CVPR 2007以及重要会议ECCV 2010上 基 于这方面工作开发的摄像机自动跟踪系统ACTS 其在变焦长序列的求解性能上优于国际著名 商业软件 Boujou Three 目前该软件已经申请软件著作权 并于2009年7月底在网上发 布 前已经有超过350名的注册用户 该系统已经成为我们许多研究工作的基础平台 进一步的 研究工作通过在离线预处理中对场景进行关键帧简化表达 并结合在线的关键帧快速识别与 匹配 实现了街道规模级别下的摄像机在线实时跟踪 已经发表在计算机视觉顶级会议ICCV 2009上 2 2 基于视频序列的时空一致性深度恢复基于视频序列的时空一致性深度恢复 随着数字摄像和显示设备的快速发展和普及 人们对三维视觉技术的需求也越来越迫切 为了能从实拍的图像 视频中恢复出高质量的深度信息 基于多帧信息统计互补的思想 我 们提出了一个基于视频序列的稠密深度恢复算法 创造性地提出在多视图立体深度恢复中采 用集束优化方法 将序列各帧上的深度变量通过几何一致性约束关联起来进行多帧上的信息 统计和全局优化 从而有效克服了噪声 遮挡和误差对深度恢复的影响 很好地解决了深度 恢复的时空一致性和边界瑕疵等问题 在此基础上 我们还提出了一个多道置信度传播优化 算法 可以在不需要增加很多计算代价的条件下有效地扩展全局优化中的深度级数 从而提 高深度恢复的精度 高质量的深度恢复直接推动了很多相关应用的发展 该项研究成果发表 在计算机视觉顶级会议 CVPR 2008 上作为大会宣读论文 录取率 4 该项工作的扩展改进 版发表在视觉与人工智能顶级期刊 IEEE Transactions on Pattern Analysis and Machine Intelligence 上 2008 年影响因子 5 96 这方面工作目前累计已被他引超过 20 次 查自 Google Scholar 3 3 单目视频立体化单目视频立体化 视频三维立体化已是大势所趋 特别是最近 阿凡达 等三维电影的空前成功 以及三 维立体电视的兴起 如何将真实拍摄的图像 视频三维立体化 引起了人们的空前关注 通 常的立体视频制作由于需要专门的硬件设备来拍摄或大量的人力进行手工建模来实现格式转 换 成本非常昂贵 特别对于已经拍摄的单目视频 往往只能采取后者的方式进行后期的二 维到三维格式转换 而目前又尚缺乏高效便捷的方法 在这种背景下 基于摄像机自动跟踪 技术 我们提出了一个自动高效的单目视频立体化算法 巧妙地绕开稠密深度恢复过程 将 视频立体化问题转化为一个综合考虑了立体效果 内容相似度和视觉平滑性的非线性能量优 化问题 通过将运动视差直接转化为双目视差 实现了摄像机移动拍摄的单目视频到双目立 体视频的快捷转换 该项研究成果发表在国际期刊 IEEE Transactions on Visualization and Computer Graphics 上 并曾在国家 十五 科技重大创新成就展上展出 该项工作已 被他引 6 次 查自 Google Scholar 4 4 基于深度恢复的视频分割 编辑与虚实融合基于深度恢复的视频分割 编辑与虚实融合 随着数字摄像机的大规模普及 视频的采集和获取变得非常容易 越来越多的视频通过 互联网发布 共享 如何将这些海量视频数据利用起来 进行视频的编辑 虚实融合和再创 作 这是一个很有现实意义但又极具挑战性的课题 这不仅需要恢复正确的摄像机参数和静 态场景的三维几何信息 还需要对视频场景进行分割 并恢复动态物体的运动和三维等信息 可能是由于正确深度信息恢复的困难性 绝大多数的分割技术只利用了颜色信息 显然 如 果能在优化能量函数中结合深度和运动信息 建立多帧上的对应统计关系 可以很大程度上 解决单纯依靠颜色信息带来的诸多不确定性 从而提高分割的稳定性 基于这个思想 我们 提出了一个新的运动前景抽取方法 将深度恢复 光流估计和运动对象分割纳入到一个统一 的求解框架下进行迭代优化 充分利用多帧的信息优势 在有效地抽取出运动前景的同时 还可以估计出整个视频场景的光流信息以及静态场景的深度信息 该方法突破了传统方法要 求摄像机位置固定的局限性 无需对背景进行预先建模 就可以很好地处理摄像机自由移动 且背景深度层次复杂的情况 将摄像机跟踪 深度恢复和视频分割技术结合起来 我们进一步提出了一个半自动的视 频再创作与虚实融合技术框架 解决了基于视频的编辑与虚实交互融合中所面临的几何一致 性 光照一致性和遮挡一致性等问题 实现了对各种视频资源的有效利用 有效增加了视频 编缉的手段和多样性 特别针对静态场景 提出了一个基于深度恢复的静态场景快速分层方 法 极大地提高了视频场景的分层效率 基于高质量的深度恢复和视频分层结果 我们分别 讨论了如何进行虚拟三维物体和视频对象的真实感合成 包括如何做到合成的几何一致性 真实的光照效果和阴影效果等 并提出了几种视频特效的生成方法 如物体的去除和伪装 子弹时间模拟 场景雾化和景深效果等 这方面的工作发表在计算机视觉顶级会议 ICCV 2007 国际期刊 IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Transactions on Visualization and Computer Graphics 和 Computer Animation and Virtual Worlds 上 相关工作累计已 被他引超过 15 次 查自 Google Scholar 关键词 摄像机跟踪 深度恢复 运动估计 视频分割 混合现实 视频 编缉 视频增强 Video Scene Reconstruction and Enhancement Guofeng Zhang ABSTRACT With the enormous increase in popularity of information acquisition and processing technologies effectively expressing our real world and integrating it with computer generated virtual scene which enables convenient human environment interaction has become a very important research topic As the object complexity increases it becomes more and more challenging to directly model and render objects to obtain realistic visual effects due to intractable computational cost and manpower requirement Based on the fact that contrary to computation images and videos can easily capture the real world computer vision techniques are the key to help extract and reconstruct the geometric and luminosity information coincident with our human visual perceptions which break the modeling limitations of the traditional computer graphics Nonetheless captured image video sequences only partly sample the projected view and do not intuitively reflect the actual 3D scenes which make it difficult for computer to accurately understand the complex real scenes automatically It seriously hinders the advanced usage of image video information The key solution is to accurately recovery 3D geometry and motion information from images videos which is also the core problem of 3D vision This thesis mainly focuses on solving the 3D geometry reconstruction and motion recovery problem from the real captured video data By making use of information consistency and redundancy of video data and leveraging the creative ideas of using keyframe representation and the complementary information from multiple frames we propose a robust and efficient global optimization framework to deal with camera tracking dense depth recovery motion estimation and video segmentation It profits many applications such as large scale city modeling autonomous visual navigation video scene understanding and reusing and mixed reality all of which are regarded fundamentally important but difficult in the technical level traditionally In summary the works in this thesis can be categorized into the following four aspects 1 Video based Automatic Camera Tracking Automatic camera tracking is a fundamental problem of computer vision which can be widely applied to urban planning military training television and entertainment etc Previous methods generally suffer from the efficiency and robustness problem and have difficulties in handling large scale scenes which seriously hinder their usage in practical problems Specifically the structure from motion SFM technique can recover the camera motion parameters from image video sequences It typically involves the steps of feature point tracking motion and structure initialization self calibration and bundle adjustment Two steps i e initializing motion and selecting an appropriate moment for self calibration in order to upgrade the projective reconstruction to a metric one are problematic in previous work With the idea of using keyframe representation and the complementary information from multiple frames we propose a robust and efficient video based automatic camera tracking technology which can efficiently and reliably handle long sequences with varying focal length using keyframe based optimization framework Our method significantly advances structure from motion SFM First keyframes are ordered in optimization to make SFM initialization reliable Second we measure the accumulation error and selectively upgrade projective reconstruction to a metric one before the error begins to damage the self calibration Third a local on demand scheme for bundle adjustment is applied which dramatically accelerates computation Accurate camera motion recovery also largely relies on the quality of feature tracking It is crucial to obtain long and accurate feature tracks so that the established constraints are sufficient to avoid the drift problem in SFM One challenge is to rapidly recognize and match the common features which correspond to the same 3D points but are distributed in non consecutive frames The traditional sequential trackers such as KLT are difficult to handle this situation which may cause the drift problem in SFM We address this problem by proposing a new feature tracking method which contains two phases namely consecutive point tracking and non consecutive track matching A new two pass matching strategy is employed to greatly increase the matching rate for detected invariant features and extend the lifetime of the tracks In the non consecutive track matching phase by efficiently computing a matching matrix a set of disjointed subsequences with overlapping content can be detected Common feature tracks distributed in these subsequences can also be reliably matched Our method not only is useful for loopback sequences but also can handle tracking and matching multiple videos and registering them in a common 3D system Our method can significantly improve the accuracy of camera tracking in large scale scenes which is central to dense 3D reconstruction and robust real time camera tracking These two pieces of work were published in top vision conferences CVPR 2007 and ECCV 2010 respectively The research findings constitute an automatic camera tracking system ACTS which outperforms state of the art commercial software Boujou in handling long sequences with varying focal lengths We have registered the software copyright for ACTS The software is downloadable from since the end of July 2009 It has received widespread attentions from both domestic and international researchers There are over three hundred and fifty registered users now This system has been a basic platform for our many other research works Based on this work we further propose a keyframe based real time camera tracking method which contains an offline module to capture reference images for environment modeling and select keyframes to represent the scene for reducing the data redundancy Then in an online module with fast keyframe recognition and matching we can achieve online real time camera tracking in street scale scenes This work has been published in top vision conference ICCV 2009 2 Spatio Temporally Consistent Depth Recovery from a Video Sequence With the rapid development and popularity of digital capture and display devices 3D vision techniques are in great demand In order to recover high quality depth information from image video sequences with the idea of using complementary statistical information among multiple frames we propose a novel video based dense depth recovery algorithm We propose a bundle optimization model for multi view stereo which associates the depth variables among different frames with geometric coherence constraint and collects the multi frame statistics information to perform global optimization It can effectively remove the influence of image noise occlusion and outliers and recover a set of spatio temporally consistent depth maps with accurate object boundaries In addition we also propose a multi pass belief propagation algorithm to significantly extend the number of depth levels in global optimization without introducing much computational overhead It is equivalent to improving depth precision in computation The recovered high quality dense depth maps can facilitate many related applications This work was published in the top conference CVPR 2008 as oral presentation acceptance rate 4 The extended and improved version was published in IEEE Transactions on Pattern Analysis and Machine Intelligence impact factor 5 96 in year 2008 The work has been cited 20 times in Google Scholar 3 Stereoscopic Video Synthesis from Monocular Videos Along with the great success of recent 3D films e g Avatar and the popularity of 3D televisions and monitors the problem of how to convert a captured 2D video to a 3D one arises and awaits urgently proper answers Acquiring stereoscopic videos generally require special devices in data capturing or excessive labors for manual 3D modeling in format conversion For existing monocular videos the latter approach has to be deployed where the cost is very high using previous methods In this context we provide an efficient and convenient algorithmic approach for stereoscopic video production based on automatic camera tracking Instead of recovering depth maps our method synthesizes the binocular parallax in stereoscopic video directly from the motion parallax in monocular video The synthesis is formulated as an optimization problem via introducing a cost function incorporating the constraints of stereoscopic effects content similarity and visual smoothness This work was published in IEEE Transactions on Visualization and Computer Graphics and once presented in the 10th Five year Plan Significant Scientific Achievements Exhibition This work has been cited 6 times in Google Scholar 4 Video Segmentation Editing and Composition based on Dense Depth Recovery With the increasing prevalence of portable video capturing devices more and more videos are shared and broadcasted over internet which can be accessed by home users in their daily life How to utilize these massive video data to synthesize new videos is a fascinating but challenging task In order to achieve this objective besides recovering camera parameters and 3D geometry of static objects we also semantically separate video layers and recover the motion and 3D information of dynamic objects Due to the difficulty of accurate depth recovery most of the existing segmentation methods only use the color information and ignore the important depth cue Obviously if we can optimize an energy function incorporating both the depth and motion information and link the correspondences in multi frames the problem ambiguity can be largely reduced and the reliability of segmentation can be significantly enhanced According to this scheme we propose a new moving object extraction method which integrates depth recovery optical flow estimation and moving object segmentation into a unified framework and solve them iteratively Because we use multiple frames our system can simultaneously accomplish foreground extraction dense motion field construction and background depth map estimation robustly Also previous methods typically require that the camera is stationary and the background is known or can be easily modeled In contrast our method does not have this limitation and can handle the challenging cases th

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论