已阅读5页,还剩3页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Edge-Preserving Camera Trajectories for Improved Optical Character Recognition on Static Scenes with Text Rohan Katoch1and Jun Ueda1 AbstractCamera systems in fast motion suffer from the effects of motion blur, which degrades image quality and can have a large impact on the performance of visual tasks. The degradation in image quality can be mitigated through the use of image reconstruction. Blur effects and the resulting reconstruction performance are highly dependent on the point- spread function resulting from camera motion. This work fo- cuses on the motion planning problem for a camera system with boundary conditions on time and position, with the objective of improving the performance of optical character recognition. Tuned edge-preserving trajectories are shown to result in higher recognition accuracy when compared to inverse error and linear trajectories. Simulation and experimental results provide quantitative measures to verify edge-preservation and greater recognition performance. Index TermsComputer vision, Motion planning, Image processing I. INTRODUCTION Camera sensors are widely used by robotic systems as a rich source of information about the surrounding environ- ment, helping to resolve a broad array of tasks, including localization, object recognition, path planning, optical char- acter recognition (OCR) 1, 2. Performing these tasks suc- cessfully in unstructured environments, requires processing visual signals at a high-level and in an effi cient manner. The quality of the signal captured, in general, has a signifi cant impact on task performance. Camera motion is one source of degradation in visual signals, especially when considering fast-moving systems. This setting may occur, for example, when scanning a large scene with suffi cient image resolution in a short period of time 3, 4, 5. In this case, it is desirable to capture images sequentially without making frequent stops. Images captured by systems in motion suffer from degra- dation in the visual signal due to two main factors: (i) camera motion, and (ii) scene motion 6. The combination of these effects results in the phenomenon known as motion blur. Camera sensors have fi nite exposure duration to allow suf- fi cient development of charge in the array of photosensitive elements. Any relative motion, between camera and scene, during this exposure period causes different point sources to be integrated on an individual element. This results in motion blur that can be described by the path a point source takes over the array of elements. This work was supported by the National Science Foundation under Grant No. 1662029. 1Rohan KatochandJunUedaarewithBio-RoboticsandHu- man Modeling Lab, George W. Woodruff School of Mechanical En- gineering, Georgia Institute of Technology, Atlanta, GA 30332 USA Fig. 1.An example of an application of motion-controlled cameras. A drone taking a route panorama while fl ying along a street. Images blurred under camera motion exhibit blur effects that are spatially invariant, as the motion is applied to the sensor globally. In contrast, objects in motion within a scene can cause spatially-varying (local) blur. Only the case of camera motion is considered in this work, therefore the scene being imaged is assumed to be stationary and depth variations are negligible (orthographic scene). Under stationarity assumptions, a blurry image can be represented as the convolution of a point-spread function (PSF) with a latent image. Where the PSF, also known as a blur kernel, represents how the energy of an ideal point source disperses over the sensor array. This blur model is applicable only for camerasI with a global shutter in planar motion, and does not apply to cameras with rolling shutter or non-planar motion. Non-planar camera motion can exist, however since exposure times are relatively short the motion during exposure can be assumed to be planar. Also, note that both displacement and velocity of the camera trajectory are required to determine the resulting PSF. Mitigating motion blur effects is possible using three methods: (i) controlling optical parameters (exposure, aper- ture, focal length), (ii) controlling camera motion, and (iii) motion deblurring. Short exposure can be achieved with high-speed cameras which have very fast shutter speeds and sensors with high photo-sensitivity. This eliminates the possibility of signifi cant motion occurring at the exposure time scale. While capable of producing high-quality images, these camera systems are costly, require very high data rates, and do not operate well in low-intensity lighting IEEE Robotics and Automation Letters (RAL) paper presented at the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 Copyright 2019 IEEE conditions. Other methods that involve active control of optical parameters include coded exposure (fl utter shutter) 7 and coded aperture 8. These methods use intelligent control of optical parameters during image capture, followed by post-processing. The ability to actively control optical parameters is not present in most commercially available cameras. Furthermore, these methods generally require a stationary camera and are not applicable under the constraints of the motion planning problem considered in this paper. Motion compensation involves providing feed-forward control signals to stabilize a camera sensor relative to the scene being captured 9. Prior knowledge of how the imaging device will move without compensation is required, in order to use this method. If performed successfully, the imaging sensor should remain stationary with respect to the scene. In practice this is rarely the case, due to disturbances or imperfect state knowledge, and additional image processing is required to further restore the latent image. Furthermore, the use of this method is constrained by hardware limitations of the compensation mechanism (i.e. speed and displacement limits). Motion deblurring refers to the use of image processing techniques to remove blur effects after an image has been captured. In general, this is an ill-posed inverse problem that requires either prior knowledge or feature-based information for application 10, 11, 12, 13. Currently, all motion deblurring algorithms that deal with spatially-invariant blur fi rst evaluate or estimate the PSF, and then utilize it for a deconvolution process. The motion planning problem for a fast-moving camera system capturing an image is considered, with the objective of improving OCR performance. This is relevant when using a camera for scanning or taking route panoramas of a scene with text, as in Figure 1 14. Instead of focusing on stabilizing the camera relative to scene, as in motion compensation, a scenario is considered where a system needs to move from its current state to a desired fi nal state within a fi xed time horizon. The problem now is to determine the camera trajectory that meets boundary conditions while generating images that preserve salient features for OCR. The camera trajectory chosen can be used to evaluate the expected PSF prior to image capture, reducing computation time. Prior literature has dealt with goal-oriented blind image reconstruction 15, 16, 17 and trajectory generation for enhanced image reconstruction 18, 13, 19, but not goal- oriented trajectory generation. While blind reconstruction methods offer improved OCR performance without requiring prior knowledge of camera motion, the computation time required for these methods prohibit them from being used in real-time applications. The authors have previously studied the use of dynamics-based deblurring for optical character recognition 20, which provides real-time performance that outperforms other image reconstruction methods. However, the methods attempts tot stabilize the camera while taking the image rather than generating a specifi c trajectory. Levin et al. propose moving the camera using parabolic trajectories which generate PSFs that are invariant to constant-velocity motion in one direction 18. This method was extended by Cho et al. to include all planar motion directions, by taking two orthogonal parabolic exposures of the same scene 21. Bando et al. propose using circular trajectories which generate PSFs that are orientation invariant for linear motion 19. This work fi rst demonstrates the positive correlation be- tween edge features and recognition accuracy, and proposes a parametric trajectory which can be tuned for edge preser- vation. The concept of residence time distributions (RTDs), introduced in prior work by the authors 22, is used as a mapping between trajectories and PSFs. Using RTDs, it was shown that inverse error trajectories result in Gaussian PSFs (see appendix A). Images of natural scenes blurred under inverse error trajectories and then reconstructed, resulted in lower mean square error (MSE) values when compared to parabolic and linear trajectories. Furthermore, Gaussian PSFs result in reconstructed images that are robust to additive noise. OCR performance, however, is dependent on image gradients, which are not well preserved by Gaussian PSFs. In contrast to inverse error functions which generate PSFs robust to noise, this work investigates PSFs that preserve edges in text images. II. MOTIONBLURANALYSIS A. Problem Setting Consider a camera motion system in planar space X R2with position x = (x,y) X . There are two objectives to accomplish: (i) reach fi nal position xfat tf, and (ii) capture image I with a desired PSF. The orientation of the camera does not change. The exposure window T?= (t?,t?+ T with duration T is the time period during which the camera sensor captures information, and is considered to be prior information. B. Image Formation The formation of images in digital cameras can be mod- eled as a noisy integration process, with two sources of noise: (i) shot noise, and (ii) thermal noise 11. Shot noise refers to the variance in the number of photons captured by photo-sensitive elements over time, and is proportional to the square-root of the signal intensity on a per-pixel basis. While thermal noise refers to the general uncertainty of reading an electrical signal which is thermally agitated. Shot noise is modeled as a stationary Poisson process P with intensity , and thermal noise as an additive zero-mean Gaussian process N with variance 2. The Poisson process generates a blurry image B to which Gaussian noise N is added, resulting in captured image I. Therefore, for latent image L and exposure period T, the captured image is described by (1) and (2): B P ? Z T? L(x(t)dt ? ,N N ?0,2? (1) I = B(x(t),T?,) + N.(2) Fig. 2.Example of a planar camera trajectory for a motion-controlled camera. The exposure position x? is defi ned to be the average position for the trajectory x(t) during the exposure window T?: x?= 1 T Z T? x(t)dt.(3) C. Non-blind Motion Deblurring Reconstructed image L(x?) is generated by deconvolution with an estimated blur kernel K, representing the expected PSF. The real kernel K is assumed to accurately represent the time-dependent motion blur process. This assumption holds when considering spatially-invariant blur. The image model in (2) can now be expressed using the convolution operator , as shown in (4): I = K(x(t),T?,) L(x?) + N.(4) The PSF can be directly estimated from the command sig- nals generated and then used for deconvolution. This process is called dynamics-based motion deblurring, as described in prior work by the authors 23, 24. D. Residence Time Distributions Consider a planar camera trajectory x(t) defi ned on expo- sure window T?, as shown in Figure 2. Residence time is the length of time spent at a particular position by the camera while moving along the trajectory, providing the mapping r(x) : R2 R+which will be called a RTD. To guarantee a continuous and smooth trajectory, the RTD must be twice differentiable. The trivial case when the camera is stationary leads to a RTD consisting of a Dirac delta function at position x?with value T. While this results in a clear image, such a trajectory would not meet the desired kinematic constraints. There are an infi nite number of possible cyclic trajectories that map to the same RTD. Restricting the analysis to noncyclic trajectories, that is x 6= 0 t T?, is necessary to guarantee a unique solution. The expression for r(x) can then be found using (5): r(x) = 1 k x(x)k2 .(5) The relation above can be used to numerically or ana- lytically construct a RTD. For the purposes of image recon- struction, the RTD is useful due to its proportionality with the expected PSF. In fact a PSF can be generated by normalizing a RTD and discretizing according to the kernel size. The compact support of the distribution is determined by the minimum xand maximum x+position values of the trajectory during exposure. For monotonically increasing trajectories, the minimum and maximum values are defi ned by the displacement parameter x and exposure position x?. III. EDGEPRESERVINGTRAJECTORIES The problem of character recognition in images has been widely studied and is an active area of research 25, 26. Many approaches have been proposed which include unsu- pervised learning, convolutional neural networks, conditional random fi elds, and belief propagation 27. In general the text recognition process consists of character segmentation and classifi cation. These processes perform best when the image has the following properties: (i) sharp edges, (ii) high contrast, (iii) well aligned characters, and (iv) low pixel noise 28, 29. While contrast and character alignment cannot be affected by camera motion, the presence of sharp edges and noise can. Post-processing methods for enhancing these features exist and are known as edge-preserving smoothing fi lters 30. Examples of such fi lters include median, bilat- eral, and anisotropic diffusion 31. However, these fi lters are spatially varying and therefore cannot be replicated by camera trajectories. They also require extensive computation and are not effi cient enough for real-time use 32. Instead, camera trajectories are desired that preserve edges and satisfy position and velocity constraints. This paper proposes to use fourth order polynomial trajectories which meet the desired constraints and generate PSFs that preserve edges. Since camera exposure periods are generally very short, the following trajectory generation method will assume that the mobile platform carrying the camera follows a straight-line path. Therefore the two-dimensional case can be reduced to one-dimension, where the new axis will be aligned with the tangent of the general path shown in Figure 2. A. Polynomial Trajectories Fourth-order polynomial trajectories are considered as a candidate for generating tuned edge-preserving PSFs while meeting the desired kinematic constraints. This choice is made so that there are just enough parameters to meet boundary conditions and a free parameter for tuning. These trajectories are expected to generate PSFs that have low variance, resulting in high RTD values near the exposure position. Equation (6) presents a parametric fourth-order polynomial with fi ve parameters ci. There are two position constraints and two velocity constraints that need to be sat- isfi ed, leaving one free parameter that must be defi ned. This last parameter is resolved by constraining the acceleration at the exposure position x? to be equal to a user-defi ned value R+. Now all parameters can be found using (7)-(9). 51.41.451.51.551.61.651.71.75 Time s Position mm 62 64 67 63 65 66 68 69 0 50 100 150 Fig. 3.Fourth order polynomial trajectories for various values. 6263646566676869 Position mm Residence Time ms 0 2 4 6 10 8 12 14 0 50 100 150 Fig. 4.RTDs corresponding to fourth order polynomial trajectories for various values. x(t) = 4 X i=0 ci(t t?)i,(6) c0= x?,c1= v?,c2= /2,(7) c3= 2 T2 ? 8 x T + T 16 (T 6v? 2ve) ? ,(8) c4= 2 T3 ? 12 x T + T 8v? 4ve ? .(9) The RTDs of fourth-order polynomials exhibit a double- peak structure with maximal values at neighboring pixels rather than the central pixel. As is increased, the distance between the RTD peaks reduces and the maximal value increases. For the trajectories in Fig. 3, this trend is demon- strated in Fig. 4. Note that small changes in trajectories can result in large changes in RTDs due to the inverse relationship between velocity and residence time. B. Image Gradients Text edges can be characterized by the gradient infor- mation of pixel intensity values. Image gradients can be evaluated using Sobel operators 33, which determine the directional change in image intensity values. Text images generally exhibit high gradient intensities and low gradient variances. Therefore the mean and variance of image gradi- ents can be used as a metric to characterize the degradation of textual edge information. The mean is evaluated as the average edge intensity over the two-dimensional image, while the variance is evaluated over the average edge intensities across each row. (c)(a)(b) 0500100015002000 0 0.2 0.4 0.6 0.8 1 0500100015002000 0 0.2 0.4 0.6 0.8 1 0500100015002000 0 0.2 0.4 0.6 0.8 1 Linear blur Restored image Pixel Normalized Intensity PixelPixel Ideal edge Fig. 5.Image of ideal edge (unit step) with one-dimensional represen- tation for: (a) clear image, (b) image blurred by linear RTD, (c) image reconstructed by Wiener deconvolution. Fig. 6.Image of ideal edge (unit step) with one-dimensional representation for: (a) clear image, (b) image blurred by 4th order polynomial RTD, (c) image reconstructed by Wiener deconvolution. (c)(a)(b) 0500100015002000 0 0.2 0.4 0.6 0.8 1 Gaussian blurRestored image Pixel Normalized Intensity PixelPixel Ideal edge 020040
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年自考学前教育(本科)《学前儿童语言教育》模拟卷及答案
- 耐火炉窑装窑工复试能力考核试卷含答案
- 宴会定制服务师保密知识考核试卷含答案
- 《GBT 23934-2015 热卷圆柱螺旋压缩弹簧 技术条件》专题研究报告
- 潜水救生员岗前设备维护考核试卷含答案
- 活性炭干燥工岗位工艺技术规程
- 工业设计工艺师岗前班组评比考核试卷含答案
- 排土机司机安全强化测试考核试卷含答案
- 旅游咨询员现场作业技术规程
- 公司家用电冰箱维修工岗位职业健康技术规程
- 轻资产运营模式下“海澜之家”财务绩效评价研究
- 人教版高中英语选择性必修一词汇表默写
- 第一单元 写作《热爱写作学会观察》讲义-2025-2026学年统编版七年级语文上册
- 联通生态伙伴管理办法
- 《通过感官来发现》课件
- 绿色施工节能措施和建筑垃圾减量策略
- 7.1 水果店(教案)北师大版数学三年级上册
- 丁苯酞指南讲解
- 生成式AI时代的智能翻译创新与实践
- 临床合理用药课件
- 【MOOC答案】《3D工程图学应用与提高》(华中科技大学)章节作业慕课答案
评论
0/150
提交评论