iccv2019论文全集8602-arbicon-net-arbitrary-continuous-geometric-transation-networks-for-image-registration

上传人：我*** IP属地：北京上传时间：2020-06-19 格式：PDF 页数：11 大小：11.54MB 积分：12 举报 版权申诉

iccv2019论文全集8602-arbicon-net-arbitrary-continuous-geometric-transation-networks-for-image-registration_第2页

iccv2019论文全集8602-arbicon-net-arbitrary-continuous-geometric-transation-networks-for-image-registration_第3页

iccv2019论文全集8602-arbicon-net-arbitrary-continuous-geometric-transation-networks-for-image-registration_第4页

iccv2019论文全集8602-arbicon-net-arbitrary-continuous-geometric-transation-networks-for-image-registration_第5页

已阅读5页，还剩6页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration Jianchun Chen NYU Multimedia and Visual Computing Lab New York University Brooklyn, NY 11201 Lingjing Wang NYU Multimedia and Visual Computing Lab New York University Brooklyn, NY 11201 lw1474nyu

2、.edu Xiang Li NYU Multimedia and Visual Computing Lab New York University Brooklyn, NY 11201 Yi Fang NYU Multimedia and Visual Computing Lab New York University Abu Dhabi Abu Dhabi, UAE Abstract This paper concerns the undetermined problem of estimating geometric transfor-

3、mation between image pairs. Recent methods introduce deep neural networks to predict the controlling parameters of hand-crafted geometric transformation models (e.g. thin-plate spline) for image registration and matching. However the low-dimension parametric models are incapable of estimating a high

4、ly complex geometric transform with limited fl exibility to model the actual geometric deforma- tion from image pairs. To address this issue, we present an end-to-end trainable deep neural networks, named Arbitrary Continuous Geometric Transformation Networks (Arbicon-Net), to directly predict the d

5、ense displacement fi eld for pair- wise image alignment. Arbicon-Net is generalized from training data to predict the desired arbitrary continuous geometric transformation in a data-driven manner for unseen new pair of images. Particularly, without imposing penalization terms, the predicted displace

6、ment vector function is proven to be spatially continuous and smooth. To verify the performance of Arbicon-Net, we conducted semantic align- ment tests over both synthetic and real image dataset with various experimental settings. The results demonstrate that Arbicon-Net outperforms the previous ima

7、ge alignment techniques in identifying the image correspondences. 1Introduction Image registration plays a fundamental role in many computer vision applications such as medical image processing 1, camera pose estimation 2, visual tracking 3. Fig.1 shows the image registration process, which includes

8、 geometric transformation estimation and image warping. To formulate the problem of image registration, traditional methods often approach the task in two steps: 1) they fi rstly compute the hand-crafted image features such as SIFT and HOG 4,5 to capture pixel-level descriptions, 2) and then iterati

9、vely search the optimal geometric transformation model to register a pair of images, driven by minimizing an alignment loss function. The alignment loss is usually pre-defi ned as a certain type of similarity metric (e.g. correlation scores) between two Equal contribution to this paper Corresponding

10、 author 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. Figure 1: Illustration of Arbicon-Net for image alignment. sets of image feature descriptors. Previous efforts 6,7,8 have achieved great success in image registration through the development of a vari

11、ety of image feature descriptors and optimization algorithms as summarized in 9. However, they often face challenges posed by various deteriorated image conditions such as 1) the dramatic image appearance variation (i.e. texture, color, lighting changes and so on) between image pairs, and 2) the sig

12、nifi cant geometric structural variation between image pairs. The recent success of deep neural network motivates researchers 10,11,12,13 to develop deep learning techniques to combine both two steps into an end-to-end trainable network, which aims to learn a pre-defi ned geometric model (i.e. affi

13、ne or thin-plate spline) through the regression process supervised by minimizing the image matching loss. With the generalization from training data, those methods are able to predict real-time image matching that is robust to various deteriorated image conditions. However, it is suggested by the au

14、thors 14 that pre-defi ned geometric transformation models only represent a set of low dimension transformations which prevents these methods from predicting complex geometric transformations for high-quality image registration. Moreover, the transformations described by hand-crafted geometric model

15、s might not reveal the actual transfor- mation required for image alignment, which leads to a sub-optimal estimation of desired geometric transformations. Some methods 14,15,16 tackle this problem by directly estimating semantic fl ow from pixel- level features. These methods are more fl exible to t

16、ransfer the keypoints of images to semantically correlated positions. However, since the fl ow fi eld is estimated entirely by local features without integrating global motion, local points are unable to move coherently, which consequently generates distorted unrealistic images. In real-world applic

17、ations (e.g. 1 ), these fl ow based methods require explicitly imposed penalization to constrain the smoothness of fl ow fi eld. To address the above mentioned issues, we propose to develop a novel geometric transformation network, named arbitrary continuous geometric transformation networks (Arbico

18、n-Net), to directly predict the dense displacement fi eld that is not formulated by pre-defi ned hand-crafted geometric models. Compared with geometric model based approaches, Arbicon-Net uses deep neural network to model geometric transformations to accommodate arbitrary complex transformations req

19、uired for the registration of image pairs. Compared with semantic fl ow based methods, Arbicon-Net features an attractive property, which predicts a smooth displacement fi eld. As shown in Fig.2, we design an Arbicon-Net to simultaneously train three major modules, namely front-end geometric feature

20、 extractor module, transformation descriptor encoder module and displacement fi eld predictor module, in an end-to-end fashion. The Arbicon-Net fi rstly extracts dense feature maps from input image pairs and encodes the discriminative local feature correlation into a transformation descriptor. The f

21、ollowing predictor module uses the transformation descriptor to decode displacement fi eld for image registration. Contributions.We have three main contributions in this paper. First, we design a novel Arbicon-Net, which uses deep neural networks to predict dense displacement fi eld to accommodate t

22、he arbitrary geometric transformations according to the actual requirement for image registration. This addresses the critical issue that the actual desired geometric transformation does not match with the one that can be provided by pre-defi ned geometric model. Second, we prove that the Arbicon-Ne

23、t is guaranteed to generate spatially continuous and smooth displacement fi eld without imposing additional penalization term as a smoothness constraint. Finally, we show that our proposed Arbicon-Net achieved superior performance against hand-crafted geometric transformation models with both strong

24、 and weak supervision. 2 Figure 2: Main Pipeline. Our proposed end-to-end trainable Arbicon-Net has three main compo- nents. 1) Geometric Feature Extractor Module; 2) Transformation Descriptor Encoder Module; 3) Displacement Field Predictor Module. 2Related Works Image Registration. Image registrati

25、on is defi ned as a process to determine a smooth geometric transformation between input image pairs, especially for 2D/3D medical images. Existing methods search optimal geometric transformation by iteratively minimizing alignment loss, which is typically defi ned by the feature similarity or hiera

26、rchically defi ned intensity pattern. To achieve a high-quality image registration, researchers 17,1,9 have explored diverse geometric transformation models, image similarity metrics and searching algorithms. Non-learning based Image Correspondence Matching.The classic image correspondence match- in

27、g pipeline 6,7,8 starts by detecting key points via hand-crafted pixel-level feature descriptors 4,5,18, followed by feature matching strategies to determine the optimal point correspondence 19,5. Following researches have developed various hand-crafted algorithms 19,20 to remove incorrect matches b

28、y searching global transformation or utilizing neighbor information. While these methods are limited in matching speed and matching performance, they have far-reaching impact on computer vision society by imposing a standard pipeline and introducing geometric transformation estimation as a mainstrea

29、m approach for image matching problem. Learning based Image Correspondence Matching.Inspired by the success of deep neural network, pioneer works 21,22,23 propose to use pre-trained convolutional neural networks (CNNs) instead of hand-crafted ones to extract discriminative pixel-wise feature descrip

30、tors. Following researches develop learnable feature extraction layer 24,25 and learnable feature matching layer 26,27 with differentiable image alignment loss. Han et al. 28 introduce a fully learnable image correspondence matching strategy over region proposals. However, this method is not in an e

31、nd-to-end trainable fashion. More recently, researchers 10,11,12,13 propose end-to-end trainable network architectures for image correspondence estimation. Specifi cally, these methods defi ne a regression network to predict the parameter of specifi c geometric transformation models (i.e. thin-plate

32、 spline, affi ne). But they are limited by the use of low dimension geometric models and consequently less capable of performing fi ne-grained image geometric transformation. Other researchers either recurrently regress pixel- level fl ow fi eld 15,16 to approximate fi ne-grained image transformatio

33、n or determine fl ow fi eld by neighbourhood consensus assignment 14. However, as we stated above, they dont take the smoothness of displacement fi eld into account. 3Approach 3.1Geometric Feature Extractor 3 Figure3: FeatureCor- relation. Following the common image matching paradigms 11, our Arbico

34、n-Net starts with extracting geometric features from input image pairIA,IB. We fi rstly leverage a share-weighted CNN to generate a representative feature mapF Rhwcfor each input image, where at each location the feature vector fij Rcrepresents local semantic information. In order to estimate the ge

35、ometric transformation of given image pairs, we establish the local feature correlations between two feature maps by using the normalized cosine similarity. For each local descriptor fromFA, we compute its similarity score with all local descriptors inFBto form a 4-D correlation tensorS Rhwhwas show

36、n in Fig.3. Each elementsijkl Sis computed as, sijkl= ?fA ij,f B kl ? |fA ij|2|f B kl|2 (1) wherehidenotes the inner product of two vectors, the denominator acts a normalization term to further amplify confi dent matching and reduce ambiguity matching. 3.2Transformation Descriptor Encoder To learn a

37、 more discriminative feature correlation, we leverage 4-D convolutional neural networks (CNNs) to refi ne correlation tensorSby using neighbor information 14. The 4-D convolution layers integrate additional neighborhood information compared with regular 2-D CNNs. Since the order of input image pairs

38、 (IA,IB) or (IB,IA ) do not infl uence the result of local feature correlation, the convolution operation is symmetrically applied, formulated as, SC= Conv(S) + (Conv(ST)T(2) , wherethetransposeofSiscomputedaccordingtosT ijkl = sklij. Moreover, wenormalizethelearned 4-D correlation tensorsby Eq.3, w

39、hereA pq= scpq11,.,scpqhwandBpq= sc11pq,.,schwpq. This normalization encourages the bilateral confi dence of correlated pairs for from source image and target image. sijkl= sc ijkl max(A ij) sc ijkl max(B kl) sc ijkl (3) Since our goal is to fi nd a global transformation, we use a Multi-Layer Percep

40、tron to encode learned 4-D tensorSinto a transformation descriptordAB Rmthat represents the overall image correspondence information, as shown in Eq.4. For global geometric transformation learning, the image correspondence information describes a geometric transformation that optimally aligns corres

41、ponding points on two images. dAB= MLP(S)(4) 3.3Displacement Field Predictor In general, the geometric transformationTfor each pointxin a point setX R2 can be defi ned as: T (x,v) = x + v(x)(5) Figure 4: Displacement Field Predictor Module. , where v : R2 R2is a “point displacement” function. The im

42、age registration task can be formulated as a process of determining the displacement functionv. It is necessary for func- tionvto be a continuous and smooth function according to the Motion Coherent Theory (MCT) 29. Fortunately, by leveraging deep neural network architecture, we can construct the su

43、itable displacement functionv which satisfi es the continuous and smooth characteristics. As illustrated in Fig.4, givenn2-D points in the source image plane, we duplicate the transformation descriptordABforntimes. 4 Each point is concatenated with them-D global descriptordAB. We further construct a

44、 Displacement Field Predictor network with four successive MLPs to decode the concatenated(m+2)-D vector into 2-D displacement vector. We defi ned this neural network structure in Eq.6 asF() : Rm+2 R2, formulated as, v(x) = F(x,dAB)(6) ,where indicates concatenation operation. Furthermore, we briefl

45、 y prove the continuity and smoothness of our displacement fi eld predictorFto be used as our deep learning-based solution for the displacement function v. Continuity. Since both MLP and activation functionare continuous, the continuity of Displacement Field Predictor network can be trivially proven

46、 as a composite of continuous functions. SincedAB is concatenated to each point inX, this concatenation operation does not change the continuity for displacement functionv() = F(,dAB). In contrast, commonly used learning paradigms 1,15,16 , which directly map high dimension feature space to 2-D disp

47、lacement fi eld, output a set of discrete displacement vectors, while the displacements of other points need to be further interpolated. Smoothness.After choosing a smooth function SoftPlus 30 as the activation function in our Dis- placement Field Predictor network, it becomes trivial to estimate it

48、s complexity and smoothness since the displacement function is a composite of a number of smooth functions (MLP and SoftPlus). In practice, Regularization Theory (RT) 31 uses the oscillatory behavior of a function to further mea- sure the smoothness of displacement function. The oscillatory behavior

49、 is measured by Reproducing Kernel Hilbert Space (RKHS) 31, 32 in Eq.7. |v|2 Hm = Z RD | v(s)|2 g(s) ds(7) , where vis the Fourier transform of the displacement functionvand g is a low-pass fi lter. In other words, a smoother displacement function has considerably less energy in high frequency domai

50、n. We generally express models that regress pixel-level displacement vector, including Arbicon-Net and RTNs 16, as a composite function v in Eq.8. v(x) = F(G(x) :R2 R2(8) FunctionGandFdenotes the point feature encoding network and point displacement vector regres- sion network respectively. Specifi

51、cally, we haveG(x) = x,dABin Arbicon-Net. In contrast, in RTNsGgenerates a high-dimensional feature map by sequential CNNs. Therefore, theGin RTNs is generally considered as a sparse and oscillate function, especially when the dimension of feature vector (output of functionG) is high, which causes t

52、he widely known “curse of dimensionality” problem. In this section, we assume that two models have same regression networkFand the input of function F are normalized to a same scale. According to 33, the Fourier transform of the composite functionvhas essentially maximum frequency uvas uv= uFmax x |

53、G0(x)| (9) , whereuFis the maximum frequency ofF, which is independent ofG. Assume that the outputs of different functionGare in a same scale, the oscillate functionGtends to have a larger maximum value of|G0(x)|compared with linear function in Arbicon-Net. As a result, our composite function has a

54、smalleruv, which is likely to have lower energy in high frequency domain, which further guarantees a smoother displacement function. Based on our proposed paradigm, we further constrain the smoothness of functionF. Fortunately, given the popularity of deep learning models, the recent research commun

55、ity has been proposing regularization strategies, which naturally help our Displacement Field Predictor network to reduce the risk of the oscillatory of displacement function. One simple solution is to design a proper network size. In section 4.3, we provide empirical results to validate the smoothn

56、ess of our estimated displacement function compared with non-rigid geometric transformation models. 3.4Loss functions As shown in the right box of Fig.2, our designed method is designed to learn geometric transformation under either strong supervision or weak supervision. 5 For strongly-supervised l

57、oss, we have point correspondence information ofx Xfrom source plane andy Yfrom target plane.Lstrongdirectly minimizes the pairwise L2 distance between corresponding points in transformed image plane and target image plane, as shown in Eq.10. Lstrong= 1 N N X i=1 |T (xi) yi|2 2 (10) For weakly-super

58、vised loss, we maximize the inner product of corresponding location in transformed source feature mapT (FA)and target feature mapFBfollowing the paradigm described in 10. LetT (IA)andIBto be matched, we haveT (IA)ijandIij B to be semantically matched, thus D T (fA)ij fij B E to be maximum. We implem

59、ent the loss function described in Eq.11. Lweak= X i,j,k,l sijkl1d(T (i,j),(k,l)t (11) , where 1() is the indicator function, d() denotes L1 distance. Since our proposed network is end-to-end trainable, the Geometric Transformation Network is optimized together with other components. It deserves noting that, since our method learns the geometric transformation from training dataset, it is more robust to train or fi ne-tune our network on real

人人文库> 全部分类> 应用文书

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

iccv2019论文全集8602-arbicon-net-arbitrary-continuous-geometric-transation-networks-for-image-registration

文档简介

温馨提示

最新文档

评论

iccv2019论文全集8602-arbicon-net-arbitrary-continuous-geometric-transation-networks-for-image-registration

文档简介

温馨提示

最新文档

评论

相关文档