FrameworksforMultimodalBiometricusingSparse.doc_第1页
FrameworksforMultimodalBiometricusingSparse.doc_第2页
FrameworksforMultimodalBiometricusingSparse.doc_第3页
FrameworksforMultimodalBiometricusingSparse.doc_第4页
FrameworksforMultimodalBiometricusingSparse.doc_第5页
已阅读5页,还剩5页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

精品论文frameworks for multimodal biometric using sparserepresentationhuang zengxi, liu yiguang, huang ronggang, yang menglong5(college of computer, sichuan university, chengdu 610065)abstract: this paper will introduce three frameworks of two fusion levels for multimodal biometric using sparse representation based classification (src), which has been successfully used in many classification tasks recently. the first framework is multimodal src at match score level (msrc_s), in which feature of each modality is sparsely coded independently, and then their representation fidelities10are used as match scores for multimodal classification. the other two frameworks are of multimodalsrc at feature level, namely msrc_f1 and msrc_f2, where features of all modalities are first fusedand then classified by using src. the difference between them is that msrc_f1 fuses the features to form a unique multimodal feature vector, while msrc_f2 implicitly combines the features in an iterative joint sparse coding process. as a typical application, the fusion of face and ear for human15identification is investigated by using the three frameworks. many results demonstrate that the proposed multimodal methods are significantly better than the multimodal recognition using common classifiers. among the src based methods, msrc_s gets the top recognition accuracy in almost all the test items, which might benefit from allowing sparse coding independence for different modalities.keywords: multimodal biometric; sparse representation; match score level; feature level; face and20ear0introductionthe original goal of sparse representation (or coding, sr) was for representation and compression of signals, potentially using lower sampling rates than the shannon-nyquist bound251. nevertheless, wright et al. 2 reckoned that the sparse representation is naturally discriminative and then designed a novel classification scheme, namely sparse representation based classification (src), which was employed in face recognition (fr) and achieved impressive performance. src could be seen as a more general model than the previous nearest classifiers, like nearest neighbor (nn), nearest feature line (nfl) 3 and nearest subspace (ns) 4, 5,30and it uses the samples from all classes to collaboratively represent the query sample to overcomethe small-sample-size problem in fr 6.src based techniques have been widely applied to various object classification tasks, such as fr 2, 6, flower classification 7. in almost all of these applications, using sparsity as a prior leads to state-of-the-art results. however, to the best of our knowledge, there is still not any report35of applying sparse coding to multimodal biometric. of course, some multi-features or multi-samples based classifications, which share similar fusion mechanism with multimodal biometric, using sparsity constraint have been reported 7, 8. in multimodal biometric systems, evidences can be fused at five different levels, i.e., sensor data, feature, match score, rank, and decision levels. since fusion at the later level, the less information can be used for classification,40thereby rank and decision levels fusions have rarely been adopted in community. compared with match score and sensor data levels, fusion at feature level can exploit the most discriminativeinformation and eliminate the redundant/adverse information from the original biometric data, andfoundations: specialized research fund for the doctoral program of higher education of china under grant (no.20070610031), national natural science foundation of china (nsfc) under grants (no. 61173182, no.61179071), and by applied basic research project (no. 2011jy0124) and international cooperation andexchange project (no. 2012hh0004) of sichuan province.brief author introduction:huang zengxi, (1985-),male,ph.d candidate, main research: image processing, pattern recognition.correspondance author: liu yiguang, (1972-), male, professor,main reserach: computer vision and image processing, pattern recognition, and computational intelligence. e-mail: - 10 -hence is much more popular recently. overall, for the multimodal biometric using sparse coding, fusions at match score and feature levels might be the most reasonable choices.45in this paper, we will introduce three frameworks for multimodal biometric using sparse coding. the first framework is multimodal src at match score level (msrc_s), in which feature of each modality is sparsely coded independently, and then their representation fidelities are used as match scores. the other two frameworks are multimodal src at feature level (msrc_f1, msrc_f2) where all features are first fused and then classified by using src. the difference50between them is that msrc_f1 fuses the features to form a unique multimodal feature vector, while msrc_f2 implicitly combines the features in an iterative joint sparse coding process. as a typical application, the fusion of face and ear for human identification is investigated by using the three frameworks. in our experiments, principal component analysis (pca) 9 based feature extraction is applied. many results demonstrate that the proposed methods are significantly better55than commonly used classifiers, like nn and nfl. among the frameworks, msrc_s gets the top performance in almost all the test items, which may benefit from allowing sparse coding independence for different modalities. match score level fusion based method appears to be more robust. from the viewpoint of sparse coding, we also give many discussions about their working mechanisms and the experimental results, and deduce a conclusion that striking a good balance60between the similarity and distinctness of the coding vectors may bring better robustness for multimodal biometric recognition.the rest of this paper is organized as follows: section 2 gives the details of the proposed multimodal biometric frameworks, including msrc_f1, msrc_f2 and msrc_s. section 3 conducts experiments on multimodal databases for evaluating the frameworks. finally, our65conclusions are summarized in section 4.1multimodal biometric frameworksface is considered to be the most acceptable and promising biometric. however, the face by itself is not yet as accurate and flexible as desired due to makeup, eyeglasses, illumination and expressions. on the other hand, the ear has some appealing advantages over the face: a) ear has a70rich and stable structure; b) ear has a uniform distribution of color; c) ear is small, and requires less computational time 10. in this paper, for the simplicity and convenience of introducing our multimodal frameworks, the face and ear based multimodal biometric is investigated as a case study.1.1 feature extraction75the pioneer src research in fr 2 had shown that with sparsity properly harnessed, the choice of features becomes less important than the number of features used. besides, src methods are relatively time-consuming compared to the commonly used classification methods. hence, a simple, time-saving and general feature extraction method is desired for our sr based multimodal biometric. in our frameworks, pca is utilized for face and ear feature extractions.80supposethatwehavec classesofsubjectsinamultimodaldatabase,f f f fe e e ea = a1 , a2 ,l, acanda = a1 , a2 ,l, acseparately denote the face and ear trainingsample sets, whereai = ai ,1 , ai , 2 ,l, ai, m( i = 1, 2,l, c , m samples per class) is the subset fromclass i . according to the pca technique, the training feature sets can be computed by:d f = ( p f )t a f andde = ( p e )t ae , where p fandp e are pca projection matrices that85calculated from the face and ear training datasets, respectively. and the feature vectors of face andear query images ( y fandye ), can be calculated by:z f = ( p f )t y f ,z e = ( p e )t ye .1.2 frameworksmsrc_f1 and msrc_f2.the general feature fusion techniques include serial concatenation, parallel fusion using a90complex vector, and cca-like methods that extracts correlation feature of two modalities.compared with the other methods, serial concatenation that our multimodal frameworks adopt is simple, effective and easy to extend for combining more than two modalities. in msrc_f1, themultimodal feature of query data is obtained byz = z f ; z e. likewise, the multimodal featuredictionary is constructed byd = d f ; de. msrc_f1 seeks to find a sparse representation of z95in terms of dictionary d , thel1 -norm minimization problem can be formulated as = arg min s.t. z d i21 2(1)where = 1 ; 2 ;l; c ,iis the coefficient vector associated with class i .the classification of msrc_f1 is based on the multimodal feature coding error that yielded by each class. the classification rule can be defined by100g( y) = arg min z di i (2)where diis the subset of multimodal features from classi .unlike in msrc_f1, the face and ear features, in msrc_f2, are not directly combined to form a unique multimodal feature vector, but are jointly represented in an iterative sparse coding process instead, which is an implicit feature fusion way. the goal of joint sparsity of msrc_f2105can be achieved by imposingl1,2mixed-norm on the coding coefficients. suppose1 2 c f = f ; f ;l; f ,e = e ; e ;l; ei, whereif andeare the face and ear coding vectorsassociated with class i . let = f , e( rm 2 ). msrc_f2 can be defined by1 2 ciiiiff f 2e e e 2 cmin z d + z - d + (3) f ,e2 2 i i 2110which can be solved by using theclassification rule isl1,2mixed-norm apg algorithm introduced in 7. and its2g( y) = arg min ( z f d f f2 + z e de e )(4)msrc_s.i i i 2i i 2in src, the representation fidelity is often measured by thel2 -norm of sparse coding error,115which can also be used as a distance score to denote how “different” between the query sample and the training samples of a class. so, in msrc_s, we first sparsely represent the face and earfeatures ( z f , ze ) in terms of their associated dictionaries ( d f , de ), respectively. after that, wefuse the sparse coding errors of face and ear features by using sum-rule. and the two minimization problems can be formulated as follows:l1 - f= arg min fs.t. z f d f f12 ,(5)120 e = arg min ethe match score fusion can be defined bys.t. z e de e1 2 ,(6)r = z - d f f+ z - de e, i = 1, 2,l, c(7)iii i 2i i i 2since r is a distance score, the classification would be ruled in favor of the class which has精品论文125130the lowest r .1.3 theoretical comparisonfrom the viewpoint of information fusion, msrc_f1 and msrc_f2 are the frameworks that combine face and ear at feature level, while msrc_s integrates them at match score level. because of its capability of utilizing more information for classification, feature level fusion has been considered to be the most promising avenue for multimodal biometric. therefore, intuitively, the former two approaches might perform better than msrc_s.from the viewpoint of sparse coding, their differences mainly lie on the constraint imposedon the coding vectors of face and ear. in msrc_f1, the modalities are enforced to share the samecoding vector, that is to say, f = eis required. this requirement may help to correctly find the135140145150155coding vector especially when some query samples of modalities are of good quality while the others are not, so as to make the multimodal recognition more robust than the unimodal recognition. however, on the other hand, if these query samples encounter the degeneration severe to some certain extent, this requirement may deteriorate the estimation of sparse coding vector. thus, such strong requirement in msrc_f1 can be seen as a double-edged sword. in contrast to msrc_f1, msrc_s sparsely codes the features independently. since each sparse coding process does not affect any others, msrc_s may lead to the lowest overall reconstruction error of all modalities, which, though, does not necessarily result in the best classification performance. compared with msrc_f1 and msrc_s, msrc_f2 does not require the coding vectors to be the same or completely independent, but allows flexibility between the similarity and distinctness of the coding vectors, which is a moderate constraint.2experiments and discussionsfor evaluating the performance of our multimodal biometric methods, we build up two virtual multimodal databases with 38 and 79 subjects, respectively, based on publicly available databases, including the extended yale b 11, ar 12 face databases and the ustb ear database iii 13, of which sample images of one person are showed in fig. 1, where the images in red rectangle are used as gallery images and the rest are used for testing. we name the databases as multimodal database i and ii (md i and ii), respectively, whose constitutions are described in details in table 1. for obtaining more instances for testing, each face test image is paired with every ear test image. all the face and ear images are normalized to have a size of 5040.in the experiments on each multimodal database, we use the same dimension of face feature vector and ear feature vector. the dimensionalities are selected empirically as 120 and 200 for md i and ii, respectively. the commonly used nn, nfl classifiers are used as references in comparison. we call their multimodal extensions as multimodal nn (mnn) and multimodal nfl(mnfl), respectively, which employ the same feature fusion scheme as that msrc_f1 used.-45-40 -10-516016505 404560fig. 1 sample images of one subject of ustb ear database iii.tab.1 the constitutions of md i and ii.multimodal database instances faceearmd igallery738 = 266extendedsubset 1 (7 images per subject)gallery (38)(38)probe381338 = 18772yale bsubset 2, 3, 4 (38 images per subject)probe (38)md ii (79)gallery 779 = 563subset 1 71379 = 7189subset 1 (7 images per subject withoutocclusion from session 1) gallery (79)subset 2 (7 images per subject withoutar occlusion from session 2)probesubset 2 61379 = 6162 subset 3 (6 images per subject with sunglasses)subset 3 61379 = 6162 subset 4 (6 images per subject with scarf)probe (79)1701751802.1 recognition without occlusionin this part, we will evaluate the performance of the proposed sr based methods with variations like illumination, pose and expression changes but without occlusion. hence, on md ii, only the subset 1 is used for testing. the experimental results are listed in table 2. on md i, a best recognition accuracy of 97.839% is yielded by msrc_s, which is slightly better than the97.552% of msrc_f2. compared with them, msrc_f1 is obviously worse by about 4%. however, this disadvantage almost disappears on md ii, where msrc_s achieves the top performance of 99.33%, only slightly better than msrc_f1. overall, msrc_f2 and msrc_s are comparable to each other, while they are better than msrc_f1. compared with the biometrics using common classifiers, i.e., mnn and mnfl, the superiority of our sr based methods can be seen clearly on all the multimodal databases.tab. 2 multimodal recognition without occlusionmnnmnflmsrc_f1msrc_f2msrc_smd i72.171%76.662%93.663%97.552%97.839%md ii79.018%82.25%98.92%99.027%99.33%tab. 3 multimodal recognition with real face disguisemnnmnflmsrc_f1msrc_f2msrc_ssubset 272.99%78.469%89.365%92.365%95.344%subset 341.344%46.74%90.375%95.396%97.677%1851901952.2 recognition despite real face disguisehere we use subset 2 (face images with sunglasses) and subset 3 (face images with scarf) of md ii to evaluate the robustness of the proposed methods against real face disguise. the detailed performance of all competing methods is listed in table. 3. clearly, the proposed sr based methods significantly outperform mnn and mnfl. among the sr based methods, msrc_s demonstrates its superior robustness again, it achieves the top performance of 95.344% and97.677% on the two datasets, respectively. msrc_f2 achieves the second best recognition accuracies on both datasets, which are 92.365% and 95.396%, respectively.2.3 recognition despite random pixel corruptionfig. 2 plots the recognition performance of all methods under the percentage of corrupted pixels of face or ear or both of them varies from 0% to 100% on md ii. with the increase of corrupted pixels, all the methods recognition-rate curves fal

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论