




已阅读5页,还剩12页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
A Study on Static Hand Gesture Recognition usingMoments静态手势识别研究中矩的应用S. Padam Priyal and Prabin.K.BoraDepartment of Electronics and Communication EngineeringIndian Institute of Technology Guwahati, Guwahati, IndiaEmail: s.priyal, prabiniitg.ernet.inAbstractHand gesture recognition is one of the key techniques in developing user-friendly interfaces for human-computer interaction. Static hand gestures are the most essential facets of gesture recognition. View point invariance and user independence are among the important requirements for realizing a real time gesture recognition system. In this context, the geometric moments and the orthogonal moments namely theZernike, Tchebichef and Krawtchouk moments are explored. The proposed system detects the hand region through skin color identification and obtains the binary silhouette. These images are normalized for rotation and scale changes. The moment features of the normalized hand gestures are classified using a minimum distance classifier. The classification results suggest that the Krawtchouk moment features are comparatively robust to view point changes and also exhibit user independence. I摘要-手势识别是研发用户友好的人机交互界面的关键技术之一。静态手势手势识别是其中最重要的方面。观察视点不变性和用户独立性是众多为实现实时手势识别系统的重要需求中的其中两种。因此,我们对几何矩和正交矩即Zernike,Tchebichef和Krawtchouk矩进行了探讨。在应用中,拟定系统通过皮肤颜色识别检测手所在的区域,并且得到了二进制表示的手部轮廓。将这些得到的图像根据旋转度的变化和比例的变化进行规范化。规范化后的手势利用一种最小距离分类器将其矩特性区分成各个不同类型。结果表明: 相较其他类型的矩,Krawtchouk矩在观察点变化时健壮性(即稳定性)较好,并且较好地体现了用户独立性。index TermsGeometric moments, Hand gesture, Krawtchouk moments, Tchebichef moments, View and user independent recognition, Zernike moments.索引词-几何矩,手势,Krawtchouk矩,,Tchebichef矩,视点和用户独立性的识别,Zernike矩。I. INTRODUCTIONHuman-computer interaction (HCI) is an interesting area of research factoring development in the field of automaton. The recent advancement has led to the emergence of HCI systems that embody the natural way of communication between humans. Therefore, attempts in integrating communication modalities like speech, handwriting and hand gestures into HCI have gained importance. Researchers have focused in developing advanced hand gesture interfaces resulting in successful applications like robotics, assistive systems, sign language communication and virtual reality 1. 1.简介人机交互(HCI)是一个十分有趣的研究领域,其涉及自动化机器(甚至机器人)领域的发展。最近的研究趋势和进步促进了新型HCI系统的出现,它可以以更“自然”的方式实现人机交流。因此,尝试将各种沟通形式如:语音,笔迹,手势整合到HCI中的研究得到了重视。对先进手势识别人机交互接口的研究也在应用领域取得了不少成功,如在机器人学、辅助系统、手语沟通与虚拟现实1中的应用。The interpretation of gesture requires proper means by which the dynamic and/or static configurations of the hand could be properly defined to the machine. This problem is dealt by using the computer vision techniques which are economical and non-obtrusive 1. The general approach to vision based gesture recognition can be based on 2D models like the image contour and the silhouette. These models offer low computational cost and high accuracy for a modest gesture vocabulary. In methods based on 2D models, recovering the hand shape is difficult due to scale changes, rotation and view point variations.机器对手势的解释需要通过适当的方式来进行,这种方式要求机器能够正确地定义出动态以及(或者)静态的手型。这个问题可以通过使用计算机视觉技术来解决,这种技术相对要经济、保守一些。通常,这种基于计算机视觉技术的手势识别可以利用一些 2D(二维)的模型来实现,如图像的外形(等高线)和轮廓。在用于数量适中的手势词表中,这些建模仅需少量的计算量却能提供高的准确度。当然,这种方法也有缺陷,特别是:由于比例的变化,以及旋转度和视点的变化性,在需要重新获得手型的时候是比较困难的。Researchers have successfully developed some scale and rotation invariant features, while very few works concentrate on view and user independence. These problems are addressed to some extent using the 3D hand modelling techniques 1. The research on developing view and user independentmethods based on 2D hand models is yet to mature.研究人员已经成功地研发了一些有关比例和旋转角度不变性的特点,然而这些工作很少将注意力集中于视点变化和用户独立性的层面。并将解决这些问题在一定程度上归于使用3D建模技术1。关于发展这种基于二维的手部建模方法的视点变化和用户独立性的研究尚未成熟。Elastic graph matching proposed in 2 is user independent and can efficiently identify the gestures in a complex background. The method is sensitive to view point distortions. Chin 3 employed the curvature scale space (CSS). The approach is computationally complex and sensitive to boundary distortions. The geometric moment invariants derived from the binary hand silhouettes form the feature set in 4 and are not robust to view point variations. The Zernike and pseudo-Zernike moment features for rotation invariant gesture recognition is introduced in 5. Gu and Su 6 investigated the Zernike moment features for view and user independent representations. Their approach employed a multivariate piecewise linear decision algorithm to successfully classify the dataset containing eleven static gesture signs.在文献2中提出的图像灵活匹配方法是用户独立的并可以在复杂的背景下有效地识别手势。但该方法对视点扭曲(失真)十分敏感。Chin博士在其文献3中提到使用曲率尺度空间(CSS)的方法。该方法的计算复杂,并且对边界扭曲失真敏感。在文献4中提到由二进制手型轮廓得出的几何矩不变量在视点变化的情况下也是不够可靠的。在文献5中介绍了使用Zernike和伪-Zernike矩的旋转不变量方法的手势识别。Gu博士和Su博士在文献6中研究了有关视点和用户独立性表现的Zernike矩特征。他们的方法使用一个名为多元分段线性决策德算法成功地将一个含有11种静态手势标志的数据组进行了分类。This work evaluates the geometric moments and a few popular orthogonal moments in view independent gesture classification. The orthogonal moments considered are the: (1)Zernike (2) Tchebichef and (3) Krawtchouk moments. The user independence is tested by varying the number of users included in the training data. The classification is done using the minimum distance classifier in order to evaluate the direct representation capability of these moment features. The rest of the paper is organized as follows: Section II presents the required mathematical theory of moments. Section III provides an overview of the proposed gesture recognition system. Experimental results are discussed in Section IV and Section V concludes the paper.本文的工作旨在给出几何矩以及少数较常见的正交矩在视点独立的手势识别中的评估,其中我们采用的正交矩包括:(1)Zernike矩(2) Tchebichef矩 以及 (3) Krawtchouk矩。同时我们通过改变参与用户的数量并使用对应的不同数量的测试数据来测试用户独立性。为了评估这些矩特征的直接表现能力,使用了最小距离分类器来实现手势的分类。在之后的文章中,组织顺序如下:第二章节将会介绍一些有关矩研究的数学理论,第三章节将会给出依据本文介绍设计的拟定系统的概观。第四章节将会对实际实验结果进行讨论,第五章节给出全文的总结。II. THEORY OF MOMENTSMoments have the ability to represent the global characteristics of the image shape. The geometric moments are the efficient and regularly deployed features for object recognition1, 7. The Zernike moments are based on continuous orthogonal polynomials defined in the polar domain and are rotation invariant 7. The implementation of these moments requires proper approximation in the discrete domain. The discretization error accumulates as the order of moment increases. Hence, moments based on discrete orthogonal polynomials like the Tchebichef and the Krawtchouk polynomials have been proposed 8. These are directly defined in the image coordinate space and do not involve any numerical approximation.II.矩的原理图形的矩可以表征出图形外形的全局特征。几何矩的应用在物体识别1, 7中是有效并且较规范的。Zernike矩必须基于一种定义在极地领域环境下(磁场方向特殊?)的连续正交多项式,并且这种矩有旋转不变性7的特点。这些矩的实现需要在离散处进行适当的近似,离散型误差随着矩数的增加而不断积累。因此,基于离散正交多项式如Tchebichef 和 Krawtchouk 多项式的矩已经在文献8中提到。这些矩直接定义于图相得坐标空间中,且不包含任何数值的近似。For a 2D image f (x, y) defined over a rectangular grid ofsize N N with (x, y) 0, 1, .N 1 0, 1, .N 1,the moments are formulated as follows.现假设有函数f (x, y)的二维图形在一个大小为N N个单位的矩形坐标图中,图形上的点满足(x, y) 0, 1, .N 1 0, 1, .N 1,则以下是各种矩的计算方法:A. Geometric momentsThe (n + m)th order geometric moment is defined as 7Thus, the geometric moments can be observed as the projectionof f(x, y) on the bases formed by the polynomials xnym.A.几何矩位于第(m+n)位的几何矩定义为:易知,几何矩可被视为函数图像f(x, y)在由多项式xnym构成的基上的投影B. Zernike MomentsThe image coordinates (x, y) are transformed to polar coordinates (, ), such that 0 1 and 0 2. The complex Zernike polynomial of order n 0 and repetition ris defined as 7Vnr (, ) = Rnr () exp(jr) (2)For even values of n|r| and |r| n, Rnr is the real-valued radial polynomial given below:The image f(, ) defined in the polar domain is represented asUsing the orthogonality property, the Zernike moment Znr of order n is obtained from the numerical approximation on the image grid ofB. Zernike 矩将图形的坐标转换为极坐标形式(, ),0 1 且 0 2。则第n项(n 0)的复合Zernike多项式(重复数为r)为7:Vnr (, ) = Rnr () exp(jr) (2)对于偶数值n|r|,|r|n,Rnr是一个实数的径向多项式:在极坐标下的图像函数f(, )为:利用正交性,Zernike矩的第n项Znr可通过数值近似由坐标图上的图像得到:C. Tchebichef momentsThe 1D discrete Tchebichef polynomial at a discrete pointx is defined as 9tn (x) = (1 N)n 3F2 (n,x,1+ n; 1, 1 N; 1)where 3F2 is the hypergeometric functionand (a)v is the Pochhammer symbol given by(a)v = a (a + 1) . (a + v 1)The separability property is used to obtain the 2D bases andf(x, y) is represented asThe Tchebichef moment Tnm of order (n+m) is obtained aswhere is a normalization constant.C. Tchebichef 矩一维离散 Tchebichef 多项式在一个离散点x 上的定义为9:tn (x) = (1 N)n 3F2 (n,x,1+ n; 1, 1 N; 1)其中3F2是一个超几何分布函数:上式中 (a)v是一个 Pochhammer 标记,定义如下:(a)v = a (a + 1) . (a + v 1)其可分离属性是用于获得二维基则函数f(x, y) 可表示为可由下式计算得第(m+n)项的Tchebichef 矩 Tnm:其中是一个正规化常数.D. Krawtchouk momentsThe nth order weighted Krawtchouk polynomial at a discretepoint x is defined as 10By definition,Where is the binomial weight function, (n; p) is a constant givenbyand (0 p 1) is a controlling parameter. As p deviates from the value of 0.5 by p, the support of the weighted Krawtchouk polynomial is approximately shifted by NpThe direction of shifting is dependent on the sign of p 10.Using the separability property, the orthogonal 2DKrawtchouk bases are defined and f (x, y) is approximatedas,The Krawtchouk moments Qnm of order (n + m) is obtainedasFrom the plots of 2D polynomial functions in Fig. 1, wecan infer that the Zernike and the Tchebichef polynomialshave wide supports. It means that the polynomial function isdefined for all the points in the domain. Hence, the Zernike andTchebichef moments characterize the global shape features.On the other hand, the Krawtchouk polynomials have compactsupport. The support of the polynomial increases with its order.Therefore, the lower order Krawtchouk moments capture thelocal features and the higher order moments represent theglobal characteristics. Thus, the Krawtchouk moments exhibitbetter localization.The moments obtained in (1), (3), (7) and (10) are used asfeatures for gesture representation.D. Krawtchouk 矩在离散点x处的第n项带权Krawtchouk多项式定义为10其中:其中函数: 是一个二项分布权值计算函数, (n; p)是一个常量:且 (0 p 1) 是一个控制变量. 记p与值 0.5 之间的偏差为 p, 加权Krawtchouk 多项式由近似的Np转换支持。转换的方向取决于p的符号 10.根据可分离性特征, 正交二维Krawtchouk 基就可以定义出来,且函数f (x, y)可近似为:Krawtchouk 矩的第(n+m)项 Qnm 可由下式得到:从Fig. 1的二维函数多项式示图中, 我们可以推断Zernike多项式和Tchebichef 多项式可靠面更广。即他们在所有点上有定义,因此, 从另一方面来说Zernike矩和Tchebichef 矩能够表征形体的全局特征。Krawtchouk 多项式有紧凑可靠性. 随着项数增加其可靠性就越好,因此,低项数的Krawtchouk矩能够捕捉一些局部特征,而较高项数的Krawtchouk矩能够反映出全局特征。所以, Krawtchouk 矩展现了更好的局部性.在式 (1), (3), (7) 和(10) 中的矩将被用于手势的表示。III. GESTURE RECOGNITION SYSTEMThe gesture recognition system consists of four modules asshown in the block diagram Fig. 2. The functions of thesemodules are summarized as follows.III.手势识别系统我们拟定的手势识别系统是由4个模块组成的,这四个模块在图Fig. 2中以框图表示。这些模块的功能综述如下。A. Hand detection and SegmentationThe first step in processing is to extract the hand fromthe image background. Teng et al 11 have given a simpleand effective method to detect skin color pixels by combiningthe features obtained from the YCbCr and YIQ color spaces.Hence, the hand regions are detected using the skin colorpixels. The resultant binary image is subjected to connectedcomponent analysis followed by the morphological closingoperation to obtain the segmented hand image.A. 手型探测和分割程序的第一步是将手型从图像背景中单独提取出来。Teng et al博士在文献11中给出了一个简单有效的方法来检测皮肤色彩像素点,这种方法结合了YCbCr和YIQ 色彩空间的属性.因此使用这种方法将手型区域识别出来,对得到的二进制合成图像做联合分量分析以及形态学闭环运算,以得到分段的手型图像B. NormalizationThe binary hand images are normalized for orientationchanges and scale variations. The image is aligned such thatthe major axis lying along the forearm region is at 90 withrespect to the horizontal axis of the image. After rotation correction,the forearm region is removed through morphologicalprocessing. The resolution of the resultant image is fixed at104 104 with the hand object normalized to 64 64.B.标准化对得到的二进制手型图像需要进行定向改变和比例变化的标准化工作。图像需要被对齐,使得沿着前臂区域的主轴关于图像水平轴方向成90,旋转修正之后,经过形态学处理将前臂区域移除,此时得到图像分辨率需要锁定在104 104 而手部分辨率需要标准化为 64 64.C. Moment Feature extractionThe moments computed from the normalized hand gestureimage form the feature vectors. The orders of the orthogonalmoments are selected experimentally based on its accuracyin reconstruction. The order of geometric moments is chosenbased on the recognition performance.C提取矩特征.将从规范化的手势图像中计算出来的矩构成特征向量。这些正交矩的顺序确定基于他们实验上的重构精确度。而几何矩的顺序由他们的识别性能高低来决定。D. ClassificationClassification is done using the minimum distance classifierdefined as follows:where R is the index of signs in the trained set, zs is thefeature vector of the test image, zt is the feature vector of thetarget image and T is the length of the feature vectors.D. 划分类别分类工作由最小距离分类器完成,其定义如下:R是训练集的符号指数,zs是测试图像的特征向量,zt是目标图像的特征向量,T是特征向量的长度。IV. EXPERIMENTAL RESULTS AND DISCUSSIONThe gesture data are captured using an RGB Frontech e-camof resolution 1280960 connected to a Intel core-II duo 2GBRAM processor. The images are collected under non-uniformbackground. The background is restricted such that the handis the largest object in the field-of-view (FOV).IV.实验结果以及讨论实验中的手势数据是使用一个RGB制式的先端电子摄像头来采集的,分辨率达到1280960,其直接与Intel core-II duo 2GB RAM 的处理器相连接。这些采集图像是在非统一的背景下采集的。但背景仍需要一定的限制,以使得手在观察视野(FOV)中是最大的观察物体。Two sets of gesture data are acquired for experimentation.The first dataset consists of gestures collected from a perfectview point, which means the angle between the line of focusand the axis of the object is 90 12. The second datasetconsists of gestures taken at varying view points. The databasefor testing is collected real-time under controlled environment.实验获得了两个手势数据集,第一个数据集包含在完美的视点下收集到的手势数据,这意味着中心线和物体主轴之间的夹角是90 12。第二个数据集包含的收视数据是在各种不同视点下收集的。用于测试的数据库数据是在可控环境下实时收集的数据。The data consist of 1, 240 images collected from 23 users.There are 10 gesture signs with 124 samples for each gesturesign. The gesture signs are shown in Fig. 3. The images arecollected under three different scales with random orientationsand the view angles at 45, 90,45 and 135.这些数据包含由来自23个用户的1,240张图像,有10个手势标记其每一个都有124个样本。手势样本在图Fig. 3中显示,这些图像是在三种不同的比例下选择随机的取向和不同的视角:45, 90,45 和 135.来收集的。The data contains 690 gestures taken at 90 and the remaining550 at varying view angles. We refer the dataset taken at90 as Dataset 1 and the remaining data as Dataset 2. Thus,Dataset 1 accounts only for the rotation and scale changesand not the shape profile. Dataset 2 consists of gestures thatinclude all the three variations (orientation, scale and viewangle). Therefore, the images undergo perspective distortionthat occurs because of the viewing angles 12.The following experiments are performed to study andcompare the adequacy of the geometric and the orthogonalmoments for robust gesture classification.数据中包含690个手势是在90的的视角下采集,其余550个是在不同视角下采集的。我们把前者称为数据集1,后者称为数据集2,这样,数据集1代表有旋转和比例变化而不含外形变化的数据,而数据集2代表包含所有三种变量(指向,比例和视角)的手势数据。因此,这些图像经过了由于视角不同而造成的透视变形12。接下来的实验是为了研究和比较几何矩和正交矩在稳定的手势分类中的恰当性。A. Performance of orthogonal moments in gesture representationThe binary image in Fig. 4(a) is approximated usingdifferent orthogonal polynomials. The representation abilityof the moments is compared on the basis of accuracy inreconstruction. The image reconstructed from the moments isbinarised through thresholding. The dissimilarity between theoriginal and the reconstructed image is measured using themean square error (MSE) and the structural similarity (SSIM)index. The MSE is sensitive to small imperfections in the reconstructedimage caused by thresholding. However, the SSIMindex is insensitive to such deviations and hence, corroboratesthe MSE values in terms of the geometric closeness.The SSIM index between the images f and is computedlocally by dividing the images in to L blocks of size 1111.For l 1, 2, , L, the SSIM between the lth block of fand is evaluated as 13A. 手势表现中正交矩的性能在Fig. 4(a)中的二进制图像是使用不同的正交多项式近似得到的。这些矩的表现能力强弱是根据重构精确度为基础来衡量的。使用矩来重构的图像是根据阈值比较的方法来二进制化。原图像和重构图像之间的差别使用均方差(MSE)和结构相似性(SSIM)指数来衡量。MSE对由重构中的阈值比较导致的小瑕疵敏感。 然而,SSIM指数对这样的偏差不敏感,因此证实了MSE值在几何学上的紧密性。图像f 和的SSIM指数通过将图像分为L块1111分辨率大小的块来局部计算,其中 l 1, 2, , L, 其中第L块的SSIM指数可以如下计算13:where f and f denotes the mean intensities, 2f and 2 fdenotes the variances and f f denotes the covariance. Theconstants c1 and c2 are chosen as 0.01 and 0.03 respectively.The average of the locally computed values gives the SSIMindex representing the overall image quality. The value ofSSIM index lies between 1, 1 and a larger value meanshigh similarity between the compared images.Fig. 4(b) shows the reconstructed images obtained fromthe Zernike, Tchebichef and the Krawtchouk moments. Thecomparative plot of MSE and SSIM index for varying numberof moments is shown in Fig. 4(c) and 4(d) respectively. Fromthe results, it is evident that the images reconstructed using theZernike moments are not well defined. Hence, its performancebased on the values of MSE and SSIM index is inferior to othertwo orthogonal moments. Also, for higher orders the Zernikemoments are numerically unstable and the reconstruction errorincreases. The images reconstructed from the Tchebichef andthe Krawtchouk moments closely approximates the originaleven for the lower orders. As the order increases, the reconstructionerror for Tchebichef moments decreases and itsapproximation is close to the performance of the Krawtchoukmoments. It is noted that the edges
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 江西工程学院《仪器机械学基础》2023-2024学年第二学期期末试卷
- 长春医学高等专科学校《建筑信息建模技术与管理》2023-2024学年第二学期期末试卷
- 天津理工大学中环信息学院《火电厂烟气净化Ⅱ》2023-2024学年第二学期期末试卷
- 河北轨道运输职业技术学院《野生动植物保护与管理》2023-2024学年第二学期期末试卷
- 西安高新科技职业学院《公共关系学》2023-2024学年第二学期期末试卷
- 大理护理职业学院《植物资源化学》2023-2024学年第二学期期末试卷
- 2024年核磁共振岩心测试仪投资申请报告代可行性研究报告
- 2024年高性能铁氧体一次磁粉项目投资申请报告代可行性研究报告
- 安全教育说课稿
- 2025年四川泸州自贸区龙驰商务秘书服务有限公司招聘笔试参考题库含答案解析
- 2025-2030中国胃食管反流药物行业市场发展趋势与前景展望战略研究报告
- 2025年建筑集成光伏(BIPV)市场规模分析
- 小学生脱口秀课件
- 抖音陪跑合同协议
- 2025-2030海工装备制造行业市场深度调研及前景趋势与投资研究报告
- 华为测试面试题及答案
- 漂珠销售合同协议
- 2025化学中考解题技巧 专题10 技巧性计算(解析版)
- 部门加班调休管理制度
- 2025-2030中国工业物联网行业市场深度调研及发展前景与趋势预测研究报告
- 海鲜门店管理制度
评论
0/150
提交评论