基于片段和DTW的模式识别外文翻译@中英文翻译@外文文献翻译_第1页
基于片段和DTW的模式识别外文翻译@中英文翻译@外文文献翻译_第2页
基于片段和DTW的模式识别外文翻译@中英文翻译@外文文献翻译_第3页
基于片段和DTW的模式识别外文翻译@中英文翻译@外文文献翻译_第4页
基于片段和DTW的模式识别外文翻译@中英文翻译@外文文献翻译_第5页
已阅读5页,还剩18页未读 继续免费阅读

付费下载

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

基于片段和 DTW的模式识别 摘要 本文专门对动态过程的状态进行评估。过程状态和异常将从被测量过程变量的模式中得到,利用这些模式的正确反映和分类,可以对一种确切的运行状态进行识别。然而相同状态的不同模式有着不同的时间持续或者大小,这篇论文中将提到一种动态时间归正算法( DTW),通过相似性匹配法进行不同模式的比较和分类。这个算法的主要改进在于利用了片段的方法对模式变量的性质进行反映。 介绍 在动态过程的状态评估中对被测量动态信号的解释是一项最重要的工作,即对错误的检测和修正。因此,拥有处理信号的工具是十分重要的 ,性质的反映期望能够代表被监测信号的趋势(倾向、震动度、警示、短暂度 .),特别是在错误的检测和修正中。根据有关过程和行为的知识,一些技术可以用于这个目的。 利用片段的方法反映信号是其中一种技术。在这种情况下,一系列的片段被用于描述表征特定变化状况的模式中,然后,问题转化为获得能够表征这些模式的分类机制。这篇文章将描述用于这种模式识别技术的一种工具。 论文将按照以下线索进行组织。如下部分讨论用以类似时间级数的方法,然后介绍动态时间归整算法和有关片段的基本概念。最后,提及 DTW 的一项新进展并在一个诊断应用例 子中进行检查。 时间级数比较 在许多应用中时间级数比较的研究已经大量展开,下一步,将观察距离类似的一些模型。 Agrawal et al. (1995b)提出形状定义语言 SDL,用于取回包含在基于形状的历史中的信息。 SDL在它可以进行频率比较的改进的性质描述中永许改变原始的数据。 在 Agrawal et al. (1995a)中推出了另一种相似模型,基于两个时间序列如果有足够非重叠时间有序的相似子序列则认为它们是相似的。由于这一模型的推出,通过建立一个可索引的数据结构,快速搜索技术被用于检测一组序列中的相似 序列成分。 Faloutsos et al(1994) 或 Chan and Fu (1999)提出了把 Haar微波转变用于时间系列索引问题的其他从收集到的序列中确定有用序列的索引方法。Keogh 和 Pazzani(1998)采用了一种新的表示法,组成 piecewise线性片段去描述形状和包含每个单独线性片段的重量矢量,并永许用户自己定义各种各样的类似量。 (Keogh &Pazzani 2000) 介绍了一种支持索引法的维度伸缩办法。 另外一种有关序列相似的有用量是最长共同序列( LLCS)的长度,基于从一个 序列传到另一个序列的编辑长度。 Paterson 和 Danck, (1994)对一些存在方案进行了修订。 在 Konstantinov and Yoshida 1992中线的组合代表了信号的性质形状。因此,如果两种瞬时状态的 qshapes偶合可以认为它们性质相同。一个真实时间部分的分析程序,从预先确定的时间间隙中提取 qshapes,并把它们与储存了有趣行为的可扩张库进行比较。 在 (Bakshi and Stephanopoulos 1994b)中描述了基于片段的模式识别。每一个模式被一连串元素代表,同时用模 式语法的办法进行定义。包含所有分类信息的特征序列通过与代表这些趋势中的相似事件的明显句法描述的匹配而确定,模式匹配促进了被用语解决需决定树法再次解决的分类问题中的性质和数量的提取。 动态时间归整 通过时间序列数据进行的大部分算法是使用欧几里得距离或者它的一些变化。然而由于它对于时间轴上小的失真非常敏感,欧几里得距离可以形成相似上的不正确量。 一种试图解决这种不便的方法是动态时间归整法( DTW),这种技术是利用动态方程把时间级数与一个特定的模板对齐使累积距离最小。 DTW已经广泛用于消除词识别中因讲话的不同速度 引起的失真。 下面描述 DTW的概念: 设两个长度分别为 M和 N的时间级数 X、 Y: X=x1,x2,.,xi,.,xm Y=y1,y2,.,yj,.,yn (1) 为了对齐两序列, DTW将在 M*N的矩阵中寻找一个具有 K点的 W序列,矩阵中每一个元素( i, j)包含了 Xi和 Xj之间的距离 d( Xi, Yj)。路径 W是为了减少两序列间矩阵元素对齐的距离。 W=w1,w2,.,wk max(m,n)km+n (2) wk=ik,jk (3) ik和 jk分别表示轨道 X和 Y的索引时间,为了寻找最佳路径 Wi,考虑一些关于匹配过程的条件,主要有: 路径端点条件: w, , wk=m,n。 连续性时间匹配路径不可能是逆时的,所以必须满足: ik+ 1ik, jk+1jk。 通过把该点距离 d(xi,yj)与先前单元中距离的最小值之和 D(i,j) 作为累积距离来抽取路径: D(i,j)=d(xi,yj)+minD(i-1,j-1),D(i-1,j),D(i,j-1) ( 1 ) 图 1:形状相同的两个信号, a)由于信号不及时对齐,欧几里得距离将产生一个不良结果。 b) dtw找到一个永许相似量的对 齐。 这项技术进行了许多更改用于在通过线性代表的较高层面上进行操作。 DTW和基于代表的片段的结合 在前面的部分中, DTW由于其在不同经度下的序列对齐能力而被作为确定不同片段的相似性的一种方法。不利的方面其算法计算时间过长和试图通过歪曲X轴来解决 Y轴的可变性可能引起无法对齐。在这一部分中,将介绍可以解决这种缺陷的 DTW算法。 拟采用的解决方案组成上, DTW将用在基于代表的片段上而不是原始的时间级数上。作为片段的序列表征通过减少数据的计算量来减少计算时间。类似的,定义片段的性质特征将回避 Y轴的可变问题。因此, DTW将可用于更长距离的片段对齐中。 唯一的问题是去定义片段间的累积距离。在这种意义中,一个距离的图表被定义,与前部分所描述的 13类片段相一致。累积距离跟性质状况和定义了不同种类片段的辅助特性有关。然而,这些累积距离是以用户的标准为条件的,因此 .这样 DTW算法的一个新的进展( EPDTW)就建立了,利用片段作为信号更高水平的表征。 必须牢记,被比较的序列可以有不同的持续,这个事实使的拟议技术的概括复杂化,在下一个例子中被分析的序列的长度是不同的,尽管不是太不相似。 诊断应用 如应用的例子中,前面提到的 改进已经在一座以诊断为目的的实验室设备中使用了。在这套设备中,容器 A的水位由 PID控制器通过从水库(容器 B)中抽水来控制。 三个阀门( V1, V2, V3)可以通过控制开或关。然后打开或关闭阀门的合适组合的一些行为将发生。表 2描述了有关情形。 系统力学可以通过利用外部水填满或者清空水库做稍微改变。再说外部水的输入或输出也是控制所感兴趣的部分。试验在假设两种情况不互搭的基础上已经被改进了。这样,阀门配置方面的改变只有过程是稳态时才被实施。被监控信号的容器 A的水位和控制信号。 监测系统可以检测这些情形并且根 据片段序列描述的被测量信号的行为源诊断。监测系统周期性地获得并作为根据目前描述片段序列的表征。这些序列通过 EPDTW法与其他著名模式进行比较来完成发现和诊断情况的目的。 执行例子 这部分中所讲的例子与表格 2中所描述的三个阀门的操作是相一致的。首先,操作阀们模拟失灵,接着再操作阀们使回到正常操作状态。前面提到的三种模式( R1、 R2和 R3)已经得到分别去代表每一个不正常状态,每一种(图 5-7)由两个被监测的信号和它在事件中的表征所组成。 然后,三种测试模板 T1、 T2和 T3(图 8-10)与相同的情 况相一致,但是拥有不同的起点用以与前面提到的模式相比较,从而诊断状态。 首先,每一模式的电平和控制信号在使信号正常化之后已经和一种古典的DTW算法进行了比较,得到的结论在表 3和 4中给出。然后,测试模板的序列与前面模板的已知序列用 EPDTW算法进行比较。表 5和 6给出了电平的控制信号的比较结果。在所有情形下,所获得的有用结果是一个正常距离,因此, 0代表完全匹配。 最后,获得两信号距离主要目的是为了得到每一种状况和不同情况( DTW和 EPDTW)下模式之间是本地距离,这种类似评价的结果,在表 7和 8中给出。 可以看出利用表 8比表 7更容易分离出正确或错误的诊断。另一个要考虑的是处理时间。在这些例子中,利用 DTW算法在 AMD K2处理器中的最大执行时间是 5.34秒,而利用 EPDTW算法是 0.3seg。 总结 这项工作表明利用片段法进行信号的性质表征和用于诊断领域模式识别的DTW算法的结合是可能的。 既然属于相同状态的不同模式可以有不同的时间持续或重要性, DTW算法的一种改进提出来,用以比较和分类相同模式,利用相似匹配法。这样,由 DTW实现的受时间限制的对齐优势就加到了利用片段作为信号表征的优势 中,从水位控制系统的例子中可以看出,控制状况的正确识别可以从当前模式和前面已知模式的比较中得出。 Pattern recognition based on episodes and DTW Abstract This work is oriented towards situation assessment of dynamic processes. Process conditions and abnormalities can be detected from patterns of measured process variables. Then, a correct representation and classification of these patterns allows identifying a particular class of operating situation. Nevertheless, different patterns belonging to the same class of situations could have different time duration or magnitudes. In this paper a modification of Dynamic Time warping (DTW) algorithm is presented in order to compare and classify patterns by means of a measure of similarity. The main improvement introduced in this algorithm is the use of qualitative representation of process variables by means of episodes. Introduction Interpretation of measured process signals is an important task in Situation Assessment of dynamic processes, namely for Fault Detection and Diagnosis. For this reason, it is necessary to have tools for dealing with signal coming from processes. Qualitative representations are proposed to represent trends of signals (tendencies, oscillation degrees,alarms, degree of transient states.) needed in supervision,especially in fault detection and diagnosis. According takenowledge about process and its behaviour several techniques could be used with this aim. One of these techniques is the representation of signals by means of episodes. In this case, series of episodes are used to describe patterns that identify particular classes of operating situation. Then the problem is to obtain a classification mechanism of these patterns in order to identify the state of the process. In this paper a description of the tools used in this pattern recognition methodology is shown. The paper is organized as follows. In the following section, similarity methods applied to time series are discussed. Then Dynamic Time Warping (DTW) is introduced and basic concepts related to episodes are presented. Finally, a new approach of DTW is proposed and tested in a diagnosis application example. Comparing Time Series. There are numerous studies that have been carried to compare time series of data in several applications. Next, some models of distance-similarity are observed. Agrawal et al. (1995b) present a shape definition language (SDL) for retrieving objects contained in the histories based on shapes. SDL allows converting original data in a qualitative description of its evolution that allows a comparison between sequences. In Agrawal et al. (1995a) another model of similarity is introduced, it is based on the notion that two time sequences are said to be similar if they have enough nonoverlapping time-ordered pairs of subsequences that are similar. Given this similarity model, fast search techniques are used for discovering all similar sequences in a set of sequences by creating a indexable data structure. Other indexing methods to locate subsequences within a collection of sequences are presented by Faloutsos et al (1994) or Chan and Fu (1999) where a Haar wavelet transformation is used for the time series indexing problem. A new representation, adopted by Keogh and Pazzani (1998), consists of piecewise linear segments to represent shape and a weight vector containing the relative importance of each individual linear segment, allowing the user to define a variety of similarity measures. (Keogh & Pazzani 2000) introduce a dimensionality reduction technique that supports an indexing algorithm. A useful measure of similarity for strings is the length of a longest common subsequence (LLCS), based on the edit distance required in passing from one string to another one. Paterson and Danck, (1994) carry out a revision of some existing solutions. In (Konstantinov and Yoshida 1992) the qualitative shape of a signal is represented by the combination of strings. Hence, two temporal shapes are considered qualitatively equivalent if their qshapes coincide. A real time analyzing procedure extracts qshapes over a predefined time interval and compares them with those of an expandable shape library that stores all interesting behaviours. A methodology for pattern recognition based on episodes is described in (Bakshi and Stephanopoulos 1994b). Each pattern is represented by a string of primitives, also identified by means of a pattern grammar. The string that captures all the features necessary for classification is determined by matching the distinct syntactic descriptions, which represent similar events in these trends. Pattern matching facilitates extraction of qualitative and quantitative features used for solving the classification problem resolved by means of decision trees. Dynamic Time Warping. Most of algorithms that operate with time series of data use the Euclidean distance or some variation. However,Euclidean distance could produce an incorrect measure of similarity because it is very sensitive to small distortions in the time axis. A method that tries to solve this inconvenience is Dynamic Time Warping (DTW), this technique uses dynamic programming (Sakoe and Chiba, 1978; Silverman,1990) to align time series with a given template so that the total distance measure in minimised (Fig. 1). DTW has been widely used in word recognition to compensate the temporal distortions related to different speeds of speech. Next, a brief notion of DTW is described. Given two time series X and Y, of length m and n respectively X=x1,x2,.,xi,.,xm Y=y1,y2,.,yj,.,yn (1) To align the two sequences, DTW will find a sequence W of k points on a m-by-n matrix where every element (i,j) of the matrix contains the distance d(xi,yj) between the points xi and yj. The path W is a contiguous set of matrix elements that minimise the distance between the two sequences. W=w1,w2,.,wk max(m,n)km+n (2) wk=ik,jk (3) where ik and jk denote the time index of trajectories X and Y respectively. In order to find the best path W, some constraints on the matching process are considered, main ones are: Constraints at the endpoints of the path, w1=1,1 and wk=m,n Continuity constraints, matching paths cannot go backwards in time, this is achieved forcing ik+1ik and jk+1jk. The path is extracted by evaluating the cumulative distance D(i,j) as the sum of the local distance d(xi,yj) in the current cell and the minimum of the cumulative distances in the previous cells. This can be expressed as: D(i,j)=d(xi,yj)+minD(i-1,j-1),D(i-1,j),D(i,j-1) ( 1 ) Several modifications of this technique have been introduced in order to apply the method in several situations. In (Keogh and Pazzani 1999) a modification of DTW is introduced to operate on a higher level of data abstraction through a piecewise linear representation. (Keogh and Pazzani 2001) consider a higher level feature of shape considering the first derivative of the sequences. Caiani et al. (1998) adapt the DTW approach to the analysis of the left ventricular volume signal for an optimal temporal alignment between pairs of cardiac cycles. (Vullings et al. 1998) implement a piecewise linear approximation and segment the signal into separate heartbeats. DTW also is used in (Kassidas et al. 1998) to synchronise batch process trajectories in order to reconcile timing differences among them. Combining DTW and Episodes based Representations In previous section, DTW has been shown as a good method to determine the similarity between two sequences of episodes due to its capacity to align sequences with different longitudes. As disadvantages, it is a computationally expensive algorithm and it could fail in the alignment by trying to solve the variability in the Y-axis by warping the X-axis. In this section, a modification of the DTW algorithm that allows solving these inconveniences is introduced. The proposed solution consists on apply DTW not in original time series but in its episodes based representations. The representation of a sequence as episodes reduces the calculation time by decreasing the amount of manipulated data. Likewise, the qualitative character that defines an episode avoids the problem of the variability in the Y-axis. Therefore DTW can be used to align episodes to obtain a global distance. The only problem is to define a local distance between episodes. In this sense, a chart of distances has been defined where the 13 types of episodes described in the previous section are related. Distances are based on the qualitative state and auxiliary characteristics that define the different types of episodes (Table 1). However, these local distances could be subject to the criterion of the user, so one could give more importance to some episodes concerning another obtaining a different global distance and preserving the essential features of the process signal. This way, a new approach (EpDTW) of the DTW algorithm is created using episodes as a higher level representation of the signal. It is necessary to keep in mind that compared sequences could have different duration. This fact complicates the generalisation of the proposed technique, in the next example the length of the analysed sequences is different although not too dissimilar. A Diagnosis Application As application example, the proposed approach has been used in a laboratory plant for diagnosis purposes. In this plant (See Fig. 4), level in tank A is controlled by means of a PID controller by pumping water from a reservoir (tankB). Three valves (V1,V2 and V3) can be handled in order to simulate obstructions and leakages. Then several behaviours are possible by appropriate combination of opening and closing valves. Table 2 represents these situations. Additionally, system dynamics can by slightly modified by filling or emptying the reservoir with external water. Then, input and output of external water are also interesting situations to detect. The experiments have been developed under the assumption that two situations can not be overlapped. Thus, changes in the configuration of valves are only performed when process is in steady state. Monitored signals are the level in tank A and the control signal (pump). The monitoring system will be able to detect such situations and diagnose about the origin of misbehaviours according to the behaviour of measured signals described by sequences of episodes. The monitoring system acquires data periodically and represents them as sequences of episodes according to the previous description. These sequences are compared by means of EpDTW with other well-known patterns with the purpose of detecting and diagnose the situation. Execution Example The example described in this section corresponds to the manipulation of the three valves as described in Table 2. First the valves are manipulated in order to simulate the failure and later are manipulated again in order to return to the normal operation. Three reference patterns (R1,R2 and R3) have been obtained to represent each abnormal situation, each one (Fig. 5-7) is composed by the two monitored signals (level in tank A and control) and its representation in episodes. Then, three test patterns T1, T2, and T3 (Fig. 8-10)corresponding to the same situations but with different setpoints are used in order to compare them with the referencepatterns and diagnose the situation. First, the level and control signals of each pattern have been compared with a classical DTW algorithm after normalising the signals. The obtained results are shown in the Table 3 and Table 4. Then, the sequences of the test patterns are compared by means of EpDTW with the wellknown seque

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论