DTW_6

Géry Casiez http://www.lifl.fr/~casiez VisA - Master 2 spécialité IVI – Université de Lille 1 DYNAMIC TIME WARPING: DEFORMATION TEMPORELLE DYNAMIQUE 1 Multi-Dimensional Dynamic Time Warping for Gesture Recognition G.A. ten Holt a,b M.J.T. Reinders a E.A. Hendriks a a Information and Communication Theory Group Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands b Human Information Communication Design Delft University of Technology, Landbergstraat 8, 2628 CC, Delft, The Netherlands [email protected] [email protected] [email protected] Keywords: Dynamic Time Warping, pattern recognition, multi-dimensional time series Abstract We present an algorithm for Dynamic Time Warp- ing (DTW) on multi-dimensional time series (MD- DTW). The algorithm utilises all dimensions to find the best synchronisation. It is compared to ordi- nary DTW, where a single dimension is used for aligning the series. Both one-dimensional and multi- dimensional DTW are also tested when derivatives instead of feature values are used for calculating the warp. MD-DTW performed best in finding a known ground truth under noisy conditions. The algorithms were also used to perform simple classification of a set of 121 gestures. MD-DTW performed as well as or better than any single dimension in all tasks. In general, DTW on feature derivatives gave better results than DTW on feature values. 1 Introduction There are various problem areas where signals need to be synchronised. When two time signals are compared, or when a pattern is sought in a larger stream of data, one of the signals may be warped in a non-linear way by shrinking or expand- ing along its time axis. Simple point-to-point comparison then gives unrealistic results, because one might be comparing different relative parts of the same signal/pattern. In these cases, some sort of synchronisation is needed. Figure 1 illustrates the difference between point-to-point comparison and comparison aided by synchronisation. Dynamic Time Warping (DTW) [5] has long been used to find the optimal alignment of two signals. The DTW algorithm calculates the distance between each possible pair of points out of two signals in terms of Figure 1: The importance of synchronisation. If signals are simply compared at each time instance (dot- ted arrow), they may be in different relative phases, giving unrealistic differences. Synchronisation en- sures proper comparison (solid arrow). their associated feature values. It uses these distances to calculate a cumulative distance matrix and finds the least expensive path through this matrix. This path represents the ideal warp - the synchronisation of the two signals which causes the feature distance between their synchronised points to be minimised. Usually, the signals are normalised and smoothed before the distances between points are calculated. DTW has been used in various fields, such as speech recognition [7], data mining [3], and move- ment recognition [1, 2]. Previous work in the field of Références ! C. S. Myers and L. R. Rabiner. A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, 60(7):1389-1409, September 1981. ! Darrell T.J., and Pentland, A. P., “Recognition of Space-Time Gestures using a Distributed Representation”, (1992) MIT Media Laboratory Perceptual Computing Group TR-197. 2 Introduction ! Exemple des gestes ! Invariance en position ! Calcul de centroïde ! changement de repère ! Invariance en rotation 3 Jacob O. Wobbrock, Andrew D. Wilson, and Yang Li. 2007. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In Proceedings of the 20th annual ACM symposium on User interface software and technology (UIST '07). ACM, New York, NY, USA, 159-168. DOI=10.1145/1294211.1294238 http://doi.acm.org/ 10.1145/1294211.1294238 Introduction ! Invariance à l’échelle ! Calcul de boite englobante ! Invariance à la fréquence d’échantillonnage, vitesse d’exécution, variations de vitesse en cours d’exécution ! Ré-échantillonnage en N points équidistants (en distance euclidienne) par interpolation linéaire ! L’interpolation linéaire n’est pas réalisable pour n’importe quel type de signal (ex: son) ! Comment reconnaître des gestes sans explicitement spécifier le début et la fin? 4 Introduction ! Comment mettre en correspondance deux signaux en étant insensible aux variations de rythme? ! Comment gérer les différences en nombre d’échantillons de deux signaux? 5 Multi-Dimensional Dynamic Time Warping for Gesture Recognition G.A. ten Holt a,b M.J.T. Reinders a E.A. Hendriks a a Information and Communication Theory Group Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands b Human Information Communication Design Delft University of Technology, Landbergstraat 8, 2628 CC, Delft, The Netherlands [email protected] [email protected] [email protected] Keywords: Dynamic Time Warping, pattern recognition, multi-dimensional time series Abstract We present an algorithm for Dynamic Time Warp- ing (DTW) on multi-dimensional time series (MD- DTW). The algorithm utilises all dimensions to find the best synchronisation. It is compared to ordi- nary DTW, where a single dimension is used for aligning the series. Both one-dimensional and multi- dimensional DTW are also tested when derivatives instead of feature values are used for calculating the warp. MD-DTW performed best in finding a known ground truth under noisy conditions. The algorithms were also used to perform simple classification of a set of 121 gestures. MD-DTW performed as well as or better than any single dimension in all tasks. In general, DTW on feature derivatives gave better results than DTW on feature values. 1 Introduction There are various problem areas where signals need to be synchronised. When two time signals are compared, or when a pattern is sought in a larger stream of data, one of the signals may be warped in a non-linear way by shrinking or expand- ing along its time axis. Simple point-to-point comparison then gives unrealistic results, because one might be comparing different relative parts of the same signal/pattern. In these cases, some sort of synchronisation is needed. Figure 1 illustrates the difference between point-to-point comparison and comparison aided by synchronisation. Dynamic Time Warping (DTW) [5] has long been used to find the optimal alignment of two signals. The DTW algorithm calculates the distance between each possible pair of points out of two signals in terms of Figure 1: The importance of synchronisation. If signals are simply compared at each time instance (dot- ted arrow), they may be in different relative phases, giving unrealistic differences. Synchronisation en- sures proper comparison (solid arrow). their associated feature values. It uses these distances to calculate a cumulative distance matrix and finds the least expensive path through this matrix. This path represents the ideal warp - the synchronisation of the two signals which causes the feature distance between their synchronised points to be minimised. Usually, the signals are normalised and smoothed before the distances between points are calculated. DTW has been used in various fields, such as speech recognition [7], data mining [3], and move- ment recognition [1, 2]. Previous work in the field of Dynamic Time Warping ! DTW ! Déformation temporelle dynamique ! Déterminer pour chaque élément d’une séquence, le meilleur élément correspondant dans l’autre séquence relativement à un certain voisinage et à une certaine métrique ! Complexité polynomiale 6 Applications ! Vidéo, audio, graphique... ! Toutes données qui peuvent être transformées en représentation linéaire en fonction du temps (séries temporelles) ! Echantillons ordonnés par une étiquette de temps ! Reconnaissance vocale ! Reconnaissance de gestes off-line et on-line ! alignement de protéines... 7 Principe de base ! Séquence de référence R = [r 1 , r 2 , ..., r n ] ! Séquence de test T = [t 1 , t 2 , ..., t m ] ! Si m=n alors on peut calculer la distance entre les deux signaux de la façon suivante (pas forcément idéal): ! Possibilité de calculer la distance euclidienne ou d’utiliser une autre métrique (fonction qui donne un réel) ! Plus possible d’utiliser cette méthode dès que n ! m ! Attention: les échantillons des séquences doivent être équidistants en temps 8 D = n i=1 distance(r i , t i ) (1) 1 Principe de base ! DTW réalise d’abord un alignement non linéaire en recherchant parmi tous les alignements possibles, celui qui minimise une fonction de coût cumulé ! «Time Warping»: Dilation ou compression des séquence pour obtenir le meilleur alignement possible 9 Principe de base ! Détermination du chemin W = [w 1 , w 2 , ..., w k ] de longueur minimale 10 D = n i=1 distance(r i , t i ) (1) k i=1 distance(w i ) (2) 1 Principe de base ! Conditions aux frontières ! w 1 = (r 1 , t 1 ) ! w k = (r n , t m ) ! Contraintes locales ! Monotonicité pour respecter le séquencement des points ! Eviter les sauts dans le temps ! Pour tout couple (r i , t j ), le choix des prédécesseurs est limité à " (r i-1 ,t j ), (r i ,t j-1 ), (r i-1 ,t j-1 ) ! Exhaustivité ! Chaque élément de R doit être mis en relation avec au moins un élément de T et vice-versa ! max(m, n) " k " m + n - 1 11 Principe de base 12 Principe de base ! Programmation dynamique ! Fonction d’optimisation: ! Soit D(i,j) la longueur du chemin entre (r 1 , t 1 ) et (r i , t j ) ! Récursion: ! D(i,j) = dist(r i ,t j ) + min(D(i-1,j), D(i-1,j-1), D(i, j-1)) ! condition initiale: D(1,1) = dist(r 0 ,t 0 ) ! Distance minimale entre les deux séquences ! D(n,m) 13 Mise en application ! Construction d’une matrice D de dimensions n x m dans laquelle chaque élément (i,j) contient D(i,j) ! Remplissage de D(1,1) avec la condition initiale ! Utilisation de la formule récursive pour remplir la matrice ligne par ligne ou colonne par colonne ! Cas particuliers première ligne et première colonne ! Distance minimale donnée par l’élément D(n,m) ! La distance minimale peut être normalisée par la longueur du chemin 14 Algorithme 15 i i Chemin de déformation 16 Chemin de déformation ! A chaque calcul de D(i,j), sauvegarde du prédécesseur qui minimise la distance ! Parcours des prédécesseurs en partant de D(n,m) 17 Complexité ! Complexité: O(m*n) ! Optimisation en limitant la région de recherche 18 Optimisation ! La zone supérieure gauche et la zone inférieure droite ne sont pas calculées ! Les distances locales associées sont mises à une valeur très élevée afin que le chemin n’y passe pas 19 Optimisation 20 Optimisation ! Mise à l’échelle ! Réduire les tailles de R et T 21 Singularités ! Une singularité apparaît quand un point d’une séquence est associé à de nombreux points de l’autre séquence sans raison valable 22 2 Although DTW has been successfully used in many domains, it can produce pathological results. The crucial observation is that the algorithm may try to explain variability in the Y-axis by warping the X-axis. This can lead to unintuitive alignments where a single point on one time series maps onto a large subsection of another time series. We call examples of this undesirable behavior "singularities". A variety of ad-hoc measures have been proposed to deal with singularities. All of these approaches essentially constrain the possible warpings allowed. However they suffer from the drawback that they may prevent the "correct" warping from being found. In simulated cases, the correct warping can be known by warping a time series and attempting to recover the original (see Section 4). In naturally occurring cases we take "correct" to mean intuitively obvious "feature to feature" alignment as in Figure 2.B. An additional problem with DTW is that the algorithm may fail to find obvious, natural alignments in two sequences simply because a feature (i.e peak, valley, inflection point, plateau etc.) in one sequence is slightly higher or lower than its corresponding feature in the other sequence. Figure 2 illustrates this problem. Figure 2 : A) Two synthetic signals (with the same mean and variance). B) The natural "feature to feature" alignment. C) The alignment produced by dynamic time warping. Note that DTW failed to align the two central peaks because they are slightly separated in the Y-axis In this paper we address both these problems by introducing a modification of DTW. The crucial difference is in the features we consider when attempting to find the correct warping. Rather than use the raw data, we consider only the (estimated) local derivatives of the data. The rest of the paper is organized as follows. Section 2 contains a review of the classic DTW algorithm, including the various techniques suggested to prevent singularities. In Section 3 we introduce and demonstrate our extension, which we call Derivative Dynamic Time Warping (DDTW). Section 4 contains experimental results, and in Section 5 we offer conclusions and discuss possible directions for future work. 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 A B C Singularités: technique de fenêtrage ! Ajout de contraintes dans les alignements possibles ! Technique de fenêtrage ! Donne une borne supérieure à la singularité 23 for j=max(1,i-w); j <= min (m,i+w); j++ do i Singularités: slope weighting ! Pondérer les déplacements suivant les directions: ! D(i,j) = dist(r i ,t j ) + min(X*D(i-1,j), D(i-1,j-1), X*D(i, j-1)) ! X>1 ! Force l’alignement suivant la diagonale 24 Singularités: Step patterns ! D(i,j) = dist(r i ,t j ) + min(D(i-1,j-2), D(i-1,j-1), D(i-2, j-1)) ! Force le déplacement suivant la diagonale 25 5 We could replace equation 5 with !(i,j) = d(i,j) + min{ !(i-1,j-1) , !(i-1,j-2) , !(i- 2,j-1) }, which corresponds with the step-pattern show in Figure 4.B. Using this equation the warping path is forced to move one diagonal step for each step parallel to an axis. Dozens of different step-patterns have been considered, Rabiner and Juang (1993) contains a review. All the above may help to mitigate the problem of singularities, but at the risk of missing the correct warping. Additionally, it is not obvious how to chose the various parameters (R for Windowing and X for Slope Weighting) or Step-Pattern. 3 Derivative dynamic time warping If DTW attempts to align two sequences that are similar except for local accelerations and decelerations in the time axis, the algorithm is likely to be successful. The algorithm has problems when the two sequences also differ in the Y-axis. Global differences, affecting the entire sequences, such as different means (offset translation), different scalings (amplitude scaling) or linear trends can be efficiently removed (Keogh and Pazzani 1998, Agrawal et. al. 1995). However the two series may also have local differences in the Y-axis, for example a valley in one sequence may be deeper that the corresponding valley in the other time series. Consider Figure 5 as an example. Two identical sequences will clearly produce a one to one alignment. But if we slightly change a local feature, in this case the depth of a valley, DTW attempts to explain the difference in terms of the time-axis and produces two singularities. Figure 5: Using DTW, two identical sequences (a) will clearly produce a one to one alignment (b). However, if we slightly change a local feature, in this case the depth of a valley (c), DTW attempts to explain the difference in terms of the time-axis and produces two singularities (d). B A Figure 4: A pictorial representation of two alternative step-patterns: A) The pattern corresponding to !(i,j) = d(i,j) + min{ !(i-1,j-1) , !(i-1,j ) , !(i,j-1) } B) The pattern corresponding to !(i,j) = d(i,j) + min{ !(i-1,j-1) , !(i-1,j-2) , !(i-2,j-1) } a b d c Singularités ! Problème: ces contraintes peuvent empêcher d’obtenir l’alignement correct ! Comment définir les différents paramètres des techniques de levée des singularités (taille de fenêtre, coefficients de pondération, motifs)? 26 DDTW: Derivative Dynamic Time Warping 27 5 We could replace equation 5 with !(i,j) = d(i,j) + min{ !(i-1,j-1) , !(i-1,j-2) , !(i- 2,j-1) }, which corresponds with the step-pattern show in Figure 4.B. Using this equation the warping path is forced to move one diagonal step for each step parallel to an axis. Dozens of different step-patterns have been considered, Rabiner and Juang (1993) contains a review. All the above may help to mitigate the problem of singularities, but at the risk of missing the correct warping. Additionally, it is not obvious how to chose the various parameters (R for Windowing and X for Slope Weighting) or Step-Pattern. 3 Derivative dynamic time warping If DTW attempts to align two sequences that are similar except for local accelerations and decelerations in the time axis, the algorithm is likely to be successful. The algorithm has problems when the two sequences also differ in the Y-axis. Global differences, affecting the entire sequences, such as different means (offset translation), different scalings (amplitude scaling) or linear trends can be efficiently removed (Keogh and Pazzani 1998, Agrawal et. al. 1995). However the two series may also have local differences in the Y-axis, for example a valley in one sequence may be deeper that the corresponding valley in the other time series. Consider Figure 5 as an example. Two identical sequences will clearly produce a one to one alignment. But if we slightly change a local feature, in this case the depth of a valley, DTW attempts to explain the difference in terms of the time-axis and produces two singularities. Figure 5: Using DTW, two identical sequences (a) will clearly produce a one to one alignment (b). However, if we slightly change a local feature, in this case the depth of a valley (c), DTW attempts to explain the difference in terms of the time-axis and produces two singularities (d). B A Figure 4: A pictorial representation of two alternative step-patterns: A) The pattern corresponding to !(i,j) = d(i,j) + min{ !(i-1,j-1) , !(i-1,j ) , !(i,j-1) } B) The pattern corresponding to !(i,j) = d(i,j) + min{ !(i-1,j-1) , !(i-1,j-2) , !(i-2,j-1) } a b d c @INPROCEEDINGS{Keogh01derivativedynamic, author = {Eamonn J. Keogh and Michael J. Pazzani}, title = {Derivative Dynamic Time Warping}, booktitle = {In First SIAM International Conference on Data Mining (SDM’2001}, year = {2001} } DDTW: Derivative Dynamic Time Warping ! Utilisation de la dérivée des données des séquences ! Calcul du carré de la différence des dérivées 28 6 The weakness of DTW is in the features it considers. It only considers a datapoints Y- axis value. For example consider two datapoints q i and c j which have identical values, but q i is part of a rising trend and c j is part of a falling trend. DTW considers a mapping between these two points ideal, although intuitively we would prefer not to map a rising trend to a falling trend. To prevent this problem we propose a modification of DTW that does not consider the Y-values of the datapoints, but rather considers the higher level feature of "shape". We obtain information about shape by considering the first derivative of the sequences, and thus call our algorithm Derivative Dynamic Time Warping (DDTW). 3.1 Algorithm details As before we construct an n-by-m matrix where the (i th , j th ) element of the matrix contains the distance d(q i ,c j ) between the two points q i and c j . With DDTW the distance measure d(q i ,c j ) is not Euclidean but rather the square of the difference of the estimated derivatives of q i and c j . While there exist sophisticated methods for estimating derivatives, particularly if one knows something about the underlying model generating the data, we use the following estimate for simplicity and generality. 1 < i < m (6) This estimate is simply the average of the slope of the line through the point in question and its left neighbor, and the slope of the line through the left neighbor and the right neighbor. Empirically this estimate is more robust to outliers than any estimate considering only two datapoints. Note the estimate is not defined for the first and last elements of the sequence. Instead we use the estimates of the second and penultimate elements respectively. For noisy datasets we use exponential smoothing (Mills 1990) before attempting to estimate the derivatives. DDTW’s time complexity is O(mn), which is the same as DTW. There are some added constant factors because of derivative estimating step, but there is no need to remove offset translation, a necessary step for DTW. Empirically the two algorithms take approximately the same time. A variety of optimizations have been proposed for DTW (Myers et. al 1990). We omit discussion of them here for brevity, but note that they apply equally well to DDTW. Figure 6: 1) Two artificial signals. 2) The intuitive feature to feature warping alignment. 3) The alignment produced by classic DTW. 4) The alignment produced by DDTW. 2 ) 2 ) (( ) ( ] [ 1 1 1 ! + ! ! + ! = i i i i x q q q q q D 0 5 10 15 20 25 30 4 1 2 3 6 The weakness of DTW is in the features it considers. It only considers a datapoints Y- axis value. For example consider two datapoints q i and c j which have identical values, but q i is part of a rising trend and c j is part of a falling trend. DTW considers a mapping between these two points ideal, although intuitively we would prefer not to map a rising trend to a falling trend. To prevent this problem we propose a modification of DTW that does not consider the Y-values of the datapoints, but rather considers the higher level feature of "shape". We obtain information about shape by considering the first derivative of the sequences, and thus call our algorithm Derivative Dynamic Time Warping (DDTW). 3.1 Algorithm details As before we construct an n-by-m matrix where the (i th , j th ) element of the matrix contains the distance d(q i ,c j ) between the two points q i and c j . With DDTW the distance measure d(q i ,c j ) is not Euclidean but rather the square of the difference of the estimated derivatives of q i and c j . While there exist sophisticated methods for estimating derivatives, particularly if one knows something about the underlying model generating the data, we use the following estimate for simplicity and generality. 1 < i < m (6) This estimate is simply the average of the slope of the line through the point in question and its left neighbor, and the slope of the line through the left neighbor and the right neighbor. Empirically this estimate is more robust to outliers than any estimate considering only two datapoints. Note the estimate is not defined for the first and last elements of the sequence. Instead we use the estimates of the second and penultimate elements respectively. For noisy datasets we use exponential smoothing (Mills 1990) before attempting to estimate the derivatives. DDTW’s time complexity is O(mn), which is the same as DTW. There are some added constant factors because of derivative estimating step, but there is no need to remove offset translation, a necessary step for DTW. Empirically the two algorithms take approximately the same time. A variety of optimizations have been proposed for DTW (Myers et. al 1990). We omit discussion of them here for brevity, but note that they apply equally well to DDTW. Figure 6: 1) Two artificial signals. 2) The intuitive feature to feature warping alignment. 3) The alignment produced by classic DTW. 4) The alignment produced by DDTW. 2 ) 2 ) (( ) ( ] [ 1 1 1 ! + ! ! + ! = i i i i x q q q q q D 0 5 10 15 20 25 30 4 1 2 3 DDTW: Derivative Dynamic Time Warping 29 8 These results confirm what a casual perusal of Figure 7 tells us. DTW attempts to correct minor differences between the sequences by wild warpings of the time axis. In contrast DDTW, which considers higher levels features is much less inclined to "find" a warping were none exists. Figure 7: Examples of the three experimental datasets used in this paper. Each box in the leftmost column contains two related sequences that do not contain time warping, but do have minor differences in the Y-axis. The warpings "discovered" by the two algorithms discussed in this paper are therefore completely spurious. The two rightmost columns show examples of the warpings returned by both algorithms. Table 1 contains a numerical comparison. 4.2 Finding the correct warping To test the ability of an algorithm to discover the correct warping between two sequences we need sequences for which the correct warping is known. Our approach is to take a sequence Q and to make a copy of it. This copy, Q’ then has a warping randomly inserted into it. We can then use Q and Q’ as input into the two algorithms and compare warpings. To distort sequence Q’ we begin by randomly choosing an anchor point, somewhere on the sequence. We first distort the Y-axis by adding or subtracting a Gaussian bump, centered on the anchor point, to the sequence. We did this for 3 different height bumps 0.0, 0.1 and 0.2 (0.0 corresponding to no bump). Next, we randomly shifted the anchor point 15 time-units left or right. The other datapoints were moved to compensate for this shift by an amount that depended on their inverse squared distance to the anchor point, thus localizing the effect. After the X-axis transformation we interpolated the data back onto the original, equi-spaced X-axis. The net effect of the two transformations is a smooth local distortion of the original sequence, as shown in Figure 8. Space Shuttle Exchange rate EEG DTW DDTW Recherche de motifs 30 Reconnaissance de gestes ! Le geste fait généralement l’objet de variance dans sa reproduction ! Un seul geste exemple n’est souvent pas suffisant ! Définition d’un modèle statistique prenant en compte ces variations ! Utilisation de plusieurs gestes exemples ! Calcul de la moyenne et de la variance en fonction du temps 31 Reconnaissance de gestes ! Chaque exemple est mis en correspondance, en utilisant DTW, avec le geste exemple le plus long avant de calculer les caractéristiques statistiques ! Calcul d’un modèle de geste g pour chaque classe d’exemples ! Calcul de la moyenne et de la variance à chaque pas de temps: 32 D = n i=1 distance(r i , t i ) (1) k i=1 distance(w i ) (2) ˜ g m [t] et σ 2 (g m [t]) (3) 1 Reconnaissance de gestes ! Un geste candidat est aligné avec chaque modèle en utilisant DTW ! A chaque nouvel événement, la séquence de test T est comparée à chaque modèle en utilisant la métrique suivante: 33 D = n i=1 distance(r i , t i ) (1) k i=1 distance(w i ) (2) ˜ g m [t] et σ 2 (g m [t]) (3) D i,j = 1 σ 2 (g m [t]) (˜ g m [t] −T[j]) 2 (4) 1 Reconnaissance de gestes ! L’alignement de la séquence T avec les modèles se fait par retour dans le temps en considérant que le temps présent est le temps 0 ! Relaxation des contraintes en considérant une longueur minimale pour la séquence T ! Le score du geste est l’inverse de la distance calculée par DTW ! Le geste avec le score le plus élevé est considéré comme le geste reconnu 34 Affine invariant DTW ! Problème: identifier dans une séquence un objet qui subit une transformation affine ! Choix de la fonction de coût? 35 t t r v r x r x x ) sin( ) cos( ! ! " " # $ , t t r v r x r v v ) cos( ) sin( ! ! " % # $ , (12) where & & # # # # n i i w r r n i i w r r v n v x n x 1 ) ( , 1 ) ( , 1 , 1 , )} )( ( ) )( ( ¦ ) ( , , 1 ) ( , , r i w r t i t n i r i w r t i t v v v v x x x x c " " " " " " # & # , )} )( ( ) )( ( ¦ ) ( , , 1 ) ( , , r i w r t i t n i r i w r t i t v v x x x x v v d " " % " " " # & # , } ) ( ) ( ¦ 2 , 2 , 1 t i t t i t n i v v x x e " % " # & # . 4. Experiments on Online Rotated Handwriting Recognition Figure 2 Examples oI Rotated Testing Samples To examine the utility oI AI-DTW, we have executed recognition experiments on a set oI rotated online handwriting digit data Irom the UCI Repository oI machine learning databases |10|. The database includes 7,494 training samples and 3,498 writer independent testing samples. II a sample consists oI more than one stroke, we connect them into one stroke Ior simplicity. All the samples are normalized to a size oI 120×90. We sample 20 points with equal space interval along the pen trace oI each sample. Then each oI the testing samples was rotated by an angle randomly generated between |-!/2, !/2|. Some oI the rotated testing samples are shown in Fig. 2. DTW had been proved to be an eIIicient method to calculate distance in online handwriting recognition |3- 5|. Here we select the simple k-Nearest Neighbor (NN) classiIier to compare the recognition perIormance oI two distance metrics: DTW and AI-DTW. As the online samples have been normalized, to be Iair Ior DTW and AI-DTW, we use the rotation and shiIt invariant with Iixed scale rate r÷1 (see Section 3.3). The recognition results are summarized in Table 1. We can Iind that the AI-DTW perIormed much better than the DTW method did, which indicated that the AI- DTW is a more eIIective method to compare online rotated shapes than classical DTW. The main drawback oI using NN classiIier is its low speed. However, one can reduce the computation time greatly by using learned prototypes |4|. Due to page limitation, we have to omit the detailed discussion on this. Table 1 Recognition rates oI DTW and AI-DTW (k is the number oI neighbors used) k 1 3 5 10 DTW 64.00° 64.44° 65.44° 65.87° AI-DTW 95.54° 95.11° 94.65° 94.08° 5.Conclusion This paper proposes the aIIine invariant dynamic time warping method to compare two sequences which may have local shape deIormation and undergo global unknown aIIine transIorm. We Iormulated the transIormation matrix and warping path into a uniIied optimal problem and developed an iterative algorithm to Iind the optimal solution. Experiments on online rotated handwriting indicated that our method can signiIicantly improve the recognition rates oI the DTW. For Iuture work, we are considering 1) to study the regulations oI transIormation matrix and 2) to combine the statistical learning methods with the AI-DTW. References |1| L. Rabiner und B.H. Juang. "Fundamentals oI Speech Recognition". Prentice Hall PTR. 1993 |2| H. Sakoe and S. Chiba, "A Dynamic Programming Algorithm Optimization Ior Spoken Word Recognition," IEEE Trans. on ASSP, 26(1): 43-49. 1978 |3| H. Mitoma, S.i Uchida, and H. Sakoe, "Online character recognition using eigen-deIormations," 9th IWHFR Tokyo, 2004 |4| J. Alon, V. Athitsos, and S. SclaroII. "Online and OIIline Character Recognition Using Alignment to Prototypes," Proc. ICDAR, Korea 2005 |5| C. Bahlmann and H. Burkhardt, "The writer independent online handwriting recognition system Ilog on hand and cluster generative statistical dynamic time warping," IEEE Trans. on PAMI, 26(3):299- 310, 2004 |6| T. Rath and R. Manmatha. "Word image matching using dynamic time warping", Proc. CJPR, vol. 2. pp. 521-527, 2003 |7| Dionisio, C.R.P.; Kim, H.Y."New Ieatures Ior aIIine-invariant shape classiIication," Proc. ICIP, pp:2135 - 2138, 2004 |8| William H P., Saul A T., William T V., Brian P F., "Numerical Recipes in C¹¹: The Art oI ScientiIic. Computing," Cambridge Universitv Press, Cambridge, England, 1992 |9| Dempster, A., Laird, N., and Rubin, D. "Maximum likelihood Irom incom-plete data via the EM algorithm.," Journal of the Roval Statistical Societv, Series B,39(1):138. 1977 |10| Hettich S,. Blake C.L, and Merz C.J. "UCI Repository oI machine learning databases", 1998 |http://www.ics.uci.edu/~mlearn/MLRepository.html| Affine invariant DTW 36 sub-problems can mutually improve one another during the process. And iI the square oI Euclidean distance is used to in the second sub-problem to calculate d(t i , r f ), the convergence oI this method can be analyzed by a similar way oI Expectation- Maximization (EM) algorithm |9|. The details oI our algorithm are as Iollows: Optimization Algorithm of AI-DTW: Initialize The warping path w (1) ÷DTW¸PATH(T, R). Iteration number k÷1 While not convergence k ÷k¹1; Update the transIormation matrix by: ( 1) 2 ( ) ( ) 1 arg min¦ } k n k i w i A i A t r A ! " " ! # . Update the warping path by: w (k) ÷DTW¸PATH(T, RA (k) ). End While Some examples oI the above algorithm are illustrated in Figure 1, and one can Iind that the most error reduction is done in the Iirst iterations and the AI-DTW can usually converge in a Iew iterations. 3.3 Rotation, Scale and Shift Invariant DTW In many 2-dimension shape analysis applications, such as online handwriting, it is oIten necessary to use rotation scale and shiIt invariant (RSS invariant Ior short) method. Rotation scale and shiIt can be seen as special cases oI aIIine transIorm. Here, we discuss this problem separately Ior the Iollowing reasons: (1) The RSS invariant problem can be solved more eIIiciently without the matrix calculation in Eq (9). (2) In some applications, the AI-DTW may over-Iit the input sequence. For example in the online handwriting, the samples oI digit '2','7' can be transIormed shapes very similar to digit '1', iI we wrinkle the samples along x coordinate. Thus Ior these examples, it is more natural to set the scale rates oI x coordinate and v coordinate as the same. Given a point (x, v), the coordinates aIter rotation, scale and shiIt transIormation can be calculated by: x´÷ rcos(!) x¹ rsin(!)v¹!x v´÷-rsin(!) x¹ rcos(!)v¹!v, (10) where r denotes the scale rate, ! is the rotation angle, and !x, !v represents the shiIt parameters. For simplicity, we denote the transIormation in Eq. (10) by (x´, v´)÷f(x, v), and assume that each element t i ÷ (x t,i , v t,i ) is a 2-dimension vector composed by the coordinates oI point t i . Similarly r f ÷ (x r,i , v r,i ). Given warping path w, the Iirst sub optimal problem discussed in Section 3.2 can be re-Iormulated as: # " ! " n i i w i b a r r f t R T G 1 2 ) ( , , , ) ( ) , ( min arg $ . (11) By solving the Iunctions "G/"r÷0, "G/"!÷0, "G/"a÷0, "G/"b÷0, we have: 1 tan ( ) d c $ ! " , e d c r / 2 2 % " , ! "! #! $! %! ! "! #! $! %! &!! &"! '()*+ - ./! ! /! &!! &! "! 0! #! /! $! 1! %! 2! &!! &&! '()*+ 3 45567"$62!"% ! "! #! $! %! ! "! #! $! %! &!! &"! '+56 & 455671602&/ ! "! #! $! %! ! "! #! $! %! &!! &"! '+56 # 45567/60%#& ! / &! ! / &! &/ "! "/ 0! '+458+9:( ; 5 5 : 5 ! "! #! $! %! ! "! #! $! %! &!! &"! '()*+ - "! #! $! %! ! "! #! $! %! &!! &"! '()*+ 3 455670$62!1/ ! "! #! $! %! ! "! #! $! %! &!! &"! '+56 & 4556726%!2% ! "! #! $! %! ! "! #! $! %! &!! &"! '+56 # 45567$6%!00 ! / &! ! / &! &/ "! "/ 0! '+458+9 :( ; 5 5 : 5 Figure 1 Examples oI AI-DTW. Top row: digit 2. Bottom row: digit 8. LeIt column: Input pattern T; 2nd column: Input pattern R aIter rotation. 3rd column: pattern R aIter the Iirst iteration oI AI-DTW. 4th column: pattern R aIter the 4th iteration; Right column: Error curve along iteration num. Yu Qiao and Makoto Yasuhara. 2006. Affine Invariant Dynamic Time Warping and its Application to Online Rotated Handwriting Recognition. In Proceedings of the 18th International Conference on Pattern Recognition - Volume 02 (ICPR '06), Vol. 2. IEEE Computer Society, Washington, DC, USA, 905-908. DOI=10.1109/ICPR.2006.228 http://dx.doi.org/10.1109/ICPR.2006.228 Multi-Dimensional Dynamic Time Warping ! Comment appliquer DTW quand plusieurs grandeurs sont mesurées à chaque instant? ! Modification de la fonction de distance: ! Chaque dimension est normalisée avec une moyenne de 0 et une variance de 1 ! Calcul de la somme des valeurs absolues des différences suivant chaque dimension 37 G. A. ten Holt, M. J. T. Reinders and E. A. Hendriks. Multi-dimensional dynamic time warping for gesture recognition. In Proceedings of the Thirteenth annual conference of the Advanced School for Computing and Imaging, 2007. Multi-Dimensional Dynamic Time Warping 38 Figure 3: The necessity for MD-DTW. (a) shows two artificial 2D time series of equal mean and variance. Di- mension 1 is shown in column 1, dimension 2 in column 2. The series contain synchronisation information in both dimensions. If 1D-DTW is performed using the first dimension, the result is suboptimal for dimension 2, as illustrated in (b). Note that the peaks and valleys in dimension 2 are not aligned properly. 1D-DTW on dimension 2 gives a similar suboptimal match for dimension 1. MD-DTW takes both dimensions into account and finds the best synchronisation (c). result. MD-DTW takes both dimensions into account in finding the optimal synchronisation. The result is a synchronisation that is as ideal as possible for both dimensions, as shown in figure 3 (c). This is the ad- vantage of MD-DTW over regular DTW. Though the series in figure 3 is artificial, the situation it depicts is not unrealistic. Figure 4 shows a similar situation from a real-world multi-dimensional series: x- and y-co-ordinates of the right hand for the Dutch sign for acrobat. Here, the y-co-ordinate is informative for the first part of the series, and the x- co-ordinate for the second part. To get both peaks properly matched, MD-DTW is necessary. 2.2 DTW on Derivatives [4] argue that for synchronising shape- characteristics of series (such as peaks and slopes), it is beneficial to perform DTW on the first-order derivatives of the feature values (DDTW). Since we want to perform such shape-matching in each of our dimensions, we considered this option for the MD- DTW algorithm. In our case, this meant taking the first-order derivative in each dimension separately. This gives us information about the slopes and peaks in each dimension. The series were first smoothed in each dimension with a Gaussian filter (σ = 5) to diminish noise effects. Then an approximation of the derivative was taken in each dimension using the filter der(a(t)) = (a(t + 1) −a(t −1))/2. Now we can perform two types of MD-DTW: on the feature values and on their derivatives. As a third type, we took the derivatives and added them to the series as extra dimensions, doubling the dimensional- ity of the series. In this setting, both the feature values themselves and their derivatives are taken into consid- eration when searching for the ideal warp. We tested our algorithm in all three settings. They are denoted as S(ignal), D(erivative) and SD(signal+derivative). In each case, the series were smoothed and normalised before (MD)DTW was commenced. 2.3 Dimension Selection To compare series with various dimensions, it is necessary to normalise the dimensions. However, this is a problem if a dimension contains only noise. For example, position data gathered on the non-dominant hand in a one-handed gesture. Normalisation will enlarge the noise in this dimension to the same pro- portions as the informative data in other dimensions (such as dominant hand positions). The noise will Mutli-Dimensional Dynamic Time Warping ! Possibilité d’utiliser les signaux et leur dérivée 39 TP 40

Comments

Description