Bio Metrics



Comments



Description

 BIOMETRICS   Edited by Jucheng Yang                              Biometrics Edited by Jucheng Yang Published by InTech Janeza Trdine 9, 51000 Rijeka, Croatia Copyright © 2011 InTech All chapters are Open Access articles distributed under the Creative Commons Non Commercial Share Alike Attribution 3.0 license, which permits to copy, distribute, transmit, and adapt the work in any medium, so long as the original work is properly cited. After this work has been published by InTech, authors have the right to republish it, in whole or part, in any publication of which they are the author, and to make other personal use of the work. Any republication, referencing or personal use of the work must explicitly identify the original source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. The publisher assumes no responsibility for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained in the book. Publishing Process Manager Mirna Cvijic  Technical Editor Teodora Smiljanic Cover Designer Jan Hyrat Image Copyright Andy Piatt, 2010. Used under license from Shutterstock.com First published July, 2011 Printed in Croatia A free online edition of this book is available at www.intechopen.com Additional hard copies can be obtained from [email protected] Biometrics, Edited by Jucheng Yang p. cm. ISBN 978-953-307-618-8 free online editions of InTech Books and Journals can be found at www.intechopen.com     Contents   Preface IX Part 1 Physical Biometrics 1 Chapter 1 Speaker Recognition 3 Homayoon Beigi Chapter 2 Finger Vein Recognition 29 Kejun Wang, Hui Ma, Oluwatoyin P. Popoola and Jingyu Li Chapter 3 Minutiae-based Fingerprint Extraction and Recognition 55 Naser Zaeri Chapter 4 Non-minutiae Based Fingerprint Descriptor 79 Jucheng Yang Chapter 5 Retinal Identification 99 Mikael Agopov Chapter 6 Retinal Vessel Tree as Biometric Pattern 115 Marcos Ortega and Manuel G. Penedo Chapter 7 DNA Biometrics 139 Masaki Hashiyada Part 2 Behavioral Biometrics 155 Chapter 8 Keystroke Dynamics Authentication 157 Romain Giot, Mohamad El-Abed and Christophe Rosenberger Chapter 9 DWT Domain On-Line Signature Verification 183 Isao Nakanishi, Shouta Koike, Yoshio Itoh and Shigang Li VI Contents Part 3 Medical Biometrics 197 Chapter 10 Heart Biometrics: Theory, Methods and Applications 199 Foteini Agrafioti, Jiexin Gao and Dimitrios Hatzinakos Chapter 11 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 217 Francesco Beritelli and Andrea Spadaccini Chapter 12 Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 235 Makoto Fukumoto and Hiroki Hasegawa Chapter 13 The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 249 Charles F. Streckfus and Cynthia Guajardo-Edwards       Preface   Biometrics  uses  methods  for  unique  recognition  of  humans  based  upon  one  or  more  intrinsic  physical  or  behavioral  traits.  In  computer  science,  particularly,  biometrics  is  used  as  a  form  of  identity  access  management  and  access  control.  It  is  also  used  to  identify individuals in groups that are under surveillance.   The  key  objective  of  the  book  is  to  provide  comprehensive  reference  and  text  on  human authentication and people identity verification from physiological, behavioural  and  other  points  of  view  (medical  biometrics).  It  aims  to  publish  new  insights  into  current  innovations  in  computer  systems  and  technology  for  biometrics  development  and its applications.   The book consists of 13 chapters, each focusing on a certain aspect of the problem. The  book  chapters  are  divided  into  three  sections:  physical  biometrics,  behavioral  biometrics  and  medical  biometrics.  In  the  first  physical  biometrics  section,  there  are  seven  chapters.  Chapter  1  provides  an  in‐depth  look  at  speaker  recognition  and  address many  practical and algorithmic issues related to the design and utilization of  speaker recognition. In chapter 2 the author proposes some new algorithms for finger  vein  recognition  such  as  using  oriented  filtering,  template  matching  with  relative  distance  and  angle  and  wavelet  moment  fusing  with  PCA  and  LDA  transform.  In  chapter  3  the  author  gives  the  recent  advancements  in  the  field  of  minutia‐based  fingerprint extraction and recognition. Chapter 4 provides a comprehensive idea about  some  of  the  well‐known  non‐minutiae  based  descriptors  during  the  last  two  decades  and  also  proposes  a  novel  non‐minutiae  based  fingerprint  descriptor  with  tessellated  invariant moment features and Support Vector Machine (SVM). In chapter 5 the retina  scanning  technique  is  considered  in  detail  throughout  its  historical  evolution  and  try  to use the birefringence of the retinal nerve fiber layer (RNFL) as a basis for successful  identification.  Chapter  6  proposes  a  fully‐automatic  authentication  system  using  the  retinal  vessel  tree  pattern  as  biometric  characteristic.  As  the  most  reliable  personal  identification, DNA is intrinsically digital and does not change during a person’s life,  and  even  after  death.  In  chapter  7  the  author  proposes  a  method  for  generating  a  personal  ID  comprising  short  tandem  repeat  (STR)  and  single  nucleotide  polymorphism (SNP) information which are used in personal identification in forensic  application.  X Preface In  Section  2,  two  kinds  of  behavioral  biometrics:  keystroke  dynamics  and  DWT  domain  on‐line  signature  verification  are  introduced  in  chapter  8  and  chapter  9  respectively.  In  section  3,  medical  biometrics  constitutes  another  category  of  new  biometric recognition modalities that encompasses signals which are typically used in  clinical  diagnostics,  so  chapter  10  gives  a  survey  on  heart  biometrics  with  its  theory,  methods  and  applications.  Chapter  11  proposes  the  usage  of  heart  sounds  for  biometric  recognition,  describes  the  strengths  and  the  weaknesses  of  the  novel  trait  and  analyzes  in  detail  the  methods  developed  so  far  and  their  performance.  Chapter  12  investigates  the  temporal  change  in  heartbeat  intervals  in  a  transition  between  different sound stimuli, since observing temporal change in heartbeat is important and  contributes to improvement of exposure method of music and sound. In chapter 13 the  author  proposes  the  use  of  saliva  protein  profiling  as  a  biometric  tool  to  authenticate  the presence of carcinoma among women.  The  book  was  reviewed  by  editor  Dr.  Jucheng  Yang,  and  some  guest  editors,  such  as  Dr.  Girija  Chetty,  Dr.  Norman  Poh,  Dr.  Loris  Nanni,  Dr.  Jianjiang  Feng,  Dr.  Dongsun  Park, Dr. Sook Yoon and other.      Dr. Jucheng Yang   Professor  School of Information Technology,  Jiangxi University of Finance and Economics,   Nanchang, Jiangxi province,  China        Part 1 Physical Biometrics 0 Speaker Recognition Homayoon Beigi Recognition Technologies, Inc. U.S.A. 1. Introduction Speaker Recognition is a multi-disciplinary technology which uses the vocal characteristics of speakers to deduce information about their identities. It is a branch of biometrics that may be used for identification, verification, and classification of individual speakers, with the capability of tracking, detection, and segmentation by extension. A speaker recognition system first tries to model the vocal tract characteristics of a person. This may be a mathematical model of the physiological system producing the human speech or simply a statistical model with similar output characteristics as the human vocal tract. Once a model is established and has been associated with an individual, new instances of speech may be assessed to determine the likelihood of them having been generated by the model of interest in contrast with other observed models. This is the underlying methodology for all speaker recognition applications. The earliest known papers on speaker recognition were published in the 1950s (Pollack et al., 1954; Shearme & Holmes, 1959). Initial speaker recognition techniques relied on a human expert examining representations of the speech of an individual and making a decision on the person’s identity by comparing the characteristics in this representation with others. The most popular representation was the formant representation. In the recent decades, fully automated speaker recognition systems have been developed and are in use (Beigi, 2011). There have been a number of tutorials, surveys, and review papers published in the recent years (Bimbot et al., 2004; Campbell, 1997; Furui, 2005). In a somewhat different approach, we have tried to present the material, more in the form of a comprehensive summary of the field with an ample number of references for the avid reader to follow. A coverage of most of the aspects is presented, not just in the form of a list of different algorithms and techniques used for handling part of the problem, as it has been done before. As for the importance of speaker recognition, it is noteworthy that speaker identity is the only biometric which may be easily tested (identified or verified) remotely through the existing infrastructure, namely the telephone network. This makes speaker recognition quite valuable and unrivaled in many real-world applications. It needs not be mentioned that with the growing number of cellular (mobile) telephones and their ever-growing complexity, speaker recognition will become more popular in the future. There are countless number of applications for the different branches of speaker recognition. If audio is involved, one or more of the speaker recognition branches may be used. However, in terms of deployment, speaker recognition is in its early stages of infancy. This is partly due to unfamiliarity of the general public with the subject and its existence, partly because of the limited development in the field. These include, but are certainly not limited to, financial, 1 2 Will-be-set-by-IN-TECH forensic and legal (Nolan, 1983; Tosi, 1979), access control and security, audio/video indexing and diarization, surveillance, teleconferencing, and proctorless distance learning Beigi (2009). Speaker recognition encompasses many different areas of science. It requires the knowledge of phonetics, linguistics and phonology. Signal processing which by itself is a vast subject is also an important component. Information theory is at its basis and optimization theory is used in solving problems related to the training and matching algorithms which appear in support vector machines (SVMs), hidden Markov models (HMMs), and neural networks (NNs). Then there is statistical learning theory which is used in the form of maximum likelihood estimation, likelihood linear regression, maximum a-posteriori probability, and other techniques. In addition, Parameter estimation and learning techniques are used in HMM, SVM, NN, and other underlying methods, at the core of the subject. Artificial intelligence techniques appear in the form of sub-optimal searches and decision trees. Also applied math, in general, is used in the form of complex variables theory, integral transforms, probability theory, statistics, and many other mathematical domains such as wavelet analysis, etc. The vast domain of the field does not allow for a thorough coverage of the subject in a venue such as this chapter. All that can be done here is to scratch the surface and to speak about the inter-relations among these topics to create a complete speaker recognition system. The avid reader is recommended to refer to (Beigi, 2011) for a comprehensive treatment of the subject, including the details of the underlying theory. To start, let us briefly review different biometrics in contrast with speaker recognition. Then, it is important to clarify the terminology and to describe the problems of interest by reviewing the different manifestations and modalities of this biometric. Afterwards, some of the challenges faced in achieving a practical system are listed. Once the problems are clearly posed and the challenges are understood, a quick reviewof the production and the processing of speech by humans is presented. Then, the state of the art in addressing the problems at hand is briefly surveyed in a section on theory. Finally, concluding remarks are made about the current state of research on the subject and its future trend. 2. Comparison with other biometrics There have been a number of biometrics used in the past few decades for the recognition of individuals. Some of these markers have been discussed in other chapters of this book. A comparison of voice with some other popular biometrics will clarify the scope of its practical usage. Some of the most popular biometrics are Deoxyribonucleic Acid (DNA), image-based and acoustic ear recognition, face recognition, fingerprint and palm recognition, hand and finger geometry, iris and retinal recognition, thermography, vein recognition, gait, handwriting, and keystroke recognition. Fingerprints, as popular as they are, have the problem of not being able to identify people with damaged fingers. These are, for example, construction workers, people who work with their hands, or maybe people without limbs, such as those who have either lost their hands or their fingers in an accident or those who congenitally lack fingers or limbs. According to the National Institute of Standards and Technology (NIST), this is about 2% of the population! Also, latex prints of finger patterns may be used to spoof some sensors. People, with damaged irides, such as some who are blind, either congenitally or due to an illness like glaucoma, may not be recognized through iris recognition. It is very hard to tell the size of this population, but they certainly exist. Additionally, one would need a high quality image of the iris to perform recognition. Acquiring these images is quite problematic. Although there are long distance iris imaging cameras, their field of vision may easily be 4 Biometrics Speaker Recognition 3 blocked by uncooperative users through the turning of the head, blinking, rolling of the eyes, wearing of hats, glasses, etc. The image may also not be acceptable due to lighting and focus conditions. Also, irides tend to change due to changes in lighting conditions as the pupils dilate or contract. It is also possible to spoof some iris recognition systems, either by wearing contact lenses or by simply using an image of the target individual’s irides. Of course, there is also a percentage of the population who are unable to speak, therefore they will not be able to use speaker recognition systems. The latest figures for the population of deaf and mute people in the United States reflected by the US Census Bureau set this percentage at 0.4% for deaf and mute individuals (USC, 2005). Spoofing, using recordings is also a concern in practical speaker recognition systems. In terms of public acceptance, fingerprint recognition has long been associated with criminology. Due to these legacy associations, many individuals are wary of producing a fingerprint for fear of its malicious usage or simply due to the criminal connotation it carries. As an example, a few years ago, the United States government required capturing the image and fingerprint of all tourists entering the nation’s airports. This action offended many tourists to the point that some countries such as Brazil placed a reciprocal system in place only for U.S. citizens entering their country. Many people entering the U.S. felt like they were being treated as criminals, only based on the act of fingerprinting. Of course, since many other countries have been adopting the fingerprint capture requirement, it is being tolerated by travelers much better, around the world. Because facial, iris, retinal images, and fingerprints have a sole purpose of being used in recognition, they are somewhat harder to capture. In general, the public is more wary of providing such information which may be archived and misused. On the other hand, speech has been established for communication and people are far less likely to be concerned about parting with their speech. Even in the technological arena, the use of speech for telephone communication makes it much more socially acceptable. Speaker recognition can also utilize the widely available infrastructure that has been around for so long, namely the telephone network. Speech may be used for doing remote recognition of the individual using the existing telephone network and without the need for any extra hardware or other apparatus. Also, speaker recognition, in the form of tracking and detection may be used to do much more than simple identification and verification of individuals, such as a full diarization of large media databases. Another attractive point is that cellular telephone and PDA-type data security needs no extra hardware, since cellular telephones already have speech capture devices, namely microphones. Most PDAs also contain built-in microphones. On the other hand, for fingerprint and image recognition, a fingerprint scanner and a camera would have to be present. Multimodal biometrics entail systems which combine any two or more of these or other biometrics. These combinations increase the accuracy of the identification or verification of the individual based on the fact that the information is obtained through different, mostly independent sources. Most practical implementations of biometric system will need to utilize some kind of multimodal approach; since any one technique may be bypassed by the eager impostor. It would be much more difficult to fool several independent biometric systems simultaneously. Many of the above biometrics may be successfully combined with speaker recognition to produce viable multimodal systems with much higher accuracies. (Viswanathan et al., 2000) shows an example of such a multimodal approach using speaker and image recognition. 5 Speaker Recognition 4 Will-be-set-by-IN-TECH 3. Terminology and manifestations In addressing the act of speaker recognition many different terms have been coined, some of which have caused great confusion. Speech recognition research has been around for a long time and, naturally, there is some confusion in the public between speech and speaker recognition. One term that has added to this confusion is voice recognition. The term voice recognition has been used in some circles to double for speaker recognition. Although it is conceptually a correct name for the subject, it is recommended that the use of this term is avoided. Voice recognition, in the past, has been mistakenly applied to speech recognition and these terms have become synonymous for a long time. In a speech recognition application, it is not the voice of the individual which is being recognized, but the contents of his/her speech. Alas, the term has been around and has had the wrong association for too long. Other than the aforementioned, a myriad of different terminologies have been used to refer to this subject. They include, voice biometrics, speech biometrics, biometric speaker identification, talker identification, talker clustering, voice identification, voiceprint identification, and so on. With the exception of the term speech biometrics which also introduces the addition of a speech knowledge-base to speaker recognition, the rest do not present any additional information. 3.1 Speaker enrollment The first step required in most manifestations of speaker recognition is to enroll the users of interest. This is usually done by building a mathematical model of a sample speech from the user and storing it in association with an identifier. This model is usually designed to capture statistical information about the nature of the audio sample and is mostly irreversible – namely, the enrollment sample may not be reconstructed from the model. 3.2 Speaker identification There are two different types of speaker identification, closed-set and open-set. Closed-set identification is the simpler of the two problems. In close-set identification, the audio of the test speaker is compared against all the available speaker models and the speaker ID of the model with the closest match is returned. In practice, usually, the top best matching candidates are returned in a ranked list, with corresponding confidence or likelihood scores. In closed-set identification, the ID of one of the speakers in the database will always be closest to the audio of the test speaker; there is no rejection scheme. One may imagine a case where the test speaker is a 5-year old child where all the speakers in the database are adult males. In closed-set Identification, still, the child will match against one of the adult male speakers in the database. Therefore, closed-set identification is not very practical. Of course, like anything else, closed-set identification also has its own applications. An example would be a software programwhich would identify the audio of a speaker so that the interaction environment may be customized for that individual. In this case, there is no great loss by making a mistake. In fact, some match needs to be returned just to be able to pick a customization profile. If the speaker does not exist in the database, then there is generally no difference in what profile is used, unless profiles hold personal information, in which case rejection will become necessary. Open-set identification may be seen as a combination of closed-set identification and speaker verification. For example, a closed-set identification may be conducted and the resulting ID may be used to run a speaker verification session. If the test speaker matches the target speaker based on the ID, returned from the closed-set identification, then the ID is accepted 6 Biometrics Speaker Recognition 5 and passed back as the true ID of the test speaker. On the other hand, if the verification fails, the speaker may be rejected all-together with no valid identification result. An open-set identification problem is therefore at least as complex as a speaker verification task (the limiting case being when there is only one speaker in the database) and most of the time it is more complex. In fact, another way of looking at verification is as a special case of open-set identification in which there is only one speaker in the list. Also, the complexity generally increases linearly with the number of speakers enrolled in the database since theoretically, the test speaker should be compared against all speaker models in the database – in practice this may be avoided by tolerating some accuracy degradation (Beigi et al., 1999). 3.3 Speaker verification (authentication) In a generic speaker verification application, the person being verified (known as the test speaker), identifies himself/herself, usually by non-speech methods (e.g., a username, an identification number, et cetera). The provided ID is used to retrieve the enrolled model for that person which has been stored according to the enrollment process, described earlier, in a database. This enrolled model is called the target speaker model or the reference model. The speech signal of the test speaker is compared against the target speaker model to verify the test speaker. Of course, comparison against the target speaker’s model is not enough. There is always a need for contrast when making a comparison. Therefore, one or more competing models should also be evaluated to come to a verification decision. The competing model may be a so-called (universal) background model or one or more cohort models. The final decision is made by assessing whether the speech sample given at the time of verification is closer to the target model or to the competing model(s). If it is closer to the target model, then the user is verified and otherwise rejected. The speaker verification problem is known as a one-to-one comparison since it does not necessarily need to match against every single person in the database. Therefore, the complexity of the matching does not increase as the number of enrolled subjects increases. Of course in reality, there is more than one comparison for speaker verification, as stated – comparison against the target model and the competing model(s). 3.3.1 Speaker verification modalities There are two major ways in which speaker verification may be conducted. These two are called the modalities of speaker verification and they are text-dependent and text-independent. There are also variations of these two modalities such as text-prompted, language-independent text-independent and language-dependent text-independent. In a purely text-dependent modality, the speaker is required to utter a predetermined text at enrollment and the same text again at the time of verification. Text-dependence does not really make sense in an identification scenario. It is only valid for verification. In practice, using such text-dependent modality will be open to spoofing attacks; namely, the audio may be intercepted and recorded to be used by an impostor at the time of the verification. Practical applications that use the text-dependent modality, do so in the text-prompted flavor. This means that the enrollment may be done for several different textual contents and at the time of verification, one of those texts is requested to be uttered by the test speaker. The chosen text is the prompt and the modality is called text-prompted. A more flexible modality is the text-independent modality in which case the texts of the speech at the time of enrollment and verification are completely random. The difficulty with this 7 Speaker Recognition 6 Will-be-set-by-IN-TECH method is that because the texts are presumably different, longer enrollment and test samples are needed. The long samples increase the probability of better coverage of the idiosyncrasies of the person’s vocal characteristics. The general tendency is to believe that in the text-dependent and text-prompted cases, since the enrollment and verification texts are identical, they can be designed to be much shorter. One must be careful, since the shorter segments will only examine part of the dynamics of the vocal tract. Therefore, the text for text-prompted and text-dependent engines must still be designed to cover enough variation to allow for a meaningful comparison. The problem of spoofing is still present with text-independent speaker verification. In fact, any recording of the person’s voice should now get an impostor through. For this reason, text-independent systems would generally be used with another source of information in a multi-factor authentication scenario. In most cases, text-independent speaker verification algorithms are also language-independent, since they are concerned with the vocal tract characteristics of the individual, mostly governed by the shape of the speaker’s vocal tract. However, because of the coverage issue discussed earlier, some researchers have developed text-independent systems which have some internal models associated with phonemes in the language of their scope. These techniques produce a text-independent, but somewhat language-dependent speaker verification system. The language limitations reduce the space and, hence, may reduce the error rates. 3.4 Speaker and event classification The goal of classification is a bit more vague. It is the general label for any technique that pools similar audio signals into individual bins. Some examples of the many classification scenarios are gender classification, age classification, and event classification. Gender classification, as is apparent from its name, tries to separate male speakers and female speakers. More advanced versions also distinguish children and place them into a separate bin; classifying male and female is not so simple in children since their vocal characteristics are quite similar before the onset of puberty. Classification may use slightly different sets of features from those used in verification and identification, depending on the problem at hand. Also, either there may be no enrollment or enrollment may be done differently. Some examples of special enrollment procedures are, pooling enrollment data fromlike classes together, using extra features in supplemental codebooks related to specific natural or logical specifics of the classes of interest, etc.(Beigi, 2011). Although these methods are called speaker classification, sometimes, the technique are used for doing event classification such as classifying speech, music, blasts, gun shots, screams, whistles, horns, etc. The feature selection and processing methods for classification are mostly dependent on the scope and could be different from mainstream speaker recognition. 3.5 Speaker segmentation, diarization, detection and tracking Automatic segmentation of an audio stream into parts containing the speech of distinct speakers, music, noise, and different background conditions has many applications. This type of segmentation is elementary to the practical considerations of speaker recognition as well as speech and other audio-related recognition systems. Different specialized recognizers may be used for recognition of distinct categories of audio in a stream. An example is the ever-growing tele-conferencing application. In a tele-conference, usually, a host makes an appointment for a conference call and notifies attendees to call a telephone number and to join the conference using a special access code. There is an increasing 8 Biometrics Speaker Recognition 7 interest from the involved parties to obtain transcripts (minutes) of these conversations. In order to fully transcribe the conversations, it is necessary to know the speaker of each statement. If an enrolled model exists for each speaker, then prior to identifying the active speaker (speaker detection), the audio of that speaker should be segmented and separated from adjoining speakers. When speaker segmentation is combined with speaker identification and the resulting index information is extracted, the process is called speaker diarization. In case one is only interested in a specific speaker and where that speaker has spoken within the conversation (the timestamps), the process is called speaker tracking. 3.6 Knowledge-based speaker recognition (speech biometrics) A knowledge-based speaker recognition system is usually a combination of a speaker recognition systemand a speech recognizer and sometimes a natural language understanding engine or more. It is somewhat related to the text-prompted modality with the difference that there is another abstraction layer in the design. This layer uses knowledge fromthe speaker to test for liveness or act as an additional authentication factor. As an example, at the enrollment time, specific information such as a Personal Identification Number (PIN) or other private data may be stored about the speakers. At the verification time, randomized questions may be used to capture the test speaker’s audio and the content of interest. The content is parsed by doing a transcription of the audio and using a natural language understanding (Manning, 1999) system to parse for the information of interest. This will increase the factors in the authentication and is usually a good idea for reducing the chance of successful impostor attacks – see Figure 1. Fig. 1. A practical speaker recognition system utilizing speech recognition and natural language understanding 4. Challenges of speaker recognition Aside from its positive outlook such as the established infrastructure and simplicity of adoption, speaker recognition, too, is filled with difficult challenges for the research community. Channel mismatch is the most serious difficulty faced in this technology. As an example, assume using a specific microphone over a channel such as a cellular communication channel with all the associated band-limitations and noise conditions in one session of using 9 Speaker Recognition 8 Will-be-set-by-IN-TECH a speaker recognition system. For instance, this session can be the enrollment session for instance. Therefore, all that the system would learn about the identity of the individual is tainted by the channel characteristics through which the audio had to pass. On the hand, at the time of performing the identification or verification, a completely different channel could be used. For example, this time, the person being identified or verified may call fromhis/her home number or an office phone. These may either be digital phones going through voice T1 services or may be analog telephony devices going through analog switches and being transferred to digital telephone company switches, on the way. They would have specific characteristics in terms of dynamics, cut-off frequencies, color, timber, etc. These channel characteristics are basically modulated with the characteristics of the person’s vocal tract. Channel mismatch is the source of most errors in speaker recognition. Another problem is signal variability. This is by no means specific to speaker recognition. It is a problem that haunts almost all biometrics. In general, an abundance of data is needed to be able to cover all the variations within an individual’s voice. But even then, a person in two different sessions, would possibly have more variation within his/her own voice than if the signal is compared to that of someone else’s voice, who possesses similar vocal traits. The existence of wide intra-class variations compared with inter-class variations makes it difficult to be able to identify a person accurately. Inter-class variations denote the difference between two different individuals while intra-class variations represent the variation within the same person’s voice in two different sessions. The signal variation problem, as stated earlier, is common to most biometrics. Some of these variations may be due to aging and time-lapse effects. Time-lapse could be characterized in many different ways (Beigi, 2009). One is the aging of the individual. As we grow older, our vocal characteristics change. That is a part of aging in itself. But there are also subtle changes that are not that much related to aging and may be habitual or may also be dependent on the environment, creating variations from one session to another. These short-term variations could happen within a matter of days, weeks, or sometimes months. Of course, larger variations happen with aging, which take effect in the course of many years. Another group of problems is associated with background conditions such as ambient noise and different types of acoustics. Examples would be audio generated in a room with echos or in a street while walking and talking on a mobile (cellular) phone, possibly with fire trucks, sirens, automobile engines, sledge hammers, and similar noise sources being heard in the background. These conditions affect the recognition rate considerably. These types of problems are quite specific to speaker recognition. Of course, similar problems may show up in different forms in other biometrics. For example, analogous conditions in image recognition would show up in the form of noise in the lighting conditions. In fingerprint recognition they appear in the way the fingerprint is captured and related noisy conditions associated with the sensors. However, for biometrics such as fingerprint recognition, the technology may more readily dictate the type of sensors which are used. Therefore, in an official implementation, a vendor or an agency may require the use of the same sensor all around. If one considers the variations across sensors, different results may be obtained even in fingerprint recognition, although they would probably not be as pronounced as the variations in microphone conditions. The original purpose of using speech has been to be able to convey a message. Therefore, we are used to deploying different microphones and channels for this purpose. One person, in general uses many different speech apparatuses such as a home phone, cellphone, office 10 Biometrics Speaker Recognition 9 phone, and even a microphone headset attached to a computer. We still expect to be able to perform reasonable speaker recognition using this varied set of sensors and channels. Although, as mentioned earlier, this becomes an advantage in terms of ease of adoptability of speaker recognition in existing arenas, it also makes the speaker recognition problem much more challenging. Another problem is the presence of vocal variations due to illness. Catching a cold causes changes to our voice and its characteristics which could create difficulties in performing accurate speaker recognition. Bulk of the work in speaker recognition research is to be able to alleviate these problems, although not every problem is easily handled with the current technology. 5. Human speech generation and processing A human child develops an inherent ability to identify the voice of his/her parents before even learning to understand the content of their speech. In humans, speaker recognition is performed in the right (less dominant) hemisphere of the brain, in conjunction with the functions for processing pitch, tempo, and other musical discourse. This is in contrast with most of the language functions (production and perception) in the brain which are processed by the Broca and Wernicke areas in the left (dominant) hemisphere of the cerebral cortex (Beigi, 2011). Speech generation starts with the speech content being developed in the brain and processed through the nervous system. It includes the intended message which is created in the brain. The abstraction of this message is encoded into a code that will then produce the language (language coding step). The brain will then induce neuro-muscular activity to start the vocal tract in vocalizing the message. This message is transmitted over a channel starting with the air surrounding the mouth and continuing with electronic devices and networks such as a telephone system to transmit the coded message. The resulting signal is therefore transmitted to the air surrounding the ear, where vibrations travel through different sections of the outer and the middle ear. The cochlear vibrations excite the cilia in the inner ear, generating neural signals which travel through the Thalamus to the brain. These signals are then decoded by different parts of the brain and are decoded into linguistic concepts which are understood by the receiving individual. The intended message is embedded in the abstraction which is deduced by the brain from the signal being presented to it. This is a very complex system where the intended message generally contains a very low bit-rate content. However, the way this content undergoes transformation into a language code, neuro-muscular excitation, and finally audio, increase the bit-rate of the signal substantially, generating great redundancy. Therefore a low information content is encoded to travel through a high-capacity channel. This small amount of information may easily be tainted by noise throughout this process. Figure 2 depicts a control system representation of speech production proposed by (Beigi, 2011). Earlier, we considered the transformation of a message being formed in the brain into a high-capacity audio signal. In reality, the creation of the audio signal from the fundamental message formed in the brain may be better represented using a control system paradigm. Let us consider the Laplace transform of the original message being generated in the brain as U(s). We may lump together the different portions of the nervous system at work in generating the control signals which move the vocal tract to generate speech, into a controller block, G c (s). This block is made up of G b (s) which makes up those parts of the nervous system 11 Speaker Recognition 10 Will-be-set-by-IN-TECH Fig. 2. Control system representation from (Beigi, 2011) in the brain associated with generating the motor control signals and G m (s) which is the part of the nervous system associated with delivering the signal to the muscles in the vocal tract. The output of G c (s) is delivered to the vocal tract which is seen here as the plant. It is called G v (s) and it includes the moving parts of the vocal tract which are responsible for creating speech. The output, H(s), is the Laplace transformof the speech wave, exciting the transmission medium, namely air. At this point we may model all the noise components and disturbances which may be present in the air surrounding the generated speech. The resulting signal is then transformed by passing through some type of electronic medium through audio capture and communication. The resulting signal, Y(s) is the signal which is used to recognize the speaker. Fig. 3. Speech production in the Cerebral Cortex – from (Beigi, 2011) Figure 3, borrowed from (Beigi, 2011), shows the superimposition of the interesting parts of the brain associated with producing speech. Broca’s area which is part of the frontal lobe is 12 Biometrics Speaker Recognition 11 associated with producing the language code necessary for speech production. It may be seen as a part of the G b (s) in the control system representation of speech production. The Precentral Gyrus shown in the blue color is a long strip in the frontal lobe which is responsible for our motor control. The lower part of this area which is adjacent to Broca’s area is further split into two parts. The lower part of the blue section is responsible for lip movement. The green part is associated with control of our larynx, Pharynx, jaw, and tongue. Together, these parts make up part of the G m (s) which is the second box in the controller. Note the proximity of these control regions to Broca’s area, which is the coding section. Due to the slow transmission of chemical signals, the brain has evolved to allow for messages to travel quickly from G b (s) to G m (s) by utilizing proximity. Note that Broca’s area is also connected to the language perception section known as Wernicke’s area. This will allow the feedback and refinement of the outgoing message. The vocal tract produces a carrier signal, based on its inherent dynamics, which is modified by the signal being generated by the G c (s). This is the actual plant which was called G v (s) in the control paradigm (Figure 2). In text-independent speaker recognition, we are only concerned with learning the characteristics of the carrier signal in G v (s). Speech recognition, on the other hand, is concerned with decoding the intended message produced by Broca’s area. This is why the signal processing is quite similar between the two disciplines, but in essence each discipline is concerned with a different part of the signal. The total time-signal is therefore a convolution of these two signals. The separation of these convolved signals is quite challenging and the results are therefore tainted in both disciplines causing a major part of the recognition error. Other sources are due to many complex disturbances along the way. Figure 4 shows the major portion of the vocal tract which begins with the trachea and ends at the mouth and at the nose. It has a very plastic shape in which many of the cavities can change their shapes to be able to adjust the plant dynamics of Figure 2. 6. Theory and current approaches The plasticity of the shape of the vocal tract makes the speech signal a non-stationary signal. This means that any segment of it, when compared to an adjacent segment in the time domain, has substantially different characteristics, indicating that the dynamics of the system producing these sections varies with time. As mentioned in the Introduction, the first step is to store the vocal characteristics of the speakers in the form of speaker models in a database, for future reference. To build these models, certain features should be defined such that they would best represent the vocal characteristics of the speaker of interest. The most prevalent features used in the field happen to be identical to those used for speech recognition, namely, Mel Frequency Cepstral Coefficients (MFCCs) – see (Beigi, 2011). 6.1 Sampling A Discrete representation of the signal is used for Automatic Speaker Recognition. Therefore we need to utilize the sampling theorem to help us determine the appropriate sampling frequency to be used for converting the continuous speech signal into its discrete signal representation. One must therefore ensure that the sampling rate is picked in accordance with the guidelines set by the Whittaker-Kotelnikoff-Shannon (WKS) sampling theorem (Beigi, 2011). The WKS sampling theorem requires that the sampling frequency be at least two times the Nyquist 13 Speaker Recognition 12 Will-be-set-by-IN-TECH Fig. 4. Sagittal section of Nose, Mouth, Pharynx, and Larynx; Source: Gray’s Anatomy (Gray, 1918) Critical frequency. The Nyquist critical frequency is really the highest frequency content of the analog signal. For simplicity, normally an ideal sampler is used, which acts like the multiplication of an impulse train with the analog signal, where the impulses happen at the chosen sampling frequency. In this representation, each sample has a zero width and lasts for an instant. The sampling theorem may be stated in words by requiring that the sampling frequency be greater than or equal to the Nyquist rate. The Nyquist rate, is defined as two times the Nyquist critical frequency. Fig. 5. Block diagram of a typical sampling process Figure 5 shows a typical sampling process which starts with an analog signal and produces a series of discrete samples at a fixed frequency, representing the speech signal. The discrete samples are usually stored using a Codec (Coder/Decoder) format such as linear PCM, μ-Law, 14 Biometrics Speaker Recognition 13 a-Law, etc. Standardization is quite important for interoperability with different recognition engines (Beigi & Markowitz, 2010). There are different forms for representing the speech signal. The simplest one is the speech waveform which is basically the plot of the sampled points versus time. In general, the amplitude is normalized to dwell between −1 and 1. In its quantized form, the data is stored in the range associated with the quantization representation. For example, for a 16-bit signed linear PCM, it would go from −32768 to 32767. Fig. 6. Narrowband spectrogram of a speech signal Another representation is, so-called, the spectrogram of the signal. Figure 6 shows the narrowband spectrogram of a signal. A sliding widow of 23 ms was used for to generate this figure. The spectrogramshows the frequency content of the speech signal as a function of time. It is really a three-dimensional representation where the z-axis is depicted by the darkness of the points on the figure. The darker the pixel, the stronger the signal strength in that frequency for the time slice of choice. An artifact of the narrowband spectrogram is the existence of the horizontal curved lines across time. A speech waveform representation has also been plotted on top of the spectrogram of Figure 6 to show the relation between different points in the waveform with their corresponding frequency content in the spectrogram. The systemof Figure 5 should be designed so that it reduces aliasing, truncation, band-limitation, and jitter by choosing the right parameters, such as the sampling rate and volume normalization. Figure 7 shows how most of the fricative information is lost going from a 22 kHz sampling rate to 8 kHz. Normal telephone sampling rates are at best 8 kHz. Mostly everyone is familiar with having to qualify fricatives on the telephone by using statements such as “S” as in “Sam” and “F” as in “Frank”. 6.2 Feature extraction Cepstral coefficients have fallen out of studies in exploring the arrival of echos in nature (Bogert et al., 1963). They are related to the spectrum of the log of spectrum of a speech signal. The frequency domain of the signal in computing the MFCCs is warped to the Melody (Mel) scale. It is based on the premise that human perception of pitch is linear up to 1000 Hz and then becomes nonlinear for higher frequencies (somewhat logarithmic). There are models of the 15 Speaker Recognition Fig. 7. Utterance: “Sampling Effects on Fricatives in Speech”, sampled at 22kHz (left) and 8kHz (right) human perception based on other warped scales such as the Bark scale. There are several ways of computing Cepstral Coefficients. They may be computed using the Direct Method, also known as Moving Average (MA) which utilizes the Fast Fourier Transform (FFT) for the first pass and the Discrete Cosine Transform (DCT) for the second pass to ensure real coefficients. This method usually entails the following steps: 1. Framing – Selecting a sliding section of the signal with a fixed width in time which is then moved with some overlap. The sliding window is generally about 30ms with an overlap of about 20ms (10ms shift). 2. Windowing – A window such as a Hamming, Hann, Welch, etc. is used to smooth the signal for the computation of the Discrete Fourier Transform (DFT). 3. FFT – The Fast Fourier Transform (FFT) is generally used for approximating the DFT of the windowed signal. 4. Frequency Warping – The FFT results are warped in the frequency domain in accordance with the Melody (Mel) or Bark scale. 5. MFCC – The Mel Frequency Cepstral Coefficients (MFCC) are computed. 6. Mel Cepstral Dynamics – Delta and Delta-Delta Cepstra are computed based on adjacent MFCC values. Some use the Linear Predictive, also known as AutoRegressive (AR) features by themselves: Linear Predictive Coefficients (LPC), Partial Correlation (PARCOR) – also known as reflection coefficients, or log area ratios. However, mostly the LPCs are converted to cepstral coefficients using autocorrelation techniques (Beigi, 2011). These are called Linear Predictive Cepstral Coefficients (LPCCs). There are also the Perceptual Linear Predictive (PLP) (Hermansky, 1990) features, shown in Figure 9. PLP works by warping the frequency and spectral magnitudes of the speech signal based on auditory perception tests. The domain is changed frommagnitudes and frequencies to loudness and pitch (Beigi, 2011). There have been an array of other features used such as wavelet filterbanks (Burrus et al., 1997), for example in the form of Mel-Frequency Discrete Wavelet Coefficients and Wavelet Octave Coefficients of Residues (WOCOR). There are also Instantaneous Amplitudes and Frequencies which are in the form of Amplitude Modulation (AM) and Frequency Modulation (FM). These features come in different flavors such as Empirical Mode Decomposition (EMD), FEPSTRUM, Mel Cepstrum Modulation Spectrum (MCMS), and so on (Beigi, 2011). 16 Biometrics Speaker Recognition 15 0 2 4 6 8 10 12 14 16 18 20 −60 −50 −40 −30 −20 −10 0 10 20 30 Feature Index M F C C M e a n Fig. 8. A sample MFCC vector – from (Beigi, 2011) Fig. 9. A typical Perceptual Linear Predictive (PLP) system It is important to note that most audio segments include a good deal of silence. Addition of features extracted from silent areas in the speech will increase the similarity of models, since silence does not carry any information about the speaker’s vocal characteristics. Therefore, Silence Detection (SD) or Voice Activity Detection (VAD) (Beigi, 2011) is quite important for better results. Only segments with vocal signals should be considered for recognition. Other preprocessing such as Audio Volume Estimation and normalization and Echo Cancellation may also be necessary for obtaining desirable result (Beigi, 2011). 6.3 Speaker models Once the features of interest are chosen, models are built based on these features to represent the speakers’ vocal characteristics. At this point, depending on whether the system is text-dependent (including text-prompted) or text-independent, different methods may be used. Models are usually based on HMMs, GMMs, SVMs, and NNs. 6.3.1 Gaussian Mixture Models (GMM) In general, there are many different modeling scenarios for speaker recognition. Most of these techniques are similar to those used for speech recognition modeling. For example, a multi-state ergodic Hidden Markov Models is usually used for text-dependent speaker recognition since there is textual context. As a special case of Hidden Markov Models, Gaussian Mixture Models (GMM) are used for doing text-independent speaker recognition. This is probably the most popular technique which is used in this field. GMMs are basically single-state degenerate HMMs. 17 Speaker Recognition 16 Will-be-set-by-IN-TECH The models are tied to the type of learning that is done. A popular technique is the use of a Gaussian Mixture Model (GMM) (Duda & Hart, 1973) to represent the speaker. This is mostly relevant to the text-independent case which encompasses speaker identification and text-independent verification. Even text-dependent techniques can use GMMs, but, they usually use a GMM to initialize Hidden Markov Models (HMMs) (Poritz, 1988) built to have an inherent model of the content of the speech as well. Many speaker diarization (segmentation and ID) systems use GMMs. To build a Gaussian Mixture Model of a speaker’s speech, one should make a few assumptions and decisions. The first assumption is the number of Gaussians to use. This is dependent on the amount of data that is available and the dimensionality of the feature vectors. Standard clustering techniques are usually used for the initial determination of the Gaussians. Once the number of Gaussians is determined, some large pool of features is used to train these Gaussians (learn the parameters). This step is called training. The models generated by training are called by many different names such as background models, universal background models (UBM), speaker independent models, Base models, etc. In a GMM, the models are parameters for collections of multi-variate normal density functions which describe the distribution of the Mel-Cepstral features (Beigi, 2011) for speakers’ enrollment data. This distribution is represented by Equation 1. p(x) = 1 (2π) d 2 |ΣΣΣ| 1 2 exp − 1 2 (x −μμμ) T ΣΣΣ −1 (x −μμμ) (1) where x, μμμ ∈ R d ΣΣΣ : R d → R d In Equation 1, μμμ is the mean vector where, μμμ Δ = E {x} Δ = ˆ ∞ −∞ x p(x)dx (2) The so-called “Sample Mean” approximation for Equation 2 is, μμμ ≈ 1 N N−1 ∑ i=0 x i (3) where N is the number of samples and x i are the Mel-Cepstral feature vectors (Beigi, 2011). The Variance-Covariance matrix of a multi-dimensional random variable is defined as, ΣΣΣ Δ = E (x − E {x}) (x − E {x}) T (4) = E xx T −μμμμμμ T (5) This matrix is called the Variance-Covariance since the diagonal elements are the variances of the individual dimensions of the multi-dimensional vector, x. The off-diagonal elements are the covariances across the different dimensions. Some have called this matrix the Variance matrix. Mostly in the field of Pattern Recognition it has been referred to, simply, as the Covariance matrix which is the name we will adopt here. The Unbiased estimate of ΣΣΣ, ˜ ΣΣΣ is given by the following expression, 18 Biometrics Speaker Recognition 17 ˜ ΣΣΣ = 1 N −1 N−1 ∑ i=0 (x i −μμμ)(x i −μμμ) T (6) = 1 N −1 S xx − N(μμμμμμ T ) (7) where the sample mean μμμ is given by Equation 3 and the second order sum matrix, S xx is given by, S xx = N−1 ∑ i=0 x i x i T (8) After the training is done, generally, the basis for a speaker independent model is built and stored in the form of the above statistics. At this stage, depending on whether a universal background model (UBM) (Reynolds et al., 2000) or cohort models are desired, different processing is done. For a UBM, a pool of speakers is used to optimize the parameters of the Gaussians as well as the mixture coefficients, using standard techniques such as maximum likelihood estimation (MLE), Maximum a-Posteriori (MAP) adaptation and Maximum Likelihood Linear Regression (MLLR). There may be one or more Background models. For example, some create a single background model called the UBM, others may build one for each gender, by using separate male and female databases for the training. Cohort models(Beigi et al., 1999) are built in a similar fashion. A cohort is a set of speakers that have similar vocal characteristics to the target speaker. This information may be used as a basis to either train a Hidden Markov Model including textual context, or to do an expectation maximization in order to come up with the statistics for the underlying model. At this point, the system is ready for performing the enrollment. The enrollment may be done by taking a sample audio of the target speaker and adapting it to be optimal for fitting this sample. This ensures that the likelihoods returned by matching the same sample with the modified model would be maximal. 6.3.2 Support vector machines Support vector machines (SVMs) have been recently used quite often in research papers regarding speaker recognition. Although they show very promising results, most of the implementations suffer from huge optimization problems with large dimensionality which have to be solved at the training stage. Results are not substantially different from GMM techniques and in general it may not be warranted to use such costly optimization calculations. The claim-to-fame of support vector machines (SVMs) is that they determine the boundaries of classes, based on the training data, and they have the capability of maximizing the margin of class separability in the feature space. (Boser et al., 1992) states that the number of parameters used in a support vector machine is automatically computed (see Vapnik-Chervonenkis (VC) dimension (Burges, 1998; Vapnik, 1998)) to present a solution in terms of a linear combination of a subset of observed (training) vectors, which are located closest to the decision boundary. These vectors are called support vectors and the model is known as a support vector machine. Vapnik (Vapnik, 1979) pioneered the statistical learning theory of SVMs, which is based on minimizing the classification error of both the training data and some unknown (held-out) data. Of course, the core of support vector machines and other kernel techniques stems from 19 Speaker Recognition 18 Will-be-set-by-IN-TECH much earlier work on setting up and solving integral equations. Hilbert (Hilbert, 1912) was one of the main developers of the formulation of integral equations and kernel transformations. One of the major problems with SVMs is their intensive need for memory and computation power at the training stage. Training of SVMs for speaker recognition also suffers from these limitations. To address this issue, new techniques have been developed to split the problem into smaller subproblems which would then be solved in parallel as a network of problems. One such technique is known as cascade SVM (Tveit & Engum, 2003) for which certain improvements have also been proposed in the literature (Zhang et al., 2005). Some of the shortcomings of SVMs have been addressed by combining them with other learning techniques such as fuzzy logic and decision trees. Also, to speed up the training process, several techniques based on the decomposition of the problemand selective use of the training data have been proposed. In application to speaker recognition, experimental results have shown that SVM implementations of speaker recognition are slightly inferior to GMM approaches. However, it has also been noted that systems which combine GMM and SVM approaches often enjoy a higher accuracy, suggesting that part of the information revealed by the two approaches may be complementary (Solomonoff et al., 2004). For a detailed coverage, see (Beigi, 2011). In general SVMs are two-class classifiers. That’s why they are suitable for the speaker verification problem which is a two-class problem of comparing the voice of an individual to his/her model versus a background population model. N-class classification problems such as speaker identification have to be reduced to N two-class classification problems where the i th two-class problem compares the i th class with the rest of the classes combined (Vapnik, 1998). This can become quite computationally intensive for large-scale speaker identification problems. Another problem is that the Kernel function being used by SVMs is almost magically chosen. 6.3.3 Neural networks Another modeling paradigm is the neural network perspective. There are quite a number of different neural networks and related architectures such as feed forward networks, TDNNs, probabilistic random access memory or pRAM models, Hierarchical Mixtures of Experts or HMEs, etc. It would take an enormous amount of time to go through all these and other possibilities. See (Beigi, 2011) for details. 6.3.4 Model adaptation (enrollment) For a new person being enrolled in the system, the base speaker-independent models are modified to match the a-posteriori statistics of the enrolled person or target speaker’s sample enrollment speech. This is done by any technique such as maximum a-posteriori probability estimation (MAP), for example, using expectation maximization (EM), or maximumlikelihood linear regression for text-independent systems or simply by modifying the counts of the transitions on a hidden Markov model (HMM) for text-dependent systems. 7. Speaker recognition At the identification and verification stage, a new sample is obtained for the test speaker. In the identification process, the sample is used to compute the likelihood of this sample being generated by the different models in the database. The identity of the model that returns the highest likelihood is returned as the identity of the test speaker. In identification, the 20 Biometrics Speaker Recognition 19 results are usually ranked by these likelihoods. To ensure a good dynamic range and better discrimination capability, log of the likelihood is computed. At the verification stage, the process becomes very similar to the identification process described earlier, with the exception that instead of computing the log likelihood for all the models in the database, the sample is only compared to the model of the target speaker and the background or cohort models. If the target speaker model provides a better log likelihood, the test speaker is verified and otherwise rejected. The comparison is done using the Log Likelihood Ratio (LLR) test. An extension of speaker recognition is diarization which includes segmentation followed by speaker identification and sometimes verification. The segmentation finds abrupt changes in the audio stream. Bayesian Information Criterion (BIC) (Chen & Gopalakrishnan, 1998) and Generalized Likelihood Ratio (GLR) techniques and their combination (Ajmera & McCowan, 2004) as well as other techniques (Beigi & Maes, 1998) have been used for the initial segmentation of the audio. Once the initial segmentation is done, a limited speaker identification procedure allows for tagging of the different parts with different labels. Figure 10 shows such a results for a two-speaker segmentation. 0 5 10 15 20 25 30 35 40 45 50 −1 0 1 Time (s) F r e q u e n c y ( H z ) S p e a k e r A S p e a k e r B S p e a k e r A S p e a k e r B U n k n o w n S p e a k e r S p e a k e r A S p e a k e r B 0 5 10 15 20 25 30 35 40 45 50 0 500 1000 1500 2000 2500 3000 3500 4000 Fig. 10. Segmentation and labeling of two speakers in a conversation using turn detection followed by identification 7.1 Representation of results Speaker identification results are usually presented in terms of the error rate. They may also be presented as the error rate based on the true result being present in the top N matches. This case is usually more prevalent in the cases where identification is used to prune a large set of speakers to only a handful of possible matches so that another expert system (human or machine) would finalize the decision process. In the case of speaker verification, the method of presenting the results is somewhat more controversial. In the early days in the field, a Receiver Operating Characteristic (ROC) curve was used (Beigi, 2011). For the past decade, the Detection Error Trade-Off (DET) curve (Martin et al., 1997; Martin & Przybocki, 2000) has been more prevalent, with a measurement of the cost of producing the results, called the Detection Cost Function (DCF) (Martin & Przybocki, 2000). 21 Speaker Recognition 20 Will-be-set-by-IN-TECH Figures 11 and 12 show sample DET curves for two sets of data underscoring the difference in performances. Recognition results are usually quite data-dependent. The next section will speak about some open problems which degrade results. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 2 4 6 8 10 12 False Acceptance (%) F a l s e R e j e c t i o n ( % ) Fig. 11. DET Curve for quality data 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 100 False Acceptance (%) F a l s e R e j e c t i o n ( % ) Fig. 12. DET Curve for highly mismatched and noisy data There is a controversial operating point on the DET curve which is usually marked as the point of comparison between different results. This point is called the Equal Error Rate (EER) and signifies the operating point where the false rejection rate and the false acceptance rate are equal. This point does not carry any real preferential information about the “correct” or “desired” operating point. It is mostly a point of convenience which is easy to denote on the curve. 22 Biometrics Speaker Recognition 21 8. State of the art In designing a practical speaker recognition system, one should try to affect the interaction between the speaker and the engine to be able to capture as many vowels as possible. Vowels are periodic signals which carry much more information about the resonance subtleties of the vocal tract. In the text-dependent and text-prompted cases, this may be done by actively designing prompts that include more vowels. For text-independent cases, the simplest way is to require more audio in hopes that many vowels would be present. Also, when speech recognition and natural language understanding modules are included (Figure 1), the conversation may be designed to allow for higher vowel production by the speaker. As mentioned earlier, the greatest challenge in speaker recognition is channel-mismatch. Considering the general communication system given by Figure 13, it is apparent that the channel and noise characteristics at the time of communication are modulated with the original signal. Removing these channel effects is the most important problem in information theory. This is of course a problem where the goal is to recognize the message being sent. It is, however, a much bigger problem when the quest is the estimation of the model that generated the message – as it is with the speaker recognition problem. In that case, the channel characteristics have mixed in with the model characteristics and their separation is nearly impossible. Once the same source is transmitted over an entirely different channel with its own noise characteristics, the problem of learning the source model becomes even harder. Fig. 13. One-way communication Many techniques are used for resolving this problem, but it is still the most important source of errors in speaker recognition. It is the reason why most systems that have been trained on a predetermined set of channels, such as landline telephone, could fail miserably when cellular (mobile) telephones are used. The techniques that are being used in the industry are listed here, but there are more techniques being introduced every day: • Spectral Filtering and Cepstral Liftering – Cepstral Mean Subtraction (CMS) or Cepstral Mean Normalization (CMN) (Benesty et al., 2008) – Cepstral Mean and Variance Normalization (CMVN) (Benesty et al., 2008) – Histogram Equalization (HEQ) (de la Torre et al., 2005) and Cepstral Histogram Normalization (CHN) (Benesty et al., 2008) – AutoRegressive Moving Average (ARMA) (Benesty et al., 2008) – RelAtive SpecTrAl (RASTA) Filtering (Hermansky, 1991; van Vuuren, 1996) – J-RASTA (Hardt & Fellbaum, 1997) – Kalman Filtering (Kim, 2002) • Other Techniques – Vocal Tract Length Normalization (VTLN) – first introduced for speech recognition: (Chau et al., 2001) and later for speaker recognition (Grashey & Geibler, 2006) – Feature Warping (Pelecanos & Sridharan, 2001) 23 Speaker Recognition 22 Will-be-set-by-IN-TECH – Feature Mapping (Reynolds, 2003) – Speaker Model Synthesis (SMS) (R. et al., 2000) – Speaker Model Normalization (Beigi, 2011) – H-Norm (Handset Normalization) (Dunn et al., 2000) – Z-Norm and T-Norm (Auckenthaler et al., 2000) – Joint Factor Analysis (JFA) (Kenny, 2005) – Nuisance Attribute Projection (NAP) (Solomonoff et al., 2004) – Total Variability (i-vector) (Dehak et al., 2009) Recently, depending on whether GMMs are used or SVMs, the two techniques of joint factor analysis (JFA) and nuisance attribute projection (NAP) have been used respectively, in most research reports. Joint factor analysis (JFA) (Kenny, 2005) is based on factor analysis (FA) (Jolliffe, 2002). FA is a linear transformation which makes the assumption of having an explicit model which differentiates it from principal component analysis (PCA) and linear discriminant analysis (LDA). In fact in some perspective, it may be seen as a more general version of PCA. FA assumes that the underlying random variable is composed of two different components. The first component is a random variable, called the common factors, which has a lower dimensionality compared to the combined random state, X, and the observation, Y. It is called the vector of common factors since the same vector, Θ : θθθ : R 1 → R M , M <= D, is a component of all the samples of y n . The second component is the, so called, vector of specific factors, or sometimes called the error or the residual vector. It is denoted by E : (e) D1 . Therefore, this linear FA model for a specific randomvariable, ˜ Y : ˜ y : R q → R D , related to the observed randomvariable Y may be written as follows, ˜ y n = Vθθθ n +e n (9) where V : R M → R D is known as the factor loading matrix and its elements, (V) dm , are known as the factor loadings. Samples of random variable Θ : (θθθ n ) M1 , n ∈ {1, 2, · · · , N} are known as the vectors of common factors, since due to the linear combination nature of the factor loading matrix, each element, (θθθ) m , has a hand in shaping the value of (generally) all ( ˜ y n ) d , d ∈ {1, 2, · · · , D}. Samples of random variable E : e n , n ∈ {1, 2, · · · , N} are known as vectors of specific factors, since each element, (e n ) d is specifically related to a corresponding, ( ˜ y n ) d . JFA uses the concept of FA to split the space of the model parameters into speaker model parameters and channel parameters. It makes the assumption that the channel parameters are normally distributed, have a smaller dimensionality, and are common to all training samples. The model parameters, on the other hand, are common for each speaker. This separation allows for learning the channel characteristics in the formof separate model parameters, hence producing pure and somewhat channel-independent speaker models. Nuisance attribute projection (NAP) (Solomonoff et al., 2004) is a method of modifying the original kernel, being used for the support vector machine (SVM) formulation, to one with the capability of telling specific channel information apart. The premise behind this approach is that by doing so, in both training and recognition stages, the systemwill not have the ability to distinguish channel specific information. This channel specific information is what is dubbed nuisance by (Solomonoff et al., 2004). NAP is a projection technique which assumes that most of the information related to the channel is stored in specific low-dimensional subspaces 24 Biometrics Speaker Recognition 23 of the higher dimensional space to which the original features are mapped. Furthermore, these regions are assumed to be somewhat distinct from the regions which carry speaker information. Some even more recent developments have been made in speaker modeling. The identity vector or i-vector is a new representation of a speaker in a space of speakers called the total variability space. This model came from an observation by (Dehak et al., 2009) that the channel space in JFA still contained some information which may be used to distinguish speakers. This triggered the following representation of the GMM supervector of means (μμμ) which contains both speaker- and channel-dependent information. μμμ = μμμ I +Tωωω (10) In Equation 10, μμμ is assumed to be normally distributed with E {μμμ} = μμμ I , where μμμ I is the GMM supervector computed over the speaker- and channel-independent model which may be chosen to be the universal background model. The covariance for μμμ is assumed to be Cov(μμμ) = TT T , where T is a low-rank matrix, and ωωω is the i-vector which is a standard normally distributed vector (p(ωωω) = N(0, I)). The i-vector represents the coordinates of the speaker in the, so-called, total variability space. 9. Future of the research There are many challenges that have not been fully addressed in different branches of speaker recognition. For example, the large-scale speaker identification problem is one that is quite hard to handle. In most cases when researchers speak of large-scale in the identification arena, they speak of a few thousands of enrolled speakers. As the number of speakers increases to millions or even billions, the problem becomes quite challenging. As the number of speakers increases, doing an exhaustive match through the whole population becomes almost computationally implausible. Hierarchical techniques (Beigi et al., 1999) would have to be utilized to handle such cases. In addition, the speaker space is really a continuum. This means that if one considers a space where speakers who are closer in their vocal characteristics would be placed near each other in that space, then as the number of enrolled speakers increases, there will always be a new person that would fill in the space between any two neighboring speakers. Since there are intra-speaker variabilities (differences between different samples taken from the same speaker), the intra-speaker variability will be at some point more than inter-speaker variabilities, causing confusion and eventually identification errors. Since there are presently no large databases (in the order of millions and higher), there is no indication of the results, both in terms of the speed of processing and accuracy. Another challenge is the fact that over time, the voice of speakers may change due to many different reasons such as illness, stress, aging, etc. One way to handle this problem is to have models which constantly adapt to changes (Beigi, 2009). Yet another problem plagues speaker verification. Neither background models nor cohort models are error-free. Background models generally smooth out many models and unless the speaker is considerably different from the norm, they may score better than the speaker’s own model. This is especially true if one considers the fact that nature is usually Gaussian and that there is a high chance that the speaker’s characteristics are close to the smooth background model. If one were to only test the target sample on the target model, this would not be a problem. But since a test sample which is different from the target sample (used for creating the model) is used, the intra-speaker variability might be larger than the inter-speaker variability between the test speech and the smooth background model. 25 Speaker Recognition 24 Will-be-set-by-IN-TECH There are, of course, many other open problems. Some of these problems have to do with acceptable noise levels until break-down occurs. Using a cellular telephone with its inherently bandlimited characteristics in a very noisy venue such as a subway (metro) station is one such challenge. Given the number of different operating conditions in invoking speaker recognition, it is quite difficult for technology vendors to provide objective performance results. Results are usually quite data-dependent and different data sets may pronounce particular merits and downfalls of each provider’s algorithms and implementation. A good speaker verification system may easily achieve an 0% EER for clean data with good inter-speaker variability in contrast with intra-speaker variability. It is quite normal for the same “good” system to show very high equal error rates under severe conditions such as high noise levels, bandwidth limitation, and small relative inter-speaker variability compared to intra-speaker variability. However, under most controlled conditions, equal error rates below 5% are readily achieved. Similar variability in performance exists in other branches of speaker recognition, such as identification, etc. 10. References Ajmera, J. & McCowan, I.and Bourlard, H. (2004). Robust speaker change detection, IEEE Signal Processing Letters 11(8): 649–651. Auckenthaler, R., Carey, M. & Lloyd-Thomas, H. (2000). Score normalization for text-independent speaker verification systems, Digital Signal Processing 10(1–3): 42–54. Beigi, H. (2009). Effects of time lapse on speaker recognition results, 16th Internation Conference on Digital Signal Processing, pp. 1–6. Beigi, H. (2011). Fundamentals of Speaker Recognition, Springer, New York. ISBN: 978-0-387-77591-3. Beigi, H. & Markowitz, J. (2010). Standard audio format encapsulation (safe), Telecommunication Systems pp. 1–8. 10.1007/s11235-010-9315-1. URL: http://dx.doi.org/10.1007/s11235-010-9315-1 Beigi, H. S., Maes, S. H., Chaudhari, U. V. & Sorensen, J. S. (1999). A hierarchical approach to large-scale speaker recognition, EuroSpeech 1999, Vol. 5, pp. 2203–2206. Beigi, H. S. & Maes, S. S. (1998). Speaker, channel and environment change detection, Proceedings of the World Congress on Automation (WAC1998). Benesty, J., Sondhi, M. M. & Huang, Y. (2008). Handbook of Speech Processing, Springer, New york. ISBN: 978-3-540-49125-5. Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacrétaz, D. & Reynolds, D. (2004). A tutorial on text-independent speaker verification, EURASIP Journal on Applied Signal Processing 2004(4): 430–451. Bogert, B. P., Healy, M. J. R. & Tukey, J. W. (1963). The quefrency alanysis of time series for echoes: Cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe cracking, in M. Rosenblatt (ed.), Time Series Analysis, pp. 209–243. Ch. 15. Boser, B. E., Guyon, I. M. & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, pp. 144–152. Burges, C. J. (1998). Atutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2: 121–167. 26 Biometrics Speaker Recognition 25 Burrus, C. S., Gopinath, R. A. & Guo, H. (1997). Introduction to Wavelets and Wavelet Transforms: A Primer, Prentice Hall, New york. ISBN: 0-134-89600-9. Campbell, J.P., J. a. (1997). Speaker recognition: a tutorial, Proceedings of the IEEE 85(9): 1437–1462. Chau, C. K., Lai, C. S. & Shi, B. E. (2001). Feature vs. model based vocal tract length normalization for a speech recognition-based interactive toy, Active Media Technology, Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, pp. 134–143. ISBN: 978-3-540-43035-3. Chen, S. S. &Gopalakrishnan, P. S. (1998). Speaker, environemnt and channel change detection and clustering via the bayesian inromation criterion, IBM Techical Report, T.J. Watson Research Center. Dehak, N., Dehak, R., Kenny, P., Brummer, N., Ouellet, P & Dumouchel, P. (2009). Support Vector Machines versus Fast Scoring in the Low-Dimensional Total Variability Space for Speaker Verification, Interspeech, pp. 1559–1562. de la Torre, A., Peinado, A. M., Segura, J. C., Perez-Cordoba, J. L., Benitez, M. C. & Rubio, A. J. (2005). Histogram equalization of speech representation for robust speech recognition, IEEE Transaction of Speech and Audio Processing 13(3): 355–366. Duda, R. O. & Hart, P. E. (1973). Pattern Classification and Scene Analysis, John Wiley & Sons, New York. ISBN: 0-471-22361-1. Dunn, R. B., Reynolds, D. A. & Quatieri, T. F. (2000). Approaches to speaker detection and tracking in conversational speech, Digital Signal Processing 10: 92–112. Furui, S. (2005). 50 years of progress in speech and speaker recognition, Proc. SPECOM, pp. 1–9. Grashey, S. & Geibler, C. (2006). Using a vocal tract length related parameter for speaker recognition, Speaker and Language Recognition Workshop, 2006. IEEE Odyssey 2006: The, pp. 1–5. Gray, H. (1918). Anatomy of the Human Body, 20th edn, LEAand FEBIGER, Philadelphia. Online version, New York (2000). URL: http://www.Bartleby.com Hardt, D. & Fellbaum, K. (1997). Spectral subtraction and rasta-filtering in text-dependent hmm-based speaker verification, Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, Vol. 2, pp. 867–870. Hermansky, H. (1990). Perceptual linear predictive (plp) analysis of speech, 87(4): 1738–1752. Hermansky, H. (1991). Compensation for the effect of the communication channel in the auditory-like analysis of speech (rasta-plp), Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH-91), pp. 1367–1370. Hilbert, D. (1912). Grundzüge Einer Allgemeinen Theorie der Linearen Integralgleichungen (Outlines of a General Theory of Linear Integral Equations), Fortschritte der Mathematischen Wissenschaften, heft 3 (Progress in Mathematical Sciences, issue 3), B.G. Teubner, Leipzig and Berlin. In German. Originally published in 1904. Jolliffe, I. (2002). Principal Component Analysis, 2nd edn, Springer, New york. Kenny, P. (2005). Joint factor analysis of speaker and session varaiability: Theory and algorithms, Technical report, CRIM. URL: http://www.crim.ca/perso/patrick.kenny/FAtheory.pdf Kim, N. S. (2002). Feature domain compensation of nonstationary noise for robust speech recognition, Speech Communication 37(3–4): 59–73. 27 Speaker Recognition 26 Will-be-set-by-IN-TECH Manning, C. D. (1999). Foundations of Statistical Natural Language Processing, The MIT Press, Boston. ISBN: 0-26-213360-1. Martin, A., Doddington, G., Kamm, T., Ordowski, M. & Przybocki, M. (1997). The det curve in assessment of detection task performance, Eurospeech 1997, pp. 1–8. Martin, A. &Przybocki, M. (2000). The nist 1999 speaker recognition evaluation – an overview, Digital Signal Processing 10: 1–18. Nolan, F. (1983). The phonetic bases of speaker recognition, Cambridge University Press, New York. ISBN: 0-521-24486-2. Pelecanos, J. & Sridharan, S. (2001). Feature warping for robust speaker verification, A Speaker Odyssey - The Speaker Recognition Workshop, pp. 213–218. Pollack, I., Pickett, J. M. &Sumby, W. (1954). On the identification of speakers by voice, Journal of the Acoustical Society of America 26: 403–406. Poritz, A. B. (1988). Hidden markov models: a guided tour, International Conference on Acoustics, Speech, and Signal Processing (ICASSP-1988), Vol. 1, pp. 7–13. R., T., B., S. & L., H. (2000). A model-based transformational approach to robust speaker recognition, International Conference on Spoken Language Processing, Vol. 2, pp. 495–498. Reynolds, D. A. (2003). Channel robust speaker verification via feature mapping, Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03). 2003 IEEE International Conference on, Vol. 2, pp. II–53–6. Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted gaussian miscture models, Digital Signal Processing 10: 19–41. Shearme, J. N. & Holmes, J. N. (1959). An experiment concerning the recognition of voices, Language and Speech 2: 123–131. Solomonoff, A., Campbell, W. & Quillen, C. (2004). Channel compensation for svm speaker recognition, The Speaker and Language Recognition Workshop Odyssey 2004, Vol. 1, pp. 57–62. Tosi, O. I. (1979). Voice Identification: Theory and Legal Applications, University Park Press, Baltimore. ISBN: 978-0-839-11294-5. Tveit, A. & Engum, H. (2003). Parallelization of the incremental proximal support vector machine classifier using a heap-based tree topology, Workshop on Parallel Distributed Computing for Machine Learning. USC (2005). Disability census results for 2005, World Wide Web. URL: http://www.census.gov van Vuuren, S. (1996). Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch, International Conference on Spoken Language Processing (ICSLP), pp. 784–787. Vapnik, V. N. (1979). Estimation of Dependences Based on Empirical Data, russian edn, Nauka, Moscow. English Translation: Springer-Verlag, New York, 1982. Vapnik, V. N. (1998). Statistical learning theory, John Wiley, New York. ISBN: 0-471-03003-1. Viswanathan, M., Beigi, H. S. & Maali, F. (2000). Information access using speech, speaker and face recognition, IEEE International Conference on Multimedia and Expo (ICME2000). Zhang, J.-P., Li, Z.-W. & Yang, J. (2005). A parallel svm training algorithm on large-scale classification problems, International Conference on Machine Learning and Cybernetics, Vol. 3, pp. 1637–1641. 28 Biometrics 2 Finger Vein Recognition Kejun Wang, Hui Ma, Oluwatoyin P. Popoola and Jingyu Li Pattern Recognition & Intelligent Systems Department, College of Automation, Harbin Engineering University China 1. Introduction Smart recognition of human identity for security and control is a global issue of concern in our world today. Financial losses due to identity theft can be severe, and the integrity of security systems compromised. Hence, automatic authentication systems for control have found application in criminal identification, autonomous vending and automated banking among others. Among the many authentication systems that have been proposed and implemented, finger vein biometrics is emerging as the foolproof method of automated personal identification. Finger vein is a unique physiological biometric for identifying individuals based on the physical characteristics and attributes of the vein patterns in the human finger. It is a fairly recent technological advance in the field of biometrics that is being applied to different fields such as medical, financial, law enforcement facilities and other applications where high levels of security or privacy is very important. This technology is impressive because it requires only small, relatively cheap single-chip design, and has a very fast identification process that is contact-less and of higher accuracy when compared with other identification biometrics like fingerprint, iris, facial and others. This higher accuracy rate of finger vein is not unconnected with the fact that finger vein patterns are virtually impossible to forge thus it has become one of the fastest growing new biometric technology that is quickly finding its way from research labs to commercial development. Historically, R&D at Hitachi of Japan (1997-2000) discovered that finger vein pattern recognition was a viable biometric for personal authentication technology and by 2000-2005 were the first to commercialize the technology into different product forms, such as ATMs. Their research reports false acceptance rate (FAR) of as low as 0.0001 % and false reject rate (FRR) of 0.1%. Today 70% of major financial institutions in Japan are using finger vein authentication. Fig. 1. Hitachi of Japan history of research & development on finger-vein recognition technology Biometrics 30 Fingerprints have been the most widely used and trusted biometrics. The reasons being: the ease of acquiring fingerprints, the availability of inexpensive fingerprint sensors and a long history of its use. However, limitations like the deterioration of the epidermis of the fingers, finger surface particles etc result in inaccuracies that call for more accurate and robust methods of authentication. Vein recognition technology however offers a promising solution to these challenges due the following characteristics. (1) Its universality and uniqueness. Just as individuals have unique fingerprints, so also they do have unique finger vein images. The vein images of most people remain unchanged despite ageing. (2) Hand and finger vein detection methods do not have any known negative effects on body health. (3) The condition of the epidermis has no effect on the result of vein detection. (4) Vein features are difficult to be forged and changed even by surgery [1]. These desirable properties make vein recognition a highly reliable authentication method. Vein object extraction is the first crucial step in the process. The aim is to obtain vein ridges from the background. Recognition performance relates largely to the quality of vein object extraction. The standard practice is to acquire finger vein images by use of near-infrared spectroscopy. When a finger is placed across near infra-red light rays of 760 nm wavelength, finger vein patterns in the subcutaneous tissue of the finger are captured because deoxygenated hemoglobin in the vein absorb the light rays. The resulting vein image appears darker than the other regions of the finger, because only the blood vessels absorb the rays. The extraction method has a direct impact on feature extraction and feature matching [2]. Therefore, vein object extraction significantly affects the effectiveness of the entire system. 2. Processing After vein image extraction, comes segmentation. The traditional vein extraction technology can be classified into three broad categories according to their approach to segmentation i.e separating the actual finder veins from the background and noise. There are those based on region information, those based on edge information, and those based on particular theories and tools. However, the application of the traditional single- threshold segmentation methods such as fixed threshold, total mean, total Otsu to perform segmentation, faces limitations in obtaining the desired accurate segmentation results. Using multi-threshold methods like local mean and local Otsu, improve these results but still cannot effectively deal with noise and over-segmentation effects [3], [4], [5], [6], [7],[8]. In a related research, reference [9] proposed an oriented filter method to enhance the image in order to eliminate noise and enhance ridgeline. Authors in [10] used the directionality feature of fingerprint to present a fingerprint image enhancement method based on orientation field. These two methods take the directionality characteristic of fingerprints into account, so they can enhance and de-noise fingerprint images effectively. Finger vein pattern also has textural and directionality features, with directionality being consistent within the local area. Inspired by methods in [9] and [10], we discuss in this chapter, finger vein pattern extraction methods using oriented filtering from the directionality feature of veins. These utilize the directionality feature of finger vein images using a group of oriented filters, and then extracting the vein object from an enhanced oriented filter image. Finger Vein Recognition 31 2.1 Normalization Normalization is a pixel-wise operation often used in image processing. The main purpose of normalization is to get an output image with desirable mean and variance, which facilitates the subsequent processing. The uniformly illuminated image becomes normalized by this formula: ÷ ÷ = = = × ¿ ¿ 1 1 0 0 1 ( , ) M N i j M I i j M N (1) ÷ ÷ = = = ÷ × ¿ ¿ 1 1 2 0 0 1 ( ( , ) ( )) M N i j VAR I i j M I M N (2) ¦ + × ÷ > ¦ ¦ = ´ ¦ ÷ × ÷ < ¦ ¹ 2 0 0 2 0 0 ( ( , ) ) , ( , ) ( , ) ( ( , ) ) , ( , ) VAR M I i j M I i j M VAR G i j VAR M I i j M I i j M VAR (3) Where Mand VAR denote the estimated mean and variance of input image and = 0 150 M , = 0 255 VAR are desired mean and variance values respectively. After normalization, the output image is ready for next processing step. The result of the above- mentioned process is shown in Fig.2. We note that Fig.2 image has lost some of its contrast but now has uniform illumination. (a) Original finger vein image (b) Normalized finger vein image Fig. 2. Original image and Normalized image 2.2 Oriented filter enhancement Finger vein pattern has directionality characteristics, which the traditional filter methods do not take into account; therefore, its resultant filtering enhancement is not satisfactory. We propose a vein pattern extraction method using oriented filtering technology that takes into account the directionality feature of the veins. This algorithm utilizes a group of oriented filters to filter the image depending on the orientation of the local ridge. Biometrics 32 a. Calculation of directional image The directional image is an image transform, where we use the direction of every pixel of an image to represent the original vein image. Pixels’ direction refers to the orientation of continuous gray value. We can determine the direction of pixel according to the gray distribution of the neighborhood. Pixels along the vein ridge have minimum gray level difference while pixels perpendicular to the vein ridge have the maximum gray level difference. To estimate the orientation field of vein image, the direction of the vein is quantized into eight directions. Using a × 9 9 template window as shown in Fig.2, we choose reference pixel ( , ) p i j the center of the direction template. The values 1-8 of the template correspond to eight directions rotating from 0 tot . Each interval has and angle of t /8 in an anti- clockwise direction from the horizontal axis. The main steps of the method are: 1. For every pixel, use the × 9 9 rectangular window (shown as Fig.2) to obtain the pixel gray value average = ( 1, 2...8) i M i . 2. Divide = ( 1, 2...8) i M i .into four groups. 1 M and 5 M are perpendicular in direction, so they belong to the same group. For the same reason, 2 M and 6 M , 3 M and 7 M , 4 M and 8 M belong to the same group. In each group, calculate AM- the absolute difference between two gray value averages j M and +4 j M . + A = ÷ = 4 , 1,..., 4 j j M M M j (4) where j is the direction of the vein. 3. Choose the maximum AM to determine the pixel’s possible directions max j and max j +4. 4. Determine the actual direction of ( , ) p i j by comparing its gray value with the gray value averages of max j and max j +4. The closer value is its direction. Therefore, the pixels direction is given by: ¦ ÷ < ÷ + ¦ = ´ ¦ + ¹ max max , 4 ( , ) 4, j j j if M M M M D x y j otherwise (5) When the above process is performed on each pixel in the image, we can obtain the directional image ( , ) D x y of the vein image. Due to the presence of noise in the vein image, the estimated orientation field may not always be correct. In a small local neighborhood, the pixels’ orientations are generally uniform; and so a local ridge orientation is specified for a block rather than at every pixel. Using a smoothing process on the point directional map, a continuous directional map is obtained. A continuous × w w window can be used to modify the incorrect ridge orientation and smooth the point directional map. The experiments show that w =8 is very good. We obtain each window’s directional histogram and choose the peak value of the direction histogram as ( , ) P x y ’s orientation. The continuous directional map ( , ) O x y is defined as: = ( , ) (max( )) i O x y ord N (6) Finger Vein Recognition 33 7 6 5 4 3 8 7 6 5 4 3 2 8 2 1 1 p(i,j) 1 1 2 8 2 3 4 5 6 7 8 3 4 5 6 7 Fig. 3. 9×9 rectangular window Fig. 4. Directional image of finger vein Where = 1, 2...8 i , function (*) ord is used to obtain the subscript of element *. The directional image of finger vein is as shown in Fig.3 .Each color of directional image corresponds to a direction. As can be seen from the Fig.4, vein ridge has the feature of directionality, and the vein ridge orientation varies slowly in a local neighborhood. b. Oriented filter The vein directional image is a kind of textured pattern generated by using oriented filters based on directional map to enhance the original images. We designed eight filter masks, each one associated with the discrete ridge orientation of finger vein pixels. From the direction determined for a specific block (from the original image), a corresponding filter is selected to enhance this block image. The template coefficients of horizontal mask are designed first. To generate the seven other masks, the horizontal filter mask is rotated according to the direction of the vein. O’Gorman’s rules for filter design which is described for enhancing fingerprint images consists of four key points: 1. An appropriate filter template size. 2. The width of the filter template should be odd in order that the template is symmetric in the direction of horizontal and vertical. 3. In the vertical direction, central part of the filter template coefficient should be positive while both sides of the coefficient negative. 4. The sum of all template coefficients should be zero. Applying the above rules based on the direction of the finger vein, we modify the filter’s coefficients so that they decay from the center to both ends of the template. The oriented filter’s size is decided according to vein ridge width. From experiments, a filter template of size 7 has been shown to be quite effective. Biometrics 34 ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ /3 2 /3 2 /3 /3 /3 2 /3 2 /3 /3 /3 2 /3 2 /3 /3 /3 2 /3 2 /3 /3 /3 2 /3 2 /3 /3 /3 2 /3 2 /3 /3 /3 2 /3 2 /3 /3 c c c c c c c b b b b b b b a a a a a a a d d d d d d d a a a a a a a b b b b b b b c c c c c c c ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ ÷ 8 16 24 24 24 16 8 0 0 0 0 0 0 0 3 6 9 9 9 6 3 10 20 30 30 30 20 10 3 6 9 9 9 6 3 0 0 0 0 0 0 0 8 16 24 24 24 16 8 (a) Template coefficient of horizontal oriented filter (b) An example of oriented filter Fig. 5. Template coefficient of horizontal oriented filter and an example of oriented filter The coefficient spatial arrangement of the horizontal mask is shown in Fig.5. a. The coefficients of the filter template should meet the following given conditions: + + ÷ = 2 2 2 0 d a b c , where > > > > 0, 0; d a d c . An example of oriented filter is shown as Fig.5 (b). Now, aiming at every pixel ( , ) i j in the input image, we select a 3 neighborhood that takes ( , ) i j as the center. It is filtered with the mask that corresponds to the block orientation of the center ( , ) i j .This filtering technique is given by: u =÷ =÷ = + + ¿ ¿ 3 3 3 3 ( , ) ( , ) ( , ) x y f i j G i x j y g x y (7) where ( , ) i j represents the pixel of original image, and , x y represents the size of oriented filter template and u ( , ) g x y represents the corresponding coefficients of the template. d T represents the filtered image. It is possible that some gray values fall outside the [0, 255] range. Equation (8) is used to adjust the gray values to fall within the range. ÷ ' = × ÷ min max min ( , ) ( , ) ( , ) ( 255) ( , ) ( , ) f i j f i j f i j Round f i j f i j (8) Where ( , ) f i j represents the original gray of ( , ) i j , min ( , ) f i j represents smallest gray value of original image, max ( , ) f i j represents biggest gray value of original image, '( , ) f i j represents transformed gray of ( , ) i j and Round(.) is a rounding function. To generate the other seven masks, we rotate the horizontal filter mask according to the following equation: Where ( , ) i j represents the coordinates in the horizontal mask, and ' ' ( , ) i j represents the ones in the rotated mask. u u u u ( ( ( = ( ( ( ÷ ¸ ¸ ¸ ¸ ¸ ¸ ' ' cos sin sin cos i i j j (9) Whereu t = ÷ = ( 1) / , 1, 2,..., 8 d d , u represents the rotation angel, and d represents the direction value of ( , ) i j . To use this method, ( , ) i j is usually not an integer, so we need to use nearest neighbor interpolation to get the coefficients of the rotated mask. u ' ' ( , ) g i j (the coefficients in the rotated mask) is equal to 1 ( , ) g i j (the coefficients in the horizontal mask). Finger Vein Recognition 35 Suppose the four points ( , ) m m i j , ( , ) m n i j , ( , ) n m i j , ( , ) n n i j compose a square centered at pixel ( , ) i j , and u ( , ) m m g i j , u ( , ) m m g i j , u ( , ) n m g i j , u ( , ) n n g i j are their corresponding coefficients in the horizontal mask, where < < m n i i i , < < m n j j j . First, make interpolation between ( , ) m n i j and( , ) n n i j : = + ÷ × ÷ 1 1 1 ( , ) ( , ) ( ) [ ( , ) ( , )] n m n m n n m n g i j g i j i j g i j g i j (10) Then make interpolation between ( , ) m m i j ando : = + ÷ × ÷ 1 1 1 ( , ) ( , ) ( ) [ ( , ) ( , )] m m m m n m m m g i j g i j i i g i j g i j (11) Finally, make interpolation between 0 a and 0 b : u ' ' = + ÷ × ÷ ( , ) ( , ) ( ) [ ( , ) ( , )] m m n m g i j g i j j j g i j g i j (12) Once we have all eight filter masks, their coefficients can be used to enhance vein image. 2.3 Image segmentation NiBlack segmentation method is a commonly used simple and effective local dynamic threshold algorithm, so we choose this method to segment the image. The basic idea of this algorithm is that for every pixel in the input image; calculate their mean and variance of its × r r neighborhood. Then the result of the following formula is used as threshold to segment image. = + × ( , ) ( , ) ( , ) T x y m x y k s x y (13) Fig. 6(a). Image enhanced by oriented filter Fig. 6(b). Image after segmentation and noise removal Where ( , ) x y represents pixel in the input image, ( , ) T x y represents the threshold of ' ' ¬ e , a a M , o = = 0 ( ) ( ( )) d a H x T T a represents the mean value of o ÷ ÷ = 1 1 0 ( ( )) d a T T a ’s × r r neighborhood, and ( , ) s x y represents the standard deviation of ' = ( ) ( ) v a v a ’s o ' = 0 ( ( )) a T v a neighborhood, k is a correction coefficient. The experiments show that o = 0 ( ) ( ( )) v a T v a equal to 9 and o ÷ ' = 1 0 ( ) ( ( )) v a T v a equal to 0.01is excellent. There are some Biometrics 36 small black blocks in the background and some small white holes on the target object in the segmented image. Such noise can be removed with area elimination method. Because of variability in image acquisition and the inherent differences in individual samples, the size and ratios of extracted finger veins are often inconsistent. In order to facilitate further research there is a need for standardization of the segmented vein image height (and width). The normalized image is shown in Fig.7. In the experiment, we have standardized the height of the image to 80 pixels. Fig. 7. Standardized finger vein image width Accurate extraction of finger vein pattern is a fundamental step in developing finger vein based biometric authentication systems. The finger vein pattern extraction method proposed and discussed above extends traditional image segmentation methods, by extracting vein object from the oriented filter enhanced image. The addition of oriented filter operation extracts smooth and continuous vein features from not only high quality vein images but also noisy low quality images and does not suffer from the over-segmentation problem. 3. Feature extraction, fusion and matching Finger vein recognition as a feature for biometric recognition has excellent advantages such as being stable, contactless, unique, immune to counterfeiting, highly accurate etc. This makes finger-vein recognition widely considered as the most promising biometric technology for the future. Naoto Miura [14] proposed one method for finger-vein recognition based on template matching. In the experiment, the finger-vein image is first binarized, and then using a distance transform noise is removed, and embedded hidden Markov model is used for finger-vein recognition. This approach is time intensive, and another major limitation is that it cannot recognize distorted finger-vein images correctly. Kejun Wang [15] combined wavelet moment, PCA and LDA transform for finger-vein recognition. Here the metric of finger-vein image is converted to a one-dimensional vector, which has been reduced dimensionally. To deal with the problem of high dimensionality, researchers usually first partition the finger-vein image and then principal component analysis (PCA) is applied. To date, this has been the most popular method for dimensionality reduction in finger-vein recognition research. Xueyan Li [16] proposed a method, which combines two-dimensional wavelet and texture characteristic, to recognize the finger vein while Xiaohua Qian [17] used seven moment invariant finger vein features. Euclidean distance and a pre-defined threshold were used as the classifying criterion for matching and recognition. Chengbo Yu [13] defined valley regions as finger vein features such that real features could not be missed and the false features would not be extracted. Zhongbo Zhang [18] proposed an algorithm based wavelet and neural network, which extracts features at multi-scale. Zhang's algorithm can capture features from degraded images. Finger Vein Recognition 37 3.1 Novel finger vein recognition methods based using fusion approach The above-mentioned algorithms have different advantages for different problems in finger-vein recognition. However, because fingers have curved surfaces, finger vein diameter is not consistent and the texture characteristic is aperiodic. When near infrared light is used to acquire the image, the gray-scale is uneven and contrast is low; besides, finger veins are tiny and few in number, such that only very few features can be extracted. What is more, a change in the finger position can cause image translation and rotation and influence recognition negatively. To deal with these problems some novel fusion methods are used. First, we discuss a method based on relative distance and angle. This approach makes full use of the uniqueness of topology, the varied distances between the intersection points of two different vein images, and the differences in angles produced by these intersection points connections, all combined for recognition. This method overcomes the influence of image translation and rotation, because relative distance and angle don’t change. Therefore, the method based on these identified characteristics has great use in practice. 3.1.1 Theoretical basis Let a, b, and c be non-zero vectors. u is the angle between two vectors, and the length of line segment is written as   a .The thinned finger-vein image is illustrated as a function denoted as M, which is defined in field D ; where Mis a subset of D . Image translation and rotation occurs in D . Image translation and rotation implies that every point of the image is translated by a vector and rotated by the some angle. The relative distances and angles remain constant before and after translation and rotation, which is proved as follows. The topology produced by all the character point connections can be called the image M. Then Mcan be shown by the vectors: { } ÷ = e 0 1 1 , ,..., n a a a a n N . Let { } + + ÷ = 1 1 [ ] , ,..., s s s n a s a a a , were [ ] a s denotes a vector a , translated by s unit distances. If = < > , [ ] s g a a s , where < > , a b denotes the inner product of a and b ; and ÷ = 0 1 1 ( ) ( , ,..., ) n v a g g g , where ( ) v a is the convolution of a . Now after image M is translated by s unit distances in the plane D , we get the image ' = [ ] M M s . A random translation of image M is translated by d T , is = s s ( ) [ ] ,(0 ) d T a a s s n . Theorem 1. Suppose d T is a translation in D , then = ( ) ( ( )) v a v T a . Proof: ¬ e , s t N , there is < + > = < > [ ], [ ] , [ ] a t a s t a a s (14) + + - < + >= + = = = < > ¿ ¿ ¿ [ ], [ ] [ ] [ ] [0] [ ] [0] [ ] [0], [ ] i i i t i t i i i i i a t a s t a t a s t a a s a a s a a s From formula (14), we know¬ e , s t N , - = ( [ ]) ( ) s s g a t g a which satisfies = ( [ ]) ( ) v a t v a (15) For every d T , - = ( ) ( ) d T a a s which leads to = = ( ) ( [ ]) ( ( )) v a v a s v T a . Theorem 2. After transformation, the relative distances and angles produced by the character point connections are in unchanged Biometrics 38 Proof: In the plane D , ¬ 0 a , 0 b , e 0 c Mdenotes the line segment vector produced by character point connections. u 0 is the angle of 0 a and 0 b . Any angle o , makes the image Mrotate around its ordinate origin by o . ( o is positive while clockwise, and negative otherwise), which is a linear transform o T . This results in image M. For ¬ a and b , there is\ o = 0 0 [ , ] [ , ] a b a b T = 0 0 [ , ] a b o o o o ( ( ÷ ¸ ¸ cos sin sin cos (16) Here [ , ] a b is a homogeneous orthogonal rotated matrix, and o T = o o µ( ) T T T =1, µ · ( ) is the matrix radius. Then we can write o o = = = 0 0 0 T T T a a a a T T a a . Accordingly, =     0 b b (17) o o o < >=< >= < >=< > 2 0 0 0 0 0 0 , , , , a b T a T b T a b a b This leads to u u < > = · 0 , =arccos a b a b (18) Theorem 3. The relative distances and angles are invariable after the translation and rotation Proof: Suppose the image M converts to ' M after translation by d T and rotation by o T . ' ' ¬ e , a a M , suppose o = = 0 ( ) ( ( )) d a H x T T a , then o ÷ ÷ = 1 1 0 ( ( )) d a T T a . Suppose, too, o ÷ ' = 1 ( ) a T a , then o ÷ ' = 1 0 ( ) a T a . From theorem 1, we know ' = ( ) ( ) v a v a , further because of o ' = 0 ( ( )) a T v a and o = 0 ( ) ( ( )) v a T v a , so o ÷ ' = 1 0 ( ) ( ( )) v a T v a . All this leads up to = 0 0 ( ) ( ( )) v a v H a (19) 3.1.2 Method description Extract the intersecting points from the repaired thinned finger-vein image and connect all the points with each other. Compute the relative distances and angles to get the relative distance feature M, and the angle featureu . Fuse these two features by “Logical And”, and on this basis, match any two images to get the number of relative distances and angles that correlate. Only when both features are approximately the same, is the matching successful. Otherwise, the matching has failed. A. Finger vein topology Using Kejun Wang’s method [19] for pre-processing hand-back vein image, combined with region merging and watershed algorithm, the finger-vein skeleton is extracted, thinned and further repaired. A fully meshed topology is formed by selecting the intersecting points on the thinned finger-vein image as character points and connecting these points to each other with straight lines, partitioning the image into several regions as shown in Fig.8. Finger Vein Recognition 39 (i) (ii) (iii) (a) The raw image of finger-vein (b) The image after thinning (c) The image after repairing and marking the intersecting points (d) The image after extracting the intersecting points (e) The fully meshed image after connecting the intersecting points Fig. 8. The finger-vein image feature extraction process In Fig.8, (i) and (ii) are two finger-vein images from same source, so their topology is similar. However, (iii) is of a different source and its topology is obviously different from (i) and (ii). Specifically, the topology expresses an integral property and peculiarity of finger- veins, the relationship between corresponding character points is of importance. 3.1.3 Matching finger-vein images using relative distance and angles From the thinned finger-vein image of Fig.8 (b), we can see the random finger-vein pattern and inner structure. The inner characteristic points produced by the intersecting vein crossings reflect the unique property of the finger-vein. However, those breakpoints may be thought as finger-vein endpoints, which would influence recognition results. For this reason, the more reliable intersecting points are chosen to characterize finger-veins. Considering that different line segments are produced by intersecting points from different finger-vein images, the two features -relative distance and angle - are combined for matching. Relative distance and angle are essential attributes of finger-veins, which ensure the feature uniqueness and reflect different characteristics of finger-vein structure. Fusion of the two Biometrics 40 features with “Logical And” make the recognition results more reliable. Thus, matching two finger-vein images is converted into matching the similarity of topologies. The detailed steps are as follows. 1. Calculate the relative distances and angles of finger-vein image. Suppose, there are d points of intersection in one image, then the number of relative distance is ÷ ( 1) /2 d d . The number of angles produced by the point connections is ÷ ÷ ( 1)( 2) /2 d d d . Here a set of finger-vein image features is defined as u = ( , ) m u R l , where l is the distance of any two intersecting points, u is the angle produced by the point connections, m and u are the number index respectively. Suppose, u = 1 ( , ) m u R l and u = 2 ( , ) n v R l are two sets of finger-vein image features. 2. Compare m relative distances from 1 R with n relative distances from 2 R , by calculating the number of approximately similar relative distances. If the number is greater than the pre-defined threshold, go to next step; else, the matching is assumed to have failed. To take care of position error of those points, we define ÷ <   m n l l e to show the extent of similarity between any two Eigen values ( e is the allowable error range). From experimental analysis, e =0.0005 is very appropriate. 3. Suppose there are q eigenvalues of approximately equivalent relative distances, connect the q character points in the two sets respectively, with each other. Thus, z angles are produced, which are denoted as u 1 z and u 2 z in 1 R and 2 R respectively. On this basis, calculate the number of approximately equivalent angles. If the number is greater than the pre-defined threshold, the matching is successful; else, the matching is thought to have failed. Similarly, u u ' ÷ <   m n e is used to show the relationship of two approximately equivalent. From experimental analysis, ' e =0.006° is very appropriate. 3.2 Finger vein recognition based on wavelet moment fused with PCA transform 3.2.1 Finger vein feature extraction Different people have different finger lengths. Also, there can be variation in the image captured for the same person due to positioning during the image capture process. Thus if image sizes are not standardized, there is bound to be representation error which leads to a decrease in the recognition rate. In this part, we resize the vein image into a specific image block size to facilitate further processing. The original image is standardized to a height of 80 pixels and split along the width into 80 × 80 sub-image block size. If the image is split evenly (given that the image width is generally about 200 pixels) there will be loss of information that will affect recognition. Therefore, the sub-blocks are created with an overlap of 60 pixels for every 80 × 80 image sub-block. The original image can thus be split into 6-7sub-images, with sufficient characteristic quantities for identification. Set a matrix × m n A to represent the standardized images ( , ) f x y . ÷ = 0 1 2 1 [ , , ,..., ] mn n A A A A A (20) Which: i A is a column vector, e ÷ [0, 1] i n Here we define the sub-block of the image width w , standardized image height h (in experiment w=80,h=80 ). Sub-images are extracted at interval r when (in this experiment r =20). Finger Vein Recognition 41 Thus can get sub-image matrix: | | | | | | ÷ ÷ + ÷ + = = = 1 0 1 2 1 1 ,..., ,..., ... ,..., w r w r k kr w kr B A A B A A B A A [x] takes the maximum integer less than x . ÷ ( = + ( ¸ ¸ 1 n w k r (21) Thus we get a total of 1 2 , ,..., k B B B k sub- image, and the size of each sub-image is × w h . Then we extract features for each sub-image B i . The finger block, feature extraction and recognition process shown in Fig.12. 1 v  1 B 2 B k B 2 v  k v  1 2 ; ;...; k V v v v ( = ¸ ¸    feature vector: 1 v  1 B 2 B k B 2 v  k v  1 2 ; ;...; k V v v v ( = ¸ ¸    feature vector: Fig. 12. The sketch map of sub-image extraction 3.2.2 Wavelet transform and wavelet moments extraction Wavelet moment is an invariant descriptor for image features. A wavelet moment feature is invariant to image rotation, translation and scaling so it is successfully applied in the pattern recognition. For each sub-image ( ) , i B x y , its size is × w h . Applying two dimensional Mallat decomposition algorithm, we can make wavelet decomposed image ( ) , i B x y . Biometrics 42 Fig. 13. The sub-image and result of wavelet decomposition Setting ( ) ( ) = e 2 2 , , ( ) i f x y B x y L R to be the analyzed sub-image vein blocks, the wavelet decomposed layer is = + + + 1 2 3 1 1 1 1 ( , ) f x y A D D D (22) where 1 A is the scale for the low frequency component (i.e. approaching component), and 1 2 3 1 1 1 , , D D D are the scales for the horizontal, vertical and diagonal components respectively. ( ) ( ) ( ) ( ) | | = =< > ¿ 1 1 1 ( , ) 1 1 , , , , ( , ), , m n A c m n m n c m n f x y m n (23) ( ) ( ) ¢ ¢ = =< > ¿ 1 1 1 ( , ) 1 1 , , , ( , ) ( , ), ( , ) k k k m n k k D d m n m n d m n f x y m n (24) Where = 1, 2, 3 k = 1 ( , ) c m n is the coefficient of 1 A 1 k d is the coefficient of the three high frequency components. | 1 ( , ) m n is the scale function ¢ 1 ( , ) k m n is the wavelet function Daub4 was chosen for wavelet decomposition, as it produced better identification results from several experimental compared with other wavelets. We use the approximation wavelet coefficients j c to compute the wavelet moment [4]. Set , p q w expressed as (p+q) order central moments of ( , ) f x y . The wavelet moment approximation is: ( ) ÷ + + e ÷ + + e ¦ = ¦ ´ = ¦ ¹ ¿ ¿ ( 1) 0 , 1 , ( 1) , 1 , 2 , 2 ( , ) p q j p q p q m n Z p q j p q k k p q m n Z w m n c m n w m n d m n (25) Here we access the wavelet moment 22 w . 3.2.3 PCA transformation The advantage of using wavelet transform to reduce computation is explored here. Using each sub-image i B directly without the PCA transform not only leads to poor classification of extracted features but also huge computational cost. After the low-frequency wavelet sub- Finger Vein Recognition 43 images compression of the original image to about one-fourth of the original size, PCA decomposition is applied on the sub-image which greatly reduces computation. 3.2.4 The transformation matrix Here we analyze a layer of i B wavelet decomposition of the low-frequency sub-image. For PCA, 1 A is transformed to a separate /4 wh dimension of image vector ç = 1 ( ) Vec A . Five finger vein images per person (*same finger) The Five finger vein images of the m th person Fig. 14. The sketch map of sample classes after PCA transform To illustrate the problem, we take finger vein samples from a total set of c people. Each sample of the same finger has five images as shown in Fig.14. (Note that Fig.14 is only for illustration purpose. In practice, there is an interval of 20pixels between two adjacent sub- blocks, and an overlap of 60pixels as earlier described). The n-th sub-block set of the m-th person is indexed as , m n k ; where n = (1,2,3…., L) ; m L is the total number of sub-blocks for person m. Biometrics 44 To compare images we only use the min k sub-image set where min k = L = min 1 2 3 ( , , ... ) m L L L L . when min k = 5, then = min 1,1 1,2 1,5 ,5 min( , ,..., , ,..., ) c k k k k k (26) Thus, a total of = × min C c k of the available pattern classes, i.e. e e e 1 2 , ,..., C . Four of the corresponding samples in the i-th class ( = + 5 i m k ), = min 1, 2,... k k is the number of sub-image of each finger image, for simplicity, we write: ç ç ç ç ç ,1 ,2 ,3 , 4 , 5 , , , , i i i i i , and all are /4 wh dimension column vectors. The total number of training samples is = 5 N C . Mean of the i-th class training sample: ç ç = = ¿ 5 , 1 1 5 i i j j (27) Mean of all training samples: ç ç = = = ¿¿ 5 1 1 1 C ij i j N (28) The scatter matrix is: e ç ç ç ç = = ÷ ÷ ¿ 1 ( )( )( ) C T i i t i i S P (29) where e ( ) i P is prior probability of the i-th class of training samples. Then we can obtain the characteristic value ì ì 1 2 , ,..., of 1 S (the value of these features have been lined up in sequence by order of ì ì > > 1 2 ..., ) and its corresponding eigenvector¢ ¢ 1 2 , ,..., . Take d before the largest eigenvalue corresponding to the standard eigenvectors orthogonal transformation matrix ¢ ¢ ¢ = 1 2 [ , ,..., ] d P . For each sub-image blocks i B , through the wavelet decomposition of the low-frequency sub-image 1 A , 1 A in accordance with the preceding method into a column vector ç , to extract the features use the transformation matrix P obtained in the previous section, the following formula: ç = T e P (30) This = 1 2 [ , ,..., ] d e e e e is the PCA extraction of feature vectors from sub-image blocks. After several experiments, we found that when = 200 d we can get a good result, and when = 300 d i.e a 300-dimensional compression, we get the best recognition results. 3.2.5 LDA map In general, PCA method is the best for describing feature characteristics, but not the best for feature classification. In order to get better classification results, we use the LDA method for further classification of PCA features. Each sample is transformed into a lower d -dimensional space in the post-dimensional feature vector = 1 2 [ , ,..., ] i i i i d e e e e . Using PCA projection matrix P, = 1, 2,..., i N is the sample number. Our classifier design follows dimension reduction to get PCA feature vectors Finger Vein Recognition 45 1 2 , ,..., N e e e and form the class scatter matrix w S and the within class scatter matrix b S . Calculate the corresponding matrix ÷1 w b S S of the l largest eigenvalue eigenvectorso o o 1 2 , ,..., l . The l largest eigenvectors corresponding to the LDA transformation matrix o o o = 1 2 [ , ,..., ] LDA l W . Then we use the LDA transformation matrix LDA W as = = 1 2 [ , ,..., ] i i i T i l LDA i z z z z W e , = 1, 2,..., i N is the sample number. (31) Thus, we can use the best classification feature z vector to replace the feature vectors e for identification and classification. 3.2.6 Matching and recognition Through the above wavelet decomposition and PCA transform for each sub-image i B , we obtain wavelet moments 22 w and extract feature vector z of PCA and LDA. i B is characterized by = 22 [ ; ] i v w z . Matching feature vectors of finger1 = 22 [ ; ] i v w z and of finger 2 can be done as follows. The first step is the length of V and ' V and may not be the same, that is, k and ' k is not necessarily the same. Here we define: = min( , ') K k k (32) Taking the K vectors of V and ' V for comparison, first analyze the corresponding sub- image blocks i v and ' i v . | | = 22 ; i v w z , | | = 22 ' ' ; ' i v w z (33) From several experiments, we set two threshold vectors t w t z . Euclidean distance between i B sub-images, o i is defined for two feature vectors w and ' w from V and ' V . A matching score defined for V and ' V feature 22 w of wavelet moment of corresponding sub-image i B matching score: o o ÷ ¦ < ¦ = ´ ¦ ¹ _ 0 t i i t t i w if w w w mark else (34) Finally, we obtain wavelet moment feature of the finger matching score: = = ¿ 0 _ _ K i i w mark w mark (35) Similarly, we create a match V and ' V , of the feature vector z scores _ z mark . Finally, combining the scores: = × + × 1 2 _ _ _ total mark s w mark s z mark (36) 1 2 , s s are the share of feature matching scores, and > > 1 2 0, 0, s s + = 1 2 1 s s . Biometrics 46 Thus, if the finger vein 1 and 2 match _ total mark value score is greater than a given threshold, the two fingers match, otherwise they do not match. A minimum distance classifier can also be used for the recognition task. 4. Experimental results 4.1 Processing experimental results To verify the effectiveness of the proposed method, we test the algorithm using images from a custom finger vein image database. The database includes five images each of 300 individuals’ finger veins. Each image size is 320*240. (a) Original image 1 (c) NiBlack segmentation method (d) Our method (b) Original image 2 (e) NiBlack method (f) Our method Fig. 15. Experimental results We have used a variety of traditional segmentation algorithms and their improved algorithms to segment vein image. But segmentation results of vein image by these algorithms aren’t ideal. Because the result of NiBlack segmentation method is better than other methods [13], we use NiBlack segmentation method as the benchmark for comparison. Segmentation was done for all the images in our database using NiBlack segmentation method and using our method. Experimental results show that our method has better performance. To take full account of the original image quality factor, we select two typical images from our database with one from high quality images and the other from poor quality images to show the results of comparative test. Where Fig. 15(a) is the high-quality vein image in which veins are clear and the background noise is small. Fig. 15(b) is the low- quality vein image. The uneven illumination caused the finger vein image to be fuzzy, which seriously affects image quality. We extract veins feature by using our method and compare with results of the NiBlack segmentation method. Experimental results shown in Fig. 15(c) and Fig. 15(e) are obtained from the NiBlack method applied in [9]. This algorithm extracts smooth and continuous vein features of high-quality image. There are a few pseudo-vein characteristics in Fig. 15(c). But in Fig. 15(e), there is much noise in the segmentation results. Segmented image features have poor continuity and smoothness, and there is the effect of the over-segmentation. Experimental results show that apart from smoothness and continuity or removal of noise and pseudo-vein characteristics, the method proposed in this paper extracts vein features effectively not only from the high-quality Finger Vein Recognition 47 images but also from the low-quality vein images as shown in Fig.15(d) and Fig.15(f). We show that the algorithm proposed in this paper performs better than the traditional NiBlack method. 4.2 Relative distance and angle experimental results Finger-vein images (size 320×240) of 300 people were selected randomly from Harbin Engineering University finger-vein database. One forefinger vein image of each person was acquired, so there are 300 training images. Generally, a good recognition algorithm can be successfully trained on a small dataset to get the required parameters and achieve good performance on a large test dataset. Therefore four more images from the forefinger of those 300 people were acquired giving a total sum of 1200 images to be used as verification dataset. When matching, every sample is matched with others, so there are (300×299)/2 = 44850 matching times; 300 of which are legal, while the others are illegal matches. Two different verifying curves are shown in Fig.16. The horizontal axis stands for the matching threshold, and vertical axis stands for the corresponding probability density. The solid curve is legal matching curve, while the dashed is illegal. Both curves are similar to the Gaussian distribution. The two curves intersect, at a threshold of 0.41. The mean legal matching distance corresponds to the wave crest near to 0.21 on the horizontal axis, and the mean of illegal matching distance corresponds to the wave crest near to 0.62 on the horizontal axis. The two wave crests are far from each other with very small intersection. So this method can recognize different finger-veins, especially when the threshold is in the range [0.09-0.38], where the GAR is highest. Fig. 16. Legal matching curve and illegal matching curve The relationship between FRR and FAR is shown in Fig.17. For this method, the closer the ROC curve is to the horizontal axis, the higher the Genuine Acceptance Rate (GAR). Besides, the threshold should be set suitably according to the fact, when FRR and FAR are equivalent, the threshold is 0.47, that is to say, EER is 13.5%. In this case, GAR of the system is 86.5%. The result indicates that this method is reasonable, giving accurate finger-vein recognition. The method above compares the numbers of relative distances and angles which are approximately equivalent from two finger-vein images. In the second step, only the Biometrics 48 intersecting points which are matched successfully on the first step are used and thus, computation of superfluous information is avoided and only information vital to decision making is used. Only when the two matching steps are successful is the recognition successful. According to Theorem 3, the relative distance and angle would not change when even after image translation and rotation. So the proposed algorithm is an effective method for finger-vein recognition. Fig. 17. ROC curve of the method In 1:1 verifying mode, compare the one image out of 1200 samples in verifying set with the image, which has the same source with the former one, in training set to verify. The experiment result is shown in Tab.1, the times of success are 1120, and the rate of success is 93.33%. In 1:n recognition mode, compare the 1200 images with all images in training set, 360000 times in sum. The result is as Tab.2, the times of FAR is 25488, and GAR is 92.92%. Total matching times Number of successes Number of failures Success rate (%) FRR(%) 1200 1120 80 93.33 6.67 Table 1. Test result of FRR in 1:1 mode Total matching times Total false acceptance GAR (%) FAR(%) 360000 25488 92.92 7.08 Table 2. Test result of FAR in 1:n mode To test the ability to overcome image translation and rotation, translate randomly in the range ÷ + [ 10, 10] and rotate the image randomly in the range ÷ + 0 0 [ 10 , 10 ] , in order to establish the translation and rotation test sets. Then verify and recognize the two sets respectively. The samples which have the same source are compared in a 1:1 experiment; matching each sample from the two sets with the samples from the training set to accomplish 1:n experiment. The result is shown in Tab.3, Tab.4, Tab.5 and Tab.6. Finger Vein Recognition 49 Total matching times Number of successes Number of failures Success rate (%) FRR(%) 1:1 1200 1111 89 92.58 7.42 Table 3. Translation test set verification result (1:1) Total matching times No. of genuine acceptance GAR(%) FAR(%) 1:n 360000 27900 92.25 7.75 Table 4. Translation test set recognition result (1:n) As the experiment shows, in 1:1 mode the rate of success can reach 92.58% even though the finger-vein image is translated, and can reach 91.75% when rotated. In 1: n mode, GAR can reach 92.25% and 91.17% respectively, which implies a robust recognition system. Further, the method can overcome the influence caused by image translation and rotation, thus it can meet practical requirements. Total matching times Number of successes Number of failures Success rate (%) FRR(%) 1:1 1200 1101 99 91.75 8.25 Table 5. Rotation test set verification result (1:1) Total matching times No. of genuine acceptance GAR(%) FAR(%) 1:n 360000 31788 91.17 8.83 Table 6. Rotation test set recognition result (1:n) 4.5.5 Experimental results analysis The algorithm was implemented on a Windows XP platform using Visual C + +6.0. Finger vein image capture was performed taking into account the convenience of users, while collecting index finger and middle finger vein images. A total of 287 finger vein images collected for each finger 5 times, and in all, a total of 287x5 = 1435 were collected to form finger vein library. Two sets 287 x 2 = 574 were taken for verification. The identification results based on template matching: We first according to the method proposed in reference [1], recognition for the veins of our library of images. In 1:1 verification mode, we use the validation library of 574 samples to verify the experimental results shown in Tab.7. Biometrics 50 Matching times Pass times Reject recognition times Correct recognition rate (%) Reject recognition rate (%) 574 559 15 97.4 2.6 Table 7. The results of refusing ratio of 1:1 For 1: n identification model, we use the validation library to identify 574 samples of the experimental results shown in Tab.8. Matching times False recognition times False recognition rate (%) 574 7 1.2 Table 8. The result of mistaken identifying ratio of 1: n For a number of reasons, we realize that the algorithm recognition rate is not as high as the reference [1], perhaps due to the acquisition of image and acquisition machine quality problems. The identification results based on wavelet moment: We first decompose the predetermined wavelet sample in the vein sample database, and then construct the wavelet moment features using identified wavelet coefficients. A few typical experimental results of 1:1 verification mode shown in Tab.9. Matching times Pass times Reject recognition times Correct recognition rate (%) Reject recognition rate (%) Hear 574 536 38 93.4 6.6 Daub4 574 547 27 95.3 4.7 Daub8 574 543 31 94.6 5.4 coif2 574 534 40 93.04 6.96 sym 574 529 45 92.2 7.8 Table 9. The results of rejection ratio with different wavelet base in the 1:1 case For 1: n identification pattern, the experimental results shown in Tab.10. Finger Vein Recognition 51 Matching times False recognition times false recognition rate Haar 574 27 4.7 Daub4 574 11 1.9 Daub8 574 19 3.3 coif2 574 30 5.2 sym 574 33 5.7 Table 10. Results of mistaken identifying with different wavelet base in the case of 1: n We chose Daub4 to carry out wavelet decomposition, identification was better than other wavelets. Identification results of wavelet moment integration of PCA. When PCA is used for dimension reduction, the relationship of selection of the compressed dimension k and the proportion of it represent components shown in Tab.11: ì ì = = = ¿ ¿ 1 1 / k N k i i i i w (37) k 215 240 276 299 338 368 380 392 k w 0.75 0.80 0.86 0.90 0.95 0.98 0.99 1.00 Table 11. The compressed dimension and it’s proportion To balance computation and weighting, we use 300 as the dimension for decomposition. Authentication in 1:1 mode and 1: n identification pattern, we use 574 samples in the validation library for the experimental, results shown in Tab.12, 13. Matching times Pass times reject recognition times recognition rate (%) reject recognition rate (%) 574 568 6 98.95 1.05 Table 12. Results of rejection ratio of 1:1 Biometrics 52 Matching times False recognition times False recognition rate (%) 574 4 0.7 Table 13. The results of mistaken identifying ratio of 1: n As seen from the recognition results, recognition rate and rejection rate in the method based on wavelet PCA can meet the requirements of practical applications. In recognition speed, that is, 1: n of the mode can meet the requirements. 5. Conclusion Accurate extraction of finger vein pattern is a fundamental step in developing finger vein based biometric authentication systems. Finger veins have textured patterns, and the directional map of a finger vein image represents an intrinsic nature of the image. The finger vein pattern extraction method using oriented filtering technology. Our method extends traditional image segmentation methods, by extracting vein object from the oriented filter enhanced image. Experimental results indicate that our method is a better enhancement over the traditional NiBlack method [11], [12], [13], and has good segmentation results even with low-quality images. The addition of oriented filter operation, extracts smooth and continuous vein features not only from high quality vein images but also handles noisy low quality images and does not suffer from the over-segmentation problem. However, it requires a little more processing time because of the added oriented filter operation. Topology is an essential image property and usually, even an inflection point may contain plenty of accurate information. Finger-vein recognition is faced with some basic challenges, like positioning, the influence of image translation and rotation etc. To address these problems, essential topology attributes of individual finger veins are utilized in a novel method. Particularly, the relative distance and angles of vein intersection points are used to characterize a finger-vein for recognition, since the topology of finger-vein is invariant to image translation and rotation. The first step is to extract those intersecting points of the thinned finger-vein image, and connect them with line segments. Then relative distances and angles are calculated. Finally combine the two features for matching and recognition. Experimental results indicate that the method can accurately recognize finger-vein, and to a certain degree, overcome the influence image translation and rotation. Furthermore, the method resolves the difficult problem of finger-vein positioning. It is also computationally efficient with minimal storage requirement, which makes the method of practical significance. However there are still problems of non- recognition and false recognition. Besides, pre-procession is an import requirement for this method and the accuracy of pre- processing influences recognition result significantly. In view of this, further research will be done on the pre-procession method, to improve the image quality and the accuracy of feature extraction, and subsequently improve system reliability. This chapter discussed recent approaches to solving the problem of varying finger lengths and proposed using a set of images of same size interval in a selected sub-block approach. For each image sub-block, wavelet moment was performed and PCA features extracted. LDA transform is performed, and the two features were combined for recognition. For Finger Vein Recognition 53 matching and identification, we proposed a method of fuzzy matching scores. Experimental results show that wavelet moment PCA fusion method achieved good recognition performance; error rate FAR was 0.7%, rejection rate FRR of 1.05%. In future research, we are committed to further study finger vein feature fusion with fingerprint and other features to improve system reliability. 6. References [1] N. Miura, A. Nagasaka and T. Miyatake, Extraction of finger-vein patterns using maximum curvature points in image profiles, IEICE Trans. Inf. & Syst. Vol.90, no.D(8),pp. 1185-1194, 2007. [2] Shahin M, Badawi A, Kamel M. Biometric authentication using fast correlation of near infrared hand vein patterns, J. International Journal of Biometrical Sciences, vol.2,no.3,pp.141-148,2007. [3] Lin Xirong,Zhuang Bo,Su Xiaosheng, ZhouYunlong, Bao Guiqiu. Measurement and matching of human vein pattern characteristics, J.Journal of Tsinghua University (Science and Technology).vol. 43no. 2 pp.164-167, 2003. (In Chinese.) [4] H.Tian, S.K.Lam, T. Srikanthan. Implementing OTSU’s Thresholding Process Using Area-time Efficient Logarithmic Approximation Unit, J. Circuits And Systems, vol.5, pp. 21-24, 2003. [5] Zhongbo Zhang, Siliang Ma, Xiao Han. Multiscale Feature Extraction of Finger-Vein Patterns Based on Curvelets and Local Interconnection Structure Neural Network, IEEE Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06), Hong Kong, China,vol 4, pp.145-148,2006. [6] M.Naoto, A.Nagasaka, iM.Takafm Feature Extraction of Finger-vein Patterns Based on Repeated Line Tracking and Its Application to Personal Identification, J. Machine Vision and Application, vol. 15, no.4, pp. 194-203, 2004. [7] Rigau J. Feixas, M.Sbert. Metal Medical image segmentation based on mutual information maximization, In Proceedings of MICCAI 2004, Saint-Malo, France pp.135- 142, 2004. [8] Yuhang Ding, Dayan Zhuang and Kejun Wang, A Study of Hand Vein Recognition Method, Mechatronics and Automation, 2005 IEEE International Conference, vol. 4 no.29pp.2106–2110, 2005. [9] O’Gorman, L. Lindeberg, J.V. Nickerson. An approach to fingerprint filter design, Pattern Recognition, vol. 22 no.1pp. 29-38, 1989. [10] Xiping Luo; Jie Tian. Image Enhancement and Minutia Matching Algorithms in Automated Fingerprint Identification System, J Journal of Software, vol. 13 no.5pp. 946-956. 2002. (In Chinese.) [11] W. Niblack. An Introduction to Digital Image Processing, Prentice Hall, ISBN 978- 0134806747, Englewood Cliffs, NJ, pp.115-116, 1986 [12] Kejun Wang Zhi Yuan. Finger vein recognition based on wavelet moment fused with PCA transform, J Pattern Recognition and Artificial Intelligence, vol. 20 no.5 pp. 692- 697, 2007. (In Chinese.) [13] Chengbo Yu, Huafeng Qing, Biometric Identification Technology Finger Vein Identification Technology: Tsinghua University Press, 2009, pp: 81-87. (In Chinese.) Biometrics 54 [14] Naoto Miura, Akio Nagasaka, Takafumi Miyatake. Feature extraction of finger-vein patterns based on repeated line tracking and its application to personal identification [J]. Machine Vision and Applications, 2004,15(4):194 -203 [15] Kejun Wang, Yuan Zhi. Finger Vein Recognition Based on Wavelet Moment Fused with PCA Transform. [J] Pattern Recognition and Artificial Intelligence. 2007 [16] Xueyan Li. Study of Multibiometrics System Based on Fingerprint and Finger Vein[D]. the doctorate dissertations of Jilin University. 2008. [17] Xiaohua Qian. Research of Finger-vein Recognition Algorithm[D]. MA Dissertation of Jilin University. 2009. [18] Zhong Bo Zhang, Dan Yang Wu, Si Liang Ma. Pattern Recognition, 2006. ICPR 2006. 18th International Conference on Volume: 4 Digital Object Identifier: 10.1109/ICPR.2006.848. Publication Year: 2006, Page(s): 145 – 148 [19] Kejun Wang, Yuhang Ding, Dazhen Wang. A Study of Hand Vein-based Identity Authentication Method [J]. Science & Technology Review. 2005, 23(1) :35-37. [20] Ji Hu, SunJixiang, YaoWei. Wavelet Moment for Images. Journal of Circuits and Systems, 2005, 10(6):132-136 (inChinese) 3 Minutiae-based Fingerprint Extraction and Recognition Naser Zaeri Arab Open University Kuwait 1. Introduction In our electronically inter-connected society, reliable and user-friendly recognition and verification system is essential in many sectors of our life. The person’s physiological or behavioral characteristics, known as biometrics, are important and vital methods that can be used for identification and verification. Fingerprint recognition is one of the most popular biometric techniques used in automatic personal identification and verification. Many researchers have addressed the fingerprint classification problem and many approaches to automatic fingerprint classification have been presented in the literature; nevertheless, the research on this topic is still very active. Although significant progress has been made in designing automatic fingerprint identification systems over the past two decades, a number of design factors (lack of reliable minutia extraction algorithms, difficulty in quantitatively defining a reliable match between fingerprint images, poor image acquisition, low contrast images, the difficulty of reading the fingerprint for manual workers, etc.) create bottlenecks in achieving the desired performance. Nowadays, investigating the influence of the fingerprint quality on recognition performances also gains more and more attention. A fingerprint is the pattern of ridges and valleys on the surface of a fingertip. Each individual has unique fingerprints. Most fingerprint matching systems are based on four types of fingerprint representation schemes (Fig. 1): grayscale image (Bazen et al., 2000), phase image (Thebaud, 1999), skeleton image (Feng, 2006; Hara & Toyama, 2007), and minutiae (Ratha et al., 2000; Bazen & Gerez, 2003). Due to its distinctiveness, compactness, and compatibility with features used by human fingerprint experts, minutiae-based representation has become the most widely adopted fingerprint representation scheme. The uniqueness of a fingerprint is exclusively determined by the local ridge characteristics and their relationships. The ridges and valleys in a fingerprint alternate, flowing in a local constant direction. The two most prominent local ridge characteristics are: 1) ridge ending and, 2) ridge bifurcation. A ridge ending is defined as the point where a ridge ends abruptly. A ridge bifurcation is defined as the point where a ridge forks or diverges into branch ridges. Collectively, these features are called minutiae. Detailed description of fingerprint minutiae will be given in the next section. The widespread deployment of fingerprint recognition systems in various applications has caused concerns that compromised fingerprint templates may be used to make fake fingers, which could then be used to deceive all fingerprint systems the same person is enrolled in. Biometrics 56 Once compromised, the grayscale image is the most at risk. Leakage of a phase image or skeleton image is also dangerous since it is a trivial problem to reconstruct a grayscale fingerprint image from the phase image or the skeleton image. In contrast to the above three representations, leakage of minutiae templates has been considered to be less serious as it is not trivial to reconstruct a grayscale image from the minutiae (Feng & Jain, 2011). Fig. 1. Fingerprint representation schemes. (a) Grayscale image (FVC2002 DB1, 19_1), (b) phase image, (c) skeleton image, and (d) minutiae (Feng & Jain, 2011) In this chapter, we study the recent advancements in the field of minutia-based fingerprint extraction and recognition, where we give a comprehensive idea about some of the well- known methods that were presented by researchers during the last two decades. Further, we provide a special focus on the recent techniques presented in the last few years. A close analysis of the fingerprint image will be discussed and the various minutiae features shall be described, as well. 2. Fingerprint minutiae description The first scientific studies on fingerprint classification were made by (Galton, 1892), who divided the fingerprints into three major classes. Later, (Henry, 1900) refined Galton’s classification by increasing the number of the classes. All the classification schemes currently used by police agencies are variants of the so-called Henry’s classification scheme. As mentioned in the previous section, the uniqueness of a fingerprint is exclusively determined by the local ridge characteristics and their relationships (Kamijo, 1993; Kawagoe & Tojo, 1984). The ridges and valleys in a fingerprint alternate, flowing in a local constant direction (Fig. 2). Eighteen different types of fingerprint features have been enumerated by (Federal Bureau of Investigation, 1984). Further, a total of 150 different local ridge characteristics (islands, short ridges, enclosure, etc.) have been identified by (Kawagoe & Tojo, 1984). These local ridge characteristics are not evenly distributed. Most of them depend heavily on the impression conditions and quality of fingerprints and are rarely observed in fingerprints. The two most prominent local ridge characteristics are: 1) ridge ending and, 2) ridge bifurcation. A ridge ending is defined as the point where a ridge ends abruptly. A ridge bifurcation is defined as the point where a ridge forks or diverges into branch ridges. Collectively, these features are called minutiae. Most of the fingerprint extraction and matching techniques restrict the set of features to two types of minutiae: ridge endings and ridge bifurcations, as shown in Fig. 3. A good quality fingerprint typically contains about 40–100 minutiae. Minutiae-based Fingerprint Extraction and Recognition 57 In a latent or partial fingerprint, the number of minutiae is much less (approximately 20 to 30). More complex fingerprint features can be expressed as a combination of these two basic features. For example, an enclosure can be considered a collection of two bifurcations and a short ridge can be considered a pair of ridge endings as shown in Fig. 4. Each of the ridge endings and ridge bifurcations types of minutiae has three attributes, namely, the x-coordinate, the y-coordinate, and the local ridge direction (θ) as shown in Fig. 5. Many other features have been derived from this basic three- dimensional feature vector. Given the minutiae representation of fingerprints, matching a fingerprint against a database reduces to the problem of point matching. Fig. 2. Gray level fingerprint images of different types of patterns with core (□) and delta (Δ) points: (a) arch; (b) tented arch; (c) right loop; (d) left loop; (e) whorl; (f) twin loop (Ratha et al., 1996) Fig. 3. Two commonly used fingerprint features: (a) ridge bifurcation; (b) ridge ending (Ratha et al., 1996) Biometrics 58 Fig. 4. Complex features as a combination of simple features: (a) short ridge; (b) enclosure (Ratha et al., 1996) The matching problem can be defined as finding a degree of match between a query and reference fingerprint feature set. The minutiae sets can be matched using many techniques, where some of them will be addressed in following sections. The large computational requirement of matching is primarily due to the following three factors: 1) a query fingerprint is usually of poor quality, 2) the fingerprint database is very large, and 3) structural distortion of the fingerprint images requires powerful matching algorithms. Fig. 5. Components of a minutiae feature (Hong et al., 1998) In addition to minutiae features described above, there are other high-level features that can be used in reducing the search space during a match. A very important feature for this purpose is the pattern class of a fingerprint. Fingerprints are classified into five main categories: • arch, • tented arch, • left loop, • right loop, and • whorl. Minutiae-based Fingerprint Extraction and Recognition 59 The pattern class may be ambiguous in partial fingerprints and indeterminate for noisy fingerprints. Yet another high-level feature is the ridge density in a fingerprint. Ridge density can be defined as the number of ridges per unit distance. In order to make it invariant to position, the ridge density between two singular points in a fingerprint is computed. Some singular points of interest are defined as the core and delta points (Ratha et al., 1996). The core point is the top most point on the inner most ridge and a delta point is the tri-radial point with three ridges radiating from it (Fig. 2 and Fig. 6). Fig. 6. Three levels of fingerprint features (Zhang et al., 2011) Fig. 7. Features at three levels in a fingerprint. (a) Grayscale image (NIST SD30, A067_11), (b) Level 1 feature (orientation field), (c) Level 2 feature (ridge skeleton), and (d) Level 3 features (ridge contour, pore, and dot) (Feng & Jain, 2011) Recently, fingerprint features have been classified at three distinctive levels of detail (Feng & Jain, 2011; Zhang et al., 2011), as shown in Figs. 6 and 7. Although their definitions for level- 1 and level-2 are different, they agree on the definition of level-3. In (Zhang et al., 2011), level-1 features are the macro details of fingerprints, such as singular points and global ridge patterns, e.g., deltas and cores (indicated by red triangles in Fig. 6). They are not very distinctive and are thus mainly used for fingerprint classification rather than recognition. The level-2 features (red rectangles) primarily refer to the minutiae, namely, ridge endings and bifurcations. Level-2 features are the most distinctive and stable features, which are used in almost all automated fingerprint recognition systems and can reliably be extracted from low-resolution fingerprint images (~500 dpi). A resolution of 500 dpi is also the standard fingerprint resolution of the Federal Bureau of Investigation for automatic fingerprint recognition systems using minutiae (Jain et al., 2007). Level-3 features (red circles) are often defined as the dimensional attributes of the ridges and include sweat pores, Biometrics 60 ridge contours, and ridge edge features, all of which provide quantitative data supporting more accurate and robust fingerprint recognition. Among these features, recent researches are focusing on pores (International Biometric Group, 2008; Jain et al., 2006; Jain et al., 2007; Parsons et al., 2008; Zhao et al., 2008; Zhao et al., 2009), where they are considered to be reliably available only at a resolution higher than 500 dpi. 3. Structural approach One of the early attempts to automate fingerprint recognition was proposed by (Liu & Shelton, 1970). The fundamental concept underlying the proposed system is to use an operator to recognize the ridge characteristics and to impart to a computer the ability to manipulate and compare the digitized locations and directions of these characteristics for single-fingerprint classification. In (Moayer & Fu, 1975) and (Rao & Balck, 1980), patterns were described by means of terminal symbols and production rules. Terminal symbols are associated to small groups of directional elements within the fingerprint directional image. A grammar is defined for each class and a parsing process is responsible for classifying each new pattern. (Moayer & Fu, 1976) demonstrated how a tree system may be used to represent and classify fingerprint patterns. The fingerprint impressions are subdivided into sampling squares which are preprocessed and postprocessed for feature extraction. A set of regular tree languages is used to describe the fingerprint patterns. In order to infer the structural configuration of the encoded fingerprints, a grammatical inference system is developed. In (Maio & Maltoni, 1996), a well-defined structural approach for fingerprint classification was presented. The basic idea is to perform a directional image partitioning into several homogeneous regular-shaped regions, which are used to build a relational graph summarizing the fingerprint macro-features. The whole approach can be divided into four main steps: computation of the directional image, segmentation of the directional image, construction of the relational graph, and inexact graph matching. The directional image is computed over a discrete grid by means of a robust technique proposed by (Donahue & Rokhlin, 1993). A dynamic clustering algorithm (Maio et al., 1996) is adopted to segment the directional image according to well-suited optimality criteria. In particular, with the aim of creating regions as homogeneous as possible, the algorithm works by minimizing the variance of the element directions within the regions and, simultaneously, by maintaining the regularity of the region shape. Starting from the segmentation of the directional image, a relational graph is built by creating a node for each region and an arc for each pair of adjacent regions. By appropriately labeling the nodes and arcs of the graph, the authors obtained a structure which summarizes the topological features of the fingerprint and is invariant with respect to displacement and rotation. The PCASYS approach (Pattern-level Classification Automation SYStem) proposed by (Candela & Chellappa, 1993) and (Candela et al., 1995) assigns fingerprints to six non- overlapping classes. Before computing the directional images, the ridge-line area is separated from the background and an enhancement is performed in the frequency domain. The computation of the directions is carried out by the method reported in (Stock & Swonger, 1969). The directional image is then registered with respect to the core position which corresponds to the fingerprint center. The dimensionality of the directional image, considered as a vector of 1,680 elements, is reduced to 64 elements by using the principal component analysis (Jolliffe, 1986). At this stage, a PNN (Probabilistic Neural Network) (Specht, 1990) is used for assigning each 64-element vector to one class of the classification Minutiae-based Fingerprint Extraction and Recognition 61 scheme. In order to improve the classification reliability, especially for whorl fingerprints, the authors also implemented an auxiliary module (called pseudoridge tracer), which works by analyzing the ridge-line concavity under the core position. (Wahab et al., 1998) described an enhanced fingerprint recognition system consisting of image preprocessing, feature extraction and matching that runs accurately on a personal computer platform. The image preprocessing includes histogram equalization, modification of directional codes, dynamic thresholding and ridgeline thinning. Only the extracted features are stored in a file for fingerprint matching. The matching algorithm presented is a modification and improvement of the structural approach. In their approach, they first divided the original image (320 x 240) into 40 x 30 small areas. Next, each area is assigned a directional code to represent the direction of the ridgeline in that area. To reduce computational time, a total of eight directional codes are used. The eight directional windows w d (d = 0, 1, 2, ..., 7), each having a length of 16 pixels are shown in Fig. 8. To find the ridge direction of a given area, each of the directional windows, w d is moved in the direction tangential to the direction of the window. Each of the directional windows will have to move eight times to cover the entire area. At each location when the window moves, the mean value M( Wd) of the grey level of the pixels in the window is calculated. The fluctuation of M(Wd) is expected to be the largest when the movement of the directional window is orthogonal to the direction of the ridges. Therefore this area will be assigned to have ridges in the direction d such that the fluctuation of M( Wd) is the largest. Fig. 8. Eight directional windows Wd for extraction of ridge direction (Wahab et al., 1998) 4. Ridge orientation approach Since the performance of a minutiae extraction algorithm relies heavily on the quality of the input fingerprint images, it is essential to incorporate a fingerprint enhancement algorithm in the minutiae extraction module to ensure that the performance of the system is robust with respect to the quality of input fingerprint images. In practice, due to variations in impression conditions, ridge configuration, skin conditions (aberrant formations of epidermal ridges of fingerprints, postnatal marks, and occupational marks), acquisition devices, and non-cooperative attitude of subjects, etc., a significant percentage of acquired fingerprint images is of poor quality. The ridge structures in poor-quality fingerprint images are not always well-defined and, hence, they cannot be correctly detected. This leads to following problems: Biometrics 62 1. a significant number of spurious minutiae may be created, 2. a large percent of genuine minutiae may be ignored, and 3. large errors in their localization (position and orientation) may be introduced. In order to ensure that the performance of the minutiae extraction algorithm is robust with respect to the quality of the input fingerprint images, an enhancement algorithm that improves the clarity of the ridge structures is necessary. Fingerprint enhancement can be conducted on either: 1) binary ridge images or, 2) gray-level images. A binary ridge image is an image where all the ridge pixels are assigned a value one and nonridge pixels are assigned a value zero. The binary image can be obtained by applying a ridge extraction algorithm on a gray-level fingerprint image. Since ridges and valleys in a fingerprint image alternate and run parallel to each other in a local neighborhood, a number of simple heuristics can be used to differentiate the spurious ridge configurations from the true ridge configurations in a binary ridge image. However, after applying a ridge extraction algorithm on the original gray-level images, information about the true ridge structures is often lost depending on the performance of the ridge extraction algorithm. Therefore, enhancement of binary ridge images has its inherent limitations. In a gray-level fingerprint image, ridges and valleys in a local neighborhood form a sinusoidal-shaped plane wave which has a well-defined frequency and orientation. Fig. 9. Fingerprint images of very poor quality (Hong et al., 1998) (Hong et al., 1998) presented a fast fingerprint enhancement algorithm, which can adaptively improve the clarity of ridge and valley structures of input fingerprint images based on the estimated local ridge orientation and frequency using both the local ridge orientation and local frequency information. (Vaikol et al., 2009) presented a reliable method of computation for minutiae feature extraction from fingerprint images. The scheme relies on describing the orientation field of the fingerprint pattern with respect to each minutia detail. A fingerprint image is treated as a textured image, where an orientation flow field of the ridges is computed. To accurately locate ridges, a ridge orientation based computation method is used. After ridge segmentation, smoothing is done using morphological operators. (Choi et al., 2010) introduced a novel fingerprint matching algorithm using both ridge features and the conventional minutiae features to increase the recognition performance against nonlinear deformation in fingerprints. The proposed ridge features are composed of four elements: ridge count, ridge length, ridge curvature direction, and ridge type. These ridge features have some advantages in that they can represent the topology information in Minutiae-based Fingerprint Extraction and Recognition 63 entire ridge patterns that exist between two minutiae and are not changed by non-linear deformation of the finger. For extracting ridge features, they have also defined the ridge- based coordinate system in a skeletonized image. With the proposed ridge features and conventional minutiae features (minutiae type, orientation, and position), they have proposed a novel matching scheme using a breadth first search to detect the matched minutiae pairs incrementally (Fig. 10). Fig. 10. Example of matched minutiae using the proposed ridge feature vectors (solid circles represent matched minutiae and dotted lines represent the vertical axis of each minutia) (Choi et al., 2010) 5. Pixel-level approach (Abutaleb & Kamel, 1999) used the fact that a fingerprint is made of white followed by black lines of bounded number of pixels. This enabled the problem formulation to be cast as a parametric optimization problem. The parameters are the widths of the black and white lines in the scanned line in the fingerprint. The proposed adaptive genetic algorithm proved to be effective in determining the ridges or edges in the fingerprint. Further, (Ceguema & Koprinska, 2002) presented an approach for combining local and global recognition schemes for automatic fingerprint verification by using matched local features as the reference axis for generating global features. In their implementation, minutia-based and shape-based techniques were combined. The first one matches local features (minutiae) by a point- pattern matching algorithm. The second one generates global features (shape signatures) by using the matched minutiae as its frame of reference. Shape signatures are then digitized to form a feature vector describing the fingerprint. Finally, a Learning Vector Quantization neural network was trained to match the fingerprints using the difference between a pair of feature vectors. In (Zhang et al., 2010), investigation has been conducted on analyzing the mechanisms of fingerprint image rotation processing and its potential effects on the major features, mainly minutiae and singular point, of the rotation transformed fingerprint. It was observed that Biometrics 64 the information integrity of the original fingerprint image can be significantly compromised by the image rotation transformation process, which can cause noticeable singular point change and produce non-negligible number of fake minutiae. It is found that the quantization and interpolation process can change the fingerprint features significantly though they may not change the image visually. Their experimental results have shown that up to 7% of the minutiae can be mis-matched. For the matched ones, their positions deviate up to 16 pixels. The position of singular point can change up to 55 pixels while the orientation angle change can be up to 90 degrees. (Kaur et al., 2010) proposed an approach for feature extraction based on dividing the image into equal sized blocks. Each block is processed independently. The gray level projection along a line perpendicular to the local orientation field provides the maximum variance. Then the ridges are located using the peaks and the variance in this projection. The ridges are thinned and the resulting image is enhanced using an adaptive morphological filter. Square-based method was presented in (Gamassi et al., 2005) and (Alibeigi et al., 2009). The Square-based method is composed of the following steps, repeated for each pixel of the binary image: 1. Create a 3x3 square mask around the (x, y) pixel and compute the average of the pixels. If the average is less than 0.25 the pixel is preliminary identified as a ridge termination minutiae, otherwise if the average is greater than 0.75 the pixel is treated like a bifurcation minutiae. 2. Create a square perimeter P around the (x, y) pixel of size W×W. 3. Compute the number of the logic commutations present in the perimeter P without considering isolated pixels as shows in Fig. 11. Fig. 11. Processing the commutations (Gamassi et al., 2005) 4. The algorithm continues if there are two logic commutations, otherwise it jumps to step 1 processing another pixel. 5. Compute the average of the pixels in the perimeter P. If the pixel has been defined as a termination minutiae in step 1, it checks if the average is greater than the threshold K. (in bifurcation minutiae, the average must be less than 1-K) otherwise it jumps to step 1 processing another pixel. 6. Estimate the orientation angle α in the minutiae point. 7. False detection removal (Fig. 12). Minutiae-based Fingerprint Extraction and Recognition 65 Fig. 12. Estimating the orientation angle (left) and removing the false terminations detections (right) (Gamassi et al., 2005) 6. Filtering and wavelet approach (Lee & Wang, 1999) have developed a one-step method using Gabor filters for directly extracting fingerprint features for a small-scale fingerprint recognition system. From the experimental results, the use of magnitude Gabor features with eight orientations as fingerprint features led to good shift-invariant properties and an accuracy of 97.2% with 3- NN classifiers, on a database of 192 inked fingerprint images from 16 persons. In (Watson et al., 1994) and (Willis & Myers, 2001), the fingerprint’s blockwise Fourier transform is multiplied by its power spectrum raised to a power, thus magnifying the dominant orientation. Also, a Laplacian-like image pyramid is used to decompose the original fingerprint into sub- bands corresponding to different spatial scales for fingerprint enhancement. The Laplacian pyramid (Adelson et al., 1984), (Simoncelli & Freeman, 1995) is equivalent to bandpass filtering in the spatial domain. In a further step, contextual smoothing is performed on these pyramid levels, where the corresponding filtering directions stem from the frequency- adapted structure tensor. For minutiae extraction, parabolic symmetry is added to the local fingerprint model which allows to accurately detecting the position and direction of a minutia simultaneously. Also, (Cappelli et al., 1999) have implemented the directional approach. The directional image is partitioned into “homogeneous” connected regions according to the fingerprint topology, thus giving a synthetic representation which can be exploited as a basis for the classification. A set of dynamic masks, together with an optimization criterion, was used to guide the partitioning. The adaptation of the masks produces a numerical vector representing each fingerprint as a multidimensional point, which can be conceived as a continuous classification. Different search strategies were discussed to efficiently retrieve fingerprints both with continuous and exclusive classification. A directional image is a discrete matrix whose elements represent the local average directions of the fingerprint ridge lines. (Hong et al., 1998) proposed an algorithm using Gabor bandpass filters tuned to the corresponding ridge frequency and orientation to remove undesired noise while preserving the true ridge-valley structures. All operations are performed in the spatial domain, whereas Biometrics 66 the contextual filtering in (Sherlock et al., 1994) and (Chikkerur & Govindaraju, 2005) is done in the Fourier domain. From (Tico et al., 2001), the discrete wavelet transform (DWT) coefficients have been used as the ridge pattern. The authors discussed that the middle frequency has an oscillated pattern corresponding to the ridge pattern. Then, to extract wavelet features from a gray-scale fingerprint image, the image was first cropped to the size of 64×64 pixels, where the center point in the image is referred to as a reference point. The cropped image was then quartered, centered at the reference point, to obtain four non- overlapping images of size 32×32 pixels. After applying the DWT to each non-overlapping image four times, twelve sub-images in the wavelet domain at each decomposition level are created as shown in Fig. 13. Next, the standard deviation of the DWT coefficients from each sub-image is computed to create a feature vector of length 48 (12 DWT sub-images from 4 non-overlapping images). The resulted feature vector is then used as a representation of that fingerprint image. Fig. 13. Arrangements of twelve sub-images in the wavelet domain (Tico et al., 2001) On the other hand, it was shown in (Tachaphetpiboon & Amornraksa, 2005, 2007) that the discrete cosine transform (DCT) is better suited in extracting informative features than the DWT. The results have shown that the fingerprint matching system based on the DCT obtained a high recognition rate and a lower complexity. (Tachaphetpiboon & Amornraksa, 2005) proposed to divide all the DCT coefficients containing the oscillate pattern in a zigzag- scanned fashion, extract DCT features from the divided DCT coefficients, and then use them for fingerprint matching. Accordingly, all the DCT coefficients in each non-overlapping image were divided into 12 areas, where one feature was extracted from each one of these areas, generating 12 features for each non-overlapping image. (Fronthaler et al., 2008) proposed the use of an image-scale pyramid and directional filtering in the spatial domain for fingerprint image enhancement to improve the matching performance as well as the computational efficiency. Image pyramids or multiresolution processing is especially known from image compression and medical image processing. (Fronthaler et al., 2008) expected that all the relevant information to be concentrated within a few frequency bands. Furthermore, they have proposed Gaussian directional filtering to enhance the ridge-valley pattern of a fingerprint image using computationally cheap 1-D filtering on higher pyramid levels (lower resolution) only. The filtering directions are recovered from the orientations of the structure tensor (Bigun, 2006) at the corresponding pyramid level. Linear symmetry features are thereby used to extract the local ridge-valley orientation (angle and reliability). (On et al., 2006) presented a filtering strategy that can solve the problem of rotated scanned input images. The fingerprint image is scanned with an optical fingerprint scanner. The Minutiae-based Fingerprint Extraction and Recognition 67 scanned fingerprint image is saved in bitmap format with black and white colour. The scanned fingerprint image is then enhanced for quality improvement. Further, the enhanced fingerprint image is applied for binarization. The conversion is needed to reduce the computation and analysis time for filtering and thinning process. The noise produced from the binarized fingerprint image is then removed using median filtering and the filtered fingerprint image is further thinned. After that, the bifurcation minutiae extraction method is applied for the thinned fingerprint image. The extracted feature data are then used for neural network training. The concept of spectral minutiae representation was used by (Xu & Veldhuis, 2009). The spectral minutiae representation is based on the shift, scale and rotation properties of the two-dimensional continuous Fourier transform. Assume a fingerprint with Z minutiae. In location-based spectral minutiae representation (SML), with every minutia, a function m i (x, y) = δ(x − x i , y − y i ), i = 1, . . . , Z is associated where (x i , y i ) represents the location of the i-th minutia in the fingerprint image. Thus, in the spatial domain, every minutia is represented by a Dirac pulse. The Fourier transform of m i (x, y) is given by: (1) and the location-based spectral minutiae representation is defined as (2) In order to reduce the sensitivity to small variations in minutiae locations in the spatial domain, a Gaussian low-pass filter is used to attenuate the higher frequencies. This multiplication in the frequency domain corresponds to a convolution in the spatial domain where every minutia is now represented by a Gaussian pulse. Following the shift property of the Fourier transform, the magnitude of M is taken in order to make the spectrum invariant to translation of the input, and we obtain (3) Then, the orientation information in the spectral representation is included. The orientation θ of a minutia can be incorporated by using the spatial derivative of m(x, y) in the direction of the minutia orientation. Thus, to every minutia in a fingerprint, a function m i (x, y, θ) is assigned being the derivative of m i (x, y) in the direction θ i , such that (4) As with the SML algorithm, using a Gaussian filter and taking the magnitude of the spectrum yields Biometrics 68 (5) Recently, (Xu & Veldhuis, 2010) have further discussed the objective of the spectral minutiae representation in representing a minutiae set as a fixed-length feature vector that is invariant to translation, rotation and scaling. Fig. 14 illustrates a general procedure of the spectral minutiae representation discussed by (Xu & Veldhuis, 2010). Fig. 14. Illustration of the general spectral minutiae representation procedure. (a) a fingerprint and its minutiae; (b) representation of minutiae points as real (or complex) valued continuous functions; (c) the 2D Fourier spectrum of ‘b’ in a Cartesian coordinate and a polar-logarithmic sampling grid; (d) the Fourier spectrum sampled on a polar- logarithmic grid (Xu & Veldhuis, 2010) Moreover, based on the spectral minutiae feature, (Xu et al., 2009a, 2009b) introduced two feature reduction methods: the Column-Principal Component Analysis (PCA) and the Line- Discrete Fourier Transform feature reduction algorithms. The experiments demonstrated that these methods decrease the minutiae feature dimensionality with a reduction rate of 94%, while at the same time, the recognition performance of the fingerprint system is not degraded. On the other hand, (Dadgostar et al., 2009) presented a novel feature extraction method based on Gabor filter and Recursive Fisher Linear Discriminate (RFLD) algorithm, for fingerprint identification. The proposed method was assessed on images from the biolab database (Biometric System Lab). Experimental results have shown that applying RFLD to a Gabor filter in four orientations, in comparison with Gabor filter and PCA transform, increases the identification accuracy from 85.2% to 95.2% by nearest cluster center point classifier with Leave-One-Out method. Also, it has been shown that applying RFLD to a Gabor filter in four orientations, in comparison with Gabor filter and PCA transform, increases the identification accuracy from 81.9% to 100% by 3NN classifier. 7. Geometric approach (Chen et al., 2009) proposed an algorithm to use minutiae for fingerprint recognition, in which the fingerprint’s orientation field is reconstructed from minutiae and further utilized in the matching stage to enhance the system’s performance. First, they have produced “virtual” minutiae by using interpolation in the sparse area, and then used an orientation model to reconstruct the orientation field from all “real” and “virtual” minutiae (Fig. 15). A Minutiae-based Fingerprint Extraction and Recognition 69 decision fusion scheme is used to combine the reconstructed orientation field matching with the conventional minutiae-based matching. (Min & Thein, 2009) have presented a recognition system which combines both the statistical and geometry approaches. The core point (CP) of the input fingerprint is detected and located in the centre. Then, the fingerprint image is cropped around the based point. Fingerprint features such as minutiae points' determination, their coordinates location, and radius of arcs for each ridge are stored in different databases. For a testing fingerprint image, the features are compared with these pre-defined databases and the decision is made by a voting system. In (Wei-bo et al., 2008), each minutia was defined by the type and the relative topological relationship among the minutia and its 5 nearest neighbors. (Qi et al., 2008) proposed a fingerprint matching algorithm using the elaborate combination of minutiae and curvature maps from fingerprint images. First, they computed the curvature in a simple way based on orientation field, and then performed the sampling operation on the curvature map around each minutia to get the fixed length minutiae specifiers. Second, a similarity measurement was defined between two specifiers. Third, they found the reference points pair based on computing the least squared error of Euclidean distance between these two specifiers. Finally, they completed the matching task by aligning the two fingerprint minutiae sets and accounting the number of overlapping minutiae. Fig. 15. Interpolation step: (a) the minutiae image; (b) the triangulated image; (c) virtual minutiae by interpolation (the bigger red minutiae are “real,” while the smaller purple ones are “virtual”) (Chen et al., 2009) 8. Singularity approach The global features of a fingerprint are singularity points, namely core and delta. Generally, singularity points are used to classify fingerprint images to reduce the search space. (Kryszczuk & Drygajlo, 2006) presented a method in which the singular point detection is performed by analyzing the local quadrant change of the ridge gradient vectors. Singular points (SP) are defined as discontinuities in the directional field (Liu et al., 2005). In formally, this can be stated as the area where ridges oriented rightwards change to leftwards and those that were oriented upwards turn downwards, and opposite. Their algorithm performs a robust estimation of the local ridge gradient. They employed a modified version of the “squared average gradients” to estimate the direction of the smoothed gradient vectors. Biometrics 70 Also, they allowed cancelling out the opposite local gradients, achieving a more robust average local ridge gradient estimation. Moreover, (Militello et al., 2008) proposed a fingerprint recognition approach based on core and delta singularity points detection. The singularity points extraction is performed using three sequential steps: directional image extraction, Poincarè indexes computation and core and delta extraction. The approach has shown a good accuracy level in the singularity points detection and extraction and a low computational cost. (Conti et al., 2010) proposed another fingerprint recognition that is based on singularity points detection and singularity regions analysis. Despite to the classical minutiae-based fingerprint recognition system, the proposed system is based on core and delta position, their relative distance and orientation to perform both classification and matching tasks. The proposed approach enhances the performance of singularity points based methods by introducing pseudo-singularity points when the standard singularity points (core and delta) cannot be extracted. As a result, singularity points and/or pseudosingularity-points are detected and extracted to make possible successful fingerprint classification and recognition. After singularity points extraction, a roto-translation operation is applied for fingerprint image registration. Finally, a matching algorithm based on morphological operation, such as dilation and erosion, on two considered regions of interests (singularity regions or pseudo singularity regions) around core and delta is performed. The obtained similarity degree considering the regions of interest gives the matching result. The experimental results have shown good accuracy levels, reaching a FAR=1.22% and a FRR=9.23% using FVC2002 DB2- A database, and in the best of case, a FAR=0.26% and a FRR=7.36% using FVC2000 DB1-B database. 9. Pore approach Recently, researchers have focused on pores as a distinctive fingerprint features (International Biometric Group, 2008; Jain et al., 2006, 2007; Parsons et al., 2008; Zhang et al., 2011; Zhao et al., 2008; Zhao et al., 2009). Focusing on this kind of features depends heavily on the quality of the digital fingerprint image. Resolution is one of the main parameters affecting the quality of a digital fingerprint image, and so, it has an important role in the design and deployment of fingerprint recognition systems and impacts both their cost and recognition performance. Despite this, the field of fingerprint recognition does not currently have a well-proven reference resolution or standard resolution that can be used interoperably between different systems. For example, (Jain et al., 2007) chose a resolution of 1,000 dpi based on the 2005 ANSI/NIST fingerprint standard update workshop. (Zhao et al., 2008, 2009) proposed some pore extraction and matching methods at a resolution of 902 dpi x 1200 dpi. Finally, the International Biometric Group analyzed level-3 features at a resolution of 2000 dpi. (Zhao & Jain, 2010) have studied the utility of pores on rolled ink fingerprint images which are widely used in forensic applications. Fingerprint images of three different qualities at two different resolutions (500ppi and 1000ppi) were considered in their experiments. By using NIST SD30 database, and a commercial minutiae matcher, they have investigated the impact of fingerprint image quality on the accuracy of automatic pore extraction, and the effectiveness of pores in improving fingerprint recognition accuracy. The experimental results have shown that the (i) pores do not provide any significant improvement to the fingerprint recognition accuracy on 500ppi fingerprint images, and (ii) fusion of pore and Minutiae-based Fingerprint Extraction and Recognition 71 minutiae matchers is effective only for high resolution (1000ppi) fingerprint images of good quality. (Zhang et al., 2011) have taken further steps toward establishing a reference resolution, assuming a fixed image size and making use of the two most representative fingerprint features, i.e., minutiae and pores, and providing a minimum resolution for pore extraction that is based on anatomical evidence. They conducted experiments on a set of fingerprint images of different resolutions (from 500 to 2000 dpi). By evaluating these resolutions in terms of the number of minutiae and pores, their results have shown that 800 dpi would be a good choice for a reference resolution. (Malathi & Meena, 2010) presented a suitable technique for partial fingerprint matching based on pores and its corresponding Local Binary Pattern (LBP) features. The first step involves extracting the pores from the partial image. These pores act as anchor points where a sub window (32x32) is formed to surround them. Then rotation invariant LBP histograms are obtained from this surrounding window. Finally, a chi-square formula is used to calculate the minimum distance between two histograms to find the best matching score. 10. Other approaches In this section we present an overview of some other general techniques proposed for fingerprint recognition. Neural network approaches are mostly based on multilayer perceptrons or Kohonen self-organizing networks (Bowen, 1992; Hughes & Green, 1991; Kamijo, 1993; Moscinska & Tyma, 1993). In particular, (Kamijo, 1993) presented an interesting pyramidal architecture constituted by several multilayer perceptrons, each of which was trained to recognize fingerprints belonging to different classes. (Wang et al., 2008) proposed the cellular neural/nonlinear network (CNN) as a powerful tool for fingerprint feature extraction. They presented two theorems for designing two kinds of CNN templates. These two theorems provided the template parameter inequalities to determine parameter intervals for implementing the corresponding functions. (Senior, 1997) proposed a hidden Markov model classifier whose input features are the measurements (ridge angle, separation, curvature, etc.) taken at the intersection points between some horizontal- vertical fiducial lines and the fingerprint ridge lines. (Yang et al., 2008) proposed a fingerprint matching algorithm based on invariant moments. (Montesanto et al., 2007) have studied the fingerprint verification based on the fuzzy logic, where they combined the results obtained using three different methods of minutiae extraction: the sequential method, the reactive agent and the neural classification system. (Puertas et al., 2010) studied the performance of a fingerprint recognition technology, in several practical scenarios of interest in forensic casework. First, the differences in performance between manual and automatic minutiae extraction for latent fingerprints were presented. Then, automatic minutiae extraction was analyzed using three different types of fingerprints: latent, rolled and plain. The experiments were carried out using a database of latent fingermarks and fingerprint impressions from real forensic cases. 11. Quality assessment methods Fingerprint quality is usually defined as a measure of the clarity of ridges and valleys and the extractability of the features used for identification such as minutiae, core and delta points, etc. Therefore, it is important to estimate the quality and validity of the fingerprint Biometrics 72 images in order to improve recognition performance. A number of factors can affect the quality of fingerprint images (Joun et al., 2003): occupation, motivation/collaboration of users, age, temporal or permanent cuts, dryness/wetness conditions, temperature, dirt, residual prints on the sensor surface, etc. Unfortunately, many of these factors cannot be controlled and/or avoided. For this reason, assessing the quality of captured fingerprints is important for a fingerprint recognition system. (Qi et al., 2005) proposed a hybrid scheme to measure quality by considering local and global features. They used seven quality indices and analyzed the correlation between the quality value and each quality index. (Chen et al., 2005) suggested quality estimation methods based on the power spectrum analysis in the 2-D Fourier domain and coherence in the spatial domain. ISO/INCITS-M1 (International Standards Organization/International Committee for Information Technology Standards) has established a biometric sample-quality draft standard (Int. Com. Inf. Technol. Standards, 2005), in which a biometric sample quality is considered from three different points of view: 1) character, which refers to the quality attributable to inherent physical features of the subject; 2) fidelity, which is the degree of similarity between a biometric sample and its source, attributable to each step through which the sample is processed; and 3) utility, which refers to the impact of the individual biometric sample on the overall performance of a biometric system, where the concept of sample quality is a scalar quantity that is related monotonically to the performance of the system. The character of the sample source and the fidelity of the processed samples contribute to, or similarly detract from, the utility of the sample. It is generally accepted that the utility is most importantly mirrored by a quality metric (Grother & Tabassi, 2007), so that images assigned higher quality shall necessarily lead to better identification of individuals (i.e., better separation of genuine and impostor match score distributions). A theoretical framework for a biometric sample quality has been developed by (Youmaran & Adler, 2006), where they relate biometric sample quality with the identifiable information contained. They defined “Biometric information” (BI) as the decrease in uncertainty about the identity of a person due to a set of biometric measurements. BI is calculated by the relative entropy between the population feature distribution and the person’s feature distribution. The results reported by (Youmaran & Adler, 2006) show that degraded biometric samples result in a decrease in BI. In (Van derWeken et al., 2007) , a number of quality metrics can be found aimed at objectively assessing the quality of an image in terms of the similarity between a reference image and a degraded version of it. Finally, (Lee et al., 2008) proposed a fingerprint-quality measurement method based on the shapes of several probability density functions (PDFs). The 2-D gradients of the fingerprint images are first separated into two sets of 1-D gradients. Then, the shapes of the PDFs of these gradients are measured in order to determine the fingerprint quality. 12. Conclusion In this chapter, we have presented a study covering different automatic fingerprint recognition techniques, presented by the experts in this field. Although many academic and commercial systems for fingerprint recognition exist, there is a necessity for further research in this topic in order to improve the reliability and performances of the current systems. Many unresolved problems still need to be explored and investigated. For example, for a Minutiae-based Fingerprint Extraction and Recognition 73 large automated fingerprint identification system, the recognition accuracy, matching speed and its robustness to poor image quality are normally regarded as the most critical elements of system performance. Also, fast comparison algorithm is necessary since most minutiae- based matching algorithms will fail to meet the high speed requirement. Further, matching partial fingerprints still needs lots of improvement. The major challenges faced in partial fingerprint matching are the absence of sufficient level 2 features (minutiae) and other structures such as core and delta. Thus, common matching methods based on alignment of singular structures would fail in case of partial prints. Pores (level 3 features) on fingerprints have proven to be discriminative features and have recently been successfully employed in automatic fingerprint recognition systems. Finally, there is still a lot of research to be done when dealing with latent fingermarks. Low quality, incompletion and distortion are typical problems that forensic fingerprint recognition systems have to face when extracting features from latent fingermarks. 13. Acknowledgments The author would like to acknowledge and thank Kuwait Foundation for the Advancement of Sciences (KFAS) for financially supporting this work. 14. References Abutaleb, A. S. & Kamel, M. (1999). A Genetic Algorithm for the Estimation of Ridges in Fingerprints, IEEE Trans. On Image Processing, Vol. 8, No. 8, pp. 1134-1139, AUGUST 1999 Adelson, E. H.; Anderson, C. H.; Bergen, J. R.; Burt, P. J. & Ogden, J. M. (1984). Pyramid methods in image processing, RCA Eng., Vol. 29, No. 6, pp. 33–41, 1984 Alibeigi, E.; Rizi M. T. & Behnamfar, P. (2009). Pipelined Minutiae Extraction From Fingerprint Images, Proceedings of the IEEE, 2009 Bazen, A.M. & Gerez, S.H. (2003). Fingerprint Matching by Thin-Plate Spline Modelling of Elastic Deformations, Pattern Recognition, vol. 36, no. 8, pp. 1859-1867, Aug. 2003 Bazen, A.M.; Verwaaijen, G.T.B.; Gerez, S.H.; Veelenturf, L.P.J. & Van der Zwaag, B.J. (2000). A Correlation-Based Fingerprint Verification System, Proc. 11th Ann. Workshop Circuits Systems and Signal Processing, pp. 205-213, Nov. 2000 Bigun, J. (2006). Vision With Direction. New York: Springer, 2006 Biometric System Lab., University of Bologna, Cesena-Italy. (www.csr.unibo.it/research/biolab/) Bowen, J.D. (1992). The Home Office Automatic Fingerprint Pattern Classification Project, Proc. IEE Colloquiym Neural Network for Image Processing Applications, 1992 Candela, G.T. & Chellappa, R. (1993). Comparative Performance of Classification Methods for Fingerprints, NIST Technical Report NISTIR 5163, Apr. 1993 Candela, G.T. & et al. (1995). PCASYS—A Pattern-Level Classification Automation System for Fingerprints, NIST Technical Report NISTIR 5647, Aug. 1995 Cappelli, R.; Lumini, A.; Maio, D. & Maltoni, D. (1999). Fingerprint Classification by Directional Image Partitioning, IEEE Trans. On Pattern Analysis And Machine Intelligence, Vol. 21, No. 5, pp. 402-421, MAY 1999 Biometrics 74 Ceguema, A. V. & Koprinska, I. (2002). Integrating Local and Global Features in Automatic Fingerprint Verification, Proceedings of the 16th International Conference on Pattern Recognition, 2002 Chen, Y.; Dass, S. & Jain, A. (2005). Fingerprint quality indices for predicting authentication performance, Proceedings 5th Int. Conf. Audioand Video-based Biometric Person Authentication, Rye Brook, NY, Jul. 2005, pp. 160–170 Chen, F.; Zhou, J. & Yang, C. (2009). Reconstructing Orientation Field From Fingerprint Minutiae to Improve Minutiae-Matching Accuracy, IEEE Transactions On Image Processing, Vol. 18, No. 7, pp. 1665- 1670, JULY 2009 Chikkerur, S. & Govindaraju, V. (2005). Fingerprint image enhancement using STFT analysis, Proc. Int. Workshop on Pattern Recognition for Crime Prevention, Security and Surveillance, 2005, pp. 20–29 Choi, H.; Choi, K. & Kim, J. (2010). Fingerprint Matching Incorporating Ridge Features with Minutiae, Prodeedings of the IEEE, 2010 Conti, V.; Militello, C.; Sorbello, F. & Vitabile, S. (2010). Introducing Pseudo-Singularity Points for Efficient Fingerprints Classification and Recognition, International Conference on Complex, Intelligent and Software Intensive Systems, 2010 Dadgostar, M.; Tabrizi, P. R.; Fatemizadeh, E. & Soltanian-Zadeh, H. (2009). Feature Extraction Using Gabor-Filter and Recursive Fisher Linear Discriminant with Application in Fingerprint Identification, 7th Int'l Conf. on Advances in Pattern Recognition, 2009 Donahue, M.J. & Rokhlin, S.I. (1993). On the Use of Level Curves in Image Analysis, Image Understanding, Vol. 57, no. 3, pp. 185-203, 1993 Feng, J. & Jain, A. K. (2011). Fingerprint Reconstruction: From Minutiae to Phase, IEEE Trans. On Pattern Analysis And Machine Intelligence, Vol. 33, No. 2, pp. 209-223, FEB. 2011 Feng, J.; Ouyang, Z. & Cai, A. (2006). Fingerprint Matching Using Ridges, Pattern Recognition, vol. 39, no. 11, pp. 2131-2140, 2006 Fronthaler, H.; Kollreider, K. & Bigun, J. (2008). Local Features for Enhancement and Minutiae Extraction in Fingerprints, IEEE Transactions On Image Processing, Vol. 17, No. 3, pp. 354-363, MARCH 2008 Galton, F. (1892). Finger Prints. London: McMillan, 1892 Gamassi, M.; Piuri, V. & Scotti, F. (2005). Fingerprint local analysis for high-performance minutiae extraction, IEEE ICIP, Vol. 3, pp. 265-268, Sept 2005 Grother, P. & Tabassi, E. (2007). Performance of biometric quality measures, IEEE Trans. Pattern Anal. Mach. Intell., Vol. 29, No. 4, pp. 531–543, Apr. 2007 Hara, M. & Toyama, H. (2007). Method and Apparatus for Matching Streaked Pattern Image, US Patent No. 7,295,688, 2007 Henry, E.R. (1900). Classification and Uses of Finger Prints. London: Routledge, 1900 Hong, L.; Wan, Y. & Jain, A. (1998). Fingerprint Image Enhancement: Algorithm and Performance Evaluation, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 20, No. 8, pp. 777-789, AUGUST 1998 Hughes, P.A. & Green, A.D.P. (1991). The Use of Neural Network for Fingerprint Classification, Proc. Second Int’l Conf. Neural Network, pp. 79-81, 1991 International Biometric Group. (2008). Analysis of Level 3 Features at High Resolutions (Phase II), 2008 Minutiae-based Fingerprint Extraction and Recognition 75 International Com. Inf. Technol. Standards. (2005). Biometric Sample Quality Std. Draft (Rev. 4), Document M1/06-0003, 2005 Jain, A.; Chen, Y. & Demirkus, M. (2006). Pores and ridges: Fingerprint matching using level 3 features, Proc. 18th Int. Conf. Pattern Recog., 2006, pp. 477–480 Jain, A.; Chen, Y. & Demirkus, M. (2007). Pores and ridges: High-resolution fingerprint matching using level-3 features, IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp. 15–27, Jan. 2007 Jolliffe, I.T. (1986). Principle Component Analysis. New York: Springer-Verlag, 1986 Joun, S.; Kim, H.; Chung, Y. & Ahn, D. (2003). An experimental study on measuring image quality of infant fingerprints, Proceedings of the KES, 2003, pp. 1261–1269 Kamijo, M. (1993). Classifying Fingerprint Images Using Neural Network: Deriving the Classification State, Proc. Third Int’l Conf. Neural Network, pp. 1,932-1,937, 1993 Kaur, R.; Sandhu, P. S. & Kamra, A. (2010). A Novel Method For Fingerprint Feature Extraction, International Conference on Networking and Information Technology, 2010 Kawagoe, M. & Tojo, A. (1984). Fingerprint Pattern Classification, Pattern Recognition, vol 17, no. 3, pp. 295-303, 1984 Kryszczuk, K. & Drygajlo, A. (2006). Singular point detection in fingerprints using quadrant change information, 18th International Conf. on Pattern Recognition, 2006 Lee, C.-J. & Wang, S.-D. (1999). Fingerprint feature extraction using Gabor filters, Electronics Letters, 18th February, Vol. 35 No. 4, pp. 288-290, 1999 Lee, S.; Choi, H.; Choi, K. & Kim, J. (2008). Fingerprint-Quality Index Using Gradient Components, IEEE Tran. On Information Forensics And Security, Vol. 3, No. 4, pp. 792-800, DEC. 2008 Liu, C. N. & Shelton, G. L. Jr. (1970). Computer-Assisted Fingerprint Encoding and Classification, IEEE Tran. on Man-Machine Systems, , pp. 156-160, SEPT. 1970 Liu, T.; Hao, P. & Zhang, C. (2005). Fingerprint Singular Points Detection and Direction Estimation with a “T” Shape Model, Proceedings of the AVBPA 2005, Springer, Hilton Rye Town, NY, USA, July 2005 pp. 201-207 Maio, D. & Maltoni, D. (1996). A Structural Approach to Fingerprint Classification, Proc. 13th ICPR, Vienna, Aug. 1996 Maio, D.; Maltoni, D. & Rizzi, S. (1996). Dynamic Clustering Of Maps In Autonomous Agents, IEEE Trans. Pattern Anal. Mach. Intell, Vol. 18, No. 11, pp. 1,080-1,091, Nov. 1996 Malathi, S. & Meena, C. (2010). An efficient method for partial fingerprint recognition based on Local Binary Pattern, ICCCCT’2010 Militello, C.; Conti, V.; Sorbello, F. & Vitabile, S. (2008). A Novel Embedded Fingerprints Authentication System Based on Singularity Points, International Conference on Complex, Intelligent and Software Intensive Systems, 2008 Min, M. M. & Thein, Y. (2009). Intelligent Fingerprint Recognition System by Using Geometry Approach, Proceedings of the IEEE, 2009 Moayer, B. & Fu, K.S. (1975). A Syntactic Approach to Fingerprint Pattern Recognition, Pattern Recognition, vol. 7, pp. 1-23, 1975 Moayer, B. & FU, K.-S. (1976). A Tree System Approach for Fingerprint Pattern Recognition, IEEE Trans. On Computers, Vol. C-25, NO. 3,pp.262-274, March 1976 Biometrics 76 Montesanto, A.; Baldassarri, P.; Vallesi, G. & Tascini, G. (2007). Fingerprints Recognition Using Minutiae Extraction: a Fuzzy Approach, 14th International Conference on Image Analysis and Processing, 2007 Moscinska, K. & Tyma, G. (1993). Neural Network Based Fingerprint Classification, Proc. Third Int’l Conf. Neural Network, pp. 229-232, 1993 On, C. K.; Pandiyan, P. M.; Yaacob, S. & Saudi, A. (2006). Fingerprint Feature Extraction Based Discrete Cosine Transformation (DCT), Proceedings of the ICOCI, 2006 Parsons, N. R.; Smith, J. Q.; Thonnes, E.; Wang, L. & Wilson, R. G. (2008). Rotationally invariant statistics for examining the evidence from the pores in fingerprints, Law, Probability and Risk, vol. 7, no. 1, pp. 1–14, Mar. 2008 Puertas, M.; Ramos, D.; Fierrez, J.; Ortega-Garcia, J. & Exposito, N. (2010). Towards a Better Understanding of the Performance of Latent Fingerprint Recognition in Realistic Forensic Conditions, International Conference on Pattern Recognition, 2010 Qi, J.; Abdurrachim, D.; Li, D. & Kunieda, H. (2005). A hybrid method for fingerprint image quality calculation, Proceedings of the 4th IEEE Workshop Automatic Identification Advanced Technologies, 2005, pp. 124–129 Qi, J.; Xie, M. & Wang, W. (2008). A Novel Fingerprint Matching Method Using a Curvature- Based Minutia Specifier, Proceedings of the 15th International IEEE Conference on Image Processing, 2008, pp.1488-1491 Rao, K. & Balck, K. (1980). Type Classification of Fingerprints: A Syntactic Approach, IEEE Trans. Anal. Mach. Intell, vol. 2, no. 3, pp. 223-231, 1980 Ratha, N.K.; Bolle, R.M.; Pandit, V.D. & Vaish, V. (2000). Robust Fingerprint Authentication Using Local Structural Similarity, Proc. Fifth IEEE Workshop Applications of Computer Vision, pp. 29-34, 2000 Ratha, N. K.; Karu, K.; Chen, S. & Jain, A. K. (1996). A Real-Time Matching System for Large Fingerprint Databases, IEEE Tran. On Pattern Analysis And Machine Intelligence, VOL. 18, NO. 8, AUGUST 1996 Senior, A. (1997). A Hidden Markov Model Fingerprint Classifier, Proc. 31st Asilomar Conf. Signals, Systems, and Computers, pp. 306-310, 1997 Simoncelli, E. P. & Freeman, W. T. (1995). The steerable pyramid:Aflexible architecture for multi-scale derivative computation, Proc. Int. Conf. Image Processing, Washington, DC, Oct. 1995, vol. 3, pp. 23–26 Sherlock, B. G.; Monro, D. M. & Millard, K. (1994). Fingerprint enhancement by directional fourier filtering, Vision Image and Signal Processing, Vol. 141, No. 2, pp. 87–94, 1994 Specht, D.F. (1990). Probabilistic Neural Network, Neural Networks, vol. 3, no. 1, pp. 109-118, 1990 Stock, R.M. & Swonger, C.W. (1969). Development and Evaluation of a reader of Fingerprint Minutiae, Cornell Aeronautical Laboratory, Technical Report CAL no. XM-2478-X- 1:13-17, 1969 Stosz, J. & Alyea, L. (1994). Automated system for fingerprint authentication using pores and ridge structure, Proceedings of SPIE, Vol. 2277, pp. 210-223, 1994 Tachaphetpiboon, S. & Amornraksa, T. (2007). Fingerprint Features Extraction Using Curve- scanned DCT Coefficients, Proceedings of Asia-Pacific Conference on Communications, 2007 Tachaphetpiboon, S. & Amornraksa, T. (2005). Fingerprint matching method using zigzag- scanned DCT coefficients, Proc. of ITC-CSCC’05, pp. 1171-1172, July 2005 Minutiae-based Fingerprint Extraction and Recognition 77 Thebaud, L.R. (1999). Systems and Methods with Identity Verification by Comparison and Interpretation of Skin Patterns Such as Fingerprints, US Patent No. 5,909,501, 1999 The Science of Fingerprints: Classification and Uses, Federal Bureau of Investigation. Washington, D.C.: US. Government Printing Office, 1984 Tico, M.; Immonen, E.; Ramo, P.; Kuosmanen, P. & Saarinen, J. (2001). Fingerprint recognition using wavelet features, Proc. of ISCS’01, Vol. 2, pp. 21-24, May 2001 Vaikol, S.; Sawarkar, S.D.; Hivrale, S. & Sharma, T. (2009). Minutiae Feature Extraction From Fingerprint Images, IEEE Int'l Advance Comp. Conf. , India, 6-7 March 2009 Van derWeken, D.; Nachtegael, M. & Kerre, E. (2007). Combining neighborhood- based and histogram similarity measures for the design of image quality measures, Image and Vision Computing, Vol. 25, pp. 184–195, 2007 Wahab, A.; Chin, S.H. & Tan, E. C. (1998). Novel approach to automated fingerprint recognition, Prodeedings of the IEE, 1998 Wang, H.; Min, L.Q. & Liu, J. (2008). Robust designs for Fingerprint Feature Extraction CNN with Von Neumann Neighborhood, International Conference on Computational Intelligence and Security, 2008 Watson, C. I.; Candela, G. T. & Grother, P. J. (1994). Comparison of FFT fingerprint filtering methods for neural network classification, NISTIR, vol. 5493, 1994 Wei-bo, Z.; Xin-bao, N. & Chen-jian, W. (2008). A Fingerprint Matching Algorithm Based on Relative Topological Relationship among Minutiae, Proceedings of the of International IEEE Conference Neural Networks & Signal Processing, pp. 225–228, 2008 Willis, A. & Myers, L. (2001). A cost-effective fingerprint recognition system for use with low-quality prints and damaged fingerprint, Pattern Recognition, Vol. 34, No. 2, pp. 255–270, Feb. 2001 Xu, H. & Veldhuis, R. N.J. (2009). Spectral Minutiae Representations of Fingerprints Enhanced by Quality Data, Proceedings of the IEEE, 2009 Xu, H. & Veldhuis, R. N.J. (2010). Complex Spectral Minutiae Representation For Fingerprint Recognition, Proceedings of the IEEE, 2010 Xu, H.; Veldhuis, R. N. J.; Bazen, A. M.; Kevenaar, T. A. M.; Akkermans, T. A. H. M. & Gokberk, B. (2009). Fingerprint Verification Using Spectral Minutiae Representations, IEEE Transactions On Information Forensics And Security, Vol. 4, No. 3, pp. 397-409, SEP. 2009 Xu, H.; Veldhuis, R. N. J.; Kevenaar, T. A. M. & Akkermans, T. A. H. M. (2009). A Fast Minutiae-Based Fingerprint Recognition System, IEEE SYSTEMS JOURNAL, Vol. 3, No. 4, pp. 418-427, December 2009 Yahagi, H.; Igaki, S. & Yamagishi, F. (1990). Moving-Window Algorithm For Fast Fingerprint Verification, Fujitsu Laboratories LTD., Atsugi 243-01, Japan, 1990 Yang, J.C.; Shin, J.W.; Min, B.J.; Lee, J.W.; Park, D.S. & Yoon, S. (2008). Fingerprint Matching using Global Minutiae and Invariant Moments, Congress on Image and Signal Processing, 2008 Youmaran, R. & Adler, A. (2006). Measuring biometric sample quality in terms of biometric information, Proceedings of the Biometrics Symp., Baltimore, MD, Sep. 2006. Zhang, P.; Li, C. & Hu, J. (2010). A Pitfall in Fingerprint Features Extraction, 11th Int. Conf. Control, Automation, Robotics and Vision, Singapore, 7-10th December 2010 Biometrics 78 Zhang, D.; Liu, F.; Zhao, Q.; Lu, G. & Luo, N. (2011). Selecting a Reference High Resolution for Fingerprint Recognition Using Minutiae and Pores, IEEE Trans. On Instrumentation And Measurement, Vol. 60, No. 3, pp. 863-871, MARCH 2011 Zhao, Q. & Jain, A. K. (2010). On the Utility of Extended Fingerprint Features:A Study on Pores, Proceedings of the IEEE, 2010 Zhao, Q.; Zhang, D.; Zhang, L. & Luo, N. (2009). High resolution partial fingerprint alignment using pore-valley descriptors, Pattern Recognition, vol. 43, no. 3, pp. 1050–1061, Aug. 2009 Zhao, Q.; Zhang, L.; Zhang, D.; Luo, N. & Bao, J. (2008). Adaptive pore model for fingerprint pore extraction, Proc. 18th Int. Conf. Pattern Recog., 2008, pp. 1–4 4 Non-minutiae Based Fingerprint Descriptor Jucheng Yang School of information technology, Jiangxi University of Finance and Economics Ahead Software Company Limited, Nanchang China 1. Introduction Fingerprint recognition refers to the techniques of identifying or verifying a match between human fingerprints. Fingerprint recognition has been one of the hot research areas in recent years, and it plays an important role in personal identification (Maio et al., 2003). A general fingerprint recognition system consists of some important steps, such as fingerprint pre- processing, feature extraction, matching, and so on. Usually, a descriptor is defined to identify an item with information storage. A fingerprint descriptor is used to descript and represent a fingerprint image for personal identification. Various fingerprint descriptors have been proposed in the literature. Two main categories for fingerprint descriptors can be classified into minutiae based and non-minutiae based. Minutiae based descriptors (Jain et al. 1997a; Jain et al. 1997b; Liu et al. 2000; Ratha et al. 1996; He et al. 2007; Cappelli et al. 2011) are the most popular algorithms for fingerprint recognition and are sophisticatedly used in fingerprint recognition systems. The major minutia features of fingerprint ridges are: ridge ending, ridge bifurcation and so on (Maio et al., 2003). Minutiae based descriptor use a feature vector extracted from fingerprints as sets of points in a multi-dimensional space, which comprise several characteristics of minutiae such as type, position, orientation, etc. The matching is to essentially search for the best alignment between the template and the input minutiae sets. However, due to the poor image quality and complex input conditions, minutiae are not easy to be accurately determined, thus it may result in low matching accuracy. In addition, minutiae based descriptors may not fully utilize the rich discriminatory information available in the fingerprints with high computational complexity. Non-minutiae based descriptors (Amornraksa & Tachaphetpiboon, 2006; Benhammadi, et al. 2007;Jain et al.,2000; Jin et al., 2004; Nanni and Lumini, 2008; Nanni & Lumini, 2009; Ross, et al. 2003; Sha et al. 2003;Tico et al. 2001; Yang et al., 2006; Yang & Park, 2008a; Yang & Park, 2008b), however, overcome the demerits of the minutiae based method. It uses features other than characteristics of minutiae from the fingerprint ridge pattern, such as local orientation and frequency, ridge shape, and texture information. It can extract more rich discriminatory information and abandon the pre-processing process such as binarization and thinning and post minutiae processing. Other merits are listed by using the non- minutiae based methods, such as a high accuracy; a fast processing speed; a fixed length feature vector; easily coupled with other system; being combined with Biohashing and so on. Biometrics 80 Among various non-minutiae based descriptors, Gabor feature-based ones (Jain et al.,2000; Sha et al. 2003) present a relatively high matching accuracy by using a bank of Gabor filters to capture both the local and global details in a fingerprint, and represent them as a compact fixed-length fingerCode. Ross et al. (2003) describes a hybrid fingerprint descriptor that uses both minutiae and a ridge feature map constructed by a set of eight Gabor filters. Benhammadi et al. (2007) also propose a new hybrid fingerprint descriptor based on minutiae texture maps according to their orientations. It exploits the eight fixed directions of Gabor filters to generate its weighting oriented Minutiae Codes. Tico et al. (2001) propose a transform-based descriptor using digital wavelet transform (DWT) features. The features are obtained from the standard deviations of the DWT coefficients of the image details at different scales and orientations. Amornraksa & Tachaphetpiboon (2006) propose another transform-based descriptor using digital cosine transform (DCT) features. The transform methods show a high matching accuracy for inputs identical to one in its database. Jin et al. (2004) propose an improved transform-based descriptor based on the features extracted from the integrated wavelet and Fourier-Mellin transform (WFMT) framework. Multiple WFMT features can be used to form a reference invariant feature through the linearity property of FMT and hence to reduce the variability of the input fingerprint images. Nanni & Lumini (2008) proposed a hybrid fingerprint descriptor based on local binary pattern (LBP). Nanni & Lumini (2009) proposed another descriptor based on histogram of oriented gradients (HoG). Yang et al. (2006) propose a fingerprint descriptor using invariant moments (IM) with the learning vector quantization neural network (LVQNN) for matching, which use a fixed-size ROI (Region of Interest) to extract seven invariant moments as a feature vector. Its improved ones (Yang & Park, 2008a; Yang & Park, 2008b;) using tessellated invariant moments(Tessellated IM) or sub-regions IM with eigenvalue-weighted cosine (EWC) distance or nonlinear back-propagation Neural Networks (BPNN) to handle the various input conditions for fingerprint recognition. In this chapter, some state of the art non-minutiae based descriptors are first reviewed, and a non-minutiae based fingerprint descriptor with tessellated invariant moment features, feature selection with PCA (Principle Component Analysis), and a Support Vector Machine (SVM) for classification is proposed. The proposed descriptor basically uses moment features invariant to scale, position and rotation to increase the matching accuracy with a low computational load. It further pursues an improved performance by using the alignment and rotation after a sophisticated, reliable detection of a reference point. Having invariant characteristics in the proposed algorithm can significantly improve the performance for input images under various conditions. To fully utilize both the global and local ridge information while removing unwanted noises, the algorithm extracts features from tessellated cells around the reference point. Combining with the PCA for feature selection to reduce the dimension of feature vector and choose the distinct features. Matching with a SVM also contributes to the performance enhancement by simply assigning weights on different cells and classification with high accuracy. The chapter is organized as follows: some state of the art non-minutiae based descriptors are briefly reviewed in section 2. In section 3, a proposed non-minutiae based fingerprint descriptor with tessellated invariant moments and SVM is explained. And experimental results are illuminated in section 4. Finally, conclusion remarks are given in section 5. Non-minutiae Based Fingerprint Descriptor 81 2. Non-minutiae based descriptors It is important to establish descriptors to extract reliable, independent and discriminate fingerprint image features. Exception of the widely used minutiae descriptors, the non- minutiae based descriptors use features other than characteristics of minutiae from the fingerprint ridge pattern are able to achieve the characters of the mentioned fine traits . The features of these descriptors may be extracted more reliably than those of minutiae. The next sub-sections will introduce some classical and state of the art non-minutiae based descriptors, such as Gabor filters, DWT, DCT, WFMT, LBP, HOG, IM based. 2.1 Gabor filters based descriptor The Gabor filters based descriptors (Jain et al.,2000; Sha et al. 2003) have been proved with their effectiveness to capture the local ridge characteristics with both frequency-selective and orientation-selective properties in both spatial and frequency domains. They describe a new texture descriptor scheme called fingerCode which is to utilize both global and local ridge descriptions to represent a fingerprint image. The features are extracted by tessellating the image around a reference point (the core point) determined in advance. The feature vector consists of an ordered collection of texture descriptors from some tessellated cells. Since the scheme assumes that the fingerprint is vertically oriented, to achieve invariance, image rotation is compensated by computing the features at various orientations. The texture descriptors are obtained by filtering each sector with 8 oriented Gabor filters and then computing the AAD (Average Absolute Deviation) of the pixel values in each cell. The features are concatenated to obtain the fingerCode. Fingerprint matching is based on finding the Euclidean distance between the two corresponding FingerCodes. However, the Gabor filters based descriptors are not rotation invariant. To achieve approximate rotation invariance, each fingerprint has to be represented with ten associated templates stored in the database, and the template with the minimum score is considered as the rotated version of the input fingerprint image. So these methods require a larger storage space and a significantly high processing time. Recently, some hybrid descriptors combined with Gabor filters are proposed. Ross et al. (2003) describes a hybrid fingerprint descriptor that uses both minutiae and a ridge feature map constructed by a set of eight Gabor filters. The ridge feature map along with the minutiae set of a fingerprint image is used for matching purposes. The hybrid matcher is proved to perform better than a minutiae-based fingerprint matching system by the author. Benhammadi et al. (2007) also propose a hybrid fingerprint descriptor based on minutiae texture maps according to their orientations. Rather than exploiting the eight fixed directions of Gabor filters for all original fingerprint images filtering process, they construct absolute images starting from the minutiae localizations and orientations to generate the Weighting Oriented Minutiae Codes. The extracted features are invariant to translation and rotation, which avoids the fingerprint pair relative alignment stage. Another Gabor filters based descriptor is proposed by Nanni & Lumini (2007), where the minutiae are used to align the images and a multi-resolution analysis performed on separate regions or sub-windows of the fingerprint pattern is adopted for feature extraction and classification. The features extracted are the standard deviation of the image convolved with 16 Gabor filters. The similarity measurement is done by the weighed Euclidean distance matchers with a sequential forward floating scheme. Biometrics 82 However, the matching accuracy of these hybrid approaches may be degraded for low- quality inputs, since the performance highly depends on extracting all minutiae precisely and reliably. 2.2 DWT based descriptor Tico et al. (2001) proposed a method using DWT features. The features are obtained from the standard deviations of the DWT coefficients of the entire image details at different scales and orientations to form a feature vector of length 12 (48 in total from four subimages). The normalized l 2 -norm of each wavelet sub-image is computed in order to create a feature vector. The feature vector represents an approximation of the image energy distribution. Fingerprint matching is based on k-Nearest Neighbor (KNN) with finding the Euclidean distance between the corresponding feature vectors. However, the features extraction from frequency domain with corresponding transforms are not rotation-invariant, so if the image input with rotation, the features from the same fingerprint image can be judged into the different ones. 2.3 DCT based descriptor Amornraksa & Tachaphetpiboon (2006) also propose a method using digital cosine transform (DCT) features for fingerprint matching. The standard deviations of the DCT coefficients located in six predefined areas from a 64×64 ROI are used as a feature vector of length 6 (24 in total from four sub-images) for fingerprint matching. To extract the informative features from a fingerprint image, the image is first cropped to a 64×64 pixel region, centred at the reference point, and then quartered to obtain four non- overlapping sub-images of size 32×32 pixels. Next, the DCT is applied to each sub-image to obtain a block of 32×32 DCT coefficients. Finally, the standard deviations of the DCT coefficients located in six predefined areas are calculated and used as a feature vector of length 6 (24 in total from four sub-images) for fingerprint matching. Fingerprint matching is also based on KNN with the Euclidean distance. However, the features are not rotation-invariant, too, to achieve rotation-invariant, so an improved method needs. 2.4 Integrated wavelet and the Fourier–Mellin transform (WFMT) based descriptor Jin et al. (2004) propose an improved transform-based descriptor extracted features from the integrated wavelet and Fourier-Mellin transform (WFMT) framework. Wavelet transform, with its energy compacted feature is used to preserve the local edges and reduce noise in the low frequency domain, is able to make the fingerprint images less sensitive to shape distortion. The Fourier–Mellin transform (FMT) serves to produce a translation, rotation and scale invariant feature. And multiple WFMT features can be used to form a reference invariant feature through the linearity property of FMT and hence reduce the variability of the input fingerprint images. Multiple m WFMT features can be used to form a reference WFMT feature and just only one representation per user needs to be stored in the database. Fingerprint matching is based on finding the Euclidean distance between the corresponding multiple WFMT feature vectors. However, the main disadvantage of this descriptor is that the reference point is based on a non-precise core point, and the descriptor requires a high time-consuming process if using the multiple WFMT features with training images. Non-minutiae Based Fingerprint Descriptor 83 2.5 Local binary pattern (LBP) based descriptor Nanni & Lumini (2008) proposed a fingerprint descriptor based on LBPs. A LBP proposed by Ojala et al. (2002) is a grayscale local texture operator with powerful discrimination and low computational complexity. Moreover, it is invariant to monotonic grayscale transformation, hence the LBP representation may be less sensitive to changes in illumination. The two fingerprints to be matched are first aligned using their minutiae, then the images are decomposed in several overlapping sub-windows; and a Gabor filter with LBP hybrid method (GLBP) also proposed by Nanni & Lumini (2008), instead of from the original sub-windows, each sub-window is convolved with a bank of Gabor filters and the invariant LBPs histograms are extracted from the convolved images. The matching value between two fingerprint images is performed by a complex distance function that takes into account the presence of different types of descriptors and different regions. However, the matching accuracy depends on extracting all minutiae precisely and reliably, and it may be degraded for low-quality inputs. 2.6 Histogram of oriented gradients (HoG) based descriptor Nanni & Lumini (2009) recently proposed a hybrid fingerprint descriptor based on HoG. HoG has been first proposed by Dalal & Triggs (2005) as an image descriptor for localizing pedestrians in complex images and has reached increasingly popularity. The aim of this descriptor is to represent an image by a set of local histograms which count occurrences of gradient orientation in a local cell of the image. The implementation of the HoG descriptors can be achieved by computing gradients of the image; dividing the image into small sub- regions; building a histogram of gradient directions and normalizing histograms within some groups of sub-regions for each sub-region to achieve a better invariance to changes in illumination or shadowing. The matching value between two fingerprint images is performed by a complex distance function, too. However, similar with the above minutiae based alignment methods, the matching accuracy depends on extracting all minutiae precisely and reliably. 2.7 IM based descriptor Except for the above non-minutiae based descriptor, recently, Yang et al. (2006) propose an IM based descriptor with the LVQNN for fingerprint matching, which use a fixed-size ROI to extract seven invariant moments as a feature vector. And its improved ones (Yang & Park, 2008a; Yang & Park, 2008b;) using tessellated IM or sub-regions IM with EWC distance or nonlinear BPNN to handle the various input conditions for fingerprint recognition. The details of the invariant moments are introduced as below: 2.7.1 Raw moments For a 2-dimensional continuous function f(x,y) the moment (sometimes called ‘raw moment’) of order (p + q) is defined as p q pq m x y f x y dxdy ( , ) ∞ ∞ −∞ −∞ = ∫ ∫ (1) for p, q= 0, 1, 2,… Biometrics 84 Adapting this to scalar (grey-tone) image with pixel intensities I(x,y), raw image moments are calculated by p q pq x y m x y I x y ( , ) = ∑∑ (2) In some cases, this may be calculated by considering the image as a probability density function, i.e., by dividing the above by x y I x y ( , ) ∑∑ (3) A uniqueness theorem (Papoulis, 1991) states that if f(x,y) is piecewise continuous and has nonzero values only in a finite part of the xy plane, moments of all orders exist, and the moment sequence (Mpq) is uniquely determined by f(x,y). Conversely, (Mpq) uniquely determines f(x,y). In practice, the image is summarized with functions of a few lower order moments. Simple image properties derived via raw moments include: 1. Area (for binary images) or sum of grey level (for grey-tone images): The Area (A) of the image can be extracted by the formula: 00 A=m (4) 2. Centroid point of the image: The centroid point x y { , } of the image can be extracted by the formula: 10 00 01 00 x y m m m m { , } { , } = (5) 2.7.2 Central moments a. The central moments are defined as p q pq x x y y f x y dxdy _ _ ( ) ( ) ( , ) μ ∞ ∞ −∞ −∞ = − − ∫ ∫ (6) Where 10 00 01 00 x m m and y m m _ _ = = (7) If f(x,y) is a digital image, then Eq.(6) becomes p q pq x y x x y y f x y _ _ ( ) ( ) ( , ) μ = − − ∑∑ (8) b. The meanings of the central moments are listed as below: 1. 20 μ represents he horizontal extension parameter of the image; 2. 02 μ represents vertical extension of the image; 3. 11 μ represents the gradient of the image, if 11 0 μ > ,it means the image inclines towards up-left; if 11 0 μ < ,it means the image inclines towards up-right; Non-minutiae Based Fingerprint Descriptor 85 4. 30 μ represents the excursion degree of the image’s barycenter on horizontal direction, when 30 0 μ > ,the barycenter inclines towards left; then if 30 0 μ < ,the barycenter inclines towards right ; 5. 03 μ represents the vertical excursion degree of the image’s barycenter, if 03 0 μ > ,the barycenter inclines towards upward, and inclines towards downward while 03 0 μ < ; 6. 21 μ represents the equilibrium degree about the image on the vertical direction, if 21 0 μ > ,the extension of the downside image is greater than the upside; then smaller when 21 0 μ < ; 7. 12 μ represents the equilibrium degree of the image about the vertical direction, when 12 0 μ > ,the image’s extension on the right side is greater than the left side, then smaller when 12 0 μ < . c. Normalized moments: Moments where p+q >=2 can also be invariant to both translation and changes in scale by dividing central moments by the properly scaled 00 μ moment, as the normalized central moments, denoted pq η , are defined as pq pq 00 γ μ η μ = (9) where p q 1 2 γ + = + for p+q = 2, 3… 2.7.3 Invariant moments As a region description method, invariant moments are used for texture analysis and pattern identification. A set of seven invariant moments derived from the second and the third moments were proposed by Hu (1962). As the equation shown below, Hu derived the expressions from algebraic invariants applied to the moment generating function under a rotation transformation. The set of moment invariants consist of groups of nonlinear centralized moment expressions. And they are scale, position, and rotation invariant (Gonzalez and woods, 2002). 1 20 02 2 2 2 20 02 11 2 2 3 30 12 21 03 2 2 4 30 12 21 03 2 2 5 30 12 30 12 30 12 21 03 2 2 21 03 21 03 30 12 21 03 6 20 02 30 12 30 4 3 3 3 3 3 3 3 ( ) ( ) ( ) ( ) ( ) ( )( )[( ) ( ) ] ( )( )[ ( ) ( ) ] ( )( )[( φ η η φ η η η φ η η η η φ η η η η φ η η η η η η η η η η η η η η η η φ η η η η η = + = − + = − + − = + + + = − + + − + + − + + − + = − + + 2 2 12 21 03 11 30 12 21 03 2 2 7 21 03 30 12 30 12 21 03 2 2 12 30 21 03 30 12 21 03 4 3 3 3 3 ) ( ) ] ( )( ) ( )( )[( ) ( ) ] ( )( )[ ( ) ( ) ] η η η η η η η η φ η η η η η η η η η η η η η η η η − + + + + = − + + − + + − + + − + (10) Biometrics 86 The values of the computed moment invariants are usually small values of higher order moment invariants are close to zero in some cases. So we reset the value range using the logarithmic function, the outputs of the moment invariants are mapped into i i i=1,2,...7 |lg(| |)| φ Φ = , which can take the values to the large dynamic range with a nonlinear scale. Figure 1 and Figure 2 show the invariant moments analysis on the four sub-images with 2D and 3D views, respectively. The experiments here are designed to check the characteristics of the invariant moment features’ invariance to rotation and scale. We choose four sub- images from a ROI of a fingerprint image, and divide the ROI into four sub-images. From the figures, we can see that the ridge valleys of the sub-images are totally different. (a) s1 (b) s2 (c) s3 (d) s4 Fig. 1. Sub-images of sub-images (a) s1 (b) s2 (c) s3 (d) s4 (a) s1 (b) s2 (c) s3 (d) s4 Fig. 2. 3D images of sub-images (a) s1 (b) s2(c) s3 (d) s4 And Table 1 and Table 2 show seven invariant moments of these four sub-images, and scale, rotation invariance of sub-image S1, respectively. As we can see from the tables, the invariant moments are nearly invariant to the scale and rotation invariance. Non-minutiae Based Fingerprint Descriptor 87 Sub- image 1 Φ 2 Φ 3 Φ 4 Φ 5 Φ 6 Φ 7 Φ S1 6.660157 22.792705 27.607114 29.742121 59.753254 41.222497 58.474584 S2 6.664572 20.127298 29.650909 29.742485 59.945926 39.811898 59.811871 S3 6.676695 21.271717 28.456694 29.916016 60.753962 41.309514 59.528609 S4 6.674009 20.209754 28.826501 29.101012 58.682184 39.417999 58.701581 Table 1. Seven invariant moments of the four sub-images Scale Rotation 1 Φ 2 Φ 3 Φ 4 Φ 5 Φ 6 Φ 7 Φ 0.8 0 6.659508 21.450452 27.459971 30.169709 59.518255 41.693181 59.301135 90 6.657605 21.641931 27.433996 31.337310 63.211065 43.918130 61.111226 180 6.657625 21.654452 27.376333 31.318065 61.275622 42.931670 60.714937 270 6.659334 21.473568 27.400168 30.162576 60.756198 41.433472 59.053503 1 0 6.660157 22.792705 27.607114 29.742121 59.753254 41.222497 58.474584 90 6.660157 22.792705 27.607114 29.742121 58.901367 41.222497 58.474584 180 6.660157 22.792705 27.607114 29.742121 59.753254 41.222497 58.474584 270 6.660157 22.792705 27.607114 29.742121 58.901367 41.222497 58.474584 2 0 6.659431 22.792705 27.607114 29.742121 59.753254 41.222497 58.474584 90 6.659431 22.792705 27.607114 29.742121 58.901367 41.222497 58.474584 180 6.659431 22.792705 27.607114 29.742121 59.753254 41.222497 58.474584 270 6.659431 22.792705 27.607114 29.742121 58.901367 41.222497 58.474584 Table 2. Scale, rotation invariance of sub-image S1 2.8 Tessellated invariant moments based descriptors In order to reduce the effects of noise and non-linear distortions, and speed up the processing time, tessellated IM based descriptors (Yang & Park, 2008a) using only a certain area (ROI) around the reference point at the centre as the feature extraction area instead of using the entire fingerprint is proposed. By adjusting the size of ROI and the number of the cells, we can capture both the local and global structure information around the reference point (see Figure 3 (a-b)). The ROI is partitioned into some non-overlapping rectangular cells as depicted in Figure 3 (a), e.g. the size of ROI is 64×64 pixels, 16 rectangular cells, and each cell has a size of 16×16 pixels. Invariant moment analysis introduced in section 2 is used as the features for the fingerprint recognition system. For each cell, a set of seven invariant moments are computed. Suppose 7 n n 1 { } φ φ = = is a set of invariant moments, and N n n 1 s s { } = = is the collection of all the cells, N 7 M n n 1 nk k 1 n 1 s s { } {{ } } φ = = = = = (11) where M is the number of the cells, N=7M is the total length of the collection. Furthermore, in order to improve the matching accuracy, weights w is assigned to each tessellated cell to distinguish the cell from the foreground or background, which can be used to resolve the embarrassment when the reference point is nearby the border of the Biometrics 88 fingerprint image. If the tessellated cells that contain a certain proportion of the background pixels, it will be labelled as background cells and the corresponding w is set to 0, else w equals 1. Therefore, a fingerprint can be represented by a fixed-length feature vector as N N n n 1 n n 1 f f ws { } { } = = = = (12) The length of the total feature vectors is N. (a) (b) Fig. 3. (a) Tessellated cells (local) or (b) tessellated cells (global) (Yang & Park, 2008a) 2.9 Summarization As the above analysis, all kinds of non-minutiae descriptor has proposed to hand the shortcomings of minutiae methods in the fingerprint recognition systems. Table 3 gives a summarization of some classical and state-of-art non-minutiae based descriptors in these recent years, such as Gabor filters, DWT, DCT, WFMT, LBP, HOG, IM based, and analyzes their feature extraction, alignment, and matching methods. Table 3 summarizes the classical and state of the art non-minutiae based method (EER is the equal error rate, FAR is the false acceptance rate, FRR is the false reject rate, and GAR is the genuine acceptance rate (GAR = 1-FRR)). From the table, we can see that based on the public database of FVC2002 DB2, the sub-regions IM method with BPNN has the lowest EER of 3.26%, which means its performance is the best among all the non-minutiae descriptors on the same public database listed in the table. 3. Proposed non-minutiae based descriptor Although the IM descriptor proposed in Yang & Park (2008a) and Yang & Park (2008b) use the tessellated or sub-regions IM to extract both the local and global features, comparing to Yang et al. (2006) just using only a fixed ROI, the tessellated or sub-regions ROIs are delineated from the same area of a fingerprint image, strong correlation may exist among the extracted features, and the dimension of the features are high. An improvement way is Non-minutiae Based Fingerprint Descriptor 89 proposed here to reduce the dimension of feature vector and choose the distinct features before matching. PCA is one of the oldest and best known techniques in multivariate analysis (Fausett, 1994). So we reduce the dimension of feature vector by examining feature covariance matrix and then selecting the most distinct features and using them to improve the verification performance. Besides, as a powerful classification tool, a SVM is proposed for fingerprint matching. Approaches Features Alignment methods Matching methods Performance analysis Databases Resutls Jain et al.(2000) Gabor filters Core point Euclidean distance Self FAR=4.5 FRR=2.8 Ross et al. (2003) Gabor filters& Minutiae Minutiae Euclidean distance Self EER=4 Benhammadi et al. (2007) Minutiae Gabor filters maps Minutiae Normalized distance FVC2002 DB2 EER=5.19 Nanni & Lumini (2007) Gabor filters Minutiae Weighed Euclidean distance FVC2002 DB 2 EER=3.6 Tico et al. (2001) DWT NO KNN with Euclidean distance Self &small Recognition rates 95.2 Amornraksa & Tachaphetpiboo n (2006) DCT NO KNN with Euclidean distance Self &small Recognition rates 99.23 Jin et al. (2004) WFMT Core point Euclidean distance FVC2002 DB 2 EER=5.309 Nanni & Lumini (2008) LBP Minutiae Complex distance function FVC2002 Db2 EER=6.2 Nanni & Lumini (2009) HoG Minutiae Complex distance function FVC2002 DB 2 EER=3.8 Yang et al. (2006) Seven IM Reference point LVQNN FVC2002 DB 2 FAR=0.6G AR=96.1 Yang & Park (2008a) Tessellated IM Reference point EWC distance FVC2002 DB 2 EER=3.78 Yang & Park (2008b) Sub-regions IM Reference point BPNN FVC2002 DB 2 EER=3.26 Table 3. Summarization of the classical and state of the art non-minutiae based method 3.1 Feature vector construction After the pre-processing steps of fingerprint enhancement, reference point determination and image alignment, ROI determination (Yang & Park, 2008a; Yang & Park, 2008b; Yang et Biometrics 90 al, 2008c), due to the good characteristics of reducing the effects of noise and non-linear distortions, we use the tessellated IM as the features, and a fingerprint can be represented by a fixed-length feature vector as { } N N N 7 n n 1 n n 1 nk k 1 n 1 f f ws = w { } { } { } φ = = = = = = , as descript in section 2.8, where 7 n n 1 { } φ φ = = is a set of invariant moments, and the length of the total feature vectors is N, w is the weight of distinguishing the cell from the foreground or background. For example, the ROI is partitioned into some non-overlapping rectangular cells with the size of ROI is 64×64 pixels, 16 rectangular cells, and each cell has a size of 16×16 pixels. Then the feature-vector with the elements consists of a sets of moments derived from tessellated ROIs. So the total length of the feature vector is 16×7=112. 3.2 Feature selection with PCA Here, the feature selection with PCA is briefly introduced. The objective of PCA is to reduce the dimensionality of the data set, while retaining as much as possible variation in the data set, and to identify new meaningful underlying variables. The basic idea in PCA is to find the orthonormal features. Let x ∈ Rn be a random vector, where n is the dimension of the input space. The first principal component is the projection in the direction in which the variance of the projection is maximized. The mth principal component is determined as the principal component of the residual based on the covariance matrix. The covariance matrix of x is defined as Ξ = E{[x-E(x)][x-E(x)] T }. Let u1, u2, . . . , un and λ1, λ2, . . . , λn be eigenvectors and eigenvalues of Ξ, respectively, and λ1 ≥ λ2 ≥. . . ≥ λn. Then, PCA factorizes Ξ into Ξ = UΛU T , with U = [u1, u2, . . . , un] and Λ = diag(λ1, λ2, . . . , λn). We need to note that once the PCA of Ξ is available, the best rank-m approximation of Ξ can be readily computed. Let P = [u1, u2, . . . , um], where m < n. Then y = P T x will be an important application of PCA in dimensionality reduction. 3.3 Matching with SVM SVMs (John & Nello, 2000) are a set of related supervised learning methods used for classification and regression. Viewing input data as two sets of vectors in an n-dimensional space, a SVM will construct a separating hyperplane in that space, one which maximizes the margin between the two data sets. To calculate the margin, two parallel hyperplanes are constructed, one on each side of the separating hyperplane, which are ‘pushed up against’ the two data sets. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the neighboring data points of both classes, since in general the larger the margin the better the generalization error of the classifier. SVM is a powerful tool for solving small sample learning problem that offers favorable performance using linear or nonlinear function estimators. It is a type of neural network that automatically determines the structural components. In our study, we used a two-class SVM to classify the input fingerpints into the corresponding class. Let the training set be ( ) { } ,y m i i i 1 = x , with input vector i x (n components), i y 1 = ± indicates two different classes and 1, 2,..., . i m = The decision function of SVM is: m m i i i i i i 1 i 1 y sign y K b sign K b ( ( , ) ) ( ( , ) ) α ω = = = + = + ∑ ∑ x x x x (13) Non-minutiae Based Fingerprint Descriptor 91 Fig. 4. Illustration of a SVM structure Figure 4 illustrates the structure of a SVM, where i K( , ) x x is the output of the ith hidden node with respect to the input vector x , it is a mapping of the input x and the support vector i x in an alternative space (the so-called feature space), by choosing the kernel i i K( , ) ( ) ( ) φ φ = ⋅ x x x x . i α and b are the learning parameters of the hidden nodes, and i i i y ω α = is the weight of the ith hidden node connecting with the output node. The learning task of a SVM is to find the optimal i α by solving the maximization of the Lagrangian m m i i j i j i j i 1 i j 1 1 W y y K 2 , ( ) ( , ) α α α α = = = − ∑ ∑ x x (14) subject to the constraints m i i i i 1 y 0 0 c α α = = ≤ ≤ ∑ (15) The common kernel functions i K( , ) x x are defined as : T i Degree T i 2 i i T i Linear c Polynomial K Radial basis c Sigmoid ( ) ( , ) exp( ) tanh( ) γ γ γ ⎧ ⋅ ⎪ ⋅ ⋅ + ⎪ = ⎨ − ⋅ − − ⎪ ⎪ ⋅ ⋅ + ⎩ x x x x x x x x x x (16) Where γ ,c ,Degree are the kernel parameters. 4. Experimental results The proposed algorithm was evaluated on the fingerprint images taken from the public FVC2002 database (Maio et al. 2002), which contains 4 distinctive databases: DB1_a, DB2_a, DB3_a and DB4_a. The resolution of DB1_a, DB3_a, and DB4_a is 500 dpi, and that of DB2_a is 569 dpi. Each database consists of 800 fingerprint images in 256 gray scale levels (100 persons, 8 fingerprints per person). Fingerprints from 101 to 110 (set B) have been made available to the participants to allow parameter tuning before the submission of the Biometrics 92 algorithms; the benchmark is then constituted by fingers numbered from 1 to 100 (set A). In our experiments, the used FVC2002databases are set A. 4.1 Performances with a SVM matching In the experiments, a SVM is used to verify a matching between feature vectors of input fingerprint and those of template fingerprint, the number of the output class is the same with the verifying persons. For each input fingerprint and its template fingerprint, we compute the IM features. Since the output is to judge whether the input fingerprint is match or non-match according to the identity ID, we can take the matching process as a two-class problem. In the training stage, training samples after normalization and scale processing (Hao, et al. 2007) are fed to a SVM with indicating their corresponding class. The features are computed from the training data, each contains vector from the training fingerprint, and the identity ID of the corresponding class is used to guide the classifying results through the SVM. While in the testing stage, test samples with the same normalization and scale processing are fed to the SVM to produce the output values. Similarly, the features are computed from the testing data, each contains vector from the test fingerprint with the corresponding identity ID. The element of the output values is restricted in the class number. If the output number is equal the corresponding ID, then it means match, vice visa. To evaluate the performance of the verification rate of the proposed method, the receiver operating characteristic (ROC) curve is used. An ROC curve is a plot of false reject rate (FRR) against false acceptance rate (FAR). The FRR and FAR are defined as follows: FRR 100 Number of rejected genuine claims % Total number of genuine accesses = × (17) FAR 100 Number of accepted imposter claims % Total number of imposter accesses = × (18) The equal error rate (EER) is also used as a performance indicator. The EER indicates the point where the FRR and FAR are equal, as below. EER FAR FRR 2 FAR FRR ( ) , if = + = (19) For evaluating the recognition rate performance of the proposed descriptor, we did experiments on FVC2002 databases DB2_a. We divided the database into a training set and a testing set. Six out of eight fingerprints from each person were chosen for training and all the eight fingerprints for testing. For a database, therefore, 600 patterns were used for training, and 800 for testing. In the experiments, the 64×64 pixels size of ROI was adopted, and the ROI was tessellated into 16 rectangular cells with each cell had a size of 16×16 pixels. So the length of the feature vector was 16×7=112. In the feature selection experiments, the feature-vector with the elements consisting of a sets of moments derived from tessellated ROIs was processed with PCA. The new dimensional features are uncorrelated from original features due to the PCA analysis. Figure 5 describes the egienvalues spectrum with different elements by using PCA for feature selection. Since our feature vector contains 112 elements, from the figure, we can see that almost all the egienvalues spectrum with the eigenvectors elements are below than 70, if we keep 95% energy, then eigenvectors element of 50 was chosen. Non-minutiae Based Fingerprint Descriptor 93 Fig. 5. The egienvalues spectrum with different elements by using PCA for feature selection Fig. 6. The curves of the recognition rate of the proposed descriptor by different types of SVM with different γ value on the database FVC2002 DB2_a As we know from the section 3.3, the linear SVM has no parameters, the Radial-basis SVM has only one paramter γ , the Polynominal SVM has three parameters of c, γ , Degree and Biometrics 94 the Sigmoid SVM has two paramters of c, γ . For evaluating the recognition rate performance of the proposed tessellated IM based descriptor with different types of SVM, Figure 6 describes the curves of the recognition rate of the proposed descriptor by different types of SVM with different γ value on the database FVC2002 DB2_a. In our experiments, the parmeters of c and Degree are fixed and determined by experiments, and the egienvalues element is 50. From the figure, we can see that the recognition rates of the proposed descriptor of all nonlinear types of SVM are growing up with the γ value, and the proposed descriptor can achieve high recognition rates with the nonlinear types of SVM. Figure 7 shows the curves of the recognition rate of the proposed descriptor by different γ value of the radial-basis SVM with different PCA egienvalues elements on the database FVC2002 DB2_a. From the figure, we can see that the recognition rate is growing up with the γ value of the Radial-basis SVM. And the descriptor with different eginevalue elements PCA has different recognition rate, with small eginevalue(such as 10) the recognition rate is lower than the PCA with large eginevalue (such as 20) under the same γ value, and the best recognition rate achieves by the egienvalue element equals 50. However, all the recognition rate of the descriptor with PCA may have better performances comparing with the descriptor without PCA. 4.2 Comparing with other descriptors The performances of several descriptors aimed at evaluating the usefulness of our descriptor were compared; all the descriptors were based on the two-stage enhanced image: 1. Gabor filters based, a descriptor using both minutiae and a ridge feature map with eight directions for fingerprint matching according to the method of Ross et al. (2003), and the same parameters as their paper are used; 2. DCT based, a descriptor using DCT features for fingerprint matching according to the method of Amornraksa & Tachaphetpiboon (2006), and the same parameters as their paper are used; 3. WFMT based, a descriptor using WFMT features for fingerprint matching according to the method of Jin et al. (2004), and the same parameters as their paper are used; 4. LBP based, a descriptor using the invariant local binary patterns features (P = 8; R=1; n=10) according to the method of Nanni & Lumini (2008). 5. Tessellated IM based, the 64×64 pixels ROI was tessellated into 16 rectangular cells with each cell had a size of 16×16 pixels and matching with SVM. In our experiment, to compute the FAR and the FRR, the genuine match and impostor match were performed on the four sub-databases of FVC2002 database. We divided each database into a training set and a testing set using a 25% jackknife method. Six out of eight fingerprints from each person were chosen for training and the remaining two for testing. For a database, therefore, 600 patterns are used for training, and 200 for testing. For genuine match, each fingerprint of each person is compared with other fingerprints of the same person. And for impostor match, each test fingerprint is compared with the fingerprints belonged to other persons. Since there are 200 test patterns for an experiment, the number of matches for genuine and impostor are 6×200=1200 and 99×200=19800 for each database. The same experiments are repeated four times by selecting different fingerprints for training and testing, and then the average of four experimental results becomes the final performance. Non-minutiae Based Fingerprint Descriptor 95 Fig. 7. The curves of the recognition rate of the proposed descriptor by different γ values of the radial-basis SVM with different PCA egienvalues elements on the database FVC2002 DB2_a Fig. 8. ROC curves of different methods on database FVC2002 DB2_a Biometrics 96 Figure 8 compares the ROC curves of the descriptors of Gabor filters, DCT, WFMT, LBP based with tessellated IM based descriptor on the database FVC2002 DB2_a. On database containing most poor quality images such as FVC2002 DB2_a, the FRRs of all the descriptors slowly drop down with respect to their FARs; The ROC curves of the Figure 8 prove the facts. On the other hand, we can see that the ROC curve of tessellated IM based descriptor (solid line) is below those of the descriptors of Gabor filters, DCT, WFMT, LBP based (dashed lines), which means that tessellated IM based descriptor outperforms the compared descriptors. And Table 4 illustrates the EER (%) performances of the descriptors of Gabor filters based, DCT based, WFMT based and LBP based with Tessellated IM based over the databases of FVC2002. From the table, we can see that the Tessellated IM based descriptor has the best of results with an average EER of 2.36% over the four sub-databases of FVC2002, while those of the descriptors of Gabor filters, DCT, WFMT, LBP based are 4.17%, 5.68%,4.66%,5.97%, respectively. DB1_a DB2_a DB3_a DB4_a Average Gabor filters based 1.87 3.98 4.64 6.21 4.17 DCT based 2.96 5.42 6.79 7.53 5.68 WFMT based 2.43 4.41 5.18 6.62 4.66 LBP based 3.21 5.62 6.82 8.23 5.97 Tessellated IM based 1.42 2.23 2.48 3.31 2.36 Table 4. EER(%) performance of the descriptors of Gabor filters, DCT, WFMT , LBP based and tessellated IM based over the databases of FVC2002 5. Conclusion In this chapter, we emphases on the introduction of the designing approaches and implementing technologies for non-minutiae based fingerprint descriptors. We firstly review some classical and state of the art fingerprint descriptor, analyze their feature extraction, alignment method and matching methods, and propose a non-minutiae based fingerprint descriptor by using tessellated IM features with a feature selection of PCA and a SVM classifier. The scheme consists of other three important steps after the pervious pre- processing: feature vector construction, feature selection with PCA, and matching with a SVM. Experimental results show that our proposed descriptor has better performances of matching accuracy comparing to other prominent descriptors on public databases. The contributions of this chapter are that we analyze and summarize some classical and state of the art non-minutiae based descriptors and propose a new improved one with tessellated IM features, feature selection by PCA and intelligent SVM to represent fingerprint image in order to effectively handle various input conditions. Further works need to be explored for robustness and reliability of the system. 6. Acknowledgment This work is supported by the National Natural Science Foundation of China (No. 61063035), and it is also supported by Merit-based Science and Technology Activities Foundation for Returned Scholars, Ministry of Human Resources of China. Non-minutiae Based Fingerprint Descriptor 97 7. References Amornraksa, T. ; Tachaphetpiboon, S. (2006). Fingerprint recognition using DCT features, Electron. Lett., 42(9), pp. 522–523 Benhammadi, F. ;Amirouche, M.N. ;Hentous, H. ; Beghdad, K.B ; Aissani, M. (2007). Fingerprint matching from minutiae texture maps, Pattern Recognition,40(1), pp.189- 197 Cappelli, R.; Ferrara, M.; Maltoni, D.(2011). Minutia Cylinder-Code: A New Representation and Matching Technique for Fingerprint Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), pp: 2128 - 2141 Dalal, N.,; Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the ninth European conference on computer vision, San Diego,CA Fausett, L. (1994), Fundamentals of Neural Networks architecture, algorithm and application (Prentice Hall) 289-304 Gonzalez, R. C. ; Woods, R E.(2002), Digital Image Processing (second edition), Prentice Hall, pp. 672-675 Hao, P.Y. ; Chiang, J.H. ; Lin Y.H. (2007). A new maximal-margin spherical-structured multi- class support vector machine. Applied Intelligence, 30(2), pp.98-111 He, X. ; Tian, J. ; Li, L. ; He, Y ; Yang X.(2007). Modeling and Analysis of Local Comprehensive Minutia Relation for Fingerprint Matching, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 4, 37(5), pp.1204-1211 Hu, M.K.(1962). Visual pattern recognition by moment invariants, IRE Trans. Info. Therory. IT-8,pp. 179-187 Jain, A. K.; Hong, L. ; Bolle, R. (1997a). On-line fingerprint verification, IEEE Trans.Pattern Analysis and Machine Intelligence, 19, pp. 302–313 Jain, A.K. ; Hong, L. ; Pankanti, S. ; Bolle, R. (1997b) .An identity-authentication system using fingerprints, Proc. IEEE, 85(9) pp. 1365-1388 Jain, A.K. ; Prabhakar, S. ; Hong, L. ; Pankanti, S. (2000). Filterbank-based fingerprint matching, IEEE Trans. Image Processing,9(5),pp. 846-859 Jiang, X. ; Yau, W.(2000). Fingerprint minutiae matching based on the local and global structures, International Conference on Pattern Recognition, pp. 1038–1041 Jin, A.T.B. ; Ling, D.N.C. ; Song, O.T. (2004). An efficient fingerprint verification system using integrated wavelet and Fourier-Mellin invariant transform, Image and Vision Computing, 22(6),pp. 503-513 John, S.T. ; Nello, C. (2000), Support Vector Machines and other kernel-based learning methods, Cambridge University Press Kenneth, N. ; Josef, B. (2003). Localization of corresponding points in fingerprints by complex filtering, Pattern Recognition Letters, 24 ,pp.2135-2144 Liu, J. ; Huang, Z. ; Chan K.(2000). Direct minutiae extraction from gray-level fingerprint image by relationship examination, Int. Conf.. Image Processing, 2,pp. 427-430 Maio, D. ; Maltoni, D.(1997). Direct gray scale minutia detection in fingerprints, IEEE Trans.Pattern Analysis and Machine Intelligence,19(1), pp. 27-39 Maio, D. ; Maltoni, D. ; Cappelli, R. ; Wayman, J.L. ; Jain, A.K.(2002). FVC2002: Second Fingerprint Verification Competition, Proc. Internationl Conference on Pattern Recognition,(3), pp.811-814 Maltoni, D. ;Maio, D. ;Jain, A.K. ; Prabhakar S. (2003). Handbook of Fingerprint Recognition, Springer, Berlin, pp.135-137 Biometrics 98 Nagaty, K.A.(2005). An adaptive hybrid energy-based fingerprint matching technique, Image and Vision Computing, 23,pp.491-500 Nanni, L. ; Lumini, A. (2007). A hybrid wavelet-based fingerprint matcher, Pattern Recognition, 40(11),pp. 3146-3151 Nanni, L. ; Lumini, A. (2008). Local Binary Patterns for a hybrid fingerprint matcher, Pattern Recognition, 41(11), pp.3461-3466 Nanni, L. ; Lumini, A.(2009). Descriptors for image-based fingerprint matchers, Expert Systems With Applications, 36(10), pp.12414-12422 Ojala, T., Pietikainen, M., & Maeenpaa, T. (2002). Multiresolution gray-scale androtation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), pp.971–987. Papoulis, A. (1991). Probablity, random variables and stochastic processes, Boston: McGraw-Hill. Ratha, N. K. ; Karu, K. ; Chen, S. ; Jain, A. K.(1996). A real-time matching system for large fingerprint databases, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), pp.799–813 Ross, A., Jain, A.K., Reisman, J.(2003). A hybrid fingerprint matcher, Pattern Recognition, 36(7), pp. 1661-1673 Sha, L.F. ; Zhao F. ; Tang X.O.(2003). Improved fingercode for filterbank-based fingerprint matching, Int. Conf. Image Processing, pp.895-898 Tico, M. ; Kuosmanen, P. ; Saarinen, J. (2001) . Wavelet domain features for fingerprint recognition, Electron. Lett., 37(1), pp. 21–22 Yang, J.C. ; Yoon, S. ; Park, D.S. (2006). Applying learning vector quantization neural network for fingerprint matching, Lecture Notes in Artificial Intelligence (LNAI 4304) (Springer, Berlin) , pp.500-509 Yang, J.C. ; Park, D. S. (2008a). A fingerprint verification algorithm using tessellated invariant moment features, Neurocomputing, 71(10-12), pp.1939-1946 Yang, J.C. ; Park, D. S. (2008b). Fingerprint Verification Based on Invariant Moment Features and Nonlinear BPNN, International Journal of Control, Automation, and Systems, 6(6), pp.800-808 Yang, J.C. ; Park, D.S. ; Hitchcock, R. (2008c). Effective Enhancement of Low-Quality Fingerprints with Local Ridge Compensation, IEICE Electronics Express, 5(23), pp.1002–1009 0 Retinal Identification Mikael Agopov University of Heidelberg Germany 1. Introduction Since the pioneering studies of Drs. Carleton Simon and Isodore Goldstein in 1935 [Simon & Goldstein (1935)], it has been known that every eye has its own unique pattern of blood vessels, and that retinal photographs can be used for identifying people. In the 1950’s, this was proven to hold even for identical twins [Tower (1955)]. Hence the idea of using the retinal blood vessel pattern for identification. Eye fundus photography for the purpose of identification is impractical, however. An optical device which scans the retinal blood vessel pattern is required. Such an optical Retinal Identification (RI) device was originally patented in 1978 [Hill (1978)]; after several subsequent patents, it developed into a commercial product in the 1980s and 1990s. As the patent of retinal identification (opposed to the actual design of the device) has now worn off, new developments in the field have taken place. In this chapter, the history, technique and recent developments of RI are discussed. 1.1 The anatomy and optical properties of the human eye Fig. 1. A schematic picture of the human eye. 5 2 Will-be-set-by-IN-TECH A schematic picture of the human eye is shown in Figure 1. The eyeball is of about 24 mm in diameter and filled with vitreous humor, jelly-like substance similar to water; its outer shell, the sclera, is made of rigid proteins called collagen. The light entering the eye first passes through the pupil, an aperture-like opening in the iris. The size of the pupil limits the amount of light entering the eye. The light is focused by the cornea and the crystalline lens onto the retina. The retina converts the photon energy into an electric signal, which is transferred to the brain through the optic nerve. The cornea is about 11 mm in diameter and only 0.5 mm thick. It accounts for most of the refractive power of the eye (about 45 D). The remaining 18 D come from the crystalline lens, which is also - through deformation - able to change its refractive power, thus partly compensating for the refractive error and helping to focus the eye. The retina is a curved surface in the back of the eye. The point of sharpest vision is called fovea - here the light-sensing photoreceptor cells are only behind a small number of other cells. Elsewhere on the retina, the light has to travel through a multi-layered structure of different cells. These various cells are responsible for the eye’s ’signal processing’, i.e. turning the incoming photons first into a chemical and then to an electric signal. After being amplified and pre-processed, the signal is transferred to the nerve fibers, which reside on the peripheral area of the retina around the optic disc, where they form the retinal nerve fiber layer (RNFL). The optic disc is an approximately 5 ◦ ×7 ◦ , ellipse-like opening in the eye fundus, through which the nerve fibers and blood vessels enter the eyeball. It is about 15 ◦ away from the fovea in the nasal direction. The choroid is the utmost layer behind the retina just in front of the sclera. It has a bunch of small blood vessels, and is responsible for the retina’s metaboly. 1.2 The birefringence properties of the eye Birefringence is a formof optical anisotrophy in a material, in which the material has different indices of refraction for p- and s-polarization components of the incoming light beam. The components are thus refracted differently, which in general results the beam being divided into two parts. If the parts are then reflected back by a diffuse reflector (such as the eye fundus), a small portion of the light will travel the same way as it came, joining the polarization components into one again, but having changed the beam’s polarization state in process 1 . The birefringence of the eye is well documented (Cope et al. (1978), Klein Brink et al. (1988), Weinreb et al. (1990), Dreher et al. (1992)). The birefringence of the corneal collagen fibrils constitutes the main part of the total birefringence of the eye. Its amount and orientation changes throughout the cornea. In the retina, the main birefringent component is the retinal nerve fiber layer (RNFL), which consists of the axons of the nerve fibers. The thickness of RNFL is not constant over the retina; the amount of birefringence varies according to the RNFL thickness and also drops steeply if a blood vessel (which is non-birefringent) is encountered. The most successful application of measuring the RNFL thickness around the optic disc is probably the GDx glaucoma diagnostic device (Carl Zeiss Meditec, Jena, Germany). It uses scanning laser polarimetry to topograph the RNFL thickness on the retina. A reduced RNFL thickness means death of the nerve fibers and thus advancing glaucoma. Atypical GDx image is shown in Figure 2. 1 See Appendix about how the polarization change can be measured. 100 Biometrics Retinal Identification 3 Fig. 2. A typical GDx image of a healthy eye. The birefringent nerve fiber layer is seen brightly in the image, as well as the blood vessels which displace the nerve fibers, thus resulting in a weaker measured signal (darker in the image). 2. RI using retinal blood vessel absorption The first patent of the biometric identification using the retinal blood vessel pattern dates back to 1978 [Hill (1978)]. Soon afterwards, the author of the patent founded the company EyeDentity (then Oregon, Portland, USA) and began full-time efforts to develop and commercialize the technique. In the original patent, the retinal blood vessel pattern is scanned with the help of two rings of LEDs. The amount of light reflected back from the retina is measured - when the beam hits a blood vessel, it is absorbed to a bigger extent than when it hits other tissue. In the original retina scan, green laser light was used - it was strongly absorbed by the red blood vessels. However, it was found out that visible light causes discomfort to the identified individual, as well as pupil constriction, causing loss of signal intensity. Since the first working prototype RI, patented in 1981 [Hill (1981)], near-infrared (NIR) light has been used for illumination. The infrared light is not absorbed by the photoreceptors (the absorption drops steeply above 730 nm); however, the retinal blood vessels are fairly transparent to the NIR wavelengths as well - the light is absorbed by the smaller choroidal blood vessels instead (thus, considering this technique, the termRetinal Identification is slightly misleading) before being reflected back fromthe eye fundus. The image acquisition technique has also been changed: the LEDs are given up in favour of scanning optics. A circular scan is preferred over a raster, which suffers from the problem of reflections from the cornea. In the patent of 1986 [Hill (1986)], the scan is centered around the fovea instead of the optic disc. The fovea is on the optical axis of the eye, so this arrangement has the definite advantage that no fixation outside the normal line of sight is required, unlike when a ring around the optic disc is scanned, when the subject has to look 15 degrees off-axis. The downside is that 101 Retinal Identification 4 Will-be-set-by-IN-TECH the choroidal blood vessels are much thinner than around the optic disc, and they don’t form a clear pattern. Thus the price paid for easier fixation is the quality of the signal. Fig. 3. A schematic drawing of the current Retinal Identification technique, based on the patent from 1996. A detailed explanation is in the text. Fig. 4. A schematic drawing fixation-alignment technique using a Fresnel lens. A detailed explanation is in the text. Current RI technology is based on an active US patent [Johnson & Hill (1996)]. A schematic drawing of the measurement setup is shown in Figure 3. An infrared light source, for example a krypton lamp (1), is focused by a lens (2) through an optical mirror (explained below) via a pinhole (3). The light enters a beam splitter (4) which reflects it into a Fresnel optical scanner (5). The rotating optical scanner scans a ring on the cornea, which - if properly focused - will hit the retina at the same angle. A multifocus Fresnel lens, cemented in the scanner, creates a fan of nearly-collimated light beams which hit the eye of the tested subject. One of the 102 Biometrics Retinal Identification 5 beams will be focused on the retina by the eye’s own optical apparatus, thus compensating for refractive error (explained in detail below). The light reflected back from the eye fundus travels the same way through the scanner and into the beam splitter; a part of it is transmitted into a photodetector (8) through a focusing lens (7). After being measured by the photodetector, the signal is A/D-converted, amplified and processed. The processed signal is converted into points, which are stored in an array, which is used for matching. A similar process is used in all RI techniques. Fixation and alignment of the subject’s eye in a RI measurement is critical; it is almost impossible to scan a non-willing subject. The fixation system of the RI technique, which was also patented (Arndt (1990)) , is illustrated in Figure 4. A Light-Emitting Diode (LED) is situated next to the Krypton lamp. It illuminates the optical double-surface mirror, creating several reflections, ’ghost images’, of the LED on the optical axis of the system. These images function as targets for the test subject’s eye. The eye looks at them through a multifocal Fresnel lens. The lens, which consists of several focusing parts with different focal lengths, focuses the ghost images on different points on the eye’s optical axis. Regardless of whether the test subject is emmetropic (normal visual acuity), myopic (near-sighted) or hyperopic (far-sighted), one of the images will almost certainly end up on the retina and will thus result a sharp image and effectively compensate for the refractive error of the subject’s eye. However, this happens at the cost of optical image quality; the measured pattern is a sum of contributions from choroidal blood vessels and other structures. 2.1 Matching At first, a reference measurement, which the further measurements will be compared against, has to be taken from each tested subject. Any further measurement will be compared against the reference. As the subject’s eye can rotate around the optical axis (due to different head position, i.e. head tilt, between the measurements), the best possible match is found by ’rotating’ the measurement points in the array. The matching is done using a Fourier-based correlation; the match is measured on a scale of +1,0 (a perfect match) to -1,0 (a complete mismatch). User experience has shown that a match above 0,7 can be considered a matching identification. 3. RI using the RNFL birefringence The blood vessels emerging into the retina through the optic disc often displace the nerve fibers in the retinal nerve fiber layer. Unlike the RNFL nerve fiber axons, the blood vessels are not birefringent - thus, if the birefringence (change in the state of polarization) of the scanning laser beam is measured around the optic disc, a steep signal drop proportional to the blood vessel size is measured wherever one is encountered. This can be seen in the GDx-pictures, where the blood vessels are seen as dark lines on the otherwise bright nerve fiber layer. The author and his co-workers studied the possibility of using blood vessel-induced RNFL birefringence changes for biometric purposes [Agopov et al. (2008)]. A measurement device was built to scan a circle of 20 ◦ around the optic disc. The measured birefringence would drop steeply where a blood vessel is encountered, creating a sharp drop, or ’blip’, in the measured signal. The scanning angle is big enough to catch the major blood vessels which enter the fundus through the disc. 103 Retinal Identification 6 Will-be-set-by-IN-TECH 3.1 Apparatus and method Fig. 5. A schematic drawing of the RI measurement setup using the RNFL birefringence. A detailed explanation is in the text. The measurement setup is explained in detail in [Agopov et al. (2008)]; here it is explained briefly. The apparatus and the light paths within it are shown in Figure 5. The light path is drawn with a solid line. A 785 nm laser diode (1) was used as a light source. The beam was collimated by a lens (2); then it passed through a non-polarizing beam splitter (3) and was reflected by a mirror (4) further into the optical scanner (5-8). The two scanning mirrors (5 and 6) and a counterweight (7) were cemented on an aluminium plate which was spun by a DC motor (8). The center mirror was (5) tilted 45 ◦ from the disc plane rotated clockwise and reflected the laser beam onto the edge mirror. The edge mirror (6) was tilted 50 ◦ , creating a circular scan subtending a 10 ◦ radius of visual angle (20 ◦ diameter) in the tested subject’s eye (9). The reflection from the ocular fundus traveled back the same optical path through the scanner; now, however, the beam splitter (3) reflected the useful half of the beam (drawn with dashed line) into the detection system (10-13). A lens (10) focused the beam into a polarizing beam splitter (12) through a quarter wave plate (11) which had its fast axis 45 ◦ to the original plane of polarization. The polarizing beam splitter separated the p- and s-polarization components; two avalanche photodiodes (13) were placed right after it, measuring the two signals which corresponded the two polarization components. Amplified by the detection electronics, the signals were added and subtracted 104 Biometrics Retinal Identification 7 respectively. The polarization was manipulated so that the Stokes parameters S 0 and S 3 were measured - it was decided to measure S 3 instead of S 1 for birefringence-based changes as it appeared to suffer less from various amounts and orientations of corneal birefringence in our computerized model. As the alignment of the eye is critical, special care was taken to properly align the test subject’s eye. The measurement apparatus included three eyepieces: the measurement was taken through the fixed central piece; in addition there were two horizontally movable ones; the subject could look through the central piece with either eye while having a ’dummy’ eyepiece available for the other eye. Thus possible head tilt was reduced to almost zero. Because the fixation point of the retina, the fovea, is approximately 15 degrees away from the optic disc, the measured eye had to look 15 ◦ away in the nasal direction to center the scan on the optic disc. To achieve this, two fixation LEDs were set at 15 ◦ angle to the central axis of the scan. Because the human eye is about 0,75 D myopic at the wavelength 785 nm (see [Fernandez et al. (2005)] for details) the fixation LEDs were placed at 130 cm distance, so that the eye’s fixation would compensate for this. 4. RI using the optic disc structure Another interesting RI technology was patented in 2004 [Marshall & Usher (2004)]. The idea is to use the image of the optic disc - taken by a scanning laser ophthalmoscope (SLO) - for identification. A company Retinal Technologies (Boston, MA, USA) was founded for developing the technique. 4.1 The principle of an SLO The best-known application of the SLO is probably the Heidelberg Retina Tomograph (Heidelberg Engineering, Heidelberg, Germany), which is used for glaucoma diagnostics. The principle of an SLO is illustrated in 6. A low-intensity laser diode (1) is used for illumination. A collimated laser beam goes through a beam splitter (2) into an optical scanner (3). The scanner consists of two rotating mirrors, a fast and a slow one, creating a raster scan. The scanning beam is imaged through two lenses (4 and 5) onto exactly one point called the conjugated plane. If the imaged subject’s cornea is at this point(6), the eye’s optics focus the scanning beam onto the retina (7). The reflection from the eye fundus travels back through the system, but is reflected into the detection system (7-9) by the beam splitter. A lens (7) focuses the beam through a pinhole (8) onto a photodiode (9). The pinhole is very important - only the light which comes exactly fromthe conjugated plane reaches the photodetector. Thus the SLOcreates a high-resolution microscopic image of the retina. The scan is usually centered around the optic disc (using an off-axis fixation target - as in the previous setup). A typical SLO image of the optic disc (taken with the HRT) is shown in Figure 7. 4.2 Image analysis The boundary of the optic disc is found from the image taken by the SLO. There is a clear boundary between the disc and surrounding tissue (as seen in Figure 7); an ellipse is fit onto the image by analyzing the average intensity of the pixels around the boundary. Once the disc is identified, a recognization pattern is created from the its structures. This fairly complicated procedure is explained in detail in the patent. The patterns of recognized 105 Retinal Identification 8 Will-be-set-by-IN-TECH Fig. 6. An schematic drawing of a scanning laser ophthalmoscope. A detailed explanation is in the text. Fig. 7. A typical SLO image of the optic disc (taken with an HRT). 106 Biometrics Retinal Identification 9 individuals are stored in a database for comparing with the pattern of a person wishing to be identified. 5. Combined retina and iris identification Fig. 8. The measurement setup of a combined retina and iris identification device. The details are in the text. In a fairly recent patent [Muller et al. (2007)], a combination of retina and iris identification was suggested (about iris identification, see the previous chapter). The measurement system is constructed so that both biometrics can be recorded with one scan. This is obviously advantageous, as now two biometrics are available simultaneously. The measurement setup is shown in Figure 8. The setup has two optical axes: one for the retinal image (dashed line) and one for alignment and the iris image (solid line). They are 15 ◦ apart; as explained earlier, the incoming light is focused on the area around the optic disc only if the eye is looking 15 ◦ in the nasal direction. To achieve this, the test subject has to be faced towards the dashed line but look in the direction of the solid line. As a fixation target, a ring of green LED’s (11) is placed around the optical axis of the iris scanning system. Two illuminating LED’s are used: One infrared (λ ≈ 800 nm) LED (1), and one red (λ ≈ 660 nm) LED (12). The interference from the ’wrong’ light source is blocked a dichroic mirror (5). The light from the infrared LED (1) first goes through a polarizer (2), which ensures that the outgoing light is linearly polarized. The outgoing beam is then divided into two parts by a beam splitter (3). The transmitted (useless) part crosses the beam splitter and is preferably absorbed by a light trap. The reflected (useful) beam part is collimated by a lens (4) and goes through the dichroic mirror (5) into the eye. The beam is focused by the eye’s optics (6) onto the area around the optic disc; after being reflected from the eye fundus (7), a reflection of the beam returns back the same way as the beam came; however, coming from the other direction, a part of the reflected beam now passes through the beam splitter into the detection system. The polarizer (8) is set so that it absorbs the illuminating LED’s polarization direction. However - having changed its state of polarization while passing through the eye tissue - a part of the beam is now able to pass through the polarizer. The beam is focused by a lens (9) onto the detector (10), which can be for example a CCD camera. It should be noted that this setup has no optical scanner (nor it is confocal) - a wide-field image around the optic disc is 107 Retinal Identification 10 Will-be-set-by-IN-TECH captured, thus resulting in lower resolution than a confocal scanning system would achieve. For the iris scan, the illuminating light from the red LED (12) is first linearly polarized (13) 2 . and then guided into the eye through an alignment tube (14). The function of the alignment tube is not explained in the patent; preferably it would consist of at least one lens with a long focal length, which would focus the light onto the iris (the beam entering the eye should not be parallel, otherwise it will be focused on the retina). The distance between the lens and the beam splitter should be much bigger than that between the beam splitter and the eye, so that the slightly de-focused reflection image of the iris can be caught. On its way to the iris and back, the light double-passes a quarter wave plate (16). The wave plate’s axes are set so that both the fast and the slow axis of it are 45 ◦ to the original polarization direction - when the beam passes through it, its polarization becomes circular; the second pass (the reflection from the iris) turns the circular polarization into linear again, but having turned the polarization direction by 90 ◦ in the process. After entering the detection system (17-20), the light can now pass through the polarizer (17), which is set to absorb the polarizing direction of the initial polarizer (13). The beam is focused by a lens onto the detector (a CCD camera); the possible reflections of the green LED are filtered out by a red band pass filter (19). The two detection systems are electronically synchronized so that one scan records both images simultaneously. The recording is triggered by a switch, which is turned on when the eye is correctly aligned. The eye is at a crossing of two optical axes - its distance and orientation are critical. In the setup suggested in the patent, the distance is controlled by an ultrasound transducer. It sends and receives ultrasound pulses which are reflected back fromthe surface of the cornea. When the distance is right, and the optic disc is seen on the CCD (10), the eye is aligned properly, and the recording can be taken. 6. Results of performance tests and limitations of the RI techniques 6.1 RI using absorption In a performance test by Sandia National Laboratories [Sandia Laboratories (1991)], the EyeDentity RI device recognized >99% of the tested subjects in a three-attempt measurement, with no false positives. 6.1.1 Limitations As the light has to pass twice through the pupil during the measurement, a constricted pupil can increase the number of false negative scan results. Thus dim light conditions are preferable for the RI (of course, this is true for almost any optical measurement); the technique has difficulties in broad daylight. In addition, various eye conditions can disturb the light’s passing through the eye, compromising vision; this also affects measuring the eye’s properties optically, including the RI. 2 In the patent, the device is desribed without the polarizers (13 and 17), the quarter wave plate (16), the filter (19); instead of a beam splitter (15), a dichroic mirror is suggested. The fixation LEDs are also placed together with the red illuminating LED. However, this leads to unsurmountable difficulties. First of all, the IR light and the green light used for fixation cannot both pass through the dichroic mirror; moreover, the IR light would first have to pass through the mirror - the reflection from the eye would then have to be reflected by it. Therefore, the author suggests slight modifications in the setup. 108 Biometrics Retinal Identification 11 1. Severe astigmatism: An astigmatic eye’s optics image dots as lines. This results in problems with focusing and also bad optical quality of measurements. 2. Cataracts: A cataract is an eye condition in which clouding develops in the crystallinen lens. The lens becomes opaque so seeing becomes compromised. Obviously any optical measurement in the eye, including the RI, becomes increasingly difficult. 3. Severe eye diseases, such as the age-related macula degeneration (AMD), can change the structure of the retina, either by destroying retinal tissue or by stimulating growth of new blood vessels. 6.2 RI using birefringence Eight eyes of four volunteer subjects were measured. Both absorption- and birefringence- based signals were recorded two times for each eye. For verifying purposes, fundus photographs were taken from all the eyes. The measured peaks and the blood vessels on the fundus photos were compared as follows: A 20 ◦ diameter circle was drawn on a transparent overlay, on which the measured peaks were marked at corresponding angles on the perimeter of the circle. The transparency was then placed on the fundus photo to compare the marked signal peaks with the blood vessels on the photo. Only the vessels larger than a certain threshold size (set for each eye individually) were taken into account, i.e. the smallest vessels were ignored. In this way, the numbers of blood vessels corresponding to the peaks in the measured absorption/reflectance and birefringence-based tracings were calculated. If a confirmed peak did not correspond to a vessel above the threshold size on the fundus photo, it was considered a false positive. Altogether 55 blood vessels were located on the fundus photos of the ’better’ eyes (the ones which yielded clearer signals) of the four volunteers, of which 34 could be correlated with the ’peaks’ in the measured signal. The calculated sensitivities (number of vessels identified / number of vessels altogether) and specificities (number of positive recognitions / number of positive + false positive recognitions) - are presented in Table 1. The columns in the tables represent total percentage of blood vessel ’peaks’ correlating with vessels in the fundus photo (Total), from the two reflection/absorption measurements (Sum) and from the two birefringence measurements (Diff). Total Sum Diff Average sensitivity 69% 51% 33% Average specificity 78% 76% 60% Table 1. Summarized results of the measurements taken from the eye with a better signal of each subject. 6.2.1 Limitations Our system was of ’proof-of-principle’ -nature and - unlike the conventional RI technique - had no inherent defocus compensation; the test subjects were required to have a refractive error of less than about ±2 diopters or a good corrected vision using contact lenses. The eye conditions disturbing the measurement include cataracts and astigmatism as well as a severe glaucoma, which damages the RNFL by killing the nerve fibers going through the optic nerve. 109 Retinal Identification 12 Will-be-set-by-IN-TECH 6.3 Other techniques The author in unaware of any scientific studies on the accuracy of the other techniques mentioned here. 7. Discussion Retinal Identification remains the most reliable and secure biometric. Falsifying an image of a retinal blood vessel pattern appears impossible. The eye scans are generally considered invasive or even harmful, especially if a laser (even if the light is weak intensity and harmless to the eye, as in our system). However, the enrollees should be able to overcome their fear of these scans once they acquire more user experience. The original RI technique uses the choroidal blood vessel pattern for identification; however, several newer RI techniques center the scan around the optic disc. The main drawback in such a device is that the eye has to fixate 15 degrees off-axis while being measured. However, if this is achieved, the blood vessels around the optic disc are easier to detect - in addition, the - as the authors and his co-workers proved - birefringence-based blood vessel detection can help detecting more blood vessels, only at a small cost of specificity. The measured signal was also directly linked to blood vessels and not to other retinal structures, as in the original RI technique. The newer techniques - imaging the optic disc using an SLO, or the combined retina- and iris scan - appear very promising. However, the author is unaware either of any commercial devices or of any scientific studies on the accuracy of these identifying method. The use of retinal identification is not limited to humans. In 2004, a patent was filed on identifying various animals using their retinal blood vessel pattern [Golden et al. (2004)]. To develop the technique for animals, a company OptiBrand was started (Ft. Collins, Colorado, USA). The company produces and develops hand-held video camera -based devices which provide an acceptable image an animal’s eye fundus to allow identification. This method is certainly preferable over the traditional hot iron branding, which is not only painful to the animal, but also costly in the lost hide value. When a false identification is not disastrous (as opposed to some high security installations), a simple video camera based device provides an accurate enough identification. 8. Appendix: The Stokes parameters The modern treatment of polarization was first suggested by Stokes in the mid-1800’s. The polarization state of the light can be completely represented with four quantities, the Stokes parameters: S 0 = E 2 x T +E 2 y T S 1 = E 2 x T −E 2 y T S 2 = 2E x E y cos T S 3 = 2E x E y sin T , (1) where = y − x is the phase difference between x- and y-polarization and the T denote time averages. For totally unpolarized light S 0 > 0 and S 1 = S 2 = S 3 = 0. For completely polarized light, on 110 Biometrics Retinal Identification 13 the other hand, S 2 0 = S 2 1 + S 2 2 + S 2 3 . The polarization state of the light can be defined as V = S 2 1 + S 2 2 + S 2 3 S 0 . (2) The importance of the Stokes parameters lies in the fact that they are connected to easily measurable intensities: S 0 (θ, ρ) = I(0 ◦ , 0) + I(90 ◦ , 0) S 1 (θ, ρ) = I(0 ◦ , 0) − I(90 ◦ , 0) S 2 (θ, ρ) = I(45 ◦ , 0) − I(135 ◦ , 0) S 3 (θ, ρ) = I(45 ◦ , π/2) − I(135 ◦ , π/2), (3) where θ is the angle of the azimuth vector measured from the x-plane and ρ is the birefringence. These can be easily measured: for example, S 1 can be measured with polarizing beam splitter and two detectors, which are placed so that they measure the intensities coming out of the beam splitter, and S 3 can be measured by adding a quarter wave plate before the aforementioned system. 9. References Agopov, M.; Gramatikov, B.I.; Wu, Y.K.; Irsch, K; Guyton, D.L. (2008). Use of retinal nerve fiber layer birefringence as an addition to absorption in retinal scanning for biometric purposes. Applied Optics 47: 1048-1053 Bettelheim, F. (1975). On optical anisotrophy of the lens fiber cells. Exp. Eye Res. 21: 231-234 Cope, W.; Wolbarsht; Yamanashi, B. (1978). The corneal polarization cross. J. Opt Soc Am 68: 1149-1140 Dreher, A.; Reiter, K. (1992). Scanning laser polarimetry of the retinal nerve fiber layer. SPIE 1746:34-41 Fernandez, E. J.; Unterhuber, A; Pieto, P.M.; Hermann, B.; Drexler, W.; Artal, P. (2005). Ocular aberrations as a function of wavelength in the near infrared measured with a femtosecond laser. Optics Express 13:400-409 Hill, R. (1978). Apparatus and method for identifying individuals through their retinal vasculature patterns, United States Patent 4109237 Hill, R. (1981). Rotating beam ocular identification apparatus and method. United States Patent 4,393,366 Hill, R. (1986). Fovea-centered eye fundus scanner. United States Patent 4620318 Johnson, J.C.; Hill, R. (1996). Eye fundus optical scanner system and method, United States Patent 5532771 Golden, B.L.; Rollin, B.E.; Switzer, R.; Comstock, C.R. (2004). Retinal vasculature image acquisition apparatus and method, United States Patent 6766041 Arndt, J.H. (1990). Optical alignment system. United States Patent 4923297 Klein Brink, H. B.; Van Blockland, G.J. (1988). Birefringence of the human foveal area assessed in vivo with the Mueller-Matrix ellipsometry. J. Opt. Soc. Amer. A 5:49-57 Marshall, J.; Usher, D. (2004). Method for generating a unique and consistent signal pattern for identification of an individual, United States Patent 6757409 111 Retinal Identification 14 Will-be-set-by-IN-TECH Muller, D. F.; Heacock G. L.; Usher D. B. (2007). Method and systemfor generating a combined retina/iris patter biometric, United States Patent 7248720 Sandia National Laboratories (1991). Performance Evaluation of Biometric Identification Devices, Technical Report SAND91-0276, UC-906 Simon C.; Goldstein I. (1935). ANewScientific Method of Identification, New York State Journal of Medicine, Vol. 35, No. 18, pp. 901-906 Tower, P. (1955). The fundus Oculi in monozygotic twins: Report of six pairs of identical twins, Archives of Ophthalmology, Vol. 54, pp. 225-239 Weinreb, R.; Dreher A.; Coleman, A.; Quigley H.; Shaw, B.; Reiter, K. (1990). Histopatologic validation of Fourier-ellipsometry measurements of retinal nerve fiber layer thickness, Arch Opthalmol 108:557-60 112 Biometrics 0 Retinal Vessel Tree as Biometric Pattern Marcos Ortega and Manuel G. Penedo University of Coruña Department of Computer Science Spain 1. Introduction In current society, reliable authentication and authorization of individuals are becoming more and more necessary tasks for everyday activities or applications. Just for instance, common situations such as accessing to a building restricted to authorized people (members, workers,...), taking a flight or performing a money transfer require the verification of the identity of the individual trying to perform these tasks. When considering automation of the identity verification, the most challenging aspect is the need of high accuracy, in terms of avoiding incorrect authorizations or rejections. While the user should not be denied to perform a task if authorized, he/she should be also ideally inconvenienced to a minimum which further complicates the whole verification process Siguenza Pizarro &Tapiador Mateos (2005). With this scope in mind, the term biometrics refers to identifying an individual based on his/her distinguished intrinsic characteristics. Particularly, this characteristics usually consist of physiological or behavioral features. Physiological features, such as fingerprints, are physical characteristics usually measured at a particular point of time. Behavioral characteristics, such as speech or handwriting, make reference to the way some action is performed by every individual. As they characterize a particular activity, behavioral biometrics are usually measured over time and are more dependant on the individual’s state of mind or deliberated alteration. To reinforce the active versus passive idea of both paradigms, physiological biometrics are also usually referred to as static biometrics while behavioral ones are referred to as dynamic biometrics. The traditional authentication systems based on possessions or knowledge are widely spread in the society but they have many drawbacks that biometrics try to overcome. For instance, in the scope of the knowledge-based authentication, it is well known that password systems are vulnerable mainly due to the wrong use of users and administrators. It is not rare to find some administrators sharing the same password, or users giving away their own to other people. One of the most common problems is the use of easily discovered passwords (child names, birth dates, car plate,...). On the other hand, the use of sophisticated passwords consisting of numbers, upper and lower case letters and even punctuation marks makes it harder to remember them for an user. Nevertheless, the password systems are easily broken by the use of brute force where powerful computers generate all the possible combinations and test it against the authentication system. In the scope of the possession-based authentication, it is obvious that the main concerns are 6 2 Will-be-set-by-IN-TECH related to the loss of the identification token. If the token was stolen or found by another individual, the privacy and/or security would be compromised. Biometrics overcome most of these concerns while they also allow an easy entry to computer systems to non expert users with no need to recall complex passwords. Additionally, commercial webs on the Internet are favored not only by the increasing trust being transmitted to the user but also by the possibility of offering a customizable environment for every individual along with the valuable information on personal preferences for each of them. Many different human biometrics have been used to build a valid template for verification and identification tasks. Among the most common biometrics, we can find the fingerprint Bolle et al. (2002); Maio & Maltoni (1997); Seung-Hyun et al. (1995); Venkataramani & Kumar (2003), iris Chou et al. (2006); He et al. (2008); Kim, Cho, Choi & Marks (2004); Ma et al. (2002); Nabti & Bouridane (2007) or face Kim, Kim, Bang & Lee (2004); Kisku et al. (2008); Mian et al. (2008); Moghaddam & Pentland (1997); Yang et al. (2000) or hand geometry Jain et al. (1999); Sidlauskas (1988); systems Lab (n.d.); Zunkel (1999). However, there exist other emerging biometrics where we can find retina biometrics. Identity verification based on retina uses the blood vessels pattern present in the retina (Figure1). Fig. 1. Schema of the retina in the human eye. Blood vessels are used as biometric characteristic. Retinal blood vessel pattern is unique for each human being even in the case of identical twins. Moreover, it is a highly stable pattern over time and totally independent of genetic factors. Also, it is one of the hardest biometric to forge as the identification relies on the blood circulation along the vessels. These property make it one of the best biometric characteristic in high security environments. Its main drawback is the acquisition process which requires collaboration from the user and it is sometimes perceived as intrusive. As it will be further discussed, some advances have been done in this field but, in any case, this continues to be the weak point in retinal based authentication. Robert Hill introduced the first identification system based on retina Hill (1999). The general idea was that of taking advantage of the inherent properties of the retinal vessel pattern to build a secure system. The system acquired the data via a scanner that required the user to be still for a few seconds. The scanner captured a band in the blood vessels area similar to the one employed in the iris recognition as shown in Figure 2. The scanned area is a circular band around blood vessels. This contrast information of this area is processed via fast Fourier transform. The transformed data forms the final biometric pattern 114 Biometrics Retinal Vessel Tree as Biometric Pattern 3 Fig. 2. Illustration of the scan area in the retina used in the system of Robert Hill. considered in this system. This pattern worked good enough as the acquisition environment was very controlled. Of course, this is also the source of the major drawbacks present in the device: the data acquisition process. This process was both slow and uncomfortable for the user. Moreover, the hardware was very expensive and, therefore, it rendered the system hardly appealing. Finally, the result was that the use of retinal pattern as a biometric characteristic, despite all its convenient properties, was discontinued. Nowadays, retinal image cameras (Figure 3) are capable of taking a photograph of the retina area in a human eye without any intrusive or dangerous scanning. Also, currently, the devices are cheaper and more accessible in general. This technology reduces the perception of danger by the user during the retina acquisition process but also brings more freedom producing a more heterogeneous type of retinal images to work with. The lighting conditions and the movement of the user’s eye vary between acquisitions. This produces as a result that previous systems based on contrast information of reduced areas may lack the required precision in some cases, increasing the false rejection rate. Fig. 3. Two retinal image cameras. The retinal image is acquired by taking an instant photograph. In Figure 4 it can be observed two images from the same person acquired at different times by the same retinograph. There are some zones in the retinal vessels that can not be compared 115 Retinal Vessel Tree as Biometric Pattern 4 Will-be-set-by-IN-TECH because of the lack of information in one of the images. Thus, to allow the retinal biometrics to keep and increase the acquisition comfortability, it is necessary to implement a more robust methodology that, maintaining the extremely low error rates, is capable to cope with a more heterogeneous range of retinal images. Fig. 4. Example of two digital retina images from the same individual acquired by the same retinal camera at different times. This work is focused on the proposal of a novel personal authentication system based on the retinal vessel tree. This system deals with the new challenges in the retinal field where a more robust pattern has to be designed in order to increase the usability for the acquisition stage. In this sense, the approach presented here to the retinal recognition is closer to the fingerprint developments than to the iris ones as the own structure of the retinal vessel tree suggests. Briefly, the objectives of this work are enumerated: • Empirical evaluation of the retinal vessel tree as biometric pattern • Design a robust, easy to store and process biometric pattern making use of the whole retinal vessel tree information • Development of an efficient and effective methodology to compare and match such retinal patterns • Analysis on similarity metrics performance to establish reliable thresholds in the authentication process To deal with the suggested goals, the rest of this document is organized as follows. Second section introduces previous works and research on the retinal vessel tree as biometric pattern. Section 3 presents the methodology developed to build the authentication system, including biometric template construction and template matching algorithms. Section 4 discusses the experiments aimed to test the proposed methodologies, including an analysis of similarity measures. Finally, Section 5 offers some conclusions and final discussion. 2. Related work Awareness of the uniqueness of the retinal vascular pattern dates back to 1935 when two ophthalmologists, Drs. Carleton Simon and Isodore Goldstein, while studying eye disease, realized that every eye has its own unique pattern of blood vessels. They subsequently published a paper on the use of retinal photographs for identifying people based on their blood vessel patterns Simon & Goldstein (1935). Later in the 1950s, their conclusions were supported by Dr. Paul Tower in the course of his study of identical twins. He noted that, of any two persons, identical twins would be the most likely to have similar retinal vascular 116 Biometrics Retinal Vessel Tree as Biometric Pattern 5 patterns. However, Tower showed that, of all the factors compared between twins, retinal vascular patterns showed the least similarities Tower (1955). Blood vessels are among the first organs to develop and are entirely derived from the mesoderm. Vascular development occurs via two processes termed vasculogenesis and angiogenesis. Vasculogenesis, this is, the blood vessel assembly during embryogenesis, begins with the clustering of primitive vascular cells or hemangioblasts into tube-like endothelial structures, which define the pattern of the vasculature. In angiogenesis, new vessels arise by sprouting of budlike and fine endothelial extensions from preexisting vessels Noden (1989). In a more recent study Whittier et al. (2003), retinal vascular pattern images from livestock were digitally acquired in order to evaluate their pattern uniqueness. To evaluate each retinal vessel pattern, the dominate trunk vessel of bovine retinal images was positioned vertically and branches on the right and left of the trunk and other branching points were evaluated. Branches from the left (mean 6.4 and variance 2.2) and the right (mean 6.4 and variance 1.5) of the vascular trunk; total branches from the vascular trunk (mean 12.8 and variance 4.3), and total branching points (mean 20.0 and variance 13.2) showed differences across all animals (52). A paired comparison of the retinal vessel patterns from both eyes of 30 other animals confirmed that eyes from the same animal differ. Retinal images of 4 cloned sheep from the same parent line were evaluated to confirm the uniqueness of the retinal vessel patterns in genetically identical animals. This would be confirming the uniqueness of animal retinal vascular pattern suggested earlier in the 1980s also by De Schaepdrijver et al. (1989). In general, retinal vessel tree his is a unique pattern in each individual and it is almost impossible to forge that pattern in a false individual. Of course, the pattern does not change through the individual’s life, unless a serious pathology appears in the eye. Most common diseases like diabetes do not change the pattern in a way that its topology is affected. Some lesions (points or small regions) can appear but they are easily avoided in the vessels extraction method that will be discussed later. Thus, retinal vessel tree pattern has been proved a valid biometric trait for personal authentication as it is unique, time invariant and very hard to forge, as showed by Mariño et al. C.Mariño et al. (2003); Mariño et al. (2006), who introduced a novel authentication system based on this trait. In that work, the whole arterial-venous tree structure was used as the feature pattern for individuals. The results showed a high confidence band in the authentication process but the database included only 6 individuals with 2 images for each of them. One of the weak points of the proposed system was the necessity of storing and handling a whole image as the biometric pattern. This greatly difficults the storing of the pattern in databases and even in different devices with memory restrictions like cards or mobile devices. In Farzin et al. (2008) a pattern is defined using the optic disc as reference structure and using multi scale analysis to compute a feature vector around it. Good results were obtained using an artificial scenario created by randomly rotating one image per user for different users. The dataset size is 60 images, rotated 5 times each. The performance of the system is about a 99% accuracy. However, the experimental results do not offer error measures in a real case scenario where different images from the same individual are compared. Based on the idea of fingerprint minutiae, a robust pattern is introduced where a set of landmarks (bifurcations and crossovers of retinal vessel tree) were extracted and used as feature points. In this scenario, the pattern matching problem is reduced to a point pattern matching problem and the similarity metric has to be defined in terms of matched points. A common problem in previous approaches is that the optic disc is used as a reference structure in the image. The detection of the optic disc is a complex problem and in some individuals 117 Retinal Vessel Tree as Biometric Pattern 6 Will-be-set-by-IN-TECH with eye diseases this cannot be achieved correctly. In this work, the use of reference structures is avoided to allow the system to cope with a wider range of images and users. 3. Retinal verification based on feature points Figure 5 illustrates the general schema for the new feature point based authentication approach. The newly introduced stages are the feature point extraction and the feature point matching. The following chapter sections will discuss the methodology on these new stages of the system. Fig. 5. Schema of the main stages for the authentication system based in the retinal vessel tree structure. 3.1 Feature points extraction Following the idea that vessels can be thought of as creases (ridges or valleys) when images are seen as landscapes (see Figure 6), curvature level curves will be used to calculate the creases 118 Biometrics Retinal Vessel Tree as Biometric Pattern 7 (ridge and valley lines). Several methods for crease detection have been proposed in the literature (see López et al. (1999) for a comparison between methods), but finally a differential geometry based method López et al. (2000) was selected because of its good performance in similar images Lloret et al. (1999; 2001), producing very good results. Fig. 6. Picture of a region of the retinal image as landscape. Vessels can be represented as creases. Among the many definitions of crease, the one based on Level Set Extrinsic Curvature, LSEC López et al. (1998), has useful invariance properties. The geometry based method named LSEC gives rise to several problems, solved through the improvement of this method by a multilocal solution, the MLSECLópez et al. (2000). But results obtained with MLSEC can still be improved by pre-filtering the image gradient vector field using structure tensor analysis and by discarding creaseness at isotropic areas by means of the computation of a confidence measure. The methodology allows to tune several parameters to apply such filters as for creases with a concrete width range or crease length. In Caderno et al. (2004) a methodology was presented for automatic parameter tuning by analyzing contrast variance in the retinal image. One of the main advantages of this method is that it is invariant to changes in contrast and illumination, allowing the extraction of creases from arteries and veins independently of the characteristics of the images, avoiding a previous normalization of the input images. The final result is an image where the retinal vessel tree is represented by its crease lines. Figure 7 shows several examples of the creases obtained from different retinal images. The landmarks of interest are points where two different vessels are connected. Therefore, it is necessary to study the existing relationships between vessels in the image. The first step is to track and label the vessels to be able to establish their relationships between them. In Figure 8, it can be observed that the crease images show discontinuities in the crossovers and bifurcations points. This occurs because of the two different vessels (valleys or ridges) coming together into a region where the crease direction can not be set. Moreover, due to some illumination or intensity loss issues, the crease images can also show some discontinuities along a vessel (Figure 8). This issue requires a process of joining segments to build the whole vessels prior to the bifurcation/crossover analysis. Once the relationships between segments are established, a final stage will take place to remove some possible spurious feature points. Thus, the four main stages in the feature point extraction process are: 119 Retinal Vessel Tree as Biometric Pattern 8 Will-be-set-by-IN-TECH Fig. 7. Three examples of digital retinal images, showing the variability of the vessel tree among individuals. Left column: input images. Right column: creases of images on the left column representing the main vessels. Fig. 8. Example of discontinuities in the creases of the retinal vessels. Discontinuities in bifurcations and crossovers are due to two creases with different directions joining in the same region. But, also, some other discontinuities along a vessel can happen due to illumination and contrast variations in the image. 120 Biometrics Retinal Vessel Tree as Biometric Pattern 9 1. Labelling of the vessels segments 2. Establishing the joint or union relationships between vessels 3. Establishing crossover and bifurcation relationships between vessels 4. Filtering of the crossovers and bifurcations 3.1.1 Tracking and labelling of vessel segments To detect and label the vessel segments, an image tracking process is performed. As the crease images eliminate background information, any non-null pixel (intensity greater than zero) belongs to a vessel segment. Taking this into account, each row in the image is tracked (from top to bottom) and when a non-null pixel is found, the segment tracking process takes place. The aim is to label the vessel segment found, as a line of 1 pixel width. This is, every pixel will have only two neighbors (previous and next) avoiding ambiguity to track the resulting segment in further processes. To start the tracking process, the configuration of the 4 pixels which have not been analyzed by the initially detected pixel is calculated. This leads to 16 possible configurations depending on whether there is a segment pixel or not in each one of the 4 positions. If the initial pixel has no neighbors, it is discarded and the image tracking continues. In the other cases there are two main possibilities: either the initial pixel is an endpoint for the segment, so this is tracked in one way only or the initial pixel is a middle point and the segment is tracked in two ways from it. Figure 9 shows the 16 possible neighborhood configurations and how the tracking directions are established in any case. Once the segment tracking process has started, in every step a neighbor of the last pixel flagged as segment is selected to be the next. This choice is made using the following criterion: the best neighbor is the one with the most non-flagged neighbors corresponding to segment pixels. This heuristic contains the idea of keeping the 1-pixel width segment to track along the middle of the crease, where pixels have more segment pixel neighbors. In case of a tie, the heuristic tries to preserve the most repeated orientation in the last steps. When the whole image tracking process finishes, every segment is a 1 pixel width line with its endpoints defined. The endpoints are very useful to establish relationships between segments because these relationships can always be detected in the surroundings of a segment endpoint. This avoids the analysis of every pixel belonging to a vessel, considerably reducing the complexity of the algorithm and therefore the running time. 3.1.2 Union relationships As stated before, the union detection is needed to build the vessels out of their segments. Aside the segments fromthe crease image, no additional information is required and therefore is the first kind of relationship to be detected in the image. An union or joint between two segments exists when one of the segments is the continuation of the other in the same retinal vessel. Figure 10 shows some examples of union relationships between segments. To find these relationships, the developed algorithm uses the segment endpoints calculated and labelled in the previous subsection. The main idea is to analyze pairs of close endpoints from different segments and quantify the likelihood of one being the prolongation of the other. The proposed algorithm connects both endpoints and measures the smoothness of the connection. An efficient approach to connect the segments is using an straight line between both endpoints. In Figure 11, a graphical description of the detection process for an union is 121 Retinal Vessel Tree as Biometric Pattern 10 Will-be-set-by-IN-TECH (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) Fig. 9. Initial tracking process for a segment depending on the neighbor pixels surrounding the first pixel found for the new segment in a 8-neighborhood. As there are 4 neighbors not tracked yet (the bottom row and the one to the right), there are a total of 16 possible configurations. Gray squares represent crease (vessel) pixels and the white ones, background pixels. The upper row neighbors and the left one are ignored as they have already been tracked due to the image tracking direction. Arrows point to the next pixels to track while crosses flag pixels to be ignored. In 9(d), 9(g), 9(j) and 9(n) the forked arrows mean that only the best of the pointed pixels (i.e. the one with more new vessel pixel neighbors) is selected for continuing the tracking. Arrows starting with a black circle flag the central pixel as an endpoint for the segment (9(b), 9(c), 9(d), 9(e), 9(g), 9(i), 9(j)). shown. The smoothness measurement is obtained from the angles between the straight line and the segment direction. The segment direction is calculated by the endpoint direction. The maximum smoothness occurs when both angles are π rad., i.e. both segments are parallel and belong to the straight line connecting it. The smoothness decreases as both angles decrease. A criterion to accept the candidate relationship must be established. A minimum angle θ min is set as the threshold for both angles. This way, the criterion to accept an union relationship is defined as Union(r, s) = (α > θ min ) ∧ (β > θ min ) (1) 122 Biometrics Retinal Vessel Tree as Biometric Pattern 11 Fig. 10. Examples of union relationships. Some of the vessels present discontinuities leading to different segments. These discontinuities are detected in the union relationships detection process. where r, s are the segments involved in the union and α, β their respective endpoint directions. It has been observed that for values of θ min close to 3 4 π rad. the algorithmdelivers good results in all cases. Fig. 11. Union of the crease segments r and s. The angles between the new segment AB and the crease segments r (α) and s (β) are near π rad, so they are above the required threshold ( 3 4 π) and the union is finally accepted. 3.1.3 Bifurcation/crossover relationships Bifurcations and crossovers are the feature interest points in this work for characterizing individuals by a biometric pattern. A crossover is an intersection between two segments. A bifurcation is a point in a segment where another one starts from. While unions allow to build the vessels, bifurcations allow to build the vessel tree by establishing relationships between them. Using both types, the retinal vessel tree can be reconstructed by joining all segments. An example of this is shown in Figure 12. Acrossover can be seen in the segment image, as two close bifurcations forking fromthe same segment. Therefore, finding bifurcation and crossover relationships between segments can be initially reduced to find only bifurcations. Crossovers can then be detected analyzing close bifurcations. In order to find bifurcations in the image, an idea similar to the union algorithm is followed based on the search of the bifurcations from the segments endpoints. The criterion in this case is finding a segment close to an endpoint whose segment can be assumed to start in the found one. This way, the algorithm does not require to track the whole segments, bounding complexity to the number of segments and not to their length. 123 Retinal Vessel Tree as Biometric Pattern 12 Will-be-set-by-IN-TECH Fig. 12. Retinal Vessel Tree reconstruction by unions (t, u) and bifurcations (r, s) and (r, t). For every endpoint in the image, the process is as follows (Figure 13): 1. Compute the endpoint direction. 2. Extend the segment in that direction a fixed length l max . 3. Analyze the points in and nearby the prolongation segment to find candidate segments. 4. If a point of a different segment is found, compute the angle (α) associated to that bifurcation, defined by the direction of this point and the extreme direction from step 1. The parameter l max is inserted in the model to avoid indefinite prolongation of the segments. If it follows that l <= l max , the segments will be joined and a bifurcation will be detected, being l the distance from the endpoint of the segment to the other segment. Fig. 13. Bifurcation between segment r and s. The endpoint of r is prolonged a maximum distance l max and eventually a point of segment s is found. Figure 14 shows an example of results after this stage where feature points are marked. Also, spurious detected points are identified in the image. These spurious points may occur for different reasons such as wrongly detected segments. In the image test set used (over 100 images) the approximate mean number of feature points detected per image was 28. The mean of spurious points corresponded to 5 points per image. To improve the performance of the matching process is convenient to eliminate as spurious points as possible. Thus, the last stage in the biometric pattern extraction process will be the filtering of spurious points in order to obtain an accurate biometric pattern for an individual. 124 Biometrics Retinal Vessel Tree as Biometric Pattern 13 (a) (b) Fig. 14. Example of feature points extracted from original image after the bifurcation/crossover stage. (a) Original Image. (b) Feature points marked over the segment image. Spurious points corresponding to the same crossover (detected as two bifurcations) are signalled in squares. 3.1.4 Filtering of feature points A segment filtering process takes place in the tracking stage, filtering detected segments by their length using a threshold, T min . This leads to images with minimum false segments and with only important segments in the vessel tree. Finally, since crossover points are detected as two bifurcation points, as Figure 14(b) shows, these bifurcation points are merged into an unique feature point by calculating the midpoint between them. Figure 15 shows an example of the filtering process result, i.e. the biometric pattern obtained from an individual. Briefly, in the initial test set of images used to tune the parameters, the reduction of false detected points was about from 5 to 2 in the average. (a) (b) Fig. 15. Example of the result after the feature point filtering. (a) Image containing feature points before filtering. (b) Image containing feature points after filtering. Spurious points from duplicate crossover points have been eliminated. 3.2 Biometric pattern matching In the matching stage, the stored reference pattern, ν, for the claimed identity is compared to the pattern extracted, ν , during the previous stage. Due to the eye movement during the image acquisition stage, it is necessary to align ν with ν in order to be matched L.G.Brown (1992); M.S.Markov et al. (1993); Zitová & Flusser (2003). This fact is illustrated in Figure 16 125 Retinal Vessel Tree as Biometric Pattern 14 Will-be-set-by-IN-TECH where two images from the same individual, 16(a) and 16(c), and the obtained results in each case, 16(b) and 16(d), are shown using the crease approach. (a) (b) (c) (d) Fig. 16. Examples of feature points obtained from images of the same individual acquired in different times. (a) and (c) original images. (b) Feature point image from (a). A set of 23 points is obtained. (d) Feature point image from (c). A set of 17 points are obtained. Depending on several factors, such as the eye location in the objective, patterns may suffer some deformations. A reliable and efficient model is necessary to deal with these deformations allowing to transform the candidate pattern in order to get a pattern similar to the reference one. The movement of the eye in the image acquisition process basically consists in translation in both axis, rotation and sometimes a very small change in scale. It is also important to note that both patterns ν and ν could have a different number of points even being from the same individual. This is due to the different conditions of illumination and orientation in the image acquisition stage. The transformation considered in this work is the Similarity Transformation (ST), which is a special case of the Global Affine Transformation (GAT). ST can model translation, rotation and isotropic scaling using 4 parameters Ryan et al. (2004). The ST works fine with this kind of images as the rotation angle is moderate. It has also been observed that the scaling, due to eye proximity to the camera, is nearly constant for all the images. Also, the rotations are very slight as the eye orientation when facing the camera is very similar. Under these circumstances, the ST model appears to be very suitable. The ultimate goal is to achieve a final value indicating the similarity between the two feature points set, in order to decide about the acceptance or the rejection of the hypothesis that both images correspond to the same individual. To develop this task the matching pairings between both images must be determined. Atransformation has to be applied to the candidate image in order to register its feature points with respect to the corresponding points in the reference image. The set of possible transformations is built based on some restrictions and 126 Biometrics Retinal Vessel Tree as Biometric Pattern 15 a matching process is performed for each one of these. The transformation with the highest matching score will be accepted as the best transformation. To obtain the four parameters of a concrete ST, two pairs of feature points between the reference and candidate patterns are considered. If M is the total number of feature points in the reference pattern and N the total number of points in the candidate one, the size of the set T of possible transformations is computed using Eq.(2): T = (M 2 − M)(N 2 − N) 2 (2) where M and N represent the cardinality of ν and ν respectively. Since T represents a high number of transformations, some restrictions must be applied in order to reduce it. As the scale factor between patterns is always very small in this acquisition process, a constraint can be set to the pairs of points to be associated. In this scenario, the distance between both points in each pattern has to be very similar. As it cannot be assumed that it will be the same, two thresholds are defined, S min and S max , to bound the scale factor. This way, elements from T are removed where the scale factor is greater or lower than the respective thresholds S min and S max . Eq.(3) formalises this restriction: S min < distance(p, q) distance(p , q ) < S max (3) where p, q are points from ν pattern, and p , q are the matched points from the ν pattern. Using this technique, the number of possible matches greatly decrease and, in consequence, the set of possible transformations decreases accordingly. The mean percentage of not considered transformations by these restrictions is around 70%. In order to check feature points, a similarity value between points (SI M) is defined which indicates how similar two points are. The distance between these two points will be used to compute that value. For two points A and B, their similarity value is defined by Eq.(4): SI M(A, B) = 1 − distance(A, B) D max (4) where D max is a threshold that stands for the maximum distance allowed for those points to be considered a possible match. If distance(A, B) > D max then SI M(A, B) = 0. D max is a threshold introduced in order to consider the quality loss and discontinuities during the creases extraction process leading to mislocation of feature points by some pixels. In some cases,two points B 1 , B 2 could have both a good value of similarity with one point A in the reference pattern. This happens because B 1 and B 2 are close to each other in the candidate pattern. To identify the most suitable matching pair, the possibility of correspondence is defined comparing the similarity value between those points to the rest of similarity values of each one of them: P(A i , B j ) = SI M(A i , B j ) 2 ⎛ ⎝ M ∑ i=1 SI M(A i , B j ) + N ∑ j =1 SI M(A i , B j ) −SI M(A i , B j ) ⎞ ⎠ (5) 127 Retinal Vessel Tree as Biometric Pattern 16 Will-be-set-by-IN-TECH A M× N matrix Q is constructed such that position (i, j) holds P(A i , B j ). Note that if the similarity value is 0, the possibility value is also 0. This means that only valid matchings will have a non-zero value in Q. The desired set C of matching feature points is obtained from P using a greedy algorithm. The element (i, j) inserted in C is the position in Q where the maximum value is stored. Then, to prevent the selection of the same point in one of the images again, the row (i) and the column(j) associated to that pair are set to 0. The algorithm finishes when no more non-zero elements can be selected from Q. The final set of matched points between patterns is C. Using this information, a similarity metric must be established to obtain a final criterion of comparison between patterns. 4. Similarity metrics analysis The goal in this stage of the process is to define similarity measures on the aligned patterns to correctly classify authentications in both classes: attacks (unauthorised accesses), when the two matched patterns are from different individuals and clients (authorised accesses) when both patterns belong to the same person. For the metric analysis a set of 150 images (100 images, 2 images per individual and 50 different images more) from VARIA database VARIA (2007) were used. The rest of the images will be used for testing in the next section. The images from the database have been acquired with a TopCon non-mydriatic camera NW-100 model and are optic disc centred with a resolution of 768x584. There are 60 individuals with two or more images acquired in a time span of 6 years. These images have a high variability in contrast and illumination allowing the system to be tested in quite hard conditions. In order to build the training set of matchings, all images are matched versus all the images (a total of 150x150 matchings) for each metric. The matchings are classified into attacks or clients accesses depending if the images belong to the same individual or not. Distributions of similarity values for both classes are compared in order to analyse the classification capabilities of the metrics. The main information to measure similarity between two patterns is the number of feature points successfully matched between them. Fig.17(a) shows the histogram of matched points for both classes of authentications in the training set. As it can be observed, matched points information is by itself quite significative but insufficient to completely separate both populations as in the interval [10, 13] there is an overlapping between them. This overlapping is caused by the variability of the patterns size in the training set because of the different illumination and contrast conditions in the acquisition stage. Fig.17(b) shows the histogram for the biometric pattern size, i.e. the number of feature points detected. A high variability can be observed, as some patterns have more than twice the number of feature points of other patterns. As a result of this, some patterns have a small size, capping the possible number of matched points (Fig. 18). Also, using the matched points information alone lacks a well bounded and normalised metric space. To combine information of patterns size and normalise the metric, a function f will be used. Normalised metrics are very common as they make easier to compare class separability or establishing valid thresholds. The similarity measure (S) between two patterns will be defined by S = C f (M, N) (6) 128 Biometrics Retinal Vessel Tree as Biometric Pattern 17 (a) (b) Fig. 17. (a) Matched points histogram in the attacks (unauthorised) and clients (authorised) authentications cases. In the interval [10, 13] both distributions overlap. (b) histogram of detected points for the patterns extracted from the training set. where C is the number of matched points between patterns, and M and N are the matching patterns sizes. The first f function defined and tested is: f (M, N) = min(M, N) (7) The min function is the less conservative as it allows to obtain a maximum similarity even in cases of different sized patterns. Fig.19(a) shows the distributions of similarity scores for clients and attacks classes in the training set using the normalisation function defined in Eq.(7), and Fig.19(b) shows the FAR and FRR curves versus the decision threshold. 129 Retinal Vessel Tree as Biometric Pattern 18 Will-be-set-by-IN-TECH (a) (b) Fig. 18. Example of matching between two samples from the same individual in VARIA database. White circles mark the matched points between both images while crosses mark the unmatched points. In (b) the illumination conditions of the image lead to miss some features from left region of the image. Therefore, a small amount of detected feature points is obtained capping the total amount of matched points. Although the results are good when using the normalisation function defined in Eq.(7), a few cases of attacks show high similarity values, overlapping with the clients class. This is caused by matchings involving patterns with a low number of feature points as min(M, N) will be very small, needing only a few points to match in order to get a high similarity value. This suggests, as it will be reviewed in section 5, that some minimum quality constraint in terms of detected points would improve performance for this metric. To improve the class separability, a new normalisation function f is defined: f (M, N) = √ MN (8) Fig.20(a) shows the distributions of similarity scores for clients and attacks classes in the training set using the normalisation function defined in Eq.(8) and Fig.20(b) shows the FAR and FRR curves versus the decision threshold. Function defined in Eq.(8) combines both patterns size in a more conservative way, preventing the system to obtain a high similarity value if one pattern in the matching process contains a low number of points. This allows to reduce the attacks class variability and, moreover, to separate its values away from the clients class as this class remains in a similar values range. As a result of the new attacks class boundaries, a decision threshold can be safely established where FAR = FRR = 0 in the interval [0.38, 0.5] as Fig.20(b) clearly exposes. Although this metric shows good results, it also has some issues due to the normalisation process which can be corrected to improve the results as showed in next subsection. 4.1 Confidence band improvement Normalising the metric has the side effect of reducing the similarity between patterns of the same individual where one of them had a much greater number of points than the other, even in cases with a high number of matched points. This means that some cases easily distinguishable based on the number of matched points are now near the confidence band borders. To take a closer look at this region surrounding the confidence band, the cases of 130 Biometrics Retinal Vessel Tree as Biometric Pattern 19 (a) (b) Fig. 19. (a) Similarity values distribution for authorised and unauthorised accesses using f = min(M, N) as normalisation function for the metric. (b) False Accept Rate (FAR) and False Rejection Rate (FRR) for the same metric. unauthorised accesses with the highest similarity values (S) and authorised accesses with the lowest ones are evaluated. Fig.21 shows the histogram of matched points for cases in the marked region of Fig.20(b). It can be observed that there is an overlapping but both histograms are highly distinguishable. To correct this situation, the influence of the number of matched points and the patterns size have to be balanced. A correction parameter (γ) is introduced in the similarity measure to control this. The new metric is defined as: S γ = S · C γ−1 = C γ √ MN (9) 131 Retinal Vessel Tree as Biometric Pattern 20 Will-be-set-by-IN-TECH (a) (b) Fig. 20. (a) similarity values distribution for authorised and unauthorised accesses using f = √ MN as normalisation function for the metric. (b) False Accept Rate (FAR) and False Rejection Rate (FRR) for the same metric. Dotted lines delimit the interest zone surrounding the confidence band which will be used for further analysis. with S, C, M and N the same parameters from Eq.(8). The γ correction parameter allows to improve the similarity values when a high number of matched points is obtained, specially in cases of patterns with a high number of points. Using the gamma parameter, values can be higher than 1. In order to normalise the metric back into a [0, 1] values space, a sigmoid transference function, T(x), is used: T(x) = 1 1 + e s·(x−0.5) (10) 132 Biometrics Retinal Vessel Tree as Biometric Pattern 21 Fig. 21. Histogram of matched points in the populations of attacks whose similarity is higher than 0.3 and clients accesses whose similarity is lower than 0.6. where s is a scale factor to adjust the function to the correct domain as S γ does not return negatives or much higher than 1 values when a typical γ ∈ [1, 2] is used. In this work, s=6 was chosen empirically. The normalised gamma-corrected metric, S γ (x), is defined by: S γ = T(S γ ) (11) Finally, to choose a good γ parameter, the confidence band improvement has been evaluated for different values of γ (Fig.22(a)). The maximum improvement is achieved at γ = 1.12 with a confidence band of 0.3288, much higher than the original from previous section. The distribution of the whole training set (using γ = 1.12) is showed in Fig.22(b) where the wide separation between classes can be observed. 5. Results A set of 90 images, 83 different from the training set and 7 from the previous set with the highest number of points, has been built in order to test the metrics performance once their parameters have been fixed with the training set. To test the metrics performance, the False Acceptance Rate and False Rejection Rate were calculated for each of them (the metrics normalised by Eq.(7), Eq.(8) and the gamma-corrected normalised metric defined in Eq.(11). A usual error measure is the Equal Error Rate (EER) that indicates the error rate where FAR curve and FRR curve intersect. Fig.23(a) shows the FAR and FRR curves for the three previously specified metrics. The EER is 0 for the normalised by geometrical mean (MEAN) and gamma corrected (GAMMA) metrics as it was the same case in the training set, and, again, the gamma corrected metric shows the highest confidence band in the test set, 0.2337. The establishment of a wide confidence band is specially important in this scenario of different images fromusers acquired on different times and with different configurations of the capture hardware. Finally, to evaluate the influence of the image quality, in terms of feature points detected per image, a test is run where images with a biometric pattern size belowa threshold are removed 133 Retinal Vessel Tree as Biometric Pattern 22 Will-be-set-by-IN-TECH (a) (b) Fig. 22. (a) Confidence band size vs gamma (γ) parameter value. Maximum band is obtained at γ = 1.12. (b) Similarity values distributions using the normalised metric with γ=1.12. for the set and the confidence band obtained with the rest of the images is evaluated. Fig.23(b) shows the evolution of the confidence band versus the minimum detected points constraint. The confidence band does not grow significatively until a fairly high threshold is set. Taking as threshold the mean value of detected points for all the test set, 25.2, the confidence band grows from 0.2337 to 0.3317. So removing half of the images, the band is increased only by 0.098 suggesting that the gamma-corrected metric is very robust to low quality images. The mean execution time on a 2.4Ghz. Intel Core Duo desktop PC for the authentication process, implemented in C++, was 155ms: 105ms in the feature extraction stage and 50ms in the registration and similarity measure estimation, so that the method is very well-fitted to be employed in a real verification system. 134 Biometrics Retinal Vessel Tree as Biometric Pattern 23 (a) (b) Fig. 23. (a) FAR and FRR curves for the normalised similarity metrics (min: normalised by minimum points, mean: normalised by geometrical mean and gamma: gamma corrected metric). The best confidence band is the one belonging to the gamma corrected metric corresponding to 0.2337.(b) Evolution of the confidence band using a threshold of minimum detected points per pattern. 135 Retinal Vessel Tree as Biometric Pattern 24 Will-be-set-by-IN-TECH 6. Conclusions and future work In this work a complete identity verification method has been introduced. Following the same idea as the fingerprint minutiae-based methods, a set of feature points is extracted fromdigital retinal images. This unique pattern will allow for the reliable authentication of authorised users. To get the set of feature points, a creases-based extraction algorithm is used. After that, a recursive algorithm gets the point features by tracking the creases from the localised optic disc. Finally, a registration process is necessary in order to match the reference pattern from the database and the acquired one. With the patterns aligned, it is possible to measure the degree of similarity by means of a similarity metric. Normalised metrics have been defined and analysed in order to test the classification capabilities of the system. The results are very good and prove that the defined authentication process is suitable and reliable for the task. The use of feature points to characterise individuals is a robust biometric pattern allowing to define metrics that offer a good confidence band even in unconstrained environments when the image quality variance can be very high in terms of distortion, illumination or definition. This is also possible as this methodology does not rely on the localisation or segmentation of some reference structures, as it might be the optic disc. Thus, if the the user suffers some structure distorting pathology and this structure cannot be detected, the system works the same with the only problem being a possible loss of feature points constrained to that region. Future work includes the use of some high-level information of points to complement metrics performance and new ways of codification of the biometric pattern allowing to perform faster matches. 7. References Bolle, R. M., Senior, A. W., Ratha, N. K. & Pankanti, S. (2002). Fingerprint minutiae: A constructive definition, Biometric Authentication, pp. 58–66. Caderno, I. G., Penedo, M. G., Mariño, C., Carreira, M. J., Gómez-Ulla, F. &González, F. (2004). Automatic extraction of the retina av index, ICIAR (2), pp. 132–140. Chou, C.-T., Shih, S.-W. & Chen, D.-Y. (2006). Design of gabor filter banks for iris recognition, IIH-MSP, pp. 403–406. C.Mariño, M.G.Penedo, M.J.Carreira & F.Gonzalez (2003). Retinal angiography based authentication, Lecture Notes in Computer Science 2905: 306–313. De Schaepdrijver, L., Simoens, L., Lauwers, H. & DeGesst, J. (1989). Retinal vascular patterns in domestic animals, Res. Vet. Sci. 47: 34–42. Farzin, H., Abrishami-Moghaddam, H. & Moin, M.-S. (2008). A novel retinal identification system, EURASIP Journal on Advances in Signal Processing ID 280635: 10 pp. He, Z., Sun, Z., Tan, T., Qiu, X., Zhong, C. & Dong, W. (2008). Boosting ordinal features for accurate and fast iris recognition, CVPR. Hill, R. (1999). Retina identification, in A. Jain, R. Bolle &S. Pankanti (eds), Biometrics: Personal Identification in Networked Society, Kluwer Academic Press, Boston, pp. 123–142. Jain, A. K., Ross, A. & Pankanti, S. (1999). A prototype hand geometry-based verification system, AVBPA, pp. 166–171. Kim, H.-C., Kim, D., Bang, S. Y. & Lee, S.-Y. (2004). Face recognition using the second-order mixture-of-eigenfaces method, Pattern Recognition 37(2): 337–349. Kim, J., Cho, S., Choi, J. & Marks, R. J. (2004). Iris recognition using wavelet features, VLSI Signal Processing 38(2): 147–156. 136 Biometrics Retinal Vessel Tree as Biometric Pattern 25 Kisku, D. R., Rattani, A., Tistarelli, M. & Gupta, P. (2008). Graph application on face for personal authentication and recognition, ICARCV, pp. 1150–1155. L.G.Brown (1992). A survey of image registration techniques, ACM Computer Surveys 24(4): 325–376. Lloret, D., López, A., Serrat, J. & Villanueva, J. (1999). Creaseness-based CT and MR registration: comparison with the mutual information method, Journal of Electronic Imaging 8(3): 255–262. Lloret, D., Mariño, C., Serrat, J., A.M.López & Villanueva, J. (2001). Landmark-based registration of full SLO video sequences, Proceedings of the IX Spanish Symposium on Pattern Recognition and Image Analysis, Vol. I, pp. 189–194. López, A., Lloret, D., Serrat, J. & Villanueva, J. (2000). Multilocal creaseness based on the level-set extrinsic curvature, Computer Vision and Image Understanding 77(1): 111–144. López, A., Lumbreras, F., Serrat, J. & Villanueva, J. (1999). Evaluation of methods for ridge and valley detection, IEEE Trans. on Pattern Analysis and Machine Intelligence 21(4): 327–335. López, A. M., Lumbreras, F. & Serrat, J. (1998). Creaseness from level set extrinsic curvature, ECCV, pp. 156–169. Ma, L., Wang, Y. & Tan, T. (2002). Iris recognition using circular symmetric filters, ICPR (2), pp. 414–417. Maio, D. &Maltoni, D. (1997). Direct gray-scale minutiae detection in fingerprints, IEEE Trans. Pattern Anal. Mach. Intell. 19(1): 27–40. Mariño, C., Penedo, M. G., Penas, M., Carreira, M. J. & Gonzalez, F. (2006). Personal authentication using digital retinal images, Pattern Analysis and Applications 9: 21–33. Mian, A. S., Bennamoun, M. & Owens, R. A. (2008). Keypoint detection and local feature matching for textured 3d face recognition, International Journal of Computer Vision 79(1): 1–12. Moghaddam, B. & Pentland, A. (1997). Probabilistic visual learning for object representation, IEEE Trans. Pattern Anal. Mach. Intell. 19(7): 696–710. M.S.Markov, H.G.Rylander & A.J.Welch (1993). Real-time algorithm for retinal tracking, IEEE Trans. on Biomedical Engineering 40(12): 1269–1281. Nabti, M. & Bouridane, A. (2007). An improved iris recognition system using feature extraction based on wavelet maxima moment invariants, ICB, pp. 988–996. Noden, D. (1989). Embryonic origins and assembly of blood vessels, Am. Rev. Respir. Dis. 140: 1097–1103. Ryan, N., Heneghan, C. & de Chazal, P. (2004). Registration of digital retinal images using landmark correspondence by expectation maximization, Image and Vision Computing 22: 883–898. Seung-Hyun, L., Sang-Yi, Y. & Eun-Soo, K. (1995). Fingerprint identification by use of a volume holographic optical correlator, Proc. SPIE Vol. 3715, Optical Pattern Recognition pp. 321–325. Sidlauskas, D. (1988). 3d hand profile identification apparatus, United States Patent No.4,736.203. Siguenza Pizarro, J. A. & Tapiador Mateos, M. (2005). Tecnologias Biometricas Aplicadas a Seguridad, Ra–Ma. Simon, C. & Goldstein, I. (1935). A new scientific method of identification, J. Medicine 35(18): 901–906. systems Lab, B. (n.d.). Hasis, a hand shape identification system. 137 Retinal Vessel Tree as Biometric Pattern 26 Will-be-set-by-IN-TECH Tower, P. (1955). The fundus oculi in monozygotic twins: Report of six pairs of identical twins, Arch. Ophthalmol. 54: 225–239. VARIA (2007). VARPA Retinal Images for Authentication. http://www.varpa.es/varia.html. Venkataramani, K. & Kumar, V. (2003). Fingerprint verification using correlation filters, AVBPA, pp. 886–894. Whittier, J. C., Doubet, J., Henrickson, D., Cobb, J., Shadduck, J. &Golden, B. (2003). Biological considerations pertaining to use of the retinal vascular pattern for permanent identification of livestock, J. Anim. Sci 81: 1–79. Yang, M.-H., Ahuja, N. & Kriegman, D. J. (2000). Face recognition using kernel eigenfaces, ICIP. Zitová, B. & Flusser, J. (2003). Image registration methods: a survey, Image Vision and Computing 21(11): 977–1000. Zunkel, R. (1999). Hand geometry based verification, BIOMETRICS:Personal Identification in Networked Society, Kluwert Academic Publishers. 138 Biometrics 7 DNA Biometrics Masaki Hashiyada Division of Forensic Medicine, Department of Public Health and Forensic Medicine, Tohoku University Graduate School of Medicine Japan 1. Introduction The biometric authentication technologies, typified by fingerprint, face recognition and iris scanning, have been making rapid progress. Retinal scanning, voice dynamics and handwriting recognition are also being developed. These methods have been commercialized and are being incorporated into systems that require accurate on-site personal authentication. However, these methods are based on the measurement of similarity of feature-points. This introduces an element of inaccuracy that renders existing technologies unsuitable for a universal ID system. Among the various possible types of biometric personal identification system, deoxyribonucleic acid (DNA) provides the most reliable personal identification. It is intrinsically digital, and does not change during a person’s life or after his/her death. This chapter addresses three questions: First, how can personally identifying information be obtained from DNA sequences in the human genome? Second, how can a personal ID be generated from DNA-based information? And finally, what are the advantages, deficiencies, and future potential for personal IDs generated from DNA data (DNA-ID)? 2. Human identification based on DNA polymorphism A human body is composed of approximately of 60 trillion cells. DNA, which can be thought of as the blueprint for the design of the human body, is folded inside the nucleus of each cell. DNA is a polymer, and is composed of nucleotide units that each has three parts: a base, a sugar, and a phosphate. The bases are adenine, guanine, cytosine and thymine, abbreviated A, G, C and T, respectively. These four letters represent the informational content in each nucleotide unit; variations in the nucleotide sequence bring about biological diversity, not only among human beings but among all living creatures. Meanwhile, the phosphate and sugar portions form the backbone structure of the DNA molecule. Within a cell, DNA exists in the double-stranded form, in which two antiparallel strands spiral around each other in a double helix. The bases of each strand project into the core of the helix, where they pair with the bases of the complementary strand. A pairs strictly with T, and C with G (Alberts, 2002; Watson, 2004). Within human cells, DNA found in the nucleus of the cell (nuclear DNA) is divided into chromosomes. The human genome consists of 22 matched pairs of autosomal chromosomes and two sex-determining chromosomes, X and Y. In other words, human cells contain 46 different chromosomes. Males are described as XY since they possess a single copy of the X Biometrics 140 chromosome and a single copy of the Y chromosome, while females possess two copies of the X chromosome and are described as XX. The regions of DNA that encode and regulate the synthesis of proteins are called genes; these regions consist of exons (protein-coding portions) and introns (the intervening sequences) and constitute approximately 25% of the genome (Jasinska & Krzyzosiak, 2004). The human genome contains only 20,000−25,000 genes (Collins et al., 2004; Lander et al., 2001; Venter et al., 2001). Therefore, most of the genome, approximately 75%, is extragenic. These regions are sometimes referred to as ‘junk’ DNA; however, recent research suggests that they may have other essential functions. Markers commonly used to identify individual human beings are usually found in the noncoding regions, either between genes or within genes (i.e., introns). 2.1 Sort tandem repeat (STR) Target region (short tandem repeat) 7 repeats 8 repeats 9 repeats 10 repeats 11 repeats 1 2 3 4 5 6 8 7 9 10 11 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 8 8 9 9 10 ・ 2-nucletotide repeat unit : (CA)(CA)(CA)・・・・ ・ 3 -nucletotiderepeat unit : (GCC)(GCC)(GCC) ・・・・ ・ 4 -nucletotiderepeat unit : (AATG)(AATG)(AATG) ・・・・ ・ 5 -nucletotiderepeat unit : (AGAAA)(AGAAA) ・・・・ Primer Primer 1 2 3 … Fig. 1. The structure of Short Tandem Repeat (STR) In the extragenic region of eukaryotic genome, there are many repeated DNA sequences (approximately 50% of the whole genome). These repeated DNA sequences come in all sizes, and are typically designated by the length of the core repeat unit and either the number of contiguous repeat units or the overall length of the repeat region. These regions are referred to as satellite DNA (Jeffreys et al., 1995). The core repeat unit for a medium- length repeat, referred to as a minisatellite or VNTR (variable number of tandem repeats), is in the range of approximately 8−100 bases in length (Jeffreys et al., 1985). DNA regions with DNA Biometrics 141 repeat units that are 2−7 base pairs (bp) in length are called microsatellites, simple sequence repeats (SSRs), or most commonly short tandem repeats (STRs) (Clayton et al. ,1995; Hagelberg et al., 1991;Jeffreys et al., 1992)(Fig. 1). STRs have become popular DNA markers because they are easily amplified by the polymerase chain reaction (PCR) and they are spread throughout the genome, including both the 22 autosomal chromosomes and the X and Y sex chromosomes. The number of repeats in STR markers can vary widely among individuals, making the STRs an effective means of human identification in forensic science (Ruitberg et al., 2001). The location of an STR marker is called its “locus.” The type of STR is represented by the number of repeat called ‘allele’ which is taken from biological father and mother. When an individual has two copies of the same allele for a given marker, they are homozygous; when they have two different alleles, they are heterozygous. 2.1.1 DNA sample collection DNA can be easily obtained from a variety of biological sources, not only body fluid but also nail, hair and used razors (Anderson et al., 1999; Lee et al., 1998; Lee & Ladd, 2001). For biometric applications, a buccal swab is the most simple, convenient and painless sample collection method (Hedman et al., 2008). Buccal cell collection involves wiping a small piece of filter paper or a cotton swab against the inside of the subject’s cheek, in order to collect shed epithelial cells. The swab is then air dried, or can be pressed against a treated collection card in order to transfer epithelial cells for storage purposes. Fig. 2. The flow of DNA polymorphism analysis 2.1.2 DNA extraction and quantification There are many methods available for extracting DNA (Butler, 2010). The choice of which method to use depends on several factors, especially the number of samples, cost, and speed. Extraction time is the critical factor for biometric applications. The author has already reported the “5-minute DNA extraction” using an automated procedure (Hashiyada, 2007a). The use of large quantities of fresh buccal cells made it possible to extract DNA in a short time. Biometrics 142 In forensic cases, DNA quantitation is an important step (Butler, 2010). However, this step can be omitted in biometrics because a relatively large quantity of DNA can be recovered from fresh buccal swab samples. 2.1.3 DNA amplification (polymerase chain reaction: PCR) The field of molecular biology has greatly benefited from the discovery of a technique known as the polymerase chain reaction, or PCR (Mullis et al., 1986; Mullis & Faloona, 1987; Saiki et al., 1986). First described in 1985 by Kary Mullis, who received the Novel Prize in Chemistry in 1993, PCR has made it possible to make hundreds of millions of copies of a specific sequence of DNA in a few hours. PCR is an enzymatic process in which a specific region of DNA is replicated over and over again to yield many copies of a particular sequence. This molecular process involves heating and cooling samples in a precise thermal cycling pattern for approximately 30 cycles. During each cycle, a copy of the target DNA sequence is generated for every molecule containing the target sequence. In recent years, it has become possible to PCR amplify 16 STRs, including the gender assignment locus called ‘amelogenin,’ in one tube (Kimpton et al., 1993; Kimpton et al., 1996). Such multiplex PCR is enabled by commercial typing kits, such as AmpFlSTR® Identifiler® (Applied Biosystems, Foster City, CA, USA) and PowerPlex® 16 (Promega, Madison, WI, USA). Fig. 3. DNA amplification with polymerase chain reaction (PCR) 2.1.4 DNA separation and detection After STR polymorphisms have been amplified using PCR, the length of products must be measured precisely; some STR alleles differ by only 1 base-pair. Electrophoresis of the PCR products through denaturing polyacrylamide gels can be used to separate DNA molecules from 20−500 nucleotides in length with single base pair resolution (Slater et al., 2000). Recently, the fluorescence labelling of PCR products followed by multicolour detection has DNA Biometrics 143 been adopted by the forensic science field. Up to five different dyes can be used in a single analysis. Electrophoresis platforms have evolved from slab-gels to capillary electrophoresis (CE), which use a narrow glass filled with an cross-linked polymer solution to separate the DNA molecules (Butler et al., 2004). After data collection by the CE, the alleles (i.e., the type or the number of STR repeat units), are analyzed by the software that accompanies the CE machine. It takes around four hours, starting with DNA extraction, to obtain data from 16 STRs including the sex determination locus. 2.2 Single nucleotide polymorphism (SNP) The simplest type of polymorphism is the single nucleotide polymorphism (SNP), a single base difference at a particular point in the sequence of DNA (Brookes, 1999). SNPs normally have just two alleles, e.g., one allele is a cytosine (C) and the other is a thymine (T) (Fig. 4). SNPs therefore are not highly polymorphic and do not possess ideal properties for DNA polymorphism to be used in forensic analysis. However, SNPs are so abundant throughout the genome that it is theoretically possible to type hundreds of them. Furthermore, sample processing and data analysis may be more fully automated because size-based separation is not required. Thus, SNPs are prospective new bio-markers in clinical medicine (Sachidanandam et al., 2001; Stenson et al., 2009). Fig. 4. The schema of Single nucleotide polymorphism (SNP) 2.2.1 SNP detection methods Several SNP typing methods are available, each with its own strengths and weaknesses, unlike the STR analysis (Butler, 2010). In order to achieve the same power of discrimination as that provided by STRs, it is necessary to analyse many more SNPs. 40 to 50 SNPs must be analyzed in order to obtain reasonable powerful discrimination and define the unique profile of an individual (Gill, 2001). Importantly, however, we can count on the development of new SNP detection technologies, capable of high-throughput analysis, in the near future. 2.3 Lineage markers Autosomal DNA markers are shuffled with each generation, which means that half of an individual's genetic information comes from his or her father and the other half from his or her mother. However, the Y chromosome (Chr Y) and mitochondrial DNA (mtDNA) Biometrics 144 markers are called “lineage markers” because they are passed down from generation to generation without changing (except for mutational events). Maternal lineages can be followed using mtDNA sequence information (Anderson et al., 1981; Andrews et al., 1999) and whereas paternal lineages can be traced using Chr Y markers (Jobling & Tyler−Smith, 2003; Kayser et al., 2004). The analysis of lineage markers does not have the discriminatory power of autosomal markers. Even so, there are some features of both Chr Y and mtDNA that make them valuable forensic tools. 3. DNA polymorphism for biometric source The most commonly studied or implemented biometrics are fingerprinting, face, iris, voice, signature, retina and the patterns of vein and hand geometry (Shen & Tan, 1999; Vijaya Kumar et al., 2004). No one model is best for all situations. In addition, these technologies are based on the measurement of similarity of features. This introduces an element of inaccuracy that renders the existing technologies unsuitable for a universal ID system. However, DNA polymorphism information, such as STRs and SNPs, could provide the most reliable personal identification. This data can be precisely defined the most minute level, is intrinsically digital, and does not change during a person’s life or after his/her death. Therefore, DNA identification data is utilized in the forensic sciences. On the negative side, the biggest problem in using DNA is the time required for the extraction of nucleic acid and the evaluation of STR or SNP data. In addition, there are several other problems, such as the high cost of analysis, issues raised by monozygotic twins, and ethical concerns. This section describes a method for generation of DNA personal ID (DNA-ID) based on STR and SNP data, specifically. In addition, by way of example, the author proposes DNA INK for authentic security. 3.1 DNA personal ID using STR system We will refer to repeat counts of alleles obtained by STR analysis, as described in section 2.1, as (j, k). Each locus is associated with two alleles with distinct repeat counts (j, k), as shown in Fig. 2: one allele is inherited from the father, and the other from the mother. Before (j, k) can be applied to a DNA personal ID, it is necessary to statistically analyze how the distribution of (j, k) varies at a given locus based on actual data. We can generate a DNA-ID, X α , that includes allelic information about STR loci. The loci are incorporated in the following sequence. The repeat counts for the pair of alleles at each locus are arranged in ascending order. Step 1. Measure the STR alleles at each locus. Step 2. Obtain STR count values for each locus; express these in ascending order. : , L j k j k ≤ Depending on the measurement, the same person's STR count may appear as (j, k) or (k, j). Therefore, j and k are expressed in an ascending order, i.e., using (j, k|j ≤ k), in order to establish a one-to-one correspondence for each individual. This step is referred to as a ordering operation. Step 3. Generate a DNA-ID X α according to the following series, L i (j, k): X 1 2 3 . . . n L L L L α = DNA Biometrics 145 where L i indicates the ith STR count (j, k). For example, suppose that Mr. M has the following alleles at the respective loci; ( ) ( ) ( ) ( ) ( ) X D3S1358 D13S317 D18S51 D21S11 . . . D16S539 12, 14 8, 11 13, 15 29, 32.2 10, 10 α = = … The X α was thus defined as follows. X α = 1214811131529322 …… 1010 When the STR number of an allele had a fractional component, such as allele32.2 in D21S11, the decimal point was removed, and all of the numbers, including those after the decimal point, were retained. Finally, X α is generated number with several tends of digits, and becomes a personal identification information that is unique with a certain probability predicted by statistical and theoretical analysis. 3.1.1 Establishment of the identification format Because X α contains personal STR information, it must be encrypted to protect privacy. This can be achieved using a one-way function that also reduces the data length of the DNA-ID. This one-way function, the secure hash algorithm-1 (SHA-1), produces an ID with a length X δ of 160 bits, according to the following transformation: X δ = h ( X α ) 3.2 Statistical and theoretical analysis of DNA-ID 3.2.1 Matching probability at locus L The probability that a STR allele (j, k) at locus L will occur in this combination is denoted as p jk . The individual occurrence probabilities of j and k are denoted as p j and p k , respectively. Here, j and k are sequenced in ascending order to make the choice of generated ID unambiguous, using the STR analysis system described above. After the STR analysis, the probability that (j, k) occurs is p jk plus the probability p kj that (k, j) occurs. The reason for this is as follows. Even if (k, j) occurs in the same person during measurement, it is treated as (j, k) by rewriting it as (j, k) if k >j. Therefore, p jk is expressed as follows when j ≠ k (j <k): p jk = p j • p k + p k • p j = 2p j p k If j = k, p jk = p j • p j 3.2.2 Probability of a match between any two persons’ DNA-ID Probability p that the STR count at the same locus is identical for any two persons can be expressed as follows: Biometrics 146 When j = k, ( ) 2 1 m j k j p p = ⋅ ∑ When j ≠ k, ( ) 2 1 2 m j k j k m p p ≤ < ≤ ⋅ ∑ ( ) ( ) 4 2 1 1 4 m m j j k j j k m p p p p = ≤ < ≤ ∴ = + ⋅ ∑ ∑ Here, m is the upper limit of j and k, and the information reported so far indicates m= 60. Next, a determination is made of the DNA-ID matching probability p n , where n loci were used to generate the ID. The probability that the STR counts at the i th locus will match for any two persons is denoted as p i . When n loci are used, the probability p n that the DNA-IDs of any two persons will match (the DNA-ID matching probability) is as follows: 1 n n i i p p = = ∏ Here, it is assumed that there is no correlation among the STR loci. 3.2.3 Verification using validation experiment (STR) As a validation experiment, we studied the genotype and distribution of allele frequencies at 18 STRs in 526 unrelated Japanese individuals. Data was obtained using three commercial STR typing kits: PowerPlex™ 16 system (Promega), PowerPlex SE33 (Promega), and AmpFlSTR Identifiler™ (Applied biosystems) (Hashiyada, 2003a; 2003b). Information about the 18 target STRs is described in Table 1. Step 1. Perform DNA extraction, PCR amplification and STR typing Step 2. Perform the exact test (the data were shuffled 10,000 times), the homozygosity, and likelihood ratio tests using STR data for each STR locus in order to evaluate Hardy–Weinberg equilibrium (HWE). HWE provides a simple mathematical representation of the relationship among genotype and allele frequencies within an ideal population, and is central to forensic genetics. Importantly, when a population is in HWE, the genotype frequencies can be predicted from the allele frequencies. Step 3. Calculate parameters, the matching probability, the expected and observed heterozygosity, the power of discrimination, the polymorphic information content, the mean exclusion chance, in order to estimate the polymorphism at each STR locus. There are some loci on the same chromosomes (chr) such as D21S11 and Penta D on chr 21, D5S818 and CSF1PO on chr 5, and TPOX and D2S1338 on chr 2. No correlation was found between any sets of loci on the same chromosome, which means they are statistically independent. In addition, the statistical data for the 18 analyzed STRs, excluding the Amelogenin locus, were analyzed and showed a relatively high rate of matching probability; no significant deviation from HWE was detected. The combined mean exclusion chance was 0.9999998995 and the combined matching probability was 1 in 9.98 × 10 21 , i.e., 1.0024 × 10 − 22 . These values were calculated using polymorphism data from Japanese subjects; it is likely that different values would be obtained using data compiled from different ethnic groups, e.g., Caucasian or African. DNA Biometrics 147 Locus Chromosome Location Repeat Motif* Locus Chromosome Location Repeat Motif* TPOX 2 q 25.3 GAAT TH01 11 p 15.5 TCAT D2S1338 2 q 35 TGCC/TTCC VWA 12 p 13.31 TCTG/TCTA D3S1358 3 p 21.31 TCTG/TCTA D13S317 13 q 31.1 TATC FGA 4 q 31.3 CTTT/TTCC Penta E 15 q 26.2 AAAGA D5S818 5 q 23.2 AGAT D16S539 16 q 24.1 GATA CSF1PO 5 q 33.1 TAGA D18S51 18 q 21.33 AGAA SE33 6 q 14 AAAG D19S433 19 q 12 AAGG/TAGG D7S820 7 q 21.11 GATA D21S11 21 q 21.1 TCTA/TCTG D8S1179 8 q 24.13 TCTA/TCTG Penta D 21 q 22.3 AAAGA * Two types of motif means a compound or complex repeat sequence Table 1. Information about autosomal STR loci 3.2.4 The “Birthday Paradox” of DNA-ID In principle, the low matching probability of STR-based IDs would allow absolute and unequivocal discrimination between individuals. However, if STRs are to be used as an authentication system in our society, we must investigate the probability of two or more randomly selected people having an identical DNA- ID. The most well-known simulation of this probability is “the birthday paradox“. Of 40 students in a class, the probability that at least two students have the same birthday is approximately 0.9. This result seems counterintuitive, and is called a “paradox,” because for any single pair of students, the probability that they have the same birthday is 1/365 (0.0027). The paradox arises when we forget to consider that we are selecting samples randomly out of the members in a group. In two randomly selected individuals, the probability that one STR locus is different and that all STR loci are identical is (1-P M ) L(L-1)/2 and 1-(1-P M ) L(L-1)/2 , respectively, where L is the population size. However, the formula, 1-(1-P M ) L(L-1)/2 , is beyond the ability of personal computers, so we use the expected value, L(L-1)/2 · P M , to estimate two persons having the same STR genotype. This formula can use an approximate value of 1-(1-P M ) L(L-1)/2 . This is because L 2 is much smaller than 1/ P M when L is small, and because 1-(1-P M ) L(L-1)/2 is smaller than L(L-1)/2 · P M when L is not small. In this report, the value, L(L-1)/2 · P M , is defined as the practical matching probability (P PM ). The matching probability (P M ) for 18 STRs is 1.0024 × 10 − 22 , as described above. When P PM multiplied by the population size is less than 1, each person in the population could have a unique DNA-ID. Therefore, when using 18 loci, a population of tens of millions could be expected to include pairs of individuals with identical STR alleles. If the frequencies of STR alleles are similar among all ethnic groups, each person in Japan (or the world) could have a unique DNA-ID if the P PM of the STR system were approximately 10 − 24 and 10 − 30 , respectively. As the number of people in a community increases, the more the practical matching probability increases. This number can be applied for unrelated persons; however, we also need to consider P PM between related individuals. For instance, between two first cousins, if 41 STR loci are analyzed, we can obtain a unique DNA-ID. In addition, discrimination between half siblings requires analysis of 57 STR loci guarantee a unique DNA-ID. Thus, when using DNA identification systems such as STR systems for DNA-personal-IDs, the P PM should be considered for both related and unrelated individuals (Hashiyada, 2007b). Biometrics 148 3.3 DNA personal ID using SNP system The vast majority of SNPs are biallelic, meaning that they have two possible alleles and therefore three possible genotypes. For example, if the alleles for a SNP locus are R and S (where ‘R’ and ‘S’ could represent a A(adenine), G(guanine), C(cytosine) and T(thymine) nucleotide), three possible genotypes would be RR, RS (SR) or SS. Because a single biallelic SNP by itself yields less information than a multiallelic STR marker, it is necessary to analyze a larger number of SNPs in order to obtain a reasonable power of discrimination to define a unique profile. Computational analysis have shown that on average, 25 to 45 SNP loci are needed in order to yield equivalent random match probabilities comparable to those obtained with the 13 core STR loci that have been adopted by the FBI’s DNA database (COmbined DNA Index System, CODIS). The steps of creating a DNA-ID using SNPs are as follows; Step 1. Define alleles 1 and 2 for each SNP locus. Since DNA has a double helix structure, the single nucleotide polymorphism of A or G is the same polymorphism of T or C, respectively (Fig. 4). In other words, it is important to specify which strand of the double helix is to be analyzed, and to define allele 1 and allele 2 at the outset. Step 2. Analyze the SNP loci and place them in the following order. L : allele 1 allele 2 Step 3. Generate the DNA-ID X α according to the following series of L i (allele1, allele2): X α = L 1 L 2 L 3 . . . L n where L i indicates the i th SNP nucleotide (allele1, allele2). For example, suppose that a person has the following alleles at the respective loci; X α = SNP 1 SNP 2 SNP 3 SNP 4 . . . SNP 50 = (A,A) (C,T) (T,C) (C,C …… (G,A) Then  X would be defined as follows. X α = AACTTCCC……GA Next, the four types of nucleotide, A, G, C and T, are translated into binary notation. A=00, G=01, C=10, T=11 Finally, the  X is described as a string of 100 bits (digits of value 0 or 1). X α = 0000101111101010……0100 This  X must be encrypted for privacy protection using the secure hash algorithm-1 (SHA-1) for the same reasons as described above for STRs. The resulting DNA-ID (SNP) has a length δ X of 160 bits, according to the following transformation: X δ = h ( X α ) DNA Biometrics 149 3.3.1 Verification using validation experiment (SNP) As a validation experiment, the author analyzed 120 autosomal SNPs in 100 unrelated Japanese subjects using the TaqMan ® method (Applied Biosystems), and built a Japanese SNP database for identification. Although several SNPs were located on the same autosomal chromosome, no correlation was found between alleles at any SNP loci. Furthermore, no significant deviation from Hardy−Weinberg Equilibrium (HWE) was detected. The macthing probability (MP) of each SNP ranged from 0.375−0.465 (Hashiyada, 2007a). The MP for 41 SNPs (3.63 × 10 − 18 ), which have high MP in each loci, was very similar to the MPs obtained with the current STR multiplex kits, PowerPlex™ 16 System(Promega) and AmpFlSTR Identifiler (Applied Biosystems), which were 5.369 × 10 − 18 and 1.440 × 10 − 17 , respectively in Japanese population. 3.4 Rapid analysis system of SNP A reduction of the time required for DNA analysis is necessary in order to make practical use of DNA biometrics. In the STR system, it is difficult to decrease the analysis time because it is necessary to perform electrophoresis after PCR amplification. From DNA extraction to STR typing, the entire process takes 4−5 hours. However, there are many methods for analyzing SNPs that do not demand such a lengthy process. The author developed the SNP typing methodology using the modified TaqMan ® method, which is capable of amplifying the DNA and typing the SNPs at the same time. The author modified the number of PCR cycles and the annealing/extension time, and selected SNP loci that yield successful results under the modified PCR conditions. This new method is capable of detecting and typing 96 SNPs within 30 minutes (Hashiyada et al., 2009). 3.5 DNA INK In this paragraph, the author demonstrates an example of an application of STR polymorphism information, specifically the authentication of rare or expensive goods using the DNA-ID. The author outlines the development of biometric ink containing DNA whose sequence is based on personal STR information. The “DNA INK” is made of synthetic DNA and printing ink. Step 1. Perform STR analysis by the method described above. Step 2. Generate the DNA-ID, δ X , consisting of 160 bits, as described above. Step 3. Extract one-quarter of the data in the DNAI-ID ( X δ ) in order to reduce costs and improve practicality. The original 160-bit length was defined as X i X 1 X2 X 3 X4 δ δ δ δ δ = | | | | where δ X 1 , δ X2 , δ X 3 and δ X4 refer to the identification, ID, containing of the first, second, third and fourth 40 bits of δ X . Each set of 8 data bits is extended by two redundant bits known as the shift and check bits, which serve not only as check but also as limiting factors in the latter stages of DNA sequence generation. These limiting factors are necessary in DNA sequence analysis in order to exclude five or more repetitions of the same base. The extracted 40-bit data as follows; X 1 δ =1001100110011101011010101001011110100010 X 1 δ =10011001 [10] 10011101 [00] 01101010 [01] 10010111 [00] 10100010 [11] Biometrics 150 (Shift and check bits show as square brackets with underlines.) Step 4. Transform the bit series generated above into base sequences according to the following scheme. We called this step the “Encodeed Base Array“ method. 00=A(adenine), 01=C(cytosine), 10=G(guanine), 11=T(thymine) X 1 δ =10011001 [10]10011101 [00] 01101010 [01] 10010111 [00] 10100010 [11] =GCGC [G] GCTC [A] CGGG [C] GCCT [A] GGAG [T] Step 5. Define the identification data format by adding a header (H, 10 bits) and a serial number (N, 30 bits) to X 1 δ (40 + 10 = 50 bits). The resulting DNA sequence, consisting of H (5-bp), N (15-bp) and δ X 1 (25-bp) would then be flanked by two 20- bp−long primer sequences. This synthetic DNA could be amplified by PCR, and only those who know the primer sequences would be able to analyze the intervening sequence. Figure 5 shows the structure of the 85-bp synthetic DNA sequence. Step 6. Synthesize the complementary strand. Synthetic single-strand DNA is more economical to produce than double-strand DNA, but much less physically stable; therefore, double-strand PCR-amplified DNA should be used for incorporation into the DNA ink. Step 7. Mix 3 mg of double-strand DNA with 100 ml of ink. The ink itself is composed of a colorless transparent pigment, so that it is invisible to the naked eye, but contains an IR color former that enables easy detection of the printed mark. In addition, add dummy DNA in order to make the DNA-ID sequence difficult to analyze by someone who does not know the primer sequences. Fig. 5. Sequence structure of the 85-bp single-strand DNA-ID P1, P2: Primer sequences are designed so as not to anneal to the human genome H: Header, N: Serial number DNA Biometrics 151 The several types of resistance tests, by heat, acids, alkalis, alcohol, ultraviolet (UV) and sunlight, were used to ascertain the durability of DNA ink for practical use. Samples printed using DNA ink were covered with zinc oxide (ZnO) on the surface in order to enhance resistance to UV light, which is the major cause of DNA degradation. The target DNA sequence was detected successfully in all resistance tests except for the UV exposure test. However, the durability improved when the ink was covered by ZnO, allowing successful amplification even after 40 hours of UV exposure. Finally, the DNA ink was proved as a sort of biological memory which could print the polymorphism information created by DNA, on the surface of everything excluding the air and water. 4. Problems of DNA biometrics There can be no doubt that DNA-ID is potentially useful as a biometric. It has many advantages, including accuracy, strictness, discriminatory power (and ease of increasing this power), and the ability to use the same analysis platform all over the world. However, DNA polymorphism information is not widely used in biometrics at this point. The weak points of DNA-ID are discussed below. 4.1 Time required for DNA analysis The most serious flaw is that DNA analysis is time-consuming compared to other authentication methods. It takes at least 4 hours to get STR identification data by common methods used in forensic science. Most of the time required for DNA analysis is taken up by PCR amplification and electrophoresis. It is impossible to dramatically shorten the duration of these steps using existing technologies. SNP analysis may be faster, however: it is possible to analyze 96 SNPs within 30 minutes (Hashiyada, Itakura et al., 2009). Thus, a SNP system could use a specific usage, for example in passports or in very large-scale mercantile transactions. 4.2 Ethical concerns The polymorphic target region in DNA used to create the DNA-ID does not relate to a person’s physical characteristics or disease factors, since the STRs and the SNP loci were selected from the extragenic regions. However, because the DNA-ID system involves handling information that can identify each individual, it should be strictly supervised in order to protect privacy. Once the DNA-ID has been generated, the one-way encryption described above makes it impossible to recover any of the original DNA information (3.1, 3.3). Therefore, raw materials like buccal swab should be especially tightly controlled in order to prevent spoofing. 4.3 Monozygotic twins and DNA chimeras Monozygotic twins, or more commonly referred to as identical twins, begin life as a single egg, which is fertilized by one sperm but then splits into two eggs early in the gestational period. Therefore, the twins share a precisely duplicated whole genome, and can‘t be distinguished by DNA polymorphism. However, sometimes one member of a pair of identical twins can develop cancer or schizophrenia while the other does not (Zwijnenburg et al., 2010). A recent “twin study” has revealed that twin pairs have significant differences in their DNA sequence, and furthermore that environmental factors can change gene expression and susceptibility to disease by affecting epigenetics, i.e., changes in the DNA Biometrics 152 that do not alter its sequence (Haque et al., 2009). Such data will hopefully aid development of tools that allow discrimination between the identical twins in the near future. A DNA chimera refers to a recombinant molecule of DNA composed of segments from more than one source. The author has observed chimerism in a case of allogeneic bone marrow transplantation (BMT).The recipient had suffered from acute promyelocytic leukemia and received a BMT from a healthy donor, resulting in complete remission of the leukemia. Samples of peripheral blood leukocytes (PBL), buccal mucosa, hair follicles and fingernails were collected from the transplant recipient. DNA analysis revealed that the STR profile of PBL of the recipient had completely converted to donor type, whereas the hair follicles and fingernails were recipient-derived. DNA patterns of the buccal mucosa appeared chimeric, i.e., they had qualities of both the recipient and donor. Neutrophilic leukocytes were observed in smear specimens from buccal swabs of the recipient, indicating that the buccal cells were not truly chimeric but were instead merely contaminated with leukocytes. 4.4 Cost DNA analysis requires a high capital cost in order to buy and maintain equipment as well as purchase commercial kits. In addition, it is necessary to equip a laboratory and employ specialists in molecular biology. These high costs may pose a barrier to entry of venture capitals. The more popular such DNA techniques become, however, the lower the unit costs of the apparatus and reagents will become. 5. Conclusion Development of biometric authentication technologies has progressed rapidly in the last few years. Personal identification devices based on unique patterns of fingerprints, iris, or subcutaneous veins in the finger have all been commercialized. All of these methods of verification are based on matching analog patterns or feature-point comparisons. Because they lack absolute accuracy, they have not yet achieved a universal standard. Among the various types of biometric information source, the DNA-ID is thought to be the most reliable method for personal identification. DNA information is intrinsically digital, and does not change either during a person’ life or after his/her death. The discriminatory power of the data can be enhanced by increasing the number of STR or SNP loci. The DNA-ID could be encrypted via the one-way function (SHA-1) to protect privacy and to reduce data length. Using the STR system, it is currently difficult to complete analysis within 3 hours; however, using the SNP system, it is possible to analyse 96 SNPs within 30 minutes. Both systems yielded verifiable results in validation experiments. The author also introduced the idea of DNA-INK as a practical application of DNA-ID. DNA-ID has some disadvantages, as well, including long analysis time, ethical concerns, high cost, and the impossibility of discrimination of monozygotic twins. However, the author believes that the DNA-ID must be employed as a biometric methodology, using breakthrough methods developed in the near future. 6. Acknowledgments I am grateful to Dr. Yukio Itakura for his extensive support, and I give special thanks to my colleagues at Div. Forensic Medicine, Tohoku University. I also thank Prof. M. Funayama for reading the manuscript and giving me helpful advice. DNA Biometrics 153 7. References Alberts, B., Jhonson, A., Lewis, J., Raff, M., Roberts, K., Walter P. (2002). Molecular biology of THE CELL NY, USA: Garland Science. Anderson, S., et al. (1981). Sequence and organization of the human mitochondrial genome. Nature, 290(5806): p. 457-65. Anderson, T.D., et al. (1999). A validation study for the extraction and analysis of DNA from human nail material and its application to forensic casework. J Forensic Sci, 44(5): p. 1053-6. Andrews, R.M., et al. (1999). Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet, 23(2): p. 147. Brookes, A.J. (1999). The essence of SNPs. Gene, 234(2): p. 177-86. Butler, J.M., et al. (2004). Forensic DNA typing by capillary electrophoresis using the ABI Prism 310 and 3100 genetic analyzers for STR analysis. Electrophoresis, 25(10-11): p. 1397-412. Butler, J.M. (2010). Fundamemntals of Forensic DNA Typipng: ELSERVIER. Clayton, T.M., et al. (1995). Identification of bodies from the scene of a mass disaster using DNA amplification of short tandem repeat (STR) loci. Forensic Sci Int, 76(1): p. 7-15. Collins, F.S., et al. (2004). Finishing the euchromatic sequence of the human genome. Nature, 431(7011): p. 931-45. Gill, P. (2001). An assessment of the utility of single nucleotide polymorphisms (SNPs) for forensic purposes. Int J Legal Med, 114(4-5): p. 204-10. Hagelberg, E., et al. (1991). Identification of the skeletal remains of a murder victim by DNA analysis. Nature, 352(6334): p. 427-9. Haque, F.N., et al. (2009). Not really identical: epigenetic differences in monozygotic twins and implications for twin studies in psychiatry. Am J Med Genet C Semin Med Genet, 151C(2): p. 136-41. Hashiyada, M., et al. (2009). Development of a spreadsheet for SNPs typing using Microsoft EXCEL. Leg Med (Tokyo), 11 Suppl 1: p. S453-4. Hashiyada, M., Itakura, Y., Nagashima, T., Nata, M., Funayama, M. (2003a). Polymorphism of 17 STRs by multiplex analysis in Japanese population. Forensic Sci Int, 133(3): p. 250-3. Hashiyada, M., Itakura, Y., Nagasima, T., Sakai, J., Funatyama, M. (2007a). High-throughput SNP analysis for human identification. DNA Polymorphism Official Journal of Japanese Society for DNA Polymorphism Resarch, 15: p. 3. Hashiyada, M., Matsuo, S., Takei, Y., Nagasima, T.,Itakura, Y., Nata, M., Funatyama, M. (2003b). The length polymorphism od SE33(ACTBP2) locus in Japanese population. Practice in Forensuic Medicine, 46: p. 4. Hashiyada, M., Sakai, J., Nagashima, T., Itakura Y., Kanetake, J., Takahashi, S., Funayama, M. (2007b). The birthday paradox in the biometric personal authentication system using STR polymorphism -Practical matching probabilities evaluate the DNA psesonal ID system-. The Research and practice in forensic medicine, 50: p. 5. Hedman, J., et al. (2008). A fast analysis system for forensic DNA reference samples. Forensic Sci Int Genet, 2(3): p. 184-9. Jasinska, A.&W.J. Krzyzosiak (2004). Repetitive sequences that shape the human transcriptome. FEBS Lett, 567(1): p. 136-41. Biometrics 154 Jeffreys, A.J., et al. (1985). Hypervariable 'minisatellite' regions in human DNA. Nature, 314(6006): p. 67-73. Jeffreys, A.J., et al. (1992). Identification of the skeletal remains of Josef Mengele by DNA analysis. Forensic Sci Int, 56(1): p. 65-76. Jeffreys, A.J., et al. (1995). Mutation processes at human minisatellites. Electrophoresis, 16(9): p. 1577-85. Jobling, M.A.&C. Tyler-Smith (2003). The human Y chromosome: an evolutionary marker comes of age. Nat Rev Genet, 4(8): p. 598-612. Kayser, M., et al. (2004). A comprehensive survey of human Y-chromosomal microsatellites. Am J Hum Genet, 74(6): p. 1183-97. Kimpton, C.P., et al. (1993). Automated DNA profiling employing multiplex amplification of short tandem repeat loci. PCR Methods Appl, 3(1): p. 13-22. Kimpton, C.P., et al. (1996). Validation of highly discriminating multiplex short tandem repeat amplification systems for individual identification. Electrophoresis, 17(8): p. 1283-93. Lander, E.S., et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822): p. 860-921. Lee, H.C., et al. (1998). Forensic applications of DNA typing: part 2: collection and preservation of DNA evidence. Am J Forensic Med Pathol, 19(1): p. 10-8. Lee, H.C.&C. Ladd (2001). Preservation and collection of biological evidence. Croat Med J, 42(3): p. 225-8. Mullis, K., et al. (1986). Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harb Symp Quant Biol, 51 Pt 1: p. 263-73. Mullis, K.B.&F.A. Faloona (1987). Specific synthesis of DNA in vitro via a polymerase- catalyzed chain reaction. Methods Enzymol, 155: p. 335-50. Ruitberg, C.M., et al. (2001). STRBase: a short tandem repeat DNA database for the human identity testing community. Nucleic Acids Res, 29(1): p. 320-2. Sachidanandam, R., et al. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409(6822): p. 928-33. Saiki, R.K., et al. (1986). Analysis of enzymatically amplified beta-globin and HLA-DQ alpha DNA with allele-specific oligonucleotide probes. Nature, 324(6093): p. 163-6. Shen, W.&T. Tan (1999). Automated biometrics-based personal identification. Proc Natl Acad Sci U S A, 96(20): p. 11065-6. Slater, G.W., et al. (2000). Theory of DNA electrophoresis: a look at some current challenges. Electrophoresis, 21(18): p. 3873-87. Stenson, P.D., et al. (2009). The Human Gene Mutation Database: 2008 update. Genome Med, 1(1): p. 13. Venter, J.C., et al. (2001). The sequence of the human genome. Science, 291(5507): p. 1304-51. Vijaya Kumar, B.V., et al. (2004). Biometric verification with correlation filters. Appl Opt, 43(2): p. 391-402. Watson, J., Baker, T., Bell, S., Gann, A., Levine, M., Losick R. (2004). Molecular Biology of the Gene, San Francisco, CA, USA: Benjamin Cummings, Cold Spring Harbor Laboratory Press. Zwijnenburg, P.J., et al. (2010). Identical but not the same: the value of discordant monozygotic twins in genetic research. Am J Med Genet B Neuropsychiatr Genet, 153B(6): p. 1134-49. Part 2 Behavioral Biometrics 0 Keystroke Dynamics Authentication Romain Giot, Mohamad El-Abed and Christophe Rosenberger GREYC Research Lab Université de Caen Basse Normandie, CNRS, ENSICAEN France 1. Introduction Everybody needs to authenticate himself on his computer before using it, or even before using different applications (email, e-commerce, intranet, . . . ). Most of the times, the adopted authentication procedure is the use of a classical couple of login and password. In order to be efficient and secure, the user must adopt a strict management of its credentials (regular changing of the password, use of different credentials for different services, use of a strong password containing various types of characters and no word contained in a dictionary). As these conditions are quite strict and difficult to be applied for most users, they do not not respect them. This is a big security flawin the authentication mechanism(Conklin et al., 2004). According to the 2002 NTA Monitor Password Survey 1 , a study done on 500 users shows that there is approximately 21 passwords per user, 81% of them use common passwords and 30% of them write their passwords down or store them in a file. Hence, password-based solutions suffer from several security drawbacks. A solution to this problem, is the use of strong authentication. With a strong authentication system, you need to provide, at least, two different authenticators among the three following: (a) what you know such as passwords , (b) what you own such as smart cards and (c) what you are which is inherent to your person, such as biometric data. You can adopt a more secure password-based authentication by including the keystroke dynamics verification (Gaines et al., 1980; Giot et al., 2009c). In this case, the strong authentication is provided by what we know (the password) and what we are (the way of typing it). With such a scheme, during an authentication, we verify two issues: (i) is the credential correct ? (ii) is the way of typing it similar ? If an attacker is able to steal the credential of a user, he will be rejected by the verification system because he will not be able to type the genuine password in a same manner as its owner. With this short example, we can see the benefits of this behavioral modality. Figure 1 presents the enrollment and verification schemes of keystroke dynamics authentication systems. We have seen that keystroke dynamics allows to secure the authentication process by verifying the way of typing the credentials. It can also be used to secure the session after its opening by detecting the changing of typing behavior in the session (Bergadano et al., 2002; Marsters, 2009). In this case, we talk about continuous authentication (Rao, 2005), the computer knows how the user interacts with its keyboard. It is able to recognize if another individual uses the 1 http://www.nta-monitor.com/ 8 Fig. 1. Keystroke dynamics enrolment and authentication schemes: A password-based authentication scenario keyboard, because the way of interacting with it is different. Moreover, keystroke dynamics can also prevent the steal of data or non authorized computer use by attackers. In this chapter, we present the general research field in keystroke dynamics based methods. Section 2 presents generalities on keystroke dynamics as the topology of keystroke dynamics methods and its field of application. Even if it has not been studied a lot comparing to other biometric modalities (see Table 1), keystroke dynamics is a biometric modality studied for many years. The first reference to such system dates from 1975 (Spillane, 1975), while the first real study dates from 1980 (Gaines et al., 1980). Since, new methods appeared all along the time which implies the proposal of many keystroke dynamics systems. They can be static, dynamic, based on one or two classes pattern recognition methods. The aim of this section is to explain all these points. Modality keystroke dynamics gait fingerprint face iris voice Nb doc. 2,330 1,390 17,700 18,300 10,300 14,000 Table 1. Number of documents referenced by Google Scholar per modality. The query is “modality biometric authentication" In section 3, we present the acquisition and features extraction processes of keystroke dynamics systems. Section 4 presents the authentication process of such keystroke dynamics based methods. These methods can be of different types: one class based (in this case, the model of a user is only built with its own samples), or two classes based (in this case, the model of a user is built also with samples of impostors). For one class problems, studies are based 158 Biometrics Keystroke Dynamics Authentication 3 on distance measures Monrose & Rubin (1997), others on statistical properties (de Magalhaes et al., 2005; Hocquet et al., 2006) or bioinformatics tools Revett (2009). Concerning two classes problems, neural networks (Bartmann et al., 2007) and Support Vectors Machines (SVM) (Giot et al., 2009c) have been used. Section 5 presents the evaluation aspects (performance, satisfaction and security) of keystroke dynamics systems. A conclusion of the chapter and some emerging trends in this research field are given in section 6. 2. Generalities 2.1 Keystroke dynamics topology Keystroke dynamics has been first imagined in 1975 (Spillane, 1975) and it has been proved to work in early eigthies (Gaines et al., 1980). First studies have proved that keystroke dynamics works quite well when providing a lot of data to create the model of a user. Nowadays, we are able to perform good performance without necessitating to ask a user to give a lot of data. “A lot of data” means typing a lot of texts on a computer. This possibility of using, or not, a lot of data to create the model allows us to have two main families of keystroke dynamics methods (as illustrated in Figure 2): • The static families, where the user is asked to type several times the same string in order to build its model. During the authentication phase, the user is supposed to provide the same string captured during his enrollment. Such methodology is really appropriate to authenticate an individual by asking him to type its own password, before login to its computer session, and verifying if its way of typing matches the model. Changing the password implies to enroll again, because the methods are not able to work with a different password. Two main procedures exist: the use of a real password and, the use of a common secret. In the first case, each user uses its own password, and the pattern recognition methods which can be applied can only use one class classifiers or distance measures. In the second case, all users share the same password and we have to address a two classes problem (genuine and impostor samples) (Bartmann et al., 2007; Giot et al., 2009c). Such systems can work even if all the impostors were not present during the training phase (Bartmann et al., 2007). • The dynamic families allow to authenticate individuals independently of what they are typing on the keyboard. Usually, they are required to provide a lot of typing data to create their model (directly by asking them to type some long texts, or indirectly by monitoring their computer use during a certain period). In this solution, the user can be verified on the fly all the time he uses its computer. We can detect a changing of user during the computer usage. This is related as continuous authentication in the literature. When we are able to model the behavior of a user, whatever the thing he types, we can also authenticate him through a challenge during the normal login process: we ask the user to type a random phrase, or a shared secret (as a one-time password, for example). 2.2 Applications and interest From the topology depicted in Figure 2, we can imagine many applications. Most of them have been presented in scientific papers and some of them are proposed by commercial applications. 159 Keystroke Dynamics Authentication 4 Will-be-set-by-IN-TECH Keystroke Dynamics Dynamic authentication Continuous authentication Random password Static authentication Two classes authentication One class authentication Fig. 2. Topology of keystroke dynamics families 2.2.1 Authentication for logical access control Most of commercial softwares are related to static keystroke dynamics authentication by modifying the Operating System login procedure. The authentication form is modified to include the capture of the timing information of the password (see Section 3.2.1), and, in addition of verifying the password, the way of typing is also verified. If it matches to the user profile, he is authenticated. Otherwise, he is rejected and considered as an impostor. By this way, we obtain two authentication factors (strong authentication): (i) what we know, which is the password of the user; (ii) what we are, which is the way of typing the password. The best practices of password management are rarely (even never) respected (regular change of password, use of a complex password, forbid to write the password on a paper, . . . ), because they are too restrictive. Moreover, they can be easily obtained by sniffing network, since a wide range of websites or protocols do not implement any protection measures on the transmission links. That is here, where keystroke dynamics is interesting, since it allows to avoid impostors which were able to get the password to authenticate instead of the real user. In addition, some studies showed that keystroke dynamics holds better performance when using simple passwords, than more complicated ones. If the user keeps a simple password, he remembers it more easily, and, administrators lost less time by giving new passwords. When used in a logical access control, the keystroke dynamics process uses different information such as the name of the user, the password of the user, the name and the password of the user, an additional passphrase (common for all the users, unique to the user). Modi & Elliott (2006) show that, sadly, using spontaneously generated password does not give interesting performance. This avoids the use of one time passwords associated to keystroke dynamics (when we are not in a monitoring way of capturing biometric data). 2.2.2 Monitoring and continuous authentication Continuously monitoring the way the user interacts with the keyboard is interesting (Ahmed & Traore, 2008; Rao, 2005; Song et al., 1997). With such a mechanism, the system is able to detect the change of user during the session life. By this way, the computer is able to lock the session if it detects that the user is different than the one which has previously been authenticated on this computer. Such monitoring can also be used to analyse the behavior of the user (instead its identity), and, detect abnormal activities while accessing to highly restricted documents or executing tasks in an environment where the user must be alert at all the times (Monrose & Rubin, 2000). 160 Biometrics Keystroke Dynamics Authentication 5 Continuous authentication is interesting, but has a lot of privacy concerns, because the system monitors all the events. Marsters (2009) proposes a solution to this problem of privacy. His keystroke dynamics system is not able to get the typed text from the biometric data. It collects quadgraphs (more information on ngraphs is given later in the chapter) for latency and trigraphs for duration. Instead of storing this information in an ordered log, it is stored in a matrix. By this way, it is impossible to recover the chronological log of keystroke, and, improve the privacy of the data. 2.2.3 Ancillary information Keystroke dynamics can also be used in different contexts than the authentication. Monrose & Rubin (2000) suggest the use of keystroke dynamics to verify the state of the user and alert a third party if its behavior is abnormal. But, this was just a suggestion, and not a verification. Hocquet et al. (2006) show that keystroke dynamics users can be categorised into different groups. They automatically assign each user to a group (authors empirically use 4 clusters). The parameters of the keystroke dynamics system are different for each group (and common for each user of the group), which allows to improve the performance of the system. However, there is no semantic information on the group, as everything is automatic. Giot &Rosenberger (2011) showthat it is possible to recognize the gender of an individual who types a predefined string. The gender recognition accuracy is superior to 91%. This information can be useful to automatically verify if the gender given by an individual is correct. It can be also used as an extra feature during the authentication process in order to improve the performance. Authors achieved an improvement of 20% of the Error Equal Rate (EER) when using the guessed gender information during the verification process. Epp (2010) shows that it is possible to get the emotional state of an individual through its keystroke dynamics. The author argues that if the computer is able to get the emotional state of the user, it can adapt its interface depending on this state. Such ability facilitates computer-mediated communication (communication through a computer). He respectively obtains 79.5% and 84.2% of correct classification for the relaxed and tired states. Khanna & Sasikumar (2010) show that 70% of users decrease their typing speed while there are in a negative emotional state (compared to a neutral emotional state) and 83%of users increase their typing speed when their are in a positive emotional state. Keystroke dynamics is also used to differentiate human behavior and robot behavior in keyboard use. This way, it is possible to detect a bot which controls the computer, and, intercepts its actions (Stefan & Yao, 2008). 3. Keystroke dynamics capture The capture phase is considered as an important issue within the biometric authentication process. The capture takes place at two different important times: • The enrollment, where it is necessary to collect several samples of the user in order to build its model. Depending of the type of keystroke dynamics systems, the enrollment procedure can be relatively different (typing of the same fixed string several times, monitoring of the computer usage, . . . ), and, the quantity of required data can be totally different between the studies (from five inputs (Giot et al., 2009c) to more than one hundred Obaidat & Sadoun (1997)). • The verification, where a single sample is collected. Various features are extracted from this sample. They are compared to the biometric model of the claimant. 161 Keystroke Dynamics Authentication 6 Will-be-set-by-IN-TECH This section first presents the hardware which must be used in order to capture the biometric data, and, the various associated features which can be collected from this data. 3.1 Mandatory hardware and variability Each biometric modality needs a particular hardware to capture the biometric data. The price of this hardware, as well as the number of sensors to buy, can be determinant when choosing a biometric system supposed to be used in a large infrastructure with number of users (e.g, necessity to buy a fingerprint sensor for each computer, if we choose a logical access control for each machine). Keystroke dynamics is probably the biometric modality with the cheapest biometric sensor : it uses only a simple keyboard of your computer. Such keyboard is present in all the personal computers and in all the laptops. If a keyboard is broken and it is necessary to change it, it would cost no more than 5$. Table 2 presents the sensor and its relative price for some modalities, in order to ease the comparison of these systems. Modality keystroke fingerprint face iris hand veins Sensor keyboard fingerprint sensor camera infrared camera near infra red camera Price very cheap normal normal very expensive expensive Table 2. Price comparison of hardware for various biometric modalities Of course, each keyboard is different on various points: • The shape (straight keyboard, keyboard with a curve, ergonomic keyboard, . . . ) • The pressure (how hard it is to press the key) • The position of keys (AZERTY, QWERTY, . . . ). Some studies only used the numerical keyboard of a computer (Killourhy & Maxion, 2010; Rodrigues et al., 2006). Hence, changing a keyboard may affect the performances of the keystroke recognition. This problem is well known in the biometric community and is related as cross device matching (Ross & Jain, 2004). It has not been treated a lot in the keystroke dynamics literature. Figure 3 presents the shape of two commonly used keyboards (laptop and desktop). We can see that they are totally different, and, the way of typing on it is also different (maybe mostly due by the red ball on the middle of the laptop keyboard). (a) Desktop keyboard (b) Laptop keyboard Fig. 3. Difference of shape of two classical keyboards Having this sensor (the keyboard) is not sufficient, because (when it is a classical one), the only information it provides is the code of the key pressed or released. This is not at all a biometric information, all the more we already know if it is the correct password or not, whereas we 162 Biometrics Keystroke Dynamics Authentication 7 are interested in if it is the right individual who types it. The second thing we need is an accurate timer, in order to capture at a sufficient precision the time when an event occurs on the keyboard. Once again, this timer is already present in every computer, and, each operating system is able to use it. Hence, we do not need to buy it. There is a drawback with this timer: its resolution can be different depending on the chosen programming language or the operating system. This issue has been extensively discussed by Killourhy & Maxion (2008), where it is shown that better performance are obtained with higher accuracy timer. Some researchers have also studied the effect of using an external clock instead of the one inside the computer. Pavaday. et al. (2010) argue that it is important to take into consideration this timer, especially when comparing algorithms, because it has an impact on performance. They also explain how to configure the operating system in order to obtain the best performances. Even on the same machine, the timer accuracy can be different between the different languages used (by the way, keep in mind, that web based keystroke dynamics implementation use interpreted languages –java or javascript– which are known to not have a precise timer on all the architectures). Historically, keystroke dynamics works with a classical keyboard on a computer, and avoids the necessity to buy a specific sensor. However, some studies have been done by using other kinds of sensors in order to capture additional information and improve the recognition. Some works (Eltahir et al., 2008; Grabham & White, 2008) have tested the possibility of using a pressure sensor inside each key of the keyboard. In this case, we can exploit an extra information in order to discriminate more easily the users: the pressure force exerced on the key. Lopatka & Peetz (2009) propose to use a keyboard incorporating a Sudden Motion Sensor (SMS) 2 . Such sensor (or similar ones) is present in recent laptops and is used to detect sudden motion of the computer in order to move the writing heads of the hard drive when a risk of damage of the drive is detected. Lopatka & Peetz use the movement in the z axis as information. From these preliminary study, it seems that this information is quite efficient. Sound signals produced by the keyboard typing have also been used in the literature. Nguyen et al. (2010) only use sound signals when typing the password, and obtain indirectly through the analysis of this signal, key-pressed time, key-released time and key-typed forces. Performance is similar to classical keystroke dynamics systems. Dozono et al. (2007) use the sound information in addition to the timing values (i.e., it is a feature fusion) which held better performance than the sound alone, or the timing information alone. Of course, as keystroke dynamics can work with any keyboard, it can also work with any machine providing a keyboard, or something similar to a keyboard. One common machine having a keyboard and owned by a lot of people is the mobile phone where we can use keystroke dynamics on it. We have three kinds of mobile phones: • Mobile phone with a numerical keyboard. In this case, it is necessary to press several times the same key in order to obtain an alphabetical character. Campisi et al. (2009) present a study on such a mobile phone. They argue that such authentication mechanism must be coupled with another one. • Mobile phone with all the keys (letters and numbers) accessible with the thumbs. This is a kind of keyboard quite similar to a computer’s keyboard. Clarke & Furnell (2007) show its feasibility and highlight the fact that such authentication mechanism can only be used by regular users of mobile phones. 2 http://support.apple.com/kb/HT1935 163 Keystroke Dynamics Authentication 8 Will-be-set-by-IN-TECH • Mobile phone without any keyboard, but a touch screen. We can argue that the two previous mobile phones are already obsolete and will be soon replaced by such kind of mobile phones. Although, there are few studies on this kind of mobile phone, we think the future of keystroke dynamics is on this kind of material. With such a mobile phone, we can capture the pressure information and position of the finger on the key which could be discriminating. Figure 4 presents the topology of the different keystroke dynamics sensors, while the Figure 5 presents the variability on the timer. Keystroke Dynamics Sensor Computer PC/Laptop keyboard Microphone Numeric keyboard Pressure sensitive Mobile Touch screen Mobile keyboard All the keys Numeric keyboard Fig. 4. Topology of keystroke dynamics sensors of the literature Timer variations Operating System Type Desktop application Mobile phone Web based application Language Native Interpreted Fig. 5. Topology of factors which may impact the accuracy of the timer 3.2 Captured information As argued before, various kinds of information can be captured. They mainly depend on the kind of used sensors. Although, we have presented some sensors that are more or less advanced in the previous subsection, we only emphasize, in this chapter, on a classic keyboard. 164 Biometrics Keystroke Dynamics Authentication 9 3.2.1 Raw data In all the studies, the same rawdata is captured (even if they are not manipulated as explained here). We are interested by events on the keyboard. These events are initiated by its user. The raw biometric data, for keystroke dynamics, is a chronologically ordered list of events: the list starts empty, when an event occurs, it is appended at the tail of the list with the following information: • Event. It is generated by an action on the key. There are two different events: – press occurs when the key is pressed. – release occurs when the key is released. • Key code. It is the code of the key from which the event occurs. We can obtain the character from this code (in order to verify if the list of characters corresponds to the password, for example). The key code is more interesting than the character, because it gives some information on the location of the key on the keyboard (which can be used by some keystroke dynamics recognition methods) and allows to differentiate different keys giving the same character (which is a discriminant information (Araujo et al., 2005)). This key code may be dependant of the platform and the language used. • Timestamp. It encodes the time when the event occurs. Its precision influence greatly the recognition performance. Pavaday. et al. (2010) propose to use the Windows function QueryPer f ormanceCounter 3 with the highest priority enabled for Windows computers, and, changing the scheduler policy to FIFO for Linux machines. It is usually represented in milliseconds, but this is not mandatory. The raw data can be expressed as (with n the number of events on the form n = 2 ∗ s with s the number of keys pressed to type the text): ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ (keycode i , event i , time i ), ∀i, 0 <= i < n keycode i ∈ Z event i ∈ {PRESS, RELEASE} time i ∈ N (1) Umphress & Williams (1985) only use the six first time values of each word (so s ≤ 6). Depending on the kind of keystroke dynamics application, the raw data is captured in different kind of scenarios: in the authentication form to type the login and password, in a form asking to type a predefined or random text different than the login and password, or in continuous capture during the use of the computer. 3.2.2 Extracted features Various features can be extracted from this raw data, we present the most commonly used in the literature. 3.2.2.1 First order The most often extracted features are local ones, computed by subtracting timing values. • Duration. The duration is the amount of time a key is pressed. For the key i (i is omitted for sake of readability) it is computed as following: duration = time{event = RELEASE} − time{event = PRESS} (2) 3 http://msdn.microsoft.com/en-us/library/ms644904%28v=VS.85%29.aspx 165 Keystroke Dynamics Authentication 10 Will-be-set-by-IN-TECH We then obtain a timing vector (of the size of the typed text), also named PR in the literature, containing the duration of each key press (by order of press). ∀i, 1 ≤ i ≤ n, PR i = duration i (3) • Latencies. Different kinds of latencies can be used. They are computed by getting the differences of time between two keys events. We can obtain the PP latencies which are the difference of time between the pressure of each key: ∀i, 1 ≤ i < n, PP i = time i+1 {event i+1 = PRESS} − time i {event i = PRESS} (4) We can obtain the RR latencies which are the difference of time between the release of each key: ∀i, 1 ≤ i < n, RR i = time i+1 {event i+1 = RELEASE} − time i {event i = RELEASE} (5) We can obtain the RP latencies which are the difference of time between the release of one key and the pressure of the next one: ∀i, 1 ≤ i < n, RP i = time i+1 {event i+1 = PRESS} − time i {event i = RELEASE} (6) Most of the time, a feature fusion is operated by concatenating the duration vector with, at least, one of the latency vector (it seems that most of the time, the selected latency vector is the PP one, but it is not always indicated in the papers). A recent paper Balagani et al. (2011) discusses on the way of using these extracted features in order to improve the recognition rate of keystroke dynamics systems. Other kinds of data can be encountered in various papers Ilonen (2003). They are mainly global types of information: • Total typing. The total time needed to type the text can also be used. The information can be used as an extra feature to append to the feature vectors, or as a normalisation factor. • Middle time. The time difference between the time when the user types the character at the middle of the password, and the time at the beginning of the input. • Mistake ratio. When the user is authorised to do typing mistakes (this is always the case in continuous authentication, but almost never the case in static authentication), counting the number of times the backspace key is hit gives an interesting feature. Another concept that is often encountered in the literature, is the notion of digraph. A digraph represents the time necessary to hit two keys. The digraph features D of a password is computed as following: ∀i, 1 ≤ i < n, D i = time i+1 {event i+1 = RELEASE} − time i {event i = PRESS} (7) This notion has been extended to ngraph, with n taking different values. trigraph are heavily used in (Bergadano et al., 2002). de Ru & Eloff (1997) use a concept of typing difficulty based on the fact that certain key combinations are more difficult to type than other. The typing difficulty is based on the distance (on the keyboard) between two successive characters (to type), and if several keys are needed to create a character (i.e., use of shift key). 166 Biometrics Keystroke Dynamics Authentication 11 3.2.2.2 Second order Some features are not extracted from the raw biometric data, but from the first order features. • min/max. It consists to get the minimumand maximum value of each type of data (latency and duration). • mean/std. It consists to get the mean value and its standard deviation of each type of data (latency and duration). • Slope. By using the slope of the biometric sample, we are interested in the global shape of the typing. We expect that users type in the same way even if the speed may be different (Modi & Elliott, 2006). The new features (result) set is computed as following (with source): ∀i, 1 ≤ i < n, result i = source i+1 − source i (8) • Entropy. The entropy inside a sample has been only studied in (Monrose et al., 2002). • Spectral information. Chang (2006a) applies a discrete wavelet transformation to the original extracted features. All the operations are done with the wavelet transformed data. We can imagine more complicated features, but the final biometric data is always a single vector composed of various features. While computing the model with several samples (see next section) feature selection mechanisms can remove non informative features. We do not insist on papers using other information than timing values in the rest of this chapter (pressure force, movements, . . . ). We have seen in this section that several features can be extracted. Verification procedures performance greatly depends on the chosen features, but, most of the time, papers only use one latency and the duration. 4. Authentication framework Once the different biometric data during enrolment procedure have been captured, it is time to build the model of each user. The way of computing it greatly depends on the used verification methods. During an authentication, the verification method compares the query sample (the biometric data captured during the authentication) to the model. Based on the result of this comparison (which is commonly a distance), the decision module accepts or rejects the user. 4.1 Enrolment The enrolment step allows to create the model of each user, thanks to its enrolled samples. Most of the time, the number of samples used during the enrolment is superior to 20. Such a high quantity of data can be really boring for the users to provide. 4.1.1 Outliers detection It is known that the classifier performance greatly depends on outliers presence in the learning dataset. Most keystroke dynamics studies do not take care of the presence of outliers in the learning set. Some studies (mainly in free text) remove times superior to a certain threshold. In (Gaines et al., 1980), filtering is done by removing timing values superior to 500ms, while in (Umphress & Williams, 1985) it is timing values superior to 750ms. Rogers & Brown (1996) cleanup data with using a Kohonen network (Kohonen, 1995) using impostors samples. They also use a statistical method. Killourhy & Maxion (2010) also detect outliers in biometric samples. An outlier feature is detected in the following way (for each feature): the feature is more than 1.5 inter-quartile range greater than the third quartile, or more than 1.5 167 Keystroke Dynamics Authentication 12 Will-be-set-by-IN-TECH inter-quartile less than the first quartile. When a feature is detected as being an outlier, it is replaced by a random sample (which is not an outlier) selected among possible values of this feature for this user. The procedure is operated for each feature of each sample. By this way, the number of samples is always the same. It seems that, most of the time, the outlier detection and correction is operated on the whole dataset, and not on the learning set. This allows to cleanup the used dataset to compute the algorithm performance (and obtain better performance), but not the enrolled samples of the user. 4.1.2 Preprocessing Biometric data may be normalized before being used. Such pre-processing allows to get better performance by using a normalisation function (Filho &Freire (2006) observed that the timing distribution is roughly Log-Normal) : g(x) = 1 1 + exp − K(log e (x)−μ σ (9) We did not find other references to other pre-processing approaches in the literature. The parameters K (k is chosen in order to minimise the squared error between the approximated function and the cumulative distribution function of the logarithm of timings distribution), μ and sigma respectively represent an optimisation factor, the mean of the logarithm of the timing values, the standard deviation of the logarithm of the timing values. 4.1.3 Feature selection A feature selection mechanism can be applied to remove irrelevant features. It seems that this point has also been rarely tested. The aim of the feature selection is to reduce the quantity of data and speed up the computation time, and, eventually to improve the performance. Very few studies have applied such kind of mechanism. Two different kinds of feature extraction systems can be used: • Filter approach which does not depend on the verification algorithm. The aim is to remove irrelevant features based on different measures (e.g., the variance); • Wrapper approach which depends on the verification algorithm. Different feature subsets are generated and evaluated. The best one is kept. Boechat et al. (2006) select a subset of N features with the minors of standard deviation, which allows to eliminate less significant features. Experiments are done at Zero False Acceptance Rate. False Rejection Rate reduces when the number of selected features increases. Keeping 70%of the features gives interesting results. Azevedo et al. (2007) use a wrapper systembased on Particle Swarm Optimization (PSO) to operate the feature selection. The PSO gives better results than a Genetic Algorithm. Bleha & Obaidat (1991) use a reduction technique based on Fisher analysis. However, the technique consists in keeping m −1 dimension for each vector, with m the number of users in the system (they have only 9 users in their system). Yu & Cho (2004) use an algorithm based an Support Vector Machines (SVM) and Genetic Algorithms (GA) to reduce the size of samples and keep only key values for each user. Other similar methods are present in the literature (Chen & Lin, 2005). 168 Biometrics Keystroke Dynamics Authentication 13 4.1.4 Model computation There are numbers of methods to verify if a query corresponds to the expected user. Some of them are based on statistical methods, other on data mining methods. Some methods use one-class assumption (they only use the enrolment samples of the user), while other use two-class or multi-class assumption (they also use impostors enrollment samples to compute the model). When impostors samples are needed, they may be automatically generated (Sang et al., 2004), instead of being collected with real impostors (Clarke & Furnell, 2006; Obaidat & Sadoun, 1997). Generally, data mining methods use a really huge number of enrolled samples to compute the model (several hundred of samples in some neural network methods) which is not realistic at all. Most used way of model computing are: • Computing the mean vector and standard deviation of enrolled samples (Umphress & Williams, 1985); • Store the enrolled vectors in order to use them with k nearest neighbourg methods (Rao, 2005) (variations being in the distance computing method (Kang & Cho, 2009)); • Learning of bayesian classifiers (Janakiraman & Sim, 2007; Rao, 2005); • Learning clusters with k-mean (Hwang et al., 2006; Obaidat & Sadoun, 1997) ; • Learning parameters of generative functions: Hidden Markov Model (HMM) (Galassi et al., 2007; Pohoa et al., 2009; Rodrigues et al., 2006) or Gaussian Mixture Models (GMM) (Hosseinzadeh & Krishnan, 2008) ; • Neural network learning (Bartmann et al., 2007; Clarke &Furnell, 2006; Obaidat &Sadoun, 1997; Rogers & Brown, 1996) ; • SVM learning (Giot et al., 2009c; Rao, 2005; Sang et al., 2004; Yu & Cho, 2004). 4.2 Verification The verification consists in verifying if the input of the user corresponds to the claimed identity. The way of capturing these inputs greatly depends on the kind of used keystroke dynamics system (e.g., for static authentication, the user must type its login and password). While the features are extracted from the raw biometric sample (same procedure than during the enrollment), they are compared to the model of the claimed user. Usually, the verification module returns a comparison score. If this score is below than a predefined threshold, the user is authenticated, otherwise, he is rejected. Several verification methods exist and depend on the way the enrollment is done, so they are similar to the present list. query represents the query biometric sample (the test capture to compare to the model). . p represents the p norm of vector. The main families of computing are (Guven & Sogukpinar, 2003): • The minimal distance computing. In (Monrose &Rubin, 1997), the euclidean distance between the query and each of enrolled samples is computed. The comparison score is the min of these distances. score = min query −enrolled u 2 , ∀ u∈[1,Card(enrolement)] (10) • The statistical methods. 169 Keystroke Dynamics Authentication 14 Will-be-set-by-IN-TECH One of the oldest methods is based on bayesian probabilities (Bleha et al., 1990). μ is the mean value of enrolled samples: score = (query −μ) t (query −μ) query 2 · μ 2 (11) A normalized version is also presented in the study. The statistical method presented in (Hocquet et al., 2006) computes the score depending on the mean μ and the standard deviation σ of the enrolled samples: score = 1 − 1 Card(query) exp − |query −μ| σ 1 (12) Filho & Freire (2006) present another method which also computes a distance: score = query −μ 2 2 (13) • Application of fuzzy rules de Ru & Eloff (1997). • Class verification. For classifiers able to give a label, the verification consists in verifying if the guessed label corresponds to the label of the claimed identity (cf. neural networks, SVM, k − nn). • Some methods are based on the disorder degree of vectors (Bergadano et al., 2002). • Others are based on timing discretisation (Hocquet et al., 2006). • Bioinformatic methods based on string motif searching are also used (Revett et al., 2007). 4.3 Improving the performance Different ways can be used to improve the performance of the recognition. Several studies (Bartmann et al., 2007; Hosseinzadeh & Krishnan, 2008; Killourhy & Maxion, 2010; Revett, 2009) request the user to type the verification text several times (mainly between two and three), when he is rejected, in order to give him more chances of being verified. Such procedure reduces the False Rejection Rate without growing the False Acceptance Rate too much. Other studies try to update the model of a user after being authenticated (Hosseinzadeh & Krishnan, 2008; Revett, 2009). This way, the model tracks the behavior modifications of the user through time, and integrate them in the model. As the keystroke data deviates progressively with time, performance degrades with time when not using such procedure. It is not always clear in the various studies if the template update is done in a supervised way (impostors samples never added), or in a semi-supervised way (samples added if the classifier recognizes them as being genuine). Even if the aim is to improve performance, the result can be totally different: semi-supervised methods may add impostor samples in the model. This way, the model deviates from the real biometric data of the user and attracts more easily impostors samples. Classifier performances greatly depend on the number of used samples to compute them. Chang (2006b) artificially generates new samples from the enrolled samples in order to improve keystroke recognition. The system uses a transformation in frequential domains thanks to wavelets. Another way to improve recognition performance is to fuse two samples together (Bleha & Obaidat, 1991). This way, timing values are smoothed when merging the two samples and light hesitation are suppressed. The fusion (Ross et al., 2006) of several keystroke dynamics methods on the same query is also a good way to improve performances: 170 Biometrics Keystroke Dynamics Authentication 15 • Bleha et al. (1990) associate a bayesian classifier to a minimal distance computing between the query vector and the model. • Hocquet et al. (2006) apply a fusion between three different keystroke dynamics methods, which greatly improves the performance. • Different kinds of weighted sums score fusion functions are proposed in Giot, El-Abed & Rosenberger (2010); Teh et al. (2007). Keystroke dynamics has also been successfully fused with other modalities, like face (Giot, Hemery & Rosenberger, 2010) or speaker recognition (Montalvao Filho & Freire, 2006). Hwang et al. (2006) have defined various measures to get the unicity, consistancy and discriminality. By analysing the behavior of these measures comparing the recognition performance, they find that it is possible to improve performance by asking users to artificially add pauses (helped by cues for being synchronized) when typing the password. Karnan et al. (2011) propose an interesting review of most of the keystroke dynamics recognition methods. 4.4 User identification The verification consists in verifying if the identity of the claimant is correct, while the identification consists to determine the identity of the user. We may find methods specifics to identification, or compare the query to each model, the identity being the owner of the model returning the lowest distance (or a reject if this distance is higher a threshold). Bleha et al. (1990) use a bayesian classifier to identify the user. Identification based on keystroke dynamics has not been much experimented in the literature. 5. Evaluation of keystroke dynamics systems Despite the obvious advantages of keystroke dynamics systems in enhancing traditional methods based on a secret, its proliferation is still not as much as expected. The main drawback is notably the lack of a generic evaluation method for such systems. We need a reliable evaluation methodology in order to put into obviousness the benefit of a new method. Nowadays, several studies exist in the state-of-the-art to evaluate keystroke dynamics systems. It is generally realized within three aspects: performance, satisfaction and security. 5.1 Performance The goal of this evaluation aspect is to quantify and to compare keystroke dynamics systems. In order to compare these systems, we need generally to compute their performance using a predefined protocol (acquisition conditions, test database, performane metrics, . . .). According to the International Organization for Standardization ISO/IEC 19795-1 (2006), the performance metrics are divided into three sets: • Acquisition performance metrics such as the Failure-To-Enroll rate (FTE). • Verification system performance metrics such as the Equal Error Rate (EER). • Identification system performance metrics such as the False-Negative and the False-Positive Identification Rates (FNIR and FPIR, respectively). Several benchmark databases exist in order to compare keystroke dynamics systems. A benchmark database can contain real samples from individuals, which reflect the best the real use cases. Nevertheless, it is costly in terms of efforts and time to create such a database. As argued by Cherifi et al. (2009), a good benchmark database must satisfy various requirements: 171 Keystroke Dynamics Authentication 16 Will-be-set-by-IN-TECH 1. As keystroke dynamics is a behavioral modality, the database must be captured among different sessions, with a reasonable time interval between sessions, in order to take into account the variation of individuals behavior. 2. The database must also contain fake biometric templates to test the robustness of the system. It seems that there is no other reference to this kind of experiment in the literature. 3. The benchmark must embed a large diversity of users (culture, age, . . .). This point is essential for any biometrics, but, it is really difficult to attain. We present an overview of the existing benchmark databases: DB 1 Chaves Montalvão et al. have used the same keystroke databases in several papers (Filho & Freire, 2006). The databases are available at http://itabi.infonet.com.br/biochaves/ br/download.htm. The databases do not seem to be yet available on their website. The maximum number of users in a database is 15, and, the number of provided samples per user is 10. Each database contains the raw data. The database is composed of couples of ASCII code of the pressed key and the elapsed time since the last key down event. Release of a key is not tracked. Four different databases have been created. Most databases were built under two different sessions spaced of one week or one month (depending on the database). Each database is stored in raw text files. DB 2 DSN2009 Killourhy &Maxion (2009) propose a database of 51 users providing four hundred samples captured in height sessions (there are fifty inputs per session). The delay between each session is one one day at minimum, but the mean value is not stated. This is the dataset having the most number of samples per user, but, a lot of them are typed on a short period (50 at the same time). Each biometric data has been captured when typing the following password: “.tie5Roanl”. The database contains some extracted features: hold time, interval between two pressures, interval between the release of a key, and the pressure of the next one. The database is available at http://www.cs.cmu.edu/~keystroke/. It is stored in raw text, csv or Excel files. DB 3 Greyc alpha Giot et al. (2009a) propose the most important public dataset in term of users. It contains 133 users and, 100 of them provided samples of, at least, five distinct sessions. Each user typed the password “greyc laboratory” twelve times, on two distinct keyboards, during each session (which give 60 samples for the 100 users having participated to each session). Both extracted features (hold time and latencies) and raw data are available (which allow to build other extracted features). The database is available at http: //www.ecole.ensicaen.fr/~rosenber/keystroke.html. It is stored in an sqlite database file. DB 4 Pressure-Sensitive Keystroke Dynamics Dataset Allen (2010) has created a public keystroke dynamics database using a pressure sensitive keyboard. The database is available at http://jdadesign.net/2010/04/ pressure-sensitive-keystroke-dynamics-dataset/ in a csv or sql file. It embeds the following raw data: key code, time when pressed, time when release, pressure force. 104 users are present on the database, but, only 7 of them provided a significant amount of data (between 89 and 504), whereas the 97 other have only provided between 3 and 15 samples. Three different passwords have been typed: “pr7q1z”, “jeffrey allen” and “drizzle”. 172 Biometrics Keystroke Dynamics Authentication 17 DB 5 Fixed Text The most recent database has been released in 2010 Bello et al. (2010). 58 volunteers participated to the experiment. Each session consists in typing 14 phrases extracted from books and 15 common UNIX commands. It seems that almost all the users have done only one session. The database is available at http://www.citefa.gov.ar/si6/ k-profiler/dataset/ in a raw text file. Press and release times for each key are saved, as well as the user agent of the browser from which the session has been done, the age, gender and handness of the user and other information. We can see that some databases are available. Each of them has been created for keystroke dynamics on computer (i.e. no public dataset available for smartphones). Despite this, these databases do not always fit the previous requirements, which may explain why none of them have been used by researchers different than their creators. Although, it would be the best kind of dataset, no public dataset has been built with one login/password different for each user. Table 3 presents a summary of these public datasets. Dataset Type Information Users Samples /users Sessions Filho & Freire (2006) Various Press events < 15 < 10 2 Killourhy & Maxion (2009) 1 fixed string Duration and 2 latencies 51 400 8 Giot et al. (2009a) 1 fixed String Press and release events. Duration and 3 latencies > 100 60 5 Allen (2010) 3 fixed strings Press and release events and pressure 7/97 (89-504) /(3-15) few months Bello et al. (2010) 14 phrases and 15 unix commands Press and release time 58 1 1 Table 3. Summary of keystroke dynamics datasets Most of the proposed keystroke dynamics methods in the literature have quantified their methods using different protocols for their data acquisition (Giot et al., 2009c; Killourhy & Maxion, 2009). Table 4 illustrates the differences of the used protocols in this research area for some major studies. The performance comparison of these methods is quite impossible, as stated in (Crawford, n.d.; Giot et al., 2009a; Karnan et al., 2011; Killourhy & Maxion, 2009), due to several reasons. First, most of these studies have used different protocols for their data acquisition, which is totally understandable due to the existence of different kinds of keystroke dynamics systems (static, continuous, dynamic) that require different acquisition protocols. Second, they differ on the used database (number of individuals, separation between sessions . . .), the acknowledgement of the password (if it is an imposed password, a high FTA is expected), the used keyboards (which may deeply influences the way of typing), and the use of different or identical passwords (which impacts on the quality of impostors’ data). In order to resolve such problematic, Giot et al. (2011) presents a comparative study of seven methods (1 contribution against 6 methods existing in the literature) using a predefined protocol, and GREYC alpha database (Giot et al., 2009a). The results from this study show a promising EER value equal to to 6.95%. To our knowledge, this is the only work that compares 173 Keystroke Dynamics Authentication 18 Will-be-set-by-IN-TECH keystroke methods within the same protocol, and using a publicly available database. The performance of keystroke dynamics systems (more general speaking, of behavioral systems) provides a lower quality than the morphological and biological ones, because they depend a lot on user’s feelings at the moment of the data acquisition: user may change his way of performing tasks due to its stress, tiredness, concentration or illness. Previous works presented by Cho & Hwang (2006); Hwang et al. (2006) focus on improving the quality of the captured keystroke features as a mean to enhance system overall performance. Hwang et al. (2006) have employed pauses and cues to improve the uniqueness and consistency of keystroke features. We believe that it is relevant to more investigate the quality of the captured keystroke features, in order to enhance the performance of keystroke dynamics systems. Paper A B C D E FAR FRR Obaidat & Sadoun (1997) 8 weeks 15 112 no no 0% 0% Bleha et al. (1990) 8 weeks 36 30 yes yes 2.8% 8.1% Rodrigues et al. (2006) 4 sessions 20 30 / no 3.6% 3.6% Hocquet et al. (2007) / 38 / / no 1.7% 2.1% Revett et al. (2007) 14 days 30 10 / no 0.15% 0.2% Hosseinzadeh & Krishnan (2008) / 41 30 no no 4.3% 4.8% Monrose & Rubin (1997) 7 weeks 42 / no no / 20% Revett et al. (2006) 4 weeks 8 12 / / 5.58% 5.58% Killourhy & Maxion (2009) 8 sessions 51 200 yes no 9.6% 9.6% Giot et al. (2009c) 5 sessions 100 5 yes no 6.96% 6.96% Table 4. Summary of the protocols used for different studies in the state-of-the-art (A: Duration of the database acquisition, B: Number of individuals in the database, C: Number of samples required to create the template, D: Is the acquisition procedure controlled?, E: Is the threshold global?). “/” indicates that no information is provided in the article. 5.2 Satisfaction This evaluation aspect focuses on measuring users’ acceptance and satisfaction regarding the system (Theofanos et al., 2008). It is generally measured by studying several properties such as easiness to use, trust in the system, etc. The works done by El-Abed et al. (2010); Giot et al. (2009b) focusing on studying users’ acceptance and satisfaction of a keystroke dynamics system (Giot et al., 2009a), show that the system is well perceived and accepted by the users. Figure 7 summarizes users’ acceptance and satisfaction while using the tested system. Satisfaction factors are rated between 0 and 10 (0 : not satisfied · · · 10 : quite satisfied). These results show that the tested system is well perceived among the five acceptance and satisfaction properties. Moreover, there were no concerns about privacy issues during its use. In biometrics, there is a potential concern about the misuse of personal data (i.e., templates) which is seen as violating users’ privacy and civil liberties. Hence, biometric systems respecting this satisfaction factor are considered as usefull. 5.3 Security Biometric authentication systems present several drawbacks which may considerably decrease their security. Schneier (1999) compares traditional security systems with biometric systems. The study presents several drawbacks of biometric systems including: • The lack of secrecy: everybody knows our biometric traits such as iris, • and, the fact that a biometric trait cannot be replaced if it is compromised. 174 Biometrics Keystroke Dynamics Authentication 19 El-Abed et al. (2011) propose an extension of the Ratha et al. model (Ratha et al., 2001) to categorize the common threats and vulnerabilities of a generic biometric system. Their proposed model is divided into two sets as depicted in figure 6: architecture threats and system overall vulnerabilities. Fig. 6. Vulnerability points in a general biometric system. 5.3.1 Set I architecture threats 1) Involves presenting a fake biometric data to the sensor. An example of such attack is the zero-effort attempts. Usually, attackers try to impersonate legitimate users having weak templates; 2) and 4) In a replay attack, an intercepted biometric data is submitted to the feature extractor or the matcher bypassing the sensor. Attackers may collect then inject previous keystroke events features using a keylogger; 3) and 5) The system components are replaced with a Trojan horse program that functions according to its designer specifications; 6) Involves attacks on the template database such as modifying or suppresing keystroke templates; 7) The keystroke templates can be altered or stolen during the transmission between the template database and the matcher; 8) The matcher result (accept or reject) can be overridden by the attacker. 5.3.2 Set II system overall vulnerabilities 9) Performance limitations By contrast to traditional authentication methods based on “what we know” or “what we own” (0% comparison error), biometric systems is subject to errors such as False Acceptance Rate (FAR) and False Rejection Rate (FRR). This inaccuracy illustrated by statistical rates would have potential implications regarding the level of security provided by a biometric system. Doddington et al. (1998) assign users into four categories: • Sheep: users who are recognized easily (contribute to a low FRR), • Lambs: users who are easy to imitate (contribute to a high FAR), • Goats: users who are difficult to recognize (contribute to a high FRR), and • Wolves: users who have the capability to spoof the biometric characteristics of other users (contribute to a high FAR). 175 Keystroke Dynamics Authentication 20 Will-be-set-by-IN-TECH A poor biometric in term of performance, may be easily attacked by lambs, goats and wolves users. There is no reference to this user classification in the keystroke dynamics literature. Therefore, it is important to take into consideration system performance within the security evaluation process. The Half Total Error Rate (HTER) may be used as an illustration of system overall performance. It is defined as the mean of both error rates FAR and FRR: HTER = FAR + FRR 2 (14) 10) Quality limitations during enrollment The quality of the acquired biometric samples is considered as an important factor during the enrollment process. The absence of a quality test increases the possibility of enrolling authorized users with weak templates. Such templates increase the probability of success of zero-effort impostor, hill-climbing and brute force (Martinez-Diaz et al., 2006) attempts. Therefore, it is important to integrate such information within the security evaluation process. In order to integrate such information, a set of rules is presented in (El-Abed et al., 2011). According to the International Organization for Standardization ISO/IEC FCD 19792 (2008), the security evaluation of biometric systems is generally divided into two complementary assessments: 1. Assessment of the biometric system (devices and algorithms), and 2. Assessment of the environmental (for example, is the system is used indoor or outdoor?) and operational conditions (for example, tasks done by system administrators to ensure that the claimed identities during enrollment of the users are valid). A type-1 security assessment of a keystroke dynamics system (Giot et al., 2009a) is presented in El-Abed et al. (2011). The presented method is based on the use of a database of common threats and vulnerabilities of biometric systems, and the notion of risk factor. A risk factor, for each identified threat and vulnerability, is considered as an indicator of its importance. It is calculated using three predefined criteria (effectiveness, easiness and cheapness) and is defined between 0 and 1000. More the risk factor is near 0, better is the robustness of the Target of Evaluation (ToE). Figure 7 summarizes the security assessment of the TOE, which illustrates the risk factors of the identified threats and system overall vulnerabilities among the ten assessment points (the maximal risk factor is retained from each point). 5.4 Discussion The evaluation of keystroke dynamics modality are very few in comparison to other types of modalities (such as fingerprint modality). As shown in section 5.1, there is only a few public databases that could be used to evaluate keystroke dynamics authentication systems. There is none competition neither existing platform to compare such behavioral modality. The results presented in the previous section show that the existing keystroke dynamics methods provide promising recognition rates, and such systems are well perceived and accepted by users. In our opinion, we believe that keystroke dynamics systems belong to the possible candidates that may be implemented in an Automated Teller Machine (ATM), and can be widely used for e-commerce applications. 176 Biometrics Keystroke Dynamics Authentication 21 Fig. 7. Satisfaction (on the left) and security (on the right) assessment of a keystroke dynamics based system. 6. Conclusion and future trends We have presented in this chapter an overview of keystroke dynamics literature. More information on the subject can be found in various overviews: Revett (2008, chapter 4) deeply presents some studies. We believe that the future of the keystroke dynamics is no more on desktop application, whereas it is the most studied in the literature, but in the mobile and internet worlds, because mobile phones are more popular than computers and its use is very democratized. They are more and more powerful every year (in terms of calculation and memory) and embeds interesting sensors (pressure information with tactile phones). Mobile phone owners are used to use various applications on their mobile and they will probably agree to lock them with a keystroke dynamics biometric method. Nowadays, more applications are available in a web browser. These applications use the classical couple of login and password to verify the identity of a user. Integrating them a keystroke dynamics verification would harden the authentication process. In order to spread the keystroke modality, it is necessary to solve various problems related to: • The cross devices problem. We daily use several computers which can have different keyboards on timing resolution. These variability must not have an impact on the recognition performances. Users tend to change often their mobile phone. In an online authentication scheme (were the template is stored on a server), it could be useful to not re-enroll the user on its new mobile phone. • The aging of the biometric data. Keystroke dynamics, is subject to a lot of intra class variability. One of the main reasons is related to the problem of template aging: performances degrade with time because user (or impostors) type differently with time. 7. Acknowledgment The authors would like to thank the Lower Normandy Region and the French Research Ministry for their financial support of this work. 8. References Ahmed, A. & Traore, I. (2008). Handbook of Research on Social and Organizational Liabilities in Information Security, Idea Group Publishing, chapter Employee Surveillance based on Free Text Detection of Keystroke Dynamics, pp. 47–63. 177 Keystroke Dynamics Authentication 22 Will-be-set-by-IN-TECH Allen, J. D. (2010). An analysis of pressure-based keystroke dynamics algorithms, Master’s thesis, Southern Methodist University, Dallas, TX. Araujo, L., Sucupira, L.H.R., J., Lizarraga, M., Ling, L. & Yabu-Uti, J. (2005). User authentication through typing biometrics features, IEEE Transactions on Signal Processing 53(2 Part 2): 851–855. Azevedo, G., Cavalcanti, G., Carvalho Filho, E. & Recife-PE, B. (2007). An approach to feature selection for keystroke dynamics systems based on pso and feature weighting, Evolutionary Computation, 2007. CEC 2007. IEEE Congress on. Balagani, K. S., Phoha, V. V., Ray, A. & Phoha, S. (2011). On the discriminability of keystroke feature vectors used in fixed text keystroke authentication, Pattern Recognition Letters 32(7): 1070 – 1080. Bartmann, D., Bakdi, I. & Achatz, M. (2007). On the design of an authentication system based on keystroke dynamics using a predefined input text, Techniques and Applications for Advanced Information Privacy and Security: Emerging Organizational, Ethical, and Human Issues 1(2): 149. Bello, L., Bertacchini, M., Benitez, C., Carlos, J., Pizzoni & Cipriano, M. (2010). Collection and publication of a fixed text keystroke dynamics dataset, XVI Congreso Argentino de Ciencias de la Computacion (CACIC 2010). Bergadano, F., Gunetti, D. & Picardi, C. (2002). User authentication through keystroke dynamics, ACM Transactions on Information and System Security (TISSEC) 5(4): 367–397. Bleha, S. & Obaidat, M. (1991). Dimensionality reduction and feature extraction applications inidentifying computer users, IEEE transactions on systems, man and cybernetics 21(2): 452–456. Bleha, S., Slivinsky, C. & Hussien, B. (1990). Computer-access security systems using keystroke dynamics, IEEE Transactions On Pattern Analysis And Machine Intelligence 12 (12): 1216–1222. Boechat, G., Ferreira, J. & Carvalho, E. (2006). Using the keystrokes dynamic for systems of personal security, Proceedings of World Academy of Science, Engineering and Technology, Vol. 18, pp. 200–205. Campisi, P., Maiorana, E., Lo Bosco, M. & Neri, A. (2009). User authentication using keystroke dynamics for cellular phones, Signal Processing, IET 3(4): 333 –341. Chang, W. (2006a). Keystroke biometric system using wavelets, ICB 2006, Springer, pp. 647–653. Chang, W. (2006b). Reliable keystroke biometric systembased on a small number of keystroke samples, Lecture Notes in Computer Science 3995: 312. Chen, Y.-W. & Lin, C.-J. (2005). Combining svms with various feature selection strategies, Technical report, Department of Computer Science, National Taiwan University, Taipei 106, Taiwan. Cherifi, F., Hemery, B., Giot, R., Pasquet, M. & Rosenberger, C. (2009). Behavioral Biometrics for Human Identification: Intelligent Applications, IGI Global, chapter Performance Evaluation Of Behavioral Biometric Systems, pp. 57–74. Cho, S. & Hwang, S. (2006). Artificial rhythms and cues for keystroke dynamics based authentication, In International Conference on Biometrics (ICB), pp. 626–632. Clarke, N. & Furnell, S. (2006). Advanced user authentication for mobile devices, computers & security 27: 109–119. 178 Biometrics Keystroke Dynamics Authentication 23 Clarke, N. L. & Furnell, S. M. (2007). Authenticating mobile phone users using keystroke analysis, International Journal of Information Security 6: 1–14. Conklin, A., Dietrich, G. & Walz, D. (2004). Password-based authentication: A system perspective, Proceedings of the 37th Hawaii International Conference on System Sciences, Hawaii. Crawford, H. (n.d.). Keystroke dynamics: Characteristics and opportunities, Privacy Security and Trust (PST), 2010 Eighth Annual International Conference on, IEEE, pp. 205–212. de Magalhaes, T., Revett, K. & Santos, H. (2005). Password secured sites: stepping forward with keystroke dynamics, International Conference on Next Generation Web Services Practices. de Ru, W. G. & Eloff, J. H. P. (1997). Enhanced password authentication through fuzzy logic, IEEE Expert: Intelligent Systems and Their Applications 12: 38–45. Doddington, G., Liggett, W., Martin, A., Przybocki, M. & Reynolds, D. (1998). Sheep, goats, lambs and wolves: A statistical analysis of speaker performance in the nist 1998 speaker recognition evaluation, ICSLP98. Dozono, H., Itou, S. & Nakakuni, M. (2007). Comparison of the adaptive authentication systems for behavior biometrics using the variations of self organizing maps, International Journal of Computers and Communications 1(4): 108–116. El-Abed, M., Giot, R., Hemery, B. & Rosenberger, C. (2010). A study of users’ acceptance and satisfaction of biometric systems, 44th IEEE International Carnahan Conference on Security Technology (ICCST). El-Abed, M., Giot, R., Hemery, B., Shwartzmann, J.-J. & Rosenberger, C. (2011). Towards the security evaluation of biometric authentication systems, IEEE International Conference on Security Science and Technology (ICSST). Eltahir, W., Salami, M., Ismail, A. & Lai, W. (2008). Design and Evaluation of a Pressure-Based Typing Biometric Authentication System, EURASIP Journal on Information Security, Article ID 345047(2008): 14. Epp, C. (2010). Identifying emotional states through keystroke dynamics, Master’s thesis, University of Saskatchewan, Saskatoon, CANADA. Filho, J. R. M. & Freire, E. O. (2006). On the equalization of keystroke timing histograms, Pattern Recognition Letters 27: 1440–1446. Gaines, R., Lisowski, W., Press, S. & Shapiro, N. (1980). Authentication by keystroke timing: some preliminary results, Technical report, Rand Corporation. Galassi, U., Giordana, A., Julien, C. & Saitta, L. (2007). Modeling temporal behavior via structured hidden markov models: An application to keystroking dynamics, Proceedings 3rd Indian International Conference on Artificial Intelligence (Pune, India). Giot, R., El-Abed, M. & Chri (2011). Unconstrained keystroke dynamics authentication with shared secret, Computers & Security pp. 1–20. [in print]. Giot, R., El-Abed, M. & Rosenberger, C. (2009a). Greyc keystroke: a benchmark for keystroke dynamics biometric systems, IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS 2009), IEEE Computer Society, Washington, District of Columbia, USA, pp. 1–6. Giot, R., El-Abed, M. & Rosenberger, C. (2009b). Keystroke dynamics authentication for collaborative systems, International Symposium on Collaborative Technologies and Systems, pp. 172–179. Giot, R., El-Abed, M. & Rosenberger, C. (2009c). Keystroke dynamics with low constraints svm based passphrase enrollment, IEEE International Conference on Biometrics: Theory, 179 Keystroke Dynamics Authentication 24 Will-be-set-by-IN-TECH Applications and Systems (BTAS 2009), IEEE Computer Society, Washington, District of Columbia, USA, pp. 1–6. Giot, R., El-Abed, M. & Rosenberger, C. (2010). Fast learning for multibiometrics systems using genetic algorithms, The International Conference on High Performance Computing & Simulation (HPCS 2010), IEEE Computer Society, Caen, France, pp. 1–8. Giot, R., Hemery, B. & Rosenberger, C. (2010). Low cost and usable multimodal biometric system based on keystroke dynamicsand 2d face recognition, IAPR International Conference on Pattern Recognition (ICPR), IAPR, Istanbul, Turkey, pp. 1128–1131. Acecptance rate: 54/100. Giot, R. & Rosenberger, C. (2011). A new soft biometric approach for keystroke dynamics based on gender recognition, Int. J. of Information Technology and Management (IJITM), Special Issue on: "Advances and Trends in Biometric pp. 1–17. [in print]. Grabham, N. & White, N. (2008). Use of a novel keypad biometric for enhanced user identity verification, Instrumentation and Measurement Technology Conference Proceedings, 2008. IMTC 2008. IEEE, pp. 12–16. Guven, A. & Sogukpinar, I. (2003). Understanding users’ keystroke patterns for computer access security, Computers & Security 22(8): 695–706. Hocquet, S., Ramel, J.-Y. & Cardot, H. (2006). Estimation of user specific parameters in one-class problems, ICPR ’06: Proceedings of the 18th International Conference on Pattern Recognition, IEEE Computer Society, Washington, DC, USA, pp. 449–452. Hocquet, S., Ramel, J.-Y. & Cardot, H. (2007). User classification for keystroke dynamics authentication, The Sixth International Conference on Biometrics (ICB2007), pp. 531–539. Hosseinzadeh, D. & Krishnan, S. (2008). Gaussian mixture modeling of keystroke patterns for biometric applications, Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 38(6): 816–826. Hwang, S.-s., Lee, H.-j. & Cho, S. (2006). Improving authentication accuracy of unfamiliar passwords with pauses and cues for keystroke dynamics-based authentication, Intelligence and Security Informatics 3917: 73–78. Ilonen, J. (2003). Keystroke dynamics, Advanced Topics in Information Processing–Lecture . ISO/IEC19795-1 (2006). Information technology biometric performance testing and reporting, Technical report, International Organization for Standardization ISO/IEC 19795-1. ISO/IECFCD19792 (2008). Information technology – security techniques –security evaluation of biometrics, Technical report, International Organization for Standardization ISO/IEC FCD 19792. Janakiraman, R. & Sim, T. (2007). Keystroke dynamics in a general setting, Lecture notes in computer science 4642: 584. Kang, P. & Cho, S. (2009). A hybrid novelty score and its use in keystroke dynamics-based user authentication, Pattern Recognition p. 30. Karnan, M., Akila, M. & Krishnaraj, N. (2011). Biometric personal authentication using keystroke dynamics: A review, Applied Soft Computing 11(2): 1565 – 1573. The Impact of Soft Computing for the Progress of Artificial Intelligence. Khanna, P. & Sasikumar, M. (2010). Recognising Emotions from Keyboard Stroke Pattern, International Journal of Computer Applications IJCA 11(9): 24–28. Killourhy, K. & Maxion, R. (2008). The effect of clock resolution on keystroke dynamics, Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection, Springer, pp. 331–350. 180 Biometrics Keystroke Dynamics Authentication 25 Killourhy, K. & Maxion, R. (2009). Comparing anomaly-detection algorithms for keystroke dynamics, IEEE/IFIP International Conference on Dependable Systems & Networks, 2009. DSN’09, pp. 125–134. Killourhy, K. & Maxion, R. (2010). Keystroke biometrics with number-pad input, IEEE/IFIP International Conference on Dependable Systems & Networks, 2010. DSN’10. Kohonen, T. (1995). Self-organising maps, Springer Series in Information Sciences 30. Lopatka, M. & Peetz, M. (2009). Vibration sensitive keystroke analysis, Proceedings of the 18th Annual Belgian-Dutch Conference on Machine Learning, pp. 75–80. Marsters, J.-D. (2009). Keystroke Dynamics as a Biometric, PhD thesis, University of Southampton. Martinez-Diaz, M., Fierrez-Aguilar, J., Alonso-Fernandez, F., Ortega-Garcia, J. & Siguenza, J. (2006). Hill-climbing and brute force attacks on biometric systems: a case study in match-on-card fingerprint verification, Proceedings of the IEEE of International Carnahan Conference on Security Technology (ICCST). Modi, S. K. & Elliott, S. J. (2006). Kesytroke dynamics verification using spontaneously generated password, IEEE International Carnahan Conferences Security Technology. Monrose, F., Reiter, M. &Wetzel, S. (2002). Password hardening based on keystroke dynamics, International Journal of Information Security 1(2): 69–83. Monrose, F. & Rubin (1997). Authentication via keystroke dynamics, Proceedings of the 4th ACM conference on Computer and communications security, ACM Press New York, NY, USA, pp. 48–56. Monrose, F. & Rubin, A. (2000). Keystroke dynamics as a biometric for authentication, Future Generation Computer Syststems 16(4): 351–359. Montalvao Filho, J. & Freire, E. (2006). Multimodal biometric fusion–joint typist (keystroke) and speaker verification, Telecommunications Symposium, 2006 International, pp. 609–614. Nguyen, T., Le, T. & Le, B. (2010). Keystroke dynamics extraction by independent component analysis and bio-matrix for user authentication, in B.-T. Zhang & M. Orgun (eds), PRICAI 2010: Trends in Artificial Intelligence, Vol. 6230 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, pp. 477–486. Obaidat, M. & Sadoun, B. (1997). Verification of computer users using keystroke dynamics, Systems, Man and Cybernetics, Part B, IEEE Transactions on 27(2): 261–269. Pavaday., N., ., S. S. & Nugessur, S. (2010). Investigating & improving the reliability and repeatability of keystroke dynamics timers, International Journal of Network Security & Its Applications (IJNSA), 2(3): 70–85. Pohoa, V. v., Pohoa, S., Ray, A. & Joshi, S. S. (2009). Hidden markov model (hmm)-based user authentication using keystroke dynamics, patent. Rao, B. (2005). Continuous keystroke biometric system, Master’s thesis, University of California. Ratha, N. K., Connell, J. H. & Bolle, R. M. (2001). An analysis of minutiae matching strength, Audio- and Video-Based Biometric Person Authentication. Revett, K. (2008). Behavioral biometrics: a remote access approach, Wiley Publishing. Revett, K. (2009). A bioinformatics based approach to user authentication via keystroke dynamics, International Journal of Control, Automation and Systems 7(1): 7–15. Revett, K., de Magalhães, S. & Santos, H. (2006). Enhancing login security through the use of keystroke input dynamics, Lecture notes in computer science 3832. Revett, K., de Magalhaes, S. & Santos, H. (2007). On the use of rough sets for user authentication via keystroke dynamics, Lecture notes in computer science 4874: 145. 181 Keystroke Dynamics Authentication 26 Will-be-set-by-IN-TECH Rodrigues, R., Yared, G., do NCosta, C., Yabu-Uti, J., Violaro, F. & Ling, L. (2006). Biometric access control through numerical keyboards based on keystroke dynamics, Lecture notes in computer science 3832: 640. Rogers, S. J. & Brown, M. (1996). Method and apparatus for verification of a computer user’s identification, based on keystroke characteristics. US Patent 5,557,686. Ross, A. & Jain, A. (2004). Biometric sensor interoperability: A case study in fingerprints, Proc. of International ECCV Workshop on Biometric Authentication (BioAW), Springer, pp. 134–145. Ross, A., Nandakumar, K. & Jain, A. (2006). Handbook of Multibiometrics, Springer. Sang, Y., Shen, H. & Fan, P. (2004). Novel impostors detection in keystroke dynamics by support vector machine, Proc. of the 5th international conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2004). Schneier, B. (1999). Inside risks: the uses and abuses of biometrics, Commun. ACM . Song, D., Venable, P. & Perrig, A. (1997). User recognition by keystroke latency pattern analysis, Retrieved on 19. Spillane, R. (1975). Keyboard apparatus for personal identification. Stefan, D. & Yao, D. (2008). Keystroke dynamics authentication and human-behavior driven bot detection, Technical report, Technical report, Rutgers University. Teh, P., Teoh, A., Ong, T. & Neo, H. (2007). Statistical fusion approach on keystroke dynamics, Proceedings of the 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System-Volume 00, IEEE Computer Society, pp. 918–923. Theofanos, M., Stanton, B. & Wolfson, C. A. (2008). Usability & biometrics: Ensuring successful biometric systems, Technical report, The National Institute of Standards and Technology (NIST). Umphress, D. & Williams, G. (1985). Identity verification through keyboard characteristics, Internat. J. Manâ ˘ A¸ SMachine Studies 23: 263–273. Yu, E. & Cho, S. (2004). Keystroke dynamics identity verification – its problems and practical solutions, Computers & Security 23(5): 428–440. 182 Biometrics 0 DWT Domain On-Line Signature Verification Isao Nakanishi, Shouta Koike, Yoshio Itoh and Shigang Li Tottori University Japan 1. Introduction Biometrics attracts attention since person authentication becomes very important in networked society. As the biometrics, the fingerprint, iris, face, ear, vein, gate, voice and signature are well known and are used in various applications (Jain et al., 1999; James et al., 2005). Especially, assuming mobile access using a portable terminal such as a personal digital assistant (PDA), a camera, microphone, and pen-tablet are normally equipped; therefore, authentication using the face, voice and/or signature can be realized with no additional sensor. Fig. 1. A PDA with a pen-tablet On the other hand, the safety of biometric data is discussed actively. Every human being has limited biometrics, for example, only ten fingerprints and one face. If the biometric data are leaked out and it is known whose they are, they are never used for authentication again. To deal with this problem, cancelable biometric techniques have been proposed, which use not biometric data directly but one-to-one transformed data from the biometric data. However, such a technique is unnecessary if the biometrics itself is cancelable. Among various biometric modalities, only the signature is cancelable from a viewpoint of spoofing. Even if a signature shape is known by others, it is possible to cope with the problem by changing the shape. Especially, in on-line signatures, the habit during writing is biometrics and it is not remained in the signature shape; therefore, to imitate it is quite difficult even if the signature shape is copied. As a result, the on-line signature verification is actively researched 9 2 Biometrics (Dimauro et al., 2004; Fierrez & Ortega-Garcia, 2007; Jain et al., 2002; Plamondon & Srihari, 2000). However, the verification performance tends to be degraded since the on-line signature is a dynamic trait. We have proposed a new on-line signature verification method in which a pen-position parameter is decomposed into sub-band signals using the discrete wavelet transform (DWT) and total decision is done by fusing verification results in sub-bands (Nakanishi et al., 2003; 2004; 2005). The reason why we use only the pen-position parameter is that detecting functions of other parameters such as pen-pressure, pen-altitude, and/or pen-direction are not equipped in the PDA. However, since the signature shape is visible, it is relatively easy to forge the pen-position parameter by tracing genuine signatures by others. In the proposed method, individual features of a signature are enhanced and extracted in the sub-band signals, so that such well-forged signatures can be distinguished from genuine ones. Additionally, in the verification process of the proposed method, dynamic programming (DP) matching is adopted to make it possible to verify two data series with different number of sampled points. The purpose of the DP matching is to find the best combination between such two data series. Concretely, a DP distance is calculated in every possible combination of the two data series and as a result the combination which has the smallest DP distance is regarded as the best. But there are problems in use of the DP matching. The DP distance is obtained as dissimilarity; therefore, signatures with large DP distances are rejected even if they are of genuine. For instance, in a pen-tablet system, a pen-up while writing causes large differences in coordinate values of pen-position and so increases false rejection. On the other hand, signatures with small DP distances are accepted even if they are forgery. The DP matching forces to match two signatures even if either is forgery. It increases false acceptance. Consequently, we propose simply-partitioned DP matching. Two data series compared are divided into several partitions and the DP distance is calculated every partition. The DP distance is initialized at the start of a next partition, so that it reduces excessively large DP distances, that is, the false rejection. On the other hand, limitation of combination in matching is effective for rejecting forgeries; therefore, it reduces the false acceptance. There is another important problem when we use the DP matching. The DP distance is proportional to the number of signature’s sampled data, that is, signature complexity (shape), so that if it is used as a criterion in verification, each signature (user) has a different optimal threshold. But, it is general to use a single threshold commonly in an authentication system. If the common threshold is used for all signatures, it results in degradation of verification performance. Therefore, we have studied threshold equalization in the on-line signature verification (Nakanishi et al., 2008). We propose new equalizing methods based on linear and nonlinear approximation between the number of sampled data and optimal thresholds. 2. DWT domain on-line signature verification In this section, we briefly explain the proposed on-line signature verification in the DWT domain. 2.1 System overview A signal flow diagram is shown in Fig. 2. An on-line signature is captured as x and y coordinate (pen-position) data in a digital pen-tablet system and their sampled data are 184 Biometrics DWT Domain On-Line Signature Verification 3 Fig. 2. DWT domain on-line signature verification given by x(n) and y(n) where n = 0, 1, · · · , S n − 1 and S n is the number of sampled data. They are respectively normalized in both time and amplitude domains and then decomposed into sub-band signals using sub-band filters by the DWT (Nakanishi et al., 2003; 2004; 2005). In advance of verification, sub-band signals are enrolled as a template for each user. Templates are generated by ensemble-averaging several genuine signatures. Please refer to Ref. (Nakanishi et al., 2003) for the details. At the verification stage, each decomposedsignal is compared with its template based on the DP matching and a DP distance is obtained at each decomposed level. Final score is calculated by combing the DP distances at appropriate sub-bands in both coordinates. Total decision is done by comparing the final score with a threshold and it is verified whether the signature data are of genuine or not. 2.2 Feature extraction by DWT In the following, the x(n) and y(n) are represented as v(n) together for convenience. The DWT of the pen-position data: v(n) is defined as u k (n) = ∑ m v(m)Ψ k,n (m) (1) where Ψ k,n (m) is a wavelet function and · denotes the conjugate. k is a frequency (level) index. It is well known that the DWT corresponds to an octave-band filter bank (Strang & Nguyen, 1997) of which parallel structure and frequency characteristics are shown in Fig. 3, where (↓ 2 k ) and (↑ 2 k ) are down-sampling and up-sampling, respectively. M is the maximum level of the sub-band, that is, the decomposition level. A k (z) and S k (z) (k = 1, · · · , M) are analysis filters and synthesis ones, respectively. The synthesized signal: v k (n) in each sub-band is the signal in higher frequency band and called Detail which corresponds to the difference between signals. Therefore, we adopt the Detail as an enhanced individual feature which can be extracted with no specialized function: pen-pressure, pen-altitude, and/or pen-direction which are not equipped in the PDA. 185 DWT Domain On-Line Signature Verification 4 Biometrics (a) Parallel structure (b) Frequency characteristics Fig. 3. Sub-band decomposition by DWT Let us get another perspective on the effect of the sub-band decomposition using Fig. 4. Each signature is digitized at equal (common) sampling period using a pen-tablet system. In the proposed system, writing time of all signatures is normalized in order to suppress intra-class variation. Concretely, the sampling period of each signature is divided by the number of sampled data and so becomes real-valued. Even genuine signatures have different number of sampled data; therefore, all signatures have different normalized sampling periods, that is, different sampling frequencies. In general, variation of writing time in the genuine signatures is small, so that their sampling periods (frequencies) are comparable as shown in Fig. 4 (a). On the other hand, in the case of forged signatures, the variation of writing time is relatively large since it is not easy for forgers to imitate writing speed and rhythm of genuine signatures. Thus, sampling periods (frequencies) of the forged signatures become greatly different from those of the genuine signatures as in Fig. 4 (b). The maximum frequency: f m of the octave-band filter bank is determined by the sampling frequency based on the “sampling theory". If the sampling frequencies are greatly different, each octave band (decomposition level) includes greatly different frequency elements as illustrated in (b). In other words, even if levels compared are the same, frequency elements included in one level are different from the other, so that the differences between genuine signatures and forged ones are accentuated. 186 Biometrics DWT Domain On-Line Signature Verification 5 (a) Comparison of genuine signatures (b) Comparison of a genuine signature with its forgery Fig. 4. Effect of sub-band decomposition The advantage was confirmed comparing with a time-domain method (Nakanishi et al., 2005). Of course, if forgers imitate writing speed and rhythm of genuine signatures, it is impossible for the proposed method to distinguish forged signatures from genuine ones. 2.3 Verification by DP matching Since on-line signatures have large intra-class variation, one-to-one matching cannot be applied in verification. In order to deal with the problem, the verification was performed every stroke (intra-stroke or inter-stroke) in the conventional system (Nakanishi et al., 2003; 2004; 2005; 2008). However, a part of signature databases eliminates the data in inter-strokes and so we could not apply the conventional system to such a database. Therefore, we introduce DP matching into the verification process. The DP matching is effective in finding the best combination between two data series even if they have different number as illustrated in Fig. 5. Letting the two data series be a(i) (i = 0, 1, · · · , I −1) and b(j) (j = 0, 1, · · · , J −1), the local distance at kth is defined as d(k) = |a(i) k − b(j) k | (k = 0, 1, · · · , K −1) (2) 187 DWT Domain On-Line Signature Verification 6 Biometrics Fig. 5. DP matching where instead of i and j, k is used as another time index since these data are permitted to be referred redundantly. By accumulating the local distances in one possible combination between the two series, a DP distance is given by D(a, b) = K−1 ∑ k=0 w(k)d(k) (3) where w(k) is a weighting factor. After calculating the DP distance in all possible combinations, we can find the best combination by searching the combination with the smallest DP distance. Moreover, since the DP distance depends on the number of sampled data, the normalized DP distance is used in general. nD(a, b) = D(a, b)/ K−1 ∑ k=0 w(k) (4) Assuming the weight is symmetric: (1-2-1) and the initial value is zero, K−1 ∑ k=0 w(k) = I + J. (5) In the proposed method, the DP distance is obtained at each sub-band level. Let the DP distance at lth level be D(v, v t ) l where v is sampled data series of a signature for verification and v t is that of a template, the normalized DP distance is given by nD(v, v t ) l = D(v, v t ) l /(V n + T n ) (6) where V n is the number of sampled data in the verification signature and T n is that of the template. 188 Biometrics DWT Domain On-Line Signature Verification 7 A total distance (TD) is obtained by accumulating the normalized DP distances in sub-bands. TD = c x · 1 L M ∑ l=M−L+1 nD(x, x t ) l + c y · 1 L M ∑ l=M−L+1 nD(y, y t ) l (7) where c x and c y are weights for combining the DP distances in x and y coordinates and c x + c y = 1, c x > 0, c y > 0. L is the number of levels used in the total decision. 2.4 Verification experiments In order to confirm verification performance of the proposed system, we carried out experiments in the following conditions. The wavelet function was Daubechies8. The maximum level of the sub-band: M was 8 and the number of levels used in the total decision: L was 4. The combination weights were c x = c y = 0.5, which mean to take the average. For generating templates, data of five genuine signatures were ensemble-averaged. We used a part of the on-line signature database: SVC2004 in which the data in inter-strokes were eliminated. The number of subjects was 40 and 17 subjects signed their names in Chinese characters and the rest in alphabetical ones. For collecting skilled forgeries, imposters could see how genuine signatures were being written. The total number of signatures was 1600. Please refer to Ref. (SVC2004, 2004) for more information. The verification performance was evaluated by using an equal error rate (EER) where a false rejection rate (FRR) was equal to a false acceptance rate (FAR). The EER of the proposed system was 20.0 %. For reference, the EER of the conventional system was 28.3 %, so that it is confirmed that introducing the DP matching is effective for not only applying to the standard database but also improving verification performance. On the other hand, assuming to use individually-optimal thresholds for all subjects, we averaged EERs of all subjects and then so obtained EER of 15.3 %. This is a rough evaluation but suggests that if a single common threshold is optimal for all subjects, the verification performance could be improved further. This issue is examined in Sect. 4. 3. Simply partitioned DP matching There is another issue to be overcome in order to improve the verification performance. For instance, in a pen-tablet system, when the pen tip is released from the surface of the tablet, it is not guaranteed to get precise coordinate values of pen-position. It sometimes brings large differences fromtemplate data and then leads to a large DP distance. The signature with such a large DP distance is rejected even if it is genuine. This increases false rejection. Conversely, signatures with small DP distances are accepted even if they are forgery. In particular, skilled forgeries (well-forged signatures) could make the DP distance smaller. The DP matching forces to match two signatures even if either is forged one. This increases false acceptance. Consequently, we propose simply-partitioned DP (spDP) matching. The concept is illustrated in Fig. 6, where the number of partitions is four. Both data series: a(i) and b(j) are divided into several partitions of the same integer number and a sub DP distance is calculated every partition. If the division leaves remainders, they are singly distributed to partitions. The sub DP distances are initialized at the start of next partitions and a total DP distance is obtained by summing the sub DP distances in all partitions. 189 DWT Domain On-Line Signature Verification 8 Biometrics Fig. 6. Simply-partitioned DP matching (Q=4) Assuming that genuine signatures have equivalent rhythms in writing, even if their data are partitioned, rhythms in corresponding partitions compared are still equivalent. Therefore, when a verification signature is genuine, appropriate matching pairs tend to exist in diagonal direction in Fig. 6. As a result, the spDP matching has no ill effect for false rejection. Furthermore, even if excessively a large sub DP distance is caused by the irregular pen-up mentioned above in a partition, it is initialized at the start of the next partition, so that the spDP matching prevents the total DP distance from becoming excessively large and has an effect on reducing false rejection. On the other hand, it is difficult for forgers to copy rhythms in writing of genuine signatures, so that the rhythm in each partition of forged signatures becomes different from that of genuine ones. Resultingly, matching pairs between the genuine signature and its forged one are not in the diagonal direction and so are excluded even if they have small DP distances. The spDP matching is also effective in reducing false acceptance. Such a concept that inappropriate pairs are excluded by partitioning the DP distance has been already proposed (Sano et al., 2007; Yoshimura & Yoshimura, 1998) but they assume to write Chinese (Kanji) characters in standard style and the partitioning is done every character or stroke. Therefore, they could not be directly applied to the case of a cursive style (connected characters). Of course, they need additional processing for character or stroke detection. Let the number of partitions and the sub DP distance be Q and D(v, v t ) q , respectively, the normalized DP distance at the sub-band level: l is obtained by summing sub DP distances in all partitions. nD(v, v t ) l = Q ∑ q=1 D(v, v t ) q /(V n + T n ) (8) A total distance (TD) is given by Eq. (7). 190 Biometrics DWT Domain On-Line Signature Verification 9 By the way, the matching window is generally adopted in the DP matching as shown in Fig. 5 in order to reduce calculation amount by excluding unlikely pairs. Comparing between Figs. 5 and 6, it is clear that the spDP matching is more effective for excluding inappropriate pairs than the matching window. 3.1 Evaluation of spDP matching We evaluated verification performance using the spDP matching. Conditions are similar with those in Sect. 2.4. EERs in various numbers of partitions are summarized in Table 1 where the case of 0 partitions corresponds to the conventional normalized DP matching. Number of Partitions 0 2 3 4 5 6 EER (%) 20.0 17.8 16.4 16.6 17.0 16.4 Table 1. EERs in various numbers of partitions From these results, it is confirmed that the spDP matching decreased the EER by 2-3%. In the following, we set the number of partitions at 4. 4. Threshold equalizing There is an important issue to be overcome as mentioned in Sect. 2.4 in order to improve verification performance. In not only on-line signature verification but also all biometric authentication systems, final scores are compared with a threshold which is preliminary determined. In addition, the threshold should be common to all users. Therefore, when the final score (the DP distance) of each user is greatly different from those of others, the verification performance tends to be degraded by using the common threshold. In general, the normalized DP distance given by Eq. (4) is used for dealing with this problem. However, the normalization also makes the DP distances of forged signatures small and thereby might increases false acceptance. We have studied to equalize the threshold instead of using the normalization (Nakanishi et al., 2008). A total distance (TD) is rewritten as TD = c x · 1 L M ∑ l=M−L+1 D(x, x t ) l + c y · 1 L M ∑ l=M−L+1 D(y, y t ) l (9) where please be aware that unnormalized DP distance D(v, v t ) l is used. Generally, complex signatures have large number of sampled data since they consume relatively long time for writing. The larger the number of sampled data of a signature becomes, the larger intra-class variation becomes and as a result, it makes a DP distance large. Final decision is achieved by comparing the DP distance with a threshold; therefore, to make the DP distance inversely proportional to the number of sampled data suppresses the variation range of the DP distance and then it leads to equalization of thresholds. Based on this concept, the conventional equalization is defined as TD p eq = γ T p n TD p (10) where p is user number and TD p , TD p eq and T p n are the total distance, the equalized total (final) distance and the number of sampled data of the template of the user, respectively. γ is a 191 DWT Domain On-Line Signature Verification 10 Biometrics constant for adjusting the final distance to an appropriate value. When the number of sampled data in a signature is too small, the final distance of the signature is enlarged. Conversely, large number of sampled data in a signature reduces the final distance. The effect of the threshold equalizing was already confirmed (Nakanishi et al., 2008). 4.1 New threshold equalizing methods Figure 7 shows the relation between the number of sampled data in signatures (templates) and their optimal thresholds (total DP distances) using the spDP matching (Q = 4) in SVC2004, where the thresholds which bring EERs are regarded as optimal. Fig. 7. Relation between the number of sampled data and optimal thresholds The optimal thresholds are widely distributed; therefore, it is easy to guess that common use of a single threshold is not good for verification performance. In addition, the relation between the number of sampled data and the optimal threshold is not simple differently from that assumed in the conventional equalization. 4.1.1 Equalization using linear approximation Assuming that the relation between the number of sampled data and the optimal threshold is approximated by a linear function, the total DP distance is equalized as TD p eq = γ α · T p n + β TD p (11) where γ is the adjustment constant as well as the conventional method. α and β are the gradient and intercept of the linear function. 4.1.2 Equalization using nonlinear approximation On the other hand, the relation between the number of sampled data and the optimal threshold could be fitted by a nonlinear function. The total DP distance is adjusted by using an exponential function as TD p eq = γ exp(α · T p n + β) TD p (12) 192 Biometrics DWT Domain On-Line Signature Verification 11 where α and β are constants for fitting the nonlinear function to the relation between the number of sampled data and the optimal threshold. 4.2 Evaluation of threshold equalizing In order to verify effectiveness of the threshold equalizing methods, we evaluated verification performance using the SVC2004, again. Conditions are the same as those in Sect. 2.4. The number of partitions in DP matching was 4. The distribution of optimal thresholds after equalization is compared with that before equalization in Fig. 8 where the uncolored triangles are before equalization and the black ones are after equalization. The broken lines indicate approximation functions where α = 10 and β = −293 in the linear case and α = 0.0069 and β = 3.3 in the nonlinear case. (a) Linear case (b) Nonlinear case Fig. 8. Distribution of optimal thresholds before and after equalization in the linear and nonlinear approximation cases From a viewpoint of their universality, it is better to determine them using a training data set, which is independent of a test data set. However, the proposed equalizing methods are based on rough approximation of the relation between the number of sampled data and the optimal threshold in the SVC2004. If the relation in the training data set is equivalent with that in the 193 DWT Domain On-Line Signature Verification 12 Biometrics test data set, the proposed methods does not depend on the data used. The approximation depends on not training data sets but databases. The larger the number of data becomes, the more universal the constants. In both cases, γ was set to a value which adjusts the thresholds to around 2000. It is confirmed that the optimal thresholds, that is, the DP distances were adjusted to around 2000 and the variation range of the DP distances was narrowed. For achieving quantitative evaluation, we analyzed statistical variance of optimal threshold values before and after the equalization. The variance before the equalization was 0.27 but after the equalization it was reduced to 0.05 in the linear case and 0.07 in the nonlinear case. Method EER(%) Variance Unnormalized DP 25.4 0.54 Normalized DP 20.0 0.17 4-partitioned DP 16.6 0.05 Conventional equalization 19.9 0.24 Linear equalization 19.0 0.22 Nonlinear equalization 19.5 0.23 4-partitioned DP + Linear equalization 14.6 0.05 4-partitioned DP + Nonlinear equalization 14.9 0.07 Table 2. EERs and variances in various methods Finally, EERs and variances in various methods are summarized in Table 2. Comparing the EER and variance in the 4-partitioned DP matching to those in the normalized DP one, it is confirmed that the proposed spDP matching is more effective in improving verification performance than the general-used DP matching. Similarly, the proposed new threshold equalization methods are confirmed to be more efficient than the normalizedDP matching and the conventional method. Moreover, combining the spDP matching with the new threshold equalization is much more effective. Especially, the smallest EER of 14.6% and variance of 0.05 were obtained when the threshold equalization using the linear approximation was applied. As confirmed in Fig. 8 (b), the adjustment in the nonlinear case might be excessive when the number of sampled data was large. It is a future problem to adopt other functions for approximating the relation between the number of sampled data and the optimal threshold. On the other hand, the EER of about 15% may not be absolutely superior to those of other on-line signature verification methods. However, it is possible to introduce the spDP matching and/or the threshold equalizing into the methods based on the DP matching and it might also improve their performance. 5. Conclusions We have studied on-line signature verification in the DWT domain. In order to improve the verification performance, we introduced spDP matching and threshold equalizing into the verification process. In the spDP matching, two data series compared were divided into partitions, a sub DP distance was calculated every partition, and then a total DP distance was obtained by summing the sub DP distances. The sub DP distances were initialized at the start of next partitions; therefore, accumulative distances were also initialized and the total DP distance 194 Biometrics DWT Domain On-Line Signature Verification 13 was prevented from becoming excessively large. It was effective in reducing false rejection. Also, the spDP matching reducedfalse acceptance since limitation of combination in matching excluded inappropriate matching pairs. In the threshold equalizing, by approximating the relation between the number of sampled data in a signature and its optimal threshold by linear or nonlinear functions, the variation range of optimal thresholds of all signatures were suppressed and as a result, it prevented the verification performance from being degraded by using a single common threshold for all signatures. In experiments using a part of the signature database: SVC2004, it was confirmed that each proposed method was efficient in improving the verification performance. Moreover, combining the spDP matching with the threshold equalizing was more effective and reduced the error rate by about 5% comparing with the general-used DP matching. We have an issue that there might be more effective approximate functions for threshold equalization. Also, we evaluated signature’s complexity by using the number of sampled data but it is expected to use sub-band signals for evaluating the complexity. 6. References Dimauro, G.; Impedovo, S.; Lucchese, M. G.; Modugno, R. & Pirlo, G. (2004). Recent Advancements in Automatic Signature Verification, Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition, Oct. 2004, pp. 179-184 Fierrez, J. &Ortega-Garcia, J. (2007). On-Line Signature Verification, In: Handbook of Biometrics, A. K. Jain, P. Flynn, and A. A. Ross, (Eds.), Chapter 10, Springer, New York Jain, A. K.; Griess, F. D. & Connell, S. D. (2002). On-Line Signature Verification, Pattern Recognition, Vol. 35, No. 12, Dec. 2002, pp. 2963-2972 Jain, A.; Bolle, R. & Pankanti, S. (1999). BIOMETRICS, Kluwer Academic Publishers, Massachusetts Wayman, J.; Jain, A.; Maltoni, D. & Maio, D. (2005). Biometric Systems, Springer, London Nakanishi, I.; Nishiguchi, N.; Itoh, Y. & Fukui, Y. (2003). On-Line Signature Verification Method Based on Discrete Wavelet Transform and Adaptive Signal Processing, Proceedings of Workshop on Multimodal User Authentication, Santa Barbara, USA, Dec. 2003, pp.207-214 Nakanishi, I.; Nishiguchi, N.; Itoh, Y. & Fukui, Y. (2004). On-Line Signature Verification Based on Discrete Wavelet Domain Adaptive Signal Processing, Proceedings of International Conference on Biometric Authentication, Hongkong, Jul. 2004, pp. 584-591 Nakanishi, I.; Nishiguchi, N.; Itoh, Y. & Fukui, Y. (2005). On-Line Signature Verification Based on Subband Decomposition by DWT and Adaptive Signal Processing, Electronics and Communications in Japan, Part 3, Vol. 88, No.6, Jun. 2005, pp.1-11 Nakanishi, I.; Sakamoto, H.; Itoh, Y. & Fukui, Y. (2008). Threshold Equalization for On-Line Signature Verification, IEICE Trans. Fundamentals, Vol.E91-A, No.8, Aug. 2008, pp. 2244-2247 Plamondon, R. & Srihari, S. N. (2000). On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 22, No. 1, Jan. 2000, pp. 63-84 195 DWT Domain On-Line Signature Verification 14 Biometrics Sano, T.; Wada, N.; Yoshida, T. & Hangai, S. (2007). A Study on Segmentation Scheme for DP Matching in Japanese Signature Verification (in Japanese), Proceedings of 2007 IEICE General Conference, Mar. 2007, p. B-18-4 Strang, G. & Nguyen, T. (1997). Wavelet and Filter Banks, Wellesley-Cambridge Press, Massachusetts. SVC2004. (2004). URL: http://www.cse.ust.hk/svc2004/index.html Yoshimura, M. & Yoshimura, I. (1998). An Off-Line Verification Method for Japanese Signatures Based on a Sequential Application of Dynamic Programming Matching Method (in Japanese), IEICE Trans. Information and Systems, Vol. J81-D-II, No. 10, Oct. 1998, pp. 2259-2266 196 Biometrics Part 3 Medical Biometrics 0 Heart Biometrics: Theory, Methods and Applications Foteini Agrafioti, Jiexin Gao and Dimitrios Hatzinakos The Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto 10 Kings College Road, Toronto Canada 1. Introduction Automatic and accurate identity validation is becoming increasingly critical in several aspects of our every day lives such as in financial transactions, access control, traveling, healthcare and other. Traditional strategies to automatic identity recognition include items such as PIN numbers, tokens, passwords and ID cards. Despite the wide deployment of such tactics, the authenticating means is either entity or knowledge-based which rises concerns with regard to their ease of acquisition and use from unauthorized third parties. According to the latest The US Federal Commission Report, Frebruary 2010 (n.d.), in 2009 identity theft was the number one complaint category ( a total of 721,418 cases of consumer complaints). As identity theft can take different forms, credit card fraud was the most prominent (17%), followed by falsification of government documents (16%), utilities fraud (15%), employment fraud (13%) and other. Among these cases, true-identity theft constitutes only a small portion of the complaints, while ID falsification appears to be the greatest threat. Unfortunately, the technology for forgery advances without analogous counterfeit improvements. Biometric recognition was introduced as a more secure means of identity establishment. Biometric features are characteristics of the human body that are unique for every individual and that can be used to establish his/her identity in a population. These characteristics can be either physiological or behavioral. For instance, the face, the iris and the fingerprints are physiological features of the body with identifying information. Examples of behavioral features include the keystroke dynamics, the gait and the voice. The fact that biometric features are directly linked with the users presents an extraordinary opportunity to bridge the security gaps caused by traditional recognition strategies. Biometric features are difficult to steal or counterfeit when compared to PIN numbers or passwords. In addition, the convenience by which a biometric feature can be presentedmakes the respective systems more accessible and easy to use. However, biometric recognition has a drawback that rises from the nature of the authenticating modality. As opposed to static PIN numbers or passwords, biometric recognition may present false rejection since usually no two readings of the same biometric feature are identical. Anatomical, psychological or even environmental factors affect the appearance of the biometric feature at a particular instance. For instance, faces may be presented to the recognizers under various expressions, different lighting settings or with 10 2 Will-be-set-by-IN-TECH occlusion (glasses, hats etc). This may introduce significant variability (commonly referred to as intra-subject or intra-class variability) and the challenge is to design algorithms that can find robust biometric patterns. Although the intra-subject variability is universal for all biometric modalities, every feature has unique characteristics. For instance, face pictures may be acquired from distance which makes them suitable for surveillance. On the opposite, fingerprints need direct contact with the sensing device, and despite the robust biometric signature that there exists, most of the error arises from inefficient processing of the image. Therefore, given a set of standards it is difficult, if not impossible, to choose one feature that satisfies all criteria. Every biometric feature has its own strengths and weaknesses and deployment choices are based on the characteristics of the envisioned application environment. On the assumption that intra-subject variability can be sufficiently addressed with appropriate feature extraction, another consideration with this technology is the robustness to circumvention and replay attacks. Circumvention is a form of biometric feature forgery, for example the use of falsified fingerprint credentials that were copied from a print of the original finger. A replay attack is the presentation to the system of the original biometric feature from an illegitimate subject, for example voice playbacks in speaker recognition systems. Biometric obfuscation is another prominent risk with this technology. There are cases where biometric features are intentionally removed to avoid establishment of the true identity (for example asylum-seekers in Europe Peter Allen, Calais migrants mutilate fingerprints to hide true identity, Daily Mail (n.d.)). With the wide deployment of biometrics, these attacks are becoming frequent and concerns are once again rising on the security levels that this technology can offer. Concentrated efforts have been made for the development the next generation of biometric characteristics that are inherently robust to the above mentioned attacks. Characteristics that are internal to the human body have been investigated such as vein patterns, the odor and cognitive biometrics. Similarly, the medical biometrics constitutes another category of new biometric recognition modalities that encompasses signals which are typically used in clinical diagnostics. Examples of medical biometric signals are the electrocardiogram (ECG), phonocardiogram (PPG), electroencephalogram (EEG), blood volume pressure (BVP), electromyogram (EMG) and other. Medical biometrics have been actively investigated only within the last decade. Although the specificity to the individuals had been observed before, the complicated acquisition process and the waiting times were restrictive for application in access control. However, with the development of dry recoding sensors that are easy to attach even by non-trained personnel, the medical biometrics field flourished. The rapid advancement between 2001-2010 was supported by the fact that signal processing of physiological signals (or biosignals) had already progressedfor diagnostic purposes and a plethora of tools were available for biometric pattern recognition. The most prominent strength of medical biometrics is the robustness to circumvention, replay and obfuscation attacks. If established as biometrics, then the respective systems are empowered with an inherent shield to such threats. Another advantage of medical biometrics is the possibility of utilizing them for continuous authentication, since they can provide a fresh biometric reading every couple of seconds. This work is interested in the ECG signal, however the concepts presented herein may be extended to all medical biometric modalities. 200 Biometrics Heart Biometrics: Theory, Methods and Applications 3 2. ECG biometrics: Motivation and challenges The ECG signal describes the electrical activity of the heart over time. It is recorded non-invasively with electrodes attached at the surface of the body. Traditionally, physicians use the ECG to gain insight on heart conditions, usually with complementary tests required in order to finalize a diagnosis. However, from a biometrics perspective, it has been demonstrated that the ECG has sufficient detail for identification. The advantages of using the ECGfor biometric recognition can be summarized as universality, permanence, uniqueness, robustness to attacks, liveness detection, continuous authentication and data minimization. More precisely, 1. Universality refers to the ability of collecting the biometric sample from the general population. Since the ECG is a vital signal, this property is satisfied naturally. 2. Permanence refers to the ability of performing biometric matches against templates that have been designed earlier in time. This essentially requires that the signal is stable over time. As it will be discussed later, the ECG is affected by both physical and psychological activity, however even though the specific local characteristics of the pulses may change, the overall diacritical waves and morphologies are still observable. 3. Uniqueness is guaranteed in the ECG signal because of its physiological origin. While ECG signals of different individuals conform to approximately the same pattern, there is large inter-individual variability due to the various electrophysiological parameters that control the generation of this waveform. Uniqueness will be further discussed in Section 3.1. 4. Robustness to attacks. The particular appearance of the ECG waveform is the outcome of several sympathetic and parasympathetic factors of the human body. Controlling the waveform or attempting to mimic somebody else’s ECG signal is extremely difficult, if not impossible. To the best of our knowledge there is currently no means of falsifying an ECG waveform and presenting it to a biometric recognition system. Obfuscation is also addressed naturally. 5. Liveness detection. ECG offers natural liveness detection, being only present in a living subject. With this modality the recognizer can trivially ensure sensor liveness. Other biometric modalities, such the iris or the fingerprint require additional processing to establish the liveness of the reading. 6. Continuous authentication. As opposed to static iris or fingerprint images, the ECG is a dynamic biometric feature that evolves with time. When deployed for security in welfare monitoring environments, a fresh reading can be obtained every couple of second to re-authenticate an identity. This property is unique to medical biometrics and can be vital in avoiding threats such as field officer impersonation. 7. Data minimization. Privacy intrusion is becoming increasingly critical in environments of airtight security. One way to address this problem is to utilize as less identifying credentials as possible. Data minimization is a great possibility with ECG biometrics because there are environments where the collection of the signal is performedirrespective of the identification task. Examples of such environments are tele-medicine, patient monitoring in hospitals, field agent monitoring (fire-fighters, policemen, soldiers etc). Despite the advantages, notable challenges are presented with this technology when envisioning large-scale deployment: 201 Heart Biometrics: Theory, Methods and Applications 4 Will-be-set-by-IN-TECH 1. Time dependency. With time-varying biosignals there is high risk of instantaneous changes which may endanger biometric security. Recordings of the cardiac potential at the surface of the body are very prone to noise due to movements. However, even in the absence of noise, the ECG signal may destabilize with respect to a biometric template that was constructed some time earlier. The reason for this is the direct effect that the body’s physiology and psychology have on the cardiac function. Therefore, a central aspect of the ECG biometrics research is the investigation of the sources of intra-subject variability. 2. Collection periods. As opposed to biometrics such as the face, the iris or the fingerprint, where the biometric information is available for capturing at any time instance, this is not the case with the ECG signal. Every heart beat is formed within approximately a second, which essentially means that longer waiting times are expected with this technology especially when long ECG segments are required for feature extraction. The challenge however is to minimize the number of pulses that the algorithm uses for recognition, as well as the processing time. 3. Privacy implications. When collecting ECG signals a large amount of sensitive information is collected inevitably. The ECG signal may reveal current and past medical conditions as well as hints on the instantaneous emotional activity of the monitored individual. Traditionally, the ECG is available to physicians only. Thus, the possibility of linking ECG samples to identities can imply catastrophic privacy issues. 4. Cardiac Conditions. Although cardiac disorders are not as a frequent damaging factor as injuries for more conventional biometrics (fingerprint, face), they can limit ECG biometric methods. Disorders can range from an isolated irregularity (Atria and ventricle premature contractions) to severe conditions which require immediate medical assistance. The biometric challenge is therefore to design algorithms that are invariant to tolerable everyday ECG irregularities Agrafioti & Hatzinakos (2008a). 3. Electrocardiogram fundamentals The electrocardiogram (ECG) is one of the most widely used signals in healthcare. Recorded at the surface of the body, with electrodes attached in various configurations, the ECG signal is studied for diagnostics even at the very early stage of a disease. In essence, this signal describes the electrical activity of the heart over time, and pictures the sequential depolarization and repolarization of the different muscles that form the myocardium. The first recording device was developed by the physiologist Williem Einthoven in the early 20th century, and for this discovery he was rewarded with the Nobel Prize in Medicine in 1924 Sornmo & Laguna (2005). Since then, ECGbecame an indispensable tool in clinical cardiology. The deployment of this signal in biometric recognition and affective computing is relatively young. Figure 1 shows the salient components of an ECGsignal i.e., the P wave, the QRS complex and the T wave. The P wave describes the depolarization of the right and left atria. The amplitude of this wave is relatively small, because the atrial muscle mass is limited. The absence of a P wave typically indicates ventricular ectopic focus. This wave usually has a positive polarity, with a duration of approximately 120 ms, while its spectral content is limited to 10-15 Hz, i.e., low frequencies. The QRS complex corresponds to the largest wave, since it represents the depolarization of the right and left ventricles, being the heart chambers with substantial mass. The duration of this complex is approximately 70-110 ms in a normal heartbeat. The anatomic characteristics 202 Biometrics Heart Biometrics: Theory, Methods and Applications 5 R T P Q S Fig. 1. Main components of an ECG heart beat. of the QRS complex depend on the origin of the pulse. Due to its steep slopes, the spectrum of a QRS wave is higher compared to that of other ECG waves, and is mostly concentrated in the interval of 10-40 Hz. Finally, the T wave depicts the ventricular repolarization. It has a smaller amplitude, compared to the QRS complex, and is usually observed 300 ms after this larger complex. However, its precise position depends on the heart rate, e.g., appearing closer to the QRS waves at rapid heart rates. 3.1 Inter-individual variability This section will briefly discuss the physiological rationale for the use of ECG in biometric recognition. Overall, healthy ECG signals from different people conform to roughly the same repetitive pulse pattern. However, further investigation of a person’s ECG signal can reveal notably unique trends which are not present in recordings from other individuals. The inter-individual variability of ECG has been extensively reported in the literature Draper et al. (1964); Green et al. (1985); Hoekema et al. (2001); Kozmann et al. (1989; 2000); Larkin & Hunyor (1980); Pilkington et al. (2006). More specific, the ECG depicts various electrophysiological properties of the cardiac muscle. Model studies have shown that physiological factors such as the heart mass orientation, the conductivity of various areas and the activation order of the heart, are sources of significant variability among individuals Hoekema et al. (2001); Kozmann et al. (2000). Furthermore, geometrical attributes such as the exact position and orientation of the myocardium, and torso shape designate ECG signals with particularly distinct and personalized characteristics. Other factors affecting the ECG signal are the timing of depolarization and repolarization and lead placement. In addition, except for the anatomic idiosyncrasy of the heart, unique patterns have been associated to physical characteristics such as the body habitus and gender Green et al. (1985); Hoekema et al. (2001); Kozmann et al. (1989; 2000); Simon & Eswaran (1997). The electrical map of the area surrounding the heart may also be affected by variations of other organs in the thorax Hoekema et al. (2001). In fact, various methodologies have been proposed to eliminate the differences among ECG recordings. The idea of clearing off the inter-individual variability is typical when seeking to establish healthy diagnostic standards Draper et al. (1964). Automatic diagnosis 203 Heart Biometrics: Theory, Methods and Applications 6 Will-be-set-by-IN-TECH Fig. 2. Variability surrounding the QRS complex among heart beats of the same individual. of pathologies using the ECG is infeasible if the level of variability among healthy people is high Kozmann et al. (2000). In such algorithms, personalized parameters of every subject are treated as randomvariables and a number of criteria have been defined to quantify the degree of subjects’ similarities on a specific feature basis. 4. Related research Prior works in the ECG biometric recognition field can be categorized as either fiducial points dependent or independent. Fiducials are specific points of interest on an ECG heart beat such as the ones shown in Figure 1. Therefore, fiducial based approaches rely on local features of the heart beats for biometric template design, such as the temporal or amplitude difference between consecutive fiducial points. On the other hand, fiducial points independent approaches treat the ECG signal or isolated heart beats holistically and extract features statistically based on the overall morphology of the waveform. This distinction has a direct analogy to face biometric systems, where one can operate locally and extract biometric features such the distance between the eyes or the size of the mouth. A holistic approach in this case would then be to analyze the facial image globally. Both approaches have advantages and disadvantages. While fiducial oriented features risk to miss identifying information hidden behind the overall morphology of the biometric, holistic approaches deal with a large amount of redundant information that needs to be eliminated. The challenge in the later case, is remove this information in a way that the intra-subject variability is minimized and the inter-subject is maximized. For the ECG case, detecting fiducial points is a very obscure process due to the high variability of the signal. Figure 2 shows an example of aligned ECG heart beats which belong to the same individual. Even 204 Biometrics Heart Biometrics: Theory, Methods and Applications 7 though the QRS complex is perfectly aligned there is significant variability surrounding the P and the T wave, rendering the localization of these waves’ onsets and offsets very difficult. In fact, there is no universally acknowledged rule that can guide this detection Hoekema et al. (2001). This section provides an overviewof fiducial dependent and independent approaches to ECG biometrics that are currently found in the literature. A comparison is also provided in Tables 1 and 2. 5. Fiducial based approaches Among the earliest works in the area is Biel et al. (2001) proposal, in 2001, for a fiducial feature extraction algorithm, which demonstrated the feasibility of using ECG signals for human identification. The standard 12 lead system was used to record signals from 20 subjects of various ages. Special experimentation was carried out to test variations due to lead placement in terms of the exact location and the operators who place the electrodes. Out of 30 clinical diagnosis features that were estimated for each of the 12 leads, only 12features were retained for matching by inspection of the correlation matrix. These features pictured local characteristics of the pulses, such as the QRS complex and T wave amplitudes, P wave duration and other. This feature set was subsequently fed to SIMCA for training and classification. Results of combining different features were compared to demonstrate that the best case classification rate was 100% with just 10 features. Kyoso & Uchiyama (2001), also proposed fiducial based features for ECG biometric recognition. Overall, four feature parameters were selected i.e., the P wave duration, PQ interval, QRS complex and QT durations. These features were identified on the pulses by applying a threshold to the second order derivative. The subject with the smallest Mahalanobis distance between each two of the four feature parameters was selected as output. The highest reported performance was 94.2% for using just the QRS and QT intervals. In 2002, Shen et al. (2002) reported an ECG based recognition method with seven fiducial based features defined based on the QRS complex. The underlying idea was that this wave is less affected by varying heart rates, and thus is appropriate for ECG biometric recognition. The proposed methodology encompassed two steps: During a first step, template matching was used to compute the correlation coefficient among QRS complexes in the gallery set in order to find possible candidates and prune the search space. Adecision based neural network (DBNN) was then formed to strengthen the validation of the identity resulting from the first step. While the first step managed to correctly identify only 85% of the cases, the neural network resulted in 100% recognition. More complete biometric recognition tests were reported in 2004, by Israel et al. (2005). This work presented the three clear stages of ECG biometric recognition i.e., preprocessing, feature extraction and classification. In addition, a variety of experimental settings are described in Israel et al. (2005) such as, electrode placement and physical stress. The proposed system employed only temporal features. A filter was first applied to retain signal information in the band 1.1- 40 Hz and discard the rest of the spectral components which were attributed to noise. By targeting to keep discriminative information while applying a stable filter over the gallery set, different filtering techniques were examined to conclude to a local averaging, spectral differencing and Fourier band-pass filter. The highest identification rate achieved was close to 100% which generally established the ECG signal as a biometric feature that is robust to heart rate variability. 205 Heart Biometrics: Theory, Methods and Applications 8 Will-be-set-by-IN-TECH A similar approach was reported in the same year by Palaniappan & Krishnan (2004). In addition to commonly used features within QRS complex, a form factor, which is a measure of signal complexity, was proposed an tested as input to a neural network classifier. An identification rate of 97.6% was achieved over recordings of 10 individuals, by training a MLP-BP with 30 hidden units. Kim et al. (2005), proposed a method to normalize time domain features by Fourier synthesizing an up-sampled ECG heart beat. In addition, the P wave was excluded when calculating the features since it disappears when heart rate increases. With this strategy, the performance improved significantly when the testing subjects were performing physical activities. Another work that addressed the heart rate variations was by Saechia et al. (2005) in 2005. The heart beats were normalized to a the healthy durations and then divided into three sub-sequences: P wave, QRS complex and T wave. The Fourier transform was applied on a heart beat itself and all three sub-sequences. The spectrum were then passed to a neural network for classification. It was shown that false rate was significantly lower (17.14% to 2.85%) by using the three sub-sequences instead of the original heart beat. Zhang & Wei (2006), suggested 14 commonly used features from ECG heart beats on which a PCA was applied to reduce dimensionality. A classification method based on BayesŠ Theorem was proposed to maximize the posterior probability given prior probabilities and class-conditional densities. The proposed method outperformed Mahalanobis’ distance by 3.5% to 13% depending on the particular lead that was used. Singh & Gupta (2008), proposeda way to delineate the P and T waveforms for accurate feature extraction. By examining the ECG signal within a preset window before Q wave onset and apply a threshold to its first derivative, the precise position of P was revealed. In addition the onset, peak and offset of the P wave were detected by tracing the signal and examining the zero crossings in its first derivative. The system’s accuracy was 99% as tested over 25 subjects. In 2009, Boumbarov et al. (2009), investigated different models, such as HMM-GMM(Hidden markov model with Gaussian mixture model), HMM-SGM (Hidden markov model with single Gaussian model) and CRF (Conditional Random Field), to determine different fiducial points in an ECG segment, followed by PCA and LDAfor dimensionality reduction. A neural network with radial basis function was realized as the classifier and the recognition rate was between 62% to 94% for different subjects. Ting & Salleh (2010), described in 2010 a nonlinear dynamical model to represent the ECG in a state space form with the posterior states inferred by an Extended Kalman Filter. The Log-likelihood score was used to compare the estimated model of a testing ECG to that of the enrolled templates. The reported identification rate was 87.5% on the normal beats of 13 subjects from the MIT Arrhythmia database. It was also reported that the method was robust to noise for SNR above 20 dB. The Dynamic time warping or FLDA were used in Venkatesh & Jayaraman (2010), together with the nearest neighbor classifier. The proposed system was comprised of two steps as follows: First the FLDA and nearest neighbor operated on the features and then a DTW classifier was applied to additionally boost the performance (100%over a 12-subject database). For verification, only features related to QRS complex were selected due to their robustness to heart rate variability. Same two-stage setting was applied together with a threshold and the reported performance was 96% for 12 legitimates and 3 intruders. Another fiducial based method was proposed by Tawfik et al. (2010). In this work, the ECG segment between the QRS complex and the T wave was first extracted and normalized in the 206 Biometrics Heart Biometrics: Theory, Methods and Applications 9 time domain by using Framinghamcorrection formula or assuming constant QT interval. The DCT was then applied and the coefficients were fed into a neural network for classification. The identification rate was 97.727% for that of Framinghamcorrection formula and 98.18% for that with constant QT interval. Furthermore, using only the QRS complex without any time domain manipulation yielded a performance of 99.09%. 6. Fiducial independent approaches On the non-fiducial methodologies side the majority of the works were reported after 2006. Among the earliest is Plataniotis et al. (2006) proposal for an autocorrelation (AC) based feature extractor. With the objective of capturing the repetitive pattern of ECG, the authors suggested the AC of an ECG segment as a way to avoid fiducial points detection. It was demonstrated that the autocorrelation of windowed ECG signals,embeds highly discriminative information in a population. However, depending on the original sampling frequency of the signal, the dimensionality of a segment from the autocorrelation was considerably high for cost efficient applications. To reduce the dimensionality and retain only useful for recognition characteristics, the discrete cosine transform (DCT) was applied. The method was tested on 14 subjects, for which multiple ECGrecordings were available, acquired a few years apart. The identification performance was 100%. Wübbeler et al. (2007), have also reported an ECG based human recognizer by extracting biometric features from a combination of Leads I, II and III i.e., a two dimensional heart vector also known as the characteristic of the electrocardiogram. To locate and extract pulses a thresholding procedure was applied. For classification, the distance between two heart vectors as well as their first and second temporal derivatives were calculated. A verification functionality was also designed by setting a threshold on the distances. Authenticated pairs were considered those which were adequately related, while in any other case, input signals were rejected. The reported false acceptance and rejection rates were 0.2% and 2.5% corresponding to a 2.8% equal error rate (EER). The overall recognition rate of the systemwas 99% for 74 subjects. A methodology for ECG synthesis was proposed in 2007 by Molina et al. (2007). An ECG heartbeat was normalized and compared with its estimate which was constructed from itself and the templates from the claimed identity. The estimated version was produced by a morphological synthesis algorithm involving a modified dynamic time warping procedure. The Euclidean distance was used as a similarity measure and a threshold was applied to decide the authenticity. The highest reported performance was 98% with a 2 In 2008, Chan et al. (2008), reported ECG signal collection from the fingers by asking the participants to hold two electrode pads with their thumb and index finger. The Wavelet distance (WDIST) was used as the similarity measure with a classification accuracy of 89.1%, which outperformed other methods such as the percent residual distance(PRD) and the correlation coefficient(CCORR). Furthermore, a new recording session was conducted on several misclassified subjects which improved the system performance to 95%. In the same year,Chiu et al. (2008), proposed the use of DWT on heuristically isolated pulses. More precisely, every heart beat was determined on the ECG signal, as 43 samples backward and 84 samples forward fromevery R peak. The DWT was used for feature extraction and the Euclidean distance as the similarity measure. When the proposed method was applied to a database of 35 normal subjects, a 100% verification rate was reported. The author also pointed 207 Heart Biometrics: Theory, Methods and Applications 10 Will-be-set-by-IN-TECH out that false rate would increase if 10 subjects with cardiac arrhythmia were included in the database. Fatemian & Hatzinakos (2009), also suggested the Wavelet transformto denoise and delineate the ECG signals, followed by a process wherein every heart beat was resampled, normalized, aligned and averaged to create one strong template per subject. A correlation analysis was directly applied to test heart beats and the template since the gallery size was greatly reduced. The reported recognition rate was 99.6% for a setting were every subject has 2 templates in the gallery. The Spectogram was employed in Odinaka et al. (2010) to transform the ECG into a set of time-frequency bins which were modeled by independent normal distributions. Dimensionality reduction was based on Kullback-Leibler divergence where a feature is selected only if the relative entropy between itself and the nominal model (which is the spectrogramof all subjects in database) is larger than a certain threshold. Log-likelihood ratio was used as a similarity measure for classification and different scenarios were examined. For enrollment and testings over the same day, a 0.37% ERR was achieved for verification and a 99% identification rate. For different days, the performance was e 5.58% ERR and 76.9% respectively. Ye et al. (2010), applied the discrete wavelet transform (DWT) and independent component analysis (ICA) on ECG heart beat segments to obtain 118 and 18 features respectively (the feature vectors were concatenated). The dimensionality of the feature space was reduced from 136 to 26 using a PCA which retained 99% of the data’s variance. An SVM with Guassian radial basis function was used for classification with a decision level fusion of the results from the two leads. Rank-1 classification rate of 99.6% was achieved for normal heart beats. Another observation was that even though dynamic features such as the R-R interval proved to be beneficial for arrhythmia classification, they were not as good descriptors for biometric recognition. Coutinho et al. (2010), isolated the heart beats and performed 8-bit uniform quantization to map the ECG samples to strings from a 256-symbol alphabet. Classification was based on finding the template in the gallery set that results in the shortest description length of the test input (given the strings in the particular template) which was calculated by the Ziv-Merhav cross parsing algorithm. The reported classification accuracy was 100% on a 19-subject database in presence of emotional state variation. Autoregressive modeling was used in Ghofrani & Bostani (2010). The ECG signal was segmented with 50%overlap and an AR model of order 4 was estimated so that its coefficients are used for classification. Furthermore, the mean PSD of each segment was concatenated as add-on features which increased the system performance to 100% using a KNN classifier. The proposed method outperformed the state-of-art fractal-based approach such as the Lyapunov exponent, ApEn, Shanon Entropy, and the Higuchi chaotic dimension. Li & Narayanan (2010), proposed a method to model the ECG signal in both the temporal and cepstral domain. The hermite polynomial expansion was employed to transform heart beats into Hermite polynomial coefficients which were then modeled by an SVM with a linear kernel. Cepstral features were extracted by simple linear filtering and modeled by GMM/GSV(GMM supervector). The highest reported performance was 98.26% with a 5% ERR corresponding to a score level fusion of both temporal and cepstral classifiers. Tables 1 and 2 provide a high level comparison of the works that can currently be found in the literature. 208 Biometrics Heart Biometrics: Theory, Methods and Applications 11 ECG Filter AC LDA Classification Templates ID Fig. 3. Block diagram of the AC/LDA algorithm. 7. The AC/LDA algorithm The AC/ LDA, is a fiducial points independent methodology originally proposed in Agrafioti & Hatzinakos (2008b) and later expanded to cardiac irregularities in Agrafioti & Hatzinakos (2008a). This method, relies on a small segment of the autocorrelation of 5 sec ECG signals. The 5 sec duration has been chosen experimentally, as it is fast enough for real life applications, and also allows to capture the ECG’s repetitive property. The reader should note that this ECG window is allowed to cut the signal even in the middle of pulse, and it does not require any prior knowledge on the heart beat locations. The autocorrelation (AC) is computed for every 5 sec ECG using: R xx (m) = N−|m|−1 ∑ i=0 x(i)x(i + m) (1) where x(i) is an ECG sample for i = 0, 1...(N − |m| −1), x(i + m) is the time shifted version of the ECG with a time lag m, and N is the length of the signal. Out of R xx only a segment φ(m), m = 0, 1...M, defined by the zero lag instance and extending to approximately the length of a QRS complex 1 is used for further processing. This is because this wave is the least affected by heart rate changes, thus utilizing only this segment for discriminant analysis, makes the system robust to the heart rate variability. Figure 3, shows a block diagram of the AC/LDA algorithm. In a distributed system, such as human verification on smart phones, every user can use his/her phone in one of the two modes of operation i.e., enrollment or verification. During enrollment, ECG sensors located on the smart phone, first acquire an ECG sample from a subject’s fingers and then a biometric signature is designed and saved. Similarly, during verification a newly acquired sample is matched against the store template. Essentially, the verification decision is performed with a threshold on the distance of the two feature vectors. Although verification performs one-to-one matches, false acceptance can be controlled by learning the patterns of possible attackers. Therefore for smart phone based verification the objective is to optimally reduce the within class variability while learning patterns of the general population. This can be done with LDA training over a large generic dataset, against which a new enrollee will be learned. More specific, given a generic dataset (which can be anonymous to ensure privacy) the autocorrelation of every 5 sec ECG segment is computed using Eq. 1. This results into a number of AC segments Φ(m) against which an input AC feature vector φ input (m) is learned. 1 A QRS complex lasts for approximately 100 msec 209 Heart Biometrics: Theory, Methods and Applications M e t h o d P r i n c i p l e P e r f o r m a n c e N u m b e r o f S u b j e c t s K y o s o & U c h i y a m a ( 2 0 0 1 ) A n a l y z e d f o u r fi d u c i a l b a s e d f e a t u r e s f r o m h e a r t b e a t s , t o 9 4 . 2 % 9 d e t e r m i n e t h o s e w i t h g r e a t e r i m p a c t o n t h e i d e n t i fi c a t i o n p e r f o r m a n c e B i e l e t a l . ( 2 0 0 1 ) U s e a S I E M E N S E C G a p p a r a t u s t o r e c o r d 1 0 0 % 2 0 a n d s e l e c t a p p r o p r i a t e m e d i c a l d i a g n o s t i c f e a t u r e s f o r c l a s s i fi c a t i o n S h e n e t a l . ( 2 0 0 2 ) U s e t e m p l a t e m a t c h i n g a n d n e u r a l n e t w o r k s t o 1 0 0 % 2 0 c l a s s i f y Q R S c o m p l e x r e l a t e d c h a r a c t e r i s t i c s I s r a e l e t a l . ( 2 0 0 5 ) A n a l y z e fi d u c i a l b a s e d t e m p o r a l f e a t u r e s 1 0 0 % 2 9 u n d e r v a r i o u s s t r e s s c o n d i t i o n s P a l a n i a p p a n & K r i s h n a n ( 2 0 0 4 ) U s e t w o d i f f e r e n t n e u r a l n e t w o r k a r c h i t e c t u r e s 9 7 . % 1 0 f o r c l a s s i fi c a t i o n o f s i x Q R S w a v e r e l a t e d f e a t u r e s K i m e t a l . ( 2 0 0 5 ) B y n o r m a l i z i n g E C G h e a r t b e a t u s i n g F o u r i e r s y n t h e s i s N / A 1 0 t h e p e r f o r m a n c e u n d e r p h y s i c a l a c t i v i t i e s w a s i m p r o v e d S a e c h i a e t a l . ( 2 0 0 5 ) E x a m i n e d t h e e f f e c t i v e n e s s o f s e g m e n t i n g 9 7 . 1 5 % 2 0 E C G h e a r t b e a t i n t o t h r e e s u b s e q u e n c e s Z h a n g & W e i ( 2 0 0 6 ) B a y e s ’ c l a s s i fi e r b a s e d o n c o n d i t i o n a l p r o b a b i l i t y w a s 9 7 . 4 % 5 0 2 u s e d f o r i d e n t i fi c a t i o n a n d w a s f o u n d s u p u r i o r t o M a h a l a n o b i s ’ d i s t a n c e . P l a t a n i o t i s e t a l . ( 2 0 0 6 ) A n a l y z e t h e a u t o c o r r e l a t i o n o f E C G s f o r f e a t u r e 1 0 0 % 1 4 e x t r a c t i o n a n d a p p l y D C T f o r d i m e n s i o n a l i t y r e d u c t i o n W ü b b e l e r e t a l . ( 2 0 0 7 ) U t i l i z e t h e c h a r a c t e r i s t i c v e c t o r o f t h e e l e c t r o c a r d i o g r a m 9 9 % 7 4 f o r fi d u c i a l b a s e d f e a t u r e e x t r a c t i o n o u t o f t h e Q R S c o m p l e x M o l i n a e t a l . ( 2 0 0 7 ) M o r p h o l o g i c a l s y n t h e s i s t e c h n i q u e w a s 9 8 % 1 0 p r o p o s e d t o p r o d u c e d a s y n t h e s i z e d E C G h e a r t b e a t b e t w e e n t h e t e s t s a m p l e a n d t e m p l a t e C h a n e t a l . ( 2 0 0 8 ) W a v e l e t d i s t a n c e m e a s u r e w a s i n t r o d u c e d 9 5 % 5 0 t o t e s t t h e s i m i l a r i t y b e t w e e n E C G s T a b l e 1 . S u m m a r y o f r e l a t e d t o E C G b a s e d r e c o g n i t i o n w o r k s . 210 Biometrics M e t h o d P r i n c i p l e P e r f o r m a n c e N u m b e r o f S u b j e c t s S i n g h & G u p t a ( 2 0 0 8 ) A n e w m e t h o d t o d e l i n e a t e P a n d T w a v e s 9 9 % 2 5 F a t e m i a n & H a t z i n a k o s ( 2 0 0 9 ) L e s s t e m p l a t e s p e r s u j e c t i n g a l l e r y s e t t o s p e e d u p 9 9 . 6 % 1 3 c o m p u t a t i o n a n d r e d u c e m e m o r y r e q u i r e m e n t B o u m b a r o v e t a l . ( 2 0 0 9 ) N e u r a l n e t w o r k w i t h r a d i a l b a s i s f u n c t i o n 8 6 . 1 % 9 w a s e m p l o y e d a s t h e c l a s s i fi e r T i n g & S a l l e h ( 2 0 1 0 ) U s e e x t e n d e d K a l m a n fi l t e r a s i n f e r e n c e 8 7 . 5 % 1 3 e n g i n e t o e s t i m a t e E C G i n s t a t e s p a c e O d i n a k a e t a l . ( 2 0 1 0 ) T i m e f r e q u e n c y a n a l y s i s a n d r e l a t i v e 7 6 . 9 % 2 6 9 e n t r o p y t o c l a s s i f y E C G s V e n k a t e s h & J a y a r a m a n ( 2 0 1 0 ) A p p l y d y n a m i c t i m e w a r p i n g a n d 1 0 0 % 1 5 F i s h e r ’ s d i s c r i m i n a n t a n a l y s i s o n E C G f e a t u r e s T a w fi k e t a l . ( 2 0 1 0 ) E x a m i n e d t h e s y s t e m p e r f o r m a n c e w h e n 9 9 . 0 9 % 2 2 u s i n g n o r m a l i z e d Q T a n d Q R S a n d u s i n g r a w Q R S Y e e t a l . ( 2 0 1 0 ) A p p l i e d W a v e l e t t r a n s f o r m a n d I n d e p e n d e n t c o m p o n e n t 9 9 . 6 % 3 6 n o r m a l a n a l y s i s , t o g e t h e r w i t h s u p p o r t v e c t o r m a c h i a n d 1 1 2 a r r h y t h m i c a s c l a s s i fi e r t o f u s e i n f o r m a t i o n f r o m t w o l e a d s C o u t i n h o e t a l . ( 2 0 1 0 ) T r e a t h e a r t b e a t s a s a s t r i n g s a n d u s i n g 1 0 0 % 1 9 Z i v - M e r h a v p a r s i n g t o m e a s u r e t h e c r o s s c o m p l e x i t y G h o f r a n i & B o s t a n i ( 2 0 1 0 ) A u t o r e g r e s s i v e c o e f fi c i e n t a n d m e a n o f 1 0 0 % 1 2 p o w e r s p e c t r a l d e n s i t y w e r e p r o p o s e d t o m o d e l t h e s y s t e m f o r c l a s s i fi c a t i o n m o d e l t h e s y s t e m f o r c l a s s i fi c a t i o n L i & N a r a y a n a n ( 2 0 1 0 ) F u s i o n o f t e m p o r a l a n d c e p s t r a l f e a t u r e s 9 8 . 3 % 1 8 T a b l e 2 . ( C o n t i n u e d ) S u m m a r y o f r e l a t e d t o E C G b a s e d r e c o g n i t i o n w o r k s 211 Heart Biometrics: Theory, Methods and Applications 14 Will-be-set-by-IN-TECH Let the number of classes in the generic dataset be C. The training set will then involve C +1 classes as follows: Φ(m) = [Φ 1 (m), Φ 2 (m)...Φ C (m), Φ input (m)] (2) and for every subject i in C + 1 , a number of C i AC vectors are available. This is because during enrollment, longer ECG recordings can be acquired, so that multiple segments of the user’s biometric participate in training. The longer the training ECG signal the lower the chances of false rejection. Furthermore, since this is only required in the enrollment mode of operation, it does not affect the overall waiting time of the verification. Given Φ(m), LDA will find a set of k feature basis vectors {ψ v } k v=1 by maximizing the ratio of between-class and within-class scatter matrix. Given the transformation matrix Ψ, a feature vector is projected using: Y i (k) = Ψ T Φ i (m) (3) where eventually k << m and at most C. An advantage of distributed verification is that smart phone can be optimized experimentally for the intra-class variability of a particular user. This can be done during enrollment, by choosing the smallest distance threshold at which an individual is authenticated. Essentially, rather than imposing universal distance thresholds for all enrolles, every device can be "tuned" to the expected variability of the user. 8. ECG signal collection The performance of the framework discussed in Section 7 was evaluated over ECGrecordings collected at the BioSec.Lab 2 , at the University of Toronto. Overall, two recording sessions took place, scheduled a couple of weeks apart, in order to investigate the permanence of the signal in terms of verification performance. During the first session, 52 healthy volunteers were recorded for approximately 3 min each. The experiment was repeated a month later for 16 of the volunteers. The signals were collected from the subject’s wrists, with the Vernier ECG sensor. The wrists were selected for this recording so that the morphology of the acquired signal can resemble the one collected by a smart phone from the subject’s fingers. The sampling frequency was 200Hz. During the collection, the subjects were given no special instructions, in order to allow for mental state variability to be captured in the data. The recordings of the 36 volunteers who participated to the experiment only once were used for generic training. For the 16 volunteers that two recordings were available, the earliest ones were used for enrollment and the latter for testing. 9. Experimental performance Preprocessing of the signals is a very important because ECGis affected by both high and low frequency noise. For this reason, a butterworth bandpass filter of order 4 was used, centered between 0.5 Hz and 40Hz based on empirical results. After filtering, the autocorrelation was computed according to Eq. 1, for the generic dataset, the enrollment records and the testing ones. Each of the enrollees’ recordings were appended to the generic dataset individually, and a new LDA was trained for every enrollee. The performance each recognizer was then tested with matches the respective subject’s recordings in the test set. To estimate the False Acceptance 2 http://www.comm.utoronto.ca/∼biometrics/ 212 Biometrics Heart Biometrics: Theory, Methods and Applications 15 100% 80% 90% FalseȱAcceptanceȱRateȱ FalseȱRejectionȱRateȱ 60% 70% a t e s 40% 50% E r r o r ȱ R a EERȱ=ȱ32ȱ% 20% 30% 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0% 10% DistanceȱThreshold Fig. 4. False Acceptance and Rejection Rates when imposing universal recognition thresholds. 40% 30% FalseȱAcceptanceȱRate FalseȱRejectionȱRate 20% e s EER % 10% E r r o r ȱ R a t e EERȱ=ȱ14% 0 10% 0 0.167 0.168 0.169 0.17 0.171 0.172 0.173 DistanceȱThreshold Fig. 5. False Acceptance and Rejection Rates after personalization of the thresholds. 213 Heart Biometrics: Theory, Methods and Applications 16 Will-be-set-by-IN-TECH Rate, the remaining enrolles (i.e., subjects who did not participate in the generic pool), acted as intruders to the system. This subset of recordings is unseen to the current LDA, and thus constitutes, the unknown population. Figure 4 demonstrates the tradeoffs between false acceptance (FA) and rejection (FR) when the same threshold values are imposed for all users. The Equal Error Rate (EER) i.e., the rate at which false acceptance and rejection rates are equal is 32%. This performance is generally unacceptable for a viable security solution. When verification is performed locally on a smart device, one can take advantage of the fact that every card can be optimized to a particular individual. This treatment controls the false rejection since the matching threshold is "tuned" to the biometric variability of the individual. Figure 5 demonstrates the error distribution for the case of personalized thresholds, when aligning the individual ROC plots of the enrolles. The EER drops on average to 14%, with a significant number of subjects exhibiting EER between 0%- 5%. More details pertaining to this result can be found in Gao et al. (2011). 10. Conclusion This chapter discussed on one of the most extensively studied medical biometric feature, the ECG. ECG reflects the cardiac electrical activity over time, and presents significant intra subject variability due to electrophysiological variations of the myocardium. However, as it was argued there are significant advantages of using ECG for human identification such as universality, permanence and uniqueness. Therefore, a number of approaches have been developed to address the challenges of designing templates that are robust to the heart rate variability. The reported performance of the AC/LDA approach as well as of other methodologies in the literature, establish the ECG in the biometrics world and render its use in human recognition very promising. It is crucial however to perform large scale tests that can generalize the current performance as well as to address the privacy implications that arise with this technology. 11. References Agrafioti, F. & Hatzinakos, D. (2008a). ECG biometric analysis in cardiac irregularity conditions, Signal, Image and Video Processing pp. 1863–1703. Agrafioti, F. & Hatzinakos, D. (2008b). Fusion of ECG sources for human identification, 3rd Int. Symp. on Communications Control and Signal Processing, Malta, pp. 1542–1547. Biel, L., Pettersson, O., Philipson, L. & Wide, P. (2001). ECG analysis: a new approach in human identification, IEEE Trans. on Instrumentation and Measurement 50(3): 808–812. Boumbarov, O., Velchev, Y. & Sokolov, S. (2009). ECG personal identification in subspaces using radial basis neural networks, IEEE Int. Workshop on Intelligent Data Acquisition and Advanced Computing Systems, pp. 446 –451. Chan, A., Hamdy, M., Badre, A. & Badee, V. (2008). Wavelet distance measure for person identification using electrocardiograms, Instrumentation and Measurement, IEEE Transactions on 57(2): 248 –253. Chiu, C. C., Chuang, C. & Hsu, C. (2008). A novel personal identity verification approach using a discrete wavelet transform of the ECG signal, International Conference on Multimedia and Ubiquitous Engineering, pp. 201 –206. 214 Biometrics Heart Biometrics: Theory, Methods and Applications 17 Coutinho, D., Fred, A. & Figueiredo, M. (2010). One-lead ECG-based personal identification using Ziv-Merhav cross parsing, 20th Int. Conf. on Pattern Recognition, pp. 3858 –3861. Draper, H., Peffer, C., Stallmann, F., Littmann, D. & Pipberger, H. (1964). The corrected orthogonal electrocardiogram and vectorcardiogram in 510 normal men (frank lead system), Circulation 30: 853–864. Fatemian, S. & Hatzinakos, D. (2009). A new ECG feature extractor for biometric recognition, 16th International Conference on Digital Signal Processing, pp. 1 –6. Gao, J., Agrafioti, F., Mohammadzade, H. & Hatzinakos, D. (2011). ECG for blind identity verification in distributed systems, IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP). Ghofrani, N. & Bostani, R. (2010). Reliable features for an ECG-based biometric system, 17th Iranian Conference of Biomedical Engineering, pp. 1 –5. Green, L., Lux, R., Williams, C. H. R., Hunt, S. & Burgess, M. (1985). Effects of age, sex, and body habitus on qrs and st-t potential maps of 1100 normal subjects, Circulation 85: 244–253. Hoekema, R., G.Uijen & van Oosterom, A. (2001). Geometrical aspect of the interindividual variaility of multilead ECG recordings, IEEE Trans. Biomed. Eng. 48: 551–559. Israel, S. A., Irvine, J. M., Cheng, A., Wiederhold, M. D. & Wiederhold, B. K. (2005). ECG to identify individuals, Pattern Recognition 38(1): 133–142. Kim, K. S., Yoon, T. H., L., J., Kim, D. J. & Koo, H. S. (2005). A robust human identification by normalized time-domain features of Electrocardiogram, 27th Annual Int. Conf on Eng. in Medicine and Biology Society, pp. 1114 –1117. Kozmann, G., Lux, R. & Green, L. (1989). Sources of variability in normal body surface potential maps, Circulation 17: 1077–1083. Kozmann, G., Lux, R. & Green, L. (2000). Geometrical factors affecting the interindividual variability of the ECG and the VCG, J. Electrocardiology 33: 219–227. Kyoso, M. & Uchiyama, A. (2001). Development of an ECG identification system, 23rd Annual International Conference of the Engin.g in Medicine and Biology Society. Larkin, H. & Hunyor, S. (1980). Precordial voltage variation in the normal electrocardiogram, J. Electrocardiology 13: 347–352. Li, M. & Narayanan, S. (2010). Robust ECG biometrics by fusing temporal and cepstral information, 20th International Conference on Pattern Recognition, pp. 1326 –1329. Molina, G. G., Bruekers, F., Presura, C., Damstra, M. &van der Veen, M. (2007). Morphological sythesis of ECG signals for person authentication, 15th European Signal Proc. Conf., Poland. Odinaka, I., Lai, P.-H., Kaplan, A., O’Sullivan, J., Sirevaag, E., Kristjansson, S., Sheffield, A. & Rohrbaugh, J. (2010). Ecg biometrics: A robust short-time frequency analysis, IEEE International Workshop on Information Forensics and Security, pp. 1 –6. Palaniappan, R. & Krishnan, S. (2004). Identifying individuals using ECG beats, International Conference on Signal Processing and Communications, pp. 569 – 572. Peter Allen, Calais migrants mutilate fingerprints to hide true identity, Daily Mail (n.d.). http://www.dailymail.co.uk/news/worldnews/article-1201126/Calais-migrants- mutilate-fingertips-hide-true-identity.html. Pilkington, T., Barr, R. & Rogers, C. L. (2006). Effect of conductivity interfaces in electrocardiography, Springer New York. 30: 637–643. Plataniotis, K., Hatzinakos, D. & Lee, J. (2006). ECG biometric recognition without fiducial detection, Proc. of Biometrics Symposiums (BSYM), Baltimore, Maryland, USA. 215 Heart Biometrics: Theory, Methods and Applications 18 Will-be-set-by-IN-TECH Saechia, S., Koseeyaporn, J. & Wardkein, P. (2005). Human identification system based ECG signal, TENCON 2005, pp. 1 –4. Shen, T. W., Tompkins, W. J. & Hu, Y. H. (2002). One-lead ECG for identity verification, Proc. of the 2nd Conf. of the IEEE Eng. in Med. and Bio. Society and the Biomed. Eng. Society, Vol. 1, pp. 62–63. Simon, B. P. & Eswaran, C. (1997). An ECG classifier designed using modified decision based neural network, Comput. Biomed. Res. 30: 257–272. Singh, Y. & Gupta, P. (2008). ECG to individual identification, nd IEEE Int. Conf. on Biometrics: Theory, Applications and Systems 2. Sornmo, L. & Laguna, P. (2005). Bioelectrical Signal Processing in Cardiac and Neurological Applications, Elsevier. Tawfik, M., Selim, H. & Kamal, T. (2010). Human identification using time normalized QT signal and the QRS complex of the ECG, 7th International Symposium on Communication Systems Networks and Digital Signal Processing, pp. 755 –759. The US Federal Commission Report, Frebruary 2010 (n.d.). http://www.ftc.gov /sentinel/reports/sentinel-annual-reports/sentinel-cy2009.pdf. Ting, C. M. & Salleh, S. H. (2010). ECG based personal identification using extended kalman filter, 10th International Conference on Information Sciences Signal Processing and their Applications, pp. 774 –777. Venkatesh, N. & Jayaraman, S. (2010). Human electrocardiogram for biometrics using DTW and FLDA, 20th International Conference on Pattern Recognition (ICPR), pp. 3838 –3841. Wübbeler, G., Stavridis, M., Kreiseler, D., Bousseljot, R. & Elster, C. (2007). Verification of humans using the electrocardiogram, Pattern Recogn. Lett. 28(10): 1172–1175. Ye, C., Coimbra, M. & Kumar, B. (2010). Investigation of human identification using two-lead electrocardiogram (ECG) signals, 4th Int. Conf. on Biometrics: Theory Applications and Systems, pp. 1 –8. Zhang, Z. & Wei, D. (2006). A new ECG identification method using bayes’ teorem, TENCON 2006, pp. 1 –4. 216 Biometrics 0 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions Francesco Beritelli and Andrea Spadaccini Dipartimento di Ingegneria Elettrica, Elettronica ed Informatica (DIEEI) University of Catania Italy 1. Introduction Identity verification is an increasingly important process in our daily lives. Whether we need to use our own equipment or to prove our identity to third parties in order to use services or gain access to physical places, we are constantly required to declare our identity and prove our claim. Traditional authentication methods fall into two categories: proving that you knowsomething (i.e., password-based authentication) and proving that you own something (i.e., token-based authentication). These methods connect the identity with an alternate and less rich representation, for instance a password, that can be lost, stolen, or shared. A solution to these problems comes from biometric recognition systems. Biometrics offers a natural solution to the authentication problem, as it contributes to the construction of systems that can recognize people by the analysis of their anatomical and/or behavioral characteristics. With biometric systems, the representation of the identity is something that is directly derived from the subject, therefore it has properties that a surrogate representation, like a password or a token, simply cannot have (Jain et al. (2006; 2004); Prabhakar et al. (2003)). The strength of a biometric system is determined mainly by the trait that is used to verify the identity. Plenty of biometric traits have been studied and some of them, like fingerprint, iris and face, are nowadays used in widely deployed systems. Today, one of the most important research directions in the field of biometrics is the characterization of novel biometric traits that can be used in conjunction with other traits, to limit their shortcomings or to enhance their performance. The aim of this chapter is to introduce the reader to the usage of heart sounds for biometric recognition, describing the strengths and the weaknesses of this novel trait and analyzing in detail the methods developed so far and their performance. The usage of heart sounds as physiological biometric traits was first introduced in Beritelli & Serrano (2007), in which the authors proposed and started exploring this idea. Their system is based on the frequency analysis, by means of the Chirp z-Transform (CZT), of the sounds produced by the heart during the closure of the mitral tricuspid valve and during the closure of the aortic pulmonary valve. These sounds, called S1 and S2, are extracted from the input 11 2 Biometrics / Book 1 signal using a segmentation algorithm. The authors build the identity templates using feature vectors and test if the identity claim is true by computing the Euclidean distance between the stored template and the features extracted during the identity verification phase. In Phua et al. (2008), the authors describe a different approach to heart-sounds biometry. Instead of doing a structural analysis of the input signal, they use the whole sequences, feeding them to two recognizers built using Vector Quantization and Gaussian Mixture Models; the latter proves to be the most performant system. In Beritelli & Spadaccini (2009a;b), the authors further develop the system described in Beritelli & Serrano (2007), evaluating its performance on a larger database, choosing a more suitable feature set (Linear Frequency Cepstrum Coefficients, LFCC), adding a time-domain feature specific for heart sounds, called First-to-Second Ratio (FSR) and adding a quality-based data selection algorithm. In Beritelli & Spadaccini (2010a;b), the authors take an alternative approach to the problem, building a system that leverages statistical modelling using Gaussian Mixture Models. This technique is different from Phua et al. (2008) in many ways, most notably the segmentation of the heart sounds, the database, the usage of features specific to heart sounds and the statistical engine. This system proved to yield good performance in spite of a larger database, and the final Equal Error Rate (EER) obtained using this technique is 13.70 % over a database of 165 people, containing two heart sequences per person, each lasting from 20 to 70 seconds. This chapter is structured as follows: in Section 2, we describe in detail the usage of heart sounds for biometric identification, comparing them to other biometric traits, briefly explaining how the human cardio-circulatory system works and produces heart sounds and how they can be processed; in Section 3 we present a survey of recent works on heart-sounds biometry by other research groups; in Section 4 we describe in detail the structural approach; in Section 5 we describe the statistical approach; in Section 6 we compare the performance of the two methods on a common database, describing both the performance metrics and the heart sounds database used for the evaluation; finally, in Section 7 we present our conclusions, and highlight current issues of this method and suggest the directions for the future research. 2. Biometric recognition using heart sounds Biometric recognition is the process of inferring the identity of a person via quantitative analysis of one or more traits, that can be derived either directly from a person’s body (physiological traits) or from one’s behaviour (behavioural traits). Speaking of physiological traits, almost all the parts of the body can already be used for the identification process (Jain et al. (2008)): eyes (iris and retina), face, hand (shape, veins, palmprint, fingerprints), ears, teeth etc. In this chapter, we will focus on an organ that is of fundamental importance for our life: the heart. The heart is involved in the production of two biological signals, the Electrocardiograph(ECG) and the Phonocardiogram(PCG). The first is a signal derived fromthe electrical activity of the organ, while the latter is a recording of the sounds that are produced during its activity (heart sounds). While both signals have been used as biometric traits (see Biel et al. (2001) for ECG-based biometry), this chapter will focus on hearts-sounds biometry. 218 Biometrics Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 3 2.1 Comparison to other biometric traits The paper Jain et al. (2004) presents a classification of available biometric traits with respect to 7 qualities that, according to the authors, a trait should possess: • Universality: each person should possess it; • Distinctiveness: it should be helpful in the distinction between any two people; • Permanence: it should not change over time; • Collectability: it should be quantitatively measurable; • Performance: biometric systems that use it should be reasonably performant, with respect to speed, accuracy and computational requirements; • Acceptability: the users of the biometric system should see the usage of the trait as a natural and trustable thing to do in order to authenticate; • Circumvention: the system should be robust to malicious identification attempts. Each trait is evaluated with respect to each of these qualities using 3 possible qualifiers: H (high), M (medium), L (low). We added to the original table a row with our subjective evaluation of heart-sounds biometry with respect to the qualities described above, in order to compare this new technique with other more established traits. The updated table is reproduced in Table 1. The reasoning behind each of our subjective evaluations of the qualities of heart sounds is as follows: • High Universality: a working heart is a conditio sine qua non for human life; • Medium Distinctiveness: the actual systems’ performance is still far from the most discriminating traits, and the tests are conducted using small databases; the discriminative power of heart sounds still must be demonstrated; • Low Permanence: although to the best of our knowledge no studies have been conducted in this field, we perceive that heart sounds can change their properties over time, so their accuracy over extended time spans must be evaluated; • Low Collectability: the collection of heart sounds is not an immediate process, and electronic stethoscopes must be placed in well-defined positions on the chest to get a high-quality signal; • Low Performance: most of the techniques used for heart-sounds biometry are computationally intensive and, as said before, the accuracy still needs to be improved; • Medium Acceptability: heart sounds are probably identified as unique and trustable by people, but they might be unwilling to use them in daily authentication tasks; • Low Circumvention: it is very difficult to reproduce the heart sound of another person, and it is also difficult to record it covertly in order to reproduce it later. Of course, heart-sounds biometry is a new technique, and some of its drawbacks probably will be addressed and resolved in future research work. 219 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 4 Biometrics / Book 1 Biometric identifier U n i v e r s a l i t y D i s t i n c t i v e n e s s P e r m a n e n c e C o l l e c t a b i l i t y P e r f o r m a n c e A c c e p t a b i l i t y C i r c u m v e n t i o n DNA H H H L H L L Ear M M H M M H M Face H L M H L H H Facial thermogram H H L H M H L Fingerprint M H H M H M M Gait M L L H L H M Hand geometry M M M H M M M Hand vein M M M M M M L Iris H H H M H L L Keystroke L L L M L M M Odor H H H L L M L Palmprint M H H M H M M Retina H H M L H L L Signature L L L H L H H Voice M L M L L M H Heart sounds H M L L L M L Table 1. Comparison between biometric traits as in Jain et al. (2004) and heart sounds 2.2 Physiology and structure of heart sounds The heart sound signal is a complex, non-stationary and quasi-periodic signal that is produced by the heart during its continuous pumping work (Sabarimalai Manikandan &Soman (2010)). It is composed by several smaller sounds, each associated with a specific event in the working cycle of the heart. Heart sounds fall in two categories: • primary sounds, produced by the closure of the heart valves; • other sounds, produced by the blood flowing in the heart or by pathologies; The primary sounds are S1 and S2. The first sound, S1, is caused by the closure of the tricuspid and mitral valves, while the second sound, S2, is caused by the closure of the aortic and pulmonary valves. Among the other sounds, there are the S3 and S4 sounds, that are quieter and rarer than S1 and S2, and murmurs, that are high-frequency noises. In our systems, we only use the primary sounds because they are the two loudest sounds and they are the only ones that a heart always produces, even in pathological conditions. We separate them from the rest of the heart sound signal using the algorithm described in Section 2.3.1. 2.3 Processing heart sounds Heart sounds are monodimensional signals, and can be processed, to some extent, with techniques known to work on other monodimensional signals, like audio signals. Those 220 Biometrics Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 5 techniques then need to be refined taking into account the peculiarities of the signal, its structure and components. In this section we will describe an algorithm used to separate the S1 and S2 sounds from the rest of the heart sound signal (2.3.1) and three algorithms used for feature extraction (2.3.2, 2.3.3, 2.3.4), that is the process of transforming the original heart sound signal into a more compact, and possibly more meaningful, representation. We will briefly discuss two algorithms that work in the frequency domain, and one in the time domain. 2.3.1 Segmentation In this section we describe a variation of the algorithm that was employed in (Beritelli & Serrano (2007)) to separate the S1 and S2 tones from the rest of the heart sound signal, improved to deal with long heart sounds. Such a separation is done because we believe that the S1 and S2 tones are as important to heart sounds as the vowels are to the voice signal. They are stationary in the short term and they convey significant biometric information, that is then processed by feature extraction algorithms. A simple energy-based approach can not be used because the signal can contain impulsive noise that could be mistaken for a significant sound. The first step of the algorithm is searching the frame with the highest energy, that is called SX1. At this stage, we do not know if we found an S1 or an S2 sound. Then, in order to estimate the frequency of the heart beat, and therefore the period P of the signal, the maximum value of the autocorrelation function is computed. Low-frequency components are ignored by searching only over the portion of autocorrelation after the first minimum. The algorithm then searches other maxima to the left and to the right of SX1, moving by a number P of frames in each direction and searching for local maxima in a window of the energy signal in order to take into account small fluctuations of the heart rate. After each maximum is selected, a constant-width window is applied to select a portion of the signal. After having completed the search that starts from SX1, all the corresponding frames in the original signal are zeroed out, and the procedure is repeated to find a new maximum-energy frame, called SX2, and the other peaks are found in the same way. Finally, the positions of SX1 and SX2 are compared, and the algorithmthen decides if SX1, and all the frames found starting from it, must be classified as S1 or S2; the remaining identified frames are classified accordingly. The nature of this algorithm requires that it work on short sequences, 4 to 6 seconds long, because as the sequence gets longer the periodicity of the sequence fades away due to noise and variations of the heart rate. To overcome this problem, the signal is split into 4-seconds wide windows and the algorithm is applied to each window. The resulting sets of heart sounds endpoint are then joined into a single set. 2.3.2 The chirp z-transform The Chirp z-Transform (CZT) is an algorithm for the computation of the z-Transform of sampled signals that offers some additional flexibility to the Fast Fourier Transform (FFT) algorithm. 221 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 6 Biometrics / Book 1 Fig. 1. Example of S1 and S2 detection The main advantage of the CZT exploited in the analysis of heart sounds is the fact that it allows high-resolution analysis of narrow frequency bands, offering higher resolution than the FFT. For more details on the CZT, please refer to Rabiner et al. (1969) 2.3.3 Cepstral analysis Mel-Frequency Cepstrum Coefficients (MFCC) are one of the most widespread parametric representation of audio signals (Davis & Mermelstein (1980)). The basic idea of MFCC is the extraction of cepstrum coefficients using a non-linearly spaced filterbank; the filterbank is instead spaced according to the Mel Scale: filters are linearly spaced up to 1 kHz, and then are logarithmically spaced, decreasing detail as the frequency increases. This scale is useful because it takes into account the way we perceive sounds. The relation between the Mel frequency ˆ f mel and the linear frequency f lin is the following: ˆ f mel = 2595 · log 10 1 + f lin 700 (1) Some heart-sound biometry systems use MFCC, while others use a linearly-spaced filterbank. The first step of the algorithm is to compute the FFT of the input signal; the spectrum is then feeded to the filterbank, and the i-th cepstrum coefficient is computed using the following formula: C i = K ∑ k=1 X k · cos i · k − 1 2 · π K i = 0, ..., M (2) where K is the number of filters in the filterbank, X k is the log-energy output of the k-th filter and M is the number of coefficients that must be computed. Many parameters have to be chosen when computing cepstrum coefficients. Among them: the bandwidth and the scale of the filterbank (Mel vs. linear), the number and spectral width of filters, the number of coefficients. In addition to this, differential cepstrum coefficients, tipically denoted using a Δ (first order) or ΔΔ (second order), can be computed and used. Figure 2 shows an example of three S1 sounds and the relative MFCC spectrograms; the first two (a, b) belong to the same person, while the third (c) belongs to a different person. 222 Biometrics Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 7 Fig. 2. Example of waveforms and MFCC spectrograms of S1 sounds 2.3.4 The First-to-Second Ratio (FSR) In addition to standard feature extraction techniques, it would be desirable to develop ad-hoc features for the heart sound, as it is not a simple audio sequence but has specific properties that could be exploited to develop features with additional discriminative power. This is why we propose a time-domain feature called First-to-Second Ratio (FSR). Intuitively, the FSR represents the power ratio of the first heart sound (S1) to the second heart sound (S2). During our work, we observed that some people tend to have an S1 sound that is louder than S2, while in others this balance is inverted. We try to represent this diversity using our new feature. The implementation of the feature is different in the two biometric systems that we described in this chapter, and a discussion of the two algorithms can be found in 4.4 and 5.4. 3. Review of related works In the last years, different research groups have been studying the possibility of using heart sounds for biometric recognition. In this section, we will briefly describe their methods. In Table 2 we summarized the main characteristics of the works that will be analyzed in this section, using the following criteria: • Database - the number of people involved in the study and the amount of heart sounds recorded from each of them; 223 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 8 Biometrics / Book 1 • Features - which features were extracted from the signal, at frame level or from the whole sequence; • Classification - how features were used to make a decision. We chose not to represent performance in this table for two reasons: first, most papers do not adopt the same performance metric, so it would be difficult to compare them; second, the database and the approach used are quite different one from another, so it would not be a fair comparison. Paper Database Features Classification Phua et al. (2008) 10 people MFCC GMM 100 HS each LBFC VQ Tran et al. (2010) 52 people Multiple SVM 100m each Jasper & Othman (2010) 10 people Energy Euclidean 20 HS each peaks distance Fatemian et al. (2010) 21 people MFCC, LDA, Euclidean 6 HS each energy peaks distance 8 seconds per HS El-Bendary et al. (2010) 40 people autocorrelation MSE 10 HS cross-correlation kNN 10 seconds per HS complex cepstrum Table 2. Comparison of recent works about heart-sound biometrics In the rest of the section, we will briefly review each of these papers. Phua et al. (2008) was one of the first works in the field of heart-sounds biometry. In this paper, the authors first do a quick exploration of the feasibility of using heart sounds as a biometric trait, by recording a test database composed of 128 people, using 1-minute heart sounds and splitting the same signal into a train and a testing sequence. Having obtained goodrecognition performance using the HTK Speech Recognition toolkit, they do a deeper test using a database recorded from 10 people and containing 100 sounds for each person, investigating the performance of the system using different feature extraction algorithms (MFCC, Linear Frequency Band Cepstra (LFBC)), different classification schemes (Vector Quantization (VQ) and Gaussian Mixture Models (GMM)) and investigating the impact of the frame size and of the training/test length. After testing many combinations of those parameters, they conclude that, on their database, the most performing system is composed of LFBC features (60 cepstra + log-energy + 256ms frames with no overlap), GMM-4 classification, 30s of training/test length. The authors of Tran et al. (2010), one of which worked on Phua et al. (2008), take the idea of finding a good and representative feature set for heart sounds even further, exploring 7 sets of features: temporal shape, spectral shape, cepstral coefficientrs, harmonic features, rhythmic features, cardiac features and the GMM supervector. They then feed all those features to a feature selection method called RFE-SVM and use two feature selection strategies (optimal and sub-optimal) to find the best set of features among the ones they considered. The tests 224 Biometrics Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 9 were conducted on a database of 52 people and the results, expressed in terms of Equal Error Rate (EER), are better for the automatically selected feature sets with respect to the EERs computed over each individual feature set. In Jasper & Othman (2010), the authors describe an experimental system where the signal is first downsampled from 11025 Hz to 2205 Hz; then it is processed using the Discrete Wavelet Transform, using the Daubechies-6 wavelet, and the D4 and D5 subbands (34 to 138 Hz) are then selected for further processing. After a normalization and framing step, the authors then extract from the signal some energy parameters, and they find that, among the ones considered, the Shannon energy envelogram is the feature that gives the best performance on their database of 10 people. The authors of Fatemian et al. (2010) do not propose a pure-PCG approach, but they rather investigate the usage of both the ECG and PCG for biometric recognition. In this short summary, we will focus only on the part of their work that is related to PCG. The heart sounds are processed using the Daubechies-5 wavelet, up to the 5th scale, and retaining only coefficients from the 3rd, 4th and 5th scales. They then use two energy thresholds (low and high), to select which coefficients should be used for further stages. The remaining frames are then processed using the Short-Term Fourier Transform (STFT), the Mel-Frequency filterbank and Linear Discriminant Analysis (LDA) for dimensionality reduction. The decision is made using the Euclidean distance from the feature vector obtained in this way and the template stored in the database. They test the PCG-based system on a database of 21 people, and their combined PCG-ECG systems has better performance. The authors of El-Bendary et al. (2010) filter the signal using the DWT; then they extract different kinds of features: auto-correlation, cross-correlation and cepstra. They then test the identities of people in their database, that is composed by 40 people, using two classifiers: Mean Square Error (MSE) and k-Nearest Neighbor (kNN). On their database, the kNN classifier performs better than the MSE one. 4. The structural approach to heart-sounds biometry The first system that we describe in depth was introduced in Beritelli & Serrano (2007); it was designed to work with short heart sounds, 4 to 6 seconds long and thus containing at least four cardiac cycles (S1-S2). The restriction on the length of the heart sound was removed in Beritelli & Spadaccini (2009a), that introduced the quality-based best subsequence selection algorithm, described in 4.1. We call this system “structural” because the identity templates are stored as feature vectors, in opposition to the “statistical” approach, that does not directly keep the feature vectors but instead it represents identities via statistical parameters inferred in the learning phase. Figure 3 contains the block diagram of the system. Each of the steps will be described in the following sections. 4.1 The best subsequence selection algorithm The fact that the segmentation and matching algorithms of the original systemwere designed to work on short sequences was a strong constraint for the system. It was required that a human operator selected a portion of the input signal based on some subjective assumptions. It was clearly a flaw that needed to be addressed in further versions of the system. 225 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 10 Biometrics / Book 1 detector S1/S2 endpoint detector S1/S2 sounds S1/S2 sounds MFCC FSR Template yes no Matcher ˆ x(n) x(n) Low-pass filter Best subsequence detector Fig. 3. Block diagram of the proposed cardiac biometry system To resolve this issue, the authors developed a quality-based subsequence selection algorithm, based on the definition of a quality index DHS QI (i) for each contiguous subsequence i of the input signal. The quality index is based on a cepstral similarity criterion: the selectedsubsequence is the one for which the cepstral distance of the tones is the lowest possible. So, for a given subsequence i, the quality index is defined as: DHS QI (i) = 1 4 ∑ k=1 4 ∑ j=1 j=k d S1 (j, k) + 4 ∑ k=1 4 ∑ j=1 j=k d S2 (j, k) (3) Where d S1 and d S2 are the cepstral distances defined in 4.5. The subsequence i with the maximum value of DHS QI (i) is then selected as the best one and retained for further processing, while the rest of the input signal is discarded. 4.2 Filtering and segmentation After the best subsequence selection, the signal is then given in input to the heart sound endpoint detection algorithm described in 2.3.1. The endpoints that it finds are then used to extract the relevant portions of the signal over a version of the heart sound signal that was previously filtered using a low-pass filter, which removed the high-frequency extraneous components. 4.3 Feature extraction The heart sounds are then passed to the feature extraction module, that computes the cepstral features according to the algorithm described in 2.3. This systemuses M = 12 MFCCcoefficients, with the addition of a 13-th coefficient computed using an i = 0 value in Equation 2, that is the log-energy of the analyzed sound. 4.4 Computation of the First-to-Second Ratio For each input signal, the system computes the FSR according to the following algorithm. Let N be the number of complete S1-S2 cardiac cycles in the signal. Let P S1 i (resp. P S2 i ) be the power of the i-th S1 (resp. S2) sound. We can then define P S1 and P S2 , the average powers of S1 and S2 heart sounds: 226 Biometrics Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 11 P S1 = 1 N N ∑ i=1 P S1 i (4) P S2 = 1 N N ∑ i=1 P S2 i (5) Using these definitions, we can then define the First-to-Second Ration of a given heart sound signal as: FSR = P S1 P S2 (6) For two given DHS sequences x 1 and x 2 , we define the FSR distance as: d FSR (x 1 , x 2 ) = |FSR dB (x 1 ) − FSR dB (x 2 )| (7) 4.5 Matching and identity verification The crucial point of identity verification is the computation of the distance between the feature set that represents the input signal and the template associated with the identity claimed in the acquisition phase by the person that is trying to be authenticated by the system. This system employs two kinds of distance: the first in the cepstral domain and the second using the FSR. MFCC are compared using the Euclidean metric (d 2 ). Given two heart sound signals X and Y, let X S1 (i) (resp. X S2 (i)) be the feature vector for the i-th S1 (resp. S2) sound of the X signal and Y S1 and Y S2 the analogous vectors for the Y signal. Then the cepstral distances between X and Y can be defined as follows: d S1 (X, Y) = 1 N 2 N ∑ i,j=1 d 2 (X S1 (i), Y S1 (j)) (8) d S2 (X, Y) = 1 N 2 N ∑ i,j=1 d 2 (X S2 (i), Y S2 (j)) (9) Now let us take into account the FSR. Starting from the d FSR as defined in Equation 7, we wanted this distance to act like an amplifying factor for the cepstral distance, making the distance bigger when it has an high value while not changing the distance for low values. We then normalized the values of d FSR between 0 and 1 (d FSR norm ), we chose a threshold of activation of the FSR (th F SR) and we defined defined k FSR , an amplifying factor used in the matching phase, as follows: k FSR = max 1, d FSR norm th FSR (10) In this way, if the normalized FSR distance is lower than th FSR it has no effect on the final score, but if it is larger, it will increase the cepstral distance. Finally, the distance between X and Y can be computed as follows: d(X, Y) = k FSR · d S1 (X, Y) 2 + d S2 (X, Y) 2 (11) 227 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 12 Biometrics / Book 1 5. The statistical approach to heart-sounds biometry In opposition to the systemanalyzed in Section 4, the one that will be described in this section is based on a learning process that does not directly take advantage of the features extracted from the heart sounds, but instead uses them to infer a statistical model of the identity and makes a decision computing the probability that the input signal belongs to the person whose identity was claimed in the identity verification process. 5.1 Gaussian Mixture Models Gaussian Mixture Models (GMM) are a powerful statistical tool used for the estimation of multidimensional probability density representationand estimation (Reynolds &Rose (1995)). A GMM λ is a weighted sum of N Gaussian probability densities: p(x|λ) = N ∑ i=1 w i p i (x) (12) where x is a D-dimensional data vector, whose probability is being estimated, and w i is the weight of the i-th probability density, that is defined as: p i (x) = 1 (2π) D |Σ i | e − 1 2 (x−μ i ) Σ i (x−μ i ) The parameters of p i are μ i (∈ R D ) and Σ i (∈ R D×D ), that together with w i (∈ R N ) form the set of values that represent the GMM: λ = {w i , μ i , Σ i } (13) Those parameters of the model are learned in the training phase using the Expectation-Maximization algorithm (McLachlan & Krishnan (1997)), using as input data the feature vectors extracted from the heart sounds. 5.2 The GMM/UBM method The problem of verifying whether an input heart sound signal s belongs to a stated identity I is equivalent to a hypothesis test between two hypotheses: H 0 : s belongs to I H 1 : s does not belong to I This decision can be taken using a likelihood test: S(s, I) = p(s|H 0 ) p(s|H 1 ) ⎧ ⎨ ⎩ ≥ θ accept H 0 < θ reject H 0 (14) where θ is the decision threshold, a fundamental systemparameter that is chosen in the design phase. The probability p(s|H 0 ), in our system, computed using Gaussian Mixture Models. 228 Biometrics Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 13 The input signal is converted by the front-end algorithms to a set of K feature vectors, each of dimension D, so: p(s|H 0 ) = K ∏ j=1 p(x j |λ I ) (15) In Equation 14, the p(s|H 1 ) is still missing. In the GMM/UBM framework (Reynolds et al. (2000)), this probability is modelled by building a model trained with a set of identities that represent the demographic variability of the people that might use the system. This model is called Universal Background Model (UBM). The UBMis created during the systemdesign, and is subsequently used every time the system must compute a matching score. The final score of the identity verification process, expressed in terms of log-likelihood ratio, is Λ(s) = log S(s, I) = log p(s|λ I ) −log p(s|λ W ) (16) 5.3 Front-end processing Each time the system gets an input file, whether for training a model or for identity verification, it goes through some common steps. First, heart sounds segmentation is carried on, using the algorithm described in Section 2.3.1. Then, cepstral features are extracted using a tool called sfbcep, part of the SPro suite (Gravier (2003)). Finally, the FSR, computed as described in Section 5.4, is appended to each feature vector. 5.4 Application of the First-to-Second Ratio The FSR, as first defined in Section 4.4, is a sequence-wise feature, i.e., it is defined for the whole input signal. It is then used in the matching phase to modify the resulting score. In the context of the statistical approach, it seemed more appropriate to just append the FSR to the feature vector computed from each frame in the feature extraction phase, and then let the GMM algorithms generalize this knowledge. To do this, we split the input heart sound signal in 5-second windows and we compute an average FSR (FSR) for each signal window. It is then appended to each feature vector computed from frames inside the window. 5.5 The experimental framework The experimental set-up created for the evaluation of this technique was implemented using some tools provided by ALIZE/SpkDet , an open source toolkit for speaker recognition developed by the ELISA consortium between 2004 and 2008 (Bonastre et al. (n.d.)). The adaptation of parts of a system designed for speaker recognition to a different problem was possible because the toolkit is sufficiently general and flexible, and because the features used for heart-sounds biometry are similar to the ones used for speaker recognition, as outlined in Section 2.3. During the world training phase, the system estimates the parameters of the world model λ W using a randomly selected subset of the input signals. The identity models λ i are then derived from the world model W using the Maximum A-Posteriori (MAP) algorithm. During identity verification, the matching score is computed using Equation 16, and the final decision is taken comparing the score to a threshold (θ), as described in Equation 14 229 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 14 Biometrics / Book 1 5.6 Optimization of the method During the development of the system, some parameters have been tuned in order to get the best performance. Namely, three different cepstral feature sets have been considered in (Beritelli & Spadaccini (2010b)): • 16 + 16 Δ + E + ΔE • 16 + 16 Δ + 16 ΔΔ • 19 + 19 Δ + E + ΔE However, the first of these sets proved to be the most effective In (Beritelli & Spadaccini (2010a)) the impact of the FSR and of the number of Gaussian densities in the mixtures was studied. Four different model sizes (128, 256, 512, 1024) were tested, with and without FSR, and the best combination of those parameters, on our database, is 256 Gaussians with FSR. 6. Performance evaluation In this section, we will compare the performance of the two systems described in Section 4 and 5 using a common heart sounds database, that will be further described in Section 6.1. 6.1 Heart sounds database One of the drawbacks of this biometric trait is the absence of large enough heart sound databases, that are needed for the validation of biometric systems. To overcome this problem, we are building a heart sounds database suitable for identity verification performance evaluation. Currently, there are 206 people in the database, 157 male and 49 female; for each person, there are two separate recordings, each lasting from 20 to 70 seconds; the average length of the recordings is 45 seconds. The heart sounds have been acquired using a Thinklabs Rhythm Digital Electronic Stethoscope, connected to a computer via an audio card. The sounds have been converted to the Wave audio format, using 16 bit per second and at a rate of 11025 Hz. One of the two recordings available for each personused to build the models, while the other is used for the computation of matching scores. 6.2 Metrics for performance evaluation A biometric identity verification system can be seen as a binary classifier. Binary classification systems work by comparing matching scores to a threshold; their accuracy is closely linked with the choice of the threshold, which must be selected according to the context of the system. There are two possible errors that a binary classifier can make: • False Match (Type I Error): accept an identity claim even if the template does not match with the model; • False Non-Match (Type II Error): reject an identity claim even if the template matches with the model The importance of errors depends on the context in which the biometric system operates; for instance, in a high-security environment, a Type I error can be critical, while Type II errors could be tolerated. 230 Biometrics Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 15 When evaluating the performance of a biometric system, however, we need to take a threshold-independent approach, because we cannot know its applications in advance. A common performance measure is the Equal Error Rate (EER) (Jain et al. (2008)), defined as the error rate at which the False Match Rate (FMR) is equal to the False Non-Match Rate (FNMR). A finer evaluation of biometric systems can be done by plotting the Detection Error Tradeoff (DET) curve, that is the plot of FMR against FNMR. This allows to study their performance when a low FNMR or FMR is imposed to the system. The DET curve represents the trade-off between security and usability. A system with low FMR is a highly secure one but will lead to more non-matches, and can require the user to try the authentication step more times; a system with low FNMR will be more tolerant and permissive, but will make more false match errors, thus letting more unauthorized users to get a positive match. The choice between the two setups, and between all the intermediate security levels, is strictly application-dependent. 6.3 Results The performance of our two systems has been computed over the heart sounds database, and the results are reported in Table 3. System EER (%) Structural 36.86 Statistical 13.66 Table 3. Performance evaluation of the two heart-sounds biometry systems The huge difference in the performance of the two systems reflects the fact that the first one is not being actively developed since 2009, and it was designed to work on small databases, while the second has already proved to work well on larger databases. It is important to highlight that, in spite of a 25% increment of the size of the database, the error rate remained almost constant with respect to the last evaluation of the system, in which a test over a 165 people database yielded a 13.70% EER. Figure 4 shows the Detection Error Trade-off (DET) curves of the two systems. As stated before, a DET curve shows how the analyzed systemperforms in terms of false matches/false non-matches as the system threshold is changed. In both cases, fixing a false match (resp. false non-match) rate, the systemthat performs better is the one with the lowest false non-match (resp. false match) rate. Looking at Figure 4, it is easy to understand that the statistical system performs better in both high-security (e.g., FMR = 1-2%) and low-security (e.g., FNMR = 1-2%) setups. We can therefore conclude that the statistical approach is definitely more promising that the structural one, at least with the current algorithms and using the database described in 6.1.. 7. Conclusions In this chapter, we presented a novel biometric identification technique that is based on heart sounds. After introducing the advantages and shortcomings of this biometric trait with respect to other traits, we explained how our body produces heart sounds, and the algorithms used to process them. 231 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 16 Biometrics / Book 1 Fig. 4. Detection Error Tradeoff (DET) curves of the two systems A survey of recent works on this field written by other research groups has been presented, showing that there has been a recent increase of interest of the research community in this novel trait. Then, we described the two systems that we built for biometric identification based on heart sounds, one using a structural approach and another leveraging Gaussian Mixture Models. We compared their performance over a database containing more than 200 people, concluding that the statistical system performs better. 7.1 Future directions As this chapter has shown, heart sounds biometry is a promising research topic in the field of novel biometric traits. So far, the academic community has produced several works on this topic, but most of them share the problem that the evaluation is carried on over small databases, making the results obtained difficult to generalize. We feel that the community should start a joint effort for the development of systems and algorithms for heart-sounds biometry, at least creating a common database to be used for the evaluation of different research systems over a shared dataset that will make possible to compare their performance in order to refine them and, over time, develop techniques that might be deployed in real-world scenarios. As larger databases of heart sounds become available to the scientific community, there are some issues that need to be addressed in future research. First of all, the identification performance should be kept low even for larger databases. This means that the matching algorithms will be fine-tuned and a suitable feature set will be identified, probably containing both elements from the frequency domain and the time domain. Next, the mid-term and long-term reliability of heart sounds will be assessed, analyzing how their biometric properties change as time goes by. Additionally, the impact of cardiac diseases on the identification performance will be assessed. 232 Biometrics Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 17 Finally, when the algorithms will be more mature and several independent scientific evaluations will have given positive feedback on the idea, some practical issues like computational efficiency will be tackled, and possibly ad-hoc sensors with embedded matching algorithms will be developed, thus making heart-sounds biometry a suitable alternative to the mainstream biometric traits. 8. References Beritelli, F. & Serrano, S. (2007). Biometric Identification based on Frequency Analysis of Cardiac Sounds, IEEE Transactions on Information Forensics and Security 2(3): 596–604. Beritelli, F. & Spadaccini, A. (2009a). Heart sounds quality analysis for automatic cardiac biometry applications, Proceedings of the 1st IEEE International Workshop on Information Forensics and Security. Beritelli, F. & Spadaccini, A. (2009b). Human Identity Verification based on Mel Frequency Analysis of Digital Heart Sounds, Proceedings of the 16th International Conference on Digital Signal Processing. Beritelli, F. & Spadaccini, A. (2010a). An improved biometric identification system based on heart sounds and gaussian mixture models, Proceedings of the 2010 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications, IEEE, pp. 31–35. Beritelli, F. & Spadaccini, A. (2010b). A statistical approach to biometric identity verification based on heart sounds, Proceedings of the Fourth International Conference on Emerging Security Information, Systems and Technologies (SECURWARE2010), IEEE, pp. 93–96. URL: http://dx.medra.org/10.1109/SECURWARE.2010.23 Biel, L. & Pettersson, O. & Philipson, L. & Wide, P. (2001). ECG Analysis: A New Approach in Human Identification, IEEE Transactions on Instrumentation and Measurement 50(3): 808–812. Bonastre, J.-F., Scheffer, N., Matrouf, D., Fredouille, C., Larcher, A., Preti, R., Pouchoulin, G., Evans, N., Fauve, B. & Mason, J. (n.d.). Alize/Spkdet: a state-of-the-art open source software for speaker recognition. Davis, S. & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech and Signal Processing 28(4): 357–366. El-Bendary, N., Al-Qaheri, H., Zawbaa, H. M., Hamed, M., Hassanien, A. E., Zhao, Q. & Abraham, A. (2010). Hsas: Heart sound authentication system, Nature and Biologically Inspired Computing (NaBIC), 2010 Second World Congress on, pp. 351 –356. Fatemian, S., Agrafioti, F. & Hatzinakos, D. (2010). Heartid: Cardiac biometric recognition, Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on, pp. 1 –5. Gravier, G. (2003). SPro: speech signal processing toolkit. URL: http://gforge.inria.fr/projects/spro Jain, A. K., Flynn, P. & Ross, A. A. (2008). Handbook of Biometrics, Springer. Jain, A. K., Ross, A. A. & Pankanti, S. (2006). Biometrics: A tool for information security, IEEE Transactions on Information Forensics and Security 1(2): 125–143. Jain, A. K., Ross, A. A. & Prabhakar, S. (2004). An introduction to biometric recognition, IEEE Transactions on Circuits and Systems for Video Technology 14(2): 4–20. 233 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 18 Biometrics / Book 1 Jasper, J. & Othman, K. (2010). Feature extraction for human identification based on envelogram signal analysis of cardiac sounds in time-frequency domain, Electronics and Information Engineering (ICEIE), 2010 International Conference On, Vol. 2, pp. V2–228 –V2–233. McLachlan, G. J. & Krishnan, T. (1997). The EM Algorithm and Extensions, Wiley. Phua, K., Chen, J., Dat, T. H. & Shue, L. (2008). Heart sound as a biometric, Pattern Recognition 41(3): 906–919. Prabhakar, S., Pankanti, S. & Jain, A. K. (2003). Biometric recognition: Security & privacy concerns, IEEE Security and Privacy Magazine 1(2): 33–42. Rabiner, L., Schafer, R. & Rader, C. (1969). The chirp z-transform algorithm, Audio and Electroacoustics, IEEE Transactions on 17(2): 86 – 92. Reynolds, D. A., Quatieri, T. F. & Dunn, R. B. (2000). Speaker verification using adapted gaussian mixture models, Digital Signal Processing, p. 2000. Reynolds, D. A. & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture speaker models, IEEE Transactions on Speech and Audio Processing 3: 72–83. Sabarimalai Manikandan, M. & Soman, K. (2010). Robust heart sound activity detection in noisy environments, Electronics Letters 46(16): 1100 –1102. Tran, D. H., Leng, Y. R. & Li, H. (2010). Feature integration for heart sound biometrics, Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 1714 –1717. 234 Biometrics 12 Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli Makoto Fukumoto and Hiroki Hasegawa Fukuoka Institute of Technology, Mukogawa Women’s University, Japan 1. Introduction Music is widely believed as one of the most effective media forms that effect on human psycho-physiologically. Many people expect the effects of music and uses music pieces in various situations. For example, relaxation effect of music is used in change in personal mind in daily life, creation of sedative atmosphere in a home, and therapeutic purposes. Oppositely, parts of music pieces excite people in disco and party. These various effects of music have been investigated in many previous studies. Especially, relaxation effects of music were investigated from various view points with psycho-physiological experiments. Although many previous studies investigated the effects, the effects have not been clarified at all. One of the reasons of that is existence of many music factors; melody, tempo, harmony, rhythm, etc. Anyway, investigations of music and its effects will contribute to theoretical use of music for therapy and so on. This chapter aims to investigate the temporal change in heartbeat intervals in a transition between different sound stimuli. Heartbeat is one of the most important physiological indices and is often used as physiological index to investigate the effect of music and sound stimuli, because the heartbeat reflects autonomic nervous activity [Pappano, 2008] of a listener and is easy to measure. Furthermore, a device measuring electrocardiogram is generally cheaper than devices measuring other physiological indices such as electroencephalogram. Although many previous studies have investigated the effect of music and sound with heartbeat interval as physiological index, very few previous studies have investigated the change in heartbeat in the transition of different stimulus; most of the previous studies have observed average of heartbeat intervals a certain range in pre- and post-listening sound stimulus. Observing temporal change in heartbeat is important and contributes to improvement of exposure method of music and sound to the listeners. For example, time intervals eliciting effect of music and sound and decreasing the effect are important information to determine the time length to exposure them to the listeners. The experimental method in the present study is set by referring to our previous study [Fukumoto et al., 2009]. As further investigations, we newly add No-sound stimulus to relaxation music piece (Air by Bach), white noise, which are music and sound stimuli employed as sound stimuli in the previous study. In the listening experiment of our previous study, two min relaxing music piece and white noise were employed as the different sound stimuli, and these sound stimuli were played twice alternately after five min rest. The alternate exposure of different sound stimuli is also employed in the present study. Biometrics 236 By employing No-sound stimulus as sound stimulus, we can observe change in heartbeat in start and end of noise and music stimuli. It means observing the change in heartbeat caused by listening sound stimuli. Objectives of this study are fundamental investigations of temporal change in heartbeat in a transition between sound stimuli; music, noise, and No-sound. Figure 1 illustrates the objectives and explains that listening sounds elicit deceleration or acceleration of heartbeat. Time cost is also investigated. The results of this study will contribute to develop the studies in music therapy and the studies on musical system reflecting a user's KANSEI information automatically. Fig. 1. Illustration of objectives of this study. As mentioned above, we generally believe relaxation and excitation effects of sounds and utilize the effect in music therapy, reducing patient’s anxiety during a surgical operation and personal use to quickly change our mind and so on. Especially for the effects of music, many researchers have investigated the effect with psycho-physiological indices [Dainow, 1977], however, the effects have not been clarified completely yet. Heartbeat has been often used to investigate the effects of music, because heartbeat is non- invasive physiological index being measured with relatively convenient and low-cost device. Moreover, as mentioned above, heartbeat reflects autonomic nervous activity, and the decrease of the heart rate is caused by a combination of two factors; increase in parasympathetic activity and decrease in sympathetic activity [Pappano, 2008]. Most of the previous studies using heartbeat as physiological index have tried to investigate the effect of music by comparing heartbeats before and after listening to the musical piece. However, heartbeat information has not been used well, because we do not know how long time the sound stimuli would cost to elicit the change of heartbeat. According to a previous study, impression for the listening to music piece on the listener is determined with 1 s [Bigand et al., 2005]. Referring to this finding, response of heartbeat needs longer than 1 s, because general physiological changes come after psychological changes. Some previous studies have investigated the temporal cost that sound elicits heartbeat approximately with different sound stimuli and different conditions. Etzel et al. have investigated the physiological effects of musical pieces inducing different moods on listener [Etzel et al., 2005]. They have not mentioned about the concrete time cost, however, the Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 237 results of heart rate development seem to elicit the change of heart rate from 30 s to 40 s. Gomez et al. have investigated the physiological effects of 30 s noise and music [Gomez & Danuser, 2004]. The results of their study showed that part of noises and musical pieces elicited physiological change including heart rate within 30 s. Moreover, Hazama et al. have associated listener’s preference (with 2-point scale) and heartbeat interval for 40 s musical pieces obtained from their evolutionary computation method composing musical piece [Hazama & Fukumoto, 2009]. Their result showed different distribution of heartbeat intervals for preferred and not preferred musical pieces. These previous studies give us useful and interesting information about the time cost, however, precise time that sound stimulus need to elicit the change of heartbeat has not been clarified. In our previous study [Fukumoto et al., 2009], average of heartbeat intervals in music section was larger than that in noise section significantly (P < 0.05). Furthermore, after the transition of sound stimuli, it took about 30 s to change heartbeat interval from previous section. In the investigation of the change in heartbeat interval, 20 s sliding window was employed. Additionally, a questionnaire asking relaxation feeling was used as psychological index. After the exposure of the sound stimuli, a questionnaire asks the subjects relaxation feeling for noise and music sections, respectively. Psychological result showed that music section induced the higher relaxation feeling than noise section significantly (P < 0.001). The significant difference in subjective relaxation feeling meant that the relaxing music piece and the noise were greatly different. These results in our previous study support the findings by other previous studies that investigated the effects of sound stimuli on heartbeat. What kind areas does this study contribute for? First, as described above, theoretical use of music is first candidate of the application of this study. For example, in Guided Imagery Method, patients listen to several music pieces for a long time. If a therapist know a time length that music pieces need to elicit change in heartbeat, the knowledge must be useful to make a selection of music pieces. From engineering point of view, with a mind to develop a musical system that reflects user’s KANSEI and psychological condition of listeners automatically, several previous studies have investigated the psycho-physiological response for sound stimulus with various physiological indices [Aoto & Ookura, 2007; Chung & Vercoe, 2006; Hazama & Fukumoto, 2009; Healy et al., 1998; Kim & André, 2004; Sugimoto et al., 2008; Yoshida et al., 2006]. Heartbeat is included the indices, and revealing time cost eliciting the change of heartbeat by sounds will contribute to musical information techniques such as automatic musical composition based on physiological index [Fukumoto & Imai, 2008]. This section has explained the background, the previous studies, and the applications as introductions of this study. The remains of this chapter are constructed as below. The section 2 describes experimental method and sound materials, and the section 3 shows results of the experiment. The section 4 discusses the effects of the sound stimuli on temporal change in heartbeat based on the experimental results. Finally, the section 5 concludes this chapter. 2. Procedure and materials This section describes experimental method used in this study. Basically, the experimental method used in this study is referring to our previous study [Fukumoto et al., 2009]: different sound stimuli are included in one experimental set, and electrocardiogram is measured in listening to the sound stimuli. Music and noise were used in our previous Biometrics 238 study. To investigate the effects of various transitions of sound stimuli, we add mute sound as a sound stimulus. Therefore, three kinds of transitions and their inverted sequences are used in the listening experiment. 2.1 Procedure Time length of one experimental set was thirteen min. The set of the listening experiment was basically constructed from two parts; five min rest and eight min sound stimuli. The part of eight min sound stimuli was composed of two min different sound stimuli. These sounds were played two times respectively with change places their sequence. Figure 2 shows example of one experimental set. Especially, the change in heartbeat intervals in second presentations of Sound A and B were used in analysis, because the subjects already realized the sound stimuli in second presentations; remove of surprise for first time listening to the sound stimulus. Based on this experimental set, three experiments were performed, and each of the experiments included two different sound stimuli; Noise and Music, Noise and No-sound, Music and No-sound. Sixteen males and females (mean age: 21.9±0.7 years) participated in the listening experiments as subjects. None of the subjects had professional or college-level music experience. Beforehand for the experiments, the subjects were instructed not to eat, drink and smoke anything from 30 min before the listening experiment. As described in the next section, each of three experiments included two conditions, and all of the subjects participated in the one condition in all of three experiments. Therefore, there were eight kind of combination of experimental conditions: Each of the subjects participated in three experimental sets. Sixteen subjects were randomly and counter-balanced assigned to the eight combinatins. Fig. 2. Experimental procedure of one experimental set. 2.2 Sound stimuli As sound stimuli, a music piece, white noise, and no sound were employed. In the listening experiment, Air in G composed by Bach was used as relaxing musical piece in music sections. This music piece is well known as its relaxing mood and was used as sedative musical piece in a previous study [Yamada et al., 2000]. White noise was used in noise sections. Format of these sound stimuli was WAVE format, and these sound stimuli were played by notebook. Both of the music piece and white noise were stereo recorded sound. In the no-sound section, sound was not played. The subjects listened to these sound stimuli through a headphone. With three experiments, the change in heart beat in the transition between different sounds was investigated. Each of the three experiments was mainly composed of two different sound stimuli. Figures from 3 to 5 show outline of waveforms of sound stimuli including first 5 min rest. Horizontal axis means time, and vertical axis means sound amplitude. As shown in the outline of waveforms, to prevent to induce strange feeling to the subjects, music stimuli and noise were introduced after 10 s of fade-in and finished with 10 s of fade- out. This control of volume enabled us to construct the experiment composed of continuous Rest Sound A Sound B Sound A Sound B Questionnaire 5 min 2 min 2 min 2 min 2 min 13 min Time Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 239 different sound stimuli. In our previous study [Fukumoto et al., 2009], volume of sound stimuli was adjusted initially by the subjects themselves by listening to white noise before the listening experiment. In the present study, to adjust the experimental condition between the subjects further, the volume of the sound stimuli were fixed. The volume of noise and musical piece was arround 66.0 to 70.0 dB(A). Fig. 3. Volumes of sound stimuli used in the experiment 1: (upper: Noise to Music condition, lower: Music to Noise condition). Fig. 4. Volumes of sound stimuli used in the experiment 2: (upper: Noise to No-sound condition, lower: No-sound to Noise condition). Fig. 5. Volumes of sound stimuli used in the experiment 3: (upper: Music to No-sound condition, lower: No-sound to Music condition). 2.3 Psychological index In the questionnaire after listening all of the sound stimuli, semantic differential method [Osgood et al., 1957] was used to ask the subjects relaxation feeling. The subjects estimated their relaxation feelings with 7-point scale of “relaxed - stressful” for the afforded two sound stimuli, respectively. Additionally, in the expeiment 1, the questionnaire also asked the subjects that the subject had an experience listening to the musical piece played in music sections. These questions were written on a paper in Japanese and were explained by an experimenter, and the subjects answered with writing on the paper by themselves. Additionally, in statistical analysis for relaxation feeling, sign test was used. 2.4 Physiological index In the physiological analysis, R-waves were detected from the subjects’ electrocardiogram. R-wave approximately represents time of heartbeat, and R-R intervals represent temporal development of heartbeat intervals. Development of heartbeat interval is represented from (t n , t n+1 -t n ), where t n means time of n-th R-wave. In the analyses, 2 min and 20 s windows Biometrics 240 were used to observe the change in heartbeat intervals. First, all heartbeat intervals were detected from electrocardiogram. Then, average of heartbeat intervals included in each window (t n is in the temporal range of each window) was calculated. Two kinds of windows were used for different observation. 2 min window was same as its temporal length of each section and was used for broad and rough observation for change in heartbeat intervals in each section. On the other hand, 20 s was used for detail observation, and temporal length of 20 s window was determined referring to a method of a previous study [Yoshida et al., 2006]. Heartbeat intervals in general condition contain two kinds of period of heartbeat fluctuations reflecting autonomic nervous activity and respiration, and the temporal lengths are 4 s and 10 s (in 60 heartbeats per 1-min). Length of 20 s window was available for omitting these heartbeat fluctuations. In a part of the analyses, the 20 s window slid as queue processing, and time of the window was defined as central time obtained from average of first and last time of the window. The analyses with 20 s window were mainly applied for latter two listening sections. 3. Experimental results This section shows the psychological and physiological results of the listening experiment. In the statistical analyses, pairwise comparison was used, because individual variation of subjective evaluation and physiological indices were considered large. 3.1 Results of psychological index First, result of questionnaire only for the experiment 1 showed that all of the subjects knew the music piece played in music sections. Figure 6 shows subjective relaxation feeling in the experiment 1. Higher point means higher subjective relaxation feeling. As shown in Fig. 6, large difference between relaxation feelings of Noise and Music conditions was observed. Statistical analysis for this result showed that there was significant difference between Noise and Music conditions (P < 0.001). Figure 7 shows subjective relaxation feelings of Noise and No-music Stimuli in the experiment 2. Average point of No-sound was larger than that of Noise a little bit, however, there was no significant difference. Figure 8 shows subjective relaxation feelings of Music and No-sound in the experiment 3. Relaxation point of Music was also larger than that of No-sound (P < 0.001). As summary of the results of relaxation feelings, Music stimulus elicited the largest relaxation feeling. Noise elicited the smallest. No-sound was intermediate, however, Noise and No-sound were almost same level. 1 2 3 4 5 6 7 Noise Music P o i n t Fig. 6. Subjective relaxation feeling to Noise and Music stimuli in the experiment 1 (N=16). P<0.001 Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 241 1 2 3 4 5 6 7 Noise No-sound P o i n t Fig. 7. Subjective relaxation feeling to Noise and No-sound stimuli in the experiment 2 (N=16). 1 2 3 4 5 6 7 Music No-sound P o i n t Fig. 8. Subjective relaxation feeling to Music and No-sound stimuli in the experiment 3 (N=16). 3.2 Results of physiological index This subsection shows the results of physiological index. First, rough change in heartbeat intervals in each section is investigated. Then, detail change in heartbeat is observed with 20 s window. 3.2.1 Change in heartbeat in each 2 min section Figures from 9 to 11 show whole changes in heartbeat in each three experiment, respectively. In the analysis, average heartbeat intervals in 2 min sections were used. For rest section prior to listening to sound stimuli, last 2 min in 5 min rest was utilized as analyzed section. Each result was composed of average and standard deviation between subjects and was obtained after analysis of each subject. 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Rest Music1 Noise1 Music2 Noise2 H e a r t b e a t I n t e r v a l 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Rest Noise1 Music1 Noise2 Music2 H e a r t b e a t I n t e r v a l Fig. 9. Average heartbeat interval in each 2 min section in the experiment 1 (Left: Music to Noise condition (N=8), Right: Noise to Music condition (N=8)). P<0.001 Biometrics 242 A common tendency between the results shown in Figures from 9 to 11 was gradual extend of heartbeat intervals in correspondence with time development. It means that lowest average heartbeat was observed in the rest section except Music to No-sound condition in the experiment 3. From these results, large differences between different sound stimuli were not observed. 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Rest Noise1 No-sound1 Noise2 No-sound2 H e a r t b e a t I n t e r v a l 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Rest No-sound1 Noise1 No-sound2 Noise2 H e a r t b e a t I n t e r v a l Fig. 10. Average heartbeat interval in each 2 min section in the experiment 2 (Left: Noise to No-sound condition (N=8), Right: No-sound to Noise condition (N=8)). 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Rest Music1 No-sound1 Music2 No-sound2 H e a r t b e a t I n t e r v a l 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Rest No-sound1 Music1 No-sound2 Music2 H e a r t b e a t I n t e r v a l Fig. 11. Average heartbeat interval in each 2 min section in the experiment 3 (Left: Music to No-sound condition (N=8), Right: No-sound to Music condition (N=8)). 3.2.2 Detail change in heartbeat in a transition of sound stimuli Figures 12 and 13 show the results of the experiment 1. Figure 12 shows change in heartbeat in latter 2 sections from 540 s to 780 s with 20 s window. The sound changed its content at 660 s. We can observe gradual change in heartbeat and tendencies that shorten of heartbeat in noise sections and extension of heartbeat in music sections.To investigate time cost eliciting change in heartbeat by listening to each sound stimulus, 20 s sliding window was used. As previous analysis section, 650 s window (640 s to 660 s) was used. From 650 s and latter windows are target sections and compared with 650 s statistically. Figure 13 shows 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 550 570 590 610 630 650 670 690 710 730 750 770 Time [s] H e a r t b e a t I n t e r v a l [ s ] Music to Noise condition Noise to Music condition Fig. 12. Average heartbeat interval in 20 s sections from 540 to 780 s in the experiment 1. Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 243 the results of analysis with sliding window. Gradual change with sliding window was observed, and plotted line means P-value obtained from statistical analysis. In the result of Noise to Music condition, heartbeat interval tended to be extended from previous section in 761 s window. It means that 111 s was needed to change heartbeat interval by listening to music from listening to noise. In the result of Music to Noise, heartbeat interval was shortened in 678 s window significantly: Time cost to elicit the change of heartbeat from Music to Noise was 28 s. 0.76 0.78 0.8 0.82 0.84 0.86 0.88 6 5 0 6 7 3 6 7 8 6 8 3 6 8 8 6 9 3 6 9 8 7 0 3 7 0 8 7 1 3 7 1 8 7 2 3 7 2 8 7 3 3 7 3 8 7 4 3 7 4 8 7 5 3 7 5 8 7 6 3 7 6 8 Time [s] H e a r t b e a t I n t e r v a l [ s ] 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 S i g n i f i c a n c e Average P-value 0.76 0.78 0.8 0.82 0.84 0.86 0.88 6 5 0 6 7 3 6 7 8 6 8 3 6 8 8 6 9 3 6 9 8 7 0 3 7 0 8 7 1 3 7 1 8 7 2 3 7 2 8 7 3 3 7 3 8 7 4 3 7 4 8 7 5 3 7 5 8 7 6 3 7 6 8 Time [s] H e a r t b e a t I n t e r v a l [ s ] 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 S i g n i f i c a n c e Average P-value Fig. 13. Detail observation in the transition using 20 s sliding window in the experiment 1 (upper: Noise to Music condition, lower: Music to Noise condition). Figure 14 shows change in heartbeat in latter 2 sections in the experiment 2. Gradual change in heartbeat was also observed in this result. 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 550 570 590 610 630 650 670 690 710 730 750 770 Time [s] H e a r t b e a t I n t e r v a l [ s ] Noise to No-sound condition No-sound to Noise condition Fig. 14. Average heartbeat interval in 20 s sections from 540 to 780 s in the experiment 2. Figure 15 shows the results of analysis with sliding window. In the result of Noise to No- sound condition, heartbeat interval tended to be extended from previous section in 671 s Biometrics 244 window: Smallest P-value was observed in the section (P = 0.0547). After that, heartbeat was gradually extended. 21 s was needed to change heartbeat interval from listening to noise. In the result of No-sound to Noise, heartbeat interval was shortened around 690 s window, furthermore, it was shortened in 756 s window significantly. 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 6 5 0 6 7 3 6 7 8 6 8 3 6 8 8 6 9 3 6 9 8 7 0 3 7 0 8 7 1 3 7 1 8 7 2 3 7 2 8 7 3 3 7 3 8 7 4 3 7 4 8 7 5 3 7 5 8 7 6 3 7 6 8 Time [s] H e a r t b e a t I n t e r v a l [ s ] 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 S i g n i f i c a n c e Average P-value 0.76 0.78 0.8 0.82 0.84 0.86 0.88 6 5 0 6 7 3 6 7 8 6 8 3 6 8 8 6 9 3 6 9 8 7 0 3 7 0 8 7 1 3 7 1 8 7 2 3 7 2 8 7 3 3 7 3 8 7 4 3 7 4 8 7 5 3 7 5 8 7 6 3 7 6 8 Time [s] H e a r t b e a t I n t e r v a l [ s ] 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 S i g n i f i c a n c e Average P-value Fig. 15. Detail observation in the transition using 20 s sliding window in the experiment 2 (upper: Noise to No-sound condition, lower: No-sound to Noise condition). Figure 16 shows change in heartbeat in latter 2 sections in the experiment 3. Gradual and rapid changes in heartbeat were observed in this result. 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 550 570 590 610 630 650 670 690 710 730 750 770 Time [s] H e a r t b e a t I n t e r v a l [ s ] Music to No-sound condition No-sound to Music condition Fig. 16. Average heartbeat interval in 20 s sections from 540 to 780 s in the experiment 3. Figure 17 shows the results of analysis with sliding window. In the result of Music to No- sound condition, heartbeat interval tended to be shortened from previous section in 686 s window. After that, heartbeat interval kept same level till the end of the listening experiment. In the result of No-sound to Music, heartbeat interval was extended around 752 s window. Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 245 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 6 5 0 6 7 3 6 7 8 6 8 3 6 8 8 6 9 3 6 9 8 7 0 3 7 0 8 7 1 3 7 1 8 7 2 3 7 2 8 7 3 3 7 3 8 7 4 3 7 4 8 7 5 3 7 5 8 7 6 3 7 6 8 Time [s] H e a r t b e a t I n t e r v a l [ s ] 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 S i g n i f i c a n c e Average P-value 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 6 5 0 6 7 3 6 7 8 6 8 3 6 8 8 6 9 3 6 9 8 7 0 3 7 0 8 7 1 3 7 1 8 7 2 3 7 2 8 7 3 3 7 3 8 7 4 3 7 4 8 7 5 3 7 5 8 7 6 3 7 6 8 Time [s] H e a r t b e a t I n t e r v a l [ s ] 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 S i g n i f i c a n c e Average P-value Fig. 17. Detail observation in the transition using 20 s sliding window in the experiment 3 (upper: Music to No-sound condition, lower: No-sound to Music condition). 4. Discussion Result of relaxation feelings of sound stimuli showed that music elicited highest relaxation and noise elicited lowest one. These results support our previous study with more concrete investigations. No-sound was intermediate. The sequence of them is reasonable, however, difference between Noise and No-sound was little (no significant). Generally, noise is believed as sound everyone dislikes. One of the reasons for that is the subjects were in the listening experiment without any task. If they have tasks to do, noise deny doing the tasks, therefore, they might feel the noise as more negative. In addition, adjustment of volume of sound stimuli between the subjects might reduce the volume for some of the subjects (The adjustment was not applied for the sound stimuli in our previous study). The reduction of volume might elicit little negative impression for noise. Entire change in 2 min average heartbeat interval did not show large difference between sections, while a significant difference was observed between music and noise conditions in our previous study [Fukumoto et al., 2009]. Adjustment of volume might effect on change in heartbeat interval as small change. Same tendency between the present and the previous study as gradual extend of heartbeat interval was observed. The tendency is considered as caused of sitting on a chair for a long time. Detail observation in a transition between different stimuli showed obvious change in heartbeat intervals. In the all experiments, tendency and significant change in heartbeat were observed. Time cost eliciting the change and direction of the change (extension or shorten) was quite different between experimental conditions: The direction of change obeyed subjective relaxation feelings. In our previous study that employed music and noise Biometrics 246 as sound stimuli, about 30 s was needed to observe obvious change in heartbeat interval from previous condition to post transition. In the present study, time cost of noise to elicit the change from music piece was 28 s. While the time cost of noise was almost same time length as our previous study, the time cost of music was longer than our previous study: 111 s was needed. As mentioned above, the difference was considered as caused of adjustment of sounds’ volume. For Noise condition, the time costs were 28 s from Music. From No-sound to Noise, to observe significant change, it costs 106 s. However, around 690 s window, shorten of heartbeat was observed. With these results, Noise elicited the change in heartbeat than Music. The reason why noise affects on heartbeat earlier than music is considered that noise is a continuous sound content: from start to end, noise sound contents unpleasant sound. On the other hand, music effects on psycho-physiologically by its development. The different of sound contents might be cause of difference of time cost. Additionally, physiological change is considered faster for unpleasant and dangerous stimuli, because the human have to defend or run away from dangerous things quickly. No-sound was a new additional condition from our previous study, and it played a role of release from music and noise and obvious start of the stimuli from no-sound. From noise to no-sound, it was needed almost 20 s to elicit the extension heartbeat intervals: With the analysis used in the present study, shorter reaction was not measured. The time cost was the fastest among the experimental conditions. Furthermore, from Music to No-sound, 36 s was needed to shorten the heartbeat intervals. The time cost was shorter than that of music. Generally, impressions and feelings to sound stimuli are believed to remain for a long time, however, the results with no-sound condition suggest that change in heartbeat interval occurred by sound stimuli disappear from 20 s to 36 s. Strength of the impressions and feelings would relate to the time cost. In some results, although temporal changes in heartbeat intervals just after the transition were observed, higher P-values around the end of post section were also observed. This tendency was also shown in temporal development of heartbeat intervals. Homeostasis is an important function assisting the body in maintaining a constant internal environment [Rubinson & Lang, 2008], and this function keeps heartbeat interval in certain range. In the physiological evaluation processes, it should be noted that heartbeat interval has its limits to change. Furthermore, the change in heartbeat was mainly caused of psychological change as discussed above, however, some previous studies indicated the possibility that tempo of the sound stimulus entrain listener’s heartbeat. Previous studies have investigated the physiological effect of tempo of sound stimuli on heartbeat with simple tone [Bason & Celler, 1976] and musical piece [Kusunoki et al., 1972] and have observed the entrainment and synchronization of heartbeat by the sound stimulus. According to the results of these previous studies, the change of heartbeat interval might be partly affected physiologically from tempo of sound stimulus. On the other hand, there was no possibility that heartbeat interval in noise section was entrained by white noise because white noise does not have any tempo and cycle. The difference of physical property of sound stimuli used in the listening experiment might affect the change of heartbeat intervals in each section. To clarify the effect of the tempo of sound stimuli, further investigation comparing tempo of musical piece and heartbeat interval is needed. 5. Conclusions In this study, as fundamental investigation of temporal development of heartbeat interval in listening to sound stimuli, we investigated the effects of relaxing musical piece and white Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 247 noise, and no-sound on heartbeat through three listening experiments. These sound stimuli induced the subjects’ different relaxation feelings between the sound stimuli. Averages of heartbeat intervals were almost same level. As precise observation with statistical analysis, detail temporal development of heartbeat interval in transitions between two diffrent sound stimuli were observed using 20-s sliding window. Some of the results, especiallyl for noise, supports previous studies. However, time cost of change in heartbeat by listening to music were longer than the result in our previous study. Some previous studies have investigated the relationship user’s KANSEI and physiological response in listening sound and have tried to apply the relation between them for developing musical system; selection, arrangement and creation of musical piece. These approach aims to reflect user’s KANSEI to the system and to absorb individual difference. The results of this study will contribute to these trials, especially if the system uses heartbeat interval as physiological index. As future study, we will investigate the psycho-physiological effects of the sound stimuli inducing different relaxation feeling with spectral analysis [Akselrod et al., 1981]. With the analysis, we can separately evaluates autonomic nervous activities based on heartbeat intervals; sympathetic and parasympathetic nervous activity. The results with the analysis would show us more precise physiological change in the change in the sound stimuli, and it will contribute to more effective applications based on user’s physiological information. 6. Acknowledgment This work was supported partly by Grant from Computer Science Laboratory, Fukuoka Institute of Technology. 7. References Akselrod, S., Gordon, D., Ubel, F. A., Shanon, D. C., Barger, A. C., & Cohen, R. J. (1981). Power Spectrum Analysis of Heart Rate Fluctuation: A Quantitative Probe of Beat- to-Beat Cardiovascular Control, Science, Vol. 213, No. 10, pp. 220-222, doi:10.1126/science.6166045 Aoto, T. & Ohkura, M. (2007). Study on Usage of Biological Signal to Evaluate Kansei of a System, Proceedings of the 1st International Conference on Kansei Engineering and Emotion Research 2007, Sapporo, L-9 Bason, P. T. & Celler, B. G. (1972). Control of the Heart Rate by External Stimuli, Nature, Vol. 238, pp. 279-280 Bigand, E., Filipic, S., & Lalitte P. (2005). The time course of emotional responses to music, Annals of the New York Academy of Sciences, pp. 429-437 Chung, J. & Vercoe, G. S. (2006). The affective remixer: personalized music arranging, Conference on Human Factors in Computing Systems, pp. 393-398 Dainow, E. (1977). Physical effects and motor responses to music, Journal of Research in Music Education, pp. 211-221 Etzel, J. A., Johnsen, E. L., Dickerson, J., Tranel, D., & Adolphs, R. (2006). Cardiovascular and respiratory responses during musical mood induction, International Journal of Psychophysiology, Vol. 61, pp. 57-69, doi:10.1016/j.ijpsycho.2005.10.025 Fukumoto, M. & Imai, J. (2008). Evolutionary Computation System for Musical Composition using Listener’s Heartbeat Information, IEEJ Transactions on Electrical and Electronic Engineering, Vol. 3, No. 6, pp. 629-631, doi:10.1002/tee.20324 Biometrics 248 Fukumoto, M., Hasegawa, H., Hazama, T., & Nagashima, T. (2009). Temporal Development of Heartbeat Intervals in Transition of Sound Stimuli Inducing Different Relaxation Feelings, Proceedings of Biometrics and Kansei Engineering, pp. 84-89, ISBN: 978-0- 7695-3692-7, Cieszyn, Poland Gomez, P. & Danuser, B. (2004). Affective and physiological responses to environmental noises and music, International Journal of Psychophysiology, Vol. 53, No. 2, pp. 91-103, doi:10.1016/j.ijpsycho.2004.02.002 Hazama T. & Fukumoto, M. (2009). Relationship between Evaluation Values in GA using Physiological Index and Subjective Evaluation, Proceedings of the 2009 IEICE General Conference, 2009, p.105, (in Japanese) Healey, J., Picard, R., Dabek, F. (1998). A New Affect-Perceiving Interface and Its Application to Personalized Music Selection, Proceedings of the 1998 Workshop on Perceptual User Interfaces Kim, S. & André, E. (2004). A Generate and Sense Approach to Automated Music Composition, Proceedings of the 9th international conference on Intelligent user interfaces, Funchal, Madeira, Portugal, pp. 268-270 Kusunoki, Y., Fukumoto, M., & Nagashima, T. (2003). A Statistical Method of Detecting Synchronization for Cardio-Music Synchrogram, IEICE Transactions, Fundamentals, Vol. E86-A, No. 9, 2003, pp. 2241-2247 Osgood, C. E., Suci, G. J., & Tannenbaum, P. (1957). The measurement of meaning, University of Illinois Press, IL, USA Pappano, A. J. (2008). ‘Section IV: The Cardiovascular System’, in Berne and Levy Physiology (6th ed.), Koeppen, B. M. & Stanton, B. A. (Eds.), Mosby Elsevier, PA, USA, pp. 287-414 Rubinson, K. & Lang, E. J. (2008) ‘Section II: The Nervous System’, in Berne and Levy Physiology (6th ed.), Koeppen, B. M. & Stanton, B. A. (Eds.), Mosby Elsevier, PA, USA, pp. 51-230 Sugimoto, T., Legaspi, R., Ota, A., Moriyama, K., Kurihara S., & Numao, M. (2008). Modelling affective-based music compositional intelligence with the aid of ANS analyses, Knowledge-Based Systems, Vol. 21, Issue 3, pp. 200-208, doi:10.1016/j.knosys.2007.11.010 Yamada, T., Yamazaki, I., Misaki, K., & Sawada, Y. (2000). A Basic Study on Psychophysiological Changes in Listening Music, Japanese Bulletin of Arts Therapy, vol. 31, no. 2, 2000, pp. 33-41, (in Japanese) Yoshida, Y., Yokoyama, K. & Ishii, N. (2006). Real-time Continuous Assessment Method for Mental and Physiological Condition using Heart Rate Variability, IEEJ Transactions on Electronics, Information and Systems, vol. 126, no. 12, 2006, pp. 1441-1446 (in Japanese) 13 The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women Charles F. Streckfus and Cynthia Guajardo-Edwards University of Texas Dental Branch at Houston, United States of America 1. Introduction 1.1 Background information Biometrics is the science and technology of measuring and analyzing biological data. It also refers to technologies that measures and analyzes human body characteristics for identification purposes. In the context of this book chapter, identification will refer to the recognition of those individuals in a disease state i.e., carcinoma of the breast. Using “start- of-the-art” mass spectrometry protein analysis, the author will demonstrate the use of salivary protein profiles to recognize individuals at risk for carcinoma of the breast. Proteomic analyses of varying body fluids are propelling the field of medical research forward at unprecedented rates due to its consistent ability to identify proteins that are at the fentomole level in concentration. These advancements have also benefited biometric research to the point where saliva is currently recognized as an excellent diagnostic medium for biometric authentication of human body characteristics. The saliva microbiome, for example, is reputed to be biometrically as accurate as a fingerprint. Collectively, these efforts are in the area of biological verification; however, biometric can be applied to identify the biological characteristics of a diseased individual. 1.2 Why Saliva as a diagnostic media? 1.2.1 Analytical advantages of Saliva Saliva as a diagnostic fluid has significant biochemical and logistical advantages when compared to blood. Bio-chemically, saliva is a clear liquid with an average protein concentration of 1.5 to 2.0 mg/ml. As a consequence of this low protein concentration, it was once assumed that this was a major drawback for using saliva as a diagnostic fluid; however, current ultra sensitive analyte detection techniques have eliminated this barrier. Saliva specimen preparation is simple involving centrifugation prior to storage and the addition of a cocktail of protease inhibitors to reduce protein degradation for long-term storage. Blood is a far more complex medium. A decision has to be made as to whether to use serum or plasma. Serum has a total protein concentration of approximately 60-80 mg/ml. Since serum possesses more proteins than saliva, assaying trace amounts of “factors” (e.g., oncogenes, etc.), may result in a greater risk of non-specific interference and a greater chance Biometrics 250 for hydrostatic (and other) interactions between the factors and the abundant serum proteins. Serum also possesses numerous carrier proteins, e.g., albumin, which must either be removed or treated prior to being assayed for protein content. Additionally, it has been demonstrated that clotting removes many background proteins, which may be altered in the presence of disease. It has been demonstrated that enzymatic activity continues during this process, which may cleave proteins from many relevant pathways (Koomen et al., 2005). It would be ideal if all enzymatic activity in serum would cease at the time of collection; however, proteomic analyses of serum has shown that this is not the case. As a consequence, plasma is also being explored as a diagnostic fluid. The main consideration in using plasma is the selection of a proper anticoagulant (Koomen et al., 2005; Teisner et al., 1983). Heparin for example can be used as an anti-clotting agent; however, current research has found that heparin has a relatively short half life (3 to 4 hours) and can produce products of coagulation which are abundantly comparable to those assayed in serum. Based on these observations, it is recommended that blood specimens be collected with ethylenediamine tetraacetic acid (EDTA). 1.2.2 Collection advantages of Saliva From a logistical perspective, the collection of saliva is safe (e.g., no needle punctures), non- invasive and relatively simple, and may be collected repeatedly without discomfort to the patient [4]. Consequently it may be possible to develop a simplified method for “home- testing”, testing in a “health fair” setting or in dental clinics where individuals are available for periodic oral examinations. This diagnostic potential could reach many individuals who for personal, logistical or economical reasons lack access to preventive care. Blood is a more complicated medium to collect. It requires highly trained personnel to collect it and if collected incorrectly, can lead to misinterpretations which can result in patient mismanagement (Ernest & Balance, 2006). Blood specimens need to be collected in a specific sequence and under-filling tubes with additives may possibly alter protein analyses. Additionally, if specimens are collected during hospital or clinical settings, there may be a lapse of time before being processed. 1.2.3 Saliva collection The oral cavity receives secretions from three pairs of major salivary glands and numerous minor salivary glands that are located on the oral buccal mucosa, palate, and tongue each producing a unique type of secretion with varying protein constituents (Birkhed & Heintze, 1989). For example the parotid and Von Ebner glands (located on the tongue) produce serous secretions while the minor salivary glands produce mucinous secretions. The submandibular and sublingual glands, however, produce mixed secretions which are both serous and mucinous. As a consequence, composite or “whole” saliva is preferred as it enhances the chances of finding a biomarker due to the variety of sources from which it derives and because of its simplicity to collect. There are basically two types of saliva to collect. One type is “resting” or unstimulated whole saliva and the other is stimulated whole saliva. There are several methods for collecting unstimulated whole saliva. These include the draining or drool method, spitting method, suction method and the swab method. These methods will yield 0.47, 0.47, 0.54 and 0.52 ml/minute of saliva respectively. Of the four methods, the most reliable is the suction method with a reliability coefficient of r = 0.93. It also revealed a within subject variance of The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 251 0.14. This is a very reliable method; however, a vacuum pump is required to collect the specimens (Birkhed & Heintze, 1989). There are several drawbacks when using unstimulated saliva. The major problem is the small amount of saliva derived from collection. The 0.47-0.54 ml/minute is the range for healthy individuals (0.25–0.35 ml/minute normal range) under ideal conditions using those aforementioned collection methodologies. If the subject is taking medications that decrease flow rates (e.g., anti-hypertensive medications) the amount collected will be significantly reduced. Additionally, if the subject has autoimmune disorders (e.g., Sjögren’s syndrome), has under gone head and neck radiation, or is very elderly, it will be difficult to obtain 0.5 ml over a five minute period. Unstimulated saliva flow rates are also influenced by circadian and circannual rhythms. Therefore, for consistency, individuals will need to be serially assessed at approximately the same time of day that the baseline specimen was collected. All other participants will need to be collected at approximately the same time in order to reduce inter-variability among the participating subjects. In conclusion, due to the small quantity of specimen obtained from these techniques and the large within subject variance, one can conclude that using unstimulated saliva is not the ideal medium for cancer biomarker discover. The alternative to using unstimulated whole saliva is obviously to use stimulated whole saliva. Stimulated secretions produce about three times the volume of unstimulated secretions and are not subjected to the effects of circadian rhythm. Additionally, you will be able to collect sufficient quantities of saliva despite health status and medication usage. The flow rate range is 1 – 3 ml/minute for healthy individuals (Birkhed & Heintze, 1989; Gu et al., 2004). There are two methods for collecting stimulated whole saliva. One method of collection is the gustatory method and the other is the reflexive or “masticatory” technique. The gustatory technique requires the use of an oral based secretory stimulant. Citric acid is the most widely used stimulant. Five drops of a 1-6% citric acid solution is applied to the dorsum of the tongue every 30 seconds. The saliva accumulates in the mouth and is expectorated intermittently for a period of five minutes. This technique produces copious amounts of saliva; however, the reliability is only r = 0.76 and has a within subject variance of 0.49. The reflexive method is based on the reflex response occurring during the mastication of a bolus of food. Usually, a standardized bolus (1 gram) of paraffin or a gum base (Wrigley Co., Peoria, IL) is given to the test subject and they chew the substance at a regular rate. The subject expectorates intermittently during the collection period for duration of five minutes. This is an accurate technique as it has a reliability coefficient of r = 0.95 and a within subject variability of 0.11. The authors recommend this salivary collection method for biomarker discovery. The procedure for collecting Stimulated Whole Salivary Gland Secretions is as follows: A standard piece of unflavored gum base (1.0 - 1.5 g.) is placed in the subject's mouth. The armamentarium used for this procedure is illustrated in Figure 1. The patient is asked to swallow any accumulated saliva and then instructed to chew the gum at a regular rate (using a metronome). The subject, upon sufficient accumulation of saliva in the oral cavity, expectorates periodically into a preweighed disposable plastic cup. This procedure is continued for a period of five minutes. The cup with the saliva specimen is reweighed and the flow rate determined gravimetrically. The volume and flow rate is then recorded along with a brief description of the specimen’s physical appearance (Gu et al., 2004). Biometrics 252 1.2.4. Long-term Saliva specimen banking Roughly 2 - 5 ml of whole saliva will be obtained from the individual. In order to minimize the degradation of the proteins, protease inhibitor cocktail (Sigma, 1 mg/ml whole saliva) and 1 mM of sodium orthovanadate are added immediately after sample collection (Shevchenko et al., 2002). All samples are kept on ice during the process. The specimen is next divided into 0.5 ml aliquots, placed into bar code labeled cryotubes, and frozen (-80°C). To assess specimen degradation, ten healthy subjects were serially sampled for saliva over a five-year period. We used c-erbB-2 to test for specimen stability as this is a large 185-kDa protein, which would be susceptible to degradation by proteases and other biochemical activity. The results are shown in Figure 1 and illustrate protein stability when frozen at - 80°C.These results are consistent with Wu et al, 1993 where they assayed serially sampled salivary specimens which were collected over a ten year period for total protein, lactoferrin (77 kDa) and histidine rich proteins concentrations. In their study, they found no concentration differences due to specimen aging. Fig. 1. Armamentarium for the collection and storage of stimulated whole saliva 1.3 Studies using Saliva protein profiling for disease state detection The majority of the literature concerning human saliva biometrics is associated with the oral cavity and its associated maladies. An example of this statement is demonstrated in a manuscript assessing salivary proteins associated with burning mouth syndrome (Moura et al., 2007). The principle objective the present study was to analyze the characteristics of salivary production and its composition in individuals with burning mouth syndrome. The investigators compared salivary flow rates, potassium, iron, chloride, thiocyanate, magnesium, calcium, phosphorus, glucose, total protein and urea concentrations, as well as the expression profile of salivary proteins by SDS-PAGE among healthy individuals and those diagnosed with burning mouth syndrome. The results of the study showed that mean salivary flow rates among control patients were lower than that of burning mouth syndrome patients. Chloride, phosphorus and potassium levels were elevated in patients with burning The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 253 mouth syndrome (p = 0.041, 0.001 and 0.034, respectively). Total salivary protein concentrations were reduced in individuals with burning mouth syndrome (p = 0.223). Additionally, the analysis of the expression of salivary proteins by Coomassie blue SDS- PAGE revealed a lower expression of low molecular weight proteins in individuals with burning mouth syndrome compared to healthy controls. The results suggested that the identification and characterization of low molecular weight salivary proteins in burning mouth syndrome may be important in understanding BMS pathogenesis, thus contributing to its diagnosis and treatment. Another study using salivary protein profiles investigated the modification of the salivary proteome occurring in type 1 diabetes and to highlight potential biomarkers of the disorder. High-resolution two-dimensional gel electrophoresis and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry was combined to perform a large scale analysis of the salivary specimens. The proteomic comparison of saliva samples from healthy subjects and poorly controlled type-1 diabetes patients revealed a modulation of 23 proteins. Fourteen isoforms of α-amylase, one prolactin inducible protein, three isoforms of salivary acidic protein-1, and three isoforms of salivary cystatins SA-1 were detected as under expressed proteins, whereas two isoforms of serotransferrin were over expressed secondary to type-1 diabetes. The proteins under expressed were all known to be implicated in the oral anti-inflammatory process, suggesting that the pathology induced a decrease of non-immunological defense of oral cavity. As only particular isoforms of proteins were modulated, type-1 diabetes seemed to differentially affect posttranslational modification of the proteins (Hirtz et al., 2006). An additional study (Delaleu et al., 2008) investigated the involvement of 87 proteins measured in serum and 75 proteins analyzed in saliva in spontaneous experimental Sjögren's syndrome. In addition, they intended to compute a model of the immunological situation representing the overt disease stage of Sjögren's syndrome. In this animal study, they used non-diabetic, non-obese diabetic mice for salivary gland dysfunction. The mice aged 21 weeks and were evaluated for salivary gland function, salivary gland inflammation and extra-glandular disease manifestations. The analytes, comprising chemokines, cytokines, growth factors, autoantibodies and other biomarkers, were quantified using multi-analyte profile technology and fluorescence-activated cell sorting. Age-matched and sex-matched Balb/c mice served as a reference. The investigators found non-diabetic, non- obese diabetic mice tended to exhibit impaired salivary flow, glandular inflammation and increased secretory SSB (anti-La) levels. Thirty-eight biomarkers in serum and 34 in saliva obtained from non-diabetic, non-obese diabetic mice were significantly different from those in Balb/c mice. Eighteen biomarkers in serum and three chemokines measured in saliva could predict strain membership with 80% to 100% accuracy. Factor analyses identified principal components mostly correlating with one clinical aspect of Sjögren's syndrome and having distinct associations with components extracted from other families of proteins. They concluded that the autoimmune manifestations of Sjögren's syndrome are greatly independent and associated with various immunological processes; however, CD40, CD40 ligand, IL-18, granulocyte chemotactic protein-2 and anti-muscarinic M3 receptor IgG3 may connect the different aspects of Sjögren's syndrome. Processes related to the adaptive immune system appear to promote Sjögren's syndrome with a strong involvement of T- helper-2 related proteins in hyposalivation. This approach further established saliva as an attractive biofluid for biomarker analyses in Sjögren's syndrome and provides a basis for the comparison and selection of potential drug targets and diagnostic markers (Delaleu et al., 2008). Biometrics 254 2. Current research in Salivary protein profiling for cancer detection 2.1 Methods 2.1.1 Study design The purpose of this study was to determine if individuals could be protein profiled and potentially classified as having cancer. The investigator also wanted to ascertain if there were alterations of the protein profiles due to the primary tissue site and the varying degree of tumor staging. In order to achieve this objective, the investigators collected saliva from women that were healthy and from those diagnosed with carcinoma breast in the following stages: Stage 0, Stage I, Stage IIa and Stage IIb. All the tumors were adenocarcinomas. Additionally, specimens were collected from women diagnosed with varying gynecological carcinomas. These included women diagnosed with moderate cervical dysplasia, severe cervical dysplasia and cervical carcinoma in situ. These tumors were all squamous cell carcinomas. Women diagnosed with ovarian and endometrial carcinomas were also included in the study. These malignancies were identified as adenocarcinomas. The final group consisted of women diagnosed with head and neck squamous cell carcinomas ten women with varying stages of development. Due to the difficulty in obtaining early stage tumors for ovarian, endometrial and head/neck carcinomas, a composite of varying staged patients formed these saliva pools. This study was performed under the UTHSC IRB approved protocol number HSC-DB-05- 0394. All procedures were in accordance with the ethical standards of the UTHSC IRB and with the Helsinki Declaration of 1975, as revised in 1983. The specimens were banked at the University of Texas Dental Branch Saliva repository, which stores the specimens at -80°C. Ten saliva specimens were pooled for each type of carcinoma. The saliva samples were pooled by combining equal volumes of cleared stimulated whole saliva from a set of archived healthy and cancer subjects. The subjects were matched for age and race and were non-tobacco users. Previous studies by the investigator have demonstrated that properly prepared specimens can remain in storage for a long period of time. 2.1.2 Saliva collection and sample preparation Stimulated whole salivary gland secretion is based on the reflex response occurring during the mastication of a bolus of food. Usually, a standardized bolus (1 gram) of paraffin or a gum base (generously provided by the Wrigley Co., Peoria, IL) is given to the subject to chew at a regular rate. The individual, upon sufficient accumulation of saliva in the oral cavity, expectorates periodically into a preweighed disposable plastic cup. This procedure is continued for a period of five minutes. The volume and flow rate is then recorded along with a brief description of the specimen’s physical appearance (Navazesh &Christensen, 1982). The cup with the saliva specimen is reweighed and the flow rate determined gravimetrically. The authors recommend this salivary collection method with the following modifications for consistent protein analyses. A protease inhibitor from Sigma Co (St. Louis, MI, USA) is added along with enough orthovanadate from a 100mM stock solution to bring its concentration to 1mM. The treated samples were centrifuged for 10 minutes at top speed in a table top centrifuge. The supernatant was divided into 1 ml aliquots and frozen at -80°C. 2.1.3 LC-MS/MS mass spectroscopy with isotopic labeling Recent advances in mass spectrometry, liquid chromatography, analytical software and bioinformatics have enabled the researchers to analyze complex peptide mixtures with the The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 255 ability to detect proteins differing in abundance by over 8 orders of magnitude (Wilmarth et al., 2004). One current method is isotopic labeling coupled with liquid chromatography tandem mass spectrometry (IL-LC-MS/MS) to characterize the salivary proteome (Gu et al., 2004). The main approach for discovery is a mass spectroscopy based method that uses isotope coding of complex protein mixtures such as tissue extracts, blood, urine or saliva to identify differentially expressed proteins (18). The approach readily identifies changes in the level of expression, thus permitting the analysis of putative regulatory pathways providing information regarding the pathological disturbances in addition to potential biomarkers of disease. The analysis was performed on a tandem QqTOF QStar XL mass spectrometer (Applied Biosystems, Foster City, CA, USA) equipped with a LC Packings (Sunnyvale, CA, USA) HPLC for capillary chromatography. The HPLC is coupled to the mass spectrometer by a nanospray ESI head (Protana, Odense, Denmark) for maximal sensitivity (Shevchenko et al., 2002). The advantage of tandem mass spectrometry combined with LC is enhanced sensitivity and the peptide separations afforded by chromatography. Thus even in complex protein mixtures MS/MS data can be used to sequence and identify peptides by sequence analysis with a high degree of confidence (Birkhed et al., 1989; Gu et al., 2004; Shevchenko et al., 2002; Wilmarth et al., 2004). Isotopic labeling of protein mixtures has proven to be a useful technique for the analysis of relative expression levels of proteins in complex protein mixtures such as plasma, saliva urine or cell extracts. There are numerous methods that are based on isotopically labeled protein modifying reagents to label or tag proteins to determine relative or absolute concentrations in complex mixtures. The higher resolution offered by the tandem Qq-TOF mass spectrometer is ideally suited to isotopically labeled applications (Gu et al., 2004; Koomen et al 2004; Ward et al., 1990). Applied Biosystems recently introduced iTRAQ reagents (Gu et al., 2004; Koomen et al 2004; Ward et al., 1990), which are amino reactive compounds that are used to label peptides in a total protein digest of a fluid such as saliva. The real advantage is that the tag remains intact through TOF-MS analysis; however, it is revealed during collision induced dissociation by MSMS analysis. Thus in the MSMS spectrum for each peptide there is a fingerprint indicating the amount of that peptide from each of the different protein pools. Since virtually all of the peptides in a mixture are labeled by the reaction, numerous proteins in complex mixtures are identified and can be compared for their relative concentrations in each mixture. Thus even in complex mixtures there is a high degree of confidence in the identification. 2.1.4 Salivary protein analyses with iTRAQ Briefly, the saliva samples were thawed and immediately centrifuged to remove insoluble materials. The supernatant was assayed for protein using the Bio-Rad protein assay (Hercules, CA, USA) and an aliquot containing 100 µg of each specimen was precipitated with 6 volumes of -20ºC acetone. The precipitate was resuspended and treated according to the manufacturers instructions. Protein digestion and reaction with iTRAQ labels was carried out as previously described and according to the manufacturer’s instructions (Applied Biosystems, Foster City, CA). Briefly, the acetone precipitable protein was centrifuged in a table top centrifuge at 15,000 x g for 20 minutes. The acetone supernatant was removed and the pellet resuspended in 20 ųl dissolution buffer. The soluble fraction was denatured and disulfides reduced by incubation in the presence of 0.1% SDS and 5 mM TCEP (tris-(2-carboxyethyl)phosphine)) at 60ºC for one hour. Cysteine residues were Biometrics 256 blocked by incubation at room temperature for 10 minutes with MMTS (methyl methane- thiosulfonate). Trypsin was added to the mixture to a protein:trypsin ratio of 10:1. The mixture was incubated overnight at 37ºC. The protein digests were labeled by mixing with the appropriate iTRAQ reagent and incubating at room temperature for one hour. On completion of the labeling reaction, the four separate iTRAQ reaction mixtures were combined. Since there are a number of components that can interfere with the LC-MS/MS analysis, the labeled peptides are partially purified by a combination of strong cation exchange followed by reverse phase chromatography on preparative columns. The combined peptide mixture is diluted 10 fold with loading buffer (10 mM KH 2 PO 4 in 25% acetonitrile at pH 3.0) and applied by syringe to an ICAT Cartridge-Cation Exchange column (Applied Biosystems, Foster City, CA) column that has been equilibrated with the same buffer. The column is washed with 1 ml loading buffer to remove contaminants. To improve the resolution of peptides during LCMSMS analysis, the peptide mixture is partially purified by elution from the cation exchange column in 3 fractions. Stepwise elution from the column is achieved with sequential 0.5 ml aliquots of 10 mM KH2PO4 at pH 3.0 in 25% acetonitrile containing 116 mM, 233 mM and 350 mM KCl respectively. The fractions are evaporated by Speed Vacuum to about 30% of their volume to remove the acetonitrile and then slowly applied to an Opti-Lynx Trap C18 100 ul reverse phase column (Alltech, Deerfield, IL) with a syringe. The column was washed with 1 ml of 2% acetonitrile in 0.1% formic acid and eluted in one fraction with 0.3 ml of 30% acetonitrile in 0.1% formic acid. The fractions were dried by lyophilization and resuspended in 10 ul 0.1% formic acid in 20% acetonitrile. Each of the three fractions was analyzed by reverse phase LCMSMS. The analytical strategy is illustrated in Figure 2. Fig. 2. Analytical strategy for quantifying peptides using iTRAQ tagging The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 257 2.1.5 Reverse phase LCMSMS The desalted and concentrated peptide mixtures were quantified and identified by nano-LC- MS/MS on an API QSTAR XL mass spectrometer (ABS Sciex Instruments) operating in positive ion mode. The chromatographic system consists of an UltiMate nano-HPLC and FAMOS auto-sampler (Dionex LC Packings). Peptides were loaded on a 75cm x 10 cm, 3cm fused silica C18 capillary column, followed by mobile phase elution: buffer (A) 0.1% formic acid in 2% acetonitrile/98% Milli-Q water and buffer (B): 0.1% formic acid in 98% acetonitrile/2% Milli-Q water. The peptides were eluted from 2% buffer B to 30% buffer B over 180 minutes at a flow rate 220 nL/min. The LC eluent was directed to a NanoES source for ESI/MS/MS analysis. Using information-dependent acquisition, peptides were selected for collision induced dissociation (CID) by alternating between an MS (1 sec) survey scan and MS/MS (3 sec) scans. The mass spectrometer automatically chooses the top two ions for fragmentation with a 60 second dynamic exclusion time. The IDA collision energy parameters were optimized based upon the charge state and mass value of the precursor ions. Each saliva sample set there are three separate LCMSMS analyses. The accumulated MSMS spectra are analyzed by ProQuant and ProGroup software packages (Applied Biosystems) using the SwissProt database for protein identification. The ProQuant analysis was carried out with a 75% confidence cutoff with a mass deviation of 0.15 Da for the precursor and 0.1 Da for the fragment ions. The ProGroup reports were generated with a 95% confidence level for protein identification. 2.1.6 Bioinformatics The Swiss-Prot database was employed for protein identification while the PathwayStudio ® bioinformatics software package was used to determine Venn diagrams were also constructed using the NIH software program (http://ncrr.pnl.gov). Graphic comparisons with log conversions and error bars for protein expression were produced using the ProQuant ® software. 2.1.7 Western blot analysis for marker validation 2.1.7.1 Preparation of samples We selected the protein profilin-1 for validating the presence of these proteins in saliva. The profilin-1 antibody was a rabbit polyclonal from the Abcam Co. #Ab10608 diluted 1:1000. The saliva samples from a healthy individuals, benign tumor subjects and individuals diagnosed with Stage IIa her2/neu receptor positive breast cancer subjects and Stage IIa her2/neu receptor negative breast cancer subjects were pooled by combining equal volumes of cleared stimulated whole saliva from a set of archived specimens. The pooled saliva was mixed with loading buffer (Laemmli buffer containing BME) in 1:1 ratio. The sample was then incubated at 95°C for 5 minutes and was then loaded onto the 4- 15% Tris-HCl polyacrylamide gel. Four-fifteen percent Tris-HCl polyacrylamide gels were loaded with molecular weight markers, controls, and the pooled saliva samples. Electrophoresis was run at 200 Volts, 30 minutes in 1X TGS buffer. The gels were equilibrated and extra thick blot paper and PVDF membranes were soaked in 1X TGS buffer for 15 minutes prior to running Western Transfer. Semi-dry transfer apparatus was used. Transfer conditions were 0.52mA constant, 17 volts, 19 minutes. Polyvinylidene fluoride (PVDF) membranes were air dried for minimum of 1 hour. Dry PVDF membranes were activated in methanol for about 10 seconds then transferred to soak in 1X PBS-T for 3 washes Biometrics 258 of 5 minutes each. The membranes were then incubated for 1 hour in 5% NFDM in PBS-T. Afterwards the membranes were washed 3 times for 5 minutes in PBS-T. The membranes were incubated overnight with a primary antibody in PBS-T. The membranes were washed 3 times for 5 minutes in PBS-T and were incubated for 4 hours with secondary antibody (HRP conjugate) in PBS-T. Again, the membranes were washed 3 times for 5 minutes in PBS- T. Finally, the membranes were treated with ECL plus detection reagents and photographed with exposure of 800 seconds. 3. Experimental results in salivary protein profiling for cancer detection 3.1 Mass spectrometry analysis Tables 1-5 summarize the results of the proteomic analysis and illustrates protein comparisons between breast (Stage 0 – IIb), cervical (moderate, severe dysplasia and in situ) Proteins Breast Ovarian Endo. Cervical H & N Staging Gene ID Accession Stage 0 Stage I Stage IIa Stage IIb Variable Variable Mod. Severe Stage 0 Variable A1AT P01009 4.32 ANXA3 P12429 0.74 CO3 P01024 2.16 HPTR P00739 3.57 K1C16 P08779 5.37 COBA1 P12107 0.57 LUZP1 Q86V48 0.62 ZN248 Q8NDW4 0.84 CYTC P01034 0.77 KAC P01834 1.42 K2C6C P48666 3.40 K1CJ P13645 0.47 0.26 SCOT2 Q9BYC2 0.83 0.50 VEGP P31025 1.36 0.47 PROF1 P07737 0.74 NGAL P80188 0.90 NUCB2 P80303 1.28 HEMO P02790 0.74 CYTD P28325 0.82 CRIS3 P54108 0.65 ACBP P07108 1.42 KLK P06870 0.80 1.46 0.86 PIP P12273 0.88 0.80 PERL P22079 1.31 0.88 PPIB P23284 1.32 Table 1. The table shows which proteins are unique to each type and stage of carcinoma The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 259 Protein Status Breast Ovarian Endo. Cervical H & N Staging Stage 0 Stage I Stage IIa Stage IIb Variable Variable Mod. Severe Stage 0 Variable Unique Proteins Up Regulated 3 4 0 0 1 0 0 0 1 2 Down Regulated 6 3 4 2 0 0 4 0 0 2 Total Proteins 9 7 4 2 1 0 4 0 1 4 Common Proteins Up Regulated 16 7 2 7 17 21 20 13 13 15 Down Regulated 8 11 11 7 5 5 7 4 8 7 Total Proteins 24 18 13 14 22 26 27 17 21 22 Total Number Proteins Up Regulated 19 11 2 7 18 21 20 13 14 17 Down Regulated 14 14 15 9 5 5 11 4 8 9 Grand Total 33 25 17 16 23 26 31 17 22 26 Table 2. The table demonstrates the unique, common and total protein counts among the varying carcinomas and their stages ovarian, endometrial and head and neck cancer subjects. The values exhibited in table 1 and table 2 represent the ratio of disease state to healthy state. Simply stated the ratio for each protein is a result of the log sum integrated area under each peak formed when plotting the intensity versus mass-to-charge ratios for each peptide (Boehm et al., 2007). The value is divided by the corresponding value yielded for that particular protein in the healthy cohort. Significance between ratios is determined by t-test analysis. Those with values greater than 1.000 were considered up-regulated proteins while those with values less than 1.000 were considered down-regulated proteins. In total, 166 proteins were identified at a confidence level at >95. Of these there were 76 proteins that were determined to be expressed significantly different (p<0.05) in the saliva from cancer subjects as compared to the healthy individuals. Table 1 lists the 25 proteins that are unique to each type of carcinoma with their corresponding ratios. Unique being defined as not associated with the other types of carcinoma. Eleven of the unique proteins were up-regulated and fourteen were down-regulated. There were no unique proteins associated with endometrial carcinoma or severe dysplasia of the cervix. Carcinoma of the breast and its associated stages accounted for nearly 58% of the total unique proteins. Table 3 indicates the function of the associated proteins. Generally, the proteins appear to be diverse with the exception of the cytoskeletal associated proteins which comprised 20% of the protein panel. Additionally, over 50% of the proteins in table 3 are referenced in the literature for the presence of carcinoma in blood and cell supernatants from cancer cell lines (Polanski & Anderson, 2006). The following table represents the up and down-regulated proteins that over-lapped the various types of carcinomas. Numerous proteins, 21 percent, were common to both squamous cell carcinoma and the adenocarcinoma cancer types. This was particularly noticed among the S100 proteins. There were 13 proteins that were exclusive to the gynecological carcinomas despite the fact that some were squamous cell carcinoma or adenocarcinoma cancer types. These proteins comprised 25% of the protein panel. Table 5 illustrates the function of the proteins listed in table 4. Biometrics 260 Gene ID Accession Protein Name Reported Function A1AT P01009 Alpha 1 protease inhibitor Protease inhibitor ACBP P07108 AcylCoA binding protein Transport protein ANXA3 P12429 Annexin A3 Inhibits phospholipase activity CO3 P01024 Complement 3 precursor Activates complement system COBA1 P12107 Collagen alpha-1 chain Fibrillogenesis CRISP3 P54108 Cysteine rich secretory protein Immune response CYTC P01034 Cystatin C Inhibitor of cysteine proteases CYTD P28325 Cystatin D precursor Protein degradation & inhibitor HEMO P02790 Hemopexin precursor Heme transporter protein HPTR P00739 Haptoglobin – 1 related protein proteolysis K1C16 P08779 Cytoskeleton 16 Cytoskeleton associated K1CJ P13645 Cytoskeleton 10 Cytoskeleton associated K2C6C P48666 Cytoskeleton 6C Cytoskeleton associated KAC P01834 Immunoglobulin kappa chain C region Immune response KLK P06870 Kallikrein-1 Cleaves kininogen LUZP1 Q86V48 Leucine zipper protein-1 Nucleus functioning protein NGAL P80188 Neutrophil Gelatinase associated lipocalin Iron trafficking protein NUCB2 P80303 Nucleobindin-2 precursor Calcium binding protein PERL P22079 Lactoperoxidase Transport, antimicrobial PIP P12273 Prolactin inducible protein precursor Secretory actin binding protein PPIB P23284 Peptidyl-prolyl cis trans isomerase B Accelerates protein folding PROF1 P07737 Profilin-1 Cytoskeleton associated SCOT2 Q9BYC2 Succinyl CoA Ketone body catabolism VEGP P31025 Lipocalin - 1 Ligand binding protein ZN248 Q8NDW4 Zinc finger protein 248 Transcriptional regulation Table 3. The table reveals the protein functions associated with the varying carcinomas and their stages Proteins Breast Ovarian Endo. Cervical H & N Staging Gene ID Accession Stage 0 Stage I Stage IIa Stage IIb Variable Variable Mod. Severe Stage 0 Variable 1433S O70456 1.50 2.14 2.09 2.12 ADA32 Q8TC27 0.52 0.69 ALBU P02768 0.86 0.91 1.46 1.15 1.23 1.94 2.50 ALK1 P03973 2.94 1.33 AMYS P04745 0.84 1.11 0.75 0.84 0.47 0.58 0.55 0.78 0.65 ANXA1 P04083 1.54 1.43 3.79 2.04 2.06 0.63 BPIL1 Q8N4F0 1.15 1.94 0.86 1.78 CAH6 P23280 1.46 0.85 1.57 CYTA P01040 1.63 1.16 CYTN P01037 0.77 0.53 1.16 0.67 0.73 CYTS P01036 0.80 0.64 0.59 CYTT P09228 1.14 0.59 0.49 0.62 0.50 DEF3 P59666 3.86 2.61 1.17 Table 4. Represents the up and down-regulated proteins that over-lapped the various types of carcinomas. The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 261 Proteins Breast Ovarian Endo. Cervical H & N Staging Gene ID Accession Stage 0 Stage I Stage IIa Stage IIb Variable Variable Mod. Severe Stage 0 Variable DMBT1 Q9UGM3 1.83 1.40 1.22 ENOA P06733 0.74 0.81 1.71 1.43 1.55 FABPE Q01469 1.56 0.55 FGRL1 Q8N441 0.21 0.44 0.64 HPT P00738 0.65 2.22 1.94 2.64 IGHA1 P01876 1.23 1.55 1.55 1.23 1.34 IGHA2 P01877 1.81 1.65 2.18 IGHG1 P01857 1.26 0.83 1.41 1.48 1.50 IGHG2 P01859 0.82 0.80 1.58 1.89 IGJ P01591 1.23 1.41 0.77 1.30 K1C13 P13646 3.50 6.21 5.91 0.73 K1C14 P02533 5.34 6.55 K1CM P13646 3.92 2.95 K1CP P08779 0.13 0.13 2.61 K22O Q01546 7.47 5.40 K2C1 P04264 0.46 0.63 0.33 0.32 2.07 2.22 K2C4 P19013 4.42 4.47 4.17 5.46 5.91 1.11 0.54 K2C5 P13647 3.08 5.39 2.65 3.49 0.75 K2C6A P02538 4.34 5.78 K2C6E P48668 0.10 0.09 LAC P01842 0.84 1.43 1.51 1.62 LCN1 P31025 0.72 1.33 LV3B P80748 1.33 1.26 MUC P01871 1.22 0.72 1.45 MUC5B Q9HC84 1.25 0.71 0.54 1.26 1.39 2.04 1.75 2.15 PERM P05164 0.72 1.88 PIGR P01833 0.82 0.88 1.29 PRDX1 Q06830 1.67 1.22 S10A7 P31151 2.05 0.67 1.26 0.74 0.76 S10A8 P05109 1.46 1.36 0.76 1.33 1.49 3.46 3.35 1.42 1.73 S10A9 P06702 1.53 0.67 1.15 3.86 3.10 1.38 1.73 SPLC2 Q96DR5 1.22 1.18 0.87 0.68 0.74 1.22 2.11 SPRR3 Q9UBC9 1.10 0.87 1.42 1.92 THIO P10599 1.51 1.43 TRFE P02787 0.80 0.80 0.83 2.00 2.36 TRFL P02788 1.36 1.19 0.64 TRY P07477 1.46 1.66 0.84 0.88 ZA2G P25311 0.90 0.91 1.13 1.29 Table 4. Represents the up and down-regulated proteins that over-lapped the various types of carcinomas.(continuation) Biometrics 262 Gene ID Accession Protein Name Reported Function 1433S O70456 1433 sigma Signaling ADA32 Q8TC27 Disintegrin Role in sperm development ALBU P02768 Albumin Protein transport ALK1 P03973 Antileukoproteinase Acid-stable proteinase inhibitor AMYS P04745 Amylase Enzyme ANXA1 P04083 Annexin A1 Involved in exocytosis BPIL1 Q8N4F0 Bactericidal/permeability-increasing protein-like 1 Lipid binding CAH6 P23280 Carbonic anhydrase 6 Hydration of carbon dioxide. CYTA P01040 Cystatin-A Intracellular thiol proteinase inhibitor. CYTN P01037 Cystatin-SN Cysteine proteinase inhibitors CYTS P01036 Cystatin-S Inhibits papain and ficin CYTT P09228 Cystatin-SA Thiol protease inhibitor DEF3 P59666 Neutrophil defensin 3 Antimicrobial activity DMBT1 Q9UGM3 Deleted in malignant brain tumors 1 Candidate tumor suppressor gene ENOA P06733 Alpha-enolase Multifunctional enzyme FABPE Q01469 Fatty acid-binding protein, epidermal Keratinocyte differentiation FGRL1 Q8N441 Fibroblast growth factor receptor 1 Negative effect on cell proliferation HPT P00738 Haptoglobin Combines with free hemoglobin IGHA1 P01876 Ig alpha-1 chain C region Immune Response IGHA2 P01877 Ig alpha-2 chain C region Immune Response IGHG1 P01857 Ig gamma-1 chain C region Immune Response IGHG2 P01859 Ig gamma-2 chain C region Immune Response IGJ P01591 Immunoglobulin J chain Immune Response K1C13 P13646 Cytokeratin 13 Cytoskeleton K1C14 P02533 Cytokeratin 14 Cytoskeleton K1CM P13646 Keratin, type I cytoskeletal 13 Cytoskeleton K1CP P08779 Keratin, type I cytoskeletal 16 Cytoskeleton K22O Q01546 Keratin, type II cytoskeletal 2 oral Contributes to terminal cornification K2C1 P04264 Keratin, type II cytoskeletal 1 Regulate the activity of kinases K2C4 P19013 Keratin, type II cytoskeletal 4 Cytoskeleton K2C5 P13647 Keratin, type II cytoskeletal 5 Protein binding K2C6A P02538 Keratin, type II cytoskeletal 6A Protein binding K2C6C P48666 Keratin, type II cytoskeletal 6C Structural molecule activity K2C6E P48668 Keratin, type II cytoskeletal 6C Cytoskeleton organization LAC P01842 Lactoperoxidase Antimicrobial LCN1 P31025 Lipocalin-1 precursor Plays a role in taste reception. LV3B P80748 Ig lambda chain V-III region LOI Activates complement pathway MUC P01871 Ig Mu Chain Immune Response MUC5B Q9HC84 Mucin-5B Contribute to the lubricating PERM P05164 Myeloperoxidase Microbiocidal activity PIGR P01833 Polymeric immunoglobulin receptor Binds IgA and IgM at cell surface PRDX1 Q06830 Peroxiredoxin-1 Involved in redox regulation Table 5. The table represents the function of the common proteins associated with the various types of carcinomas. The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 263 Gene ID Accession Protein Name Reported Function S10A7 P31151 S100 A7 Interacts with RANBP9 S10A8 P05109 S100 A8 Calcium-binding protein S10A9 P06702 S100 A9 Calcium-binding protein SPLC2 Q96DR5 Epithelial carcinoma associated protein-2 Lipid binding protein SPRR3 Q9UBC9 Small proline-rich protein 3 Protein of keratinocytes. THIO P10599 Thioredoxin Redox activity TRFE P02787 Serotransferrin Iron binding transport proteins TRFL P02788 Lactotransferrin precursor Iron binding transport proteins TRY P07477 Trypsin-1 Activity against synthetic substrates ZA2G P25311 Zinc alpha-2 glycoprotein Signaling Table 5. The table represents the function of the common proteins associated with the various types of carcinomas.(continuation) 3.2 Western blot analyses The results of the western blot suggest the presence of profilin-1 in saliva. Figure 3 illustrates the presence of profilin-1 protein in both the human submandibular gland cell lysates and in whole saliva. Fig. 3. The western blot indicates the presence of the 15 kDa profilin-1 protein in human submandibular gland cell lysates and stimulated whole saliva The western blot analyses also revealed the presence of profilin-1 In SKBR3 cell lysates. Fig. 4. Also revealed the presence of profilin-1 In SKBR3 HER2/neu receptor positive breast cancer cell lysates. Figure 4 Biometrics 264 Figure 5 illustrates the presence of profilin-1 in the saliva sampled from healthy, benign and malignant tumor patients. Profilin is a down-regulated protein in the presence of malignancy and it is visualized by the lighter bands associated with malignancy. It is also worth noting that the Her2/neu receptor negative band is darker than the Her2/neu receptor positive counterpart suggesting further down-regulation of the profilin-1 protein. Fig. 5. This figure illustrates the presence and modulation of salivary profilin-1 protein secondary to HER2/neu receptor status. 3.3 Conclusions It is interesting to note that the salivary protein profile mimics the findings of protein alterations secondary to cancer in serum, tissues and cell lysates that are numerously cited in the literature (Polanski et al, 2006). For example, there are salivary protein alterations in the cytoskeleton phenotype, which are strongly implicated in tumor growth and cancer metastases. Additionally, there are alterations in the metabolic, growth and signaling pathways all of which provide support for the concept that salivary protein profiles are altered secondary to the presence of cancer. The investigator has spent 15 years investigating the phenomena and postulates, that in the presence of disease, i.d., carcinoma, that there is an over abundance of protein resulting from the rapid growth of the malignancy which in turn, produces a humoral response in the salivary glands. This response results in altered salivary protein concentrations. Another possible explanation is active transport of the proteins of interest. It is plausible that these proteins are secreted into saliva as consequence of localized regulatory function in the oral cavity via signal transduction similar to the proposed explanation of HER-2/neu protein in nipple aspirates. These “loop” mechanisms, in health, appear to be in equilibrium both intercellularly and extracellularly with each pathway fulfilling the resultant phenotypic process of growth, proliferation, and differentiation. Figure 5 The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 265 The results of this preliminary study suggest that there are two panels of biomarkers that need to be assessed. There are those biomarkers that are unique to each tumor site i.e., breast, cervix, etc. and those that are common to various types of malignancies. The finding may prove to be useful. It may be possible to develop a “global” profile for the overall detection of cancer while using the unique protein identifiers to determine the site of the tumor. Theoretically, by using the biomarkers S100A7, S100A8 and S100A9 you may predict the presence of a malignancy and by using the biomarkers listed in table 1 you, potentially could determine the site of the tumor. This is merely speculation at this time; however, advances in mass spectrometry and microarray technology may make the concept cost effective and logistically feasible. These results are very preliminary and investigated only the abundant proteins found in saliva. Further research is required to determine the low abundant protein profiles associated with the various carcinomas. Additionally, the peptidome and the fragmetome need to be evaluated to further enhance the profile. Other carcinomas such as lung, colon, pancreas, bladder, skin, brain and kidney at varying stages of development also need to be studied and their resultant profiles added to the catalogue of cancer associated protein profiles. One major question that is essential to answer is, “Are we identifying the primary site of the tumor’s origin?” This information is essential for rendering “tailored” treatment regimens. The results of this study suggest that it is feasible to employ biometric principles for cancer detection. It is obvious from the resultant data that cancer is a very complex disease process involving numerous gene alterations in varying numbers of molecular pathways. This in turn renders the concept of the single biomarker as passé with respect to cancer detection. The multi-marker approach is evident as it avoids the pitfalls of a single marker concept as demonstrated by the current shortcomings of the prostate specific antigen marker for prostate cancer. Due to the overwhelming complexity of the malady high-throughput protein detection coupled with protein modeling in health and disease may yield the tools necessary for life saving early cancer detection. 4. Acknowledgement The research presented in this chapter was supported by the Avon Breast Cancer Foundation (#07-2007-071), Komen Foundation (KG080928), Gillson-Longenbaugh Foundation and the Texas Ignition Fund. The authors would also like to thank Dr. William Dubinsky for the LC-MS/MS salivary mass spectrometry analyses and Dr. Karen Storthz for her technical expertise concerning the western blot analyses. 5. References Birkhed D & Heintze U (1989). Salivary secretion rate, buffer capacity, and pH. In: Human Saliva: Clinical Chemistry and Microbiology - Volume 1. Tenovuo JO, editor, pp. (25- 30), CRC press, Boca Raton, ISBN 0-8493-6391-8. Costantino JP, Gail MH, Pee D, Anderson S, Redmond CK, Benichou J & Wieand HS (1999): Validation studies for models projecting the risk of invasive and total breast cancer incidence. J Natl Cancer Inst , 91. 18, (Sept, 1999), pp. (1541-1548), ISSN 0027-8874. Delaleu N, Immervoll H, Cornelius J & Jonsson R. Biomarker profiles in serum and saliva of experimental Sjögren's syndrome: associations with specific autoimmune Biometrics 266 manifestations. Arthritis Research & Therapy 10. 1,(Feb 2008), (R22-R36), ISSN (electronic) 1478-6362. Ernst DJ & Balance LO (2006). Quality collection: the phlebotomist’s role in pre-analytical errors. Med Lab Obs, 83. 9, (Sept 2006), pp. (30-38), ISSN 0580-7247. Gu S, Liu Z, Pan S, Jiang Z, Lu H, Amit O, Bradbury E.M, Hu CA & Chen X (2004). Global investigation of p53-induced apoptosis through quantitative proteomic profiling using comparative amino acid-coded tagging. Mol Cell Proteomics, 3.10, (Oct 2004), pp. (998-1008), ISSN 1535-9476. Hirtz C, Chevalier F, Sommerer N, Raingeard I, Bringer J, Rossignol M & and Deville de Périere D (2006). Salivary protein profiling in type I diabetes using two- dimensional electrophoresis and mass spectrometry. Clinical Proteomics 2. 2, (Sept 2006), pp. (117-127), ISSN 1542-6416. Kallenberg CG, Vissink A, Kroese FG, Abdulahad WH, Bootsma H. What have we learned from clinical trials in primary Sjögren's syndrome about pathogenesis? Arthritis Res Ther. 2011 Feb 28;13(1):205. Koomen JM, Li D, Xiao L, Liu TC, Coombes KR, Abbruzzese J & Kobayashi R (2005). Direct tandem mass spectrometry reveals limitations in protein profiling experiments for plasma biomarker recovery. J Prot Res, 4. 3, (2005), pp. (972-981), ISSN 1535-3893. Koomen JM, Zhao H, Li D, Abbruzzese J, Baggerly K & Kobayashi R (2004). Diagnostic protein discovery using proteolytic peptide targeting and identification. Rapid Com Mass Spec, 18. 21, (Nov 2004), pp. (2537-2548) ISSN 1097-0231. Moura S, Sousa J, Lima D, Negreiros A, Silva F & Costa L. (2007). Burning mouth syndrome (BMS): sialometric and sialochemical analysis and salivary protein profile. Gerodontology, 24, (Sept., 2007), pp. (173–176), ISSN 0734-0664. Navazesh M, Christensen C (1982). A comparison of whole mouth resting and stimulated salivary measurements. J Dent Res, 61. 10, (Oct 1982), pp. 1158-1162, ISSN 0022- 0345. Polanski M & Anderson NL. A list of candidate cancer biomarkers for targeted proteomics (2006). Biomarker Insights, 7. 1, (Feb 2006), pp. (1-48), ISSN 1177-2719. Shevchenko AV, Chernushevic I, Shevchenko A, Wilm M & Mann M (2002). "De novo" sequencing of peptides recovered from in-gel digested proteins by nanoelectrospray tandem mass spectrometry. Mol Biotech, 20. 1, (Jan 2002), pp. (107- 118) ISSN 1073-6083. Teisner B, Davey MW & Grudzinskas JG (1983). Interaction between heparin and plasma proteins analyzed by crossed immunoelectrophoresis and affinity chromatography. Clin Chem Acta, 127. 3, (Feb 1983) pp. 413-417, ISSN 0009-8981. Ward LD, Reid, GE, Moritz RL & Simpson RJ (1990). Strategies for internal amino acid sequence analysis of proteins separated by polyacrylamide gel electrophoresis. J Chroma, 519. 1, (Oct 1990), pp. (199-216), ISSN 0021-9673. Wilmarth PA, Riviere MA, Rustvold DL, Lauten JD, Madden TE & David LL (2004). Two dimensional liquid chromatography study of the human whole saliva proteome. J Prot Res, 3. 5, (Sept-Oct, 2004), pp. (1017-1023), ISSN 1535-3893. Wu AJ, Atkinson JC, Fox PC, Baum BJ & Ship JA (1993). Cross-sectional and longitudinal analyses of stimulated parotid salivary constituents in healthy, different-aged subjects. J Geron, 41. 5, (Sept,1993); pp. (M219-224), ISSN 1079-5006.                       Biometrics Edited by Jucheng Yang Published by InTech Janeza Trdine 9, 51000 Rijeka, Croatia Copyright © 2011 InTech All chapters are Open Access articles distributed under the Creative Commons Non Commercial Share Alike Attribution 3.0 license, which permits to copy, distribute, transmit, and adapt the work in any medium, so long as the original work is properly cited. After this work has been published by InTech, authors have the right to republish it, in whole or part, in any publication of which they are the author, and to make other personal use of the work. Any republication, referencing or personal use of the work must explicitly identify the original source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. The publisher assumes no responsibility for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained in the book. Publishing Process Manager Mirna Cvijic  Technical Editor Teodora Smiljanic Cover Designer Jan Hyrat Image Copyright Andy Piatt, 2010. Used under license from Shutterstock.com First published July, 2011 Printed in Croatia A free online edition of this book is available at www.intechopen.com Additional hard copies can be obtained from [email protected] Biometrics, Edited by Jucheng Yang p. cm. ISBN 978-953-307-618-8 free online editions of InTech Books and Journals can be found at www.intechopen.com     Contents   Preface IX Part 1 Chapter 1 Physical Biometrics Speaker Recognition Homayoon Beigi 1 3 Chapter 2 Finger Vein Recognition 29 Kejun Wang, Hui Ma, Oluwatoyin P. Popoola and Jingyu Li Minutiae-based Fingerprint Extraction and Recognition 55 Naser Zaeri Non-minutiae Based Fingerprint Descriptor Jucheng Yang Retinal Identification Mikael Agopov 99 79 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Retinal Vessel Tree as Biometric Pattern 115 Marcos Ortega and Manuel G. Penedo DNA Biometrics 139 Masaki Hashiyada Behavioral Biometrics 155 Chapter 7 Part 2 Chapter 8 Keystroke Dynamics Authentication 157 Romain Giot, Mohamad El-Abed and Christophe Rosenberger DWT Domain On-Line Signature Verification 183 Isao Nakanishi, Shouta Koike, Yoshio Itoh and Shigang Li Chapter 9 Jiexin Gao and Dimitrios Hatzinakos Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 217 Francesco Beritelli and Andrea Spadaccini Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 235 Makoto Fukumoto and Hiroki Hasegawa The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 249 Charles F. Streckfus and Cynthia Guajardo-Edwards Chapter 11 Chapter 12 Chapter 13 . Methods and Applications Foteini Agrafioti.VI Contents Part 3 Chapter 10 Medical Biometrics 197 199 Heart Biometrics: Theory.   .   .  behavioural  and  other  points  of  view  (medical  biometrics).  .   The  key  objective  of  the  book  is  to  provide  comprehensive  reference  and  text  on  human authentication and people identity verification from physiological.  As  the  most  reliable  personal  identification. In chapter 5 the retina  scanning technique is considered in detail throughout its historical evolution and try  to use the birefringence of the retinal nerve fiber layer (RNFL) as a basis for successful  identification.  template  matching  with  relative  distance  and  angle  and  wavelet  moment  fusing  with  PCA  and  LDA  transform.  biometrics  is  used  as  a  form  of  identity  access  management  and  access  control. In chapter 2 the author proposes some new algorithms for finger  vein  recognition  such  as  using  oriented  filtering. Chapter 4 provides a comprehensive idea about  some of the well‐known non‐minutiae based descriptors during the last two decades  and also proposes a novel non‐minutiae based fingerprint descriptor with tessellated  invariant moment features and Support Vector Machine (SVM).  Chapter  6  proposes  a  fully‐automatic  authentication  system  using  the  retinal  vessel  tree  pattern  as  biometric  characteristic. DNA is intrinsically digital and does not change during a person’s life.  It  aims  to  publish  new  insights  into  current innovations in computer systems and technology for biometrics development  and its applications.  particularly.  and  even  after  death.  In  chapter  7  the  author  proposes  a  method  for  generating  a  personal  ID  comprising  short  tandem  repeat  (STR)  and  single  nucleotide  polymorphism (SNP) information which are used in personal identification in forensic  application.   The book consists of 13 chapters.  In  computer  science. each focusing on a certain aspect of the problem. The  book  chapters  are  divided  into  three  sections:  physical  biometrics.  In  the  first  physical  biometrics  section.  It  is  also  used  to  identify individuals in groups that are under surveillance.  Preface   Biometrics  uses  methods  for  unique  recognition  of  humans  based  upon  one  or  more  intrinsic  physical  or  behavioral  traits.  behavioral  biometrics  and  medical  biometrics.  In  chapter  3  the  author  gives  the  recent  advancements  in  the  field  of  minutia‐based  fingerprint extraction and recognition.  Chapter  1  provides  an  in‐depth  look  at  speaker  recognition  and  address many practical and algorithmic issues related to the design and utilization of  speaker recognition.  there  are  seven  chapters.  Dr. Norman Poh.  Chapter  11  proposes  the  usage  of  heart  sounds  for  biometric  recognition. and some guest editors.  medical  biometrics  constitutes  another  category  of  new  biometric recognition modalities that encompasses signals which are typically used in  clinical  diagnostics.  methods  and  applications. Jiangxi province. Girija Chetty. since observing temporal change in heartbeat is important and  contributes to improvement of exposure method of music and sound. Jucheng Yang   Professor  School of Information Technology.  China        .X Preface In  Section  2.  In  section  3. Jucheng Yang. Dongsun  Park. such as  Dr.  so  chapter  10  gives  a  survey  on  heart  biometrics  with  its  theory.      Dr. Dr. Sook Yoon and other. Dr. Loris Nanni.  two  kinds  of  behavioral  biometrics:  keystroke  dynamics  and  DWT  domain  on‐line  signature  verification  are  introduced  in  chapter  8  and  chapter  9  respectively. In chapter 13 the  author proposes the use of saliva protein profiling as a biometric tool to authenticate  the presence of carcinoma among women.  Jiangxi University of Finance and Economics. Dr.  The book was reviewed by editor Dr.  describes  the  strengths  and  the  weaknesses  of  the  novel  trait  and  analyzes in  detail  the  methods developed  so  far  and  their  performance. Jianjiang Feng.  Chapter  12  investigates  the  temporal  change  in  heartbeat  intervals  in  a  transition  between  different sound stimuli. Dr.   Nanchang. . . Part 1 Physical Biometrics . . speaker recognition is in its early stages of infancy. with the capability of tracking. Once a model is established and has been associated with an individual. but are certainly not limited to. Inc.A. Campbell. 2004. 1959). and segmentation by extension. fully automated speaker recognition systems have been developed and are in use (Beigi. 2011). There are countless number of applications for the different branches of speaker recognition. It needs not be mentioned that with the growing number of cellular (mobile) telephones and their ever-growing complexity. Initial speaker recognition techniques relied on a human expert examining representations of the speech of an individual and making a decision on the person’s identity by comparing the characteristics in this representation with others. In the recent decades. Shearme & Holmes. A coverage of most of the aspects is presented. more in the form of a comprehensive summary of the field with an ample number of references for the avid reader to follow. we have tried to present the material. In a somewhat different approach. as it has been done before. partly because of the limited development in the field. not just in the form of a list of different algorithms and techniques used for handling part of the problem. new instances of speech may be assessed to determine the likelihood of them having been generated by the model of interest in contrast with other observed models. 1954. This is partly due to unfamiliarity of the general public with the subject and its existence. This makes speaker recognition quite valuable and unrivaled in many real-world applications. This may be a mathematical model of the physiological system producing the human speech or simply a statistical model with similar output characteristics as the human vocal tract. . The most popular representation was the formant representation. As for the importance of speaker recognition. in terms of deployment.S.. one or more of the speaker recognition branches may be used. The earliest known papers on speaker recognition were published in the 1950s (Pollack et al. speaker recognition will become more popular in the future. A speaker recognition system first tries to model the vocal tract characteristics of a person. 2005). and review papers published in the recent years (Bimbot et al.. Introduction Speaker Recognition is a multi-disciplinary technology which uses the vocal characteristics of speakers to deduce information about their identities. and classification of individual speakers. 1. These include. surveys. There have been a number of tutorials.0 1 Speaker Recognition Homayoon Beigi Recognition Technologies. verification. However. It is a branch of biometrics that may be used for identification. Furui. This is the underlying methodology for all speaker recognition applications. financial. 1997. U. detection. If audio is involved. it is noteworthy that speaker identity is the only biometric which may be easily tested (identified or verified) remotely through the existing infrastructure. namely the telephone network. It requires the knowledge of phonetics. integral transforms. either congenitally or due to an illness like glaucoma. it is important to clarify the terminology and to describe the problems of interest by reviewing the different manifestations and modalities of this biometric. SVM. The avid reader is recommended to refer to (Beigi. People. Speaker recognition encompasses many different areas of science. 1983. in general. such as some who are blind. Some of these markers have been discussed in other chapters of this book. Parameter estimation and learning techniques are used in HMM. fingerprint and palm recognition. access control and security. is used in the form of complex variables theory. teleconferencing. linguistics and phonology. Finally. According to the National Institute of Standards and Technology (NIST). Then. All that can be done here is to scratch the surface and to speak about the inter-relations among these topics to create a complete speaker recognition system. Also applied math. including the details of the underlying theory. surveillance. gait. such as those who have either lost their hands or their fingers in an accident or those who congenitally lack fingers or limbs. have the problem of not being able to identify people with damaged fingers. people who work with their hands. The vast domain of the field does not allow for a thorough coverage of the subject in a venue such as this chapter. may not be recognized through iris recognition. statistics. probability theory. but they certainly exist. let us briefly review different biometrics in contrast with speaker recognition. Once the problems are clearly posed and the challenges are understood. Then there is statistical learning theory which is used in the form of maximum likelihood estimation. It is very hard to tell the size of this population. or maybe people without limbs. and proctorless distance learning Beigi (2009). vein recognition. this is about 2% of the population! Also. and neural networks (NNs). maximum a-posteriori probability. at the core of the subject. one would need a high quality image of the iris to perform recognition. Signal processing which by itself is a vast subject is also an important component. hand and finger geometry. iris and retinal recognition. Afterwards. Some of the most popular biometrics are Deoxyribonucleic Acid (DNA). concluding remarks are made about the current state of research on the subject and its future trend. some of the challenges faced in achieving a practical system are listed. Tosi. In addition. latex prints of finger patterns may be used to spoof some sensors. the state of the art in addressing the problems at hand is briefly surveyed in a section on theory. a quick review of the production and the processing of speech by humans is presented. NN. for example. 2. face recognition. etc. handwriting. These are. thermography. Artificial intelligence techniques appear in the form of sub-optimal searches and decision trees. Then. Although there are long distance iris imaging cameras. To start. construction workers. hidden Markov models (HMMs). likelihood linear regression.4 2 Will-be-set-by-IN-TECH Biometrics forensic and legal (Nolan. and keystroke recognition. Information theory is at its basis and optimization theory is used in solving problems related to the training and matching algorithms which appear in support vector machines (SVMs). and other underlying methods. with damaged irides. and many other mathematical domains such as wavelet analysis. as popular as they are. their field of vision may easily be . Comparison with other biometrics There have been a number of biometrics used in the past few decades for the recognition of individuals. Acquiring these images is quite problematic. image-based and acoustic ear recognition. 2011) for a comprehensive treatment of the subject. Fingerprints. audio/video indexing and diarization. Additionally. A comparison of voice with some other popular biometrics will clarify the scope of its practical usage. 1979). and other techniques. These combinations increase the accuracy of the identification or verification of the individual based on the fact that the information is obtained through different. Speech may be used for doing remote recognition of the individual using the existing telephone network and without the need for any extra hardware or other apparatus. speech has been established for communication and people are far less likely to be concerned about parting with their speech. blinking. many individuals are wary of producing a fingerprint for fear of its malicious usage or simply due to the criminal connotation it carries. namely microphones. This action offended many tourists to the point that some countries such as Brazil placed a reciprocal system in place only for U. The latest figures for the population of deaf and mute people in the United States reflected by the US Census Bureau set this percentage at 0. iris. rolling of the eyes. 2000) shows an example of such a multimodal approach using speaker and image recognition. since any one technique may be bypassed by the eager impostor. wearing of hats. 2005). therefore they will not be able to use speaker recognition systems. (Viswanathan et al. either by wearing contact lenses or by simply using an image of the target individual’s irides. Many people entering the U. in the form of tracking and detection may be used to do much more than simple identification and verification of individuals. the use of speech for telephone communication makes it much more socially acceptable. In general. Due to these legacy associations. the United States government required capturing the image and fingerprint of all tourists entering the nation’s airports. a fingerprint scanner and a camera would have to be present. since cellular telephones already have speech capture devices. It would be much more difficult to fool several independent biometric systems simultaneously. such as a full diarization of large media databases. Spoofing. there is also a percentage of the population who are unable to speak. Also. they are somewhat harder to capture. Even in the technological arena. Of course. On the other hand.S. Multimodal biometrics entail systems which combine any two or more of these or other biometrics. Speaker recognition can also utilize the widely available infrastructure that has been around for so long. for fingerprint and image recognition. The image may also not be acceptable due to lighting and focus conditions. Many of the above biometrics may be successfully combined with speaker recognition to produce viable multimodal systems with much higher accuracies. Because facial. fingerprint recognition has long been associated with criminology.. around the world.Speaker Recognition Speaker Recognition 5 3 blocked by uncooperative users through the turning of the head. it is being tolerated by travelers much better. only based on the act of fingerprinting. since many other countries have been adopting the fingerprint capture requirement. Also. Another attractive point is that cellular telephone and PDA-type data security needs no extra hardware. glasses. the public is more wary of providing such information which may be archived and misused. irides tend to change due to changes in lighting conditions as the pupils dilate or contract. mostly independent sources. As an example. speaker recognition. It is also possible to spoof some iris recognition systems. felt like they were being treated as criminals. . Most PDAs also contain built-in microphones. On the other hand.4% for deaf and mute individuals (USC. Most practical implementations of biometric system will need to utilize some kind of multimodal approach. namely the telephone network. citizens entering their country. a few years ago. In terms of public acceptance. Of course. and fingerprints have a sole purpose of being used in recognition.S. etc. retinal images. using recordings is also a concern in practical speaker recognition systems. the rest do not present any additional information. An example would be a software program which would identify the audio of a speaker so that the interaction environment may be customized for that individual. This is usually done by building a mathematical model of a sample speech from the user and storing it in association with an identifier. the audio of the test speaker is compared against all the available speaker models and the speaker ID of the model with the closest match is returned. talker identification. voiceprint identification. Therefore. like anything else. If the speaker does not exist in the database. the enrollment sample may not be reconstructed from the model. In this case. has been mistakenly applied to speech recognition and these terms have become synonymous for a long time. there is no rejection scheme. talker clustering. still. They include. the ID of one of the speakers in the database will always be closest to the audio of the test speaker. In closed-set identification. If the test speaker matches the target speaker based on the ID. In a speech recognition application. it is recommended that the use of this term is avoided. The term voice recognition has been used in some circles to double for speaker recognition. Speech recognition research has been around for a long time and. some of which have caused great confusion. naturally. With the exception of the term speech biometrics which also introduces the addition of a speech knowledge-base to speaker recognition. then the ID is accepted . Alas. it is not the voice of the individual which is being recognized. For example. the child will match against one of the adult male speakers in the database. a closed-set identification may be conducted and the resulting ID may be used to run a speaker verification session. In fact. 3. Terminology and manifestations In addressing the act of speaker recognition many different terms have been coined. some match needs to be returned just to be able to pick a customization profile. there is no great loss by making a mistake. In practice. One may imagine a case where the test speaker is a 5-year old child where all the speakers in the database are adult males. but the contents of his/her speech. the top best matching candidates are returned in a ranked list. speech biometrics. closed-set identification is not very practical.2 Speaker identification There are two different types of speaker identification. One term that has added to this confusion is voice recognition. a myriad of different terminologies have been used to refer to this subject. Other than the aforementioned. In closed-set Identification. Voice recognition. in which case rejection will become necessary. Of course. In close-set identification. returned from the closed-set identification. voice biometrics. and so on. voice identification. 3. the term has been around and has had the wrong association for too long. then there is generally no difference in what profile is used. Closed-set identification is the simpler of the two problems. Although it is conceptually a correct name for the subject. unless profiles hold personal information. Open-set identification may be seen as a combination of closed-set identification and speaker verification. closed-set identification also has its own applications. in the past. with corresponding confidence or likelihood scores. closed-set and open-set.6 4 Will-be-set-by-IN-TECH Biometrics 3. there is some confusion in the public between speech and speaker recognition. biometric speaker identification.1 Speaker enrollment The first step required in most manifestations of speaker recognition is to enroll the users of interest. usually. This model is usually designed to capture statistical information about the nature of the audio sample and is mostly irreversible – namely. if the verification fails. do so in the text-prompted flavor. There are also variations of these two modalities such as text-prompted. Also. There is always a need for contrast when making a comparison. et cetera). an identification number. The provided ID is used to retrieve the enrolled model for that person which has been stored according to the enrollment process. a username. the person being verified (known as the test speaker).1 Speaker verification modalities There are two major ways in which speaker verification may be conducted. using such text-dependent modality will be open to spoofing attacks. A more flexible modality is the text-independent modality in which case the texts of the speech at the time of enrollment and verification are completely random. These two are called the modalities of speaker verification and they are text-dependent and text-independent. If it is closer to the target model. usually by non-speech methods (e. Of course in reality. The chosen text is the prompt and the modality is called text-prompted.Speaker Recognition Speaker Recognition 7 5 and passed back as the true ID of the test speaker. Therefore. then the user is verified and otherwise rejected.. described earlier. The speaker verification problem is known as a one-to-one comparison since it does not necessarily need to match against every single person in the database. This enrolled model is called the target speaker model or the reference model. An open-set identification problem is therefore at least as complex as a speaker verification task (the limiting case being when there is only one speaker in the database) and most of the time it is more complex. as stated – comparison against the target model and the competing model(s).. The speech signal of the test speaker is compared against the target speaker model to verify the test speaker. The difficulty with this . one or more competing models should also be evaluated to come to a verification decision. 3. Of course. In a purely text-dependent modality. there is more than one comparison for speaker verification. the speaker is required to utter a predetermined text at enrollment and the same text again at the time of verification. the test speaker should be compared against all speaker models in the database – in practice this may be avoided by tolerating some accuracy degradation (Beigi et al. the complexity generally increases linearly with the number of speakers enrolled in the database since theoretically. another way of looking at verification is as a special case of open-set identification in which there is only one speaker in the list. It is only valid for verification. In fact. one of those texts is requested to be uttered by the test speaker.3 Speaker verification (authentication) In a generic speaker verification application. This means that the enrollment may be done for several different textual contents and at the time of verification. language-independent text-independent and language-dependent text-independent.g.3. the audio may be intercepted and recorded to be used by an impostor at the time of the verification. Practical applications that use the text-dependent modality. The competing model may be a so-called (universal) background model or one or more cohort models. the complexity of the matching does not increase as the number of enrolled subjects increases. in a database. namely. Therefore. Text-dependence does not really make sense in an identification scenario. 3. The final decision is made by assessing whether the speech sample given at the time of verification is closer to the target model or to the competing model(s). the speaker may be rejected all-together with no valid identification result. 1999). On the other hand. In practice. comparison against the target speaker’s model is not enough. identifies himself/herself. and event classification. any recording of the person’s voice should now get an impostor through. the text for text-prompted and text-dependent engines must still be designed to cover enough variation to allow for a meaningful comparison. may reduce the error rates. classifying male and female is not so simple in children since their vocal characteristics are quite similar before the onset of puberty. In fact. because of the coverage issue discussed earlier. pooling enrollment data from like classes together. blasts. a host makes an appointment for a conference call and notifies attendees to call a telephone number and to join the conference using a special access code. since they are concerned with the vocal tract characteristics of the individual. 3. the technique are used for doing event classification such as classifying speech.(Beigi. However. detection and tracking Automatic segmentation of an audio stream into parts containing the speech of distinct speakers. horns. Gender classification. Therefore. tries to separate male speakers and female speakers. music. music. since the enrollment and verification texts are identical. mostly governed by the shape of the speaker’s vocal tract. The general tendency is to believe that in the text-dependent and text-prompted cases. diarization. some researchers have developed text-independent systems which have some internal models associated with phonemes in the language of their scope. but somewhat language-dependent speaker verification system. Different specialized recognizers may be used for recognition of distinct categories of audio in a stream. 2011). The long samples increase the probability of better coverage of the idiosyncrasies of the person’s vocal characteristics. either there may be no enrollment or enrollment may be done differently. The language limitations reduce the space and. Some examples of special enrollment procedures are. usually. whistles. gun shots. sometimes. etc. text-independent speaker verification algorithms are also language-independent. as is apparent from its name. Also. Some examples of the many classification scenarios are gender classification. More advanced versions also distinguish children and place them into a separate bin. The feature selection and processing methods for classification are mostly dependent on the scope and could be different from mainstream speaker recognition. In a tele-conference. screams. The problem of spoofing is still present with text-independent speaker verification. These techniques produce a text-independent. Although these methods are called speaker classification. depending on the problem at hand. hence. In most cases.4 Speaker and event classification The goal of classification is a bit more vague. etc.8 6 Will-be-set-by-IN-TECH Biometrics method is that because the texts are presumably different. they can be designed to be much shorter.5 Speaker segmentation. This type of segmentation is elementary to the practical considerations of speaker recognition as well as speech and other audio-related recognition systems. age classification. using extra features in supplemental codebooks related to specific natural or logical specifics of the classes of interest. For this reason. It is the general label for any technique that pools similar audio signals into individual bins. text-independent systems would generally be used with another source of information in a multi-factor authentication scenario. There is an increasing . An example is the ever-growing tele-conferencing application. One must be careful. since the shorter segments will only examine part of the dynamics of the vocal tract. noise. longer enrollment and test samples are needed. 3. Classification may use slightly different sets of features from those used in verification and identification. and different background conditions has many applications. it is necessary to know the speaker of each statement. This layer uses knowledge from the speaker to test for liveness or act as an additional authentication factor. It is somewhat related to the text-prompted modality with the difference that there is another abstraction layer in the design. In case one is only interested in a specific speaker and where that speaker has spoken within the conversation (the timestamps). A practical speaker recognition system utilizing speech recognition and natural language understanding 4. Challenges of speaker recognition Aside from its positive outlook such as the established infrastructure and simplicity of adoption. at the enrollment time. too. 3. speaker recognition. If an enrolled model exists for each speaker. As an example. The content is parsed by doing a transcription of the audio and using a natural language understanding (Manning. the process is called speaker tracking. As an example. assume using a specific microphone over a channel such as a cellular communication channel with all the associated band-limitations and noise conditions in one session of using . 1999) system to parse for the information of interest. is filled with difficult challenges for the research community. At the verification time.Speaker Recognition Speaker Recognition 9 7 interest from the involved parties to obtain transcripts (minutes) of these conversations. then prior to identifying the active speaker (speaker detection). 1. randomized questions may be used to capture the test speaker’s audio and the content of interest. Fig. Channel mismatch is the most serious difficulty faced in this technology. This will increase the factors in the authentication and is usually a good idea for reducing the chance of successful impostor attacks – see Figure 1. In order to fully transcribe the conversations. When speaker segmentation is combined with speaker identification and the resulting index information is extracted. specific information such as a Personal Identification Number (PIN) or other private data may be stored about the speakers. the audio of that speaker should be segmented and separated from adjoining speakers.6 Knowledge-based speaker recognition (speech biometrics) A knowledge-based speaker recognition system is usually a combination of a speaker recognition system and a speech recognizer and sometimes a natural language understanding engine or more. the process is called speaker diarization. a person in two different sessions. Of course. sirens. If one considers the variations across sensors. Inter-class variations denote the difference between two different individuals while intra-class variations represent the variation within the same person’s voice in two different sessions. although they would probably not be as pronounced as the variations in microphone conditions. creating variations from one session to another. It is a problem that haunts almost all biometrics. all that the system would learn about the identity of the individual is tainted by the channel characteristics through which the audio had to pass. different results may be obtained even in fingerprint recognition. a completely different channel could be used. this time. For instance. as stated earlier. The signal variation problem. The existence of wide intra-class variations compared with inter-class variations makes it difficult to be able to identify a person accurately. In general. However. sledge hammers. cellphone. Channel mismatch is the source of most errors in speaker recognition. an abundance of data is needed to be able to cover all the variations within an individual’s voice. These channel characteristics are basically modulated with the characteristics of the person’s vocal tract. analogous conditions in image recognition would show up in the form of noise in the lighting conditions. which take effect in the course of many years. These conditions affect the recognition rate considerably. Examples would be audio generated in a room with echos or in a street while walking and talking on a mobile (cellular) phone. would possibly have more variation within his/her own voice than if the signal is compared to that of someone else’s voice. in an official implementation. on the way. Therefore. office . timber. But even then. That is a part of aging in itself. for biometrics such as fingerprint recognition. Another problem is signal variability. Time-lapse could be characterized in many different ways (Beigi. But there are also subtle changes that are not that much related to aging and may be habitual or may also be dependent on the environment. or sometimes months. a vendor or an agency may require the use of the same sensor all around. in general uses many different speech apparatuses such as a home phone. at the time of performing the identification or verification. As we grow older. These types of problems are quite specific to speaker recognition. etc. The original purpose of using speech has been to be able to convey a message. we are used to deploying different microphones and channels for this purpose. For example. For example. On the hand. weeks. One is the aging of the individual. who possesses similar vocal traits. 2009). They would have specific characteristics in terms of dynamics. and similar noise sources being heard in the background.10 8 Will-be-set-by-IN-TECH Biometrics a speaker recognition system. Therefore. is common to most biometrics. Therefore. In fingerprint recognition they appear in the way the fingerprint is captured and related noisy conditions associated with the sensors. These short-term variations could happen within a matter of days. similar problems may show up in different forms in other biometrics. Another group of problems is associated with background conditions such as ambient noise and different types of acoustics. this session can be the enrollment session for instance. color. the technology may more readily dictate the type of sensors which are used. our vocal characteristics change. Some of these variations may be due to aging and time-lapse effects. automobile engines. This is by no means specific to speaker recognition. the person being identified or verified may call from his/her home number or an office phone. cut-off frequencies. Of course. larger variations happen with aging. possibly with fire trucks. One person. These may either be digital phones going through voice T1 services or may be analog telephony devices going through analog switches and being transferred to digital telephone company switches. This is in contrast with most of the language functions (production and perception) in the brain which are processed by the Broca and Wernicke areas in the left (dominant) hemisphere of the cerebral cortex (Beigi.Speaker Recognition Speaker Recognition 11 9 phone. In humans. increase the bit-rate of the signal substantially. generating great redundancy. into a controller block. This block is made up of Gb (s) which makes up those parts of the nervous system . 5. This message is transmitted over a channel starting with the air surrounding the mouth and continuing with electronic devices and networks such as a telephone system to transmit the coded message. 2011). The resulting signal is therefore transmitted to the air surrounding the ear. The brain will then induce neuro-muscular activity to start the vocal tract in vocalizing the message. Earlier. This is a very complex system where the intended message generally contains a very low bit-rate content. as mentioned earlier. We may lump together the different portions of the nervous system at work in generating the control signals which move the vocal tract to generate speech. 2011). Figure 2 depicts a control system representation of speech production proposed by (Beigi. It includes the intended message which is created in the brain. where vibrations travel through different sections of the outer and the middle ear. and finally audio. Although. This small amount of information may easily be tainted by noise throughout this process. and other musical discourse. Gc (s). These signals are then decoded by different parts of the brain and are decoded into linguistic concepts which are understood by the receiving individual. The intended message is embedded in the abstraction which is deduced by the brain from the signal being presented to it. we considered the transformation of a message being formed in the brain into a high-capacity audio signal. Speech generation starts with the speech content being developed in the brain and processed through the nervous system. In reality. the creation of the audio signal from the fundamental message formed in the brain may be better represented using a control system paradigm. generating neural signals which travel through the Thalamus to the brain. this becomes an advantage in terms of ease of adoptability of speaker recognition in existing arenas. tempo. Human speech generation and processing A human child develops an inherent ability to identify the voice of his/her parents before even learning to understand the content of their speech. The abstraction of this message is encoded into a code that will then produce the language (language coding step). Bulk of the work in speaker recognition research is to be able to alleviate these problems. in conjunction with the functions for processing pitch. and even a microphone headset attached to a computer. it also makes the speaker recognition problem much more challenging. neuro-muscular excitation. Catching a cold causes changes to our voice and its characteristics which could create difficulties in performing accurate speaker recognition. We still expect to be able to perform reasonable speaker recognition using this varied set of sensors and channels. although not every problem is easily handled with the current technology. The cochlear vibrations excite the cilia in the inner ear. However. Therefore a low information content is encoded to travel through a high-capacity channel. Let us consider the Laplace transform of the original message being generated in the brain as U (s). Another problem is the presence of vocal variations due to illness. speaker recognition is performed in the right (less dominant) hemisphere of the brain. the way this content undergoes transformation into a language code. The resulting signal. namely air. exciting the transmission medium. Y (s) is the signal which is used to recognize the speaker. Speech production in the Cerebral Cortex – from (Beigi.12 10 Will-be-set-by-IN-TECH Biometrics Fig. 3. 2011) in the brain associated with generating the motor control signals and Gm (s) which is the part of the nervous system associated with delivering the signal to the muscles in the vocal tract. Control system representation from (Beigi. 2011). The output. borrowed from (Beigi. is the Laplace transform of the speech wave. The resulting signal is then transformed by passing through some type of electronic medium through audio capture and communication. Broca’s area which is part of the frontal lobe is . The output of Gc (s) is delivered to the vocal tract which is seen here as the plant. At this point we may model all the noise components and disturbances which may be present in the air surrounding the generated speech. It is called Gv (s) and it includes the moving parts of the vocal tract which are responsible for creating speech. 2. 2011) Figure 3. Fig. H (s). shows the superimposition of the interesting parts of the brain associated with producing speech. Speech recognition. on the other hand. The total time-signal is therefore a convolution of these two signals. the brain has evolved to allow for messages to travel quickly from Gb (s) to Gm (s) by utilizing proximity. Due to the slow transmission of chemical signals. has substantially different characteristics. 2011). 6. The WKS sampling theorem requires that the sampling frequency be at least two times the Nyquist . It may be seen as a part of the Gb (s) in the control system representation of speech production. Theory and current approaches The plasticity of the shape of the vocal tract makes the speech signal a non-stationary signal. The Precentral Gyrus shown in the blue color is a long strip in the frontal lobe which is responsible for our motor control. In text-independent speaker recognition. Pharynx. Together. the first step is to store the vocal characteristics of the speakers in the form of speaker models in a database. but in essence each discipline is concerned with a different part of the signal. This will allow the feedback and refinement of the outgoing message. Figure 4 shows the major portion of the vocal tract which begins with the trachea and ends at the mouth and at the nose. jaw.Speaker Recognition Speaker Recognition 13 11 associated with producing the language code necessary for speech production. To build these models. 6. Note the proximity of these control regions to Broca’s area. It has a very plastic shape in which many of the cavities can change their shapes to be able to adjust the plant dynamics of Figure 2. we are only concerned with learning the characteristics of the carrier signal in Gv (s). One must therefore ensure that the sampling rate is picked in accordance with the guidelines set by the Whittaker-Kotelnikoff-Shannon (WKS) sampling theorem (Beigi. The green part is associated with control of our larynx. Other sources are due to many complex disturbances along the way. is concerned with decoding the intended message produced by Broca’s area. This means that any segment of it. and tongue. Mel Frequency Cepstral Coefficients (MFCCs) – see (Beigi. which is modified by the signal being generated by the Gc (s). which is the coding section.1 Sampling A Discrete representation of the signal is used for Automatic Speaker Recognition. This is the actual plant which was called Gv (s) in the control paradigm (Figure 2). for future reference. indicating that the dynamics of the system producing these sections varies with time. The lower part of the blue section is responsible for lip movement. This is why the signal processing is quite similar between the two disciplines. 2011). certain features should be defined such that they would best represent the vocal characteristics of the speaker of interest. The lower part of this area which is adjacent to Broca’s area is further split into two parts. The separation of these convolved signals is quite challenging and the results are therefore tainted in both disciplines causing a major part of the recognition error. when compared to an adjacent segment in the time domain. The most prevalent features used in the field happen to be identical to those used for speech recognition. these parts make up part of the Gm (s) which is the second box in the controller. Note that Broca’s area is also connected to the language perception section known as Wernicke’s area. namely. Therefore we need to utilize the sampling theorem to help us determine the appropriate sampling frequency to be used for converting the continuous speech signal into its discrete signal representation. based on its inherent dynamics. As mentioned in the Introduction. The vocal tract produces a carrier signal. For simplicity. . is defined as two times the Nyquist critical frequency. 1918) Critical frequency. The Nyquist critical frequency is really the highest frequency content of the analog signal. In this representation. Source: Gray’s Anatomy (Gray. 4. where the impulses happen at the chosen sampling frequency. 5. which acts like the multiplication of an impulse train with the analog signal. Block diagram of a typical sampling process Figure 5 shows a typical sampling process which starts with an analog signal and produces a series of discrete samples at a fixed frequency.14 12 Will-be-set-by-IN-TECH Biometrics Fig. The Nyquist rate. The sampling theorem may be stated in words by requiring that the sampling frequency be greater than or equal to the Nyquist rate. Fig. Sagittal section of Nose. Pharynx. and Larynx. Mouth. The discrete samples are usually stored using a Codec (Coder/Decoder) format such as linear PCM. normally an ideal sampler is used. each sample has a zero width and lasts for an instant. μ-Law. representing the speech signal. For example.. the spectrogram of the signal. band-limitation.2 Feature extraction Cepstral coefficients have fallen out of studies in exploring the arrival of echos in nature (Bogert et al. In its quantized form. 6. The frequency domain of the signal in computing the MFCCs is warped to the Melody (Mel) scale. the stronger the signal strength in that frequency for the time slice of choice. The simplest one is the speech waveform which is basically the plot of the sampled points versus time. There are models of the . 1963). The system of Figure 5 should be designed so that it reduces aliasing. Mostly everyone is familiar with having to qualify fricatives on the telephone by using statements such as “S” as in “Sam” and “F” as in “Frank”. Narrowband spectrogram of a speech signal Another representation is. In general. the amplitude is normalized to dwell between −1 and 1. It is really a three-dimensional representation where the z-axis is depicted by the darkness of the points on the figure. The darker the pixel. so-called. truncation. An artifact of the narrowband spectrogram is the existence of the horizontal curved lines across time. Figure 7 shows how most of the fricative information is lost going from a 22 kHz sampling rate to 8 kHz. Standardization is quite important for interoperability with different recognition engines (Beigi & Markowitz. Normal telephone sampling rates are at best 8 kHz. the data is stored in the range associated with the quantization representation. There are different forms for representing the speech signal. 6. it would go from −32768 to 32767. Figure 6 shows the narrowband spectrogram of a signal. They are related to the spectrum of the log of spectrum of a speech signal. 2010). for a 16-bit signed linear PCM. It is based on the premise that human perception of pitch is linear up to 1000 Hz and then becomes nonlinear for higher frequencies (somewhat logarithmic). etc. The spectrogram shows the frequency content of the speech signal as a function of time. Fig. A sliding widow of 23 ms was used for to generate this figure. such as the sampling rate and volume normalization. A speech waveform representation has also been plotted on top of the spectrogram of Figure 6 to show the relation between different points in the waveform with their corresponding frequency content in the spectrogram.Speaker Recognition Speaker Recognition 15 13 a-Law. and jitter by choosing the right parameters. Framing – Selecting a sliding section of the signal with a fixed width in time which is then moved with some overlap. also known as AutoRegressive (AR) features by themselves: Linear Predictive Coefficients (LPC). This method usually entails the following steps: 1. 4. MFCC – The Mel Frequency Cepstral Coefficients (MFCC) are computed. Windowing – A window such as a Hamming. 2011). 5. Welch. 1997). Hann.16 Biometrics Fig. There are also the Perceptual Linear Predictive (PLP) (Hermansky. and so on (Beigi. etc. Some use the Linear Predictive. also known as Moving Average (MA) which utilizes the Fast Fourier Transform (FFT) for the first pass and the Discrete Cosine Transform (DCT) for the second pass to ensure real coefficients. 2011). However. Mel Cepstral Dynamics – Delta and Delta-Delta Cepstra are computed based on adjacent MFCC values. Frequency Warping – The FFT results are warped in the frequency domain in accordance with the Melody (Mel) or Bark scale. There have been an array of other features used such as wavelet filterbanks (Burrus et al. 3. 6. 1990) features. for example in the form of Mel-Frequency Discrete Wavelet Coefficients and Wavelet Octave Coefficients of Residues (WOCOR). . 7. or log area ratios. sampled at 22kHz (left) and 8kHz (right) human perception based on other warped scales such as the Bark scale. The domain is changed from magnitudes and frequencies to loudness and pitch (Beigi. Partial Correlation (PARCOR) – also known as reflection coefficients. They may be computed using the Direct Method. These features come in different flavors such as Empirical Mode Decomposition (EMD). shown in Figure 9. is used to smooth the signal for the computation of the Discrete Fourier Transform (DFT). 2011). The sliding window is generally about 30ms with an overlap of about 20ms (10ms shift). PLP works by warping the frequency and spectral magnitudes of the speech signal based on auditory perception tests.. Utterance: “Sampling Effects on Fricatives in Speech”. Mel Cepstrum Modulation Spectrum (MCMS). mostly the LPCs are converted to cepstral coefficients using autocorrelation techniques (Beigi. FEPSTRUM. There are also Instantaneous Amplitudes and Frequencies which are in the form of Amplitude Modulation (AM) and Frequency Modulation (FM). These are called Linear Predictive Cepstral Coefficients (LPCCs). There are several ways of computing Cepstral Coefficients. FFT – The Fast Fourier Transform (FFT) is generally used for approximating the DFT of the windowed signal. 2. Therefore.Speaker Recognition Speaker Recognition 30 17 15 20 10 0 MFCC Mean −10 −20 −30 −40 −50 −60 0 2 4 6 8 10 12 Feature Index 14 16 18 20 Fig. As a special case of Hidden Markov Models.3 Speaker models Once the features of interest are chosen. 8. Addition of features extracted from silent areas in the speech will increase the similarity of models. Models are usually based on HMMs. Other preprocessing such as Audio Volume Estimation and normalization and Echo Cancellation may also be necessary for obtaining desirable result (Beigi. For example. This is probably the most popular technique which is used in this field. a multi-state ergodic Hidden Markov Models is usually used for text-dependent speaker recognition since there is textual context. SVMs. 2011) Fig. GMMs. Only segments with vocal signals should be considered for recognition. 2011) is quite important for better results.3. 6. models are built based on these features to represent the speakers’ vocal characteristics. and NNs. A sample MFCC vector – from (Beigi. .1 Gaussian Mixture Models (GMM) In general. different methods may be used. GMMs are basically single-state degenerate HMMs. A typical Perceptual Linear Predictive (PLP) system It is important to note that most audio segments include a good deal of silence. At this point. 9. 6. depending on whether the system is text-dependent (including text-prompted) or text-independent. Gaussian Mixture Models (GMM) are used for doing text-independent speaker recognition. since silence does not carry any information about the speaker’s vocal characteristics. there are many different modeling scenarios for speaker recognition. Most of these techniques are similar to those used for speech recognition modeling. Silence Detection (SD) or Voice Activity Detection (V AD) (Beigi. 2011). The Variance-Covariance matrix of a multi-dimensional random variable is defined as. In a GMM. 2011). one should make a few assumptions and decisions. Σ is given by the following expression. they usually use a GMM to initialize Hidden Markov Models (HMMs) (Poritz. 1 N −1 x (3) N i∑ i =0 where N is the number of samples and xi are the Mel-Cepstral feature vectors (Beigi. Base models. μ≈ Σ = E (x − E {x}) (x − E {x}) T Δ (4) (5) = E xx T − μμ T This matrix is called the Variance-Covariance since the diagonal elements are the variances of the individual dimensions of the multi-dimensional vector. A popular technique is the use of a Gaussian Mixture Model (GMM) (Duda & Hart. This step is called training. etc. Standard clustering techniques are usually used for the initial determination of the Gaussians. . This is dependent on the amount of data that is available and the dimensionality of the feature vectors. 1988) built to have an inherent model of the content of the speech as well. ˜ The Unbiased estimate of Σ . μ ∈ Rd Σ : Rd → Rd ˆ In Equation 1. Once the number of Gaussians is determined. but. μ is the mean vector where. This is mostly relevant to the text-independent case which encompasses speaker identification and text-independent verification. This distribution is represented by Equation 1. The off-diagonal elements are the covariances across the different dimensions. μ = E {x} = Δ Δ ∞ −∞ x p(x)dx (2) The so-called “Sample Mean” approximation for Equation 2 is. Some have called this matrix the Variance matrix. The models generated by training are called by many different names such as background models. p(x) = 1 exp − (x − μ ) T Σ −1 (x − μ ) 2 (2π ) |Σ | d 2 1 2 1 (1) where x. as the Covariance matrix which is the name we will adopt here. Even text-dependent techniques can use GMMs. simply. x. the models are parameters for collections of multi-variate normal density functions which describe the distribution of the Mel-Cepstral features (Beigi. speaker independent models. universal background models (UBM). 2011) for speakers’ enrollment data. The first assumption is the number of Gaussians to use. some large pool of features is used to train these Gaussians (learn the parameters). Many speaker diarization (segmentation and ID) systems use GMMs.18 16 Will-be-set-by-IN-TECH Biometrics The models are tied to the type of learning that is done. To build a Gaussian Mixture Model of a speaker’s speech. 1973) to represent the speaker. Mostly in the field of Pattern Recognition it has been referred to. The claim-to-fame of support vector machines (SVMs) is that they determine the boundaries of classes. generally. by using separate male and female databases for the training. At this stage.2 Support vector machines Support vector machines (SVMs) have been recently used quite often in research papers regarding speaker recognition.. which are located closest to the decision boundary. which is based on minimizing the classification error of both the training data and some unknown (held-out) data. the system is ready for performing the enrollment. 1998)) to present a solution in terms of a linear combination of a subset of observed (training) vectors. 6.. using standard techniques such as maximum likelihood estimation (MLE). 1998. and they have the capability of maximizing the margin of class separability in the feature space. There may be one or more Background models. 1999) are built in a similar fashion. Cohort models(Beigi et al. the core of support vector machines and other kernel techniques stems from . 2000) or cohort models are desired. 1992) states that the number of parameters used in a support vector machine is automatically computed (see Vapnik-Chervonenkis (VC) dimension (Burges.. Of course. A cohort is a set of speakers that have similar vocal characteristics to the target speaker. Maximum a-Posteriori (MAP) adaptation and Maximum Likelihood Linear Regression (MLLR). The enrollment may be done by taking a sample audio of the target speaker and adapting it to be optimal for fitting this sample. This information may be used as a basis to either train a Hidden Markov Model including textual context. Vapnik. most of the implementations suffer from huge optimization problems with large dimensionality which have to be solved at the training stage. depending on whether a universal background model (UBM) (Reynolds et al.3. some create a single background model called the UBM. This ensures that the likelihoods returned by matching the same sample with the modified model would be maximal. Results are not substantially different from GMM techniques and in general it may not be warranted to use such costly optimization calculations. S xx = N −1 i =0 ∑ xi xi T (8) After the training is done. For example. or to do an expectation maximization in order to come up with the statistics for the underlying model. based on the training data. others may build one for each gender. different processing is done. the basis for a speaker independent model is built and stored in the form of the above statistics.Speaker Recognition Speaker Recognition 19 17 ˜ Σ = 1 N −1 (x − μ )(xi − μ ) T N − 1 i∑ i =0 1 S xx − N (μμ T ) N−1 (6) (7) = where the sample mean μ is given by Equation 3 and the second order sum matrix. Vapnik (Vapnik. a pool of speakers is used to optimize the parameters of the Gaussians as well as the mixture coefficients. These vectors are called support vectors and the model is known as a support vector machine. For a UBM. (Boser et al. 1979) pioneered the statistical learning theory of SVMs. S xx is given by. Although they show very promising results. At this point. to speed up the training process. suggesting that part of the information revealed by the two approaches may be complementary (Solomonoff et al. In application to speaker recognition.4 Model adaptation (enrollment) For a new person being enrolled in the system. One such technique is known as cascade SVM (Tveit & Engum. Another problem is that the Kernel function being used by SVMs is almost magically chosen. or maximum likelihood linear regression for text-independent systems or simply by modifying the counts of the transitions on a hidden Markov model (HMM) for text-dependent systems. 2011) for details. Speaker recognition At the identification and verification stage. See (Beigi. To address this issue. several techniques based on the decomposition of the problem and selective use of the training data have been proposed. see (Beigi. Hilbert (Hilbert. for example. 2005). Some of the shortcomings of SVMs have been addressed by combining them with other learning techniques such as fuzzy logic and decision trees. experimental results have shown that SVM implementations of speaker recognition are slightly inferior to GMM approaches. This is done by any technique such as maximum a-posteriori probability estimation (MAP).3.3 Neural networks Another modeling paradigm is the neural network perspective. TDNNs. In general SVMs are two-class classifiers. Hierarchical Mixtures of Experts or HMEs. This can become quite computationally intensive for large-scale speaker identification problems. There are quite a number of different neural networks and related architectures such as feed forward networks. the base speaker-independent models are modified to match the a-posteriori statistics of the enrolled person or target speaker’s sample enrollment speech. 6. In the identification process.. etc. a new sample is obtained for the test speaker. new techniques have been developed to split the problem into smaller subproblems which would then be solved in parallel as a network of problems. using expectation maximization (EM). However. it has also been noted that systems which combine GMM and SVM approaches often enjoy a higher accuracy. probabilistic random access memory or pRAM models. 2004). One of the major problems with SVMs is their intensive need for memory and computation power at the training stage.. N-class classification problems such as speaker identification have to be reduced to N two-class classification problems where the ith two-class problem compares the ith class with the rest of the classes combined (Vapnik. 2003) for which certain improvements have also been proposed in the literature (Zhang et al. It would take an enormous amount of time to go through all these and other possibilities. That’s why they are suitable for the speaker verification problem which is a two-class problem of comparing the voice of an individual to his/her model versus a background population model.3. Training of SVMs for speaker recognition also suffers from these limitations. The identity of the model that returns the highest likelihood is returned as the identity of the test speaker. In identification. 2011). the sample is used to compute the likelihood of this sample being generated by the different models in the database. 1912) was one of the main developers of the formulation of integral equations and kernel transformations. 7. 1998). For a detailed coverage. Also.20 18 Will-be-set-by-IN-TECH Biometrics much earlier work on setting up and solving integral equations. 6. the . 2011). An extension of speaker recognition is diarization which includes segmentation followed by speaker identification and sometimes verification. In the case of speaker verification. 2000) has been more prevalent. 2000). To ensure a good dynamic range and better discrimination capability. log of the likelihood is computed. Figure 10 shows such a results for a two-speaker segmentation. They may also be presented as the error rate based on the true result being present in the top N matches. called the Detection Cost Function (DCF) (Martin & Przybocki. The comparison is done using the Log Likelihood Ratio (LLR) test. with a measurement of the cost of producing the results. If the target speaker model provides a better log likelihood. Segmentation and labeling of two speakers in a conversation using turn detection followed by identification 7.Speaker Recognition Speaker Recognition 21 19 results are usually ranked by these likelihoods. In the early days in the field. Bayesian Information Criterion (BIC) (Chen & Gopalakrishnan. Once the initial segmentation is done.. 1998) have been used for the initial segmentation of the audio. with the exception that instead of computing the log likelihood for all the models in the database. At the verification stage. the test speaker is verified and otherwise rejected. the Detection Error Trade-Off (DET) curve (Martin et al. The segmentation finds abrupt changes in the audio stream. 1998) and Generalized Likelihood Ratio (GLR) techniques and their combination (Ajmera & McCowan. For the past decade. the process becomes very similar to the identification process described earlier. . a limited speaker identification procedure allows for tagging of the different parts with different labels. This case is usually more prevalent in the cases where identification is used to prune a large set of speakers to only a handful of possible matches so that another expert system (human or machine) would finalize the decision process. Martin & Przybocki. the sample is only compared to the model of the target speaker and the background or cohort models. 2004) as well as other techniques (Beigi & Maes. a Receiver Operating Characteristic (ROC) curve was used (Beigi. 1 0 −1 4000 0 5 10 15 20 25 30 35 40 45 50 Speaker A Speaker B Speaker A Speaker B Speaker A 45 3000 Frequency (Hz) 2500 2000 1500 1000 500 0 0 5 10 15 20 25 30 35 40 Unknown Speaker 3500 Speaker B 50 Time (s) Fig.1 Representation of results Speaker identification results are usually presented in terms of the error rate. 1997. 10. the method of presenting the results is somewhat more controversial. 2 0. This point does not carry any real preferential information about the “correct” or “desired” operating point. DET Curve for highly mismatched and noisy data There is a controversial operating point on the DET curve which is usually marked as the point of comparison between different results. DET Curve for quality data 100 90 80 70 False Rejection (%) 60 50 40 30 20 10 0 0 10 20 30 40 50 60 False Acceptance (%) 70 80 90 Fig.4 False Acceptance (%) 0. The next section will speak about some open problems which degrade results. It is mostly a point of convenience which is easy to denote on the curve. . 12 10 False Rejection (%) 8 6 4 2 0 0 0.6 0.5 0. This point is called the Equal Error Rate (EER) and signifies the operating point where the false rejection rate and the false acceptance rate are equal. 11.1 0.7 Fig.22 20 Will-be-set-by-IN-TECH Biometrics Figures 11 and 12 show sample DET curves for two sets of data underscoring the difference in performances. Recognition results are usually quite data-dependent.3 0. 12. Vowels are periodic signals which carry much more information about the resonance subtleties of the vocal tract. but it is still the most important source of errors in speaker recognition. 2008) – Cepstral Mean and Variance Normalization (CMVN) (Benesty et al. the greatest challenge in speaker recognition is channel-mismatch. It is. van Vuuren. It is the reason why most systems that have been trained on a predetermined set of channels. a much bigger problem when the quest is the estimation of the model that generated the message – as it is with the speaker recognition problem. This is of course a problem where the goal is to recognize the message being sent. Fig. For text-independent cases. however.. State of the art In designing a practical speaker recognition system. the simplest way is to require more audio in hopes that many vowels would be present. The techniques that are being used in the industry are listed here. Also. the channel characteristics have mixed in with the model characteristics and their separation is nearly impossible. 2008) – Histogram Equalization (HEQ) (de la Torre et al. 2008) – AutoRegressive Moving Average (ARMA) (Benesty et al.. Considering the general communication system given by Figure 13.Speaker Recognition Speaker Recognition 23 21 8. could fail miserably when cellular (mobile) telephones are used. one should try to affect the interaction between the speaker and the engine to be able to capture as many vowels as possible. it is apparent that the channel and noise characteristics at the time of communication are modulated with the original signal. 2008) – RelAtive SpecTrAl (RASTA) Filtering (Hermansky. Removing these channel effects is the most important problem in information theory. 1996) – J-RASTA (Hardt & Fellbaum. In the text-dependent and text-prompted cases. 13. the conversation may be designed to allow for higher vowel production by the speaker.. 1991. such as landline telephone. but there are more techniques being introduced every day: • Spectral Filtering and Cepstral Liftering – Cepstral Mean Subtraction (CMS) or Cepstral Mean Normalization (CMN) (Benesty et al.. 2002) • Other Techniques – Vocal Tract Length Normalization (VTLN) – first introduced for speech recognition: (Chau et al.. 2005) and Cepstral Histogram Normalization (CHN) (Benesty et al. 2006) – Feature Warping (Pelecanos & Sridharan. 2001) and later for speaker recognition (Grashey & Geibler. this may be done by actively designing prompts that include more vowels. Once the same source is transmitted over an entirely different channel with its own noise characteristics. when speech recognition and natural language understanding modules are included (Figure 1).. 1997) – Kalman Filtering (Kim. As mentioned earlier. In that case. the problem of learning the source model becomes even harder. One-way communication Many techniques are used for resolving this problem. 2001) . JFA uses the concept of FA to split the space of the model parameters into speaker model parameters and channel parameters. is a component of all the samples of yn . depending on whether GMMs are used or SVMs. 2. Samples of random variable Θ : (θ n ) M1 ... and the observation. D }. 2003) Speaker Model Synthesis (SMS) (R. 2000) Z-Norm and T-Norm (Auckenthaler et al. et al. FA is a linear transformation which makes the assumption of having an explicit model which differentiates it from principal component analysis (PCA) and linear discriminant analysis (LDA). · · · . since due to the linear combination nature of the factor loading matrix. The second component is the. Nuisance attribute projection (NAP) (Solomonoff et al.. since each element. vector of specific factors. M <= D. the system will not have the ability to distinguish channel specific information. hence producing pure and somewhat channel-independent speaker models. 2000) Speaker Model Normalization (Beigi. · · · . which has a lower dimensionality compared to the combined random state. 2005) Nuisance Attribute Projection (NAP) (Solomonoff et al. The premise behind this approach is that by doing so.. has a hand in shaping the value of (generally) all ˜ (yn )d .. 2011) H-Norm (Handset Normalization) (Dunn et al. In fact in some perspective. It is called the vector of common factors since the same vector. and are common to all training samples. the two techniques of joint factor analysis (JFA) and nuisance attribute projection (NAP) have been used respectively. 2000) Joint Factor Analysis (JFA) (Kenny. θ yn = Vθ n + en ˜ (9) where V : R M → R D is known as the factor loading matrix and its elements. (V)dm .. The model parameters. Therefore. This separation allows for learning the channel characteristics in the form of separate model parameters. Y. in both training and recognition stages. Joint factor analysis (JFA) (Kenny. in most research reports. (en )d is specifically related to a corresponding. This channel specific information is what is dubbed nuisance by (Solomonoff et al. NAP is a projection technique which assumes that most of the information related to the channel is stored in specific low-dimensional subspaces . Y : y : Rq → R D . n ∈ {1.. being used for the support vector machine (SVM) formulation. this linear FA model for a specific ˜ ˜ random variable. so called. are known as the factor loadings. 2009) Will-be-set-by-IN-TECH Biometrics Recently. called the common factors. to one with the capability of telling specific channel information apart. it may be seen as a more general version of PCA. n ∈ {1. · · · . X. It is denoted by E : (e) D1 . 2. on the other hand. 2004) is a method of modifying the original kernel. or sometimes called the error or the residual vector. FA assumes that the underlying random variable is composed of two different components. ˜ (yn )d . 2002). 2004). each element. The first component is a random variable. have a smaller dimensionality. d ∈ {1. 2004) Total Variability (i-vector) (Dehak et al. 2.24 22 – – – – – – – – Feature Mapping (Reynolds. N } are known as vectors of specific factors. 2005) is based on factor analysis (FA) (Jolliffe. (θ )m . related to the observed random variable Y may be written as follows. It makes the assumption that the channel parameters are normally distributed. are common for each speaker. Samples of random variable E : en . Θ : θ : R1 → R M . N } are known as the vectors of common factors. aging. Some even more recent developments have been made in speaker modeling. then as the number of enrolled speakers increases. As the number of speakers increases. causing confusion and eventually identification errors. etc. Furthermore. and ω is the i-vector which is a standard normally distributed vector (p(ω ) = N (0.. doing an exhaustive match through the whole population becomes almost computationally implausible. the large-scale speaker identification problem is one that is quite hard to handle. This means that if one considers a space where speakers who are closer in their vocal characteristics would be placed near each other in that space. μ is assumed to be normally distributed with E {μ } = μ I . ω μ = μ I + Tω (10) In Equation 10. Future of the research There are many challenges that have not been fully addressed in different branches of speaker recognition. Hierarchical techniques (Beigi et al. the speaker space is really a continuum. The covariance for μ is assumed to be Cov(μ ) = TT T . 2009) that the channel space in JFA still contained some information which may be used to distinguish speakers.and channel-independent model which may be chosen to be the universal background model. stress. Since there are intra-speaker variabilities (differences between different samples taken from the same speaker).. Neither background models nor cohort models are error-free. 2009). there will always be a new person that would fill in the space between any two neighboring speakers. the intra-speaker variability will be at some point more than inter-speaker variabilities. This μ triggered the following representation of the GMM supervector of means (μ ) which contains both speaker. total variability space. For example. I)).Speaker Recognition Speaker Recognition 25 23 of the higher dimensional space to which the original features are mapped. But since a test sample which is different from the target sample (used for creating the model) is used. Yet another problem plagues speaker verification. 1999) would have to be utilized to handle such cases. both in terms of the speed of processing and accuracy. Since there are presently no large databases (in the order of millions and higher). As the number of speakers increases to millions or even billions. 9. In most cases when researchers speak of large-scale in the identification arena. they may score better than the speaker’s own model. where T is a low-rank matrix. . the voice of speakers may change due to many different reasons such as illness.and channel-dependent information. In addition. One way to handle this problem is to have models which constantly adapt to changes (Beigi. If one were to only test the target sample on the target model. they speak of a few thousands of enrolled speakers. there is no indication of the results. the problem becomes quite challenging. Background models generally smooth out many models and unless the speaker is considerably different from the norm. the intra-speaker variability might be larger than the inter-speaker variability between the test speech and the smooth background model. This is especially true if one considers the fact that nature is usually Gaussian and that there is a high chance that the speaker’s characteristics are close to the smooth background model. The identity vector or i-vector is a new representation of a speaker in a space of speakers called the total variability space. this would not be a problem. where μ I is the GMM supervector computed over the speaker. The i-vector represents the coordinates of the speaker in the. Another challenge is the fact that over time. these regions are assumed to be somewhat distinct from the regions which carry speaker information. This model came from an observation by (Dehak et al. so-called. EuroSpeech 1999. & Sorensen. & Maes. D. A tutorial on support vector machines for pattern recognition.and Bourlard..org/10. Time Series Analysis. J. Merlin. E. U. H. F. 10. Burges. of course. & McCowan.. and saphe cracking. 1–8. M. M. Bimbot. Similar variability in performance exists in other branches of speaker recognition. A hierarchical approach to large-scale speaker recognition. Handbook of Speech Processing. (2000). Healy. Springer. S. Score normalization for text-independent speaker verification systems. C.26 24 Will-be-set-by-IN-TECH Biometrics There are. The quefrency alanysis of time series for echoes: Cepstrum. Carey. such as identification. H. W. 144–152. pp. Beigi. Auckenthaler. under most controlled conditions.. & Vapnik.. V. Fundamentals of Speaker Recognition. 2203–2206. pp. ISBN: 978-3-540-49125-5. Given the number of different operating conditions in invoking speaker recognition.. (2008). Some of these problems have to do with acceptable noise levels until break-down occurs. and small relative inter-speaker variability compared to intra-speaker variability. (2004). Beigi. G. bandwidth limitation.. 16th Internation Conference on Digital Signal Processing. Bonastre. equal error rates below 5% are readily achieved. Ch. & Markowitz. M. channel and environment change detection. it is quite difficult for technology vendors to provide objective performance results. many other open problems. Rosenblatt (ed. S. However. S. S. H. Beigi. & Huang. Benesty. Proceedings of the fifth annual workshop on Computational learning theory. 15.. & Lloyd-Thomas. S. Fredouille.). S. in M. pp. pp. Speaker. Results are usually quite data-dependent and different data sets may pronounce particular merits and downfalls of each provider’s algorithms and implementation. (2009). Petrovska-Delacrétaz. H. (1999). Digital Signal Processing 10(1–3): 42–54. B.. & Tukey. H. H. Chaudhari. etc. J. J. Magrin-Chagnolleau. J. Guyon. Gravier. ISBN: 978-0-387-77591-3. M. S. Bogert. R. B. J. 209–243. Y. New York.. I. It is quite normal for the same “good” system to show very high equal error rates under severe conditions such as high noise levels. I. R. New york. M.. 10. D. IEEE Signal Processing Letters 11(8): 649–651. C. P. J. (2010). H. V. (1998). (2004). A training algorithm for optimal margin classifiers. Standard audio format encapsulation (safe). I. pseudo-autocovariance. T.1007/s11235-010-9315-1. Telecommunication Systems pp. J. (1963). Meignier. Data Mining and Knowledge Discovery 2: 121–167. Vol. cross-cepstrum. Sondhi.-F. Boser. (1992).. Using a cellular telephone with its inherently bandlimited characteristics in a very noisy venue such as a subway (metro) station is one such challenge. Robust speaker change detection.. . J. Ortega-Garcia. EURASIP Journal on Applied Signal Processing 2004(4): 430–451.1007/s11235-010-9315-1 Beigi... (1998). Proceedings of the World Congress on Automation (WAC1998). URL: http://dx. Maes. References Ajmera. A good speaker verification system may easily achieve an 0% EER for clean data with good inter-speaker variability in contrast with intra-speaker variability. A tutorial on text-independent speaker verification. N. (2011). J. 1–6. Beigi. 5. & Reynolds. Effects of time lapse on speaker recognition results.doi. H. Springer. M. Support Vector Machines versus Fast Scoring in the Low-Dimensional Total Variability Space for Speaker Verification.pdf Kim. John Wiley & Sons. Approaches to speaker detection and tracking in conversational speech. S. Philadelphia. B. Hermansky.. Watson Research Center.. Speech.. New York. pp. Vol. B. P. Principal Component Analysis. (2001). Speaker and Language Recognition Workshop. (2000). Joint factor analysis of speaker and session varaiability: Theory and algorithms. ISBN: 978-3-540-43035-3. URL: http://www. Jolliffe. Grundzüge Einer Allgemeinen Theorie der Linearen Integralgleichungen (Outlines of a General Theory of Linear Integral Equations). Prentice Hall. Dehak. issue 3). (2006). (2009). pp. P.. R. 2006. Gopinath. Dunn.. (2005). N.Speaker Recognition Speaker Recognition 27 25 Burrus.com Hardt.. E. Leipzig and Berlin. 2. URL: http://www. B. Technical report. S. Speech Communication 37(3–4): 59–73. S. Acoustics. O. Furui. pp. & Geibler. J. Using a vocal tract length related parameter for speaker recognition. 1367–1370. Pattern Classification and Scene Analysis. (1997).. (1991). Springer. (1990). IBM Techical Report. T. 134–143. J. pp. Dehak. (1997). A. C. Duda.crim. S. pp. Gray. Compensation for the effect of the communication channel in the auditory-like analysis of speech (rasta-plp). Hermansky. Peinado. 1–5. Spectral subtraction and rasta-filtering in text-dependent hmm-based speaker verification. heft 3 (Progress in Mathematical Sciences.. Reynolds. J. Berlin/Heidelberg. 867–870.. S. In German. de la Torre. Hilbert. P. Lai. S. Introduction to Wavelets and Wavelet Transforms: A Primer. & Gopalakrishnan. D.Bartleby. ISBN: 0-471-22361-1. C. . P. & Rubio. I. (1998). R. (1997). Anatomy of the Human Body. pp.ca/perso/patrick. S. (2005). K.P.. IEEE Odyssey 2006: The. C. J. Speaker. Perceptual linear predictive (plp) analysis of speech. Kenny. Feature domain compensation of nonstationary noise for robust speech recognition. Digital Signal Processing 10: 92–112. New York (2000). SPECOM. (2005). Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH-91). T. C. Fortschritte der Mathematischen Wissenschaften. Grashey. R. Springer. 20th edn. Originally published in 1904. Segura. ISBN: 0-134-89600-9. Brummer. (1973). (1918). 1559–1562. H. & Shi. (2002). D. LEA and FEBIGER. Interspeech. A. environemnt and channel change detection and clustering via the bayesian inromation criterion. Campbell. New york. A. H. H. J. and Signal Processing. Chen.kenny/FAtheory. 87(4): 1738–1752. K. C. & Fellbaum. CRIM. (1912). Online version. ICASSP-97. Kenny. N. & Hart. S. A. Proc. Ouellet. H. R. New york. M. C. Benitez. D. & Guo. Speaker recognition: a tutorial. Active Media Technology.. 1–9. F. A. Chau. P.. N.G. Lecture Notes in Computer Science. 1997 IEEE International Conference on. a. Perez-Cordoba. Histogram equalization of speech representation for robust speech recognition. P & Dumouchel. 50 years of progress in speech and speaker recognition. (2002). Feature vs. Proceedings of the IEEE 85(9): 1437–1462. 1997. E.J. L.. 2nd edn. IEEE Transaction of Speech and Audio Processing 13(3): 355–366. model based vocal tract length normalization for a speech recognition-based interactive toy. Teubner. & Quatieri. . Language and Speech 2: 123–131. & Holmes. Nolan. Pelecanos. S. pp. F. B. University Park Press. & Sumby. 784–787. (1983). 1982. Digital Signal Processing 10: 1–18. A.. Channel robust speaker verification via feature mapping. (1954). An experiment concerning the recognition of voices... Speech. Eurospeech 1997. Vol. A. pp. W. Ordowski. H. Quatieri. 1637–1641. 57–62. Beigi. Vapnik. I. . Reynolds. & Dunn.. (2000). (2005).The Speaker Recognition Workshop. (1997). A. (1988). A. Statistical learning theory. Information access using speech. M. The Speaker and Language Recognition Workshop Odyssey 2004. Shearme. B. Estimation of Dependences Based on Empirical Data. International Conference on Machine Learning and Cybernetics. T. USC (2005).. N. (2003). pp. ISBN: 0-521-24486-2. Vol. Reynolds. S. (2004). and Signal Processing (ICASSP-1988). C. Martin. ISBN: 0-26-213360-1. 2003. Zhang. Moscow. Feature warping for robust speaker verification. pp. (1979). Z. W.-W. speaker and face recognition. (2000). IEEE International Conference on Multimedia and Expo (ICME2000). J. F. Voice Identification: Theory and Legal Applications.28 26 Will-be-set-by-IN-TECH Biometrics Manning. T. 2.gov van Vuuren. International Conference on Spoken Language Processing (ICSLP). International Conference on Spoken Language Processing. M. Vapnik. Li. Cambridge University Press. A. R... New York. (2000). Solomonoff.census. A model-based transformational approach to robust speaker recognition. 1. & Quillen. Disability census results for 2005. Pollack. English Translation: Springer-Verlag. pp. The nist 1999 speaker recognition evaluation – an overview. Workshop on Parallel Distributed Computing for Machine Learning. & Przybocki. Poritz. II–53–6. F. Hidden markov models: a guided tour. Baltimore. D. D. J. Doddington. 2. pp. International Conference on Acoustics. (1999). 7–13. M. The phonetic bases of speaker recognition. H. (1979). M. Comparison of text-independent speaker recognition methods on telephone speech with acoustic mismatch. Viswanathan. 495–498. C.-P. 3. Parallelization of the incremental proximal support vector machine classifier using a heap-based tree topology. A. Tveit. ISBN: 978-0-839-11294-5. (2001).. Channel compensation for svm speaker recognition. (1959). Acoustics. A parallel svm training algorithm on large-scale classification problems. M. S. Vol. Tosi. Boston. V. J. Martin. On the identification of speakers by voice. The MIT Press. Vol. pp. O. R. B. (1998). John Wiley.. H. S. J. Digital Signal Processing 10: 19–41. & Engum.. (1996). and Signal Processing. & Sridharan. pp. Foundations of Statistical Natural Language Processing. ISBN: 0-471-03003-1. URL: http://www. (2003). The det curve in assessment of detection task performance. D. V. New York. & Maali. & L. J. russian edn. J. (2000). Kamm. N. Proceedings. N. A Speaker Odyssey . Campbell. G. T.. Nauka. New York. 1–8. 2003 IEEE International Conference on. Pickett.. I. (ICASSP ’03). 1. Speaker verification using adapted gaussian miscture models. World Wide Web. Speech. Vol. N. & Yang. Journal of the Acoustical Society of America 26: 403–406. & Przybocki. A. 213–218. such as ATMs. Hitachi of Japan history of research & development on finger-vein recognition technology . Financial losses due to identity theft can be severe. facial and others. Historically. Today 70% of major financial institutions in Japan are using finger vein authentication. Popoola and Jingyu Li Fig. Hui Ma. Their research reports false acceptance rate (FAR) of as low as 0. automatic authentication systems for control have found application in criminal identification. This higher accuracy rate of finger vein is not unconnected with the fact that finger vein patterns are virtually impossible to forge thus it has become one of the fastest growing new biometric technology that is quickly finding its way from research labs to commercial development. Kejun Wang. Oluwatoyin P. and the integrity of security systems compromised. R&D at Hitachi of Japan (1997-2000) discovered that finger vein pattern recognition was a viable biometric for personal authentication technology and by 2000-2005 were the first to commercialize the technology into different product forms. Introduction Smart recognition of human identity for security and control is a global issue of concern in our world today. iris. Among the many authentication systems that have been proposed and implemented. finger vein biometrics is emerging as the foolproof method of automated personal identification. autonomous vending and automated banking among others. Harbin Engineering University China 1. This technology is impressive because it requires only small. It is a fairly recent technological advance in the field of biometrics that is being applied to different fields such as medical. financial. law enforcement facilities and other applications where high levels of security or privacy is very important. College of Automation. and has a very fast identification process that is contact-less and of higher accuracy when compared with other identification biometrics like fingerprint.1%. relatively cheap single-chip design.0001 % and false reject rate (FRR) of 0.2 Finger Vein Recognition Pattern Recognition & Intelligent Systems Department. Finger vein is a unique physiological biometric for identifying individuals based on the physical characteristics and attributes of the vein patterns in the human finger. Hence. 1. [6]. . The vein images of most people remain unchanged despite ageing. The resulting vein image appears darker than the other regions of the finger. These two methods take the directionality characteristic of fingerprints into account. However. Therefore. so also they do have unique finger vein images. Authors in [10] used the directionality feature of fingerprint to present a fingerprint image enhancement method based on orientation field. The extraction method has a direct impact on feature extraction and feature matching [2]. faces limitations in obtaining the desired accurate segmentation results. limitations like the deterioration of the epidermis of the fingers. 2. total mean. vein object extraction significantly affects the effectiveness of the entire system. reference [9] proposed an oriented filter method to enhance the image in order to eliminate noise and enhance ridgeline. (1) Its universality and uniqueness. we discuss in this chapter. Just as individuals have unique fingerprints. and those based on particular theories and tools.[8]. the application of the traditional singlethreshold segmentation methods such as fixed threshold. total Otsu to perform segmentation. In a related research. with directionality being consistent within the local area. [7]. (3) The condition of the epidermis has no effect on the result of vein detection. There are those based on region information. Processing After vein image extraction. finger vein pattern extraction methods using oriented filtering from the directionality feature of veins. Finger vein pattern also has textural and directionality features. improve these results but still cannot effectively deal with noise and over-segmentation effects [3]. because only the blood vessels absorb the rays. The standard practice is to acquire finger vein images by use of near-infrared spectroscopy. Vein object extraction is the first crucial step in the process. Recognition performance relates largely to the quality of vein object extraction. so they can enhance and de-noise fingerprint images effectively. Vein recognition technology however offers a promising solution to these challenges due the following characteristics. The traditional vein extraction technology can be classified into three broad categories according to their approach to segmentation i. The reasons being: the ease of acquiring fingerprints. comes segmentation. finger surface particles etc result in inaccuracies that call for more accurate and robust methods of authentication. Inspired by methods in [9] and [10]. These desirable properties make vein recognition a highly reliable authentication method. and then extracting the vein object from an enhanced oriented filter image. However. (4) Vein features are difficult to be forged and changed even by surgery [1]. the availability of inexpensive fingerprint sensors and a long history of its use.30 Biometrics Fingerprints have been the most widely used and trusted biometrics.e separating the actual finder veins from the background and noise. Using multi-threshold methods like local mean and local Otsu. [5]. These utilize the directionality feature of finger vein images using a group of oriented filters. those based on edge information. The aim is to obtain vein ridges from the background. finger vein patterns in the subcutaneous tissue of the finger are captured because deoxygenated hemoglobin in the vein absorb the light rays. When a finger is placed across near infra-red light rays of 760 nm wavelength. (2) Hand and finger vein detection methods do not have any known negative effects on body health. [4]. The uniformly illuminated image becomes normalized by this formula: M 1 M 1 N 1   I(i . j )    VAR0  ( I ( i . which facilitates the subsequent processing. I ( i . j )  M  M0  VAR  G( i .1 Normalization Normalization is a pixel-wise operation often used in image processing. which the traditional filter methods do not take into account. j )  M )2 . j )  M )2 . . the output image is ready for next processing step. The main purpose of normalization is to get an output image with desirable mean and variance. VAR0  255 are desired mean and variance values respectively. therefore.2.2 Oriented filter enhancement Finger vein pattern has directionality characteristics. j )  M( I ))2 M  N i 0 j 0 (2)  VAR0  ( I ( i . This algorithm utilizes a group of oriented filters to filter the image depending on the orientation of the local ridge. 2. I ( i . Original image and Normalized image (b) Normalized finger vein image 2. (a) Original finger vein image Fig. We note that Fig. After normalization.Finger Vein Recognition 31 2.2 image has lost some of its contrast but now has uniform illumination. The result of the abovementioned process is shown in Fig. We propose a vein pattern extraction method using oriented filtering technology that takes into account the directionality feature of the veins. its resultant filtering enhancement is not satisfactory. j ) M  N i 0 j 0 (1) VAR  1 M 1 N 1   ( I(i . j )  M  M0  VAR  (3) Where M and VAR denote the estimated mean and variance of input image and M 0  150 . where we use the direction of every pixel of an image to represent the original vein image. We can determine the direction of pixel according to the gray distribution of the neighborhood. A continuous w  w window can be used to modify the incorrect ridge orientation and smooth the point directional map. 2. M 1 and M 5 are perpendicular in direction. the direction of the vein is quantized into eight directions. Pixels along the vein ridge have minimum gray level difference while pixels perpendicular to the vein ridge have the maximum gray level difference. Divide M i (i  1. M 3 and M 7 .. the pixels direction is given by:  jmax .. The closer value is its direction. so they belong to the same group. y ) is defined as: O( x . 2. In each group. Choose the maximum M to determine the pixel’s possible directions jmax and jmax +4. the pixels’ orientations are generally uniform. The main steps of the method are: 1. M 4 and M 8 belong to the same group.. j  1. the estimated orientation field may not always be correct. Due to the presence of noise in the vein image. (4) where j is the direction of the vein. M 2 and M 6 . For the same reason. we can obtain the directional image D( x . Therefore. 4 3. To estimate the orientation field of vein image. The values 1-8 of the template correspond to eight directions rotating from 0 to  . The experiments show that w =8 is very good. 4. and so a local ridge orientation is specified for a block rather than at every pixel.8) . y )  ord(max( N i )) (6) . y )    jmax  4.8) . In a small local neighborhood. we choose reference pixel p(i .32 a. Using a 9  9 template window as shown in Fig. 2. Pixels’ direction refers to the orientation of continuous gray value.. y ) ’s orientation. Using a smoothing process on the point directional map. We obtain each window’s directional histogram and choose the peak value of the direction histogram as P( x . use the 9  9 rectangular window (shown as Fig.. Determine the actual direction of p(i . M  M j  M j  4 . a continuous directional map is obtained. calculate M . y ) of the vein image. The continuous directional map O( x .. j ) by comparing its gray value with the gray value averages of jmax and jmax +4.2) to obtain the pixel gray value average M i (i  1.into four groups..2. Calculation of directional image Biometrics The directional image is an image transform.. For every pixel.the absolute difference between two gray value averages M j and M j  4 . j ) the center of the direction template. otherwise  (5) When the above process is performed on each pixel in the image. if M  M j  M  M j  4  D( x . Each interval has and angle of  / 8 in an anticlockwise direction from the horizontal axis. central part of the filter template coefficient should be positive while both sides of the coefficient negative. 9×9 rectangular window 6 7 8 1 2 3 4 6 5 5 p(i. function ord(*) is used to obtain the subscript of element *. the horizontal filter mask is rotated according to the direction of the vein.8 . Directional image of finger vein Where i  1. The width of the filter template should be odd in order that the template is symmetric in the direction of horizontal and vertical. To generate the seven other masks. The oriented filter’s size is decided according to vein ridge width.3 . As can be seen from the Fig. In the vertical direction. The sum of all template coefficients should be zero. 3. The template coefficients of horizontal mask are designed first.j) 4 4 3 2 1 8 7 6 3 2 1 8 7 4 5 5 6 Fig. each one associated with the discrete ridge orientation of finger vein pixels. vein ridge has the feature of directionality. O’Gorman’s rules for filter design which is described for enhancing fingerprint images consists of four key points: 1. and the vein ridge orientation varies slowly in a local neighborhood. An appropriate filter template size. From experiments. 4. 2. a filter template of size 7 has been shown to be quite effective. we modify the filter’s coefficients so that they decay from the center to both ends of the template. Oriented filter The vein directional image is a kind of textured pattern generated by using oriented filters based on directional map to enhance the original images. 3. .4. We designed eight filter masks. a corresponding filter is selected to enhance this block image. The directional image of finger vein is as shown in Fig. From the direction determined for a specific block (from the original image).Finger Vein Recognition 33 7 8 1 2 3 Fig.. 4. Applying the above rules based on the direction of the finger vein. 2..Each color of directional image corresponds to a direction. b. j ) represents the pixel of original image. Td represents the filtered image. f min (i . j ) represents transformed gray of (i . Equation (8) is used to adjust the gray values to fall within the range. and d represents the direction value of (i . j )  Round( f (i . f (i . we rotate the horizontal filter mask according to the following equation: Where (i . j ) is usually not an integer.. where d  a  d  0. j ) in the input image. .. j )  f min (i . j ) and Round(. It is possible that some gray values fall outside the [0. f max (i .  represents the rotation angel. 5. To generate the other seven masks. g  (i. 255] range. a. An example of oriented filter is shown as Fig. j ) represents biggest gray value of original image. j )  f min (i . 2. j ) represents smallest gray value of original image.5. Now.This filtering technique is given by: f (i .. j) (the coefficients in the rotated mask) is equal to g1 (i . f (i . j ) . j )  x 3 3 G(i  x . j ) as the center. c  0. y represents the size of oriented filter template and g ( x . (i . d  1.) is a rounding function. j ) . so we need to use nearest neighbor interpolation to get the coefficients of the rotated mask. we select a 3 neighborhood that takes (i .8 . and x .  i   cos  j     sin     sin    i'    cos   j '   (9) Where   ( d  1) /  .. j ) represents the coordinates in the horizontal mask. j ) . j ) represents the original gray of (i . The coefficients of the filter template should meet the following given conditions: d  2 a  2 b  2c  0 . j )  255) f max ( i . j ) (8) Where f (i . y )   y 3 3 (7) where (i . y ) represents the corresponding coefficients of the template. j) represents the ones in the rotated mask. It is filtered with the mask that corresponds to the block orientation of the center (i . and (i. .34 Biometrics c / 3 2c / 3 c c c 2c / 3 c / 3 b /3 a /3 d /3 a /3 b /3 2b / 3 2a / 3 2d / 3 2a / 3 2b / 3 b a d a b b a d a b b a d a b 2b / 3 2a / 3 2d / 3 2a / 3 2b / 3 b /3 a /3 d /3 a /3 b /3 8 16 24 24 24 16 8 0 3 10 3 0 0 6 20 6 0 0 9 30 9 0 0 9 30 9 0 0 9 30 9 0 0 6 20 6 0 0 3 10 3 0 c / 3 2c / 3 c c c 2c / 3 c / 3 (a) Template coefficient of horizontal oriented filter 8 16 24 24 24 16 8 (b) An example of oriented filter Fig. To use this method. j ) (the coefficients in the horizontal mask). Template coefficient of horizontal oriented filter and an example of oriented filter The coefficient spatial arrangement of the horizontal mask is shown in Fig. j  y ) g ( x .5 (b). aiming at every pixel (i . jn )  ( i  jm )  [ g1 (in . jm  j  jn . g (in . y )  k  s( x . a  H ( x )  Td (T ( a0 )) represents the mean value of a0  T1 (Td1 ( a)) ’s r  r neighborhood. jn )  g1 ( im . jn ) are their corresponding coefficients in the horizontal mask. jm )] (12) Once we have all eight filter masks. jn )  g(i . (in . jm ) and  : g(i . jm )  ( j  jm )  [ g(i . and g (im . jn ) compose a square centered at pixel (i . The experiments show that v( a)  T ( v( a0 )) equal to 9 and v( a0 )  T1 ( v( a)) equal to 0. where im  i  in . make interpolation between (im . jn ) . jm )] (11) Finally. jm ) .Finger Vein Recognition 35 Suppose the four points (im . (im . Then the result of the following formula is used as threshold to segment image. g (in . T ( x . The basic idea of this algorithm is that for every pixel in the input image. y )  m( x . g (im . k is a correction coefficient. jm )  (i  im )  [ g1 (in . y ) represents the standard deviation of v( a)  v( a) ’s a0  T ( v( a)) neighborhood. 6(a). jm ) . j ) . 6(b). and s( x . their coefficients can be used to enhance vein image. y ) (13) Fig. j)  g(i . jm ) . a  M . calculate their mean and variance of its r  r neighborhood. make interpolation between a0 and b0 : g (i. jm )  g1 (im . There are some . (in . jm ) . jm ) . jn ) : g(i . y ) represents the threshold of  a . 2. y ) represents pixel in the input image. jn )] (10) Then make interpolation between (im . so we choose this method to segment the image. First. jn )  g1 (im .01is excellent. Image after segmentation and noise removal Where ( x . Image enhanced by oriented filter Fig. jn ) and (in . T ( x .3 Image segmentation NiBlack segmentation method is a commonly used simple and effective local dynamic threshold algorithm. jm )  g1 (im . which has been reduced dimensionally. highly accurate etc. To date. to recognize the finger vein while Xiaohua Qian [17] used seven moment invariant finger vein features. Kejun Wang [15] combined wavelet moment. Euclidean distance and a pre-defined threshold were used as the classifying criterion for matching and recognition. we have standardized the height of the image to 80 pixels. this has been the most popular method for dimensionality reduction in finger-vein recognition research. by extracting vein object from the oriented filter enhanced image. Here the metric of finger-vein image is converted to a one-dimensional vector. fusion and matching Finger vein recognition as a feature for biometric recognition has excellent advantages such as being stable. researchers usually first partition the finger-vein image and then principal component analysis (PCA) is applied. the finger-vein image is first binarized. Because of variability in image acquisition and the inherent differences in individual samples. In the experiment. which extracts features at multi-scale. Chengbo Yu [13] defined valley regions as finger vein features such that real features could not be missed and the false features would not be extracted.36 Biometrics small black blocks in the background and some small white holes on the target object in the segmented image. unique. the size and ratios of extracted finger veins are often inconsistent. Zhang's algorithm can capture features from degraded images. Naoto Miura [14] proposed one method for finger-vein recognition based on template matching.7. immune to counterfeiting. Fig. contactless. PCA and LDA transform for finger-vein recognition. Such noise can be removed with area elimination method. Standardized finger vein image width Accurate extraction of finger vein pattern is a fundamental step in developing finger vein based biometric authentication systems. The finger vein pattern extraction method proposed and discussed above extends traditional image segmentation methods. and another major limitation is that it cannot recognize distorted finger-vein images correctly. and embedded hidden Markov model is used for finger-vein recognition. In order to facilitate further research there is a need for standardization of the segmented vein image height (and width). The normalized image is shown in Fig. Xueyan Li [16] proposed a method. Zhongbo Zhang [18] proposed an algorithm based wavelet and neural network. This makes finger-vein recognition widely considered as the most promising biometric technology for the future. . This approach is time intensive. Feature extraction. 3. To deal with the problem of high dimensionality. 7. which combines two-dimensional wavelet and texture characteristic. and then using a distance transform noise is removed. In the experiment. The addition of oriented filter operation extracts smooth and continuous vein features from not only high quality vein images but also noisy low quality images and does not suffer from the over-segmentation problem. a[s  t ]    a . because relative distance and angle don’t change. besides. What is more. we know  s . The topology produced by all the character point connections can be called the image M .. and the differences in angles produced by these intersection points connections. all combined for recognition. and v( a )  ( g0 . an  1  n  N . were a[s ] denotes a vector a . is Td ( a)  a[s ] . This approach makes full use of the uniqueness of topology... a[s ]  . t  N .Finger Vein Recognition 37 3. 3.  is the angle between two vectors. where M is a subset of D . we get the image M  M[s ] . Image translation and rotation occurs in D . as  1 .. A random translation of image M is translated by Td . finger veins are tiny and few in number.. First.(0  s  n) . t  N .The thinned finger-vein image is illustrated as a function denoted as M . as  n  1  . The relative distances and angles remain constant before and after translation and rotation. a1 . Then M can be shown by the vectors: a  a0 . a[s]  i i i From formula (14)... Theorem 1.1. and the length of line segment is written as  a  . b.  Td ( a)  a( s ) which leads to v( a)  v( a[s ])  v(T ( a)) .. finger vein diameter is not consistent and the texture characteristic is aperiodic.  gs ( a[t ])  gs ( a) which satisfies v( a[t ])  v( a) (15) For every Td . Therefore. a[s ]  (14)   a[t ]. b  denotes the inner product of a and b . where v( a ) is the convolution of a . the relative distances and angles produced by the character point connections are in unchanged . the gray-scale is uneven and contrast is low. the method based on these identified characteristics has great use in practice. we discuss a method based on relative distance and angle. Suppose Td is a translation in D . such that only very few features can be extracted.. a[ s  t ]   ai [t ]ai [ s  t ]   ai  t [0]ai  t [s ]  ai [0]ai [s ]   a[0]. This method overcomes the influence of image translation and rotation. Proof:  s . When near infrared light is used to acquire the image. which is proved as follows. there is  a[t ]. gn  1 ) . Now after image M is translated by s unit distances in the plane D . However. because fingers have curved surfaces. where  a . Image translation and rotation implies that every point of the image is translated by a vector and rotated by the some angle. To deal with these problems some novel fusion methods are used.. g1 . and c be non-zero vectors. a change in the finger position can cause image translation and rotation and influence recognition negatively. the varied distances between the intersection points of two different vein images. which is defined in field D .1 Novel finger vein recognition methods based using fusion approach The above-mentioned algorithms have different advantages for different problems in finger-vein recognition. After transformation... Theorem 2. If gs   a . translated by s unit distances.1 Theoretical basis Let a. then v( a )  v(T ( a)) . Let a[s ]  as . makes the image M rotate around its ordinate origin by  . a  T1 ( a) . and on this basis. and T =  (TT T ) =1. Fuse these two features by “Logical And”. too. and negative otherwise). (  is positive while clockwise. A.  a . then a0  T1 (Td1 ( a)) . b   0 a  b (18) Theorem 3.  () is the matrix radius. Only when both features are approximately the same. there is\  cos  sin   [ a .38 Biometrics Proof: In the plane D . combined with region merging and watershed algorithm. b0 ]     sin  cos   (16) Here [ a . Any angle  .1. b ] is a homogeneous orthogonal rotated matrix. b  T a0 . partitioning the image into several regions as shown in Fig. b0 ] T = [ a0 . which is a linear transform T . match any two images to get the number of relative distances and angles that correlate. A fully meshed topology is formed by selecting the intersecting points on the thinned finger-vein image as character points and connecting these points to each other with straight lines. . suppose a  H ( x )  Td (T ( a0 )) . we know v( a)  v( a) .2 Method description Extract the intersecting points from the repaired thinned finger-vein image and connect all the points with each other. thinned and further repaired. further because of a0  T ( v( a)) and v( a )  T ( v( a0 )) . T b0  T This leads to 2  a0 . and the angle feature  . Otherwise. b0   =arc cos  a.  0 is the angle of a0 and b0 . Accordingly. This results in image M . then a0  T1 ( a) . b ]  [ a0 . For  a and b . Suppose. so v( a0 )  T1 ( v( a)) . Finger vein topology Using Kejun Wang’s method [19] for pre-processing hand-back vein image. b0  a0 . From theorem 1. Compute the relative distances and angles to get the relative distance feature M. a  M  .8. the matching has failed.  b  b0  (17)  a . The relative distances and angles are invariable after the translation and rotation Proof: Suppose the image M converts to M after translation by Td and rotation by T .  a0 , b0 , c0  M denotes the line segment vector produced by character point connections. All this leads up to v( a0 )  v( H ( a0 )) (19) 3. the finger-vein skeleton is extracted. is the matching successful. Then we can write T a  aT a  a0 TTT a0  a0 . (i) and (ii) are two finger-vein images from same source.1. However.are combined for matching.3 Matching finger-vein images using relative distance and angles From the thinned finger-vein image of Fig. (iii) is of a different source and its topology is obviously different from (i) and (ii). The inner characteristic points produced by the intersecting vein crossings reflect the unique property of the finger-vein. However. Relative distance and angle are essential attributes of finger-veins. we can see the random finger-vein pattern and inner structure.Finger Vein Recognition 39 (i) (ii) (iii) (a) The raw image of finger-vein (b) The image after thinning (c) The image after repairing and marking the intersecting points (d) The image after extracting the intersecting points (e) The fully meshed image after connecting the intersecting points Fig.8 (b). For this reason. 3. Fusion of the two . 8. Specifically. so their topology is similar.8. those breakpoints may be thought as finger-vein endpoints. The finger-vein image feature extraction process In Fig. the two features -relative distance and angle . the more reliable intersecting points are chosen to characterize finger-veins. which ensure the feature uniqueness and reflect different characteristics of finger-vein structure. the relationship between corresponding character points is of importance. the topology expresses an integral property and peculiarity of fingerveins. which would influence recognition results. Considering that different line segments are produced by intersecting points from different finger-vein images. we resize the vein image into a specific image block size to facilitate further processing. The detailed steps are as follows. If the number is greater than the pre-defined threshold. go to next step. .1 Finger vein feature extraction Different people have different finger lengths.40 Biometrics features with “Logical And” make the recognition results more reliable.. there are d points of intersection in one image. with each other. the sub-blocks are created with an overlap of 60 pixels for every 80 × 80 image sub-block.. where l is the distance of any two intersecting points. Thus if image sizes are not standardized. Compare m relative distances from R1 with n relative distances from R2 . Here a set of finger-vein image features is defined as R  (lm .. The original image is standardized to a height of 80 pixels and split along the width into 80 × 80 sub-image block size. Suppose. u ) . u ) and R2  (ln . connect the q character points in the two sets respectively. Suppose there are q eigenvalues of approximately equivalent relative distances. else. Also. Suppose. 1. Thus. If the number is greater than the pre-defined threshold. which are denoted as  z 1 and  z 2 in R1 and R2 respectively. In this part.  is the angle produced by the point connections. n  1] Here we define the sub-block of the image width w . Thus.2. Calculate the relative distances and angles of finger-vein image. If the image is split evenly (given that the image width is generally about 200 pixels) there will be loss of information that will affect recognition. z angles are produced. calculate the number of approximately equivalent angles. 3. with sufficient characteristic quantities for identification. we define  lm  ln  e to show the extent of similarity between any two Eigen values ( e is the allowable error range). else. matching two finger-vein images is converted into matching the similarity of topologies. v ) are two sets of finger-vein image features. there can be variation in the image captured for the same person due to positioning during the image capture process. Similarly. then the number of relative distance is d( d  1) / 2 . by calculating the number of approximately similar relative distances. A1 . the matching is assumed to have failed. there is bound to be representation error which leads to a decrease in the recognition rate. To take care of position error of those points. An  1 ] (20) Which: Ai is a column vector. the matching is successful. e =0.. Amn  [ A0 . On this basis. i  [0. y ) .2 Finger vein recognition based on wavelet moment fused with PCA transform 3. From experimental analysis. The original image can thus be split into 6-7sub-images. A2 . R1  ( lm . From experimental analysis. standardized image height h (in experiment w=80,h=80 ). Therefore. The number of angles produced by the point connections is d( d  1)( d  2) / 2 . Set a matrix Amn to represent the standardized images f ( x .0005 is very appropriate.006° is very appropriate. 2. the matching is thought to have failed.   m   n  e is used to show the relationship of two approximately equivalent. e =0. 3. Sub-images are extracted at interval r when (in this experiment r =20). m and u are the number index respectively. 12. Aw  1  r   Bk   Akr .. For each sub-image Bi  x .. y  .. A wavelet moment feature is invariant to image rotation...image... Aw  1 Thus can get sub-image matrix: ... The sketch map of sub-image extraction 3.Finger Vein Recognition 41 B1   A0 . Then we extract features for each sub-image Bi.. B2   Ar .12. v 2 .. B1 B2 Bk  v1  v2  vk feature vector:    V  v1 .. its size is w  h .. feature extraction and recognition process shown in Fig...2... n  w  k  1  r  (21) Thus we get a total of B1 . .2 Wavelet transform and wavelet moments extraction Wavelet moment is an invariant descriptor for image features. translation and scaling so it is successfully applied in the pattern recognition. The finger block. and the size of each sub-image is w  h .. B2 . y  . Bk k sub. Aw  1  kr  [x] takes the maximum integer less than x . v k    Fig. we can make wavelet decomposed image Bi  x .... Applying two dimensional Mallat decomposition algorithm.. and 1 2 3 D1 . n) is the coefficient of A1 k d1 is the coefficient of the three high frequency components. n) is the scale function  1k (m . q  2 ( p  q  1) j  m p nqc 1  m .2. n)  f ( x . D1 . n   f ( x .n )  c  m . y ) .42 Biometrics Fig. 2. Set wp . q expressed as (p+q) order central moments of f ( x . n   m .1  m . y   Bi  x . The sub-image and result of wavelet decomposition Setting f  x . The wavelet moment approximation is: 0  wp . nZ  k  ( p  q  1) j  wp . n   k D1  k 1 ( m . n)  Where k  1. y )  A1  D1  D1  D1 the wavelet (22) where A1 is the scale for the low frequency component (i. Using each sub-image Bi directly without the PCA transform not only leads to poor classification of extracted features but also huge computational cost.n )  d  m. decomposed layer is 1 2 3 f ( x . k 1 k 1 k 1 (24) d (m . After the low-frequency wavelet sub- . n) is the wavelet function Daub4 was chosen for wavelet decomposition. A1  ( m .3 PCA transformation The advantage of using wavelet transform to reduce computation is explored here. D1 are the scales for the horizontal. n   m . n) m . n   m.e. 3 c1  (m . n  . 3. as it produced better identification results from several experimental compared with other wavelets. approaching component). n . (m . 13. y   L2 ( R 2 ) to be the analyzed sub-image vein blocks. y ). nZ  (25) Here we access the wavelet moment w22 . 1 1 (23) c1  m . 1 (m . q  2  mpnq d1k (m. y ). We use the approximation wavelet coefficients c j to compute the wavelet moment [4]. vertical and diagonal components respectively. 14. and an overlap of 60pixels as earlier described).2.Finger Vein Recognition 43 images compression of the original image to about one-fourth of the original size. there is an interval of 20pixels between two adjacent subblocks. In practice.2. Lm is the total number of sub-blocks for person m. PCA decomposition is applied on the sub-image which greatly reduces computation.3…. (Note that Fig. The sketch map of sample classes after PCA transform To illustrate the problem. we take finger vein samples from a total set of c people.14 is only for illustration purpose.. For PCA. The n-th sub-block set of the m-th person is indexed as km .4 The transformation matrix Here we analyze a layer of Bi wavelet decomposition of the low-frequency sub-image.14. L) . 3. Each sample of the same finger has five images as shown in Fig. Five finger vein images per person (*same finger) The Five finger vein images of the mth person Fig. where n = (1. . A1 is transformed to a separate wh / 4 dimension of image vector   Vec( A1 ) . n . .. Each sample is transformed into a lower d -dimensional space in the post-dimensional i i i feature vector ei  [ e1 ... for simplicity.... Take d before the largest eigenvalue corresponding to the standard eigenvectors orthogonal transformation matrix P  [1 .. we get the best recognition results.4 . . we found that when d  200 we can get a good result. when kmin = 5. N is the sample number. PCA method is the best for describing feature characteristics.....1 . ed ] is the PCA extraction of feature vectors from sub-image blocks.2.. For each sub-image blocks Bi . i . through the wavelet decomposition of the low-frequency sub-image A1 ... The total number of training samples is N  5C .. we write:  i . d ] .kmin is the number of sub-image of each finger image. k1.  i . 2.e a 300-dimensional compression.3 . k1.44 Biometrics To compare images we only use the kmin sub-image set where kmin = L = min (L1 .2 . A1 in accordance with the preceding method into a column vector  .e.. Our classifier design follows dimension reduction to get PCA feature vectors .....1 . L2 . After several experiments. of S1 (the value of these features have been lined up in sequence by order of 1  2  ... Mean of the i-th class training sample: i  Mean of all training samples: 1 5  i . and all are wh / 4 dimension column vectors. we use the LDA method for further classification of PCA features.. then kmin  min( k1... and when d  300 i.. 3... Then we can obtain the characteristic value 1 . but not the best for feature classification.5 ) (26) Thus. i. i .. 2 . j 5 j 1 (27)  The scatter matrix is: 1 C 5  ij N i 1 j 1 (28) St   P(i )( i   )( i   )T i 1 C (29) where P(i ) is prior probability of the i-th class of training samples... i  1.. In order to get better classification results. 1 ... ) and its corresponding eigenvector 1 . a total of C  c  kmin of the available pattern classes.. 2 .5 LDA map In general... k  1. 2 .. e2 . the following formula: e  PT  (30) This e  [ e1 .2 .. 2 . L3 . . e2 . ed ] . kc . C . 2.Lm ) .5 . Four of the corresponding samples in the i-th class ( i  5m  k ).. Using PCA projection matrix P.  i .5 .... to extract the features use the transformation matrix P obtained in the previous section.  2 ... s2 are the share of feature matching scores. z ' (33) From several experiments.. 2... Here we define: K  min( k . 1 Calculate the corresponding matrix Sw Sb of the l largest eigenvalue eigenvectors  1 . we obtain wavelet moment feature of the finger matching score: w _ mark   w _ marki i 0 K (35) Similarly. s1  s2  1 .. we set two threshold vectors wt zt . vi   w22 . (31) Thus. l . z] .. s2  0. we create a match V and V ' . combining the scores: total _ mark  s1  w _ mark  s2  z _ mark s1 . N is the sample number. v 'i   w '22 ...2. zli ]  WLDA ei .Finger Vein Recognition 45 e1 . Bi is characterized by vi  [ w22 .. we can use the best classification feature z vector to replace the feature vectors e for identification and classification. first analyze the corresponding subimage blocks vi and v 'i . (36) .. of the feature vector z scores z _ mark . that is...  i is defined for two feature vectors w and w ' from V and V ' . k and k ' is not necessarily the same. Then we use the LDA transformation matrix WLDA as i i T zi  [ z1 .. A matching score defined for V and V ' feature w22 of wavelet moment of corresponding sub-image Bi matching score:  wt   i  w _ marki   wt 0  if  i  wt (34) else Finally. Finally. k ') (32) Taking the K vectors of V and V ' for comparison.. 3.6 Matching and recognition Through the above wavelet decomposition and PCA transform for each sub-image Bi . The first step is the length of V and V ' and may not be the same. we obtain wavelet moments w22 and extract feature vector z of PCA and LDA. e2 . i  1. z2 .. 2 . Matching feature vectors of finger1 vi  [ w22 . Euclidean distance between Bi sub-images. z .... The l largest eigenvectors corresponding to the LDA transformation matrix WLDA  [ 1 . l ] . and s1  0. z] and of finger 2 can be done as follows. eN and form the class scatter matrix Sw and the within class scatter matrix Sb .. But in Fig. we test the algorithm using images from a custom finger vein image database. (a) Original image 1 (c) NiBlack segmentation method (d) Our method (b) Original image 2 Fig. 15(a) is the high-quality vein image in which veins are clear and the background noise is small. we use NiBlack segmentation method as the benchmark for comparison. Experimental results show that our method has better performance. if the finger vein 1 and 2 match total _ mark value score is greater than a given threshold. the two fingers match. Fig. Where Fig. otherwise they do not match. A minimum distance classifier can also be used for the recognition task. 15(e). The database includes five images each of 300 individuals’ finger veins. there is much noise in the segmentation results. Each image size is 320*240.46 Biometrics Thus. Experimental results (e) NiBlack method (f) Our method We have used a variety of traditional segmentation algorithms and their improved algorithms to segment vein image. We extract veins feature by using our method and compare with results of the NiBlack segmentation method. 15(e) are obtained from the NiBlack method applied in [9]. 4. Experimental results 4. But segmentation results of vein image by these algorithms aren’t ideal. There are a few pseudo-vein characteristics in Fig. 15(b) is the lowquality vein image. To take full account of the original image quality factor. The uneven illumination caused the finger vein image to be fuzzy. Experimental results show that apart from smoothness and continuity or removal of noise and pseudo-vein characteristics. This algorithm extracts smooth and continuous vein features of high-quality image. 15(c).1 Processing experimental results To verify the effectiveness of the proposed method. the method proposed in this paper extracts vein features effectively not only from the high-quality . 15(c) and Fig. 15. Experimental results shown in Fig. Because the result of NiBlack segmentation method is better than other methods [13]. we select two typical images from our database with one from high quality images and the other from poor quality images to show the results of comparative test. and there is the effect of the over-segmentation. Segmented image features have poor continuity and smoothness. which seriously affects image quality. Segmentation was done for all the images in our database using NiBlack segmentation method and using our method. 47.5%. The two wave crests are far from each other with very small intersection. Generally. 4. giving accurate finger-vein recognition.5%.41.15(d) and Fig. and the mean of illegal matching distance corresponds to the wave crest near to 0. EER is 13. Besides. while the others are illegal matches. when FRR and FAR are equivalent. the closer the ROC curve is to the horizontal axis. The solid curve is legal matching curve. Two different verifying curves are shown in Fig. the threshold should be set suitably according to the fact. Fig. especially when the threshold is in the range [0. One forefinger vein image of each person was acquired. The mean legal matching distance corresponds to the wave crest near to 0. We show that the algorithm proposed in this paper performs better than the traditional NiBlack method. the threshold is 0. at a threshold of 0. where the GAR is highest. a good recognition algorithm can be successfully trained on a small dataset to get the required parameters and achieve good performance on a large test dataset.16. In the second step. The method above compares the numbers of relative distances and angles which are approximately equivalent from two finger-vein images. so there are 300 training images. The result indicates that this method is reasonable.15(f). while the dashed is illegal.2 Relative distance and angle experimental results Finger-vein images (size 320×240) of 300 people were selected randomly from Harbin Engineering University finger-vein database. Both curves are similar to the Gaussian distribution. So this method can recognize different finger-veins. GAR of the system is 86. only the .17. The horizontal axis stands for the matching threshold.21 on the horizontal axis.Finger Vein Recognition 47 images but also from the low-quality vein images as shown in Fig. 300 of which are legal. so there are (300×299)/2 = 44850 matching times. Therefore four more images from the forefinger of those 300 people were acquired giving a total sum of 1200 images to be used as verification dataset. 16. and vertical axis stands for the corresponding probability density. that is to say. every sample is matched with others. Legal matching curve and illegal matching curve The relationship between FRR and FAR is shown in Fig. The two curves intersect. the higher the Genuine Acceptance Rate (GAR).62 on the horizontal axis.09-0.38]. For this method. In this case. When matching. in training set to verify. Test result of FRR in 1:1 mode Total matching times 360000 Total false acceptance 25488 GAR (%) 92. So the proposed algorithm is an effective method for finger-vein recognition. and GAR is 92. computation of superfluous information is avoided and only information vital to decision making is used.1. 360000 times in sum. compare the 1200 images with all images in training set. translate randomly in the range [ 10. ROC curve of the method In 1:1 verifying mode. The experiment result is shown in Tab.33%. Fig. The result is as Tab. The result is shown in Tab. The samples which have the same source are compared in a 1:1 experiment.92 FAR(%) 7.2. Only when the two matching steps are successful is the recognition successful.92%.33 FRR(%) 6. Tab. matching each sample from the two sets with the samples from the training set to accomplish 1:n experiment. 17.5 and Tab.08 Table 2. the times of success are 1120. the relative distance and angle would not change when even after image translation and rotation. which has the same source with the former one. 100 ] . In 1:n recognition mode.48 Biometrics intersecting points which are matched successfully on the first step are used and thus. the times of FAR is 25488. According to Theorem 3. compare the one image out of 1200 samples in verifying set with the image.4. in order to establish the translation and rotation test sets. and the rate of success is 93.3.6. Then verify and recognize the two sets respectively. Total matching times 1200 Number of successes 1120 Number of failures 80 Success rate (%) 93. . Tab. 10] and rotate the image randomly in the range [ 10 0 . Test result of FAR in 1:n mode To test the ability to overcome image translation and rotation.67 Table 1. the method can overcome the influence caused by image translation and rotation. Rotation test set verification result (1:1) Total matching times 1:n 360000 No.7. In 1:1 verification mode.25% and 91. Finger vein image capture was performed taking into account the convenience of users. Total matching times 1:1 1200 Number of successes 1101 Number of failures 99 Success rate (%) 91. and in all. thus it can meet practical requirements. GAR can reach 92.58 FRR(%) 7.25 Table 5. Translation test set recognition result (1:n) As the experiment shows.Finger Vein Recognition 49 Total matching times 1:1 1200 Number of successes 1111 Number of failures 89 Success rate (%) 92.5 Experimental results analysis The algorithm was implemented on a Windows XP platform using Visual C + +6. Further. and can reach 91.83 Table 6.42 Table 3.75 FRR(%) 8. Rotation test set recognition result (1:n) 4.17 FAR(%) 8. A total of 287 finger vein images collected for each finger 5 times. of genuine acceptance 27900 GAR(%) 92. In 1: n mode. Translation test set verification result (1:1) Total matching times 1:n 360000 No. in 1:1 mode the rate of success can reach 92.75 Table 4. a total of 287x5 = 1435 were collected to form finger vein library.5. recognition for the veins of our library of images. The identification results based on template matching: We first according to the method proposed in reference [1].17% respectively. while collecting index finger and middle finger vein images.75% when rotated. of genuine acceptance 31788 GAR(%) 91. Two sets 287 x 2 = 574 were taken for verification.25 FAR(%) 7. which implies a robust recognition system. .58% even though the finger-vein image is translated.0. we use the validation library of 574 samples to verify the experimental results shown in Tab. 50 Biometrics Matching times 574 Pass times Reject recognition times 15 Correct recognition rate (%) 97.04 92.3 94. The results of refusing ratio of 1:1 For 1: n identification model.10.4 Reject recognition rate (%) 2. Matching times Reject Correct Pass recognition recognition times times rate (%) 536 547 543 534 529 38 27 31 40 45 93.2 Table 8.4 95.9.8 Hear Daub4 Daub8 coif2 sym 574 574 574 574 574 Table 9. Matching times 574 False recognition times 7 False recognition rate (%) 1. . we realize that the algorithm recognition rate is not as high as the reference [1]. A few typical experimental results of 1:1 verification mode shown in Tab. the experimental results shown in Tab. The results of rejection ratio with different wavelet base in the 1:1 case For 1: n identification pattern.7 5. The result of mistaken identifying ratio of 1: n For a number of reasons. The identification results based on wavelet moment: We first decompose the predetermined wavelet sample in the vein sample database. we use the validation library to identify 574 samples of the experimental results shown in Tab.6 93.2 Reject recognition rate (%) 6.4 6.8.6 4. and then construct the wavelet moment features using identified wavelet coefficients.96 7.6 559 Table 7. perhaps due to the acquisition of image and acquisition machine quality problems. reject recognition times 6 reject recognition rate (%) 1.11: wk   i /  i i 1 i 1 k N (37) k wk 215 0.Finger Vein Recognition 51 Matching times Haar Daub4 Daub8 coif2 sym 574 574 574 574 574 False recognition times 27 11 19 30 33 false recognition rate 4. we use 574 samples in the validation library for the experimental. we use 300 as the dimension for decomposition. Results of rejection ratio of 1:1 .00 Table 11. When PCA is used for dimension reduction. Results of mistaken identifying with different wavelet base in the case of 1: n We chose Daub4 to carry out wavelet decomposition.12.7 Table 10. the relationship of selection of the compressed dimension k and the proportion of it represent components shown in Tab.99 392 1. 13.05 Matching times 574 Pass times recognition rate (%) 568 98. Identification results of wavelet moment integration of PCA.3 5. results shown in Tab.9 3.98 380 0.86 299 0.75 240 0.95 Table 12.90 338 0. identification was better than other wavelets. Authentication in 1:1 mode and 1: n identification pattern. The compressed dimension and it’s proportion To balance computation and weighting.7 1.95 368 0.2 5.80 276 0. 5. To address these problems. which makes the method of practical significance. Our method extends traditional image segmentation methods.7 Table 13. For each image sub-block. and to a certain degree. extracts smooth and continuous vein features not only from high quality vein images but also handles noisy low quality images and does not suffer from the over-segmentation problem. However there are still problems of non. It is also computationally efficient with minimal storage requirement. the influence of image translation and rotation etc. Finger veins have textured patterns. by extracting vein object from the oriented filter enhanced image. it requires a little more processing time because of the added oriented filter operation. essential topology attributes of individual finger veins are utilized in a novel method. and the directional map of a finger vein image represents an intrinsic nature of the image. 1: n of the mode can meet the requirements. since the topology of finger-vein is invariant to image translation and rotation. the relative distance and angles of vein intersection points are used to characterize a finger-vein for recognition. The addition of oriented filter operation. [12]. wavelet moment was performed and PCA features extracted. [13]. Particularly. However. Experimental results indicate that our method is a better enhancement over the traditional NiBlack method [11]. the method resolves the difficult problem of finger-vein positioning. even an inflection point may contain plenty of accurate information. Conclusion Accurate extraction of finger vein pattern is a fundamental step in developing finger vein based biometric authentication systems. and subsequently improve system reliability. Besides. Experimental results indicate that the method can accurately recognize finger-vein. and connect them with line segments. For . recognition rate and rejection rate in the method based on wavelet PCA can meet the requirements of practical applications. further research will be done on the pre-procession method. In view of this. Finally combine the two features for matching and recognition. and the two features were combined for recognition. LDA transform is performed.52 Biometrics Matching times 574 False recognition times 4 False recognition rate (%) 0. The results of mistaken identifying ratio of 1: n As seen from the recognition results. Then relative distances and angles are calculated. Topology is an essential image property and usually. and has good segmentation results even with low-quality images. like positioning. In recognition speed. The finger vein pattern extraction method using oriented filtering technology. Furthermore.recognition and false recognition. The first step is to extract those intersecting points of the thinned finger-vein image. overcome the influence image translation and rotation. that is. pre-procession is an import requirement for this method and the accuracy of preprocessing influences recognition result significantly. to improve the image quality and the accuracy of feature extraction. Finger-vein recognition is faced with some basic challenges. This chapter discussed recent approaches to solving the problem of varying finger lengths and proposed using a set of images of same size interval in a selected sub-block approach. Saint-Malo.Nagasaka. Lindeberg. 15. (In Chinese. Kamel M. pp. Metal Medical image segmentation based on mutual information maximization. Biometric authentication using fast correlation of near infrared hand vein patterns. 6. vol. 692697. 1185-1194. J.Tian. M. Implementing OTSU’s Thresholding Process Using Area-time Efficient Logarithmic Approximation Unit. 2 pp. 2002. Bao Guiqiu. L. 20 no. 2009. 2007.2.V. Machine Vision and Application. iM. France pp.vol. A. Vol. IEICE Trans.Journal of Tsinghua University (Science and Technology).1pp. Srikanthan.Lam. [5] Zhongbo Zhang. pp: 81-87.) [13] Chengbo Yu. 2007. International Journal of Biometrical Sciences. 2003. 13 no. [10] Xiping Luo. Xiao Han. vol. Finger vein recognition based on wavelet moment fused with PCA transform. J. 2003. Miyatake. 194-203.) [11] W. & Syst. we are committed to further study finger vein feature fusion with fingerprint and other features to improve system reliability. [3] Lin Xirong. 2005. 29-38. 21-24.5. Experimental results show that wavelet moment PCA fusion method achieved good recognition performance. (In Chinese.141-148. J Pattern Recognition and Artificial Intelligence.Zhuang Bo. Hong Kong.2106–2110. no. J. IEEE Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06). ZhouYunlong. Huafeng Qing. [9] O’Gorman. Biometric Identification Technology Finger Vein Identification Technology: Tsinghua University Press. 2005 IEEE International Conference. An Introduction to Digital Image Processing. Inf.4. Prentice Hall. Nagasaka and T. vol.Takafm Feature Extraction of Finger-vein Patterns Based on Repeated Line Tracking and Its Application to Personal Identification. [7] Rigau J.164-167.K.) . [8] Yuhang Ding. References [1] N. Dayan Zhuang and Kejun Wang. ISBN 9780134806747. Englewood Cliffs.Finger Vein Recognition 53 matching and identification. 2004. In future research. 1986 [12] Kejun Wang Zhi Yuan.pp. J.90. In Proceedings of MICCAI 2004. China.Naoto.135142. Feixas. Pattern Recognition. 22 no. [6] M. T.2007. Badawi A. NJ. Siliang Ma.29pp. rejection rate FRR of 1. (In Chinese. Extraction of finger-vein patterns using maximum curvature points in image profiles. Multiscale Feature Extraction of Finger-Vein Patterns Based on Curvelets and Local Interconnection Structure Neural Network. no.Su Xiaosheng. Niblack. 2004. 946-956. J Journal of Software.D(8). pp.2006. Image Enhancement and Minutia Matching Algorithms in Automated Fingerprint Identification System. A Study of Hand Vein Recognition Method. vol. Mechatronics and Automation.Sbert. 1989. Miura.5pp. (In Chinese.) [4] H. vol. 4 no.5 pp.no.145-148. J.3. Measurement and matching of human vein pattern characteristics. Nickerson.7%.115-116. vol. Jie Tian. S. vol. [2] Shahin M. error rate FAR was 0. An approach to fingerprint filter design. pp. A. we proposed a method of fuzzy matching scores. Circuits And Systems. pp.05%.vol 4. 43no.pp. MA Dissertation of Jilin University. Machine Vision and Applications. Yuan Zhi. 2005. 2004.1109/ICPR.15(4):194 -203 [15] Kejun Wang. Page(s): 145 – 148 [19] Kejun Wang. the doctorate dissertations of Jilin University.848. Science & Technology Review. Finger Vein Recognition Based on Wavelet Moment Fused with PCA Transform. 10(6):132-136 (inChinese) . Study of Multibiometrics System Based on Fingerprint and Finger Vein[D]. ICPR 2006. [17] Xiaohua Qian. Takafumi Miyatake. 2006. A Study of Hand Vein-based Identity Authentication Method [J]. [18] Zhong Bo Zhang. 2008.54 Biometrics [14] Naoto Miura.2006. SunJixiang. Feature extraction of finger-vein patterns based on repeated line tracking and its application to personal identification [J]. 23(1) :35-37. 2005. Dan Yang Wu. 2007 [16] Xueyan Li. Journal of Circuits and Systems. Wavelet Moment for Images. [20] Ji Hu. YaoWei. Akio Nagasaka. 18th International Conference on Volume: 4 Digital Object Identifier: 10. Publication Year: 2006. Research of Finger-vein Recognition Algorithm[D]. Dazhen Wang. [J] Pattern Recognition and Artificial Intelligence. Si Liang Ma. 2009. Yuhang Ding. Pattern Recognition. ) create bottlenecks in achieving the desired performance. 2003). 2000). Nowadays. and minutiae (Ratha et al. A fingerprint is the pattern of ridges and valleys on the surface of a fingertip. The two most prominent local ridge characteristics are: 1) ridge ending and. Fingerprint recognition is one of the most popular biometric techniques used in automatic personal identification and verification. Introduction In our electronically inter-connected society. Many researchers have addressed the fingerprint classification problem and many approaches to automatic fingerprint classification have been presented in the literature. Naser Zaeri . difficulty in quantitatively defining a reliable match between fingerprint images. 2) ridge bifurcation. A ridge bifurcation is defined as the point where a ridge forks or diverges into branch ridges. etc. nevertheless. Most fingerprint matching systems are based on four types of fingerprint representation schemes (Fig. flowing in a local constant direction. poor image acquisition. are important and vital methods that can be used for identification and verification. Each individual has unique fingerprints. 2006. the difficulty of reading the fingerprint for manual workers. The widespread deployment of fingerprint recognition systems in various applications has caused concerns that compromised fingerprint templates may be used to make fake fingers. minutiae-based representation has become the most widely adopted fingerprint representation scheme. Although significant progress has been made in designing automatic fingerprint identification systems over the past two decades. which could then be used to deceive all fingerprint systems the same person is enrolled in. A ridge ending is defined as the point where a ridge ends abruptly. low contrast images. the research on this topic is still very active.. reliable and user-friendly recognition and verification system is essential in many sectors of our life. 1999). Collectively. and compatibility with features used by human fingerprint experts. 1): grayscale image (Bazen et al. investigating the influence of the fingerprint quality on recognition performances also gains more and more attention. Detailed description of fingerprint minutiae will be given in the next section. The person’s physiological or behavioral characteristics.. known as biometrics. The ridges and valleys in a fingerprint alternate. 2000. Due to its distinctiveness. skeleton image (Feng. The uniqueness of a fingerprint is exclusively determined by the local ridge characteristics and their relationships. 2007). Hara & Toyama. a number of design factors (lack of reliable minutia extraction algorithms. Bazen & Gerez. phase image (Thebaud. these features are called minutiae.3 Minutiae-based Fingerprint Extraction and Recognition Arab Open University Kuwait 1. compactness. the grayscale image is the most at risk. the uniqueness of a fingerprint is exclusively determined by the local ridge characteristics and their relationships (Kamijo. 2011). as shown in Fig. 1984). 1993. These local ridge characteristics are not evenly distributed. Later. a total of 150 different local ridge characteristics (islands. Fingerprint representation schemes. these features are called minutiae. and (d) minutiae (Feng & Jain. A ridge bifurcation is defined as the point where a ridge forks or diverges into branch ridges. Most of them depend heavily on the impression conditions and quality of fingerprints and are rarely observed in fingerprints. 2) ridge bifurcation. Collectively. As mentioned in the previous section. (b) phase image. (Henry. enclosure. Fig. Fingerprint minutiae description The first scientific studies on fingerprint classification were made by (Galton. The two most prominent local ridge characteristics are: 1) ridge ending and. 19_1). 1892). 1900) refined Galton’s classification by increasing the number of the classes. The ridges and valleys in a fingerprint alternate. Most of the fingerprint extraction and matching techniques restrict the set of features to two types of minutiae: ridge endings and ridge bifurcations. flowing in a local constant direction (Fig. A good quality fingerprint typically contains about 40–100 minutiae. A ridge ending is defined as the point where a ridge ends abruptly. leakage of minutiae templates has been considered to be less serious as it is not trivial to reconstruct a grayscale image from the minutiae (Feng & Jain. Leakage of a phase image or skeleton image is also dangerous since it is a trivial problem to reconstruct a grayscale fingerprint image from the phase image or the skeleton image. as well. 2). . Further. 3. 1984).56 Biometrics Once compromised. we provide a special focus on the recent techniques presented in the last few years. (c) skeleton image. where we give a comprehensive idea about some of the wellknown methods that were presented by researchers during the last two decades. A close analysis of the fingerprint image will be discussed and the various minutiae features shall be described. 1.) have been identified by (Kawagoe & Tojo. 1984). Eighteen different types of fingerprint features have been enumerated by (Federal Bureau of Investigation. short ridges. In contrast to the above three representations. (a) Grayscale image (FVC2002 DB1. All the classification schemes currently used by police agencies are variants of the so-called Henry’s classification scheme. 2. Kawagoe & Tojo. who divided the fingerprints into three major classes. etc. we study the recent advancements in the field of minutia-based fingerprint extraction and recognition. Further. 2011) In this chapter. Minutiae-based Fingerprint Extraction and Recognition 57 In a latent or partial fingerprint. the number of minutiae is much less (approximately 20 to 30). 4.. an enclosure can be considered a collection of two bifurcations and a short ridge can be considered a pair of ridge endings as shown in Fig. matching a fingerprint against a database reduces to the problem of point matching. Gray level fingerprint images of different types of patterns with core (□) and delta (Δ) points: (a) arch. (e) whorl. 5. Many other features have been derived from this basic three. 1996) . For example. (f) twin loop (Ratha et al. the y-coordinate. (c) right loop. the x-coordinate. and the local ridge direction (θ) as shown in Fig. Fig. Each of the ridge endings and ridge bifurcations types of minutiae has three attributes. 1996) Fig. (b) tented arch. 3.dimensional feature vector. namely. Given the minutiae representation of fingerprints. More complex fingerprint features can be expressed as a combination of these two basic features. (b) ridge ending (Ratha et al.. 2. (d) left loop. Two commonly used fingerprint features: (a) ridge bifurcation. • right loop. The minutiae sets can be matched using many techniques. tented arch. 5. 1998) In addition to minutiae features described above. and 3) structural distortion of the fingerprint images requires powerful matching algorithms. and • whorl. 4. A very important feature for this purpose is the pattern class of a fingerprint. 1996) The matching problem can be defined as finding a degree of match between a query and reference fingerprint feature set. 2) the fingerprint database is very large. Components of a minutiae feature (Hong et al. The large computational requirement of matching is primarily due to the following three factors: 1) a query fingerprint is usually of poor quality. (b) enclosure (Ratha et al. there are other high-level features that can be used in reducing the search space during a match.. where some of them will be addressed in following sections. • • left loop.58 Biometrics Fig. . Fig. Complex features as a combination of simple features: (a) short ridge.. Fingerprints are classified into five main categories: • arch. 2 and Fig. 6 and 7.. Features at three levels in a fingerprint. namely. Level-2 features are the most distinctive and stable features. level-1 features are the macro details of fingerprints. fingerprint features have been classified at three distinctive levels of detail (Feng & Jain. 1996). Although their definitions for level1 and level-2 are different. (a) Grayscale image (NIST SD30. (c) Level 2 feature (ridge skeleton). 2011) Fig. which are used in almost all automated fingerprint recognition systems and can reliably be extracted from low-resolution fingerprint images (~500 dpi).Minutiae-based Fingerprint Extraction and Recognition 59 The pattern class may be ambiguous in partial fingerprints and indeterminate for noisy fingerprints. e. Zhang et al. Yet another high-level feature is the ridge density in a fingerprint. deltas and cores (indicated by red triangles in Fig. Three levels of fingerprint features (Zhang et al. 2011). 6. Ridge density can be defined as the number of ridges per unit distance. A067_11). pore... A resolution of 500 dpi is also the standard fingerprint resolution of the Federal Bureau of Investigation for automatic fingerprint recognition systems using minutiae (Jain et al.. In (Zhang et al. (b) Level 1 feature (orientation field). 6). 6). ridge endings and bifurcations.. 7. The core point is the top most point on the inner most ridge and a delta point is the tri-radial point with three ridges radiating from it (Fig. such as singular points and global ridge patterns. 2011) Recently.g. They are not very distinctive and are thus mainly used for fingerprint classification rather than recognition. the ridge density between two singular points in a fingerprint is computed. In order to make it invariant to position. Level-3 features (red circles) are often defined as the dimensional attributes of the ridges and include sweat pores. 2011). 2011. as shown in Figs. Some singular points of interest are defined as the core and delta points (Ratha et al. and dot) (Feng & Jain. . and (d) Level 3 features (ridge contour. they agree on the definition of level-3. Fig. 2007).. The level-2 features (red rectangles) primarily refer to the minutiae. Zhao et al. Parsons et al. In (Maio & Maltoni. 2007. In order to infer the structural configuration of the encoded fingerprints. recent researches are focusing on pores (International Biometric Group.. segmentation of the directional image. At this stage. a grammatical inference system is developed. 1996) is adopted to segment the directional image according to well-suited optimality criteria. A grammar is defined for each class and a parsing process is responsible for classifying each new pattern.. A dynamic clustering algorithm (Maio et al. and inexact graph matching. Among these features. The dimensionality of the directional image. 1969). Zhao et al. 1976) demonstrated how a tree system may be used to represent and classify fingerprint patterns. The fundamental concept underlying the proposed system is to use an operator to recognize the ridge characteristics and to impart to a computer the ability to manipulate and compare the digitized locations and directions of these characteristics for single-fingerprint classification. The fingerprint impressions are subdivided into sampling squares which are preprocessed and postprocessed for feature extraction. 2008. a relational graph is built by creating a node for each region and an arc for each pair of adjacent regions. 1970). A set of regular tree languages is used to describe the fingerprint patterns. The basic idea is to perform a directional image partitioning into several homogeneous regular-shaped regions. Structural approach One of the early attempts to automate fingerprint recognition was proposed by (Liu & Shelton. where they are considered to be reliably available only at a resolution higher than 500 dpi.60 Biometrics ridge contours. The directional image is computed over a discrete grid by means of a robust technique proposed by (Donahue & Rokhlin.. the ridge-line area is separated from the background and an enhancement is performed in the frequency domain. the authors obtained a structure which summarizes the topological features of the fingerprint and is invariant with respect to displacement and rotation. a well-defined structural approach for fingerprint classification was presented. Before computing the directional images. is reduced to 64 elements by using the principal component analysis (Jolliffe. by maintaining the regularity of the region shape. 2006. Starting from the segmentation of the directional image.680 elements. 1986).. construction of the relational graph. 1975) and (Rao & Balck. 1996). 2009). In particular. 2008.. Terminal symbols are associated to small groups of directional elements within the fingerprint directional image. Jain et al. 1993). The PCASYS approach (Pattern-level Classification Automation SYStem) proposed by (Candela & Chellappa. 1990) is used for assigning each 64-element vector to one class of the classification . The computation of the directions is carried out by the method reported in (Stock & Swonger. considered as a vector of 1.. 1980). Jain et al. 1993) and (Candela et al. (Moayer & Fu. the algorithm works by minimizing the variance of the element directions within the regions and. a PNN (Probabilistic Neural Network) (Specht. 3. 2008. The whole approach can be divided into four main steps: computation of the directional image. 1995) assigns fingerprints to six nonoverlapping classes. and ridge edge features. In (Moayer & Fu. with the aim of creating regions as homogeneous as possible. which are used to build a relational graph summarizing the fingerprint macro-features. By appropriately labeling the nodes and arcs of the graph.. patterns were described by means of terminal symbols and production rules. The directional image is then registered with respect to the core position which corresponds to the fingerprint center. simultaneously. all of which provide quantitative data supporting more accurate and robust fingerprint recognition. modification of directional codes. Next.. The matching algorithm presented is a modification and improvement of the structural approach. 7). ridge configuration. The fluctuation of M(Wd) is expected to be the largest when the movement of the directional window is orthogonal to the direction of the ridges. To find the ridge direction of a given area. and occupational marks). dynamic thresholding and ridgeline thinning. a significant percentage of acquired fingerprint images is of poor quality. feature extraction and matching that runs accurately on a personal computer platform. they cannot be correctly detected. each of the directional windows. 8. 8. a total of eight directional codes are used. Eight directional windows Wd for extraction of ridge direction (Wahab et al. they first divided the original image (320 x 240) into 40 x 30 small areas. which works by analyzing the ridge-line concavity under the core position. the mean value M( Wd) of the grey level of the pixels in the window is calculated.. Ridge orientation approach Since the performance of a minutiae extraction algorithm relies heavily on the quality of the input fingerprint images. The ridge structures in poor-quality fingerprint images are not always well-defined and. To reduce computational time. The eight directional windows wd (d = 0... At each location when the window moves. the authors also implemented an auxiliary module (called pseudoridge tracer). In practice. postnatal marks. 1998) 4. This leads to following problems: . acquisition devices. and non-cooperative attitude of subjects. 2.. . 1. especially for whorl fingerprints. each having a length of 16 pixels are shown in Fig. wd is moved in the direction tangential to the direction of the window. In their approach. Fig. etc. (Wahab et al. due to variations in impression conditions. 1998) described an enhanced fingerprint recognition system consisting of image preprocessing. The image preprocessing includes histogram equalization. In order to improve the classification reliability. Only the extracted features are stored in a file for fingerprint matching. it is essential to incorporate a fingerprint enhancement algorithm in the minutiae extraction module to ensure that the performance of the system is robust with respect to the quality of input fingerprint images. each area is assigned a directional code to represent the direction of the ridgeline in that area. Each of the directional windows will have to move eight times to cover the entire area. hence. skin conditions (aberrant formations of epidermal ridges of fingerprints.Minutiae-based Fingerprint Extraction and Recognition 61 scheme.. Therefore this area will be assigned to have ridges in the direction d such that the fluctuation of M( Wd) is the largest. 1998) (Hong et al. a ridge orientation based computation method is used. Therefore. A binary ridge image is an image where all the ridge pixels are assigned a value one and nonridge pixels are assigned a value zero. and ridge type.. which can adaptively improve the clarity of ridge and valley structures of input fingerprint images based on the estimated local ridge orientation and frequency using both the local ridge orientation and local frequency information. ridges and valleys in a local neighborhood form a sinusoidal-shaped plane wave which has a well-defined frequency and orientation. (Vaikol et al. large errors in their localization (position and orientation) may be introduced. The binary image can be obtained by applying a ridge extraction algorithm on a gray-level fingerprint image. Fig. ridge curvature direction. a number of simple heuristics can be used to differentiate the spurious ridge configurations from the true ridge configurations in a binary ridge image. after applying a ridge extraction algorithm on the original gray-level images. A fingerprint image is treated as a textured image. 9. ridge length. Fingerprint images of very poor quality (Hong et al. In a gray-level fingerprint image. where an orientation flow field of the ridges is computed. To accurately locate ridges. Since ridges and valleys in a fingerprint image alternate and run parallel to each other in a local neighborhood. After ridge segmentation. These ridge features have some advantages in that they can represent the topology information in . (Choi et al. an enhancement algorithm that improves the clarity of the ridge structures is necessary. The proposed ridge features are composed of four elements: ridge count. 2010) introduced a novel fingerprint matching algorithm using both ridge features and the conventional minutiae features to increase the recognition performance against nonlinear deformation in fingerprints. a significant number of spurious minutiae may be created. In order to ensure that the performance of the minutiae extraction algorithm is robust with respect to the quality of the input fingerprint images. enhancement of binary ridge images has its inherent limitations.. 2009) presented a reliable method of computation for minutiae feature extraction from fingerprint images. However. information about the true ridge structures is often lost depending on the performance of the ridge extraction algorithm. a large percent of genuine minutiae may be ignored. and 3.. 2. Fingerprint enhancement can be conducted on either: 1) binary ridge images or. 1998) presented a fast fingerprint enhancement algorithm. 2) gray-level images.. The scheme relies on describing the orientation field of the fingerprint pattern with respect to each minutia detail. smoothing is done using morphological operators.62 Biometrics 1. 2010). The parameters are the widths of the black and white lines in the scanned line in the fingerprint. Example of matched minutiae using the proposed ridge feature vectors (solid circles represent matched minutiae and dotted lines represent the vertical axis of each minutia) (Choi et al. In their implementation. 2010) 5. This enabled the problem formulation to be cast as a parametric optimization problem. 1999) used the fact that a fingerprint is made of white followed by black lines of bounded number of pixels. 10). Pixel-level approach (Abutaleb & Kamel.. and position). 2002) presented an approach for combining local and global recognition schemes for automatic fingerprint verification by using matched local features as the reference axis for generating global features.Minutiae-based Fingerprint Extraction and Recognition 63 entire ridge patterns that exist between two minutiae and are not changed by non-linear deformation of the finger. minutia-based and shape-based techniques were combined. they have also defined the ridgebased coordinate system in a skeletonized image. a Learning Vector Quantization neural network was trained to match the fingerprints using the difference between a pair of feature vectors. Fig. In (Zhang et al. The second one generates global features (shape signatures) by using the matched minutiae as its frame of reference. Finally. orientation. (Ceguema & Koprinska. With the proposed ridge features and conventional minutiae features (minutiae type.. The first one matches local features (minutiae) by a pointpattern matching algorithm. It was observed that . they have proposed a novel matching scheme using a breadth first search to detect the matched minutiae pairs incrementally (Fig. Further. mainly minutiae and singular point. of the rotation transformed fingerprint. For extracting ridge features. investigation has been conducted on analyzing the mechanisms of fingerprint image rotation processing and its potential effects on the major features. 10. The proposed adaptive genetic algorithm proved to be effective in determining the ridges or edges in the fingerprint. Shape signatures are then digitized to form a feature vector describing the fingerprint. . 2005) and (Alibeigi et al.64 Biometrics the information integrity of the original fingerprint image can be significantly compromised by the image rotation transformation process. 2009). Each block is processed independently. Their experimental results have shown that up to 7% of the minutiae can be mis-matched. (in bifurcation minutiae. which can cause noticeable singular point change and produce non-negligible number of fake minutiae. Processing the commutations (Gamassi et al. repeated for each pixel of the binary image: 1. The gray level projection along a line perpendicular to the local orientation field provides the maximum variance. Compute the average of the pixels in the perimeter P... 3. 2. The ridges are thinned and the resulting image is enhanced using an adaptive morphological filter. 6. 7. . 2005) 4. 12).75 the pixel is treated like a bifurcation minutiae. If the pixel has been defined as a termination minutiae in step 1. y) pixel and compute the average of the pixels. The algorithm continues if there are two logic commutations. Compute the number of the logic commutations present in the perimeter P without considering isolated pixels as shows in Fig. otherwise if the average is greater than 0. False detection removal (Fig. If the average is less than 0. Fig. otherwise it jumps to step 1 processing another pixel. 2010) proposed an approach for feature extraction based on dividing the image into equal sized blocks.. The position of singular point can change up to 55 pixels while the orientation angle change can be up to 90 degrees. For the matched ones. it checks if the average is greater than the threshold K. It is found that the quantization and interpolation process can change the fingerprint features significantly though they may not change the image visually. y) pixel of size W×W. Square-based method was presented in (Gamassi et al.25 the pixel is preliminary identified as a ridge termination minutiae. Then the ridges are located using the peaks and the variance in this projection. 5. Create a square perimeter P around the (x. their positions deviate up to 16 pixels. (Kaur et al. 11. Create a 3x3 square mask around the (x. the average must be less than 1-K) otherwise it jumps to step 1 processing another pixel. 11. The Square-based method is composed of the following steps. Estimate the orientation angle α in the minutiae point. The directional image is partitioned into “homogeneous” connected regions according to the fingerprint topology. which can be conceived as a continuous classification. a Laplacian-like image pyramid is used to decompose the original fingerprint into subbands corresponding to different spatial scales for fingerprint enhancement.. Estimating the orientation angle (left) and removing the false terminations detections (right) (Gamassi et al. parabolic symmetry is added to the local fingerprint model which allows to accurately detecting the position and direction of a minutia simultaneously. thus giving a synthetic representation which can be exploited as a basis for the classification.Minutiae-based Fingerprint Extraction and Recognition 65 Fig. (Simoncelli & Freeman. the fingerprint’s blockwise Fourier transform is multiplied by its power spectrum raised to a power.. For minutiae extraction. 2005) 6. where the corresponding filtering directions stem from the frequencyadapted structure tensor. thus magnifying the dominant orientation.2% with 3NN classifiers. was used to guide the partitioning. on a database of 192 inked fingerprint images from 16 persons.. Also. 1995) is equivalent to bandpass filtering in the spatial domain. In (Watson et al. A set of dynamic masks. (Hong et al.. The adaptation of the masks produces a numerical vector representing each fingerprint as a multidimensional point. The Laplacian pyramid (Adelson et al. 1984). contextual smoothing is performed on these pyramid levels.. In a further step. 1999) have developed a one-step method using Gabor filters for directly extracting fingerprint features for a small-scale fingerprint recognition system. A directional image is a discrete matrix whose elements represent the local average directions of the fingerprint ridge lines. 1999) have implemented the directional approach. Also. 1994) and (Willis & Myers. 12. whereas . From the experimental results. Filtering and wavelet approach (Lee & Wang. together with an optimization criterion. (Cappelli et al. Different search strategies were discussed to efficiently retrieve fingerprints both with continuous and exclusive classification. 1998) proposed an algorithm using Gabor bandpass filters tuned to the corresponding ridge frequency and orientation to remove undesired noise while preserving the true ridge-valley structures. the use of magnitude Gabor features with eight orientations as fingerprint features led to good shift-invariant properties and an accuracy of 97. All operations are performed in the spatial domain. 2001). Next. the discrete wavelet transform (DWT) coefficients have been used as the ridge pattern. 2007) that the discrete cosine transform (DCT) is better suited in extracting informative features than the DWT. From (Tico et al. 1994) and (Chikkerur & Govindaraju. The results have shown that the fingerprint matching system based on the DCT obtained a high recognition rate and a lower complexity. Fig. to extract wavelet features from a gray-scale fingerprint image. Then. all the DCT coefficients in each non-overlapping image were divided into 12 areas. 13. 2005. 2006) at the corresponding pyramid level. (Fronthaler et al. The filtering directions are recovered from the orientations of the structure tensor (Bigun. Furthermore. twelve sub-images in the wavelet domain at each decomposition level are created as shown in Fig. 2001). 2008) proposed the use of an image-scale pyramid and directional filtering in the spatial domain for fingerprint image enhancement to improve the matching performance as well as the computational efficiency.. 2005) is done in the Fourier domain. they have proposed Gaussian directional filtering to enhance the ridge-valley pattern of a fingerprint image using computationally cheap 1-D filtering on higher pyramid levels (lower resolution) only. 2001) On the other hand. The cropped image was then quartered. 2006) presented a filtering strategy that can solve the problem of rotated scanned input images. where the center point in the image is referred to as a reference point.... the image was first cropped to the size of 64×64 pixels.66 Biometrics the contextual filtering in (Sherlock et al. Image pyramids or multiresolution processing is especially known from image compression and medical image processing. generating 12 features for each non-overlapping image. Linear symmetry features are thereby used to extract the local ridge-valley orientation (angle and reliability). 13. The fingerprint image is scanned with an optical fingerprint scanner. The resulted feature vector is then used as a representation of that fingerprint image. to obtain four nonoverlapping images of size 32×32 pixels. 2005) proposed to divide all the DCT coefficients containing the oscillate pattern in a zigzagscanned fashion. centered at the reference point. The authors discussed that the middle frequency has an oscillated pattern corresponding to the ridge pattern. where one feature was extracted from each one of these areas. Arrangements of twelve sub-images in the wavelet domain (Tico et al. Accordingly. (Tachaphetpiboon & Amornraksa. (Fronthaler et al. extract DCT features from the divided DCT coefficients. it was shown in (Tachaphetpiboon & Amornraksa.. 2008) expected that all the relevant information to be concentrated within a few frequency bands. and then use them for fingerprint matching. (On et al. the standard deviation of the DWT coefficients from each sub-image is computed to create a feature vector of length 48 (12 DWT sub-images from 4 non-overlapping images). After applying the DWT to each non-overlapping image four times. The .. The Fourier transform of mi(x. Further. and we obtain (3) Then. y) is given by: (1) and the location-based spectral minutiae representation is defined as (2) In order to reduce the sensitivity to small variations in minutiae locations in the spatial domain. the orientation information in the spectral representation is included. . a function mi(x. a Gaussian low-pass filter is used to attenuate the higher frequencies. y) in the direction of the minutia orientation. using a Gaussian filter and taking the magnitude of the spectrum yields . The orientation θ of a minutia can be incorporated by using the spatial derivative of m(x. The conversion is needed to reduce the computation and analysis time for filtering and thinning process. This multiplication in the frequency domain corresponds to a convolution in the spatial domain where every minutia is now represented by a Gaussian pulse. a function mi(x. the enhanced fingerprint image is applied for binarization. with every minutia. every minutia is represented by a Dirac pulse. Z is associated where (xi. in the spatial domain. The noise produced from the binarized fingerprint image is then removed using median filtering and the filtered fingerprint image is further thinned. y) in the direction θi. . y. Assume a fingerprint with Z minutiae. Following the shift property of the Fourier transform. i = 1. such that (4) As with the SML algorithm. scale and rotation properties of the two-dimensional continuous Fourier transform. yi) represents the location of the i-th minutia in the fingerprint image. θ) is assigned being the derivative of mi(x. The concept of spectral minutiae representation was used by (Xu & Veldhuis. . Thus. the bifurcation minutiae extraction method is applied for the thinned fingerprint image. The scanned fingerprint image is then enhanced for quality improvement. to every minutia in a fingerprint. . The spectral minutiae representation is based on the shift. 2009). Thus. the magnitude of M is taken in order to make the spectrum invariant to translation of the input. y) = δ(x − xi. y − yi).Minutiae-based Fingerprint Extraction and Recognition 67 scanned fingerprint image is saved in bitmap format with black and white colour. After that. The extracted feature data are then used for neural network training. In location-based spectral minutiae representation (SML). 2009b) introduced two feature reduction methods: the Column-Principal Component Analysis (PCA) and the LineDiscrete Fourier Transform feature reduction algorithms. in which the fingerprint’s orientation field is reconstructed from minutiae and further utilized in the matching stage to enhance the system’s performance. 14 illustrates a general procedure of the spectral minutiae representation discussed by (Xu & Veldhuis. On the other hand.. for fingerprint identification. the recognition performance of the fingerprint system is not degraded. (b) representation of minutiae points as real (or complex) valued continuous functions. (c) the 2D Fourier spectrum of ‘b’ in a Cartesian coordinate and a polar-logarithmic sampling grid. 2009a. they have produced “virtual” minutiae by using interpolation in the sparse area. 14. 2009) proposed an algorithm to use minutiae for fingerprint recognition. Fig. (d) the Fourier spectrum sampled on a polarlogarithmic grid (Xu & Veldhuis. increases the identification accuracy from 85. (Xu & Veldhuis.2% by nearest cluster center point classifier with Leave-One-Out method. in comparison with Gabor filter and PCA transform. (Xu et al. The proposed method was assessed on images from the biolab database (Biometric System Lab). The experiments demonstrated that these methods decrease the minutiae feature dimensionality with a reduction rate of 94%. increases the identification accuracy from 81. 2010) have further discussed the objective of the spectral minutiae representation in representing a minutiae set as a fixed-length feature vector that is invariant to translation. 15). (a) a fingerprint and its minutiae. 2010) Moreover. First. Geometric approach (Chen et al. it has been shown that applying RFLD to a Gabor filter in four orientations... Illustration of the general spectral minutiae representation procedure. rotation and scaling. Also. Fig. (Dadgostar et al. Experimental results have shown that applying RFLD to a Gabor filter in four orientations. 7. 2010). 2009) presented a novel feature extraction method based on Gabor filter and Recursive Fisher Linear Discriminate (RFLD) algorithm. and then used an orientation model to reconstruct the orientation field from all “real” and “virtual” minutiae (Fig.2% to 95. in comparison with Gabor filter and PCA transform.68 Biometrics (5) Recently. A .9% to 100% by 3NN classifier. based on the spectral minutiae feature. while at the same time. Minutiae-based Fingerprint Extraction and Recognition 69 decision fusion scheme is used to combine the reconstructed orientation field matching with the conventional minutiae-based matching. the features are compared with these pre-defined databases and the decision is made by a voting system. Generally. Third. . (c) virtual minutiae by interpolation (the bigger red minutiae are “real. Singularity approach The global features of a fingerprint are singularity points. In (Wei-bo et al. Then. 2006) presented a method in which the singular point detection is performed by analyzing the local quadrant change of the ridge gradient vectors. they completed the matching task by aligning the two fingerprint minutiae sets and accounting the number of overlapping minutiae.. the fingerprint image is cropped around the based point. singularity points are used to classify fingerprint images to reduce the search space. they computed the curvature in a simple way based on orientation field.. 2009) 8. (Kryszczuk & Drygajlo. Finally. they found the reference points pair based on computing the least squared error of Euclidean distance between these two specifiers. Singular points (SP) are defined as discontinuities in the directional field (Liu et al. Interpolation step: (a) the minutiae image. their coordinates location. Fig. and opposite.” while the smaller purple ones are “virtual”) (Chen et al. 2008) proposed a fingerprint matching algorithm using the elaborate combination of minutiae and curvature maps from fingerprint images. (b) the triangulated image.. Second. 15. Their algorithm performs a robust estimation of the local ridge gradient. First.. (Qi et al. Fingerprint features such as minutiae points' determination. 2008). In formally. this can be stated as the area where ridges oriented rightwards change to leftwards and those that were oriented upwards turn downwards. 2005). For a testing fingerprint image. and radius of arcs for each ridge are stored in different databases. The core point (CP) of the input fingerprint is detected and located in the centre. namely core and delta. They employed a modified version of the “squared average gradients” to estimate the direction of the smoothed gradient vectors. each minutia was defined by the type and the relative topological relationship among the minutia and its 5 nearest neighbors. (Min & Thein. a similarity measurement was defined between two specifiers. 2009) have presented a recognition system which combines both the statistical and geometry approaches. and then performed the sampling operation on the curvature map around each minutia to get the fixed length minutiae specifiers. . After singularity points extraction. a FAR=0. 2010) have studied the utility of pores on rolled ink fingerprint images which are widely used in forensic applications. it has an important role in the design and deployment of fingerprint recognition systems and impacts both their cost and recognition performance. and (ii) fusion of pore and . Focusing on this kind of features depends heavily on the quality of the digital fingerprint image. Resolution is one of the main parameters affecting the quality of a digital fingerprint image. 2010) proposed another fingerprint recognition that is based on singularity points detection and singularity regions analysis. the field of fingerprint recognition does not currently have a well-proven reference resolution or standard resolution that can be used interoperably between different systems.36% using FVC2000 DB1-B database. and so. achieving a more robust average local ridge gradient estimation.. Fingerprint images of three different qualities at two different resolutions (500ppi and 1000ppi) were considered in their experiments. 2008. The proposed approach enhances the performance of singularity points based methods by introducing pseudo-singularity points when the standard singularity points (core and delta) cannot be extracted. 2008) proposed a fingerprint recognition approach based on core and delta singularity points detection.26% and a FRR=7. 2008.22% and a FRR=9. For example.. the International Biometric Group analyzed level-3 features at a resolution of 2000 dpi. Jain et al. Parsons et al. 2009). and in the best of case. they allowed cancelling out the opposite local gradients. Zhao et al.23% using FVC2002 DB2A database. Poincarè indexes computation and core and delta extraction. By using NIST SD30 database. Despite to the classical minutiae-based fingerprint recognition system.. The experimental results have shown good accuracy levels. their relative distance and orientation to perform both classification and matching tasks. (Zhao & Jain. Moreover.. Despite this. and a commercial minutiae matcher. 2008. The experimental results have shown that the (i) pores do not provide any significant improvement to the fingerprint recognition accuracy on 500ppi fingerprint images. The singularity points extraction is performed using three sequential steps: directional image extraction. Zhang et al. 2006. Finally.. 2007. 2007) chose a resolution of 1. Zhao et al. singularity points and/or pseudosingularity-points are detected and extracted to make possible successful fingerprint classification and recognition. such as dilation and erosion. on two considered regions of interests (singularity regions or pseudo singularity regions) around core and delta is performed. they have investigated the impact of fingerprint image quality on the accuracy of automatic pore extraction. The obtained similarity degree considering the regions of interest gives the matching result.. Finally. 2008.. 9. 2009) proposed some pore extraction and matching methods at a resolution of 902 dpi x 1200 dpi.70 Biometrics Also. researchers have focused on pores as a distinctive fingerprint features (International Biometric Group.. As a result. (Conti et al. (Jain et al. a roto-translation operation is applied for fingerprint image registration.000 dpi based on the 2005 ANSI/NIST fingerprint standard update workshop. the proposed system is based on core and delta position. Pore approach Recently. a matching algorithm based on morphological operation. (Zhao et al. reaching a FAR=1. The approach has shown a good accuracy level in the singularity points detection and extraction and a low computational cost. (Militello et al. 2011. and the effectiveness of pores in improving fingerprint recognition accuracy. 1993) presented an interesting pyramidal architecture constituted by several multilayer perceptrons. Neural network approaches are mostly based on multilayer perceptrons or Kohonen self-organizing networks (Bowen. automatic minutiae extraction was analyzed using three different types of fingerprints: latent. 2011) have taken further steps toward establishing a reference resolution. 11. Therefore. their results have shown that 800 dpi would be a good choice for a reference resolution. the differences in performance between manual and automatic minutiae extraction for latent fingerprints were presented.e. Then.. (Wang et al. 2008) proposed a fingerprint matching algorithm based on invariant moments. and providing a minimum resolution for pore extraction that is based on anatomical evidence.. (Puertas et al.) taken at the intersection points between some horizontal. (Montesanto et al. minutiae and pores. etc. In particular. The first step involves extracting the pores from the partial image. These two theorems provided the template parameter inequalities to determine parameter intervals for implementing the corresponding functions. i. core and delta points. Other approaches In this section we present an overview of some other general techniques proposed for fingerprint recognition. 2007) have studied the fingerprint verification based on the fuzzy logic. (Malathi & Meena. 1997) proposed a hidden Markov model classifier whose input features are the measurements (ridge angle. assuming a fixed image size and making use of the two most representative fingerprint features. a chi-square formula is used to calculate the minimum distance between two histograms to find the best matching score.vertical fiducial lines and the fingerprint ridge lines.. Finally. Kamijo. (Zhang et al. 1991. They presented two theorems for designing two kinds of CNN templates. These pores act as anchor points where a sub window (32x32) is formed to surround them. Moscinska & Tyma. 2010) studied the performance of a fingerprint recognition technology.. 1993). in several practical scenarios of interest in forensic casework. 2008) proposed the cellular neural/nonlinear network (CNN) as a powerful tool for fingerprint feature extraction.. (Senior. the reactive agent and the neural classification system. (Kamijo. rolled and plain. it is important to estimate the quality and validity of the fingerprint . 1992. First.Minutiae-based Fingerprint Extraction and Recognition 71 minutiae matchers is effective only for high resolution (1000ppi) fingerprint images of good quality. By evaluating these resolutions in terms of the number of minutiae and pores. (Yang et al. Then rotation invariant LBP histograms are obtained from this surrounding window. etc. where they combined the results obtained using three different methods of minutiae extraction: the sequential method. each of which was trained to recognize fingerprints belonging to different classes. 10. The experiments were carried out using a database of latent fingermarks and fingerprint impressions from real forensic cases. Quality assessment methods Fingerprint quality is usually defined as a measure of the clarity of ridges and valleys and the extractability of the features used for identification such as minutiae. separation. 2010) presented a suitable technique for partial fingerprint matching based on pores and its corresponding Local Binary Pattern (LBP) features. They conducted experiments on a set of fingerprint images of different resolutions (from 500 to 2000 dpi).. 1993. curvature. Hughes & Green. Standards. 2006) show that degraded biometric samples result in a decrease in BI. 2007) .. temperature. a number of quality metrics can be found aimed at objectively assessing the quality of an image in terms of the similarity between a reference image and a degraded version of it. which refers to the impact of the individual biometric sample on the overall performance of a biometric system. 2008) proposed a fingerprint-quality measurement method based on the shapes of several probability density functions (PDFs). motivation/collaboration of users. They defined “Biometric information” (BI) as the decrease in uncertainty about the identity of a person due to a set of biometric measurements. 2007). Many unresolved problems still need to be explored and investigated. we have presented a study covering different automatic fingerprint recognition techniques. dirt. the shapes of the PDFs of these gradients are measured in order to determine the fingerprint quality. so that images assigned higher quality shall necessarily lead to better identification of individuals (i. Inf. 2005). BI is calculated by the relative entropy between the population feature distribution and the person’s feature distribution. 2006).72 Biometrics images in order to improve recognition performance. Conclusion In this chapter. A theoretical framework for a biometric sample quality has been developed by (Youmaran & Adler. The 2-D gradients of the fingerprint images are first separated into two sets of 1-D gradients. etc.. 2005) suggested quality estimation methods based on the power spectrum analysis in the 2-D Fourier domain and coherence in the spatial domain. assessing the quality of captured fingerprints is important for a fingerprint recognition system. Then.. 2) fidelity. age. (Chen et al. attributable to each step through which the sample is processed. It is generally accepted that the utility is most importantly mirrored by a quality metric (Grother & Tabassi. the utility of the sample. For example.e. They used seven quality indices and analyzed the correlation between the quality value and each quality index. For this reason. Technol. which refers to the quality attributable to inherent physical features of the subject.. Com. for a . In (Van derWeken et al. 2005) proposed a hybrid scheme to measure quality by considering local and global features. ISO/INCITS-M1 (International Standards Organization/International Committee for Information Technology Standards) has established a biometric sample-quality draft standard (Int. where the concept of sample quality is a scalar quantity that is related monotonically to the performance of the system. A number of factors can affect the quality of fingerprint images (Joun et al. temporal or permanent cuts. The character of the sample source and the fidelity of the processed samples contribute to. where they relate biometric sample quality with the identifiable information contained. The results reported by (Youmaran & Adler. many of these factors cannot be controlled and/or avoided. which is the degree of similarity between a biometric sample and its source. and 3) utility. Although many academic and commercial systems for fingerprint recognition exist. better separation of genuine and impostor match score distributions). 12.. in which a biometric sample quality is considered from three different points of view: 1) character. 2003): occupation. or similarly detract from. presented by the experts in this field. (Lee et al. dryness/wetness conditions. Unfortunately. (Qi et al.. Finally. there is a necessity for further research in this topic in order to improve the reliability and performances of the current systems. residual prints on the sensor surface. Aug. P. 1992 Candela. A. A. (1999). Bergen. & Chellappa. (1993). H. 1995 Cappelli. (2000). pp. Also. A. 2000 Bigun. R. Pyramid methods in image processing. 1984 Alibeigi. S. D.M. (2003). Proc. 5. NIST Technical Report NISTIR 5647.T. IEEE Trans. & Maltoni. Proc.. S. S. Comparative Performance of Classification Methods for Fingerprints. Pipelined Minutiae Extraction From Fingerprint Images. A Correlation-Based Fingerprint Verification System. common matching methods based on alignment of singular structures would fail in case of partial prints. New York: Springer. E. J. Further. pp. Cesena-Italy. Vol.. L. (1992). 13. H.. Aug. Acknowledgments The author would like to acknowledge and thank Kuwait Foundation for the Advancement of Sciences (KFAS) for financially supporting this work. No. R. B. & Van der Zwaag. D. 14.J. (2009). pp.. (1984). Vol. J.T. 205-213. Workshop Circuits Systems and Signal Processing. Lumini. NIST Technical Report NISTIR 5163. E. Veelenturf.. fast comparison algorithm is necessary since most minutiaebased matching algorithms will fail to meet the high speed requirement. pp.. Pattern Recognition.M. & et al. 1993 Candela.csr. R.J.. Apr. Proceedings of the IEEE. AUGUST 1999 Adelson. The major challenges faced in partial fingerprint matching are the absence of sufficient level 2 features (minutiae) and other structures such as core and delta. A. PCASYS—A Pattern-Level Classification Automation System for Fingerprints. IEE Colloquiym Neural Network for Image Processing Applications. 11th Ann. (1999). 1859-1867.P. G. Fingerprint Classification by Directional Image Partitioning. P.D.B. Low quality. The Home Office Automatic Fingerprint Pattern Classification Project. Verwaaijen. M. (2006). J. 8. Thus. Burt. G... 6. 36. Vision With Direction. & Gerez. 8. J. IEEE Trans. & Kamel. A Genetic Algorithm for the Estimation of Ridges in Fingerprints. On Pattern Analysis And Machine Intelligence. RCA Eng. References Abutaleb. Rizi M. Vol. no. Maio. T.it/research/biolab/) Bowen. pp.. Anderson. MAY 1999 . 29. (1995). matching partial fingerprints still needs lots of improvement.T. 8. incompletion and distortion are typical problems that forensic fingerprint recognition systems have to face when extracting features from latent fingermarks. C. 2006 Biometric System Lab. & Behnamfar.unibo. matching speed and its robustness to poor image quality are normally regarded as the most critical elements of system performance. 21. (www. Fingerprint Matching by Thin-Plate Spline Modelling of Elastic Deformations. 402-421.Minutiae-based Fingerprint Extraction and Recognition 73 large automated fingerprint identification system. Nov. On Image Processing. No. vol. 2003 Bazen.. 33–41. Pores (level 3 features) on fingerprints have proven to be discriminative features and have recently been successfully employed in automatic fingerprint recognition systems. there is still a lot of research to be done when dealing with latent fingermarks. No. University of Bologna. M. G. 1134-1139. 2009 Bazen.H. J. & Ogden. the recognition accuracy. Finally.H. Gerez. 7. 1900 Hong. & Bigun. Sorbello. IEEE Transactions On Pattern Analysis And Machine Intelligence. 2010 Dadgostar. vol. London: Routledge. pp. 2009 Donahue. US Patent No.J. Performance of biometric quality measures.688. 2011 Feng. no. Vol. J. Workshop on Pattern Recognition for Crime Prevention. 3. Ouyang. & Vitabile. (2005). J. Method and Apparatus for Matching Streaked Pattern Image. Integrating Local and Global Features in Automatic Fingerprint Verification.. 2005..R. (1993). Rye Brook. & Tabassi. F. Image Understanding.D. C. M. Sept 2005 Grother. 39. 2008 . V. Fingerprint Image Enhancement: Algorithm and Performance Evaluation. AUGUST 1998 Hughes. E. 79-81. Vol. Intelligent and Software Intensive Systems. E. 2010 Conti. 1665. IEEE Trans. (2007). Kollreider. Audioand Video-based Biometric Person Authentication. F. (2009). 7.. V. Y. Fingerprint Reconstruction: From Minutiae to Phase. Piuri. On the Use of Level Curves in Image Analysis. (1998). 2007 Henry. 2131-2140. Pattern Recognition.. (2008). 3. 160–170 Chen. 2002 Chen. Vol. K. H. M. 29. P. & Yang. (2011). Neural Network. & Jain. Fatemizadeh. 2006 Fronthaler. & Koprinska. A.. Feature Extraction Using Gabor-Filter and Recursive Fisher Linear Discriminant with Application in Fingerprint Identification. Proc. (2006). London: McMillan. Analysis of Level 3 Features at High Resolutions (Phase II). Int. V. Tabrizi. S.1670. 4. 2. Pattern Anal. Second Int’l Conf. P. (2010).. I. FEB. S. F. Militello. M. 1892 Gamassi. Proc. Jul. Y. (2009). Wan. Fingerprint image enhancement using STFT analysis. IEEE Transactions On Image Processing. No. 3. A. 57. 11. & Cai. pp. Fingerprint quality indices for predicting authentication performance. 777-789. & Soltanian-Zadeh. & Jain. pp. J. 209-223. Vol. Fingerprint Matching Using Ridges. no. IEEE Trans. H.. 20–29 Choi. Zhou. (1892). Proceedings 5th Int. Security and Surveillance. Classification and Uses of Finger Prints. Prodeedings of the IEEE. & Rokhlin. 33. J. (2010). (1991). The Use of Neural Network for Fingerprint Classification. 2005. K. A. S. 18. Reconstructing Orientation Field From Fingerprint Minutiae to Improve Minutiae-Matching Accuracy. (2007). (2005). & Kim. H. A. Vol. NY. pp.. K. L. Vol. 20. 185-203. pp.I. No. V. & Toyama. R. On Pattern Analysis And Machine Intelligence. & Scotti. International Conference on Complex. pp. pp. on Advances in Pattern Recognition. & Green. 7th Int'l Conf. Local Features for Enhancement and Minutiae Extraction in Fingerprints. & Jain. P. MARCH 2008 Galton. Introducing Pseudo-Singularity Points for Efficient Fingerprints Classification and Recognition. A. No.. pp. A. Intell. pp.. 531–543. Conf. (2005). (2008). E. 1991 International Biometric Group. Fingerprint Matching Incorporating Ridge Features with Minutiae. 17. (2002). J. F. Mach. Z. No. & Govindaraju. JULY 2009 Chikkerur. pp. 354-363. Dass. Apr. No. Choi. Fingerprint local analysis for high-performance minutiae extraction. 1993 Feng. IEEE ICIP. pp. Vol.A.295. M.. 265-268. 8. 2007 Hara. Finger Prints.. C. (1900). IEEE Transactions On Image Processing. H.74 Biometrics Ceguema.P. S. Proceedings of the 16th International Conference on Pattern Recognition. . Electronics Letters. M.T. 1261–1269 Kamijo. Kim. 18th February.-S. & Kamra. 1996 Maio. Mach. S. On Computers. Vienna. (1999). 156-160. S. Computer-Assisted Fingerprint Encoding and Classification. March 1976 . Jr. Pattern Recognition. (2010). R. An efficient method for partial fingerprint recognition based on Local Binary Pattern. M. 1993 Kaur. A. T. Fingerprint Singular Points Detection and Direction Estimation with a “T” Shape Model. 2008 Liu.. An experimental study on measuring image quality of infant fingerprints. International Conference on Networking and Information Technology. K. K. 15–27. J. Sorbello. 1970 Liu. (1996). 11. IEEE Tran. Fingerprint-Quality Index Using Gradient Components. Proceedings of the IEEE. (1976). Inf. 1999 Lee. A. & Demirkus. Y. New York: Springer-Verlag. G. Dynamic Clustering Of Maps In Autonomous Agents. Choi. Proc. 1. ICCCCT’2010 Militello. V. Conf. (2007). 3. (1975).262-274. H.. (2009). C. pp. L. (1993). B. pp. 13th ICPR.. 29. 201-207 Maio. P. M.. 3. (1986). 1-23. 1996 Malathi. Intelligent and Software Intensive Systems. A Novel Method For Fingerprint Feature Extraction. Pores and ridges: Fingerprint matching using level 3 features. A Novel Embedded Fingerprints Authentication System Based on Singularity Points. (2006).. A. 4. C. Y.. (2005). Pattern Recog. 2003. 18th International Conf. D. Standards. Pattern Anal. Chung. 477–480 Jain.-D. (2008). Biometric Sample Quality Std. S. Fingerprint Pattern Classification. S.080-1. Aug. pp. M.. (2006). 7. Proc. C. & Tojo. & Maltoni. 1975 Moayer. 2005 Jain. 1.S. Mach. I. no. & Thein. 792-800. International Conference on Complex. & Meena. on Man-Machine Systems. 1984 Kryszczuk. S. Technol.937. vol. A Structural Approach to Fingerprint Classification. (2008). D. C. Third Int’l Conf. no. (2005). IEEE Tran. . 1. K. Intell. Vol. No. Intell. vol 17. C. Proc. NY. F. Chen. 18..Minutiae-based Fingerprint Extraction and Recognition 75 International Com. D. on Pattern Recognition. July 2005 pp. Pattern Anal. IEEE Trans. IEEE Trans. A Tree System Approach for Fingerprint Pattern Recognition. & Ahn. Vol. S. M. 4). & Rizzi. (2003). Choi. Nov. On Information Forensics And Security. H. Vol. D. Springer. 1986 Joun. & Drygajlo. 3. Hilton Rye Town. 2009 Moayer. A. pp.pp. Chen. S. Principle Component Analysis. B. Draft (Rev. 18th Int. (2010). A Syntactic Approach to Fingerprint Pattern Recognition. DEC. & Zhang. pp. pp. Y.. Conti. P. K. pp. Vol. & FU. NO. 4.. 2006. USA. Sandhu. A. pp. & Kim. Neural Network. No. (1970). & Demirkus. D. Intelligent Fingerprint Recognition System by Using Geometry Approach. Proceedings of the KES. SEPT. 2010 Kawagoe. 2006 Lee. C-25. IEEE Trans.091. & Vitabile. & Wang. Maltoni.. (1996). Fingerprint feature extraction using Gabor filters. 2007 Jolliffe. pp. & Fu. 35 No. N. Document M1/06-0003. 288-290. Classifying Fingerprint Images Using Neural Network: Deriving the Classification State. Pattern Recognition. 295-303. Singular point detection in fingerprints using quadrant change information. Pores and ridges: High-resolution fingerprint matching using level-3 features. (1984). pp. 2008 Min. Hao. Y.. Proceedings of the AVBPA 2005. M.932-1.-J. Jan. vol. & Shelton. Ortega-Garcia. K. (1969). K.. D.. Fingerprint enhancement by directional fourier filtering. A. No. pp. & Millard.. J. pp. T. Mar. (1980). 1990 Stock. 29-34. A. 1171-1172. T. Baldassarri. C. & Tyma.. & Vaish. Anal. Cornell Aeronautical Laboratory. 2010 Qi. The steerable pyramid:Aflexible architecture for multi-scale derivative computation. (2007). 23–26 Sherlock. & Freeman. (2007). Chen. Proc. Karu. (1995). 141. G. Proceedings of Asia-Pacific Conference on Communications. Proceedings of the ICOCI. P.D. E. pp.F. Probability and Risk. Fifth IEEE Workshop Applications of Computer Vision. J. B. 2... 7. no. S. Washington. (2000). On Pattern Analysis And Machine Intelligence.. Abdurrachim. Third Int’l Conf. 1. 109-118. G. 31st Asilomar Conf. Mach. DC. 2006 Parsons. Proceedings of SPIE. Proc. vol. Fingerprint matching method using zigzagscanned DCT coefficients. Vol. S. N. R. Signals. K. (2008). N.. M.K. Thonnes. V. & Amornraksa. pp. & Saudi. Development and Evaluation of a reader of Fingerprint Minutiae. J. pp..M. W. pp. J. P. & Wang. D. Technical Report CAL no. C. 223-231. (1994). Smith. vol. of ITC-CSCC’05. W. Yaacob. Li. Vol. K. & Swonger. Law. 1994 Tachaphetpiboon. K. 1997 Simoncelli. Towards a Better Understanding of the Performance of Latent Fingerprint Recognition in Realistic Forensic Conditions. (1990). A hybrid method for fingerprint image quality calculation. (2010). Proc. 1994 Specht. & Amornraksa. 306-310. (1997).W.1488-1491 Rao. pp. P. 8. Neural Networks.. Fingerprint Feature Extraction Based Discrete Cosine Transformation (DCT). G. Proc. Type Classification of Fingerprints: A Syntactic Approach. L. Robust Fingerprint Authentication Using Local Structural Similarity. & Tascini. & Balck. Fierrez. D. 1980 Ratha. International Conference on Pattern Recognition. Int.M. G. 210-223. Pandiyan. 229-232. 2007 Tachaphetpiboon. vol. M. no. Conf. VOL. K. AUGUST 1996 Senior.. Neural Network Based Fingerprint Classification. 2000 Ratha. 2008. G. 1993 On. Fingerprints Recognition Using Minutiae Extraction: a Fuzzy Approach. & Exposito. (2005). Proc. & Alyea. pp. Proceedings of the 4th IEEE Workshop Automatic Identification Advanced Technologies. pp. R. Systems. Vision Image and Signal Processing. 2. IEEE Tran. (2006). and Computers. 1995. 14th International Conference on Image Analysis and Processing. (1996). D. Image Processing. 2005. 3. L. H. 3. 2007 Moscinska.. A. July 2005 . 1–14. pp. 124–129 Qi. 1. (1994). Pandit. Oct. Monro. & Kunieda. Ramos. IEEE Trans. Wang. 2008 Puertas.. Proceedings of the 15th International IEEE Conference on Image Processing. (2008). N. 2277. M. R. A Hidden Markov Model Fingerprint Classifier. R. vol. E. NO. Bolle. Automated system for fingerprint authentication using pores and ridge structure. A. Intell. XM-2478-X1:13-17. (1993).. 3. Vallesi. pp.. S. 18. Neural Network. Xie. D.. & Jain. V. M.. 87–94.. 1969 Stosz. K. no. Fingerprint Features Extraction Using Curvescanned DCT Coefficients.76 Biometrics Montesanto. J. & Wilson. T. pp. Rotationally invariant statistics for examining the evidence from the pores in fingerprints. K. S. A Real-Time Matching System for Large Fingerprint Databases. (2005). Probabilistic Neural Network. N. Q. A Novel Fingerprint Matching Method Using a CurvatureBased Minutia Specifier. J. A. 7-10th December 2010 . 34. Fujitsu Laboratories LTD. 2. 2006. Min. & Tan. M. .. Minutiae Feature Extraction From Fingerprint Images.J.. Min. Conf. Congress on Image and Signal Processing. & Grother.. Complex Spectral Minutiae Representation For Fingerprint Recognition. Conf. Xin-bao.... 4. (1999).Minutiae-based Fingerprint Extraction and Recognition 77 Thebaud. Moving-Window Algorithm For Fast Fingerprint Verification. A. J. IEEE SYSTEMS JOURNAL. J. 2008 Willis. J. No. H. Automation. H. (2009). Robotics and Vision. Government Printing Office. H.J. Federal Bureau of Investigation. E. N. S.W.. A. Z. M..Q. Novel approach to automated fingerprint recognition. T. SEP. Kevenaar. Veldhuis. Vol. P. 397-409. (2010). D.. Immonen. & Sharma.. Vol. S.501. (2007). (1990). December 2009 Yahagi. Systems and Methods with Identity Verification by Comparison and Interpretation of Skin Patterns Such as Fingerprints. Proceedings of the IEEE. pp. & Saarinen. Shin. G. (2001). pp. (2008). 25.. J. 2010 Xu. B. L. 1994 Wei-bo. Combining neighborhood. & Myers. & Akkermans. Lee. MD.D. (2009). 2001 Xu.. (1998). J.. IEEE Int'l Advance Comp. Prodeedings of the IEE. 3. C. L. (1994). & Yamagishi. Chin. Comparison of FFT fingerprint filtering methods for neural network classification. Robust designs for Fingerprint Feature Extraction CNN with Von Neumann Neighborhood. R. Feb. D. A cost-effective fingerprint recognition system for use with low-quality prints and damaged fingerprint. C. P. 1998 Wang. C. 2008 Youmaran. Measuring biometric sample quality in terms of biometric information.. M. T. 11th Int. S. & Liu. J. Candela. Sawarkar. W. 2. & Gokberk. & Veldhuis. R. H. T. R...S. Igaki. 5. M. 6-7 March 2009 Van derWeken. (2009). 418-427. F. pp.909. H.C. N.based and histogram similarity measures for the design of image quality measures. A Fast Minutiae-Based Fingerprint Recognition System. Spectral Minutiae Representations of Fingerprints Enhanced by Quality Data. 3. 5493. Kevenaar. H. 1984 Tico. NISTIR. (2008). 4. & Adler. Fingerprint Matching using Global Minutiae and Invariant Moments. Vol. 225–228. D. A.. M. J. L.R. 184–195. Sep. pp. International Conference on Computational Intelligence and Security. P.. S.C. A Fingerprint Matching Algorithm Based on Relative Topological Relationship among Minutiae. J.. Nachtegael. 2009 Xu. Baltimore.J.. & Chen-jian. 1999 The Science of Fingerprints: Classification and Uses. Hivrale. B. M. Pattern Recognition. A. & Hu. Washington. S. Veldhuis. Park. (2009). Vol. N. 2007 Wahab. India. H. US Patent No. N. Zhang. & Yoon. Proc. Control. (2001).H. Akkermans. IEEE Transactions On Information Forensics And Security. 2008 Watson. Li. P. (2006). pp..: US. Atsugi 243-01. E.. N. of ISCS’01. Ramo. pp. Proceedings of the Biometrics Symp. R. No. Vol. Image and Vision Computing. & Veldhuis. Fingerprint recognition using wavelet features.. 21-24. R. A. A. E. vol. T.W. Bazen. & Kerre.. Singapore. Kuosmanen. May 2001 Vaikol. Proceedings of the IEEE. H. Japan. A. M. (2010). Proceedings of the of International IEEE Conference Neural Networks & Signal Processing. I. No. 255–270. 1990 Yang. T. (2008). 2009 Xu. T. Fingerprint Verification Using Spectral Minutiae Representations. S. A Pitfall in Fingerprint Features Extraction. J. & Luo. Zhao. On Instrumentation And Measurement.. Q. 18th Int. pp.. Adaptive pore model for fingerprint pore extraction. (2010). J. no. Zhang. Zhang. D.. Lu. G. Pattern Recog.. On the Utility of Extended Fingerprint Features:A Study on Pores. 3. MARCH 2011 Zhao. A. 863-871. Conf. IEEE Trans. Aug. D. Selecting a Reference High Resolution for Fingerprint Recognition Using Minutiae and Pores. L. Vol. 1–4 . High resolution partial fingerprint alignment using pore-valley descriptors.. N. No. Zhang. D. 1050–1061. 2009 Zhao.78 Biometrics Zhang. & Jain. 3. 43.. K. Zhang. N. (2011).. L.. & Bao. pp. 60. (2008). Q. pp. Pattern Recognition. N. Proceedings of the IEEE. (2009).. Liu. Q. & Luo. 2008. Luo. 2010 Zhao. Proc. vol. F. Q. It uses features other than characteristics of minutiae from the fingerprint ridge pattern. et al. overcome the demerits of the minutiae based method. A fingerprint descriptor is used to descript and represent a fingerprint image for personal identification. 1997a. et al. Non-minutiae based descriptors (Amornraksa & Tachaphetpiboon. 2006. 2006. a descriptor is defined to identify an item with information storage. Cappelli et al. Fingerprint recognition has been one of the hot research areas in recent years. Ratha et al. a fixed length feature vector. Ross. minutiae are not easy to be accurately determined.. and texture information. such as local orientation and frequency. being combined with Biohashing and so on. 2009. 2003). etc. a fast processing speed. position. Minutiae based descriptor use a feature vector extracted from fingerprints as sets of points in a multi-dimensional space. It can extract more rich discriminatory information and abandon the pre-processing process such as binarization and thinning and post minutiae processing. However. Various fingerprint descriptors have been proposed in the literature.. Jain et al. however. 2011) are the most popular algorithms for fingerprint recognition and are sophisticatedly used in fingerprint recognition systems.. Nanni and Lumini. Two main categories for fingerprint descriptors can be classified into minutiae based and non-minutiae based. Usually. 2008b). Other merits are listed by using the nonminutiae based methods. Yang & Park. Yang et al. 2003). thus it may result in low matching accuracy. 1997b. 2007. such as a high accuracy.2000. such as fingerprint preprocessing. The matching is to essentially search for the best alignment between the template and the input minutiae sets. 2000. Jin et al. He et al. 1996.Jain et al. 2001. ridge bifurcation and so on (Maio et al. 2008. A general fingerprint recognition system consists of some important steps. minutiae based descriptors may not fully utilize the rich discriminatory information available in the fingerprints with high computational complexity. ridge shape. In addition. 2008a. Minutiae based descriptors (Jain et al. 2003. and it plays an important role in personal identification (Maio et al. Jucheng Yang . Benhammadi. 2004. due to the poor image quality and complex input conditions. Yang & Park. Nanni & Lumini. 2007. Sha et al. Nanchang China 1. feature extraction..4 Non-minutiae Based Fingerprint Descriptor School of information technology.. orientation. easily coupled with other system. and so on. which comprise several characteristics of minutiae such as type. The major minutia features of fingerprint ridges are: ridge ending. Jiangxi University of Finance and Economics Ahead Software Company Limited. Introduction Fingerprint recognition refers to the techniques of identifying or verifying a match between human fingerprints. matching. 2003;Tico et al. Liu et al. In section 3. . 2008b. Yang & Park. Finally. Having invariant characteristics in the proposed algorithm can significantly improve the performance for input images under various conditions. the algorithm extracts features from tessellated cells around the reference point. position and rotation to increase the matching accuracy with a low computational load. Tico et al. It further pursues an improved performance by using the alignment and rotation after a sophisticated.2000. which use a fixed-size ROI (Region of Interest) to extract seven invariant moments as a feature vector. and represent them as a compact fixed-length fingerCode. In this chapter.80 Biometrics Among various non-minutiae based descriptors. Nanni & Lumini (2009) proposed another descriptor based on histogram of oriented gradients (HoG). Nanni & Lumini (2008) proposed a hybrid fingerprint descriptor based on local binary pattern (LBP). The features are obtained from the standard deviations of the DWT coefficients of the image details at different scales and orientations. reliable detection of a reference point. And experimental results are illuminated in section 4. Amornraksa & Tachaphetpiboon (2006) propose another transform-based descriptor using digital cosine transform (DCT) features. Sha et al.. (2006) propose a fingerprint descriptor using invariant moments (IM) with the learning vector quantization neural network (LVQNN) for matching. Its improved ones (Yang & Park. (2007) also propose a new hybrid fingerprint descriptor based on minutiae texture maps according to their orientations. (2001) propose a transform-based descriptor using digital wavelet transform (DWT) features. feature selection with PCA (Principle Component Analysis). Multiple WFMT features can be used to form a reference invariant feature through the linearity property of FMT and hence to reduce the variability of the input fingerprint images. Benhammadi et al. Ross et al. and a Support Vector Machine (SVM) for classification is proposed. Matching with a SVM also contributes to the performance enhancement by simply assigning weights on different cells and classification with high accuracy. Gabor feature-based ones (Jain et al. It exploits the eight fixed directions of Gabor filters to generate its weighting oriented Minutiae Codes. The proposed descriptor basically uses moment features invariant to scale. 2003) present a relatively high matching accuracy by using a bank of Gabor filters to capture both the local and global details in a fingerprint. (2004) propose an improved transform-based descriptor based on the features extracted from the integrated wavelet and Fourier-Mellin transform (WFMT) framework. 2008a. To fully utilize both the global and local ridge information while removing unwanted noises. Combining with the PCA for feature selection to reduce the dimension of feature vector and choose the distinct features. The chapter is organized as follows: some state of the art non-minutiae based descriptors are briefly reviewed in section 2. and a non-minutiae based fingerprint descriptor with tessellated invariant moment features.) using tessellated invariant moments(Tessellated IM) or sub-regions IM with eigenvalue-weighted cosine (EWC) distance or nonlinear back-propagation Neural Networks (BPNN) to handle the various input conditions for fingerprint recognition. The transform methods show a high matching accuracy for inputs identical to one in its database. Jin et al. Yang et al. conclusion remarks are given in section 5. some state of the art non-minutiae based descriptors are first reviewed. (2003) describes a hybrid fingerprint descriptor that uses both minutiae and a ridge feature map constructed by a set of eight Gabor filters. a proposed non-minutiae based fingerprint descriptor with tessellated invariant moments and SVM is explained. image rotation is compensated by computing the features at various orientations. to achieve invariance. independent and discriminate fingerprint image features. Sha et al. . Another Gabor filters based descriptor is proposed by Nanni & Lumini (2007). HOG. The features are extracted by tessellating the image around a reference point (the core point) determined in advance. IM based.Non-minutiae Based Fingerprint Descriptor 81 2. They describe a new texture descriptor scheme called fingerCode which is to utilize both global and local ridge descriptions to represent a fingerprint image. LBP. Fingerprint matching is based on finding the Euclidean distance between the two corresponding FingerCodes. (2003) describes a hybrid fingerprint descriptor that uses both minutiae and a ridge feature map constructed by a set of eight Gabor filters. such as Gabor filters. The texture descriptors are obtained by filtering each sector with 8 oriented Gabor filters and then computing the AAD (Average Absolute Deviation) of the pixel values in each cell. and the template with the minimum score is considered as the rotated version of the input fingerprint image. some hybrid descriptors combined with Gabor filters are proposed. The feature vector consists of an ordered collection of texture descriptors from some tessellated cells. The features of these descriptors may be extracted more reliably than those of minutiae. where the minutiae are used to align the images and a multi-resolution analysis performed on separate regions or sub-windows of the fingerprint pattern is adopted for feature extraction and classification.1 Gabor filters based descriptor The Gabor filters based descriptors (Jain et al. which avoids the fingerprint pair relative alignment stage. Recently. DCT. So these methods require a larger storage space and a significantly high processing time. Exception of the widely used minutiae descriptors.2000. Non-minutiae based descriptors It is important to establish descriptors to extract reliable. the Gabor filters based descriptors are not rotation invariant. they construct absolute images starting from the minutiae localizations and orientations to generate the Weighting Oriented Minutiae Codes. The similarity measurement is done by the weighed Euclidean distance matchers with a sequential forward floating scheme. the nonminutiae based descriptors use features other than characteristics of minutiae from the fingerprint ridge pattern are able to achieve the characters of the mentioned fine traits . 2. However. Ross et al. The ridge feature map along with the minutiae set of a fingerprint image is used for matching purposes. The hybrid matcher is proved to perform better than a minutiae-based fingerprint matching system by the author. (2007) also propose a hybrid fingerprint descriptor based on minutiae texture maps according to their orientations. 2003) have been proved with their effectiveness to capture the local ridge characteristics with both frequency-selective and orientation-selective properties in both spatial and frequency domains.. The extracted features are invariant to translation and rotation. The features are concatenated to obtain the fingerCode. DWT. The features extracted are the standard deviation of the image convolved with 16 Gabor filters. each fingerprint has to be represented with ten associated templates stored in the database. The next sub-sections will introduce some classical and state of the art non-minutiae based descriptors. To achieve approximate rotation invariance. Rather than exploiting the eight fixed directions of Gabor filters for all original fingerprint images filtering process. Since the scheme assumes that the fingerprint is vertically oriented. Benhammadi et al. WFMT. The Fourier–Mellin transform (FMT) serves to produce a translation. centred at the reference point. and then quartered to obtain four nonoverlapping sub-images of size 32×32 pixels. And multiple WFMT features can be used to form a reference invariant feature through the linearity property of FMT and hence reduce the variability of the input fingerprint images.2 DWT based descriptor Tico et al. Fingerprint matching is based on finding the Euclidean distance between the corresponding multiple WFMT feature vectors. Next. The features are obtained from the standard deviations of the DWT coefficients of the entire image details at different scales and orientations to form a feature vector of length 12 (48 in total from four subimages). to achieve rotation-invariant. 2. To extract the informative features from a fingerprint image. the features extraction from frequency domain with corresponding transforms are not rotation-invariant. since the performance highly depends on extracting all minutiae precisely and reliably. rotation and scale invariant feature.82 Biometrics However. The normalized l2-norm of each wavelet sub-image is computed in order to create a feature vector. Fingerprint matching is also based on KNN with the Euclidean distance. Multiple m WFMT features can be used to form a reference WFMT feature and just only one representation per user needs to be stored in the database. The feature vector represents an approximation of the image energy distribution. the features from the same fingerprint image can be judged into the different ones. the features are not rotation-invariant. the standard deviations of the DCT coefficients located in six predefined areas are calculated and used as a feature vector of length 6 (24 in total from four sub-images) for fingerprint matching. 2. the matching accuracy of these hybrid approaches may be degraded for lowquality inputs. and the descriptor requires a high time-consuming process if using the multiple WFMT features with training images. However. so an improved method needs.4 Integrated wavelet and the Fourier–Mellin transform (WFMT) based descriptor Jin et al. is able to make the fingerprint images less sensitive to shape distortion. the DCT is applied to each sub-image to obtain a block of 32×32 DCT coefficients. with its energy compacted feature is used to preserve the local edges and reduce noise in the low frequency domain. The standard deviations of the DCT coefficients located in six predefined areas from a 64×64 ROI are used as a feature vector of length 6 (24 in total from four sub-images) for fingerprint matching. (2001) proposed a method using DWT features. However. the main disadvantage of this descriptor is that the reference point is based on a non-precise core point. 2. Finally. too. However. . the image is first cropped to a 64×64 pixel region. Fingerprint matching is based on k-Nearest Neighbor (KNN) with finding the Euclidean distance between the corresponding feature vectors.3 DCT based descriptor Amornraksa & Tachaphetpiboon (2006) also propose a method using digital cosine transform (DCT) features for fingerprint matching. so if the image input with rotation. Wavelet transform. (2004) propose an improved transform-based descriptor extracted features from the integrated wavelet and Fourier-Mellin transform (WFMT) framework. 2. The matching value between two fingerprint images is performed by a complex distance function. HoG has been first proposed by Dalal & Triggs (2005) as an image descriptor for localizing pedestrians in complex images and has reached increasingly popularity. The aim of this descriptor is to represent an image by a set of local histograms which count occurrences of gradient orientation in a local cell of the image. 2. The implementation of the HoG descriptors can be achieved by computing gradients of the image. and it may be degraded for low-quality inputs.6 Histogram of oriented gradients (HoG) based descriptor Nanni & Lumini (2009) recently proposed a hybrid fingerprint descriptor based on HoG. hence the LBP representation may be less sensitive to changes in illumination.7. 2008b. recently. it is invariant to monotonic grayscale transformation.y) the moment (sometimes called ‘raw moment’) of order (p + q) is defined as ∞ ∞ mpq = −∞ −∞ ∫ ∫x y p q f ( x . The matching value between two fingerprint images is performed by a complex distance function that takes into account the presence of different types of descriptors and different regions.1 Raw moments For a 2-dimensional continuous function f(x. building a histogram of gradient directions and normalizing histograms within some groups of sub-regions for each sub-region to achieve a better invariance to changes in illumination or shadowing. 2008a.) using tessellated IM or sub-regions IM with EWC distance or nonlinear BPNN to handle the various input conditions for fingerprint recognition. A LBP proposed by Ojala et al. the matching accuracy depends on extracting all minutiae precisely and reliably. And its improved ones (Yang & Park. The two fingerprints to be matched are first aligned using their minutiae. and a Gabor filter with LBP hybrid method (GLBP) also proposed by Nanni & Lumini (2008). instead of from the original sub-windows. Yang & Park.Non-minutiae Based Fingerprint Descriptor 83 2. 2. which use a fixed-size ROI to extract seven invariant moments as a feature vector. Moreover. then the images are decomposed in several overlapping sub-windows.7 IM based descriptor Except for the above non-minutiae based descriptor. (2006) propose an IM based descriptor with the LVQNN for fingerprint matching. too. the matching accuracy depends on extracting all minutiae precisely and reliably. dividing the image into small subregions. 1.… . Yang et al. However. (2002) is a grayscale local texture operator with powerful discrimination and low computational complexity. similar with the above minutiae based alignment methods. q= 0. The details of the invariant moments are introduced as below: 2. However.5 Local binary pattern (LBP) based descriptor Nanni & Lumini (2008) proposed a fingerprint descriptor based on LBPs. each sub-window is convolved with a bank of Gabor filters and the invariant LBPs histograms are extracted from the convolved images. y )dxdy (1) for p. 2. Conversely.2 Central moments a. (Mpq) uniquely determines f(x. and the moment sequence (Mpq) is uniquely determined by f(x. if μ11 < 0 .it means the image inclines towards up-right. μ11 represents the gradient of the image.y). then Eq. y )dxdy (6) x = m10 m00 _ and y = m01 m00 _ (7) If f(x.. μ02 represents vertical extension of the image. y ) x y (3) A uniqueness theorem (Papoulis.y) is piecewise continuous and has nonzero values only in a finite part of the xy plane.y). The meanings of the central moments are listed as below: μ 20 represents he horizontal extension parameter of the image. 3. moments of all orders exist. The central moments are defined as μ pq = Where ∞ ∞ −∞ −∞ ∫ ∫ (x − x) (y − y ) p _ _ q f ( x .y). 1. y } = { m10 m00 . i. y ) x y (2) In some cases.it means the image inclines towards up-left. this may be calculated by considering the image as a probability density function.7. Centroid point of the image: The centroid point { x . the image is summarized with functions of a few lower order moments. In practice.y) is a digital image.e.(6) becomes μ pq = ∑∑ ( x − x )p ( y − y )q f ( x .84 Biometrics Adapting this to scalar (grey-tone) image with pixel intensities I(x. if μ11 > 0 . y } of the image can be extracted by the formula: { x . by dividing the above by ∑∑ I (x . raw image moments are calculated by mpq = ∑∑ x p y q I ( x . y ) x y _ _ (8) b. 1991) states that if f(x. m01 m00 } (5) 2. Simple image properties derived via raw moments include: 1. . Area (for binary images) or sum of grey level (for grey-tone images): The Area (A) of the image can be extracted by the formula: A=m00 (4) 2. As the equation shown below. φ1 = η 20 + η02 2 φ2 = (η20 − η02 )2 + 4η11 φ3 = (η30 − 3η12 )2 + ( 3η21 − 3η03 )2 φ4 = (η30 + η12 )2 + (η 21 + η03 )2 φ5 = (η30 − 3η12 )(η30 + η12 )[(η30 + η12 )2 − 3(η21 + η03 )2 ] + ( 3η 21 − η03 )(η 21 + η03 )[ 3(η 30 + η12 )2 − (η 21 + η03 )2 ] (10) φ6 = (η20 − η02 )(η 30 + η12 )[(η 30 + η12 )2 − (η 21 + η03 )2 ] + 4η11 (η 30 + η12 )(η 21 + η03 ) φ7 = ( 3η21 − η03 )(η30 + η12 )[(η 30 + η12 )2 − 3(η21 + η03 )2 ] + ( 3η12 − η 30 )(η 21 + η03 )[ 3(η 30 + η12 )2 − (η 21 + η03 )2 ] .the barycenter inclines towards right . position. 3… 2 μ pq γ μ00 (9) 2. when μ12 > 0 . and rotation invariant (Gonzalez and woods. when μ 30 > 0 . 6. 7.Non-minutiae Based Fingerprint Descriptor 85 μ 30 represents the excursion degree of the image’s barycenter on horizontal direction.the image’s extension on the right side is greater than the left side. 5. then if μ 30 < 0 . as the normalized central moments.the extension of the downside image is greater than the upside. 2002). The set of moment invariants consist of groups of nonlinear centralized moment expressions. if μ03 > 0 . Hu derived the expressions from algebraic invariants applied to the moment generating function under a rotation transformation. and inclines towards downward while μ03 < 0 . A set of seven invariant moments derived from the second and the third moments were proposed by Hu (1962). c. invariant moments are used for texture analysis and pattern identification. then smaller when μ12 < 0 . η pq = where γ = p+q + 1 for p+q = 2. denoted η pq . then smaller when μ 21 < 0 . μ 21 represents the equilibrium degree about the image on the vertical direction.3 Invariant moments As a region description method.the barycenter inclines towards left. are defined as 4. Normalized moments: Moments where p+q >=2 can also be invariant to both translation and changes in scale by dividing central moments by the properly scaled μ00 moment. And they are scale. if μ21 > 0 .7.the barycenter inclines towards upward. μ03 represents the vertical excursion degree of the image’s barycenter. μ12 represents the equilibrium degree of the image about the vertical direction. Figure 1 and Figure 2 show the invariant moments analysis on the four sub-images with 2D and 3D views. 2.. We choose four subimages from a ROI of a fingerprint image.. respectively. we can see that the ridge valleys of the sub-images are totally different. and scale. (a) s1 (b) s2 (c) s3 (d) s4 Fig. The experiments here are designed to check the characteristics of the invariant moment features’ invariance to rotation and scale.2.. and divide the ROI into four sub-images. rotation invariance of sub-image S1. As we can see from the tables.86 Biometrics The values of the computed moment invariants are usually small values of higher order moment invariants are close to zero in some cases. the outputs of the moment invariants are mapped |lg(|φi |)| i=1. So we reset the value range using the logarithmic function.7 . Sub-images of sub-images (a) s1 (b) s2 (c) s3 (d) s4 (a) s1 (b) s2 (c) s3 Fig. the invariant moments are nearly invariant to the scale and rotation invariance. . which can take the values to the large dynamic range with a into Φ i = nonlinear scale. 3D images of sub-images (a) s1 (b) s2(c) s3 (d) s4 (d) s4 And Table 1 and Table 2 show seven invariant moments of these four sub-images. respectively. From the figures. 1. 101012 59. tessellated IM based descriptors (Yang & Park.222497 39.162576 29.753254 58.753254 58.753254 59.792705 20.693181 43.811898 41.222497 41.682184 41. a set of seven invariant moments are computed.742121 29.8 0 90 180 270 0 90 180 270 0 90 180 270 6.474584 58.660157 6.g.474584 Table 2. For each cell. Furthermore.607114 27. and s = {sn } n =1 is the collection of all the cells.474584 58.674009 22.659508 6.607114 27.209754 27.916016 29.664572 6. and each cell has a size of 16×16 pixels. n N M s = {sn } n =1 = {{φnk }7 =1 } n =1 k (11) where M is the number of the cells. By adjusting the size of ROI and the number of the cells.301135 61.660157 6.753962 58.528609 58.742121 29.931670 41. and speed up the processing time. the size of ROI is 64×64 pixels.701581 Table 1.459971 27.607114 27.742121 29.607114 29.607114 30.400168 27.169709 31. in order to improve the matching accuracy.518255 63.607114 27.222497 41. e.742121 29.659431 6.792705 27.275622 60.222497 41.474584 58. rotation invariance of sub-image S1 2. 2008a) using only a certain area (ROI) around the reference point at the centre as the feature extraction area instead of using the entire fingerprint is proposed.450452 21.211065 61.901367 59.660157 6.271717 20. Scale.742121 29.918130 42.811871 59.945926 60.474584 58.657625 6.474584 59.433472 41.127298 21.Non-minutiae Based Fingerprint Descriptor 87 Φ4 Φ5 Φ6 Φ7 Subimage S1 S2 S3 S4 Φ1 Φ2 Φ3 6. Seven invariant moments of the four sub-images Scale Rotation Φ 1 0.742121 29. N=7M is the total length of the collection.660157 6.417999 58.792705 22. weights w is assigned to each tessellated cell to distinguish the cell from the foreground or background.456694 28. Suppose N φ = {φn }7 = 1 is a set of invariant moments.641931 21.792705 22.659431 6.222497 59.309514 39.222497 41.337310 31.657605 6.714937 59. which can be used to resolve the embarrassment when the reference point is nearby the border of the .826501 29.742121 59.8 Tessellated invariant moments based descriptors In order to reduce the effects of noise and non-linear distortions.792705 22. 16 rectangular cells.792705 22.660157 6.742121 29.607114 27.654452 21.792705 22.474584 58.753254 58.753254 58.111226 60. we can capture both the local and global structure information around the reference point (see Figure 3 (a-b)).474584 58.659431 Φ2 Φ3 Φ4 Φ5 Φ6 Φ7 1 2 21.676695 6.901367 59.433996 27.659431 6.222497 41.222497 41.473568 22.650909 28.901367 59.659334 6.607114 27.792705 22. Invariant moment analysis introduced in section 2 is used as the features for the fingerprint recognition system.222497 41.053503 58.742485 29.792705 22. The ROI is partitioned into some non-overlapping rectangular cells as depicted in Figure 3 (a).607114 27.474584 58.376333 27.901367 41.742121 29.756198 59.318065 30. FAR is the false acceptance rate. HOG. else w equals 1.26%. Table 3 gives a summarization of some classical and state-of-art non-minutiae based descriptors in these recent years. WFMT. and the dimension of the features are high. DWT. (2006) just using only a fixed ROI. a fingerprint can be represented by a fixed-length feature vector as N N f = { f n } n =1 = { wsn } n =1 (12) The length of the total feature vectors is N. An improvement way is . the tessellated or sub-regions ROIs are delineated from the same area of a fingerprint image. DCT. IM based. it will be labelled as background cells and the corresponding w is set to 0. From the table. all kinds of non-minutiae descriptor has proposed to hand the shortcomings of minutiae methods in the fingerprint recognition systems. If the tessellated cells that contain a certain proportion of the background pixels. we can see that based on the public database of FVC2002 DB2. strong correlation may exist among the extracted features. Therefore. which means its performance is the best among all the non-minutiae descriptors on the same public database listed in the table. such as Gabor filters. (a) Tessellated cells (local) or (b) tessellated cells (global) (Yang & Park. FRR is the false reject rate. 2008a) 2. (a) (b) Fig. and analyzes their feature extraction. and GAR is the genuine acceptance rate (GAR = 1-FRR)). LBP. 3. 3. alignment.9 Summarization As the above analysis. Proposed non-minutiae based descriptor Although the IM descriptor proposed in Yang & Park (2008a) and Yang & Park (2008b) use the tessellated or sub-regions IM to extract both the local and global features. the sub-regions IM method with BPNN has the lowest EER of 3. Table 3 summarizes the classical and state of the art non-minutiae based method (EER is the equal error rate. and matching methods. comparing to Yang et al.88 Biometrics fingerprint image. Approaches Features Alignment methods Core point Matching methods Performance analysis Databases Resutls Self Self FVC2002 DB2 FVC2002 DB 2 Self &small Self &small FVC2002 DB 2 FVC2002 Db2 FVC2002 DB 2 FVC2002 DB 2 FVC2002 DB 2 FVC2002 DB 2 FAR=4. (2001) Gabor filters Amornraksa & Tachaphetpiboo n (2006) Jin et al. So we reduce the dimension of feature vector by examining feature covariance matrix and then selecting the most distinct features and using them to improve the verification performance. ROI determination (Yang & Park.1 Feature vector construction After the pre-processing steps of fingerprint enhancement. (2003) Benhammadi et al. PCA is one of the oldest and best known techniques in multivariate analysis (Fausett. (2004) Nanni & Lumini (2008) Nanni & Lumini (2009) Yang et al. (2006) Yang & Park (2008a) Yang & Park (2008b) Euclidean distance Gabor filters& Minutiae Euclidean Minutiae distance Minutiae Gabor Minutiae Normalized filters maps distance Gabor filters Minutiae Weighed Euclidean distance DWT NO KNN with Euclidean distance DCT NO KNN with Euclidean distance WFMT Core point Euclidean distance LBP Minutiae Complex distance function HoG Minutiae Complex distance function Seven IM Reference point LVQNN Tessellated IM Reference point EWC distance Sub-regions IM Reference point BPNN Recognition rates 95. (2007) Nanni & Lumini (2007) Tico et al.26 Table 3.2 EER=3. as a powerful classification tool. 2008a. a SVM is proposed for fingerprint matching. Besides. 2008b.(2000) Ross et al.1 EER=3.8 FAR=0. reference point determination and image alignment.5 FRR=2.309 EER=6.8 EER=4 EER=5.78 EER=3. Yang & Park. Yang et . 1994).2 Recognition rates 99.6 Jain et al.23 EER=5. Summarization of the classical and state of the art non-minutiae based method 3.6G AR=96.19 EER=3.Non-minutiae Based Fingerprint Descriptor 89 proposed here to reduce the dimension of feature vector and choose the distinct features before matching. Let x ∈ Rn be a random vector.90 Biometrics al. w is the weight of distinguishing the cell from the foreground or background. and each cell has a size of 16×16 pixels. um]. . λn). 2. ≥ λn. It is a type of neural network that automatically determines the structural components. . PCA factorizes Ξ into Ξ = UΛUT. Let u1. For example. Let the training set m be {( x i . with input vector x i (n components).. . . The basic idea in PCA is to find the orthonormal features. un and λ1. 3. 16 rectangular cells. due to the good characteristics of reducing the effects of noise and non-linear distortions. and a fingerprint can be represented by N N N a fixed-length feature vector as f = { f n } n =1 = { wsn } n = 1= {w{φnk }7 =1} . with U = [u1. which are ‘pushed up against’ the two data sets. The covariance matrix of x is defined as Ξ = E{[x-E(x)][x-E(x)]T}. since in general the larger the margin the better the generalization error of the classifier. . as descript in section 2. λn be eigenvectors and eigenvalues of Ξ. . m. . we used a two-class SVM to classify the input fingerpints into the corresponding class. λ2. . while retaining as much as possible variation in the data set. The mth principal component is determined as the principal component of the residual based on the covariance matrix. a good separation is achieved by the hyperplane that has the largest distance to the neighboring data points of both classes. 3. u2. The first principal component is the projection in the direction in which the variance of the projection is maximized. Let P = [u1. x ) + b ) = sign( ∑ i =1 ωi K ( x i . where m < n. We need to note that once the PCA of Ξ is available.. u2. x ) + b ) m m (13) . . 2000) are a set of related supervised learning methods used for classification and regression. . and the length of the total feature vectors is n N. . the best rank-m approximation of Ξ can be readily computed.8. un] and Λ = diag(λ1. 2008c). . k n =1 where φ = {φn }7 = 1 is a set of invariant moments. Then the feature-vector with the elements consists of a sets of moments derived from tessellated ROIs. . two parallel hyperplanes are constructed.3 Matching with SVM SVMs (John & Nello. Then. SVM is a powerful tool for solving small sample learning problem that offers favorable performance using linear or nonlinear function estimators. The objective of PCA is to reduce the dimensionality of the data set. a SVM will construct a separating hyperplane in that space. Intuitively. . So the total length of the feature vector is 16×7=112. one on each side of the separating hyperplane.. . u2. the feature selection with PCA is briefly introduced. λ2. respectively. where n is the dimension of the input space. y i = ±1 indicates two different classes and i = 1.2 Feature selection with PCA Here. Then y = PTx will be an important application of PCA in dimensionality reduction.. The decision function of SVM is: y = sign( ∑ i =1α i yi K ( x i . . . . we use the tessellated IM as the features. and to identify new meaningful underlying variables. Viewing input data as two sets of vectors in an n-dimensional space. In our study. one which maximizes the margin between the two data sets. the ROI is partitioned into some non-overlapping rectangular cells with the size of ROI is 64×64 pixels. .y i )}i =1 . . and λ1 ≥ λ2 ≥. To calculate the margin. . . Degree are the kernel parameters. 8 fingerprints per person). DB3_a. and DB4_a is 500 dpi. x ) is the output of the ith hidden node with respect to the input vector x . where K( x i . Experimental results The proposed algorithm was evaluated on the fingerprint images taken from the public FVC2002 database (Maio et al. DB2_a. 4. and ωi = α i y i is the weight of the ith hidden node connecting with the output node. by choosing the kernel K( x i . The learning task of a SVM is to find the optimal α i by solving the maximization of the Lagrangian W (α ) = ∑ i =1α i − m 1 m ∑ α iα j yi y j K( xi . which contains 4 distinctive databases: DB1_a. and that of DB2_a is 569 dpi. it is a mapping of the input x and the support vector x i in an alternative space (the so-called feature space).Non-minutiae Based Fingerprint Descriptor 91 Fig. DB3_a and DB4_a. The resolution of DB1_a. Fingerprints from 101 to 110 (set B) have been made available to the participants to allow parameter tuning before the submission of the . j =1 (14) subject to the constraints ∑ m i =1 α i yi = 0 0 ≤ αi ≤ c (15) The common kernel functions K ( x i . Linear Polynomial Radial − basis Sigmoid (16) 4. x ) = ⎨ 2 ⎪exp( −γ ⋅ x i − x ) ⎪ T ⎩ tanh(γ ⋅ x i ⋅ x + c ) Where γ . 2002). x j ) 2 i . x ) = φ ( x i ) ⋅ φ ( x ) . Each database consists of 800 fingerprint images in 256 gray scale levels (100 persons.c . Illustration of a SVM structure Figure 4 illustrates the structure of a SVM. x ) are defined as : ⎧ xT ⋅ x i ⎪ Degree T ⎪ (γ ⋅ x i ⋅ x + c ) K( x i . α i and b are the learning parameters of the hidden nodes. 92 Biometrics algorithms. therefore. we compute the IM features. For each input fingerprint and its template fingerprint. While in the testing stage. The EER indicates the point where the FRR and FAR are equal. Similarly. Since the output is to judge whether the input fingerprint is match or non-match according to the identity ID. if we keep 95% energy. we can see that almost all the egienvalues spectrum with the eigenvectors elements are below than 70. et al. from the figure. the 64×64 pixels size of ROI was adopted. we did experiments on FVC2002 databases DB2_a. each contains vector from the test fingerprint with the corresponding identity ID. To evaluate the performance of the verification rate of the proposed method. The FRR and FAR are defined as follows: FRR = FAR = Number of rejected genuine claims × 100% Total number of genuine accesses (17) (18) Number of accepted imposter claims × 100% Total number of imposter accesses The equal error rate (EER) is also used as a performance indicator. test samples with the same normalization and scale processing are fed to the SVM to produce the output values. If the output number is equal the corresponding ID. 4. then it means match. Six out of eight fingerprints from each person were chosen for training and all the eight fingerprints for testing. The element of the output values is restricted in the class number. EER = (FAR + FRR ) 2 . the number of the output class is the same with the verifying persons. and the identity ID of the corresponding class is used to guide the classifying results through the SVM. each contains vector from the training fingerprint. In the training stage. the benchmark is then constituted by fingers numbered from 1 to 100 (set A). and 800 for testing. if FAR = FRR (19) For evaluating the recognition rate performance of the proposed descriptor. vice visa. as below. 600 patterns were used for training. The features are computed from the training data. In the feature selection experiments. We divided the database into a training set and a testing set. An ROC curve is a plot of false reject rate (FRR) against false acceptance rate (FAR). In our experiments. the receiver operating characteristic (ROC) curve is used. Since our feature vector contains 112 elements. 2007) are fed to a SVM with indicating their corresponding class. we can take the matching process as a two-class problem. the feature-vector with the elements consisting of a sets of moments derived from tessellated ROIs was processed with PCA. then eigenvectors element of 50 was chosen. For a database. . training samples after normalization and scale processing (Hao. a SVM is used to verify a matching between feature vectors of input fingerprint and those of template fingerprint. So the length of the feature vector was 16×7=112. Figure 5 describes the egienvalues spectrum with different elements by using PCA for feature selection. The new dimensional features are uncorrelated from original features due to the PCA analysis.1 Performances with a SVM matching In the experiments. and the ROI was tessellated into 16 rectangular cells with each cell had a size of 16×16 pixels. the used FVC2002databases are set A. the features are computed from the testing data. In the experiments. 3. the Radial-basis SVM has only one paramter γ .Non-minutiae Based Fingerprint Descriptor 93 Fig. 6. The curves of the recognition rate of the proposed descriptor by different types of SVM with different γ value on the database FVC2002 DB2_a As we know from the section 3. γ . the Polynominal SVM has three parameters of c. The egienvalues spectrum with different elements by using PCA for feature selection Fig. Degree and . 5. the linear SVM has no parameters. γ . n=10) according to the method of Nanni & Lumini (2008). we can see that the recognition rate is growing up with the γ value of the Radial-basis SVM. (2004). In our experiments. And the descriptor with different eginevalue elements PCA has different recognition rate. However. In our experiment. we can see that the recognition rates of the proposed descriptor of all nonlinear types of SVM are growing up with the γ value.2 Comparing with other descriptors The performances of several descriptors aimed at evaluating the usefulness of our descriptor were compared. 2. a descriptor using the invariant local binary patterns features (P = 8. the parmeters of c and Degree are fixed and determined by experiments. with small eginevalue(such as 10) the recognition rate is lower than the PCA with large eginevalue (such as 20) under the same γ value. and the proposed descriptor can achieve high recognition rates with the nonlinear types of SVM. Gabor filters based. For genuine match. For a database. all the recognition rate of the descriptor with PCA may have better performances comparing with the descriptor without PCA. From the figure. The same experiments are repeated four times by selecting different fingerprints for training and testing. and 200 for testing. therefore. Tessellated IM based. 4. the number of matches for genuine and impostor are 6×200=1200 and 99×200=19800 for each database. and the same parameters as their paper are used. each test fingerprint is compared with the fingerprints belonged to other persons. a descriptor using both minutiae and a ridge feature map with eight directions for fingerprint matching according to the method of Ross et al. DCT based. And for impostor match. each fingerprint of each person is compared with other fingerprints of the same person. Figure 6 describes the curves of the recognition rate of the proposed descriptor by different types of SVM with different γ value on the database FVC2002 DB2_a. and the egienvalues element is 50. We divided each database into a training set and a testing set using a 25% jackknife method. For evaluating the recognition rate performance of the proposed tessellated IM based descriptor with different types of SVM. a descriptor using WFMT features for fingerprint matching according to the method of Jin et al. 5. and the same parameters as their paper are used. the genuine match and impostor match were performed on the four sub-databases of FVC2002 database. Figure 7 shows the curves of the recognition rate of the proposed descriptor by different γ value of the radial-basis SVM with different PCA egienvalues elements on the database FVC2002 DB2_a. and the best recognition rate achieves by the egienvalue element equals 50. to compute the FAR and the FRR. (2003). From the figure. and the same parameters as their paper are used. 3. Since there are 200 test patterns for an experiment. and then the average of four experimental results becomes the final performance. WFMT based. 600 patterns are used for training. LBP based. R=1.94 Biometrics the Sigmoid SVM has two paramters of c. 4. . Six out of eight fingerprints from each person were chosen for training and the remaining two for testing. all the descriptors were based on the two-stage enhanced image: 1. a descriptor using DCT features for fingerprint matching according to the method of Amornraksa & Tachaphetpiboon (2006). the 64×64 pixels ROI was tessellated into 16 rectangular cells with each cell had a size of 16×16 pixels and matching with SVM. 8.Non-minutiae Based Fingerprint Descriptor 95 Fig. The curves of the recognition rate of the proposed descriptor by different γ values of the radial-basis SVM with different PCA egienvalues elements on the database FVC2002 DB2_a Fig. ROC curves of different methods on database FVC2002 DB2_a . 7. 66 5.42 4.36 Table 4. The scheme consists of other three important steps after the pervious preprocessing: feature vector construction.17%. respectively.5.4. We firstly review some classical and state of the art fingerprint descriptor. From the table.97%. WFMT. DCT. we emphases on the introduction of the designing approaches and implementing technologies for non-minutiae based fingerprint descriptors. On the other hand.36% over the four sub-databases of FVC2002.21 1. And Table 4 illustrates the EER (%) performances of the descriptors of Gabor filters based.21 7. Acknowledgment This work is supported by the National Natural Science Foundation of China (No. Experimental results show that our proposed descriptor has better performances of matching accuracy comparing to other prominent descriptors on public databases.23 3.68%.23 DB3_a 4. Further works need to be explored for robustness and reliability of the system. Conclusion In this chapter. WFMT based and LBP based with Tessellated IM based over the databases of FVC2002.18 6. and it is also supported by Merit-based Science and Technology Activities Foundation for Returned Scholars.97 2.42 DB2_a 3.43 3. LBP based (dashed lines).64 6. DCT.96 Biometrics Figure 8 compares the ROC curves of the descriptors of Gabor filters. DCT.48 DB4_a 6. .68 4. DCT based. DCT. the FRRs of all the descriptors slowly drop down with respect to their FARs. WFMT.96 2.62 8. On database containing most poor quality images such as FVC2002 DB2_a. WFMT. Ministry of Human Resources of China. 61063035). alignment method and matching methods. and matching with a SVM. we can see that the ROC curve of tessellated IM based descriptor (solid line) is below those of the descriptors of Gabor filters.31 Average 4. we can see that the Tessellated IM based descriptor has the best of results with an average EER of 2.98 5. LBP based are 4. EER(%) performance of the descriptors of Gabor filters. The contributions of this chapter are that we analyze and summarize some classical and state of the art non-minutiae based descriptors and propose a new improved one with tessellated IM features.17 5. analyze their feature extraction. LBP based and tessellated IM based over the databases of FVC2002 5. while those of the descriptors of Gabor filters. and propose a non-minutiae based fingerprint descriptor by using tessellated IM features with a feature selection of PCA and a SVM classifier. The ROC curves of the Figure 8 prove the facts. feature selection by PCA and intelligent SVM to represent fingerprint image in order to effectively handle various input conditions. LBP based with tessellated IM based descriptor on the database FVC2002 DB2_a.53 6.66%. DB1_a Gabor filters based DCT based WFMT based LBP based Tessellated IM based 1. which means that tessellated IM based descriptor outperforms the compared descriptors. 5. feature selection with PCA.62 2.87 2.79 5.82 2.41 5. WFMT . 6. Hong.K..Maio. Fingerprint minutiae matching based on the local and global structures. .. R.Y. Visual pattern recognition by moment invariants. R.811-814 Maltoni. . pp. 302–313 Jain.H.. D. R. . He. Proc.135-137 . T. .9(5). On-line fingerprint verification. Image and Vision Computing. Direct gray scale minutia detection in fingerprints.19(1). J.Hentous.(3). Chiang. Fingerprint matching from minutiae texture maps. (2004). pp. Pattern Recognition Letters.189197 Cappelli. 672-675 Hao. Electron. F. D. 846-859 Jiang. IEEE Trans. W.98-111 He. R E. J. A. Maltoni.2135-2144 Liu. Tachaphetpiboon.CA Fausett. 4. . (2007). K. .(2002). International Conference on Pattern Recognition. S. L. Handbook of Fingerprint Recognition. K. and Cybernetics. S. 1365-1388 Jain. S. Man.K. . pp. . D. (2003). Part B: Cybernetics. . Pankanti. Support Vector Machines and other kernel-based learning methods.(2000). . 24 . Chan K. Song.(2000). . M. Localization of corresponding points in fingerprints by complex filtering.pp. A. D. 2.pp. pp. pp.T. R. B. An efficient fingerprint verification system using integrated wavelet and Fourier-Mellin invariant transform. Maltoni. IEEE Transactions on Pattern Analysis and Machine Intelligence. Nello. Proc.T. Fundamentals of Neural Networks architecture.H. Li. L.N. Conf. L. pp. Yang X. M. . (2007).K. Internationl Conference on Pattern Recognition. . A. . pp. Pattern Recognition. . Prentice Hall. Fingerprint recognition using DCT features. 179-187 Jain. . . Berlin.(2007). . Info. A new maximal-margin spherical-structured multiclass support vector machine. D. algorithm and application (Prentice Hall) 289-304 Gonzalez. D. Maltoni. (1997a). Josef. Applied Intelligence. References Amornraksa. (1994). 27-39 Maio.pp. N. Therory.C. . P. Modeling and Analysis of Local Comprehensive Minutia Relation for Fingerprint Matching. D. L.Pattern Analysis and Machine Intelligence.Amirouche. Triggs. . Z.1204-1211 Hu. S. Int. Cappelli. Prabhakar S. San Diego. Filterbank-based fingerprint matching. X. A. IRE Trans. L. B.2141 Dalal. Tian. H. R. . pp. IEEE Trans. S. M. .T. 503-513 John. . Prabhakar. pp. X.N. C.pp.. N. Aissani.B. Hong. Lett. Yau. IEEE.40(1). Image Processing. D.K. . 427-430 Maio. Pankanti. Cambridge University Press Kenneth. In Proceedings of the ninth European conference on computer vision.Jain. FVC2002: Second Fingerprint Verification Competition.Pattern Analysis and Machine Intelligence. (1997b) . (2006). Digital Image Processing (second edition). M. . Lin Y. pp: 2128 .. . 522–523 Benhammadi. Y . 32(12).(1962).(1997). .pp. Huang. (2000). Image Processing. Hong. 19. J.. Ling. 85(9) pp. . C. pp. Bolle. .An identity-authentication system using fingerprints. Woods. (2005). Bolle.. (2000). . IEEE Transactions on Systems.K. Histograms of oriented gradients for human detection. Wayman. 22(6).(2011). IT-8.B . J. Beghdad. Ferrara.(2002). 37(5). Springer. 30(2). Jain. Direct minutiae extraction from gray-level fingerprint image by relationship examination.Non-minutiae Based Fingerprint Descriptor 97 7. 42(9). 1038–1041 Jin. . A. . Minutia Cylinder-Code: A New Representation and Matching Technique for Fingerprint Recognition. O. A. (2003).L. IEEE Trans. C. Image Processing. Pattern Recognition. 36(10). Fingerprint Verification Based on Invariant Moment Features and Nonlinear BPNN. A real-time matching system for large fingerprint databases. Reisman. A. Automation. T. S.3461-3466 Nanni. Expert Systems With Applications. pp. N. Int. A.A. Lumini.(1996). L. D. International Journal of Control. pp. L.(2003).S.pp..12414-12422 Ojala. . Lecture Notes in Artificial Intelligence (LNAI 4304) (Springer.1939-1946 Yang. Papoulis. Effective Enhancement of Low-Quality Fingerprints with Local Ridge Compensation.pp.K. J. pp. . S. IEICE Electronics Express. Ratha. & Maeenpaa. M. J..500-509 Yang. and Systems. D.(2003). S. A. J. Neurocomputing.491-500 Nanni. pp. Boston: McGraw-Hill. K. S. Electron. Karu. Berlin) . (2001) . A. J. . . 18(8). Jain.895-898 Tico. Improved fingercode for filterbank-based fingerprint matching. T. Wavelet domain features for fingerprint recognition. Saarinen. A.98 Biometrics Nagaty. (2002).S. 3146-3151 Nanni. A fingerprint verification algorithm using tessellated invariant moment features.. . Park. pp. (2008b). Chen. Park.(2005).799–813 Ross. 36(7). A hybrid wavelet-based fingerprint matcher. 24(7). 40(11). Lumini. K. L. A.1002–1009 . .971–987. Image and Vision Computing. Descriptors for image-based fingerprint matchers. pp. A. . Probablity. pp. . 21–22 Yang. R. 1661-1673 Sha.. (2006). Applying learning vector quantization neural network for fingerprint matching. Kuosmanen.800-808 Yang. 41(11). 5(23). Hitchcock. Tang X. (2008). D. . . Lett. J. J. 37(1). Lumini. Local Binary Patterns for a hybrid fingerprint matcher. Conf. Pattern Recognition. Zhao F.O. K. 6(6). (2008c). . pp. D. An adaptive hybrid energy-based fingerprint matching technique. IEEE Transactions on Pattern Analysis and Machine Intelligence.F.C.. . (2008a). IEEE Transactions on Pattern Analysis and Machine Intelligence.C. (2007). 71(10-12). K. Yoon. . random variables and stochastic processes. 23. Park. . M. (1991).C. Multiresolution gray-scale androtation invariant texture classification with local binary patterns. L.(2009). P. . A hybrid fingerprint matcher. Pattern Recognition. pp. pp. . Park. Jain. Pietikainen. pp. 1. 1. Introduction Since the pioneering studies of Drs. An optical device which scans the retinal blood vessel pattern is required. Such an optical Retinal Identification (RI) device was originally patented in 1978 [Hill (1978)]. and that retinal photographs can be used for identifying people. Eye fundus photography for the purpose of identification is impractical. this was proven to hold even for identical twins [Tower (1955)]. it has been known that every eye has its own unique pattern of blood vessels. the history. Hence the idea of using the retinal blood vessel pattern for identification. technique and recent developments of RI are discussed. Carleton Simon and Isodore Goldstein in 1935 [Simon & Goldstein (1935)]. As the patent of retinal identification (opposed to the actual design of the device) has now worn off.1 The anatomy and optical properties of the human eye Fig. In this chapter. after several subsequent patents. In the 1950’s. . however. new developments in the field have taken place.0 5 Retinal Identification Mikael Agopov University of Heidelberg Germany 1. it developed into a commercial product in the 1980s and 1990s. A schematic picture of the human eye. jelly-like substance similar to water. The retina is a curved surface in the back of the eye. In the retina. The most successful application of measuring the RNFL thickness around the optic disc is probably the GDx glaucoma diagnostic device (Carl Zeiss Meditec.and s-polarization components of the incoming light beam.through deformation . It uses scanning laser polarimetry to topograph the RNFL thickness on the retina. It is about 15◦ away from the fovea in the nasal direction. The optic disc is an approximately 5◦ × 7◦ . turning the incoming photons first into a chemical and then to an electric signal.2 100 Will-be-set-by-IN-TECH Biometrics A schematic picture of the human eye is shown in Figure 1. After being amplified and pre-processed. the signal is transferred to the nerve fibers. A typical GDx image is shown in Figure 2. The retina converts the photon energy into an electric signal. The components are thus refracted differently. (1978). Klein Brink et al. (1988). the amount of birefringence varies according to the RNFL thickness and also drops steeply if a blood vessel (which is non-birefringent) is encountered. It has a bunch of small blood vessels. These various cells are responsible for the eye’s ’signal processing’. where they form the retinal nerve fiber layer (RNFL). The thickness of RNFL is not constant over the retina. which in general results the beam being divided into two parts. its outer shell. which is transferred to the brain through the optic nerve. The light is focused by the cornea and the crystalline lens onto the retina. If the parts are then reflected back by a diffuse reflector (such as the eye fundus). The size of the pupil limits the amount of light entering the eye.5 mm thick. through which the nerve fibers and blood vessels enter the eyeball. A reduced RNFL thickness means death of the nerve fibers and thus advancing glaucoma. The choroid is the utmost layer behind the retina just in front of the sclera. The point of sharpest vision is called fovea . Dreher et al. The light entering the eye first passes through the pupil. The eyeball is of about 24 mm in diameter and filled with vitreous humor. Weinreb et al. . It accounts for most of the refractive power of the eye (about 45 D). Its amount and orientation changes throughout the cornea. Germany). i. a small portion of the light will travel the same way as it came. the sclera.e.able to change its refractive power. an aperture-like opening in the iris. is made of rigid proteins called collagen. The birefringence of the eye is well documented (Cope et al. The remaining 18 D come from the crystalline lens. Elsewhere on the retina. 1. The birefringence of the corneal collagen fibrils constitutes the main part of the total birefringence of the eye.here the light-sensing photoreceptor cells are only behind a small number of other cells. which is also . which consists of the axons of the nerve fibers.2 The birefringence properties of the eye Birefringence is a form of optical anisotrophy in a material. The cornea is about 11 mm in diameter and only 0. (1992)). the main birefringent component is the retinal nerve fiber layer (RNFL). thus partly compensating for the refractive error and helping to focus the eye. (1990). which reside on the peripheral area of the retina around the optic disc. in which the material has different indices of refraction for p. 1 See Appendix about how the polarization change can be measured. and is responsible for the retina’s metaboly. ellipse-like opening in the eye fundus. joining the polarization components into one again. Jena. but having changed the beam’s polarization state in process1 . the light has to travel through a multi-layered structure of different cells. The infrared light is not absorbed by the photoreceptors (the absorption drops steeply above 730 nm). as well as pupil constriction. The downside is that . patented in 1981 [Hill (1981)]. In the original patent. so this arrangement has the definite advantage that no fixation outside the normal line of sight is required. unlike when a ring around the optic disc is scanned. green laser light was used . 2. the retinal blood vessels are fairly transparent to the NIR wavelengths as well . In the original retina scan. However. A typical GDx image of a healthy eye.the light is absorbed by the smaller choroidal blood vessels instead (thus. the retinal blood vessel pattern is scanned with the help of two rings of LEDs. RI using retinal blood vessel absorption The first patent of the biometric identification using the retinal blood vessel pattern dates back to 1978 [Hill (1978)]. Since the first working prototype RI. The image acquisition technique has also been changed: the LEDs are given up in favour of scanning optics.it was strongly absorbed by the red blood vessels. the scan is centered around the fovea instead of the optic disc. thus resulting in a weaker measured signal (darker in the image). A circular scan is preferred over a raster. Soon afterwards. which suffers from the problem of reflections from the cornea. considering this technique. The fovea is on the optical axis of the eye. In the patent of 1986 [Hill (1986)]. however. when the subject has to look 15 degrees off-axis. it was found out that visible light causes discomfort to the identified individual. 2. The birefringent nerve fiber layer is seen brightly in the image. it is absorbed to a bigger extent than when it hits other tissue. the term Retinal Identification is slightly misleading) before being reflected back from the eye fundus. as well as the blood vessels which displace the nerve fibers. the author of the patent founded the company EyeDentity (then Oregon.Retinal Identification Retinal Identification 101 3 Fig. causing loss of signal intensity. near-infrared (NIR) light has been used for illumination. The amount of light reflected back from the retina is measured .when the beam hits a blood vessel. Portland. USA) and began full-time efforts to develop and commercialize the technique. which . The light enters a beam splitter (4) which reflects it into a Fresnel optical scanner (5). based on the patent from 1996.if properly focused . One of the . Fig. creates a fan of nearly-collimated light beams which hit the eye of the tested subject. A detailed explanation is in the text. 3. 4. An infrared light source.will hit the retina at the same angle. A detailed explanation is in the text. Current RI technology is based on an active US patent [Johnson & Hill (1996)]. A schematic drawing of the current Retinal Identification technique. A schematic drawing fixation-alignment technique using a Fresnel lens. Thus the price paid for easier fixation is the quality of the signal. A schematic drawing of the measurement setup is shown in Figure 3. A multifocus Fresnel lens. cemented in the scanner. for example a krypton lamp (1). The rotating optical scanner scans a ring on the cornea.4 102 Will-be-set-by-IN-TECH Biometrics the choroidal blood vessels are much thinner than around the optic disc. Fig. is focused by a lens (2) through an optical mirror (explained below) via a pinhole (3). and they don’t form a clear pattern. amplified and processed. in the measured signal. the measured pattern is a sum of contributions from choroidal blood vessels and other structures. it is almost impossible to scan a non-willing subject.0 (a complete mismatch). thus compensating for refractive error (explained in detail below). 3. User experience has shown that a match above 0. creating several reflections. between the measurements).e. where the blood vessels are seen as dark lines on the otherwise bright nerve fiber layer. this happens at the cost of optical image quality. RI using the RNFL birefringence The blood vessels emerging into the retina through the optic disc often displace the nerve fibers in the retinal nerve fiber layer. 2.1 Matching At first. focuses the ghost images on different points on the eye’s optical axis. After being measured by the photodetector. if the birefringence (change in the state of polarization) of the scanning laser beam is measured around the optic disc. A Light-Emitting Diode (LED) is situated next to the Krypton lamp. As the subject’s eye can rotate around the optical axis (due to different head position. the blood vessels are not birefringent . is illustrated in Figure 4. a reference measurement. . creating a sharp drop.thus. However. The author and his co-workers studied the possibility of using blood vessel-induced RNFL birefringence changes for biometric purposes [Agopov et al. which consists of several focusing parts with different focal lengths. Fixation and alignment of the subject’s eye in a RI measurement is critical. The matching is done using a Fourier-based correlation. i. The lens. (2008)]. A measurement device was built to scan a circle of 20◦ around the optic disc. The scanning angle is big enough to catch the major blood vessels which enter the fundus through the disc. The light reflected back from the eye fundus travels the same way through the scanner and into the beam splitter. Regardless of whether the test subject is emmetropic (normal visual acuity). The measured birefringence would drop steeply where a blood vessel is encountered. or ’blip’. which the further measurements will be compared against. The eye looks at them through a multifocal Fresnel lens. This can be seen in the GDx-pictures. of the LED on the optical axis of the system. which is used for matching. head tilt. the best possible match is found by ’rotating’ the measurement points in the array. which are stored in an array. a part of it is transmitted into a photodetector (8) through a focusing lens (7).7 can be considered a matching identification. The processed signal is converted into points. has to be taken from each tested subject. which was also patented (Arndt (1990)) . myopic (near-sighted) or hyperopic (far-sighted). A similar process is used in all RI techniques.0 (a perfect match) to -1. one of the images will almost certainly end up on the retina and will thus result a sharp image and effectively compensate for the refractive error of the subject’s eye. ’ghost images’. Any further measurement will be compared against the reference. It illuminates the optical double-surface mirror. the signal is A/D-converted. Unlike the RNFL nerve fiber axons. These images function as targets for the test subject’s eye. the match is measured on a scale of +1.Retinal Identification Retinal Identification 103 5 beams will be focused on the retina by the eye’s own optical apparatus. The fixation system of the RI technique. a steep signal drop proportional to the blood vessel size is measured wherever one is encountered. A 785 nm laser diode (1) was used as a light source. however. 5. The measurement setup is explained in detail in [Agopov et al. (2008)]. creating a circular scan subtending a 10◦ radius of visual angle (20◦ diameter) in the tested subject’s eye (9). now. Amplified by the detection electronics. the signals were added and subtracted . then it passed through a non-polarizing beam splitter (3) and was reflected by a mirror (4) further into the optical scanner (5-8). The edge mirror (6) was tilted 50◦ . The center mirror was (5) tilted 45◦ from the disc plane rotated clockwise and reflected the laser beam onto the edge mirror. The polarizing beam splitter separated the p. here it is explained briefly. two avalanche photodiodes (13) were placed right after it. The light path is drawn with a solid line.6 104 Will-be-set-by-IN-TECH Biometrics 3. A schematic drawing of the RI measurement setup using the RNFL birefringence.and s-polarization components. measuring the two signals which corresponded the two polarization components. A detailed explanation is in the text. The reflection from the ocular fundus traveled back the same optical path through the scanner. A lens (10) focused the beam into a polarizing beam splitter (12) through a quarter wave plate (11) which had its fast axis 45◦ to the original plane of polarization. the beam splitter (3) reflected the useful half of the beam (drawn with dashed line) into the detection system (10-13). The two scanning mirrors (5 and 6) and a counterweight (7) were cemented on an aluminium plate which was spun by a DC motor (8). The apparatus and the light paths within it are shown in Figure 5.1 Apparatus and method Fig. The beam was collimated by a lens (2). The polarization was manipulated so that the Stokes parameters S0 and S3 were measured . RI using the optic disc structure Another interesting RI technology was patented in 2004 [Marshall & Usher (2004)]. If the imaged subject’s cornea is at this point(6). special care was taken to properly align the test subject’s eye.2 Image analysis The boundary of the optic disc is found from the image taken by the SLO. Heidelberg. which is used for glaucoma diagnostics. A typical SLO image of the optic disc (taken with the HRT) is shown in Figure 7. Because the fixation point of the retina.it was decided to measure S3 instead of S1 for birefringence-based changes as it appeared to suffer less from various amounts and orientations of corneal birefringence in our computerized model. As the alignment of the eye is critical. The scan is usually centered around the optic disc (using an off-axis fixation target . 4. USA) was founded for developing the technique. the measured eye had to look 15◦ away in the nasal direction to center the scan on the optic disc.1 The principle of an SLO The best-known application of the SLO is probably the Heidelberg Retina Tomograph (Heidelberg Engineering. MA. This fairly complicated procedure is explained in detail in the patent. the fovea. The reflection from the eye fundus travels back through the system. Thus possible head tilt was reduced to almost zero. an ellipse is fit onto the image by analyzing the average intensity of the pixels around the boundary.taken by a scanning laser ophthalmoscope (SLO) for identification. The pinhole is very important only the light which comes exactly from the conjugated plane reaches the photodetector. Germany). The measurement apparatus included three eyepieces: the measurement was taken through the fixed central piece. There is a clear boundary between the disc and surrounding tissue (as seen in Figure 7). The patterns of recognized . Because the human eye is about 0. two fixation LEDs were set at 15◦ angle to the central axis of the scan. 4.75 D myopic at the wavelength 785 nm (see [Fernandez et al. The idea is to use the image of the optic disc . The principle of an SLO is illustrated in 6. The scanner consists of two rotating mirrors. but is reflected into the detection system (7-9) by the beam splitter. a fast and a slow one. 4. A lens (7) focuses the beam through a pinhole (8) onto a photodiode (9). creating a raster scan. A collimated laser beam goes through a beam splitter (2) into an optical scanner (3).Retinal Identification Retinal Identification 105 7 respectively. in addition there were two horizontally movable ones. The scanning beam is imaged through two lenses (4 and 5) onto exactly one point called the conjugated plane.as in the previous setup). To achieve this. A low-intensity laser diode (1) is used for illumination. a recognization pattern is created from the its structures. (2005)] for details) the fixation LEDs were placed at 130 cm distance. Once the disc is identified. the eye’s optics focus the scanning beam onto the retina (7). so that the eye’s fixation would compensate for this. Thus the SLO creates a high-resolution microscopic image of the retina. A company Retinal Technologies (Boston. is approximately 15 degrees away from the optic disc. the subject could look through the central piece with either eye while having a ’dummy’ eyepiece available for the other eye. 7.8 106 Will-be-set-by-IN-TECH Biometrics Fig. An schematic drawing of a scanning laser ophthalmoscope. Fig. A typical SLO image of the optic disc (taken with an HRT). 6. . A detailed explanation is in the text. The light from the infrared LED (1) first goes through a polarizer (2).a part of the beam is now able to pass through the polarizer. the test subject has to be faced towards the dashed line but look in the direction of the solid line. The measurement setup of a combined retina and iris identification device. and one red (λ ≈ 660 nm) LED (12). Combined retina and iris identification Fig. The interference from the ’wrong’ light source is blocked a dichroic mirror (5). This is obviously advantageous. The measurement setup is shown in Figure 8. a combination of retina and iris identification was suggested (about iris identification. It should be noted that this setup has no optical scanner (nor it is confocal) . as explained earlier. a part of the reflected beam now passes through the beam splitter into the detection system.Retinal Identification Retinal Identification 107 9 individuals are stored in a database for comparing with the pattern of a person wishing to be identified. The beam is focused by the eye’s optics (6) onto the area around the optic disc. a reflection of the beam returns back the same way as the beam came. after being reflected from the eye fundus (7). The beam is focused by a lens (9) onto the detector (10). The outgoing beam is then divided into two parts by a beam splitter (3). 5.having changed its state of polarization while passing through the eye tissue . which can be for example a CCD camera. The details are in the text.a wide-field image around the optic disc is . To achieve this. The transmitted (useless) part crosses the beam splitter and is preferably absorbed by a light trap. The measurement system is constructed so that both biometrics can be recorded with one scan. coming from the other direction. The reflected (useful) beam part is collimated by a lens (4) and goes through the dichroic mirror (5) into the eye. However . which ensures that the outgoing light is linearly polarized. (2007)]. as now two biometrics are available simultaneously. In a fairly recent patent [Muller et al. They are 15◦ apart. however. see the previous chapter). 8. Two illuminating LED’s are used: One infrared (λ ≈ 800 nm) LED (1). The setup has two optical axes: one for the retinal image (dashed line) and one for alignment and the iris image (solid line). As a fixation target. The polarizer (8) is set so that it absorbs the illuminating LED’s polarization direction. the incoming light is focused on the area around the optic disc only if the eye is looking 15◦ in the nasal direction. a ring of green LED’s (11) is placed around the optical axis of the iris scanning system. In the setup suggested in the patent. The two detection systems are electronically synchronized so that one scan records both images simultaneously. Therefore.10 108 Will-be-set-by-IN-TECH Biometrics captured. On its way to the iris and back. The fixation LEDs are also placed together with the red illuminating LED. the possible reflections of the green LED are filtered out by a red band pass filter (19). However. instead of a beam splitter (15). The wave plate’s axes are set so that both the fast and the slow axis of it are 45◦ to the original polarization direction . 6. which would focus the light onto the iris (the beam entering the eye should not be parallel. including the RI. the distance is controlled by an ultrasound transducer. the illuminating light from the red LED (12) is first linearly polarized (13) 2 . and then guided into the eye through an alignment tube (14). which is turned on when the eye is correctly aligned.1 Limitations As the light has to pass twice through the pupil during the measurement. The distance between the lens and the beam splitter should be much bigger than that between the beam splitter and the eye. various eye conditions can disturb the light’s passing through the eye. The beam is focused by a lens onto the detector (a CCD camera). First of all. 2 In the patent. moreover. . compromising vision. 6.the reflection from the eye would then have to be reflected by it. For the iris scan. and the recording can be taken. the technique has difficulties in broad daylight. which is set to absorb the polarizing direction of the initial polarizer (13). When the distance is right. the device is desribed without the polarizers (13 and 17).when the beam passes through it.1 RI using absorption In a performance test by Sandia National Laboratories [Sandia Laboratories (1991)]. a constricted pupil can increase the number of false negative scan results. its polarization becomes circular. preferably it would consist of at least one lens with a long focal length. the IR light and the green light used for fixation cannot both pass through the dichroic mirror. this is true for almost any optical measurement). the light double-passes a quarter wave plate (16). The eye is at a crossing of two optical axes . the second pass (the reflection from the iris) turns the circular polarization into linear again. Results of performance tests and limitations of the RI techniques 6. this leads to unsurmountable difficulties. and the optic disc is seen on the CCD (10). with no false positives.1. After entering the detection system (17-20). so that the slightly de-focused reflection image of the iris can be caught. the eye is aligned properly.its distance and orientation are critical. The recording is triggered by a switch. the author suggests slight modifications in the setup. a dichroic mirror is suggested. thus resulting in lower resolution than a confocal scanning system would achieve. the IR light would first have to pass through the mirror . the light can now pass through the polarizer (17). Thus dim light conditions are preferable for the RI (of course. It sends and receives ultrasound pulses which are reflected back from the surface of the cornea. The function of the alignment tube is not explained in the patent. this also affects measuring the eye’s properties optically. In addition. otherwise it will be focused on the retina). the filter (19). the quarter wave plate (16). but having turned the polarization direction by 90◦ in the process. the EyeDentity RI device recognized >99% of the tested subjects in a three-attempt measurement. Cataracts: A cataract is an eye condition in which clouding develops in the crystallinen lens.2 RI using birefringence Eight eyes of four volunteer subjects were measured.2. such as the age-related macula degeneration (AMD). of which 34 could be correlated with the ’peaks’ in the measured signal. 6. Altogether 55 blood vessels were located on the fundus photos of the ’better’ eyes (the ones which yielded clearer signals) of the four volunteers. from the two reflection/absorption measurements (Sum) and from the two birefringence measurements (Diff). 2. Severe eye diseases. . the test subjects were required to have a refractive error of less than about ±2 diopters or a good corrected vision using contact lenses. problems with focusing and also bad optical quality of measurements.e. becomes increasingly difficult. which damages the RNFL by killing the nerve fibers going through the optic nerve.unlike the conventional RI technique had no inherent defocus compensation. the smallest vessels were ignored. The columns in the tables represent total percentage of blood vessel ’peaks’ correlating with vessels in the fundus photo (Total). The eye conditions disturbing the measurement include cataracts and astigmatism as well as a severe glaucoma. The transparency was then placed on the fundus photo to compare the marked signal peaks with the blood vessels on the photo. If a confirmed peak did not correspond to a vessel above the threshold size on the fundus photo. i. Total Sum Diff Average sensitivity 69% 51% 33% Average specificity 78% 76% 60% Table 1. Obviously any optical measurement in the eye. fundus photographs were taken from all the eyes. Summarized results of the measurements taken from the eye with a better signal of each subject. In this way. The lens becomes opaque so seeing becomes compromised. either by destroying retinal tissue or by stimulating growth of new blood vessels. including the RI. on which the measured peaks were marked at corresponding angles on the perimeter of the circle. For verifying purposes.Retinal Identification Retinal Identification 109 11 This results in 1. Severe astigmatism: An astigmatic eye’s optics image dots as lines. the numbers of blood vessels corresponding to the peaks in the measured absorption/reflectance and birefringence-based tracings were calculated. Only the vessels larger than a certain threshold size (set for each eye individually) were taken into account. The measured peaks and the blood vessels on the fundus photos were compared as follows: A 20◦ diameter circle was drawn on a transparent overlay.are presented in Table 1.and birefringencebased signals were recorded two times for each eye. it was considered a false positive. The calculated sensitivities (number of vessels identified / number of vessels altogether) and specificities (number of positive recognitions / number of positive + false positive recognitions) . 6. 3. can change the structure of the retina. Both absorption.1 Limitations Our system was of ’proof-of-principle’ -nature and . but also costly in the lost hide value. However. The original RI technique uses the choroidal blood vessel pattern for identification. the . especially if a laser (even if the light is weak intensity and harmless to the eye.imaging the optic disc using an SLO. 8. Colorado. To develop the technique for animals. Collins. the author is unaware either of any commercial devices or of any scientific studies on the accuracy of these identifying method.birefringence-based blood vessel detection can help detecting more blood vessels. 7. a simple video camera based device provides an accurate enough identification. or the combined retina. The company produces and develops hand-held video camera -based devices which provide an acceptable image an animal’s eye fundus to allow identification. The polarization state of the light can be completely represented with four quantities. as in the original RI technique. only at a small cost of specificity. The measured signal was also directly linked to blood vessels and not to other retinal structures.and y-polarization and the T denote time averages. several newer RI techniques center the scan around the optic disc. however. This method is certainly preferable over the traditional hot iron branding. When a false identification is not disastrous (as opposed to some high security installations). S2 = 2Ex Ey cos S3 = 2Ex Ey sin (1) where = y − x is the phase difference between x. The use of retinal identification is not limited to humans.3 Other techniques The author in unaware of any scientific studies on the accuracy of the other techniques mentioned here. if this is achieved.in addition. In 2004.12 110 Will-be-set-by-IN-TECH Biometrics 6.appear very promising. The main drawback in such a device is that the eye has to fixate 15 degrees off-axis while being measured.as the authors and his co-workers proved . a patent was filed on identifying various animals using their retinal blood vessel pattern [Golden et al. the blood vessels around the optic disc are easier to detect . which is not only painful to the animal.and iris scan . as in our system). USA). the enrollees should be able to overcome their fear of these scans once they acquire more user experience. However. The newer techniques . Falsifying an image of a retinal blood vessel pattern appears impossible. For totally unpolarized light S0 > 0 and S1 = S2 = S3 = 0. The eye scans are generally considered invasive or even harmful. a company OptiBrand was started (Ft. However. the Stokes parameters: S0 = E 2 x S1 = T 2 + Ey T E2 T x − 2 Ey T T T. Appendix: The Stokes parameters The modern treatment of polarization was first suggested by Stokes in the mid-1800’s. For completely polarized light. on . Discussion Retinal Identification remains the most reliable and secure biometric. (2004)]. . Comstock. Exp. π/2). D. United States Patent 5532771 Golden. (1986). J. W. (1975). On optical anisotrophy of the lens fiber cells.. 0) S2 (θ. R. (1996).. Optical alignment system. Rollin. B. Opt. Eye fundus optical scanner system and method. J. J. 0) − I (90◦ . Amer. United States Patent 4. ρ) = I (45◦ .E. Wolbarsht. 9. K. J. United States Patent 4923297 Klein Brink. M. 0) S1 (θ. SPIE 1746:34-41 Fernandez. (1978). 21: 231-234 Cope. (1978).393. Opt Soc Am 68: 1149-1140 Dreher. R. Van Blockland. Hill..I. Rotating beam ocular identification apparatus and method. Hermann. Use of retinal nerve fiber layer birefringence as an addition to absorption in retinal scanning for biometric purposes. Wu. 0) − I (135◦ . Applied Optics 47: 1048-1053 Bettelheim.M. F.. P.. (2008). Reiter. Fovea-centered eye fundus scanner.J. References Agopov. (2004). The polarization state of the light can be defined as 2 2 2 S1 + S2 + S3 111 13 V= S0 . π/2) − I (135◦ . G. These can be easily measured: for example.K. A.. B. A 5:49-57 Marshall.L.. (1981). Method for generating a unique and consistent signal pattern for identification of an individual. Switzer. Drexler. J. K. C.. Gramatikov. Irsch.Retinal Identification Retinal Identification 2 2 2 2 the other hand. (2005). Eye Res.R.. P. (1990). Birefringence of the human foveal area assessed in vivo with the Mueller-Matrix ellipsometry. R. W. Pieto. United States Patent 4109237 Hill. Retinal vasculature image acquisition apparatus and method. A. R. J. 0) S3 (θ. United States Patent 6766041 Arndt.L. (3) where θ is the angle of the azimuth vector measured from the x-plane and ρ is the birefringence. Yamanashi. Y. Ocular aberrations as a function of wavelength in the near infrared measured with a femtosecond laser. Guyton. B. R. Unterhuber. Soc. B. and S3 can be measured by adding a quarter wave plate before the aforementioned system. (1988). H. United States Patent 4620318 Johnson. (1992)..H. ρ) = I (0◦ . 0) + I (90◦ . Apparatus and method for identifying individuals through their retinal vasculature patterns.. ρ) = I (0◦ . Usher. B. S0 = S1 + S2 + S3 . (2) The importance of the Stokes parameters lies in the fact that they are connected to easily measurable intensities: S0 (θ. B. (2004).366 Hill. D. Scanning laser polarimetry of the retinal nerve fiber layer. United States Patent 6757409 . Artal... The corneal polarization cross. which are placed so that they measure the intensities coming out of the beam splitter. Optics Express 13:400-409 Hill.C.. ρ) = I (45◦ . E. S1 can be measured with polarizing beam splitter and two detectors. (2007).. P. Technical Report SAND91-0276.. L. The fundus Oculi in monozygotic twins: Report of six pairs of identical twins. Vol. Quigley H.. United States Patent 7248720 Sandia National Laboratories (1991). Coleman. R.. B. Histopatologic validation of Fourier-ellipsometry measurements of retinal nerve fiber layer thickness. (1990). (1935). 225-239 Weinreb. F. Performance Evaluation of Biometric Identification Devices.. Reiter. B. Usher D. (1955). Arch Opthalmol 108:557-60 . Archives of Ophthalmology.. 54. K. D. A New Scientific Method of Identification. Dreher A. UC-906 Simon C. New York State Journal of Medicine. Shaw.. Heacock G. Goldstein I.. pp. pp. 35. Method and system for generating a combined retina/iris patter biometric.14 112 Will-be-set-by-IN-TECH Biometrics Muller. 901-906 Tower. A. 18. Vol. No. behavioral biometrics are usually measured over time and are more dependant on the individual’s state of mind or deliberated alteration. the most challenging aspect is the need of high accuracy. Behavioral characteristics. Penedo University of Coruña Department of Computer Science Spain 1. When considering automation of the identity verification. the password systems are easily broken by the use of brute force where powerful computers generate all the possible combinations and test it against the authentication system. Nevertheless. it is obvious that the main concerns are ..0 6 Retinal Vessel Tree as Biometric Pattern Marcos Ortega and Manuel G. car plate. he/she should be also ideally inconvenienced to a minimum which further complicates the whole verification process Siguenza Pizarro & Tapiador Mateos (2005). in the scope of the knowledge-based authentication. the use of sophisticated passwords consisting of numbers. such as speech or handwriting... To reinforce the active versus passive idea of both paradigms. reliable authentication and authorization of individuals are becoming more and more necessary tasks for everyday activities or applications. make reference to the way some action is performed by every individual. Particularly.. such as fingerprints. With this scope in mind. this characteristics usually consist of physiological or behavioral features.. birth dates. On the other hand. Physiological features. it is well known that password systems are vulnerable mainly due to the wrong use of users and administrators.). In the scope of the possession-based authentication. the term biometrics refers to identifying an individual based on his/her distinguished intrinsic characteristics.). One of the most common problems is the use of easily discovered passwords (child names. Just for instance. It is not rare to find some administrators sharing the same password. taking a flight or performing a money transfer require the verification of the identity of the individual trying to perform these tasks. are physical characteristics usually measured at a particular point of time. As they characterize a particular activity. While the user should not be denied to perform a task if authorized. For instance. The traditional authentication systems based on possessions or knowledge are widely spread in the society but they have many drawbacks that biometrics try to overcome. in terms of avoiding incorrect authorizations or rejections.. Introduction In current society. physiological biometrics are also usually referred to as static biometrics while behavioral ones are referred to as dynamic biometrics. upper and lower case letters and even punctuation marks makes it harder to remember them for an user. workers. or users giving away their own to other people. common situations such as accessing to a building restricted to authorized people (members. Maio & Maltoni (1997). (2006). (1999). some advances have been done in this field but. Yang et al. Additionally.114 2 Will-be-set-by-IN-TECH Biometrics related to the loss of the identification token. This contrast information of this area is processed via fast Fourier transform. Retinal blood vessel pattern is unique for each human being even in the case of identical twins. The transformed data forms the final biometric pattern . Biometrics overcome most of these concerns while they also allow an easy entry to computer systems to non expert users with no need to recall complex passwords. (2008). Choi & Marks (2004). Robert Hill introduced the first identification system based on retina Hill (1999). it is one of the hardest biometric to forge as the identification relies on the blood circulation along the vessels. Mian et al.d.). The general idea was that of taking advantage of the inherent properties of the retinal vessel pattern to build a secure system. Kim. in any case. (2002). (2002). (2008). (2000) or hand geometry Jain et al. it is a highly stable pattern over time and totally independent of genetic factors. systems Lab (n. 1. Schema of the retina in the human eye. the privacy and/or security would be compromised. Its main drawback is the acquisition process which requires collaboration from the user and it is sometimes perceived as intrusive. Moghaddam & Pentland (1997). Blood vessels are used as biometric characteristic. Many different human biometrics have been used to build a valid template for verification and identification tasks. The scanned area is a circular band around blood vessels. Moreover. Cho. As it will be further discussed. If the token was stolen or found by another individual. However. this continues to be the weak point in retinal based authentication. iris Chou et al. (1995). He et al. there exist other emerging biometrics where we can find retina biometrics. Seung-Hyun et al. These property make it one of the best biometric characteristic in high security environments. Among the most common biometrics. Kim. (2008). Zunkel (1999). Fig. The scanner captured a band in the blood vessels area similar to the one employed in the iris recognition as shown in Figure 2. The system acquired the data via a scanner that required the user to be still for a few seconds. Bang & Lee (2004). Identity verification based on retina uses the blood vessels pattern present in the retina (Figure1). Venkataramani & Kumar (2003). Kisku et al. Also. Sidlauskas (1988). commercial webs on the Internet are favored not only by the increasing trust being transmitted to the user but also by the possibility of offering a customizable environment for every individual along with the valuable information on personal preferences for each of them. Ma et al. we can find the fingerprint Bolle et al. Nabti & Bouridane (2007) or face Kim. Also. considered in this system. This pattern worked good enough as the acquisition environment was very controlled. 2. The retinal image is acquired by taking an instant photograph. There are some zones in the retinal vessels that can not be compared . increasing the false rejection rate. Two retinal image cameras. the devices are cheaper and more accessible in general. Finally. it rendered the system hardly appealing. despite all its convenient properties.Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 115 3 Fig. was discontinued. This process was both slow and uncomfortable for the user. the result was that the use of retinal pattern as a biometric characteristic. Nowadays. therefore. this is also the source of the major drawbacks present in the device: the data acquisition process. This produces as a result that previous systems based on contrast information of reduced areas may lack the required precision in some cases. In Figure 4 it can be observed two images from the same person acquired at different times by the same retinograph. Fig. currently. Illustration of the scan area in the retina used in the system of Robert Hill. The lighting conditions and the movement of the user’s eye vary between acquisitions. the hardware was very expensive and. retinal image cameras (Figure 3) are capable of taking a photograph of the retina area in a human eye without any intrusive or dangerous scanning. 3. Moreover. This technology reduces the perception of danger by the user during the retina acquisition process but also brings more freedom producing a more heterogeneous type of retinal images to work with. Of course. Briefly. Drs. the objectives of this work are enumerated: • Empirical evaluation of the retinal vessel tree as biometric pattern • Design a robust. identical twins would be the most likely to have similar retinal vascular . to allow the retinal biometrics to keep and increase the acquisition comfortability. Later in the 1950s. Fig. Carleton Simon and Isodore Goldstein. Section 3 presents the methodology developed to build the authentication system. This work is focused on the proposal of a novel personal authentication system based on the retinal vessel tree. the approach presented here to the retinal recognition is closer to the fingerprint developments than to the iris ones as the own structure of the retinal vessel tree suggests. They subsequently published a paper on the use of retinal photographs for identifying people based on their blood vessel patterns Simon & Goldstein (1935). Example of two digital retina images from the same individual acquired by the same retinal camera at different times. of any two persons. 4. In this sense. maintaining the extremely low error rates. Finally. is capable to cope with a more heterogeneous range of retinal images. Section 5 offers some conclusions and final discussion. Section 4 discusses the experiments aimed to test the proposed methodologies. 2. the rest of this document is organized as follows. He noted that. easy to store and process biometric pattern making use of the whole retinal vessel tree information • Development of an efficient and effective methodology to compare and match such retinal patterns • Analysis on similarity metrics performance to establish reliable thresholds in the authentication process To deal with the suggested goals. Second section introduces previous works and research on the retinal vessel tree as biometric pattern.116 4 Will-be-set-by-IN-TECH Biometrics because of the lack of information in one of the images. Thus. including an analysis of similarity measures. This system deals with the new challenges in the retinal field where a more robust pattern has to be designed in order to increase the usability for the acquisition stage. including biometric template construction and template matching algorithms. Related work Awareness of the uniqueness of the retinal vascular pattern dates back to 1935 when two ophthalmologists. it is necessary to implement a more robust methodology that. their conclusions were supported by Dr. Paul Tower in the course of his study of identical twins. realized that every eye has its own unique pattern of blood vessels. while studying eye disease. The dataset size is 60 images. In Farzin et al. and total branching points (mean 20. total branches from the vascular trunk (mean 12. A paired comparison of the retinal vessel patterns from both eyes of 30 other animals confirmed that eyes from the same animal differ. Blood vessels are among the first organs to develop and are entirely derived from the mesoderm.3). The results showed a high confidence band in the authentication process but the database included only 6 individuals with 2 images for each of them. time invariant and very hard to forge.2) and the right (mean 6.0 and variance 13.2) showed differences across all animals (52). The detection of the optic disc is a complex problem and in some individuals . the dominate trunk vessel of bovine retinal images was positioned vertically and branches on the right and left of the trunk and other branching points were evaluated. retinal vascular pattern images from livestock were digitally acquired in order to evaluate their pattern uniqueness. Mariño et al. (2003). the whole arterial-venous tree structure was used as the feature pattern for individuals.4 and variance 1. which define the pattern of the vasculature. In this scenario. In a more recent study Whittier et al. Branches from the left (mean 6. Thus. Tower showed that. Most common diseases like diabetes do not change the pattern in a way that its topology is affected. Vascular development occurs via two processes termed vasculogenesis and angiogenesis. In angiogenesis. However.Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 117 5 patterns. the blood vessel assembly during embryogenesis.Mariño et al. Based on the idea of fingerprint minutiae. as showed by Mariño et al. the pattern does not change through the individual’s life. retinal vessel tree pattern has been proved a valid biometric trait for personal authentication as it is unique. (2006). of all the factors compared between twins. A common problem in previous approaches is that the optic disc is used as a reference structure in the image. who introduced a novel authentication system based on this trait. This greatly difficults the storing of the pattern in databases and even in different devices with memory restrictions like cards or mobile devices. the pattern matching problem is reduced to a point pattern matching problem and the similarity metric has to be defined in terms of matched points. In general.4 and variance 2. Some lesions (points or small regions) can appear but they are easily avoided in the vessels extraction method that will be discussed later. retinal vessel tree his is a unique pattern in each individual and it is almost impossible to forge that pattern in a false individual.5) of the vascular trunk. retinal vascular patterns showed the least similarities Tower (1955). new vessels arise by sprouting of budlike and fine endothelial extensions from preexisting vessels Noden (1989). (1989). this is. (2008) a pattern is defined using the optic disc as reference structure and using multi scale analysis to compute a feature vector around it. begins with the clustering of primitive vascular cells or hemangioblasts into tube-like endothelial structures. Vasculogenesis. unless a serious pathology appears in the eye. the experimental results do not offer error measures in a real case scenario where different images from the same individual are compared. rotated 5 times each. In that work. (2003). However. C.8 and variance 4. One of the weak points of the proposed system was the necessity of storing and handling a whole image as the biometric pattern. This would be confirming the uniqueness of animal retinal vascular pattern suggested earlier in the 1980s also by De Schaepdrijver et al. Of course. a robust pattern is introduced where a set of landmarks (bifurcations and crossovers of retinal vessel tree) were extracted and used as feature points. Good results were obtained using an artificial scenario created by randomly rotating one image per user for different users. The performance of the system is about a 99% accuracy. Retinal images of 4 cloned sheep from the same parent line were evaluated to confirm the uniqueness of the retinal vessel patterns in genetically identical animals. To evaluate each retinal vessel pattern. 118 6 Will-be-set-by-IN-TECH Biometrics with eye diseases this cannot be achieved correctly. In this work, the use of reference structures is avoided to allow the system to cope with a wider range of images and users. 3. Retinal verification based on feature points Figure 5 illustrates the general schema for the new feature point based authentication approach. The newly introduced stages are the feature point extraction and the feature point matching. The following chapter sections will discuss the methodology on these new stages of the system. Fig. 5. Schema of the main stages for the authentication system based in the retinal vessel tree structure. 3.1 Feature points extraction Following the idea that vessels can be thought of as creases (ridges or valleys) when images are seen as landscapes (see Figure 6), curvature level curves will be used to calculate the creases Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 119 7 (ridge and valley lines). Several methods for crease detection have been proposed in the literature (see López et al. (1999) for a comparison between methods), but finally a differential geometry based method López et al. (2000) was selected because of its good performance in similar images Lloret et al. (1999; 2001), producing very good results. Fig. 6. Picture of a region of the retinal image as landscape. Vessels can be represented as creases. Among the many definitions of crease, the one based on Level Set Extrinsic Curvature, LSEC López et al. (1998), has useful invariance properties. The geometry based method named LSEC gives rise to several problems, solved through the improvement of this method by a multilocal solution, the MLSECLópez et al. (2000). But results obtained with MLSEC can still be improved by pre-filtering the image gradient vector field using structure tensor analysis and by discarding creaseness at isotropic areas by means of the computation of a confidence measure. The methodology allows to tune several parameters to apply such filters as for creases with a concrete width range or crease length. In Caderno et al. (2004) a methodology was presented for automatic parameter tuning by analyzing contrast variance in the retinal image. One of the main advantages of this method is that it is invariant to changes in contrast and illumination, allowing the extraction of creases from arteries and veins independently of the characteristics of the images, avoiding a previous normalization of the input images. The final result is an image where the retinal vessel tree is represented by its crease lines. Figure 7 shows several examples of the creases obtained from different retinal images. The landmarks of interest are points where two different vessels are connected. Therefore, it is necessary to study the existing relationships between vessels in the image. The first step is to track and label the vessels to be able to establish their relationships between them. In Figure 8, it can be observed that the crease images show discontinuities in the crossovers and bifurcations points. This occurs because of the two different vessels (valleys or ridges) coming together into a region where the crease direction can not be set. Moreover, due to some illumination or intensity loss issues, the crease images can also show some discontinuities along a vessel (Figure 8). This issue requires a process of joining segments to build the whole vessels prior to the bifurcation/crossover analysis. Once the relationships between segments are established, a final stage will take place to remove some possible spurious feature points. Thus, the four main stages in the feature point extraction process are: 120 8 Will-be-set-by-IN-TECH Biometrics Fig. 7. Three examples of digital retinal images, showing the variability of the vessel tree among individuals. Left column: input images. Right column: creases of images on the left column representing the main vessels. Fig. 8. Example of discontinuities in the creases of the retinal vessels. Discontinuities in bifurcations and crossovers are due to two creases with different directions joining in the same region. But, also, some other discontinuities along a vessel can happen due to illumination and contrast variations in the image. Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 121 9 1. Labelling of the vessels segments 2. Establishing the joint or union relationships between vessels 3. Establishing crossover and bifurcation relationships between vessels 4. Filtering of the crossovers and bifurcations 3.1.1 Tracking and labelling of vessel segments To detect and label the vessel segments, an image tracking process is performed. As the crease images eliminate background information, any non-null pixel (intensity greater than zero) belongs to a vessel segment. Taking this into account, each row in the image is tracked (from top to bottom) and when a non-null pixel is found, the segment tracking process takes place. The aim is to label the vessel segment found, as a line of 1 pixel width. This is, every pixel will have only two neighbors (previous and next) avoiding ambiguity to track the resulting segment in further processes. To start the tracking process, the configuration of the 4 pixels which have not been analyzed by the initially detected pixel is calculated. This leads to 16 possible configurations depending on whether there is a segment pixel or not in each one of the 4 positions. If the initial pixel has no neighbors, it is discarded and the image tracking continues. In the other cases there are two main possibilities: either the initial pixel is an endpoint for the segment, so this is tracked in one way only or the initial pixel is a middle point and the segment is tracked in two ways from it. Figure 9 shows the 16 possible neighborhood configurations and how the tracking directions are established in any case. Once the segment tracking process has started, in every step a neighbor of the last pixel flagged as segment is selected to be the next. This choice is made using the following criterion: the best neighbor is the one with the most non-flagged neighbors corresponding to segment pixels. This heuristic contains the idea of keeping the 1-pixel width segment to track along the middle of the crease, where pixels have more segment pixel neighbors. In case of a tie, the heuristic tries to preserve the most repeated orientation in the last steps. When the whole image tracking process finishes, every segment is a 1 pixel width line with its endpoints defined. The endpoints are very useful to establish relationships between segments because these relationships can always be detected in the surroundings of a segment endpoint. This avoids the analysis of every pixel belonging to a vessel, considerably reducing the complexity of the algorithm and therefore the running time. 3.1.2 Union relationships As stated before, the union detection is needed to build the vessels out of their segments. Aside the segments from the crease image, no additional information is required and therefore is the first kind of relationship to be detected in the image. An union or joint between two segments exists when one of the segments is the continuation of the other in the same retinal vessel. Figure 10 shows some examples of union relationships between segments. To find these relationships, the developed algorithm uses the segment endpoints calculated and labelled in the previous subsection. The main idea is to analyze pairs of close endpoints from different segments and quantify the likelihood of one being the prolongation of the other. The proposed algorithm connects both endpoints and measures the smoothness of the connection. An efficient approach to connect the segments is using an straight line between both endpoints. In Figure 11, a graphical description of the detection process for an union is 122 10 Will-be-set-by-IN-TECH Biometrics (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) Fig. 9. Initial tracking process for a segment depending on the neighbor pixels surrounding the first pixel found for the new segment in a 8-neighborhood. As there are 4 neighbors not tracked yet (the bottom row and the one to the right), there are a total of 16 possible configurations. Gray squares represent crease (vessel) pixels and the white ones, background pixels. The upper row neighbors and the left one are ignored as they have already been tracked due to the image tracking direction. Arrows point to the next pixels to track while crosses flag pixels to be ignored. In 9(d), 9(g), 9(j) and 9(n) the forked arrows mean that only the best of the pointed pixels (i.e. the one with more new vessel pixel neighbors) is selected for continuing the tracking. Arrows starting with a black circle flag the central pixel as an endpoint for the segment (9(b), 9(c), 9(d), 9(e), 9(g), 9(i), 9(j)). shown. The smoothness measurement is obtained from the angles between the straight line and the segment direction. The segment direction is calculated by the endpoint direction. The maximum smoothness occurs when both angles are π rad., i.e. both segments are parallel and belong to the straight line connecting it. The smoothness decreases as both angles decrease. A criterion to accept the candidate relationship must be established. A minimum angle θmin is set as the threshold for both angles. This way, the criterion to accept an union relationship is defined as Union(r, s) = (α > θmin ) ∧ ( β > θmin ) (1) an idea similar to the union algorithm is followed based on the search of the bifurcations from the segments endpoints. the retinal vessel tree can be reconstructed by joining all segments. A bifurcation is a point in a segment where another one starts from. 10. . 4 3. Some of the vessels present discontinuities leading to different segments. s are the segments involved in the union and α. bifurcations allow to build the vessel tree by establishing relationships between them. The criterion in this case is finding a segment close to an endpoint whose segment can be assumed to start in the found one. A crossover is an intersection between two segments. A crossover can be seen in the segment image. An example of this is shown in Figure 12. so they are above the required threshold ( 3 π) and the union is finally accepted. While unions allow to build the vessels. β their respective endpoint directions. In order to find bifurcations in the image.1.3 Bifurcation/crossover relationships Bifurcations and crossovers are the feature interest points in this work for characterizing individuals by a biometric pattern. the algorithm does not require to track the whole segments. Therefore. bounding complexity to the number of segments and not to their length. where r. 11. the algorithm delivers good results 4 in all cases. Using both types. Union of the crease segments r and s. Examples of union relationships. This way. as two close bifurcations forking from the same segment. finding bifurcation and crossover relationships between segments can be initially reduced to find only bifurcations. These discontinuities are detected in the union relationships detection process.Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 123 11 Fig. It has been observed that for values of θmin close to 3 π rad. Fig. The angles between the new segment AB and the crease segments r (α) and s (β) are near π rad. Crossovers can then be detected analyzing close bifurcations. Also. the process is as follows (Figure 13): 1. the last stage in the biometric pattern extraction process will be the filtering of spurious points in order to obtain an accurate biometric pattern for an individual. To improve the performance of the matching process is convenient to eliminate as spurious points as possible. 3. Analyze the points in and nearby the prolongation segment to find candidate segments.124 12 Will-be-set-by-IN-TECH Biometrics Fig. 12. Thus. s) and (r. For every endpoint in the image. Extend the segment in that direction a fixed length lmax . Retinal Vessel Tree reconstruction by unions (t. 4. 2. If it follows that l <= lmax . In the image test set used (over 100 images) the approximate mean number of feature points detected per image was 28. . If a point of a different segment is found. being l the distance from the endpoint of the segment to the other segment. spurious detected points are identified in the image. The mean of spurious points corresponded to 5 points per image. Compute the endpoint direction. Figure 14 shows an example of results after this stage where feature points are marked. These spurious points may occur for different reasons such as wrongly detected segments. Bifurcation between segment r and s. the segments will be joined and a bifurcation will be detected. t). The parameter lmax is inserted in the model to avoid indefinite prolongation of the segments. 13. The endpoint of r is prolonged a maximum distance lmax and eventually a point of segment s is found. u) and bifurcations (r. Fig. compute the angle (α) associated to that bifurcation. defined by the direction of this point and the extreme direction from step 1. Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 125 13 (a) (b) Fig.e.Markov et al. M. as Figure 14(b) shows. during the previous stage. 15. Spurious points corresponding to the same crossover (detected as two bifurcations) are signalled in squares. Tmin . 3. ν . the biometric pattern obtained from an individual.4 Filtering of feature points A segment filtering process takes place in the tracking stage.S. Spurious points from duplicate crossover points have been eliminated. Figure 15 shows an example of the filtering process result. (b) Image containing feature points after filtering. This fact is illustrated in Figure 16 . in the initial test set of images used to tune the parameters. (b) Feature points marked over the segment image. Briefly.2 Biometric pattern matching In the matching stage.G. it is necessary to align ν with ν in order to be matched L. 14. since crossover points are detected as two bifurcation points. Zitová & Flusser (2003). (1993). i. Finally.1. This leads to images with minimum false segments and with only important segments in the vessel tree. the reduction of false detected points was about from 5 to 2 in the average. Example of the result after the feature point filtering. ν.Brown (1992). Example of feature points extracted from original image after the bifurcation/crossover stage. (a) Original Image. Due to the eye movement during the image acquisition stage. (a) (b) Fig. for the claimed identity is compared to the pattern extracted. filtering detected segments by their length using a threshold. the stored reference pattern. these bifurcation points are merged into an unique feature point by calculating the midpoint between them. (a) Image containing feature points before filtering. 3. (2004). are shown using the crease approach. Under these circumstances. It is also important to note that both patterns ν and ν could have a different number of points even being from the same individual. in order to decide about the acceptance or the rejection of the hypothesis that both images correspond to the same individual. Depending on several factors. such as the eye location in the objective. The transformation considered in this work is the Similarity Transformation (ST). rotation and sometimes a very small change in scale. (a) and (c) original images. (d) Feature point image from (c). To develop this task the matching pairings between both images must be determined. 16. rotation and isotropic scaling using 4 parameters Ryan et al. Also. 16(b) and 16(d). A reliable and efficient model is necessary to deal with these deformations allowing to transform the candidate pattern in order to get a pattern similar to the reference one.126 14 Will-be-set-by-IN-TECH Biometrics where two images from the same individual. is nearly constant for all the images. A transformation has to be applied to the candidate image in order to register its feature points with respect to the corresponding points in the reference image. (b) Feature point image from (a). the rotations are very slight as the eye orientation when facing the camera is very similar. Examples of feature points obtained from images of the same individual acquired in different times. It has also been observed that the scaling. patterns may suffer some deformations. the ST model appears to be very suitable. The set of possible transformations is built based on some restrictions and . and the obtained results in each case. A set of 23 points is obtained. The ultimate goal is to achieve a final value indicating the similarity between the two feature points set. A set of 17 points are obtained. The ST works fine with this kind of images as the rotation angle is moderate. (a) (b) (c) (d) Fig. ST can model translation. This is due to the different conditions of illumination and orientation in the image acquisition stage. due to eye proximity to the camera. The movement of the eye in the image acquisition process basically consists in translation in both axis. which is a special case of the Global Affine Transformation (GAT). 16(a) and 16(c). The transformation with the highest matching score will be accepted as the best transformation. the number of possible matches greatly decrease and. B j ) = ⎛ ⎝ ∑ SI M( Ai . Bj )2 N ⎞ (5) j =1 ∑ SI M( Ai . Dmax is a threshold introduced in order to consider the quality loss and discontinuities during the creases extraction process leading to mislocation of feature points by some pixels. two thresholds are defined.(4): distance( A. For two points A and B. In some cases. In this scenario. If distance( A.(3) formalises this restriction: T= Smin < distance( p. The distance between these two points will be used to compute that value. In order to check feature points. The mean percentage of not considered transformations by these restrictions is around 70%. To identify the most suitable matching pair. the distance between both points in each pattern has to be very similar. Bj )⎠ . As it cannot be assumed that it will be the same. Since T represents a high number of transformations. in consequence. q) < Smax distance( p . q are points from ν pattern. some restrictions must be applied in order to reduce it. Smin and Smax . As the scale factor between patterns is always very small in this acquisition process. q are the matched points from the ν pattern. and p . two pairs of feature points between the reference and candidate patterns are considered. B) > Dmax then SI M ( A. This happens because B1 and B2 are close to each other in the candidate pattern. the set of possible transformations decreases accordingly. B) = 0. to bound the scale factor. the possibility of correspondence is defined comparing the similarity value between those points to the rest of similarity values of each one of them: SI M( A. To obtain the four parameters of a concrete ST. B2 could have both a good value of similarity with one point A in the reference pattern. This way.Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 127 15 a matching process is performed for each one of these. B) = 1 − P ( Ai . a similarity value between points (SI M) is defined which indicates how similar two points are. their similarity value is defined by Eq. Using this technique.two points B1 . B) (4) Dmax where Dmax is a threshold that stands for the maximum distance allowed for those points to be considered a possible match. If M is the total number of feature points in the reference pattern and N the total number of points in the candidate one. elements from T are removed where the scale factor is greater or lower than the respective thresholds Smin and Smax . Eq.(2): ( M2 − M)( N 2 − N ) (2) 2 where M and N represent the cardinality of ν and ν respectively. a constraint can be set to the pairs of points to be associated. q ) (3) where p. the size of the set T of possible transformations is computed using Eq. Bj ) − SI M( Ai . Bj ) + i =1 M SI M ( Ai . The algorithm finishes when no more non-zero elements can be selected from Q. Note that if the similarity value is 0. These images have a high variability in contrast and illumination allowing the system to be tested in quite hard conditions. 2 images per individual and 50 different images more) from VARIA database VARIA (2007) were used. Fig. N ) (6) . 4. Bj ). The main information to measure similarity between two patterns is the number of feature points successfully matched between them.17(a) shows the histogram of matched points for both classes of authentications in the training set. all images are matched versus all the images (a total of 150x150 matchings) for each metric. This means that only valid matchings will have a non-zero value in Q. j) inserted in C is the position in Q where the maximum value is stored. capping the possible number of matched points (Fig. The similarity measure (S) between two patterns will be defined by S= C f ( M. using the matched points information alone lacks a well bounded and normalised metric space. the possibility value is also 0. As it can be observed. a function f will be used. A high variability can be observed. Using this information. There are 60 individuals with two or more images acquired in a time span of 6 years. i.128 16 Will-be-set-by-IN-TECH Biometrics A M × N matrix Q is constructed such that position (i. the row (i) and the column(j) associated to that pair are set to 0. matched points information is by itself quite significative but insufficient to completely separate both populations as in the interval [10. In order to build the training set of matchings. The desired set C of matching feature points is obtained from P using a greedy algorithm. The final set of matched points between patterns is C. 18). j) holds P( Ai .e. some patterns have a small size. to prevent the selection of the same point in one of the images again. Fig. Then. Distributions of similarity values for both classes are compared in order to analyse the classification capabilities of the metrics. The rest of the images will be used for testing in the next section. as some patterns have more than twice the number of feature points of other patterns.17(b) shows the histogram for the biometric pattern size. The element (i. Also. As a result of this. This overlapping is caused by the variability of the patterns size in the training set because of the different illumination and contrast conditions in the acquisition stage. when the two matched patterns are from different individuals and clients (authorised accesses) when both patterns belong to the same person. The matchings are classified into attacks or clients accesses depending if the images belong to the same individual or not. The images from the database have been acquired with a TopCon non-mydriatic camera NW-100 model and are optic disc centred with a resolution of 768x584. the number of feature points detected. To combine information of patterns size and normalise the metric. Normalised metrics are very common as they make easier to compare class separability or establishing valid thresholds. Similarity metrics analysis The goal in this stage of the process is to define similarity measures on the aligned patterns to correctly classify authentications in both classes: attacks (unauthorised accesses). 13] there is an overlapping between them. a similarity metric must be established to obtain a final criterion of comparison between patterns. For the metric analysis a set of 150 images (100 images. Fig. where C is the number of matched points between patterns. (a) Matched points histogram in the attacks (unauthorised) and clients (authorised) authentications cases. and Fig.19(b) shows the FAR and FRR curves versus the decision threshold. 13] both distributions overlap.Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 129 17 (a) (b) Fig. and M and N are the matching patterns sizes.(7).19(a) shows the distributions of similarity scores for clients and attacks classes in the training set using the normalisation function defined in Eq. . The first f function defined and tested is: f ( M. In the interval [10. (b) histogram of detected points for the patterns extracted from the training set. N ) (7) The min function is the less conservative as it allows to obtain a maximum similarity even in cases of different sized patterns. 17. N ) = min( M. 4. 0.1 Confidence band improvement Normalising the metric has the side effect of reducing the similarity between patterns of the same individual where one of them had a much greater number of points than the other.20(a) shows the distributions of similarity scores for clients and attacks classes in the training set using the normalisation function defined in Eq.(8) and Fig. White circles mark the matched points between both images while crosses mark the unmatched points.38. Although the results are good when using the normalisation function defined in Eq. the cases of .20(b) clearly exposes. it also has some issues due to the normalisation process which can be corrected to improve the results as showed in next subsection. Although this metric shows good results. a few cases of attacks show high similarity values. This allows to reduce the attacks class variability and. a small amount of detected feature points is obtained capping the total amount of matched points. N ) = √ MN (8) Fig. to separate its values away from the clients class as this class remains in a similar values range.20(b) shows the FAR and FRR curves versus the decision threshold. overlapping with the clients class.(8) combines both patterns size in a more conservative way. moreover. 18. a new normalisation function f is defined: f ( M. that some minimum quality constraint in terms of detected points would improve performance for this metric.5] as Fig. To improve the class separability. This is caused by matchings involving patterns with a low number of feature points as min( M. needing only a few points to match in order to get a high similarity value. In (b) the illumination conditions of the image lead to miss some features from left region of the image.130 18 Will-be-set-by-IN-TECH Biometrics (a) (b) Fig. even in cases with a high number of matched points. a decision threshold can be safely established where FAR = FRR = 0 in the interval [0. To take a closer look at this region surrounding the confidence band. Therefore. This suggests.(7). This means that some cases easily distinguishable based on the number of matched points are now near the confidence band borders. Example of matching between two samples from the same individual in VARIA database. N ) will be very small. Function defined in Eq. As a result of the new attacks class boundaries. as it will be reviewed in section 5. preventing the system to obtain a high similarity value if one pattern in the matching process contains a low number of points. 21 shows the histogram of matched points for cases in the marked region of Fig. Fig.Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 131 19 (a) (b) Fig.20(b). The new metric is defined as: Cγ Sγ = S · C γ −1 = √ MN (9) . A correction parameter (γ) is introduced in the similarity measure to control this. (b) False Accept Rate (FAR) and False Rejection Rate (FRR) for the same metric. unauthorised accesses with the highest similarity values (S) and authorised accesses with the lowest ones are evaluated. It can be observed that there is an overlapping but both histograms are highly distinguishable. the influence of the number of matched points and the patterns size have to be balanced. (a) Similarity values distribution for authorised and unauthorised accesses using f = min( M. To correct this situation. N ) as normalisation function for the metric. 19. (b) False Accept Rate (FAR) and False Rejection Rate (FRR) for the same metric. C. values can be higher than 1. a sigmoid transference function. with S. 1] values space. Using the gamma parameter. (a) similarity values distribution for authorised and unauthorised accesses using √ f = MN as normalisation function for the metric.(8). In order to normalise the metric back into a [0. specially in cases of patterns with a high number of points. T ( x ). 20. is used: T (x) = 1 1 + es·( x−0.5) (10) . M and N the same parameters from Eq.132 20 Will-be-set-by-IN-TECH Biometrics (a) (b) Fig. Dotted lines delimit the interest zone surrounding the confidence band which will be used for further analysis. The γ correction parameter allows to improve the similarity values when a high number of matched points is obtained. 83 different from the training set and 7 from the previous set with the highest number of points. The maximum improvement is achieved at γ = 1.3 and clients accesses whose similarity is lower than 0. and.(11). A usual error measure is the Equal Error Rate (EER) that indicates the error rate where FAR curve and FRR curve intersect. Histogram of matched points in the populations of attacks whose similarity is higher than 0. Sγ ( x ). The EER is 0 for the normalised by geometrical mean (MEAN) and gamma corrected (GAMMA) metrics as it was the same case in the training set. to evaluate the influence of the image quality. the False Acceptance Rate and False Rejection Rate were calculated for each of them (the metrics normalised by Eq.12 with a confidence band of 0. The establishment of a wide confidence band is specially important in this scenario of different images from users acquired on different times and with different configurations of the capture hardware. where s is a scale factor to adjust the function to the correct domain as Sγ does not return negatives or much higher than 1 values when a typical γ ∈ [1. To test the metrics performance. the gamma corrected metric shows the highest confidence band in the test set. 21.22(a)).2337.3288. much higher than the original from previous section. Eq.22(b) where the wide separation between classes can be observed. The distribution of the whole training set (using γ = 1. is defined by: Sγ = T ( Sγ ) (11) Finally. 5. 2] is used. 0. s=6 was chosen empirically. in terms of feature points detected per image. the confidence band improvement has been evaluated for different values of γ (Fig. In this work. to choose a good γ parameter. Fig. has been built in order to test the metrics performance once their parameters have been fixed with the training set. Finally. a test is run where images with a biometric pattern size below a threshold are removed .23(a) shows the FAR and FRR curves for the three previously specified metrics.6.12) is showed in Fig. Results A set of 90 images. again.(8) and the gamma-corrected normalised metric defined in Eq.(7). The normalised gamma-corrected metric.Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 133 21 Fig. 12. 25.3317. was 155ms: 105ms in the feature extraction stage and 50ms in the registration and similarity measure estimation. the band is increased only by 0. The mean execution time on a 2. (a) Confidence band size vs gamma (γ) parameter value. Fig. for the set and the confidence band obtained with the rest of the images is evaluated.12. So removing half of the images.098 suggesting that the gamma-corrected metric is very robust to low quality images.134 22 Will-be-set-by-IN-TECH Biometrics (a) (b) Fig. The confidence band does not grow significatively until a fairly high threshold is set.23(b) shows the evolution of the confidence band versus the minimum detected points constraint.2.2337 to 0. the confidence band grows from 0. Maximum band is obtained at γ = 1. Intel Core Duo desktop PC for the authentication process. 22. . so that the method is very well-fitted to be employed in a real verification system. Taking as threshold the mean value of detected points for all the test set. implemented in C++.4Ghz. (b) Similarity values distributions using the normalised metric with γ=1. The best confidence band is the one belonging to the gamma corrected metric corresponding to 0.(b) Evolution of the confidence band using a threshold of minimum detected points per pattern.2337. (a) FAR and FRR curves for the normalised similarity metrics (min: normalised by minimum points. 23. mean: normalised by geometrical mean and gamma: gamma corrected metric). .Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 135 23 (a) (b) Fig. G. This is also possible as this methodology does not rely on the localisation or segmentation of some reference structures. K. Ross. Penedo. C. To get the set of feature points. L. C. N.-Y. S..-T.Gonzalez (2003). A. Kim. Sun. & Marks. (2002)... Sci. Following the same idea as the fingerprint minutiae-based methods. 403–406. Cho. M. D. A prototype hand geometry-based verification system. (2004). Z.. Farzin. S. 7. Boosting ordinal features for accurate and fast iris recognition. & González. (1999). H. VLSI Signal Processing 38(2): 147–156. R. (2004). Choi. Tan. M. Conclusions and future work In this work a complete identity verification method has been introduced. Chou.. if the the user suffers some structure distorting pathology and this structure cannot be detected. (2004). EURASIP Journal on Advances in Signal Processing ID 280635: 10 pp. (2006). pp. Pankanti (eds). R. R. the system works the same with the only problem being a possible loss of feature points constrained to that region. J. pp. Fingerprint minutiae: A constructive definition. Pattern Recognition 37(2): 337–349.Mariño. 132–140. Normalised metrics have been defined and analysed in order to test the classification capabilities of the system. a creases-based extraction algorithm is used.J. IIH-MSP. H. Biometrics: Personal Identification in Networked Society. References Bolle. A. 47: 34–42. 58–66.-W. X. Boston.. Jain. it is possible to measure the degree of similarity by means of a similarity metric. a set of feature points is extracted from digital retinal images. M. I. Retinal angiography based authentication. Design of gabor filter banks for iris recognition. A novel retinal identification system. Zhong. Y. in A. CVPR. (2008). C.. pp... After that. Retina identification.. Vet. ICIAR (2). 166–171... De Schaepdrijver. J. Bolle & S. & Lee. M. M. & Pankanti. He. H. Z. 123–142. W.G. S.Carreira & F. Gómez-Ulla. Face recognition using the second-order mixture-of-eigenfaces method. Lecture Notes in Computer Science 2905: 306–313. Carreira. F.-C. S. W. & Pankanti. Retinal vascular patterns in domestic animals. M. . Lauwers. L. Bang. (1989). Mariño. (2008). Kluwer Academic Press. Jain. & DeGesst..Penedo. H. J. Finally. Hill. J. The use of feature points to characterise individuals is a robust biometric pattern allowing to define metrics that offer a good confidence band even in unconstrained environments when the image quality variance can be very high in terms of distortion. D. pp.-Y. G. Thus. Kim.136 24 Will-be-set-by-IN-TECH Biometrics 6. Automatic extraction of the retina av index.. C.. Kim. Simoens. AVBPA. Res. Abrishami-Moghaddam. F.. S.. K. A. With the patterns aligned. S. as it might be the optic disc.. Senior. & Dong. a recursive algorithm gets the point features by tracking the creases from the localised optic disc. Future work includes the use of some high-level information of points to complement metrics performance and new ways of codification of the biometric pattern allowing to perform faster matches. J.. illumination or definition. Biometric Authentication. Qiu. This unique pattern will allow for the reliable authentication of authorised users. pp. & Chen. Caderno. R. (1999). The results are very good and prove that the defined authentication process is suitable and reliable for the task. T. Ratha. Iris recognition using wavelet features. Shih. a registration process is necessary in order to match the reference pattern from the database and the acquired one. & Moin.-S. . D. M. D.203. Fingerprint identification by use of a volume holographic optical correlator.López & Villanueva. Sidlauskas. Lloret. (2008). 156–169.. (2001). pp.S. Graph application on face for personal authentication and recognition.J. D. A. Evaluation of methods for ridge and valley detection. Mariño. López. D. Dis. Probabilistic visual learning for object representation. A. (1999).). systems Lab. B. M. Lloret. United States Patent No. A. López. Respir. Mariño. a hand shape identification system. A. G. A. Real-time algorithm for retinal tracking. J. F. Moghaddam.. (1998). F. & Maltoni.. Y. Simon. J. A. D. Pattern Anal. A. R.. pp.. ECCV. (2002). Personal authentication using digital retinal images. An improved iris recognition system using feature extraction based on wavelet maxima moment invariants. (1935). IEEE Trans. M. Keypoint detection and local feature matching for textured 3d face recognition. Ma. Seung-Hyun. Wang. Creaseness from level set extrinsic curvature. A. A. Carreira.M. (1989).. P. Proc. B. (2007). Tistarelli. Am. D. Mian. & Bouridane. (1997).. (2000).Brown (1992). M. C. Proceedings of the IX Spanish Symposium on Pattern Recognition and Image Analysis. & Eun-Soo. ICARCV. (1988). (2008). F. J. & Villanueva.. S. C. pp. D.. Sang-Yi. T. Serrat. L.G. Creaseness-based CT and MR registration: comparison with the mutual information method. on Pattern Analysis and Machine Intelligence 21(4): 327–335. . J. P. Ryan. Heneghan. ICPR (2). J. & Gupta. & Pentland. (1997). Optical Pattern Recognition pp. & Serrat. M. L. Serrat. Pattern Anal. Penas. M. Rattani. M. López. Rev. Iris recognition using circular symmetric filters. Serrat. IEEE Trans.. 3715. & Gonzalez. J. J.4. I. Computer Vision and Image Understanding 77(1): 111–144. & Owens. & Villanueva.Rylander & A. Embryonic origins and assembly of blood vessels.. Nabti. on Biomedical Engineering 40(12): 1269–1281. (2004). H. Registration of digital retinal images using landmark correspondence by expectation maximization. D. & Tan. IEEE Trans.. IEEE Trans.. J. Image and Vision Computing 22: 883–898. M. Penedo. Lumbreras. A survey of image registration techniques. M. A. ACM Computer Surveys 24(4): 325–376. 321–325. J. & de Chazal. Landmark-based registration of full SLO video sequences. J. Mach. A. J. N. & Goldstein. ICB. 19(7): 696–710.G. Siguenza Pizarro. López. Lumbreras. Hasis. Bennamoun. International Journal of Computer Vision 79(1): 1–12. (2005). Pattern Analysis and Applications 9: 21–33. pp. Vol. Medicine 35(18): 901–906. J. Y. Direct gray-scale minutiae detection in fingerprints. (n.d. Maio. Serrat.Retinal Vessel TreePattern Retinal Vessel Tree as Biometric as Biometric Pattern 137 25 Kisku. (1995). & Villanueva. pp. 988–996. 3d hand profile identification apparatus. 414–417.Markov.. 189–194. 1150–1155. C. Journal of Electronic Imaging 8(3): 255–262.736. Lloret. & Tapiador Mateos. SPIE Vol. K. (1999). A new scientific method of identification. Intell. Multilocal creaseness based on the level-set extrinsic curvature. C.. 140: 1097–1103. Intell..Welch (1993). I. 19(1): 27–40. (2006). Tecnologias Biometricas Aplicadas a Seguridad.. Ra–Ma. Mach. R. Noden. L.. & Golden. J. N. Venkataramani. B. D. Hand geometry based verification. Henrickson. The fundus oculi in monozygotic twins: Report of six pairs of identical twins. Kluwert Academic Publishers. Yang. Shadduck. Whittier. & Flusser. J. Ophthalmol.-H. ICIP. & Kumar. K. C. AVBPA. (1999).. Doubet. VARPA Retinal Images for Authentication. BIOMETRICS:Personal Identification in Networked Society. M. (1955). (2003). Cobb.. J..varpa. Biological considerations pertaining to use of the retinal vascular pattern for permanent identification of livestock. B. http://www. Fingerprint verification using correlation filters. Sci 81: 1–79. Image registration methods: a survey.es/varia. D. J. pp.. Zunkel. R. J. Face recognition using kernel eigenfaces. (2003). V. Zitová. VARIA (2007).. Ahuja.html. . J. Anim. P. (2000). 54: 225–239. (2003). J. & Kriegman. 886–894. Image Vision and Computing 21(11): 977–1000.138 26 Will-be-set-by-IN-TECH Biometrics Tower. Arch. is folded inside the nucleus of each cell. in which two antiparallel strands spiral around each other in a double helix. However. X and Y. and C with G (Alberts. Human identification based on DNA polymorphism A human body is composed of approximately of 60 trillion cells. Retinal scanning. not only among human beings but among all living creatures. 2002. These four letters represent the informational content in each nucleotide unit. 2004).7 DNA Biometrics Division of Forensic Medicine. The bases of each strand project into the core of the helix. Tohoku University Graduate School of Medicine Japan 1. DNA. and does not change during a person’s life or after his/her death. Watson. DNA exists in the double-stranded form. Within human cells. respectively. typified by fingerprint. voice dynamics and handwriting recognition are also being developed. Males are described as XY since they possess a single copy of the X . C and T. have been making rapid progress. cytosine and thymine. deficiencies. a sugar. deoxyribonucleic acid (DNA) provides the most reliable personal identification. and is composed of nucleotide units that each has three parts: a base. face recognition and iris scanning. variations in the nucleotide sequence bring about biological diversity. A pairs strictly with T. G. what are the advantages. DNA found in the nucleus of the cell (nuclear DNA) is divided into chromosomes. where they pair with the bases of the complementary strand. abbreviated A. DNA is a polymer. It is intrinsically digital. The bases are adenine. Department of Public Health and Forensic Medicine. guanine. Meanwhile. human cells contain 46 different chromosomes. the phosphate and sugar portions form the backbone structure of the DNA molecule. The human genome consists of 22 matched pairs of autosomal chromosomes and two sex-determining chromosomes. these methods are based on the measurement of similarity of feature-points. Introduction The biometric authentication technologies. how can a personal ID be generated from DNA-based information? And finally. Among the various possible types of biometric personal identification system. and future potential for personal IDs generated from DNA data (DNA-ID)? Masaki Hashiyada 2. Within a cell. This chapter addresses three questions: First. how can personally identifying information be obtained from DNA sequences in the human genome? Second. and a phosphate. This introduces an element of inaccuracy that renders existing technologies unsuitable for a universal ID system. These methods have been commercialized and are being incorporated into systems that require accurate on-site personal authentication. which can be thought of as the blueprint for the design of the human body. In other words. These repeated DNA sequences come in all sizes. These regions are referred to as satellite DNA (Jeffreys et al. is in the range of approximately 8−100 bases in length (Jeffreys et al. however. referred to as a minisatellite or VNTR (variable number of tandem repeats). is extragenic.1 Sort tandem repeat (STR) Primer 1 2 7 repeats 3 4 5 6 7 Primer 5 6 7 8 8 repeats 1 2 3 4 9 repeats 1 2 3 4 5 6 7 8 9 10 repeats 1 2 3 4 5 6 7 8 9 10 11 repeats 1 2 3 4 5 6 7 8 9 10 11 Target region (short tandem repeat) 1 2 3 … ・ 2-nucletotide repeat unit : (CA)(CA)(CA)・・・・ ・ 3 -nucletotide repeat unit : (GCC)(GCC)(GCC) ・・・・ ・ 4 -nucletotide repeat unit : (AATG)(AATG)(AATG) ・・・・ ・ 5 -nucletotide repeat unit : (AGAAA)(AGAAA) ・・・・ Fig. Therefore. while females possess two copies of the X chromosome and are described as XX. approximately 75%.. either between genes or within genes (i.. These regions are sometimes referred to as ‘junk’ DNA. 2004). these regions consist of exons (protein-coding portions) and introns (the intervening sequences) and constitute approximately 25% of the genome (Jasinska & Krzyzosiak.. and are typically designated by the length of the core repeat unit and either the number of contiguous repeat units or the overall length of the repeat region. 1. DNA regions with . Markers commonly used to identify individual human beings are usually found in the noncoding regions.. The structure of Short Tandem Repeat (STR) In the extragenic region of eukaryotic genome. 2001). recent research suggests that they may have other essential functions. The regions of DNA that encode and regulate the synthesis of proteins are called genes.e. introns). The human genome contains only 20. Venter et al. 1995).000−25.140 Biometrics chromosome and a single copy of the Y chromosome... 1985). most of the genome.000 genes (Collins et al. there are many repeated DNA sequences (approximately 50% of the whole genome). 2004. 2001. Lander et al. The core repeat unit for a mediumlength repeat. 2. The swab is then air dried.1 DNA sample collection DNA can be easily obtained from a variety of biological sources. 1992)(Fig.1995.. The flow of DNA polymorphism analysis 2. Lee et al.1.. 1991. 2008).Jeffreys et al. they are homozygous. a buccal swab is the most simple. Buccal cell collection involves wiping a small piece of filter paper or a cotton swab against the inside of the subject’s cheek. 2007a)... hair and used razors (Anderson et al. 2010).” The type of STR is represented by the number of repeat called ‘allele’ which is taken from biological father and mother. convenient and painless sample collection method (Hedman et al. 1998. simple sequence repeats (SSRs). The use of large quantities of fresh buccal cells made it possible to extract DNA in a short time. or can be pressed against a treated collection card in order to transfer epithelial cells for storage purposes. . The number of repeats in STR markers can vary widely among individuals. For biometric applications. Fig. Lee & Ladd. including both the 22 autosomal chromosomes and the X and Y sex chromosomes.2 DNA extraction and quantification There are many methods available for extracting DNA (Butler. when they have two different alleles. cost. Hagelberg et al. The location of an STR marker is called its “locus. 1). 2. The author has already reported the “5-minute DNA extraction” using an automated procedure (Hashiyada. STRs have become popular DNA markers because they are easily amplified by the polymerase chain reaction (PCR) and they are spread throughout the genome.1. especially the number of samples. Extraction time is the critical factor for biometric applications. they are heterozygous.DNA Biometrics 141 repeat units that are 2−7 base pairs (bp) in length are called microsatellites. not only body fluid but also nail. 2001). The choice of which method to use depends on several factors. When an individual has two copies of the same allele for a given marker. and speed. ... in order to collect shed epithelial cells. or most commonly short tandem repeats (STRs) (Clayton et al. 2001). making the STRs an effective means of human identification in forensic science (Ruitberg et al. 1999. 2. some STR alleles differ by only 1 base-pair.. it has become possible to PCR amplify 16 STRs.. 1987. the fluorescence labelling of PCR products followed by multicolour detection has . Foster City. 2000). or PCR (Mullis et al. DNA amplification with polymerase chain reaction (PCR) 2.4 DNA separation and detection After STR polymorphisms have been amplified using PCR.142 Biometrics In forensic cases. 1986). Recently. 1993. USA). However. Fig.’ in one tube (Kimpton et al. 1996). First described in 1985 by Kary Mullis. CA. During each cycle. a copy of the target DNA sequence is generated for every molecule containing the target sequence. including the gender assignment locus called ‘amelogenin. Such multiplex PCR is enabled by commercial typing kits.3 DNA amplification (polymerase chain reaction: PCR) The field of molecular biology has greatly benefited from the discovery of a technique known as the polymerase chain reaction.. such as AmpFlSTR® Identifiler® (Applied Biosystems. Madison.1. PCR is an enzymatic process in which a specific region of DNA is replicated over and over again to yield many copies of a particular sequence. USA) and PowerPlex® 16 (Promega. In recent years. Electrophoresis of the PCR products through denaturing polyacrylamide gels can be used to separate DNA molecules from 20−500 nucleotides in length with single base pair resolution (Slater et al.1. Saiki et al. Kimpton et al. 1986.. 2. This molecular process involves heating and cooling samples in a precise thermal cycling pattern for approximately 30 cycles. Mullis & Faloona.. DNA quantitation is an important step (Butler. WI. PCR has made it possible to make hundreds of millions of copies of a specific sequence of DNA in a few hours. the length of products must be measured precisely. 2010). 3. who received the Novel Prize in Chemistry in 1993. this step can be omitted in biometrics because a relatively large quantity of DNA can be recovered from fresh buccal swab samples. e.DNA Biometrics 143 been adopted by the forensic science field.. 2010). the alleles (i. However. capable of high-throughput analysis. one allele is a cytosine (C) and the other is a thymine (T) (Fig. 2001). However.. SNPs normally have just two alleles. the type or the number of STR repeat units). 40 to 50 SNPs must be analyzed in order to obtain reasonable powerful discrimination and define the unique profile of an individual (Gill. 4). it is necessary to analyse many more SNPs. Importantly. e. the Y chromosome (Chr Y) and mitochondrial DNA (mtDNA) . to obtain data from 16 STRs including the sex determination locus. a single base difference at a particular point in the sequence of DNA (Brookes. 4.1 SNP detection methods Several SNP typing methods are available. Thus. 2. starting with DNA extraction.3 Lineage markers Autosomal DNA markers are shuffled with each generation. which means that half of an individual's genetic information comes from his or her father and the other half from his or her mother. Furthermore. 2.. It takes around four hours.2 Single nucleotide polymorphism (SNP) The simplest type of polymorphism is the single nucleotide polymorphism (SNP). After data collection by the CE. Fig. are analyzed by the software that accompanies the CE machine.. 1999). however. SNPs are prospective new bio-markers in clinical medicine (Sachidanandam et al. 2009). SNPs are so abundant throughout the genome that it is theoretically possible to type hundreds of them. unlike the STR analysis (Butler. In order to achieve the same power of discrimination as that provided by STRs. The schema of Single nucleotide polymorphism (SNP) 2. in the near future. Electrophoresis platforms have evolved from slab-gels to capillary electrophoresis (CE). each with its own strengths and weaknesses.2. sample processing and data analysis may be more fully automated because size-based separation is not required. 2001. Stenson et al. we can count on the development of new SNP detection technologies. Up to five different dyes can be used in a single analysis.g.. which use a narrow glass filled with an cross-linked polymer solution to separate the DNA molecules (Butler et al. 2004). SNPs therefore are not highly polymorphic and do not possess ideal properties for DNA polymorphism to be used in forensic analysis. No one model is best for all situations. Generate a DNA-ID α X according to the following series. This introduces an element of inaccuracy that renders the existing technologies unsuitable for a universal ID system. there are some features of both Chr Y and mtDNA that make them valuable forensic tools. Measure the STR alleles at each locus. retina and the patterns of vein and hand geometry (Shen & Tan. Vijaya Kumar et al. However. k). signature. that includes allelic information about STR loci. Even so. such as STRs and SNPs.e. could provide the most reliable personal identification. it is necessary to statistically analyze how the distribution of (j.144 Biometrics markers are called “lineage markers” because they are passed down from generation to generation without changing (except for mutational events). j ≤ k Depending on the measurement. k|j ≤ k). voice. such as the high cost of analysis. In addition. 1999) and whereas paternal lineages can be traced using Chr Y markers (Jobling & Tyler−Smith. there are several other problems. Before (j. in order to establish a one-to-one correspondence for each individual. Ln . k) or (k. as described in section 2. Obtain STR count values for each locus. as (j. is intrinsically digital. face. as shown in Fig. Kayser et al. . by way of example. 2004). The repeat counts for the pair of alleles at each locus are arranged in ascending order. and the other from the mother. k). specifically. In addition. 2: one allele is inherited from the father. Li (j. j and k are expressed in an ascending order. 1981. 3. the biggest problem in using DNA is the time required for the extraction of nucleic acid and the evaluation of STR or SNP data. DNA identification data is utilized in the forensic sciences. 1999. Each locus is associated with two alleles with distinct repeat counts (j.. iris.. 2003. 3. i. Step 1. . Andrews et al. j). The loci are incorporated in the following sequence.1 DNA personal ID using STR system We will refer to repeat counts of alleles obtained by STR analysis. Maternal lineages can be followed using mtDNA sequence information (Anderson et al. Therefore. the author proposes DNA INK for authentic security. Therefore. the same person's STR count may appear as (j. k) varies at a given locus based on actual data.. 2004). k): α X = L1 L2 L3 . On the negative side. these technologies are based on the measurement of similarity of features. L: j k.1. k) can be applied to a DNA personal ID. The analysis of lineage markers does not have the discriminatory power of autosomal markers. issues raised by monozygotic twins. This data can be precisely defined the most minute level. This section describes a method for generation of DNA personal ID (DNA-ID) based on STR and SNP data. express these in ascending order.. and does not change during a person’s life or after his/her death. This step is referred to as a ordering operation. Step 3. In addition. Step 2. DNA polymorphism for biometric source The most commonly studied or implemented biometrics are fingerprinting. and ethical concerns. using (j. DNA polymorphism information. α X .. We can generate a DNA-ID. it must be encrypted to protect privacy. using the STR analysis system described above. . k) if k >j. suppose that Mr. Finally. For example. α X = 1214811131529322 …… 1010 When the STR number of an allele had a fractional component. Therefore. 3. j) occurs. pjk is expressed as follows when j ≠ k (j <k): pjk = pj • pk + pk • pj = 2pjpk If j = k.2 ) … ( 10. it is treated as (j.1 Matching probability at locus L The probability that a STR allele (j.14 ) ( 8. Here. the secure hash algorithm-1 (SHA-1). M has the following alleles at the respective loci. and becomes a personal identification information that is unique with a certain probability predicted by statistical and theoretical analysis. pjk = pj • pj 3.2. the probability that (j.11) ( 13.15 ) ( 29. The reason for this is as follows.2 Statistical and theoretical analysis of DNA-ID 3.10 ) The α X was thus defined as follows.1.2. and all of the numbers. k) at locus L will occur in this combination is denoted as pjk. This can be achieved using a one-way function that also reduces the data length of the DNA-ID. were retained. k).DNA Biometrics 145 where Li indicates the ith STR count (j. j and k are sequenced in ascending order to make the choice of generated ID unambiguous. After the STR analysis. such as allele32. 32. α X = D3S1358 D13S317 D18S51 D21S11 . produces an ID with a length δ X of 160 bits. k) occurs is pjk plus the probability pkj that (k. respectively. k) by rewriting it as (j. Even if (k. . This one-way function. according to the following transformation: δX = h ( αX ) 3.2 Probability of a match between any two persons’ DNA-ID Probability p that the STR count at the same locus is identical for any two persons can be expressed as follows: .2 in D21S11. the decimal point was removed. j) occurs in the same person during measurement. D16S539 = ( 12. including those after the decimal point. The individual occurrence probabilities of j and k are denoted as pj and pk.1 Establishment of the identification format Because α X contains personal STR information. α X is generated number with several tends of digits. .146 Biometrics When j = k.9999998995 and the combined matching probability was 1 in 9. These values were calculated using polymorphism data from Japanese subjects. Caucasian or African.. were analyzed and showed a relatively high rate of matching probability.000 times). which means they are statistically independent. the genotype frequencies can be predicted from the allele frequencies. ∑( p j =1 m m j ⋅ pk ) 2 When j ≠ k. ∴ p = ∑ pj j =1 m 1≤ j < k ≤ m ∑ (2p m 1≤ j < k ≤ m j ⋅ pk ) 2 ( ) 4 + ∑ 4(p j ⋅ pk ) 2 Here. D5S818 and CSF1PO on chr 5. PCR amplification and STR typing Step 2. the probability pn that the DNA-IDs of any two persons will match (the DNA-ID matching probability) is as follows: pn = ∏ pi i =1 n Here. and AmpFlSTR Identifiler™ (Applied biosystems) (Hashiyada. Step 1. Perform the exact test (the data were shuffled 10. Calculate parameters. and the information reported so far indicates m= 60. it is likely that different values would be obtained using data compiled from different ethnic groups. the power of discrimination. 1. There are some loci on the same chromosomes (chr) such as D21S11 and Penta D on chr 21. m is the upper limit of j and k. 3.e. Next. it is assumed that there is no correlation among the STR loci.2.3 Verification using validation experiment (STR) As a validation experiment. Step 3. the polymorphic information content. where n loci were used to generate the ID. HWE provides a simple mathematical representation of the relationship among genotype and allele frequencies within an ideal population. the mean exclusion chance. . we studied the genotype and distribution of allele frequencies at 18 STRs in 526 unrelated Japanese individuals. 2003a. The combined mean exclusion chance was 0.98 × 1021. the matching probability. no significant deviation from HWE was detected. Importantly. the expected and observed heterozygosity. 2003b).g. No correlation was found between any sets of loci on the same chromosome. Information about the 18 target STRs is described in Table 1. The probability that the STR counts at the ith locus will match for any two persons is denoted as pi. e. i. In addition. PowerPlex SE33 (Promega). When n loci are used. a determination is made of the DNA-ID matching probability pn.0024 × 10−22. and TPOX and D2S1338 on chr 2. the homozygosity. excluding the Amelogenin locus. and likelihood ratio tests using STR data for each STR locus in order to evaluate Hardy–Weinberg equilibrium (HWE). Data was obtained using three commercial STR typing kits: PowerPlex™ 16 system (Promega). and is central to forensic genetics. Perform DNA extraction. the statistical data for the 18 analyzed STRs. in order to estimate the polymorphism at each STR locus. when a population is in HWE. 13 Repeat Motif* GAAT TGCC/TTCC TCTG/TCTA CTTT/TTCC AGAT TAGA AAAG GATA TCTA/TCTG Locus TH01 VWA D13S317 Penta E D16S539 D18S51 D19S433 D21S11 Penta D Chromosome Location 11 p 15. a population of tens of millions could be expected to include pairs of individuals with identical STR alleles. and is called a “paradox.0024 × 10−22 .ID. This formula can use an approximate value of 1-(1-PM)L(L-1)/2 . when using 18 loci. between two first cousins.3 5 q 23. we must investigate the probability of two or more randomly selected people having an identical DNA.1 6 q 14 7 q 21. As the number of people in a community increases. the probability that at least two students have the same birthday is approximately 0.2 5 q 33. discrimination between half siblings requires analysis of 57 STR loci guarantee a unique DNA-ID. when using DNA identification systems such as STR systems for DNA-personal-IDs. however. if STRs are to be used as an authentication system in our society.11 8 q 24. is defined as the practical matching probability (PPM). 1-(1-PM)L(L-1)/2. we can obtain a unique DNA-ID. If the frequencies of STR alleles are similar among all ethnic groups.” because for any single pair of students. the more the practical matching probability increases. However. L(L-1)/2 · PM. Therefore.31 13 q 31. In addition. The paradox arises when we forget to consider that we are selecting samples randomly out of the members in a group.4 The “Birthday Paradox” of DNA-ID In principle.DNA Biometrics 147 Locus TPOX D2S1338 D3S1358 FGA D5S818 CSF1PO SE33 D7S820 D8S1179 Chromosome Location 2 q 25. Of 40 students in a class. so we use the expected value. 2007b).5 12 p 13.1 21 q 22. to estimate two persons having the same STR genotype. Thus. the PPM should be considered for both related and unrelated individuals (Hashiyada. each person in the population could have a unique DNA-ID.2. In this report. where L is the population size. as described above.1 15 q 26. The most well-known simulation of this probability is “the birthday paradox“. each person in Japan (or the world) could have a unique DNA-ID if the PPM of the STR system were approximately 10−24 and 10−30. the probability that they have the same birthday is 1/365 (0. is beyond the ability of personal computers. L(L-1)/2 · PM.3 2 q 35 3 p 21. The matching probability (PM) for 18 STRs is 1. the formula.33 19 q 12 21 q 21.2 16 q 24. we also need to consider PPM between related individuals.0027). respectively. This number can be applied for unrelated persons. the probability that one STR locus is different and that all STR loci are identical is (1-PM)L(L-1)/2 and 1-(1-PM)L(L-1)/2. However. This result seems counterintuitive.31 4 q 31. Information about autosomal STR loci 3. When PPM multiplied by the population size is less than 1. In two randomly selected individuals.9. if 41 STR loci are analyzed.1 18 q 21. respectively. the low matching probability of STR-based IDs would allow absolute and unequivocal discrimination between individuals. and because 1-(1-PM)L(L-1)/2 is smaller than L(L-1)/2 · PM when L is not small.3 Repeat Motif* TCAT TCTG/TCTA TATC AAAGA GATA AGAA AAGG/TAGG TCTA/TCTG AAAGA * Two types of motif means a compound or complex repeat sequence Table 1. This is because L2 is much smaller than 1/ PM when L is small. the value. For instance. . α X = 0000101111101010……0100 This X must be encrypted for privacy protection using the secure hash algorithm-1 (SHA-1) for the same reasons as described above for STRs. 25 to 45 SNP loci are needed in order to yield equivalent random match probabilities comparable to those obtained with the 13 core STR loci that have been adopted by the FBI’s DNA database (COmbined DNA Index System. Analyze the SNP loci and place them in the following order. the four types of nucleotide. Since DNA has a double helix structure. T=11 Finally. The resulting DNA-ID (SNP) has a length δX of 160 bits. Ln where Li indicates the ith SNP nucleotide (allele1.C .T) SNP 3 (T.A) Then X would be defined as follows. it is important to specify which strand of the double helix is to be analyzed. α X = AACTTCCC……GA Next. Computational analysis have shown that on average. C=10. meaning that they have two possible alleles and therefore three possible genotypes. 4). Define alleles 1 and 2 for each SNP locus. L : allele 1 allele 2 Step 3. according to the following transformation: δX = h ( αX ) .. are translated into binary notation. three possible genotypes would be RR. α X = SNP 1 = (A. The steps of creating a DNA-ID using SNPs are as follows. CODIS). Generate the DNA-ID α X according to the following series of Li (allele1. C and T. G. For example. the X is described as a string of 100 bits (digits of value 0 or 1).. C(cytosine) and T(thymine) nucleotide). G=01. G(guanine). Step 1.3 DNA personal ID using SNP system The vast majority of SNPs are biallelic. it is necessary to analyze a larger number of SNPs in order to obtain a reasonable power of discrimination to define a unique profile. A=00. For example. if the alleles for a SNP locus are R and S (where ‘R’ and ‘S’ could represent a A(adenine). A.A) SNP 2 (C. Step 2.C) SNP 4 (C. Because a single biallelic SNP by itself yields less information than a multiallelic STR marker. In other words. respectively (Fig. allele2): α X = L1 L2 L3 .148 Biometrics 3.. RS (SR) or SS. and to define allele 1 and allele 2 at the outset. the single nucleotide polymorphism of A or G is the same polymorphism of T or C.. …… SNP 50 (G. allele2). suppose that a person has the following alleles at the respective loci. Although several SNPs were located on the same autosomal chromosome. the author demonstrates an example of an application of STR polymorphism information. Perform STR analysis by the method described above. Each set of 8 data bits is extended by two redundant bits known as the shift and check bits.63 × 10−18). The author outlines the development of biometric ink containing DNA whose sequence is based on personal STR information.440 × 10−17. no significant deviation from Hardy−Weinberg Equilibrium (HWE) was detected.465 (Hashiyada. 3. ID. These limiting factors are necessary in DNA sequence analysis in order to exclude five or more repetitions of the same base. which is capable of amplifying the DNA and typing the SNPs at the same time.375−0. which have high MP in each loci. This new method is capable of detecting and typing 96 SNPs within 30 minutes (Hashiyada et al. and selected SNP loci that yield successful results under the modified PCR conditions. From DNA extraction to STR typing.3. specifically the authentication of rare or expensive goods using the DNA-ID. The extracted 40-bit data as follows.4 Rapid analysis system of SNP A reduction of the time required for DNA analysis is necessary in order to make practical use of DNA biometrics. The original 160-bit length was defined as δ X i = δ X|δ X2 δ X|δ X4 | 3 | 1 where δX 1. respectively in Japanese population. The author developed the SNP typing methodology using the modified TaqMan® method. δX 3 and δX4 refer to the identification. Step 2. which serve not only as check but also as limiting factors in the latter stages of DNA sequence generation. Step 3. it is difficult to decrease the analysis time because it is necessary to perform electrophoresis after PCR amplification.DNA Biometrics 149 3. PowerPlex™ 16 System(Promega) and AmpFlSTR Identifiler (Applied Biosystems). containing of the first.369 × 10−18 and 1. there are many methods for analyzing SNPs that do not demand such a lengthy process. consisting of 160 bits. which were 5. The MP for 41 SNPs (3. second. δX2.5 DNA INK In this paragraph. Furthermore. However. the entire process takes 4−5 hours. In the STR system. δ X 1 =1001100110011101011010101001011110100010 δ X 1 =10011001 [10] 10011101 [00] 01101010 [01] 10010111 [00] 10100010 [11] .. Generate the DNA-ID. was very similar to the MPs obtained with the current STR multiplex kits. 2007a). The macthing probability (MP) of each SNP ranged from 0. Step 1.1 Verification using validation experiment (SNP) As a validation experiment. and built a Japanese SNP database for identification. no correlation was found between alleles at any SNP loci. the author analyzed 120 autosomal SNPs in 100 unrelated Japanese subjects using the TaqMan® method (Applied Biosystems). Extract one-quarter of the data in the DNAI-ID ( δ X ) in order to reduce costs and improve practicality. The author modified the number of PCR cycles and the annealing/extension time. 3. The “DNA INK” is made of synthetic DNA and printing ink. δX . third and fourth 40 bits of δX. 2009). as described above. We called this step the “Encodeed Base Array“ method. 5. This synthetic DNA could be amplified by PCR. Define the identification data format by adding a header (H. 00=A(adenine).) Step 4. N: Serial number . add dummy DNA in order to make the DNA-ID sequence difficult to analyze by someone who does not know the primer sequences. therefore. Step 6. 10=G(guanine). 10 bits) and a serial number (N. Transform the bit series generated above into base sequences according to the following scheme. 30 bits) to δ X 1 (40 + 10 = 50 bits). Mix 3 mg of double-strand DNA with 100 ml of ink. The resulting DNA sequence. N (15-bp) and δX 1 (25-bp) would then be flanked by two 20bp−long primer sequences. Sequence structure of the 85-bp single-strand DNA-ID P1. In addition. but contains an IR color former that enables easy detection of the printed mark. P2: Primer sequences are designed so as not to anneal to the human genome H: Header. and only those who know the primer sequences would be able to analyze the intervening sequence.150 Biometrics (Shift and check bits show as square brackets with underlines. but much less physically stable. so that it is invisible to the naked eye. double-strand PCR-amplified DNA should be used for incorporation into the DNA ink. Fig. Synthesize the complementary strand. Step 7. 11=T(thymine) δ X 1 =10011001 [10]10011101 [00] 01101010 [01] 10010111 [00] 10100010 [11] =GCGC [G] GCTC [A] CGGG [C] GCCT [A] GGAG [T] Step 5. consisting of H (5-bp). Synthetic single-strand DNA is more economical to produce than double-strand DNA. 01=C(cytosine). The ink itself is composed of a colorless transparent pigment. Figure 5 shows the structure of the 85-bp synthetic DNA sequence. and the ability to use the same analysis platform all over the world.3 Monozygotic twins and DNA chimeras Monozygotic twins.. It takes at least 4 hours to get STR identification data by common methods used in forensic science. because the DNA-ID system involves handling information that can identify each individual. Most of the time required for DNA analysis is taken up by PCR amplification and electrophoresis. and furthermore that environmental factors can change gene expression and susceptibility to disease by affecting epigenetics. i. ultraviolet (UV) and sunlight. and can‘t be distinguished by DNA polymorphism. 3. acids. Itakura et al. It is impossible to dramatically shorten the duration of these steps using existing technologies. The weak points of DNA-ID are discussed below. Samples printed using DNA ink were covered with zinc oxide (ZnO) on the surface in order to enhance resistance to UV light. Problems of DNA biometrics There can be no doubt that DNA-ID is potentially useful as a biometric. However.. which is the major cause of DNA degradation. discriminatory power (and ease of increasing this power). alkalis.DNA Biometrics 151 The several types of resistance tests. the durability improved when the ink was covered by ZnO. which is fertilized by one sperm but then splits into two eggs early in the gestational period.1. however: it is possible to analyze 96 SNPs within 30 minutes (Hashiyada. it should be strictly supervised in order to protect privacy. begin life as a single egg. However. raw materials like buccal swab should be especially tightly controlled in order to prevent spoofing. 4. Once the DNA-ID has been generated. on the surface of everything excluding the air and water. Therefore. including accuracy. sometimes one member of a pair of identical twins can develop cancer or schizophrenia while the other does not (Zwijnenburg et al. were used to ascertain the durability of DNA ink for practical use. DNA polymorphism information is not widely used in biometrics at this point. allowing successful amplification even after 40 hours of UV exposure. 4. Therefore. 2009). It has many advantages. the one-way encryption described above makes it impossible to recover any of the original DNA information (3. 2010). However. strictness. the DNA ink was proved as a sort of biological memory which could print the polymorphism information created by DNA. alcohol. The target DNA sequence was detected successfully in all resistance tests except for the UV exposure test.3). 4. or more commonly referred to as identical twins. a SNP system could use a specific usage.e.. changes in the DNA .1 Time required for DNA analysis The most serious flaw is that DNA analysis is time-consuming compared to other authentication methods. A recent “twin study” has revealed that twin pairs have significant differences in their DNA sequence. by heat. since the STRs and the SNP loci were selected from the extragenic regions.2 Ethical concerns The polymorphic target region in DNA used to create the DNA-ID does not relate to a person’s physical characteristics or disease factors. SNP analysis may be faster. the twins share a precisely duplicated whole genome. for example in passports or in very large-scale mercantile transactions. Finally. However. 4. Thus. DNA-ID has some disadvantages. However. it is currently difficult to complete analysis within 3 hours. it is necessary to equip a laboratory and employ specialists in molecular biology. 4. The DNA-ID could be encrypted via the one-way function (SHA-1) to protect privacy and to reduce data length. ethical concerns. as well. using breakthrough methods developed in the near future. All of these methods of verification are based on matching analog patterns or feature-point comparisons. These high costs may pose a barrier to entry of venture capitals. DNA analysis revealed that the STR profile of PBL of the recipient had completely converted to donor type. M. Both systems yielded verifiable results in validation experiments. The author has observed chimerism in a case of allogeneic bone marrow transplantation (BMT). and does not change either during a person’ life or after his/her death. iris. buccal mucosa. I also thank Prof. they have not yet achieved a universal standard. A DNA chimera refers to a recombinant molecule of DNA composed of segments from more than one source. whereas the hair follicles and fingernails were recipient-derived. and I give special thanks to my colleagues at Div.The recipient had suffered from acute promyelocytic leukemia and received a BMT from a healthy donor. Such data will hopefully aid development of tools that allow discrimination between the identical twins in the near future. and the impossibility of discrimination of monozygotic twins. 6. however. Acknowledgments I am grateful to Dr. Forensic Medicine.e. DNA information is intrinsically digital. Personal identification devices based on unique patterns of fingerprints. . Neutrophilic leukocytes were observed in smear specimens from buccal swabs of the recipient. The discriminatory power of the data can be enhanced by increasing the number of STR or SNP loci. The more popular such DNA techniques become.. Conclusion Development of biometric authentication technologies has progressed rapidly in the last few years. DNA patterns of the buccal mucosa appeared chimeric. Yukio Itakura for his extensive support. The author also introduced the idea of DNA-INK as a practical application of DNA-ID. however. hair follicles and fingernails were collected from the transplant recipient. the DNA-ID is thought to be the most reliable method for personal identification. resulting in complete remission of the leukemia. the author believes that the DNA-ID must be employed as a biometric methodology. Tohoku University. 2009). indicating that the buccal cells were not truly chimeric but were instead merely contaminated with leukocytes. Among the various types of biometric information source. 5. In addition. Funayama for reading the manuscript and giving me helpful advice.152 Biometrics that do not alter its sequence (Haque et al. Because they lack absolute accuracy. Using the STR system. or subcutaneous veins in the finger have all been commercialized. including long analysis time. using the SNP system.4 Cost DNA analysis requires a high capital cost in order to buy and maintain equipment as well as purchase commercial kits. i.. it is possible to analyse 96 SNPs within 30 minutes. high cost. they had qualities of both the recipient and donor. the lower the unit costs of the apparatus and reagents will become. Samples of peripheral blood leukocytes (PBL). Forensic DNA typing by capillary electrophoresis using the ABI Prism 310 and 3100 genetic analyzers for STR analysis. 7-15. Practice in Forensuic Medicine. Hashiyada. J. Collins. 234(2): p. (2001). S.. (2002). Kanetake. F. DNA Polymorphism Official Journal of Japanese Society for DNA Polymorphism Resarch. Butler. A. Jasinska. M. et al. 1397-412. Krzyzosiak (2004)... The Research and practice in forensic medicine.S. Repetitive sequences that shape the human transcriptome..N. S453-4. Hagelberg.. Nata. The essence of SNPs. 23(2): p. Y.. Nat Genet. 11 Suppl 1: p..DNA Biometrics 153 7.. Sakai. 3.M. Lewis.D. Y. Andrews. M.J. F. et al. Y. Roberts. (1981). Hashiyada. (2004). References Alberts.&W. (1999). Walter P. Finishing the euchromatic sequence of the human genome. Forensic Sci Int. 5. Takahashi. Forensic Sci Int Genet... M. Funatyama. 250-3. Anderson.M. T. Nature..M. 204-10. An assessment of the utility of single nucleotide polymorphisms (SNPs) for forensic purposes. 931-45. Identification of the skeletal remains of a murder victim by DNA analysis. et al. USA: Garland Science. T.. M. E. (2010). 44(5): p. Takei.. Am J Med Genet C Semin Med Genet.. 177-86. 133(3): p. Int J Legal Med.. B.Itakura. Nagashima. T. et al. M. (2003b). (1999). Forensic Sci Int. The length polymorphism od SE33(ACTBP2) locus in Japanese population. Hashiyada. 151C(2): p.. Nagasima. 427-9. Y. FEBS Lett. et al. 25(10-11): p.. 114(4-5): p. (2007a). A. J. J. M. Butler. T. Matsuo. K. et al. M... J. . Itakura.. 457-65. et al. Nagashima. (2008). Leg Med (Tokyo). 352(6334): p. Hashiyada. (2007b).. (2004). T. 567(1): p. Nata. S. (1995). Hashiyada. Funatyama. Raff. Itakura. Sakai. 136-41.. Gill. M. The birthday paradox in the biometric personal authentication system using STR polymorphism -Practical matching probabilities evaluate the DNA psesonal ID system-. Funayama. (1999). 136-41. Polymorphism of 17 STRs by multiplex analysis in Japanese population. J. A validation study for the extraction and analysis of DNA from human nail material and its application to forensic casework. et al. 1053-6. M. J. 15: p.. 290(5806): p. (2003a). Development of a spreadsheet for SNPs typing using Microsoft EXCEL. 76(1): p. Not really identical: epigenetic differences in monozygotic twins and implications for twin studies in psychiatry. M.J. Haque. High-throughput SNP analysis for human identification.. M... Hedman. Electrophoresis. Identification of bodies from the scene of a mass disaster using DNA amplification of short tandem repeat (STR) loci. S.. Nagasima. Jhonson. Nature.. Fundamemntals of Forensic DNA Typipng: ELSERVIER. P. Nature.M. Sequence and organization of the human mitochondrial genome. et al. Anderson. A.. Clayton. J. (2009). (2009). Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.. Brookes. R... 46: p.. 147. A fast analysis system for forensic DNA reference samples. J Forensic Sci. 4. M. (1991). T.. Funayama. Itakura Y. et al. Gene. 50: p.. 2(3): p. 184-9. Molecular biology of THE CELL NY. 431(7011): p. 154 Biometrics Jeffreys, A.J., et al. (1985). Hypervariable 'minisatellite' regions in human DNA. Nature, 314(6006): p. 67-73. Jeffreys, A.J., et al. (1992). Identification of the skeletal remains of Josef Mengele by DNA analysis. Forensic Sci Int, 56(1): p. 65-76. Jeffreys, A.J., et al. (1995). Mutation processes at human minisatellites. Electrophoresis, 16(9): p. 1577-85. Jobling, M.A.&C. Tyler-Smith (2003). The human Y chromosome: an evolutionary marker comes of age. Nat Rev Genet, 4(8): p. 598-612. Kayser, M., et al. (2004). A comprehensive survey of human Y-chromosomal microsatellites. Am J Hum Genet, 74(6): p. 1183-97. Kimpton, C.P., et al. (1993). Automated DNA profiling employing multiplex amplification of short tandem repeat loci. PCR Methods Appl, 3(1): p. 13-22. Kimpton, C.P., et al. (1996). Validation of highly discriminating multiplex short tandem repeat amplification systems for individual identification. Electrophoresis, 17(8): p. 1283-93. Lander, E.S., et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822): p. 860-921. Lee, H.C., et al. (1998). Forensic applications of DNA typing: part 2: collection and preservation of DNA evidence. Am J Forensic Med Pathol, 19(1): p. 10-8. Lee, H.C.&C. Ladd (2001). Preservation and collection of biological evidence. Croat Med J, 42(3): p. 225-8. Mullis, K., et al. (1986). Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harb Symp Quant Biol, 51 Pt 1: p. 263-73. Mullis, K.B.&F.A. Faloona (1987). Specific synthesis of DNA in vitro via a polymerasecatalyzed chain reaction. Methods Enzymol, 155: p. 335-50. Ruitberg, C.M., et al. (2001). STRBase: a short tandem repeat DNA database for the human identity testing community. Nucleic Acids Res, 29(1): p. 320-2. Sachidanandam, R., et al. (2001). A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, 409(6822): p. 928-33. Saiki, R.K., et al. (1986). Analysis of enzymatically amplified beta-globin and HLA-DQ alpha DNA with allele-specific oligonucleotide probes. Nature, 324(6093): p. 163-6. Shen, W.&T. Tan (1999). Automated biometrics-based personal identification. Proc Natl Acad Sci U S A, 96(20): p. 11065-6. Slater, G.W., et al. (2000). Theory of DNA electrophoresis: a look at some current challenges. Electrophoresis, 21(18): p. 3873-87. Stenson, P.D., et al. (2009). The Human Gene Mutation Database: 2008 update. Genome Med, 1(1): p. 13. Venter, J.C., et al. (2001). The sequence of the human genome. Science, 291(5507): p. 1304-51. Vijaya Kumar, B.V., et al. (2004). Biometric verification with correlation filters. Appl Opt, 43(2): p. 391-402. Watson, J., Baker, T., Bell, S., Gann, A., Levine, M., Losick R. (2004). Molecular Biology of the Gene, San Francisco, CA, USA: Benjamin Cummings, Cold Spring Harbor Laboratory Press. Zwijnenburg, P.J., et al. (2010). Identical but not the same: the value of discordant monozygotic twins in genetic research. Am J Med Genet B Neuropsychiatr Genet, 153B(6): p. 1134-49. Part 2 Behavioral Biometrics 0 8 Keystroke Dynamics Authentication Romain Giot, Mohamad El-Abed and Christophe Rosenberger GREYC Research Lab Université de Caen Basse Normandie, CNRS, ENSICAEN France 1. Introduction Everybody needs to authenticate himself on his computer before using it, or even before using different applications (email, e-commerce, intranet, . . . ). Most of the times, the adopted authentication procedure is the use of a classical couple of login and password. In order to be efficient and secure, the user must adopt a strict management of its credentials (regular changing of the password, use of different credentials for different services, use of a strong password containing various types of characters and no word contained in a dictionary). As these conditions are quite strict and difficult to be applied for most users, they do not not respect them. This is a big security flaw in the authentication mechanism (Conklin et al., 2004). According to the 2002 NTA Monitor Password Survey1 , a study done on 500 users shows that there is approximately 21 passwords per user, 81% of them use common passwords and 30% of them write their passwords down or store them in a file. Hence, password-based solutions suffer from several security drawbacks. A solution to this problem, is the use of strong authentication. With a strong authentication system, you need to provide, at least, two different authenticators among the three following: (a) what you know such as passwords , (b) what you own such as smart cards and (c) what you are which is inherent to your person, such as biometric data. You can adopt a more secure password-based authentication by including the keystroke dynamics verification (Gaines et al., 1980; Giot et al., 2009c). In this case, the strong authentication is provided by what we know (the password) and what we are (the way of typing it). With such a scheme, during an authentication, we verify two issues: (i) is the credential correct ? (ii) is the way of typing it similar ? If an attacker is able to steal the credential of a user, he will be rejected by the verification system because he will not be able to type the genuine password in a same manner as its owner. With this short example, we can see the benefits of this behavioral modality. Figure 1 presents the enrollment and verification schemes of keystroke dynamics authentication systems. We have seen that keystroke dynamics allows to secure the authentication process by verifying the way of typing the credentials. It can also be used to secure the session after its opening by detecting the changing of typing behavior in the session (Bergadano et al., 2002; Marsters, 2009). In this case, we talk about continuous authentication (Rao, 2005), the computer knows how the user interacts with its keyboard. It is able to recognize if another individual uses the 1 http://www.nta-monitor.com/ 158 Biometrics Fig. 1. Keystroke dynamics enrolment and authentication schemes: A password-based authentication scenario keyboard, because the way of interacting with it is different. Moreover, keystroke dynamics can also prevent the steal of data or non authorized computer use by attackers. In this chapter, we present the general research field in keystroke dynamics based methods. Section 2 presents generalities on keystroke dynamics as the topology of keystroke dynamics methods and its field of application. Even if it has not been studied a lot comparing to other biometric modalities (see Table 1), keystroke dynamics is a biometric modality studied for many years. The first reference to such system dates from 1975 (Spillane, 1975), while the first real study dates from 1980 (Gaines et al., 1980). Since, new methods appeared all along the time which implies the proposal of many keystroke dynamics systems. They can be static, dynamic, based on one or two classes pattern recognition methods. The aim of this section is to explain all these points. Modality keystroke dynamics gait fingerprint face iris voice Nb doc. 2,330 1,390 17,700 18,300 10,300 14,000 Table 1. Number of documents referenced by Google Scholar per modality. The query is “modality biometric authentication" In section 3, we present the acquisition and features extraction processes of keystroke dynamics systems. Section 4 presents the authentication process of such keystroke dynamics based methods. These methods can be of different types: one class based (in this case, the model of a user is only built with its own samples), or two classes based (in this case, the model of a user is built also with samples of impostors). For one class problems, studies are based Keystroke Dynamics Authentication Keystroke Dynamics Authentication 159 3 on distance measures Monrose & Rubin (1997), others on statistical properties (de Magalhaes et al., 2005; Hocquet et al., 2006) or bioinformatics tools Revett (2009). Concerning two classes problems, neural networks (Bartmann et al., 2007) and Support Vectors Machines (SVM) (Giot et al., 2009c) have been used. Section 5 presents the evaluation aspects (performance, satisfaction and security) of keystroke dynamics systems. A conclusion of the chapter and some emerging trends in this research field are given in section 6. 2. Generalities 2.1 Keystroke dynamics topology Keystroke dynamics has been first imagined in 1975 (Spillane, 1975) and it has been proved to work in early eigthies (Gaines et al., 1980). First studies have proved that keystroke dynamics works quite well when providing a lot of data to create the model of a user. Nowadays, we are able to perform good performance without necessitating to ask a user to give a lot of data. “A lot of data” means typing a lot of texts on a computer. This possibility of using, or not, a lot of data to create the model allows us to have two main families of keystroke dynamics methods (as illustrated in Figure 2): • The static families, where the user is asked to type several times the same string in order to build its model. During the authentication phase, the user is supposed to provide the same string captured during his enrollment. Such methodology is really appropriate to authenticate an individual by asking him to type its own password, before login to its computer session, and verifying if its way of typing matches the model. Changing the password implies to enroll again, because the methods are not able to work with a different password. Two main procedures exist: the use of a real password and, the use of a common secret. In the first case, each user uses its own password, and the pattern recognition methods which can be applied can only use one class classifiers or distance measures. In the second case, all users share the same password and we have to address a two classes problem (genuine and impostor samples) (Bartmann et al., 2007; Giot et al., 2009c). Such systems can work even if all the impostors were not present during the training phase (Bartmann et al., 2007). • The dynamic families allow to authenticate individuals independently of what they are typing on the keyboard. Usually, they are required to provide a lot of typing data to create their model (directly by asking them to type some long texts, or indirectly by monitoring their computer use during a certain period). In this solution, the user can be verified on the fly all the time he uses its computer. We can detect a changing of user during the computer usage. This is related as continuous authentication in the literature. When we are able to model the behavior of a user, whatever the thing he types, we can also authenticate him through a challenge during the normal login process: we ask the user to type a random phrase, or a shared secret (as a one-time password, for example). 2.2 Applications and interest From the topology depicted in Figure 2, we can imagine many applications. Most of them have been presented in scientific papers and some of them are proposed by commercial applications. 2 Monitoring and continuous authentication Continuously monitoring the way the user interacts with the keyboard is interesting (Ahmed & Traore. . the system is able to detect the change of user during the session life. By this way. in addition of verifying the password. The best practices of password management are rarely (even never) respected (regular change of password. since it allows to avoid impostors which were able to get the password to authenticate instead of the real user. Rao. he is rejected and considered as an impostor. they can be easily obtained by sniffing network. which is the way of typing the password. If it matches to the user profile. Song et al. This avoids the use of one time passwords associated to keystroke dynamics (when we are not in a monitoring way of capturing biometric data). (ii) what we are. 1997). detect abnormal activities while accessing to highly restricted documents or executing tasks in an environment where the user must be alert at all the times (Monrose & Rubin. use of a complex password. Otherwise.. the way of typing is also verified.160 4 Keystroke Dynamics Will-be-set-by-IN-TECH Biometrics Dynamic authentication Static authentication Continuous authentication Random password Two classes authentication One class authentication Fig. If the user keeps a simple password. In addition. ). than more complicated ones. That is here. sadly. 2. Modi & Elliott (2006) show that. using spontaneously generated password does not give interesting performance. unique to the user). the keystroke dynamics process uses different information such as the name of the user. 2005. we obtain two authentication factors (strong authentication): (i) what we know. .1).2. Such monitoring can also be used to analyse the behavior of the user (instead its identity). forbid to write the password on a paper.2. 2. and. where keystroke dynamics is interesting. he remembers it more easily. By this way. Moreover. When used in a logical access control.1 Authentication for logical access control Most of commercial softwares are related to static keystroke dynamics authentication by modifying the Operating System login procedure. the computer is able to lock the session if it detects that the user is different than the one which has previously been authenticated on this computer. because they are too restrictive. and. administrators lost less time by giving new passwords. and. With such a mechanism. the password of the user. an additional passphrase (common for all the users. . he is authenticated. Topology of keystroke dynamics families 2. some studies showed that keystroke dynamics holds better performance when using simple passwords. since a wide range of websites or protocols do not implement any protection measures on the transmission links. which is the password of the user. The authentication form is modified to include the capture of the timing information of the password (see Section 3.2. . the name and the password of the user. 2008. 2000). the quantity of required data can be totally different between the studies (from five inputs (Giot et al. The author argues that if the computer is able to get the emotional state of the user. Marsters (2009) proposes a solution to this problem of privacy. it is possible to detect a bot which controls the computer. He respectively obtains 79. which allows to improve the performance of the system. improve the privacy of the data. . and. Hocquet et al. 3. as everything is automatic. Keystroke dynamics capture The capture phase is considered as an important issue within the biometric authentication process. it is impossible to recover the chronological log of keystroke. Depending of the type of keystroke dynamics systems. there is no semantic information on the group. Instead of storing this information in an ordered log.2. this was just a suggestion. intercepts its actions (Stefan & Yao. the enrollment procedure can be relatively different (typing of the same fixed string several times. Monrose & Rubin (2000) suggest the use of keystroke dynamics to verify the state of the user and alert a third party if its behavior is abnormal. where it is necessary to collect several samples of the user in order to build its model. It can be also used as an extra feature during the authentication process in order to improve the performance. because the system monitors all the events. but has a lot of privacy concerns. Keystroke dynamics is also used to differentiate human behavior and robot behavior in keyboard use. It collects quadgraphs (more information on ngraphs is given later in the chapter) for latency and trigraphs for duration. . ). . it can adapt its interface depending on this state. Khanna & Sasikumar (2010) show that 70% of users decrease their typing speed while there are in a negative emotional state (compared to a neutral emotional state) and 83% of users increase their typing speed when their are in a positive emotional state. (2006) show that keystroke dynamics users can be categorised into different groups. Giot & Rosenberger (2011) show that it is possible to recognize the gender of an individual who types a predefined string.Keystroke Dynamics Authentication Keystroke Dynamics Authentication 161 5 Continuous authentication is interesting.5% and 84. They are compared to the biometric model of the claimant.2% of correct classification for the relaxed and tired states. 2008).. where a single sample is collected. 2. Such ability facilitates computer-mediated communication (communication through a computer). This information can be useful to automatically verify if the gender given by an individual is correct. By this way. and. The capture takes place at two different important times: • The enrollment. His keystroke dynamics system is not able to get the typed text from the biometric data. Various features are extracted from this sample. But. Authors achieved an improvement of 20% of the Error Equal Rate (EER) when using the guessed gender information during the verification process. This way. The gender recognition accuracy is superior to 91%. The parameters of the keystroke dynamics system are different for each group (and common for each user of the group). and. monitoring of the computer usage. • The verification. Epp (2010) shows that it is possible to get the emotional state of an individual through its keystroke dynamics. it is stored in a matrix. However. . 2009c) to more than one hundred Obaidat & Sadoun (1997)).3 Ancillary information Keystroke dynamics can also be used in different contexts than the authentication. They automatically assign each user to a group (authors empirically use 4 clusters). and not a verification. 162 6 Will-be-set-by-IN-TECH Biometrics This section first presents the hardware which must be used in order to capture the biometric data. The price of this hardware. 2004). Hence. QWERTY. 3. and. because (when it is a classical one).. whereas we . . We can see that they are totally different. ). it would cost no more than 5$. 3. (a) Desktop keyboard (b) Laptop keyboard Fig. each keyboard is different on various points: • The shape (straight keyboard. Figure 3 presents the shape of two commonly used keyboards (laptop and desktop). Some studies only used the numerical keyboard of a computer (Killourhy & Maxion. It has not been treated a lot in the keystroke dynamics literature. in order to ease the comparison of these systems. as well as the number of sensors to buy. the various associated features which can be collected from this data. Table 2 presents the sensor and its relative price for some modalities. . can be determinant when choosing a biometric system supposed to be used in a large infrastructure with number of users (e. This is not at all a biometric information.1 Mandatory hardware and variability Each biometric modality needs a particular hardware to capture the biometric data. Difference of shape of two classical keyboards Having this sensor (the keyboard) is not sufficient. Price comparison of hardware for various biometric modalities Of course. . This problem is well known in the biometric community and is related as cross device matching (Ross & Jain. Rodrigues et al. the way of typing on it is also different (maybe mostly due by the red ball on the middle of the laptop keyboard). if we choose a logical access control for each machine). Modality keystroke fingerprint face iris hand veins Sensor keyboard fingerprint sensor camera infrared camera near infra red camera Price very cheap normal normal very expensive expensive Table 2. . ) • The pressure (how hard it is to press the key) • The position of keys (AZERTY. 2006). 2010. Keystroke dynamics is probably the biometric modality with the cheapest biometric sensor : it uses only a simple keyboard of your computer. If a keyboard is broken and it is necessary to change it. all the more we already know if it is the correct password or not. keyboard with a curve.g. Such keyboard is present in all the personal computers and in all the laptops. necessity to buy a fingerprint sensor for each computer. the only information it provides is the code of the key pressed or released. changing a keyboard may affect the performances of the keystroke recognition. . ergonomic keyboard. and. . Some works (Eltahir et al. However. Grabham & White. Nguyen et al. we can exploit an extra information in order to discriminate more easily the users: the pressure force exerced on the key. it is necessary to press several times the same key in order to obtain an alphabetical character. it can also work with any machine providing a keyboard. Lopatka & Peetz use the movement in the z axis as information.e. key-released time and key-typed forces. some studies have been done by using other kinds of sensors in order to capture additional information and improve the recognition. The second thing we need is an accurate timer. Sound signals produced by the keyboard typing have also been used in the literature.apple. Performance is similar to classical keystroke dynamics systems. et al. (2010) only use sound signals when typing the password. keystroke dynamics works with a classical keyboard on a computer. One common machine having a keyboard and owned by a lot of people is the mobile phone where we can use keystroke dynamics on it.. (2010) argue that it is important to take into consideration this timer. it seems that this information is quite efficient. Some researchers have also studied the effect of using an external clock instead of the one inside the computer. and avoids the necessity to buy a specific sensor. and. the timer accuracy can be different between the different languages used (by the way. Dozono et al. Clarke & Furnell (2007) show its feasibility and highlight the fact that such authentication mechanism can only be used by regular users of mobile phones.com/kb/HT1935 . especially when comparing algorithms. (2007) use the sound information in addition to the timing values (i.Keystroke Dynamics Authentication Keystroke Dynamics Authentication 163 7 are interested in if it is the right individual who types it. This issue has been extensively discussed by Killourhy & Maxion (2008). each operating system is able to use it. in order to capture at a sufficient precision the time when an event occurs on the keyboard. • Mobile phone with all the keys (letters and numbers) accessible with the thumbs. We have three kinds of mobile phones: • Mobile phone with a numerical keyboard. (2009) present a study on such a mobile phone. 2008) have tested the possibility of using a pressure sensor inside each key of the keyboard. Of course. it is a feature fusion) which held better performance than the sound alone. Lopatka & Peetz (2009) propose to use a keyboard incorporating a Sudden Motion Sensor (SMS)2 . Once again. Campisi et al. They argue that such authentication mechanism must be coupled with another one. where it is shown that better performance are obtained with higher accuracy timer. 2008. There is a drawback with this timer: its resolution can be different depending on the chosen programming language or the operating system. this timer is already present in every computer. Pavaday. or something similar to a keyboard. Historically. This is a kind of keyboard quite similar to a computer’s keyboard. In this case.. keep in mind. key-pressed time. 2 http://support. Hence. we do not need to buy it. Such sensor (or similar ones) is present in recent laptops and is used to detect sudden motion of the computer in order to move the writing heads of the hard drive when a risk of damage of the drive is detected. In this case. as keystroke dynamics can work with any keyboard. Even on the same machine. They also explain how to configure the operating system in order to obtain the best performances. From these preliminary study. that web based keystroke dynamics implementation use interpreted languages –java or javascript– which are known to not have a precise timer on all the architectures). or the timing information alone. and obtain indirectly through the analysis of this signal. because it has an impact on performance. there are few studies on this kind of mobile phone. Topology of keystroke dynamics sensors of the literature Timer variations Operating System Desktop application Type Language Mobile phone Web based application Native Interpreted Fig. Figure 4 presents the topology of the different keystroke dynamics sensors. Although. we can capture the pressure information and position of the finger on the key which could be discriminating. Topology of factors which may impact the accuracy of the timer 3. on a classic keyboard. we think the future of keystroke dynamics is on this kind of material. We can argue that the two previous mobile phones are already obsolete and will be soon replaced by such kind of mobile phones. 4. while the Figure 5 presents the variability on the timer. With such a mobile phone. . various kinds of information can be captured.2 Captured information As argued before. we only emphasize. They mainly depend on the kind of used sensors. but a touch screen. in this chapter. 5. Keystroke Dynamics Sensor Computer Mobile PC/Laptop keyboard Microphone Numeric keyboard Touch screen Mobile keyboard Pressure sensitive All the keys Numeric keyboard Fig. we have presented some sensors that are more or less advanced in the previous subsection. Although.164 8 Will-be-set-by-IN-TECH Biometrics • Mobile phone without any keyboard. 1 Raw data In all the studies.2. and. or in continuous capture during the use of the computer. computed by subtracting timing values. timei ). • Timestamp. – release occurs when the key is released. 3.microsoft. we present the most commonly used in the literature. Depending on the kind of keystroke dynamics application. changing the scheduler policy to FIFO for Linux machines. the raw data is captured in different kind of scenarios: in the authentication form to type the login and password. is a chronologically ordered list of events: the list starts empty.2. eventi .2. The raw data can be expressed as (with n the number of events on the form n = 2 ∗ s with s the number of keys pressed to type the text): ⎧ ⎪ (keycodei . the same raw data is captured (even if they are not manipulated as explained here). • Key code. • Duration. ∀i. This key code may be dependant of the platform and the language used. for keystroke dynamics.. It is usually represented in milliseconds. The raw biometric data. We are interested by events on the keyboard. The key code is more interesting than the character. because it gives some information on the location of the key on the keyboard (which can be used by some keystroke dynamics recognition methods) and allows to differentiate different keys giving the same character (which is a discriminant information (Araujo et al.com/en-us/library/ms644904%28v=VS. for example). We can obtain the character from this code (in order to verify if the list of characters corresponds to the password. It encodes the time when the event occurs. The duration is the amount of time a key is pressed. Pavaday. For the key i (i is omitted for sake of readability) it is computed as following: duration = time{event = RELEASE} − time{event = PRESS} 3 (2) http://msdn.Keystroke Dynamics Authentication Keystroke Dynamics Authentication 165 9 3. There are two different events: – press occurs when the key is pressed. It is the code of the key from which the event occurs.aspx . (2010) propose to use the Windows function QueryPer f ormanceCounter3 with the highest priority enabled for Windows computers. 0 <= i < n ⎪ ⎨ keycodei ∈ Z (1) ⎪ eventi ∈ { PRESS. 2005)). but this is not mandatory. These events are initiated by its user. It is generated by an action on the key. when an event occurs.1 First order The most often extracted features are local ones.2. it is appended at the tail of the list with the following information: • Event.2 Extracted features Various features can be extracted from this raw data. in a form asking to type a predefined or random text different than the login and password. et al. Its precision influence greatly the recognition performance.85%29. RELEASE} ⎪ ⎩ timei ∈ N Umphress & Williams (1985) only use the six first time values of each word (so s ≤ 6). 3. RRi = timei+1 {eventi+1 = RELEASE} − timei {eventi = RELEASE} (5) We can obtain the RP latencies which are the difference of time between the release of one key and the pressure of the next one: ∀i. The typing difficulty is based on the distance (on the keyboard) between two successive characters (to type). The total time needed to type the text can also be used. (2011) discusses on the way of using these extracted features in order to improve the recognition rate of keystroke dynamics systems. at least. Different kinds of latencies can be used. • Mistake ratio. The digraph features D of a password is computed as following: ∀i. use of shift key).. but almost never the case in static authentication). with n taking different values. 1 ≤ i < n. 1 ≤ i ≤ n. We can obtain the PP latencies which are the difference of time between the pressure of each key: ∀i. the selected latency vector is the PP one. ∀i. RPi = timei+1 {eventi+1 = PRESS} − timei {eventi = RELEASE} (6) Most of the time. is the notion of digraph. • Middle time. a feature fusion is operated by concatenating the duration vector with. and the time at the beginning of the input. They are mainly global types of information: • Total typing. de Ru & Eloff (1997) use a concept of typing difficulty based on the fact that certain key combinations are more difficult to type than other. PRi = durationi (3) • Latencies. Other kinds of data can be encountered in various papers Ilonen (2003). 1 ≤ i < n. also named PR in the literature. one of the latency vector (it seems that most of the time.. counting the number of times the backspace key is hit gives an interesting feature. . When the user is authorised to do typing mistakes (this is always the case in continuous authentication. but it is not always indicated in the papers). They are computed by getting the differences of time between two keys events. 2002). The time difference between the time when the user types the character at the middle of the password. 1 ≤ i < n. PPi = timei+1 {eventi+1 = PRESS} − timei {eventi = PRESS} (4) We can obtain the RR latencies which are the difference of time between the release of each key: ∀i.166 10 Will-be-set-by-IN-TECH Biometrics We then obtain a timing vector (of the size of the typed text). trigraph are heavily used in (Bergadano et al.e. A digraph represents the time necessary to hit two keys. and if several keys are needed to create a character (i. A recent paper Balagani et al. or as a normalisation factor. 1 ≤ i < n. The information can be used as an extra feature to append to the feature vectors. containing the duration of each key press (by order of press). Di = timei+1 {eventi+1 = RELEASE} − timei {eventi = PRESS} (7) This notion has been extended to ngraph. Another concept that is often encountered in the literature. They also use a statistical method. 4. 4. . we are interested in the global shape of the typing. but the final biometric data is always a single vector composed of various features. Rogers & Brown (1996) cleanup data with using a Kohonen network (Kohonen. Most keystroke dynamics studies do not take care of the presence of outliers in the learning set. resulti = sourcei+1 − sourcei (8) • Entropy.5 . 2006). papers only use one latency and the duration. but from the first order features. It consists to get the minimum and maximum value of each type of data (latency and duration). or more than 1. 1985) it is timing values superior to 750ms. The way of computing it greatly depends on the used verification methods. While computing the model with several samples (see next section) feature selection mechanisms can remove non informative features. . 4.1 Outliers detection It is known that the classifier performance greatly depends on outliers presence in the learning dataset. the verification method compares the query sample (the biometric data captured during the authentication) to the model.2. most of the time. movements. it is time to build the model of each user. By using the slope of the biometric sample. but. Most of the time. During an authentication. It consists to get the mean value and its standard deviation of each type of data (latency and duration). An outlier feature is detected in the following way (for each feature): the feature is more than 1. • mean/std.1. 1 ≤ i < n. Based on the result of this comparison (which is commonly a distance). • Slope. • Spectral information. while in (Umphress & Williams. 1980). We expect that users type in the same way even if the speed may be different (Modi & Elliott.. We do not insist on papers using other information than timing values in the rest of this chapter (pressure force. Such a high quantity of data can be really boring for the users to provide. Authentication framework Once the different biometric data during enrolment procedure have been captured. 1995) using impostors samples.. the decision module accepts or rejects the user. All the operations are done with the wavelet transformed data. Chang (2006a) applies a discrete wavelet transformation to the original extracted features. We can imagine more complicated features. The entropy inside a sample has been only studied in (Monrose et al. Verification procedures performance greatly depends on the chosen features. 2002). thanks to its enrolled samples. ). Some studies (mainly in free text) remove times superior to a certain threshold. We have seen in this section that several features can be extracted. .2.1 Enrolment The enrolment step allows to create the model of each user. the number of samples used during the enrolment is superior to 20.Keystroke Dynamics Authentication Keystroke Dynamics Authentication 167 11 3. Killourhy & Maxion (2010) also detect outliers in biometric samples.2 Second order Some features are not extracted from the raw biometric data. • min/max. filtering is done by removing timing values superior to 500ms. In (Gaines et al. The new features (result) set is computed as following (with source): ∀i.5 inter-quartile range greater than the third quartile. 4. μ and sigma respectively represent an optimisation factor. It seems that. Very few studies have applied such kind of mechanism. The parameters K (k is chosen in order to minimise the squared error between the approximated function and the cumulative distribution function of the logarithm of timings distribution). The procedure is operated for each feature of each sample.1. False Rejection Rate reduces when the number of selected features increases. Bleha & Obaidat (1991) use a reduction technique based on Fisher analysis. (2006) select a subset of N features with the minors of standard deviation. (2007) use a wrapper system based on Particle Swarm Optimization (PSO) to operate the feature selection. 4. and not on the learning set.1. The aim of the feature selection is to reduce the quantity of data and speed up the computation time. 2005). . Azevedo et al. It seems that this point has also been rarely tested.2 Preprocessing Biometric data may be normalized before being used. However. • Wrapper approach which depends on the verification algorithm. When a feature is detected as being an outlier. By this way. it is replaced by a random sample (which is not an outlier) selected among possible values of this feature for this user. the variance).168 12 Will-be-set-by-IN-TECH Biometrics inter-quartile less than the first quartile. the outlier detection and correction is operated on the whole dataset.3 Feature selection A feature selection mechanism can be applied to remove irrelevant features. The best one is kept. Keeping 70% of the features gives interesting results. the number of samples is always the same. The PSO gives better results than a Genetic Algorithm. Experiments are done at Zero False Acceptance Rate. the technique consists in keeping m − 1 dimension for each vector. most of the time.. The aim is to remove irrelevant features based on different measures (e. This allows to cleanup the used dataset to compute the algorithm performance (and obtain better performance). the standard deviation of the logarithm of the timing values. eventually to improve the performance. which allows to eliminate less significant features. Different feature subsets are generated and evaluated. Such pre-processing allows to get better performance by using a normalisation function (Filho & Freire (2006) observed that the timing distribution is roughly Log-Normal) : g( x ) = 1 1 + exp − K (loge ( x )−μ σ (9) We did not find other references to other pre-processing approaches in the literature. Other similar methods are present in the literature (Chen & Lin. and.g. Two different kinds of feature extraction systems can be used: • Filter approach which does not depend on the verification algorithm. Yu & Cho (2004) use an algorithm based an Support Vector Machines (SVM) and Genetic Algorithms (GA) to reduce the size of samples and keep only key values for each user. Boechat et al. with m the number of users in the system (they have only 9 users in their system). the mean of the logarithm of the timing values. but not the enrolled samples of the user. . 2007. 2004. the verification module returns a comparison score.4 Model computation There are numbers of methods to verify if a query corresponds to the expected user. 2005) (variations being in the distance computing method (Kang & Cho.2 Verification The verification consists in verifying if the input of the user corresponds to the claimed identity. While the features are extracted from the raw biometric sample (same procedure than during the enrollment).g. 2009c. • Learning clusters with k-mean (Hwang et al. otherwise. for static authentication. he is rejected. 2006) or Gaussian Mixture Models (GMM) (Hosseinzadeh & Krishnan. Rogers & Brown. p represents the p norm of vector.. other on data mining methods. Generally. Clarke & Furnell. 2005). 1996) . ∀u∈[1. Some methods use one-class assumption (they only use the enrolment samples of the user).. Most used way of model computing are: • Computing the mean vector and standard deviation of enrolled samples (Umphress & Williams. 2009)). Usually. instead of being collected with real impostors (Clarke & Furnell. • Store the enrolled vectors in order to use them with k nearest neighbourg methods (Rao. query represents the query biometric sample (the test capture to compare to the model). 2006. 2005. 1997.. while other use two-class or multi-class assumption (they also use impostors enrollment samples to compute the model). 2007. they are compared to the model of the claimed user. 1997) . • Learning of bayesian classifiers (Janakiraman & Sim. Rao. 4. 2004). 2004). score = min query − enrolledu • The statistical methods. Pohoa et al. Obaidat & Sadoun.Keystroke Dynamics Authentication Keystroke Dynamics Authentication 169 13 4. Some of them are based on statistical methods. Obaidat & Sadoun. • Learning parameters of generative functions: Hidden Markov Model (HMM) (Galassi et al. Sang et al. so they are similar to the present list.. The comparison score is the min of these distances. the euclidean distance between the query and each of enrolled samples is computed. In (Monrose & Rubin.1. 1997). Rodrigues et al. Obaidat & Sadoun. 2 . If this score is below than a predefined threshold. 2006.. Yu & Cho. 1997). they may be automatically generated (Sang et al.Card(enrolement)] (10) . ... • Neural network learning (Bartmann et al. the user is authenticated. Several verification methods exist and depend on the way the enrollment is done. The way of capturing these inputs greatly depends on the kind of used keystroke dynamics system (e. 2009. 2006. When impostors samples are needed. The main families of computing are (Guven & Sogukpinar. • SVM learning (Giot et al. 2008) . Rao. 2003): • The minimal distance computing. 2007. the user must type its login and password).. 1985). data mining methods use a really huge number of enrolled samples to compute the model (several hundred of samples in some neural network methods) which is not realistic at all. It is not always clear in the various studies if the template update is done in a supervised way (impostors samples never added). 2006). • Bioinformatic methods based on string motif searching are also used (Revett et al. Other studies try to update the model of a user after being authenticated (Hosseinzadeh & Krishnan. 1991). This way. 2009) request the user to type the verification text several times (mainly between two and three). 2008. As the keystroke data deviates progressively with time.3 Improving the performance Different ways can be used to improve the performance of the recognition. or in a semi-supervised way (samples added if the classifier recognizes them as being genuine). Revett. The statistical method presented in (Hocquet et al. Such procedure reduces the False Rejection Rate without growing the False Acceptance Rate too much. This way. μ is the mean value of enrolled samples: score = (query − μ)t (query − μ) query 2 · μ 2 (11) A normalized version is also presented in the study. 2008. • Some methods are based on the disorder degree of vectors (Bergadano et al. SVM. • Others are based on timing discretisation (Hocquet et al. the verification consists in verifying if the guessed label corresponds to the label of the claimed identity (cf.170 14 Will-be-set-by-IN-TECH Biometrics One of the oldest methods is based on bayesian probabilities (Bleha et al. Several studies (Bartmann et al. Classifier performances greatly depend on the number of used samples to compute them. the model tracks the behavior modifications of the user through time. The system uses a transformation in frequential domains thanks to wavelets. 2007. and integrate them in the model. Killourhy & Maxion. 1990).. the result can be totally different: semi-supervised methods may add impostor samples in the model. timing values are smoothed when merging the two samples and light hesitation are suppressed. • Class verification. Even if the aim is to improve performance. 2009). 2007).. the model deviates from the real biometric data of the user and attracts more easily impostors samples. 2010. neural networks. 2002).. Another way to improve recognition performance is to fuse two samples together (Bleha & Obaidat.. For classifiers able to give a label.. 4. Hosseinzadeh & Krishnan. The fusion (Ross et al. 2006) of several keystroke dynamics methods on the same query is also a good way to improve performances: . when he is rejected. k − nn). Chang (2006b) artificially generates new samples from the enrolled samples in order to improve keystroke recognition. Revett. 2006) computes the score depending on the mean μ and the standard deviation σ of the enrolled samples: score = 1 − |query − μ| 1 exp − Card(query ) σ score = query − μ 2 2 (12) 1 Filho & Freire (2006) present another method which also computes a distance: (13) • Application of fuzzy rules de Ru & Eloff (1997). This way.. performance degrades with time when not using such procedure. in order to give him more chances of being verified.. . respectively). . Evaluation of keystroke dynamics systems Despite the obvious advantages of keystroke dynamics systems in enhancing traditional methods based on a secret. while the identification consists to determine the identity of the user. Identification based on keystroke dynamics has not been much experimented in the literature. Keystroke dynamics has also been successfully fused with other modalities. Hemery & Rosenberger. Teh et al. A benchmark database can contain real samples from individuals. El-Abed & Rosenberger (2010). (2006) have defined various measures to get the unicity. (2006) apply a fusion between three different keystroke dynamics methods. • Verification system performance metrics such as the Equal Error Rate (EER). Nevertheless. Hwang et al. several studies exist in the state-of-the-art to evaluate keystroke dynamics systems. 5. (2009). like face (Giot. • Different kinds of weighted sums score fusion functions are proposed in Giot. 5. (2011) propose an interesting review of most of the keystroke dynamics recognition methods.4 User identification The verification consists in verifying if the identity of the claimant is correct. (2007).Keystroke Dynamics Authentication Keystroke Dynamics Authentication 171 15 • Bleha et al. The main drawback is notably the lack of a generic evaluation method for such systems. Bleha et al. performane metrics. Nowadays. we need generally to compute their performance using a predefined protocol (acquisition conditions. It is generally realized within three aspects: performance. We need a reliable evaluation methodology in order to put into obviousness the benefit of a new method. test database. 2006). According to the International Organization for Standardization ISO/IEC 19795-1 (2006). satisfaction and security. • Hocquet et al. (1990) associate a bayesian classifier to a minimal distance computing between the query vector and the model. the identity being the owner of the model returning the lowest distance (or a reject if this distance is higher a threshold). Several benchmark databases exist in order to compare keystroke dynamics systems. By analysing the behavior of these measures comparing the recognition performance. which reflect the best the real use cases. (1990) use a bayesian classifier to identify the user.). 4. Karnan et al. 2010) or speaker recognition (Montalvao Filho & Freire. . the performance metrics are divided into three sets: • Acquisition performance metrics such as the Failure-To-Enroll rate (FTE). which greatly improves the performance. consistancy and discriminality. • Identification system performance metrics such as the False-Negative and the False-Positive Identification Rates (FNIR and FPIR. they find that it is possible to improve performance by asking users to artificially add pauses (helped by cues for being synchronized) when typing the password. it is costly in terms of efforts and time to create such a database. its proliferation is still not as much as expected. In order to compare these systems. As argued by Cherifi et al.1 Performance The goal of this evaluation aspect is to quantify and to compare keystroke dynamics systems. We may find methods specifics to identification. a good benchmark database must satisfy various requirements: . or compare the query to each model. The benchmark must embed a large diversity of users (culture. 3. interval between the release of a key. interval between two pressures.ecole.edu/~keystroke/. a lot of them are typed on a short period (50 at the same time). but.infonet. “jeffrey allen” and “drizzle”.). DB 3 Greyc alpha Giot et al. . Each user typed the password “greyc laboratory” twelve times. with a reasonable time interval between sessions. but. 2006). at least. DB 2 DSN2009 Killourhy & Maxion (2009) propose a database of 51 users providing four hundred samples captured in height sessions (there are fifty inputs per session). We present an overview of the existing benchmark databases: DB 1 Chaves Montalvão et al. The databases are available at http://itabi. Release of a key is not tracked. The maximum number of users in a database is 15.com. (2009a) propose the most important public dataset in term of users.172 16 Will-be-set-by-IN-TECH Biometrics 1. Three different passwords have been typed: “pr7q1z”. it is really difficult to attain. DB 4 Pressure-Sensitive Keystroke Dynamics Dataset Allen (2010) has created a public keystroke dynamics database using a pressure sensitive keyboard. . The database must also contain fake biometric templates to test the robustness of the system.html. but. The database is composed of couples of ASCII code of the pressed key and the elapsed time since the last key down event. The database is available at http://jdadesign. This is the dataset having the most number of samples per user. The database is available at http: //www. It seems that there is no other reference to this kind of experiment in the literature. on two distinct keyboards. The databases do not seem to be yet available on their website.ensicaen. in order to take into account the variation of individuals behavior. As keystroke dynamics is a behavioral modality. five distinct sessions. the number of provided samples per user is 10.br/biochaves/ br/download. This point is essential for any biometrics. It embeds the following raw data: key code. 104 users are present on the database. have used the same keystroke databases in several papers (Filho & Freire. during each session (which give 60 samples for the 100 users having participated to each session).cmu. .cs. Four different databases have been created. It is stored in raw text. and. time when release.htm. Most databases were built under two different sessions spaced of one week or one month (depending on the database). 2.fr/~rosenber/keystroke. only 7 of them provided a significant amount of data (between 89 and 504). Both extracted features (hold time and latencies) and raw data are available (which allow to build other extracted features). and the pressure of the next one. The database contains some extracted features: hold time. It contains 133 users and. age. The database is available at http://www. the database must be captured among different sessions. It is stored in an sqlite database file.net/2010/04/ pressure-sensitive-keystroke-dynamics-dataset/ in a csv or sql file. . csv or Excel files. Each database contains the raw data. whereas the 97 other have only provided between 3 and 15 samples. pressure force. time when pressed. Each biometric data has been captured when typing the following password: “. Each database is stored in raw text files. The delay between each session is one one day at minimum.tie5Roanl”. but the mean value is not stated. 100 of them provided samples of. Duration and 3 latencies Press and 7/97 (89-504) few months release /(3-15) events and pressure Press and 58 1 1 release time Filho & Freire (2006) Various Killourhy & Maxion (2009) 1 fixed string Giot et al.). Table 4 illustrates the differences of the used protocols in this research area for some major studies. Second. (2010) 14 phrases and 15 unix commands Table 3. the acknowledgement of the password (if it is an imposed password.citefa. these databases do not always fit the previous requirements. Although. It seems that almost all the users have done only one session. The database is available at http://www. Table 3 presents a summary of these public datasets. and GREYC alpha database (Giot et al. the used keyboards (which may deeply influences the way of typing). Each of them has been created for keystroke dynamics on computer (i. separation between sessions . this is the only work that compares .. due to several reasons. 2009c. n. Press and release times for each key are saved. (2010). (2009a) 1 fixed String Allen (2010) 3 fixed strings Bello et al.. 2009a). as stated in (Crawford. dynamic) that require different acquisition protocols. 2009a.gov..e. and the use of different or identical passwords (which impacts on the quality of impostors’ data). Summary of keystroke dynamics datasets Most of the proposed keystroke dynamics methods in the literature have quantified their methods using different protocols for their data acquisition (Giot et al. it would be the best kind of dataset. the age. Karnan et al. they differ on the used database (number of individuals. Giot et al.. First. as well as the user agent of the browser from which the session has been done. (2011) presents a comparative study of seven methods (1 contribution against 6 methods existing in the literature) using a predefined protocol. no public dataset has been built with one login/password different for each user. Killourhy & Maxion. 2009). no public dataset available for smartphones). Despite this. The performance comparison of these methods is quite impossible. The results from this study show a promising EER value equal to to 6. most of these studies have used different protocols for their data acquisition. gender and handness of the user and other information. Each session consists in typing 14 phrases extracted from books and 15 common UNIX commands. . We can see that some databases are available. which is totally understandable due to the existence of different kinds of keystroke dynamics systems (static. 2009). continuous. a high FTA is expected).Keystroke Dynamics Authentication Keystroke Dynamics Authentication 173 17 DB 5 Fixed Text The most recent database has been released in 2010 Bello et al. Giot et al.d.. 58 volunteers participated to the experiment. In order to resolve such problematic.95%.ar/si6/ k-profiler/dataset/ in a raw text file. which may explain why none of them have been used by researchers different than their creators. Dataset Type Information Users Samples Sessions /users Press events < 15 < 10 2 Duration and 51 400 8 2 latencies Press and > 100 60 5 release events. . To our knowledge. 2011. Killourhy & Maxion. 96% FRR 0% 8. (2007) Hosseinzadeh & Krishnan (2008) Monrose & Rubin (1997) Revett et al. • and. (2007) Revett et al. We believe that it is relevant to more investigate the quality of the captured keystroke features. because they depend a lot on user’s feelings at the moment of the data acquisition: user may change his way of performing tasks due to its stress. Previous works presented by Cho & Hwang (2006). E: Is the threshold global?). Hwang et al. Summary of the protocols used for different studies in the state-of-the-art (A: Duration of the database acquisition.2% 4. 5. (2009b) focusing on studying users’ acceptance and satisfaction of a keystroke dynamics system (Giot et al.1% 0. “/” indicates that no information is provided in the article. (1990) Rodrigues et al.58% 9. biometric systems respecting this satisfaction factor are considered as usefull. Satisfaction factors are rated between 0 and 10 (0 : not satisfied · · · 10 : quite satisfied). show that the system is well perceived and accepted by the users. trust in the system. (2010). there is a potential concern about the misuse of personal data (i.6% 2. The study presents several drawbacks of biometric systems including: • The lack of secrecy: everybody knows our biometric traits such as iris.e. in order to enhance the performance of keystroke dynamics systems. B: Number of individuals in the database. etc. Figure 7 summarizes users’ acceptance and satisfaction while using the tested system.8% 3. Paper Obaidat & Sadoun (1997) Bleha et al. Schneier (1999) compares traditional security systems with biometric systems.174 18 Will-be-set-by-IN-TECH Biometrics keystroke methods within the same protocol.58% 9. D: Is the acquisition procedure controlled?. concentration or illness. C: Number of samples required to create the template. 2009a). 5. of behavioral systems) provides a lower quality than the morphological and biological ones...8% 20% 5.6% 1. Hence. (2006) have employed pauses and cues to improve the uniqueness and consistency of keystroke features.2 Satisfaction This evaluation aspect focuses on measuring users’ acceptance and satisfaction regarding the system (Theofanos et al.6% 6. (2006) focus on improving the quality of the captured keystroke features as a mean to enhance system overall performance. . Hwang et al. These results show that the tested system is well perceived among the five acceptance and satisfaction properties.3 Security Biometric authentication systems present several drawbacks which may considerably decrease their security. the fact that a biometric trait cannot be replaced if it is compromised.96% Table 4. Moreover. 2008). tiredness.. (2009c) A 8 weeks 8 weeks 4 sessions / 14 days / 7 weeks 4 weeks 8 sessions 5 sessions B 15 36 20 38 30 41 42 8 51 100 C 112 30 30 / 10 30 / 12 200 5 D no yes / / / no no / yes yes E no yes no no no no no / no no FAR 0% 2.6% 6. It is generally measured by studying several properties such as easiness to use. The works done by El-Abed et al. In biometrics. templates) which is seen as violating users’ privacy and civil liberties. (2006) Hocquet et al. (2006) Killourhy & Maxion (2009) Giot et al.7% 0. and using a publicly available database.1% 3.15% 4. The performance of keystroke dynamics systems (more general speaking. there were no concerns about privacy issues during its use.3% / 5. Giot et al. (1998) assign users into four categories: • Sheep: users who are recognized easily (contribute to a low FRR). 6. An example of such attack is the zero-effort attempts. 2) and 4) In a replay attack. • Goats: users who are difficult to recognize (contribute to a high FRR). attackers try to impersonate legitimate users having weak templates. biometric systems is subject to errors such as False Acceptance Rate (FAR) and False Rejection Rate (FRR). Attackers may collect then inject previous keystroke events features using a keylogger. (2011) propose an extension of the Ratha et al.3.1 Set I architecture threats 1) Involves presenting a fake biometric data to the sensor. an intercepted biometric data is submitted to the feature extractor or the matcher bypassing the sensor. Usually. model (Ratha et al.Keystroke Dynamics Authentication Keystroke Dynamics Authentication 175 19 El-Abed et al.. Fig. . Doddington et al. 7) The keystroke templates can be altered or stolen during the transmission between the template database and the matcher.3. 6) Involves attacks on the template database such as modifying or suppresing keystroke templates. This inaccuracy illustrated by statistical rates would have potential implications regarding the level of security provided by a biometric system. Vulnerability points in a general biometric system. 2001) to categorize the common threats and vulnerabilities of a generic biometric system. 5. • Lambs: users who are easy to imitate (contribute to a high FAR). 3) and 5) The system components are replaced with a Trojan horse program that functions according to its designer specifications. and • Wolves: users who have the capability to spoof the biometric characteristics of other users (contribute to a high FAR). 5. 8) The matcher result (accept or reject) can be overridden by the attacker. Their proposed model is divided into two sets as depicted in figure 6: architecture threats and system overall vulnerabilities.2 Set II system overall vulnerabilities 9) Performance limitations By contrast to traditional authentication methods based on “what we know” or “what we own” (0% comparison error). goats and wolves users.. 2011). which illustrates the risk factors of the identified threats and system overall vulnerabilities among the ten assessment points (the maximal risk factor is retained from each point). it is important to integrate such information within the security evaluation process. 2009a) is presented in El-Abed et al. More the risk factor is near 0. Assessment of the environmental (for example. for each identified threat and vulnerability. In order to integrate such information. The results presented in the previous section show that the existing keystroke dynamics methods provide promising recognition rates. Therefore. A risk factor. we believe that keystroke dynamics systems belong to the possible candidates that may be implemented in an Automated Teller Machine (ATM). . the security evaluation of biometric systems is generally divided into two complementary assessments: 1.176 20 Will-be-set-by-IN-TECH Biometrics A poor biometric in term of performance. a set of rules is presented in (El-Abed et al. there is only a few public databases that could be used to evaluate keystroke dynamics authentication systems. may be easily attacked by lambs. There is none competition neither existing platform to compare such behavioral modality. Figure 7 summarizes the security assessment of the TOE. hill-climbing and brute force (Martinez-Diaz et al. is the system is used indoor or outdoor?) and operational conditions (for example. There is no reference to this user classification in the keystroke dynamics literature. It is calculated using three predefined criteria (effectiveness. Assessment of the biometric system (devices and algorithms). As shown in section 5. In our opinion. it is important to take into consideration system performance within the security evaluation process. (2011). tasks done by system administrators to ensure that the claimed identities during enrollment of the users are valid). The absence of a quality test increases the possibility of enrolling authorized users with weak templates. According to the International Organization for Standardization ISO/IEC FCD 19792 (2008). Therefore. and such systems are well perceived and accepted by users. The presented method is based on the use of a database of common threats and vulnerabilities of biometric systems. 5. and 2. It is defined as the mean of both error rates FAR and FRR: HTER = FAR + FRR 2 (14) 10) Quality limitations during enrollment The quality of the acquired biometric samples is considered as an important factor during the enrollment process.. A type-1 security assessment of a keystroke dynamics system (Giot et al. Such templates increase the probability of success of zero-effort impostor.4 Discussion The evaluation of keystroke dynamics modality are very few in comparison to other types of modalities (such as fingerprint modality). is considered as an indicator of its importance. 2006) attempts.1.. better is the robustness of the Target of Evaluation (ToE). and can be widely used for e-commerce applications. and the notion of risk factor. The Half Total Error Rate (HTER) may be used as an illustration of system overall performance. easiness and cheapness) and is defined between 0 and 1000. 47–63.Keystroke Dynamics Authentication Keystroke Dynamics Authentication 177 21 Fig. & Traore. 8. whereas it is the most studied in the literature. 7. Users tend to change often their mobile phone. chapter Employee Surveillance based on Free Text Detection of Keystroke Dynamics. because mobile phones are more popular than computers and its use is very democratized. it could be useful to not re-enroll the user on its new mobile phone. more applications are available in a web browser. Conclusion and future trends We have presented in this chapter an overview of keystroke dynamics literature. Handbook of Research on Social and Organizational Liabilities in Information Security. 7. A. We daily use several computers which can have different keyboards on timing resolution. (2008). is subject to a lot of intra class variability. We believe that the future of the keystroke dynamics is no more on desktop application. Nowadays. They are more and more powerful every year (in terms of calculation and memory) and embeds interesting sensors (pressure information with tactile phones). In order to spread the keystroke modality. it is necessary to solve various problems related to: • The cross devices problem. In an online authentication scheme (were the template is stored on a server). . One of the main reasons is related to the problem of template aging: performances degrade with time because user (or impostors) type differently with time. These applications use the classical couple of login and password to verify the identity of a user. More information on the subject can be found in various overviews: Revett (2008. Integrating them a keystroke dynamics verification would harden the authentication process. 6. These variability must not have an impact on the recognition performances. • The aging of the biometric data. Acknowledgment The authors would like to thank the Lower Normandy Region and the French Research Ministry for their financial support of this work. Keystroke dynamics. Satisfaction (on the left) and security (on the right) assessment of a keystroke dynamics based system. References Ahmed. Idea Group Publishing. chapter 4) deeply presents some studies. pp. I. but in the mobile and internet worlds. Mobile phone owners are used to use various applications on their mobile and they will probably agree to lock them with a keystroke dynamics biometric method. Evolutionary Computation. (2007). User authentication through typing biometrics features. & Neri. pp..H. Carvalho Filho. Cho. Giot. Behavioral Biometrics for Human Identification: Intelligent Applications. 200–205. M. 647–653. Bergadano. & Carvalho. J. pp. An approach to feature selection for keystroke dynamics systems based on pso and feature weighting. On the design of an authentication system based on keystroke dynamics using a predefined input text.. E.. M. ICB 2006.. User authentication through keystroke dynamics. Engineering and Technology. & Rosenberger. B. Combining svms with various feature selection strategies. & Achatz. Bakdi. Springer. S. Ling. Chang. Vol. Ferreira. Pattern Recognition Letters 32(7): 1070 – 1080. Proceedings of World Academy of Science. V. chapter Performance Evaluation Of Behavioral Biometric Systems. (2009). J. L. National Taiwan University. M. Chen. L. S. Department of Computer Science. A. Artificial rhythms and cues for keystroke dynamics based authentication.. Lecture Notes in Computer Science 3995: 312.. (2005). C. M... A. IGI Global. F. Advanced user authentication for mobile devices. ACM Transactions on Information and System Security (TISSEC) 5(4): 367–397. IEEE Congress on.. Pasquet. M. G. B. & Yabu-Uti. Gunetti. man and cybernetics 21(2): 452–456. Ethical. Dallas. (1991). Computer-access security systems using keystroke dynamics.. V.-W. 57–74. & Phoha. Pizzoni & Cipriano. D. IEEE Transactions On Pattern Analysis And Machine Intelligence 12 (12): 1216–1222. (2006). C. (2011). Ray. On the discriminability of keystroke feature vectors used in fixed text keystroke authentication.. Benitez. Chang. M. M. Boechat. & Hussien. E. S. 2007.. W. S.178 22 Will-be-set-by-IN-TECH Biometrics Allen. TX. pp. Phoha. Dimensionality reduction and feature extraction applications inidentifying computer users. IET 3(4): 333 –341. S. IEEE transactions on systems. Sucupira.R.. Slivinsky. Bello. CEC 2007. Reliable keystroke biometric system based on a small number of keystroke samples. 626–632. (2010).. Cavalcanti. Carlos. Collection and publication of a fixed text keystroke dynamics dataset. D.. Y. G. Lo Bosco. Hemery. & Recife-PE. J. (2006b). Using the keystrokes dynamic for systems of personal security. (2010). Taiwan. (1990). .. K. & Obaidat. Southern Methodist University. Taipei 106. S. Bertacchini. & Furnell.. Technical report. (2007). Cherifi. R. I. An analysis of pressure-based keystroke dynamics algorithms. C. Campisi. Signal Processing. In International Conference on Biometrics (ICB). Maiorana. Azevedo. Araujo. L. User authentication using keystroke dynamics for cellular phones.. XVI Congreso Argentino de Ciencias de la Computacion (CACIC 2010). P. W. N. J. G. pp. (2009). (2006a). (2006). Balagani. C.. IEEE Transactions on Signal Processing 53(2 Part 2): 851–855. (2005). and Human Issues 1(2): 149. Lizarraga. & Picardi. S. Keystroke biometric system using wavelets. F. Techniques and Applications for Advanced Information Privacy and Security: Emerging Organizational. E. L. Bartmann. B. & Hwang.. Bleha. & Lin. C. computers & security 27: 109–119. D.. Clarke. (2002).-J. (2006). 18. Bleha. J. Master’s thesis. .. M. A. & Rosenberger.. pp. & Chri (2011). A. Giot. Giot. IEEE Expert: Intelligent Systems and Their Applications 12: 38–45. Giot. de Magalhaes. Doddington. R. M. Authenticating mobile phone users using keystroke analysis. & Lai. Giordana. Hemery. (1980). Design and Evaluation of a Pressure-Based Typing Biometric Authentication System. (2009a).). & Saitta. Rand Corporation. P.. ICSLP98. & Nakakuni. (2006). Saskatoon. & Freire. . A.. H.d.Keystroke Dynamics Authentication Keystroke Dynamics Authentication 179 23 Clarke. (2004). Pattern Recognition Letters 27: 1440–1446. J. W. W. El-Abed. pp.. District of Columbia. C. C. M. El-Abed.. Master’s thesis. Comparison of the adaptive authentication systems for behavior biometrics using the variations of self organizing maps. (2007). IEEE. N.. L. M. 44th IEEE International Carnahan Conference on Security Technology (ICCST). R. Article ID 345047(2008): 14. M. U. H. Galassi. K. T. S. (2007).. Hawaii. D. Filho. Password secured sites: stepping forward with keystroke dynamics. O. On the equalization of keystroke timing histograms. C.. Salami. IEEE Computer Society. (2010). 172–179. & Rosenberger. University of Saskatchewan. IEEE International Conference on Biometrics: Theory. Press. (1998).. R. C. (2008). S. Proceedings of the 37th Hawaii International Conference on System Sciences. R. (n. N. E.-J. El-Abed.. IEEE International Conference on Biometrics: Theory.. Dietrich. Eltahir. Modeling temporal behavior via structured hidden markov models: An application to keystroking dynamics. 1–6. Dozono.. Giot. goats. A. S. & Furnell. International Conference on Next Generation Web Services Practices. M. Giot. A study of users’ acceptance and satisfaction of biometric systems. Sheep. Keystroke dynamics with low constraints svm based passphrase enrollment. & Santos.. Julien. pp. Epp. (2007). International Symposium on Collaborative Technologies and Systems. H. USA. B. Washington. (1997). & Shapiro. IEEE International Conference on Security Science and Technology (ICSST). R. El-Abed. Crawford. (2009b). 2010 Eighth Annual International Conference on. R. Authentication by keystroke timing: some preliminary results. EURASIP Journal on Information Security. Revett. D. CANADA. (2010). M. Shwartzmann. Martin... Hemery. G. Gaines. El-Abed. & Rosenberger. El-Abed. Password-based authentication: A system perspective. Giot. W. Liggett. R. lambs and wolves: A statistical analysis of speaker performance in the nist 1998 speaker recognition evaluation. & Eloff. L.. Lisowski. Przybocki. H. Keystroke dynamics authentication for collaborative systems. India). Enhanced password authentication through fuzzy logic. Itou.. Conklin. & Rosenberger. de Ru. M. J. R. Identifying emotional states through keystroke dynamics. (2005). J. International Journal of Information Security 6: 1–14. Proceedings 3rd Indian International Conference on Artificial Intelligence (Pune. Computers & Security pp. & Rosenberger. M. B. Privacy Security and Trust (PST). M. (2009c). Keystroke dynamics: Characteristics and opportunities. C. W. C. 1–20. W. & Walz. Unconstrained keystroke dynamics authentication with shared secret.. Applications and Systems (BTAS 2009). [in print]. C. Greyc keystroke: a benchmark for keystroke dynamics biometric systems. Towards the security evaluation of biometric authentication systems.. G. & Reynolds. Technical report. 205–212. (2011). G. International Journal of Computers and Communications 1(4): 108–116. Ismail. M. (2007). S. S. Washington. (2008). and Cybernetics. Computers & Security 22(8): 695–706. pp. & Rosenberger. Gaussian mixture modeling of keystroke patterns for biometric applications. Advanced Topics in Information Processing–Lecture . Akila. Hwang. (2006).-j. J. R. pp. IAPR. P. Lee.180 24 Will-be-set-by-IN-TECH Biometrics Applications and Systems (BTAS 2009). Intelligence and Security Informatics 3917: 73–78. P. C. (2010). I. Recognising Emotions from Keyboard Stroke Pattern. M. H. 449–452. Applied Soft Computing 11(2): 1565 – 1573. Ramel. Caen. & Sogukpinar. International Journal of Computer Applications IJCA 11(9): 24–28. & Krishnan. Understanding users’ keystroke patterns for computer access security. Special Issue on: "Advances and Trends in Biometric pp. ISO/IEC 19795-1 (2006). Hemery. N. J. 1–6. 1128–1131. M. 331–350. Biometric personal authentication using keystroke dynamics: A review. Int. Giot. Hocquet. (2010). The Sixth International Conference on Biometrics (ICB2007). N. Estimation of user specific parameters in one-class problems. (2003). C. & Cho. (2009). IAPR International Conference on Pattern Recognition (ICPR). Man. H. France. 12–16. The Impact of Soft Computing for the Progress of Artificial Intelligence. IEEE. Information technology – security techniques –security evaluation of biometrics. Washington.-Y. Istanbul. T.. The International Conference on High Performance Computing & Simulation (HPCS 2010). (2008). Part C: Applications and Reviews. Hocquet. Giot. & Cardot. IEEE Computer Society. R. D. Lecture notes in computer science 4642: 584. of Information Technology and Management (IJITM). Instrumentation and Measurement Technology Conference Proceedings.. pp. 1–8.-s. J. Grabham. Acecptance rate: 54/100. Fast learning for multibiometrics systems using genetic algorithms. B. Keystroke dynamics in a general setting. A hybrid novelty score and its use in keystroke dynamics-based user authentication. Pattern Recognition p. & Sim. & Rosenberger. District of Columbia.. 1–17. Karnan. User classification for keystroke dynamics authentication. C. (2006). IEEE Computer Society. IMTC 2008. . Low cost and usable multimodal biometric system based on keystroke dynamicsand 2d face recognition. USA. R. Use of a novel keypad biometric for enhanced user identity verification. Ramel. Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection. H. (2011). Guven. S. Technical report. S. A new soft biometric approach for keystroke dynamics based on gender recognition. (2010). A. Hosseinzadeh. M. Keystroke dynamics. & Maxion. Killourhy. ICPR ’06: Proceedings of the 18th International Conference on Pattern Recognition.. ISO/IEC FCD 19792 (2008). (2008). Improving authentication accuracy of unfamiliar passwords with pauses and cues for keystroke dynamics-based authentication. Janakiraman. The effect of clock resolution on keystroke dynamics. International Organization for Standardization ISO/IEC 19795-1. R. Systems. J. Kang. & Sasikumar. & White. Khanna.. USA. Technical report. El-Abed. R. (2003).-Y. & Krishnaraj. Ilonen. 2008. Turkey. & Rosenberger. pp. 30. & Cardot. (2011). N. S. IEEE Computer Society. International Organization for Standardization ISO/IEC FCD 19792. (2007). Information technology biometric performance testing and reporting. & Cho. Springer. pp.. pp. Giot. pp. K. IEEE Transactions on 38(6): 816–826. [in print]. S. M. 531–539. DC. 6230 of Lecture Notes in Computer Science. K. F. 2010. Comparing anomaly-detection algorithms for keystroke dynamics. International Journal of Information Security 1(2): 69–83. (1997). & Wetzel. A. V. S.. S. patent. T. Revett. International Journal of Control. (2010). . Kohonen.. K. & Rubin. pp.Keystroke Dynamics Authentication Keystroke Dynamics Authentication 181 25 Killourhy. Future Generation Computer Syststems 16(4): 351–359. N. M.. & Siguenza. Rao. Proceedings of the 4th ACM conference on Computer and communications security. On the use of rough sets for user authentication via keystroke dynamics.. M. PRICAI 2010: Trends in Artificial Intelligence. (2005). S. IEEE International Carnahan Conferences Security Technology. S. pp. Automation and Systems 7(1): 7–15. Continuous keystroke biometric system. DSN’10. J. PhD thesis. Proceedings of the 18th Annual Belgian-Dutch Conference on Machine Learning. Le. Vol. Wiley Publishing. Pohoa. M. & Santos. (2006). Audio. A. T. Telecommunications Symposium. Springer Berlin / Heidelberg. Ray. Zhang & M. J. F. Monrose. (2010). 125–134..-D. Keystroke biometrics with number-pad input. (2006). Hill-climbing and brute force attacks on biometric systems: a case study in match-on-card fingerprint verification. Alonso-Fernandez. (2007). & Le. Multimodal biometric fusion–joint typist (keystroke) and speaker verification. 75–80. Orgun (eds). & Peetz. S.. (2009). J. K. & Freire. University of Southampton. pp. K. J. Monrose. (2006). Monrose. v. Pavaday. de Magalhães. Password hardening based on keystroke dynamics. (2009). 48–56. Master’s thesis. & Joshi. Reiter.. Behavioral biometrics: a remote access approach. M. in B.. M. S.. N. B. Ortega-Garcia.. Authentication via keystroke dynamics. Montalvao Filho. 2009. & Elliott. 609–614. Part B. Proceedings of the IEEE of International Carnahan Conference on Security Technology (ICCST). Revett. USA. (2006). Vibration sensitive keystroke analysis. Connell. Pohoa. . Killourhy. International Journal of Network Security & Its Applications (IJNSA). (2000). Ratha. (2008). M. T. IEEE Transactions on 27(2): 261–269. Lopatka. Enhancing login security through the use of keystroke input dynamics. H.and Video-Based Biometric Person Authentication. An analysis of minutiae matching strength. S.. & Maxion. S. (2009). H. & Bolle. ACM Press New York. University of California. J.. S. & Rubin (1997). (2001). Keystroke dynamics as a biometric for authentication. H. E. J. Revett. F. pp. R.-T. K. Hidden markov model (hmm)-based user authentication using keystroke dynamics. Systems. Obaidat. NY. & Nugessur. & Santos. Investigating & improving the reliability and repeatability of keystroke dynamics timers. 2006 International. 477–486. Modi. (2009). & Sadoun. R. K. de Magalhaes. Kesytroke dynamics verification using spontaneously generated password. Marsters. (2009). 2(3): 70–85. Lecture notes in computer science 4874: 145. R. Man and Cybernetics. pp. Self-organising maps. Springer Series in Information Sciences 30. F. Keystroke Dynamics as a Biometric. A bioinformatics based approach to user authentication via keystroke dynamics. Fierrez-Aguilar. K. Lecture notes in computer science 3832. J. B. Revett. (2010). (2002). K. B. IEEE/IFIP International Conference on Dependable Systems & Networks. Martinez-Diaz. S. IEEE/IFIP International Conference on Dependable Systems & Networks. Keystroke dynamics extraction by independent component analysis and bio-matrix for user authentication. & Maxion. S.. Nguyen. DSN’09. Verification of computer users using keystroke dynamics. (1995). Usability & biometrics: Ensuring successful biometric systems. (1999). Ross. J. Keyboard apparatus for personal identification. & Neo.. (2004). Computers & Security 23(5): 428–440. C. H. M. Springer.. S. R. . A. A. Ross. H. Yabu-Uti. R. D. pp. (2008). Stefan. Yared. Spillane. J.. Commun. Y. Teoh. Retrieved on 19. Biometric access control through numerical keyboards based on keystroke dynamics. & Brown. (1985). Springer. ManâASMachine Studies 23: 263–273. ACM . (2006). The National Institute of Standards and Technology (NIST). (1996). & Williams. Violaro.686.182 26 Will-be-set-by-IN-TECH Biometrics Rodrigues. B. B. A. D. Rutgers University. Proc. Rogers. & Wolfson. J. (1997)... P.. D. Song. D. Proceedings of the 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System-Volume 00.557. L. & Cho. G. (2007). Ong. Venable. A. Biometric sensor interoperability: A case study in fingerprints. ˘¸ Internat. Statistical fusion approach on keystroke dynamics. Technical report. M. Technical report. IEEE Computer Society. Stanton. Inside risks: the uses and abuses of biometrics. & Jain.. P. Sang. P. Method and apparatus for verification of a computer user’s identification. do NCosta. Applications and Technologies (PDCAT 2004). Umphress. Lecture notes in computer science 3832: 640. US Patent 5. of International ECCV Workshop on Biometric Authentication (BioAW).. Identity verification through keyboard characteristics. S. C. 134–145.. (1975). G. Keystroke dynamics authentication and human-behavior driven bot detection. & Jain. F. Novel impostors detection in keystroke dynamics by support vector machine. & Fan. T. of the 5th international conference on Parallel and Distributed Computing. Theofanos. A.. & Perrig. Teh. pp. E. & Ling. (2004). Technical report. Yu. A. Handbook of Multibiometrics. Keystroke dynamics identity verification – its problems and practical solutions. Shen. & Yao. K. 918–923. Proc. Schneier. Nandakumar. (2004). based on keystroke characteristics. User recognition by keystroke latency pattern analysis. A. (2006). (2008). ear. vein. the on-line signature verification is actively researched . for example. only ten fingerprints and one face. the safety of biometric data is discussed actively. the habit during writing is biometrics and it is not remained in the signature shape. Fig. in on-line signatures. a camera. 2005). and pen-tablet are normally equipped. James et al. it is possible to cope with the problem by changing the shape. gate.. therefore. A PDA with a pen-tablet On the other hand. such a technique is unnecessary if the biometrics itself is cancelable. If the biometric data are leaked out and it is known whose they are. Especially. iris. microphone. cancelable biometric techniques have been proposed. Introduction Biometrics attracts attention since person authentication becomes very important in networked society. to imitate it is quite difficult even if the signature shape is copied. assuming mobile access using a portable terminal such as a personal digital assistant (PDA). only the signature is cancelable from a viewpoint of spoofing.0 9 DWT Domain On-Line Signature Verification Isao Nakanishi. authentication using the face. 1999. voice and signature are well known and are used in various applications (Jain et al. To deal with this problem. 1. Even if a signature shape is known by others. face. Especially. As a result. which use not biometric data directly but one-to-one transformed data from the biometric data. Shouta Koike. Among various biometric modalities. voice and/or signature can be realized with no additional sensor. they are never used for authentication again. Yoshio Itoh and Shigang Li Tottori University Japan 1. As the biometrics. therefore.. However. Every human being has limited biometrics. the fingerprint. Concretely. An on-line signature is captured as x and y coordinate (pen-position) data in a digital pen-tablet system and their sampled data are .2 184 Biometrics Biometrics (Dimauro et al. the verification performance tends to be degraded since the on-line signature is a dynamic trait. 2000). signatures with small DP distances are accepted even if they are forgery. 2. 2002.. 2004. it is relatively easy to forge the pen-position parameter by tracing genuine signatures by others. Two data series compared are divided into several partitions and the DP distance is calculated every partition. There is another important problem when we use the DP matching. 2. we propose simply-partitioned DP matching. The DP matching forces to match two signatures even if either is forgery. For instance. Plamondon & Srihari.. In the proposed method. However. that is. 2004. 2008). we briefly explain the proposed on-line signature verification in the DWT domain. it is general to use a single threshold commonly in an authentication system. a pen-up while writing causes large differences in coordinate values of pen-position and so increases false rejection. so that such well-forged signatures can be distinguished from genuine ones. signatures with large DP distances are rejected even if they are of genuine. therefore.. Additionally. it results in degradation of verification performance. we have studied threshold equalization in the on-line signature verification (Nakanishi et al. individual features of a signature are enhanced and extracted in the sub-band signals. We propose new equalizing methods based on linear and nonlinear approximation between the number of sampled data and optimal thresholds. If the common threshold is used for all signatures. We have proposed a new on-line signature verification method in which a pen-position parameter is decomposed into sub-band signals using the discrete wavelet transform (DWT) and total decision is done by fusing verification results in sub-bands (Nakanishi et al. in the verification process of the proposed method. Therefore. since the signature shape is visible. Fierrez & Ortega-Garcia. The purpose of the DP matching is to find the best combination between such two data series. Jain et al. a DP distance is calculated in every possible combination of the two data series and as a result the combination which has the smallest DP distance is regarded as the best. However. it reduces the false acceptance. On the other hand. The reason why we use only the pen-position parameter is that detecting functions of other parameters such as pen-pressure. pen-altitude. DWT domain on-line signature verification In this section. The DP distance is proportional to the number of signature’s sampled data. in a pen-tablet system. limitation of combination in matching is effective for rejecting forgeries. 2005). But. 2003. signature complexity (shape). Consequently. dynamic programming (DP) matching is adopted to make it possible to verify two data series with different number of sampled points. The DP distance is obtained as dissimilarity.1 System overview A signal flow diagram is shown in Fig. But there are problems in use of the DP matching. so that it reduces excessively large DP distances. and/or pen-direction are not equipped in the PDA. It increases false acceptance. that is. each signature (user) has a different optimal threshold. 2007. On the other hand. so that if it is used as a criterion in verification. The DP distance is initialized at the start of a next partition. 2. the false rejection. therefore.. where (↓ 2k ) and (↑ 2k ) are down-sampling and up-sampling. · · · . 2. 2003) for the details. Total decision is done by comparing the final score with a threshold and it is verified whether the signature data are of genuine or not. 2. the decomposition level. respectively. Therefore. 1.DWT DomainSignature Verification DWT Domain On-Line On-Line Signature Verification 185 3 Fig. pen-altitude. Ak (z) and Sk (z) (k = 1. that is. The synthesized signal: vk (n ) in each sub-band is the signal in higher frequency band and called Detail which corresponds to the difference between signals.n (m) is a wavelet function and · denotes the conjugate. 2005). They are respectively normalized in both time and amplitude domains and then decomposed into sub-band signals using sub-band filters by the DWT (Nakanishi et al.2 Feature extraction by DWT In the following. M) are analysis filters and synthesis ones. 1997) of which parallel structure and frequency characteristics are shown in Fig. DWT domain on-line signature verification given by x (n ) and y(n ) where n = 0. The DWT of the pen-position data: v(n ) is defined as u k (n ) = ∑ v(m)Ψk. sub-band signals are enrolled as a template for each user. · · · . 2004. (Nakanishi et al. the x (n ) and y(n ) are represented as v(n ) together for convenience. In advance of verification. we adopt the Detail as an enhanced individual feature which can be extracted with no specialized function: pen-pressure. Sn − 1 and Sn is the number of sampled data. Templates are generated by ensemble-averaging several genuine signatures. k is a frequency (level) index.. and/or pen-direction which are not equipped in the PDA. 2003. 3. M is the maximum level of the sub-band. each decomposed signal is compared with its template based on the DP matching and a DP distance is obtained at each decomposed level. It is well known that the DWT corresponds to an octave-band filter bank (Strang & Nguyen. respectively. Final score is calculated by combing the DP distances at appropriate sub-bands in both coordinates.. At the verification stage.n (m) m (1) where Ψk. Please refer to Ref. . If the sampling frequencies are greatly different. Each signature is digitized at equal (common) sampling period using a pen-tablet system. In general. all signatures have different normalized sampling periods. therefore. Even genuine signatures have different number of sampled data. the sampling period of each signature is divided by the number of sampled data and so becomes real-valued. Thus. so that the differences between genuine signatures and forged ones are accentuated. 3. On the other hand. the variation of writing time is relatively large since it is not easy for forgers to imitate writing speed and rhythm of genuine signatures. 4. 4 (b). sampling periods (frequencies) of the forged signatures become greatly different from those of the genuine signatures as in Fig. different sampling frequencies. so that their sampling periods (frequencies) are comparable as shown in Fig. Concretely. In the proposed system. Sub-band decomposition by DWT Let us get another perspective on the effect of the sub-band decomposition using Fig. variation of writing time in the genuine signatures is small. that is. The maximum frequency: f m of the octave-band filter bank is determined by the sampling frequency based on the “sampling theory". In other words. 4 (a). in the case of forged signatures. each octave band (decomposition level) includes greatly different frequency elements as illustrated in (b). writing time of all signatures is normalized in order to suppress intra-class variation. frequency elements included in one level are different from the other. .4 186 Biometrics Biometrics (a) Parallel structure (b) Frequency characteristics Fig. even if levels compared are the same. In order to deal with the problem. 2. we introduce DP matching into the verification process. 2008). J − 1). However. 1. if forgers imitate writing speed and rhythm of genuine signatures. the local distance at kth is defined as d ( k ) = | a (i ) k − b ( j ) k | (k = 0. 1. 2003. 1. · · · . the verification was performed every stroke (intra-stroke or inter-stroke) in the conventional system (Nakanishi et al. I − 1) and b ( j) ( j = 0. 4. 2005).3 Verification by DP matching Since on-line signatures have large intra-class variation. The DP matching is effective in finding the best combination between two data series even if they have different number as illustrated in Fig. 2005. 2004.DWT DomainSignature Verification DWT Domain On-Line On-Line Signature Verification 187 5 (a) Comparison of genuine signatures (b) Comparison of a genuine signature with its forgery Fig. Letting the two data series be a(i ) (i = 0.. 5. Effect of sub-band decomposition The advantage was confirmed comparing with a time-domain method (Nakanishi et al. · · · .. a part of signature databases eliminates the data in inter-strokes and so we could not apply the conventional system to such a database. it is impossible for the proposed method to distinguish forged signatures from genuine ones. Therefore. K − 1) (2) . one-to-one matching cannot be applied in verification. Of course. · · · . b )/ ∑ w(k) k =0 K −1 (4) Assuming the weight is symmetric: (1-2-1) and the initial value is zero. b ) = D ( a. vt )l /(Vn + Tn ) (6) where Vn is the number of sampled data in the verification signature and Tn is that of the template. b ) = K −1 k =0 ∑ w(k)d(k) (3) where w(k) is a weighting factor. K −1 k =0 ∑ w(k) = I + J. a DP distance is given by D ( a. nD ( a. the normalized DP distance is given by nD (v. vt )l where v is sampled data series of a signature for verification and vt is that of a template. 5. DP matching where instead of i and j. Let the DP distance at lth level be D (v. After calculating the DP distance in all possible combinations. we can find the best combination by searching the combination with the smallest DP distance. (5) In the proposed method. vt )l = D (v. the normalized DP distance is used in general.6 188 Biometrics Biometrics Fig. since the DP distance depends on the number of sampled data. . the DP distance is obtained at each sub-band level. k is used as another time index since these data are permitted to be referred redundantly. Moreover. By accumulating the local distances in one possible combination between the two series. cy > 0. assuming to use individually-optimal thresholds for all subjects. 6. the EER of the conventional system was 28. The number of subjects was 40 and 17 subjects signed their names in Chinese characters and the rest in alphabetical ones. The wavelet function was Daubechies8. in a pen-tablet system. The total number of signatures was 1600. If the division leaves remainders. so that it is confirmed that introducing the DP matching is effective for not only applying to the standard database but also improving verification performance. On the other hand. This is a rough evaluation but suggests that if a single common threshold is optimal for all subjects. For instance. we propose simply-partitioned DP (spDP) matching. This increases false acceptance. c x > 0. The EER of the proposed system was 20. Consequently. (SVC2004.3 %. This issue is examined in Sect.0 %. The DP matching forces to match two signatures even if either is forged one. This increases false rejection. The maximum level of the sub-band: M was 8 and the number of levels used in the total decision: L was 4. We used a part of the on-line signature database: SVC2004 in which the data in inter-strokes were eliminated. In particular. skilled forgeries (well-forged signatures) could make the DP distance smaller. The combination weights were c x = cy = 0. The verification performance was evaluated by using an equal error rate (EER) where a false rejection rate (FRR) was equal to a false acceptance rate (FAR). For generating templates. Simply partitioned DP matching There is another issue to be overcome in order to improve the verification performance. 2004) for more information. Conversely. 4.DWT DomainSignature Verification DWT Domain On-Line On-Line Signature Verification 189 7 A total distance (TD) is obtained by accumulating the normalized DP distances in sub-bands. which mean to take the average. Both data series: a(i ) and b ( j) are divided into several partitions of the same integer number and a sub DP distance is calculated every partition. It sometimes brings large differences from template data and then leads to a large DP distance. it is not guaranteed to get precise coordinate values of pen-position. we averaged EERs of all subjects and then so obtained EER of 15.4 Verification experiments In order to confirm verification performance of the proposed system. signatures with small DP distances are accepted even if they are forgery. For collecting skilled forgeries. the verification performance could be improved further. L is the number of levels used in the total decision. 3. we carried out experiments in the following conditions. .5. The signature with such a large DP distance is rejected even if it is genuine. For reference. xt )l + cy · L ∑ nD(y. they are singly distributed to partitions. 2. TD = c x · M M 1 1 ∑ nD(x. when the pen tip is released from the surface of the tablet. imposters could see how genuine signatures were being written. The concept is illustrated in Fig. Please refer to Ref.3 %. where the number of partitions is four. yt )l L l = M − L +1 l = M − L +1 (7) where c x and cy are weights for combining the DP distances in x and y coordinates and c x + cy = 1. data of five genuine signatures were ensemble-averaged. The sub DP distances are initialized at the start of next partitions and a total DP distance is obtained by summing the sub DP distances in all partitions. Furthermore. so that the rhythm in each partition of forged signatures becomes different from that of genuine ones. Of course. they need additional processing for character or stroke detection. the normalized DP distance at the sub-band level: l is obtained by summing sub DP distances in all partitions. Let the number of partitions and the sub DP distance be Q and D (v. 6. respectively. matching pairs between the genuine signature and its forged one are not in the diagonal direction and so are excluded even if they have small DP distances. Therefore. even if excessively a large sub DP distance is caused by the irregular pen-up mentioned above in a partition. when a verification signature is genuine. On the other hand. vt )q Q /(Vn + Tn ) (8) . Therefore. Resultingly. As a result. 6. The spDP matching is also effective in reducing false acceptance. nD (v. appropriate matching pairs tend to exist in diagonal direction in Fig. q =1 ∑ D(v. 1998) but they assume to write Chinese (Kanji) characters in standard style and the partitioning is done every character or stroke. it is difficult for forgers to copy rhythms in writing of genuine signatures. they could not be directly applied to the case of a cursive style (connected characters). even if their data are partitioned.8 190 Biometrics Biometrics Fig. 2007. rhythms in corresponding partitions compared are still equivalent. vt )l = A total distance (TD) is given by Eq. Simply-partitioned DP matching (Q=4) Assuming that genuine signatures have equivalent rhythms in writing. Such a concept that inappropriate pairs are excluded by partitioning the DP distance has been already proposed (Sano et al. Yoshimura & Yoshimura.. the spDP matching has no ill effect for false rejection. vt )q . it is initialized at the start of the next partition. (7). so that the spDP matching prevents the total DP distance from becoming excessively large and has an effect on reducing false rejection. Comparing between Figs.6 17.8 16. In the following. 2008). 5 in order to reduce calculation amount by excluding unlikely pairs. 2. (4) is used for dealing with this problem. TDeq and Tn are the total distance. 3. However. it is confirmed that the spDP matching decreased the EER by 2-3%. the larger intra-class variation becomes and as a result. Final decision is achieved by comparing the DP distance with a threshold. therefore. it makes a DP distance large.4. when the final score (the DP distance) of each user is greatly different from those of others. complex signatures have large number of sampled data since they consume relatively long time for writing. Generally.4 Table 1. A total distance (TD) is rewritten as TD = c x · M M 1 1 ∑ D(x. the normalization also makes the DP distances of forged signatures small and thereby might increases false acceptance. the matching window is generally adopted in the DP matching as shown in Fig. Therefore. Number of Partitions 0 2 3 4 5 6 EER (%) 20.4 in order to improve verification performance.DWT DomainSignature Verification DWT Domain On-Line On-Line Signature Verification 191 9 By the way. In addition.4 16. γ is a . 4. Conditions are similar with those in Sect. We have studied to equalize the threshold instead of using the normalization (Nakanishi et al. it is clear that the spDP matching is more effective for excluding inappropriate pairs than the matching window. the verification performance tends to be degraded by using the common threshold. the normalized DP distance given by Eq. the conventional equalization is defined as TDeq = p p γ p p TD Tn p (10) where p is user number and TD p . In not only on-line signature verification but also all biometric authentication systems.0 16. we set the number of partitions at 4. The larger the number of sampled data of a signature becomes.1 Evaluation of spDP matching We evaluated verification performance using the spDP matching. EERs in various numbers of partitions are summarized in Table 1 where the case of 0 partitions corresponds to the conventional normalized DP matching. Threshold equalizing There is an important issue to be overcome as mentioned in Sect. respectively. yt )l L l = M − L +1 l = M − L +1 (9) where please be aware that unnormalized DP distance D (v. to make the DP distance inversely proportional to the number of sampled data suppresses the variation range of the DP distance and then it leads to equalization of thresholds. xt )l + cy · L ∑ D(y.. 2. In general. 5 and 6. the threshold should be common to all users. Based on this concept. vt )l is used. EERs in various numbers of partitions From these results. the equalized total (final) distance and the number of sampled data of the template of the user.0 17. final scores are compared with a threshold which is preliminary determined. 10 192 Biometrics Biometrics constant for adjusting the final distance to an appropriate value. therefore. 7. 4. the relation between the number of sampled data and the optimal threshold could be fitted by a nonlinear function.1 Equalization using linear approximation Assuming that the relation between the number of sampled data and the optimal threshold is approximated by a linear function. the final distance of the signature is enlarged. 4.. Relation between the number of sampled data and optimal thresholds The optimal thresholds are widely distributed. In addition. α and β are the gradient and intercept of the linear function. When the number of sampled data in a signature is too small. Fig.1 New threshold equalizing methods Figure 7 shows the relation between the number of sampled data in signatures (templates) and their optimal thresholds (total DP distances) using the spDP matching (Q = 4) in SVC2004. where the thresholds which bring EERs are regarded as optimal. The effect of the threshold equalizing was already confirmed (Nakanishi et al. The total DP distance is adjusted by using an exponential function as γ p TDeq = TD p (12) p exp(α · Tn + β) .2 Equalization using nonlinear approximation On the other hand. 2008).1. it is easy to guess that common use of a single threshold is not good for verification performance. 4.1. large number of sampled data in a signature reduces the final distance. Conversely. the relation between the number of sampled data and the optimal threshold is not simple differently from that assumed in the conventional equalization. the total DP distance is equalized as TDeq = p γ TD p p α · Tn + β (11) where γ is the adjustment constant as well as the conventional method. Conditions are the same as those in Sect. again. If the relation in the training data set is equivalent with that in the . The broken lines indicate approximation functions where α = 10 and β = −293 in the linear case and α = 0. However. The distribution of optimal thresholds after equalization is compared with that before equalization in Fig. 8. we evaluated verification performance using the SVC2004. The number of partitions in DP matching was 4. which is independent of a test data set.3 in the nonlinear case. it is better to determine them using a training data set.2 Evaluation of threshold equalizing In order to verify effectiveness of the threshold equalizing methods.DWT DomainSignature Verification DWT Domain On-Line On-Line Signature Verification 193 11 where α and β are constants for fitting the nonlinear function to the relation between the number of sampled data and the optimal threshold. the proposed equalizing methods are based on rough approximation of the relation between the number of sampled data and the optimal threshold in the SVC2004. 4. (a) Linear case (b) Nonlinear case Fig.4. Distribution of optimal thresholds before and after equalization in the linear and nonlinear approximation cases From a viewpoint of their universality.0069 and β = 3. 8 where the uncolored triangles are before equalization and the black ones are after equalization. 2. it is confirmed that the proposed spDP matching is more effective in improving verification performance than the general-used DP matching. Moreover. accumulative distances were also initialized and the total DP distance . and then a total DP distance was obtained by summing the sub DP distances.07 in the nonlinear case. the more universal the constants.27 but after the equalization it was reduced to 0. As confirmed in Fig. a sub DP distance was calculated every partition. The approximation depends on not training data sets but databases.54 0. EERs and variances in various methods are summarized in Table 2. EERs and variances in various methods Finally. we analyzed statistical variance of optimal threshold values before and after the equalization.23 0.22 0. γ was set to a value which adjusts the thresholds to around 2000.17 0. that is.12 194 Biometrics Biometrics test data set. Conclusions We have studied on-line signature verification in the DWT domain. It is a future problem to adopt other functions for approximating the relation between the number of sampled data and the optimal threshold.05 0. Especially. the DP distances were adjusted to around 2000 and the variation range of the DP distances was narrowed. On the other hand. The larger the number of data becomes. In both cases.6 14. EER(%) Variance 25. In order to improve the verification performance.0 16. the EER of about 15% may not be absolutely superior to those of other on-line signature verification methods.5 14. 8 (b). It is confirmed that the optimal thresholds. the smallest EER of 14.05 in the linear case and 0.4 20. the adjustment in the nonlinear case might be excessive when the number of sampled data was large. Similarly.6 19.05 were obtained when the threshold equalization using the linear approximation was applied. The sub DP distances were initialized at the start of next partitions. combining the spDP matching with the new threshold equalization is much more effective.0 19. Comparing the EER and variance in the 4-partitioned DP matching to those in the normalized DP one. For achieving quantitative evaluation.05 0. In the spDP matching. it is possible to introduce the spDP matching and/or the threshold equalizing into the methods based on the DP matching and it might also improve their performance.6% and variance of 0.07 5. The variance before the equalization was 0. However.9 19. we introduced spDP matching and threshold equalizing into the verification process. the proposed new threshold equalization methods are confirmed to be more efficient than the normalized DP matching and the conventional method. Method Unnormalized DP Normalized DP 4-partitioned DP Conventional equalization Linear equalization Nonlinear equalization 4-partitioned DP + Linear equalization 4-partitioned DP + Nonlinear equalization Table 2. two data series compared were divided into partitions.24 0. the proposed methods does not depend on the data used. therefore.9 0. On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey. Vol. Pattern Recognition. Y.. Proceedings of Workshop on Multimodal User Authentication. pp. Oct. Itoh. A. K. N. Lucchese. M. G. (2000). by approximating the relation between the number of sampled data in a signature and its optimal threshold by linear or nonlinear functions. Hongkong. Vol. On-Line Signature Verification Method Based on Discrete Wavelet Transform and Adaptive Signal Processing. Maltoni.1-11 Nakanishi. A. I. D. Also. Springer. K. London Nakanishi. Jun. S. Aug. P. 2000. S. Jan. Itoh. New York Jain. J. J. Y. & Fukui.207-214 Nakanishi. Threshold Equalization for On-Line Signature Verification. Y. & Fukui. Also. No. 35.. it was confirmed that each proposed method was efficient in improving the verification performance. Moreover. Proceedings of International Conference on Biometric Authentication. (2004). Jul. No. 2008. Bolle. & Fukui. 2004. S. A. 2004. On-Line Signature Verification Based on Subband Decomposition by DWT and Adaptive Signal Processing. Y. Santa Barbara. (2005). Itoh. N. Biometric Systems. On-Line Signature Verification. (2003). R.. Vol. No. F. Impedovo. (Eds. A. 179-184 Fierrez. Modugno. H. 2963-2972 Jain. I.. 6. (2005).. IEEE Trans. pp. Vol. References Dimauro. (1999). We have an issue that there might be more effective approximate functions for threshold equalization. On-Line Signature Verification. pp. & Ortega-Garcia. & Fukui. Springer. No. Kluwer Academic Publishers. it prevented the verification performance from being degraded by using a single common threshold for all signatures. IEICE Trans. USA.E91-A. pp. Y. Griess. On-Line Signature Verification Based on Discrete Wavelet Domain Adaptive Signal Processing. (2004).6. 584-591 Nakanishi. Y. G.. J. Itoh. Pattern Analysis and Machine Intelligence. S. (2002).8. Y. 2005. 88. (2007). Jain. 1. Massachusetts Wayman.. BIOMETRICS. pp. N. In the threshold equalizing. the spDP matching reduced false acceptance since limitation of combination in matching excluded inappropriate matching pairs. In: Handbook of Biometrics. I. 2244-2247 Plamondon. pp. D. Part 3. and A. 22.. & Srihari. Flynn. Recent Advancements in Automatic Signature Verification. Jain. I. 63-84 ... we evaluated signature’s complexity by using the number of sampled data but it is expected to use sub-band signals for evaluating the complexity. 2002. Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition. & Pirlo. D. Chapter 10.. Nishiguchi. (2008). pp. Electronics and Communications in Japan. Fundamentals. Nishiguchi. R. Y.. R. D.). Sakamoto. G. the variation range of optimal thresholds of all signatures were suppressed and as a result. & Maio. Dec. Nishiguchi. A... In experiments using a part of the signature database: SVC2004. It was effective in reducing false rejection.DWT DomainSignature Verification DWT Domain On-Line On-Line Signature Verification 195 13 was prevented from becoming excessively large. 12. Ross. & Pankanti.. Dec. & Connell. combining the spDP matching with the threshold equalizing was more effective and reduced the error rate by about 5% comparing with the general-used DP matching. N. 2003. 14 196 Biometrics Biometrics Sano, T.; Wada, N.; Yoshida, T. & Hangai, S. (2007). A Study on Segmentation Scheme for DP Matching in Japanese Signature Verification (in Japanese), Proceedings of 2007 IEICE General Conference, Mar. 2007, p. B-18-4 Strang, G. & Nguyen, T. (1997). Wavelet and Filter Banks, Wellesley-Cambridge Press, Massachusetts. SVC2004. (2004). URL: http://www.cse.ust.hk/svc2004/index.html Yoshimura, M. & Yoshimura, I. (1998). An Off-Line Verification Method for Japanese Signatures Based on a Sequential Application of Dynamic Programming Matching Method (in Japanese), IEICE Trans. Information and Systems, Vol. J81-D-II, No. 10, Oct. 1998, pp. 2259-2266 Part 3 Medical Biometrics 0 10 Heart Biometrics: Theory, Methods and Applications Foteini Agrafioti, Jiexin Gao and Dimitrios Hatzinakos The Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto 10 Kings College Road, Toronto Canada 1. Introduction Automatic and accurate identity validation is becoming increasingly critical in several aspects of our every day lives such as in financial transactions, access control, traveling, healthcare and other. Traditional strategies to automatic identity recognition include items such as PIN numbers, tokens, passwords and ID cards. Despite the wide deployment of such tactics, the authenticating means is either entity or knowledge-based which rises concerns with regard to their ease of acquisition and use from unauthorized third parties. According to the latest The US Federal Commission Report, Frebruary 2010 (n.d.), in 2009 identity theft was the number one complaint category ( a total of 721,418 cases of consumer complaints). As identity theft can take different forms, credit card fraud was the most prominent (17%), followed by falsification of government documents (16%), utilities fraud (15%), employment fraud (13%) and other. Among these cases, true-identity theft constitutes only a small portion of the complaints, while ID falsification appears to be the greatest threat. Unfortunately, the technology for forgery advances without analogous counterfeit improvements. Biometric recognition was introduced as a more secure means of identity establishment. Biometric features are characteristics of the human body that are unique for every individual and that can be used to establish his/her identity in a population. These characteristics can be either physiological or behavioral. For instance, the face, the iris and the fingerprints are physiological features of the body with identifying information. Examples of behavioral features include the keystroke dynamics, the gait and the voice. The fact that biometric features are directly linked with the users presents an extraordinary opportunity to bridge the security gaps caused by traditional recognition strategies. Biometric features are difficult to steal or counterfeit when compared to PIN numbers or passwords. In addition, the convenience by which a biometric feature can be presented makes the respective systems more accessible and easy to use. However, biometric recognition has a drawback that rises from the nature of the authenticating modality. As opposed to static PIN numbers or passwords, biometric recognition may present false rejection since usually no two readings of the same biometric feature are identical. Anatomical, psychological or even environmental factors affect the appearance of the biometric feature at a particular instance. For instance, faces may be presented to the recognizers under various expressions, different lighting settings or with 2 200 Will-be-set-by-IN-TECH Biometrics occlusion (glasses, hats etc). This may introduce significant variability (commonly referred to as intra-subject or intra-class variability) and the challenge is to design algorithms that can find robust biometric patterns. Although the intra-subject variability is universal for all biometric modalities, every feature has unique characteristics. For instance, face pictures may be acquired from distance which makes them suitable for surveillance. On the opposite, fingerprints need direct contact with the sensing device, and despite the robust biometric signature that there exists, most of the error arises from inefficient processing of the image. Therefore, given a set of standards it is difficult, if not impossible, to choose one feature that satisfies all criteria. Every biometric feature has its own strengths and weaknesses and deployment choices are based on the characteristics of the envisioned application environment. On the assumption that intra-subject variability can be sufficiently addressed with appropriate feature extraction, another consideration with this technology is the robustness to circumvention and replay attacks. Circumvention is a form of biometric feature forgery, for example the use of falsified fingerprint credentials that were copied from a print of the original finger. A replay attack is the presentation to the system of the original biometric feature from an illegitimate subject, for example voice playbacks in speaker recognition systems. Biometric obfuscation is another prominent risk with this technology. There are cases where biometric features are intentionally removed to avoid establishment of the true identity (for example asylum-seekers in Europe Peter Allen, Calais migrants mutilate fingerprints to hide true identity, Daily Mail (n.d.)). With the wide deployment of biometrics, these attacks are becoming frequent and concerns are once again rising on the security levels that this technology can offer. Concentrated efforts have been made for the development the next generation of biometric characteristics that are inherently robust to the above mentioned attacks. Characteristics that are internal to the human body have been investigated such as vein patterns, the odor and cognitive biometrics. Similarly, the medical biometrics constitutes another category of new biometric recognition modalities that encompasses signals which are typically used in clinical diagnostics. Examples of medical biometric signals are the electrocardiogram (ECG), phonocardiogram (PPG), electroencephalogram (EEG), blood volume pressure (BVP), electromyogram (EMG) and other. Medical biometrics have been actively investigated only within the last decade. Although the specificity to the individuals had been observed before, the complicated acquisition process and the waiting times were restrictive for application in access control. However, with the development of dry recoding sensors that are easy to attach even by non-trained personnel, the medical biometrics field flourished. The rapid advancement between 2001-2010 was supported by the fact that signal processing of physiological signals (or biosignals) had already progressed for diagnostic purposes and a plethora of tools were available for biometric pattern recognition. The most prominent strength of medical biometrics is the robustness to circumvention, replay and obfuscation attacks. If established as biometrics, then the respective systems are empowered with an inherent shield to such threats. Another advantage of medical biometrics is the possibility of utilizing them for continuous authentication, since they can provide a fresh biometric reading every couple of seconds. This work is interested in the ECG signal, however the concepts presented herein may be extended to all medical biometric modalities. 7. 4. More precisely. policemen. With this modality the recognizer can trivially ensure sensor liveness. This property is unique to medical biometrics and can be vital in avoiding threats such as field officer impersonation. To the best of our knowledge there is currently no means of falsifying an ECG waveform and presenting it to a biometric recognition system. Universality refers to the ability of collecting the biometric sample from the general population. if not impossible. The particular appearance of the ECG waveform is the outcome of several sympathetic and parasympathetic factors of the human body. from a biometrics perspective. When deployed for security in welfare monitoring environments. 5. Permanence refers to the ability of performing biometric matches against templates that have been designed earlier in time. the ECG is a dynamic biometric feature that evolves with time. Traditionally. a fresh reading can be obtained every couple of second to re-authenticate an identity. continuous authentication and data minimization. such the iris or the fingerprint require additional processing to establish the liveness of the reading. 6. Controlling the waveform or attempting to mimic somebody else’s ECG signal is extremely difficult. ECG offers natural liveness detection. Privacy intrusion is becoming increasingly critical in environments of airtight security. 1. being only present in a living subject.1. usually with complementary tests required in order to finalize a diagnosis. the ECG is affected by both physical and psychological activity. Methods 201 3 2. Since the ECG is a vital signal. Continuous authentication. permanence. While ECG signals of different individuals conform to approximately the same pattern. 3. liveness detection. this property is satisfied naturally. However. Obfuscation is also addressed naturally. This essentially requires that the signal is stable over time. As opposed to static iris or fingerprint images. field agent monitoring (fire-fighters. robustness to attacks. It is recorded non-invasively with electrodes attached at the surface of the body. Liveness detection. 2. however even though the specific local characteristics of the pulses may change. notable challenges are presented with this technology when envisioning large-scale deployment: . it has been demonstrated that the ECG has sufficient detail for identification. As it will be discussed later. uniqueness. Robustness to attacks. Uniqueness is guaranteed in the ECG signal because of its physiological origin. Other biometric modalities. Data minimization is a great possibility with ECG biometrics because there are environments where the collection of the signal is performed irrespective of the identification task. physicians use the ECG to gain insight on heart conditions. Despite the advantages. The advantages of using the ECG for biometric recognition can be summarized as universality. soldiers etc). ECG biometrics: Motivation and challenges The ECG signal describes the electrical activity of the heart over time. there is large inter-individual variability due to the various electrophysiological parameters that control the generation of this waveform. Examples of such environments are tele-medicine. Data minimization. Methods and Applications and Applications Heart Biometrics: Theory. Uniqueness will be further discussed in Section 3.Heart Biometrics: Theory. One way to address this problem is to utilize as less identifying credentials as possible. patient monitoring in hospitals. the overall diacritical waves and morphologies are still observable. because the atrial muscle mass is limited. 3. this is not the case with the ECG signal. the ECG signal may destabilize with respect to a biometric template that was constructed some time earlier. the ECG signal is studied for diagnostics even at the very early stage of a disease. The first recording device was developed by the physiologist Williem Einthoven in the early 20th century. The P wave describes the depolarization of the right and left atria. Traditionally. since it represents the depolarization of the right and left ventricles. The anatomic characteristics .. and pictures the sequential depolarization and repolarization of the different muscles that form the myocardium. Although cardiac disorders are not as a frequent damaging factor as injuries for more conventional biometrics (fingerprint. 4. this signal describes the electrical activity of the heart over time. Therefore.e. where the biometric information is available for capturing at any time instance. Electrocardiogram fundamentals The electrocardiogram (ECG) is one of the most widely used signals in healthcare. they can limit ECG biometric methods. and for this discovery he was rewarded with the Nobel Prize in Medicine in 1924 Sornmo & Laguna (2005). the iris or the fingerprint. The ECG signal may reveal current and past medical conditions as well as hints on the instantaneous emotional activity of the monitored individual. low frequencies. Recorded at the surface of the body. 2. Since then. Privacy implications. 3. a central aspect of the ECG biometrics research is the investigation of the sources of intra-subject variability. In essence. while its spectral content is limited to 10-15 Hz. When collecting ECG signals a large amount of sensitive information is collected inevitably. as well as the processing time. even in the absence of noise. The deployment of this signal in biometric recognition and affective computing is relatively young. Time dependency. Collection periods. the P wave. The duration of this complex is approximately 70-110 ms in a normal heartbeat. However. with electrodes attached in various configurations. As opposed to biometrics such as the face. i. face). which essentially means that longer waiting times are expected with this technology especially when long ECG segments are required for feature extraction. The challenge however is to minimize the number of pulses that the algorithm uses for recognition.e.. Thus. ECG became an indispensable tool in clinical cardiology. Cardiac Conditions. This wave usually has a positive polarity. with a duration of approximately 120 ms. being the heart chambers with substantial mass. the possibility of linking ECG samples to identities can imply catastrophic privacy issues.4 202 Will-be-set-by-IN-TECH Biometrics 1. The biometric challenge is therefore to design algorithms that are invariant to tolerable everyday ECG irregularities Agrafioti & Hatzinakos (2008a). The reason for this is the direct effect that the body’s physiology and psychology have on the cardiac function. The absence of a P wave typically indicates ventricular ectopic focus. The amplitude of this wave is relatively small. the ECG is available to physicians only. With time-varying biosignals there is high risk of instantaneous changes which may endanger biometric security. Recordings of the cardiac potential at the surface of the body are very prone to noise due to movements. Disorders can range from an isolated irregularity (Atria and ventricle premature contractions) to severe conditions which require immediate medical assistance. Every heart beat is formed within approximately a second. The QRS complex corresponds to the largest wave. Figure 1 shows the salient components of an ECG signal i. the QRS complex and the T wave. (2001). Simon & Eswaran (1997).g. 1. various methodologies have been proposed to eliminate the differences among ECG recordings. 2000). The electrical map of the area surrounding the heart may also be affected by variations of other organs in the thorax Hoekema et al. Hoekema et al. Kozmann et al. healthy ECG signals from different people conform to roughly the same repetitive pulse pattern. of the QRS complex depend on the origin of the pulse. However.. e. Green et al. Automatic diagnosis . are sources of significant variability among individuals Hoekema et al. (1985). and is usually observed 300 ms after this larger complex. Kozmann et al. The inter-individual variability of ECG has been extensively reported in the literature Draper et al. except for the anatomic idiosyncrasy of the heart. Model studies have shown that physiological factors such as the heart mass orientation. (1989. In fact. appearing closer to the QRS waves at rapid heart rates. and torso shape designate ECG signals with particularly distinct and personalized characteristics. Other factors affecting the ECG signal are the timing of depolarization and repolarization and lead placement. In addition. (1964). unique patterns have been associated to physical characteristics such as the body habitus and gender Green et al. and is mostly concentrated in the interval of 10-40 Hz. further investigation of a person’s ECG signal can reveal notably unique trends which are not present in recordings from other individuals. Furthermore. the conductivity of various areas and the activation order of the heart.1 Inter-individual variability This section will briefly discuss the physiological rationale for the use of ECG in biometric recognition. the ECG depicts various electrophysiological properties of the cardiac muscle. The idea of clearing off the inter-individual variability is typical when seeking to establish healthy diagnostic standards Draper et al. (2001). Due to its steep slopes. Main components of an ECG heart beat. However. 3. Hoekema et al.Heart Biometrics: Theory. (1964). (2001). (1989. Kozmann et al. 2000). Finally. geometrical attributes such as the exact position and orientation of the myocardium. Overall. (2000). (2006). Larkin & Hunyor (1980). compared to the QRS complex. the spectrum of a QRS wave is higher compared to that of other ECG waves. its precise position depends on the heart rate. Methods 203 5 R T P Q S Fig. It has a smaller amplitude. Pilkington et al. the T wave depicts the ventricular repolarization. (2001). More specific. (1985). Methods and Applications and Applications Heart Biometrics: Theory. such as the temporal or amplitude difference between consecutive fiducial points. A holistic approach in this case would then be to analyze the facial image globally.6 204 Will-be-set-by-IN-TECH Biometrics Fig. where one can operate locally and extract biometric features such the distance between the eyes or the size of the mouth. personalized parameters of every subject are treated as random variables and a number of criteria have been defined to quantify the degree of subjects’ similarities on a specific feature basis. 4. Fiducials are specific points of interest on an ECG heart beat such as the ones shown in Figure 1. fiducial based approaches rely on local features of the heart beats for biometric template design. Both approaches have advantages and disadvantages. Figure 2 shows an example of aligned ECG heart beats which belong to the same individual. The challenge in the later case. Therefore. 2. Related research Prior works in the ECG biometric recognition field can be categorized as either fiducial points dependent or independent. Even . fiducial points independent approaches treat the ECG signal or isolated heart beats holistically and extract features statistically based on the overall morphology of the waveform. detecting fiducial points is a very obscure process due to the high variability of the signal. is remove this information in a way that the intra-subject variability is minimized and the inter-subject is maximized. holistic approaches deal with a large amount of redundant information that needs to be eliminated. On the other hand. Variability surrounding the QRS complex among heart beats of the same individual. For the ECG case. While fiducial oriented features risk to miss identifying information hidden behind the overall morphology of the biometric. of pathologies using the ECG is infeasible if the level of variability among healthy people is high Kozmann et al. (2000). This distinction has a direct analogy to face biometric systems. In such algorithms. The proposed system employed only temporal features. This work presented the three clear stages of ECG biometric recognition i. A filter was first applied to retain signal information in the band 1. These features pictured local characteristics of the pulses. (2001). also proposed fiducial based features for ECG biometric recognition. By targeting to keep discriminative information while applying a stable filter over the gallery set. While the first step managed to correctly identify only 85% of the cases. Kyoso & Uchiyama (2001). Shen et al. such as the QRS complex and T wave amplitudes. In 2002.e. Methods 205 7 though the QRS complex is perfectly aligned there is significant variability surrounding the P and the T wave.. feature extraction and classification. . The subject with the smallest Mahalanobis distance between each two of the four feature parameters was selected as output. 5.e. spectral differencing and Fourier band-pass filter. electrode placement and physical stress.1. The highest identification rate achieved was close to 100% which generally established the ECG signal as a biometric feature that is robust to heart rate variability. QRS complex and QT durations.Heart Biometrics: Theory. preprocessing. Fiducial based approaches Among the earliest works in the area is Biel et al. by Israel et al. A comparison is also provided in Tables 1 and 2. (2005) such as. and thus is appropriate for ECG biometric recognition. there is no universally acknowledged rule that can guide this detection Hoekema et al. only 12features were retained for matching by inspection of the correlation matrix. This section provides an overview of fiducial dependent and independent approaches to ECG biometrics that are currently found in the literature. in 2001. the P wave duration. four feature parameters were selected i. Methods and Applications and Applications Heart Biometrics: Theory. (2005). The standard 12 lead system was used to record signals from 20 subjects of various ages. More complete biometric recognition tests were reported in 2004. The underlying idea was that this wave is less affected by varying heart rates. In fact. different filtering techniques were examined to conclude to a local averaging. Special experimentation was carried out to test variations due to lead placement in terms of the exact location and the operators who place the electrodes..2% for using just the QRS and QT intervals.40 Hz and discard the rest of the spectral components which were attributed to noise. Results of combining different features were compared to demonstrate that the best case classification rate was 100% with just 10 features. (2001) proposal. which demonstrated the feasibility of using ECG signals for human identification. the neural network resulted in 100% recognition. (2002) reported an ECG based recognition method with seven fiducial based features defined based on the QRS complex. The proposed methodology encompassed two steps: During a first step. for a fiducial feature extraction algorithm. template matching was used to compute the correlation coefficient among QRS complexes in the gallery set in order to find possible candidates and prune the search space. This feature set was subsequently fed to SIMCA for training and classification. The highest reported performance was 94. A decision based neural network (DBNN) was then formed to strengthen the validation of the identity resulting from the first step. P wave duration and other. These features were identified on the pulses by applying a threshold to the second order derivative. In addition. rendering the localization of these waves’ onsets and offsets very difficult. Overall. Out of 30 clinical diagnosis features that were estimated for each of the 12 leads. a variety of experimental settings are described in Israel et al. PQ interval. A classification method based on BayesŠ Theorem was proposed to maximize the posterior probability given prior probabilities and class-conditional densities. An identification rate of 97. It was also reported that the method was robust to noise for SNR above 20 dB. together with the nearest neighbor classifier. followed by PCA and LDA for dimensionality reduction.5% on the normal beats of 13 subjects from the MIT Arrhythmia database. investigated different models. (2005) in 2005. (2010). The Fourier transform was applied on a heart beat itself and all three sub-sequences. The proposed system was comprised of two steps as follows: First the FLDA and nearest neighbor operated on the features and then a DTW classifier was applied to additionally boost the performance (100% over a 12-subject database).6% was achieved over recordings of 10 individuals. (2009). The Log-likelihood score was used to compare the estimated model of a testing ECG to that of the enrolled templates. A neural network with radial basis function was realized as the classifier and the recognition rate was between 62% to 94% for different subjects. Another fiducial based method was proposed by Tawfik et al.5% to 13% depending on the particular lead that was used. In addition. suggested 14 commonly used features from ECG heart beats on which a PCA was applied to reduce dimensionality. In addition the onset. The proposed method outperformed Mahalanobis’ distance by 3. the P wave was excluded when calculating the features since it disappears when heart rate increases. (2005). The system’s accuracy was 99% as tested over 25 subjects. HMM-SGM (Hidden markov model with single Gaussian model) and CRF (Conditional Random Field). Zhang & Wei (2006). The spectrum were then passed to a neural network for classification. only features related to QRS complex were selected due to their robustness to heart rate variability. QRS complex and T wave. peak and offset of the P wave were detected by tracing the signal and examining the zero crossings in its first derivative. Ting & Salleh (2010). Boumbarov et al. In addition to commonly used features within QRS complex. By examining the ECG signal within a preset window before Q wave onset and apply a threshold to its first derivative.85%) by using the three sub-sequences instead of the original heart beat.14% to 2. Singh & Gupta (2008).8 206 Will-be-set-by-IN-TECH Biometrics A similar approach was reported in the same year by Palaniappan & Krishnan (2004). which is a measure of signal complexity. proposed a method to normalize time domain features by Fourier synthesizing an up-sampled ECG heart beat. a form factor. Another work that addressed the heart rate variations was by Saechia et al. was proposed an tested as input to a neural network classifier. The heart beats were normalized to a the healthy durations and then divided into three sub-sequences: P wave. For verification. The reported identification rate was 87. In 2009. the ECG segment between the QRS complex and the T wave was first extracted and normalized in the . described in 2010 a nonlinear dynamical model to represent the ECG in a state space form with the posterior states inferred by an Extended Kalman Filter. to determine different fiducial points in an ECG segment. In this work. It was shown that false rate was significantly lower (17. The Dynamic time warping or FLDA were used in Venkatesh & Jayaraman (2010). Same two-stage setting was applied together with a threshold and the reported performance was 96% for 12 legitimates and 3 intruders. Kim et al. proposed a way to delineate the P and T waveforms for accurate feature extraction. such as HMM-GMM (Hidden markov model with Gaussian mixture model). With this strategy. the performance improved significantly when the testing subjects were performing physical activities. the precise position of P was revealed. by training a MLP-BP with 30 hidden units. 5% corresponding to a 2. For classification. the authors suggested the AC of an ECG segment as a way to avoid fiducial points detection. Among the earliest is Plataniotis et al. (2006) proposal for an autocorrelation (AC) based feature extractor. Fiducial independent approaches On the non-fiducial methodologies side the majority of the works were reported after 2006. proposed the use of DWT on heuristically isolated pulses.2% and 2. reported ECG signal collection from the fingers by asking the participants to hold two electrode pads with their thumb and index finger. The author also pointed . The overall recognition rate of the system was 99% for 74 subjects. However. Methods and Applications and Applications Heart Biometrics: Theory. for which multiple ECG recordings were available. Wübbeler et al. Furthermore. The method was tested on 14 subjects. The estimated version was produced by a morphological synthesis algorithm involving a modified dynamic time warping procedure.Heart Biometrics: Theory.Chiu et al. the dimensionality of a segment from the autocorrelation was considerably high for cost efficient applications. a new recording session was conducted on several misclassified subjects which improved the system performance to 95%. A verification functionality was also designed by setting a threshold on the distances.1%.embeds highly discriminative information in a population. The highest reported performance was 98% with a 2 In 2008.727% for that of Framingham correction formula and 98.18% for that with constant QT interval. Chan et al. acquired a few years apart. input signals were rejected. When the proposed method was applied to a database of 35 normal subjects. 6. To locate and extract pulses a thresholding procedure was applied. The DWT was used for feature extraction and the Euclidean distance as the similarity measure. More precisely. The identification rate was 97. Methods 207 9 time domain by using Framingham correction formula or assuming constant QT interval.09%. (2007). (2008). a 100% verification rate was reported. The DCT was then applied and the coefficients were fed into a neural network for classification. have also reported an ECG based human recognizer by extracting biometric features from a combination of Leads I. using only the QRS complex without any time domain manipulation yielded a performance of 99. The reported false acceptance and rejection rates were 0. With the objective of capturing the repetitive pattern of ECG. The Wavelet distance (WDIST) was used as the similarity measure with a classification accuracy of 89. depending on the original sampling frequency of the signal. An ECG heartbeat was normalized and compared with its estimate which was constructed from itself and the templates from the claimed identity. II and III i. which outperformed other methods such as the percent residual distance(PRD) and the correlation coefficient(CCORR). (2007). as 43 samples backward and 84 samples forward from every R peak. while in any other case.e. In the same year.8% equal error rate (EER). The identification performance was 100%. It was demonstrated that the autocorrelation of windowed ECG signals. To reduce the dimensionality and retain only useful for recognition characteristics. (2008).. Furthermore. Authenticated pairs were considered those which were adequately related. every heart beat was determined on the ECG signal. the discrete cosine transform (DCT) was applied. a two dimensional heart vector also known as the characteristic of the electrocardiogram. A methodology for ECG synthesis was proposed in 2007 by Molina et al. the distance between two heart vectors as well as their first and second temporal derivatives were calculated. The Euclidean distance was used as a similarity measure and a threshold was applied to decide the authenticity. applied the discrete wavelet transform (DWT) and independent component analysis (ICA) on ECG heart beat segments to obtain 118 and 18 features respectively (the feature vectors were concatenated). An SVM with Guassian radial basis function was used for classification with a decision level fusion of the results from the two leads. The reported classification accuracy was 100% on a 19-subject database in presence of emotional state variation. followed by a process wherein every heart beat was resampled. and the Higuchi chaotic dimension. The dimensionality of the feature space was reduced from 136 to 26 using a PCA which retained 99% of the data’s variance. . they were not as good descriptors for biometric recognition. Fatemian & Hatzinakos (2009). Shanon Entropy. The hermite polynomial expansion was employed to transform heart beats into Hermite polynomial coefficients which were then modeled by an SVM with a linear kernel. Another observation was that even though dynamic features such as the R-R interval proved to be beneficial for arrhythmia classification. isolated the heart beats and performed 8-bit uniform quantization to map the ECG samples to strings from a 256-symbol alphabet.37% ERR was achieved for verification and a 99% identification rate. normalized. also suggested the Wavelet transform to denoise and delineate the ECG signals. ApEn. Classification was based on finding the template in the gallery set that results in the shortest description length of the test input (given the strings in the particular template) which was calculated by the Ziv-Merhav cross parsing algorithm.10 208 Will-be-set-by-IN-TECH Biometrics out that false rate would increase if 10 subjects with cardiac arrhythmia were included in the database. Li & Narayanan (2010). aligned and averaged to create one strong template per subject. Tables 1 and 2 provide a high level comparison of the works that can currently be found in the literature. (2010). (2010). Dimensionality reduction was based on Kullback-Leibler divergence where a feature is selected only if the relative entropy between itself and the nominal model (which is the spectrogram of all subjects in database) is larger than a certain threshold. A correlation analysis was directly applied to test heart beats and the template since the gallery size was greatly reduced. Ye et al.6% was achieved for normal heart beats. Log-likelihood ratio was used as a similarity measure for classification and different scenarios were examined. The highest reported performance was 98. Cepstral features were extracted by simple linear filtering and modeled by GMM/GSV(GMM supervector). The ECG signal was segmented with 50% overlap and an AR model of order 4 was estimated so that its coefficients are used for classification. the mean PSD of each segment was concatenated as add-on features which increased the system performance to 100% using a KNN classifier. Coutinho et al. Autoregressive modeling was used in Ghofrani & Bostani (2010). Furthermore. Rank-1 classification rate of 99.26% with a 5% ERR corresponding to a score level fusion of both temporal and cepstral classifiers. a 0. For enrollment and testings over the same day. the performance was e 5. (2010) to transform the ECG into a set of time-frequency bins which were modeled by independent normal distributions.9% respectively. For different days. The reported recognition rate was 99. proposed a method to model the ECG signal in both the temporal and cepstral domain. The proposed method outperformed the state-of-art fractal-based approach such as the Lyapunov exponent.6% for a setting were every subject has 2 templates in the gallery. The Spectogram was employed in Odinaka et al.58% ERR and 76. 3. Similarly. The 5 sec duration has been chosen experimentally.. The autocorrelation (AC) is computed for every 5 sec ECG using: R xx (m) = N −| m |−1 i =0 ∑ x (i ) x (i + m ) (1) where x (i ) is an ECG sample for i = 0. and N is the length of the signal. ECG sensors located on the smart phone. Out of R xx only a segment φ(m). false acceptance can be controlled by learning the patterns of possible attackers. the verification decision is performed with a threshold on the distance of the two feature vectors. This results into a number of AC segments Φ(m) against which an input AC feature vector φinput (m) is learned. Therefore for smart phone based verification the objective is to optimally reduce the within class variability while learning patterns of the general population. enrollment or verification. defined by the zero lag instance and extending to approximately the length of a QRS complex1 is used for further processing. Essentially. every user can use his/her phone in one of the two modes of operation i. 1 A QRS complex lasts for approximately 100 msec . such as human verification on smart phones.. given a generic dataset (which can be anonymous to ensure privacy) the autocorrelation of every 5 sec ECG segment is computed using Eq. thus utilizing only this segment for discriminant analysis. is a fiducial points independent methodology originally proposed in Agrafioti & Hatzinakos (2008b) and later expanded to cardiac irregularities in Agrafioti & Hatzinakos (2008a). makes the system robust to the heart rate variability. This method..M. The AC/LDA algorithm The AC/ LDA. and also allows to capture the ECG’s repetitive property. as it is fast enough for real life applications. m = 0. 1. Methods and Applications and Applications Heart Biometrics: Theory. Block diagram of the AC/LDA algorithm. shows a block diagram of the AC/LDA algorithm. 1. Methods 209 11 Filter ECG AC LDA Classification ID Templates Fig.e..( N − | m| − 1). In a distributed system.. 7. against which a new enrollee will be learned. first acquire an ECG sample from a subject’s fingers and then a biometric signature is designed and saved. Figure 3. x (i + m) is the time shifted version of the ECG with a time lag m. Although verification performs one-to-one matches. relies on a small segment of the autocorrelation of 5 sec ECG signals. More specific. 1.Heart Biometrics: Theory. during verification a newly acquired sample is matched against the store template. This is because this wave is the least affected by heart rate changes. The reader should note that this ECG window is allowed to cut the signal even in the middle of pulse. During enrollment. and it does not require any prior knowledge on the heart beat locations. This can be done with LDA training over a large generic dataset. (2006) Analyze the autocorrelation of ECGs for feature extraction and apply DCT for dimensionality reduction Wübbeler et al. (2007) Utilize the characteristic vector of the electrocardiogram for fiducial based feature extraction out of the QRS complex Molina et al. Performance Number of Subjects 9 94.2% 100% 20 100% 20 100% 29 97. (2007) Morphological synthesis technique was proposed to produced a synthesized ECG heartbeat between the test sample and template Wavelet distance measure was introduced to test the similarity between ECGs Chan et al. Plataniotis et al.Biometrics Method Principle Kyoso & Uchiyama (2001) Analyzed four fiducial based features from heart beats. (2005) Analyze fiducial based temporal features under various stress conditions Palaniappan & Krishnan (2004) Use two different neural network architectures for classification of six QRS wave related features Kim et al.% 10 N/A 10 97.15% 20 97. (2001) Use a SIEMENS ECG apparatus to record and select appropriate medical diagnostic features for classification Shen et al. (2005) By normalizing ECG heartbeat using Fourier synthesis the performance under physical activities was improved Saechia et al.4% 502 100% 14 99% 74 98% 10 95% 50 210 . (2005) Examined the effectiveness of segmenting ECG heartbeat into three subsequences Zhang & Wei (2006) Bayes’ classifier based on conditional probability was used for identification and was found supurior to Mahalanobis’ distance. to determine those with greater impact on the identification performance Biel et al. Summary of related to ECG based recognition works. (2008) Table 1. (2002) Use template matching and neural networks to classify QRS complex related characteristics Israel et al. 09% 22 99. (2010) Examined the system performance when using normalized QT and QRS and using raw QRS Ye et al.211 Method Principle Performance Heart Biometrics: Theory. (2010) Time frequency analysis and relative entropy to classify ECGs Venkatesh & Jayaraman (2010) Apply dynamic time warping and Fisher’s discriminant analysis on ECG features Tawfik et al.6% 36 normal and 112 arrhythmic 100% 19 100% 12 98. (Continued) Summary of related to ECG based recognition works .1% 9 87. (2010) Treat heartbeats as a strings and using Ziv-Merhav parsing to measure the cross complexity Ghofrani & Bostani (2010) Autoregressive coefficient and mean of power spectral density were proposed to model the system for classification model the system for classification Fusion of temporal and cepstral features Li & Narayanan (2010) 99% 99. (2010) Applied Wavelet transform and Independent component analysis.6% Number of Subjects 25 13 86. together with support vector machi as classifier to fuse information from two leads Coutinho et al.9% 269 100% 15 99.3% 18 Table 2. (2009) Neural network with radial basis function was employed as the classifier Ting & Salleh (2010) Use extended Kalman filter as inference engine to estimate ECG in state space Odinaka et al. Methods and Applications Singh & Gupta (2008) A new method to delineate P and T waves Fatemian & Hatzinakos (2009) Less templates per suject in gallery set to speed up computation and reduce memory requirement Boumbarov et al.5% 13 76. Experimental performance Preprocessing of the signals is a very important because ECG is affected by both high and low frequency noise. 9. For the 16 volunteers that two recordings were available. it does not affect the overall waiting time of the verification. and a new LDA was trained for every enrollee. Overall. This is because during enrollment. every device can be "tuned" to the expected variability of the user.comm. During the collection.Φ C (m). with the Vernier ECG sensor. ECG signal collection The performance of the framework discussed in Section 7 was evaluated over ECG recordings collected at the BioSec. Furthermore. for the generic dataset. During the first session. Φ2 (m). two recording sessions took place. longer ECG recordings can be acquired. The training set will then involve C + 1 classes as follows: (2) Φ(m) = [ Φ1 (m). The wrists were selected for this recording so that the morphology of the acquired signal can resemble the one collected by a smart phone from the subject’s fingers.Lab2 . scheduled a couple of weeks apart. a feature vector is projected using: Yi (k) = Ψ T Φ i (m) (3) where eventually k << m and at most C. at the University of Toronto. in order to allow for mental state variability to be captured in the data.. the subjects were given no special instructions. centered between 0. This can be done during enrollment.ca/∼biometrics/ . the autocorrelation was computed according to Eq. The performance each recognizer was then tested with matches the respective subject’s recordings in the test set. 52 healthy volunteers were recorded for approximately 3 min each. Each of the enrollees’ recordings were appended to the generic dataset individually. The experiment was repeated a month later for 16 of the volunteers. so that multiple segments of the user’s biometric participate in training.14 212 Will-be-set-by-IN-TECH Biometrics Let the number of classes in the generic dataset be C. the enrollment records and the testing ones.. The longer the training ECG signal the lower the chances of false rejection. since this is only required in the enrollment mode of operation. Essentially. rather than imposing universal distance thresholds for all enrolles. An advantage of distributed verification is that smart phone can be optimized experimentally for the intra-class variability of a particular user. 8.utoronto. by choosing the smallest distance threshold at which an individual is authenticated. in order to investigate the permanence of the signal in terms of verification performance. Φ input (m)] and for every subject i in C + 1 . Given Φ(m). 1. The recordings of the 36 volunteers who participated to the experiment only once were used for generic training. The sampling frequency was 200Hz. the earliest ones were used for enrollment and the latter for testing. LDA will find a set of k feature basis vectors {ψv }k =1 by maximizing the ratio of v between-class and within-class scatter matrix. For this reason. After filtering. Given the transformation matrix Ψ. a butterworth bandpass filter of order 4 was used. a number of Ci AC vectors are available. The signals were collected from the subject’s wrists.5 Hz and 40Hz based on empirical results. To estimate the False Acceptance 2 http://www. Methods 213 15 100% 90% 80% 70% 60% 50% 40% False Acceptance Rate False Rejection Rate Error Ra ates EER = 32 % 30% 20% 10% 0% 0 0.15 0.17 0.2 0.167 0. 5.Heart Biometrics: Theory.35 0.4 Distance Threshold Fig. .1 0.173 Distance Threshold Fig. 40% False Acceptance Rate False Rejection Rate 30% Error Rate es 20% EER = 14% % 10% 0 0. 4.05 0. False Acceptance and Rejection Rates when imposing universal recognition thresholds. False Acceptance and Rejection Rates after personalization of the thresholds.3 0.171 0.169 0.168 0. Methods and Applications and Applications Heart Biometrics: Theory.25 0.172 0. F. This subset of recordings is unseen to the current LDA. M. Velchev. O.e. . When verification is performed locally on a smart device. C. However. subjects who did not participate in the generic pool). The Equal Error Rate (EER) i. Wavelet distance measure for person identification using electrocardiograms. acted as intruders to the system. (2008). Chiu. International Conference on Multimedia and Ubiquitous Engineering.. Agrafioti. D. Badre.. the ECG. ECG biometric analysis in cardiac irregularity conditions. Malta. establish the ECG in the biometrics world and render its use in human recognition very promising. IEEE Transactions on 57(2): 248 –253. L. 446 –451. Symp. a number of approaches have been developed to address the challenges of designing templates that are robust to the heart rate variability. when aligning the individual ROC plots of the enrolles. IEEE Int. with a significant number of subjects exhibiting EER between 0%.. Pettersson. Instrumentation and Measurement. ECG personal identification in subspaces using radial basis neural networks. Y. P. the unknown population. on Instrumentation and Measurement 50(3): 808–812. C.e. Fusion of ECG sources for human identification. O.16 214 Will-be-set-by-IN-TECH Biometrics Rate. Philipson. S.. A. Chuang. (2001). 3rd Int. Therefore. This performance is generally unacceptable for a viable security solution.. Signal. F. pp. ECG analysis: a new approach in human identification. (2009). D. one can take advantage of the fact that every card can be optimized to a particular individual. ECG reflects the cardiac electrical activity over time. the remaining enrolles (i. & Badee. 11. Hamdy. 1863–1703. 10. and thus constitutes. Biel. Conclusion This chapter discussed on one of the most extensively studied medical biometric feature.5%. More details pertaining to this result can be found in Gao et al. It is crucial however to perform large scale tests that can generalize the current performance as well as to address the privacy implications that arise with this technology. pp. permanence and uniqueness. 1542–1547. V. & Sokolov. C. L. and presents significant intra subject variability due to electrophysiological variations of the myocardium. Boumbarov. A novel personal identity verification approach using a discrete wavelet transform of the ECG signal. & Hatzinakos. (2008b).. A. The reported performance of the AC/LDA approach as well as of other methodologies in the literature. Image and Video Processing pp. (2008a). (2008). Chan. IEEE Trans. on Communications Control and Signal Processing. Figure 5 demonstrates the error distribution for the case of personalized thresholds. & Hsu. (2011). 201 –206. Workshop on Intelligent Data Acquisition and Advanced Computing Systems. & Wide.. the rate at which false acceptance and rejection rates are equal is 32%. The EER drops on average to 14%. pp. as it was argued there are significant advantages of using ECG for human identification such as universality. This treatment controls the false rejection since the matching threshold is "tuned" to the biometric variability of the individual. References Agrafioti. C.. Figure 4 demonstrates the tradeoffs between false acceptance (FA) and rejection (FR) when the same threshold values are imposed for all users. & Hatzinakos. (2010). Kozmann. Springer New York. G. Development of an ECG identification system. A. & Lee. S. & Green. (2001). Calais migrants mutilate fingerprints to hide true identity. (1985). Pattern Recognition 38(1): 133–142. A robust human identification by normalized time-domain features of Electrocardiogram. 1114 –1117. pp. & Green. C. 569 – 572.uk/news/worldnews/article-1201126/Calais-migrantsmutilate-fingertips-hide-true-identity. (2010). on Pattern Recognition.. L. Bruekers. R. Presura. S. Mohammadzade. J. International Conference on Signal Processing and Communications. pp. Fred. R. I.. Sheffield. Li. pp. Poland. D. & Hatzinakos. (2010). & Rohrbaugh. Odinaka. H. Peffer. Geometrical factors affecting the interindividual variability of the ECG and the VCG. M. (2001).. H. C. M. D. Stallmann. R. K. J. H.html. N. Precordial voltage variation in the normal electrocardiogram. Plataniotis. (2010). A. Morphological sythesis of ECG signals for person authentication.. G. (2007). F. Larkin. S. (2005). (1964). 16th International Conference on Digital Signal Processing. H. Palaniappan. J. 1 –5. A new ECG feature extractor for biometric recognition. The corrected orthogonal electrocardiogram and vectorcardiogram in 510 normal men (frank lead system). M. 20th Int. Geometrical aspect of the interindividual variaility of multilead ECG recordings. M. & Figueiredo.. & Narayanan. Molina.. Ghofrani. Wiederhold. D... Robust ECG biometrics by fusing temporal and cepstral information. Green. and body habitus on qrs and st-t potential maps of 1100 normal subjects. C.... IEEE Trans. A. K. 17th Iranian Conference of Biomedical Engineering. Methods and Applications and Applications Heart Biometrics: Theory. pp. pp. L. & Hatzinakos. A. P. S. T. L.. Littmann. Kim. Ecg biometrics: A robust short-time frequency analysis. (2006). 23rd Annual International Conference of the Engin. Hoekema. F. (1989). Barr. D. A... & Hunyor. 1326 –1329.Heart Biometrics: Theory. K. & Pipberger. J. Williams. T. Cheng. (2011). G. Speech and Signal Processing (ICASSP)... Conf. ECG biometric recognition without fiducial detection. http://www. & Uchiyama. Effects of age. 20th International Conference on Pattern Recognition.d.. S. Pilkington. R. B. Conf. D.. G.. Gao. Fatemian. Conf on Eng. Electrocardiology 13: 347–352. Circulation 85: 244–253. pp. R. sex. Israel. J. Kyoso. D. 1 –6. (2000). Damstra.. Conf. C.. Lux. Kozmann. & Bostani.. D. R. O’Sullivan. on Acoustics. Irvine. pp. S. Kristjansson. Draper. L. 3858 –3861. Agrafioti. M. Proc. ECG to identify individuals. Eng. & Krishnan.Uijen & van Oosterom. J. Methods 215 17 Coutinho. M. G.. H. S. Electrocardiology 33: 219–227. 1 –6.dailymail. (2006). . & Burgess. F. Biomed.. Daily Mail (n. Sirevaag. & Wiederhold. & Koo. Circulation 17: 1077–1083. & van der Veen. Kaplan. of Biometrics Symposiums (BSYM). Sources of variability in normal body surface potential maps.g in Medicine and Biology Society. USA. in Medicine and Biology Society. H. Yoon.co. M. Hunt. Effect of conductivity interfaces in electrocardiography. Lux. One-lead ECG-based personal identification using Ziv-Merhav cross parsing. 48: 551–559. R. A. J. & Rogers. 30: 637–643. Circulation 30: 853–864.... (1980). Lai.). 27th Annual Int. (2004).. J. (2005). Peter Allen. Kim. Baltimore.-H. 15th European Signal Proc. Maryland. M. Hatzinakos. Reliable features for an ECG-based biometric system. IEEE Int. S. A. H.. E. (2009). R.. IEEE International Workshop on Information Forensics and Security. Identifying individuals using ECG beats. L. J. S.. ECG for blind identity verification in distributed systems. Lux. 755 –759. (2005). Koseeyaporn. Proc. 1. Simon. 774 –777. TENCON 2006. 1 –4. A new ECG identification method using bayes’ teorem. M. 62–63. Human identification system based ECG signal. C. ECG to individual identification. S. B.. Y. nd IEEE Int. pp.d. (2010). One-lead ECG for identity verification. http://www.gov /sentinel/reports/sentinel-annual-reports/sentinel-cy2009.. Zhang. S. Selim. N. 3838 –3841. H. in Med. D. & Wei. Venkatesh. Investigation of human identification using two-lead electrocardiogram (ECG) signals. on Biometrics: Theory Applications and Systems. 7th International Symposium on Communication Systems Networks and Digital Signal Processing. D. 1 –4. Eng. pp.). . Frebruary 2010 (n. 20th International Conference on Pattern Recognition (ICPR). S. Shen. B. Applications and Systems 2. H. (1997). M. C. P. Human identification using time normalized QT signal and the QRS complex of the ECG.. Society and the Biomed. Kreiseler.. (2010). Verification of humans using the electrocardiogram. and Bio. & Salleh. M. pp. Vol. 1 –8. M. (2010). & Gupta. L.. H. The US Federal Commission Report. Ting. & Laguna. 28(10): 1172–1175. Lett. & Eswaran. T. (2007). C. C. Y. Elsevier. J. & Kamal. T. Singh. of the IEEE Eng. ECG based personal identification using extended kalman filter. Sornmo. on Biometrics: Theory.ftc. Tompkins. 10th International Conference on Information Sciences Signal Processing and their Applications. W. TENCON 2005..pdf. G. Biomed. & Jayaraman. & Hu. (2010).. pp. & Kumar. W. P. pp. Bousseljot. Human electrocardiogram for biometrics using DTW and FLDA. Coimbra. pp. pp. Pattern Recogn. 4th Int. (2002). Ye. 30: 257–272. (2005). P. (2006). J. Comput. Conf. P. Z. (2008). & Wardkein. An ECG classifier designed using modified decision based neural network. of the 2nd Conf. Tawfik. Res. Wübbeler. & Elster. Society. Stavridis.18 216 Will-be-set-by-IN-TECH Biometrics Saechia. Conf. R. Bioelectrical Signal Processing in Cardiac and Neurological Applications. . (2006. Prabhakar et al. like fingerprint. are nowadays used in widely deployed systems. Traditional authentication methods fall into two categories: proving that you know something (i. Biometrics offers a natural solution to the authentication problem. A solution to these problems comes from biometric recognition systems. Today. Their system is based on the frequency analysis. Whether we need to use our own equipment or to prove our identity to third parties in order to use services or gain access to physical places. These sounds. 2004). therefore it has properties that a surrogate representation. (2003)). like a password or a token. called S1 and S2. stolen. iris and face. that can be lost.e. token-based authentication). to limit their shortcomings or to enhance their performance. The strength of a biometric system is determined mainly by the trait that is used to verify the identity. Introduction Identity verification is an increasingly important process in our daily lives. in which the authors proposed and started exploring this idea. These methods connect the identity with an alternate and less rich representation. are extracted from the input .. of the sounds produced by the heart during the closure of the mitral tricuspid valve and during the closure of the aortic pulmonary valve. password-based authentication) and proving that you own something (i. The aim of this chapter is to introduce the reader to the usage of heart sounds for biometric recognition. With biometric systems. simply cannot have (Jain et al. we are constantly required to declare our identity and prove our claim. for instance a password. The usage of heart sounds as physiological biometric traits was first introduced in Beritelli & Serrano (2007). Elettronica ed Informatica (DIEEI) University of Catania Italy 1.e. as it contributes to the construction of systems that can recognize people by the analysis of their anatomical and/or behavioral characteristics. or shared. one of the most important research directions in the field of biometrics is the characterization of novel biometric traits that can be used in conjunction with other traits. by means of the Chirp z-Transform (CZT). describing the strengths and the weaknesses of this novel trait and analyzing in detail the methods developed so far and their performance.0 11 Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions Francesco Beritelli and Andrea Spadaccini Dipartimento di Ingegneria Elettrica. the representation of the identity is something that is directly derived from the subject. Plenty of biometric traits have been studied and some of them. comparing them to other biometric traits. In Beritelli & Spadaccini (2010a. in Section 5 we describe the statistical approach. . each lasting from 20 to 70 seconds. while the latter is a recording of the sounds that are produced during its activity (heart sounds). This technique is different from Phua et al. building a system that leverages statistical modelling using Gaussian Mixture Models. in Section 3 we present a survey of recent works on heart-sounds biometry by other research groups. containing two heart sequences per person. In Beritelli & Spadaccini (2009a. In this chapter. choosing a more suitable feature set (Linear Frequency Cepstrum Coefficients. finally. the authors further develop the system described in Beritelli & Serrano (2007). (2008)): eyes (iris and retina). 2. the latter proves to be the most performant system. (2008). the authors take an alternative approach to the problem. they use the whole sequences. Instead of doing a structural analysis of the input signal. the usage of features specific to heart sounds and the statistical engine. in Section 7 we present our conclusions. face. in Section 6 we compare the performance of the two methods on a common database. in Section 4 we describe in detail the structural approach. this chapter will focus on hearts-sounds biometry. LFCC). almost all the parts of the body can already be used for the identification process (Jain et al. The first is a signal derived from the electrical activity of the organ. palmprint. ears. called First-to-Second Ratio (FSR) and adding a quality-based data selection algorithm. This chapter is structured as follows: in Section 2.2 218 Biometrics / Book 1 Biometrics signal using a segmentation algorithm. the authors describe a different approach to heart-sounds biometry.b). and highlight current issues of this method and suggest the directions for the future research.b). evaluating its performance on a larger database. This system proved to yield good performance in spite of a larger database. the database. (2008) in many ways. Biometric recognition using heart sounds Biometric recognition is the process of inferring the identity of a person via quantitative analysis of one or more traits. fingerprints). the Electrocardiograph (ECG) and the Phonocardiogram (PCG). The heart is involved in the production of two biological signals. teeth etc. we describe in detail the usage of heart sounds for biometric identification.70 % over a database of 165 people. feeding them to two recognizers built using Vector Quantization and Gaussian Mixture Models. adding a time-domain feature specific for heart sounds. and the final Equal Error Rate (EER) obtained using this technique is 13. In Phua et al. Speaking of physiological traits. describing both the performance metrics and the heart sounds database used for the evaluation. (2001) for ECG-based biometry). briefly explaining how the human cardio-circulatory system works and produces heart sounds and how they can be processed. While both signals have been used as biometric traits (see Biel et al. most notably the segmentation of the heart sounds. hand (shape. that can be derived either directly from a person’s body (physiological traits) or from one’s behaviour (behavioural traits). veins. we will focus on an organ that is of fundamental importance for our life: the heart. The authors build the identity templates using feature vectors and test if the identity claim is true by computing the Euclidean distance between the stored template and the features extracted during the identity verification phase. • Low Collectability: the collection of heart sounds is not an immediate process. • Circumvention: the system should be robust to malicious identification attempts. with respect to speed. and it is also difficult to record it covertly in order to reproduce it later. a trait should possess: • Universality: each person should possess it. as said before. the discriminative power of heart sounds still must be demonstrated. • Medium Acceptability: heart sounds are probably identified as unique and trustable by people. but they might be unwilling to use them in daily authentication tasks. • Low Performance: most of the techniques used for heart-sounds biometry are computationally intensive and. (2004) presents a classification of available biometric traits with respect to 7 qualities that. • Performance: biometric systems that use it should be reasonably performant. heart-sounds biometry is a new technique. in order to compare this new technique with other more established traits. M (medium). • Distinctiveness: it should be helpful in the distinction between any two people. The reasoning behind each of our subjective evaluations of the qualities of heart sounds is as follows: • High Universality: a working heart is a conditio sine qua non for human life. • Acceptability: the users of the biometric system should see the usage of the trait as a natural and trustable thing to do in order to authenticate. • Medium Distinctiveness: the actual systems’ performance is still far from the most discriminating traits. • Collectability: it should be quantitatively measurable.1 Comparison to other biometric traits The paper Jain et al. L (low). the accuracy still needs to be improved. • Permanence: it should not change over time. • Low Permanence: although to the best of our knowledge no studies have been conducted in this field. and some of its drawbacks probably will be addressed and resolved in future research work. We added to the original table a row with our subjective evaluation of heart-sounds biometry with respect to the qualities described above. • Low Circumvention: it is very difficult to reproduce the heart sound of another person. Each trait is evaluated with respect to each of these qualities using 3 possible qualifiers: H (high). and electronic stethoscopes must be placed in well-defined positions on the chest to get a high-quality signal. so their accuracy over extended time spans must be evaluated. The updated table is reproduced in Table 1.Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 219 3 2. . we perceive that heart sounds can change their properties over time. accuracy and computational requirements. and the tests are conducted using small databases. according to the authors. Of course. Heart sounds fall in two categories: • primary sounds. The first sound. (2004) and heart sounds 2. non-stationary and quasi-periodic signal that is produced by the heart during its continuous pumping work (Sabarimalai Manikandan & Soman (2010)).3. It is composed by several smaller sounds. Comparison between biometric traits as in Jain et al. with techniques known to work on other monodimensional signals. Those . even in pathological conditions. S2. produced by the blood flowing in the heart or by pathologies.1. The primary sounds are S1 and S2. there are the S3 and S4 sounds. We separate them from the rest of the heart sound signal using the algorithm described in Section 2. produced by the closure of the heart valves. we only use the primary sounds because they are the two loudest sounds and they are the only ones that a heart always produces. Among the other sounds. S1.3 Processing heart sounds Heart sounds are monodimensional signals. to some extent. is caused by the closure of the aortic and pulmonary valves.4 220 Distinctiveness Circumvention L M H L M M M L L M L M L H H L Collectability Acceptability Performance Universality Permanence Biometrics / Book 1 Biometrics Biometric identifier DNA Ear Face Facial thermogram Fingerprint Gait Hand geometry Hand vein Iris Keystroke Odor Palmprint Retina Signature Voice Heart sounds H M H H M M M M H L H M H L M H H M L H H L M M H L H H H L L M H H M L H L M M H L H H M L M L L M H H M H H M M M L M L H L L H M L M H L M M H L L H H L L L L H H H M H M M L M M M L H M M Table 1.2 Physiology and structure of heart sounds The heart sound signal is a complex. In our systems. like audio signals. while the second sound. that are quieter and rarer than S1 and S2. 2. that are high-frequency noises. is caused by the closure of the tricuspid and mitral valves. each associated with a specific event in the working cycle of the heart. and can be processed. • other sounds. and murmurs. Low-frequency components are ignored by searching only over the portion of autocorrelation after the first minimum. the positions of SX1 and SX2 are compared. 2. moving by a number P of frames in each direction and searching for local maxima in a window of the energy signal in order to take into account small fluctuations of the heart rate.3. A simple energy-based approach can not be used because the signal can contain impulsive noise that could be mistaken for a significant sound. To overcome this problem. a constant-width window is applied to select a portion of the signal. and the other peaks are found in the same way.Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 221 5 techniques then need to be refined taking into account the peculiarities of the signal. At this stage.1) and three algorithms used for feature extraction (2. . and possibly more meaningful. the maximum value of the autocorrelation function is computed. We will briefly discuss two algorithms that work in the frequency domain. Finally.3. the signal is split into 4-seconds wide windows and the algorithm is applied to each window.2 The chirp z-transform The Chirp z-Transform (CZT) is an algorithm for the computation of the z-Transform of sampled signals that offers some additional flexibility to the Fast Fourier Transform (FFT) algorithm. because as the sequence gets longer the periodicity of the sequence fades away due to noise and variations of the heart rate. 2. The resulting sets of heart sounds endpoint are then joined into a single set.3. all the corresponding frames in the original signal are zeroed out. its structure and components. and one in the time domain. The algorithm then searches other maxima to the left and to the right of SX1. After having completed the search that starts from SX1. that is then processed by feature extraction algorithms.4). must be classified as S1 or S2. improved to deal with long heart sounds. and the algorithm then decides if SX1. Then.2.3.3. in order to estimate the frequency of the heart beat. that is the process of transforming the original heart sound signal into a more compact. The nature of this algorithm requires that it work on short sequences. representation. the remaining identified frames are classified accordingly. 2. called SX2. Such a separation is done because we believe that the S1 and S2 tones are as important to heart sounds as the vowels are to the voice signal. The first step of the algorithm is searching the frame with the highest energy. that is called SX1. 4 to 6 seconds long. and the procedure is repeated to find a new maximum-energy frame. and therefore the period P of the signal.3. we do not know if we found an S1 or an S2 sound. In this section we will describe an algorithm used to separate the S1 and S2 sounds from the rest of the heart sound signal (2.1 Segmentation In this section we describe a variation of the algorithm that was employed in (Beritelli & Serrano (2007)) to separate the S1 and S2 tones from the rest of the heart sound signal. They are stationary in the short term and they convey significant biometric information. After each maximum is selected. and all the frames found starting from it. 2.3. can be computed and used. Xk is the log-energy output of the k-th filter and M is the number of coefficients that must be computed. the number and spectral width of filters. the number of coefficients.3. 1. linear). please refer to Rabiner et al. b) belong to the same person. Figure 2 shows an example of three S1 sounds and the relative MFCC spectrograms. For more details on the CZT. the first two (a. In addition to this. M (2) where K is the number of filters in the filterbank. differential cepstrum coefficients. offering higher resolution than the FFT. This scale is useful because it takes into account the way we perceive sounds. and then are logarithmically spaced. decreasing detail as the frequency increases.6 222 Biometrics / Book 1 Biometrics Fig. and the i-th cepstrum coefficient is computed using the following formula: Ci = k =1 ∑ Xk · cos K i· k− 1 2 · π K i = 0. Among them: the bandwidth and the scale of the filterbank (Mel vs. Many parameters have to be chosen when computing cepstrum coefficients. while the third (c) belongs to a different person.. . (1969) 2. while others use a linearly-spaced filterbank. the filterbank is instead spaced according to the Mel Scale: filters are linearly spaced up to 1 kHz. the spectrum is then feeded to the filterbank. The basic idea of MFCC is the extraction of cepstrum coefficients using a non-linearly spaced filterbank. tipically denoted using a Δ (first order) or ΔΔ (second order). The relation between the Mel frequency fˆmel and the linear frequency f lin is the following: fˆmel = 2595 · log10 1 + f lin 700 (1) Some heart-sound biometry systems use MFCC.. Example of S1 and S2 detection The main advantage of the CZT exploited in the analysis of heart sounds is the fact that it allows high-resolution analysis of narrow frequency bands. The first step of the algorithm is to compute the FFT of the input signal.. .3 Cepstral analysis Mel-Frequency Cepstrum Coefficients (MFCC) are one of the most widespread parametric representation of audio signals (Davis & Mermelstein (1980)). 4 and 5.the number of people involved in the study and the amount of heart sounds recorded from each of them. This is why we propose a time-domain feature called First-to-Second Ratio (FSR).4 The First-to-Second Ratio (FSR) In addition to standard feature extraction techniques.3. 3. In Table 2 we summarized the main characteristics of the works that will be analyzed in this section. we observed that some people tend to have an S1 sound that is louder than S2. we will briefly describe their methods. In this section. 2. as it is not a simple audio sequence but has specific properties that could be exploited to develop features with additional discriminative power.Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 223 7 Fig. We try to represent this diversity using our new feature. During our work. The implementation of the feature is different in the two biometric systems that we described in this chapter. it would be desirable to develop ad-hoc features for the heart sound. and a discussion of the two algorithms can be found in 4. while in others this balance is inverted. Example of waveforms and MFCC spectrograms of S1 sounds 2. using the following criteria: • Database . Intuitively.4. different research groups have been studying the possibility of using heart sounds for biometric recognition. Review of related works In the last years. . the FSR represents the power ratio of the first heart sound (S1) to the second heart sound (S2). Comparison of recent works about heart-sound biometrics In the rest of the section. (2008). so it would be difficult to compare them. at frame level or from the whole sequence. take the idea of finding a good and representative feature set for heart sounds even further. (2008) Tran et al. energy peaks Classification GMM VQ SVM Euclidean distance Euclidean distance MSE kNN Fatemian et al.which features were extracted from the signal. • Classification . rhythmic features. Phua et al. the most performing system is composed of LFBC features (60 cepstra + log-energy + 256ms frames with no overlap). (2010). (2010) 10 HS cross-correlation 10 seconds per HS complex cepstrum Table 2. Having obtained good recognition performance using the HTK Speech Recognition toolkit. so it would not be a fair comparison. they do a deeper test using a database recorded from 10 people and containing 100 sounds for each person. GMM-4 classification. using 1-minute heart sounds and splitting the same signal into a train and a testing sequence.how features were used to make a decision. 30s of training/test length. harmonic features. one of which worked on Phua et al. we will briefly review each of these papers. cepstral coefficientrs. (2010) Jasper & Othman (2010) Database 10 people 100 HS each 52 people 100m each 10 people 20 HS each 21 people 6 HS each 8 seconds per HS Features MFCC LBFC Multiple Energy peaks MFCC. the authors first do a quick exploration of the feasibility of using heart sounds as a biometric trait. LDA. We chose not to represent performance in this table for two reasons: first. on their database. Linear Frequency Band Cepstra (LFBC)). the database and the approach used are quite different one from another. second. (2008) was one of the first works in the field of heart-sounds biometry. cardiac features and the GMM supervector. The tests . After testing many combinations of those parameters. exploring 7 sets of features: temporal shape. they conclude that. spectral shape. investigating the performance of the system using different feature extraction algorithms (MFCC.8 224 Biometrics / Book 1 Biometrics • Features . by recording a test database composed of 128 people. different classification schemes (Vector Quantization (VQ) and Gaussian Mixture Models (GMM)) and investigating the impact of the frame size and of the training/test length. Paper Phua et al. The authors of Tran et al. (2010) 40 people autocorrelation El-Bendary et al. They then feed all those features to a feature selection method called RFE-SVM and use two feature selection strategies (optimal and sub-optimal) to find the best set of features among the ones they considered. In this paper. most papers do not adopt the same performance metric. are better for the automatically selected feature sets with respect to the EERs computed over each individual feature set. using the Daubechies-6 wavelet. using two classifiers: Mean Square Error (MSE) and k-Nearest Neighbor (kNN). that does not directly keep the feature vectors but instead it represents identities via statistical parameters inferred in the learning phase. 4. (2010) do not propose a pure-PCG approach. They test the PCG-based system on a database of 21 people. but they rather investigate the usage of both the ECG and PCG for biometric recognition. and they find that. The authors of El-Bendary et al. described in 4. The restriction on the length of the heart sound was removed in Beritelli & Spadaccini (2009a). They then test the identities of people in their database. among the ones considered. that is composed by 40 people.1. and retaining only coefficients from the 3rd. In this short summary. The remaining frames are then processed using the Short-Term Fourier Transform (STFT).Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 225 9 were conducted on a database of 52 people and the results. up to the 5th scale. The heart sounds are processed using the Daubechies-5 wavelet. The decision is made using the Euclidean distance from the feature vector obtained in this way and the template stored in the database. In Jasper & Othman (2010). (2010) filter the signal using the DWT. expressed in terms of Equal Error Rate (EER). cross-correlation and cepstra. It was required that a human operator selected a portion of the input signal based on some subjective assumptions. 4 to 6 seconds long and thus containing at least four cardiac cycles (S1-S2). then it is processed using the Discrete Wavelet Transform. 4. and the D4 and D5 subbands (34 to 138 Hz) are then selected for further processing. On their database. then they extract different kinds of features: auto-correlation. Figure 3 contains the block diagram of the system. the Mel-Frequency filterbank and Linear Discriminant Analysis (LDA) for dimensionality reduction.1 The best subsequence selection algorithm The fact that the segmentation and matching algorithms of the original system were designed to work on short sequences was a strong constraint for the system. it was designed to work with short heart sounds. the authors then extract from the signal some energy parameters. The structural approach to heart-sounds biometry The first system that we describe in depth was introduced in Beritelli & Serrano (2007). 4th and 5th scales. in opposition to the “statistical” approach. that introduced the quality-based best subsequence selection algorithm. After a normalization and framing step. the authors describe an experimental system where the signal is first downsampled from 11025 Hz to 2205 Hz. the Shannon energy envelogram is the feature that gives the best performance on their database of 10 people. we will focus only on the part of their work that is related to PCG. and their combined PCG-ECG systems has better performance. The authors of Fatemian et al. to select which coefficients should be used for further stages. Each of the steps will be described in the following sections. . They then use two energy thresholds (low and high). We call this system “structural” because the identity templates are stored as feature vectors. the kNN classifier performs better than the MSE one. It was clearly a flaw that needed to be addressed in further versions of the system. the average powers of S1 and S2 heart sounds: . So. 4.5. The subsequence i with the maximum value of DHS QI (i ) is then selected as the best one and retained for further processing. k) k =1 j =1 j =k Where dS1 and dS2 are the cepstral distances defined in 4.10 226 S1/S2 endpoint detector MFCC Biometrics / Book 1 Biometrics Template yes x ( n) Best subsequence detector ˆ x ( n) S1/S2 sounds detector S1/S2 sounds FSR Low-pass filter Matcher no Fig. which removed the high-frequency extraneous components. that is the log-energy of the analyzed sound. that computes the cepstral features according to the algorithm described in 2. 4. 3. k) + ∑ ∑ dS2 ( j. Let N be the number of complete S1-S2 cardiac cycles in the signal. the authors developed a quality-based subsequence selection algorithm. S2) sound. while the rest of the input signal is discarded. for a given subsequence i. We can then define PS1 and PS2 . the quality index is defined as: DHS QI (i ) = 1 k =1 j =1 j =k 4 4 4 4 (3) ∑ ∑ dS1 ( j. The quality index is based on a cepstral similarity criterion: the selected subsequence is the one for which the cepstral distance of the tones is the lowest possible. the signal is then given in input to the heart sound endpoint detection algorithm described in 2. with the addition of a 13-th coefficient computed using an i = 0 value in Equation 2. 4. The endpoints that it finds are then used to extract the relevant portions of the signal over a version of the heart sound signal that was previously filtered using a low-pass filter. Block diagram of the proposed cardiac biometry system To resolve this issue.3.2 Filtering and segmentation After the best subsequence selection.3 Feature extraction The heart sounds are then passed to the feature extraction module. This system uses M = 12 MFCC coefficients. Let PS1i (resp.3.1. the system computes the FSR according to the following algorithm.4 Computation of the First-to-Second Ratio For each input signal. PS2i ) be the power of the i-th S1 (resp. based on the definition of a quality index DHS QI (i ) for each contiguous subsequence i of the input signal. j =1 ∑ ∑ N N d2 ( XS1 (i ). Y ) = 1 N2 1 N2 i. as follows: k FSR = max 1. YS2 ( j)) (8) (9) i. Y )2 (11) . we can then define the First-to-Second Ration of a given heart sound signal as: PS1 PS2 For two given DHS sequences x1 and x2 . d FSR norm th FSR (10) In this way. XS2 (i )) be the feature vector for the i-th S1 (resp. We then normalized the values of d FSR between 0 and 1 (d FSRnorm ). Y ) = dS2 ( X. we chose a threshold of activation of the FSR (th F SR) and we defined defined k FSR . MFCC are compared using the Euclidean metric (d2 ). it will increase the cepstral distance. Y )2 + dS2 ( X. Y ) = k FSR · dS1 ( X. Then the cepstral distances between X and Y can be defined as follows: dS1 ( X.Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 227 11 PS1 = PS2 = 1 N P N i∑ S1i =1 (4) 1 N P (5) N i∑ S2i =1 Using these definitions. S2) sound of the X signal and YS1 and YS2 the analogous vectors for the Y signal. we define the FSR distance as: FSR = d FSR ( x1 . the distance between X and Y can be computed as follows: d( X. let XS1 (i ) (resp. making the distance bigger when it has an high value while not changing the distance for low values. Starting from the d FSR as defined in Equation 7. Given two heart sound signals X and Y.5 Matching and identity verification (6) (7) The crucial point of identity verification is the computation of the distance between the feature set that represents the input signal and the template associated with the identity claimed in the acquisition phase by the person that is trying to be authenticated by the system. Finally. an amplifying factor used in the matching phase. This system employs two kinds of distance: the first in the cepstral domain and the second using the FSR. we wanted this distance to act like an amplifying factor for the cepstral distance. but if it is larger. YS1 ( j)) d2 ( XS2 (i ). if the normalized FSR distance is lower than th FSR it has no effect on the final score. x2 ) = | FSRdB ( x1 ) − FSRdB ( x2 )| 4.j =1 Now let us take into account the FSR. Σ i } (13) Those parameters of the model are learned in the training phase using the Expectation-Maximization algorithm (McLachlan & Krishnan (1997)). . but instead uses them to infer a statistical model of the identity and makes a decision computing the probability that the input signal belongs to the person whose identity was claimed in the identity verification process. computed using Gaussian Mixture Models. 5. a fundamental system parameter that is chosen in the design phase. whose probability is being estimated. and wi is the weight of the i-th probability density.2 The GMM/UBM method The problem of verifying whether an input heart sound signal s belongs to a stated identity I is equivalent to a hypothesis test between two hypotheses: H0 : s belongs to I H1 : s does not belong to I This decision can be taken using a likelihood test: ⎧ p(s| H0 ) ⎨ ≥ θ accept H0 S (s. The probability p(s| H0 ). that is defined as: p i (x ) = 1 (2π ) D |Σ i | e − 2 (x− μ i ) Σ i (x− μ i ) 1 The parameters of pi are μ i (∈ R D ) and Σi (∈ R D× D ). A GMM λ is a weighted sum of N Gaussian probability densities: p (x | λ ) = i =1 ∑ w i p i (x ) N (12) where x is a D-dimensional data vector.12 228 Biometrics / Book 1 Biometrics 5. that together with wi (∈ R N ) form the set of values that represent the GMM: λ = {w i . I ) = p(s| H1 ) ⎩ < θ reject H 0 (14) where θ is the decision threshold. in our system. μ i . using as input data the feature vectors extracted from the heart sounds. The statistical approach to heart-sounds biometry In opposition to the system analyzed in Section 4. the one that will be described in this section is based on a learning process that does not directly take advantage of the features extracted from the heart sounds.1 Gaussian Mixture Models Gaussian Mixture Models (GMM) are a powerful statistical tool used for the estimation of multidimensional probability density representation and estimation (Reynolds & Rose (1995)). 5. using the algorithm described in Section 2. as first defined in Section 4. i. The final score of the identity verification process. heart sounds segmentation is carried on. it seemed more appropriate to just append the FSR to the feature vector computed from each frame in the feature extraction phase. During the world training phase. as outlined in Section 2. 5.e. so: p(s| H0 ) = ∏ p( x j | λ I ) j =1 K (15) In Equation 14. we split the input heart sound signal in 5-second windows and we compute an average FSR (FSR) for each signal window. It is then used in the matching phase to modify the resulting score. the FSR. During identity verification.4. To do this..4. and the final decision is taken comparing the score to a threshold (θ). whether for training a model or for identity verification. and because the features used for heart-sounds biometry are similar to the ones used for speaker recognition. it is defined for the whole input signal. First. and is subsequently used every time the system must compute a matching score. This model is called Universal Background Model (UBM). expressed in terms of log-likelihood ratio. computed as described in Section 5. Finally. an open source toolkit for speaker recognition developed by the ELISA consortium between 2004 and 2008 (Bonastre et al. The adaptation of parts of a system designed for speaker recognition to a different problem was possible because the toolkit is sufficiently general and flexible.3.5 The experimental framework The experimental set-up created for the evaluation of this technique was implemented using some tools provided by ALIZE/SpkDet . (n. In the context of the statistical approach. the system estimates the parameters of the world model λW using a randomly selected subset of the input signals.4 Application of the First-to-Second Ratio The FSR. is appended to each feature vector. is a sequence-wise feature.3 Front-end processing Each time the system gets an input file. It is then appended to each feature vector computed from frames inside the window.)). part of the SPro suite (Gravier (2003)).3. The identity models λi are then derived from the world model W using the Maximum A-Posteriori (MAP) algorithm. is Λ(s) = log S (s. this probability is modelled by building a model trained with a set of identities that represent the demographic variability of the people that might use the system. the matching score is computed using Equation 16. and then let the GMM algorithms generalize this knowledge.Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 229 13 The input signal is converted by the front-end algorithms to a set of K feature vectors.d. In the GMM/UBM framework (Reynolds et al. as described in Equation 14 . it goes through some common steps. (2000)). each of dimension D. The UBM is created during the system design. cepstral features are extracted using a tool called sfbcep. Then. 5. the p(s| H1 ) is still missing. I ) = log p(s| λ I ) − log p(s| λW ) (16) 5.1. for instance. that are needed for the validation of biometric systems. the average length of the recordings is 45 seconds. on our database. 1024) were tested. while Type II errors could be tolerated. 6. To overcome this problem. Binary classification systems work by comparing matching scores to a threshold. The sounds have been converted to the Wave audio format. that will be further described in Section 6. the first of these sets proved to be the most effective In (Beritelli & Spadaccini (2010a)) the impact of the FSR and of the number of Gaussian densities in the mixtures was studied. 6. for each person. in a high-security environment. we are building a heart sounds database suitable for identity verification performance evaluation. while the other is used for the computation of matching scores. . using 16 bit per second and at a rate of 11025 Hz. there are two separate recordings. some parameters have been tuned in order to get the best performance. which must be selected according to the context of the system. is 256 Gaussians with FSR. each lasting from 20 to 70 seconds. One of the two recordings available for each personused to build the models. connected to a computer via an audio card. 6. and the best combination of those parameters. • False Non-Match (Type II Error): reject an identity claim even if the template matches with the model The importance of errors depends on the context in which the biometric system operates. three different cepstral feature sets have been considered in (Beritelli & Spadaccini (2010b)): • 16 + 16 Δ + E + ΔE • 16 + 16 Δ + 16 ΔΔ • 19 + 19 Δ + E + ΔE However.1 Heart sounds database One of the drawbacks of this biometric trait is the absence of large enough heart sound databases. 157 male and 49 female. 512. Four different model sizes (128. with and without FSR. there are 206 people in the database. a Type I error can be critical.6 Optimization of the method During the development of the system. 256. Currently.14 230 Biometrics / Book 1 Biometrics 5.1. Namely. we will compare the performance of the two systems described in Section 4 and 5 using a common heart sounds database.2 Metrics for performance evaluation A biometric identity verification system can be seen as a binary classifier. There are two possible errors that a binary classifier can make: • False Match (Type I Error): accept an identity claim even if the template does not match with the model. The heart sounds have been acquired using a Thinklabs Rhythm Digital Electronic Stethoscope. their accuracy is closely linked with the choice of the threshold. Performance evaluation In this section. however. The choice between the two setups. the error rate remained almost constant with respect to the last evaluation of the system.. and it was designed to work on small databases. in spite of a 25% increment of the size of the database. a DET curve shows how the analyzed system performs in terms of false matches/false non-matches as the system threshold is changed.86 Statistical 13. FNMR = 1-2%) setups. that is the plot of FMR against FNMR. Performance evaluation of the two heart-sounds biometry systems The huge difference in the performance of the two systems reflects the fact that the first one is not being actively developed since 2009. thus letting more unauthorized users to get a positive match. and the results are reported in Table 3. because we cannot know its applications in advance. in which a test over a 165 people database yielded a 13. we need to take a threshold-independent approach. System EER (%) Structural 36. Figure 4 shows the Detection Error Trade-off (DET) curves of the two systems. (2008)). A finer evaluation of biometric systems can be done by plotting the Detection Error Tradeoff (DET) curve. and can require the user to try the authentication step more times. false non-match) rate. and between all the intermediate security levels. fixing a false match (resp. We can therefore conclude that the statistical approach is definitely more promising that the structural one.g. 6. The DET curve represents the trade-off between security and usability. defined as the error rate at which the False Match Rate (FMR) is equal to the False Non-Match Rate (FNMR). As stated before. the system that performs better is the one with the lowest false non-match (resp. In both cases. It is important to highlight that.1. Conclusions In this chapter. while the second has already proved to work well on larger databases. Looking at Figure 4. is strictly application-dependent.. This allows to study their performance when a low FNMR or FMR is imposed to the system. it is easy to understand that the statistical system performs better in both high-security (e..Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 231 15 When evaluating the performance of a biometric system. and the algorithms used to process them. After introducing the advantages and shortcomings of this biometric trait with respect to other traits. we explained how our body produces heart sounds.66 Table 3. but will make more false match errors. 7. false match) rate. a system with low FNMR will be more tolerant and permissive. we presented a novel biometric identification technique that is based on heart sounds.3 Results The performance of our two systems has been computed over the heart sounds database. at least with the current algorithms and using the database described in 6.70% EER. A common performance measure is the Equal Error Rate (EER) (Jain et al.g. . A system with low FMR is a highly secure one but will lead to more non-matches. FMR = 1-2%) and low-security (e. concluding that the statistical system performs better. This means that the matching algorithms will be fine-tuned and a suitable feature set will be identified. 4. the academic community has produced several works on this topic. 7. but most of them share the problem that the evaluation is carried on over small databases. As larger databases of heart sounds become available to the scientific community. probably containing both elements from the frequency domain and the time domain. making the results obtained difficult to generalize. First of all. at least creating a common database to be used for the evaluation of different research systems over a shared dataset that will make possible to compare their performance in order to refine them and. there are some issues that need to be addressed in future research. over time. the mid-term and long-term reliability of heart sounds will be assessed. one using a structural approach and another leveraging Gaussian Mixture Models. heart sounds biometry is a promising research topic in the field of novel biometric traits. Additionally. Detection Error Tradeoff (DET) curves of the two systems A survey of recent works on this field written by other research groups has been presented. develop techniques that might be deployed in real-world scenarios. We feel that the community should start a joint effort for the development of systems and algorithms for heart-sounds biometry. So far.1 Future directions As this chapter has shown. analyzing how their biometric properties change as time goes by. the impact of cardiac diseases on the identification performance will be assessed. we described the two systems that we built for biometric identification based on heart sounds. Then. showing that there has been a recent increase of interest of the research community in this novel trait. the identification performance should be kept low even for larger databases. We compared their performance over a database containing more than 200 people. Next. .16 232 Biometrics / Book 1 Biometrics Fig. D. S. Flynn. (2010).d. when the algorithms will be more mature and several independent scientific evaluations will have given positive feedback on the idea. A. Alize/Spkdet: a state-of-the-art open source software for speaker recognition. IEEE Transactions on Acoustics. and possibly ad-hoc sensors with embedded matching algorithms will be developed. IEEE Transactions on Instrumentation and Measurement 50(3): 808–812.. Jain. 93–96. K.. Heartid: Cardiac biometric recognition.. N. Scheffer. A. O.2010. A. An improved biometric identification system based on heart sounds and gaussian mixture models. IEEE. Beritelli.org/10. A.. Fatemian. Biometrics: Theory Applications and Systems (BTAS). Larcher. Gravier. F. Heart sounds quality analysis for automatic cardiac biometry applications. IEEE Transactions on Information Forensics and Security 2(3): 596–604. El-Bendary. Fauve. Davis. ECG Analysis: A New Approach in Human Identification. & Wide. Beritelli. URL: http://dx. Preti. Agrafioti. A. & Mermelstein. (2008). Human Identity Verification based on Mel Frequency Analysis of Digital Heart Sounds. A. A. & Pankanti. Al-Qaheri. IEEE.). (n.. 2010 Fourth IEEE International Conference on. 8. (2007). pp. & Pettersson. & Spadaccini. H. (2010). A. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. (1980). G. & Prabhakar. (2009a). some practical issues like computational efficiency will be tackled. & Spadaccini. Evans. K. P. A. B. & Serrano. Biometrics: A tool for information security. Matrouf. K.. Proceedings of the Fourth International Conference on Emerging Security Information. & Ross. F. P. Fredouille.. A.inria.. pp. E. Nature and Biologically Inspired Computing (NaBIC). IEEE Transactions on Information Forensics and Security 1(2): 125–143. N. A. A.23 Biel. & Mason. S. A.. 351 –356. SPro: speech signal processing toolkit.. F. P.. H. Jain.fr/projects/spro Jain. A. Proceedings of the 1st IEEE International Workshop on Information Forensics and Security. pp. (2006). A. . Proceedings of the 16th International Conference on Digital Signal Processing. (2010a). J. Handbook of Biometrics. (2009b). 1 –5. A statistical approach to biometric identity verification based on heart sounds.Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions Human Identity Verification Based on Heart Sounds: Recent Advances and Future Directions 233 17 Finally. & Spadaccini. & Abraham. L.. thus making heart-sounds biometry a suitable alternative to the mainstream biometric traits. Systems and Technologies (SECURWARE2010).. Ross. Beritelli. & Spadaccini. Q. D. & Philipson. (2001). S. Ross. IEEE Transactions on Circuits and Systems for Video Technology 14(2): 4–20. Hamed. Hsas: Heart sound authentication system. G. L. Proceedings of the 2010 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications. S. & Hatzinakos. Zhao. Springer. 31–35. J. URL: http://gforge. Speech and Signal Processing 28(4): 357–366. R. N.. Biometric Identification based on Frequency Analysis of Cardiac Sounds. (2010b).-F. F. Bonastre. F. M.medra. C. (2003)... 2010 Second World Congress on. Pouchoulin. pp.1109/SECURWARE. Zawbaa. References Beritelli. F. M. (2004). Beritelli.. An introduction to biometric recognition. Hassanien. A. S. . (2010). Pankanti. & Rader. K. (2008). (1969). (1995). (2000). C. K. R. Reynolds. McLachlan. (2003). T. (1997). A. . L. & Soman.. K. Leng. 2010 International Conference On. Chen. The EM Algorithm and Extensions. D. Prabhakar. Reynolds. & Li. A. S. H. H. Y.. V2–228 –V2–233. A.. L. T. T. F. Feature integration for heart sound biometrics. J. & Jain. 2000. 1714 –1717. K. Rabiner. Audio and Electroacoustics. 2010 IEEE International Conference on. J. Biometric recognition: Security & privacy concerns.. Speaker verification using adapted gaussian mixture models. B. Electronics and Information Engineering (ICEIE). IEEE Transactions on Speech and Audio Processing 3: 72–83. Dat. Heart sound as a biometric. Tran. M. The chirp z-transform algorithm.. G. Electronics Letters 46(16): 1100 –1102. (2010). Robust heart sound activity detection in noisy environments. & Rose. Digital Signal Processing. & Shue.18 234 Biometrics / Book 1 Biometrics Jasper. Schafer. Sabarimalai Manikandan. (2010). D. pp. Pattern Recognition 41(3): 906–919. R. Acoustics Speech and Signal Processing (ICASSP). p. pp. H. D. R. C. S. IEEE Transactions on 17(2): 86 – 92. Robust text-independent speaker identification using gaussian mixture speaker models. J. Feature extraction for human identification based on envelogram signal analysis of cardiac sounds in time-frequency domain. Wiley. 2. Phua. & Dunn. R. & Othman. IEEE Security and Privacy Magazine 1(2): 33–42. & Krishnan. Vol. Quatieri. 2009]. most of the previous studies have observed average of heartbeat intervals a certain range in pre. parts of music pieces excite people in disco and party. 2008] of a listener and is easy to measure. As further investigations. a device measuring electrocardiogram is generally cheaper than devices measuring other physiological indices such as electroencephalogram. Oppositely. The alternate exposure of different sound stimuli is also employed in the present study. Furthermore. Especially. For example. For example. These various effects of music have been investigated in many previous studies. . The experimental method in the present study is set by referring to our previous study [Fukumoto et al. white noise.and post-listening sound stimulus. Introduction Music is widely believed as one of the most effective media forms that effect on human psycho-physiologically. Mukogawa Women’s University. One of the reasons of that is existence of many music factors. Japan 1. because the heartbeat reflects autonomic nervous activity [Pappano. two min relaxing music piece and white noise were employed as the different sound stimuli. harmony. and these sound stimuli were played twice alternately after five min rest. relaxation effects of music were investigated from various view points with psycho-physiological experiments. investigations of music and its effects will contribute to theoretical use of music for therapy and so on. Although many previous studies investigated the effects. the effects have not been clarified at all. time intervals eliciting effect of music and sound and decreasing the effect are important information to determine the time length to exposure them to the listeners. creation of sedative atmosphere in a home. tempo. which are music and sound stimuli employed as sound stimuli in the previous study.. Heartbeat is one of the most important physiological indices and is often used as physiological index to investigate the effect of music and sound stimuli. Many people expect the effects of music and uses music pieces in various situations. This chapter aims to investigate the temporal change in heartbeat intervals in a transition between different sound stimuli. very few previous studies have investigated the change in heartbeat in the transition of different stimulus. etc. melody. rhythm. we newly add No-sound stimulus to relaxation music piece (Air by Bach). Although many previous studies have investigated the effect of music and sound with heartbeat interval as physiological index. In the listening experiment of our previous study. Anyway. relaxation effect of music is used in change in personal mind in daily life.12 Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli Makoto Fukumoto and Hiroki Hasegawa Fukuoka Institute of Technology. Observing temporal change in heartbeat is important and contributes to improvement of exposure method of music and sound to the listeners. and therapeutic purposes. Most of the previous studies using heartbeat as physiological index have tried to investigate the effect of music by comparing heartbeats before and after listening to the musical piece. however. According to a previous study. as mentioned above. It means observing the change in heartbeat caused by listening sound stimuli. have investigated the physiological effects of musical pieces inducing different moods on listener [Etzel et al. They have not mentioned about the concrete time cost. The results of this study will contribute to develop the studies in music therapy and the studies on musical system reflecting a user's KANSEI information automatically. because heartbeat is noninvasive physiological index being measured with relatively convenient and low-cost device. Illustration of objectives of this study. because general physiological changes come after psychological changes. noise. 2008]. heartbeat reflects autonomic nervous activity. reducing patient’s anxiety during a surgical operation and personal use to quickly change our mind and so on. Fig. because we do not know how long time the sound stimuli would cost to elicit the change of heartbeat. 1977]. and the decrease of the heart rate is caused by a combination of two factors. heartbeat information has not been used well. As mentioned above. the effects have not been clarified completely yet.. Moreover. many researchers have investigated the effect with psycho-physiological indices [Dainow. 1. Objectives of this study are fundamental investigations of temporal change in heartbeat in a transition between sound stimuli. we can observe change in heartbeat in start and end of noise and music stimuli. increase in parasympathetic activity and decrease in sympathetic activity [Pappano. Time cost is also investigated. response of heartbeat needs longer than 1 s. Heartbeat has been often used to investigate the effects of music. impression for the listening to music piece on the listener is determined with 1 s [Bigand et al. Etzel et al. however. Figure 1 illustrates the objectives and explains that listening sounds elicit deceleration or acceleration of heartbeat.236 Biometrics By employing No-sound stimulus as sound stimulus. Some previous studies have investigated the temporal cost that sound elicits heartbeat approximately with different sound stimuli and different conditions. 2005]. 2005]. the .. Referring to this finding. Especially for the effects of music. we generally believe relaxation and excitation effects of sounds and utilize the effect in music therapy. and No-sound. music. However. The remains of this chapter are constructed as below. have investigated the physiological effects of 30 s noise and music [Gomez & Danuser. 2004]. Music and noise were used in our previous . it took about 30 s to change heartbeat interval from previous section. 2008. Chung & Vercoe. The section 2 describes experimental method and sound materials. In our previous study [Fukumoto et al. Additionally. Moreover. 1998. Yoshida et al. 2008].. Hazama et al. 2. 2004..Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 237 results of heart rate development seem to elicit the change of heart rate from 30 s to 40 s. Hazama & Fukumoto. a questionnaire asking relaxation feeling was used as psychological index. several previous studies have investigated the psycho-physiological response for sound stimulus with various physiological indices [Aoto & Ookura. Kim & André. These results in our previous study support the findings by other previous studies that investigated the effects of sound stimuli on heartbeat. theoretical use of music is first candidate of the application of this study. the section 5 concludes this chapter. Furthermore. 2009]: different sound stimuli are included in one experimental set.05). Psychological result showed that music section induced the higher relaxation feeling than noise section significantly (P < 0. however. Gomez et al. What kind areas does this study contribute for? First. and the applications as introductions of this study.. as described above. 20 s sliding window was employed. 2009]. 2009. From engineering point of view. with a mind to develop a musical system that reflects user’s KANSEI and psychological condition of listeners automatically. respectively. the previous studies. after the transition of sound stimuli. Sugimoto et al. Finally. and the section 3 shows results of the experiment. precise time that sound stimulus need to elicit the change of heartbeat has not been clarified. In the investigation of the change in heartbeat interval. in Guided Imagery Method.. 2009]. The results of their study showed that part of noises and musical pieces elicited physiological change including heart rate within 30 s. The significant difference in subjective relaxation feeling meant that the relaxing music piece and the noise were greatly different. average of heartbeat intervals in music section was larger than that in noise section significantly (P < 0. have associated listener’s preference (with 2-point scale) and heartbeat interval for 40 s musical pieces obtained from their evolutionary computation method composing musical piece [Hazama & Fukumoto. patients listen to several music pieces for a long time. 2006. Healy et al. Procedure and materials This section describes experimental method used in this study. and electrocardiogram is measured in listening to the sound stimuli.. If a therapist know a time length that music pieces need to elicit change in heartbeat. Heartbeat is included the indices. the experimental method used in this study is referring to our previous study [Fukumoto et al. This section has explained the background. and revealing time cost eliciting the change of heartbeat by sounds will contribute to musical information techniques such as automatic musical composition based on physiological index [Fukumoto & Imai. After the exposure of the sound stimuli. 2007. Basically. the knowledge must be useful to make a selection of music pieces.001). For example. Their result showed different distribution of heartbeat intervals for preferred and not preferred musical pieces. The section 4 discusses the effects of the sound stimuli on temporal change in heartbeat based on the experimental results. These previous studies give us useful and interesting information about the time cost. a questionnaire asks the subjects relaxation feeling for noise and music sections. 2006]. five min rest and eight min sound stimuli. to prevent to induce strange feeling to the subjects. The part of eight min sound stimuli was composed of two min different sound stimuli. Experimental procedure of one experimental set. and no sound were employed. Each of the three experiments was mainly composed of two different sound stimuli.9±0. the change in heart beat in the transition between different sounds was investigated. 2000]. Both of the music piece and white noise were stereo recorded sound. because the subjects already realized the sound stimuli in second presentations. and each of the experiments included two different sound stimuli. Based on this experimental set. In the no-sound section. 2. the subjects were instructed not to eat. drink and smoke anything from 30 min before the listening experiment.238 Biometrics study. there were eight kind of combination of experimental conditions: Each of the subjects participated in three experimental sets. Rest 5 min Sound A Sound B 2 min 13 min Fig. Noise and Music. As shown in the outline of waveforms. remove of surprise for first time listening to the sound stimulus. Especially. This control of volume enabled us to construct the experiment composed of continuous 2 min Sound A Sound B Questionnaire 2 min 2 min Time . we add mute sound as a sound stimulus. music stimuli and noise were introduced after 10 s of fade-in and finished with 10 s of fadeout. Horizontal axis means time. In the listening experiment. sound was not played. white noise. and all of the subjects participated in the one condition in all of three experiments. 2.. Sixteen subjects were randomly and counter-balanced assigned to the eight combinatins. None of the subjects had professional or college-level music experience. The set of the listening experiment was basically constructed from two parts. The subjects listened to these sound stimuli through a headphone. the change in heartbeat intervals in second presentations of Sound A and B were used in analysis. Therefore. and vertical axis means sound amplitude. With three experiments.7 years) participated in the listening experiments as subjects. Figure 2 shows example of one experimental set. each of three experiments included two conditions. These sounds were played two times respectively with change places their sequence. White noise was used in noise sections.1 Procedure Time length of one experimental set was thirteen min. Format of these sound stimuli was WAVE format. Music and No-sound. Air in G composed by Bach was used as relaxing musical piece in music sections. three kinds of transitions and their inverted sequences are used in the listening experiment. As described in the next section. To investigate the effects of various transitions of sound stimuli. Noise and No-sound. Figures from 3 to 5 show outline of waveforms of sound stimuli including first 5 min rest. and these sound stimuli were played by notebook. Sixteen males and females (mean age: 21. three experiments were performed. a music piece. Beforehand for the experiments. 2. Therefore.2 Sound stimuli As sound stimuli. This music piece is well known as its relaxing mood and was used as sedative musical piece in a previous study [Yamada et al. 2009]. In the present study. 3. 5. 2 min and 20 s windows . 1957] was used to ask the subjects relaxation feeling. Development of heartbeat interval is represented from (tn.Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 239 different sound stimuli..3 Psychological index In the questionnaire after listening all of the sound stimuli. tn+1-tn). the questionnaire also asked the subjects that the subject had an experience listening to the musical piece played in music sections. R-waves were detected from the subjects’ electrocardiogram. In our previous study [Fukumoto et al. where tn means time of n-th R-wave. Fig. lower: No-sound to Music condition).0 to 70. Fig. Additionally. in statistical analysis for relaxation feeling. Volumes of sound stimuli used in the experiment 3: (upper: Music to No-sound condition.stressful” for the afforded two sound stimuli. lower: Music to Noise condition). lower: No-sound to Noise condition). and the subjects answered with writing on the paper by themselves. 2. 4. Additionally. volume of sound stimuli was adjusted initially by the subjects themselves by listening to white noise before the listening experiment. These questions were written on a paper in Japanese and were explained by an experimenter. respectively. semantic differential method [Osgood et al. in the expeiment 1.0 dB(A). Volumes of sound stimuli used in the experiment 2: (upper: Noise to No-sound condition. The volume of noise and musical piece was arround 66. and R-R intervals represent temporal development of heartbeat intervals. R-wave approximately represents time of heartbeat. The subjects estimated their relaxation feelings with 7-point scale of “relaxed . the volume of the sound stimuli were fixed. Fig..4 Physiological index In the physiological analysis. In the analyses. sign test was used. 2. to adjust the experimental condition between the subjects further. Volumes of sound stimuli used in the experiment 1: (upper: Noise to Music condition. Noise elicited the smallest. As summary of the results of relaxation feelings. 6. Figure 7 shows subjective relaxation feelings of Noise and No-music Stimuli in the experiment 2.240 Biometrics were used to observe the change in heartbeat intervals.001). Average point of No-sound was larger than that of Noise a little bit. First. In the statistical analyses. and time of the window was defined as central time obtained from average of first and last time of the window. Figure 6 shows subjective relaxation feeling in the experiment 1. 2006].. No-sound was intermediate.001 Point 4 3 2 1 Noise Music Fig. average of heartbeat intervals included in each window (tn is in the temporal range of each window) was calculated. 7 6 5 P<0. 2 min window was same as its temporal length of each section and was used for broad and rough observation for change in heartbeat intervals in each section. all heartbeat intervals were detected from electrocardiogram. pairwise comparison was used. Two kinds of windows were used for different observation. result of questionnaire only for the experiment 1 showed that all of the subjects knew the music piece played in music sections. however. 3. and the temporal lengths are 4 s and 10 s (in 60 heartbeats per 1-min). Higher point means higher subjective relaxation feeling. 20 s was used for detail observation. Noise and No-sound were almost same level. The analyses with 20 s window were mainly applied for latter two listening sections. Relaxation point of Music was also larger than that of No-sound (P < 0. Heartbeat intervals in general condition contain two kinds of period of heartbeat fluctuations reflecting autonomic nervous activity and respiration. however. Music stimulus elicited the largest relaxation feeling. As shown in Fig. In a part of the analyses. Statistical analysis for this result showed that there was significant difference between Noise and Music conditions (P < 0. because individual variation of subjective evaluation and physiological indices were considered large. On the other hand. the 20 s window slid as queue processing.1 Results of psychological index First. 3. Figure 8 shows subjective relaxation feelings of Music and No-sound in the experiment 3. Experimental results This section shows the psychological and physiological results of the listening experiment. . there was no significant difference. large difference between relaxation feelings of Noise and Music conditions was observed. Length of 20 s window was available for omitting these heartbeat fluctuations.001). Subjective relaxation feeling to Noise and Music stimuli in the experiment 1 (N=16). Then. 6. and temporal length of 20 s window was determined referring to a method of a previous study [Yoshida et al. Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 7 6 5 241 Point 4 3 2 1 Noise No-sound Fig.9 0. Subjective relaxation feeling to Music and No-sound stimuli in the experiment 3 (N=16).7 0. For rest section prior to listening to sound stimuli. . 1 0.6 Rest Noise1 Music1 Noise2 Music2 Fig.65 0. First. Each result was composed of average and standard deviation between subjects and was obtained after analysis of each subject. In the analysis.8 0.9 0.75 0. 7. 7 6 5 P<0. rough change in heartbeat intervals in each section is investigated. respectively.2.85 0. 3.7 0. detail change in heartbeat is observed with 20 s window. Right: Noise to Music condition (N=8)).95 1 0. last 2 min in 5 min rest was utilized as analyzed section.001 Point 4 3 2 1 Music No-sound Fig.65 0. 3.8 0. 8.85 0. average heartbeat intervals in 2 min sections were used.95 Heartbeat Interval Heartbeat Interval 0. Then. Average heartbeat interval in each 2 min section in the experiment 1 (Left: Music to Noise condition (N=8).1 Change in heartbeat in each 2 min section Figures from 9 to 11 show whole changes in heartbeat in each three experiment.6 Rest Music1 Noise1 Music2 Noise2 0. 9.75 0. Subjective relaxation feeling to Noise and No-sound stimuli in the experiment 2 (N=16).2 Results of physiological index This subsection shows the results of physiological index. From these results.8 0.95 1 0. Figure 12 shows change in heartbeat in latter 2 sections from 540 s to 780 s with 20 s window. 20 s sliding window was used.8 0.9 0. 11. 1 0. Average heartbeat interval in each 2 min section in the experiment 2 (Left: Noise to No-sound condition (N=8). From 650 s and latter windows are target sections and compared with 650 s statistically. 650 s window (640 s to 660 s) was used.7 0.2.95 Heartbeat Interval 0. 12.82 0.75 0.242 Biometrics A common tendency between the results shown in Figures from 9 to 11 was gradual extend of heartbeat intervals in correspondence with time development.75 0.7 0. Figure 13 shows 0.95 1 0. 1 0.75 0.9 0. Average heartbeat interval in each 2 min section in the experiment 3 (Left: Music to No-sound condition (N=8).95 Heartbeat Interval 0.2 Detail change in heartbeat in a transition of sound stimuli Figures 12 and 13 show the results of the experiment 1. Average heartbeat interval in 20 s sections from 540 to 780 s in the experiment 1. 3. 10.85 0.To investigate time cost eliciting change in heartbeat by listening to each sound stimulus. Right: No-sound to Music condition (N=8)). Right: No-sound to Noise condition (N=8)).6 Rest No-sound1 Noise1 No-sound2 Noise2 Fig.6 Rest Music1 No-sound1 Music2 No-sound2 Heartbeat Interval 0.78 0. .9 Heartbeat Interval [s] 0.88 0. We can observe gradual change in heartbeat and tendencies that shorten of heartbeat in noise sections and extension of heartbeat in music sections.76 550 570 590 610 630 650 670 690 710 730 750 770 Music to Noise condition Noise to Music condition Time [s] Fig.65 0.8 0. large differences between different sound stimuli were not observed.86 0.75 0.8 0.8 0.7 0.6 Rest No-sound1 Music1 No-sound2 Music2 Fig. The sound changed its content at 660 s.65 0.85 0.9 0.85 0.9 0.7 0.6 Rest Noise1 No-sound1 Noise2 No-sound2 Heartbeat Interval 0.84 0. It means that lowest average heartbeat was observed in the rest section except Music to No-sound condition in the experiment 3.65 0.85 0.65 0. As previous analysis section. Average heartbeat interval in 20 s sections from 540 to 780 s in the experiment 2.82 0.1 0. 0.76 0.78 0.8 0.25 0.88 0.35 0.88 0.74 550 570 590 610 630 650 670 690 Heartbeat Interval [s] Noise to No-sound condition No-sound to Noise condition 710 730 750 770 Time [s] Fig.84 0.94 0.4 0.84 0. In the result of Noise to Nosound condition.05 0 650 673 678 683 688 693 698 703 708 713 718 723 728 733 738 743 748 753 758 763 768 0.25 0. In the result of Music to Noise.82 0.15 0.92 0. heartbeat interval tended to be extended from previous section in 761 s window. lower: Music to Noise condition). Gradual change with sliding window was observed.2 0.82 0. Gradual change in heartbeat was also observed in this result.86 0.45 0. heartbeat interval was shortened in 678 s window significantly: Time cost to elicit the change of heartbeat from Music to Noise was 28 s. 13.1 0.84 0.3 0. 0. It means that 111 s was needed to change heartbeat interval by listening to music from listening to noise.3 0. Detail observation in the transition using 20 s sliding window in the experiment 1 (upper: Noise to Music condition.2 0. and plotted line means P-value obtained from statistical analysis. 14.4 0.88 0.86 0.78 0.45 0.15 0.8 0.9 0.8 0.78 0. In the result of Noise to Music condition. Figure 15 shows the results of analysis with sliding window. Figure 14 shows change in heartbeat in latter 2 sections in the experiment 2.35 Heartbeat Interval [s] 0.86 0.76 Time [s] Fig.96 0.5 Average P-value 0.05 0 650 673 678 683 688 693 698 703 708 713 718 723 728 733 738 743 748 753 758 763 768 0.5 Average P-value 0.76 Heartbeat Interval [s] Time [s] 0.Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 243 the results of analysis with sliding window. heartbeat interval tended to be extended from previous section in 671 s Significance Significance . 94 0. 21 s was needed to change heartbeat interval from listening to noise.2 0. Gradual and rapid changes in heartbeat were observed in this result. heartbeat interval tended to be shortened from previous section in 686 s window.35 Heartbeat Interval [s] 0.89 Heartbeat Interval [s] 0. Significance Significance .85 0.5 0.95 0.78 0. 16.244 Biometrics window: Smallest P-value was observed in the section (P = 0. In the result of No-sound to Music. Figure 17 shows the results of analysis with sliding window.1 0.88 0.05 0 650 673 678 683 688 693 698 703 708 713 718 723 728 733 738 743 748 753 758 763 768 0.88 Time [s] 0.9 0.4 0.2 0.45 0.8 0.25 0. heartbeat was gradually extended.91 0.45 Average P-value 0.35 0.3 0. Detail observation in the transition using 20 s sliding window in the experiment 2 (upper: Noise to No-sound condition. 15. heartbeat interval was extended around 752 s window.15 0.87 0.3 0.4 0.82 0. 0. heartbeat interval kept same level till the end of the listening experiment.9 0.86 0. After that. In the result of Music to Nosound condition. Average heartbeat interval in 20 s sections from 540 to 780 s in the experiment 3.86 0. lower: No-sound to Noise condition). it was shortened in 756 s window significantly. furthermore. After that.82 550 570 590 610 630 650 670 690 710 730 750 770 Music to No-sound condition No-sound to Music condition Time [s] Fig.83 0.05 0 650 673 678 683 688 693 698 703 708 713 718 723 728 733 738 743 748 753 758 763 768 0. 0.89 0. In the result of No-sound to Noise.1 0.88 0.0547).25 0.93 0.15 0.5 Average P-value 0.84 0. heartbeat interval was shortened around 690 s window.92 0.84 0. Figure 16 shows change in heartbeat in latter 2 sections in the experiment 3.76 Heartbeat Interval [s] Time [s] Fig. tendency and significant change in heartbeat were observed.15 0. Generally. 2009].4 0.3 0.1 0.85 0.15 0. Detail observation in a transition between different stimuli showed obvious change in heartbeat intervals..89 0.45 0.84 0. noise is believed as sound everyone dislikes.9 0.92 0. The tendency is considered as caused of sitting on a chair for a long time.25 0.35 245 Heartbeat Interval [s] 0.05 0 650 673 678 683 688 693 698 703 708 713 718 723 728 733 738 743 748 753 758 763 768 0. Discussion Result of relaxation feelings of sound stimuli showed that music elicited highest relaxation and noise elicited lowest one.3 0.05 0 650 673 678 683 688 693 698 703 708 713 718 723 728 733 738 743 748 753 758 763 768 0.1 0.35 Heartbeat Interval [s] 0.5 0.82 Time [s] 0.86 0. Adjustment of volume might effect on change in heartbeat interval as small change. therefore.87 0.4 0. Time cost eliciting the change and direction of the change (extension or shorten) was quite different between experimental conditions: The direction of change obeyed subjective relaxation feelings. One of the reasons for that is the subjects were in the listening experiment without any task.2 0.91 Average P-value 0.9 0.83 0. In addition.2 0.86 0. If they have tasks to do. 17. lower: No-sound to Music condition).45 0. noise deny doing the tasks.25 0.Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 0. Detail observation in the transition using 20 s sliding window in the experiment 3 (upper: Music to No-sound condition. they might feel the noise as more negative. however. difference between Noise and No-sound was little (no significant). The sequence of them is reasonable. In the all experiments.84 Time [s] Fig.88 0.87 0. No-sound was intermediate. These results support our previous study with more concrete investigations. In our previous study that employed music and noise Significance Significance . Entire change in 2 min average heartbeat interval did not show large difference between sections. 4.89 0. while a significant difference was observed between music and noise conditions in our previous study [Fukumoto et al. The reduction of volume might elicit little negative impression for noise.5 Average P-value 0.88 0. adjustment of volume of sound stimuli between the subjects might reduce the volume for some of the subjects (The adjustment was not applied for the sound stimuli in our previous study).85 0. Same tendency between the present and the previous study as gradual extend of heartbeat interval was observed. Noise elicited the change in heartbeat than Music. In the physiological evaluation processes. 5. we investigated the effects of relaxing musical piece and white . higher P-values around the end of post section were also observed. Homeostasis is an important function assisting the body in maintaining a constant internal environment [Rubinson & Lang. it was needed almost 20 s to elicit the extension heartbeat intervals: With the analysis used in the present study. however. music effects on psycho-physiologically by its development. No-sound was a new additional condition from our previous study. The time cost was the fastest among the experimental conditions. the change of heartbeat interval might be partly affected physiologically from tempo of sound stimulus. 2008]. 36 s was needed to shorten the heartbeat intervals. however. As mentioned above. although temporal changes in heartbeat intervals just after the transition were observed. Strength of the impressions and feelings would relate to the time cost. According to the results of these previous studies. it should be noted that heartbeat interval has its limits to change. Furthermore. Conclusions In this study. While the time cost of noise was almost same time length as our previous study. noise sound contents unpleasant sound. In the present study. as fundamental investigation of temporal development of heartbeat interval in listening to sound stimuli. about 30 s was needed to observe obvious change in heartbeat interval from previous condition to post transition. time cost of noise to elicit the change from music piece was 28 s. around 690 s window. the results with no-sound condition suggest that change in heartbeat interval occurred by sound stimuli disappear from 20 s to 36 s. The difference of physical property of sound stimuli used in the listening experiment might affect the change of heartbeat intervals in each section. This tendency was also shown in temporal development of heartbeat intervals. 1976] and musical piece [Kusunoki et al. shorten of heartbeat was observed. there was no possibility that heartbeat interval in noise section was entrained by white noise because white noise does not have any tempo and cycle. In some results. because the human have to defend or run away from dangerous things quickly. To clarify the effect of the tempo of sound stimuli. The reason why noise affects on heartbeat earlier than music is considered that noise is a continuous sound content: from start to end. Previous studies have investigated the physiological effect of tempo of sound stimuli on heartbeat with simple tone [Bason & Celler. However. to observe significant change. On the other hand. the time costs were 28 s from Music. physiological change is considered faster for unpleasant and dangerous stimuli. further investigation comparing tempo of musical piece and heartbeat interval is needed. 1972] and have observed the entrainment and synchronization of heartbeat by the sound stimulus. the difference was considered as caused of adjustment of sounds’ volume. Furthermore. the change in heartbeat was mainly caused of psychological change as discussed above. impressions and feelings to sound stimuli are believed to remain for a long time. the time cost of music was longer than our previous study: 111 s was needed. The time cost was shorter than that of music. From No-sound to Noise.. For Noise condition. Additionally. it costs 106 s. and this function keeps heartbeat interval in certain range. The different of sound contents might be cause of difference of time cost. some previous studies indicated the possibility that tempo of the sound stimulus entrain listener’s heartbeat. With these results. Generally. From noise to no-sound. shorter reaction was not measured.246 Biometrics as sound stimuli. and it played a role of release from music and noise and obvious start of the stimuli from no-sound. from Music to No-sound. On the other hand. These sound stimuli induced the subjects’ different relaxation feelings between the sound stimuli. & Imai. The affective remixer: personalized music arranging.. selection. D. International Journal of Psychophysiology. Sapporo. 220-222.. time cost of change in heartbeat by listening to music were longer than the result in our previous study.. pp.. doi:10. S. Evolutionary Computation System for Musical Composition using Listener’s Heartbeat Information. Johnsen. doi:10. Acknowledgment This work was supported partly by Grant from Computer Science Laboratory. and it will contribute to more effective applications based on user’s physiological information. A. (1977). D. G. 57-69.025 Fukumoto. & Lalitte P.2005. pp. arrangement and creation of musical piece. (2005). 211-221 Etzel. M. F. Conference on Human Factors in Computing Systems. With the analysis. J.6166045 Aoto. pp. & Celler.1126/science. Science. especially if the system uses heartbeat interval as physiological index. G. Journal of Research in Music Education. 279-280 Bigand. 10. Vol. S.. Tranel. Study on Usage of Biological Signal to Evaluate Kansei of a System.. R. E. Dickerson. M.ijpsycho. Control of the Heart Rate by External Stimuli. P. Annals of the New York Academy of Sciences. 7. L. Some of the results.. No. & Adolphs. T. The results with the analysis would show us more precise physiological change in the change in the sound stimuli. No.. As precise observation with statistical analysis. we will investigate the psycho-physiological effects of the sound stimuli inducing different relaxation feeling with spectral analysis [Akselrod et al.10. doi:10. As future study. 3. R. J.1002/tee. & Ohkura. & Cohen.. IEEJ Transactions on Electrical and Electronic Engineering. Ubel. A. Proceedings of the 1st International Conference on Kansei Engineering and Emotion Research 2007. Vol. 238. E. S. Filipic.Investigation of Temporal Change in Heartbeat in Transition of Sound and Music Stimuli 247 noise. (1981). (2006). The results of this study will contribute to these trials. 1981]. we can separately evaluates autonomic nervous activities based on heartbeat intervals. Power Spectrum Analysis of Heart Rate Fluctuation: A Quantitative Probe of Beatto-Beat Cardiovascular Control. Cardiovascular and respiratory responses during musical mood induction. supports previous studies. pp. J. 429-437 Chung. detail temporal development of heartbeat interval in transitions between two diffrent sound stimuli were observed using 20-s sliding window. C. A. Gordon. pp. Barger. Some previous studies have investigated the relationship user’s KANSEI and physiological response in listening sound and have tried to apply the relation between them for developing musical system.. (2006). & Vercoe. Shanon. Vol. 6. References Akselrod. L-9 Bason. Vol. E. 6. Physical effects and motor responses to music. Nature. pp. These approach aims to reflect user’s KANSEI to the system and to absorb individual difference. 629-631. (2007).1016/j.20324 . T. Fukuoka Institute of Technology. sympathetic and parasympathetic nervous activity. C. 213. and no-sound on heartbeat through three listening experiments. especiallyl for noise. 393-398 Dainow. However.. J. Averages of heartbeat intervals were almost same level. D. 61. (1972). (2008). J.. B. pp. The time course of emotional responses to music. & Stanton. 1441-1446 (in Japanese) . T. no. (2004). ISBN: 978-07695-3692-7..248 Biometrics Fukumoto. A Basic Study on Psychophysiological Changes in Listening Music. Yamazaki.. Vol. Suci. Hasegawa. Yokoyama. Proceedings of Biometrics and Kansei Engineering. pp.. & Nagashima. & Danuser.. M. Japanese Bulletin of Arts Therapy. Y. PA. 200-208. Modelling affective-based music compositional intelligence with the aid of ANS analyses. (2008). E. Affective and physiological responses to environmental noises and music. J. Legaspi. International Journal of Psychophysiology.). 126. M.. I. pp. Proceedings of the 9th international conference on Intelligent user interfaces. ‘Section IV: The Cardiovascular System’. R. IEICE Transactions. 84-89. Fundamentals. (Eds. (in Japanese) Yoshida. IL. & Sawada..11. Real-time Continuous Assessment Method for Mental and Physiological Condition using Heart Rate Variability. Vol. T. & Numao. pp. pp. K. M. Proceedings of the 2009 IEICE General Conference. vol.1016/j. Relationship between Evaluation Values in GA using Physiological Index and Subjective Evaluation.. 2009. PA. C. 2000. Issue 3. 287-414 Rubinson..1016/j. H. A New Affect-Perceiving Interface and Its Application to Personalized Music Selection. No. A. P. G. The measurement of meaning. in Berne and Levy Physiology (6th ed. doi:10..). Fukumoto. F. S. & Nagashima. Poland Gomez. pp. N. K. doi:10. M. & Tannenbaum. & Stanton. 33-41. A.. Temporal Development of Heartbeat Intervals in Transition of Sound Stimuli Inducing Different Relaxation Feelings. USA Pappano.2004. P. Moriyama. 2. USA. J. 12. 2241-2247 Osgood.. E. E. Mosby Elsevier. A Statistical Method of Detecting Synchronization for Cardio-Music Synchrogram. J. Koeppen. Kurihara S. A. M. M.2007. T. (2006).). B. E86-A. (2009). B.010 Yamada.ijpsycho. & Lang.. Hazama.. 2.. (2009). Misaki. USA.. 268-270 Kusunoki. K. Koeppen. B.. Picard. 2006. Y. 31. Information and Systems. No. pp. Portugal. & Ishii. R. B. & André. Y. Funchal. pp.knosys. Cieszyn. 53. 2003. Ota. pp. (2008) ‘Section II: The Nervous System’. A. Knowledge-Based Systems. Dabek.105. Proceedings of the 1998 Workshop on Perceptual User Interfaces Kim. (2003). p.. (1957). (Eds. no. IEEJ Transactions on Electronics. Mosby Elsevier. J. 51-230 Sugimoto. (2004). T. & Fukumoto. (1998).). (in Japanese) Healey. pp. (2008).002 Hazama T.02. (2000). Madeira. K. T. A Generate and Sense Approach to Automated Music Composition. 9.. Vol. 91-103. vol. B. in Berne and Levy Physiology (6th ed. University of Illinois Press. 21. Introduction 1. Proteomic analyses of varying body fluids are propelling the field of medical research forward at unprecedented rates due to its consistent ability to identify proteins that are at the fentomole level in concentration. Bio-chemically. The saliva microbiome. etc. Blood is a far more complex medium.13 The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women Charles F. Streckfus and Cynthia Guajardo-Edwards University of Texas Dental Branch at Houston. current ultra sensitive analyte detection techniques have eliminated this barrier. Since serum possesses more proteins than saliva. may result in a greater risk of non-specific interference and a greater chance . oncogenes..0 mg/ml.1 Background information Biometrics is the science and technology of measuring and analyzing biological data. however. Using “startof-the-art” mass spectrometry protein analysis. Saliva specimen preparation is simple involving centrifugation prior to storage and the addition of a cocktail of protease inhibitors to reduce protein degradation for long-term storage. It also refers to technologies that measures and analyzes human body characteristics for identification purposes. identification will refer to the recognition of those individuals in a disease state i. for example. As a consequence of this low protein concentration. These advancements have also benefited biometric research to the point where saliva is currently recognized as an excellent diagnostic medium for biometric authentication of human body characteristics.2 Why Saliva as a diagnostic media? 1. it was once assumed that this was a major drawback for using saliva as a diagnostic fluid. these efforts are in the area of biological verification. carcinoma of the breast.. assaying trace amounts of “factors” (e. saliva is a clear liquid with an average protein concentration of 1. A decision has to be made as to whether to use serum or plasma.1 Analytical advantages of Saliva Saliva as a diagnostic fluid has significant biochemical and logistical advantages when compared to blood.g.5 to 2. Collectively. Serum has a total protein concentration of approximately 60-80 mg/ml.2. is reputed to be biometrically as accurate as a fingerprint. 1. the author will demonstrate the use of salivary protein profiles to recognize individuals at risk for carcinoma of the breast.). United States of America 1.e. biometric can be applied to identify the biological characteristics of a diseased individual. however. In the context of this book chapter. albumin. e. 2006). composite or “whole” saliva is preferred as it enhances the chances of finding a biomarker due to the variety of sources from which it derives and because of its simplicity to collect.250 Biometrics for hydrostatic (and other) interactions between the factors and the abundant serum proteins. Serum also possesses numerous carrier proteins. 1. and may be collected repeatedly without discomfort to the patient [4]. it is recommended that blood specimens be collected with ethylenediamine tetraacetic acid (EDTA). which may be altered in the presence of disease.2. suction method and the swab method. the most reliable is the suction method with a reliability coefficient of r = 0.. and tongue each producing a unique type of secretion with varying protein constituents (Birkhed & Heintze.. it has been demonstrated that clotting removes many background proteins.2. 1989). Additionally.. plasma is also being explored as a diagnostic fluid. Consequently it may be possible to develop a simplified method for “hometesting”. The submandibular and sublingual glands. which may cleave proteins from many relevant pathways (Koomen et al. As a consequence. This diagnostic potential could reach many individuals who for personal. however. Blood specimens need to be collected in a specific sequence and under-filling tubes with additives may possibly alter protein analyses. These methods will yield 0. There are basically two types of saliva to collect.g. which must either be removed or treated prior to being assayed for protein content. the collection of saliva is safe (e. These include the draining or drool method. spitting method. logistical or economical reasons lack access to preventive care. The main consideration in using plasma is the selection of a proper anticoagulant (Koomen et al. palate. It also revealed a within subject variance of . no needle punctures). produce mixed secretions which are both serous and mucinous. Blood is a more complicated medium to collect. It would be ideal if all enzymatic activity in serum would cease at the time of collection. 0.93. 2005. It has been demonstrated that enzymatic activity continues during this process. As a consequence.3 Saliva collection The oral cavity receives secretions from three pairs of major salivary glands and numerous minor salivary glands that are located on the oral buccal mucosa. One type is “resting” or unstimulated whole saliva and the other is stimulated whole saliva. however. It requires highly trained personnel to collect it and if collected incorrectly. current research has found that heparin has a relatively short half life (3 to 4 hours) and can produce products of coagulation which are abundantly comparable to those assayed in serum. There are several methods for collecting unstimulated whole saliva.47.54 and 0. Additionally.52 ml/minute of saliva respectively.g. 2005). 1. proteomic analyses of serum has shown that this is not the case. if specimens are collected during hospital or clinical settings.47.. Teisner et al. For example the parotid and Von Ebner glands (located on the tongue) produce serous secretions while the minor salivary glands produce mucinous secretions. noninvasive and relatively simple. Based on these observations. can lead to misinterpretations which can result in patient mismanagement (Ernest & Balance. 1983)..2 Collection advantages of Saliva From a logistical perspective. testing in a “health fair” setting or in dental clinics where individuals are available for periodic oral examinations. 0. however. there may be a lapse of time before being processed. Heparin for example can be used as an anti-clotting agent. Of the four methods. a vacuum pump is required to collect the specimens (Birkhed & Heintze. Additionally. individuals will need to be serially assessed at approximately the same time of day that the baseline specimen was collected.. Stimulated secretions produce about three times the volume of unstimulated secretions and are not subjected to the effects of circadian rhythm. If the subject is taking medications that decrease flow rates (e. The authors recommend this salivary collection method for biomarker discovery.47-0.. the reliability is only r = 0. however. This technique produces copious amounts of saliva.g. The patient is asked to swallow any accumulated saliva and then instructed to chew the gum at a regular rate (using a metronome)... Sjögren’s syndrome). IL) is given to the test subject and they chew the substance at a regular rate.25–0. Additionally.0 . The reflexive method is based on the reflex response occurring during the mastication of a bolus of food. has under gone head and neck radiation. Five drops of a 1-6% citric acid solution is applied to the dorsum of the tongue every 30 seconds. 2004). expectorates periodically into a preweighed disposable plastic cup. The saliva accumulates in the mouth and is expectorated intermittently for a period of five minutes. The armamentarium used for this procedure is illustrated in Figure 1.49. The major problem is the small amount of saliva derived from collection. The cup with the saliva specimen is reweighed and the flow rate determined gravimetrically. 1989).) is placed in the subject's mouth. if the subject has autoimmune disorders (e. Citric acid is the most widely used stimulant. The gustatory technique requires the use of an oral based secretory stimulant. Therefore. a standardized bolus (1 gram) of paraffin or a gum base (Wrigley Co. There are two methods for collecting stimulated whole saliva. In conclusion. This procedure is continued for a period of five minutes. Unstimulated saliva flow rates are also influenced by circadian and circannual rhythms.1. .5 ml over a five minute period.54 ml/minute is the range for healthy individuals (0.. The subject expectorates intermittently during the collection period for duration of five minutes. for consistency.11.14. This is an accurate technique as it has a reliability coefficient of r = 0. One method of collection is the gustatory method and the other is the reflexive or “masticatory” technique. This is a very reliable method. The procedure for collecting Stimulated Whole Salivary Gland Secretions is as follows: A standard piece of unflavored gum base (1. upon sufficient accumulation of saliva in the oral cavity. There are several drawbacks when using unstimulated saliva. 2004).The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 251 0. Peoria. Usually. All other participants will need to be collected at approximately the same time in order to reduce inter-variability among the participating subjects. The alternative to using unstimulated whole saliva is obviously to use stimulated whole saliva. you will be able to collect sufficient quantities of saliva despite health status and medication usage.35 ml/minute normal range) under ideal conditions using those aforementioned collection methodologies. or is very elderly.95 and a within subject variability of 0.g.76 and has a within subject variance of 0. The 0. The volume and flow rate is then recorded along with a brief description of the specimen’s physical appearance (Gu et al. Gu et al. 1989. anti-hypertensive medications) the amount collected will be significantly reduced.5 g. The subject. it will be difficult to obtain 0. however. one can conclude that using unstimulated saliva is not the ideal medium for cancer biomarker discover. due to the small quantity of specimen obtained from these techniques and the large within subject variance. The flow rate range is 1 – 3 ml/minute for healthy individuals (Birkhed & Heintze. Armamentarium for the collection and storage of stimulated whole saliva 1. total protein and urea concentrations. protease inhibitor cocktail (Sigma. 2002). We used c-erbB-2 to test for specimen stability as this is a large 185-kDa protein. lactoferrin (77 kDa) and histidine rich proteins concentrations. Fig. An example of this statement is demonstrated in a manuscript assessing salivary proteins associated with burning mouth syndrome (Moura et al. potassium. The results of the study showed that mean salivary flow rates among control patients were lower than that of burning mouth syndrome patients. iron. thiocyanate. 2007). glucose. Chloride.. 1 mg/ml whole saliva) and 1 mM of sodium orthovanadate are added immediately after sample collection (Shevchenko et al. 1993 where they assayed serially sampled salivary specimens which were collected over a ten year period for total protein. 1. phosphorus and potassium levels were elevated in patients with burning .2. which would be susceptible to degradation by proteases and other biochemical activity.5 ml aliquots. In order to minimize the degradation of the proteins. and frozen (-80°C). The specimen is next divided into 0. ten healthy subjects were serially sampled for saliva over a five-year period. Long-term Saliva specimen banking Roughly 2 .3 Studies using Saliva protein profiling for disease state detection The majority of the literature concerning human saliva biometrics is associated with the oral cavity and its associated maladies.These results are consistent with Wu et al. The results are shown in Figure 1 and illustrate protein stability when frozen at 80°C. calcium.4.252 Biometrics 1. The principle objective the present study was to analyze the characteristics of salivary production and its composition in individuals with burning mouth syndrome.. The investigators compared salivary flow rates. In their study. they found no concentration differences due to specimen aging. To assess specimen degradation. chloride.5 ml of whole saliva will be obtained from the individual. phosphorus. placed into bar code labeled cryotubes. magnesium. All samples are kept on ice during the process. as well as the expression profile of salivary proteins by SDS-PAGE among healthy individuals and those diagnosed with burning mouth syndrome. High-resolution two-dimensional gel electrophoresis and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry was combined to perform a large scale analysis of the salivary specimens. As only particular isoforms of proteins were modulated. Additionally.. Eighteen biomarkers in serum and three chemokines measured in saliva could predict strain membership with 80% to 100% accuracy. three isoforms of salivary acidic protein-1. An additional study (Delaleu et al. CD40 ligand. autoantibodies and other biomarkers. IL-18. . Fourteen isoforms of α-amylase. growth factors. nonobese diabetic mice tended to exhibit impaired salivary flow. the analysis of the expression of salivary proteins by Coomassie blue SDSPAGE revealed a lower expression of low molecular weight proteins in individuals with burning mouth syndrome compared to healthy controls. The results suggested that the identification and characterization of low molecular weight salivary proteins in burning mouth syndrome may be important in understanding BMS pathogenesis. they intended to compute a model of the immunological situation representing the overt disease stage of Sjögren's syndrome. and three isoforms of salivary cystatins SA-1 were detected as under expressed proteins. The investigators found non-diabetic. They concluded that the autoimmune manifestations of Sjögren's syndrome are greatly independent and associated with various immunological processes. Thirty-eight biomarkers in serum and 34 in saliva obtained from non-diabetic. The proteins under expressed were all known to be implicated in the oral anti-inflammatory process. The proteomic comparison of saliva samples from healthy subjects and poorly controlled type-1 diabetes patients revealed a modulation of 23 proteins. however. thus contributing to its diagnosis and treatment. respectively). Age-matched and sex-matched Balb/c mice served as a reference.223).041. non-obese diabetic mice for salivary gland dysfunction.001 and 0.The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 253 mouth syndrome (p = 0.. whereas two isoforms of serotransferrin were over expressed secondary to type-1 diabetes. 2008) investigated the involvement of 87 proteins measured in serum and 75 proteins analyzed in saliva in spontaneous experimental Sjögren's syndrome. cytokines. Another study using salivary protein profiles investigated the modification of the salivary proteome occurring in type 1 diabetes and to highlight potential biomarkers of the disorder. granulocyte chemotactic protein-2 and anti-muscarinic M3 receptor IgG3 may connect the different aspects of Sjögren's syndrome. Factor analyses identified principal components mostly correlating with one clinical aspect of Sjögren's syndrome and having distinct associations with components extracted from other families of proteins. glandular inflammation and increased secretory SSB (anti-La) levels. one prolactin inducible protein. comprising chemokines. 0.034. were quantified using multi-analyte profile technology and fluorescence-activated cell sorting. type-1 diabetes seemed to differentially affect posttranslational modification of the proteins (Hirtz et al. 2006). suggesting that the pathology induced a decrease of non-immunological defense of oral cavity. This approach further established saliva as an attractive biofluid for biomarker analyses in Sjögren's syndrome and provides a basis for the comparison and selection of potential drug targets and diagnostic markers (Delaleu et al. The mice aged 21 weeks and were evaluated for salivary gland function. In addition. non-obese diabetic mice were significantly different from those in Balb/c mice. salivary gland inflammation and extra-glandular disease manifestations. Total salivary protein concentrations were reduced in individuals with burning mouth syndrome (p = 0. they used non-diabetic. Processes related to the adaptive immune system appear to promote Sjögren's syndrome with a strong involvement of Thelper-2 related proteins in hyposalivation. CD40. In this animal study. 2008). The analytes.. analytical software and bioinformatics have enabled the researchers to analyze complex peptide mixtures with the . liquid chromatography. a composite of varying staged patients formed these saliva pools. The subjects were matched for age and race and were non-tobacco users. Additionally. These included women diagnosed with moderate cervical dysplasia.2 Saliva collection and sample preparation Stimulated whole salivary gland secretion is based on the reflex response occurring during the mastication of a bolus of food. Stage I. the investigators collected saliva from women that were healthy and from those diagnosed with carcinoma breast in the following stages: Stage 0. The authors recommend this salivary collection method with the following modifications for consistent protein analyses.3 LC-MS/MS mass spectroscopy with isotopic labeling Recent advances in mass spectrometry. 2.1 Methods 2. 1982). as revised in 1983. Ten saliva specimens were pooled for each type of carcinoma. These tumors were all squamous cell carcinomas. The treated samples were centrifuged for 10 minutes at top speed in a table top centrifuge. a standardized bolus (1 gram) of paraffin or a gum base (generously provided by the Wrigley Co. Stage IIa and Stage IIb. In order to achieve this objective. severe cervical dysplasia and cervical carcinoma in situ. Due to the difficulty in obtaining early stage tumors for ovarian. The saliva samples were pooled by combining equal volumes of cleared stimulated whole saliva from a set of archived healthy and cancer subjects. which stores the specimens at -80°C.1. The final group consisted of women diagnosed with head and neck squamous cell carcinomas ten women with varying stages of development. The specimens were banked at the University of Texas Dental Branch Saliva repository.1 Study design The purpose of this study was to determine if individuals could be protein profiled and potentially classified as having cancer. endometrial and head/neck carcinomas. These malignancies were identified as adenocarcinomas.1. Louis. All the tumors were adenocarcinomas.254 Biometrics 2. USA) is added along with enough orthovanadate from a 100mM stock solution to bring its concentration to 1mM. upon sufficient accumulation of saliva in the oral cavity. Usually. All procedures were in accordance with the ethical standards of the UTHSC IRB and with the Helsinki Declaration of 1975. The investigator also wanted to ascertain if there were alterations of the protein profiles due to the primary tissue site and the varying degree of tumor staging. Women diagnosed with ovarian and endometrial carcinomas were also included in the study. Previous studies by the investigator have demonstrated that properly prepared specimens can remain in storage for a long period of time.. Peoria. The individual. The supernatant was divided into 1 ml aliquots and frozen at -80°C. The cup with the saliva specimen is reweighed and the flow rate determined gravimetrically. A protease inhibitor from Sigma Co (St. The volume and flow rate is then recorded along with a brief description of the specimen’s physical appearance (Navazesh &Christensen. 2. IL) is given to the subject to chew at a regular rate. expectorates periodically into a preweighed disposable plastic cup. specimens were collected from women diagnosed with varying gynecological carcinomas. Current research in Salivary protein profiling for cancer detection 2.1. This study was performed under the UTHSC IRB approved protocol number HSC-DB-050394. This procedure is continued for a period of five minutes. MI. The acetone supernatant was removed and the pellet resuspended in 20 ųl dissolution buffer. Wilmarth et al.000 x g for 20 minutes.. numerous proteins in complex mixtures are identified and can be compared for their relative concentrations in each mixture. Applied Biosystems recently introduced iTRAQ reagents (Gu et al.. Thus even in complex protein mixtures MS/MS data can be used to sequence and identify peptides by sequence analysis with a high degree of confidence (Birkhed et al. CA. Isotopic labeling of protein mixtures has proven to be a useful technique for the analysis of relative expression levels of proteins in complex protein mixtures such as plasma. the saliva samples were thawed and immediately centrifuged to remove insoluble materials. 1989. 2004. Odense. The supernatant was assayed for protein using the Bio-Rad protein assay (Hercules. 2004). CA.The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 255 ability to detect proteins differing in abundance by over 8 orders of magnitude (Wilmarth et al. 1990). urine or saliva to identify differentially expressed proteins (18).. however. One current method is isotopic labeling coupled with liquid chromatography tandem mass spectrometry (IL-LC-MS/MS) to characterize the salivary proteome (Gu et al. The HPLC is coupled to the mass spectrometer by a nanospray ESI head (Protana. 2002).4 Salivary protein analyses with iTRAQ Briefly.. 2004). The higher resolution offered by the tandem Qq-TOF mass spectrometer is ideally suited to isotopically labeled applications (Gu et al. The main approach for discovery is a mass spectroscopy based method that uses isotope coding of complex protein mixtures such as tissue extracts. The approach readily identifies changes in the level of expression. it is revealed during collision induced dissociation by MSMS analysis.... The advantage of tandem mass spectrometry combined with LC is enhanced sensitivity and the peptide separations afforded by chromatography. 2002.. Koomen et al 2004. Foster City. Thus in the MSMS spectrum for each peptide there is a fingerprint indicating the amount of that peptide from each of the different protein pools.1. Cysteine residues were . Since virtually all of the peptides in a mixture are labeled by the reaction. 2004). USA) and an aliquot containing 100 µg of each specimen was precipitated with 6 volumes of -20ºC acetone. CA). Foster City. There are numerous methods that are based on isotopically labeled protein modifying reagents to label or tag proteins to determine relative or absolute concentrations in complex mixtures. Briefly. The soluble fraction was denatured and disulfides reduced by incubation in the presence of 0. Thus even in complex mixtures there is a high degree of confidence in the identification. 2004. the acetone precipitable protein was centrifuged in a table top centrifuge at 15.1% SDS and 5 mM TCEP (tris-(2-carboxyethyl)phosphine)) at 60ºC for one hour. 2004. saliva urine or cell extracts. The analysis was performed on a tandem QqTOF QStar XL mass spectrometer (Applied Biosystems. Koomen et al 2004. thus permitting the analysis of putative regulatory pathways providing information regarding the pathological disturbances in addition to potential biomarkers of disease. Shevchenko et al. The precipitate was resuspended and treated according to the manufacturers instructions. 1990). USA) equipped with a LC Packings (Sunnyvale. blood. CA. The real advantage is that the tag remains intact through TOF-MS analysis. Ward et al. which are amino reactive compounds that are used to label peptides in a total protein digest of a fluid such as saliva... Gu et al. Protein digestion and reaction with iTRAQ labels was carried out as previously described and according to the manufacturer’s instructions (Applied Biosystems. Denmark) for maximal sensitivity (Shevchenko et al.. Ward et al. USA) HPLC for capillary chromatography. 2. Analytical strategy for quantifying peptides using iTRAQ tagging . 2.5 ml aliquots of 10 mM KH2PO4 at pH 3.1% formic acid. the four separate iTRAQ reaction mixtures were combined. CA) column that has been equilibrated with the same buffer. The fractions are evaporated by Speed Vacuum to about 30% of their volume to remove the acetonitrile and then slowly applied to an Opti-Lynx Trap C18 100 ul reverse phase column (Alltech. To improve the resolution of peptides during LCMSMS analysis. The column is washed with 1 ml loading buffer to remove contaminants.1% formic acid and eluted in one fraction with 0. The protein digests were labeled by mixing with the appropriate iTRAQ reagent and incubating at room temperature for one hour. 233 mM and 350 mM KCl respectively. Since there are a number of components that can interfere with the LC-MS/MS analysis.256 Biometrics blocked by incubation at room temperature for 10 minutes with MMTS (methyl methanethiosulfonate). Deerfield. On completion of the labeling reaction. The combined peptide mixture is diluted 10 fold with loading buffer (10 mM KH2PO4 in 25% acetonitrile at pH 3. Each of the three fractions was analyzed by reverse phase LCMSMS.0 in 25% acetonitrile containing 116 mM. the labeled peptides are partially purified by a combination of strong cation exchange followed by reverse phase chromatography on preparative columns. IL) with a syringe. The column was washed with 1 ml of 2% acetonitrile in 0.0) and applied by syringe to an ICAT Cartridge-Cation Exchange column (Applied Biosystems.1% formic acid in 20% acetonitrile. The analytical strategy is illustrated in Figure 2. Fig. The fractions were dried by lyophilization and resuspended in 10 ul 0. The mixture was incubated overnight at 37ºC. the peptide mixture is partially purified by elution from the cation exchange column in 3 fractions.3 ml of 30% acetonitrile in 0. Foster City. Stepwise elution from the column is achieved with sequential 0. Trypsin was added to the mixture to a protein:trypsin ratio of 10:1. Graphic comparisons with log conversions and error bars for protein expression were produced using the ProQuant® software.7 Western blot analysis for marker validation 2. and the pooled saliva samples. Dry PVDF membranes were activated in methanol for about 10 seconds then transferred to soak in 1X PBS-T for 3 washes .1.1% formic acid in 2% acetonitrile/98% Milli-Q water and buffer (B): 0. The gels were equilibrated and extra thick blot paper and PVDF membranes were soaked in 1X TGS buffer for 15 minutes prior to running Western Transfer. followed by mobile phase elution: buffer (A) 0. Four-fifteen percent Tris-HCl polyacrylamide gels were loaded with molecular weight markers. Each saliva sample set there are three separate LCMSMS analyses. The pooled saliva was mixed with loading buffer (Laemmli buffer containing BME) in 1:1 ratio. Polyvinylidene fluoride (PVDF) membranes were air dried for minimum of 1 hour.1.The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 257 2. Transfer conditions were 0. 17 volts.52mA constant.1 Preparation of samples We selected the protein profilin-1 for validating the presence of these proteins in saliva. controls.7. The ProGroup reports were generated with a 95% confidence level for protein identification.1.15 Da for the precursor and 0. The chromatographic system consists of an UltiMate nano-HPLC and FAMOS auto-sampler (Dionex LC Packings). The IDA collision energy parameters were optimized based upon the charge state and mass value of the precursor ions. Using information-dependent acquisition. The mass spectrometer automatically chooses the top two ions for fragmentation with a 60 second dynamic exclusion time.1% formic acid in 98% acetonitrile/2% Milli-Q water. The peptides were eluted from 2% buffer B to 30% buffer B over 180 minutes at a flow rate 220 nL/min. The LC eluent was directed to a NanoES source for ESI/MS/MS analysis.6 Bioinformatics The Swiss-Prot database was employed for protein identification while the PathwayStudio® bioinformatics software package was used to determine Venn diagrams were also constructed using the NIH software program (http://ncrr.1 Da for the fragment ions. Semi-dry transfer apparatus was used. #Ab10608 diluted 1:1000. The accumulated MSMS spectra are analyzed by ProQuant and ProGroup software packages (Applied Biosystems) using the SwissProt database for protein identification. 30 minutes in 1X TGS buffer. peptides were selected for collision induced dissociation (CID) by alternating between an MS (1 sec) survey scan and MS/MS (3 sec) scans.gov).1. The saliva samples from a healthy individuals.5 Reverse phase LCMSMS The desalted and concentrated peptide mixtures were quantified and identified by nano-LCMS/MS on an API QSTAR XL mass spectrometer (ABS Sciex Instruments) operating in positive ion mode. 2. 3cm fused silica C18 capillary column. 19 minutes. 2. The profilin-1 antibody was a rabbit polyclonal from the Abcam Co. benign tumor subjects and individuals diagnosed with Stage IIa her2/neu receptor positive breast cancer subjects and Stage IIa her2/neu receptor negative breast cancer subjects were pooled by combining equal volumes of cleared stimulated whole saliva from a set of archived specimens. The sample was then incubated at 95°C for 5 minutes and was then loaded onto the 415% Tris-HCl polyacrylamide gel. Peptides were loaded on a 75cm x 10 cm. Electrophoresis was run at 200 Volts. The ProQuant analysis was carried out with a 75% confidence cutoff with a mass deviation of 0.pnl. 74 2. Experimental results in salivary protein profiling for cancer detection 3.74 0.37 Breast Ovarian Endo. The membranes were incubated overnight with a primary antibody in PBS-T.26 0. Severe Stage 0 Variable 4.46 0.74 0. Finally.1 Mass spectrometry analysis Tables 1-5 summarize the results of the proteomic analysis and illustrates protein comparisons between breast (Stage 0 – IIb).65 1. Staging Stage Stage Stage Stage Variable 0 I IIa IIb Variable Mod.57 Cervical H&N NUCB2 P80303 Table 1.31 1.57 0.86 0. The table shows which proteins are unique to each type and stage of carcinoma . The membranes were then incubated for 1 hour in 5% NFDM in PBS-T.82 3.84 0.77 5.40 0.28 0.47 0.32 0. the membranes were washed 3 times for 5 minutes in PBST. Afterwards the membranes were washed 3 times for 5 minutes in PBS-T.36 0.50 1.42 0.62 0.47 0.16 3. severe dysplasia and in situ) Proteins Gene ID Accession A1AT P01009 ANXA3 P12429 CO3 HPTR K1C16 COBA1 LUZP1 ZN248 CYTC KAC K2C6C K1CJ SCOT2 VEGP PROF1 NGAL HEMO CYTD CRIS3 ACBP KLK PIP PERL PPIB P01024 P00739 P08779 P12107 Q86V48 Q8NDW4 P01034 P01834 P48666 P13645 Q9BYC2 P31025 P07737 P80188 P02790 P28325 P54108 P07108 P06870 P12273 P22079 P23284 0.258 Biometrics of 5 minutes each.88 1. The membranes were washed 3 times for 5 minutes in PBS-T and were incubated for 4 hours with secondary antibody (HRP conjugate) in PBS-T.80 0.42 0. Again.80 1. the membranes were treated with ECL plus detection reagents and photographed with exposure of 800 seconds.88 1. 3.90 1.83 0.32 0. cervical (moderate. Carcinoma of the breast and its associated stages accounted for nearly 58% of the total unique proteins. 2007). endometrial and head and neck cancer subjects. The values exhibited in table 1 and table 2 represent the ratio of disease state to healthy state.05) in the saliva from cancer subjects as compared to the healthy individuals. over 50% of the proteins in table 3 are referenced in the literature for the presence of carcinoma in blood and cell supernatants from cancer cell lines (Polanski & Anderson. Cervical 259 H&N Stage 0 Stage I Stage IIa Stage IIb Variable Variable Mod. were common to both squamous cell carcinoma and the adenocarcinoma cancer types. The table demonstrates the unique. Generally. This was particularly noticed among the S100 proteins. Those with values greater than 1. the proteins appear to be diverse with the exception of the cytoskeletal associated proteins which comprised 20% of the protein panel. Severe Stage 0 Variable Table 2. There were 13 proteins that were exclusive to the gynecological carcinomas despite the fact that some were squamous cell carcinoma or adenocarcinoma cancer types. 21 percent. The value is divided by the corresponding value yielded for that particular protein in the healthy cohort.The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women Breast Protein Status Unique Proteins Up Regulated Total Proteins Up Regulated 3 9 16 4 3 7 7 11 18 11 14 25 0 4 4 2 11 13 2 15 17 0 2 2 7 7 14 7 9 16 1 0 1 17 5 22 18 5 23 0 0 0 21 5 26 21 5 26 0 4 4 20 7 27 20 11 31 0 0 0 13 4 17 13 4 17 1 0 1 13 8 21 14 8 22 2 2 4 15 7 22 17 9 26 Down Regulated 6 Common Proteins Down Regulated 8 Total Proteins 24 Total Number Proteins Up Regulated Grand Total 19 33 Down Regulated 14 Ovarian Staging Endo. Unique being defined as not associated with the other types of carcinoma. 2006). Table 5 illustrates the function of the proteins listed in table 4. Eleven of the unique proteins were up-regulated and fourteen were down-regulated. Additionally. These proteins comprised 25% of the protein panel. Table 3 indicates the function of the associated proteins. . common and total protein counts among the varying carcinomas and their stages ovarian.000 were considered down-regulated proteins. Table 1 lists the 25 proteins that are unique to each type of carcinoma with their corresponding ratios. Simply stated the ratio for each protein is a result of the log sum integrated area under each peak formed when plotting the intensity versus mass-to-charge ratios for each peptide (Boehm et al. 166 proteins were identified at a confidence level at >95. The following table represents the up and down-regulated proteins that over-lapped the various types of carcinomas. In total.. Of these there were 76 proteins that were determined to be expressed significantly different (p<0.000 were considered up-regulated proteins while those with values less than 1. There were no unique proteins associated with endometrial carcinoma or severe dysplasia of the cervix. Numerous proteins. Significance between ratios is determined by t-test analysis. 33 0.17 1.49 1.77 0.62 0.43 1.67 0.47 3.09 2. Severe Stage 0 Variable 2.16 0.86 0.11 0.50 2.85 1.63 1.50 0.260 Gene ID A1AT ACBP ANXA3 CO3 COBA1 CRISP3 CYTC CYTD HEMO HPTR K1C16 K1CJ K2C6C KAC KLK LUZP1 NGAL NUCB2 PERL PIP PPIB PROF1 SCOT2 VEGP ZN248 Accession P01009 P07108 P12429 P01024 P12107 P54108 P01034 P28325 P02790 P00739 P08779 P13645 P48666 P01834 P06870 Q86V48 P80188 P80303 P22079 P12273 P23284 P07737 Q9BYC2 P31025 Q8NDW4 Protein Name Alpha 1 protease inhibitor AcylCoA binding protein Annexin A3 Complement 3 precursor Collagen alpha-1 chain Cysteine rich secretory protein Cystatin C Cystatin D precursor Hemopexin precursor Haptoglobin – 1 related protein Cytoskeleton 16 Cytoskeleton 10 Cytoskeleton 6C Immunoglobulin kappa chain C region Kallikrein-1 Leucine zipper protein-1 Neutrophil Gelatinase associated lipocalin Nucleobindin-2 precursor Lactoperoxidase Prolactin inducible protein precursor Peptidyl-prolyl cis trans isomerase B Profilin-1 Succinyl CoA Lipocalin .46 0.65 0. .78 2.54 1.91 1.53 0.86 Variable Mod.14 0.59 3.06 0.15 2.86 1.84 1. Represents the up and down-regulated proteins that over-lapped the various types of carcinomas.79 0.75 0. The table reveals the protein functions associated with the varying carcinomas and their stages Proteins Gene ID 1433S ADA32 ALBU ALK1 AMYS ANXA1 BPIL1 CAH6 CYTA CYTN CYTS CYTT DEF3 Accession O70456 Q8TC27 P02768 P03973 P04745 P04083 Q8N4F0 P23280 P01040 P01037 P01036 P09228 P59666 Breast Ovarian Endo.57 2.04 0.15 1.46 1.64 1.84 1.94 1.12 1.50 0.61 Table 4.63 0.94 0.58 2.14 0.80 0.59 0.55 2.69 1.73 0.23 0. Staging Cervical H&N Stage Stage Stage Stage Variable 0 I IIa IIb 1.52 0.1 Zinc finger protein 248 Biometrics Reported Function Protease inhibitor Transport protein Inhibits phospholipase activity Activates complement system Fibrillogenesis Immune response Inhibitor of cysteine proteases Protein degradation & inhibitor Heme transporter protein proteolysis Cytoskeleton associated Cytoskeleton associated Cytoskeleton associated Immune response Cleaves kininogen Nucleus functioning protein Iron trafficking protein Calcium binding protein Transport.16 0.94 1. antimicrobial Secretory actin binding protein Accelerates protein folding Cytoskeleton associated Ketone body catabolism Ligand binding protein Transcriptional regulation Table 3.78 1. 22 1.36 Table 4.55 0.68 1.88 1.80 1.The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women Breast Ovarian Endo. Represents the up and down-regulated proteins that over-lapped the various types of carcinomas.86 0.83 0.72 1.73 1.23 1.36 0.26 1.53 1.41 1.04 1.94 1.64 1.22 1.34 2.13 0.10 0.82 2.51 0.61 7.95 2.71 0.91 1.07 5.22 1.49 1.81 1.13 2.36 1.87 0.32 5.90 0.48 1.51 1.43 0.26 0.77 5.65 2.92 Variable Mod.38 2.74 0.45 2.42 0.72 0.11 0.76 1.33 1.29 0.58 0.46 3.26 2.42 1.55 0.11 1.22 0.75 0.33 1.43 0.44 1.55 1.66 0.88 1.83 1.35 3.92 1.25 0.46 1.74 3.33 0.75 6.18 1.78 1.62 2.46 4.41 3.54 1.88 1.73 1.83 0.76 0.65 2.10 1.73 0.87 1.56 0.23 Cervical H&N 261 Proteins Gene ID DMBT1 ENOA FABPE FGRL1 HPT IGHA1 IGHA2 IGHG1 IGHG2 IGJ K1C13 K1C14 K1CM K1CP K22O K2C1 K2C4 K2C5 K2C6A K2C6E LAC LCN1 LV3B MUC MUC5B PERM PIGR PRDX1 S10A7 S10A8 S10A9 SPLC2 SPRR3 THIO TRFE TRFL TRY ZA2G Accession Q9UGM3 P06733 Q01469 Q8N441 P00738 P01876 P01877 P01857 P01859 P01591 P13646 P02533 P13646 P08779 Q01546 P04264 P19013 P13647 P02538 P48668 P01842 P31025 P80748 P01871 Q9HC84 P05164 P01833 Q06830 P31151 P05109 P06702 Q96DR5 Q9UBC9 P10599 P02787 P02788 P07477 P25311 3.65 1.47 4.40 1.26 1.22 3.30 0.50 5.(continuation) .84 0.40 4.42 1.13 0.15 0.43 2. Staging Stage Stage Stage Stage Variable 0 I IIa IIb 1.67 1.09 1.46 2.46 1.34 2.10 1.67 0.00 0.64 1.67 1.84 0.74 0.15 0.91 3.39 1.22 5.64 2.49 1.19 0.29 1.34 0.80 1.50 1.80 1.21 0.89 1.47 3.17 5.08 4.39 5.18 0.63 0. Severe Stage 0 Variable 1.91 0.55 1.54 1.71 0.23 1.33 1.22 1.81 1.55 6.05 1.21 0.82 1.72 0. type I cytoskeletal 13 Keratin. type I cytoskeletal 16 Keratin. type II cytoskeletal 1 Keratin. type II cytoskeletal 6C Lactoperoxidase Lipocalin-1 precursor Ig lambda chain V-III region LOI Ig Mu Chain Mucin-5B Myeloperoxidase Polymeric immunoglobulin receptor Peroxiredoxin-1 Reported Function Biometrics Signaling Role in sperm development Protein transport Acid-stable proteinase inhibitor Enzyme Involved in exocytosis Lipid binding Hydration of carbon dioxide. type II cytoskeletal 5 Keratin. type II cytoskeletal 6A Keratin. epidermal Fibroblast growth factor receptor 1 Haptoglobin Ig alpha-1 chain C region Ig alpha-2 chain C region Ig gamma-1 chain C region Ig gamma-2 chain C region Immunoglobulin J chain Cytokeratin 13 Cytokeratin 14 Keratin.262 Gene ID 1433S ADA32 ALBU ALK1 AMYS ANXA1 BPIL1 CAH6 CYTA CYTN CYTS CYTT DEF3 DMBT1 ENOA FABPE FGRL1 HPT IGHA1 IGHA2 IGHG1 IGHG2 IGJ K1C13 K1C14 K1CM K1CP K22O K2C1 K2C4 K2C5 K2C6A K2C6C K2C6E LAC LCN1 LV3B MUC MUC5B PERM PIGR PRDX1 Accession O70456 Q8TC27 P02768 P03973 P04745 P04083 Q8N4F0 P23280 P01040 P01037 P01036 P09228 P59666 Q9UGM3 P06733 Q01469 Q8N441 P00738 P01876 P01877 P01857 P01859 P01591 P13646 P02533 P13646 P08779 Q01546 P04264 P19013 P13647 P02538 P48666 P48668 P01842 P31025 P80748 P01871 Q9HC84 P05164 P01833 Q06830 Protein Name 1433 sigma Disintegrin Albumin Antileukoproteinase Amylase Annexin A1 Bactericidal/permeability-increasing protein-like 1 Carbonic anhydrase 6 Cystatin-A Cystatin-SN Cystatin-S Cystatin-SA Neutrophil defensin 3 Deleted in malignant brain tumors 1 Alpha-enolase Fatty acid-binding protein. Activates complement pathway Immune Response Contribute to the lubricating Microbiocidal activity Binds IgA and IgM at cell surface Involved in redox regulation Table 5. Intracellular thiol proteinase inhibitor. . type II cytoskeletal 4 Keratin. type II cytoskeletal 2 oral Keratin. Cysteine proteinase inhibitors Inhibits papain and ficin Thiol protease inhibitor Antimicrobial activity Candidate tumor suppressor gene Multifunctional enzyme Keratinocyte differentiation Negative effect on cell proliferation Combines with free hemoglobin Immune Response Immune Response Immune Response Immune Response Immune Response Cytoskeleton Cytoskeleton Cytoskeleton Cytoskeleton Contributes to terminal cornification Regulate the activity of kinases Cytoskeleton Protein binding Protein binding Structural molecule activity Cytoskeleton organization Antimicrobial Plays a role in taste reception. The table represents the function of the common proteins associated with the various types of carcinomas. type II cytoskeletal 6C Keratin. Figure 4 Fig.(continuation) 3. Figure 3 illustrates the presence of profilin-1 protein in both the human submandibular gland cell lysates and in whole saliva. Fig. 4.2 Western blot analyses The results of the western blot suggest the presence of profilin-1 in saliva. The table represents the function of the common proteins associated with the various types of carcinomas. Redox activity Iron binding transport proteins Iron binding transport proteins Activity against synthetic substrates Signaling Table 5.The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women Gene ID S10A7 S10A8 S10A9 SPLC2 SPRR3 THIO TRFE TRFL TRY ZA2G Accession P31151 P05109 P06702 Q96DR5 Q9UBC9 P10599 P02787 P02788 P07477 P25311 Protein Name S100 A7 S100 A8 S100 A9 Epithelial carcinoma associated protein-2 Small proline-rich protein 3 Thioredoxin Serotransferrin Lactotransferrin precursor Trypsin-1 Zinc alpha-2 glycoprotein Reported Function Interacts with RANBP9 Calcium-binding protein Calcium-binding protein Lipid binding protein 263 Protein of keratinocytes. 3. . The western blot indicates the presence of the 15 kDa profilin-1 protein in human submandibular gland cell lysates and stimulated whole saliva The western blot analyses also revealed the presence of profilin-1 In SKBR3 cell lysates. Also revealed the presence of profilin-1 In SKBR3 HER2/neu receptor positive breast cancer cell lysates. Figure 5 Fig. For example.3 Conclusions It is interesting to note that the salivary protein profile mimics the findings of protein alterations secondary to cancer in serum. produces a humoral response in the salivary glands. proliferation. that in the presence of disease.d. there are alterations in the metabolic. appear to be in equilibrium both intercellularly and extracellularly with each pathway fulfilling the resultant phenotypic process of growth. Another possible explanation is active transport of the proteins of interest. Profilin is a down-regulated protein in the presence of malignancy and it is visualized by the lighter bands associated with malignancy. i. It is also worth noting that the Her2/neu receptor negative band is darker than the Her2/neu receptor positive counterpart suggesting further down-regulation of the profilin-1 protein.264 Biometrics Figure 5 illustrates the presence of profilin-1 in the saliva sampled from healthy. which are strongly implicated in tumor growth and cancer metastases. This figure illustrates the presence and modulation of salivary profilin-1 protein secondary to HER2/neu receptor status. It is plausible that these proteins are secreted into saliva as consequence of localized regulatory function in the oral cavity via signal transduction similar to the proposed explanation of HER-2/neu protein in nipple aspirates. 5. there are salivary protein alterations in the cytoskeleton phenotype. 3. These “loop” mechanisms. and differentiation. in health. that there is an over abundance of protein resulting from the rapid growth of the malignancy which in turn. The investigator has spent 15 years investigating the phenomena and postulates. . benign and malignant tumor patients. growth and signaling pathways all of which provide support for the concept that salivary protein profiles are altered secondary to the presence of cancer. carcinoma. This response results in altered salivary protein concentrations.. tissues and cell lysates that are numerously cited in the literature (Polanski et al. Additionally. 2006). The multi-marker approach is evident as it avoids the pitfalls of a single marker concept as demonstrated by the current shortcomings of the prostate specific antigen marker for prostate cancer. It is obvious from the resultant data that cancer is a very complex disease process involving numerous gene alterations in varying numbers of molecular pathways. This is merely speculation at this time.The Use of Saliva Protein Profiling as a Biometric Tool to Determine the Presence of Carcinoma among Women 265 The results of this preliminary study suggest that there are two panels of biomarkers that need to be assessed. pp. skin. cervix. In: Human Saliva: Clinical Chemistry and Microbiology . Gillson-Longenbaugh Foundation and the Texas Ignition Fund. Tenovuo JO. References Birkhed D & Heintze U (1989). Immervoll H.Volume 1..e. however. (1541-1548). Biomarker profiles in serum and saliva of experimental Sjögren's syndrome: associations with specific autoimmune . It may be possible to develop a “global” profile for the overall detection of cancer while using the unique protein identifiers to determine the site of the tumor. CRC press. potentially could determine the site of the tumor. colon. 91. and pH. brain and kidney at varying stages of development also need to be studied and their resultant profiles added to the catalogue of cancer associated protein profiles. S100A8 and S100A9 you may predict the presence of a malignancy and by using the biomarkers listed in table 1 you. pancreas. (Sept. Pee D. The authors would also like to thank Dr. Delaleu N. There are those biomarkers that are unique to each tumor site i. ISSN 0027-8874. J Natl Cancer Inst . Costantino JP. 18. Gail MH. These results are very preliminary and investigated only the abundant proteins found in saliva. advances in mass spectrometry and microarray technology may make the concept cost effective and logistically feasible. and those that are common to various types of malignancies. Other carcinomas such as lung. buffer capacity. (2530). Anderson S. The finding may prove to be useful. by using the biomarkers S100A7. Acknowledgement The research presented in this chapter was supported by the Avon Breast Cancer Foundation (#07-2007-071). One major question that is essential to answer is. Komen Foundation (KG080928). Theoretically. bladder. breast. Boca Raton. Additionally. This in turn renders the concept of the single biomarker as passé with respect to cancer detection. the peptidome and the fragmetome need to be evaluated to further enhance the profile. etc. William Dubinsky for the LC-MS/MS salivary mass spectrometry analyses and Dr. pp. Benichou J & Wieand HS (1999): Validation studies for models projecting the risk of invasive and total breast cancer incidence. 5. Due to the overwhelming complexity of the malady high-throughput protein detection coupled with protein modeling in health and disease may yield the tools necessary for life saving early cancer detection. editor. Redmond CK. Cornelius J & Jonsson R. Salivary secretion rate. Karen Storthz for her technical expertise concerning the western blot analyses. ISBN 0-8493-6391-8. “Are we identifying the primary site of the tumor’s origin?” This information is essential for rendering “tailored” treatment regimens. 1999). Further research is required to determine the low abundant protein profiles associated with the various carcinomas. The results of this study suggest that it is feasible to employ biometric principles for cancer detection. 4. Koomen JM. Madden TE & David LL (2004).10. Coombes KR. Reid. pp. Wu AJ. Strategies for internal amino acid sequence analysis of proteins separated by polyacrylamide gel electrophoresis. Interaction between heparin and plasma proteins analyzed by crossed immunoelectrophoresis and affinity chromatography. (Oct 2004). 519. 41.266 Biometrics manifestations. "De novo" sequencing of peptides recovered from in-gel digested proteins by nanoelectrospray tandem mass spectrometry. 7. 2011 Feb 28. (117-127). 83. Xiao L. Moura S. 18. Negreiros A. Two dimensional liquid chromatography study of the human whole saliva proteome. (2005). ISSN 0021-9673.13(1):205. Lima D. Silva F & Costa L. ISSN 1542-6416. different-aged subjects. Vissink A.. 1158-1162. Lu H. J Chroma. A comparison of whole mouth resting and stimulated salivary measurements. 5. ISSN (electronic) 1478-6362. 1. Sousa J. Arthritis Research & Therapy 10. (Sept-Oct. Rapid Com Mass Spec. (972-981). (Oct 1990). Mol Biotech. Raingeard I. Baum BJ & Ship JA (1993). Moritz RL & Simpson RJ (1990). J Prot Res. pp. 3. 3. pp. (Feb 2006). ISSN 1177-2719.(Feb 2008). What have we learned from clinical trials in primary Sjögren's syndrome about pathogenesis? Arthritis Res Ther. Rustvold DL. ISSN 0734-0664. ISSN 1535-3893. 127. 413-417. J Dent Res. (Feb 1983) pp. Sommerer N. ISSN 0580-7247. Shevchenko AV. 2. Liu TC. 3. Abbruzzese J. 20. Lauten JD. Li D. Jiang Z. Direct tandem mass spectrometry reveals limitations in protein profiling experiments for plasma biomarker recovery. (Sept 2006). Chernushevic I. Abbruzzese J & Kobayashi R (2005). Gerodontology. ISSN 0009-8981. Bootsma H. Wilm M & Mann M (2002). GE. J Prot Res. Christensen C (1982). Liu Z. pp. Gu S. J Geron. Diagnostic protein discovery using proteolytic peptide targeting and identification. 24. Bradbury E. (2007). pp. 21. 9. pp. (Sept. Bringer J. pp. 10. . Quality collection: the phlebotomist’s role in pre-analytical errors. (Sept 2006). (M219-224). Hu CA & Chen X (2004).1993). Navazesh M. Amit O. 1. Burning mouth syndrome (BMS): sialometric and sialochemical analysis and salivary protein profile. Hirtz C. (199-216). pp. 1. (Oct 1982). Zhao H. pp. (Jan 2002). Salivary protein profiling in type I diabetes using twodimensional electrophoresis and mass spectrometry. Davey MW & Grudzinskas JG (1983). ISSN 1079-5006. Kallenberg CG. Global investigation of p53-induced apoptosis through quantitative proteomic profiling using comparative amino acid-coded tagging. (998-1008). (1017-1023). (107118) ISSN 1073-6083. 1. (30-38). 5. Pan S.M. Ward LD. pp. Abdulahad WH. pp. 61. Fox PC. (2537-2548) ISSN 1097-0231. Clinical Proteomics 2. Shevchenko A. Cross-sectional and longitudinal analyses of stimulated parotid salivary constituents in healthy. ISSN 1535-9476. 3. ISSN 1535-3893. 4. 2007). Li D. A list of candidate cancer biomarkers for targeted proteomics (2006). Med Lab Obs. Biomarker Insights. (Sept. Koomen JM. Rossignol M & and Deville de Périere D (2006). 2004). Atkinson JC. (R22-R36). Kroese FG. (173–176). Teisner B. Wilmarth PA. Chevalier F. (1-48). Mol Cell Proteomics. Clin Chem Acta. Baggerly K & Kobayashi R (2004). pp. ISSN 00220345. (Nov 2004). Ernst DJ & Balance LO (2006). Polanski M & Anderson NL. Riviere MA. Documents Similar To Bio MetricsSkip carouselcarousel previouscarousel nextVoice Verification System for Mobile Devices Based on ALIZE LIA_RALDigital Communications QBBio MetricFinger Vein Extraction And Authentication for Security Purpose131104-ivector-microsoft-wj.pdfV4I206Bio Metricssanjujcs_si_84-88Ma KaleBio Metrics Pptarun_nairBiometrics in Network Security3-Handouts_Lecture_35.pdfDigital Communication VTU unit 1 SeminarFingerprint ReaderOverlapped Fingerprint Separation for Fingerprint AuthenticationPhdPres 1uid seminar presentation.pptxVeer Narmad South Gujarat University, Surat. M.sc. (Information Technology) ProgrammeThe 'New World Order'Biometrics Registrationyagiz09faces3d Password BharatDCTA Study of Cancellable Fingerprint Template Generation Techniques using cryptographyIris Recognition PptDSP CEN352 Ch2 SamplingMore From NicoletaSkip carouselcarousel previouscarousel nextRegulament cadru 2016.pdfLeadership La Elevi Si Tinericalendar bacalaureat 2018.pdfStandardeLucrari2013.pdfGhid de Practica 2015-2016PDI_FINALNorme de Protecţie a Muncii Înstandarde pentru profesia didactica.pdftendinte moderne in formarea cadrelor didactice in moldovaRealizați o Comparație Între Procesel Cognitive Inferioare Și Cele Superioar1Laboratorul Prescolarmvisem.docplan_de_dezvoltate_personala_model.pdf01 Ghid de Elaborare PasPIPPVACCINURILE PREVENTIE SAU BOALA O Noua Patologie Pediatrica Dr Christa Todea Gross Editura Christiana 2012Prajitura tavalitaTort Raw in trei culori.docxChoux a la creme.docxManual ElevPrimul Ajutor Pentru Un CGmona-lisaclasa a XI-a ManualBlackboard 1Fisa Lucru FunctiiPagini WEBCpp TutorialFisa Lucru Functiioferta_2014_2015_siteTort Raw in trei culori.docxFooter MenuBack To TopAboutAbout ScribdPressOur blogJoin our team!Contact UsJoin todayInvite FriendsGiftsLegalTermsPrivacyCopyrightSupportHelp / FAQAccessibilityPurchase helpAdChoicesPublishersSocial MediaCopyright © 2018 Scribd Inc. .Browse Books.Site Directory.Site Language: English中文EspañolالعربيةPortuguês日本語DeutschFrançaisTurkceРусский языкTiếng việtJęzyk polskiBahasa indonesiaSign up to vote on this titleUsefulNot usefulYou're Reading a Free PreviewDownloadClose DialogAre you sure?This action might not be possible to undo. Are you sure you want to continue?CANCELOK
Copyright © 2025 DOKUMEN.SITE Inc.