International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271 The Society of Digital Information andWireless Communications (SDIWC) 2012 (ISSN: 2305-0012) Secure Network Communication Based on Text-to-Image Encryption Ahmad Abusukhon1, Mohamad Talib2 , Issa Ottoum3 1 IT Faculty, - Computer Network Department Al-Zaytoonah University of Jordan Amman, JORDAN
[email protected] 2 Department of Computer Science University of Botswana Gaborone, BOTSWANA
[email protected] 3 IT Faculty, - Computer Network Department Al-Zaytoonah University of Jordan Amman, JORDAN
[email protected] 1 INTRODUCTION ABSTRACT Security becomes an important issue when secure or sensitive information is sent over a network where all computers are connected together. In such a network a computer is recognized by its IP address. Unfortunately, an IP address is attacked by hackers; this is where one host claims to have the IP address of another host and thus sends packets to a certain machine causing it to take some sort of action. In order to overcome this problem cryptography is used. In cryptographic application, the data sent are encrypted first at the source machine using an encryption key then the encrypted data are sent to the destination machine. This way the attacker will not have the encryption key which is required to get the original data and thus the hacker is unable to do anything with the session. In this paper, we propose a novel method for data encryption. Our method is based on private key encryption. We call our method Text-To-Image Encryption (TTIE). Information security is one of the most important issues to be considered when describing computer networks. The existence of many applications on the Internet, for example e-commerce (selling and buying through the Internet) is based on network security. In addition, the success of sending and receiving sensitive data using wireless networks depends on the existence of a secure communication (the Virtual Private Network, VPN) [11]. One of the methods which are used to provide secure communication is Cryptography. Cryptography (or sometimes referred to as encipherment) is used to convert the plain text to encode or make unreadable form of text [9]. An Encryption method uses what is known as an encryption key to hide the contents of a plain text (make it unintelligible). Without knowing the decryption key it is difficult to determine what the plain text is. In computer networks; the sensitive data are encrypted on the sender side in order to have them hidden and protected from KEYWORDS Network; Secured Communication; Text-toImage Encryption; Algorithm; Decryption; Private key; Encoding. 263 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) unauthorized access and then sent via the network. When the data are received they are decrypted depending on an algorithm and zero or more encryption keys as described in "Fig.1". Decryption is the process of converting data from encrypted format back to their original format [3]. Data encryption becomes an important issue when sensitive data are to be sent through a network where unauthorized users may attack the network. These attacks include IP spoofing in which intruders create packets with false IP addresses and exploit applications that use authentication based on IP and packet sniffing in which hackers read transmitted information. One of the applications that are attacked by the hackers is the E-mail. There are many companies providing the E-mail service such as Gmail, Hotmail and Yahoo mail. These companies need to provide the user with a certain data capacity, speed access, as well as a certain level of security. Security is an important issue that we should consider when we choose Web Mail [14]. Some of the techniques that are used to verify the user identity (i.e. to verify that a user sending a message is the one who he claims to be) are the digital signature and the digital certificate [5]. Digital signature and digital certificate are not the focus of this research. There are some standard methods which are used with cryptography such as private-key (also known as symmetric, conventional, or secret key), public-key (also known as asymmetric), digital signature, and hash functions [17]. In private-key cryptography, a single key is used for both encryption and decryption. This requires that each individual must possess a copy of the key and the key must be passed over a secure channel to the other individual [15]. Private-key algorithms are very fast and easily implemented in hardware. Therefore they are commonly used for bulk data encryption. Mainly, there are two types of privatekey encryption; stream ciphers and block ciphers [1]. (plaintext) Here is a text message Encryption Key (cipher text) #%XYZ#$ (plaintext) Here is a text message Decryption Key Secure Channel Receiver Figure 1 Encryption and Decryption methods with a secure channel for key exchange. 264 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) In stream ciphers a given text is encrypted one byte or one bit at a time whereas in block ciphers a given text is divided into chunks and then chunks are encrypted using an encryption algorithm. Example of stream ciphers are RC4 ciphers and one time pad ciphers. Examples of block ciphers are DES and AES [15]. Data encryption is performed serially or in parallel. Data encryption is performed in parallel in order to speed up cryptographic transformations. In Block ciphers algorithms such as DES there are some of the operations executed serially like CBC and CFB and other operations executed in parallel like ECB and OFB [10]. Parallel encryption is not the focus of this research. In this research we focus on stream ciphers rather than block ciphers. The main components of the symmetric encryption include plaintext, encryption algorithm, secret key, cipher text and decryption algorithm. The plaintext is the text before applying the encryption algorithm. It is one of the inputs to the encryption algorithm. The encryption algorithm is the algorithm used to transfer the data from plaintext to cipher text. The secret key is a value independent of the encryption algorithm and of the plaintext and it is one of the inputs of the encryption algorithm. The cipher text is the scrambled text produced as output. The decryption algorithm is the encryption algorithm run in reverse [16, 3, 14]. Public-key encryption uses two distinct but mathematically related keys – public key and private key. The public key is the non-secret key that is available to anyone you choose (it is often made available through a digital certificate). The private key is kept in a secure location used only by the user. When data are sent they are protected with a secret-key encryption that was encrypted with the public key. The encrypted secret key is then transmitted to the recipient along with the encrypted data. The recipient will then use the private key to decrypt the secret key. The secret key will then be used to decrypt the message itself. This way the data can be sent over insecure communication channels [16]. Examples on public key encryption are Pretty Good Privacy (PGP) and RSA. PGP is one of the most public key encryption methods. RSA [12] is based on the product of two very large prime numbers (greater than 10100). The idea of RSA algorithm is that it is difficult to determine the prime factors of these large numbers. There are other algorithms used to create public keys such as E1Game1 and Rabin but these algorithms are not common as RSA [9]. In this paper, we propose a new data encryption algorithm based on symmetric encryption technique. We propose to encrypt a given text into an image. 2 RELATED WORK Bh. P., et al. [2] proposed the Elliptic Curve Cryptography. In this method encoding and decoding a text in the implementation of Elliptic Curve Cryptography is a public key cryptography using Koblitz's method [7, 8]. In their work, each point on the curve represents one character in the text message. When the message is parsed each character is encoded by its ASCII code then the ASCII value is encoded to one point on the curve and so on. Our 265 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) work differs from their work. In their work they used public-key technique whereas in our work we use private key technique. They encoded each character by its ASCII value but we encode each character by one pixel (three integer values - R for Red, G for Green and B for Blue). Singh and Gilhorta [15] proposed encrypting a word of text to a floating point number that lies in range from 0 to 1. The floating point number is then converted into binary number and after that one time key is used to encrypt this binary number. In this paper, we encode each character by one pixel (three integer values R, G and B). Kiran et al. [6] proposed a new method for data encryption. In their method the original text (plain text) was ordered into a two-directional circular queue in a matrix say A of a given size say m x n. In their work data encryption is reliant on matrix disordering. To do so, they proposed to perform transformation operations on the rows or the columns of matrix A a number of times. They proposed three types of transformation operations to be performed on A. These operations were encoded as follows; 0 for circular left shift, 1 for circular right shift, and 2 for reverse operation. The matrix disordering was carried out by generating a positive random number say R, and then this number is converted to a binary number. The decision on which to perform rows or columns transformation was based on the value of the individual bits in the binary number. For example if the binary bit is 0 then row transformation is performed otherwise (if the binary bit is 1) column transformation is performed. To determine which transformation operation should be carried out; another random number is generated and then divided by 3. The reminder of the division is 0, 1, or 2. The reminder represents the transformation operation. In case of row transformation, two distinct rows were selected randomly by generating two distinct random numbers say R1 and R2. Another two distinct random numbers were generated c1 and c2 that represent two distinct columns. The two columns c1 and c2 were generated in order to determine the range of rows in which transformation had to be performed. After the completion of each transformation a sub-key is generated and stored in a file key. The file key is then sent to the receiver to be used as decryption key. The sub-key format is (T, Op, R1, R2, Min, Max) where: T: the transformation applied to either row or column. Op: the operation type coded as 0, 1, or 2, e.g., shift left array contents, shift right array contents, and reverse array contents. R1 and R2: two random rows or columns. Min, Max: minimum and maximum values of range for two selected R1 and R2. 3 OUR ALGORITHM Here we describe the main features of our proposed algorithm TTIE. Our algorithm includes two main phases namely the TTIE phase (this is where our work is based) and the ISE (ImageShuffle Encryption) phase. In the TTIE phase the plain text is transformed (encrypted) into an image. In this phase the plain text is concatenated as one string and then this string is stored into an array of characters say C. For each 266 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) character in C, one pixel of the resulting image is generated. Each pixel consists of three integers created randomly in advance and before the transformation (encryption) begins (see Fig 3-A, key 1). Each integer of the three integer values represents one color. The color value is in the range from 0 to 255. The result of this phase is a matrix, say M, in which each three contiguous columns in a given row represent one character of the original text (plain text). This is done in order to make it difficult for hackers to guess what the plain text is. To the best of our knowledge, no previous work has attempted transforming a text file into an image. Server work carried out by Kiran et al. [6]. In the ISE phase the matrix M is shuffled a number of times. The shuffle process includes row swapping and column swapping. In row swapping, two rows are selected randomly and then swapped. In column swapping two columns are selected randomly and then swapped. This matrix disordering makes it difficult for hackers to guess the original order of the matrix M. The shuffle key (key 2) is shown in Fig. 3-B. These two phases (the TTIE and the ISE) are carried out on the sender machine (in this paper it is the server machine) as described in Fig. 2. Client Plaintext cryptography c y … R34,R35,R36 Plaintext c Use key 1 Key1 cryptography R1,R2,R3 y … R34,R35,R36 Key 1 is generated randomly. 3 random numbers for each character R1,R2,R3 Pixels Matrix (RGB) or … pixel 1 Get the pixel’s matrix (RGB) or pixel 12 Use key 2 Key 2 Shuffle the matrix to produce a new Matrix. Random columns /rows are swapped with another random columns/rows. Re-shuffle the matrix to produce the original one. ciphertext ciphertext Store the pixels into an Image “img” of type PENG “img” image is sent to the client Read the pixels from the image “img”. Figure 22 The main steps of the Text-to-Image-Encryption (TTIE) algorithm Figure The main steps of the Text-to-Image-Encryption (TTIE) algorithm The second phase is the ISE phase. The work in this phase is based on a previous The encrypted message is then sent to the client machine where the message is 267 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) decrypted using respectively. key2 and key1 4 OUR EXPERIMENT Java NetBeans is used as a vehicle to carry out our experiments. We build the client's and server's programs on different machines and then we tested sending and receiving data on both sides. We use the following text message in our experiments: "encryption is the conversion of data into a form called a cipher text that cannot be easily understood by unauthorized people. decryption is the process of converting encrypted data back into its original form so it can be understood. The use of encryption decryption is as old as the art of communication in wartime. a cipher often incorrectly called a code can be employed to keep the enemy from obtaining the contents of transmissions. technically a code is a means of representing a signal without the intent of keeping it secret. examples are morse code and ascii simple ciphers include the substitution of letters for numbers the rotation of letters in the alphabet and the scrambling of voice signals by inverting the sideband frequencies". [13]. "Fig. 3" shows part of the generated keys namely "Key 1" and "Key 2" whereas "Fig. 3" (A) shows the format of "Key 1". Each value is delimited by the # symbol. The first three values (0, 5, 5) represent one pixel in the result image. In this pixel, R (the Red color value) = 0, G (the Green color value) = 5, and B (the Blue color value) = 5. In order to guarantee that distinct letters have unique colors i.e. unique RGB values, we create 26 different ranges because of 26 alphabets. For example, these ranges are unique subsets of the main set which ranges from 0 to 255. The letter A may be represented by RGB values in the range from 0 to 9, the letter B may be represented in the range from 10 to 19 and so on. This pixel (0, 5, 5) represents the letter A. The next three values (12, 13, 17) are another pixel which represents the letter B and so on. 0#5#5#12#13#17#20#25#25#30#32#32#37#41#37#47#52#53#55#56#60#68#69#68#78#74#79#88#82#86#9 (A) Part of Key 1 5736834348:644:34:3641834:868:4348:644,34:364,438:1643,34::6413:316:33::6:4:38:364:138136::8313463: (B) Part of Key 2 Figure 3 The format of Key1 and Key2 Figure 4 Cipher text – the output of Text-to-Image-Encryption 268 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) "Fig. 3" (B) shows the format of "Key2". Each two contiguous values represent two columns in the matrix M. The first pair in Key 2 is 375:364 which means that column number 375 is swapped with column number 364 and so on. "Fig. 4" shows the cipher text (is the text after it is encrypted as an image). The image in Fig. 4 is zoomed out many times to make it clear. In this image pixels are created randomly and thus they do not form a known shape like tree, fish, mobile, etc. The image shown in "Fig. 4" is sent to the client and on the client side we decrypt the cipher text shown in "Fig. 4" then we finally get the original text message (i.e. the plain text). 5 ANALYSIS In our algorithm each letter is represented by a random pixel, i.e., three random values namely R, G and B. To attack the data, hackers need to guess the following: 1. That each three contiguous values represent one letter. Since we send the data as integers’ values, it is hard to guess that each three contiguous values represent one letter. 2. If a hacker is able to guess point 1, then he needs to guess what random numbers represent the letters A, B, C, etc. In other words, a hacker needs to guess the value of key 1 "Fig. 3". Note that guessing the value of key 1 is difficult since we shuffle (scramble) the matrix using key 2 (key 2 is based on the algorithm described in [6]). For example, suppose that the message we want to send is "abcd". Using key 1 "Fig. 3" (A) the random numbers generated for "a", “b”, “c” and “d” are (0,5,5), (12,13,17), (20,25,25), and (30,32,32) respectively. The matrix before shuffling is described in Table-1. Table-2 describes the matrix after shuffling (Table-2 describes a simple swap operation where column 1 is swapped with column 2). Table 1 Pixels before shuffling- each three contiguous integers in a row represent one pixel or one letter. Letter R-value G-value B-value A 0 5 5 B 12 13 17 C 20 25 25 D 30 32 32 Table 2 Pixels after column 1 is swapped with column 2 Letter R-value G-value B-value ? 5 0 5 ? 13 12 17 ? 25 20 25 ? 32 30 32 Using statistical analysis, hackers may guess the letters from Table-1. However, it is very difficult for hackers to guess the letters from Table-2 because the order of the values RGB is changed. In other words, each three contiguous values RGB in Table-1 which represent one letter are now distributed randomly in Table-2 and thus make it difficult to guess that letter even if hackers use statistical analysis (a method involving a statistical breakdown of byte patterns such as the number of times any particular value appears in the encrypted output would quickly reveal whether any potential patterns might exist). Similarly, it is hard for "letter A follows letter B" analysis to decrypt the cipher text. With the simple calculation, the number of possible permutations to encrypt 26 letters is((256)3)26) (1) Since each pixel consists of three values and each one of these values is in the 269 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) range from 0 to 255, choosing three values has (256)3 permutations. We have 26 letters and thus the permutations for 26 letters is ((256)3)26 which is equal to 1.1679847981112819759721399310593 exp+195. The individual keys: key1 and key2, are generated each time a new message is sent. This is done in order to avoid regularity in the resultant cipher text. 6 CONCLUSION AND FUTURE WORK In this paper, we add another level of data security at the top of the data security system proposed by Kiran et al. [6]. In our method of encryption we first encrypted the text to an image (matrix of pixels) then based on the work done by Kiran et al. [6], we scrambled the matrix to a new one making it more difficult for hackers to guess the original text message. Our algorithm is good for text encryption for a network system as well as for individual offline machines. It is also useful for e-mail security since all messages stored in the mail box will be displayed as images and thus even if someone leaves the e-mail page on it is difficult for others to guess the meaning (the original text) of these images. In future, we propose to investigate dividing the text into blocks and then transfer each block into an image and thus create an individual key for each block. This will make it difficult for hackers to use statistical approach to guess the color of each letter since different colors will be assigned to a specific letter when it appears in different blocks. In addition we will investigate the efficiency of our proposed algorithm (the TTIE) when large scale data collection (multiple Gigabytes) is used. ACKNOWLEDGMENT I would like to acknowledge and extend my heartfelt gratitude to Al-zaytoonah University for their financial support to carry out this work successfully. REFERENCES [1] Bellare, M., Kilian J., and Rogaway, P.: The Security of cipher block chaining. In Proceedings of the Conference on Advances in Cryptology (CRYPTO’94). Lecture Notes in Computer Science, vol. 839 (1994). [2] Bh, P., Chandravathi, D., Roja, P.: Encoding and decoding of a message in the implementation of Elliptic Curve cryptography using Koblitz’s method. International Journal of Computer Science and Engineering, 2(5) (2010). [3] Chan, A.: A Security framework for privacypreserving data aggregation in wireless sensor networks. ACM transactions on sensor networks 7(4) (2011). [4] Chomsiri, T.: A Comparative Study of Security Level of Hotmail, Gmail and Yahoo Mail by Using Session Hijacking Hacking Test. International Journal of Computer Science and Network Security IJCSNS, 8(5) (2008). [5] Goldwasser, S., Micali, S., L.Rivest, R.: A Digital signature scheme secure against adaptive chosen-message attacks, SIAM Journal of Computing 17(2) pp. 281-308 (1998). [6] Kiran Kumar, M., Mukthyar Azam, S., and Rasool, S.: Efficient digital encryption algorithm based on matrix scrambling technique. International Journal of Network Security and its Applications (IJNSA), 2(4) (2010). [7] Koblitz, N.: Elliptic Curve cryptosystems, Mathematics of Computation, 48 (1987), pp. 203-209 (1987). [8] Koblitz, N.: A Course in Number Theory and cryptography. 2'nd edition. Springer-Verlag (1994). [9] Lakhtaria K. Protecting computer network with encryption technique: A Study. International Journal of u- and e-service, Science and Technology 4(2) (2011). [10] Pieprzyk, J. and Pointcheval, D.: Parallel Authentication and Public-Key 270 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 263-271 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) Encryption. The Eighth Australasian Conference on Information Security and Privacy (ACISP '03). Wollongong, Australia) R. Safavi-Naini Ed. SpringerVerlag, LNCS. (2003). [11] Ramaraj, E., and Karthikeyan, S.: A New Type of Network Security Protocol Using Hybrid Encryption in Virtual Private Networking. Journal of Computer Science 2(9) (2006). [12] Rivest, R.L., Shamir, A and Adelman, L.: A method of obtaining digital signatures and public key cryptosystems. Comms. ACM, 21(2) (1978). [13] SearchSecurity , definition Encryption [online] available at: http://searchsecurity.techtarget.com/definit ion/encryption Accessed on 13-06-2012. [14] Shannon, C. E.: Communication Theory of secrecy systems. Bell System Technical Journal (1948). [15] Singh, A., Gilhorta, R.: Data security using private key encryption system based on arithmetic coding. International Journal of Network Security and its Applications (IJNSA), 3(3) (2011). [16] Stalling, W.: Cryptography and network security principles and practices ,4th edition Prentice Hall. [online] Available at: http://www.filecrop.com/cryptographyand-network-security-4th-edition.html, Accessed on 1-Oct-2011. [17] Zaidan, B., Zaidan A., Al-Frajat, A., Jalab, H.: On the differences between hiding Information and cryptography techniques: An Overview. Journal of Applied Sciences 10(15) (2010). 271 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) An Analysis of Base Station Location Accuracy within Mobile-Cellular Networks Liam Smit, Adrie Stander and Jacques Ophoff Department of Information Systems University of Cape Town South Africa
[email protected], {adrie.stander, jacques.ophoff}@uct.ac.za Abstract—An important feature within a mobile-cellular network is that the location of a cellphone can be determined. As long as the cellphone is powered on, the location of the cellphone can always be traced to at least the cell from which it is receiving, or last received, signal from the cellular network. Such network-based methods of estimating the location of a cellphone is useful in cases where the cellphone user is unable or unwilling to reveal his or her location, and have practical value in digital forensic investigations. This study investigates the accuracy of using mobile-cellular network base station information for estimating the location of cellphones. Through quantitative analysis of mobile-cellular network base station data, large variations between the best and worst accuracy of recorded location information is exposed. Thus, depending on the requirements, base station locations may or may not be accurate enough for a particular application. Index Terms—Mobile-cellular network, Base station, Cellphone, Location, Information accuracy is that no modifications to the handset or network are required. However, by using network, handset, or hybrid approaches the accuracy of location information can be improved [1]. This study investigates the accuracy with which the locations of network base stations are known, as inaccuracy can impair the ability of many of the most feasible methods to provide accurate cellphone location estimates. It starts by providing background information on current techniques for determining the location of a cellphone within a mobilecellular network. Thereafter the research methodology followed in the investigation is discussed, followed by a report of the data collected. Finally, the findings are presented and the implications are highlighted. II. BACKGROUND Many handset and network techniques for determining location exist. The most widely known, using the internal hardware of the cellphone, is satellite positioning using GPS but WiFi, Bluetooth, and augmented sensor networks can also be employed [3], [4], [5]. The accuracy of these techniques can vary depending on the technology, line-of-sight, and sensor network coverage [6]. An improvement is to use such hardware in combination with mobile-cellular network information, such as in the case of Assisted-GPS (A-GPS) which uses network resources in the case of poor signal reception. In addition new algorithms have greatly improved the accuracy and efficiency with which a cellphone can calculate its position [7], [8]. However, major obstacles including high energy usage and non-availability of features in older cellphones remain. Thus using location methods based primarily on mobile-cellular network information is widespread. Global System for Mobile Communications (GSM) networks were not originally designed to calculate locations for the cellphones which access and make use of the network. Many methods have been proposed and developed to be retrofitted to existing networks [9]. There are a range of accuracies and costs associated with the various methods. The following are the most feasible methods, in order of increasing potential accuracy. • Cell identification (Cell ID) is the simplest location estimation method available, but also the least accurate. The estimated area is at best a wedge shaped area, comprising roughly a third of the cell (for three sectored sites), but I. I NTRODUCTION It is well known that the location of a cellphone, and thus the location of its user, can be determined with a certain degree of accuracy. This information can be used to offer various location-based services and creates the opportunity to build new information services that can be useful to both cellphone users and companies. In addition, location information can be used in other scenarios, such as providing law enforcement agencies with tracking data [1]. One example is that of a murder suspect being found by police after inserting his SIM card into the cellphone of a murder victim [2]. Location information can be used to aid police in tracking movements during investigations and locating suspects. However, it can also be valuable in tracing people for humanitarian reasons, such as search-and-rescue teams defining search areas for locating missing persons. By increasing the accuracy of location information the process of finding the cellphone and its user can be made faster, simpler, and cheaper. In borderline cases it can be the difference between finding someone in need of medical attention in time, or catching a suspect who would have otherwise escaped. Many of the most feasible methods for estimating the location of a cellphone within a mobile-cellular network depends on using the location of network base stations as known reference points from which to calculate the estimated position of the cellphone. The benefit of such network-based approaches 272 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) can include the entire circular area for sites using omnidirectional antennas in low-density single sector cells [10]. • Round Trip Time (RTT) is merely a measure of distance from the base station which is calculated from the time taken by a radio signal to travel from the base station to the cellphone and back. It provides a drastic reduction in the estimated location area compared to the Cell ID method for the same site. • Cell ID and RTT combines the aforementioned methods to provide an estimated location for the cellphone where these areas overlap [11]. • Observed Time Difference of Arrival (OTDOA) uses hyperbolic arcs from three (or more) base stations to estimate the location of a cellphone. These arcs are determined by the distance that the radio signals travel in the measured time (i.e. the difference) [12]. • Angle of Arrival (AOA) is a seemingly practical solution due to its straightforward method of calculating an estimated location from the intersection of the bearings to the cellphone provided by each base station. In practice this method requires expensive antenna arrays, which limit its feasibility despite its potential for high accuracy [10]. It is important to bear in mind that all of the above methods estimate the location of the cellphone, and thus its user, relative to the location of the base station. Next follows a discussion of factors impacting on accuracy and ways of negating these factors. A. Factors that negatively impact accuracy There are a number of well recognized challenges to accurately determining the location of cellphones. In addition to degrading accuracy these challenges can also increase the cost of estimating location. These challenges include non-line-ofsight and multi-path propagation of radio waves, the near-far effect in Code Division Multiple Access (CDMA) based third generation networks [12], base station density (or lack thereof) and accuracy of base station locations [13], optimisations for network capacity, and the unsynchronised nature of Universal Mobile Telecommunications System (UMTS) type networks [14]. There are varying levels of accuracy inherent to the methods and combinations thereof, as well as the enhancements which have been implemented for a particular method. In order of increasing accuracy: Cell ID (the whole area of a circular cell), Cell ID and sector (the area of the wedge), Cell ID and RTT (circular band), Cell ID and the intersection of multiple RTT determined hyperbolic arcs and A-GPS (outdoor only and which requires GPS functionality to be available in the cellphone) [15]. Pilot correlation method (PCM) has been left out of the list as it can be made as accurate as the fidelity of the spacing of the measurement sites [16]. Certain base stations with low utilisation, in small towns for example, will not be sectored and there will only be one site. It will be possible to obtain a circular band from RTT calculations, but to achieve a more precise location will require adding another measurement technique such as PCM or probabilistic fingerprinting [17]. B. Methods of improving accuracy To address these challenges there are various solutions and enhancements to methods for estimating location that can be employed. Less accurate measurements can be identified and then discarded, re-weighted or adjusted. It is feasible to use more than the minimum number of required data points, other methods which are not impacted by inaccurate measurements, and improving the precision of data by employing high fidelity measurements and oversampling [15]. It is also possible to employ techniques such as forced soft handover and minimising problems by using methods which are not negatively affected by challenges such as non-line-of-sight or multi-path radio wave propagation. The methods of estimating location can be organised into two groups. The first group consists of those methods which do not depend on base station location and are thus unaffected by the accuracy with which these locations are known. These methods include A-GPS, PCM [16], probabilistic fingerprinting [17], bulk map-matching, and the centroid algorithm [18]. The second group consists of methods which estimate the location of the cellphone and its user relative to the location of the base station and are therefore dependant on the accuracy with which these network base station locations are known. These include the Cell ID based methods of Cell ID, Cell ID and RTT, enhanced Cell ID and RTT, as well as cell polygons and RTT [15]. The Time of Arrival (TOA), OTDOA, as well its enhancements, such as cumulative virtual blanking, are affected in a similar fashion although this may have more of an impact as these methods are meant to deliver greater accuracy than the Cell ID based methods [14]. While not very widespread in implementation, the methods of AOA and the TOA to the Time Difference of Arrival algorithm are also negatively impacted [12]. There are a range of direct and indirect costs that can be attributed to most methods. The greater the work involved in network configuration, the larger the amount of additional hardware, and the more involved the deployment the higher the cost. Some methods require more human intervention to set up, such as PCM and probabilistic fingerprint matching, whilst others might require additional hardware, such as OTDOA requiring location measurement units. There is also the possibility that certain methods will reduce the network capacity. Thus it is vitally important to the network operator that existing infrastructure information (i.e. network base station locations) is as accurate as possible, to minimise and manage further costs to improve accuracy. In summary, it can be seen that there are many methods of determining the location of a cellphone within a mobilecellular network. While some of these are not dependent on base station location, the majority of network-based methods are. The accuracy of such data is thus the main focus of this study. 273 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) III. R ESEARCH M ETHODOLOGY A quantitative analysis of base station information in a Southern African mobile-cellular network was performed. The population consisted of all active base stations that form part of the network. Any base station that was operational on the network (including those that had recently gone live or are scheduled to be replaced) was included due to the possibility that such a base station could participate in estimating the location of cellphones. To evaluate the accuracy of base stations locations, they had to be evaluated by comparing their recorded locations to observations of their actual locations. For each base station a GPS location in a valid number format was stored in the network database. The method used to measure the base station’s actual observed location in order to be able to compare it to the stored value also served to validate the stored value. As this is a time consuming process it was not performed for all base station sites. Instead the entire population consisting of all available recorded base station locations was sampled. All sub-populations needed to be represented in the sample in order to be able to compare their results for commonalities or differences. Each of the ten regions which comprise the Southern African network were individually queried to find a list of sites that contain operational base stations. The sampling interval was determined by taking the number of sites and dividing it by the desired minimum sample size of thirty base stations for each region. The sampling interval was then rounded down in order to provide some spare sample base station locations in the event of being unable to locate one or more of the selected base stations and having to select another. A sampling method of a random starting number followed by periodic sampling was employed. For each sample the latitude and longitude was entered into Google Maps [19] with maximum zoom enabled together with the ‘Satellite’ and ‘Show labels’ options selected. The resulting aerial photograph was examined to identify the presence of a base station. If the base station could be identified then its position was measured using a set procedure: • Fig. 1. Aerial view of palm tree Fig. 2. ‘Street View’ of palm tree • The map was centred on the base of the sampled base station using the ‘Right-Click’ and ‘Center map here’ function. The latitude and longitude of the map centred on the base station was copied via the ‘Link’ function. For each base station that was found by the above process, the following additional information was captured in a spreadsheet to add to the original recorded base station location: • • The base station’s location was categorized as serving either: 1) a population centre (city, town, suburb, village, township, commercial or industrial area), or 2) an area outside of a population centre (mountains, road, farms or mines). Categorising information was captured for each base station location: 1) technology generation (second and/or third), and 2) equipment vendor. The GPS coordinates of the recorded and measured locations were then used to calculate the difference in metres between the two using the ‘Great Circle’ method: 1) employ the law of cosines, 2) convert to radians, and 3) multiply by the radius of Earth. If a base station could not be identified from the aerial photograph then the Google Maps Street View function was used to assist with identifying the base station location. If the base station still could not be detected then it was discarded and the next base station was selected and the identification and measuring process repeated. Reasons for not being able to identify a base station included unclear satellite photographs, the use of camouflage, and multiple base stations in close proximity to each other. An example of the difficulty in identifying structures is illustrated in Figures 1 and 2, which shows an aerial and ‘Street View’ of a base station camouflaged as a palm tree. The first stage of analysis consisted of categorising the collected data into various categories, such as geographic region, technology type, vendor, site owner, and whether or not the base station serves a population centre. This was followed by finding the minimum (best accuracy), maximum (worst accuracy), median, average and standard deviation values for the location accuracy data in each category. Accuracy results for base stations were placed into categories of various intervals of accuracy to better allow for evaluation in terms of desired levels of accuracy of the base station locations for • 274 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) TABLE I S UMMARY OF E NTIRE S AMPLE Interval Spacing 5 STDV 152.38 Worst 1634 Best 0.52 AVG 77.04 Median 25.38 Sample Size 369 varying applications. The preceding steps allowed for comparisons between different categories to see if there were differences or similarities in terms of accuracy. By identifying the base stations sites for which the recorded location accuracy was far worse and categorising them as outliers, these sites could be revisited in an attempt to find out why they differed so markedly to the rest of the base station locations in the category. IV. DATA A NALYSIS Due to the nature of how the network database was constructed the location data was both complete and in a valid number format. Accuracy was examined for the entire sample as well as the various categories of base stations. The best, worst, average (AVG) and median accuracies, together with the standard deviation (STDV) were calculated and is shown in Table I. By starting with a high level overview of all sampled base station locations it is possible to gain an understanding of the range of accuracies for the overall sample population. The data is represented in Figure 3 as a cumulative percentage of the base stations for a given level of accuracy. For example 66.67 percent of base stations have a recorded location that is accurate to within 50 metres of the measured location while 80 percent of recorded base station locations are accurate to within 100 metres of their measured locations. In a near ideal situation 100 percent of the base station locations would be accurate to less than two and half metres and rounded down, with zero deviation remaining the ultimate prize. This would result in a vertical line at zero metres from zero to 100 percent (of base stations) after which it would then make a ninety degree turn to the right, indicating that all base station locations are accurate to within the distances given on the X axis. Fig. 4. Map of South Africa [20] Fig. 5. Distribution per region A. Regions The base stations that comprise the sample are situated in ten regions. These regions are Central (CEN), Eastern (EAS), KwaZulu Natal (KZN), Lesotho (LES), Limpopo (LIM), Mpumalanga (MPU), as well as Northern (NGA), Central (SGC) and Southern Gauteng (SGS) and lastly Western (WES). These regions correspond in area to the provinces of South Africa, which are illustrated in Figure 4 for reference. Figure 5 shows the distribution graph for these regions. The KwaZulu Natal region stands out markedly as having the best average and median accuracy values. It also has the lowest worst accuracy figure, which all told, results in it having the lowest standard deviation. The Lesotho region has an extremely large worst accuracy figure which results in it having the worst average and the highest standard deviation of all the regions. The Central Gauteng region stands out for having the highest median value, despite not having a large worst value. The accuracy of the Central Gauteng is lower that of the Fig. 3. Entire Sample 275 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) Fig. 7. Fig. 6. Vendors Technology generation accurate vendors. Lesotho and Southern Gauteng regions for the cumulative most accurate 80 percent of base stations portrayed in Figure 5. It lags the other regions until the 160 metres of accuracy level is reached where it then begins to rapidly surpass the cumulative percentage of the other regions. In addition to the Central Gauteng and Lesotho regions, the Southern Gauteng and Northern Gauteng regions also lag behind the accuracy of the more accurate regions. B. Vendors The sampled base stations can also be categorised by the network equipment vendors that supply them. These base station vendors are Alcatel, Huawei, Motorola and Siemens. As before the highest (worst) numbers have been marked in bold and the lowest (best) numbers have been italicised in addition to be marked in bold. Looking at Table II it is clear that Siemens offers the best overall accuracy of the vendors and Huawei the worst, with Alcatel and Motorola falling in between these two extremes. However when analysing Figure 6, it is apparent that Alcatel offers the best accuracy for the most accurate cumulative 85 percent of its base stations that were measured (up to 110 metres difference between recorded and measured locations). Only when the last 15 percent of the base stations with accuracies worse than 110 metres are included, is it overtaken by Siemens. The accuracy of the base station location information for Huawei is confirmed as the lowest of the four vendors with Motorola assuming a position between it and the two more TABLE II BASE S TATION DATA CATEGORISED BY V ENDORS Vendor Alcatel Huawei Motorola Siemens STDV 141.77 133.76 170.9 62.05 Worst 879.32 849.44 1634 296.55 Best 0.52 1.73 1 1.99 AVG 68.14 86.8 77.12 47.52 Median 19.98 36.59 25.27 19.35 Sample Size 121 94 150 94 C. Technology generation When categorising base station locations by technology generation (for example second or third) there are three categories. This is due to co-location of base stations of different generations on the same sites. It is however not a simple ‘one for one’ correlation but rather a case where a site which has a second generation base station on it may also have a third generation base station on it but the converse is not necessarily true. This results in the three categories of sites: 1) Those with only second generation base stations (2nd Only). 2) Those with both third and second generation base stations (3rd & 2nd). 3) Those with second generation base stations which will possibly, but not necessarily, also include third generation base stations (2nd (incl. 3rd)). In comparing the sites in Figure 7 it becomes clear that the locations of those sites that contain third (and second) generation base stations are known with better accuracy than those containing only second generation base stations. Sites that contain second generation base stations, and possibly include third generation base stations, tend to fall in the middle. Unfortunately there is no set of sites that contain only third generation base stations and which would enable the comparison of sites that contain only second generation base stations to those that contain only third generation base stations. D. Site owner Base station sites are not necessarily used exclusively by the owner of the sites. This leads to a situation where some base stations are installed on sites that belong to another network operator. The “Own” network sites constitute the vast majority of the sampled base station locations. As such it was necessary to combine the sites from the other vendors into a single category “Other” in order to achieve a meaningful sample size. 276 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) According to Table III despite the ”Own” category containing a very large worst accuracy figure and being only slightly worse for best accuracy, it offers better overall accuracy as shown by all other metrics. When reviewing Figure 8, for any cumulative percentage, the “Own” category has a lower (better) accuracy measure for base stations locations than the “Other” category for at least the first cumulative 95 percent of most accurate recorded locations. E. Population centres Base station locations contain base stations that either serve centres of population or the areas in between them. Base stations serving population centres have a higher median value than the those serving the areas between population centres. However, Figure 9 shows that base stations in population centres only have better accuracy once the last (most inaccurate) 15 percent of the base station locations are included. F. Outliers Outliers were defined as the ten percent of the total sample with the worst accuracy. Notably this category covers all regions except for the KwaZulu Natal region and with only one base station location for Western region. In Table IV the results for the ten percent least accurate base station locations are presented. Even looking past the ‘Worst’ accuracy figure and instead at the average, median or even the ‘Best’ figures the outlier locations are clearly very inaccurate. To gain an understanding of why outliers occur and how their accuracies can be so poor, examples of outliers were selected to illustrate the difference in recorded versus measured accuracy. TABLE III BASE S TATION DATA CATEGORISED BY S ITE OWNER Site owner Own Other STDV 151.05 161.93 Worst 1634 879.32 Best 1 0.52 AVG 73.07 105.61 Median 25.07 49.14 Sample Size 318 49 Fig. 9. Population centres TABLE IV BASE S TATION O UTLIERS Interval Spacing 25 STDV 297.67 Worst 1634 Best 178.65 AVG 410.15 Median 303.92 Sample Size 38 The location of the access road (marked with a red ‘A’) which is used to reach the base station instead of the location of the base station itself (marked with six red dots) has been recorded in Figure 10. This Northern Gauteng region base station serves a population centre but its location is off by 324 metres. The Pretoria University building (tagged with Green arrow) in Figure 11 has been recorded instead of the actual location of the base station (indicated by six red dots) on the grounds. This base station serves a population centre in the Northern Gauteng region. It has a difference of 178.5 metres between its recorded and measured locations. Figure 12 shows that while the recorded location (marked by the red ‘A’) is atop the same mountain in the Central region, it does not follow the track all the way to the base station (circled with red dots). This results in a deviation of 879 metres from the measured location of the base station which serves a Fig. 8. Site owner Fig. 10. Watloo Despatch 277 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) application. This could have serious implications when the data is used for security-related incidents. Base station accuracies ranged from less than one metre to more that 1600 metres. Fifty percent of base stations were accurate to 25 metres (rounded) and 80 percent are accurate to 100 metres (rounded). However to include 90 percent of base stations it would be necessary to accept base station locations that were off 180 metres (rounded). The deviation of the least accurate ten percent of base station locations ranged from 179 to 1634 metres. The significance of these inaccuracies and their impact would depend on the particular application and its requirement for accuracy. When investigating outliers a discernible pattern emerged, revealing that the given locations were actually the access point, or the access road to the base station was recorded instead of the base station itself. Network operators can improve the accuracy of the estimated locations that they are able to provide by increasing the accuracy of recorded base station locations. This can be done by analysing and measuring aerial photographs or through taking more accurate measurements when performing routine maintenance, upgrades or equipment swap-outs of base stations. R EFERENCES [1] I.A. Junglas and R.T. Watson, “Location-based services,” Commun. ACM, vol. 51, no. 3, pp. 65–69, 2008. [2] J. Warner, “Murder Suspect Caught,” Weekend Argus (Sept. 11), p. 4, 2010. [3] V. Zeimpekis, G.M. Giaglis, and G. Lekakos, “A Taxonomy of Indoor and Outdoor Positioning Techniques for Mobile Location Services,” SIGecom Exch., vol. 3, no. 4, pp. 19–27, 2003. [4] M. Hazas, J. Scott, and J. Krumm, “Location-Aware Computing Comes of Age,” Comput., vol. 37, no. 2, pp. 95–97, 2004. [5] A. K¨ upper, Location-Based Services: Fundamentals and Operation. Chichester: Wiley, 2005. [6] S. von Watzdorf and F. Michahelles, “Accuracy of Positioning Data on Smartphones,” in Proc. 3rd Int. Workshop on Location and the Web, Tokyo, Japan, 2010, pp. 1–4. [7] M. Ibrahim and M. Youssef, “A Hidden Markov Model for Localization Using Low-End GSM Cell Phones,” in Proc. 2011 IEEE Int. Conf. on Communications (ICC), Cairo, Egypt, 2011, pp. 1–5. [8] J. Paek, K. Kim, J.P. Singh, and R. Govindan, “Energy-Efficient Positioning for Smartphones using Cell-ID Sequence Matching,” in Proc. 9th Int. Conf. on Mobile Systems, Applications, and Services, Maryland, USA, 2011, pp. 293–306. [9] W. Buchanan, J. Munoz, R. Manson, and K. Raja, “Analysis and Migration of Location-Finding Methods for GSM and 3G Networks,” in Proc. 5th IEE Int. Conf. on 3G Mobile Communication Technologies, Edinburgh, United Kingdom, 2004, pp. 352–358. [10] J. Borkowski, “Performance of Cell ID+RTT Hybrid Positioning Method for UMTS,” M. Sc. thesis, Tampere University of Technology, Finland, 2004. [11] J. Niemel¨ a and J. Borkowski. (2004) Topology planning considerations for capacity and location techniques in WCDMA radio networks. [Online]. Available: http://www.cs.tut.fi/tlt/RNG/publications/abstracts/ topoplanning.shtml [12] J.J. Caffery and G.L. St¨ uber, “Overview of Radiolocation in CDMA Cellular Systems,” IEEE Commun. Mag., vol. 36, no. 4, pp. 38–45, 1998. [13] M. Mohr, C. Edwards, and B. McCarthy, “A study of LBS accuracy in the UK and a novel approach to inferring the positioning technology employed,” Comput. Commun., vol. 31, no. 6, pp. 1148–1159, 2008. [14] P.J. Duffett-Smith and M.D. Macnaughtan, “Precise UE Positioning in UMTS using Cumulative Virtual Blanking,” in Proc. 3rd Int. Conf. on 3G Mobile Communication Technologies, London, United Kingdom, 2002, pp. 355–359. Fig. 11. Pretoria University Fig. 12. Carnarvon population centre at the foot of the mountain. From the above data several points need to be considered. Firstly, the large outliers and standard deviations for all vendors, technology generations, site owners, and almost all regions. The KwaZulu Natal region was a notable exception to this pattern, proving by example that good accuracy is entirely possible. Secondly, one category could be cumulatively more accurate for the majority of its (more accurate) base station locations but when including its least accurate base stations, these were so inaccurate that its overall accuracy would drop below that of another category. Lastly, the extent of the inaccuracy for the outliers was so great that it warranted further assessment. This revealed the ease with which highly inaccurate locations could be recorded. V. C ONCLUSIONS This paper builds on previous research, emphasising the importance of accurately knowing base station location for cellphone localisation [12], [21]. The nature of this study allows it to be replicated in any country and for any technology type or other category of base station site. The resulting data shows that depending on the requirements, base station locations may or may not be accurate enough for a particular 278 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 272-279 The Society of Digital Information and Wireless Communications (SDIWC) 2012 (ISSN: 2305-0012) [15] J. Borkowski, J. Niemel¨ a, and J. Lempi¨ ainen. (2004) Location Techniques for UMTS Radio Networks. [Online]. Available: http://www.cs. tut.fi/tlt/RNG/publications/abstracts/UMTSlocation.shtml [16] J. Borkowski and J. Lempi¨ ainen, “Pilot correlation positioning method for urban UMTS networks,” in Proc. 11th European Next Generation Wireless and Mobile Communications and Services Conf., Tampere, Finland, 2005, pp. 1–5. [17] M. Ibrahim and M. Youssef, “CellSense: A Probabilistic RSSI-Based GSM Positioning System,” in Proc. 2010 IEEE Global Telecommunications Conf., Cairo, Egypt, 2010, pp. 1–5. [18] A. Varshavsky, M.Y. Chen, E. de Lara, J. Froehlich, D. Haehnel, J. Hightower, A. LaMarca, F. Potter, T. Sohn, K. Tang, and I. Smith, “Are GSM phones THE solution for localization?” in Proc. 7th IEEE Workshop on Mobile Computing Systems and Applications, Washington, USA, 2006, pp. 20–28. [19] Google. (2012) Google Maps. [Online]. Available: https://maps.google. com/ [20] Htonl. (2010) Map of South Africa (via Wikimedia Commons). [Online]. Available: http://commons.wikimedia.org/wiki/File:Map of South Africa with English labels.svg [21] J. Yang, A. Varshavsky, H. Liu, Y. Chen, and M. Gruteser, “Accuracy Characterization of Cell Tower Localization,” in Proc. 12th ACM Int. Conf. on Ubiquitous Computing, Copenhagen, Denmark, 2010, pp. 223– 226. 279 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288 The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012) Technical Security Metrics Model in Compliance with ISO/IEC 27001 Standard M.P. Azuwa, Rabiah Ahmad, Shahrin Sahib and Solahuddin Shamsuddin
[email protected], {rabiah,shahrin}@utem.edu.my
[email protected] ABSTRACT Technical security metrics provide measurements in ensuring the effectiveness of technical security controls or technology devices/objects that are used in protecting the information systems. However, lack of understanding and method to develop the technical security metrics may lead to unachievable security control objectives and inefficient implementation. This paper proposes a model of technical security metrics to measure the effectiveness of network security management. The measurement is based on the security performance for (1) network security controls such as firewall, Intrusion Detection Prevention System (IDPS), switch, wireless access point and network architecture; and (2) network services such as Hypertext Transfer Protocol Secure (HTTPS) and virtual private network (VPN). The methodology used is Plan-Do-Check-Act process model. The proposed technical security metrics provide guidance for organizations in complying with requirements of ISO/IEC 27001 Information Security Management System (ISMS) standard. The proposed model should also be able to provide a comprehensive measurement and guide to use ISO/IEC 27004 ISMS Measurement standard. 1 INTRODUCTION The phenomena of instant grow and increasing number of cyber attacks has urged the organizations to adopt security standards and guidelines. International Organization for Standardization and the International Electrotechnical Commission (ISO/IEC) has developed the ISO/IEC 27000 series of standards that have been specifically reserved for information security matters. Through ISO/IEC 27001 Information Security Management System (ISMS) – Requirements [1], the organization may comply and obtain the certification in increasing level of protection for their information and information systems. Information security metrics can be ineffective tools if organizations do not have data to measure, procedures or processes to follow, indicators to make good protection decisions and people to develop and report to the management. To be useful, measurement of information security effectiveness should be comparable. Comparisons are usually made on the basis of quantifiable measurement of a common characteristic. The main problems in the information security metrics development are identified; (i) lack of clarity on defining quantitative effective security metrics to the security standards and guidelines; (ii) lack of method to guide the organizations in choosing security objectives, metrics and KEYWORDS Information security metrics, technical security metrics model, measurement, vulnerability assessment, ISO/IEC 27001:2005, ISO/IEC 27004:2009, Critical National Information Infrastructure. 280 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288 The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012) measurements for mitigating current cyber attacks [2][3]. Hulitt and Vaughn [4] report, lack of clarity in a standard quantitative metric to describe information system’s level of compliance with the FISMA standard, even though thorough and repeatable compliance assessment conducted using Risk Management Framework (RMF). Bellovin [5] remarks that defining metrics is hard. It is not infeasible, because an attacker’s effort is often linear, even when the exponential security work is needed. Those pursuing the development of a security metrics program should think of themselves as pioneers and be prepared to adjust strategies as experience dictate [6]. It is also known that ISO/IEC 27001 provides generic guidance in developing the security objectives and metrics and still lack of method to guide the organizations [2][3]. 1.1 Information Security Metrics In understanding the meaning of information security metrics, the security practitioners and researchers have simplified their definitions of information security metrics and measures (as described in Table 1). Table 1: Definitions of Information Security Metrics and Measures Author Stoddard et al. [7] Definition A metric is a measurement that is compared to a scale or benchmark to produce a meaningful result. Metrics are a key component of risk management. Security Metric is a quantitative and objective basis for security assurance. It eases in making business and engineering decisions concerning information security. The metrics are derived from comparing two or more measurements taken over time with a predetermined baseline. Brotby [9] The metric is a term used to denote a measure based on a reference and involves at least two points, the measure and the reference. A security is the protection from or absence of danger. The security metrics are categorized by what they measure. The measures include the process, performance, outcomes, quality, trends, conformance to standards and probabilities. “Security metrics are indicators, and not measurements of security. Security metrics highly depend on the point of reference taken for the measurement, and shouldn’t be considered as absolute values with respect to an external scale.” “A security metric contains three main parts: a magnitude, a scale and an interpretation. The security values of systems are measured according to a specified magnitude and related to a scale. The interpretation prescribes the meaning of obtained security values.” The measurement quantifies only a single dimension of the object of measurement that does not hold value (facilitate decision making) in itself. The metric is derived from two or more of the measurement to demonstrate an important correlation that can aid a decision. Masera et al. [10] Hallberg et al. [11] Lundholm et al. [12] Savola [8] From these definitions, we propose the definition as information security metrics is a measurement standard for information security controls that can be quantified and reviewed to meet the security objectives. It facilitates the relevant actions for improvement, provide decision making and guide 281 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288 The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012) compliancy to security standards. Information security measurement is a process of measuring/assessing the effectiveness of information security controls that can be described by the relevant measurement methods to quantify data and the measurement results are comparable and reproducible. Hence, information security measurement is a subset of information security metric. 1.2 Technical Security Metrics and Measurement We found the research activities for technical security metrics are very limited. Also, there is lack of specific technical security metrics research to measure the technical security controls from a total 133 security controls from the ISO/IEC 27001 standard. Vaughn et al. [13] define Technical Target of Assessment (TTOA) as to measure how much a technical object, system or product is capable of providing assurance in terms of protection, detection and response. According to Stoddard et al. [7], technical security metrics are used to assess technical objects, particularly products or systems [8], against standards; to compare such objects; or to assess the risks inherent in such objects. Additionally, the technical security metrics should be able to evaluate the strength in resistance and response to attacks and weaknesses (in terms of threats, vulnerabilities, risks, anticipation of losses in face of attack) [13]. At the same time, it indicates the security readiness with respect to a possible set of attack scenarios [10]. 1.3 Effective Measurement Requirement from ISO/IEC 27001 Standard Information security measurement is a mandatory requirement in ISO/IEC 27001 standard where it is indicated in a few clauses in: 4.2.2(d) “Define how to measure the effectiveness of the selected controls or groups of controls and specify how these measurements are to be used to assess control effectiveness to produce comparable and reproducible results”, 4.2.3(c) “Measure the effectiveness of controls to verify that security requirements have been met”, 4.3.1(g) “documented procedures needed by the organization to ensure the effective planning, operation and control of its information security processes and describe how to measure the effectiveness of controls”, 7.2(f) “results from effectiveness measurements” and 7.3(e) “Improvement to how the effectiveness of controls is being measured”. The importance of information security measurement is well defined in these clauses. 2 SECURITY METRICS DEVELOPMENT APPROACH The development of technical security metrics model (TSMM) is derived from the following approach: (1) The requirements of technical security controls are based on ISO/IEC 27002 ISMS – Code of Practices standard [14]. Identify relevant security requirements Achieve security performance objectives Align to risk assessment value (2) (3) (4) 282 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288 The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012) (5) The development of technical security metrics should not be an extensive list, but more focus on the critical security controls that provide high impact to the organizations. According to Lennon [15], “the metrics must be prioritized to ensure that the final set selected for initial implementation facilitates improvement of high priority security control implementation. Based on current priorities, no more than 10 to 20 metrics at a time should be used. This ensures that an IT security metrics program will be manageable.” (6) Align to risk assessment value (7) Ease of measurement. (8) Provide the process to obtain data/evidence, method and formula to assess the security measurement (9) Resistance and response to known and unknown attacks (10) Provide the threshold values to determine the level of protection (11) Provide actions to improve (12) Comply to the ISO/IEC 27001 standard 3 TECHNICAL SECURITY METRICS MODEL (TSMM) The development of TSMM is based on Plan-Do-Check-Act (PDCA) model. The development of TSMM is described in Figure 1. 3.1 PLAN Phase: (Selection of Controls and Definition) The focus is on the technical security controls that will be extracted from the total 133 security controls as stated in the Annex A of ISO/IEC 27001 standard. We define technical security metrics as a measurement standard to address the performance of security countermeasures within the technical security controls and to fulfill the security requirements. The technical security measures are based on information security performance objectives that can be accomplished by quantifying the implementation, efficiency, and effectiveness of security controls. ISO/IEC 27002 [14] provides the best practice guidance in initiating, implementing or maintaining the security control in the ISMS. This standard regards that “not all of the controls and guidance in this code of practice may be applicable and additional controls and guidelines not included in this standard may be required”. Federal Information Processing Standards 200 (FIPS 200) [16] defines technical controls as “the security controls (i.e., safeguards or countermeasures) for an information system that are primarily implemented and executed by the information system through mechanisms contained in the hardware, software, or firmware components of the system”. These are the basis of our definition for technical security controls. Based on NIST SP800-53 guidelines [17], the technical security controls comprise of: (1) Access Control (AC-19 controls) (2) Audit and Accountability (AU14 controls) (3) Identification and Authentication (IA-8 controls) (4) System and Communications Protection (SC-34 controls) 283 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288 The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012) The total of technical security controls from NIST SP800-53 guidelines is seventy-five (75). In the Appendix H of [18], the technical security controls are extracted from Table H-2. This table is mapping from the security controls in ISO/IEC 27001 (Annex A) to NIST Special Publication 800-53. We extract and analyze these technical security controls. We discover that: (1) Within three (3) main domains from ISO/IEC 27001 (Annex A) that include: A.10 Communications and operations management A.11 Access Control A.12 Information systems acquisition, development and maintenance (2) The initial total of technical security controls is forty-five (45). (3) The identified technical security controls only require a process or policy implementation and not related to technical implementation, such as A.11.1.1 Access control policy, A.11.4.1 Policy on use of network services, A.11.5.1 Secure log-on procedures, A.11.6.2 Sensitive system isolation, A.11.7.2 Teleworking, A.12.3.1 Policy on the use of cryptographic control and A.12.6.1 Control of technical vulnerabilities. (4) There are relationships with other security controls in NIST SP80053 document, including: • Management controls: Security Assessment and Authorization (CA), Planning (PL), System and Services Acquisition (SA) • Operational controls: Configuration Management (CM), Maintenance (MA), Media Protection (MP), Physical and Environmental Protection (PE), Personnel Security (PS), System and Information Integrity (SI). Figure 1: Technical Security Metrics Model (TSMM) The technical security controls should be practical, customized and measured according to organization’s business requirements and environments. A risk management approach will be used in identifying the relevant security controls. Threat and vulnerability assessment will be carried out. Threat and vulnerability assessment will be carried out. Also, identifying both impact and risk exposure to determine the prioritization of security controls. Cyber-Risk Index: A cyber-risk index is used to evaluate the vulnerability and threat probabilities related to the successfulness of current and future attacks. Attack-Vulnerability-Damage 284 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288 The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012) (AVD) model [19] and Common Vulnerability Scoring System (CVSS) Base Metric [20] are used to determine this weighted-index. We will extent and include the criticality or impact of loss to the organization. The CVSS base score is calculated using the information provided by the U.S. National Vulnerability Database (NVD) Common Vulnerability Scoring System Support v2 [21] and other relevant Cyber Emergency Response Team (CERT) Advisories and Report. 3.2 DO Phase: (Effective Measurement) The security requirements describe the actual security functional for technical security controls in protecting the information systems. Security functional includes the identification and authentication, access control, configurations/algorithm, architecture and communication. A set of performance objectives is developed for each security requirement. Vulnerability Assessment (VA) Index: The VA index is that can be derived by conducting the security or vulnerability assessment to the information systems through a simulation assessment, vulnerability scanning or penetration testing. This is based on the current assessment of potential attacks and will be weighted-index using the numeric CVSS scores: "Low" severity (CVSS base score of 0.0-3.9), "Medium" severity (CVSS score of 4.0-6.9) and "High" severity ( CVSS base score of 7.0-10.0). The VAI can also be derived from Vulnerability-Exploits-Attack (VEAbility) metrics [22]. The VEAbility measures the security of a network that is influenced by the severity of existing vulnerabilities, distribution of services, connectivity of hosts, and possible attack paths. These factors are modeled into three network dimensions: Vulnerability, Exploitability, and Attackability. The overall VEA-bility score, a numeric value in the range [0,10], is a function of these three dimensions. At this phase, the data collection must be easily obtainable and the measurements are not complicated. The measurement should be able to cater for current (through audit report and evidence of events) and future attacks. 3.3 CHECK Phase: (Security Indicators and Corrective Action) In verifying the effectiveness of controls, we measure how much the control decreases the probability of realization of the described risks. The attributes must be significant in determining the increase or decrease of risk. The expected measure function can be derived by the percentage of the successful or failure occurrences. For example, number of patches successfully installed on information systems (> 95%), number of security incidents caused by attacks from the network (< 3%). The determination of the percentage should consider that even though the security controls are implemented, the risk of attacks can still occur. Therefore, the percentage depicts the strength of the existing security controls in mitigating the risks. Security Indicator Index: If the measure is equal to or below the recommendation, the risk is adequately controlled, thus explain the effectiveness of the security controls. The proposed indicators are the trends of the derived measures and they must be within the 285 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288 The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012) same measurement scale in order to establish that the risk is adequately controlled [23]. This indicator index can also act as a compliance index to ISO/IEC 27001 standard. Algorithm or calculation combining one or more base and/or derived measures with associated decision criteria. For example: 0-60% Red; 60-90% - Yellow; 90-100% Green. Decision Criteria: Thresholds, targets, or patterns used to determine the need for action or further investigation, or to describe the level of confidence in a given result (for example, Red – intervention is required, causation analysis must be conducted to determine reasons for non-compliance and poor performance; Yellow – indicator should be watched closely for possible slippage to Red; Green – no action is required). Corrective actions provide the range of Potential changes in improving the efficiency and effectiveness of the security controls. They can be prioritized based on overall risk mitigation goals and select based on cost-benefit analysis. 4 CONCLUSIONS AND FUTURE WORK Malaysia government has seen the importance of Critical National Information Infrastructure (CNII) organizations to protect their critical information systems. In the year of 2010, the government has mandated for their systems to be ISO/IEC 27001 ISMS certified within 3 years [24]. The ISO 27001 certification is one of the most used corporate best practices for IT security standards, addressing management requirements as well as identifying specific control areas for information security. It provides a comprehensive framework for designing and implementing a risk-based Information Security Management System. The requirements and guidance cover policies and actions that are necessary across the whole range of information security vulnerabilities and threats. By customizing the security requirements from ISO/IEC 27002 and other relevant security standards and guidelines, the CNII organizations will implement the necessary security controls in compliance with ISO/IEC 27001 ISMS standard. The proposed TSMM is to provide guidance for CNII organizations to measure the effectiveness of the network security controls in compliance with ISO/IEC 27001 standard. The relevant type of information security measurement and metrics are interrelated and worth to use in aligning with business risk management. We also want to explore the usability of the ISO/IEC 27004 standard and conduct a case study at several CNII organizations. 3.4 ACT Phase: The developed technical security metric and measurement will be validated by the respective organizations. The metric is to comply to ISO/IEC 27001 standard requirements. The development of technical security metrics will be based on Information security measurement model in ISO/IEC 27004 standard. The measurement result should be reported to the management in ensuring the continuity and improvement of information security in the organization. 286 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288 The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012) ACKNOWLEDGMENT The authors wish to acknowledge and thank members of the research teams of the Long Term Fundamental Research Grant Scheme (LRGS) number LRGS/TD/2011/UKM/ICT/02/03 for this work. The research scheme is supported by the Ministry of Higher Education (MOHE) under the Malaysian R&D National Funding Agency Programme. 8. 9. 10. 5 REFERENCES 1. International Organization for Standardization and International Electrotechnical Commission, “Information technology - Security techniques Information security management systemsRequirements,” ISO/IEC 27001:2005, 2005. R. Barabanov, S. Kowalski, and L. Yngström, “Information Security Metrics: Research Directions,” FOI Swedish Defence Research Agency, 2011. C. Fruehwirth, S. Biffl, M. Tabatabai, and E. Weippl, “Addressing misalignment between information security metrics and businessdriven security objectives,” Proceedings of the 6th International Workshop on Security Measurements and Metrics - MetriSec ’10, p. 1, 2010. E. Hulitt and R. B. Vaughn, “Information system security compliance to FISMA standard: A quantitative measure,” 2008 International Multiconference on Computer Science and Information Technology, no. 4, pp. 799–806, Oct. 2008. S. M. Bellovin, “On the Brittleness of Software and the Infeasibility of Security Metrics,” IEEE Security & Privacy Magazine, vol. 4, no. 4, pp. 96–96, Jul. 2006. K. Stouffer, J. Falco, and K. Scarfone, “Guide to Industrial Control Systems ( ICS ) Security,” National Institute of Standards and Technology, NIST Special Publication 800-82, no. June, 2011. J. Stoddard, M., Bodeau, D., Carlson, R., Glantz, C., Haimes, Y., Lian, C., Santos, J., and Shaw, “Process Control System Security 11. 12. 2. 13. 3. 14. 4. 15. 5. 16. 6. 7. 17. Metrics – State of Practice,” Institute for Information Infrastructure Protection (I3P), vol. Research R, no. August, 2005. R. Savola, “Towards a Security Metrics Taxonomy for the Information and Communication Technology Industry,” in International Conference on Software Engineering Advances, 2007. W. K. Brotby, Information Security Management Metrics: A Definitive Guide to Effective Security Monitoring and Measurement. Auerbach Publications, 2009. M. Masera and I. N. Fovino, “Security metrics for cyber security assessment and testing,” Joint Research Centre of the European Commission,, vol. ESCORTS D4, no. August, pp. 1–26, 2010. J. Hallberg, M. Eriksson, H. Granlund, S. Kowalski, K. Lundholm, Y. Monfelt, S. Pilemalm, T. Wätterstam, and L. Yngström, “Controlled Information Security: Results and conclusions from the research project,” FOI Swedish Defence Research Agency, pp. 1–42, 2011. H. Lundholm, K., Hallberg, J., Granlund, “Design and Use of Information Security Metrics,” FOI, Swedish Defence Research Agency, pp. ISSN 1650–1942, 2011. J. Rayford B. Vaughn, R. Henning, and A. Siraj, “Information Assurance Measures and Metrics - State of Practice and Proposed Taxonomy,” in Proceedings of the 36th Hawaii International Conference on System Sciences, 2003, p. 10 pp. International Organization for Standardization and International Electrotechnical Commission, “Information technology - security techniques - Code of practice for information security management,” ISO/IEC 27002:2005, vol. 2005, 2005. E. B. Lennon, M. Swanson, J. Sabato, J. Hash, L. Graffo, and N. Sp, “IT Security Metrics,” ITL Bulletin, National Institute of Standards and Technology, no. August, 2003. W. J. Carlos M. Gutierrez, “Federal Information Processing Standards 200 Minimum Security Requirements for Federal Information and Information Systems,” National Institute of Standards and Technology,, no. March, 2006. Computer Security Division and Information Technology Laboratory, “Recommended Security Controls for Federal Information Systems and Organizations,” National 287 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 280-288 The Society of Digital Information and Wireless Communications (SDIWC), 2012 (ISSN: 2305-0012) 18. 19. 20. 21. 22. 23. 24. Institute of Standards and Technology, NIST Special Publication 800-53 , Revision 3, 2010. Computer Security Division and I. T. Laboratory, “Security and Privacy Controls for Federal Information Systems and Organizations,” National Institute of Standards and Technology, NIST Special Publication 800-53 , Revision 4, no. February, 2012. T. Fleury, H. Khurana, and V. Welch, “Towards A Taxonomy Of Attacks Against Energy Control Systems,” in Proceedings of the IFIP International Conference on Critical Infrastructure Protection, 2008. P. Mell, K. Scarfone, and S. Romanosky, “A Complete Guide to the Common Vulnerability Scoring System,” Forum of Incident Response and Security Teams, FIRST Organization, pp. 1–23, 2007. “NVD Common Vulnerability Scoring System Support v2,” NIST, National Vulnerability Database (NVD), http://nvd.nist.gov/cvss.cfm?version=2. M. Tupper and a. N. Zincir-Heywood, “VEA-bility Security Metric: A Network Security Analysis Tool,” 2008 Third International Conference on Availability, Reliability and Security, pp. 950–957, Mar. 2008. M. H. S. Peláez, “Measuring effectiveness in Information Security Controls,” SANS Institute InfoSec Reading Room, http://www.sans.org/reading_room/whitepa pers/basics/measuring-effectivenessinformation-security-controls_33398, 2010. J. P. M. Malaysia, “Pelaksanaan Pensijilan MS ISO/IEC 27001:2007 Dalam Sektor Awam,” Unit Pemodenan Tadbiran dan Perancangan Pengurusan Malaysia (MAMPU), vol. MAMPU.BPIC, p. 1, 2010. 288 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Trusted Document Signing based on use of biometric (Face) keys Ahmed B. Elmadani Department of Computer Science Faculty of Science Sebha University Sebha Libya www.sebhau.edu.ly
[email protected] ABSTRACT An online secured document exchange, secured bank transactions, and other ecommerce requirements need to be protected in commercial environment as it becomes big,. Digital signature (DS) is the only means of achieving it. This paper introduces a prototype online-algorithm in signing and verifying a document digitally. Document’s hash value is calculated, and protected using keys derived from face characteristics. This paper presents a method in signing document differ from traditional systems using passwords, smartcards or biometrics based on direct access. It utilizes a wirelessly accessed biometrics type to provide: 1. Un tampered biometrics in digital signatures. 2. Proof of a true identity. It also investigates existing digital signature system that is based on smart card. The obtained results were translated in term of speed and security enhancement which is highly in demand of e-commerce society [2]. Digitally signed messages may be anything that can be represented as a bit or a string, examples include electronic mail, contracts, or a message sent via some other cryptographic protocol [3]. Hash function is used in creating and verifying a DS. Hash function is an algorithm which creates a digital representation of document. Few hashing algorithms have been developed such Secure Hash Algorithm – 128 (SHA-1) and Message Digest Version 5 (MD5) to be used in ecommerce [4]. SHA-1 is a secured hash algorithm – 160. Produces 160-bit hash value. It is designed by NIST & NSA in 1993 revised 1995 as SHA – 160, US standard for use with digital signature algorithm (DSA) signature scheme. SHA256, SHA-384, and SHA-512. Designed for compatibility with increased security provided by the advanced encryption standard (AES) cipher[3]. In traditional DS, normally a smart card is used to perform signatures because the used cryptographic keys are stored inside the card [6]. However most of the existing DS systems, provide signature without proofing true identity[5], because they stand on using keys that anyone can use[7]. Therefore, documents have to be signed in such a way that proofs the true identity to avoid many attacks reported in [8][11]. This can be done only by using user’s personal characteristics such as fingerprint, Iris or face [7]. In automation security, faces are more secured than passwords, because of fine Keywords Digital Signature, Smart card, Hash, True identity and Biometric (face). 1. INTRODUCTION A mathematical scheme for demonstrating the authenticity of a digital message or document is known as Digital Signature (DS) [1]. DS convince a recipient that a document was created by a known sender. DSs are commonly used for software distribution, financial transactions, and in other cases to avoid forgery and tampering 289 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) differentiation between seemingly identical and won’t be forgotten or stolen [9]. Faces are also more secured than fingerprint, because fingerprint can be spoof using jelly[10]. Face image as any digital image always needs to be enhanced, to come out with its features clearly. This is because of the low quality images captured using camera devices. An image once captured and resized, is filtered using one of the known filters methods such as Linear, Wiener, Median, or Gaussian [9]. The image using one or more filtering algorithm is filtered several times until it becomes clear. Then information can be constructed [12]. The constructed information are stored for future comparison use. Face structure is eyes, mouth, and there position, which are different from person to another. All related together forming a unique characteristic of face [9]. There are more factors that can be used, that might make recognition easy or difficult they are listed in the FERET dataset [15]. Several face recognition algorithms were introduced in recent years. One of them is to measure the resulted triangle between eyes and mouth, but this is trivial of change, so a measurement should be taken in age intervals [][16]. The first mention to eigenfaces in image processing, a technique that would become the dominant approach in following years, was made by L. Sirovich and M. Kirby in 1986, it is based on principal component analysis (PCA) [16]. It becomes a base of developing many new face algorithms such as the measurement of the importance of certain intuitive feature, geometric measures between eye distances, with length ratio [17]. This work considered as an improvement of the research done by Costas et al (2008), they perform face-based digital signature in retrieving video segments using pre-extracted face in detection and recognition[14]. They use signature in retrieving while in this work we use segments of document to retrieve their signatures for verification. In our proposed DS system, we will introduce a system that uses keys derived from user’s face that will help in assuring true identity, face factors mentioned in [15] are out of our concern. In our security analysis, we only consider secure signature-generation systems that use SMCs to protect DS from attacks mention in [8]. Then to improve the use of biometrics in order to proof true user identity as in [13], and DS protection. Meanwhile avoid using systems based on biometric which can be tampered such as fingerprint [14]. In the proposed system, we shall construct keys from face that is protected using a Ron’s Code ver. 5 (RC5) a variable-key-size encryption algorithm. It is fast and suitable in protecting SMC keys [6]. Of course other solutions exist. However, they are out of the scope of this paper.. 2. METHODOLOGY AND DISCUSSIONS The following paragraphs will discuss proposed algorithm, experiment and obtained results. 2.1 PROPOSED ALGORITHM Sequence of DS in the proposed system for any given document shown in Figure 1 are performed in five steps described as following: • Enhancement, face image adjustment and filtering.. • Feature extraction, information extract and keys construction. • Document signing, obtaining document fingerprint. 290 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) • Signatures protection, document and keys protection. • Siging authenticity, signatures matching. distributed, it has to be filtered. In C a face image is shown after removing noise by using fast Fourier transform (FFT) “wiener filter”. It was used several times to come out with it features. The histogram of a well distributed information as a result filtering process was shown in D. Face image is then cropped to dimension 150x150 pixels in an area reach of information. It contains eyes, nose, and mouth to use it for feature extraction as shown in Figure 3. Figure 1. Sequence process in proposed system 2.2 FACE IMAGE ENHANCEMENT At each sign-point, there is a fixed webcam that is used to capture face image. Figure 3. Selected face image area that is reach of information. 2.3 INFORMATION EXTRACTION The cropped face image that prepared in paragraph 2.2 is used to extract features to calculate user key ( keys as sender’s key and keyr receiver’s key). User key are calculated using an equation (1). n ,m keys = ∑i= j =0 x1 (i, j ) keyr = ∑i= j =0 x2 (i, j ) n ,m Figure 2. Face image enhancement, and noise remove using wiener filter (1) Where x1 and x2 are sender’s, receiver’s cropped face image 160x160 pixels and i = 0..... n, j = 0...m. The selected area surrounds eyes, nose and mouth, within the dimension of 200x200 pixels. Figure 2 A shows an original image, while B presents the histogram of A, and it shows that information are not Obtained users keys are unique as results of applying the equation (1). Table 1 shows obtained users keys, it is insure that users can be distinguish from each other. 291 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) As a requirement of signing process, user requires another key ( keysr ), it is constructed after selecting a target user as a receiver of a document. Table 1. Users keys User No. 6 7 8 9 18 User key ( keys or keyr ) 581497 533018 668856 627684 632414 A user who intends to sign a document (Doc) has to first select or prepare a document, then a process using equation no. (3) to calculate a fingerprint of the document. SHA-1 as stable hash algorithm was chosen to calculate document’s fingerprint. Then a sender invokes RC5 algorithm with a constructed key ( keysr ) to encrypt the calculated fingerprint as shown in equation (4). Fingerprint = SHA-1(Doc) (3) key sr Encrypted-fingerprint = RC5 ( fingerprint) (4) The key is constructed by combining the two keys ( keys and keyr ) using equation no. (2). The constructed key ( keysr ) is used in encryption process. keysr = keysi, keyr j Sender prepares a message that contains document, its fingerprint and sender’s key and sent them to the receiver according to the equation no. (5) and as shown in Figure 4. Message = (Encrypted-fingerprint, Doc, whereΚ i = 1Κ n, j = n + 1Λ n + m (2) keys ) (5) The constructed keysr is used in both sides for encryption or/decryption and to protect an outgoing document in sender’s side or incoming document in receiver’s side. A third column in Table 2 shows results of applying equation (2) to construct a key ( keysr ) that is used in an encryption process. Table 2. Constructed key keysr between sender Combined key’s ( keysr ) 5814977533018 7533018581497 668856668856 627684632414 632414627684 Figure 4. Sequence process of a document signing and message encryption and receiver Sender’s key ( keys ) 581497 7533018 668856 627684 632414 Receiver’ s key ( keyr ) 7533018 581497 668856 632414 627684 2.5 SECURE SIGNATURES To avoid un authorized use of document and used keys in signatures, a RC5 cryptographic algorithm is used to protected them. Message contains document (Doc), fingerprint and keys are prepared by sender and protected using a formed key ( keysr ) that only target 2.4 SIGNING PROCESS 292 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) receiver can decrypt according to the equation (6). Encrypted-Mseg= RC5 keysr (Message) (6) construct a combination key ( keysr ) that needed to decrypt received fingerprint. Receiver calculates fingerprint of received document using SHA-1 algorithm and compare the two fingerprints to see if they match. 2.7 TESTING THE ALGORITHM 2.6 AUTHENTICITY of SIGNATURES Verification process is performed in the receiver’s side, receiver once he received an encrypted message, he decrypts it using his key ( keyr ) to obtain original document, sender’s key ( keys ), and encrypted fingerprint. Two processes are used one to calculate new fingerprint and second to construct combined key ( keysr ) as discussed in 2.3. The key is used to decrypt the received encrypted fingerprint. Signature is authenticated by comparing the two obtained fingerprints. A document is said authenticated and sent by trusted person if fingerprint are equals. In Figure 5 Two signature points are configured using two connected computers where each was equipped with webcam. They are used to test the proposed algorithm. One is for document signing, where the second is for signature verification. The system was tested for acceptance and rejection in term of signature-verification running process. This test is used to discover the system’s incorrect decision. Use was made of 1030 matching trails (MT) and three security levels. Table 3 shows used intensity level for each of the three levels security. Group (1) uses 30 low intensity face images, group (2) uses 400 medium intensity face images, where group (3) uses 600 high intensity face images. Table 3 Number of RecognizedRejected users by the proposed system Group Description Number of Users Recognized Rejected Recognized Rate Error Rate Group (1) Low 30 28 2 93.33 6.67 Group (2) mediu m 400 393 7 98.25 1.75 Group (3) High 600 592 8 98.67 1.33 Total 1030 1013 17 98.35 1.65 Figure 5 Received message decryption and signing authentication process. illustration of verification process starts by the decrypting of the received message with receiver’s key to obtain the sender’s key ( keys ). The keys will be used to The results of testing for the system to the MT, for group 1 a 28 out of 30 low intensity images were recognized, that is 93.33%. Meanwhile, 2 images were rejected with 6.67% as demonstrated in Figure 6. 293 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Figure 6 Accepted users by the proposed system Figure 8 Accepted and rejected users by the proposed system In group 2 which presents medium intensity face images as shown in Figure 7, 393 out of 400 face images, were recognized with percentage of 98.25% and only 7 images were rejected with percentage of 1.75%. In Table 4 tests of the proposed system done only for known users, the false acceptance rate (FAR) registered value equals to zero in all groups. The false rejection rate (FRR) goes in descending order which means configuring the system with big number of users will translate in getting less rejection as results show for low and all intensity. No. 1. 2. 3. 4. Table 4 FAR and FRR Ranges Description FAR Low intensity 0 Medium intensity 0 High intensity 0 For all intensity 0 FRR 6.67 1.75 1.33 1.65 2.8 THIS ALGORITHM AGAINST EXISTING ALGORITHMS Figure 7 Rejected users by proposed system In group 3, 600 high intensity face images were used as shown in Figure 8, 592 were recognized registering a 98.67% and 8 rejection, that is 1.33%. In summary 1030 face images were used, images got different intensity. 1013 of them were recognized with percentage of 98.35%, and only 17 of them were rejected, that is 1.65% and this demonstrates the success of the proposed system. In recent years few algorithms are developed to solve document signing digitally, but they fail in covering a lot of issues. The proposed algorithm solves them as will be described below. Most of DS systems as in Sufreenmohd at el (2002) or in Elmadani at el (2005) are using smart card to store keys and they suffer from forgery or tampering, were the proposed algorithm solve this problem by authenticating user with their faces, that were stolen, forgotten, tampered and user has nothing to carry with a hand. The existing DS algorithms as in Sirovich and 294 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Kirby (1987) are based on temple selection in extracting features, Givens at el (2003) and Yang (2010) algorithms are based on calculating values from image to compare them later with storing ones, such processes are time consuming, were in the proposed system features are based on forming keys which are numbers, directly processed no need to store them, which means protection from any attack mentioned by Langweg (2006). The proposed algorithm uses simple mathematic functions in key calculation different than algorithms used by Costas at el (2008) or used by Kirby and Sirovich (1990), our system is fast because it is based calculating numbers, it requires minor process, less memory space compared to them. 3. COCLUSION A model of signing – verifying document signature and protecting it was presented. Meanwhile, an investigation and drawback of existing digital signatures were shown. The proposed algorithm uses person characteristics biometrics (face) which is not possibly stolen or forge or tampered. It provides an easy method in use, that requires nothing to carry with. Our results shows that with no doubt, face is strongly recommended for online document signing. 4. REFERENCES 1. Nentwich F, Kirda E and Kruegel C. Practical Security Aspects of Digital Signature Systems. Technical University Vienna. Technical. 2006. Introduction to digital signature. www.esignature.gov.eg/ ElectronicSignature_Mechanizm_Arabic. 2010. Robshow M. MD2, MD5, SHA and other Hash Functions. RSA Laboratories Technical Report TR-101.1995. Wang X, Feng D, Lai X and Yu. Collision for Hash Functions MD4, MD5, HAVAL-128, and RIPEDMD. Proceedings of the 2th Annual 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 2. 3. 15. 4. International Cryptology Conference (Crypto ’04), Santa Barbara CA. 2004. Elmadani. A. B. Digital Signature forming and keys protection based on person’s Characteristics. Proceedings of the IEEE International Conference on Information Technology and e-services (ICITeS’2012). Souse, Tunisia. 2012. Elmadani A. B, Prakash V and Ramli A. R. Application of Smartcard & Secure Coprocessor, BICET conference. Brunei.2001. Elmadani A. B. Human Authentication using FingerIris algorithm based on statistical approach the 2nd International in network digital conference (NDT '10), Prague Czech Republic. pp (288-296). 2010. Spalka A. Cremers A and Langweg H. Protecting the Creation of Digital Signature with Trusted Computing Platform Technology Against Attacks by Trojan Horse. In IFIP Security Conference. 2001. Fang, Y. Wang Y and Tan T. Combining Color, Contour and Region for Face Detection. ACCV2002: The 5th Asian Conference on Computer Vision, Melbourne, Australia. 2002. Elmadani A. B, Prakash V, Ali, B. M, Ramli A. R and Jumari K. Fingerprint Access Control with Anti-spoofing Protection, Brunei Darussalam Journal of Technology and Commerce. Brunei. 2005. Langweg H. Malware Attacks on Electronic Signatures Revisited. In Sicherheit 3rd Jahrestagug Fachbereich Sicherheit der Gesellschaft fuer Informatik. 2006. Zhao W, Chellappa R, Phillips P. J and Rosenfeld A. Face Recognition: A Literature Survey. ACM Computing Survey. Vol. 35, no. 4. PP. 399–458. 2003. Yang J. Biometrics Verification Techniques Combing with Digital Signature for Multimodal Biometrics Payment System. Proceedings of Fourth International Conference on Management of e-Commerce and e-Government (ICMeCG), pp. 405-420. China.2010. Costas C, Nikolaidis N and Ioannis P. Facebased Digital Signatures for Video Retrieval. IEEE Transactions on Circuits and Video Technology, Vol. 18. No. 4. Pp. 549-553. 2008. Givens G, Beveridge J, Bruce A, Draper B and Bolme D. A Statistical Assessment of Subject Factors in the PCA Recognition of Human Faces. Proceedings of Computer Vision and Pattern Recognition Workshop (CVPRW’03). Wisconsin USA. 2003. 295 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 289-296 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) 16. Sirovich Land Kirby M. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America A - Optics, Image Science and Vision, Vol 4. No 3. pp 519–524. 1987. 17. Kirby M and Sirovich L. Application of the karhunen-loeve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12. No. 1. Pp 103–108. 1990. Ahmed B. Elmadani was born in Libya 1956. He received Ph.D. degree at UPM University Malysia in 2003. He worked in computer science department Faculty of Science Sebha University (Libya), from 1997 to 1999 as Assistant lectuerar and head department of computer Science, from 2003 – 2008 as lectuerar at the same department, from 2009- till now as asistant prof. and Vice Dean at the same Faculty. His main research interests include cryptography, information security, imaging, digital signature and biometrics fingerprint. 296 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) A Comparative Study of the Perceptions of End Users in the Eastern, Western, Central, Southern and Northern Regions of Saudi Arabia about Email SPAM and Dealing with it Hasan Alkahtani*, Robert Goodwin**, and Paul Gardner-Stephen** * Computer Science Department, College of Computer Science and Information Technology, King Faisal University, P.O. Box: 400 Al-Hassa 31982, Kingdom of Saudi Arabia
[email protected] ** School of Computer Science, Engineering and Mathematics, Faculty of Science and Engineering, Flinders University, GPO Box 2100, Adelaide SA 5001, Australia
[email protected],
[email protected] participants in the central region were more aware of Anti-SPAM programs than the participants in other regions. The results showed that the participants in all regions estimated that the existing Anti-SPAM programs were more effective in detecting English SPAM than Arabic SPAM. The results showed that most of the participants in all regions were not aware of the government efforts to combat SPAM and the participants in the central region were more aware of the government efforts than the participants in other regions. The results also showed that most of the participants in all regions were not aware of the ISPs efforts to combat SPAM and the participants in the central and western regions were more aware of the ISPs efforts than the participants in other regions. ABSTRACT This paper presents the results of a survey of email users in different regions of Saudi Arabia about email SPAM. The survey investigated the nature of email SPAM, how email users in the eastern, western, central, southern and northern dealt with it, and the efforts made to combat it. It also investigated the effectiveness of existing Anti-SPAM filters in detecting Arabic and English email SPAM. 1,500 participants located in the eastern, western, central, southern and northern regions of Saudi Arabia were surveyed and completed surveys were collected from 1,020 of the participants. The results showed that there were different definitions for email SPAM based on different users’ opinions in Saudi Arabia. The results showed that the participants in the central and western regions were more aware of SPAM than the participants in other regions. The results revealed that the volume of email SPAM was different from region to another and the volume of SPAM received by the participants in the northern and central regions was larger than that received in other regions. The results indicated that the majority of email SPAM received by the participants in different regions was written in English. The results showed that the most common type of email SPAM received in Arabic was emails related to forums and in English was phishing and fraud, and business advertisements. The results also showed that a few participants in all regions responded to SPAM and the average of the participants who responded to SPAM was larger in the southern region than other regions. The results showed that most of the participants were not aware of Anti-SPAM programs and the KEYWORDS: SPAM, email, Arabic, users, English, Saudi. 1. INTRODUCTION Email is an important tool for many people and they consider email as a necessary part of their daily lives. Email enables people to communicate with each other in a short time at low cost. Although email gives benefits for people who use it, some people, called spammers, have exploited email for their personal purposes. They send so-called SPAM to a large number of recipients. They can use programs known as spam-bots to catch email addresses on the internet or they can buy email addresses from individuals and organizations to send email SPAM to these addresses [11]. They 297 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) also use many methods to bypass SPAM filters such as tokenization and obfuscation [27]. Email SPAM is defined as "Unsolicited, unwanted email that is sent indiscriminately, directly or indirectly, by a sender having no current relationship with the recipient" [12], [13]. It is also defined as Unsolicited Bulk Email (UBE) that is sent to a large number of recipients who were not asked if they wanted to receive it [4], [14], [18]. Some studies [6], [7], [25] defined email SPAM as Unsolicited Commercial Email (UCE) that contains business advertisements sent to a large number of recipients. There are legal and technical methods [2] to combat SPAM. Legally, some countries enacted laws against SPAM. Examples of these countries include the United States of America [26], European Union countries and Australia [5]. However, there are no laws in Saudi Arabia to combat SPAM although research and projects were conducted to assess the problem of SPAM in the country. Technically, there exist many filters to combat SPAM. Examples of these filters include content based filters such as Bayesian [24], keywords [11] and genetic algorithms [15], and origin based filters like black lists [11], white lists [22], origin diversity analysis [16] and challenge response systems [21]. However, some of these techniques need to be updated to detect new types of email SPAM due to spammers developing ways to bypass these techniques. This study aimed to gain an understanding about: a. The nature of email SPAM, its definition based on email users’ opinions, its volume and its types in different regions of Saudi Arabia. b. Differences between Arabic SPAM and English SPAM received by the participants in different regions of Saudi Arabia. c. The effects of email SPAM on email users in different regions of Saudi Arabia. d. How email users in the eastern, western, central, southern and northern deal with email SPAM. e. The efforts of government to combat email SPAM. f. The efforts of ISPs to combat email SPAM. g. Evaluation of email users’ perception in different regions of Saudi Arabia for the effectiveness of Anti-SPAM filters in detecting Arabic and English email SPAM. 2. METHODOLOGY 2.1. Measures It was decided that the best way to answer the research questions was through a questionnaire. Therefore, a questionnaire was distributed to the participants in different region of Saudi Arabia and the responses were analyzed. Initially a pilot questionnaire was prepared and distributed to a few participants to get their comments about the questions. Then all the participants completed the 10 page questionnaire which included both yes/no answers and open ended answers. The questionnaire consisted of three main parts as follows. 2.1.1. General information questions In this part, the participants were asked for the following information: gender, age, nationality, speaking language, highest level of education, major area of study, work status and the nature of the work. These questions helped in understanding and comparing the level of awareness of users about email SPAM. Examples for the first part of questions of the survey can be seen in Figure 1. 1. Gender: O Male O Female 2. What is your age? 3. Nationality: O Saudi O Other 4. What is your current work status? O Student O Employed O Self employed Figure 1: Examples of questions of the first part of the survey 2.1.2. Email SPAM questions At the beginning of this part, the participants were asked for a definition of email SPAM in their own words in order to understand the definition of email SPAM based on their opinions. Then the study defined email SPAM as “an unsolicited, unwanted, commercial or noncommercial email that is sent indiscriminately, directly or indirectly, to a large number of recipients without their permission and there is no relationship between the recipients and sender”. This definition was in the survey and used to provide a reference point for the remainder of the questions. Care was taken to ensure that the respondents did not see the study 298 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) supplied definition until after they had supplied their own definition of email SPAM to prevent introducing a strong bias. The variety of responses to the question of what is SPAM is evidence that this approach was successful. Some examples of email SPAM, keywords and phrases used in email SPAM were given in the survey. The participants were asked if they knew about email SPAM prior to reading the survey, and what were the sources of their knowledge. The participants were also asked if they received email SPAM and how many email SPAMs they received on average weekly. They were also asked about the languages they received in email and types of Arabic and English email SPAM. The study focused on English and Arabic email SPAM because English is the main language in the world and Arabic is the native language in Saudi Arabia. The participants were asked about what they did when they receive email SPAM (i.e. the actions of email users in dealing with SPAM). The actions of emails users in dealing with SPAM described in the survey were as follows: reading the entire email SPAM, deleting the email SPAM without reading it, and contacting the ISP and notifying it about email SPAM. The participants were asked to choose one option from the following options to answer their action in dealing with SPAM. These options were as follows: never, sometimes and always. Figure 2 shows an example for questions of email users in Saudi Arabia about their actions in dealing with email SPAM. Note: the following question will ask you to choose the appropriate option for your dealing with email SPAM. For example, when I am not reading the SPAM email at all, I will circle the option "Never" in the scale in the following table. If I sometimes read SPAM, I will circle the option "Sometimes". Read the entire email Sometimes The participants were asked if they were aware of Anti-SPAM filters to block email SPAM, what were the sources of their knowledge about these filters, and how effective these filters were in detecting Arabic and English email SPAM. Examples for the second part of questions of the survey can be seen in Figure 3. 1. Everyone defines SPAM differently, in your own words, how would you define email SPAM? 2. Did you know about SPAM emails prior to reading this survey? O Yes O No 3. Have you received SPAM emails? O Yes O No 2. What is the language of SPAM email you receive on Percentage % average weekly? The percentages should add up to 100 % . O English O Arabic O Other language O Languages I do not recognize 5. Are you aware of Anti-SPAM programs? O Yes O No 6. If you have used Anti-SPAM programs, please rate their effectiveness in detecting English and Arabic email SPAM? Current Programs\ 0% 25% 50% 75% 100% Percentage The effectiveness of current programs in detecting Arabic email SPAM The effectiveness of current programs in detecting English email SPAM Figure 3: Examples of questions of the second part of the survey Never Always Figure 2: An example for questions of email users in Saudi Arabia about their actions in dealing with email SPAM 2.1.3. Questions about the efforts of government and ISPs to combat email SPAM The participants were asked if they purposely responded to an offer made by a SPAM email and what benefits they derived from email SPAM. They were also asked if they were affected by email SPAM and what were the effects of email SPAM on them. In this part, the participants were asked if they were aware of government efforts to combat SPAM and which efforts they were aware of. The participants were also asked if they were aware of ISPs efforts to combat SPAM and which efforts they were aware of. Examples for the third part of questions of the survey can be seen in Figure 4. 299 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) 1. Are you aware of efforts by the government in Saudi Arabia to combat email SPAM? O Yes O No 2. Are you aware of efforts by ISPs in Saudi Arabia to combat email SPAM? O Yes O No Figure 4: Examples of questions of the third part of the survey 3. RESULTS This section described the responses of the participants in the eastern, western, central, southern and northern regions of Saudi Arabia for the email users’ survey. 3.1. Respondents Definition and Awareness of Email SPAM Email users were asked for a definition of email SPAM based on their opinions. The responses showed that only 428 of 1,020 participants in different regions of Saudi Arabia answered this question. 42% of the participants who answered this question defined email SPAM as an email that was sent randomly to numerous recipients and contained Spyware, files, links, images or text that aims to hack the computer or steal confidential information such as email passwords, credit card numbers and bank account numbers. 39% defined email SPAM as an email that did not contain an email address or that was sent randomly, directly or indirectly by unknown senders or sources to a large number of recipients without their permission to receive it. 33% said that email SPAM was an email that was sent randomly and contained malicious programs such as Viruses, Trojans, Worms, or contained hidden links, strange contents and untrusted attachments that aimed to damage computer, software and hardware, or aimed to delete important information in a computer. 29% defined email SPAM as Unsolicited Commercial Email (UCE) or email that was sent to a large number of recipients and aimed to promote commercial advertisements which contained attractive words that were used to encourage the recipient to buy medical, technical and sexual products. 9% said that email SPAM was annoying and unimportant email that was sent from friends, but it was not sent in person and contained jokes, greetings, invitations to subscribe to forums, invitations for friendship by social networks such as Facebook, competition, puzzles, political and religious reviews, news, and scandals of famous people in the world. 7% defined email SPAM as junk email or as Unwanted, Unsolicited Bulk Email (UBE) that was sent randomly to a large number of recipients. 2.2. Participants The questionnaire was designed and distributed to 1,500 participants in the central, eastern, western, southern and northern regions of Saudi Arabia. Completed questionnaires were received from 1,020 participants in Saudi Arabia. 34% of the participants were from the central region, 20% were from the eastern region, 20% were from the western region, 13% were from the southern region and 13% were from the northern region. Table 1 shows general information about the participants who were located in the Eastern, Western, Central, Southern and Northern regions in Saudi Arabia. Table 1: General information about the participants in the Eastern, Western, Central, Southern and Northern regions of Saudi Arabia Question Region C 57% 43% 35% 41% 17% 6% 1% 81% 19% 99% 63% 3% 11% 7% 54% 16% 12% 20% 34% 12% 11% 8% 15% 29% 70% 1% 48% 8% 18% 19% 7% Gender: Age: Nationality: Language of speaking: Highest level of education: Major area of study for the participants who had diploma, bachelor, master or PhD: Work status: Nature of work for the employed participants: E W Part 1: General Information Male 62% 59% Female 38% 41% 15-25 58% 63% 26-35 25% 26% 36-45 14% 10% 46-55 2% 1% 56 and more 1% 0% Saudi 90% 88% Other 10% 12% Arabic 99% 100% English 62% 81% Other 2% 2% High school 17% 17% Diploma 2% 2% Bachelor 61% 70% Master 12% 7% PhD 8% 4% Education and 17% 13% teaching Computer science and information 31% 40% technology Social sciences 4% 5% Physical and 21% 7% biological sciences Health sciences 16% 7% and medicine Other 11% 28% Student 58% 61% Employed 42% 37% Self-employed 0% 2% Educational 44% 55% Medical 17% 8% Technical 14% 20% Management 21% 16% Other 4% 1% S 64% 36% 37% 38% 21% 2% 2% 75% 25% 99% 73% 3% 15% 5% 49% 17% 14% 16% 31% 20% 5% 12% 16% 41% 59% 0% 47% 8% 16% 24% 5% N 61% 39% 35% 47% 12% 6% 0% 86% 14% 99% 75% 1% 12% 8% 52% 19% 9% 26% 26% 15% 6% 10% 17% 45% 51% 4% 58% 16% 9% 3% 14% 300 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) 1% defined email SPAM as an email that was not related to recipients’ work or was not related to their interests. From the definitions described above, it can be clearly seen that there was no a specific definition for email SPAM by email users and that the most common definition for email SPAM was that “an email that was sent randomly to numerous recipients and contained Spyware, files, links, images or text that aims to hack the computer or steal confidential information such as email passwords, credit card numbers and bank account numbers”. The definitions described above indicated that some definitions of users in Saudi Arabia for email SPAM agreed with the international definitions for email SPAM by defining email SPAM as Unsolicited Commercial Email (UCE) and as Unsolicited Bulk Email (UBE). The differences in definition of email SPAM could cause problems in enacting laws to combat SPAM in Saudi Arabia and developing AntiSPAM filters for different languages such as Arabic. This suggests that there is a scope to specify an agreed definition for email SPAM which could be used for enacting laws to combat SPAM and developing Anti-SPAM techniques in Saudi Arabia. When the participants were asked if they knew about email SPAM prior to reading the survey, the results revealed that approximately third of email users in Saudi Arabia did not know about email SPAM and this is a significant and a risk for Saudi society. The results of the survey revealed that most of the participants indicated prior awareness of SPAM, suggesting that the survey itself has acted as a means of educating the participants about SPAM and its impact. This suggests that a broader survey or information campaign about SPAM would have a further positive impact in different regions of Saudi Arabia. Also, this suggests that conducting research related to SPAM and funding researchers who work in the field of SPAM could help in increasing the awareness of email users in all regions about email SPAM and hence reducing the impact of email SPAM in Saudi Arabia. As seen in Table 2, the results revealed that the participants in the central and western regions were more aware of SPAM than the participants in other regions of Saudi Arabia. This could be because of the major area of study where the results indicated that the percentages of the participants who studied computer science and information technology in the western and central regions were higher than the percentages of the participants who studied the same area of study in the other regions. Also, it could be because of the work nature where the results indicated that the participants who worked in technical positions in the central and western regions were more than the participants who worked in the same positions in the other regions. The results suggest that there should be a focus on awareness programs about SPAM for users in different regions of Saudi Arabia, especially in the eastern, southern and northern regions. These awareness programs could be executed by the government sectors or private sectors. The results, as shown in Table 2, revealed that most of the participants in all regions knew about SPAM by self-education through the internet and forums, and friends and relatives. The results showed that there were prominent efforts by school and university education in informing users about SPAM in all regions compared to other public and private sectors, and the educational sectors in the southern region have the highest percentage in the awareness of users about SPAM. The results also revealed that there was a deficiency in the government efforts in awareness of email users about SPAM in all regions, and the efforts of the government in informing users about SPAM was better in the northern region than other regions. Also, the results revealed that there were no government efforts in informing users about SPAM in the western region. The results also revealed that there was a deficiency in the ISPs efforts in awareness of users about SPAM although they are one of the sectors who are responsible to control internet service in Saudi Arabia. This suggests that the government should focus on the awareness of users about SPAM in all regions, especially in the western region. The awareness programs could be executed by educational sectors such as universities, broadcast media such as magazines and newspapers, and sectors who are responsible to provide and control internet services in Saudi Arabia. 301 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Table 2: Responses of the participants in the Eastern, Western, Central, Southern and Northern regions about their knowledge about email SPAM Question Region C E W Part 2: Email SPAM 57% 43% 9% 67% 70% 30% 7% 76% S N Did you know Yes about SPAM emails prior to reading the No survey? Internet Service Providers (ISPs) The internet and forums Broadcast media such How as radio, TV, do you newspapers and know magazines about Friends and relatives SPAM Government emails? ministries and commissions Through my school or university education Other 72% 28% 6% 59% 56% 44% 13% 51% 37% 63% 13% 50% 10% 45% 6% 38% 4% 21% 56% 0% 29% 3% 13% 39% 4% 41% 5% 11% 48% 4% 44% 7% 8% 44% 8% 40% 6% 3.2. Volume and Nature of Email SPAM in Saudi Arabia When the participants were asked if they received email SPAM, the results showed that most of the participants in Saudi Arabia received email SPAM. Email users estimated they received an average of 108 SPAM emails per week. Another study, conducted by [17], showed that the participants received an average of 94.5 emails SPAM per week. By comparing the volume of SPAM received in Saudi Arabia to the volume of SPAM in that study [17], it can be clearly seen that the volume of SPAM in Saudi Arabia was broadly similar to the volume in that study. The results shown in Table 3 revealed that the highest percentage of the participants who received SPAM was in the southern region. The results indicated that the average of the number of email SPAM received weekly by the participants was different from region to another. The results revealed that the average of SPAM received weekly was 77 emails SPAM in the eastern region, 104 emails SPAM in the western region, 126 emails SPAM in the central region, 95 emails SPAM in the southern region and 129 in the northern region. This indicated that the number of SPAM received was larger in the northern and central regions than other regions. When the participants were asked about the language of email SPAM that they received, the results showed that the most email SPAM received (59%) was in English, 34% was in Arabic, 4% was not recognized and 3% was in other languages. A study conducted in Bahrain indicated that 64% of the respondents said that they received English SPAM, 18% said that they received Arabic SPAM and 18% said that they received both Arabic and English SPAM [1]. The results of this study indicated that the volume of English SPAM received in Bahrain was similar to the volume of English SPAM that received in Saudi Arabia. The results of the study also revealed that the volume of Arabic SPAM received in Bahrain was less than that received in Saudi Arabia. As seen in Table 3, the results revealed that the volume of English SPAM received was larger in the northern region than other regions. Also, the results showed that the volume of Arabic SPAM was larger in the western region than other regions. The number of unrecognized SPAM was larger in the southern and northern regions than other regions. The results showed that the participants in the southern region received SPAM in other languages such as Chinese, Japanese, Russian, Turkish, French, Brazilian, Spanish, Persian, German, Italian, Hindi, Urdu and Hebrew more than other regions. Table 3: Responses of the participants in the Eastern, Western, Central, Southern and Northern regions about the languages of email SPAM Question Region C 73% 27% 61% 33% 4% 2% E W Part 2: Email SPAM Yes No English 70% 30% 60% 33% 4% 3% 75% 25% 51% 43% 3% 3% S 83% 17% 61% 30% 5% 4% N 65% 35% 65% 29% 5% 1% Have you received SPAM emails? What is the language of SPAM email you receive on average weekly? Arabic Not recognized Other language When the participants were asked about the types of Arabic and English emails SPAM that they received, the results showed that there were many types for both Arabic and English SPAM and these types were different from Arabic to English SPAM. Types of Arabic and English SPAM and the differences between them can be seen in Table 4. 302 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Table 4: The differences between Arabic and English email SPAM received by end users in Saudi Arabia Types of email SPAM Business Religious and Political Party Pornographic Forums Products and services Phishing and Fraud Other Total AR (%) 31 5 10 36 11 6 1 100 EN (%) 30 2 24 3 12 28 1 100 As described in Table 4, it can be clearly seen that the volume of business advertisements, emails from religious and political parties, and emails related to forums was larger in Arabic SPAM than English SPAM. The percentages indicated that there was a significant difference in composition between Arabic and English SPAM, for example in the volume of forum emails where this volume was much more in Arabic SPAM than English SPAM. Also, the results showed that the volume of pornographic emails, products and services emails, and phishing and fraud emails was larger in English SPAM than Arabic SPAM. The percentages indicated that there was a significant difference between Arabic and English email SPAM in the volume of pornographic and phishing and fraud emails where this volume was much more in English SPAM than Arabic SPAM (See Table 4). The results revealed other types of Arabic SPAM that did not exist in English SPAM. These types included news, training consultation, jokes, scandals of famous people, puzzles, greetings, competition, and invitations by social networks websites such as Facebook. A study conducted by the Communication and Information Technology Commission (CITC) in Saudi Arabia in 2007 showed that 64% of email SPAM received in Saudi Arabia were direct marketing, 25% were sexual emails, 5% were religious emails, and 5% was other types [20]. However, this study did not specify if the email SPAM received was written in Arabic or English. The results of the CITC study indicated that the volume of religious emails, pornographic emails and other types of email SPAM was similar to the volume of the same types in this study. The results, seen in Table 4, showed that the volume of pornographic emails for both Arabic and English email SPAM was lower compared to the same type in other countries such as Bahrain. The results of a study conducted in Bahrain by [1] revealed that 76% of the participants received pornographic emails while 24% did not receive pornographic emails. The results of this study did not specify if the volume of pornographic emails was larger in English or Arabic. Therefore, the results of this study indicated that the volume of pornographic emails in Saudi Arabia was lower and this could be because the access to pornographic websites is not allowed for public in Saudi Arabia and this could be contributed in reducing the volume of SPAM email that sent from pornographic websites. Table 5 shows the different averages of Arabic email SPAM received by the participants in the eastern, western, central, southern and northern regions of Saudi Arabia. The results revealed that the participants in the southern region received business advertisements more than the participants in other regions. The volume of religious and political emails received by the participants in the eastern region was higher compared to the same type received by the participants in other regions. The results indicated that the volume of pornographic emails received in the western and central regions was larger than the same type received in other regions. In addition, the results revealed that the participants in the northern region received more forums emails than the participants in other regions. The volume of products and services emails was larger in the eastern and western regions than other regions. The results showed that the volume of phishing and fraud was larger in the western region than other regions. The percentages also showed that the volume of other types of Arabic SPAM was larger in the eastern, central and southern regions than other regions (See Table 5). Table 5 shows the different averages of English email SPAM received by the participants in the eastern, western, central, southern and northern regions of Saudi Arabia. The results showed that the volume of business advertisements was larger in the northern region than other regions. The volume of religious and political emails received in the western and southern regions was larger compared to the same type in other regions. The results revealed that the participants in the eastern region received pornographic emails more than other regions. The volume of forums, products and services, and other types of English SPAM was larger in the western region than other regions. 303 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) The results also showed that the volume of phishing and fraud emails was larger in the southern region than other regions. Table 5: Averages of Arabic and English email SPAM received by the participants in the Eastern, Western, Central, Southern and Northern regions of Saudi Arabia Types of email SPAM Business Religious and Political Parties Pornographic Forums Products and Services Phishing and Fraud Other E AR % EN % AR % W EN % AR % C EN % AR % S EN % AR % N EN % 31 6 9 35 13 5 1 27 2 27 3 9 31 1 29 5 11 30 13 12 0 28 3 22 6 17 22 2 32 5 11 36 10 5 1 31 2 24 2 13 28 0 34 4 6 39 11 5 1 30 3 23 3 9 32 0 31 5 9 42 8 5 0 32 2 26 2 10 27 1 A study conducted by [3] described some keywords and phrases used in Arabic and English email SPAM in Saudi Arabia. These keywords and phrases were collected from different ISPs in Saudi Arabia. Examples of Arabic SPAM keywords and phrases are as follows: ""فياقرا, “”ألعاب, ””أدوية, ””ريجيم, ””فرصة للربح, ””مبروك لقد ربحت, ””مسابقة, ” ”اربح مليون ﷼ سعودي,” ”تعليم,” ”انضم إلينا,” بطاقة ”خضراء للسفر إلى أمريكا,” ”موﺿة,”ً ”حصريا,””زواج ,” ”شريك العمر,” ”ﺟنس,” فمافوق+18” ,””رومانسية ,” ”فضيحة,” ”مفاﺟآت,""برامج, ""تدريب, “”تبرعات, ""اشترك في المنتدى, "" شارك واربح, ” ”ﺟائزة,” عرض ”خاص,” ”ھدية,” ”إباحية,””أقل ا سعار, “ ”ثورة, “”أسھم, ” ”بشرى, “”أموال, “”دورة, ” ”أزياء,” مقاطع ”مضحكة, ""اعمل من المنزل, and " "للرﺟال فقط. Examples for English SPAM keywords and phrases are as follows: "sex", "Cialis", “gift”, ”Dollar”, ”discount”, ”bonus”, "girls", "Viagra", "Loto winner", "Investment", "Forex", "Green", "Visa and Master", “reactivate your email account”, “Incomplete personal information”, “Verify your account”, “Account not updated”, “Financial Information Missing”, “$USD”, “You have won”, “fund”, “money”, “winning promotion”, “transferring”, "Training", "South Africa", "Partnership", "Bank loans", and "work and live in USA". 3.3. Actions of Email Users in Dealing with SPAM The participants were asked about the appropriate action for dealing with email SPAM. In the survey, the participants were given three actions for their dealing with SPAM. These actions were as follows. The first action was that reading the entire email SPAM. The second action was that deleting the email SPAM without reading it. The third action was that contacting with the ISP and notifying it about email SPAM. To answer this question, the participants were asked to evaluate their actions in dealing with SPAM by choosing one of the following options for each action. The options for each action were as follows: never, sometimes and always. Firstly, when the participants were asked if they read the entire email SPAM, the results revealed that the most of the participants said that they sometimes read the entire SPAM. The results showed that participants in the eastern and central regions were better than the participants in other regions where the results showed that the average of the participants who said that they never read the entire email SPAM was larger in the eastern and central regions than other regions (See Table 6). Secondly, when the participants were asked if they delete the email SPAM without reading it, the results showed that the most of the participants said that they sometimes delete the email SPAM without reading it. The results, as shown in Table 6, revealed that the participants in the central and eastern regions were better than the participants in other regions where the results indicated that the average of the participants who said that they always delete the email SPAM without reading it was larger in the central and eastern regions than other regions. Thirdly, when the participants were asked if they contact with ISP and notify it about email SPAM, the results revealed that the most of the participants said that they never contact with ISP and notify it about SPAM (See Table 6). The results indicated that the participants in the southern and northern regions were better than the participants in other regions where the results revealed that the average of the participants who said that they always contact with ISP and notify it about SPAM was larger in the southern and northern regions than other regions. The results of a study conducted by [17] showed that 11.7% of the participants said that they contacted their ISPs when they received email SPAM. By comparing the results of two studies, it can be clearly seen that most of email users in the two studies did not contact with ISPs regarding SPAM problems. From the results shown above regarding the actions of users in dealing with email SPAM, it can be clearly suggest that the ISPs in Saudi Arabia should inform users about email SPAM, its impacts, technical and legal efforts of the ISPs to combat SPAM, and what are the 304 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) necessary procedures that users do when they receive SPAM. When the participants were asked if they responded to an offer made by a SPAM email, the results showed that the most of the participants in all regions did not respond to an offer made by a SPAM email (See Table 6). The results revealed that the participants in the southern region responded to offers made by SPAM email more than the participants in other regions of Saudi Arabia. The results indicated that the participants in the western and southern regions were enjoyed fun emails involved in SPAM more than the participants in other regions. The results also showed that the participants in the eastern and northern regions used purchasing and selling offers involved in SPAM email more than the participants in other regions. Also, the results revealed that the participants in the central and southern and northern regions used SPAM as a learning tool more than the participants in other regions. The participants in the northern region derived other benefits from SPAM such as friendship requests more than the participants in other regions (See Table 6). The results indicated that as long as some users responded to some offers of SPAM, email SPAM could be increased and caused problems for other users unless those users combat it. This suggests that laws against SPAM in Saudi Arabia could reduce the incidence of SPAM by greatly reducing the ability of spammers to make sales without fear of penalties. Table 6: Actions of users in the Eastern, Western, Central, Southern and Northern regions of Saudi Arabia in dealing with email SPAM Question Region C 37% 53% 10% 7% 50% 43% 83% 14% 3% 20% 80% 18% 47% 54% 0% 1- Read the entire email What do you do when you receive SPAM email? 2- Delete the email without reading it E W Part 2: Email SPAM Never 40% 33% Sometimes 48% 62% Always 12% 5% Never Sometimes Always Never Sometimes Always Yes No Purchasing and selling Learning Fun Other 11% 49% 40% 77% 19% 4% 19% 81% 23% 33% 56% 3% 6% 59% 35% 87% 12% 1% 15% 85% 10% 39% 71% 3% S 28% 62% 10% 13% 52% 35% 73% 15% 12% 34% 66% 16% 47% 71% 0% N 29% 65% 6% 5% 62% 33% 86% 6% 8% 20% 80% 23% 46% 50% 4% 3- Contact with ISP and notify it about email SPAM Have you ever purposely responded to an offer made by a SPAM email? What benefits did you derive from SPAM emails? 3.4. Effects of Email SPAM on End Users When the participants were asked if they affected negatively by email SPAM, the results revealed that approximately half of the participants in all regions affected by email SPAM (See Table 7). The results showed that the participants in the southern and northern regions were affected by email SPAM more than the other participants in other regions. This could be because of the most of the participants in the southern and northern regions were not aware of SPAM and the effective ways in dealing with it. Also, this could be because of dealing of the participants in the southern and northern regions with offers made by a SPAM email where the results revealed that the participants in the southern and northern regions responded to emails SPAM more than the participants in other regions (See Table 7). The results revealed that the main impact of SPAM on users was that filling inboxes with SPAM. The results showed that the participants in the southern region were more affected by this impact than the participants in other regions. The results also showed that the second main impact of SPAM on users was that the infection of computers by a Virus, Worm or other malicious program. The results revealed that the participants in the northern and central regions were more affected by this impact than the participants in other regions (See Table 7). The results showed that the participants in the western region were affected by SPAM through losing time and reducing productivity more than the participants in other regions. The results revealed that the participants in the eastern, southern and western regions were affected by SPAM through stealing personal information such as user name, password and credit card numbers more than the participants in other regions. The results also revealed that the participants in the eastern, western and central regions felt less confidence in using the email more than the participants in other regions. Also, the results showed that the participants in the central region were affected by email SPAM through other effects such as annoying and bothering more than the participants in other regions (See Table 7). 305 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Table 7: Effects of email SPAM on of users in the Eastern, Western, Central, Southern and Northern regions of Saudi Arabia Question Region C E W S N Part 2: Email SPAM Have you been affected negatively by email SPAM? Yes 43% 37% 46% 51% 52% about Anti-SPAM programs and how they work to detect SPAM. This also suggests that distributing free Anti-SPAM programs by the government or by sectors of providing the internet service for email users could reduce the effects of SPAM in Saudi Arabia. Table 8: Responses of the participants in the Eastern, Western, Central, Southern and Northern regions of Saudi Arabia about their knowledge about Anti-SPAM programs Region C No Stealing personal information such as user name, password and credit card numbers Losing time and reducing productivity Less confidence in using the email Filling email inbox Computer was infected by a Virus, Worm or other malicious program Other impacts 57% 63% 54% 49% 48% 23% 22% 18% 23% 16% Question 45% 25% 52% 55% 2% 51% 23% 66% 51% 3% 44% 22% 65% 58% 4% 36% 7% 71% 43% 3% 35% 15% 56% 59% 3% What was the impact of email SPAM? E W Part 2: Email SPAM Yes No 38% 62% 4% 67% 38% 62% 8% 79% S N Are you aware of AntiSPAM programs? 44% 56% 6% 62% 31% 69% 10% 52% 28% 72% 8% 67% Internet Service Providers (ISPs) The internet and forums Broadcast media such as radio, TV, newspapers and magazines Friends and relatives Government ministries and commissions Through my school or university education Other 3.5. Awareness of Anti-SPAM Filters and the Effectiveness of Anti-SPAM Filters in Detecting Arabic and English SPAM When the participants were asked if they aware of Anti-SPAM programs, the results revealed that the most of the participants in all regions were not aware of Anti-SPAM programs. The results indicated that the participants in the central region were more aware of Anti- SPAM programs than the participants in other regions (See Table 8 ). A study conducted in Bahrain [1] revealed that 26% of the participants knew about Anti-SPAM programs while 74% did not know about AntiSPAM programs. By comparing the results of Bahraini study to the results of this study, it can be clearly seen that Saudi society was more aware of Anti-SPAM programs than Bahraini society, but still most Saudi society were not aware. When the participants were asked about how they knew about Anti-SPAM programs, the results showed that the majority of the participants in all regions knew about AntiSPAM programs through the internet and forums and through school and university education. The results also revealed that there was a deficiency in the government and ISPs efforts in informing users about Anti-SPAM programs and how they work. As seen in Table 8, there were no government efforts to inform users about Anti-SPAM programs in the western and southern regions. This suggests that there should be a coordinating between the government and the sectors of providing the internet service in Saudi Arabia in informing users in all regions, especially in the western and southern regions, How did you know about AntiSPAM programs? 6% 3% 8% 5% 3% 32% 6% 25% 0% 28% 3% 48% 0% 14% 11% 33% 1% 27% 5% 47% 5% 52% 5% 36% 6% When the participants were asked to rate the effectiveness of Anti-SPAM programs in detecting Arabic and English SPAM, the results revealed that the existing Anti-SPAM programs were not completely effective in detecting Arabic and English SPAM. This suggests that the existing Anti-SPAM filters need to be developed to detect SPAM in different languages such as Arabic and English. The results showed that the participants in all regions estimated that the existing Anti-SPAM programs were effective in detecting English SPAM more than Arabic SPAM. This suggests that there should be a focus on producing and developing techniques to detect email SPAM in Arabic language. The evaluation of the participants in all regions for the effectiveness of Anti-SPAM programs in detecting Arabic and English SPAM can be seen in Figure 5 and Figure 6. 306 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) The effectiveness of Anti-SPAM filters in detecting Arabic email SPAM based on the evaluation of email users in Eastern, Western, Central, Southern and Northern regions in Saudi Arabia Table 9: The awareness of the participants in the Eastern, Western, Central, Southern and Northern regions of Saudi Arabia about the government and ISPs efforts Region E W C S N Part 3: Efforts of combating of Email SPAM in Saudi Arabia Are you aware of Yes 20% 22% 30% 20% 23% efforts by the government in Saudi Arabia to combat No 80% 78% 70% 80% 77% email SPAM? Are you aware of Yes 11% 15% 16% 13% 10% efforts by ISPs in Saudi Arabia to combat email No 89% 85% 84% 87% 90% SPAM? Question 100 80 Northern (n=130) Southern (n=134) Central (n=352) Eastern (n=203) Western (n=201) 20 40 60 60% 51% 61% 61% 63% 0 Region Figure 5: The effectiveness of Anti-SPAM filters in detecting Arabic email SPAM based on the evaluation of the participants in the Eastern, Western, Central, Southern and Northern regions of Saudi Arabia The effectiveness of Anti-SPAM filters in detecting English email SPAM based on the evaluation of email users in Eastern, Western, Central, Southern and Northern regions in Saudi Arabia 100 74% 80 Southern (n=134) Eastern (n=203) Central (n=352) Northern (n=130) Western (n=201) 20 40 60 79% 80% 83% 85% 0 Region Figure 6: The effectiveness of Anti-SPAM filters in detecting English email SPAM based on the evaluation of the participants in the Eastern, Western, Central, Southern and Northern regions of Saudi Arabia 3.6. Efforts of Government and ISPs to combat SPAM When the participants were asked if they aware of the government efforts to combat SPAM, the results showed that only a few participants were aware of the government efforts to combat SPAM. The results revealed that users in the central regions were more aware of the government efforts to combat SPAM than other regions (See Table 9). This suggest that the government should inform users about their efforts to combat SPAM and should provide awareness programs about SPAM, its impacts and methods of combating it for users in all regions of Saudi Arabia. This could help in reducing the effects of SPAM on email users in Saudi Arabia. The participants who were aware of government efforts to combat SPAM were asked about these efforts that they were aware of. Most of the participants (62%) said that the government efforts could be observed by King Abdulaziz City for Science and Technology (KACST). They said that KACST blocks unsecured websites and websites that send SPAM, informs people about dangerous security attacks and their impacts, and conducts and fund researches related to information technology [19]. 24% of the participants said that the government recommended that each government sector and private sector in Saudi Arabia should apply security policy in the organization. The policy should include: providing the organization with software and hardware that are necessary to avoid security attacks such as Viruses and SPAM, awareness of employees and customers about security attacks and methods of combating them, conducting researches related to security attacks and countermeasures for these attacks, conducting training and workshops related to security issues for employees, employment of qualified people in the field of networks security in the organization to deal with security attacks, providing financial budget to develop the work of security policy and reviewing the security policy regularly to find out the strengths and weaknesses of the work of security policy. 22% said that the government established and funded centres to deal with information security issues. Examples for these centres are Centre of Excellence in Information Assurance (COEIA) [8], Computer Emergency Response Team (CERT) [10] and Prince Muqrin Chair for Information Security Technologies (PMC IT SECURITY) [23]. They said the aims of these centres were to inform people about security attacks such as Viruses and SPAM and their 307 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) impacts, conducting and funding researches related to security issues and conducting conferences and workshops regarding security attacks. 19% of the participants said that the government efforts could be observed by Communication and Information Technology Commission (CITC). They said that CITC funded Saudi National Anti-SPAM Program project and created a website for this project that includes information about SPAM, methods of combating it and published it for public on the internet. They also said that this project informed people about SPAM by publishing brochures or by subscription of people in mailing list of CITC to make people look for the new development in SPAM. The participants also said that the project conducted some researches regarding SPAM problems and publish the results of researches for public. They also said that CITC received complaints of people regarding SPAM problems and it processed these problems with the other responsible government sectors [9]. 18% said that some universities in Saudi Arabia established centres for information security which provide the following services for people. First of all, information security centres provide awareness of people about security attacks. Second, these centres conducted workshops, conferences and ongoing training in the field of security issues and methods of combating it for people. Third, centres published valued researches in the field of security issues for people and different libraries in Saudi Arabia. 18% of the participants said that the government enacted law for combating electronic crimes in Saudi Arabia and there were no specific laws for SPAM. They said that the government sectors that are responsible to execute the electronic crime law are Communication and Information Technology Commission (CITC) with coordination with other legal sectors. When the participants were asked if they aware of the ISPs efforts to combat SPAM, the results revealed that only a few participants were aware of ISPs efforts to combat SPAM. The results indicated that users in the central and western regions were more aware of ISPs efforts to combat SPAM than other regions (See Table 9). This suggests that the ISPs should provide awareness programs about SPAM and its impact, and their efforts to combat it for users in all regions of Saudi Arabia which could help in reducing the effects of SPAM on email users. The participants who were aware of the ISPs efforts to combat SPAM were asked about these efforts that they were aware of. 42% of the participants said that the ISPs used advanced Anti-SPAM filters to block email SPAM before it reaches end users inboxes. 26% said that the ISPs blocked websites or forums that send email SPAM for recipients and put them in black lists. 13% of the participants said that the ISPs informed people about email SPAM and methods of combating it by email, brochures, and Short Message Service (SMS). 13% said that the ISPs warned customers not to send SPAM, they received customers’ complaints regarding SPAM and they executed some legal actions against people who sent email SPAM such as disconnecting the internet service and cancellation of the contract. 4. CONCLUSION AND FUTURE WORK This paper presented the results of a survey of email users in the eastern, western, central, southern and northern regions of Saudi Arabia about email SPAM and how they deal with it. The results showed that there was no a specific definition for email SPAM and the most common definition for email SPAM was that “an email that was sent randomly to numerous recipients and contained Spyware, files, links, images or text that aimed to hack the computer or steal confidential information such as email passwords, credit card numbers and bank account numbers”. The results revealed that approximately third of users in Saudi Arabia did not know about email SPAM and this is a significant and a risk for Saudi society. The results showed that the level of the awareness of the participants about SPAM was different from region to another and the participants in the central and western regions were more aware of SPAM more than the participants in other regions. The results showed that the volume of email SPAM was high in Saudi Arabia compared to other countries. The results revealed that the volume of email SPAM was different from region to another and the volume of SPAM received by the participants was larger in the northern and central regions than other regions. The results showed that most of the email SPAM received in all regions was written in English 308 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) and the volume of English SPAM was different from region to another. The results also showed that there were many types of Arabic and English SPAM received by the participants in all regions. The results showed that the most common type of Arabic SPAM was forums emails and for English was business advertisements, and phishing and fraud emails and the volume of these types for both Arabic and English were different from region to another. The results showed that a few participants in all regions responded to SPAM and the average of the participants who responded to SPAM was larger in the southern region than other regions. The results revealed that approximately half of the participants in all regions were affected negatively by email SPAM and the average of the participants who affected negatively by SPAM was larger in the southern and northern regions than other regions. The results showed that most of the participants in all regions were not aware of Anti-SPAM programs and the participants in the central region were more aware of Anti-SPAM programs than the participants in other regions. The results showed that the participants in all regions estimated that the existing Anti-SPAM programs were more effective in detecting English than Arabic SPAM. The results showed that most of the participants in all regions were not aware of the government efforts to combat SPAM and the participants in the central region were more aware of the government efforts than the participants in other regions. Finally, the results showed that most of the participants in all regions were not aware of the ISPs efforts to combat SPAM and the participants in the central and western regions were more aware of the ISPs efforts than the participants in other regions. Future work could include investigating government efforts to combat SPAM to find more effective methods to combat SPAM. Laws to combat SPAM in Saudi Arabia could be investigated. This could be achieved by taking the experiences of developed countries to combat SPAM. This could help in enacting a new clear law to combat SPAM in Saudi Arabia. The legal and technical efforts of ISPs in Saudi Arabia to combat email SPAM, and ways to encourage ISPs to collaborate with each other ISPs, private sectors, government sectors and customers could be investigated. Effective awareness programs to inform users in all regions of Saudi Arabia, private sectors and government sectors about SPAM, its effects and methods of combating it could be investigated. Improving the performance of existing AntiSPAM filters in detecting Arabic and English email SPAM could be investigated. This could be achieved by testing the effectiveness of existing Anti-SPAM filters in detecting Arabic and English SPAM email and this could help in creating and developing effective filters to detect new types of Arabic and English SPAM. A listing of keywords and phrases used in Arabic email SPAM were involved in this research and this could help in designing and producing special Anti-SPAM filters for Arabic SPAM. 5. REFERENCES 1. Al-A'ali, M.: A Study of Email Spam and How to Effectively Combat It. http://www.webology.org/2007/v4n1/a37.html , Webology (2007). 2. Alkahtani, H. S., Gardner-Stephen, P., Goodwin, R.: A taxonomy of email SPAM filters. In: Proc. The 12th International Arab Conference on Information Technology (ACIT), pp. 351--356, Riyadh, Saudi Arabia (2011). 3. Alkahtani, H. S., Goodwin, R., Gardner-Stephen, P.: Email SPAM related issues and methods of controlling used by ISPs in Saudi Arabia. In: Proc. The 12th International Arab Conference on Information Technology (ACIT), pp. 344--351, Riyadh, Saudi Arabia (2011). 4. Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., Spyropoulos, C. D.: An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proc of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 160--167, Athens, Greece (2000). 5. Australian Communications & Media Authority (ACMA), http://www.efa.org.au/Issues/Privacy/spam.html#acts 6. Boykin, O., Roychowdhury, V.: Personal Email networks: an effective anti-spam tool. Condensed Matter cond-mat 0402143, pp. 1--10 (2004). 7. Carreras, X., Marquez, L.: Boosting Trees for AntiSpam Email Filtering. In: Proc. of RANLP, 4th International Conference on Recent Advances in Natural Language Processing, pp. 1--7, Tzigov Chark, BG (2001). 8. Centre of Excellence in Information Assurance (COEIA), http://coeia.edu.sa/index.php/en/aboutcoeia/strategic-plan.html 9. Communication and Information Technology Commission (CITC) , http://www.spam.gov.sa/eng_main.htm 309 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 297-310 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Computer Emergency Response Team (CERT), http://www.cert.gov.sa/index.php?option=com_conten t&task=view&id=69&Itemid=116 Cook, D., Hartnett, J., Manderson, K., Scanlan, J.: Catching spam before it arrives: domain specific dynamic blacklists. In: Proc. of the 2006 Australasian workshops on Grid computing and e-research, pp. 193--202, Hobart, Tasmania, Australia (2006). Cormack, G., Lynam, T.: Spam corpus creation for TREC. In: Proc. of Second Conference on Email and Anti-Spam (CEAS), pp. 1--2 (2005). Cormack, G. V., Kolcz, A.: Spam filter evaluation with imprecise ground truth. In: Proce. of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp. 604-611, Boston, MA, USA (2009). Damiani, E., Vimercati, S. D. C. d., Paraboschi, S., Samarati, P.: An Open Digest-based Technique for Spam Detection. pp. 1--6, San Francisco, CA, USA (2004). Garcia, F. D., Hoepman, J.-H., Nieuwenhuizen, J. V.: SPAM FILTER ANALYSIS. SEC, pp. 395--410 (2004). Gardner-Stephen, P.: A Biologically Inspired Method of SPAM Detection. 20th International Workshop, pp. 53--56, DEXA (2009). Grimes, G. A., Hough, M. G., Signorella, M. L.: Email end users and spam: relations of gender and age group to attitudes and actions. Computers in Human Behavior 23, 1, 318--332 (2007). Hovold, J.: Naive Bayes Spam Filtering Using WordPosition-Based Attributes. In: Proc. Of Conference on Email and Anti-Spam, pp. 1--8 (2005). King Abdulaziz City for Science and Technology, http://www.kacst.edu.sa/en/about/Pages/default.aspx National Saudi Anti-SPAM Program, http://www.spam.gov.sa/eng_stat2.htm O'Brien, C., Vogel, C.: Spam filters: bayes vs. chisquared; letters vs. words. In: Proc. of the 1st international symposium on Information and communication technologies, pp. 291--296, Dublin, Ireland (2003). Pfleeger, S. L., Bloom, G.: Canning Spam: Proposed Solutions to Unwanted Email. IEEE Security and Privacy 3, 2, pp. 40--47 (2005). [23] Prince Muqrin Chair for Information Security Technologies (PMC IT SECURITY), http://pmc.ksu.edu.sa/AboutPMC.aspx Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail: Learning for Text Categorization. Papers from the 1998 Workshop, pp. 1--8, Madison, Wisconsin (1998). Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C. D., Stamatopoulos, P.: A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists. Information Retrieval 6, 1, 49--73 (2003). Sorkin, D. E.: SPAM LAWS. The Center for Information Technology and Privacy Law, http://www.spamlaws.com/ (2009). 27. Wittel, G. L., Wu, S. F.: On Attacking Statistical Spam Filters. In: Proc. of the Conference on Email and Anti-Spam (CEAS), pp. 1--7, Mountain View, CA, USA (2004). 310 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) A Survey on Privacy Issues in Digital Forensics Asou Aminnezhad Faculty of Computer Science and Information Technology University Putra Malaysia
[email protected] Ali Dehghantanha Faculty of Computer Science and Information Technology University Putra Malaysia
[email protected] Mohd Taufik Abdullah Faculty of Computer Science and Information Technology University Putra Malaysia
[email protected] ABSTRACT Privacy issues have always been a major concern in computer forensics and security and in case of any investigation whether it is pertaining to computer or not always privacy issues appear. To enable privacy’s protection in the physical world we need the law that should be legislated, but in a digital world by rapidly growing of technology and using the digital devices more and more that generate a huge amount of private data it is impossible to provide fully protected space in cyber world during the transfer, store and collect data. Since its introduction to the field, forensics investigators, and developers have faced challenges in finding the balance between retrieving key evidences and infringing user privacy. This paper looks into developmental trends in computer forensics and security in various aspects in achieving such a balance. In addition, the paper analyses each scenario to determine the trend of solutions in these aspects and evaluate their effectiveness in resolving the aforementioned issues. KEYWORDS Privacy, Computer Forensics, Digital Forensics, Security. 1 INTRODUCTION Computer forensics has always been a field which is growing alongside technology. As networks become more and more available and data transfer through networks getting faster, the risks involved gets higher. Malicious software, tools and methodologies are designed and implemented every day to exploit networks and data storage associated with them to extract useful private information that can be used in various crimes. This is where computer forensics and security comes in. The field applies to scientifically collect, preserve, and recover latent evidence from crime scenes with techniques and tools. Computer forensics is the science of identifying, analyzing, preserving, documenting and presenting evidence and information from digital and electronic devices, and it is meant to preserve the privacy of users from being exploited. Forensic specialists have a duty to their client to pay attention about the data to be extracted that can become possibly evidence, essentially it can be digital evidence’s investigation and way guiding to feasible litigation. However, the process of extracting data evidences itself opens up avenues for forensic investigators to infringe user privacy themselves. The privacy concern that computer forensics disclose can be image, encrypted key , the user passwords and utilize knowledge that more than aim of the investigation. In order to prevent such potential abuses and protect the forensics investigators as well as users, researches and analysis has been done in various fields to provide solutions for the problem. 311 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) This paper comprises of 5 Sections and will be presented as such: Section 2 determines the limitations of the study, collects data from research publications and reviews related works in the field of privacy application in various fields and their solutions. Section 3 analyses these solutions and determine whether privacy can be preserved on both user and forensic investigator’s perspective. Section 4 identifies the overlooked privacy issues by current developmental trends of privacy preservation and its potential setbacks. Section 5 concludes the paper and summarizes the overall development of technology in privacy preservation. 1.1 Limitations of the study This paper focuses on statistical analysis based on trends from 2006. Due to the technicalities of each paper in specification of research field it is not possible to rely solely on the results to reflect the holistic picture of the real trend in privacy issues when it comes to forensics investigations. It is also difficult to fully explain the development trends of privacy issues as they are delicate in each research specimen. The research nature and scenarios used cannot be fully dependably upon as they are not necessarily applicable in another similar scenario. The numbers of specimen provided are also too few to adequately sustain very significant research value. In this case, where most of the papers reviewed are too specific in their corresponding research field and purpose, it is difficult to generalize the specimen into statistical data with higher accuracy. We also realize that most specimens are from the Elsevier journal platform, and thus also acknowledge this as a form of limitation on availability of more related research publications in other sources. We also credit another limitation on the lack of graphical statistical data, as most of the papers researched do not necessarily belong to statistical based research. It is not practical to add statistical assumptions into these graphical statistical data as it will possibly divert the accurate picture of the research. 1.2 Data Collection In this research, a stringent data collection procedure is set up. Such procedure is required as the resource provided to achieve high level research results is scarce, hence every important data cannot be risked being left overlooked. We consider 3 very important analyses: research nature analyses, keyword analyses and individual analytic platform. There is a total of 21 documents analyzed based on the aforementioned 3 approaches. Table 1 signifies the shift of research focus when it comes to preserving privacy. It is rather evident that the current focus of forensics and security solutions are now more towards databases and networking with the rise of dependency on cloud computing technology, with 8 papers focusing on that area. More data are being stored in third party databases as compared to 5 years ago, and it became a tempting source to gain valuable private information. A shift of focus is inevitable from software and systems to database and networking under such circumstance where it is harder to gain access to information without networking access and maintain it for further exploitation. Methodologies and framework still receive adequate focus as these are the foundation of many solutions that are to be proposed in the future. The keyword analysis signifies the focus of each specific specimen analyzed. As it is shown in Table 2, keywords used do not necessarily bear the same signature as published in these specimens, but are grouped based on their representation. For example, a computer forensics publication with digital forensics representation will be grouped together as they represent similar research nature. Keyword analysis provides a picture of techniques and theories that are being emphasized within the timeframe of this research paper. The clear distinction on the focus of researchers to privacy and digital forensics issues marked the importance of balancing privacy and forensics. Excluding the specific related issues, general privacy and digital forensics focus achieved a total of 24 keyword matches out of 21 papers. To quantify, that would mean there are at least 3 papers that draw a comparison between both issues 312 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) in finding a balance as a major purpose of research. The other important trend is the diversity of the research. There are only 11 out of 53 representable keywords identified that bear more than 2 keyword matches. This means that more focus is given to individually specified research subjects rather a holistic picture of privacy-forensics balance. The individual analytic platform is conducted as a final data collection. This is done by picking up a summary of each paper, and gives a brief explanation of what the paper is trying to prove and possible benefits from the publications. Before a forensics investigator or computer security designer works on finding evidence or putting up detection systems, the first step is always to gather information and plan. The problem with Standard of Procedures (SOP) [1] of forensics investigations are that there are many instances where forensics investigators step into information that are not necessarily related to a particular crime. Table 1. Research Nature Analaysis 9 8 7 6 5 4 3 2 1 0 methodolgies software and database and education and and systems networking networking framework Table 2. Keyword Analaysis PIS Forensics Framework Phisher Network Suspicious Database Queries … Relational Database Forensics Database Distributed Information System Sensor Web Data Privacy/Protection Commutative Encryption Homomorphic Encryption Encrpyted Data Searching Privacy Preserving Forensics In-Private Incognito Private Browsing Information Forensics Portable Document Format Information Leakage Electronic Document Document Security Compound Document Format Privacy Preserving Object Privacy Accurate Sequence Release Forensics/Digital Investigation Forensics Computing Education Privacy Protection Forensics Images Information Privacy Incidents Forensics Readiness Capability Warrants Legal Issues Transparency/Reliability Privacy Enhancing Technologies Network Intelligence Traffic Analysis Log Files Analysis Security Anonymizers Antiforensics Onion Routing Statistical Database Privacy Preserving Semantics Privacy Identity Based Encryption Cryptography Network Forensics Netflow Fraud Computer/Digital Forensics Computer Prevention Cybercrime 0 2 4 6 8 10 12 14 16 The Fourth Amendment of the Constitution of United States of America is no stranger to digital forensics investigators. 313 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) investigators themselves from infringing privacy. It allows forensics investigators to first obtain information that is layered as not related to individual before moving towards the next layer. As each layer of information is justified and obtained the layer gets deeper and closer in relation to the individual until the final layer where information is needed for forensics investigation and directly linked to the person. In [6], PPINA (Protect Private Information Not Abuser) is proposed, an embedded framework in Privacy Enhancing Technologies (PET), a technology designed to preserve user anonymity while accessing the internet. The framework allows users to continue being anonymous unless the server has enough evidence to prove that the user is attacking the server, hence requesting a forensics investigation entity to reveal user identity. The framework is designed to achieve a balance between user privacy and digital forensics, where both goals can be achieved with a harmonious combination of network forensics and PET. The development of digital forensics and security on software level also raises many privacy related issue. This includes information systems and related tools. The first software that is looking into is the counter forensics privacy tool. A review was done in 2005 on this software type that prevents forensics investigators from accessing private information by wiping out data like cache, temporary files and registry values when executed. In [7], the researchers evaluated 6 tools under this category and found that while the tools potentially eliminate vast majority of targeted data, they either partially or fully failed in 6 evaluation sections which they claim to function, including incomplete wiping of unallocated space, erasing targeted user and system files, registry usage records, recoverable registry archive from system restore point, recoverable data from special file system structures and the tool’s own activity records disclosure. The authors suggested that encryption might be a better alternative to replace these tools, such as Encrypting File System. 314 2 CURRENT TRENDS OF PRIVACY IN DIGITAL FORENSICS The Amendment protects people from unreasonable seizure and searches, and warrants that allow such seizure has to be specific to its cause. For example, if a warrant is issued against an individual to be searched for evidence of drugs, any related searches that turned out to be child pornography will not be eligible to be used against the individual. The amendment also stretches to interception of communication networks, including wiretapping [2]. However, the Amendment only limits what type of information to be searched and seized, not the protocols on how they are to be searched and seized. On this ground, [2] proposed that an audit trail on methodologies used by forensics investigators will be enough to verify if the investigation protocols exceeded court authorization. Apart from a general audit, many related researches also produced different models for forensics investigations in recent years. In [3] proposed a framework where enterprises can meet forensics readiness to approach privacy related violations. It consisted of a series of business processes and forensics approach, executed in hierarchical order such that enterprises can conduct quality privacyrelated forensics investigations on information privacy incidents. There are 2 later models proposed in 2010. Firstly, in their research, [4] proposed a cryptographic model to be incorporated into the current digital investigation framework, where forensics investigators first have to allow the data owner to encrypt his digital data with a key and perform indexing of the image of the data storage. Investigators will then extract data from relative image sectors that matches keywords they used, with the encryption key. Image sectors without the keywords will then not be revealed to forensics investigators, guaranteeing privacy. The next model proposed by [5] introduces a layering system on data in order to protect privacy of users from being violated and the forensics International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) A similar analysis done on Privacy-Invasive Software (PIS) by [8], software that collects user information without user knowledge such as spyware and advertisement software known as adware, also found that current tools designed to combat them (anti-spyware and adware) failed to identify them fast enough or even identifying them at all and have problems classifying PIS properly. The research concluded that these tools, that be run on similar algorithm dealing with viruses and malware (signature identification) does not work well on PIS due to its nature of existence in grey area between business facilitating and malicious. Manual forensics method, upon experiments, provided better results instead. Browsers also raise privacy related issues, as they are used to perform many activities such as trading online, which requires a private information transfer. In [9] published an analysis on three widely used browsers in terms of their private browsing effectiveness. Private browsing is a feature that prevents browsing history to be stored in the computer’s data storage. The authors concluded that while all three browsers do not display visible evidences in private browsing mode, related data can still be extracted with proper forensics tool and methodology. From the user’s viewpoint, the authors also concluded that Google Chrome and Mozilla Firefox are better private browsing solutions compared to Internet Explorer. Portable Document Format (PDF) is invented by Adobe, credited with its security compared to other document format. In [10], the researchers released their review in this format, suggesting that PDF is subject to leak information due to its several interactive features, including flagging content as “deleted” instead of really deleting them, allow tracing of IP address on its distribution, and very subject to hackers to collect this information while using PDF to conduct malicious cyber-attacks. The authors proved the investigation with several tools and attacks, suggested a few solutions on an administrator level dealing with PDFs, such as the shocking nature of PDF files received and systems like EES (Elsevier Editorial System) to monitor PDF files. In [11], on the concept of Onion Routing, pointing out the evolution of the concept in preserving privacy raised issues of difficulties during investigations. Onion Routing is created to absolutely prevent traffic analysis from third parties by encrypting socket connections and act as a proxy instead. Only the adjacent kin routers along the anonymous connection can “unpeel” the encryption as the packets approach its destination, preventing hijacks and man-in-the-middle phenomena. However, the author argued that the same technology could be used by criminals to prevent traffic analysis of forensics investigators and bypass censorship, or combining the concept to perform other malicious attacks on networks. Such concept makes it very difficult for forensics investigators to collect evidence as there are too few avenues to access the information pockets from third parties, unless access is gained from the inside chain of the connection or tracing the last router’s communication with the destination which is the weakest protection in the chain. In [12], the researcher published their findings on preserving privacy in forensics DNA databases. Such databases are designed to be centralized, usable by forensics investigators globally to identify criminal identities based on DNA matches. To solve issues where such information may be leaked into parties for non-investigative purposes on forensics ground, the authors proposed a framework in reworking the database access controls to only accept certain queries that are legitimate forensics queries. These queries include blood samples and cell tissues that are found at crime scenes. In [13], the researcher outlined his research on privacy issues raised by sensor webs and distributed information systems, an active field after the 911 incident. Distributed information systems are information collecting systems with huge data repository, including private information such as financial and communications records. Sensor webs use small, independent sensors to collect and share information about their environment without wire. The author proposed several policies to maintain privacy in distributed information systems and sensor webs, 315 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) including fundamental security primitives such as low level encryption and authentication, human interfaces to limit queries, selective revelation of data, strong audits and better querying technologies, with policy experimenting, security and legal analysis, masking strategies to obtain results. Another networking issue arises in shared and remote servers, servers that stores data for users as a form of third party data storage. Essentially there are two problems here; firstly, these servers are owned by third party service providers, hence getting access without their knowledge of what investigators are looking for is difficult due to permission grants (privacy preservation). Secondly, the servers’ nature to be remote also makes it difficult to trace evidence in a large number of shared and distributed storage using traditional forensics method of imaging (cloning) the storage devices. The usual privacy issue of tampering into irrelevant data also exists. To solve these problems, [14] proposed two schemes, the homomorphic and commutative encryption. The homomorphic encryption is a scheme where both administrator of remote servers and investigators encrypt their data and queries. The administrator then uses the encrypted queries with the investigator’s key to search the server for relevant data, and the investigator then decrypts the data with the administrator’s key. The commutative encryption introduces a Trusted Third Party (TTP) that supervises the administrator to prevent unfair play. The details are similar to homomorphic encryption, with another layer of commutative-law based encryption applied by TTP before the searching on data storage is conducted. Both schemes allow investigators to obtain information that they need without exposing them to administrators of the remote servers. In [15], the researchers presented an approach to detect accessing parties of leaked information from a relational database through queries. In this approach, the authors argued that suspicious queries can be determined if and only if the disclosed secret information could be inferred from its answers. To do this, a series of optimization steps involving the concept of replaceable tuples and certificates, and database instances are explained in relational mathematics. An algorithm is constructed then from these optimization steps to determine whether a query is suspicious with respect to a secret and a database instance. In [16], a framework in 2011 to preserve privacy while handling network flow records is proposed. Network flow recording collects information about network traffic sessions. This information can contain very private data, including network user information; their activities on network, amount of data transferred and used services. The authors proposed a framework of integrated tools and concepts to prevent such data from falling into the wrong hands. The framework is divided into 3 sections: data collection and traffic flow recording, combined encryption with Identity Based Encryption and Advanced Encryption System, and statistical database modelling and inference controls. The framework is implemented to prevent privacy on two phases, including encryption and decryption of data collected and the manner of constructing statistical reports such that inference controls are applied to prevent a response to suspicious queries. To combat phishing that often leads to identity theft, [17] proposed a framework in 2008 (citation 2008 a forensic). The framework is to counterphish phishers, using a fake service (phoneypot) with traceable credential data (phoneytokens). When a phisher is identified, he/she is directed to the phoneypot and transact with it, transferring phoneytokens into the phisher’s collection server. This allows investigators to trace and profile the identity of the phisher through these tokens. The authors argued that even if the counter-phishing attempt is discovered, it would have caused enough problems to the phisher to avoid the target in the future, protecting the user from further exploitation by phishing attacks. In general, database systems are supposedly designed to store and handle data in a proper manner. In [18], the researchers’ findings in 2007 that proved this wrong are published. They concluded that database systems do not necessarily remove stored data securely after deletion whereby remnant data and operations can be found in 316 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) allocated storage. Database systems also made redundant copies of data items that can be found in file systems. These data present a strong threat to privacy as not only investigators may find themselves dealing with unwarranted data, criminals may also access them for malicious purposes. To avoid this, the authors designed a set of transparency principles to ensure secure deletion of data, modified database language (MySQL) internals to encrypt the expunction log with minimal performance impact that usually occur when it comes to overwriting-encryption. In 2008, [19] published a paper explaining the importance of computer forensics to be practiced in today’s networked organizations. It outlined the key questions including the definition of computer forensics, its importance, legal aspects involved and online resources available for organizations to understand computer forensics in a nutshell. In [20], a paper is published that addressed a rising problem of professionalism when it comes to digital forensics in other fields. The author pointed out that in many scenarios when it comes to InfoSec professionals being deployed to work on digital crime investigations their duties are very limited to laws and legal systems, and lack the intersection of business requirements from enterprises and government. He argued that coordination between different departments is essential to achieve investigation goals, hence proposed a GRC-InfoSec compliance effort. A few suggestions put forth include a legal research database to create a cross-referencing table of regulatory actions and legal case citations to ITspecific laws and guidelines, and presentation of resulting costs and business disruption. (GRC stands for Governance, Risk management and amp; Compliance) As for education, [21] published a system that produces file system images for forensic computing training courses. The system known as forensig, developed with Python and Qemu, allows instructors to set constraints on certain user behavior such as deleting and copying files, in a script which is then executed in a form of image that can be analyzed by the students. The results can then be matched with the input script. It solves the issues of instructors using second hand hard disks for analysis practice, which often times contain private data. Besides that, [22] tackle cybercrime-related issues. Issues regarding privacy as a fundamental right, comparison of legal issues between countries discuss in the workshop. In addition there were few works on privacy issues that may arise during malware analysis [23,24], analysis of cloud and virtualized environments [25-27], and in pervasive and ubiquitous systems [28-32]. With growing usage of mobile devices and Voice over IP (VoIP) protocol several researchers tried to provide privacy sound models for investigation in these environments [33-36]. Finally, there were models for forensics log protection while considering user privacy in log access occasions [37,38]. 3 DISCUSSION AND ANALYSIS OF RESULTS We believe that the development of solutions and frameworks to contain privacy issues in various fields are not synchronized. Our analysis is done based on each field, with comparison to related fields and their effects as a whole towards privacy preservation. We found out that while research in one field contributed compelling solutions that might be a long term answer to privacy preservation, it does not necessarily be the case on another field. To analyze the development of each field, we split the stakeholders in each section, from users’ and forensics investigators’ perspectives. 3.1 Privacy Perspective Preservation from User’s We found that in the case of a user, the major problem of preserving privacy is the lack of knowledge and understanding. General users do not know the technicalities of how networks and data storage are being managed, and their rights in their personal and private information being used by organizations. Hence, researches and development of a framework and systems with privacy preservation of user’s data are focused more 317 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) towards passive preservation, without them knowing how the framework and system preserve their data. We found this to be very effective, yet deceiving at the same time. In instances where frameworks are applied to networks and databases, for example the inference controls and encryption framework that are implemented on network flow recording and traffic analysis, onion routing, cryptographic approach on DNA forensic databases, homomorphic and commutative encryption, and sensor webs protection framework, the solutions provided are usually effective in tackling situational crisis on data privacy, and users usually do not know such solutions are implemented to protect their data from being exploited. However, the review on counter-forensics privacy tools and analysis of how database systems delete data, plus the problems in Portable Document Format when they “delete” data, proved the deceiving pictures of these tools and systems being able to live up to expectations, or placed a false dichotomy that they deliver in their tasks. Especially when users generally do not know if these tools work exactly like what they expect, and assumed that they do work, private data are constantly under threat of being exploited by malicious parties with no warning posed to the users to be aware of the situation of their private data. We also found that privacy preservation can never be achieved at its fullest. The proposed frameworks and models, with encryption and technologies implemented, their findings have a similar issue; it is particularly hard to design a fully protected system, with constraints and assumptions primarily added into the calculus to prove their frameworks and models can function under these constraints. The mention of “future works” or manual audits have been used in particularly general models, including sensor webs and distributed information systems, database systems, relational database query controls and counter-Phishing. This presents another issue; not all users are aware of what type of scenario their data would most likely be exploited, or in which type of scenario their current data storage is in. This contributes generally to another problem; when user privacy is breached, the need for different professionalism to handle the investigations become difficult due to the lacking of standardization and understanding of the scenarios and the status quo. Throughout these flaws, we understand that while development and researches to preserve user privacy better are getting better on the road, the idea of a fully protected framework or model will not suffice in the near future. It is important for users to understand the need for them to secure their private information at the best of their interest, particularly when cloud computing technology is on the rise, and more remote and shared data storages are made available for users. Users must know their responsibility in their own personal information, and utilize as much as possible combinations of several developed privacy preserving solutions to protect their data well while networking. From picking the right browser to perform private browsing to using the services of trusted organizations with proven functioning privacy preservation policies and technologies in place are a few sets of decisions and combination of models and framework to secure private data better. We also think that users must always have the awareness and understanding that their private data might be leaked. Such awareness is needed with status quo proving that privacy preservation is still in its developmental stages in redefining their borders and to what extent they should provide protection. Users must always be prepared to face scenarios and seek solutions when such leaks happen, and know how forensics investigators perform investigation without further threatening their privacy in this regard. To conclude this subsection, we believe that users need to have a general understanding and knowledge on how technologies aid in privacy preservation while they are storing data on networks, using tools and services, and if these technologies are delivering their functions. We also believe that users must understand that technologies can only help in privacy preservation that much and it is a collective effort of a combination of technologies with professionalism and expertise of other aspects to better privacy 318 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) preservation. It is also important that users are prepared to deal with situations when their privacy has been breached, and seek the best solutions available, including forensics investigations. It is also evident to us that development of privacy preservation techniques and tools are predicated more towards technical solutions rather than a holistic approach, desynchronizing the focus to tackle the problem. 3.2 Privacy Preservation from Forensics Investigators’ Perspective The jobs of forensics investigators are to collect, preserve and analyze information, then reconstruct the events of a crime. We found that when it comes to privacy preservation from the forensics investigators’ perspective, it is always a dilemma strongly linked with user privacy and legal systems, as pointed out by many related works. We concur that forensics investigators’ procedural methodologies in collecting, preserving and analyze information possess potential avenues of user privacy infringement. Our agreement on this course is based on a general assumption that forensic investigators have vested interest in this information; either they are important in proving a court case or a crime, or they are important for personal use, which often times contain malicious purposes. We found that the related research and proposed solutions provided positive and negative effects in forensics investigations. We argue that the limitations and constraints implemented in these systems and models do help in protecting forensics investigators from infringing privacy, but on the other hand, limit them from conducting forensics investigation in a more direct and effective approach. We want to explain this on both levels. On the positive note, constraints applied on various frameworks, such as homomorphic and commutative encryption, onion routing, inference controls, DNA blood and tissue samples from the crime scene as key queries, sequential data release based on relational levels and network flow recording framework all demonstrated a vast implementation of constraints to protect unrelated data from being exposed to forensics investigators while conducting investigations. We believe that sequential data release based on relational levels is particularly critical in addressing privacy issues and balancing user privacy and legal need to access such private data, as it allows direct avenue to gain access to private information through a specific process, not as general as organized queries and encryption. We believe that integration of these technologies can bring more positive contribution in aiding forensics investigators. Using the Sequential release of information based on a relational level as a framework to implement and shape organized queries is an example of integration of both techniques while conducting forensics investigations. However, there are negative sides of it as well. The issues here are on the non-technical part of dealing with privacy. We found that the most obvious impact of the proposed frameworks, such as cross referencing encrypted queries with data, onion routing and strong audit are among the frameworks that directly limit avenues that can be taken by forensics investigators to approach their investigations. We need to consider the assumption that all crime investigations are time sensitive and such constraints placed by these frameworks may prolong the already time consuming investigation progress, as investigators now have to plan their investigation methods to be more technical and direct in order to extract the right evidence. Besides that, the possibility of extracting wrong or irrelevant evidence still exists regardless of how these frameworks are in place. The fact that tracing private information without really knowing the content and only based on keywords does not necessarily reflect the nature of data collected, meaning the data might not be useful to the investigation, and risks the possibility of exposing private information as well. Finally, we found that ambiguity always exists in privacy issues when it comes to forensics investigators. We argue that a forensics investigator is an individual that is equipped with decent knowledge of computer security. We believe that if an individual’s purpose of obtaining private 319 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) information is malicious, the data will still be leaked into the wrong hands anyway. The idea is regardless of how far technology has gone into preserving privacy, it still runs the possibility of being leaked and exposed, considering of their possible use and management by another person other than the user him/herself. While having such technologies deter forensics investigators to use the extracted information properly, it is still not a guarantee that the information will not be misused in the hands of forensics investigators, whether intentional or unintentional. To conclude, we believe that the proposed frameworks, introduced technologies and implemented models and tools believed to be able to aid forensics investigators from infringing user privacy while conducting investigations might not be as one sided as it seems. We believe that the rationale and professionalism of the forensic investigators are important when handling private data as their expertise in handling computer security is on level enough to know how these technologies work in protecting private data. We also believe that such technologies still need to remain to deter forensics investigators from drifting off their professionalism, but essentially the negative impacts of such deterrence in place might jeopardize privacy even further with the possibility of irrelevant information leaking out anyway, and prolonging the forensics investigation process. We conclude that it is important that the forensics investigators know the sensitivity of data they are going to handle in each investigation and understand their professionalism is important in preserving privacy. 3.3 Privacy Preservation from Technologies’ Perspective We found that from a technology perspective, the current development of cyber security and digital forensics in preserving privacy may have reached a bottleneck, and the latest developments are too constrained to very few general security measures. This in turn does not bring too much positive improvement in the field, but returns negative effects as well. We analyzed some of the reviews and would like to highlight several examples to support our findings. The first problem with current technologies is the similarity of techniques. We found that almost all security measures taken in various frameworks and models, be it database systems, remote servers, relational databases or network flow recording, the framework looks similar in terms of their algorithm, which includes encryption, data deletion and controls. We concur that some of the combinations are effective, such as onion routing and sequential data release in preserving privacy from being exposed to unrelated parties. However, assuming in general scenarios, similarity in security frameworks often means faster workarounds being developed by malicious hackers, as these frameworks share a common structure, and provide more examples for malicious parties to work their ways around the security system. We also noticed that in some of the frameworks proposed, the authors made assumptions that otherwise will jeopardize the system, and offer a contingency solution. However, in one such scenario such as onion routing, the author mentioned about how it would also harm investigators should the framework be used against them. As onion routing renders traffic analysis from third parties impossible, it would be extremely difficult to trace or extract information from such routing method used by malicious users for tracking and profiling purposes. This is a typical example of how technologies, even in the cyber security field, can reserve wanted results and have an unexpected and undesired effect when it is being used by the wrong party. The same happens to the commutative encryption example. The framework could only work properly under the assumption that the administrator provides all database information in an encrypted manner. Should this is not the case, not only the extracted information by the forensics investigators suffer possibilities of being irrelevant, it also jeopardizes the process of investigation as the forensics investigators would likely miss out important evidence in reconstructing the sequence of events on the crime. 320 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) To conclude, development of technologies in cyber security and digital forensics are very much predicated on technicalities only, and does not necessarily provide more improvement to preserving privacy as it has been expected to. The similarity in frameworks and models proposed, plus the possibility of technologies being used in the wrong hands are all issues that have to be solved at grassroots level to ensure privacy preservation is successful. We believe that apart from technical development, technologies will need to take into consideration other aspects that influence digital forensics and cyber security, including education, business requirements, professionalism from other related fields and work together to ensure a more holistic level of improvement in preserving privacy can be achieved. We also argue that technologies in digital forensics and security can backfire and become dangerous if it is reversely used by malicious users with intent to harm and infringe user privacy. 4 CRITICALLY OVERLOOKED ISSUES level by professional forensics investigators without proper training and awareness. This opens up to more possibilities of abuse without consent or abuse without a motive by investigators. Awareness is also not given emphasis on the user’s side, and this exposes users to higher risk of being abused under the same paradigm. Simply put, even with the latest technologies and framework in place to preserve privacy, it would have been rendered useless should both parties that use them are not aware of their potential, and subject to risk of being abused by such technologies instead. 5 CONCLUSION As mentioned in the analysis section, we believe that privacy issues stem from intention, and made possible with the use of technology. However, technology has already revolutionized to a level that it is applicable to almost every industry; a good example is how database technology is used in storing DNA samples of criminals, which can stem into medical forensics for a start. Research focus should now be more emphasized on solving the issue at a root problem rather than introducing more technical countermeasures in the field, which many publications in this research also proved to be applicable on both privacy preservation and exploitation use. We also note that the focus on education and awareness of intention of protecting privacy and preservation in a professional forensics field are not adequate enough to strike the balance between privacy preservation and getting the investigation done in quality level. We find that this is particularly detrimental, as technologies that are continuously being rolled out into the commercial market will not be able to be utilized in satisfactory This paper has identified various privacy issues in cyber security and digital forensics, issues that use for protecting privacy of data in forensic investigation, whereby how forensics investigators may have infringed user privacy while conducting forensics investigations, and how user privacy is always under threat without proper protection. It has also reviewed the current development trend shift in this industry, why such trend could have happened and its drive. The paper has reviewed various fields and their development in the technicalities and technologies to address this problem. The paper describing each field in a nutshell that explains how these technologies work, and what are their approaches in solving the problem of preserving privacy. The reviews are split into three sections, each with its corresponding fields of reviews and explanation. The paper then analyses these reviews and view them from the user and forensics investigator’s perspectives, whether such development in cyber security and digital forensics actually improve the efforts on preserving privacy. The paper concluded that while every development has its positive approach and finds the solution to what the authors want to solve, the issue of privacy preservation still exists, with the consideration of non-technical aspects in professionalism in practice and the ambiguity of scenarios causing some approaches to be counterproductive. The paper also analyses on how at a technical level, advanced technologies in 321 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) digital forensics and security are facing a bottleneck in development and could bring about as equal harms to the current efforts in preserving privacy. 6 REFERENCES International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp.378-383, 2011. [15] S. Böttcher, R. Hartel and M. Kirschner, “Detecting suspicious relational database queries,” in The Third International Conference on Availability, Reliability and Security, 2008. [16] B. Shebaro and J. R. Crandall, “Privacypreserving network flow recording,” Digital Investigation, volume 8, pp. 90-100, 2011. [17] S. Gajek, and A. Sadeghi, “A forensic framework for tracing phishers,” volume 6102 of LNCS, pages 19-33. Springer, 2008. [18] P. Stahlberg, G. Miklau, and B. N. Levine, “Threats to privacy in the forensic analysis of database systems,” ACM Intl Conf. on Management of Data (SIGMOD/PODS), 2007. [19] US-CERT, Computer Forensics, 2008. [20] S. M. Giordano, “Applying Information Security and Privacy Principles to Governance,” Risk Management & Compliance, 2010. [21] C. Moch and F. C. Freiling, “The forensic image generator generator,” in Fifth International Conference on IT Security Incident Management and IT Forensics, 2009. [22] J. R . Agustina and F. Insa, “Challenges before crime in a digital era: Outsmarting cybercrime offenders,” Workshop on Cybercrime, Computer Crime Prevention and the Surveillance Society, volume 27, pp.211-212, 2011. [23] F. Daryabar, A. Dehghantanha, HG. Broujerdi, Investigation of Malware Defence and Detection Techniques,” International Journal of Digital Information and Wireless Communications(IJDIWC), volume 1, issue 3, pp. 645-650, 2012. [24] F. Daryabar, A. Dehghantanha, NI. Udzir, “Investigation of bypassing malware defences and malware detections,” Conference on Information Assurance and Security (IAS), pp. 173-178, 2011. [25] M. Damshenas, A. Dehghantanha, R. Mahmoud, S. Bin Shamsuddin, “Forensics investigation challenges in cloud computing environments,” Cyber Warfare and Digital Forensics (CyberSec), pp. 190-194, 2012. [26] F. Daryabar, A. Dehghantanha, F. Norouzi, F Mahmoodi, “Analysis of virtual honeynet and VLAN-based virtual networks,” Science & Engineering Research (SHUSER), pp.73-70, 2011. [27] S. H. Mohtasebi, A. Dehghantanha, “Defusing the Hazards of Social Network Services,” International Journal of Digital Information, pp. 504-515, 2012. [28] A. Dehghantanha, R. Mahmod, N. I Udzir, Z.A. Zulkarnain, “User-centered Privacy and Trust Model in Cloud Computing Systems,” Computer And Network Technology, pp. 326-332, 2009. [29] A. Dehghantanha, “Xml-Based Privacy Model in Pervasive Computing,” Master thesis- University Putra Malaysia 2008. [30] C. Sagaran, A. Dehghantanha, R Ramli, “A UserCentered Context-sensitive Privacy Model in Pervasive 322 [1] I-Long Lin, Yun-Sheng Yen, Annie Chang: “A Study on Digital Forensics Standard Operation Procedure for Wireless Cybercrime,” International Journal of Computer Engineering Science (IJCES), Volume 2 Issue 3, 2012. [2] C. W. Adams, “Legal issues pertaining to the development of digital forensic tools,” Third International Workshop on Systematic Approaches to Digital Forensic Engineering, pp. 123-132, 2008. [3] K. Reddy and H. Venter, “A Forensic Framework for Handling Information Privacy Incidents,” Advances in Digital Forensics, volume V, pp. 143-155, 2009. [4] Frank Y.W. Law et al, “Protecting Digital Data Privacy in Computer Forensic Examination,” Systematic Approaches to Digital Forensic Engineering (SADFE), 2011. [5] N. J. Croft, M.S. Olivier, “Sequenced release of privacy-accurate information in a forensic investigation,” Digital Investigation, volume 7, pp. 1-7, 2010. [6] G. Antoniou, C. Wilson, and D. Geneiatakis, PPINA – “A Forensic Investigation Protocol for Privacy Enhancing Technologies,” Proceedings of the 10th IFIP on Communication and Multimedia Security, pp. 185-195, 2006. [7] M. Geiger and L. F. Cranor, “Counter-forensic privacy tools,” Privacy in the Electronic Society, 2005. [8] M. Boldt and B. Carlsson, “Analysing countermeasures against privacy-invasive software,” in ICSEA, 2006. [9] H. Said, N. Al Mutawa, I. Al Awadhi and M. Guimaraes, “Forensic analysis of private browsing artifacts,” in International Conference on Innovations in Information Technology, 2011. [10] A. Castiglionea, A. D. Santisa and C. Sorien, “Security and privacy issues in the Portable Document Format,” The Journal of S ystems and Software, volume 83, pp. 1813–1822, 2010. [11] D. Forte, “Advances in Onion Routing: Description and backtracing/investigation problems,” Digital Investigation, volume 3, pp. 85-88, 2006. [12] P. Bohannon, M. Jakobsson and S. Srikwan, “Cryptographic Approaches to Privacy in Forensic DNA Databases,” Lecture Notes in Computer Science Volume 1751, pp 373-390, 2000. [13] J.D. Tygar, “Privacy in sensor webs and distributed information systems,” Software Security, pp. 8495, 2003. [14] Y. M. Lai, Xueling Zheng, K. P. Chow, Lucas Chi Kwong Hui, Siu-Ming Yiu, “Privacy preserving confidential forensic investigation for shared or remote servers,” in International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 311-323 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Systems,” Communication Software and Networks, pp. 7882, 2010. [31] A. Dehghantanha, N. Udzir, R. Mahmod, “Evaluating user-centered privacy model (UPM) in pervasive computing systems,” Computational Intelligence in Security for Information Systems, pp. 272-284, 2011. [32] A. Dehghantanha, R. Mahmod, “UPM: UserCentered Privacy Model in Pervasive Computing Systems,” Future Computer and Communication, pp. 65-70, 2009. [33] S. Parvez, A. Dehghantanha, HG. Broujerdi, “Framework of digital forensics for the Samsung Star Series phone,” Electronics Computer Technology (ICECT), Volume 2, pp. 264-267, 2011. [34] S. H. Mohtasebi, A. Dehghantanha, H. G. Broujerdi, “Smartphone Forensics: A Case Study with Nokia E5-00 Mobile Phone,” International Journal of Digital Information and Wireless Communications (IJDIWC),volume 1, issue 3, pp. 651-655, 2012. [35] FN. Dezfouli, A. Dehghantanha, R. Mahmoud ,”Volatile memory acquisition using backup for forensic investigation,” Cyber Warfare and Digital Foresnsic, pp. 186-189, 2012 [36] M. Ibrahim, MT. Abdullah, A. Dehghantanha , “VoIP evidence model: A new forensic method for investigating VoIP malicious attacks,” Cyber Security, Cyber Warfare and Digital Forensic , pp. 201-206, 2012. [37] Y. TzeTzuen, A. Dehghantanha, A. Seddon, “Greening Digital Forensics: Opportunities and Challenges,” Signal Processing and Information Technology, pp. 114-119, 2012. [38] N. Borhan, R. Mahmod, A. Dehghantanha, “A Framework of TPM, SVM and Boot Control for Securing Forensic Logs,” International Journal of Computer Application, volume 50, Issue 13, pp. 65-70, 2009. 323 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Modelling Based Approach for Reconstructing Evidence of VoIP Malicious Attacks Mohammed Ibrahim, Mohd Taufik Abdullah and Ali Dehghantanha Faculty of Computer Science and Information Technology Universiti Putra Malaysia, 43400 UPM, Serdang, Selangor, Malaysia
[email protected], {mtaufik, alid}@fsktm.upm.edu.my ABSTRACT Voice over Internet Protocol (VoIP) is a new communication technology that uses internet protocol in providing phone services. VoIP provides various forms of benefits such as low monthly fee and cheaper rate in terms of long distance and international calls. However, VoIP is accompanied with novel security threats. Criminals often take advantages of such security threats and commit illicit activities. These activities require digital forensic experts to acquire, analyses, reconstruct and provide digital evidence. Meanwhile, there are various methodologies and models proposed in detecting, analysing and providing digital evidence in VoIP forensic. However, at the time of writing this paper, there is no model formalized for the reconstruction of VoIP malicious attacks. Reconstruction of attack scenario is an important technique in exposing the unknown criminal acts. Hence, this paper will strive in addressing that gap. We propose a model for reconstructing VoIP malicious attacks. To achieve that, a formal logic approach called Secure Temporal Logic of Action(S-TLA+) was adopted in rebuilding the attack scenario. The expected result of this model is to generate additional related evidences and their consistency with the existing evidences can be determined by means of S-TLA+ model checker. KEYWORDS Voice over IP, S-TLA+, Reconstruction, malicious attack, Investigation, SIP, Evidence Generation, attack scenario 1 INTRODUCTION Voice-over Internet Protocols (VoIP) phone services are prevalent in modern telecommunication settings and demonstrate a potentiality to be the next-generation telephone system. This novel telecommunication system provides a set of platform that varied from the subjected and closed environment offered by conventional public switch network telephone (PSTN) service providers [1]. The exploitation of VoIP applications has drastically changed the universal communication patterns by dynamically combining video and audio (Voice) data to traverse together with the usual data packets within a network system [2]. The advantages of using VoIP services incorporated with cheaper call costs for long distance, local and international calls. Users make telephone calls with soft phones or IP phones (such as Skype) and send instant messages to their friends or loved ones via their computer systems [3]. The development of VoIP has brought a significant amount of benefits and satisfactory services to its subscribers [2]. However, VoIP services are exposed to various security threats derived from the Internet Protocol (IP) [4]. Threats related to this new technology are denial of service, 324 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) host and protocol vulnerability exploits, surveillance of calls, hijacking of calls, identity theft of users, eavesdropping and the insertion, deletion and modification of audio streams [5]. Criminals take advantage of such security threats and commit illicit activities such as VoIP malicious attacks. This requires acquisitions, analysing and reconstruction of digital evidence. However, detecting and analysing evidence of attacks related to converged network application is the most complicated task. Moreover, the complex settings of its service infrastructure such as DHCP servers, AAA server, routers, SIP registrar, SIP proxies, DNS server, and wireless and wired network devices also complicate the process of analysing digital evidence. As a result, reconstructing the root cause of the incident or crime scenario would be difficult without a specific model guiding the process. 1.1 Related Work In recent times, researchers have developed new models to assist forensic analysis by providing comprehensive methodologies and sound proving techniques. Palmer [6] first proposed a framework with the following steps: identification, preservation, collection, examination, analysis, presentation as well as decision steps. The framework was presented at the proceeding of the first Digital Forensic Workshop (DFRW) and served as the first attempt to apply forensic science into network system. The framework was later cobble together and produced an abstract digital forensic model with the addition of preparation and approach strategy phases; the decision phase was replaced by returning evidence. However, the model works independently on system technology or digital crime [7]. Similarly, the work of Mandila and Procise developed simple and accurate methodology in incident response. At the initial response phase of the methodology, it is aimed at determining the incident, and strategy response phase is formulated and added [8]. On the other hand, Casey and Palmer [9] proposed an investigative process model that ensures appropriate handling of evidence and decrease chances of mistakes through a comprehensive systematic investigation. Also in another paper, it was reported that Carrier and Spafford [10], has adopted the process of physical investigation and proposed an integrated digital forensic process. In another approach [11] combined existing models in digital forensic and comes up with an extended model for investigating cyber crime that represents the flow of information and executes full investigation. Baryamureeba and Tushabe reorganized different phases of the work of Carrier and Spafford and enhanced digital investigation process by adding two new phases (i.e. traceback and dynamite)[12] . Other frameworks include the work of Bebee and Clark which is hierarchical and objective based for digital investigation process[22]. However, all the aforementioned models are applied to digital investigation in a generalized form. Meanwhile, Ren and Jin [14] were the first to introduce a general model for network forensic that involves the following steps: capture, copy, transfer, analysis, investigation and presentation. The authors in [15] after surveyed the existing models suggest a new generic model for network forensic built from the aforementioned models. This model consists of preparation, detection, collection, preservation, examination, analysis, investigation and presentation. 325 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Furthermore, many authors proposed event reconstruction attacks models for instance Stephenson [16] analysed the root cause of digital incident and applied colored Petri Nets for modelling of occurred events. Gladyshev and Patel [17] developed event reconstruction in which potential attack scenarios are constructed based on finite state machine (FSM) and neglecting scenario that deviate from the available evidence. The author in [18] uses a computation model based on finite state machine together with computer history and came up with a model that supports the existing investigation. Rekhis and Boudriga proposed in [19], [20] and [21] a formal logic entitled Investigation-based Temporal Logic of Action (I-TLA) which can be used to proof the existence or non-existence of potential attack scenario for reconstruction and investigation of network malicious attacks. On the other hand, Pelaez and Fernandez [22] in an effort to analyse and reconstruct evidence of attacks in converged network, logs correlation and normalization techniques were proposed. However, such techniques are effective if the data in the file or forensic logs are not altered. The existing models stated above are more of generic not specific to a particular kind of attacks. Therefore, the need for reconstructing the evidences of malicious attacks against VoIP is highly needed because it plays an important role in revealing the unknown attack scenario. As a result, the reliability and integrity of analysis of evidence in VoIP digital forensic would be improved and enhances its admissibility in the court of law. In view of that, the work in this paper is focused on reconstruction of Session Initiation Protocol (SIP) server malicious attacks. Hence, the VoIP evidence reconstruction model (VoIPERM) is proposed that categorized the previous model in [23] into main components and subcomponents. The model described VoIP system as a state machine through which information could be aggregated from various components of the system and formulates them into hypotheses that enable investigator model the attack scenario. Following the reconstruction of attack scenario, actions that contradict the desirable properties of the system state machine are considered to be malicious [23]. Consequently, the collection of both legitimate and malicious actions enables the reconstruction of attack scenario that will uncover new more evidence. To determine the consistency of additional evidences with respect to the existing evidence, a state space representation was adopted that depict the relationship between set of evidence using graphical representation. The graphical representation enables investigators understand if generated evidences can support the existing once. Hence, it reduces the accumulation of unnecessary data during the process of investigation [23]. Additionally, the model is capable of reconstructing actions executed during the attack that moves the system from the initial state to the unsafe state. Thus, all activities of the attacker are conceptualized to determine what, where and how such an attack occurred for proper analysis of evidence [23]. To handle ambiguities in the reconstruction of attack scenario, S-TLA+ is to be applied. Essentially, the application of S-TLA+ into computer security technology is efficient and generic. On the other hand, S-TLA+ is built on the basis of logic formalism that accumulate forward hypotheses if there is deficient details to comprehend the compromised system [19]. In addition there were several works on malware investigation [24,25], analysis of cloud and virtualized environments [26-28], privacy issues that may arise during forensics investigation[29-34], mobile 326 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) device investigation [35-37] and greening digital forensics process [38]. The main contribution of this paper is to propose a novel model in VoIP digital forensic analysis that can integrate digital evidences from various components of VoIP system and reconstruct the attack scenario. Our objective is to reconstruct VoIP malicious attacks to generate more additional evidences from the existing evidence. The remaining of the paper is arranged as follows: next section discusses VoIP malicious attacks; 3 discuss VoIP digital forensic investigation, section 4 introduces the new model, section 5 discusses S-TLC model checker, section 6 case study and 7 conclusions. 2 VoIP MALICIOUS ATTACKS In general, an appropriate term used related to software built purposely to negatively affect a computer system without the consent of the user is called a malware [39]. And the increased number of malicious activities during the last decade brought most of the failures in computer systems [40]. Nevertheless, Voice over IP is prone to those malware attacks by exploiting its related vulnerabilities. Having access to VoIP network devices, intruders can disrupt media service by flooding traffic, whip and control confidential information by illicit interception of call content or call signal. Through impersonating servers, intruders can hijack and make fake calls by spoofing identities [3]. Consequently, the confidentiality, integrity and availability of the users are negatively affected. Also VoIP services are utilized by spammers to deliver instant messages, spam calls, or presence information. However, these spam calls are more problematic than the usual email spam since they are hard to filter [3]. Similarly, attacks can transverse gateways to an integrated network system like traditional telephony and mobile system. Meanwhile, compromising VoIP applications composed a link to break out security mechanisms and attack internal networks [39]. Also, attackers make use of malformed SIP messages to attack embedded web servers through Database injection vectors or Cross Script attacks [39]. 2.1 SIP Malicious Attack As previously explained, this paper considers SIP Server attacks. Several attacks are related to SIP server, but the most concern threat within research community is VoIP spam. Generally, spam is an unwanted bulk email or call, deliberated to publicize social engineering. The author in [3] discusses that “Spam wastes network bandwidth and system resources. It exists in the form of instant message (IM), Voice and presence Spam within a VoIP setting” [3]. It affects the availability of network resources to legitimate users which can result to denial of service (DoS) attack. Spam originates from the collection of session initiation in an effort to set up a video or an audio communications session. If the users accepted, the attacker continues to transmit a message over the real-time media. This kind of spam refers to as classic telemarketer Spam and is applicable to SIP protocol and is well known as Spam over IP Telephone (SPIT). However, spam is categorized into instant Message (IM spam) and presence Spam (SPPP). The former is like email spam, but it is bulky and unwelcome set of instant messages encapsulated with the message that the attacker wishes to send. IM spam is delivered using SIP message request with bulky subject headers, or SIP message with text or HTML bodies. The latter, is like the former, but it is placed on presence request (that is, SIP subscribes requests) in an effort 327 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) to obtain the "white list" of users to transmit them an instant message or set off another kind of communication [3]. 3 VoIP DIGITAL FORENSIC INVESTIGATION Lin and Yen [41] define digital forensic science to preserve, identify, extract, record as well as interpret the computer and network system evidence and analyse through complete and perfect methods and procedures.” On the other hand, forensic computing is particularly important interdisciplinary research area founded from computer science and drawing on telecommunications and network engineering, law, justice studies, and social science [42]. However, to convene with the security challenges various organizations developed numerous models and Methodologies that satisfy their organizational security policy. Presently, more than hundreds of digital forensic procedures developed globally [43]. Also the increase number of security challenges in VoIP persuades researcher to developed several models. On the other hand, in VoIP digital forensic a standard operating procedure called VoIP Digital Evidence Forensic Standard Operating Procedure (VoIP DEFSOP) is established [41]. Moreover, previous study noted that there was not established research agenda in digital forensic; to resolve that, six additional research areas were proposed at the 42nd Hawaii international conference, which include Evidence Modelling. In evidence modelling investigation procedure is replicated for practitioners and case modelling for various categories of crimes [44]. However, the increase number of crimes associated with computers over the last decade pushes product and company to support in understanding what, who, where and how such attack happened [45]. To fulfil this current development, in this paper the proposed model can support investigation and analysis of evidence by reconstructing attack scenario related to VoIP malicious attacks. Afterwards, the reconstruction of potential attack scenario will assist investigators to conceptualize what, where, and how does the attack happened in the VoIP system. 4 VoIP EVIDENCE RECONSTRUCTION MODEL (VoIPERM) The idea proposed in [43] is to assist investigators in finding and tracing out the origin of attacks through the formulation of hypotheses. However, our proposed model considered VoIP system as a state machine (which observed the system properties in a given state) and the model is built up from four main components as shown below. Figure 1. VoIP evidence reconstruction model The explanation of each component is as follows: 4.1 Terminal State/Available Evidence This component observes the final state of the system at the prevalence of the crime; it 328 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) is the primary source of evidence and is characterized by the undesirable system behavior. The terminal state provides available evidence and gives an inside about the kind of action acted upon on the compromised system [23]. Other properties of system compromise described by [21] include any of the following: Undesirable safety property of some system components Unexpected temporal property Given be the set of all reachable states in VoIP system and be the collection of all desirable properties in a given state. If then the final state of the system is said to be unsafe and can be represented as . For all actions where is the sequence of actions associated with each reachable state; then is said to be a malicious action. So is signifying one of the available evidence [23]. 4.2 Information Gathering This component is aimed to collect and gather information that gives details about VoIP system state. It requires the following subcomponents. VoIP components: these components provide services such as voice mail access, user interaction media control, protocol conversion, and call set up, and so on. The components can be proxy servers, call processing servers, media gateways and so on, depends on the type of protocol in use [23]. Moreover, software and hardware behaviours are observed to assist the investigator with some clue about VoIP system state. VoIP system states are defined as the valuation of component variables that change as a result of actions acted upon them. If are components variables that change by executing action in a given state. These variables are referred to as flexible variables given as ... and for any action that transforms . Where and are respectively variables in old and new state and . Then the properties of and are observed to decide whether they belongs to the system desirable properties [23]. VoIP vulnerabilities: These refer to any faults an adversary can abuse and commit a crime. Vulnerabilities make a system more prone to be attack by a threat or permit some degree of chances for an attack to be successful [46]. In VoIP systems, vulnerabilities include weaknesses of the operating systems and network infrastructures. Some weaknesses formed from poor in design and implementation security mechanism and Mis-configuration settings of network devices. VoIP protocol stack also associated with weaknesses that attacker exploits and access text based credentials and other private information. 4.3 Evidence Generation In this component, hypotheses are formulated based on information gathered in the previous stage. The formulated hypotheses are used in the process of finding and generation of additional evidence. The formal logic of digital investigation is applied to consider available evidence collected from different sources and handle incompleteness in them by generating a series of crime scenario according to the formulated hypotheses. This stage involves the following subcomponents: Hypothesis formulation: To overcome the lack of system details encountered during the investigation, hypotheses are formulated based on intruder’s anticipated knowledge about the 329 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) system and the details of information captured from VoIP components. The basis of hypothesis formulation is to predict the unknown VoIP malicious attack. In this case, there is a need to have specific variables attached to hypotheses and VoIP components respectively and make an assumption to establish a relationship between the variables. This determines what effect of such hypothesis if it is applied to VoIP components. To achieve this, three main requirements are set out: Hypotheses should establish a relationship between system states (that is, VoIP component states in this regard), to avoid violating the original properties (Type Invariant) of the system under investigation. All hypotheses found to be contradictory are eliminated to avoid adding deceptive hypotheses within a generated attack scenario. To efficiently select and minimize the number of hypotheses through which a node is reached, the relationship among the hypotheses should be clearly expressed [19]. Moreover, the process of investigation relied on the formulation of hypotheses to describe the occurrence of the crime. At the lowest levels of investigation, hypotheses are used to reconstruct events and to abstract data into files and complex storage types. While at higher levels of investigation, hypotheses are used to explain user actions and sequences of events [45]. An investigation is a process that applies scientific techniques to formulate and test hypotheses. At this point, VoIP variables are signifying as (indigenous Variable), while variables formed by hypotheses are denoted as (Exogenous Variable). Consequently, it describes how VoIP components are expected to behave if formulated hypotheses are executed. However, Assumptions are obviously made based on the expected knowledge of the attacker about the system. The sets of hypotheses are said to be variables signifying attacker’s expected knowledge about the system which is different from the flexible variables as has been mentioned. However, all the variables derived from hypothesis formulation are referred to as constrained variables denoted by ... . Meanwhile, while hypotheses are aggregated care should be taking to stay away from adding ambiguous hypothesis that can prevent the system from moving to the next state. In S-TLA+ it is signifies inconsistency and denoted as [19] Modelling of Attack scenario: Digital forensic practices demands for the generation of temporal analysis that logically reconstruct the crime [26]. Also according to [47], in crime investigation it is likely to reason about crime scenarios: explanation of states and events that change those states that may have occurred in the real world. However, due to the complexity of understanding attack scenario, to handle them, it is vital to develop a model that simplifies their description and representation within a collection of information and set aside new attacks to be regenerated from the existing ones [19]. For this reason, it is essential to model VoIP malicious attacks to enable investigators understand the attack scenario and describes how and where to acquire digital evidence. In this regard, instead of modelling both the system and witness statement as a finite automata like in [40] an S-TLA+ is used to model attack scenario as its 330 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) support logic formulation with uncertainty. In addition, evidences can easily be identified with S-TLA+ using a state predicate that evaluates relevant system variables [19]. Moreover, S-TLA+ is an advancement over a temporal logic of action (TLA). However, a system is signified in TLA by a formula of the form x: [ ]v , relating the set of all its authorised behaviours. It expresses a system whose initial behaviour satisfies and where every state satisfies the next state relation or leaves the tuple of specification variable unchanged. The infinite behaviour of the system is constrained by the Liveness property (written as a conjunction of weak and strong fairness conditions of actions). In this regard, TLA can be used in S-TLA+ to illustrate a system’s progress from a state to another, in advance of the execution of an action under a given hypothesis [11].Meanwhile, in STLA+ a constrained variable with hypothesis not yet express out, assumed a fictive value denoted as [19]. An action is a collection of Boolean function true or false if ( : / , ′) = true i.e. each unprimed variable in the state is replaced with prime variable ′ in state the action become true [19]. ( : / , ) = true i.e. each non-assumed constrained variable in state s is replaced with assumed constrained variable in state t. The action becomes true, and if { ⋀ then the set of actions is said to be legitimate actions. Likewise if { ⋀ then the set of actions is said to be malicious actions, where is the property satisfying the behaviour of [23], Attack scenario fragment are the collection of both legitimate and malicious actions that move the system to an unsafe state. Thus, attack scenario denoted as is defined [23] Testing Attack scenario: the purpose of testing generated attack scenario is to ascertain its reliability in respect to the system behaviours. The properties of the system at a given state is examined, the investigator should compare the properties of the generated attack scenario with the system final state. If any of the scenarios satisfied the properties of the final state, then the investigator should then generate and print digital evidence else the hypotheses should be reformulated [23]. Let be the set of the generated attack scenario and be the set of VoIP system states. If } and then satisfied the properties of the system final state, where is the property satisfying the behaviour of ( ) otherwise known as and [23]. 4.4 Print Generated evidence Evidences can be generated from attack scenario using forward and Backward chaining phases adopted from inferring scenarios with S-TLC [19]. However, the proposed model after being logically proof by the S-TLA+, it is expected to reconstruct malicious attack scenario in the form of specifications that can be verified using STLA+ model checker called S-TLC. S-TLC is a directed graph founded on the basis of state space representation that verifies the logical flow of specifications written in S331 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) TLA+ formal language. Therefore, absolute reconstructions of attack scenario fragments are represented and the logical relationships between them are illustrated on a directed graph [23]. At this point, the investigator is likely to realize what, how, where and why such an incident was accomplished in the VoIP system. Also the resulting outcome of the graph is to generate new evidence that matches the existing evidence. For all ⟨ ⟩ generated attack scenarios such that all the flexible variables and constrained variable are evaluated as and respectively, where is the valuation of all non-constrained variables called a node core and is the valuation of all constrained variables called node label. Then, each reachable state can be represented on the directed graph G with their node core and node label as , respectively. 5 S-TLC MODEL CHECKER, STATE SPACE REPRESANTATION A state can be represented on the generated graph as a valuation of all its variables including the constrained ones. It involves two notions: Node core: it represents the valuation of the entire non-constrained variables and Node label: is a valuation of the entire constrained variables under a given hypothesis. Given a state t, tn is used to denote equivalent node core, tc to describe resulting environment (is a set hypotheses) and Label (G, t) to refer to label in graph G. its its of its are not yet computed, during forward and backward chaining phases respectively. The S-TLC model checker works in three phases [19]. 5.1 Initialization Phase Initialization phase is the first stage in STLC algorithm and involve the following steps: 1. G as well as UF and UB are created and initialized respectively to empty set and empty sequence . At this step, each step satisfying the initial predicate is computed and then checked whether it satisfies the invariant predicate Invariant (that is a state predicate to be satisfied by each reachable state). 2. On satisfying the predicate Invariant, it is appended to graph G with a pointer to the null state and a label equal to the set of hypotheses relative to the current state. Otherwise, an error is generated. If the state does not satisfy the evidence predicate (i.e. a predicate characterized by system terminal state that represent digital evidence), it is attached to UF, otherwise it is considered as terminal state and append to UB which can be retrieved in backward chaining phase [19]. 5.2 Forward Chaining UF In this phase, all the scenarios that originate from the set of initial system states are inferred in forward chaining. This involves the generation of new sets of hypotheses and evidences that are consequent to these scenarios. During this phase and until the queue becomes empty, state is retrieved from the tail of UF and its successor states are computed. For every successor state t satisfying the predicate constraint (specified to assert bound on the set of reachable 332 The S-TLC algorithm is built on three data structures G, UF and UB , G refers to the reachable directed graph under construction. UF and UB are FIFO (first in first out) queues containing states whose successors International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) states), if the predicate Invariant is not satisfied, an error is generated and the algorithm terminates otherwise state is appended to G as follows: 1. If a node core tn does not exist in G, a new node (set to tn) is appended to the graph with a label equal to t c and a predecessor equal to sn. State t is appended to UB if satisfied predicate , otherwise it is attached to UF. 2. If there exists a node x in G that is equal to tn and whose label includes tc, then a conclusion could be made stating that node t was added previously to G. In that case, a pointer is simply added from x to the predecessor state sn. 3. If there exists a node x in G that is equal to tn, but whose label does not include tc, then the node label is updated as follows: tc is added to Label (G, x). Any environment from Label (G, x), which is a superset of some other elements on this label, is deleted to ensure hypotheses minimality. If tc is still in Label (G, t) then x is pointed to the predecessor state sn and node t is appended to UB if it satisfies predicate EvidenceSt ate . Otherwise, it is attached to UF [19] The resulting graph is a set of scenarios that end in any state satisfying the predicate EvidenceSt ate and/or Constraint. 5.3 Backward Chaining Phase 6 CASE STUDY All the scenarios that could produce states satisfying predicate generated in forward chaining, are constructed. During this phase and until the queue becomes empty, the tail of UB, described by state t, is retrieved and its predecessor states (i.e. the set of states si such that (si, t) satisfy action Next) which To investigate VoIP malicious attack using the proposed model, the following case study on the reconstruction of spam over Internet Telephony (SPIT) attack is proposed, to investigate the denial of service experienced by some of the VoIP users as a result of VoIP spam. A direct 333 are not terminal states and satisfy the predicate Invariant (States that doesn’t satisfy predicate Invariant are discarded because this step aims simply to generate additional explanations) and Constraint are computed. Each computed state s is appended to G as follows: 1. If sn is not in G, a new node (set to sn) is appended to G with a label equal to the environment sc. Then a pointer is added from node tn to sn and state s is appended to UB. 2. If there exists a node x in G that is equal to sn, and whose label includes sc, then it is stated that node s was been added previously to G. In that case a pointer is simply added from tn to the predecessor state sn and s is appended to UB. 3. If there is x in G that is equal to Sn, but whose label doesn't include sc, then Label (G, t) is updated as follows: sc is added to Label (G, x). Any environment from Label (G, x) which is a superset of some other elements in this label is deleted to ensure hypotheses minimality. If sc is still contained in the label of state x then the node t is pointed to the predecessor state x and the node is appended to UB. The outcome of the three phases is a graph G containing the set of possible causes relative to the collected evidence. It embodies different initial system states apart from those described by the specification [19]. International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) investigation shows that the network bandwidth and other resources has been exhausted by the server as it was busy receiving and sending audio message request to SIP URIs(Uniform Resource Identifiers). According to the VoIP evidence reconstruction model, the first stage emphasis on the identification of the terminal state and the available evidence of the attack. 6.1 Terminal State/Available Evidence Exhausting of bandwidth and other resource/sending an audio message request to SIP URIs. 6.2 Information Gathering This includes the following: VoIP Components: these comprise both signalling and media infrastructure. The former is based on session initiation protocol (SIP) in particular, that include: SIP STACK (SS) (which is responsible for sending and receiving, manufacturing and parsing SIP messages) and SIP addressing (SA) (is based on the URI). The latter, considered Real Transmission Protocol (RT) (RTP stacks) which code and decode, compress and expand, and encapsulate and demultiplex of media flows. VoIP vulnerabilities: it can be as a result of the following: a. Unchanged default passwords of deployed VoIP platforms can be strongly vulnerable to remote brute force attack, b. Many of the services that exposes data also interact as web services with VoIP system and these are open to common vulnerabilities such as cross-site request forgeries and cross- site scripting. c. Many phones expose service that allows administrators to gather statistics, information and remote configuration settings. These ports open the door for information disclosure that attackers can use to gain more insight to a network and identify the VoIP phones. d. Wrong configure access device that broadcast messages enable an attacker to sniff messages in VoIP domain. e. The initial version of SIP allows plain text-based credentials to pass through access device. 6.3 Evidence Generation This stage involves the following: Hypothesis formulation. Using the hypothesis that a VoIP running a service on a default password can grant an access to an intruder after a remote brute force attack. A hypothesis stating that service ports on VoIP phones expose data, also interact as web services, an intruder that have access to VoIP service can exploit such vulnerability in the form of cross-site scripting to have an administrator access. . Some phones expose a service that allows administrators to gather statistics, information and remote configuration, a hypothesis stating that such phones can grant an intruder with direct access to administrative responsibility. a. A hypothesis stating that there is a wrong configured access device which broadcast SIP messages. This enables the attacker to intercept SIP messages. 334 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) b. A hypothesis stating that the messages are running on the initial version of SIP, which has a vulnerability that send a plain text SIP message. The intruder that intercepts the messages can extract user information from the message. c. An intruder who is equipped with administrator function can create, decode and send a request message d. An intruder can extract SIP extension/URIs by sending an OPTION message request after searching all ports running on 5060 in SIP domain, to send a SIP message. e. A hypothesis stating that the credentials were encrypted with cipher text requires an encryption engine to enable the intruder to digest SIP message header and obtain other information. Modelling of Attack Scenario: in this case, we are to use STLA+ The specification describes the available evidence with predicate which uses the function request to state that the machine is busy sending invite audio messages. In this segment we are to represent hacking scenario fragment inform of hypothetical action as described below. a. : There is a Hypothesis stated that there is vulnerability that VoIP running service on a default password, an intruder can easily brute force and gain access and raise up his privilege from no access( ) to access level ( ) on the VoIP network, by performing brute force on VoIP( ) default password. : using the hypothesis stating that the service ports on VoIP has some vulnerabilities if it is exploited can raise the accessibility level of an attacker from ( ) to administrator access( ) by exploring service port vulnerability ( . : A hypothesis stating that some VoIP phones expose service that allows administrators to gather information for remote configuration. Such vulnerability can grant a direct access from ( ) to an administrator access ( ), if it’s exploited by exploring phone vulnerability ( ). : hypothesis stating that if there is wrong configured access devices, which allow messages to be broadcast a SIP has vulnerabilities that send messages with plain-text credentials. If it’s exploited, an intruder can intercept SIP messages ( ) and eavesdrop. : a user with administrative access can manufacture ( ), decode and encapsulate SIP messages using SIP STACK (SS). the user requires SIP extension or URIs to send an invite messages, being equipped with administrative access the intruder sends OPTION message request to extract SIP URIs ( ) provided that the b. c. d. e. f. 335 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) service port is running on 5060 ports. g. t: the intruder takes advantage of vulnerability that the device has an encryption engine, it will enable him digest the cipher text on SIP message header field value to extract other information related to SIP message credentials. h. : the intruder with administrative access and manufactured SIP message then send an invite audio message ( ) to the server as a message request. i. : the user then logout from the VoIP domain. The S-TLA+ attack scenario fragment module is depicted in the figure below. Testing Generated Scenario: given a set of a generated attack scenario, if any of the scenarios satisfies the terminal state of the system under investigation, then digital evidence is generated and printed otherwise the hypothesis is reformulated. In the case study presented above, an action in the generated scenarios satisfied the available evidence of the terminal state of the system. Print Generated evidence: To generate evidence from the attack scenario fragment presented in Figure 2, we used forward and backward chaining phases as explained above. This has been adopted from inferring scenarios with STLC[19]. Figure 3. Forward chaining phase VoIP attack scenario Figure 2. Generated attack scenario fragment using S-TLA+ The graph of Figure 3 shows the main possible attack scenario on VoIP. Initially, there is no user accessing the VoIP system. The default password was not changed during implementation of the system. An 336 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) intruder exploit this vulnerability by performing an action and gained access to the VoIP Service and the intruder further exploits vulnerability in the service ports with an action and gain and administrator access. Or exploit VoIP phones vulnerability with an action that grants access to administrative functions and obtain Administrator access. The hacker can intercept all the incoming messages into the server by executing an action , as a result of exploiting a vulnerability in which messages are sent as plain text based on the initial version of SIP. With administrative power, the intruder access SIP URIs from the intercepted messages after executing an action and send an audio invite messages to the collected URIs by performing an action without any hypothesis been established in the last two actions. Therefore the node labels remain the same and then logout and leave evidences within the system. The underlined texts in the generated graph are the available evidence, while others are new evidence generated during an investigation. The generated attack scenario stopped inconsistency from occurring. The action ( ) is not part of the generated scenario as a result of contradicting with action . The generated graph after execution of forward and backward chaining phase is shown in Figure 4. It shows a new generated scenario. It follows the same pattern with the forward chaining phase, but in this case the VoIP system is holding information on received messages that are not accessible to the intruder. The intruder performs the same actions as in the forward chaining phase and was granted an administrator access. Thereafter, the intruder manufactured a SIP invite messages by executing an action ( ). The intruder access SIP URIs and send a SIP invite audio message to the collected URIs by performing actions and respectively. No any hypotheses have been established for these actions to be executed, the intruder then logout from the system after executing an action and leave digital evidence. The underlined texts in the generated graph are the available evidence, while other texts are new evidences generated during reconstruction of attack scenario. Figure 4. Backward chaining phase, scenario attacks on VoIP 7 CONCLUSIONS In this paper, we proposed a model for reconstructing Voice over IP (VoIP) malicious attacks. This model generates more specified evidences that match with the existing evidence through the reconstruction of potential attack scenario. 337 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) Consequently, it provides significant information on what, where, why and how a particular attack happens in VoIP System. To harmonize our study, there is a need for reconstruction of anonymous and Peer-topeer SIP malicious attacks. REFERENCES 1. Yun-Sheng Yen, I-Long Lin, Bo-Lin Wu. A: Study on the Mechanisms of VoIP attacks: Analysis and digital Evidence. Journal of Digital Investigation 8, 56–67 Science direct (2011). Jaun C. Pelaez: Using Misuse Patterns for VoIP Steganalysis. 20th International Workshop on Database and Expert Systems Application (2009). Patric Park. Voice over IP Security. Cisco press ISBN: 1587054698 (2009) Hsien-Ming Hsu, Yeali S. Sun, Meng Chang Chen. Collaborative Forensic Framework for VoIP Services in Multi-network Environments. In: Proc. 2008 IEEE International workshops on intelligence and security informatics, pp. 260271 Springer-Verlag Berlin Heidelberg (2008) Jill Slay and Mathew Simon: Voice over IP: Privacy and Forensic Implication. International Journal of Digital Crime and Forensics (IJDCF) IGI Global (2009). Palmer G. : A road map for digital forensic research. In: First digital forensic research workshop. DFRWS Technical Report New York (2001). Mark Reith, Clint Carr and Gregg Gunsch: An Examination of Digital Forensic Models. International Journal of Digital Evidence. Vol. 1Issue 3. Fall (2002) Mandia K, Procise C.: Incident Response and Computer Forensics. In: Emmanuel S. Pilli, R.C. Joshi, Rajdeep Niyogi: Network Forensic Frameworks: Survey and Research Challenges. Digital Investigation pp.1-14, Elsevier(2010). 2. of Digital Evidence, Vol.3 Issue1. Summer(2004). 12. Baryamureeba V. Tushabe F.: The Enhanced Digital Investigation Process Model. In : Proceedings of the fourth digital forensic research workshop (DFRWS); (2004). www.makerere.ac.ug/ics 13. Beebe NL, Clark JG: A Hierarchical, Objectives-Based Framework For the Digital Investigations Process. Digital Investigation 2(2) pp146-66. Elsevier(2005) 14. Ren W , Jin H. : Modeling the Network Forensic Behavior. In: Security and Privacy for Emerging Areas in Ccommunication Networks, 2005. Workshop of the 1st International Conference pp 1-8 IEEE(2005) 15. Emmanuel S. Pilli, R.C. Joshi, Rajdeep Niyogi: Network Forensic Frameworks: Survey and Research Challenges. Digital Investigation pp.1-14, Elsevier(2010). 3. 4. 5. 6. 7. 8. 9. Casey E, Palmer G.: The investigative process. In: Emmanuel S. Pilli, R.C. Joshi, Rajdeep Niyogi: Network Forensic Frameworks: Survey and Research Challenges. Digital Investigation pp.1-14, Elsevier(2010). 10. Barian Carrier, Eugene Spafford.: Getting Physical with the Digital Investigation Process. International Journal of Digital Evidence, Vol.2 Issue 2. Fall(2003). 11. Ciarduhain O.S.: An extended Model of Cybercrime Investigation. International Journal 16. Peter Stephenson.: Modeling of Post-incident Root Cause Analysis. International Journal of Digital Evidence 2, pp. 1-16 (2003). 17. Pavel Glydyshev and Ahmed Patel :Finite State Machine Approach to Digital Event Reconstructions, International Journal of Digital Forensic & Incident, ACM pages 130149,(2004) 18. Brian D. Carrier and Eugene H. Spafford: An Event-Based Digital Forensic Investigation Framework. In: Proc. 2004 DFRWS 2004, pp. 1-12 (2004). 19. Slim Rekhis: Theoretical Aspects of Digital Investigation of Security Incidents. PhD thesis, Communication Network and Security (CN&S) research Laboratory (2008). 20. Slim Rekhis and Noureddine Boudriga: Logic Based approach for digital forensic investigation in communication Networks. Computers & Security pp 1-21, Elsevier (2011). 21. Slim Rekhis and Noureddine Boudriga: A Formal Logic- Based Language and an Automated Verification Tool for Computer Forensic Investigation in communication Networks. 2005 ACM symposium on Applied Computing pp. 287-289 (2005) 22. Jaun C. Pelaez and Eduardo B Fernandez. Network Forensic Models for Converged Architectures. International Journal on Advances in security, Vol 3 no 1 & 2 (2010). 23. Mohammed Ibrahim, Mohd Taufik Abdullah, Ali Dehghantanha: VoIP Evidence Model : A New Forensic Method For Investigating VoIP Malicious Attacks. Cyber Security, Cyber Warfare and Digital Forensic (CyberSec), IEEE International Confence, Malaysia (2012). 338 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) 24. F. Daryabar, A. Dehghantanha, HG. Broujerdi, Investigation of Malware Defence and Detection Techniques,” International Journal of Digital Information and Wireless Communications(IJDIWC), volume 1, issue 3, pp. 645-650, 2012. 25. F. Daryabar, A. Dehghantanha, NI. Udzir, “Investigation of bypassing malware defences and malware detections,” Conference on Information Assurance and Security (IAS), pp. 173-178, 2011. 26. M. Damshenas, A. Dehghantanha, R. Mahmoud, S. Bin Shamsuddin, “Forensics investigation challenges in cloud computing environments,” Cyber Warfare and Digital Forensics (CyberSec), pp. 190-194, 2012. 27. F. Daryabar, A. Dehghantanha, F. Norouzi, F Mahmoodi, “Analysis of virtual honeynet and VLAN-based virtual networks,” Science & Engineering Research (SHUSER), pp.73-70, 2011. 28. S. H. Mohtasebi, A. Dehghantanha, “Defusing the Hazards of Social Network Services,” International Journal of Digital Information, pp. 504-515, 2012. 29. A. Dehghantanha, R. Mahmod, N. I Udzir, Z.A. Zulkarnain, “User-centered Privacy and Trust Model in Cloud Computing Systems,” Computer And Network Technology, pp. 326332, 2009. 30. A. Dehghantanha, “Xml-Based Privacy Model in Pervasive Computing,” Master thesisUniversity Putra Malaysia 2008. 31. C. Sagaran, A. Dehghantanha, R Ramli, “A User-Centered Context-sensitive Privacy Model in Pervasive Systems,” Communication Software and Networks, pp. 78-82, 2010. 32. A. Dehghantanha, N. Udzir, R. Mahmod, “Evaluating user-centered privacy model (UPM) in pervasive computing systems,” Computational Intelligence in Security for Information Systems, pp. 272-284, 2011. 33. A. Dehghantanha, R. Mahmod, “UPM: UserCentered Privacy Model in Pervasive Computing Systems,” Future Computer and Communication, pp. 65-70, 2009. 34. A.Aminnezhad,A.Dehghantanha,M.T.Abdullah , “A Survey on Privacy Issues in Digital Forensics,” International Journal of Cyber Security and Digital Forensics (IJCSDF)- Vol 1, Issue 4, pp. 311-323, 2013. 35. S. Parvez, A. Dehghantanha, HG. Broujerdi, “Framework of digital forensics for the Samsung Star Series phone,” Electronics Computer Technology (ICECT), Volume 2, pp. 264-267, 2011. 36. S. H. Mohtasebi, A. Dehghantanha, H. G. Broujerdi, “Smartphone Forensics: A Case Study with Nokia E5-00 Mobile Phone,” International Journal of Digital Information and Wireless Communications (IJDIWC),volume 1, issue 3, pp. 651-655, 2012. 37. FN. Dezfouli, A. Dehghantanha, R. Mahmoud ,”Volatile memory acquisition using backup for forensic investigation,” Cyber Warfare and Digital Foresnsic, pp. 186-189, 2012 38. Y. TzeTzuen, A. Dehghantanha, A. Seddon, “Greening Digital Forensics: Opportunities and Challenges,” Signal Processing and Information Technology, pp. 114-119, 2012. 39. Mohammed Nassar, Radu State, Olivier Festor: VoIP Malware: Attack Tool & Attack Scenarios In: 2009 IEEE International Conference on Communications (2009). 40. Mouna Jouini, Anis Ben Aissa, Latifa Ben ArfaRabai, Ali Milli: Towards quantitative measures of Information Security: A cloud computing case Study” International Journal of Cyber-Security and Digital Forensic (IJCSDF) 1(3):248-262. The society of Digital Information and Wireless communications.(ISSN:23050012)( 2012) 41. I-Long Lin, Yun-Sheng Yen: VoIP Digital Evidence Standard Operating Procedure. International Journal of Research and Reviews in Computer Science 2, pp. 173 (2011). 42. Jill Slay and Mathew Simon: Voice over IP forensics. In: e-Forensics 08 Proceedings of the 1st international conference on Forensic applications and techniques in telecommunications, information, and multimedia workshop. Adelaide, Australia (2008). 43. Siti Rahayu Selamat, Robiah Yusof, Shaharin Sahib, Nor Hafeizah Hassan, Mohd Faizal Abdollah, Zaheera Zainal Abidin. Traceability in Digital Forensic Investigation Process. In: 2011 IEEE Conference on Open Systems, pp. 101-106 (2011). 44. Kara Nance Brian Hay, Matt Bishop. Digital Forensic: Defining a Research Agenda Incident Response. In: Proc. 42nd Hawaii International Conference on system science (2009). 45. Karen Kent Suzanne Chevaliar, Tim Grance, Hung Dang. Integrating Forensic Techniques into Incident Response. A white paper submitted by Guidance Software Inc. UK (2006). 46. Tamjidyamcholo A, Dawoud R A.: Genetic Agorithm for Risk Reduction of Information Security. International Journal of CyberSecurity and Digital Forensic(IJCSDF) 1(1):59339 International Journal of Cyber-Security and Digital Forensics (IJCSDF) 1(4): 324-340 The Society of Digital Information and Wireless Communications, 2012 (ISSN: 2305-0012) 66 (ISSN:2305-0012) the society of Digital Information and wireless communications (2012). 47. Jeroen Keppens and John Zeleznikow. “A Model Based Approach for Generating Plausible Crime Scenarios from Evidence. In: Proc. of the 9th International Conference on Artificial intelligence and Law (2003). 340