´ LIBRE DE BRUXELLESUNIVERSITE Facult´ e des Sciences D´ epartement d’Informatique Digital reverse engineering of executable files. Obfuscation techniques against patching. Nikita Veshchikov Promoteur : Prof. Olivier Markowitch M´emoire pr´esent´e en vue de l’obtention du grade de Master en Sciences Informatiques Ann´ ee acad´ emique 2010 - 2011 Acknowledgments First of all, I would like to thank my family for their patience. I would like to thank my advisor - Olivier Markowitch for his advices and support. I am grateful to everyone who helped me editing this paper, especially Tony Osborne and Julia Zavyalova. I would also like to thank persons who suggested interesting ideas and new sections for this work - Liran Lerman and Markus Lindstr¨om. A very special thanks goes to everyone who listened to my explanations about reverse engineering, code obfuscation and error correction on numerous occasions. Thank you for your patience! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Definition of reverse engineering . . . . . . . . .4 Notations . . . . . . . . . . . . . . 5. . . . . . . .1 Hex editors . . . . . . . . 5. . . . . . 4 4 5 5 3 Definitions of reverse engineering 3. . . . . . . . . . . . . . . . . 3. .2. . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . .3 Operating systems . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . .2 Organization . . . . . . . . . . . . . . . . . . . . 5. . . . 14 14 15 15 16 17 20 6 Reversing tools 6. . . . 4. . . . .1 Programming languages . . . . . . . . . . . . . . . . . . . . . . . . . .1 Copyright . . . . . . . . . . . . . . . . . . . . . . .3 Definition of digital reverse engineering . . .1 Goal and context 1. . . . . . . . . .2 Changes due to optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. 1. . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Compilers . . . .Contents 1 Introduction 1. 8 8 9 10 4 Legal aspects of digital reverse engineering 4. . . . . . . . . . .1. . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . 11 12 12 12 II Minimum knowledge required to perform digital reverse engineering 13 5 Theoretical knowledge 5. . . . . . . . . . . . . . . . . . . . . .2 Sandboxes . . . . . . . . . 6. . . . .2 Reverse engineering in military . . . . . . . . .1 General changes in the code structure 5. . . .3 Contributions . . . . . . . . . . .3 Digital reverse engineering . . . . . . . . 22 22 24 . . . . . . . . . . . . . . . . . . . . . . . .1 Determining language used . . . . . . . 4. . . . . . . . . . Understanding reverse engineering 1 1 1 2 2 3 2 History 2. . . . . . . . . . . . . 1. . . .1 Reasons for reverse engineering . . . . . .2 Patent . . . . . . . . . .1 Intellectual property protection . . . . . . .1 Intuition behind reverse engineering . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . .2. . Monitoring tools . . . . . . . .4 Algorithm TPCA: Checker Network . . . .1 File type recognition . . 7. . . 9. . . . . . . . . . . . . . . . . . . . . . . . 69 69 III 71 6. . . . . . .3 Error correcting codes . . . . . . . . . . .1 Manual checking . . . . . . . . . . 9. . . . . . . . . 9. . .5 Hardware protections summary . . . 7. . . . . . . . . . 24 25 26 26 28 29 29 30 32 33 33 7 Code obfuscation 7. .1 The definition . . . . . . . . . . . . . . . .1 Training . . . . . . . . .3. . . .i . . 7. . . 7. . . . 9. . . . . . . . . . . . . . . . .1 The idea behind error detection and correction 9. . . . . . . . . . . . . . . . Visual representations . .3. . . . .2 The problem . . . . . . . . . . . . . . Contribution 9 Anti-patching 9.3 Check results of computations . 7. . . . . . .9 6. . . . . . . . . . . 7. . . . . . . . .3 6. . . . . . . . . . . . . . . .1 Why obfuscate? . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Strings and pattern searching . . .1 A known problem . . . . . . . . . . . . . . . . . . . 39 39 39 40 40 41 47 52 58 60 62 63 64 65 65 65 66 67 68 8 Applied reversing 8. . . . . . . . . . . .3. . . . 7. . . . . . . . . . . . . . . Disassemblers . . . . .3 Anti-reversing techniques .4 6. . . . . 7. . .2 Cryptoprocessors . . . . . . . . 7. . . . . . . . . .2 Control flow obfuscation . . . . . . . . . .7 6. . . . . . .3. .4 Trusted computing . . . . .1 Decompilers . . . . . . .6 6. . . . . . Automated deobfuscators . . . . .7 Eliminating symbolic information .1 Virtual environments . . . . . . . . . . . . .5 6. . . . . .9. . . . . . . . .2 Existing solutions . .5 Data transformations .2 Error detecting codes . . . . . . . . . . 6. . . . . . . . . . . . .9. . . . . . . . . . . . . . . . .3 Error detecting and error correcting codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . Dumping tools . . . . .4. 7. . .3. . . . . . . . . . . . .1 Packing techniques . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . Debuggers . . . . . . . . . . . . . . .2. . . . . . . . . .4 Pushing the reversing problem out of the software world 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . .4 Crashing and confusing reversing tools . . . . . . . . . . . . . .3.8 6. . . . . . . . . Miscellaneous useful tools . . . . . . .3. . . . . 9. . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . .3 Dongles . . . . . . . . . . . . .8 Human reversers versus automated deobfuscators 7. . . . . . . . . . . . . . . . 7. . . . . . 9. . . . . 7. . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . .2. . . . . . . . . . . . . . . 6. . . . . . . . . . . . . 6. 9. . . . . . . . . . . .6 Hiding data . .4. . . . . . . . . . . . . . . . . . . . . 7.2. .3 Detection of digital reverse engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Automatic error detection . . . . . . . . . . . . . . . . . . . . .1 Program as a service . . . . . . . . . .2. . 72 72 72 73 73 73 74 74 74 75 77 . . . 7. . . . . . . . . . . 9. . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Implementation . . . . .4 Advantages and disadvantages . . . . . . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 The idea . . . 96 96 96 97 99 100 106 D Internship report . .3 Other possible implementations 9. . . . . .4. . . . . . . . . . . . . . . .1 History of reverse engineering . . . .1 Reverse engineering in military . . . . C. . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . .4 SecretValue . . . . .2 ComputeHash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . .1. . . . . . . . . . world . . . . .4 My addition . . . . . . . . . . . . . . . . . . . . . . . C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 AutoCorrect . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . .6 Makefile . .1 Hello. . . . . . . . . . . . C. . . . . . . . . C. . . . . . . . . . .ii 9. . . . . . . . . . . . . 79 79 81 84 86 10 Conclusions 10. . . . . . . . . . . . . . . . . . . . . B. . . . . . 93 93 93 94 95 C Code C. . . .2 Further work . . .4. . . . . . .1 Anti-patching . . . . . . . . . . . . . . . 90 90 90 A Abbreviations 91 B Images B. .3 AddChecksum C. . . . . . . . . . . . . . . . . 9. . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Reverse engineering in digital world B. . . . 10. .2 Flowchart . . . . . . . . 107 . B. . . . . . . . . and improvements . . Part I of this paper is meant to introduce digital reverse engineering. Secondly. This work has three main goals. The first goal is to understand the basic concepts of digital reverse engineering. digital reverse engineering could be used in order to understand how a malicious program works (and then protect a system against this program) or it could be used to crack a copyright-protected software i. Then. Finally. disable its protection. Chapter 2 describes two cases from the history of reverse engineering. Then.Chapter 1 Introduction 1. Chapter 4 gives a quick overview about the legislation related to reverse engineering. There exist many kinds of obfuscations against different reversing techniques. Digital reverse engineering is a very powerful tool that can be used for good or evil. but it is also used by reverse engineers (especially by crackers.1 Goal and context Digital reverse engineering is a very interesting sub-domain of reverse engineering. Patching is a process of modifying a program. the third goal is to suggest improvements and other possible implementations for the implementation of anti-patching techniques which use error correcting codes. It consists in extracting from a software the knowledge about how this software works. One commonly used reversing technique is called patching. Obfuscation techniques are techniques used for protection against reverse engineering. Part II describes the minimum knowledge needed in order to do digital reverse engineering. The second one is to propose an anti-patching technique based on error correcting codes and to implement a proof of concept program. in order to show that this technique works. Several obfuscation techniques can be used against patching.2 Organization First of all. It is used by software developers in order to update their programs. that any reverser must have. Chapters 6 and 7 give an overview about reversing tools and techniques 1 . Developers of malware do not want their programs being disabled and developers of copyrightprotected software do not want their programs being cracked and distributed for free. Chapter 5 presents a theoretical background. they use patching to disable the protection of a program). For example. Chapter 3 defines the reverse engineering and the digital reverse engineering. That is why almost all software developers are concerned about reverse engineering.e. 1. The source code of the proof of concept program is given in appendix C on the page 96. Implementation of a proof of concept program which use an error correcting code in order to restore its original code if it was tampered (see Chapter 9 and Section 9. . presents existing solutions against patching and explains how error correcting codes could be used against patching. The abbreviations RE and DRE are used for reverse engineering and digital reverse engineering. explains why some developers are concerned about it. it also explains how to start practicing digital reverse engineering.4 Notations The terms reverse engineering and reversing are used interchangeably in this paper. The Chapter 10 concludes this paper. 1. as well as terms a reverse engineer and a reverser.2 used in order to prevent digital reverse engineering (also called obfuscating techniques). 1. List of difficulties related to the use of error correcting codes against patching and suggestions on how to overcome these difficulties (see Section 9.2). by pointing out that proof of concept program (described in Chapter 9) works well.3 Contributions The list of contributions: 1.4. Finally. Part III describes the proof of concept program and concludes this paper. 2. that theoretical knowledge is not enough to do digital reverse engineering. Appendix ?? is the internship report at Forensic Technology Solutions of PriceWaterhouseCoopers.3). Chapter 8 points out. A list of abbreviations used in this paper is given in appendix A on the page 91.4. Chapter 9 presents patching. Part I Understanding reverse engineering 3 . Protection removal. Replication of an existing device or object. Before delving into history. Here is a list of the most common goals achieved by reverse engineering (RE): Creation and the completion of documentation for an existing product or device. Mere curiosity.Chapter 2 History The history of reverse engineering is older than might appear at first glance. Reverse engineering was used long before the invention of computers. 2.1 Reasons for reverse engineering One of the most relevant questions that one might ask is: “Why do people want or need to reverse-engineer something?”. Security auditing. Because of the lack of documentation sometimes RE might help to design a device that would be able to use or otherwise interact with an existing component or product. Copyright protection or access restriction removal. For example to identify potential patent/copyright violations or to do malware analysis. Sometimes existing documentation for a device or a software is not clear enough. Military or commercial espionage to acquire sensitive data. learning from others’ successes and mistakes. It appears that there are many answers to this question. Re-designing. lets see why people do reverse engineering. Espionage. or has been lost or was never written and nobody knows how the device works. The best way to audit a secured system is to try to break it. 4 . Learning. Interoperability. Incorporation of new functionalities into an existing device or program. Academic purposes. Product analysis. 3 (page 94). Originally its characteristics were a 4. a prototype of the Tu-4 (see figure B. The third was left untouched as an original reference (For more information about the Tu-4 creation read [52] and [64]). One B-29 was given to Gromov Flight Research Institute for flight training. It was mainly used by the United States Army Air Force (USAAF) in the war against Japan. which were then reverse engineered by Tupolev Design Bureau in order to build the Tu-4. On 21 September 1942 a new Boeing Bomber B-29 Superfortress (see figure B. see figure B.3 Digital reverse engineering The history of digital reverse engineering (DRE) began in June 1982 with one of the most famous cases of reverse engineering . The USA refused Soviet requests to purchase this aircraft. page 93) had its first flight. on 19 May 1947. One of them was broken and could not be repaired while three others had only minor damage and were repaired.g. Tu-4 was put into mass production two years later. On 12 August 1981 the first IBM PC was released [59. 56]. tank) to upgrade an already existing equipment There are many known examples of machines that were reverse engineered. airplane. The second B-29 was fully disassembled by Tupolev Design Bureau in order to study its parts. occasionally on Soviet territory.5 2.g.2. 34].256 kB of memory. 16 kB . Based on the flight tests from this institute full instructions for B-29 as well as its flight characteristics and capabilities were provided. In this way the USSR acquired three B-29 bombers. 2. Here is the story of one such case that illustrates the military approach towards reverse engineering. In 1944 four B-29 bombers landed in the USSR. B-29 bombers made many missions to Japan and sometimes had to make emergency landings. It also had three possible operating systems (OS): IBM BASIC / PC-DOS 1.2 Reverse engineering in military As for many other things (e. the Boeing B-29 became one of the most performant (range and gross weight) bombers in the world.0 CP/M-86 (later MS-DOS) UCSD p-System . The most common military motivations for reverse engineering have been: to find any vulnerabilities in a potential enemy’s equipment to produce a copy of a component or an entire machine (e. At the same time the Soviet Union did not have any comparable bomber. A few years later. see [6]. However.implementation of the first non-IBM PC BIOS by Columbia Data Products.1 on page 93) prototype had its first flight [57.77 MHz CPU. GPS) the military was instrumental in applying RE techniques. Two years later after substantial modifications. Clean-room software engineering is a process of software development that intend to produce a software with a certifiable level of reliability i. The main principle: in order to copy something two separate teams are created.1. The first group reverse-engineers the original product and creates its documentation. These two sources of information would allow to recreate the BIOS. This turned out to be wrong. 42].1). The second group creates a new product based on the new documentation. It was possible to understand the inner working of the BIOS with the help of the manual. IBM wanted to be the only PC manufacturer but could not have copyright nor patent on the entire IBM PC because it was designed and manufactured with different existing parts and chips of non IBM origin.e. one has to know how to ”talk” to the PC’s motherboard in order to design a new component. The expression ”the BIOS was clean-roomed” is also used. The idea of clean room reversing is based on the principle of independent invention. 2 Do not confuse clean room reverse engineering with clean-room software engineering. Here was the problem: how to create a similar software that would be compatible with all already existing hardware without knowing the BIOS? Many companies were thinking about how to do it ever since the IBM PC had become a big success. IBM PC compatible computers were produced by the following companies: Columbia Data Products presented its IBM compatible MPC 1600 ”Multi Personal Computer” in June 1982 [6]. Here is the answer to the problem: the copy was done through the process that is called clean room reverse engineering. They could not copy IBM’s BIOS as that was illegal. having confidence that no other company could reproduce the BIOS. Compaq Computer Corporation released its version of the BIOS in November 1982. Also it was possible to acquire the original assembly code and then reverse engineer it. Although many companies wanted to create their own PC compatible computer they were not allowed to use IBM’s BIOS. focus on defect prevention.6 The Basic Input/Output System (BIOS) was the only component with an IBM copyright (see Section 4. Since putting a license on the hardware was not possible IBM decided to put a copyright license on the BIOS. Phoenix Software created its BIOS in 1983 and introduced it to the market in May 1984 [34. In this way IBM hoped to be the only manufacturer able to produce PCs. and in one year. IBM encouraged other manufacturers to produce add-on modules for the IBM PC. Thus anyone could reproduce the entire hardware of an IBM PC by simply buying the same chips from the same suppliers that IBM used. However. . Definition 1. The 1 Hardware specifications are not enough to produce add-ons. copying the code via RE was illegal because lawyers consider it as a copy since the reverser would see the original code which was under copyright protection. Clean room reverse engineering2 (also called Chinese wall) is a reversing method that involves two separate groups of engineers. Finally the answer was found. It released a technical reference manual with diagrams of all PC’s hardware components as well as the motherboard specifications1 for the contributing manufacturers. The task of the second team is to create a completely new product using only the documentation produced by the first team. . schemes.7 goal of the first team is to reverse engineer the subject in order to create exhaustive documentation (diagrams. Otherwise the judge will consider that the new product is an illegal copy. as it is not very cost effective compared to 20-30 years ago: in 1982 BIOS was a quite small code compared to modern software. but not the patent (see Section 4. Recreating a large modern software from scratch is a very challenging exercise. The most challenging part of this process is to ensure and to be able to prove that the members of the second team have never seen or studied the original product. because it was created after seeing the original product. reversing a patented product for security auditing or for studying could be very interesting.1). flow charts etc) which is then then given to the second team. Nowadays clean room RE is not widely used. However. The above example illustrates how to evade the copyright protection. Then Bob disassembled the code of the progaram and started to analyse the general structure of the program. Alice imediately called Bob and asked him to come in order to inspect her finding. a famous explorer. Alice had a friend Bob . He discovered that Alice’s finding was copiled with g++ 1. It contained about 150 bones. Bob was not able to tell directly what it was. so he took the image of the hard drive (with the suspicious file) in order to analyse it in his laboratory. 8 .8 meters long and 4. the process of reverse engineering as well as resemblance between scientific research and reverse engineering. The program contained about different 40 functions. He discovered that Alice’s finding lived about 66 million years ago. Alice imediately called Bob and asked him to come in order to inspect her finding.0. The file was a dynamically linked executable of 26 kilobytes. Once upon a time.1 Intuition behind reverse engineering Here are two little stories that illustrate the intuition behind reverse engineering.0 tall. The sceleton was 12. Alice had a friend Bob . Alice.2. but he was curious about Alice’s finding. Alice. so he extracted the fossil from the ground in order to analyse it in his laboratory. but he was curious about Alice’s finding. a system administrator. Then Bob assembled all fossils of the sceleton together and started to analyse the general structure of the sceleton..an experienced reverse engineer. Bob was not able to tell directly what it was. was inspecting her network when she found a strange executable file.Chapter 3 Definitions of reverse engineering 3. was exploring a desert when she found strange fossils. Bob started his analysis by dating the executable file. Bob started his analysis by dating the fossils..an experienced paleontologist. An integrated circuit might also be reverse engineered by an unscrupulous company wishing to make unlicensed copies of a popular chip. 1988). Here is the definition from the Oxford dictionary [17]: Reverse engineering is the reproduction of another manufacturer’s product following detailed examination of its construction or composition.9 He found. run it to study how it behaved with different input and then attempt to write a program oneself which behaved identically (or better). Further analysis of the progam showed that the program copies itself.a dinosaur.2 Definition of reverse engineering Different sources give slightly different definitions of reverse engineering. that the the animal is bipedal reptile .” Let us consider a more complete definition is as given in Free On-Line Dictionary of Computing [20]: “Reverse engineering is the process of analyzing an existing system to identify its components and their interrelationships and create representations of the system in another form or at a higher level of abstraction.” Here is a very compact definition by Andrew Huang. He found. that the the program comunicates through the network. based on various sources of information ([19] and those listed above): . Finaly. The first time a part of a bone (a teeth) of a Tyrannosaurus rex was found in 1874 near Golden. Bob classified Alice’s finding as a computer worm. The definition from the Cambridge dictionary [16]: “Reverse engineering is when a company copies the product of another company by looking carefully at how it is made. The first worm spreaded throug the Internet was the Moriss Worm (November 2. This two stories are work of fiction and any resemblance between the characters and persons living or dead is purely coincidental. one might take the executable code of a computer program. Finaly. Reverse engineering is usually undertaken in order to redesign the system for better maintainability or to produce a copy of a system without access to the design from which it was originally produced. Further analysis of the fossils showed that the animal was a predator. 3. Colorado. Bob called Alice’s finding Tyrannosaurus. that he gives in his book about reverse engineering of the Xbox [25]: “Reverse engineering is the process of extracting know-how or knowledge from an artifact” Here we would like to propose the following definition. For example. 10 Definition 2. Definition 4. Data reverse engineering is a reverse engineering of file formats.a sub-domain of reverse engineering. data structures and protocols (the structure of fields as well as the order of messages). e. In both cases the subject could be studied both passively and actively: disassembling or execution in a virtual environment in reverse engineering and observations or experiments in science. 3. Binary reverse engineering (also called code reverse engineering) is a reverse engineering of executable files. Definition 3. Reverse engineering is the process of discovering the technological principles of a man-made object or system through analysis of its structure and operation. Quite often a reverser has to do some data reverse engineering in order to perform the reversing of an executable file and vice versa. RE involves man made objects or devices whereas scientific research usually involves natural things or phenomena.3). When a person is trying to study natural phenomena by observing and analyzing it. The understanding of data structure helps to understand how it is managed and algorithms that use a data can reveal their structure. In both cases the study is made because the person who undertakes it does not have access to the relevant documentation or any other kind of explanations about how the subject works. malware reverse engineering.2) whereas digital reverse engineering is.3 Definition of digital reverse engineering This paper is about digital reverse engineering (DRE) (also called software reverse engineering) . In other words. Data reverse engineering. Excel and PowerPoint formats were reverse engineered and now OpenOffice can handle files of these formats. proprietary Microsoft Word. RE is not necessarily attached to the IT domain (see example in Section 2. e. BIOS (see Section 2. . Digital (or Software) Reverse Engineering is reverse engineering of any part of a computer software.g. RE is not unlike scientific researche. the whole process is called scientific research. Definition 5.g. or by reproducing it in an experiment in order to understand how it works. Generally reversers distinguish between two types of digital reverse engineering: Binary reverse engineering. reverse engineering is a methodology used to find out how things work. 19. In many cases it is hard to tell if RE of a given system is legal or not. 44. Nowadays many people try to reverse Skype’s protocol (see [12] and [11]) for different reasons (see Section 2. Here are some examples which illustrate it. The protocol used by this program to communicate between clients is also proprietary. if the structure of Skype’s protocol is revealed. compared to others mentioned above. as there is almost no clarity (see [53. Another example.Chapter 4 Legal aspects of digital reverse engineering Legal aspects of digital reverse engineering seems to be a very complicated area. Clean room reverse engineering (see Section 2. While legislation for murders. Therefore reverse engineering is still considered as a “grey area”1 . some programs are open to be reverse engineered .1. it is not a high price to pay. and this is an obvious way to copy any system which is protected by the copyright law (see Section 4. However.full disclosure. Maybe it would be better.1). Skype is a proprietary Voice-over-IP (VoIP) program. grey hats are somewhere in the middle. 5. which is described in [25]. is the reversing of the Xbox. There is also another important issue – the security of protocol (now it seems more like security through obscurity. which is free of charge.3) is legal in most countries. divorces. legislation for RE still remains quite unclear. On the other hand. Grey hat. 25]). 11 .CrackMe’s and KeygenMe’s (see 1 Do not confuse with grey area that referres to grey hats. stealing. white hats . see definition 22). In this way it would be possible to check if the personal data is collected and if the protocol is secure. because Skype provides very good service and has developed a good protocol. However. This is due to the fact that RE (and IT in general) is a very young domain. making copies of copyright-protected material is forbidden by the copyright law.1). as well as black hat and white hat are the terms usually used to refer security experts in terms of their attitude towards sharing the information about security issues. but the developers of the original product were not happy about it. heritage and many other domains is well developed. Skype’s owners would like to retain the monopoly ownership of the protocol. The final example is about the data reverse engineering of Skype’s protocol. the monopoly of Skype’s protocol does not reveal whether personal data is collected (and used by Skype). Their respective attitudes are: black hats .full concealment. The reversing was done for learning purposes. On the one hand. a program is copied into RAM memory and into processor’s cache memory during its execution. 4. In order to be patented an invention has to be non-obvious and novel. For example. There also exist several exceptions e.1. Copyright law contains an exception. The inventor has to provide all information needed by someone in order to be able to recreate the invention. that allow the owner to copy a program to the memory of a computer.g. from the moment it was created. However. it does not protect from independent invention or from use of ideas based on the original work e.1. specially created in order to be reverse-engineered for studying purposes and no legislation clearly forbidds it. EU) or 50 years (e. Some countries have no copyright law (e. So. Any reverser should know if he is reversing a protected product. the inventor contributes to the general store of knowledge of the society. Patents also protect against independent invention. However. He also should be aware of the difference between a copyright and a patent.g.1 Copyright Copyright law protects any recorded (written) work of expression.g. 4. 4. It gives exclusive rights to the owner to distribute and reproduce his (or her) product.programs. Copyright law is different in different countries. Copyright law is very complex.g. see [60] for coyright’s length.1 Intellectual property protection In most countries laws that protect intellectual property are somehow different.g. Not all inventions can be patented.2 Patent A patent is another commonly used protection for original works. Nowadays patents are granted for at least 20 years (the first patent protection was granted for only 14 years) see [25] and [62]. .12 chapter 8) . you own a copy. it is legally possible to write a program that does the same thing as soon as its code is not a copy of the original program. if you buy a program. Bern Convention [1] has the author’s life plus 25 years for the copyright length for photographic works and 50 years from publication or if not shown 50 years from creation for the length of the copyright for cinematographic works. In most countries the length of the copyright is the author’s life plus 70 years (e. but only the copyright owner can copy and distribute it. Afghanistan). It gives the author of an invention the exclusive right to make and distribute his invention. Canada). Part II Minimum knowledge required to perform digital reverse engineering 13 . which is about as low level as you can get (in software). Definition 7. In order to do DRE a reverse engineer most of his time has to work with assembly code. 5. Definition 6. Perl. A programmer deals with lower level concepts such as classes and algorithms. A high-level programming language is a programming language with high level of abstraction from the particular computer instruction set. Cobol.the program is translated into the machine code once and then could be executed many times e. Rubby.at each execution the program is translated into the machine code by an interpreter e. and what is happening when a program is compiled i. A future reverse engineer has to understand the difference between a hight-level language and an assembly code (low-level language). From the point of view of a reverse engineer (or obfuscator) there are three types of programming languages to be distinguished: interpreted or compiled just-in-time (JIT) programming languages .NET compiled programming languages . A low-level programming language is a programming language that provides no (or very little) abstraction from the particular computer’s instruction set. high-level and low-level.e. php compiled to bytecode or precompiled programming languages . Also. There are many different programming languages and we could classify them in several ways: compiled and interpreted.1 Programming languages In order to do DRE it is necessary to understand differences between programming languages.g. Fortran. flow charts. .g. Java. design patterns.Chapter 5 Theoretical knowledge Before studying reverse engineering one should familiarize himself (or herself) with several concepts that could be referred as “low level”.g. functional and procedural (imperative) etc. A software engineer usually uses high level concepts: UML diagrams.the program is translated into an intermediate language of a virtual machine once and then interpreted by this virtual machine at each execution e. see [47]. transformed from a high-level language to a low-level language. C/C++ 14 . Function and procedure calls . C/C++ use the stdcall convention. DRE techniques required are also different. The callee is responsible for restoring the stack after the call. thiscall etc (see [19]). fastcall. see figure 6. For example. In stdcall. stdcall.g. that different programming languages have different structure and different underlying logic.g. some obfuscations do not make sense for compiled languages but are useful for compiled to byte-code languages e.there exist many conventions on passing arguments to a callee and returning the result to the caller e.1.zero byte ’\0’. Parameters are piled in the reverse order (last parameter first). Libraries used . in Pascal each string is prefixed by its length.2 Compilers It’s important to understand the transformations that compilers apply to a program during the compilation. Even if there are some similarities between these types of languages in case of DRE these languages should be treated separately. Once the reverser knows which programming language was used. which are standards. If the stack is used one of two conventions for the order of parameters could be used (straight or reverse). he may try to figure out which compiler was used (in some cases it might be interesting e. see chapter 7.15 Interpreted and compiled to bytecode languages are executed by an interpreter. that can help the reverser to find out which programming language was used. Since obfuscation techniques are different for different types of languages.different programming languages have their own libraries. in Fortran there is no delimiters between strings. There exist many possible representations of strings e. Generally. Also. parameters are passed to the callee through the stack. Compiled to bytecode and compiled languages are compiled off-line.even if each individual character is usually represented in ASCII or Unicode format. malware analysis [44]). . their names may be found in the header of the executable file. in different conventions the caller or the callee is responsible for restoring the stack. A good knowing of the differences between the various programming languages could significantly help the reverser in his (or her) work. Arguments may be passed (the result could be returned) through the stack or through registers. 5. File headers . 5. Here are some basic features. renaming of variables and functions. This is due to the fact.g in C/C++ each string is terminated by a special symbol . before the execution of the main program).1 Determining language used Knowing which language was used in order to write the program in question can help a reverse engineer in his work. For example.21).g.5 on the page 31 (the name of a library is needed in order to load it. cdecl.most of the time compilers add their name and their version in the header of the file (see example on figure 7. because different obfuscation techniques are applied on programs written in different types of programming languages. String representation . methods nor any kinds of labels1 . Understanding what a program does is a very challenging task.2. [ y1 ] push ECX mov EDX. y2 ) . functions. Since a reverser has to deal with assembly code. x2 . Below are mentioned the main changes that a reverser has to be aware of. 1 In some cases several compilers keep some labels Many operations could not be done directly into the memory. [ y2 ] push EAX mov EBX. it becomes more difficult. A computer does neither need names of variables. Understanding a program written in a high-level language is usually easier than trying to understand a program written in a low-level language. A Compiler is a program that transforms a source code into an other programming language (target language). because computers have a limited number of registers2 .1: Example of translation of a simple function call from C++ code into x86asm assembler. 5. [ x2 ] push EBX mov ECX. 2 . [ x1 ] push EDX c a l l drawLine pop EDX pop ECX pop EBX pop EAX Figure 5. these names help to understand what the program does. Compilers are mostly used to transform a program written in a high-level language into a low-level language. To know what a given variable represents can be even harder.1 General changes in the code structure A compiler makes many changes during the compilation and it can be a very hard task to recognize the source code once compiled. mainly due to the changes made by compiler. whitch are recycled and reused for different variables many times during the execution.16 Definition 8. classes. If you have a source code of a program. he should know what kinds of changes are applied to the original source code by the compiler during the translation from higher-level programming language into the machine code. Listing 5. modified and than copied back.1: drawLine function call in C++ 1 drawLine ( x1 . so the data has to be copied into a register. but once the program was compiled. in case of a compiled program this task is harder to accomplish. Listing 5.2: drawLine function call in x86asm assembler 1 2 3 4 5 6 7 8 9 10 11 12 13 mov EAX. even you have access to the source code. y1 . 2.2). an instruction a high-level programming language can not be translated directly into one low-level instruction but into a block of instructions (see figures 5. Translation form C++ code into x86asm assembler. Most modern languages allow to declare variables in different blocks of the code. i t i s not a f r o g 16 c o n t i n u e : Figure 5. man g++). i t is a frog ! 13 jmp c o n t i n u e 14 notFrog : 15 . The entire structure of the code changes after the compilation.3 of g++ compiler has about 150 optimization options like -finline-small-functions and -funsafe-loop-optimizations etc (see g++ manpages.3: ”Is it a frog?” program in C++ 1 i f ( ( paws==4 2 3 // i t i s 4 } else { 5 // i t i s 6 } and e a r s ==0) or s a y s R i b b e t ) { a frog ! not a f r o g Listing 5.2 Changes due to optimization Some changes are not obligatory but are done in order to optimize the program e. 5. However after the compilation. 4 3 jne checkOR 4 mov EBX.g.2: Example of an if-else statement with several conditions. There are many different types of optimization that a compiler can do during the translation of a program from a high-level programming language to the machine code.g.17 Listing 5. so instructions and data are mixed in the source code. 0 6 je f r o g 7 checkOR : 8 mov ECX. Most optimization options can be turned on and off by the programmer depending on the purpose of the program e. [ paws ] 2 cmp EAX. [ s a y s R i b b e t ] 9 cmp ECX. version 4.4: ”Is it a frog?” program in x86asm assembler 1 mov EAX.4. data and text(code) sections are clearly separated.1 and 5. to accelerate program’s execution or to gain some space by reducing the size of the final executable file. 1 10 jne notFrog 11 f r o g : 12 . [ e a r s ] 5 cmp EBX. Generally. This kind of optimization targets the space-time trade off by . Code duplication Sometimes you can find out that some pieces of the original code are duplicated several times in the final program. 3 p r i n t ( i +1). Often programmers do not know (or do not think about it) which instruction is faster and if there is a way to use an equivalent faster instruction.3. the next instruction does not need the result of the previous one. It accelerates the execution speed of a program. 4 p r i n t ( i +2). i +=5){ 2 print ( i ). the compiler is well suited for such kinds of replacement and can do it for programmers during the compilation. inline keyword in C/C++. Loop unrolling also minimizes branch penalities and can accelerate the program’s execution. i <1000. i <1000.3: Example of loop unrolling in C++. Fortunately. we are interested in using the faster instruction when possible.e. 5 p r i n t ( i +3). even for a computer. Knowing this fact. Since a CALL instruction takes more time than a simple execution of the next instruction an ’inlined’ function will execute faster. See example on figure 5.g. In this example there would be 5 times less branch penalities after the loop unrolling. Most modern compilers provide this option although sometimes it is embedded in the programming language itself e. A keyword inline means that the code of a given function or method would be inserted directly into the program instead of making a function call. 3 } Listing 5.6: Same loop after unrolling 1 f o r ( int i =0. 6 p r i n t ( i +4). than executing physically following instruction . i ++){ 2 print ( i ).5: Normal loop 1 for ( int i =0. For certain cases this can be useful. 3 Tacking a branch (a CALL or JUMP instruction) takes more time. Listing 5.18 trying to minimize branch penalities3 . Loop unrolling (also known as loop unwinding) is another case of code duplication. Usually such optimization is used for relatively small procedures. The most widely known example of optimization that uses an equivalent instruction to obtain the same result is the equivalence between multiplication/division by the base b (also called radix) and shifting a decimal point one position to the right/left. Multiplications and divisions take longer than additions. If multiplied/divided by bn the decimal point has to be shifted n positions to the right/left in order to obtain the same result. but it lengthen the size of the code. in that case they could be executed in parallel (in case of a loop. 7 } Figure 5. it means that a cycle does not depend on the result of the execution of the previous cycle of the loop). Replacing Different instructions are not executed at the same speed and in some cases different instructions can give the same result. if statements inside the loop are independent i.4. See an example of loop unrolling on figure 5. 0 × 100 = 404. In order to use this opportunity sometimes instructions need to be arranged in such way that between two consecutive instructions the further instruction does not need the result of the previous one. 0 1 0 1 0 0 are equivalent to: 1 mov EAX. this way the processor will be better used i. Instruction SHL is faster than the instruction MUL. 5 . as you can see in the example on figure 5. However. 5 . . In case of the digital world. execute.6. the base is 2. .7 you can see that the instruction add EAX. Many processors have a SHIFT instruction which does exactly the same thing. In the example on the figure 5. .0 × 1000 = 42.24 Figure 5. 0 0 0 1 0 1 2 shl EAX. 0 1 0 1 0 0 Figure 5. write back. This goal could be achieved by putting an other instruction in between them. This means that multiplying a number by is equivalent to shifting the decimal point n digits to the right. First example is about the if . can accelerate the program.6). If the condition of the if statement is inversed and the instructions inside of the if and else statements are swapped. memory access.e. eax = 0 .e. EDX has . the program will execute faster. eax = 0 . CF . See an example on figure 5.19 base = 10 404.0 ÷ 102 = 10.4: Example: multiplication (division) by the base is equivalent to shifting the decimal point to the right (left). 4 . Reorganizing Sometimes reorganizing the code i. decode.5: Multiplication by the base. the result will be tha same. eax = 0 . eax = 0 .0 × 102 = 40400 42. inverting the condition can save some instructions in the final assembly code. placing certain blocks of code in different order.5. 0 0 0 1 0 1 2 mul AX.Carry Flag.0 ÷ 100 = 1024. The other case when the compiler decides to reorganize some instructions of a program is due to its knowledge of the processor’s architecture. Also see [51]).0 × 103 = 42000 1024. 2 . . Modern processors can execute several instructions at the same time (processors with several cores and programs with multiple threads) or almost at the same time (several instructions are in different stages: fetch. 2n base = 2 101 × 100 = 101 × 10 × 10 = 10100 Same operation in base 10: 5 × 4 = 5 × 22 = 20 Result of SHL instruction on a register CF ? ? ? ? 0 6 6 6 Instructions: 1 mov EAX.else statement (see figure 5. In case if inc ECX instruction (which does not impact the two previous instructions) is placed in between (see listing 5. I have no answer : −( jmp c o n t i n u e else : .3 Operating systems An operating system (OS) coordinates different elements in a computer. I have an answer : −) jmp c o n t i n u e else : . many reversing techniques (and some obfuscation techniques) are based on the functioning of the OS.20 Listing 5. BYTE[ valueFound ] . . BYTE[ valueFound ] xor EAX. 0 je else . memory management scheme (e. The processor will have to wait till the last instruction is executed. I have an answer : −) continue : Listing 5.g. memory sharing. I have no answer : −( continue : Figure 5. no need t o i n v e r s e cmp EAX.9: Translation into assembly language after reorganization 1 2 3 4 5 6 7 8 9 movzx EAX. Since an OS plays the role of a guardian that controls all links between a program and the outside world. 0 cmp EAX. but in most situations this is not so obvious. [EBX+ECX]. [EBX+ECX] instruction is not ready.7) the programmer would probably do the reorganization himself. the processor will be able to execute inc ECX while the result of mov EAX. It manages the hardware and the software. 5.kernel and user memory. A reverse engineer must have at least some basic knowledge about following concepts: Memory management .6: Example of code reorganization in C++: instructions inside if and else were swapped. In this particular case (listing 5.8: Direct translation into assembly language 1 2 3 4 5 6 7 8 9 movzx EAX. paging). An operating system is a key component of any computer and a reverser must understand how it works. to wait for the result of the instruction mov EAX. 0 je else . memory allocation mechanisms.7: Written code 1 // valueFound i s a b o o l e a n 2 i f ( not valueFound ) { 3 // I have no answer : −( 4 } else { 5 // I have an answer : −) 6 } Listing 5.11). threads. Here is another example. that shows the importance of basic understanding of OS fundamentals: all programs comunicate with the outside world.7: Example of code reorganization in C++: order of instructions changes. See [51] and [19] to learn more about OS fundamentals. This means. that all addresses that has first bit set to ”1” are not valid user-mode pointers. Intercepting and understanding these comunications could give reverser some clues about the program. that Windows use 32-bits memory addresses (4 addressable gygabytes). 2 mov EBX. set to ”0”). Understanding what exactly does each system call will help the reverser to quickly understand general structure of the progaram. context switching. 4 loop : 5 cmp ECX. APIs.10: Direct translation into assembly language 1 xor ECX. synchronization.exception handling.I/O.initialization. The structure of executable file format Comunication with the rest of the world . 9 inc ECX 10 jmp loop 11 end : ECX [ myVector ] 5 10 [EBX+ECX] EDX Listing 5. 8 add EAX. 10 jmp loop 11 end : ECX [ myVector ] 5 10 [EBX+ECX] EDX Figure 5. Process handling . 8 inc ECX 9 add EAX. . 3 mov EDX. For example. 4 loop : 5 cmp ECX.e. 2 mov EBX. In order to comunicate with the OS program will use system calls. Windows uses the upper 2 GB for kernel-related memory and the lower 2 gygabytes (GB) for user-related memory (each time when an address is used in the user-mode the first bit is cleared i. system calls. a reverser should know. 3 mov EDX. 6 j e end 7 mov EAX.11: Translation into assembly language after reorganization 1 xor ECX.21 Listing 5. 6 j e end 7 mov EAX. Instructions on lines 8 and 9 are swapped. Interruptions . IDA Pro Debugger & Disassembler).4. gedit. Active and passive techniques could be used regardless of the object that is reversed.g. as an additional precaution against execution).g.6. The prefix ’hex’ means hexadecimal and refers to the base sixteen1 which is used to represent numbers. but still useful. Basicaly. emacs.8. Definition 10.3. Section 2. Definition 9. nano or vim. Passive reverse engineering is a reverse engineering done by disassembling the reversed object and studying its components. tool that is used by reverse engineers is a hex editor. or after each system call or after each instruction.5. Passive digital reverse engineering is also called off-line code analysis. It may be done with different granularities i. In the case of passive DRE the file is analyzed without being executed (sometimes the permission to execute the file is deactivated at the system level.e. The main idea of active reverse engineering in DRE is to execute the code and then observe and analyse the result .D.1.1 Hex editors One of the most basic. Here is a list of basic tools which are mostly used by reverse engineers. In case of DRE active reverse engineering is also called live code analysis. all hex editors show the content of any file in two forms: numbers in hexadecimal representation . 6. observe changes after the execution of the entier program. notepad++.B.how the system (the environment) changed after (or during) the execution.E. There exist many tools (programs) that allow to do active or passive DRE.9.2 shows a good example in which both approaches were used. Active reverse engineering is a reverse engineering that is done using the reversed object as a blackbox.2. A hex editor is a program which is very similar to a simple text editor like e. Reverse engineering is based on observations done after and during the use of the reversed object.F 22 .A.7. Sometimes the possibilities offered by these different tools are combined into one ’swiss army knife’ tool that can do many things (e.C.Chapter 6 Reversing tools There are two main reversing techniques: active reversing and passive reversing.for all bytes of the file 1 Numerals go from 0 to F: 0. 1): offset from the beginning of the file (in bytes) number representation in other bases e. 2 possibility to switch between big endian and little endian representations Figure 6. Almost all proprietary programs like Photoshop.1) in ghex2 . Patching a program could be useful in order to do active DRE e. modern and more sophisticated hex editors also show other information (see example on figure 6.hex editor for Linux.e. 10. Definition from [19].g. Left column gives the addresses in hexadecimal representation (i.g.for bytes which correspond to printable characters Generally.23 characters . replace one instruction by another instruction or to modify some part of the data. Hex editors are useful to modify files e. offsets from the beginning of the file). That is why hex editors are sometimes classified as patching tools. central column shows hex representations of values of bytes and right column shows same values as characters (non-printable characters are replaced by dots). Patching is also used for protection removal i. are cracked immediately after their releases. Patching is the process of modifying code in a binary executable to somehow alter its behavior.e. Microsoft Office. to ensure that program will execute its particular branch that the reverser wants to explore. Definition 11. WinRAR etc. . cracking.1: View of a part of ’hello world’ program (see source in appendix C.g. these tools are called hex readers or hex viewers. which is used only for DRE and which is disconnected from the Internet or any kinds of external networks. a computer or a network can be used as a sandbox in order to analyze an unknown executable file. which allows to create virtual machines. In DRE a virtual machine will be used as a sandbox instead of a real computer. Hardware virtualization is a virtualization of a computer. A reverser has to analyze the file in a closed environment called sandbox. in this case.2.g. A real (physical) system e. and then work with any suspicious files. 6. Virtualization in not solely used by DRE. In this situation several virtual machines are installed and then inter-connected on the same physical machine (software such as VirtualBox and VMware allow to do it more or less easily). The disadvantage of using a real physical system is the time consumed on the re-installation of the system if analyzed executable file harms the system. the last is called a virtual machine (VM). A sandbox (in computer security) is a mechanism that allows to run programs in separate environments. is available e. This way a virus (if analyzed in such closed environment) would not be able to contaminate the rest of the digital world. In the case of DRE a sandbox could be a computer (or a network).e. In order to avoid this time-consuming task reversers should use virtual environments. Virtualization is also used during the software development for devices like mobile phones. If a suspicious file breaks the system on the virtual machine. The use of snapshots will significantly simplify the life of a reverse engineer: once the system is installed he would take a snapshot of the system. In order to do the DRE of a potentially harmful executable file a hardware virtualization needs to be used2 . he does not know if the given file could be harmful to his system. CD players etc. the reverser would just need to restore the initial state of the system from the snapshot.24 There exist similar programs that do not allow modifications of the content of the file. It is often used to separate untrusted programs from the rest of the system.1 Virtual environments Definition 13. 6.g. pause and resume the execution or take a snapshot (save its current state). VirtualBox and VMware. A lot of software. 2 It is strongly recommended to do DRE of all unknown executable files only in virtual environments . This operation is similar to opening a file in any program.g. In some cases virtualization of the whole network is necessary for a complete analysis. Sometimes virtualization is used in order to host several virtual machines on one physical machine. Definition 12. in order to give impression to end users that they have an entire machine for themselves. Virtualization is a use of a virtual version of a real object. Virtual machines allow to do things which you can not do with physical machines e.2 Sandboxes Usually a reverse engineer deals with an unknown executable file i. program written in an assembly language i.3 Disassemblers Disassembler is the most important reversing tool. 6.%esp $ 0x10 .25 The use of a virtual machine has several disadvantages. Disassembled by the disassembler integrated into objdump program. Disassemblers could use one of two different approaches in order to disassemble the code: sequential. but some of them are crucial for DRE! Usually. Middle column .2. IDA Pro) can also generate flowcharts of functions and entire programs.3. The disassembler converts a raw stream of numbers (machine code) into a human readable format . and act accordingly.(% esp ) 80485 e0 $ 0x8048600 . so there are ways to detect for the program if it is executed in real or virtual machine.3.%esp $ 0x8048830 .g.%ebp $ 0 x f f f f f f f 0 . disassemblers translate bytecodes used by computers into human readable text.addresses in hexadecimal. Left column . Two right columns . Some of them. virtual environment is not exactly the same as a real environment.1: objdump -d hello world 80486 d4 : 80486 d5 : 80486 d7 : 80486 da : 80486dd : 80486 e4 : 80486 e5 : 80486 e c : 80486 f 1 : 80486 f 8 : 80486 f 9 : 80486 f c : 8048701: 8048706: 8048707: 55 89 83 83 c7 08 c7 e8 c7 08 89 e8 b8 c9 c3 e5 e4 f 0 e c 10 44 24 04 30 88 04 push mov and sub movl %ebp %esp . A disassembler is a tool that shows the content of the text (code) section of an executable file.(% esp ) 80485 f 0 $ 0x0 .%eax Figure 6.e. also called linear sweep . Some powerful disassemblers (e. so disassemblers are also platform-specific although some disassemblers that support several platforms. Command line: objdump -d hello world.2: Disassembled code of main() function of hello world program (see code in appendix C. For more information see Section 7. See figure 6.raw data (values in hexadecimal). 0 x4(%esp ) 04 24 40 a0 04 08 ef fe f f f f 44 24 04 00 86 04 movl call movl $ 0x804a040 . Listing 6.corresponding instructions. are not very significant for DRE. like performance degradation.1). 0 x4(%esp ) 04 24 ef fe f f f f 00 00 00 00 mov call mov leave ret %eax . The format used to encode instructions and the set of instructions are platformspecific (depend on the system). The work of a decompiler is harder than the work of a disassembler because it has to deal with all transformations that a compiler applied to the program (see 5.26 recursive. so the given address would be decoded only if it is reachable from previously decoded code. follow the control flow and take branches (evaluate conditions). Most debuggers were created for bug fixing.2). also called recursive traversal The sequential algorithm is simpler than the recursive.3). 6. Nowadays there are several more or less successful projects that develop decompilers e. as well as disassemblers. so recursive algorithms always uses some heuristics to determine which instruction to disassemble next. but decompilers try to translate programs into a high-level programming language. OllyDBG [66]. Andromeda [14]. A lot of information remains in the compiled program from the source code. the recursive approach appeared as a countermeasure to the data insertion into the code section of the file. translate programs into human readable form. In the first place.1 about programming languages). The recursive algorithm would not follow the physical order of instructions.e. assembly language).e. Decompilers. There are two types of breakpoints: .1 Decompilers A decompiler is a dream-tool of all reverse engineers. Insertion of data into the code section is also an obfuscation techniques used in order to break or to confuse different analyzing tools (see more in section 7.g. In order to halt the execution of a program.3. 6.3. .g. In case of sequential algorithm the disassembler reads the code section of the file byte after byte and translates it into instructions (in a human readable form i. All debuggers include disassembler’s features. jump tables (see 7.NET produce a very good human readable code. Generally debuggers that were created for DRE purposes incorporate more powerful disassemblers. It is difficult to determine the length of these jump tables. This is also used as a control flow obfuscation technique e. The first disassemblers used only sequential algorithms. however there exist several debuggers that were created for DRE (but they still could be used for bug fixing) e. so decompilers for Java. Boomerang [15]. They can show values of different variables and registers and can also show the content of the stack (order of called procedures with parameters). Decompilers do their best job in the case of compiled to bytecode languages (see Section 5. debuggers were used by software developers for tracing and fixing errors and bugs.2). Debuggers allow to execute programs and stop them at various breakpoints. Decompiling a program is a very complicated task.3. especially if obfuscation techniques were applied (see chapter 7).4 Debuggers A debugger is a program that allow to monitor the execution of a process (called a debugee). debuggers use breakpoints. This is used by some compilers for optimization of switch statements (see example on figure 6. but it would follow the logical order of instructions i.4).g. ESI <− 0 8 loop : 9 cmp ESI . 8] 4 a d r v e c : [ adr0 .special instructions that a debugger add into a program.2: Switch statement in C++ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 switch ( s t a t e ) { case 0 : statements 0 .3: switch statement in assembler 1 mov EAX. break . break . Hardware breakpoints . } Listing 6. Hardware breakpoints are very useful for detecting code regions responsible for managing data structures. the program pauses and the control is transfered to the debugger. When such instruction is reached during the execution of the program. . ESI .27 Listing 6. break . break .. adr8 ] 5 6 f i n d adr : 7 xor ESI . Tables states vec and adr vec are in the code section of the file. [ s t a t e ] 2 jmp f i n d a d r 3 states vec : [0 .. default : statements def . Figure 6. [ s t a t e v e c+ESI ] 12 j e found 13 inc ESI 14 jne loop 15 found : 16 jmp [ a d r v e c+ESI ] 17 18 adr0 : 19 statements 0 20 jmp c o n t i n u e 21 adr1 : 22 statements 1 23 jmp c o n t i n u e 24 adr8 : 25 statements 8 26 jmp c o n t i n u e 27 a d r d e f a u l t : 28 statements def 29 30 c o n t i n u e : 31 . Software breakpoints .3: Example of implementation of a switch statement in C++ and in Assembler x86.special features (interruptions) that allow pausing the execution of a program and transfer the control to the debugger when a certain memory address is accessed. case 8 : statements 8 . case 1 : statements 1 . adr1 . 1 . [ v e c s i z e ] 10 jmp a d r d e f a u l t 11 cmp EAX. The biggest disadvantage of kernel-mode debuggers is that they affect the system i.g.g. User-mode and kernel-mode debuggers are very different and are used depending on what kind of program is analyzed. Kernel-mode debugger is installed as component of the kernel (the core of the system). a driver). The main advantage of kernel-mode debuggers is that they allow to monitor the entire system. before the main program). etc) could be easily answered without digging into the code. WinDbg [36]. This is done by monitoring all system calls made by a process. Such debuggers allow to stop and observe the entire system at any given moment. Kernel-mode debuggers are mandatory in case if a kernel-mode code is analyzed (e. Since a user-mode debugger is a program that runs on top of the system level. A user-mode debugger attaches to the debuggee program in order to take full control of it. gdb [21]. Numega SoftICE3 [61]) are more powerful debuggers. .g. which files this program uses?. Main advantages and disadvantages of user-mode and kernel-mode debuggers: Advantages Disadvantages User-mode easy to install and use limithed field of view one process monitoring can not debug kernel-mode code Kernel-mode monitor the entire system hard to install destabilize the system 6. but just by monitoring the analyzed program while it executes. as their name implies.g.e active reverse engineering. There exist a variety of different monitoring tools that can monitor different parts of a process or an entire system (see [50] and [19]).(e.28 There are two different types of debuggers: User-mode debuggers . Most user-mode debuggers can not analyze the code that is executed before the main entry point of the debugee process is reached (user-mode code from libraries loaded during the initialization. Kernel-mode debuggers (e. which ports it is listening to?. than user-mode debuggers. it is easier to install and operate than any kernel-mode debugger (which is a component of the system). IDA Pro [43]) are the most simple and more conventional debuggers.5 Monitoring tools Monitoring tools are used in order to perform live-code analysis i. but the program is still used. destabilize the operating system that they attached to. while user-mode debuggers allow to monitor only one process. operate in the user-mode. a simple user-mode debugger is enough for almost all kinds of DRE. Here is a list of several most commonly used types of monitoring tools: 3 Unfortunately. Generally.e. OllyDbg [66]. User-mode debuggers are programs that. monitoring tools observe inputs and outputs (I/O) on channels that exist between a process and the OS. This is not the case of user-mode debuggers. this project was closed in 2006. Some questions that a reverse engineer asks himself or herself (e. Generally. 6 Dumping tools In IT to dump means to copy data from one place to another e. from a memory to a printout or from the RAM to a file. A tool like gcore (in Linux) extracts the content of the memory of a running process into a file. See an example of a flowchart in the appendix (figure B.5.g. CPU usage. network traffic monitoring tools watch on all opened network connections. loaded libraries. This image can be used in a debugger to inspect the state of the program at the time when it terminated. when a program crashes.5 on the page 95). The most known use of dumping is a core dump. TCP. Programs such as objdump in Linux and dumpbin in Windows perform executabledumping. There also exist several visualization tools that help to do digital reverse engineering. . These tools represent information about a file (or its part) in graphical format and sometimes it is very useful for a reverse engineer. The most conventional visual representation of a program is a flowchart.4 and 6. Generally shows which port is used by which program and what kind of traffic it uses (e. usually a file. data contained in the header of an object. Another type of dumping is object dumping. these tools shows all opened files with their permissions per process. A core dump is a disk file containing an image of the process’ memory at the time of termination. In this case the live-code analysis could be used. 6. During the execution of a program the code of that program can be extracted directly from the memory of the process and then analyzed (in a debugger). etc) port monitoring tools monitor I/O on parallel and serial ports. Dumping a memory of a process is useful if the code of the original program could not be extracted during off-line analysis using different deobfuscators (see Section 6. command ”gcore 42” will generate a core dump file (core. Sometimes core dump files are generated automatically. Such files could also be used in case of active reverse engineering. Object dumping is presenting the metadata.29 file monitoring tools monitors all I/O on files. Definition from linux manpages ( man core). 6.7 Visual representations People can more easily and rapidly understand information in its visual representation such as images and diagrams rather than a long text. Generally. these tools are very powerful and upgraded versions of Windows’ task manager and Linux’s top program. For example. Actually. memory usage etc. These programs can present all data from header to footer in a simple human readable format see examples on figures 6. Definition 14.42) of a process with process id (pid) equal to 42. In case of executable files the term executable-dumping is used.8).g. UDP. process monitoring tools show information about running programs e.g. HAS SYMS.6. 6. Control flow deobfuscators .4: A part of object dump of hello world program (see source in appendix C. Another interesting tool. This representation could help to find interesting areas of code in the analyzed executable file (see figure 6. This tool could also be used in order to reveal some types of steganography i. IDA Pro disassembler & debugger [43]) can generate flowcharts. Here are some of the most common types of automated deobfuscators: Code extractors or unpackers .30 Listing 6. However. For example.2) . This tool gives several graphical representations of a file. Renovo [37]. there exist many automated deobfuscators which are well suited to handle some types of obfuscation like packing or control flow obfuscation.6 and figure 6.4: objdump -f hello world hello world : f i l e format e l f 3 2 −i 3 8 6 a r c h i t e c t u r e : i 3 8 6 . It tries to give a general impression of the entire program by showing the overall control flow of the program as a graph. like one presented in [10].3. Creating such tool is a very challenging task which has not yet been achieved. f l a g s 0 x00000112 : EXEC P.8.3.3. A flowchart is a diagram that represents an algorithm.try to remove different kinds of control flow (see 7. see figure 6. By using flowcharts a reverser can quickly understand the general structure of a function.can extract code from packed executables (see Section 7. hidden message passing (see Section 7. For example plots that map each byte as a pixel on the display.8 Automated deobfuscators A fully automated deobfuscator which would effectively remove the effect of any obfuscation would be a dream-tool for a reverse engineer. UUnP (plug-in for IDAPro [43]).7). see figure 6. There also exist less conventional visualization tools. that were created for DRE purposes. Flowcharts give a significant level of abstraction over assembly code. Case studies in original papers [23] and [10] show that such visualizing tools are very helpful in DRE: they can significantly reduce the time that a reverse engineer spends in order to find an interesting part of a file.9). PolyUnpack [41]. D PAGED s t a r t a d d r e s s 0 x08048620 Figure 6. Several reversing tools (e. Command line display the contents of the overall file header: objdump -f hello world.1).e.g. Definition 15. Each action is shown as a box and the order of actions is represented by arrows connecting the boxes. that was created to do DRE of executable files and data formats is presented in [23].1). There exist many code extractors that use different techniques to unpack the original code from the packed file. 6 libm.. 6 : 0 x056bafd3 0 x00 05 CXXABI 1. s o .3 0 x08922974 0 x00 03 GLIBCXX 3.1.so.4 r e q u i r e d from l i b c .0 0 x 0 9 6 9 1 f 7 3 0 x00 02 GLIBC 2. 6 : 0 x0d696910 0 x00 04 GLIBC 2.. s o .so. Dynamic S e c t i o n : NEEDED NEEDED NEEDED NEEDED INIT FINI HASH GNU HASH STRTAB SYMTAB STRSZ SYMENT DEBUG PLTGOT PLTRELSZ PLTREL JMPREL REL RELSZ RELENT VERNEED VERNEEDNUM VERSYM l i b s t d c++. Here the reverser can see libraries (and their versions) used by the program. Command line display object format specific file header contents: objdump -p hello world. different flags etc.6 libgcc s.6 0 x08048550 0 x0804880c 0 x080481ac 0 x080481f4 0 x080482f8 0 x08048228 0 x00000183 0 x00000010 0 x00000000 0 x08049ff4 0 x00000048 0 x00000011 0 x08048508 0 x080484f8 0 x00000010 0 x00000008 0 x08048498 0 x00000002 0 x0804847c Version References : r e q u i r e d from l i b s t d c++.5: objdump -p hello world hello world : f i l e format e l f 3 2 −i 3 8 6 Program Header : . .3 Figure 6.31 Listing 6.5: A part of object dump of hello world program.1 libc.so. s o . For example. . For example. The picture comes from the official site of IDA Pro [43]. obfuscations such as jump tables. Loco [33]. Most of these tools were not created for DRE purposes in the first place. Pltobased [4].are tools that usually monitor programs in order to discover the structure of files and/or protocols that these programs use.6: A flowchart generated by IDA Pro disassembler. but once reverse engineers started to use them some DRE-versions were created. These tools are mostly used in data reverse engineering.9 Miscellaneous useful tools There exist many miscellaneous tools that do not belong in any category. Here below are the most common ones. automated deobfuscators of executable files create a new executable file which contains the deobfuscated version of the code from the original file. but that appear to be very useful to do DRE. Autoformat [31] and Tupni [55]. Diablo [40]. Automated format reversers . Generally. 6. Automated deobfuscators of file formats and protocol structures usually create diagrams or textual representations of protocol structures and file formats.32 Figure 6. There exists one such program that is integrated in most of Linux distributions - .9. in order to find out the type of the file.10).2 Strings and pattern searching Tools that perform string search in files are extremely useful in DRE. Tools that can identify types of file are very useful in DRE (especially in DRE of malicious programs). Image comes from the original paper.33 Figure 6. etc). a file which contains a sensitive data or code may be diffused as a simple image file (see figure 6.11). OEP . programs simply check its extension (but this is not very foolproof).original entry point. Most such programs can look for strings encoded in different formats (ASCII. 6. in order to obfuscate a program.9. Searching for strings can be useful in order to find messages which are displayed in case of errors and names of libraries used (see figure 6.12). because sometimes.7: Visualization of the control flow of a program with utility created in [10]. Most of the time. 6. like the file command in Linux bash (see figure 6.1 File type recognition A tool that is very useful for DRE is a program that tries to identify the type of a file e.g. Unicode. Tools as Linux’s file command also check for some some ’magic patterns’ that are specific to particular types of files (see linux manpages man file). (f) Byte Frequency tag cloud. Image from the original paper.e. program strings. he can find the address where these strings are stored.12).34 Figure 6. (h) Control Toolbar.used bu bioinformatics for visual detection of repeated sequences. Once a reverse engineer discovers all messages (strings) that can be displayed. which can be called from bash (see for example figure 6. (c) Byteview Visualization . (d) ASCII strings contained in the file. Nowadays almost all powerful disassemblers and hex editors integrate utilities that allow to search for a string. see Section 6.hexadecimal and ASCII. Once these addresses are found. Another searching utility which is rarely integrated in specific tools is the search for strings using regular expressions (see [27]) i. . search for patterns instead of searching for exact strings (see example on figure 6. by using hardware breakpoints. the reverser can find the part of the code that use these strings (e. (e) Dot Plot Visualization . (g) Canonical hexeditor view .the color of each pixel maps to a value of a byte from 00(black) to FF (green).4 about debuggers) and thus start to analyze it.g. Visualization of a file.13).each of 256 columns shows presence (green) and absence (black) of bytes of a given value. (b) Byte Presence Visualization . (a) Current position in the file.8: Program presented in [23]. 35 Figure 6. The image was generated by the program described in [23] (Byte presence view). . Image from the original paper.9: Steganographic message hidden in an audio mp3 file. cpp: ASCII C program text Figure 6.jpg picture1.cpp hello world.jpg: JPEG image data.6.36 /> file hello world hello world: ELF 32-bit LSB executable.jpg) and for a shell code disguised as a picture (picture2. Intel 80386.jpg). for GNU/Linux 2.01 /> file picture2. not stripped /> file hello world.15.jpg: Bourne-Again shell script text executable Figure 6.10: Example of output of file program for a C++ source code file hello world.cpp (see source in appendix C. version 1 (SYSV). Both files have the same extension. dynamically linked (uses shared libs). /> file picture1. JFIF standard 1. .1) and its compiled version hello.jpg picture2.11: Example of output of the file program for a real picture (picture1. so.so. CyIk libstdc++.6 gmon start Jv RegisterClasses ZSt4endlIcSt11char traitsIcEERSt13basic ostreamIT T0 ES6 ZSt4cout ZStlsISt11char traitsIcEERSt13basic ostreamIcT ES5 PKc ZNSt8ios base4InitC1Ev ZSt3cin ZNSt8ios base4InitD1Ev ZNSirsERj ZNSolsEPFRSoS E gxx personality v0 libm.4 PTRh‘ QVhD [ˆ ] Enter the secret value: Congratulations! Wrong value! Bye-bye! Figure 6.12: Example of output of the string program for program secretValue (see source code in appendix C.so.0 GLIBC 2.1 libc.so.so.3 CXXABI 1.6 libgcc s.2 : v.6 IO stdin used cxa atexit libc start main GLIBC 2. .37 /> strings secretValue /lib/ld-linux.4).3 GLIBCXX 3.1. e]rror” Error while exec corrected file Error while fork() for copy exec.4). Error while opening file ” exec cp error Figure 6.38 /> strings autoCorrect | grep ”[E. .13: Search for all embedded messages which contain the word ’Error’ or ’error’ in executable file autoCorrect (see code in appendix C. 7.in case of digital reversing this includes computational power. 3. Two programs α and β are equivalent if these programs gives equal outputs for equal inputs1 . if it is cost-effective to do so. π) the definition of equality between two outputs could be ’soft’ e. ’equal to the third decimal place’. e.1 The definition Definition 16.g. than on the original program.2) to do DRE on the obfuscated program. if there are techniques that are used to ’attack’ a system. 1 In case of programs that use heuristics or calculate an irrational number (e. you can observe the ’armaments’ drive on the side of reversers as well as on the side of anti-reversers (or obfuscators). that there is no way to create a perfectly secured or an unbreakable system.g. Harder to reverse-engineer means that it would require more resources (see further section: 7. but there also exist different automated obfuscators. in DRE. Sometimes. this function is applied by hand. This means any system could be broken. 7. if you do not know how to use it for DRE.2 The problem Before presenting different obfuscating techniques we would like to point out. skills/knowledge . 141 ≈ 3. Definition 17. Three main resources are needed in order to reverse-engineer any system: tools . In case of binary reverse engineering the anti-reversing technique is called code obfuscation. The only issue is the amount of resources that is required in order to break the system.having a modern computer would not help to reverse-engineer a system. 141592 39 . It is also impossible to create a system that cannot be reverse-engineered (given unlimited resources).g. So.Chapter 7 Code obfuscation Generally. then there also exist other techniques to protect the system against attacks. An obfuscator is a function that takes a program α as an input and returns a program β which is equivalent to the program α but is harder to reverseengineer. as in many other domains. a powerful computer. Of course. The challenge of obfuscation is to create some irreversible or very difficult to reverse transformation(s) that complicate the reversing process but do not affect the program’s execution. Generally.how much time someone is willing to spend on this task. Each obfuscation technique might protect a given system against some reversers. an obfuscation technique protects against one or two (sometimes more) reversing techniques so combining several obfuscation techniques is very useful. Some obfuscation techniques might require to try all possibilities in order to reverse them (e.40 time . By adding one more obfuscation technique in our program we increase its level of protection. in order to reverse-engineer a system. 7. The aim of obfuscation is to make reversers’ work very difficult i. All obfuscation techniques require knowledge about how to bypass (or remove) them. Somebody who knows how to circumvent only protection type α would not be able to reverse-engineer a type β obfuscation technique. why then should one obfuscate programs? As already mentioned in Section 7. which is 1024 bits long.g. it could take up to 4 × 10275 times the age of the universe to find the key using exhaustive research (see figure 7. finding a password using exhaustive research [39]). it could take a lot of time.2.1). discourage as many reversers as possible by using such obfuscation techniques.1: Time needed to find a 1024 bits secret key using exhaustive research. you need a certain amount of resources. 2 × 10293 seconds ≈ 3 × 10291 minutes ≈ ≈ 5 × 10289 hours ≈ 2 × 10288 days ≈ ≈ 6 × 10285 years ≈ 4 × 10275 × age of the universe Figure 7. if a string was encrypted and you do not have the decryption key. By combining many different obfuscation techniques it is possible to reduce dramatically the number of potential reverse engineers that could successfully reverse the program. but often it is difficult to say if a given obfuscation technique is indeed easier to bypass than another one. For example. key length = 1024 bits. Number of possible values = 21024 ≈ 2 × 10308 If trying 1015 possibilities takes 1 second. Some obfuscation techniques are easier to bypass than others. 7. Most of the time obfuscation techniques put the emphasis on two resources: time and knowledge.e. to find the key. In case of a compiled program .1 Why obfuscate? If there is no way to have a sound protection against reverse engineers. in the worst case it will take 2 × 10293 sec. and even with a powerful computer. so it is impossible to guarantee that a given program could not be reverseengineered.2. that require a lot of resources in order to reverse them.3 Anti-reversing techniques Many anti-reversing techniques exist nowadays. an experienced reverse engineer has enough skill to bypass many different obfuscations. but only the unpacker’s assembly code and the packed version of the original code.1 Packing techniques One of the most common techniques that is used to obfuscate an executable file is packing. when the new file is executed the unpacker would extract the hidden code and then ’jump’ into the original (extracted) program.3. 2 Packing use only encryption. In order to hide it (the code).3. modern cryptology also study digital signatures.2. It is better to do an irreversible transformation. the content of the file is modified: the actual code is packed and an unpacker’s code is added to the file. that is why most obfuscation techniques use reversible transformations and other tricks in order to render them as difficult to reverse as possible. So now. performing irreversible transformations and being able to execute the program is almost impossible. Of course. see figure 7. protocols and integrity of data. it would not be possible to see the original assembly code. So. Most of the time obfuscating the entire program is pointless. In case of anti-reversing techniques. the program could be encrypted and the key could be kept in secret (see Section 7. However. but nobody would be able to execute it! So. we aim at hiding the assembly code by encrypting it.41 some such transformations are done by the compiler (in a compiled version of a program there are no variable or function names). but these are only minor transformations compared to other techniques of obfuscation used nowadays. Here below several of the most common techniques of code obfuscation are presented. This way nobody would be able to do DRE of the program. since the would be no way to reverse the transformation. others reduce the execution speed of the program. In case of a normal executable file anyone can read its content and see the assembly code. All techniques of obfuscation somehow reduce the performance of the obfuscated program or function.1). By simply reading the file’s content. basically the packing algorithm can use one of the three ways to pack the code: encryption compression virtualization Encryption In Greek κρνπτ oς [kryptos] means ”hidden secret”. because the majority of the code is not sensitive and because of the performance degradation (of the obfuscated program). Some obfuscations increase the size of the final executable file. Obfuscation is always a trade-off between how the programmer is concerned about his (her) program being reversed and performance degradation. . In order to execute a packed program the transformation through which the the original code undergoes has to be reversible. Cryptology is a science of hiding information2 . because in this case the obfuscated program will be harder to reverse-engineer. 7. Thus it is a good idea is to obfuscate only sensitive code and sensitive data. while some do both. 2: General scheme of a packed program. Definition 20. she can simply send it as a plaintext message. However. In this way. Here is the general scheme (see figure 7. If Alice does not care if somebody else reads it. the decrypting algorithm and two corresponding keys. should be able to reverse this transformation and read the original message. Bob needs to know the secret (the decryption key) that would allow him to find the original plaintext message from the ciphertext message that he received.42 Figure 7. except Bob. In other words. based on some chosen keytext. Decryption is a reverse transformation. If she does not want anyone else.3): there are two algorithms the encrypting algorithm. to read this message she has to transform the message from plaintext to ciphertext before sending it. encryption is a process of transformation of a plaintext into a ciphertext. An encryption (also called enciphering). Ciphertext is the plaintext after the encryption. 3 Alice and Bob are common names used in cryptography for convenience (rather than ’party A’ and ’party B’) in order to designate participants of protocols . nobody. Imagine that Alice wants to send a message to Bob3 . It is performed by a stepwise application of a (more or less formalized) encryption algorithm (definition from [2]). Definition 19. except Bob. Plaintext or cleartext is the original message before encryption. is a mapping of plaintext to ciphertext. Definition 18. g. except the secret key. Security through obscurity is a way of securing a system by hiding how it was secured. Definition 23. A problem is considered to be difficult if the only way to solve it is to use exhaustive research. A long time ago even the encryption and the decryption algorithms were a part of the secret. who does not know the secret. as well as any kind of security through obscurity. discrete logarithm and integer factorization.43 Definition 21. would not be able to find out the plaintext from the ciphertext.3: General scheme for sending encrypted message. Definition 24. In modern cryptography the algorithms are well known and studied in order to test their properties (e. Exhaustive research or brute-force search is a problem-solving technique that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem’s statement. Kerckhoffs’s principle says that a cryptographic system has to be secure even if the potential enemy knows everything about the system. Key is a parameter that determines the output of an encryption (or decryption) algorithm. algorithms rely on different principles. Nowadays it is considered as a bad practice. Modern cryptography relies on another principle – Kerckhoffs’s principle which is opposite to the security through obscurity. The only secret here is the decryption key.g. Figure 7. . Asymmetric encryption algorithms use ’difficult’ problems e. Definition 22. In order to be sure that a random person. resistance against different kinds of attacks). Definition 25. Data compression is the reduction in the amount of signal space that must be allocated to a given message set or a data sample set. so this method can also be used as an anti-reversing technique. but should he also find the decryption key and then he would be able to decrypt the original code. Lossy data compression is a kind of data encoding method that exploits the fact that in some cases a part of the original data could be discarded. If we want our program to be executed on many different computers without our intervention. Definition 27. The idea of packing the executables is to hide the original assembly code. Lossless data compression is possible because there are many redundancies in real-world data. because we are hiding the information about ’where the key is stored’ but once it was found the system in no longer secured. for example a text.4. decrypt the original code the decryption algorithm needs the decryption key. Lossless data compression allows to reconstruct the original data from the compressed data without losses. The exhaustive research of such key could take a lot of time (see figure 7. Definition from [32]. In case of packing of computer programs only lossless data compression may be used otherwise the processor would not be able to execute the decompressed program since it no longer corresponds to its original version. There are two kinds of data compression: lossless data compression lossy data compression Definition 26. In other words. .3. data compression is a process of transformation of data into a format that requires less space. that could be reversed only if a secret key is known. than by the principle of Kerckhoff our program is safe. The kind of data compression method that should be used depends on the application. Compression is done by reducing the amount of redundant information. Lossy data compression is mostly used to compress images and audio. see figure 7. Lossless data compression is used in cases when information loss is not tolerable. If a reverser finds the decryption algorithm (which is easy because the packed program starts by executing this algorithm). that in order to unpack i. the decryption function and the corresponding key have to be stored in the executable file. Lossless data compression is a kind of data encoding method that uses a statistical redundancy in data in order to compress it. However the key should be well hidden in the file (see Section 7.1).6). Hiding the decryption key in the executable file is equivalent to the security through obscurity. It means that they could be found by a reverse engineer. Note.44 Symmetric encryption algorithms rely on irreversible transformations. Normally compression is used in order to gain some space on the storage device or to use less bandwidth in the case of network transfers. the original assembly code would not be found by simply reading the transformed file. Since some of compression techniques heavily transform the original data.e. Compression Another technique that is used for code obfuscation is compression. instruction pointer) In the case of virtualization.4: Example of lossless data compression.2.5. The table of replacements needs an addition storage of 11 symbols.1). The first option is the same as in case of encryption or compression : translating all instructions back into the original program before ’jumping’ into it. Definition 28. The total space used to store compressed phrase is 59 symbols.45 Phrase: ”There are two types of data compression: lossless and lossy. During the packing process all instructions of the original program would be translated into instructions of an nonexistent virtual machine (see Section 6. regA← PC + 1 Figure 7. See more about instruction set in [51]. Virtualization Virtualization is slightly different from the two previous techniques.” The length of new phrase is 48 symbols. .” Length of this phrase is 60 symbols If following replacements are done: string new symbol loss λ ss β re φ The same phrase could be rewritten: ”Theφ aφ two types of data compφβion: λleβ and λy. Instruction ADD ADDI NAND LUI LW SW BEQ JALR Name Addition Add Imediate Not And Load Upper Immediate Load Word Store Word Branch If EQual Jump And Link Using Register Description regA ← regB + regC regA ← regB + immediate regA ← NOT (regB AND regC) regA ← immediate + 0xFFC0 regA ← Mem[regB + immediate] regA → Mem[regB + immediate] if(regA == regB): PC ← PC+1+immediate else: PC ← PC+1 PC ← regB. PC Program Counter (sometimes called IP . See figure 7. This virtual machine should not exist. Instruction set is a list of all the instructions. that a processor (or in the case of a virtual machine.5: Instruction set of a Reduced instruction set computing (RISC) processor (developed for learning purposes by Bruce Jacob from Maryland university [26]). there are two options for the unpacker. because otherwise packing does not really hide the original code. automated packers. that embed virtual machines into the original code. Figure 7. an interpreter) can execute. Generally. generate a random instruction set for the new virtual machine and then translate the original program into it. Packing summary Packing techniques are not very difficult to implement but they have two disadvantages: executable file’s size increase . the memory dump would not work. These techniques executed ’by hand’ are acceptable if the original program was packed only a few times. the basic use of packing techniques protects against simple reading of the executable file. the code of the original packed program will remain in the file.4) with a breakpoint placed after the unpacker’s code.in case of encryption or compression there is an additional initialization phase during which the original code is unpacked. translate it into real instructions (instructions that can be executed by the physical machine) and then execute these instructions.46 In the other case the unpacking algorithm would not just unpack the original code and jump into it. Basically there are two ways to access the original unpacked assembly code in order to reverse engineer it and as for almost all techniques one of them is passive and the other is active. because in order to unpack it the reverser needs two things: the algorithm and the key which could be very well hidden somewhere inside the executable file. But since the aim of obfuscation is to discourage as many reversers as possible. In order to stop the program at the right moment it could be executed in the debugger environment (see Section 6. nowadays packing techniques are not used in their basic form. So the reverser has the access to both of them and could use the unpacker in order to access the original code. because the only code that would be in memory all the time is the interpreter’s code (see Section 7.3. Note that the unpacker’s code and the packed original code are in the same file. Note. but it may be more or less easily circumvented.6) just after the unpacking is finished. Nevertheless encryption can bring more security in term of obfuscation the file. It means that only the code of the interpreter would be loaded into the memory. Passive techniques consist of finding the unpacker’s code in the packed executable file and write a little program that would use it in order to unpack the original assembly code. in the case of a virtual machine the interpreter has to translate instructions (thereby adding a delay). but it would execute the code like an interpreter: reading and then executing instruction after instruction. The advantage of active reversing in this case is that the reverser does not need to understand how the unpacker’s algorithm works. The virtual machine will read one (or several) instructions from the file. .the obfuscated program contains the original program and the unpacker’s code (which is usually not big in relation to the original code). So the active technique consists of executing the packed program and dumping its memory (see Section 6. That is why in this case encryption does not add any security in terms of cryptography. execution speed decrease .1). that if the virtual machine works as an interpreter. An active technique means that the unpacker would unpack the original code into the memory at the beginning of the execution. and thus once again. Control flow is the order in which instructions of a program are executed. logical order of instructions follows the physical order of instructions except for branching instructions such as jump or call (other exception are interruptions provoked by different events e.g. Armadillo. In this way retrieving the original program manually is not a viable option. Definition 29.pack more than once. Then. the entire code of the original program would never be in the memory. There exist many automated packers e. so the first thing to do is to break the key in pieces and hide each one of them separately. If different parts of the original code are packed separately. ASProtect. the key needed to decrypt the next part could be hidden in the previous part. Improvements to basic packing techniques A first and very simple improvement that can be made . Another interesting addition that could be made. Different keys could be used in order to decrypt different parts of the code.47 Here are several improvements and upgrades used to render the basic packing scheme more resistant to DRE. Renovo. using several variables of the original program. making a memory dump less useful. keystroke).2 Control flow obfuscation Different techniques of control flow obfuscation of a program work against human reversers as well as against automated reversers. one of the most interesting things that could be done in order to improve packing is finding better ways to hide the decryption key (if the encryption is used). in order to protect the program against active reversing: as soon as an unpacked part of the program is used it could be repacked back again. UUnP (also see Section 6. PECompact.g. The second upgrade used in order to discourage as many reversers as possible. Normally. These parts could be packed with different algorithms. UPX. Here below is a list of the most common control flow obfuscation techniques: Mixing the code The idea is to break the original code of several functions into little chunks and mix them using jump instructions to go through them in the right order.g.3. FSG. PolyUnpack. Even better: the key does not need to be stored somewhere in the file.g. The main idea of all control flow obfuscation techniques is to transform the program in such way that it would be difficult to follow the logical order of instructions. WinUPack etc. so it becomes harder to follow the control flow.6) it could be found more or less easily. 7.3. Finally. See the example . they could be unpacked at once or only as soon as they are needed during the execution of the program (if different branches of the program are packed separately.8 about reversing tools). it could be calculated on the fly during the execution e. even up to 100 times. As a countermeasure to this improvement there exist automated unpackers e. the entire code of the program would never be present in the memory at one given moment). If it’s simply stored somewhere in the file (also see Section about stegonography 7. All control flow obfuscations introduce additional branches into the program. is to pack different parts of the original program separately. MoleBox. // end o f f1 segment2 . because these obfuscations use different code structures and techniques used in order to deobfuscate these obfuscations are different. } Listing 7. In other words. each block of the code would ’provide’ the control flow to the (logically) next block. Jump tables This obfuscating technique is very similar to the previous one (mixing the code). f1 segment3 . f1 segment2 . // end o f f 1 s e g m e n t 3 . so the control loop would know which part to execute next. The idea of jump tables is the same as in the previous technique . // end o f f 3 s e g m e n t 1 . goto f 3 s e g m e n t 3 f 1 s e g m e n t 1 . Each part of the code would set a special unique value (usually. mixing the code of several functions is more effective if it is combined with obfuscations of conditions instead of simple goto statements. This technique confuses human reversers and can confuse some automated deobfuscators (most of the time they fail to reconstruct the correct functions). f2 segment3 . Listing 7. However they are always presented separately. Example inspired from [19] Generally.8 . goto f 2 s e g m e n t 3 f3 segment2 . something that corresponds to an index in a table or a pointer). Mixing several functions in such a way has very little impact on the size of the code as well as on the execution speed of the obfuscated program. goto f 1 s e g m e n t 3 . See example on figure 7. f 2 s e g m e n t 1 . f3 segment3 . see figure 7. in case of mixing code obfuscation each block of the code ’knows’ his next neighbour. // e n t e r y goto f 1 s e g m e n t 2 f2 f2 f3 f1 f3 f1 Figure 7.6: Example of mixing 3 functions.1: Original functions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 void f u n c t i o n 1 ( ) { f1 segment1 .7. f3 segment2 .a piece of code that determines which block has to be executed next.2: After obfuscation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 f 2 s e g m e n t 3 . // e n t e r y goto f 2 s e g m e n t 2 f 3 s e g m e n t 3 .48 on the figure 7.break the original code of one or several function(s) into short parts and mix them. } int f u n c t i o n 2 ( ) { f2 segment1 . Each of them separated into 3 pieces. // e n t e r y goto f 3 s e g m e n t 2 f2 segment2 .6. The difference consists in the way of determining which segment should be executed next. } string function3 (){ f3 segment1 . f2 segment2 . In case of jump tables each little block will end up by jumping into a control loop . If the segments are mixed using the technique described earlier. in case of jump tables each block ’knows’ only himself and the control loop. that the code he is looking at is actually an inlined function. is to add a second layer of indirection i. The reverser does not know. it will confuse human reversers.e. when a part of the code is transformed into a function. This is due to the jumps that the control flow has to follow. Inlining and outlining Inlining duplicates some parts of the code (see more in Section 5.2.49 Figure 7. Outlining is the inverse transformation i. jump instructions are one of the most slow and ’heavy’ instruction that a processor can execute.e. indexes it that table would give the indexes in the first jump table and the code would use indexes of the second table. that could be made on the idea of jump tables.7: Structure of control flow obfuscations that mix different block of the code. Jump tables. . see figure 7. Inlining reduces the abstraction created by the programmer in the first place. which could be filled in during run time. Taking a function and duplicating its code makes the work of a reverse engineer harder. dramatically reduce the execution speed of the obfuscated program or function. An improvement. This obfuscation is effective if a function is created from a random piece of code. especially jump tables with several levels of indirection. This kind of control flow obfuscation reduces the readability of the code which can confuse human reversers as well as repel automated deobfuscators.9.2) in order to improve execution speed. have a second table. 6 control : 7 goto j u m p t a b l e [ i d x +1] 8 9 adr1 : s t a t e m e n t 4 . Figure 7. end ] 4 5 idx = 0. statement 2 . statement 5 . 24 adr6 : s t a t e m e n t 3 . 17 goto c o n t r o l . 3 adr1 . These adds confusion to reversers. rises the level of difficulty to understand the program i. Listing 7. adr3 . adr4 .4: After obfuscation 1 jump table : 2 [ adr7 . Such obfuscated conditions are also called opaque predicates. raises the level of protection against DRE.8: Example of use of jump tables. statement 4 . 15 adr3 : s t a t e m e n t 5 . 13 idx = 2. 21 adr5 : s t a t e m e n t 6 . . The other idea consists in combining the inlining with outlining. adr6 . It consists of creating several copies of one function and use them all in the main program. 20 goto c o n t r o l . 22 idx = 6. Condition obfuscation All techniques of control flow obfuscation ’play’ with branches. statement 6 .50 Listing 7. 18 adr4 : s t a t e m e n t 1 . 28 idx = 0. adr2 . This means that in case of conditional branching obfuscating conditions.e. 30 end : continue . 29 goto c o n t r o l . 11 goto c o n t r o l . 14 goto c o n t r o l . 27 adr7 : s t a t e m e n t 0 . 10 idx = 4. 19 idx = 1. 26 goto c o n t r o l . necessary to take the branch. 23 goto c o n t r o l . statement 3 . adr5 . statement 1 . 25 idx = 3.3: Original code 1 2 3 4 5 6 7 8 9 10 11 12 13 statement 0 . 12 adr2 : s t a t e m e n t 2 . 16 idx = 5. look at the figure 7. there always exists a way to create a fake control flow statement (i. the main thread will use these values in opaque predicates. For example. Knowing the rule(s) used to generate the random values.9: Example of jump table with a layer of indirection. Definition 30.8). The result of the condition on the listing 7.5 is always true and the program will never execute the else statement. etc). In its turn.5 and 7.6 could be easily reverse-engineered by human reversers and also by automated control-flow deobfuscators (see Section 6. a branching point with a condition which is always true or always false). Definition from [19]. One of the main ideas of condition obfuscation consists in creating a condition which is always true (or always false). one thread of the main program can generate random values. values used inside opaque predicates have to be generated during the execution of the program. and store them in a place that is accessible by the main thread.e. Such approach . that adhere to some rule (e. be divisible by 7.51 Figure 7. in order to confuse reversers.10. For example. Generally. The result of the following condition in the listing 7. Opaque predicate is a logical statement whose outcome is constant (always true or always false) and is known in advance (by the programmer).6 is always false and the program will always execute the else statement. be greater than some fixed value. The simple examples from listings 7.g. 52 Listing 7.5: Condition is always true 1 2 3 4 5 6 7 bool bb ; // . . . some a c t i o n s . . . i f ( bb or not bb ) { cout<<” . . t h a t i s t h e q u e s t i o n ”<<e n d l ; } else { cout<<” Problems with l o g i c ? ! ”<<e n d l ; } Listing 7.6: Condition is always false 1 int n ; 2 // . . . some a c t i o n s . . . 3 i f ( n == n+1){ 4 cout<<” . oO( Problems with math ? ) ”<<e n d l ; 5 } else { 6 cout<<” E v e r y t h i n g i s f i n e . ”<<e n d l ; 7 } Figure 7.10: Example of opaque predicates in C++. is described in [19]. Reordering Generally, reversers rely on the locality of the code i.e. assume that operations that reside near each other are somehow codependent. The idea here is to randomize the order of operations as much as possible. This is not always possible, because many operations are codependent. The order of operations that are not codependent could be randomized, see figure 7.11. This transformation has almost no effect on the performance of the final code. This technique is not very effective against automated deobfuscators but can confuse most human reversers. 7.3.3 Detection of digital reverse engineering One obfuscating technique consists in detecting that someone is trying to reverseengineer the program and then taking some precautions in order complicate the reversing process. Basically, if program detects that it is being reverse-engineered, it will jump into the code that was concealed in order to confuse reversers (see figure 7.12). Since a program can detect that it is analyzed only during its execution, this method of obfuscation mostly protects against live code analysis and patching. Generally, one of three actions is taken in order to stop the reverser : Stop the program - is the most simple thing to do, it will force the reverser to patch the obfuscated program (disable the part of the code that detects that the 53 Listing 7.7: Normal order of operations 1 ; g e t parameter 2 mov EAX, [EBX] 3 push EAX 4 c a l l f o o ; r e s u l t i n ECX 5 pop EAX 6 7 ; g e t parameters 8 ; for loop condition 9 mov EDX, [EBP+14] 10 mov ESI , [EDX] 11 loop : 12 ; operations 13 cmp ECX, ESI 14 j e loop Listing 7.8: Randomized order of operations 1 mov EAX, [EBX] 2 mov EDX, [EBP+14] 3 push EAX 4 mov ESI , [EDX] 5 c a l l f o o ; r e s u l t i n ECX 6 loop : 7 ; operations 8 cmp ECX, ESI 9 j e loop 10 pop EAX Figure 7.11: Example of Randomization of the order of operations. program is reversed) in order to proceed the live-code analysis Execute meaningless code - is another option, in this case the reverser will spent his time trying to understand a meaningless and irrelevant part of the program. This change in the execution may be unseen by the reverser, and will not be detected by automated deobfuscators (because an automated deobfuscator can not distinguish between interesting and meaningless code). Crash the system - is the most radical thing that can be done. The most harmless is deleting or wiping4 the executable file of the program. The most harmful method consists in crushing the entire system, in order to do maximum damage. Methods used in order to detect different reversing tools and reversing techniques are varied. Here we present the most common techniques used to detect the presence of most wide-spread reversing tools. Detecting debuggers A debugger is one of the most used reversing tools (see Section 6.4). Obfuscators are thus very interested in ways that can detect the presence of a debugger. Obfuscation techniques, that detect the presence of a debugger are more effective if combined with packing techniques. If automated unpackers are unable to extract the original code from the obfuscated file, the reverser is forced to use a debugger in order to analyze it. Techniques used for detecting debuggers have two significant disadvantages: 4 When a file is wiped, it means that before deleting the file some meaningless data is written into it, in order to overwrite the original data. In such way, even when the memory blocks are restored, the original data can not be retrieved. 54 Figure 7.12: Flowchart: idea of behavior in case of detection of DRE. Almost all solutions are platform-specific this means that the programmer has to know what kind of system will execute his program. The way to reduce the impact of this disadvantage is to implement several debugger detecting techniques. False positives could be generated by the part of the code that detects the debugger, so the final code can malfunction even if the debugger is not present. Generally, it is easier to detect a user-mode debugger, than a kernel-mode debugger. This is due to the fact that a kernel-mode debugger has less ’direct’ impact on the program. Kernel-mode debuggers are not attached to the process directly, but observes the program from the kernel (see more in Section 6.4). Since almost all solutions for detecting the presence of a debugger are platformspecific, it is difficult to describe all of them. Here below some general ideas that are used to detect debuggers are presented. Some systems have an application programming interface (API) which returns true if a user-mode debugger is present e.g. IsDebuggerpresent API in Windows. This method is not very effective because an API call is easy to detect and easy to bypass. However it could be improved by inlining the code (see Section 5.2.2 about inlining) of the API in the program instead of calling it. Windows also allows to make a request SystemKernelDebuggerInformation, which returns a structure (see figure 7.13). This structure shows if a kernel debugger is present and if it is activated. In Linux OS there is also a way to discover if the program is being traced, see an example in figure 7.14. Another approach is more generic and consists in the use of the Trap Flag. The Trap Flag is a flag defined for x86 processors. /debuggerPresent (gdb) run Hello! Debugger is present! /> Figure 7. It means that the debugger is present. Figure 7.is another generic approach. the processor will execute only one instruction and then raise an interruption in order to allow the debugger to inspect the debugee. NULL. 5 *PSYSTEM KERNEL DEBUGGER INFORMATION.14: C debuggerPresent program for Linux and its output. NULL ) .9: Structure SYSTEM KERNEL DEBUGGER INFORMATION 1 typedef struct SYSTEM KERNEL DEBUGGER INFORMATION { 2 bool DebuggerEnabled . If the program checks its integrity it will detect that its code was modified. Listing 7. When the Trap Flag is activated. 3 bool DebuggerNotPresent .10: debuggerPresent program for Linux 1 #include <s t d i o ./debuggerPresent Hello! /> gdb . If the exception was not risen it means that the debugger handled it for us. 8 } 9 return EXIT SUCCESS . that allow to detect if the debugger is present. 10 } Output: /> . 4 } SYSTEM KERNEL DEBUGGER INFORMATION. h> 2 #include <s y s / p t r a c e . The idea is to enable the Trap Flag and check if the exception was raised. The use of checksums . Read more . 0 . 5 int t r a c e = p t r a c e (PTRACE TRACEME. 6 i f ( t r a c e ){ 7 p r i n t f ( ” Debugger i s p r e s e n t ! \ n” ) . h> 3 int main ( ) { 4 p r i n t f ( ” H e l l o ! \ n” ) .55 Listing 7. When debuggers set software breakpoints.13: Structure returned by the request SystemKernelDebuggerInformation. they change the code of the program. so checksums can not help in detection of virtual environments. detecting that the program is executed inside of a virtual machine is based on some differences between real and virtual machines (e. in case if the process has a very low priority and it is often rescheduled.4 about software and hardware breakpoints. see Section 6. There would always be ’something missing’ in the virtual machine. First of all. the virtualization software could implement a functionality that sends the signal (used to switch from virtual to physical machine) to the virtual machine. debuggers may also be detected. detecting a virtual environment is much more difficult. In that case the reversed program might ask to press different keys and the switching key in order to measure the time between two signals (two keys pressed).56 about checksums in Section 9. see figure 7. The program might ask to press a random key from the set of all keys used in order to switch from virtual to physical machine. Imagine the following scenario: a program might ask a user to press right ctrl key on the keyboard. Generally. Sometimes. Of course. Such kind of tests are only limited by the imagination of the developer.3. release keyboard and mouse controls.2. However this method could give a lot of false positives e. than detecting a debugger.15.g. Nowadays. This method works if the debugger sets a software breakpoint.1) from the obfuscator point of view it is interesting to be able to detect virtual environments. Detecting virtual environments Since all reversers work in virtual machines (VM) (see Section 6. so when the end user press right control (ctrl) key VirtualBox release mouse and keyboard from virtual machine (it means that the right key is not pressed in the virtual machine). VirtualBox use right Ctrl key. The main ideas used for detecting virtual environments are similar to ideas used for detecting debuggers. It could be bypassed by use of hardware breakpoints. Secondly. so there is always a more or less tricky way to detect a virtual environment. because even if a virtual machine perfectly simulate a real machine’s hardware.g.e. by measuring the time spend in a given procedure. virtual machines become more and more sophisticated and the differences between real (physical) and a virtual machines tend to converge to zero. Virtualization software always use a rarely used key (or a combination of keys) in order to switch from virtual to physical machine i. some low-level operations could have different effect on a real and on a virtual machine). the set of possible interactions between a user and the virtual machine is different from the set of possible interactions between a user and a physical machine. . virtual machines do not set any software breakpoints. Detection of virtual environments has same general disadvantages: All solutions are platform-specific False positives However there are several differences between detecting debuggers and detecting virtual environments. These differences exist because it is almost impossible to create an environment which would perfectly simulate the real machine. In case if VirtualBox is used (with its default configurations) user would not be able to press the right ctrl key in the virtual machine. Finally. modification of the original code of the analyzed program in several cases: Disabling a part of a program is generally used for protection removal.15: Virtualization : example of passing control (keyboard and mouse) from virtual to physical machine. All methods that detect patching use error detection mechanisms (see more in Section 9. otherwise the program stops. Forcing a branch is mostly used in live code analysis.1). This control procedure would be called just before the execution of the code of a function in order to calculate the checksum of the function and compare it with the precalculated control checksum. when the reverser wants to analyze a precise part of the code. Modifying (sometimes adding) functionalities could be useful in some rare cases. the file was not modified) the program continue its execution.e. The main idea is to add a control checksum to the file and to have a control procedure. So even if a VM is detected.e. .3. Also used in order to crack programs. If the checksums are equal (i. virtual machines are not only used by reverse engineers (see more in Section 6.57 Figure 7. Detecting patching Reverse engineers use patching i.2).2. there are good chances that the program is not being reverse engineered. 3).g in listing 7.16 correctly. Patch data .16 (listing 7. Disassemblers. 7. because generally only one byte is inserted in order to confuse a disassembler. will disassemble the code such as in figure 7. It shows how a byte could be inserted into the code section of a program without affecting the normal execution of the program. because its success depends on the precise version of a precise tool that is used for analysis.13 adr+4 is calculated from the current address (adr+1) plus the parameter 02 plus one byte (the instruction pointer is incremented automatically). This last option is extremely rare and very difficult to implement.13 and 7. If the value of the inserted data is well chosen.12).14 bytecodes of instructions following adr+05 will also be disassembled with errors. crash or to make the reversing tool produce an incorrect output. Disassemblers that use a linear sweep algorithm (see Section 6. all or several following instructions will be misinterpreted by the disassembler. A good way to use checksums is to calculate them only for most sensitive parts of the code. Disassemblers that use recursive traversal algorithms could be used in order to detect byte insertions.3) would interpret this data as an instruction.recalculate the checksum for the patched part of the code/data. which use recursive traversal algorithms (see Section 6. anti-patching techniques could be deactivated by patching. it takes a lot of time.2. .3 presents more information about error detecting codes and techniques used against patching. This technique is also called byte insertion. Generally programs have several highly sensitive functions that are called at the initialization of the program (this is always the case of programs that have license verification procedures). Confusing disassemblers Since a disassembler is one of the most important reversing tools.3. Section 9. See an example in figure 7.disable the checking procedure or to circumvent (disable) the function call to the checking procedure.4 Crashing and confusing reversing tools One of countermeasures against DRE is to confuse. In listing 7. The basic idea consist in adding a piece of data into the code section of a program. Listings 7. There are two ways to deactivate checking procedures: Patch instructions . The most extreme type of such countermeasure is to exploit a bug in the analyzing tool and to take control of it. Other techniques used against patching are discussed in Section 9.58 Calculating checksums is an expensive operation i.14 show how such code would be disassembled by recursive traversal and linear sweep disassemblers. confusing a disassembler is very a interesting obfuscation technique. The destination address e. If the program does many checksum verifications its execution speed is reduced dramatically. Ironically.e. This technique is not very difficult to implement and has almost no effect on the execution speed of the program. .11: Original code 1 2 3 4 . Instruction jne (jump-if-not-equal) has bytecode 75 (in hexadecimal) and needs a parameter . Since recursive traversal disassemblers use heuristics in order to estimate when to stop....12: Modified code 1 2 3 4 5 instruction 1 jump c o n t i n u e DATA BYTE[ 7 5 h ] . instruction 1 jump <adr+04> jne <(adr+03)+ b y t e c o d e 2+1> Figure 7.13: Modified code disassembled correctly 1 2 3 4 5 adr+00 adr+01 adr+03 adr+04 adr+05 bytecode1 eb 02 75 bytecode2 .. The program will still be executed correctly. instruction 1 instruction 2 . .59 Listing 7... Listing 7.3. it is still possible to make them produce wrong results (the code is not fully disassembled) at least for some parts of the code.16: An example of one byte insertion into the code section of a program. instruction 1 jump <adr+04> = <(adr+01)+02+1> DATA instruction 2 Listing 7. . but will be disassembled incorrectly by linear sweep disassemblers... the disassembler would not be able to always tell if a section of a code could be accessed or not. one b y t e i n s e r t i o n = b o g u s i n s t r u c t i o n continue : instruction 2 Listing 7. If opaque predicates are used.2) could also be used in order to confuse recursive traversal disassemblers. Opaque predicates (see Section 7.14: Modified code disassembled incorrectly 1 2 3 4 adr+00 adr+01 adr+03 adr+05 bytecode1 eb 02 75 b y t e c o d e 2 .the next byte. binarycoded decimal (BCD).1 about decompilers).18). here are some general rules that may be applied in order to obfuscate data.g. goto l o o p 1 . a sum. goto l o o p 1 . The use of several different formats and use of non-standard encodings will reduce the readability of the program. Many data transformations are possible. loop2 : i f ( in loop2 ){ statement2 1 .3. 7. Listing 7. goto l o o p 2 . gray code. there is a goto statement in Java bytecode. see an example in figure 7. a coordinates. statement2 2 . } i f ( in loop1 ){ statement1 2 . } statement2 2 . Data transformations can significantly reduce the readability of the code and will make the work of the reverser harder. etc (see figure 7. .NET. Listing 7.3. Differences between the bytecodes and high-level language can be used in order to confuse a decompiler.17. Figure 7.17: Example: two loops with an overlay.60 Confusing decompilers Since compiled to bytecode languages decompilers can produce good results (see Section 6. but not in the high-level language.5 Data transformations The way how different variables and data structures are stored and handled could reveal their purpose and their meaning i.16: Two loops with an overlay 1 2 3 4 5 6 7 8 9 10 11 12 loop1 : statement1 1 . For example. loop2 : statement2 1 . etc). two’s complement.e. This could be used in order to break the control flow of the program in such way that a decompiler would not be able to create a corresponding high-level language structure. goto l o o p 2 .15: Normal flow 1 2 3 4 5 6 7 8 loop1 : statement1 1 . The main idea consists in changing the ’normal’ order of bits in a byte (or bytes in a word) or introducing bogus values in the real data. Encoding formats There exist many standard formats that are used to encode (represent) data e. what a given variable represents (a counter. statement1 2 . it is interesting to try to obfuscate the final executable files of programs written in such languages as Java or . i <16.19: Example: all bits in a variable are shifted one position to the left.61 Number 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Binary 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 BCD 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 0001 0000 0001 0001 0001 0010 0001 0011 0001 0100 0001 0101 Gray code 0000 0001 0011 0010 0110 0111 0101 0100 1100 1101 1111 1110 1010 1011 1001 1000 Figure 7.17: Normal code 1 f o r ( int i =0. 3 } Figure 7. . . . For example.17 and 7.18). all bits in a byte can be shifted n positions to the left. . do s o m e t h i n g . . see example for n = 1 in figure 7. Many such transformations can be done in high-level languages (see listings 7. 3 } Listing 7.19. do s o m e t h i n g . i <32. . . i ++){ 2 // . i +=2){ 2 // . In this case all numbers would be multiplied by 2n . Example inspired from [19] .18: With shifted encoding 1 f o r ( int i =0. Decimal Number 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Representations Normal Shifted 00000000 00000000 00000001 00000010 00000010 00000100 00000011 00000110 00000100 00001000 00000101 00001010 00000110 00001100 00000111 00001110 00001000 00010000 00001001 00010010 00001010 00010100 00001011 00010110 00001100 00011000 00001101 00011010 00001110 00011100 00001111 00011110 Listing 7. The reverser could spend a lot of time to understand this logic.18: Examples of different binary representations for numbers. . If a string is stored in its encrypted form (as a chipertext).. α1 α2 αn . Finding a good way to hide sensitive data improves the resistance of a program to reverse engineering and raises its level of protection.. several arrays that contain totally independent information could be stored together: concatenated or mixed (see figure 7. αn β1 β2 Array AB. standard modules (embedded in different reversing tools) that search for patterns become useless.g. Several precautions can be taken in order to prevent reversers from finding these strings.. Concatenated . Array AB. encryption (see Section 7.. Another way to prevent reversers from finding messages stored in the file. so generally.. The decryption key could be hidden somewhere in the file or in some rare cases generated from the input (see [39]).. All strings could be stored in encrypted form. For example. use an image with an error message instead of standard ’normal’ error message. is to store them as images. α1 β1 α2 β2 β1 β2 Array B .9..3. For example.1) could be used. . Mixed . This should be done in such way that the reverser would not suspect that it is an image e.62 α1 α2 Array A .2).3. Hiding strings During the reversing process reversers often look for strings stored in the file (see Section 6.. In order to use this technique a function that displays windows with error messages should be overwritten.20). reversers can know only the cleartext form of strings that they are looking for. Data structures The structures used to store data can be altered in order to mislead reversers..20: Example of how two arrays could stored. βn βn αn βn Figure 7. 7. In this last case an image would be shown instead of a text message..6 Hiding data Reversers always look for interesting pieces of data and code. A reverser can execute a program and observe its output. and decrypted before each use. in files that are not stored using compressed file formats.3. Nowadays digital steganography is used more often than tattoos on slaves’ heads.3”. figure 6. Steganography has an advantage over cryptography. There exist programs that can help to find embedded steganographic messages. Also see [54].3. One of the most famous (and one of the oldest) cases of steganography is the case of Histiaeus. the name and the version of the compiler.9. Steganography is a science of writing hidden messages and thus could be used in order to hide any type of data. names of variables). . In order to confuse reversers.21. the original string “GCC: (Ubuntu 4. For example a secret message could be embedded into an image. in theory only the sender and the receiver know how and where the message was hidden. Another kind of symbolic information that could be replaced by something misleading is stored in the header (or in the footer) of the file e. So steganography means ”concealed writing”.1) part of the symbolic information is eliminated by the compiler (e.4.g.1) i.e. In order to represent an image each color receives a numerical value. In the case of compiled languages (see Section 5.3.e. Secret messages can be embedded into images.3-3suse4) 3. In the example in figure 7. He tattooed a secret message on the shaved head of his slave.3-4ubuntu5) 4. If lossy compression is used. Note that steganography could be used together with cryptography in which case the message would be encrypted and then hidden.7.2. Such information would not be eliminated in case of compiled to bytecode languages. functions certainly help the reverser to do DRE. When steganography is used.2. 7.1). steganography is a form of security through obscurity (see 7. Since in most cases the human eye can not distinguish the color represented by ”11111111” and the color represented by ”11111110” the least significant bit of each pixel could be used for purposes of steganography.63 Steganography The word steganography comes from Greek words στ εγανσ [steganos] which means ”covered” and γραφη [graphei] which means ”writing”. that is stored in a format that does not use a lossy compression (see Section 7. Generally steganographic messages may be more easily hidden in messages that contain a lot of redundancies i. the message might go unnoticed.g. the receiver would not be able to restore the message.7 Eliminating symbolic information Names of variables. So. see example in Section 6. the message was hidden. eliminating all symbolic information will complicate the analysis.3” could be replaced by something like “GNU Fortran (openSUSE 3. If a message is encrypted in most cases this means that the message contains something ’interesting’ and it will attract the attention of reversers. a possibility which is of the great interest for obfuscation.4. the compiler (g++) added a string with its version and its name in the executable file. After some time. Unlike cryptography. covered by his hair. This information remains in the final compiled code and it is used by the interpreter for cross-referencing (instead of addresses). video or audio files. There is NO warranty. The opposite is also possible.4.4.21: View of hello world program in ghex hex editor.3.8 Human reversers versus automated deobfuscators Obfuscation techniques that prevent from deobfuscation by automated systems are different from techniques used to prevent from deobfuscation by human reversers.3 Copyright (C) 2009 Free Software Foundation. some techniques are effective only against human reversers. Command g++ --version gives the same information. Some techniques could easily fool a debugger but a more or less experienced reverser would understand what’s going on in less than a minute.3”. see the source for copying conditions. 7. but not against automated deobfuscators. extraction of the packed code) and then human reversers try to reverse-engineer the rest.g. compiled on Ubuntu OS.4. Generally.4. It is better to use a combination of techniques against human reversers and techniques against automated deobfuscators.3. version: 4. . Figure 7. in DRE process the first part is done by automated deobfuscators (e. not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.3-4ubuntu5) 4. Inc.4.3-4ubuntu5) 4.64 /> g++ --version g++ (Ubuntu 4. This is free software. These info can be seen in the string ”GCC: (Ubuntu 4. Compiler: GCC. Several years ago this solution was not realistic. Package .65 7. the sensitive code will be executed and stored on the server side. There exist several solutions that consist of taking the sensitive parts of the code and putting them in a place that is inaccessible (using only software means) for reversers. In this case.4 Pushing the reversing problem out of the software world As were already mentioned in Section 7.1 Program as a service First solution consists in implementing a client . If such a solution is applied (assuming. reverse engineering becomes significantly harder and thus requires more resources. There exists many solutions in terms of how the end users might pay for such service: Data volume processed . In such an implementation the sensitive code could never be accessed by the end users. 7.22. that sooner or later.2 it is impossible to create a perfectly secured system. The server will handle these requests and send the result back to the clients. that the solution was properly designed and implemented). This means.server solution is possible. Note that such kind of network solution is not suitable for all kinds of applications. .time of connection between the client and the server or time that the server spends to threat clients’ request.4. The difference with the idea described in 7.server solution. when many computers are connected to the Internet and technology permitted a high bandwidth interconnectivity. Here below main solutions.4. 7. In case of DRE. the reverser has full access to the executable file and can do whatever he (she) wants with it.by megabytes or gigabytes Time spend . The idea is to create a processor that can execute encrypted code (by decrypting it on the fly). All end-users will execute a clients’ code which contains no sensitive parts.3. Implementing such processor is not enough: a whole system must be created in order to support cryptoprocessors. The code of a sensitive program is stored on the users’ computer.for month. for year or for number of requests or number of connections. that protect against software reverse engineering. client . Clients will make requests to the server (like requests to a database). the file will be cracked (if it is cost effective). Nowadays. See an example of a procedure in the figure 7.1 consists in the place where the decryption key is stored. are presented.2 Cryptoprocessors Another solution is to use encryption (see definition 20). The reversing could be achieved only by the use of a combination of digital reverse engineering and hardware reverse engineering. but it is encrypted and can not be accessed (in theory) without a secret decryption key. 6.5). Cryptoprocessors have to separate different process and prevent them from accessing each others data. 4. Instead of calling all types of hardware ’a dongle’ different names are used for little hardware devices (depending on their purpose). Software manufacturer asks the CA for the public key that corresponds to the ID of the processor.22: Basic steps for protecting a program using a cryptoprocessor.3 Dongles The use of software protection dongles is a very interesting solution that could be used as a countermeasure against digital reverse engineering. The use of dongles can be seen as a hybrid solution of two previously discussed (cryptoprocessors and server-client): the dongle would play a role of a server (it will return responses to user requests).4. 8. A software protection dongle usually contains a microcontroller. CA responds by sending the public key of the processor. Software developer sends encrypted program to the user. that steps 3 and 4 could be replaced by: “the user sends the processor’s public key”. 5. Purchasing a program 1. Processors are sold.g. the term dongle was used for any piece of hardware that plugs into a computer. The user is happy. User sends his processor ID. authentication devices are called security tokens (there also exist security tokens that do not plug into a computer). contains a license key. 3. Nowadays this term is used in case when dongle is used for copyright protection e. Manufacturer asks a certification authority (CA) a set of identification numbers (ID) and corresponding private keys. 2.4. Pluggable storage devices are generally called usb-keys or flash drives. 7. otherwise the dongle would be reverse-engineered more or less easily. 3. A dongle5 is a piece of hardware that plugs into a computer. 7. Software developer encrypts the program with the public key. The code of sensitive functions will be usually encrypted. Definition 31. An end user buys a program.66 Manufacturing 1. During the manufacturing. . Note. Software developer asks him for his processor ID. each processor receives an ID and a corresponding private key. Since dongle is accessible by the end user. 2. it has to be tamper resistant (see Section 7. otherwise a reverser can write a program that could access the data of the encrypted program (by interacting with it). 5 In the first place. Figure 7. the sensitive code would be encrypted and stored with the main program. that the code that is contained on the dongle does not have to be encrypted. a decryption module and a secret decryption key.4. In the first case it will execute the encrypted code directly and then return the result to the main program. If this scheme is applied.67 The basic idea consists in having a dongle which is able to execute the sensitive code and return the answers to the main program. Here below two general schemes (in terms of where the sensitive code is stored) that could be used in dongle solution are presented. When the main program needs to execute the sensitive part of the code. If several parts of the program. Store all code together In this case. Secondly. the dongle will decrypt the code. If the developer updates a sensitive part of code he can just include its encrypted version in general updates. The first advantage of this scheme is the simplicity with which code updates may be handled. it also could be modified. If it is not the dongle has to be conceived in such way that the code could not be accessed from the outside. In order to add one more level of protection. In its turn. When the main program needs a result of one particular function it sends the id (a number or a name) of the function and its parameters. There are two main disadvantages of storing all sensitive code and data on the dongle. First of all. The disadvantage of storing the encrypted sensitive code with the main code lies in a security issue. are encrypted. Since the encrypted code can be accessed by the reverser. it will send the encrypted code and eventual parameters to the dongle and then wait for results. all sensitive functions are stored on the dongle. Store the sensitive code on the dongle This solution has its own advantages and disadvantages. the code could be encrypted. In this last case. In the second case. 7. updates are not as easy as in ’store all code together’ solution. the dongle has to be able to store the biggest encrypted function (and its eventual parameters). execute it and then send the result back to the main program. Trusted computing concept includes hardware . There exist attacks that consist in sending incorrect messages (which will handled as encrypted code by the dongle) and observing the results. that the dongle is well conceived DRE could be done only by using hardware hacks. if the reverser finally obtains the code (by means of some hardware hacks) he still would not be able to read it (except in the case. The main advantage of such implementation (when the sensitive code is stored on the dongle) is that the sensitive code is harder to obtain.4 Trusted computing Trusted computing platform (see [49]) is another solution that could be used in order to protect programs from being reversed. that contain sensitive code. where the reverser finds the encrypted code and the decryption key). The dongle can contain a cryptoprocessor or a processor. dongle needs to have a larger storage space in order to store all sensitive functions (and its eventual parameters). The second advantage is the storage space that the dongle needs. Supposing. Note. the dongle will execute the function and send its results back to the main program. . While using trusted computing. power analysis.5 Hardware protections summary Hardware solutions against reverse engineering do not offer a full protection against reverse engineering. hardware has to be tamper resistant i. See more on http://www.g. screws with special (non-standard) heads. music. it also relies on cryptography.org/. See more about hardware reverse engineering in [25].trustedcomputinggroup. There exist many techniques. Trusted computing is not only for anti-reverse engineering purposes. etc. There exist many hardware protections that make tampering difficult e. that trusted computing restricts the end user too much.e. The main idea is to be able to verify. Their main criticism is based on the fact. in order to be difficult to reverse engineer. Also see [49]. Trusted computing could also be used for : Disk encryption Platform integrity verification Digital Rights Management Password protection Trusted Computing Group (non-profit organization) promotes trusted computing and creates specifications needed to meet the requirements of a particular trusted system.4. films or programs). that could be used in oder to overcome hardware protections e.68 and software. Trusted computing has a lot of opponents. One of the main motivations was preventing users from sharing copyrighted files (e. Use of hardware protection does not mean that a program could not be reverse engineered.g. the end users would lose their anonymity and the full control over their data (inability to move files from one computer to another). that only authorized code is executed on the system. They transform the reversing from a pure software operation to the hardware. 7. chips erasing their memory if opened (exposed to sunlight) etc. Since the hardware is accessible by the end user.g. hard to tamper with physical access to it. 8.Chapter 8 Applied reversing General knowledge about reversing tools. the assembly language etc (see chapter 5).e. Reason – process tried to access a part of memory that does not belong to it. The only effective way to learn about outputs. reversing techniques and obfuscation techniques is required for doing DRE.cpp: In function ’int main()’: hello world.2: g++ compiler output for a common error: forgotten semi-column./autoCorrect Segmentation fault Figure 8. it is not enough. One of the most difficult questions remains: What does a given error message mean exactly? i.2). produced because of errors. He or she also has to know as much as possible about error handling. The main difficulty in training for DRE consists in choosing a suitable target or a program for practicing. / > . Software developers also encounter this problem. more exactly . is practice. This is mostly due to two reasons: 69 .1 Training As was already mentioned. why the error occurred. When analysing a file. but generally compilers and debuggers help to answer these questions quickly (see figure 8. in most real world cases. / > g++ hello world.’ before ’return’ Figure 8.1). where in the code it happened and how to correct it (see figure 8.cpp:9: error: expected ’.what results are generated by different errors. a reverser has to know as much as possible about the underlying operating system. trainings are very important for learning DRE.1: Error: Segmentation fault.cpp -o hello world hello world. Unfortunately. net/ propose such training programs for reverse engineers. First of all. Regardless of these difficulties. a beginner can start from easier DRE challenges without knowing what obfuscation techniques (or any other kind of difficulties) he will have to deal with. If the reverser knows what he would be dealing with. the reversing process become easier. less interesting and less didactic. so reversing them is legal. Generally. There exists entire databases of such programs. Most of the time the legislation is not clear enough to understand if someone can legally reverse-engineer a given program. Secondly. generally users who were able to reverse a program can evaluate the level of difficulty of reversing a given program. This means that. Legislation for DRE is not clear .most modern pieces of software are very difficult to reverse-engineer.bright-shadows. a beginner should not start practicing using such programs.70 Choosing the right level of difficulty . Sites like http://www. these programs are called CrackMe of KeygenMe. these programs were created to be reverse-engineered. The are two main advantages in using programs such as CrackMe. . there exist many programs that are good to start DRE trainings with.in some countries reverse engineering is not always legal. It is also impossible to know what kinds of obfuscating techniques were used (before starting DRE). There is no way to know the level of difficulty of reversing a given program. Part III Contribution 71 Chapter 9 Anti-patching Usually patching is used in one of the following cases: Updating - in this case, the software developer modifies his own code. This is done in order to fix bugs or in order to add a new functionality to the program. Reverse engineering - in this case, a reverse engineer modifies the program in order to accomplish his (or her) goal. Cracking - usually done by a hacker, most of the time it is done in order to break the protection of a program. 9.1 A known problem A software developer has to be able to fix bugs and update his own product. In this case patching is 100% legitimate and raises no questions. At the same time, developers of proprietary software do not want others to reverseengineer and modify (patch) their programs. Software developers may not want their programs to be patched for several reasons: Proprietary algorithms and protocols - many proprietary software use secret algorithms and protocols that are better (more perfomant) than equivalents used by other developers e.g. protocol used by Skype, algorithms used by Oracle database manager. Illegal and free copies - in many cases, patching is used in order to break or circumvent protections. Once the software protection is broken, cracked program could spread very quickly. Nowadays, protections of almost all proprietary non-free programs are cracked as soon as they appear on the market. Embedded malware - software developers do not want their programs to be modified because a piece of malicious code could be embedded in their code. 9.2 Existing solutions Different solutions can protect programs from patching. The idea consists in checking the integrity of the code (or data) before executing (or using) it (also see Section 7.3.3). Here below, general schemes and ideas of existing solutions are presented. 72 73 9.2.1 Manual checking One of the first protections against patching was a manual checking of a checksum. Generally, MD5 (Message-Digest Algorithm 5) hash is used (see Section 9.3.2). Imagine the following scenario: Alice obtains a program She computes the MD5 hash of the program and compares it to the value given by some trusted authority e.g. look for it on website of the software developer (who developed the original program) Alice installs the program if and only if the values match. This solution can prevent legitimate users from installing software with embedded malware. However not all users check if the program that they install was not modified. This solution can only protect legitimate users from using modified programs, but will not protect programs from being modified. Only cryptographic collision resistant hash functions (see Section 9.3.2) should be used for checking. Otherwise, a malicious person would be able to create a program that has the same hash value as a legitimate program. This protection could be defeated. For example a hacker, who is trying to embed a malicious program into a legitimate one, can find a second preimage to the hash function (see definition in section 9.3.2). He (a hacker) can also try to break the server of software developer and replace the hash value of the program on it. 9.2.2 Automatic error detection This technique is very similar to the previous one, except that it is not done manually by the end user. As a result this technique offers an additional advantage – it protect programs from being modified. As were already mentioned in the previous section, not all users check new software before installation. Programs started to check their integrity by themselves. The idea is to use error detecting codes (see Section 9.3 and 9.3.2), if an error is detected it means that the original program was modified. In order to check themselves programs use the following scheme: before executing a part of a program (e.g. before executing a sensitive function) the program will check its checksum. If the value is correct, then the execution continues normally. Otherwise, program changes its normal behavior, see Section 7.3.3 and figure 7.12. Generally, cryptographic collision resistant hash functions are used as checksums (see Section 9.3.2). In order to break this protection, all functions that check a patched part of the program have to be disabled or it could also be circumvented by recalculating the checksum. This process could be very difficult, especially if many obfuscations (see chapter 7) were applied to functions that check the program’s integrity. 9.2.3 Check results of computations This technique attempts to check the code integrity through data. It consists in checking the results of a given function after the function call. g. a transmission error is also called a bit inversion. error detecting and error correcting codes may be used. it means that the function was altered or circumvented (assuming. Waves interfere with each other.3 Error detecting and error correcting codes Some time ago people started to use electromagnetic waves (e. radio waves. The efficiency of error correcting and error detecting codes can be evaluated using two parameters: Redundancy rate . This technique has a significant disadvantage . electricity) as a medium for information i.e. Typically. If after the function call of QSort the array is not sorted.4 Algorithm TPCA: Checker Network Mikhail J. see [3]. 9. which means that a message could be modified during the transmission i. CD/DVD disks. with a copy of the code stored elsewhere. This protection could also be defeated by disabling all checking procedures. the receiver will be able to detect if he received a correct message or if errors (bit inversions) occurred during the transmission. the ratio between the size of useful information and total size of the message is often used. that detect and correct patches. see [7]. which could be annoying in case if a part of a legitimate program was replaced by malicious code. Atallah and Chang Hoi patented an algorithm. they compute hash value(s) over a region of a code. that the function was properly implemented in the first place). electromagnetic waves are used to transfer information from one point (sender) to another (receiver). all space rovers use error correction in order to send photos of the space to the earth.3. In order to solve the problem of errors that occur during the transmission of a message. This means that all error correcting codes are also error detecting codes. the receiver will receive a message with errors and there is no guarantee.g. Since one bit could take only one of two possible values (zero or one).74 For example. 9. In computer science all messages are streams of bits. . In case if a checker detects that a part of a code was patched. a responder (i.the amount of redundancy (otherwise useless information). The idea consist in having a network of checkers procedures that verify the integrity of the entire code and each other’s code. If the sender and the receiver know what kind of redundancies should be introduced in the message.e corrector) procedure will replace the patched region.1 The idea behind error detection and correction The main idea is to introduce redundancy into the message before transmitting it. that the message received is correct or not.it checks the code after it was executed. consider a function QSort that sorts an array. TCP protocol. error detection and error correction are used in almost all data transmissions e. Nowadays. Error correcting codes allow to detect and correct errors that occurred during the transmission of a message. Error detecting codes allows to detect that a message was modified during its transmission. 9.2.e. 3. Original message: ”0000” (n = 4) Message to send: ”0000 0000 0000” (k = 3) Received message: ”1111 1110 1111” 11 errors = 4 × 3 − 1 Figure 9. k = 3 and n = 2.Maximum number of errors that could be detected and/or corrected in the message. where i ∈ [0.e the message contains specific redundancies. if the message is valid the receiver will acknowledge the message send an “OK” message to the sender.75 Number of errors . k − 1] : i + t × k. See example of detection of k × n − 1 (all bits received correctly except one) in figure 9. error detecting codes introduce redundancy in the message. . also known as repetition code. However. Here below general ideas used in error detection and error correction are presented. the sender will send the last message one more time.2. This scheme is similar to the scheme used by the TCP/IP protocol.2: Example of detecting errors with repetition code. If the sender does not receive an acknowledgement in a certain period of time or if he receives an incomprehensible answer. See the example in figure 9. Original message 00 01 10 11 Message to send 000000 010101 101010 111111 Examples of invalid messages 010001 110110 010111 001000 Figure 9.3. Then.2 Error detecting codes As were already mentioned.1. this code would be unable to detect k bit inversions if they all occur at positions ∀t ∈ [0. 9. Such code can detect up to k × n − 1 errors (single bit inversions). n = k1 of useful inforIf this error detecting code is used. n − 1]. each message contains k×n mation and k−1 k redundancies. The idea is simply repeat the original message (of size n) k times. Repetition code Here is an example of the most simple error detecting code. where n is the size of the original message and k is the number of repetitions. See example in figure 9. The idea is that receiver will check if the message is valid i. Redundancy that has to be added to the message can be calculated using different algorithms. This redundancy is generally called a checksum.1: Example of repetition code. A message with parity bits code contains n−1 n of useful data. Parity bits error detecting code can detect any odd number of bit inversions in the message. Cryptographic hash function could also be used in order to calculate a checksum. collision resistance) that a cryptographic hash function has to satisfy. hash functions are good checksums. See figure 9. Definition 32. The hash function must be one-way in the sense that given a Y in the image of h. It could not detect if any even number of errors occurred.3: Example of an undetected errors k = 3 and n = 4. where n . This error detecting code adds only one bit to the message. Hash functions Hash functions are used for different purposes e. See an example in figure 9.is a number of bits in the message. One-way hash function (OWHI) is a function h satisfying the following conditions: The argument X can be of arbitrary length and the result h(X) has a fixed length of n bits. The idea is to set the additional bit of the message to 0 or to 1 in order to obtains an even or an odd number of bits set to 1 in the message. Parity bits error detection code has a big advantage . Figure 9. There exists two types of parity bits code: even and odd.76 Original message: ”0000” (n = 4) Message to send: ”0000 0000 0000” (k = 3) Received message: ”0010 0010 0010” 3 errors occurred at positions ∀t ∈ [0. hash tables.it is very simple to implement using a logical exclusive or (XOR) – a very simple and fast instruction. Because of the conditions (fixed size.5. Definition from [2]. Parity bits code is used in redundant arrays of independent disks (RAID 5). Parity bits Here is another family of slightly more sophisticated error detecting codes called parity bits.g. Original message 0100110 0010100 Number of ’1’ 3 2 Even parity 01001101 00101000 Odd parity 01001100 00101001 Figure 9.4. it is computationally infeasible to find a message X such that h(X) = Y (preimage resistant) and given X it is computationally infeasible to find a message X 0 6= X such that h(X’) = h(X) (second preimage resistant). 2] : 2 + t × 3.4: Example of parity bits error detection code. digital signatures. Where k = 3 and i = 2. . message 011 111 001 010 101 010 110 001 # ’1’ 2 3 1 1 2 1 2 1 parity bit 0 1 1 1 0 1 0 1 Same result using XOR: byte 1 byte 2 XOR (r1) 01001010 11010110 10011100 r1 byte 3 XOR (r2) 10011100 11101001 01110101 Byte containing parity bits: ”01110101”. The hash function must be OWHI The hash function must be collision resistant: it is computationally infeasible to find two distinct messages that hash to the same result. no matter how many times the track is read. The advantages of use of cryptographic hash functions for error detection are: Fixed size . each time the CD player will read the message with errors. Definition from [2]. imagine that there is little a scratch on a CD disk. Figure 9.a very little amount of redundant information is added to the original message. 8 parity bits will be used.it is extremely unlikely. it takes only to invert one bit in order to correct one error. A collision resistant hash function is a function h satisfying the following conditions: The argument X can be of arbitrary length and the result h(X) has a fixed length of n bits.77 Original message: ”01001010 11010110 11101001” (3 bytes) Original message presented as 8 messages of size n = 3. that several bit inversions in a message produce another message with the same hash value. One error is a single bit inversion. 9. For example.3 Error correcting codes Sometimes detecting an error is not enough since in many cases information always arrives to the receiver with errors.3.5: Implementation of even parity bits error detection code using XOR. Collision resistance . . Definition 33. Suppose. is defined by: d(c. Here. Hamming distance between two codewords.2) is presented. Figure 9. The Hamming distance is used to quantify the distance between two messages. Hamming distance (”0000”.2 can only detect errors. See example on figure 9. There exist a variety of different error correcting codes (see [24]). it means that the bit inversion occurred in M [i][j].6. described in Section 9. c0 ) = card{i ∈ [0. most likely. Sender can send only two possible messages: ”0000” or ”1111”. only a simple error correcting code (used in 9. See example on figure 9. ”1001”) = 2 Hamming distance (”1111”.7).78 In order to be able to correct a single bit inversion two things are required: the error must be detected and place where the error occurred must be found. In other words.7: Example of equal Hamming distances between the received message and two distinct valid messages. however they can not be corrected (see figure 9.3. but can not be corrected. the Hamming distance is a number of positions at which the corresponding symbols of two equal length strings are different.9 . the original message is the valid message closest to the received message. ”1101”) = 1 It is more likely. that the sender sent ”1111”. that a message was received with one error (single bit inversion). n − 1]|ci 6= c0i }.4. If the receiver receives ”1001”. but not correct them. c and c’.6: Example of a valid message close to the received invalid message. Parity bits The code. See figure 9. Errors are detected. If the receiver receives ”1101”. . The idea is to represent a message as a matrix M of m × n bits and then add parity bits to each line and to each column (m + n parity bits). Generally. Sender can send only two possible messages: ”0000” or ”1111”.8. In this case an error would be detected in a line i of M and also an error would be detected in the column j of M . ”1001”) = 2 Figure 9. There is a simple way to modify the parity bits error detecting code in order to transform it into an error correcting code. error correcting codes rely on the fact that if a received message is not a valid message then. Definition from [2]. Hamming distance (”0000”. In the case when a received message has more than one closest valid messages errors are detected. ”1101”) = 3 Hamming distance (”1111”. Definition 34. 9. If there are more errors they could detected or undetected depending on their configuration.4.4 9. This error correcting code can always correct a single bit error.11. Original message: ”0100 1010 1101 0110” (2 bytes) Sent message: ”0100 1010 1101 0110 0101 1010” Received message (with one error): ”0100 1000 1101 0110 0101 1010” Received message presented as a matrix M : Line received parity bits calculated parity bits 0100 1000 1101 0110 1010 1000 Parity bits received calculated 0 0 1 0 0 0 1 1 There is an error in the second line and in the third column.1 My addition The idea The general idea consists in the use of error correcting codes instead of error detecting codes.9: Example of odd parity bits correction of one error. Nate Lawson mentions in his blog [30]. Figure 9. that use of error correcting codes is pos- .8: Example of odd parity bits error correcting code.79 Original message: ”0100 1010 1101 0110” (2 bytes) Message presented as a 4 × 4 matrix: parity bits Line 0100 1010 1101 0110 1010 parity bits 0 1 0 1 Parity bits for lines: ”0101” Parity bits for columns: ”1010” Message to send: ”0100 1010 1101 0110 0101 1010” Figure 9. See examples in figures 9. M[2][3] have to be corrected from zero to one.10 and 9. This offers an advantage over a simple use of error detection . unexpectedly terminated with an error. there were no publication about use of error correcting codes against patching. All parity bits are correct. It rises the level of protection of a program.11: Example of 4 errors that could not be detected. In this case the reverser .g.10: Example of 2 detected errors that could not be corrected. sible but. a patch could be deactivated and replaced by the original code. Figure 9. Original message: ”0100 1010 1101 0110” (2 bytes) Sent message: ”0100 1010 1101 0110 0101 1010” Received message (with four errors): ”0100 0000 1101 1100 0101 1010” Received message presented as a matrix M : Line received parity bits calculated parity bits 0100 0000 1101 1100 1010 1010 Parity bit received calculated 0 0 1 1 0 0 1 1 Errors were not detected.e. He will see that the program has completely changed its behavior e. since there is no way to know which lines of the matrix were affected by errors.the error could be corrected i. to my best knowledge. Consider the following scenario: a reverse engineer modifies the code of a program which uses an error detection mechanism.80 Original message: ”0100 1010 1101 0110” (2 bytes) Sent message: ”0100 1010 1101 0110 0101 1010” Received message (with two errors): ”0100 1100 1101 0110 0101 1010” Received message presented as a matrix M : Line received parity bits calculated parity bits 0100 1100 1101 0110 1010 1100 Parity bit received calculated 0 0 1 1 0 0 1 1 The message can not be corrected. Figure 9. This kind of patches (1 byte replacement) is very common in case when a hacker (or a reverser) wants to inverse a condition of a cycle or of an if-else statement (see further in this section). The proof of concept program is able to correct a 1 byte error. Proof of concept Two proof of concept programs were implemented (under Ubuntu OS) in order to show that use of error correcting codes is possible and that it is possible to restore the original code (i.g. .4.e. See figure 9. 9. deactivate a patch) and execute it. Use of error correction might be an additional source of frustration for a reverse engineer. An odd parity bits error correcting code (see Section 9.2 Implementation The general idea consists in implementing the scheme used by error detection mechanisms i. Now. A reverser will see that there are no changes in the behavior of the program.3. consider the scenario in which an error correction mechanism is used.12. parity bits were calculated for all bits at the same position in bytes (in order to form a parity byte). see figure 9.e. Figure 9. The reverser might consider that he patched a wrong part of the code (e.13.3) was used as following: the executable file is presented as a matrix of 1024 bytes ×d f ileSize 1024 e bytes.81 will start to search where the program checks its integrity in order to deactivate the integrity checking procedure. a part that was not reached during the execution of the program). before executing a part of a code a checking procedure is called in order to ensure that the following code was not patched.12: General idea of implementation. using program cp 1 (called with the system call ’exec’).14 The second program autoCorrect (see source code in appendix C. The executable file is presented as a matrix 1024 × d f ileSize 1024 e. the execution continues normally. Once the file was copied. System call is a mechanism used by programs in order to request a service from the OS. AutoCorrect program is equivalent (see definition 17) to the program secretValue (see source code in appendix C.3) adds parity bits to a file. The first program addChecksum (see code in appendix C. See figure 9. Since a simple parity bits error correcting code was used. Program autoCorrect checks its own code before executing it. Definition 35. Then the two arrays of parity bytes (lines and columns) are appended to the file.4).82 Figure 9.13: Use of parity bits error correcting code in programs addChecksum and autoCorrect. If there are no errors. the main process corrects the error (in the newly created file) and forks for the second time. If an error is detected the main process saves its current state ( it is modeled by a variable ’state’). In order to align the last line of the matrix it is filled with bytecode ”00000000”.5) is able to correct its own code. Then the main process will fork the first time in order to create a new process. program can always restore one byte of its code. The second fork is used in order to execute the new 1 Linux program cp copies a file . which will copy the executable file. This first fork could be replaced by a procedure which opens the executable file and then copies all data from the executable file to a new one. for ( unsigned i =0. append ( linesChecksum ) . Depending on result the program goes into if or else statement. columnsChecksum = columnsChecksum XOR v e c t o r ( ” 0xFF” ) .g. je (jump-if-equal) instruction could be replaced by jne (jump-if-not- . ’clean’ file. some operational systems (e. I n t e g e r zeroesToAdd = columnsNbr * l i n e s N b r − s i z e ( f i l e ) . read ( b u f f e r ) . Vector linesChecksum [ l i n e s N b r ] = 0 . f i l e . ++i ) { f i l e . the execution continues as if there was no errors (patches) in the file. append (BYTE( ” 00000000 ” ) ) . i <l i n e s N b r . The proof of concept program creates a copy of the original executable file and executes it. Integer linesNbr = c e i l ( f i l e . This has to be done. because any changes in the original executable file done during its execution would not affect the execution of the program (since it is already loaded into the memory). append ( columnsChecksum ) . collumnsChecksum [ i ] = collumnsChecksum [ j ] XOR b u f f e r [ j ] . Once a new. s i z e ()/1024) Vector columnsChecksum [ columnsNbr ] = 0 . for ( unsigned i =0. A hacker can simply inverse the condition in the if statement by replacing the jump instruction in the if-else statement e. } } // Odd p a r i t y : linesChecksum = linesChecksum XOR v e c t o r ( ” 0xFF” ) . } Vector b u f f e r [ columnsNbr ] = 0 . See figure 9. ++i ) { f i l e . Figure 9. i < zeroesToAdd . the corrected file should be deleted. j <columnsNbr . Windows) do not allow to modify a file during its execution. f i l e . If such scheme is used in a real application. ++j ) { linesChecksum [ i ] = linesChecksum [ i ] XOR b u f f e r [ j ] . The autoCorrect program and secretValue program ask the user for a password.14: Pseudocode of addChecksum program. Although.e.g. was created it is executed with a saved state as a parameter i. For educational purposes the corrected file is not deleted after its execution.15. for ( unsigned j =0.83 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 input : f i l e I n t e g e r columnsNbr = 1 0 2 4 . clean file. Several parts of a program that use error correcting mechanism could be implemented differently.3 Other possible implementations and improvements The proof of concept program shows that it is possible to use error correction against patching.84 Figure 9. Where to put the corrected code There are several places where the corrected code could be stored. Error correction mechanisms could be implemented in different ways. 9.4. . Here are presented main possible upgrades. equal).15: Flowchart of checking procedure of autoCorrect program. Use the stack. An error correcting code could be applied separately to each function or to each block of the code (and data). Blocks of the code. How to correct The proof of concept use a very simple error correcting code.3). code and data sections. . This technique is the most tricky one. If a program communicates through the network.3. Turbocodes etc.85 Copy the original file. which can always correct only a single error (one byte). However. Correct the code in the RAM.6). that have checksums. Sometimes the program has a permission to execute the stack memory (this security issue is often used by hackers) the corrected (little) part of the program could be placed into the stack and executed. There exist many other error correcting codes.4. this technique could not be applied on all operating systems.3. These kinds of improvements will complicate the process of reverse engineering and patching. because most OS do not allow to write directly into executable part of the memory. But in this case a system of checkpoints should be also implemented (see Section 9.the reverser would not be able to easily see the trace of the correction. It could be achieved using encryption (see Section 7. all checksums could be stored on a distant (always accessible) machine. This technique presents a big advantage . this technique has several disadvantages: – If monitoring tools are used. A problem might appear if a corrected page is swapped out and than reloaded into the memory (see [51] about memory management). A more powerful error correction mechanism will certainly improve the resistance of the program against patching. that are able to correct much more errors e.1) and steganography (see Section 7. Reed-Solomon. could overlap in this case code will be checked several times with use of different checksums. – The original program might not have permissions to write on the disk or to create files. This technique (used in proof of concept program) has an advantage: it is simple to implement.g. Hide checksum Sometimes it is difficult to circumvent the checking mechanism. so the checksum should be well hidden in the file. However. What to correct The proof of the concept applies one error correction mechanism to the entire file including its headers. but it is easier to calculate a new checksum and replace the original checksum by a new one. the reverser will notice the disk I/O. g.2.4). If the system of checkpoints is separated from error-checking procedure the system of checkpoints would not be disabled.many sensitive parts of the code could be duplicated (also see 9. Secondly. that checks a block of the code or data for errors. correct the error and resume the execution.2). crashes.2. This means that all checkpoints must be saved. First of all. if one part is patched and the error correction code is not able to correct all errors (restore the original program from the patched version) it might execute the code of a duplicate procedure (or replace patched part of the program by a clean code stored elsewhere and then execute it.16).g. In patching. This technique should be used mostly on sensitive parts of the code. 9. In this case. Suppose that a reverse engineer was able to detect and disable all checking procedures between the first checking procedure and the patch that he (or she) applied (see figure 9. In such way. stack and variables) in order to continue the execution from the checkpoint. See example on figure 9.17. Definition 37. In a program checkpoint is a place where the current state of the program is saved.2). The idea of checkpoints is to be able to continue (resume) the execution of a program if the program stops unexpectedly e. In order to be able to patch such a system all checking procedures have to be found and deactivated.4 Advantages and disadvantages The use of error checking mechanism against patching has its own advantages and disadvantages. Definition 36. the procedure. This could be much harder than deactivating one checking procedure especially if different obfuscating techniques are applied to different checking procedures. Imagine that the checking procedure is inlined (see Section 5.4 and [3]). Another improvement that could be done is duplication . the size of the file grows. a system of checkpoints could be used in order to be able to execute a program from the last clean ckeckpoint.2. A Clean checkpoint is a checkpoint C such as all previously used blocks (data and instructions) of the program contain no patches. with the use of checking procedures.2. The first non-disabled checking procedure will detect errors in previous (already executed) part of the code. Other improvements The code of the checking procedure will be harder to delete if many obfuscations (see Chapter 7) are applied to it e. .86 Checkpoints The system of management of checkpoints could be improved in order to raise the level of protection against patching. registers. A state of a program includes everything that is needed (value of instruction pointer. see Section 9. it will be able to find the last clean checkpoint. will be harder to circumvent if it is inlined (see Section 5. especially if checking procedures are inlined. computing a checksum before entering each function (or a block of the code) might take a lot of time.4. g. error correction has an additional ad- . who try to break the protection of the program. the program will stop because the program is not packed anymore. However. The advantage of this technique is the additional source of frustration for reverse engineer. The disadvantages and the advantages. if an automated unpacker unpacks the code and then the reverser tries to execute it. described above. This mechanism will also protect the program from the use of some automated deobfuscators e.87 Figure 9.16: Inlined checking procedure with multiple checkpoints. are also present in antipatching error detection mechanism. that the program did not reach the patched code (another branch of the program was chosen). restore the original code. Error detection and error correction mechanism are very close and so they share some weaknesses. Weaknesses Any system has its weaknesses and error correction mechanism is no exception. the program will continue its execution normally and the reverser might think.the ability to disable patches i.e. vantage . Bar copy is a procedure equivalent to bar. Normally a reverser will patch the code and then try to execute it.he will try to find the error detection mechanism. the reverser will try to find out why .88 Figure 9. .17: Duplicated function bar. If the program changes its behavior unexpectedly. In case error correction is used. Since a checksum calculation mechanism is implemented in the file it could be used by a reverser to recalculate the new checksum (even if the reverser does not understand how the checking procedure works). Finally.4.89 First of all. However. If it is disabled a patch could be applied to a file and this patch will work.3) and by use of error correcting codes that are able to correct many errors. this weakness could be fortified by use of duplication (see Section 9. . the checksum could be recalculated after a patch was applied. Secondly. the error checking procedure could be disabled. the error correction mechanism will not be able to restore the original code or data if too many changes were made. if disabling the checking procedure is too hard. The proof of concept program demonstrates. 10. it would be very interesting (and challenging) to implement an automated obfuscator based on ideas presented in this work and than study how different automated deobfuscators can handle this obfuscation combined with other obfuscation techniques.1 Anti-patching One technique used by reverse engineers and by crackers in order to deactivate the protection of a program is patching. Digital reverse engineering (a sub-domain of reverse engineering) and code obfuscation are closely connected and should be studied together. because each obfuscation raises the level of protection of the system and decrease the number of potential reversers. Developers (especially developers of non-free programs) are very concerned about their programs being patched and distributed freely.Chapter 10 Conclusions Many different goals could be achieved through reverse engineering . There are legitimate (updates) and illegitimate (cracking. Any system could be reverse-engineered and it will be reverse-engineered. if it is costeffective. 90 . One should know how to obfuscate systems in order to understand how to reverse-engineer them and vise versa. This paper present an anti-patching technique based on error correcting codes and a proof of concept program.a methodology used to find out how things work. 10. Patches are applied to different programs on a daily basis. Regardless of the fact that any system could be reversed. a developer concerned about his product being reversed must use obfuscations. protection removal) uses of patching. There exist several solutions that prevent programs from being patched.2 Further work Further works on this subject can study different implementations of error correcting mechanisms based on possible improvements presented in this paper. that it is possible to restore and execute the original code after it was patched. It is also interesting to implement an efficient system of checkpoints. And finally. Appendix A Abbreviations API Application Programming Interface BCD Binary-coded decimal BIOS Basic Input/Output System CA certification authority CPU Central Processing Unit IBM International Business Machines I/O Input/Output IP Internet Protocol IT Information Technology DRE Digital Reverse Engineering EU European Union GB Gyga Byte GPS Global Positioning System OS Operating System PC Personal Computer RAID Redundant Array of Independent Disks RAM Random Access Memory RE Reverse Engineering RISC Reduced Instruction Set Computing TCP Transmission Control Protocol VM Virtual Machine UDP User Datagram Protocol 91 . 92 UML Unified Modeling Language USA United States of America USSR Union of Soviet Socialist Republics . 1 B.2: Tu-4 created by Tupolev Design bureau.Appendix B Images B.1 History of reverse engineering Reverse engineering in military Figure B. Image from [56] Figure B. Image from [63] 93 .1: B-29 Superfortress created by Boeing.1. 94 B. Image from [58] Figure B.3: IBM PC 5150.4: IBM-compatible portable by Compaq.1. Image from [58] .2 Reverse engineering in digital world Figure B. com/518/ .2 Flowchart Figure B. Image comes from http://xkcd.5: Guide to understand flowcharts.95 B. data ( ) . long myhash = c o l l . // t h e ”C” l o c a l e const c o l l a t e <char>& c o l l = u s e f a c e t <c o l l a t e <char> >( l o c ) . return 0 . world #include <i o s t r e a m > using namespace s t d .Appendix C Code C. int main ( int argc . int main ( ) { cout<<” H e l l o . cout<<” Enter a v a l u e : ” . char * argv [ ] ) { s t r i n g value . 96 . cout<< ”Hash : ”<<myhash<<e n d l . h> #include <l o c a l e > // hash using namespace s t d . world ! ”<<e n d l . g e t l i n e ( cin . } C. locale loc .2 ComputeHash #include <i o s t r e a m > #include < s t d l i b . hash ( v a l u e . data ()+ v a l u e .1 Hello. v a l u e . l e n g t h ( ) ) . cout<<”Bye−bye ! ”<<e n d l . value ) . c o u t << ” F i l e s i z e : ”<< o r i g F i l e S i z e << ” b y t e s \n” . f s t r e a m : : i n | f s t r e a m : : out ) . c e r r << ” Usage : ”<<argv [0]<< ” <f i l e n a m e >”<<e n d l . h> <math . // open f i l e : f i l e . s i z e = b u f f e r S i z e char * vpVector . h> <u n i s t d . char * b u f f e r . // v e r t i c a l p a r i t y v e c t o r fstream f i l e . s t a t ( argv [ 1 ] .3 AddChecksum #include #include #include #include #include #include #include <i o s t r e a m > // I /O <f s t r e a m > // f i l e < s t d l i b . &f i l e s t a t u s ) . //ODD p a r i t y i n h o r i z o n t a l and i n v e r t i c a l p a r i t y v e c t o r s int main ( int argc . h> // c e i l using namespace s t d . return EXIT FAILURE . . // h o r i z o n t a l p a r i t y v e c t o r . s t s i z e ) . // c a l c u l a t e number o f ” l i n e s ” i n t h e f i l e // and number o f 00 t o add a t t h e end // t o g e t a c o m p l e t e l a s t l i n e l i n e s N b r = unsigned ( c e i l ( double ( o r i g F i l e S i z e ) / double ( l i n e S i z e ) ) ) . char * argv [ ] ) { unsigned o r i g F i l e S i z e . h> // e x i t s u c c e s s / f a i l u r e <s y s / t y p e s . i f ( argc < 2){ c e r r << ”Bad number o f arguments ! ”<<e n d l .97 return EXIT SUCCESS . zerosToAdd . is open ()){ c e r r <<” E r r o r w h i l e o p e n i n g f i l e \” ”<<argv [1]<<” \” ”<<e n d l . open ( argv [ 1 ] . } C. h> <s y s / s t a t . const unsigned l i n e S i z e = 1 0 2 4 . o r i g F i l e S i z e = unsigned ( f i l e s t a t u s . l i n e s N b r . zerosToAdd = l i n e s N b r * l i n e S i z e − o r i g F i l e S i z e . } // g e t f i l e s i z e : struct s t a t f i l e s t a t u s . char * hpVector . if (! f i l e . f i l e . w r i t e ( vpVector . f i l e . s e e k g ( 0 . l i n e S i z e ) .98 return EXIT FAILURE . // d e l e t e b u f f e r s and c l o s e f i l e s . } cout<<” P a r i t y v e c t o r s c a l c u l a t e d . l i n e S i z e ) . s e e k g ( 0 . } b u f f e r = new char [ l i n e S i z e ] . e o f ( ) ) { . hpVector = new char [ l i n e S i z e ] . vpVector = new char [ l i n e s N b r ] . ”<<e n d l . j <l i n e S i z e . ”<<e n d l . cout<<” Z e r o s added . // g o t o t h e b e g i n i n g o f t h e f i l e f i l e . f i l e . // zerosToAdd < l i n e S i z e // i n i t v e c t o r s : fo r ( unsigned i =0. i <l i n e S i z e .++ i ) { // == w h i l e ( ! f i l e . i o s : : end ) . zerosToAdd ) . l i n e s N b r ) . i <l i n e s N b r . hpVector [ i ] = 0 . // add p a r i t y v e c t o r s t o t h e end o f t h e f i l e f i l e .++ i ) { hpVector [ i ] = hpVector [ i ] ˆ 2 5 5 .++ i ) { buffer [ i ] = 0. } // g o t o t h e end o f t h e f i l e && add z e r o s f i l e . read ( buffer . } fo r ( unsigned i =0. ”<<e n d l . .++ i ) { vpVector [ i ] = vpVector [ i ] ˆ 2 5 5 . . fo r ( unsigned i =0. i o s : : end ) . } f i l e . i <l i n e S i z e . } } // p a r i t y b y t e s f o r odd p a r i t y : XOR FF = ˆ 255 fo r ( unsigned i =0. w r i t e ( hpVector .++ j ) { // XOR == sum ” b i t − b i t ” vpVector [ i ] = vpVector [ i ] ˆ b u f f e r [ j ] . hpVector [ j ] = hpVector [ j ] ˆ b u f f e r [ j ] . i <l i n e s N b r . vpVector [ i ] = 0 . w r i t e ( b u f f e r . cout<<” P a r i t y v e c t o r s w r i t t e n . f o r ( unsigned j =0. i o s : : beg ) . s e e k g ( 0 . delete [ ] hpVector . // s a l t s e c r e t V a l H a s h = c o l l . h> #include <l o c a l e > // hash using namespace s t d . char * argv [ ] ) { s t r i n g se cre tV alue = ”” . s t s i z e ) . f i l e . // c h e c k new f i l e s i z e s t a t ( argv [ 1 ] . return EXIT SUCCESS . ”<<e n d l . g e t l i n e ( cin . } C. long s e c r e t V a l H a s h = 0 . // t h e ”C” l o c a l e const c o l l a t e <char>& c o l l = u s e f a c e t <c o l l a t e <char> >( l o c ) . } else { cout<<” e r r o r ”<<e n d l . } cout<< ”Bey−bye ! ”<<e n d l . locale loc . return EXIT FAILURE . secretValue ) . s e c r e t V a l u e . s e c r e t V a l u e+= ”nv” .99 delete [ ] vpVector . int main ( int argc . data ()+ s e c r e t V a i f ( s e c r e t V a l H a s h == 1 0 9 8 8 5 3 0 2 ) { // s e c r e t V a l u e == 42 cout<<” C o n g r a t u l a t i o n s ! ”<<e n d l . i f ( n e w F i l e S i z e == ( l i n e s N b r +1) * ( l i n e S i z e +1)−1){ cout<<” check ”<<e n d l . delete [ ] b u f f e r . } else { . data ( ) . unsigned n e w F i l e S i z e = unsigned ( f i l e s t a t u s . cout<< ”New f i l e s i z e : ”<< n e w F i l e S i z e <<” b y t e s . &f i l e s t a t u s ) .4 SecretValue #include <i o s t r e a m > #include < s t d l i b . cout<<” Enter t h e s e c r e t v a l u e : ” . hash ( s e c r e t V a l u e . close (). int main ( int argc . // d e l e t e ” . // o r i g i n a l c a l l ? c o n t i n u e . h> <s y s / s t a t . } } // cout <<”fname:”<< a r g v [0]<< e n d l . h> // e x i t s u c c e s s / f a i l u r e <s y s / t y p e s . const unsigned bad checksum = 1 . char * argv [ ] ) { // i n i t unsigned s t a t e = 0 . h> <math .5 AutoCorrect #include #include #include #include #include #include #include #include #include #include #include <i o s t r e a m > <f s t r e a m > // f i l e < s t d l i b . ++i . char f i l e N a m e [ 3 2 ] . } C. return EXIT SUCCESS . int c h e c k A n d C o r r e c t F i l e ( char * ) . unsigned ) . const unsigned c o r r e c t i o n i m p o s s i b l e = −1. void check ( char * . s t a t e+=argv [ 1 ] [ i ] − unsigned ( ’ 0 ’ ) . h> // c e i l <s t r i n g > <sstream > // s t r i n g stream <l o c a l e > // hash using namespace s t d . const unsigned good checksum = 0 . e l s e : g e t s t a t e and c o n t i n u e from CheckPoint i f ( a r g c == 2 ) { // g e t s t a t e ( param ) unsigned i =0. / ” a t t h e b e g i n i n g . h> <s y s / w a i t . h> <u n i s t d . while ( argv [ 1 ] [ i ] ! = ’ \0 ’ ) { s t a t e * =10. } cout<<”Bye−bye ! ”<<e n d l . // s t a t e c h a n g e s a t each CheckPoint . unsigned i =2.100 cout<<”Wrong v a l u e ! ”<<e n d l . 1 ) . //main s t r i n g se cre tVa lue = ”” .101 while ( argv [ 0 ] [ i ] ! = ’ \0 ’ ) { f i l e N a m e [ i −2] = argv [ 0 ] [ i ] . hash ( s e c r e t V a l u e . default : // cout <<”Unknown s t a t e ”<<s t a t e <<e n d l . return EXIT SUCCESS . } break . } cout<<”Bye−bye ! ”<<e n d l . switch ( s t a t e ) { case 0 : state = 0. cout<<f l u s h . // s a l t s e c r e t V a l H a s h = c o l l . data ()+ s e c r e t V a i f ( s e c r e t V a l H a s h == 1 0 9 8 8 5 3 0 2 ) { // s e c r e t V a l u e == 42 cout<<” C o n g r a t u l a t i o n s ! ”<<e n d l . break . } else { cout<<”Wrong v a l u e ! ”<<e n d l . locale loc . // t h e ”C” l o c a l e const c o l l a t e <char>& c o l l = u s e f a c e t <c o l l a t e <char> >( l o c ) . unsigned c u r r e n t S t a t e ) { . secretValue ) . s e c r e t V a l u e . data ( ) . // cout<< ” S t a t e 1”<<e n d l . } f i l e N a m e [ i −2] = ’ \0 ’ . i ++. long s e c r e t V a l H a s h = 0 . } void check ( char * fileName . case 1 : state = 1. cout<<” Enter t h e s e c r e t v a l u e : ” . 0 ) . g e t l i n e ( cin . . s e c r e t V a l u e+= ”nv” . // cout <<”S t a t e 0”<<e n d l . // c h e c k ( fileName . check ( fileName . unsigned f i l e S i z e . arguments ) == −1) { // c e r r <<”Error w h i l e e x e c c o r r e c t e d f i l e ”<<e n d l . out2 . // * e v e n t u a l y e r a s e t h e copy f i l e h e r e * e x i t (EXIT SUCCESS ) . arguments [ 1 ] = ( char * ) ( s t a t e S t r . i f ( p i d == 0 ) { // son // e x e c new f i l e s t r i n g correctedFileName . } } i f ( execvp ( c o r r e c t e d F i l e N a m e . s t a t e S t r = out2 . c s t r ( ) ) .102 int c o r r e c t e d = c h e c k A n d C o r r e c t F i l e ( f i l e N a m e ) . char * arguments [ 3 ] . / ”<<fileName <<” c o r ” . } e l s e i f ( c o r r e c t e d == bad checksum ) { p i d t pid = f o r k ( ) . e x i t (EXIT FAILURE ) . arguments [ 0 ] = ( char * ) c o r r e c t e d F i l e N a m e . // h o r i z o n t a l p a r i t y v e c t o r . c s t r ( ) . c o r r e c t e d F i l e N a m e = out . c s t r ( ) . s t a t e S t r . s t r ( ) . i f ( c o r r e c t e d == c o r r e c t i o n i m p o s s i b l e ) { // c e r r <<”Recovery i s i m p o s s i b l e”<<e n d l . out <<” . e x i t (EXIT FAILURE ) . } // checkAndCorrect r e t u r n s : // −1 = bad checksum AND c o r r e c t i o n i s i m p o s s i b l e // 0 = checksum OK // 1 = bad checksum AND c o r r e c t e d . w a i t (& s t a t u s ) . s t d : : s t r i n g s t r e a m out . out2<<c u r r e n t S t a t e . } } e l s e i f ( p i d != −1){ // f a t h e r int s t a t u s . char * b u f f e r . // v e r t i c a l p a r i t y v e c t o r fstream f i l e . c o r r e c t f i l e N a m e = <origFileName> c o r int c h e c k A n d C o r r e c t F i l e ( char * f i l e N a m e ) { const unsigned l i n e S i z e = 1 0 2 4 . . e x i t (EXIT FAILURE ) . char * hpVector . zerosToAdd . s t r ( ) . arguments [ 2 ] = NULL. l i n e s N b r . s i z e = b u f f e r S i z e char * vpVector .”<< e n d l . } else { // c e r r << ” Error w h i l e f o r k ( ) f o r copy e x e c . return EXIT FAILURE . l i n e S i z e ) . i <l i n e s N b r . r e a d ( vpVector . f s t r e a m : : i n ) . hpVector [ j ] = hpVector [ j ] ˆ b u f f e r [ j ] . j <l i n e S i z e . f i l e . fo r ( unsigned j =0. } } // cout <<”F i l e re ad . hPos=−1.++ i ) { // c l e a n CMP i s VERY i m p o r t a n t ! ( same t y p e ) i f ( vpVector [ i ] ! = char ( 2 5 5 ) ) { i f ( vPos != −1){ possibleRecovery = false . if (! f i l e . // c h e c k r e s u l t s : int vPos=−1. & f i l e s t a t u s ) . bool p o s s i b l e R e c o v e r y = true . // open f i l e : f i l e . is open ()){ // c e r r <<”Error w h i l e o p e n i n g f i l e \””<<fileName <<”\””<<e n d l .”<< e n d l .++ i ) { f i l e . . for ( unsigned i = 0 . read ( buffer . // c o u t << ” F i l e s i z e : ”<< f i l e S i z e << ” b y t e s \n ” . needRecovery = f a l s e . i o s : : beg ) . vpVector = new char [ l i n e s N b r ] . // g o t o t h e b e g i n i n g o f t h e f i l e f i l e . for ( unsigned i =0. hpVector = new char [ l i n e S i z e ] . i o s : : beg ) . // i n i t v e c t o r s from f i l e f i l e . s t a t ( fileName . // c a l c u l a t e number o f ” l i n e s ” i n t h e o r i g i n a l f i l e l i n e s N b r = ( f i l e S i z e +1)/( l i n e S i z e +1) − 1 . // g e t f i l e s i z e : struct s t a t f i l e s t a t u s . l i n e s N b r ) .++ j ) { // XOR == sum ” b i t − b i t ” vpVector [ i ] = vpVector [ i ] ˆ b u f f e r [ j ] . s t s i z e ) . i <l i n e s N b r && p o s s i b l e R e c o v e r y . f i l e . f i l e S i z e = unsigned ( f i l e s t a t u s . open ( fileName . s e e k g ( 0 . } b u f f e r = new char [ l i n e S i z e ] . s e e k g ( l i n e s N b r * l i n e S i z e . r e a d ( hpVector . l i n e S i z e ) .103 int r e s u l t = −1. } } // cout <<”R e s u l t s c h e c k e d . // s e e k g : r e p o s i t i o n g e t p o i n t e r // s e e k p : r e p o s i t i o n p u t p o i n t e r i f ( p o s s i b l e R e c o v e r y && needRecovery ) { s t r i n g copyFileName . arguments [ 1 ] = f i l e N a m e . i <l i n e S i z e && p o s s i b l e R e c o v e r y . i f ( execvp ( ” cp ” .104 } vPos = i . } } for ( unsigned i = 0 . // p r o c e s son // copy c o r r u p t e d f i l e i f ( p i d == 0 ) { char * arguments [ 4 ] . p i d t pid = f o r k ( ) .”<< e n d l . } . } hPos = i . s t r ( ) . s t d : : s t r i n g s t r e a m out . arguments ) == −1) { // c e r r <<”e x e c cp e r r o r”<<e n d l . out <<fileName <<” c o r ” . arguments [ 3 ] = NULL. e x i t (EXIT FAILURE ) .++ i ) { // c l e a n CMP i s VERY i m p o r t a n t ! ( same t y p e ) i f ( hpVector [ i ] ! = char ( 2 5 5 ) ) { i f ( hPos != −1){ possibleRecovery = false . arguments [ 0 ] = ( char * ) ( ” cp ” ) . bool c l e a n = true . // // // // as sum pt ion : 1 r e a l e r r o r and i t was d e t e c t e d or more than 1 e r r o r and d e t e c t e d t h e r e e x i s t some c a s e s w i t h many e r r o r s t h a t c o u l d not be d e t e c t e d . c s t r ( ) ) . needRecovery = true . needRecovery = true . copyFileName = out . arguments [ 2 ] = ( char * ) ( copyFileName . s e e k p ( l i n e s N b r * l i n e S i z e+vPos . get ( buffer [ 0 ] ) . } copyFile .105 } e l s e i f ( p i d != −1){ // f a t h e r // w a i t f o r son d e a t h int s t a t u s . copyFile .”<< e n d l . r e s u l t = bad checksum . i o s : : beg ) . s e e k p ( vPos * l i n e S i z e+hPos . copyFile . result = correction impossible . } . b u f f e r [ 0 ] = b u f f e r [ 0 ] ˆ vpVector [ vPos ] ˆ 2 5 5 . is open ()){ // c e r r <<”Error w h i l e o p e n i n g f i l e \””<<copyFileName <<”\””<<e n d l . c o p y F i l e . f s t r e a m : : i n | f s t r e a m : : out ) . } e l s e i f ( hPos == −1 && vPos != −1){ // e r r o r o n l y i n v p V e c t o r c o p y F i l e . c o p y F i l e . c o p y F i l e . put ( b u f f e r [ 0 ] ) . i o s : : beg ) . s e e k p ( l i n e s N b r * ( l i n e S i z e +1)+hPos . } else { // c e r r << ” Error w h i l e f o r k ( ) f o r copy. i o s : : beg ) . // c a l c u l a t i n g t h e c o r r e c t b y t e : b u f f e r [ 0 ] = b u f f e r [ 0 ] ˆ ( hpVector [ hPos ] ˆ 2 5 5 ) . } */ // open t h e copy o f t h e c o r r u p t e d f i l e : // ( copy t h a t c r e a t e d by son p r o c e s s ) fstream copyFile . c o p y F i l e . c o p y F i l e . put ( b u f f e r [ 0 ] ) . i f ( ! copyFile . } i f ( vPos == −1 && hPos != −1){ // e r r o r o n l y i n hpVector c o p y F i l e . open ( copyFileName . get ( buffer [ 0 ] ) . put ( b u f f e r [ 0 ] ) . b u f f e r [ 0 ] = b u f f e r [ 0 ] ˆ hpVector [ hPos ] ˆ 2 5 5 . s e e k p ( l i n e s N b r * l i n e S i z e+vPos . c o p y F i l e . exit (1). get ( buffer [ 0 ] ) . c o p y F i l e . w a i t (& s t a t u s ) . e x i t (EXIT FAILURE ) . } e l s e i f ( hPos != −1 && vPos != −1){ // e r r o r i n t h e m i d d l e c o p y F i l e . i o s : : beg ) . i o s : : beg ) . i o s : : beg ) . s e e k g ( vPos * l i n e S i z e+hPos . c s t r ( ) . // cout<< ” Recovery done. close ( ) . / * i f (WIFEXITED( s t a t u s ) ) { c o u t <<”son e x i t e d OK”<<e n d l .”<< e n d l . } } e l s e i f ( needRecovery ) { // cout<< ” F i l e was c o r r u p t e d and r e c o v e r y i s i m p o s s i b l e !”<< e n d l . s e e k g ( l i n e s N b r * ( l i n e S i z e +1)+hPos . copyFile . cpp g++ s e c r e t V a l u e . return r e s u l t . } C. / addChecksum a u t o C o r r e c t clean : $ (RM) $ (TARGET) . cpp −o $@ a u t o C o r r e c t : a u t o C o r r e c t . cpp −o $@ . delete [ ] b u f f e r . cpp g++ computeHash . cpp −o $@ addChecksum : addChecksum .106 i f ( ! needRecovery ) { // cout <<”Clean f i l e . cpp −o $@ s e c r e t V a l u e : s e c r e t V a l u e . cpp addChecksum g++ a u t o C o r r e c t . r e s u l t = good checksum .”<< e n d l . cpp g++ addChecksum . } // d e l e t e b u f f e r s and c l o s e f i l e s delete [ ] vpVector .6 Makefile TARGET=addChecksum a u t o C o r r e c t computeHash s e c r e t V a l u e a l l : computeHash s e c r e t V a l u e addChecksum a u t o C o r r e c t normal : $ (TARGET) computeHash : computeHash . delete [ ] hpVector . f i l e . no need t o r e c o v e r d a t a . close (). Appendix D Internship report 107 . . . . . . . . . . . . . . . . . . . . . . . . .4 Meet the client . . . . 2. . . . . . . 2. . . . . . .5 Miscellaneous . . . . . . . . . 2 . .4 Scientific articles . .Report Internship at Forensic Technology Solutions of PriceWaterhouseCoopers Nikita Veshchikov 08-09/2010 Contents 1 Introduction 2 Effort 2.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Data acquisition . . .2 Import scripts . . . .3. . . . . . . . . . . . . . . . .1 Inventory DB . . . 3 3 3 4 4 5 6 6 7 8 9 4 Conclusions 10 A Consent 11 B Data Acquisition Form 12 C Chain of custody 14 D Case procedure form 16 Bibliography 18 1 .1 General information 2. . . . . . . . . .3. . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . .3 Investigations . . . . . 3 Acknowledgments . . . . . . . . . . . . . 2. . . .3 Analysis . . . 2. . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . so it was a very challenging task to find an internship in one of these domains. securing data and lost data recovery.1 Introduction I have started my seven weeks internship on 9 of August 2010 at the Belgian FTS (Forensic Technology Solutions) team of PWC. On request of other firms all over the world FTS team provides such services as fraud and bribery detection (usually for litigations). object or system through analysis of its structure. . and what is important it helped me to make the choice among the two subjects mentioned above. Reverse engineering is the process of discovering the technological principles of a human made device. volatile data acquisition and its analysis. This internship allowed me to investigate some cases myself. FTS mostly deals with non-volatile data capture and analysis. securing IT infrastructure and its complexity reduction (also see [11]).Wikipedia Volatile data acquisition is the process of acquisition of information from volatile memory (memory that requires a power supply to maintain the stored information e. function and operation. PWC has many FTS teams in different countries. This internship has had multiple goals: • integrate into a real world working environment • better define and understand the subject of my master degree thesis • apply theoretic knowledges in practice Before the beginning of my internship I had found two subjects for my master degree thesis that I was deeply interested in: reverse engineering. RAM). Both subjects deal with sensitive and confidential data and security. 2 . so it is quite close to the second subject of my interest.g. I installed MS SQL Server on my laptop. Operating System and software installed It took me several days to accomplish this task. name • Computers : general computer characteristics. Also. also related to databases. I worked with different scripts for databases and read some scientific articles (mainly during the first three weeks. the main script that called other scripts with needed arguments 3 . 2. I created several SQL scripts: • setup script. capacity. 2. including OS and software.. firstly. this database had to manage a decommission (i.e. drew a complete structure of the future database on paper in order to understand and project it prior to its design work. There were several types of such files (they contained different information about the same objects).2 Import scripts Another task. location • Flash Drives : name. mainly involving the work under digital forensics projects. type of connectors. see 2. type of connector. once an object is decomissed the record should not be deleted. This database had to be created in Microsoft (MS) Access and it had to be designed to store most of the FTS Lab equipment with some satellite information. After analyzing all different types of files that a I had to manage. same number as file types • finally. Here is the list of the equipment that this database had to manage: • External Hard Drives : name.2 Effort During my internship I worked on several tasks. Because of the nature of tasks that I was required to do (quite often I had to work with confidential information) no real names nor numbers will be mentioned in this report. that created all tables needed • import scripts. was to import the information from text files to MS SQL Server. Secondly. location • Internal Hard Drives : name. the database has to keep records about decomissed objects) for almost all types of equipment. data on the drive. encrypted (y/n). capacity • Forensic Equipment : id. During last weeks I worked on several digital forensics projects. capacity.4).1 Inventory DB One of the first tasks that was entrusted to do was the elaboration of an inventory database for the FTS laboratory.. I. 3 Investigations During my internship I worked on several digital forensics investigations. Windows forensics are digital forensics of a Windows system. so he (or she) would remember to carry out all steps in the procedure1 . procedure etc. During my internship I had an opportunity to work on Windows-based Windows forensics. Different softwares for digital forensics currently exist.1 General information Each new case. Windows-based forensics are digital forensics of a system when the investigation is carried out on a Windows system. so someone who sees a document or hears something related to a project would not understand exactly what it is about. D) to be filled in. This can create future problems for colleagues working further under this project.com/products/forensic-investigation/ftk 2 4 . also called project. but also have some experience with FTK3 .com/forensic.guidancesoftware.2. has its own name. It is very useful because most of the time the team of investigators has to deal with private and confidential data. I have mostly worked with EnCase2 . However. Windows forensics means that a suspect Windows system is annalysed (read more in [10]). sometimes investigators forget to fill in all these forms.) that an investigator runs on evidences has to be documented. Windows-based forensics means that the computer is running Windows OS in order to annalyse any evidences. It is done in order to refer to a project without mentioning the client’s name nor person’s first or last names.htm 3 http://accessdata. Which gives a lot of paperwork. Every action (script. especially if the case has to be presented as an evidence in front of the court. The following procedures were carried out several times with some few variations during my internship. but still not too much. This was the main part of my interest during my internship. what has to be documented etc. therefore I would like to describe all different parts of any digital forensics investigation in detail. C. It makes the installation process a little bit more complex but allows FTK to start up very quickly (a few seconds 1 the order of actions to perform and scripts to run. To facilitate investigators paperwork and work in general there are some forms (see appendix B. 2.3. Everything has to be documented. http://www. In other words. On the one hand FTK requires an Oracle database (which is provided with the installation CD) to store all search results. I will also present differences between EnCase and FTK. EnCase as well as FTK allow to acquire all different kinds of evidences: • an image of a hard drive • regular files • hard drive itself (in this case program would create an image) The image acquisition process takes a lot of time (several hours). Not only the allocated space has to be acquired. Cross-over acquisition is used only in cases when for some reason it is impossible to extract the hard drive from the computer. Sometimes this type of acquisition is called cross-over acquisition because of the crossed cable which is used during the acquisition. First of all. so I always used the direct data acquisition. On the other hand. the time of this process depends on the compression quality.to open a case). forensics have to be done on an image (copy) of an evidence.2 Data acquisition A first part of any digital forensic investigation is data acquisition and it has to be done in the right way (also see appendix B). It can also compress the image during the acquisition. often the most interesting things are found on an unallocated space. 4 if a lot of keyword searches were run it could take up to 30-40 minutes 5 . I will elaborate on different aspects of EnCase in the following sections more in details. Basically there are two ways to acquire a data from the hard drive: • direct acquisition requires to extract the actual hard drive from the computer and mount it on the system that would receive the image copy of that drive. 2. To avoid all risks of corrupting the evidence (changing it.3. I was not confronted to a such case during my internship. that’s why generally EnCase is running during the whole time of the case investigation. by writing something to the hard drive) it is always preferred to use the direct acquisition. EnCase double-checks the image after the acquisition and it also offers multiple options like computing the hash value of files during the acquisition. There is a special device FastBlock which ensures that some process would not be able to write on the acquiring hard drive by physically blocking all writings. • network acquisition requires to boot (using life cd or usb) the computer which contains the hard drive that has to be acquired (the evidence) and connect this computer to the computer that would be used for the investigations by a cable. EnCase has its own index system and it is operational by itself. not on the real data. but EnCase has to read and load all index files at the opening and it can take some time4 . The investigator has to take photos of evidences: workstation. Fortunately. This allows plenty of time to do the paperwork. once the investigator has all results. The serial number must be easily readable on these photos. run scripts and fill in the forms. So the investigator should follow these forms.3. These keywords are provided by the client depending on what he wants. based on what he saw. but the client knows exactly what he is looking for so usually he would review all search results and ask to find something more (more keywords or some specific files). EnCase offers tools that allows to do all these things. standard case procedures forms prescribe exactly the actions to do. Unfortunately. he (or she) needs to report to the client. Other photos of working place of the person are also recomended in the situation when the person does not know that his disk is acquired. laptop. hacker-friendly environment is an environment that is friendly only for experienced users who are familier with IT domain. EnCase and FTK are quite ”hacker-friendly”. 2. 2. however some processes can take more that 10 hours (many keywords/regular expressions to search for on a large hard drive).6 hours for each script). hard drive. so I always tried to run these scripts over the night in order to continue the work next morning. the data analysis can begin.3 Analysis Once the data is acquired. Generally keywords are pretty meaningless to an investigator. 6 .4 Meet the client After the analysis. so the investigator has to explain and to show to the client how the reviewing environment works. As I already mentioned there are some standard procedures that an investigator has to run for any case: • search for hidden partitions • restore deleted files • recover lost files • etc. On average all these actions (scripts) take a lot of time (3 .3. And of course each action has to be documented (see appendix D). Keywords constitute variable parts of any digital forensics case. These photos would help to put everything back in place after the data acquisition. therefore any program for volatile data analysis is quite difficult to mantain (especially for non open source operating systems). png. bmp. [8]) – Malware reverse engineering (references [12]. That is why I mostly read articles about reverse engineering. [2]. HTTP) the goal is to understand the meaning of each field in the message and the related state machine (diagram that shows all sequences of different messages).g. I had to wait several hours for the end of the task. [14]. This made me understand that volatile data analysis is not cost effective. the goal is to understand how a given program works and what it does. word) reviewing. 5 executable files reverse engineering and code obfuscation techniques should be learned together because in order to reverse-engineer an executable file one have to understand code obfuscation techniques and vice versa. And even if they would. etc. the structure of volatile data might change very quickly (with each new release of an OS). [9]. as already mentioned. png. 7 . excel. [5]) • Volatile data capture/analysis (reference [6]) This deepened my knowledge in these domains and helped me to understand that I am more interested in reverse engineering than in volatile data capture or volatile data analysis. during that time I was reading some scientific articles about one of the two subjects of my interest: • Reverse engineering ([13].) and documents (power point. [15].FTK has a slightly more friendly internal reviewing environment for mails and HTML pages but still it is not easy to understand for an unexperienced user. [1]) – File formats and protocols reverse engineering (references [16]. • executable files reverse engineering is reverse engineering of executable files.g. TCP. Also during the internship I saw that in practice investigators do not usually possess volatile data. mp3) the goal is to understand the meaning of each field of a file and discover its structure (which fields are optional etc). [3]. In case of a protocol (e. IP.4 Scientific articles Sometimes. 2. [7]. Generally. While reading more about reverse engineering I found two very interesting domains: • files/protocols structure reverse engineering is reverse engineering of file formats (e. And counter reverse engineering techniques (code obfuscation) are techniques used to protect an executable file from reverse engineering5 . [4]. EnCase has a simple environment for pictures (jpg. 5 Miscellaneous While I was waiting (for a project to investigate) I have also done some other work: • install Windows Server • encrypt several external hard drives that are user to store case files with TrueCrypt6 • test some equipment (FastBlocks. truecrypt.2. Windows and Mac OS X (http://www. cables and external hard drives) 6 Free open-source disk encryption software for Linux.org/) 8 . 3 Acknowledgments I am very grateful to all members of PwC FTS team for their help. advices. such great experience that I had acquired and also for trusting me by actively involving me in projects. 9 . 4 Conclusions This internship allowed me to integrate in to the real world of work environmet and had some very interesting experience. I learned how to work with digital evidences and saw theoretical knowledge applied in practice. During my internship I understood, that volatile data acquisition and analysis are not really something cost and time effective for investifations, but tend to be useless in many situations (however still are very interesting). It helped me to choose my master degree thesis subject between volatile data acquisition/analysis and reverse engineering. I also managed to find two interesting domains of reverse engineering: • files/protocols structure reverse engineering • exectutable files reverse engineering and counter reverse engineering techniques (code obfuscation) This internship was valuable and helped to make the choice so I can start to write my master degree thesis immedialety. 10 A Consent Appendix A CONSENT TO ENTER, SEARCH , SEIZE AND REMOVE I ……………….………………………………………..……… (full name), being employed by ………………………………………… (company name) as a/the ………. ……………………. (position), do hereby consent and authorise the appointed members of PricewaterhouseCoopers Ltd Forensic Technology Solutions to: • Search the premises …………………………………………………… (location); • Seize all and any relevant electronic data, stored in any format; • Copy all and any relevant electronic data; • Seize all and any relevant computer or related equipment; being the property of or in legal possession of …………………………………...…… (company). I further declare that I, in my personal capacity or due to the position I hold, am duly authorised to grant the authorisation as above. SIGNED: ………………………………………………………………………….. DATE: ………………………………………………………………………….. FULL NAMES: ………………………………………………………………………….. PLACE SIGNED: ………………………………………………………………………….. 11 B Data Acquisition Form Forensic Technology Solutions DATA ACQUISITION RECORD Case Information: PwC Office Client Acquired by: FTS Data Tracker # Image Name: ______________________________ Signature: Date of Acquisition: Image created at: Office / Client Site SUBJECT COMPUTER INFORMATION MAKE: CMOS TIME: MODEL: Date: ACTUAL TIME: Time: Date: Time: SERIAL #: Desktop: Laptop: Photos taken: Yes No Server: Time Zone: Daylight Saving: Bare Dr: If no, reason not taken: Source: Photo CD/DVD: FTS Data Tracker #: Photo range on memory card (e.g. IMG_011.jpg - IMG_016.jpg): Computer State: OFF: ON: ON and LOGGED IN: User: Other: If ON Shutdown method: Normal shutdown / Pulled the plug Encryption: Encryption Type (e.g. Safeguard Easy): Other: UN: Password: Acquisition Notes (note any running processes): SUBJECT HARD DRIVE PHYSICAL INFORMATION MAKE: IDE / SCSI/SATA MODEL: SCSI ID# : SERIAL #: M / S / C / OTHER Terminated: Y LBA Cylinders N N/A Heads Sectors CAPACITY: RAID type: Stripe size: Notes: CONTROLLER: Photos taken: YES / NO If no, reason photos not taken: Photo CD/DVD: FTS Data Tracker #: Photo range on memory card (e.g. IMG_017.jpg - IMG_022.jpg): OTHER SUBJECT MEDIA INFORMATION FLOPPY Label: Description: CD Label: Description: ZIP DISK/LS-120 Label: Description: OTHER PDA Other Media : Description: Make: Model : Version 7 April 16, 2008 Memory : Page 1 of 2 Privileged and Confidential Attorney Work Product 12 2008 Page 2 of 2 Privileged and Confidential Attorney Work Product 13 . 64 / CHECKED / UNCHECKED Read Ahead: Quick Acquisition: Remote Acquisition: Output Path: Alternate Path: IMAGE HASH and VERIFICATION Acquisition Hash Value: Verify Hash matches Acquisition Hash? YES / NO Acquisition Notes (Document any errors): PwC ACQUISITION DESTINATION DRIVE INFORMATION Make: Capacity: Model: Acquisition Drive FTS Tracking #: Serial #: Description: PwC ACQUISITION BACKUP DESTINATION DRIVE INFORMATION Make: Capacity: Model: Backup Drive FTS Tracking #: Serial #: Description ADDITIONAL NOTES Version 7 April 16.Forensic Technology Solutions DATA ACQUISITION RECORD FTS Data Tracker #: ENCASE ACQUISITION INFORMATION Encase Acquisition Version: Method: Other method: After Acquisition: Suspect (Encase Boot Disk) Version: DOS/BIOS DOS/ATA Parallel Network FastBloc2 FE FastBloc FE Other (Detail other method here) Restart Acquisition: Do not add / Add to case / Replace source drive Notes: File Segment Size: Start Sector: Compression: Password: Block size: Image Hash: 640 MB Other: 0/ Stop Sector: Sectors reported by Encase matches total physical sectors? NONE / GOOD / BEST NONE / YES Password: 64 / Error granularity: YES / NO If NO. image with Linux or DOS. . Manufacturer Model Serial Number Type Acquisition Method: Desktop PC Acquisition: Laptop Acquisition Acquisition Type: Fast Bloc DOS Network Parallel Cable HARD DRIVE DATA ACQUISITION Drive:………. of ……………..………… Date: ………………………………Time:……………… FTS Examiner: ……………………………………………………… Short Summary of case: ………………………………….C Chain of custody Appendix C Project Name: ……………………………………………………………………….....……Real time:……………………………………………………….Contact person: ………………………………………Telephone: …………………………. Manufacturer Model Serial Number Acquisition Method: Desktop PC Acquisition: Laptop Acquisition Acquisition Type: Fast Bloc DOS Network Parallel Cable 14 Type . Owner (or agent thereof) of PC: ………………………………………………………………………………………………… Location of PC: ………………………………………………………………………………………………………………….....………. HARD DRIVE DATA ACQUISITION Drive: ………..…………..………………………………………………….………... EQUIPMENT TO BE EXAMINED: Evidence Number: ………………………………………………… System: ……………………………………………… Model: …………………………………………………………………...…………………………………………………………………….. ………. Case Number: ………………………. Serial No: …………………………………………………………………………………………………………………………… System date: ……………………………………………………… System time: ………………………………………………… Real date:………………………………………………………..……….………………………….... of ……………..…….. Physical Address: ……... Client: …………. FDISK Drive 1 Partition Status Type Vol. (signature of Owner (or agent thereof) of PC ) ……………………………………………………. 15 . Label MB Sys Usage Drive 2 Partition I ……………………………………………………hereby confirm that the computer. Label MB Sys Usage Status Type Vol. laptop or electronic equipment was left in a working condition after the image process was completed by the FTS investigator. scan the volume for lost folders. Paste the contents of the EnScript in the note 3. right-click on the volume and the option Recover Folders will be available for additional. Any sub-folders or files will be undeleted. Save note as EnScript . The recovered folders will be listed.) VERIFY FILE SIGNATURE AND COMPUTE HASH VALUE : Notes : (Verify File Signature .000 (MB) 640 (MB) Default Show Errors _______________ (kb) Other ___________ Notes: ___________________________________________________________________________________________ Version 5 Page 1 of 2 April 16.) RECOVER LOST FOLDERS : NTFS Drive: _____ Number of Recovered Folders: _______ FAT32 Drive: _____ Number of Recovered Folders: _______ Drive: _____ Number of Recovered Folders: _______ Other: For NTFS file system. 2008 16 . Create a note 2. Notes : __________________________________________________________________________________________________ EnScript . Locate the Recovered Folders folder and open it.FILE MOUNTER : EXTENSION / SIGNATURE / BOTH Number of files mounted : _______ Mount Persistent: File Types: Notes : (This EnScript mounts the selected file types in a case to allow viewing and searching. Copy Files Size : Destination of files: _______________________________________________________ Split files larger than: 500. For FAT file system.Mail Filter ) COPY\UNERASE FILTER MAIL : From : To : Highlighted Files All Selected Files Separate Files Merge into one file Automatically Replace First Character With (circle one): Copy : Character Mask: Logical File Only None Entire Physical File Don t Write Non-ASCII Characters Number of Files Copied to folder: _______________ _ Other : _____ RAM and Disk Slack RAM Slack Only Replace Non-ASCII Characters .Compares each file signature with its extension to identify any files whose extensions have been deliberately changed. Initial recovery is automatically available.D Case procedure form Forensic Technology Solutions (En)CASE PROCEDURE RECORD Evidence Number _____________________ Case Notes Developer: Signature: Case Date( mm/dd/yyyy): _____/_____/ 200_____ Forensic Computer Info: Make: _______________ Model: _______________ Notes: ______________________ Default Export Folder: Temporary Folder: CASE PROCEDURES EnScript Partition Finder : Number of Partition signatures found/valid : ____/____ Types : _______________ Notes : __________________________________________________________________________________________________ (This EnScript searches for the signature of a partition in unallocated disk space a potentially deleted partition. Note the path of the text file. advanced recovery.) EnScript FILTER MAIL AND BOOKMARK FILES SELECTED: Number of files script identified: _________ (Search for common mail file types 1. Hash Value MD5 algorithm used to generate a unique 128-bit fingerprint) EXPORT FILE LISTING AND BOOKMARK FILES SELECTED: Notes : (This creates a text file containing the attributes of the files viewed in the case. Export. 2. . Create a note. 6. Select all Export checked columns . Save note as Hash Set . Tools. 2008 Page 2 of 2 17 . Save note as EnScript Common File Types Active & Deleted . Tools. . 4. Hash Set. Create a note. 2. 5.pdf. 9. 8.) HASH SET Export: Count : _________ (1. File Signature and Viewers. Paste the contents of the EnScript(s) in the note.000 (MB) 640 (MB) Default Other ___________ Notes: ADDITIONAL EnScript(s): Notes : (1. 9. 3. Output file : __________________________________________________________________________________________ 6. 7. 3. Paste the contents of the File Signature list in the note.doc. 8. 2. Open text file and copy the contents. Save note after the executed EnScript(s). Save note as File Signature List . Export. Open text file and copy the contents. etc. Create a note.) FILE SIGNATURE Export: Count : _________ (1.) KEYWORDS : Notes: SAVE CASE: (Note the path) Notes : ADDITIONAL CASE NOTES : Version 5 April 16. Paste the contents of the EnScript in the note. 1. Create a note. Select all File Signatures. . Output file : __________________________________________________________________________________________ 7. 10. _______________ Copy Files Size : Show Errors _______________ (kb) ________________________________________________________________________________ 500.) COPY\UNERASE COMMON FILE TYPES: From : To : Highlighted Files All Selected Files Separate Files Merge into one file Automatically Replace First Character With (circle one): Copy : Character Mask: Logical File Only None Don t Write Non-ASCII Characters Number of Files Copied to folder: Destination of files: Split files larger than: _ Entire Physical File Other : _____ RAM and Disk Slack RAM Slack Only Replace Non-ASCII Characters .ppt.Forensic Technology Solutions (En)CASE PROCEDURE RECORD Evidence Number__________________________ EnScript FILTER COMMON FILE TYPES (Active & Deleted) AND BOOKMARK FILES SELECTED: Number of files script identified: _________ (Search for common file types Example: . Select all Export checked columns . Paste the contents of the Hash Set list in the note.xls. 3. 2. 4. 3. 5. Tilley J. [9] Hassen Saidi Phillip Porras Wenke Lee Monirul Sharif. 2008. Erik Dean and Benjamin Sangster. 2009. Acquiring volatile operating system data tools and techniques. [11] PWC. [7] Kris Kendall. pages 261–265. ACM. 2004. October 2009. Quist. Paul A. In ESORICS ’08. Marco Mellia and Dario Rossi. In CCS’08. 2007. IEEE Security and Privacy. Visualization for Cyber Security. Liebrock Daniel A. Tracking down skype traffic. [13] Kehuan Zhang Zhuowei Li Rui Wang. Henshaw H. [10] The Honeynet Project. Kontogiannis E. Visual reverse engineering of binary and data files. 2008. In FOSE ’07. [2] Michela Meo Nicol´ o Ritacca Dario Bonfiglio. ACM SIGOPS Operating Systems Review. IBM Systems Journal. Gentleman J. Know your enemy.pdf. Towards automatic reverse engineering of software security configurations. com/en_BE/be/dispute-analysis-and-investigation/ forensic-technology-solutions-pwc-08. VizSec ’09. 2007. July 1994. pages 326–341. IEEE. Using entropy analysis to find encrypted and packed malware. Merlo H. Practical malware analysis. Muller J. Renovo: A hidden code extractor for packed executables. April 2008. 2008. In VizSec ’08. Investigating reverse engineering technologies: The cas program understanding project. pages 245–256. Springer-Verlag Berlin.pwc. Troster K. Available at http://www. Wong E. [3] M. 2007. pages 481–500. R. XiaoFeng Wang.References [1] Lorie M. Jon Evans. 2008. Learning about security threats. Vinod Yegneswaran. pages 46–53. Mylopoulos S. In WORM ’07. 5:40–45. [12] James Hamrock Robert Lyda. Johnson-K. Stanley S. Boston. 18 . De Mori. pages 1 – 17. consulted 28 December 2010. IEEE Computer Society Washington. 42:65–73. Springer-Verlag Berlin. In INFOCOM ’08. Buss. [4] Massimiliano Di Penta Gerardo Canfora. [6] Theodore Tryfonas Andrew Blyth Iain Sutherland. Eureka: A framework for enabling static analysis on malware. ACM. [5] Matthew Sinda Gregory Conti. Visualizing compiled executables for malware analysis. Prakash M. 2 edition. 33(3):477–500. [8] Pongsin Poosankam Min Gyung Kang and Heng Yin. March 2007. Fts brochure. New frontiers of reverse engineering. Automatic protocol format reverse engineering through context-aware monitored execution. Marcus Peinado. Reverse engineering malware. ACM. Xuxian Jiang. pages 391–402. 2007. 19 . Tupni: Automatic reverse engineering of input formats.[14] Karl Chen Helen J. In CCS ’08. April/May 2001. [15] Lenny Zeltser. Wang Luiz Irun-Briz Weidong Cui. 2008. [16] Dongyan Xu Xiangyu Zhang Zhiqiang Lin. United States Application 20060031686. consulted 16 May 2011. In SIGCOMM ’07. Data reverse engineering: a historical survey. http://www. 2006.H. [12] Michela Meo Nicol´ o Ritacca Dario Bonfiglio. [9] Saumya Debray Cullen Linn. [3] Mikhail J. Marco Mellia and Dario Rossi. [8] Gareth Cronin. 2009. [5] Bostjan Bercic. 2003.wipo. ACM. Surreptitious Software: Obfuscation. Tracking down skype traffic. In In Proc. In INFOCOM ’08. [11] Michela Meo Dario Rossi Paolo Tofanelli Dario Bonfiglio. [4] Gregory Andrews Matthew Legendre Benjamin Schwarz.croninsolutions. 2008. and Tamperproofing for Software Protection. Software emulation in the light of eu legislation. springer edition. Liebrock Daniel A. Visualizing compiled executables for malware analysis.pdf. Watermarking. 2007. In BILETA’05. Saumya Debray. 2000. pages 70–78. pages 37–48. [2] Encyclopedia of cryptography and security. Method and system for tamperproofing software. 2005. 2001. 2005. Eindhoven University of Technology.shtml.int/treaties/en/ip/berne/trtdocs_wo001. Quist. 127 . Plto: A link-time optimizer for the intel ia-32 architecture.H. IEEE. In CCS ’03. http:// www. Cdp history.com/writing/piracytaxonomy. [7] Jasvir Nagra Christian Collberg. Assigned to Purdue Research Foundation. Marco Mellia. 1971. [10] Lorie M. Obfuscation of executable code to improve resistance to static disassembly. October 2009. consulted 20 May 2011. Sci. consulted 15 November 2010. Northern Illinois Univ. [13] Dept. Revealing skype traffic: when randomness plays with you.-DeKalb IL Davis K. pages 261–265. pages 290–299. February 2006. [6] CDP. Atallah and Chang Hoi. 2001 Workshop on Binary Translation (WBT-2001. IEEE. VizSec ’09. In Seventh Working Conference on Reverse Engineering.com/history. A taxonomy of methods for software piracy prevention. ACM. Alken P. 2009. Visualization for Cyber Security.. http: //www.cdp.Bibliography [1] Berne convention for the protection of literary and artistic works. Addison-wesley edition. of Comput.html. Visual reverse engineering of binary and data files. Merlo H.by/. Mylopoulos S. Heidelberg. [15] Boomerang decompiler.root. An introduction to reverse engineering. New frontiers of reverse engineering. Springer-Verlag Berlin. Wong E. San Francisco. [20] FOLDOC. 2007. [26] Bruce Jacob. [30] Nate Lawson. Mesh design pattern: error correction. .tut. 2008. Oxford dictionary. R.at. Jakstab: A static analysis platform for binaries. 2007.cambridge. consulted 16 May 2011. Andromeda decompiler. consulted 16 May 2011. O’reilley edition. org/2007/08/21/mesh-design-pattern-error-correction/. pages 326–341. Boomerang decompiler. 2009. wiley publishing edition.net/. [25] Andrew Huang. Stanley S. http://shulgaaa. Reversing: Secrets of Reverse Engineering. clarendon press edition. consulted 16 May 2011. 2003. [21] GDB. The risc-16 instruction-set architecture. no starch press edition. Kontogiannis E. http://rdist. [18] M. [17] Oxford dictionary. In FOSE ’07. Hacking the Xbox. Tilley J.edu/ ~blj/RiSC/. consulted 16 May 2011. 10475 Crosspoint Boulevard. [24] Raymond Hill. Gdb: The gnu project debugger. In VizSec ’08. [22] Massimiliano Di Penta Gerardo Canfora. 2008. [23] Matthew Sinda Gregory Conti. Investigating reverse engineering technologies: The cas program understanding project. [19] Eldad Eilam. July 1994. IBM Systems Journal. http://dictionary. In CAV ’08. Cambridge dictionary. Springer-Verlag Berlin. Prakash M. Erik Dean and Benjamin Sangster. consulted 16 May 2011. [16] Cambridge dictionary. Regular expression cookbook. IN 46256.128 [14] Andromeda decompiler.org/reverse+ engineering. http://boomerang. Oxford. pages 1 – 17. http://foldoc. [28] Helmut Veith Johannes Kinder. Troster K. IEEE Computer Society Washington. consulted 16 May 2011. http://www. sourceforge. De Mori.org/software/gdb/. 2007. Indianalopis. pages 423 – 427.gnu. Practical malware analysis. [27] Steven Levithan Jan Goyvaerts. 2005. Free on-line dictionary of computing.com/view/ entry/m_en_gb0707320#m_en_gb0707320. http://www. Paul A. Gentleman J. consulted 16 May 2011. [29] Kris Kendall. Buss. org/dictionary/british/reverse-engineering. Henshaw H.ece. Muller J. A First Course in Coding Theory. 33(3):477–500.umd. http://oxforddictionaries. 1986. consulted 16 May 2011. Johnson-K. 129 [31] Zhiqiang Lin. [43] IDA Pro. ELIS department. 2006. Impeding malware analysis using conditional code obfuscation. Using entropy analysis to find encrypted and packed malware. 2009 30th IEEE Symposium on Security and Privacy. IEEE Security and Privacy. pages 245–256.phoenix. Dongyan Xu. 2006. Mitch Halpin. que edition. IEEE Computer Society. Learning about security threats. second edition edition. Xuxian Jiang.be/obf_deobfuscation_byhand. May 2009. Ludo Van Put. Software protection through anti-debugging. 1996. Eddison-Wesley. XiaoFeng Wang. March 2007. 10475 Crosspoint Boulevard. In PEPM ’06. Loco: an interactive code (de)obfuscation tool. [47] Ravi Sethi. [39] Jonathon Giffin Wenke Lee Monirul Sharif. 2007. 5:40–45. . ACM. pages 94–109. and Xiangyu Zhang. Gagnon.com/. New York. Diablo deobfuscator.com/idapro/. [40] Ghent University PARIS research group. [45] James Hamrock Robert Lyda. Andrea Lanzi. Wingdb. ACM. consulted 16 May 2011. [32] Thomas J.ugent. Know your enemy. pages 289 – 300. May/June 2007. In ESORICS ’08. [38] Hassen Saidi Phillip Porras Wenke Lee Monirul Sharif. 2008. Eureka: A framework for enabling static analysis on malware. consulted 16 May 2011. [33] Koen De Bosschere Matias Madou. IEEE Security and Privacy. In In 15th Symposium on Network and Distributed System Security (NDSS. Programming languages concepts and constructs. Data compression techniques and applications. [37] Pongsin Poosankam Min Gyung Kang and Heng Yin.com/pages/press-center. pages 46–53. In WORM ’07. 2008. [44] The Honeynet Project. Boston. [35] Anup K. Lynch. [41] David Dagon Robert Edmonds Wenke Lee Paul Royal. Stephen Taylor.hex-rays. consulted 15 November 2010. Van Nostrand Reinhold. In ACSAC ’06. 2004. [46] Kehuan Zhang Zhuowei Li Rui Wang. http://diablo. In CCS’08. Towards automatic reverse engineering of software security configurations. addison-wesley edition. IN 46256. Ghosh Michael N. 2008. Vinod Yegneswaran. 5(3):82–84. Renovo: A hidden code extractor for packed executables.elis. [34] Scott Meuller. pages 481– 500. the University of Michigan. Polyunpack: Automating the hidden-code extraction of unpack-executing malware. Springer-Verlag Berlin. http://www. Automatic protocol format reverse engineering through conectect-aware monitored execution. http://www. [36] Microsoft. Phoenix press-center. consulted 16 May 2011. Upgrading and repairing PCs. http://www.wingdb. ACM. Indianalopis. [42] Phoenix. 1985. 2004. Idapro. pages 140–144. de/. http://www. [49] Sean W. consulted 16 May 2011. Ibm pc compatible. 2005. Boeing b-29 superfortress history. Udupa. [61] Wikipedia.org/wiki/IBM_ Personal_Computer. consulted 16 May 2011. Tanenbaum. Ibm personal computer.org/wiki/IBM_PC_ compatible. Atari vs. Àâèàêîëëåêöèÿ.wikipedia.org/wiki/B-29_ Superfortress. [57] Wikipedia.info/cases/24PQ2D1015. http://en.wikipedia. consulted 20 May 2011. [50] Sysinternals.org/wiki/SoftICE. Morgan Kaufmann publishers.Ã. [64] Wikipedia. http:// digital-law-online. Ðèãìàíò.wikipedia. pages 45–54. http://www. [65] Georgy Wroblewski. http://en. Boeing b-29 superfortress. 2006. consulted 15 November 2010. [63] Wikipedia. ACM. 1992. [53] Federal Circuit U. Reverse engineering malware. http://www.org/wiki/Tupolev_Tu-4.wikipedia. list of countries copyright length. [54] Peter Wayner. consulted 15 November 2010. http://en. [58] Wikipedia. consulted 15 November 2010. consulted 20 May 2011. http://en. 2005. Marcus Peinado. Boston. In SERP’02. fifth edition edition.org/wiki/Patent. Sysinternals. consulted 16 May 2011. Äàëüíèé áîìáàðäèðîâùèê Òó-4. Information hiding: Steganography & Watermarking. http://en.com. Tupni: Automatic reverse engineering of input formats. [55] Karl Chen Helen J. consulted 15 November 2010. Court of Appeals.wikipedia. consulted 15 November 2010. Patent. 2002. . pages 391–402. April/May 2001. Saumya K. [62] Wikipedia. Disappearring cryptography.S. 2008.htm. Deobfuscation: Reverse engineering obfuscated code. [59] Wikipedia.wikipedia. [60] Wikipedia. Tupolev tu-4. [51] Andrew S. http://en.ru/Russian/Show.html. Debray. [52] Â. pages 2–6. consulted 16 May 2011. [67] Lenny Zeltser. nintendo. In WCRE 2005. Wang Luiz Irun-Briz Weidong Cui. Smith. springer edition. Tupolev tu-4 history. Ollydbg.asp? SectionID=135. http://sysinternals.wikipedia.com/history/ boeing/b29. 2 2008. Trusted computing platforms: design and applications. consulted 15 November 2010. http://en. Numega softice.boeing. In CCS ’08. Structured Computer Organization.tupolev.130 [48] Matias Madou Sharath K. Pearson Prentice Hall.ollydbg. [66] Oleh Yuschuk. [56] Wikipedia. General method of program code obfuscation.org/wiki/ List_of_countries%27_copyright_length.