Dedication We dedicate this work .. to the great persons Founded through the history giving everything to their nations and they didn’t take any things.! Also to any person around this world that searching about the pure love and pure Peace among the civilizations and he think this love can save this world from it’s narrow disputation.! SE4 compiler design workgroup SE4 Compiler Design Documentation Page 2 Contents Chapter One 1 Introduction ………………………………………………………………..… 7 1.1 Preface ………………………………………………………………… 7 1.2 Project proposal …………………………………………………………7 The plan ……………………………………………………………..7 1.4 Setting Milestones ………………………………………………………8 1.5 Report Overview ……………………………………………………….. 8 1.3 Chapter Two 2 Compiler Analysis …………………………………………………………..10 2.1 What Is a Compiler ……………………………………………………11 2.2 History of a Compiler ………………………………………………….11 2.3 Compiler Vs Interpreter ……………………………………………….12 2.4 Compiler Phases ……………………………………………………….12 2.4.1 Front End Phases ………………………………………………..13 2.4.2 Back End Phases …………………………………………………14 2.5 Object Oriented development …………………………………………15 2.6 Compilation Processes …………………………………………………15 2.6.1 Lexical Analyzer Phase ……………………………………………16 2.6.2 Syntax Analyzer Phase ……………………………………………16 2.6.2.1 Context Free Grammar ( CFG ) …………………………...17 2.6.2.2 Derivation the Parse Tree …………………………………17 2.6.3 Symantec Analyzer Phase …………………………………………18 2.6.3.1 Types and Declarations ……………………………………19 2.6.3.2 Type Checking ……………………………………………..19 2.6.3.3 Scope Checking ……………………………………………19 Page 3 SE4 Compiler Design Documentation 2.6.4 Intermediate Code Generator Phase ……………………………...19 2.6.5 Code Optimizer Phase ……………………………………………..20 2.6.5.1 Types of optimizations …………………………………….21 2.6.6 Code Generator Phase ……………………………………………..21 2.6.7 Error Handling methods …………………………………………..22 2.6.7.1 Recovery from the lexical phase errors ……………………23 2.6.7.2 Recovery from the Symantec phase errors …………….…23 2.6.8 Symbol Table & Symbol Table Manger …………………………..23 Chapter Three 3 Compiler Design 3.1 Compiler Design ……………………………………………………….25 3.2 The Context of the System……………………………………………..25 3.2.1 Compiler Architectural Design……………………………………25 3.2.1.1 Compiler Structuring………………………………………..25 3.2.1.1.1 Compiler Organization……………………………..26 3.2.1.2 Compiler Control Modeling………………………………….27 3.2.1.3 Compiler Modular Decomposition…………………………..28 3.2.1.4 Compiler Domain Specific Architectures…………………….28 3.3 Compiler USE Case Models……………………………………………..28 3.3.1 Lexical Analyzer USE Case Diagram………………………………29 3.3.2 Syntax Analyzer USE Case Diagram………………………………29 3.3.3 Symantec Analyzer USE Case Diagram…………………………..30 3.3.4 I.R Code Generator USE Case Diagram…………………………..30 3.3.5 I.R Code Optimizer USE Case Diagram…………………………..31 3.3.6 Code Generator USE Case Diagram……………………………..32 3.3.7 Symbol Table Manager USE Case Diagram……………………...32 3.3.8 Error Handler USE Case Diagram………………………………..33 SE4 Compiler Design Documentation Page 4 3.4 Compiler’s Subsystems Activity Diagrams………………………........33 3.4.1 Lexical Analyzer Activity Diagram………………………………34 3.4.2 Syntax Analyzer Activity Diagram………………………………35 3.4.3 Symantec Analyzer Activity Diagram………………………..…36 3.4.4 I.R Code Generator Activity Diagram………………………..…37 3.4.5 I.R Code Optimizer Activity Diagram………………………..….38 3.4.6 Code Generator Activity Diagram………………………………39 3.4.7 Symbol Table Manager Activity Diagram……………………….40 3.3.8 Error Handler USE Case Diagram………………………………..41 3.5 Compiler Class Diagram………………………………………………..42 3.6 Compiler sequence Diagram……………………………………….….44 3.6.1 Lexical analyzing sequence diagram………………………..….44 3.6.2 syntax analyzing sequence diagram……………………………45 3.6.3 Symantec Analyzing sequence diagram……………………..…46 3.6.4 I.R Code Generation sequence diagram………………………..47 3.6.5 I.R Code Optimizing sequence diagram…………………………48 3.6.6 Code Generation sequence diagram…………………………...49 3.6.7 Additional sequence diagram…………………………………...50 Glossary ………………………………………………………51 Bibliography ………………………………………………..…54 SE4 Compiler Design Documentation Page 5 Introduction SE4 Compiler Design Documentation Page 6 . When the source language is large and complex. The quality of the end product depends on informed decisions at each step of way. analysis this information. and high quality output is required the design may be split into a number of relatively independent phases. The compiler writer must choose a path through a decision space that is filled with diverse alternatives. and we have a main of advantage that we doesn't find in any book that illustrate the compilers construction. 1. the approach taken to compiler design used to be directly affected by the complexity of the processing. this is done side by side with the principles of the software engineering phases. It also becomes much easier to replace a single phase by an improved one. monolithic piece of software. Implement the software design principles in our work and establishing it step by step till the end of this development. Having separate phases means development can be parceled up into small parts and given to different people. Each decision has an impact on the resulting compiler. the experience of the person(s) designing it. Our Project is to design a small functional compiler by take in consideration the above hints. and the resources available. advantages. SE4 Compiler Design Documentation Page 7 . This project or a short study tries to explore the design space – to present some of the ways that used to design or construct a compiler. So.3 The plan We planned to make some common tasks that every software developer does on his/her work plan ( information gathering. or to insert new phases later (eg. and implement his/her project ). 1. each with distinct costs. A compiler for a relatively simple language written by one person might be a single. this advantage is the UML ( unified modeling language ) diagrams that illustrate the compiler development process from many sides and also describe the small procedures that making the subsystems of the compiler and the communication between them that lead to produce a target file of the compiler.2 Project proposal In the early days. and complexity.1 Preface Compiler design and construction is an exercise in engineering design. additional optimizations). designing the basic diagrams. 1. 5 Report Overview Chapter 1 A brief overview of what will follow in the report is given here. 1.4 Setting Milestones In order to establish a sentiment of progression towards the goals of the project. perception by giving the illustration of the theoretical concepts and principles of compiler phases analysis. o Implementation of the Sequence Diagrams. certain milestones have been set : Implementation of the Object Oriented Strategy in the analysis phase of the development. Implementation of the UML ( unified modeling language ) Diagrams in the Design phase as follow … o Implementation of the Use Case Diagrams. o Implementation of the Activity Diagrams. Chapter 2 ( Compiler Analysis ) Describe why we use the object oriented technique in the analysis phase. SE4 Compiler Design Documentation Page 8 . and giving the main diagrams during the design phase to simplify the operation of implementation that done by the programmers. Chapter 3 ( Compiler Design ) describes the main design aspects of the project commencing with the Requirements Analysis and concluding with specific designing of project parts. 1. Implementation of the Object Oriented Strategy in the design phase of the development. how the Compiler is produced. o Implementation of the Class Diagrams and the relations between this classes. Chapter Two Compiler Analysis SE4 Compiler Design Documentation Page 9 . Compiler programs can also be specialized. Cross compiler programs have also been developed. This was a very time consuming task. it is pretending to be that other computer. Errors in the way the program functions are called logic errors. Errors that limit the compiler in understanding a program are called syntax errors. and the first compiler was written. the executable file. This made writing programs easier and allowed more people to begin writing programs. the steps required to execute the step by step program. so portions of this task were automated or programmed. since it not usable by the computer. This is not always possible. In this way. A first pass could take a very natural language and make it closer to a computer understandable language. With the newer version. compiler programs continue to evolve. a human will operate very slowly and find the information contained in the long string of 1s and 0s incomprehensible. This frequently does not work very well. for cross compilation to work. At the level at which computer programs operate. composed of a series of steps that were originally translated by hand into data the computer could understand. more rules could be added to the compiler program to allow a more natural language structure for the human programmer to operate with. the 1s and 0s. This program assembled. you need both the original natural language text that describes the program and a computer that is sufficiently similar to the original computer that the program can function on to run on a different computer. since it is what is actually executed or run by the computer. At its most basic level. 2. so both techniques are in use. a program that was written to run on an Intel computer can sometimes be cross compiled to run a on computer developed by Motorola. the computer hardware can look very different. a 1 and a 0. whereas logic errors are a bit more like grammatical errors. as each sub task is different. or compiled. compilers were very simple programs that could only translate symbols into the bits. A compiler is a computer program that bridges this gap. like a computer program. Programs were also very simple. As more people started writing programs. more ideas about writing programs were offered and used to make more sophisticated compilers. the computer understood. A cross compiler allows a text file set of instructions that is written for one computer designed by a specific manufacturer to be compiled and run for a different computer by a different manufacturer. SE4 Compiler Design Documentation Page 10 . Some compilers are multistage or multiple pass. Splitting the task up like this made it easier to write more sophisticated compilers. Pseudo‐code is very structured. The final output is called the executable file. For example. Certain language structures are better suited for a particular task than others. Syntax errors are like spelling mistakes. The intermediate output in a multistage compiler is usually called pseudo‐code. improve and become easier to use. Logic errors are much harder to spot and correct. not free flowing and verbose like a more natural language. since two programs are running at once. so specific compilers were developed for specific tasks or languages.1 What Is a Compiler ? A compiler is a special type of computer program that translates a human readable text file into a form that the computer can more easily understand. even if they may look similar to you. In the beginning. If a computer is emulating a different computer. At this level. These simple compilers were used to write a more sophisticated compiler. a computer can only understand two things. However. A second or even a third pass could take it to the final stage. the program that is pretending to be the other computer and the program that is running. Cross compilation is different from having one computer emulate another computer. It also made it easier for the computer to point out where it had trouble understanding what it was being asked to do. Emulation is frequently slower than cross compilation. The advantage of this is it's simple and you can interrupt it while it is running. The first self‐hosting compiler — capable of compiling its own source code in a high‐level language — was created for Lisp by Tim Hart and Mike Levin at MIT in 1962. The first compiler was written by Grace Hopper. The machine code is then discarded and the next line is read. Examples of interpreters are Basic on older home computers. The very limited memory capacity of early computers also created many technical problems when implementing a compiler. in 1952. Building a self‐hosting compiler is a bootstrapping problem— the first such compiler for a language must be compiled either by a compiler written in a different language. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of CPUs started to become significantly greater than the cost of writing a compiler. So we need a way of converting our instructions (source code) into machine language. editing SE4 Compiler Design Documentation Page 11 .2 History of a Compiler Software for early computers was primarily written in assembly language for many years. although both Pascal and C have been popular choices for implementation language. compilers have become more and more complex. in 1960. 2. and languages such as Lisp and Forth. An interpreter reads the source code one instruction or line at a time. Towards the end of the 1950s. The biggest advantage of this is that the translation is done once only and as a separate process. for the A‐0 programming language. The disadvantage is that every line has to be translated every time it is executed. converts this line into machine code and executes it. Subsequently. Because of the expanding functionality supported by newer programming languages and the increasing complexity of computer architectures. change the program and either continue or start again. several experimental compilers were developed. in 1957.3 Compiler Vs Interpreter We usually prefer to write computer programs in languages we understand rather than in machine language. 2. The program that is run is already translated into machine code so is much faster in execution. The disadvantage is that you cannot change the program without going back to the original source code. This is done by an interpreter or a compiler. and script interpreters such as JavaScript. In many application domains the idea of using a higher level language quickly caught on. even if it is executed many times as the program runs. machine‐independent programming languages were first proposed. This completely separates the source code from the executable file. The FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler. or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an interpreter. COBOL was an early language to be compiled on multiple architectures. Early compilers were written in assembly language. Because of this interpreters tend to be slow. but the processor can only understand machine language. A compiler reads the whole source code and translates it into a complete machine code program to perform the required tasks which is output as a new file. Since the 1970s it has become common practice to implement a compiler in the language it compiles. When the source language is large and complex. which converts the input character sequence to a canonical form ready for the parser.4. identifier or symbol name. Languages which strop their keywords or allow arbitrary spaces within identifiers require a phase before parsing. C#. additional optimizations). It also becomes much easier to replace a single phase by an improved one. Assemblers are normally used only by people who want to squeeze the last bit of performance out of a processor by working at machine code level. C++. recursive‐descent.1 Front End Phases The front end analyzes the source code to build an internal representation of the program. Lexical analysis breaks the source code text into small pieces called tokens. for instance a keyword. Line reconstruction. or to insert new phases later (eg. called the intermediate representation or IR. Cobol. 2. the experience of the person(s) designing it. and the resources available. table‐driven parsers used in the 1960s typically read the source one character at a time and did not require a separate tokenizing phase. which includes some of the following: 1. It also manages the symbol table. Ada. a data structure mapping each symbol in the source code to associated information such as location. Atlas Autocode. 2. Current examples of compilers are Visual Basic. You will sometimes see reference to a third type of translation program: an assembler. SE4 Compiler Design Documentation Page 12 . but works at a much lower level. A compiler for a relatively simple language written by one person might be a single. Having separate phases means development can be parceled up into small parts and given to different people. where one source code line usually translates directly into one machine code instruction. The top‐down. type and scope. so a finite state automaton constructed from a regular expression can be 2. The token syntax is typically a regular language. This is done over several phases. C. monolithic piece of software. Each token is a single atomic unit of the language. Pascal and so on. the approach taken to compiler design used to be directly affected by the complexity of the processing. that and recompiling (though for a professional software developer this is more of an advantage because it stops source code being copied). and Imp (and some implementations of Algol and Coral66) are examples of stropped languages whose compilers would have a Line Reconstruction phase. and high quality output is required the design may be split into a number of relatively independent phases. Fortran. This is like a compiler.4 Compiler Phases In the early days. g. C. Accurate analysis is the basis for any compiler optimization. This phase performs semantic checks such as type checking (checking for type errors). This phase is also called lexing or scanning. Analysis: This is the gathering of program information from the intermediate representation derived from the input.. alias analysis. Some literature uses middle end to distinguish the generic analysis and optimization phases in the back end from the machine‐dependent code generators. augmented. Some languages.4. or object binding (associating variable and function references with their definitions). Syntax analysis involves parsing the token sequence to identify the syntactic structure of the program. 3. Semantic analysis usually requires a complete parse tree. and logically precedes the code generation phase. 2. e. meaning that this phase logically follows the parsing phase. This phase typically builds a parse tree. Semantic analysis is the phase in which the compiler adds semantic information to the parse tree and builds the symbol table. Preprocessing.g. SE4 Compiler Design Documentation Page 13 . Typical analyses are data flow analysis to build use‐define chains. some languages such as Scheme support macro substitutions based on syntactic forms. require a preprocessing phase which supports macro substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic or semantic analysis. the preprocessor manipulates lexical tokens rather than syntactic forms. used to recognize it. 5. e. though it is often possible to fold multiple phases into one pass over the code in a compiler implementation. The main phases of the back end include the following: 1.2 Back End Phases The term back end is sometimes confused with code generator because of the overlapped functionality of generating assembly code. which replaces the linear sequence of tokens with a tree structure built according to the rules of a formal grammar which define the language's syntax. escape analysis etc. The parse tree is often analyzed. However. and the software doing lexical analysis is called a lexical analyzer or scanner. in the case of C. and transformed by later phases in the compiler. pointer analysis. The call graph and control flow graph are usually also built during the analysis phase. 4. rejecting incorrect programs or issuing warnings. or definite assignment (requiring all local variables to be initialized before use). dependence analysis. They hide information about the representation of the state and hence limit access to it. and illustrate each one carefully. dead code elimination. loop transformation. There may be close relationships between some problem objects and some solution objects but the designer inevitably has to add new objects and to transform problem objects to implement the solution. because as we know the compiler is a software system and here the analysis stage of our project doesn’t completed without using the correctly if we use another strategy to analyses it. • Object‐oriented analysis is concerned with developing an object‐oriented model of the application domain. we use this stage of development to analysis our compiler because we are seeing that the object oriented analysis stage is the best stage to collect our data from the previous compilers or from any references books that talk about the compiler .. An object‐oriented programming language. The executing system is made up of interacting objects that maintain their own local state and provide operations on that state information . The objects in an object‐oriented design are related to the solution to the problem that is being solved. 2.. And here with our project . register allocation or even automatic parallelization. 2. supports the direct implementation of objects and provides facilities to define object classes. The identified objects reflect entities and operations that are associated with the problem to be solved. After this shortly illustration of the object oriented development process . Popular optimizations are inline expansion.designers and programmer think in terms of ‘things’ instead of operations or functions. An object‐oriented development shorty described as the following hints . we will give the full data about the terminologies in our work. usually the native machine language of the system. 3. • Object‐oriented programming is concerned with realising a software design using an object‐ oriented programming language. such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modes (see also Sethi‐Ullman algorithm).5 Object Oriented development Object‐oriented strategy is a development strategy where system analyzers . This involves resource and storage decisions. such as Java. Code generation: the transformed intermediate language is translated into the output language. Optimization: the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. because this is our outline information that we can to start our compiler design depending to it. constant propagation. • Object‐oriented design is concerned with developing an object‐oriented model of a software system to implement the identified requirements. SE4 Compiler Design Documentation Page 14 . relational. Source Code Lexical Analyzer Phase Syntax Analyzer Phase Symantic Analyzer Phase Intermediate Code Generator Phase Code Optimizer Phase Code Generator Phase 2. This process is very complex. from the logical as well as an implementation point of view.1.g.. 2. logical). There are usually only a small number of tokens for a programming language: constants (integer. Object Code Figure 2. ` Tokens are sequences of characters with a collective meaning. Some tokens have exactly one lexeme (e. "Fred" or "Wilma" for a string constant token. it is customary to partition the compilation process into several phases. which are nothing more than logically cohesive operations that input one representation of a source program and output another representation. this process could the Tokenization process. double.2 The lexical analyzer takes a source program as input. and produces a stream of tokens as Output. A typical compilation. the token is the general class that a lexeme belongs to. A lexeme is the actual character sequence forming a token. char.1 Lexical Analyzer Phase Lexical analyzer is the part that process where the stream of characters making up the source program is read from left‐to‐right and grouped into tokens. there are many lexemes (e.6. hence. and reserved words. The lexical analyzer might recognize particular instances of tokens such as 3 or 255 for an integer constant token. SE4 Compiler Design Documentation Page 15 .2.).g. punctuation. numTickets or queue for a variable token Such specific instances are called lexemes. etc. is shown in Figure 1. for others. string. the > character). operators (arithmetic. integer constants).6 Compilation Processes Compilation refers to the compiler's process of translating a high‐level language program into a low‐level language program.. broken down into phases. A simple example of lexical analyzer is shown in Figure 1.1 Figure 2. an error handler is called. and by using these specifications. A suitable recognizer will be designed to recognize whether a string of tokens generated by the lexical analyzer is a valid construct or not. Therefore. we form a specification of what tokens the lexical analyzer will return.1 Regular Expression Notation Regular expression notation can be used for specification of tokens because tokens constitute a regular set. suitable notation must be used to specify the constructs of a language. Therefore. and contains a deterministic finite automata (DFA) that accepts the language specified by the regular expression.6. 2. If the tokens in a string are grouped according to the language's rules of syntax. It is compact. All valid constructs of a programming language must be specified. and we specify in what manner these tokens are to be grouped so that the result of the grouping will be a valid construct of the language. a compiler verifies whether or not the tokens generated by the lexical analyzer are grouped according to the syntactic rules of the language. Figure 2. precise.2 Syntax Analyzer Phase In the syntax‐analysis phase. 2. precise. then the string of tokens generated by the lexical analyzer is accepted as a valid construct of the language. The DFA is used to recognize the language specified by the regular expression notation.6.1.3. And here is a simple Regular Expression that used to separate any kinds of tokens as shown in the Figure 1. That is. The syntax‐structure SE4 Compiler Design Documentation Page 16 . The notation for the construct specifications should be compact. two issues are involved when designing the syntax‐analysis phase of a compilation process: 1. otherwise.3 2. making the automatic construction of recognizer of tokens possible. a valid program is formed. Hence. and easy to understand. the study of regular expression notation and finite automata becomes necessary. Each production consists of a nonterminal on the left‐hand side. For example. S is a member of V. and productions. and then with terminal b.. This production specifies that the set of strings defined by the nonterminal S are obtained by concatenating terminal a with any string belonging to the set of strings defined by nonterminal S. Nonterminals are the variables that denote a set of strings. 2.2. And for certain classes of grammar. the valid constructs of the language) uses context‐free grammar (CFG).2 Derivation the Parse Tree When deriving a string w from S. if every derivation is considered to be a step in the tree construction.4 consider the production S → aSb.2. defined by the nonterminals. 3.6. in a typical programming language.S) where: 1. S and E are nonterminals that denote statement strings and expression strings. we can automatically construct an efficient parser that determines if a source program is syntactically correct. a start symbol.6. specification for the programming language (i. then we get the graphical display of the derivation of string w as a tree. V is a finite set of symbols called as nonterminals or variables. P is a set of productions. and a Figure 2. Therefore context‐free grammar is a four‐tuple denoted as: G = (V. This is called a "derivation tree" or a SE4 Compiler Design Documentation Page 17 . The left‐hand side of a production is separated from the right‐hand side using the "→" symbol.P. 2. respectively. can be combined to form a set of strings defined by a particular nonterminal. and 4. The nonterminals define the sets of strings that are used to define the language generated by the grammar.4 string of terminals and nonterminals on the right‐hand side.1 Context Free Grammar ( CFG ) CFG notation specifies a context‐free language that consists of terminals. used to form the language constructs. which is useful for both syntax analysis and translation. Hence. called as start symbol. CFG notation is required topic for study. For example.T. Grammar productions specify the manner in which the terminals and string sets. 2. nonterminals. The terminals are nothing more than tokens of the language. because A regular expression is not powerful enough to represent languages which require parenthesis matching to arbitrary depths. which is used to identify a relation on a set (V T)*. They also impose a hierarchical structure on the language. as shown in Figure 1.e. T is a set a symbols that are called as terminals. Figure 2.5 is a derivation tree for a string id + id * id. and it has n descendents with labels X1. Now we’ll move forward to semantic analyzer. and here we will discover some functions that handled by Context Sensitive Grammer ( CSG ). classes. The interior nodes are labeled by the nonterminals. expressions and variables must be used in ways that respect the type system. etc. 2. The language may require that identifiers be unique. SE4 Compiler Design Documentation Page 18 . and so forth. Then. functions. 4. If an interior node has a label A. Note that a tree is a derivation tree if it satisfies the following requirements: 1. We need to ensure the program is sound enough to carry on to code generation. it verifies that the type of an identifier is respected in terms of the operations being performed. it records the type information assigned to that identifier. as it continues examining the rest of the program. identifiers have to be declared before they’re used. that done by the Symantec analyzer . access control must be respected. Semantic analyzer is the front end’s penultimate phase and the compiler’s last chance to weed out incorrect programs.. a semantically correct one has subject‐verb agreement. for instance).6. and the components go together to express an idea that makes sense. and the left side needs to be a properly declared and assignable identifier. proper use of gender. For a program to be semantically valid. For example. consider a grammar whose list of productions is: E→ E+E E→ E*E E→ id The tree shown in Figure 1. thereby forbidding two global declarations from sharing the same name. where we delve even deeper to check whether they form a sensible set of instructions in the programming language. All the leaf nodes of the tree are labeled by terminals of the grammar. must be properly defined. then the production rule A → X1 X2 X3 …… Xn must exist in the grammar. Arithmetic operands will need to be of numeric—perhaps even the exact same type (no automatic int‐to‐double conversion. In many languages. A large part of semantic analyzer consists of tracking variable/function/type declarations and type checking. The root node of the tree is labeled by the start symbol of the grammar. As the compiler encounters a new declaration. Therefore. The parameters of a function should match the arguments of a function call in both number and type. These are examples of the things checked in the semantic analyzer phase. X2. Whereas any old noun phrase followed by some verb phrase makes a syntactically correct English sentence. 3. all variables.5 2.3 Symantec Analyzer Phase After completing the lexical and Syntax functions without errors. …. a derivation tree or parse tree is the display of the derivations as a tree. For example. the type of the right side expression of an assignment statement should match the type of the left side. "parse tree" of string w. Xn from left to right. 2.. during execution.2 Type Checking Type checking is the process of verifying that each operation executed in a program respects the type system of the language. 2. You may recognize these as abstract data types.. structs. etc.. There may be a facility for user‐defined variants on the base types (such as C enums). char. pointers.6. In many languages.e. unions. a programmer must first establish the name and type of any data object (e.g. e. such as in C. classes. 2. static in C. etc. double. type. A declaration is a statement in a program that communicates this information to the compiler. stacks. The scope defined by the innermost such unit is called the current scope. The basic declaration is just a name and type. SE4 Compiler Design Documentation Page 19 . bool.3. The scopes defined by the current scope and by any enclosing program units are known as open scopes.. Some languages also allow declarations to initialize variables. These types are constructed as aggregations of the base types and simple compound types. and so on.6. where you can declare and initialize in one statement. tables. float. we need to determine if the identifier is accessible at that point in the program.1 Types and Declarations We begin with some basic definitions to set the stage for performing semantic analysis. we need a few definitions. In addition. A type is a set of values and a set of operations operating on those values. records. or begin‐end in Pascal. This is called scope checking. or divided across both.4 Intermediate Code Generator Phase Intermediate code generator is the part of the compilation phases that by which a compiler's code generator converts some internal representation of source code into a form (e.6. etc).3 Scope Checking To understand how this is handled in a compiler. we should get an error message. function. • Complex types lists. These are the primitive types provided directly by the underlying hardware. If we try to access a local variable declared in one function in another function. but in many languages it may include modifiers that control visibility and lifetime (i. Much of what we do in the semantic analyzer phase is type checking. Any other scope is a closed.. As we encounter identifiers in a program. There are three categories of types in most programming languages: • Base types int. Many languages allow nested scopes that are scopes defined within other scopes. If a problem is found. private in Java). This is because only variables declared in the current scope and in the open scopes containing the current scope are accessible. 2. • Compound types arrays. A language is considered stronglytyped if each and every type error is detected during compilation. one tries to add a char pointer to a double in C. e. {} in C. This generally means that all operands in any expression are of appropriate types and number. Sometimes the rules regarding operations are defined by other parts of the code (as in function prototypes). trees. and sometimes such rules are a part of the definition of the language itself (as in "both operands of a binary arithmetic operation must be of the same type").3.3.g.6. we encounter a type error. queues. heaps. the programmer usually defines the lifetime. Type checking can be done compilation. A language may or may not have support for these sort of higher‐level abstractions. machine code) that can be readily executed by a machine (often a computer). variable.g.g. A scope is a section of program text enclosed by basic program delimiters. The input to the Intermediate code generator typically consists of a parse tree or an abstract syntax tree. induction variable elimination. The most common requirement is to minimize the time taken to execute a program. we are actually SE4 Compiler Design Documentation Page 20 . (For example. Sophisticated compilers typically perform multiple passes over various intermediate forms. This multi‐ stage process is used because many algorithms for code optimization are easier to apply one at a time.6. Examples of machine‐independent optimizations are: elimination of loop invariant computation. This organization also facilitates the creation of a single compiler that can target multiple architectures. there is no guarantee that the generated object code will be optimal. Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code. 2. a compiler should not attempt any optimization that would lead to a change in the program's semantics. even after performing various optimizing transformations. that is. the optimizations are not tied to the target machine's specific platform or language. usually in an intermediate language such as three address code. code optimization is a misnomer. a less common one is to minimize the amount of memory occupied. Hence. On the other hand. but every optimizing transformation must also preserve the semantics of the program. a peephole optimization pass would not likely be called "code generation".5 Code Optimizer Phase the code optimizer is take the Intermediate Representation Code that would be generated by the Intermediate Code Generator as it’s input and doing the Compilation Optimization Process that finally produce the Optimized Code to be the input to the code generator. although a code generator might incorporate a peephole optimization pass. as only the last of the code generation stages (the backend) needs to change from target to target. Machine‐independent optimizations can be performed independently of the target machine for which the compiler is generating code. The tree is converted into a linear sequence of instructions. or because the input to one optimization relies on the processing performed by another optimization. That is. And here we can define the compiler optimization process that the process of tuning the output of the intermediate representation code to minimize or maximize some attribute of an executable computer program. machine‐dependent optimization requires knowledge of the target machine. The growth of portable computers has created a market for minimizing the power consumed by a program. depending on whether they involve a significant change in the representation of the program. Actually. Further stages of compilation may or may not be referred to as "code generation". It involves a complex analysis of the intermediate code and the performance of various transformations. and elimination of common subexpressions. An attempt to generate object code that will utilize the target machine's registers more efficiently is an example of machine‐ dependent code optimization. Optimization can be machine‐independent or machine‐dependent. Generally speaking.6.1 Types of optimizations Techniques used in optimization can be broken up among various scopes which can affect anything from a single statement to the entire program. it is not possible to generate good code without considering the details of the particular machine for which the compiler is expected to generate code. performing code improvement. on average. Even so. locally scoped techniques are easier to implement than global ones but result in smaller gains. (This example is also an instance of strength reduction. and this phase is being a machine‐dependent phase. reduce the time and space expended by the object code.6.6 Code Generator Phase It’s the last phase in the compiler operations. When attempting any optimizing transformation. For instance. This form of optimization examines a few adjacent instructions (like "looking through a peephole" at the code) to see whether they can be replaced by a single instruction or a shorter sequence of instructions. This reduces the amount of analysis that needs to be performed (saving time and reducing storage requirements) but means that worst case assumptions have to be made when function calls occur or global variables are accessed (because little information about them is available). SE4 Compiler Design Documentation Page 21 . • Peephole optimizations : Usually performed late in the compilation process after machine code has been generated. • Loop optimizations : These act on the statements which make up a loop. loop‐invariant code motion). 3‐ The optimization should. a carefully selected code‐generation algorithm can produce code that is twice as fast as code generated by an ill‐considered code‐generation algorithm. such as a for loop (eg. a multiplication of a value by 2 might be more efficiently executed by left‐shifting the value or by adding the value to itself. Some examples of scopes include : • Local optimizations : These only consider information local to a function definition. 2. Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops.) 2. 2‐ The optimization should be such that the meaning of the source program is preserved.5. the following criteria should be applied: 1‐ The optimization should capture most of the potential improvements without an unreasonable amount of effort. 2. 1‐ The error message should be easy to understand by the user. For example. the message should be produced along with the line numbers of the source program. And the clearly operations of this phase is the representation of the logical addresses of our compiled program in the physical addresses. an error message should read. because the compiler will be scanning and compiling the entire program.6. an error is returned. When detecting an error. allocating the registers the target machine. that is. After detecting an error. but the first token. the same message should not be produced again and again. is in error. a compiler scans some of the tokens that are ahead of the error's point of occurrence. Every phase of a compilation expects the input to be in a particular format. "x is not declared in function fun. 3‐ The message should not be redundant. 2‐ The error message should be specific and should localize the problem. also linking the contents of the symbol table and the optimized code that generated by the code optimizer to produce finally the target file or the object file. SE4 Compiler Design Documentation Page 22 .7 Error Handling methods One of the important tasks that a compiler must perform is the detection of and recovery from errors. The error in the above statement will be detected in the syntactic analysis phase. For example. perhaps in the presence of errors. so as many errors as possible need to be detected. A good error diagnostic should possess the following properties." and not just. The fewer the number of tokens that must be scanned ahead of the point of error occurrence. Recovery from errors is important. but not before the syntax analyzer sees the token "then". the first thing that a compiler is supposed to do is to report the error by producing a suitable diagnostic. For example. and whenever that input is not in the required format. itself. The message should be produced in terms of the original source program rather than in terms of some internal representation of the source program. consider the following statement : if a = b then x: = y + z. the better the compiler's error‐detection capability. "missing declaration". and the labels of statements.7. SE4 Compiler Design Documentation Page 23 . This information is used in the source program to identify the various program elements. a symbol table must have an efficient mechanism for accessing the information held in the table as well as for adding new entries to the symbol table. After detecting an error. our choice of the implementation data structure for the symbol table and the organization its contents should be stress a minimal cost when adding new entries or accessing the information on existing entries. This is all the required information that wanted to construct a complete picture about the compiler in the mind of any interested person that have a little knowledge about the compiler and the automation language. which can cause severe difficulties in the syntaxanalysis and remaining phases. And know we are ready to go to the next chapter or the important chapter – Compiler Design – because the project or study is a practice of the principles of the Software Design subject. The symbol table is searched every time a name is encountered in the source text. One way the parser can help the lexical analyzer can improve its ability to recover from errors is to make its list of legitimate tokens (in the current context) available to the error recovery routine. procedures. constants.6. the lexical analyzer can invoke an error recovery routine. But this is likely to cause the parser to read a deletion error. Also. But there are a variety of other Symantec analyzer s that do not necessarily have this property.6. Therefore. When a new name or new information about an existing name is discovered. 2. So. if the symbol table can grow dynamically as necessary. then it is more useful for a compiler. The simplest possible error recovery is to skip the erroneous characters until the lexical analyzer finds another token. 2. For efficiency.2 Recovery from the Symantec phase errors A Symantec analyzer detects an error when it has no legal move from its current configuration. like variables.1 Recovery from the lexical phase errors The lexical analyzer detects an error when it discovers that an input's prefix does not fit the specification of any token class. It is capable of announcing an error as soon as they read an input that is not a valid continuation of the previous input's prefix. the content of the symbol table changes. This is earliest time that a left‐to‐right parser can announce an error.6. and it minimizes the amount of erroneous output passed to subsequent phases of the compiler.7. The advantages of using a Symantec analyzer with a valid‐prefix‐property capability is that it reports an error as soon as possible. The error‐recovery routine can then decide whether a remaining input's prefix matches one of these tokens closely enough to be treated as that token.8 Symbol Table & Symbol Table Manger A symbol table is a data structure used by a compiler to keep track of scope/ binding information about names. in the advanced compilers there is a symbol table manager that helping in arranging the data in the symbol table and provide a specified access to the other component of the compiler that want to access the symbol table to does anything . This can entail a variety of remedial actions. 2. Chapter Three Compiler Design SE4 Compiler Design Documentation Page 24 . and why we are use this strategy in our work and implement it in all stages.2. 3. the linker. In this chapter. now we will show the subsystems of our compiler. this figure showing that our system is a subsystem in a large system. and how this subsystem is interconnect with each other. we can easily optimized the SE4 Compiler Design Documentation Page 25 . we will show how to design the compiler by using the object oriented strategy. 3.1 3. if we considered any low or high language as a system.e. 3. we will establish the architectural design of our compiler.2. we can observe that. We also used this strategy in the design process of our compiler . By using the object oriented design. i.1.2. and how this subsystems are interact to produce the target file that the compiler if found to generate it. So When we look to the Figure 3. the libraries and report generator( if it is a high language ) as a subsystems. we can extract or determine that we have many subsystems such as the compiler. As shown in the Figure 2. and also using the standard notations and diagrams of the UML (unified modeling language ) that helping the programmer to implement this system. in this process we will define the subsystems that making up our main system. and it is interact with the others subsystems to complete theirs function without any conflict.1 Compiler Structuring The main output from the System Structuring is the definition of the subsystems. the loader.2 The Context of the System When we look to the compiler system as a subsystem. and how this subsystems would be communicate.1 Compiler Architectural Design Here. Figure 3.1 Compiler Design As e mentioned above about the object oriented development strategy.1. 2. Symantec. The lines that linking between this subsystems is represent the relations between them.2 • SE4 Compiler Design Documentation . it save the stream of tokens in the symbol table by helping from the symbol table manager. 3. update. after that the syntax analyzer receive this stream of tokens and doing the phrasing operation to this tokens and produce the phrase tree to the symbol table. So all of our subsystems has a flexible interface or the same interface when getting or setting the data with the symbol table manager. we can determine the organization models that used in our compiler. also this is done with helping with the symbol table manager. at the beginning the lexical analyzer receive the source code from the editor screen if the language has a UI or from the File that the source code written to it. and save it to the symbol table.1. and communicate with each others.2 to understand this. subsystems and determine the multi stage in the compiler from the beginning by receiving the source code through the lexical analyzer. Abstract Machine ( Layered Model ) : this is the main model that the compiler depends to it. till the producing the target file.1. Then the Symantec analyzer receive this phrase tree and checking the Page 26 Figure 3. The compilation process goes to end through the Layered model. This models can help to illustrate how this subsystems share it’s data. See the Figure 3. When the Lexal analyzer finishes it’s function of separation the token that the source have. syntax. From this model we can say that our compiler has a subsystem share the data with all the others.1 Compiler Organization Also from the Figure 3.2. Because it handle the main function of the compiler. Also the repository model appears in the error handler subsystem that communicate with all the other subsystems to report the errors if found in any stage. We can find easily the following models in our compiler… • Repository Model : found in the symbol table manager that is connect with the all others subsystems to dealing with any subsystem’s data to check. …etc. 2 Compiler Control Modeling In this sub level we work with the controls to answers many questions such as how the sequence of control flows on this subsystems?. After that the next phases is begin with the Intermediate representation code generator that receive the abstract phrase tree and doing the low level operation as generate the dependent machine instruction and the instruction set architecture. After that this intermediate representation code was receive by the I. So we can determine the model of controls of our system as the Broadcast Model because any subsystem is interest to doing it’s job. Figure 3. this is done by using many functions that work in the low level environment that solving many problems such as loop optimization and the parallel optimization and … etc. we have the ordering way to do that. it is first do the monitoring SE4 Compiler Design Documentation Page 27 .2.1.3 that represent this layered models in our system. We can convey the meaning of this illustration in the following Figure 3.R code optimizer that doing the additional effort in the optimizing this code such as reducing the time needed from the CPU to executed this program and also reduce the size of memory that is wanted to complete the execution.e. the output from this stage is the optimized code that would be received by the code generator that work with it and the content of the symbol table to execute many operation such as controlling the run time errors and transform the logical addresses in the program that we want to compile to the physical addresses in the memory. logical errors then produce the abstract syntax tree. i.3 3. and the compilation process goes from any subsystem to another depending to the output of this subsystem. also allocating the registers to produce finally the target code. it’s take a little bit. then it produce the intermediate representation code. we can says that the control of our compiler is the Event – Based Control because the timing in our system is very important. the compilation process doesn’t take a few second. when this sequence change from subsystem to another ? and etc … After many times to determine that. We will illustrate this step when we reach to the UML class diagram in this chapter. By identifying the needed classes and identifying theirs attributes and operations.! .2. 3. i.3 Compiler Modular Decomposition In this field e can show the strategy that used in the decomposition of our subsystems component.4 Compiler Domain Specific Architectures Here. In the SE4 compiler design we can say we use to analysis the domain specific of our compiler the Generic Model that appear from the analysis the previous compilers and extracting the good features to implement it in our compiler. we also use the Object Model to doing the modular decomposition in the subsystems component. 3. our compiler is come from the study of the existing compilers that have the same objectives with our compiler. because the programmers need a diagrams nether that the natural language to implement our system. And here subsystem doesn’t know when an event will be handled. SE4 Compiler Design Documentation Page 28 We start with the Lexical analyzer use case diagram .e. we can determine the domain specific model of our compiler. As we mentioned above that we use the object oriented strategy to analysis and design our system. operation to determine the time to start depending to any event the broadcast to all subsystems.3 Compiler USE Case Models Use case models or diagrams is important to describe our subsystems functions. So we develop the use case models for each of our subsystems of the compiler. And we can say that this is an efficient way has been done in the integrating subsystems. It is very important step because if this step completed successfully.1.1. ! 3. This is one of the important step of the architectural design of any systems that we want to build. we gives a big help to the programmers that will be written the codes of our compiler.2. 5 SE4 Compiler Design Documentation Page 29 .5.4. As we see this diagram it also have many additional functions .3. we specify the lexical analyzer use case diagram whose main function is to generate the stream of tokens.3. Figure 3.2 Syntax Analyzer USE Case Diagram Syntax analyzer use case diagram is shown in the Figure 3. it’s main use case is the phrasing this stream of token that generated by the lexical analyzer.4 3.1 Lexical Analyzer USE Case Diagram In this diagram shows in the Figure 3. Figure 3. 3. 6 3.R Code Generator USE Case Diagram In the intermediate representation code generator we can say this subsystem’s main use case is to produce the simple intermediate representation code. and also produce the AST (abstract syntax tree) as shown in the Figure 3. Figure 3.6 . 3.3.7 . SE4 Compiler Design Documentation Page 30 .4 I. We shown it’s use case diagram in the Figure 3.3 Symantec Analyzer USE Case Diagram Symantec analyzer doing the functions of determining the logical error by comparing the phrase tree that generated from the syntax analyzer with it’s CSG (context sensitive grammar) .3. 3.R Code Optimizer USE Case Diagram In the Figure 3. Figure 3. Figure 3.8 SE4 Compiler Design Documentation Page 31 .8 we would show the use case diagram of the I.R code that was generated by the I.R code optimizer that doing it’s operation on the simple I.R code generator to finally produce Simple Optimized Code.5 I.7 3. It’s use case diagram is shown in the Figure 3. we can access to the symbol table manager. Figure 3.6 Code Generator USE Case Diagram The main function of this subsystem and the system is to produce the target file. 3.3.3.10 SE4 Compiler Design Documentation Page 32 . So this subsystem have many functions applied on the symbol table. this function is done with many environmental functions that are shown in the Figure 3.10. Figure 3.9.7 Symbol Table Manager USE Case Diagram with this subsystem.9 3. SE4 Compiler Design Documentation Page 33 .8 Error Handler USE Case Diagram The function of the error handler is to report any occurred errors and recognize any types of errors occurred during the compilation process. This is shown in the Figure 3.4 Compiler’s Subsystems Activity Diagrams The activity diagram the diagrams that translate the use case to the sequence of operations and allowed the conditions through this sequence. So it is describe any function from the beginning till the end of it. 3.11 3.11. Figure 3.3. Figure 3.12 SE4 Compiler Design Documentation Page 34 . 3. Here the result May be the stream of tokens or the error that occurred.12 is shown this diagram. Figure 3.4.1 Lexical Analyzer Activity Diagram The activity diagram of the lexical analyzer subsystem is illustrate the process of tokenization and how this process is happened. also what is the probabilities of the result. 2 Syntax Analyzer Activity Diagram The syntax analyzer activity diagram is illustrate how the syntax analyzer is internally interact to produce in the final. Figure 3. Figure 3.13 show us the activity diagram of the syntax analyzer subsystem.4. the phrase tree or occurring the syntax error that doesn’t allow this stage to complete it’s sequences to produce the phrase tree. 3.13 SE4 Compiler Design Documentation Page 35 . 3 Symantec Analyzer Activity Diagram The Symantec analyzer sequences to produce the abstract phrase tree if doesn’t error occurred is shown in the Figure 3. 3.4.14. SE4 Compiler Design Documentation Figure 3.14 Page 36 . the work is done in the low level meaning that this work is done dependently to the machine it self or from the other side if the compilation process reach to this stage without any error.R Code Generator Activity Diagram From this subsystem till the end of the compilation process.15 SE4 Compiler Design Documentation Page 37 . Figure 3. the user finishes it’s requirements and the compiler is interact to finish this operation and produce the target file with less negative effect to the user’s machine.4. Figure 3.15 shown how the I.4 I.R code Generator internally interact to produce the simple I. 3.R code. 5 I.R code generator.R simple code produced by the I.16 SE4 Compiler Design Documentation Page 38 .16. 3.R code Optimizer activity diagram shown in the Figure 3.R Code Optimizer Activity Diagram I.4. Figure 3. it’s main function is to produce the Optimized code from the I. 17. this figure shown the sequence that used in this subsystem to produce the target file or code.6 Code Generator Activity Diagram The activity diagrams of this subsystem is shown in the Figure 3.17 SE4 Compiler Design Documentation Page 39 . Figure 3.4. 3. 7 Symbol Table Manager Activity Diagram In the activity diagram of the symbol table manager.4. 3. we illustrate the sequence of this subsystem operations that are shown in the Figure 3.18.18 SE4 Compiler Design Documentation Page 40 . Figure 3. SE4 Compiler Design Documentation Page 41 Figure 3. The Figure 3. 3.8 Error Handler Activity Diagram In the error handler activity diagram.4.19 is shown this diagram.19 . we illustrate the operations that are executed when any type of error are happened in any stage of the compilation process. This class attributes is extracted from the use case and activity diagrams that are discussed above. The following large Figure 3.20 is show this classes of our compiler. SE4 Compiler Design Documentation Page 42 . we show the generated class that needed to implement our compiler. 3.5 Compiler Class Diagram In this subsection of the design process. In the determination of this classes we also extract any redundant class that our main class in need it to complete it’s functions. 20 . SE4 Compiler Design Documentation Page 43 Figure 3. 6. 3. it is describe how the subsystems will be communicated. SE4 Compiler Design Documentation Page 44 Figure 3. symbol table manager.e. we generate the sequence diagram that is interact among our interface.21. This sequence diagram is shown in the Figure 3.21 . symbol table and error handler actors. 3. the lexical analyzer. i. And what is the sequence from many paths of the communications between them.6 Compiler sequence Diagram The sequence diagram is the diagram that describe the interaction among many actors of any system.1 Lexical analyzing sequence diagram In the lexical analyzing stage. 3. symbol table manager. symbol table and error handler actors. the syntax analyzer.6. This sequence diagram is shown in the Figure 3. we generate the sequence diagram that is interact among our interface. Figure 3.22 SE4 Compiler Design Documentation Page 45 .22.2 syntax analyzing sequence diagram In the syntax analyzing stage. the Symantec analyzer.23.6. This sequence diagram is shown in the Figure 3.3 Symantec Analyzing sequence diagram In the Symantec analyzing stage. symbol table and error handler actors.23 SE4 Compiler Design Documentation Page 46 . symbol table manager. we generate the sequence diagram that is interact among our interface. Figure 3. 3. 6. SE4 Compiler Design Documentation Figure 3.24 Page 47 .4 I.R Code Generation stage.24. symbol table manager. and the symbol table . 3.R Code Generation sequence diagram In the I. we generate the sequence diagram that is interact among our Intermediate Representation code generator. This sequence diagram is shown in the Figure 3. 5 I. 3.6. and the symbol table . This sequence diagram is shown in the Figure 3.25.R Code Optimizing stage.25 SE4 Compiler Design Documentation Page 48 .R Code Optimizing sequence diagram In the I. we generate the sequence diagram that is interact among our Intermediate Representation code optimizer. Figure 3. symbol table manager. symbol table manager.26 SE4 Compiler Design Documentation Page 49 .26. This sequence diagram is shown in the Figure 3. we generate the sequence diagram that is interact among our code generator . 3.6. Figure 3. and the symbol table .6 Code Generation sequence diagram In the Code generation stage. 27 SE4 Compiler Design Documentation Page 50 .7 Additional sequence diagram The sequences diagrams that are shown in the Figure 3. is represent the sequence diagrams that are executed once at the first time of installing our compiler. To feeding our subsystems with the grammars. 3.27.6. Figure 3. Syntax Analyzer The syntax‐analysis is the part of a compiler that verifies whether or not the tokens generated by the lexical analyzer are grouped according to the syntactic rules of the language. etc. then R = L(M) for some finite automata M. then L(M) is always a regular set. That is. double. char.designers and programmer think in terms of ‘things’ instead of operations or functions. string. logical). operators (arithmetic. A lexeme is the actual character sequence forming a token. punctuation. if M is a finite automata. relational. Object-oriented strategy Is a development strategy where system analyzers . There are usually only a small number of tokens for a programming language: constants (integer. converts this line into machine code and executes it. if R is a regular set. Similarly. Lexical Analyzer Lexical analyzer is the part where the stream of characters making up the source program is read from left‐to‐right and grouped into tokens.). Tokens Tokens are sequences of characters with a collective meaning. and reserved words. Glossary Compiler A compiler is a special type of computer program that translates a human readable text file into a form that the computer can more easily understand. Interpreter An interpreter reads the source code one instruction or line at a time. Regular Expression A regular set is a set of strings for which there exists some finite automata that accepts that set. SE4 Compiler Design Documentation Page 51 . nonterminals. Types and Declarations A type is a set of values and a set of operations operating on those values. Type Checking Type checking is the process of verifying that each operation executed in a program respects the type system of the language. Scope Checking Is the determination if the identifier is accessible at that point in the program. used to form the language constructs. a start symbol. whose left‐hand side contains the nonterminal to be replaced. and productions. by the right‐hand side of the production rule. The terminals The terminals are nothing more than tokens of the language. Nonterminals The nonterminals define the sets of strings that are used to define the language generated by the grammar. Context Free Grammar ( CFG ) CFG notation specifies a context‐free language that consists of terminals. Symantec Analyzer Is the part of compiler where delve even deeper to check whether they form a sensible set of instructions in the programming language. SE4 Compiler Design Documentation Page 52 . A scope A scope is a section of program text enclosed by basic program delimiters. Intermediate Code Generator Is the part of the compilation phases that by which a compiler's code generator converts some internal representation of source code into a form that can be readily executed by a machine. Derivation Derivation refers to replacing an instance of a given string's nonterminal. procedures. Code Optimizer Refers to techniques a compiler can employ in order to produce an improved object code for a given source program. Code Generator It’s the last phase in the compiler operations. SE4 Compiler Design Documentation Page 53 . like variables. constants. This information is used in the source program to identify the various program elements. and the labels of statements. that convert the optimized code to depended machine code> Error Handling One of the important tasks that a compiler must perform is the detection of and recovery from errors. Symbol Table Manger Is the part of the compiler that provide to other compiler components the access to symbol table. Symbol Table A symbol table is a data structure used by a compiler to keep track of scope/ binding information about names. and Computation. New York: Wiley.G. 1986.N. “ Introduction to Computer Theory”.Ravi Sethi and Jeffrey D. “Software Engineering “.Techniques and Tools “. England: McGraw‐Hill. [4] Y. Berkshire”. [7] Bennett . 1990. “ Algorithms for Compiler Design ” . Kakde . Ullman. Srikant. Charles River Media 2002. J. Reading” MA: Addison‐Wesley. “Introduction to Automata Theory. Bibliography [1] O.Ullman. [6] Hopcroft. [3] Ian Sommerville . “The Compiler Design Handbook 2nd Edition “Dec. [2] Alfred v.Aho . Priti Shankar. Languages. 1979. Websites www. “Introduction to Compiling Techniques.com SE4 Compiler Design Documentation Page 54 .wikipedia.2007 [5] Cohen.“ Compilers Principles .