Data Structures

March 26, 2018 | Author: krishnastays | Category: Time Complexity, Summation, Matrix (Mathematics), Algorithms, Division (Mathematics)


Comments



Description

Introduction to Data StructuresComplexity, Rate of Growth, Big O Notation Arrays Linked List Stacks, Queues, Recursion Sorting and Searching Techniques Hashing Techniques MSc. In Software 1 INTRODUCTION TO DATA Structures ! The Need to learn Data structures ! Coding, Testing And Refinement ! Pitfalls ! Basic Terminology; Elementary Data ! Data Structures ! Data Structures Operations ! Complexity, Time-Space Tradeof ! Summary 1 MSc. In Software 2 THE NEED TO LEARN DATA STRUCTURES S ome of the rules for good programming are: 1. Always name your variables and functions with the greatest care and explain them thoroughly. 2. The reading time for programs is much more than the writing time. Write clearly to enable easy reading. 3. Each function should do only one task. Ensure that it is done well. 4. Each function should hide something. 5. Keep your connections simple. Avoid global variables whenever possible. If you must use global variables as input, document them thoroughly. 6. Never code until the specifications are precise and complete. This subject describes programming methods and tools that will prove effective for projects. Food for thought: How do you rewrite the following function so that it accomplishes the same result in a less tricky way? Doessomething(int *first,int *second) { *first = *second - *first; *second = *second - *first; *first = * second + *first; } A probable solution is like this: Doessomething(int *first, int *second) { * first = 2(*second - *first); } By now we know that the computer understands logical commands. The instructions given to the computer must be in a very structured form. This structured form is called Algorithm in computer jargon. The algorithm is a representation of any event in a stepwise manner. In some cases these sequence of activities are quite simple and the algorithm can be easily constructed. But when the problem at hand is quite complex, and a lot of different activities have to be considered within a single problem, keeping track of all these events and the variables they involve becomes a very tedious task. To manage and handle these without extra variables that contribute nothing to the understanding.}} A probable solution: Do it yourself. since each requires its own approach and method. CODING. Testing is the process of running the program on simple data to find errors if there are any. After coding the main program. The need for using data structures arises from the fact that it teaches how to write a function with meaningful variable names. most programmers wish to complete the writing and coding of the functions as soon as possible.MSc. Refinement is done on the basic program. 3. Be sure you understand the algorithmic method before you start to program.} else if (orange <= apple) { peach = apple. divide the problem into pieces and think of each part separately. email to us. Be sure you understand your problem before you decide how to solve it. Yet it is important to keep them separate in our thinking. Coding is the process of writing an algorithm in the correct syntax (grammar) of a computer language like C. } if (lemon != MAXINT) {return (peach). . peach = 0. 2. int orange) { int peach. with better format and without unnecessary variables. to check if the full project works. lemon = MAXINT. If you cannot. In Software 3 events and variables in a more structured and orderly manner we take the aid of data structures. lemon = 0. lemon. there are good reasons for debugging functions one at a time. #define MAXINT 100 int calculate(int apple. if (apple < orange) { peach = orange. But even for small projects. Food for thought: Rewrite the following function with meaningful variables. TESTING AND REFINEMENT The three processes in the title above go hand-in-hand and must be done together. In case of difficulty. PITFALLS 1. with a better layout and without redundant and useless statements.} else { peach = MAXINT. which we will be using throughout the subject. or. ELEMENTARY DATA Try to remember some basic terminologies. 7. In Software 4 4.g. To compile the program correctly. Field Single elementary unit of information representing an attribute of an entity Record . and hence we must put in short. entitles and entity sets. The way data is organized into the hierarchy of fields. Information : The term information is sometimes used for data with given attributes.MSc. dummy functions. For better understanding of the concept. there must be something in the place of each function that is used..Collection of field values of a given entity File Collection of records of the entities in a given entity set Primary key : Each record in a file may contain many field items. The following are the possible attributes and their corresponding values for an entity. No. and when reading a program make sure that you debug the code and not just the comments. records and files reflect the relationship between attributes. 134 Entity Set : Entities with similar attributes (e. Explain your program to somebody else. Keep your functions short and simple. all the employees in an organization) form an entity set. Keep your documentation consistent with your code. Data Values or sets of values Data-items – Refers to single unit of values Group-items – These are data-items that can be divided into sub-items Elementary–items . 6. Keep your programs well formatted as you write them – it will make debugging much easier. but the value in a certain field may uniquely determine the record in the file. Range of Values : Each attribute of an entity set has a range of values. an employee of a given organization. Doing so will help you understand it better.These are data items that cannot be divided into sub-items An entity is something that has certain attributes or properties that may be assigned values. These values may be numeric or nonnumeric. Such a field K is . in other words. meaningful or processed data. BASIC TERMINOLOGY. the set of all possible values that could be assigned to the particular attribute. This makes debugging easier. called stubs. look at the example given below. 8. Attributes: Values: Name XYZ Age 34 Sex M Emp. 5. which we are going to use. The above examples must have cleared your doubt about key words. K2. since more than one person can have the same name. A file can have fixed-length records or variable-length records.File records may contain different lengths. Consider these cases to understand Primary key better: (1) Suppose an automobile dealership maintains a file where each record contains the following data: Serial Number Type Year Price Accessories The ‘Serial Number’ field can serve as a primary key for the file. variablelength records have a minimum and maximum length.MSc. Variable-length records . in such a field are called keys or key values. since different students take a varying number of courses. since each automobile has a unique serial number. ‘Name’ and ‘Address’ may be group items and together can serve as a primary key. The above can be explained as: Student records usually have variable lengths. Usually. . and the values K1. In Software 5 called a primary key.…. Note also that the ‘Address’ and ‘Telephone Number’ fields may not serve as primary keys. since some members may belong to the same family and have the same address and telephone number. Records may also be classified according to length. Fixed-length records All the records contain the same data items with the same amount of space assigned to each data item. ‘Name’ cannot be the primary key. (2) Suppose an organization maintains a membership file where each record contains the following data: Name Address Telephone Number Dues Owed Although there are four data items. ‘Dues Owed’ is out of the question because many people can have the same value. ….3… n. A(N) Or by the bracket notation A[1]. A1 . A[N] Let’s see an example so that we can understand it easily: A linear array STUDENT consisting of the names of six students is pictured in this figure 1 2 3 4 5 6 Name1 Name2 Name3 Name4 Name5 Name6 Fig 1-1 This is the simplest example of a single-dimensional array.2. A two-dimensional array is a collection of similar data elements where each element is referred to by two subscripts. A[2]. A3 .. In Software 6 The study of such data structures includes the following three steps: (1) (2) (3) Logical or mathematical description of the structure Implementation of the structure on a computer Quantitative analysis of the structure. and so on. we mean a list of a finite number n of similar data elements referred respectively by a set of n consecutive numbers. STUDENT[2] denotes ‘Name2’. usually 1.…. By a linear array. If we choose the name A for the array. Here we will discuss three types of data structures in detail. then the elements of A are denoted by subscript notation. A[3]. They are: arrays.MSc.. Here STUDENT[1] denotes ‘Name1’. link list. and trees. which includes determining the amount of memory needed to store the structure and the time required to process the structure DATA STRUCTURES The logical or mathematical model of a particular organization of data is called a data structure.…. A(3). Arrays The simplest type of data structure is a linear (or one-dimensional) array. A (2). Such arrays are called matrices in . An Or by the parenthesis notation A(1). A2 . Consider a file where each record contains a customer's name and his or her salesperson. since it contains 3 rows (the horizontal lines of numbers) and 4 columns (the vertical lines of numbers). emphasize more on linked lists. Consider a block like this. and tables in business applications. and suppose the file contains the data as appearing in the figure 1-3. If we denote the 1st array members as array[0][0]. this may not be the most useful way to store the data. . However. Clearly the file could be stored in the computer by such a table. understand the basic idea and try to solve as many examples as you can.MSc. If we only give the theory of linked lists it will be difficult for you to understand. You can visualize a two dimensional array just like this. having 3 rows and 4 columns. Therefore. Multidimensional arrays are defined analogously. by two columns of five names. So if you want to be an expert in data structures. In Software 7 mathematics. the following array members will be denoted like this: array[0][0] array [0][1] array [0][2] array [0][3] array [1][0] array [1][1] array [1][2] array [1][3] array [2][0] array [2][1] array [2][2] array [2][3] Fig 1-2 The position of the highlighted cell in the block in the array notation is array[2][3]. i.e. Linked Lists A linked list is the most important and difficult part of data structure. sw The size of this array is denoted by 3 X 4 (read 3 by 4). we will introduce it with an example. One way to simplify such a search is to have a table containing customer name and a number (pointer) corresponding to each customer. Thus we can save a lot of space. where against every customer name we have written the number (pointer) of the corresponding salesperson. Each salesperson would now have a set of numbers (pointer) giving the position of his or her customers. In such a case. Practically speaking. in front of each customer we have specified his salesperson’s name. But imagine a case where there are hundreds of customers. the firm would have to search through the entire customer file. repeating the name of the sales person will consume lot of space. we will give numbers to the sales persons and mention that number in front of the customer’s name.8 4 5.3 3. as in this figure 1-5. Sales person3 Salesperson3. . which gives the location of each customer's salesperson. In Software Salesperson1 Salesperson2 Salesperson3 Salesperson4 Salesperson5 Salesperson6 Salesperson7 Salesperson8 1 2 3 4 5 6 7 8 Customer1 Customer2 Customer3 Customer4 Customer5 8 Salesperson1 Salesperson2. Now in the above case the number of customers is very less that is why we can afford to do this. in figure 1-3. Customer1 Customer2 Customer3 Customer4 Customer5 Customer1 Customer2 Customer3 Customer4 Customer5 a b c d e 1 2. This is done in the figure 1-4. 1-3 Another way of storing data in the figure 1-3 is to have a separate array for the salespeople and an entry (called a pointer) in the customer file.1 Fig. Instead.MSc. Sales person8 Salesperson4 Salesperson5 Salesperson1 Fig. Using the data representation in this figure 1-4.1-4 Suppose the firm wants the list of customers for a given salesperson. 1-6 Here ‘1’ is the pointer of ‘Salesperson1’. Customer4). this customer is not further connected to any other customer. and so on.e b b. Similarly. which is the pointer of ‘Customer2’ points to ‘Customer3’ and so on. Here each salesperson has one pointer which points to his or her first customer. is called a rooted trees graph or. Consider figure 1-6 for the salesperson ‘Salesperson1’. The most popular way to store such data is shown in figure 1-6. Since ‘Customer4’ is the last customer i. which reflects a hierarchical relationship between various elements. a tree. 1 Salesperson1 a Customer1 1 b Customer2 a c Customer3 b 0 Customer4 c Fig.. simply. Trees Data frequently contains a hierarchical relationship between various elements. Customer2.MSc.e. In Software Salesperson1 Salesperson2 Salesperson3 Salesperson4 Salesperson5 Salesperson6 Salesperson7 Salesperson8 9 a. (In this picture we have considered that Salesperson1 has got Customer1. 1-5 Disadvantage: The main disadvantage of this representation is that each salesperson may have many pointers and the set of pointers will change as customers are added and deleted. Link List contd.c d e c Fig. ‘b’. which in turn points to ‘Customer2’. with the salesperson's last customer indicated by a 0. This pointer points to ‘Customer1’. . Customer3. ‘a’ is the pointer of ‘Customer1’. its pointer has been assigned to 0. whose pointer in turn points to the second customer. The data structure. In Software 10 Trees will be defined and discussed in detail in later modules but here we indicate some of their basic properties by means of two examples: (a) An employee’s personnel record : This may contain the following data items i) Social Security Number ii) Name iii) Address iv) Age v) Salary vi) Dependents However. First and MI (middle initial). 01 Employee 02. as shown in figure 1-7 (b).MSc. where Area itself may be a group item having subitems City. Name may be a group item with the subitems Last. Last 03 First 03 Middle Initial 02 Address 03 Street 03 Area 04 City 04 State . Social Security Number 02. Name 03. address may be a group item with the subitems Street address and Area address. This hierarchical structure is explained in figure 1-7 (a). State and ZIP code. Fig 1-7(a) Another way of picturing such a tree structure is in terms of levels. Also. (b) Queue : A queue. Another analogy is with . New dishes are inserted only at the top of the stack and dishes can be deleted only from the top of the stack. also called a last-in first out (LIFO) system. and insertions can take place only at the other end. and the multiplication at the top of the tree must be executed last. is a linear list in which deletions can take place only at the front of the list. The first person in line is the first person to board the bus.7b)3 Now we want to represent the expression by the tree. Observe that the order in which the operations will be performed is reflected in the diagram: the exponentiation must take place after the subtraction. also called a first-in first-out (FIFO) system. Let the expression be (2x + y) (a . Thus now we can show the expression in terms of a tree diagram as shown in figure 1-8. In Software 02 Age 02 Salary 02 Dependents 11 04 ZIP Fig. 1-7(b) (b) An algebraic expression in the tree structure format for calculating. This structure is similar in its operation to a stack of dishes on a spring. the rear of the list. Fig. is a linear list in which insertions and deletions can take place only at one end. This structure operates in much the same way as a line of people waiting at a bus stop. called the top. so let’s use a vertical arrow (↑) for exponential and an asterisk (*) for multiplication. 1-8 Some More data Structures (a) Stack : A stack.MSc. will also be considered. (4) Deleting: Removing a record from the structure.. (2) Merging: Combining the records in two different sorted files into a single sorted file. City1 City2 City4 City3 City5 Fig. we may want to delete the record with a given key. e. is called a graph. 1-9 DATA STRUCTURE OPERATIONS Now we are going to see how data appearing in our data structures are processed by certain operations. (3) Inserting: Adding a new record to the structure. or finding the locations of all records. (c) Graph : Data sometimes contain a relationship between pairs of elements. Sometimes two or more of the operations may be used in a given situation. alphabetically according to some NAME key. . Remember that the particular data structure one chooses for a given situation depends largely on the frequency with which specific operations are performed.) (2) Searching: Finding the location of the record with a given key value. or in numerical order according to some NUMBER key. which satisfy one or more conditions. The data structure. which may mean we first need to search for the location of the record.MSc. For example. such as employee number or account number). which is not necessarily hierarchical in nature. In Software 12 automobiles waiting to pass through an intersection--the first car in line is the first car through. suppose an airline flies only between the cities connected by lines as shown in the figure below.g. The following two operations. which reflects such a relationship. (1) Sorting: Arranging the records in some logical order (e. In this section we will introduce some of the frequently used operations. The following four operations play a major role in the text: (1) Traversing: Accessing each record exactly once so that certain items in the record may be processed.g. which are used in special situations.. (The accessing and processing is sometimes called "visiting" the record. Again one would traverse the file to obtain the data. If the storage space is available and otherwise unused. TIME-SPACE TRADEOF • • Consider time and space trade-offs in deciding on your algorithm. An algorithm is a well-defined list of steps for solving a particular problem. The time and space it uses are two major measures of the efficiency of an algorithm. then time may have to be sacrificed.e. the name and telephone number of its member.. ALGORITHMS: COMPLEXITY. If not. Then one would delete his or her record from the file. (c) Suppose one wants to obtain address for a given Name. Next time it may be both shorter and easier. change items in the record with the new data. among other data. counting such members. Again one would traverse the file. Then one would traverse the file to obtain Name and Address for each member. (b) Suppose one wants to find the names of all members living in a certain area. (g) Suppose one wants to find the number of members 65 or older. Never be afraid to start over. Suppose we are given the name of a . One major purpose of this section is to develop efficient algorithms for the processing of our data. Let’s see these ideas with two examples: Searching Algorithms Consider a membership file in which each record contains.MSc. (e) Suppose a member dies. Given the name of the member. Then one would search the file for the record containing Name. Name Address Telephone Number Age Sex (a) Suppose the organization wants to announce a meeting through a mailing system. one would first need to search for the record in the file. In Software 13 A real example will make our idea clear about these concepts: An organization contains a membership file in which each record contains the following data for a given member. Then one would perform the "update"--i. (f) Suppose a member has moved and has a new address and telephone number. (d) Suppose a new person joins the organization. it is preferable to use the algorithm requiring more space and less time. Then one would insert his or her record into the file. e. sorting the file alphabetically and using a binary search is a very efficient way.MSc. employee numbers and much additional information among its fields. Specifically. Drawback Although the binary search algorithm is a very efficient algorithm. This indicates which half of the list contains Name. inserting an element in an array requires elements to be moved down the list. How can we solve such a problem? One way is to have another file. the algorithm assumes that one has direct access to the middle name in the list or a sublist. one at a time. Binary Search Compare the given Name with the name in the middle of the list. For finding the record for a given name. i. it is intuitively clear that the average number of comparisons for a file with n records is equal to n/2. Then we would have to do a linear search for the record. Consider that the time required to execute the algorithm is proportional to the number of comparisons. One can show that the complexity of the binary search algorithm is given by C(n) = log2n. Unfortunately. for example. the complexity of the linear search algorithm is given by C(n) = n/2. On the other hand. In Software 14 member and we want to find his or her telephone number. until the given Name and the corresponding telephone number is found. one will not require more than 6 comparisons to find a given Name in a list containing 64 (=26) names. This means that the list must be stored in some type of array. which is extremely timeconsuming for a very large number of records. it has some major drawbacks. Thus.. Second. One way to do this is to linearly search through the file. assuming that each name in the file is equally likely to be picked. suppose we are given only the employee number of the person. An Example of Time-Space Tradeoff Suppose a file of records contains names. and deleting an element from an array requires elements to be moved up the list. which is sorted numerically according to the . Continue the process until Name is found in the list. that is. to apply the following algorithm: Linear Search Search each record of the file. Then compare Name with the name in the middle of the correct half to determine which quarter of the list contains Name. containing only two columns. # Some other data types are stacks. is minimal for the amount of extra information it provides. . (ii) binary search. Name 1-abc 2-xyz 3-pqr 4-mnp 5-lmn Name1 Name2 Name3 Name4 Name5 Extra Data XXXX XXXX XXXX XXXX XXXX Pointer 1 2 3 4 5 Name Name1 Name2 Name3 Pointer 1 2 3 Fig. In Software 15 employee number. the first column containing an alphabetized list of the names and the second column containing pointers. This. Summary # An entity is something that has certain attributes or properties that may be assigned values. pictured in figure 1-10. link list and trees. which give the locations of the corresponding records in the main file. These values may be numeric or nonnumeric. Program in haste and debug forever. queues and graphs. (i) linear search. Another way. would double the space required for storing the data. 1-10 Quote of the chapter: Act in haste and repent in leisure. since the additional space. however. Employee. This is one way of solving the problem that is done frequently. No. # Sometimes two or more operations may be used in a given situation to get optimum speed of a given situation.MSc. # The logical or mathematical model of a particular organization of data is called a data structure. is to have the main file sorted numerically by employee number and to have an auxiliary array with only two columns. # We have discussed two searching algorithms. Three most popular data Structures are arrays. MSc. In Software Zee Interactive Learning Systems 16 . In Software 2 COMPEXITY. BIG O NOTATION MAIN POINTS COVERED ! INTRODUCTION a) Floor and Ceiling Functions b) Remainder function: Modular Arithmetic c) Integer and Absolute Value Functions d) Summation Symbol: Sums e) Factorial Function f) Permutations g) Exponents and Logarithms 1 . RATE OF Growth.MSc. 5  7  =4 =3 = -8 =7 b) Remainder function: Modular Arithmetic Let ‘k’ be any integer and let M be a positive integer. Multiple alternatives Iteration logic.14 √5  -8.14 = √5 = -8. Then. or repetitive flow ! COMPLEXITY OF ALGORITHMS ! RATE OF GROWTH. Then ‘x’ lies between two integers called the floor and the ceiling of ‘x’. BIG O NOTATION ! SUMMARY INTRODUCTION This section gives various mathematical functions. denotes the greatest integer that does not exceed x. which appear very often in the analysis of algorithms and in computer science.5 = 7 = 3 2 -9 7 3. then x  = x  . Specifically. denotes the least integer that is not less than x. Single alternative 2. otherwise x  + 1 = x  3. or conditional flow 1. a) Floor and Ceiling Functions Let ‘x’ be any real number. In Software ! ALGORITHMIC NOTATION ! CONTROL STRUCTURES • • • 2 Sequence logic. x  . If x is itself an integer. Double alternative 3. called the ceiling of x. x  . or sequential flow Selection logic. k (mod M) . called the floor of x.MSc. simply divide ‘k’ by M to obtain the remainder ‘r’.MSc. { 0. in the set {1. remainder is 4] =25. In Software 3 (read k modulo M) will denote the integer remainder when ‘k’ is divided by M. and a Ξ b (mod M) is read ‘a’ is congruent to ‘b’ modulo M. divider = 7. 6 + 9 Ξ 3. More exactly. 1 . divider = 5. 0 Ξ M (mod M) and a ± M Ξ a (mod M) Arithmetic modulo M refers to the arithmetic operations of addition.3. or INT (-8. remainder is 2] =3. M -1} or. INT(x) = x  INT (7) according to whether x is .5) = -8. 2. k (mod M) is the unique integer r such that Where 0 ≤ r < M k = Mq + r When ‘k’ is positive. written INT(x). sometimes called "clock" arithmetic. The following aspects of the congruence relation are frequently useful. divider = 8. Observe that INT(x) = x  positive or negative. M}. multiplication and subtraction where the arithmetic value is replaced by its equivalent value in the set.2. divider = 11. 25(mod 5) = 0. converts x into an integer by deleting (truncating) the fractional part of the number..….) 12 c) Integer and Absolute Value Functions Let x be any real number. Some examples: 25 (mod 7) = 4. For example. remainder is 0] =35. =7 INT ( √5 ) = 2. in arithmetic modulo 12. The integer value of x. 2 + 10 Ξ 0 Ξ (The use of 0 or M depends on the application. which is denoted and defined as: a Ξ b (mod M) if and only if M divides (b – a) M is called the modulus.5 Ξ 8.…. 3 (mod 8) = 3 [dividend [dividend [dividend [dividend =25. 35 (mod 11) = 2. Thus INT(3.14) = 3. remainder is 3] The term "mod" is also used for the mathematical congruence relation.. 1. 7 x 5 Ξ 11. then the sums a1 + a2 + …….1). is defined as the greater of x or -x.. 2 . That is. by n Σ aj j=1 and n Σ aj j=m The expression a1b1 + a2b2 + ……… + anbn is denoted as n Σ ai bi i=1 when n = 5.2.44| = is positive. |-0. and. Thus. 3! = 1.075 We note that | x| = |-x| |-3. |7| = 7. In Software 4 The absolute value of the real number x. and -x. an and am + am+1 + … + an will be denoted respectively. written ABS(x) or |x| .44.(n . For example.33| and. n! = 1 . 3 … (n -2). ABS (x) = x.n It is also convenient to define 0! = 1. d) Summation Symbol: Sums Here we introduce the summation symbol Σ (the Greek letter sigma). | .3 = 6 . if x is positive.MSc.075| = 0. inclusive. Hence ABS (0) = 0. a2.15| = 15 . = 3. …. for x ≠ 0. 4.2 = 2 . for a=b and I starting from 2. for x ≠ 0. Consider a sequence a1. . |x | |4. is denoted by n! (read "n factorial"). a3. (a) (b) 2! = 1.33. we have 5 Σ j2 = 22 + 32 + 42 + 52 = 4 + 9 + 16 + 25 = 54 i=2 e) Factorial Function The product of the positive integers from 1 to n. if x is negative. cab. Accordingly log 2 8=3 since 23 = 8 . 5! = 120 permutations of a set with 5 elements. bca. a-m = 1 / am Exponents are extended to include all rational numbers by defining. Let ‘b’ be a positive number.2. for any rational number m / n a For example. cba One can prove: There are n! permutations of a set of n elements. In Software (c) (d) (e) 5 4! = 1. a ( m times ) a0 = 1. acb. 5! = 6 . m/n = n am = ( a) m n 2-4 = 1 / 24 = 1/16 . 120 = 720 f) Permutations A permutation of a set of n elements is an arrangement of the elements in a given order. For example the permutations of the set consisting of the elements a . The logarithm of any positive number ‘x’ to the base ‘b’ written log b x represents the exponent to which ‘b’ must be raised to obtain ‘x’. am = a .3.4! = 5. g) Exponents and Logarithms We consider first integer exponents (where m is a positive integer). and so on. bac. c are: abc .4 = 24 5! = 5.24 = 120 . b . That is y = log b x and by = x are equivalent statements. …. Accordingly there are 4! = 24 permutations of a set of 4 elements. 6! = 6.. 24 = 16 .MSc. 1252/3 = 52 = 25 Logarithms are related to exponents as follows. a . 001 = -3 10 Furthermore for any base b. Logarithms to the base e Logarithms to the base 10 Logarithms to the base 2 Notation: log x will mean log 2 x unless otherwise specified. This section describes the format that is used to present algorithms throughout the text. Algorithm (Largest Element in Array) A nonempty array DATA with N numerical values is given. iii) If DATA [K] exceeds MAX. Step 1 [Initialize. The algorithm finds the location LOC and the value MAX of the largest element of DATA. since b0 = 1 since b1 = b log b 1 = 0 log b b = 1 The logarithm of a negative number and the logarithm of 0 are not defined. In Software log log log 6 since 10 2 = 100 since 2 6 = 64 since 10 –3 = 0. The final values appearing in LOC and MAX give the location and value of the largest element of DATA.001 100 = 2 2 64 = 6 10 0. ..718281…. Given no other information about DATA. The variable K is used as a counter. ALGORITHMIC NOTATION An algorithm is a finite step-by-step list of well-defined instructions for solving a particular problem. This algorithmic notation is best described by means of examples. We want to find the location LOC and the value MAX of the largest element of DATA. LOC = 1 and MAX = DATA [1]. Exponent function f(x) = bx Logarithmic function g(x) = log b x For example log e 40 = 3. ] Set K = 1. then update LOC and MAX so that LOC = K and MAX = DATA [K]. An array DATA of numerical values is in memory.MSc.6889…. ii) Then compare MAX with each successive element DATA [K] of DATA. one way to solve the problem is: i) Initially we being with LOC = 1 and MAX = DATA [1]. Natural logarithms Common logarithms Binary logarithms – where e = 2. or implicitly. . MAX. ] if MAX < DATA[K]. (1) (2) (3) Sequence logic. Module C Sequence logic.] Set K = K + 1. [Compare and update.] if K > N. The second part of the algorithm consists of the list of steps that is to be executed. .] Go to Step 2. by means of numbered steps. Step 4. Step 5. Step 3. or conditional flow Iteration logic. . CONTROL STRUCTURES Algorithms and their equivalent computer programs are more easily understood if they use self-contained modules and three types of logic or flow of control. 7 [Increment counter. Module A Flow chart equivalent Module A Module B Module B Module C . The first part identifies the variables. The sequence may be presented explicitly. In Software Step 2. Algorithm . The format for the formal presentation of an algorithm consists of two parts. or repetitive flow These three types of logic are discussed below and in each case we show the equivalent flowchart. [Test counter. Sequence Logic (Sequential Flow) In this case the modules are executed in the obvious sequence. or sequential flow Selection logic. by the order in which the modules are written. . . and Exit. then Write LOC.MSc. then Set LOC = K and MAX = DATA [K] [Repeat loop. which occur in the algorithm and lists the input data. For clarity. which may consist of one or more statements. we will frequently indicate the end of such a structure by the statement [End of if Structure. which are discussed separately. then Module A. then [ Module A ] else [ Module B ] [ End of if structure.] These conditional structures fall into three types.] The logic of this structure is pictured in Fig.MSc. then [ Module A2 ] : : else if condition(M). The structures. otherwise Module A is skipped and control transfers to the next step of the algorithm. Double alternative: This structure has the form: if condition. then [ Module A1 ] else if condition(1). then 8 . In Software Selection Logic (Conditional Flow) Selection logic employs a number of conditions. which implement this logic are called conditional structures or if structures. is executed. then [Module A] [End of if Structure. ] Multiple alternatives: The structure has the form if condition(1). Single alternative: This structure has the form if condition. which lead to a selection of one out of several alternative modules. If the condition holds. 2-3(a). if any. x = -b/2a. X . Algorithm (Quadratic Equation) This algorithm inputs the coefficients A. then there is only one(double) real solution. is given by x -b ± √( b2 – 4ac) = 2a where D = b2 – 4ac is called the discriminant of the equation. Exit. Iteration Logic (Repetitive Logic) . Step 3. Example The solution of the quadratic equation ax2 + bx + c = 0 where a ≠ 0. X2 else if D = 0. Step 1. C Set D = B2 – 4AC if D > 0.√D)/2A Write X1. If D = 0. then Set X1 = (-B + √D)/2A and X2 = (-B . then Set X = -B / 2A Write ‘UNIQUE SOLUTION’ .MSc. the formula gives two distinct solutions. If D is negative then there are no real solutions. B. In Software else 9 [ Module AM ] [ Module B] [ End of if structure ] The logic of the structure allows only one of the modules to be executed. Read A. else Write ‘NO REAL SOLUTION’ . [ End of If structure ] Step 4. The following algorithm finds the solution of a quadratic equation. C of a quadratic equation and outputs the real solutions. If D is positive. B. Step 2. Let’s see this with an example. In order to compare algorithms.MSc. and MAX = DATA[1]. we have to find out the efficiency of our algorithms. Each type begins with a Repeat statement and is followed by a module. [ Initialize. ] Write LOC. In this section we will discuss how to find efficiency. This will enable us to choose the right one in a given situation. Now we are going to discuss the same problem using a repeat-while loop Algorithm( Largest Element in Array ) Given a nonempty array DATA with N numerical values this algorithm finds the location LOC and the value MAX of the largest element of DATA. if MAX < DATA[K]. [ End of Step 2 loop. 2. 6. ] 4. In Software 10 The third kind of logic refers to either of two types of structures involving loops. called the body of the loop. 5. Example 1 Food for thought: . Exit. MAX. Repeat Steps 3 and 4 while K ≤ N 3. 1. COMPLEXITY OF ALGORITHMS In designing algorithms we need methods to separate bad algorithms from good ones.Set K = K + 1. ] Set K = 1 . LOC = 1 . The analysis of algorithms and comparisons of alternative methods constitute an important part of software engineering. [ End of if structure. then Set LOC = K and MAX = DATA[K]. There are two types of such loops: (1) repeat-for loop Repeat for K = R to S step T [ Module ] [ End of Loop ] (2) repeat-while loop Repeat while condition [ Module ] [ End of loop ] We have discussed an algorithm for finding the maximum element in an array. Key operations . Among them. If the 3-letter word is “the”. Complexity basically refers to the running time of an algorithm. and want to search through TEXT for the first occurrence of a given 3-letter word. and the size of the input data is n.During the execution of an algorithm we have to perform several operations. Example 2 Suppose we have been given an English short story TEXT. Suppose you are given an algorithm M.MSc. so the complexity. Space: The space is measured by counting the maximum memory needed by the algorithm. W. Frequently. if W stands for the 3-letter word “zee” Food for thought: Where do you think the word “zee” will occur? a) Beginning of the text b) End of the text . Then the complexity of an algorithm M is the function f(n). the operation that takes longer time as compared to other operations is called key operation. On the other hand. then it is likely that it will occurs near the beginning of TEXT. Then the efficiency of the algorithm M depends on two main measures: • • Time taken by the algorithm Space used by the algorithm Time: We can measure time by counting the number of key operations performed during the execution of the algorithm. f(n) will be small. the storage space required by an algorithm is simply a multiple of the data size n. which gives the running time and/or storage space requirement of the algorithm in terms of size n of the input data. In Software 11 What do you think are the possible criteria that measure the efficiency of an algorithm? (a) Time taken (b) Length of the algorithm (c) Memory space used (d) Time required in writing the algorithm (a) and (c) are the correct answers. called the best case Food for thought: What is the best case while searching in an array for a specific element? (a) Element occurs at the end of the array (b) Element occurs at the beginning of the array (c) Element occurs at the middle most position of the array (d) Best case does not exist (b) is the correct choice. In Software 12 c) Never occur d) Not quite sure Just for thought. Average case analysis of algorithms: The analysis of the average case assumes a certain probabilistic distribution for the input data. such as LOC = 0. and suppose a specific ITEM of information is given. Then the expectation or average value E is given by E = n1p1 + n2p2 + ………. p2 . pk.MSc.. as only one element has to be compared before the searching procedure can end. The two cases one usually investigates in the complexity theory are as follows: (1) Worst case: the maximum value of f(n) for any possible input (2) Average case: the expected value of f(n) (3) Best case: sometimes we also consider the minimum possible value of f(n). The average case also uses the following concept in probability theory. nk occur with respective probabilities p1 . …… . The word “zee” is not a very common word so W may not appear at all. . or to send some message. with each element in DATA. one by one. The above discussion leads us to the question of finding the complexity function f(n) for certain cases. any answer can be true. That . n2 . ……. We want either to find the location LOC of ITEM in the array DATA. One such assumption might be that all possible permutations of an input data set are equally likely. + nkpk Example (Linear Search) Suppose the linear array DATA contains n elements. to indicate that item does not appear in DATA. The linear search algorithm solves this problem by comparing ITEM. Suppose the numbers n1. so the complexity f(n) of the algorithm will be large. we have C(n) = n Average Case Here we assume that ITEM does appear in DATA. In Software 13 is. Set K = K + 1. + n ) . 1/ n = (n+1)/2 . Repeat Steps 3 and 4 while LOC = 0 and K ≤ N. 3. [ Successful? ] if LOC = 0.. the number of comparisons can be any of the numbers 1. and so on. In either case. Accordingly. 1/n = ( 1 + 2 + …. then DATA[2]. [ End of If structure. 4. ] 5. ] [ End of Step 2 loop. The algorithm finds the location LOC of ITEM in the array DATA or sets LOC = 0 1. then Set LOC = K.n. Then C(n) = 1 . We seek C(n) for the worst case and the average case. [ Initialize ] Set K = 1 and LOC = 0. 1/n = n ( n + 1 ) / 2 . ] 6. 1/n + …………… + n . and it is equally likely to occur at any position in the array. Worst Case Clearly the worst case occurs when ITEM is the last element in the array DATA or is not there at all. 2. ……. [ Increments counter.MSc. we compare ITEM with DATA[1]. if ITEM = DATA[K]. 1/n + 2 . then Write ITEM is not in the array DATA else Write LOC is the location of ITEM. until we find LOC such that ITEM = DATA[LOC]. 2. exit We can find the complexity of the search algorithm by the number C of comparisons between ITEM and DATA[K]. and each with probability p = 1/n. A formal representation of the algorithm is as follows: Algorithm ( Linear Search ) A linear array DATA with N elements and a specific ITEM of information are given. Suppose M is an algorithm. not dependent on n. These five orders. We call O(n2 ) quadratic time. Under these conditions we also say that “f(n) has order at most g(n)” or “ f(n) grows no more rapidly than g(n)”. such as Logarithmic time : log 2n Linear time : n Quadratic time : n2 Cubic time : n3 Linear cum logarithmic time : n log2n. f(n) will normally be the operation count or time for some algorithm.MSc. Rate of Growth: Big O Notation or Big Oh Notation If f(n) and g(n) are functions defined for positive integers. are the once most commonly used in analyzing algorithm. and we wish to choose the form of g(n) to be as simple as possible. It is usually the rate of increase of f(n) that we want to examine. O(n) means that the time is directly proportional to n. Food for thought: What will be the name of the function whose functional form is 2n? (a) Hyperbolic (b) Parabolic (c) Exponential (d) Logarithmic (c) is the correct choice The rates of growth of these standard functions are given below in the Fig 2-1. and is called the linear time. and n is the size of input data in that algorithm. then to write f(n) is O (g(n)) means that there exists a constant C such that |f(n)| < c |g(n)| for all sufficiently large positive integers n. In Software 14 This agrees with our intuitive feeling that the average number of comparisons needed to find the location of ITEM is approximately equal to half the number of elements in the DATA set. O(2n) exponential. We thus write O(1) to mean computing time that is bound by a constant. Clearly the complexity f(n) of M increases as n increases. When we apply this notation. O(n3) cubic. . This is usually done by comparing f(n) with some standard function. together with logarithmic time O(log n) and O(n log n). Zee Interactive Learning Systems . we give the complexity of certain well known searching and sorting algorithms: (a) (b) (c) (d) Linear search Binary search Bubble sort Merge-sort : : : : O(n) O(log n) O(n2) O(n log n) Summary # In order to compare algorithms. we have to find out its efficiency. for all n > n0. and n is the size of input data in that algorithm. This will help us to employ the right one in order to solve problems effectively.MSc. To indicate the convenience of this notation. # Suppose M is an algorithm. there exists a positive integer n0 and a positive number M such that. we have | f(n) | < M | g(n) | Then we may write f(n) = O ( g(n) ) This is called big O notation. In Software 15 Fig 2-1 The above table is arranged such that the rate of growth of the function increases from left to right with log n having the lowest rate of growth and 2n having the largest rate of growth. # Big O notation states that for a function f(n). Clearly the complexity f(n) of M increases as n increases. In Software 1 3 ARRAYS MAIN POINTS COVERED ! INTRODUCTION ! LINEAR ARRAY ! REPRESENTATION OF LINEAR ARRAY IN MEMORY ! TRAVERSING LINEAR ARRAYS ! SORTING.MSc. BUBBLE SORT ! SEARCHING. LINEAR SEARCH ! MULTIDIMENSIONAL ARRAYS ! ! REPRESENTATION OF MULTIDIMENSIONAL ARRAYS IN MEMORY SUMMARY . . A2..e. In Software 2 Introduction D ata-structures are classified into two broad categories linear and nonLinear. data elements of the same type) such that: (a) The elements of the array are referenced respectively by an index set consisting of n consecutive numbers (b) The elements of the array are stored respectively in successive memory locations Length or size of an array = The number of elements in the array Length = UB – LB + 1 Where UB is the largest index called the upper bound And LB is the smallest index called the lower bound Notation For Representing Arrays Food for thought: You can represent the elements of an array A by (a) A1. Advantages: This has a linear structure They are easy to traverse. LINEAR ARRAY A linear array is a list of a finite number n of homogeneous data elements (i. ….. An . The most elementary data-structure that we will introduce is array. A3. search and sort They are easy to implement Disadvantages: The length of the array cannot be changed once it is specified Food for thought: What do you think is the reason for the above disadvantage? (a) Array size has to be fixed at the beginning (b) Problem with memory allocation occurs if size is altered (c) Once a fixed block of memory has been reserved for a particular array it cannot be altered (d) Variable length arrays are not required in real life (a) and (c) are the correct choices.MSc. .. A [2].. …. Accordingly. A [N] We will usually use the subscript notation or the bracket notation. …. A [3]. A2. Food for thought: What is the nature of these memory locations? a) b) c) d) Linear Circular Random We cannot know about the memory locations (a) Is the correct choice as seen from the diagram below We use the notation LOC (LA[K]) = address of the element LA [K] of the array LA As we have previously noted. A (2). …. …. A{2}. Note that a subscript allows any element of A to be referenced by its relative position in A. A [3]. A [2]. the elements of LA are stored in successive memory cells. …. (b) and (c) are the correct choices The subscript notation may denote the elements of an array A A1. PL/1 and BASIC) A (1). REPRESENTATION OF LINEAR ARRAYS IN MEMORY Let LA be a linear array in the memory of the computer. A (N) A [1]. …. denoted by Base (LA) and called the base address of LA. A (2). the computer does not need to keep track of the address of every element of LA.... A{N} (a) . The memory of the computer is simply a sequence of addressed locations as shown in the fig 31. A{3}. A (N) or by the bracket notation (used in C and Pascal) A [1]. An or by the parentheses notation (used in FORTRAN. Food for thought: Why don’t we keep track of all the array elements? . A [N] A{1}. Regardless of the notation.MSc.. A3. In Software (b) (c) (d) 3 A (1)... but needs to keep track only of the address of the first element of LA.... the number K in A[K] is called a subscript or an index and A[K] is called a subscripted variable. Fig 3-1 TRAVERSING LINEAR ARRAYS Let A be a collection of data elements stored in the memory of the computer.lower bound) Where w is the number of words per memory cell for the array LA. Suppose we want to print the contents of each element of A or suppose we want to count the number of elements of A with a given property. We can accomplish this by traversing A. Observe that the time to calculate LOC(LA[K]) is essentially the same for any value of K. the computer calculates the address of any element of LA by the following formula: LOC(LA[K]) = Base (LA) + w(K .MSc. The simplicity of the algorithm comes from the fact that LA is a linear structure. that is. one can locate and access the content of LA[K] without scanning any other element of LA. . by accessing and processing (frequently called visiting) each element of A exactly once. Furthermore. . In Software 4 (a) Array elements except the first one are not required (b) The first element contains information of all the other elements (c) Knowing the first element and the position of the required element we can traverse the array to reach that element (c) is the correct choice as explained below Using this address Base (LA). The following algorithm traverses a linear array LA. given any subscript K. 100 101 102 103 104 . Inserting And Deleting Let A be a collection of data elements in the memory of the computer. from which end does insertion and deletion take place? (a) Beginning of the array (b) End of the array (c) Middle of the array (d) Beginning and end of the array (c) and (d) are the correct choices. Accordingly.] 5. Repeat for K = LB to UB Apply PROCESS to LA[K] [End of loop. Note: The operation PROCESS in the traversal algorithm may use certain variables. This section discusses inserting and deleting when A is a linear array.] Apply Process to LA [K]. 1. which uses a repeat-for loop instead of the repeat-while loop. 4 [Increase counter. the algorithm may need to be preceded by such an initialization step. [End of Step 2 loop. the choice of the end from which insertion and deletion occurs solely depends on the programmer. In Software 5 Algorithm: (Traversing a Linear Array) Here LA is a linear array with lower bound LB and upper bound UB. This algorithm traverses LA applying an operation PROCESS to each element of LA. 3 [Visit element. which must be initialized before PROCESS is applied to any of the elements in the array. . "Inserting" refers to the operation of adding another element to the collection A.] Set K = LB.] Set K = K + 1. Here is an alternative of the algorithm.] 2. 2 Repeat Steps 3 and 4 while K ≤ UB. Now we present the same algorithm using a different control structure. and "deleting" refers to the operation of removing one of the elements of A. Food for thought: In an array. Algorithm: (Traversing a Linear Array) This algorithm traverses a linear array LA with lower bound LB and upper bound UB. 1 [Initialize counter. Exit.MSc. Exit. many different sorting algorithms. BUBBLE SORT Let A be a list of ‘n’ numbers.13.6. Similarly. Actually. If X is the value of the next test.6 4. In Software 6 We can easily insert an element at the "end" of a linear array provided the memory space allocated for the array is large enough to accommodate the additional element. if Y is the value of the subsequent test. then we simply assign TEST[5] = Y to add Y to the list. Similarly.MSc. Sorting A refers to the operation of rearranging the elements of A so they are in increasing order. Example Suppose TEST has been declared to be a 5-element array but data have been recorded only for TEST[1]. there are many. Now. then one simply assigns TEST [4] = X to add X to the list. suppose A originally is the list After sorting. SORTING. sorting efficiently may be quite complicated.13.19. so that A [1] < A [2] < A [3] < … < A [N] For example. i. In fact. deleting an element at the "end" of an array presents no difficulties.4.23 Sorting may seem to be an easy task. we cannot add any new test scores to the list.. A is the list 23. if we need to insert an element in the middle of the array. but deleting an element somewhere in the middle of the array would require each subsequent element to be moved one location upward in order to "fill up" the array. . Then.e.5.19. Here we present and discuss a very simple sorting algorithm known as the bubble sort. on the average. TEST[2] and TEST[3].6. On the other hand. however. half of the elements must be moved downward to new locations to accommodate the new element and keep the order of the other elements.5. .. Actually. now we stop after we compare and possibility rearrange A[N-2] and A[N-1]....... A[2]. that is. Similarly we can compare A[2] and A[3] and arrange them so that A[2] < A[3]...... (During Step 1...... After n ...1] with A[N] and arrange them so that A[N-1] < A[N].... Clearly.............. A[N] will contain the largest element..................... The bubble sort algorithm works as follows: Step 1.. A is frequently a file of records....... the second largest element will occupy A[N-1]..... the list will be stored in increasing order..... You can note that the Step 1 involves n-1 comparisons.. This restriction is only for notational convenience... that is.) When Step 1 is completed. Step 2..... . ....... In Software 7 Food for thought: On which type of data can sorting take place? (a) Numeric data (b) Non-numeric data (c) Symbolic data (d) Binary data (a) and (b) are the correct choices Remark: The above definition of sorting refers to arranging numerical data in increasing order...... . First we have to compare A[1]................. Bubble Sort Suppose the list of numbers A[1]... we stop after comparing and rearranging A[N-3] and A[N-2]. when Step 2 is completed........ (Step 2 involves N2 comparisons and... We have to continue this process of comparison until we compare A[N ..... so that A[1] < A[2].......….. Step N-1... Repeat Step 1 with one less comparison... A[2] and arrange them in the desired order.... the largest element is "bubbled up" to the nth position or "sinks" to the nth position.... and sorting A refers to rearranging the records of A so that the values of a given key are ordered... . Repeat Step 1 with two fewer comparisons.........MSc.... sorting may also mean arranging numerical data in decreasing order or arranging non-numerical data in alphabetical order..... Compare A[1] with A[2] and arrange them so that A[1] < A[2}.1 steps..) Step3.. Then compare A[3] and A[4] and arrange them so that A[3] < A[4]..... A[N] is in the memory.... first we test whether DATA[1] = ITEM. Food for thought: What is the time complexity of the linear search algorithm? (a) Linear (b) Logarithmic (c) Exponential (d) Quadratic (a) is the correct choice We shall show that linear search is a linear time algorithm. or printing some message that ITEM does not appear there. rather than simply a search algorithm. Suppose we have been given a specific ITEM of information to search. In Software 8 The process of sequentially traversing through all or part of a list is frequently called a "pass". we may want to add the element ITEM to DATA after an unsuccessful search for ITEM in DATA. SEARCHING. Linear Search Suppose DATA is a linear array with n elements. The most intuitive way to search for a given ITEM in DATA is to compare ITEM with each element of DATA one by one. is called linear search or sequential search. and then we test whether DATA[2] = ITEM. the position following the last element of DATA. The search is said to be successful if ITEM does appear in DATA and unsuccessful otherwise. Frequently. Searching refers to the operation of finding the location LOC of ITEM in DATA. So each of the above steps is called a pass. That is. Let DATA represent that collection in the memory. LINEAR SEARCH Consider a collection of data in the memory. This method. Then the outcome . Accordingly. To simplify this. One then uses a search and insertion algorithm. The complexity of searching algorithms is measured in terms of the number f(n) of comparisons required in finding ITEM in DATA where DATA contains n elements.MSc. the bubble sort algorithm requires n-1 passes. and so on. We have not been given any other information about DATA. we first assign ITEM to DATA[N + 1]. where n is the number of input items. which traverses DATA sequentially to locate ITEM. 3 [Search for ITEM. the search must eventually "succeed". it signifies that the search is unsuccessful. one must guarantee that there is an unused memory location. or sets LOC = 0 if the search is unsuccessful. We have shown an algorithm for linear search. not one: Repeat while LOC ≤ N and DATA [LOC] = ITEM On the other hand. Without Step 1 the Repeat statement in Step 3 must be replaced by the following statement.] Set LOC = 1. which involves two comparisons.MSc. In Software 9 LOC = N + 1 When LOC denotes the location where ITEM first occurs in DATA. then Set LOC = 0. Complexity of the Linear Search Algorithm Food for thought: What is/are the factors on which the complexity of a given algorithm depends? (a) Number of steps (b) Number of comparisons (c) Number of arithmetic operations (d) Memory space occupied by the algorithm (a) and (c) are the correct choices . Algorithm : (Linear Search) LINEAR (DATA. N.] 4 [Successful ?] if LOC = n + 1. 1 [Insert ITEM at the end of DATA. The purpose of this initial assignment is to avoid repeated testing to find out if we have reached the end of the array DATA.LOC) Here DATA is a linear array with N elements and ITEM is a given item of information. Exit. This algorithm finds the location LOC of ITEM in DATA. ITEM.] Set DATA [N + 1] = ITEM. Observe that Step 1 guarantees that the loop in Step 3 must terminate. This way. 2 [Initialize counter. in order to use Step 1.] Repeat while DATA[LOC] ≤ ITEM Set LOC = LOC + 1 [End of loop. arrays where elements are referenced. since each element we represent in the array is a single subscript. in the worst case.. (Then P1 + P2 + … + Pn + q= 1.e. and suppose q is the probability that ITEM does not appear in DATA. The running time of the average case uses the probabilistic notion of expectation. by two and three subscripts. the average number of comparisons required to find the location of ITEM is approximately equal to half the number of elements in the array.e. In fact. P2 + …+ n. 1 + … + n. We have to consider two important cases as the average case and the worst case.) Since the algorithm uses K comparisons when ITEM appears in DATA[K]. the algorithm requires f(n)=n+1 Comparisons. suppose q is very small and ITEM appears with equal probability in each element of DATA. P1 + 2. the running time is proportional to n.MSc. the average number of comparisons is given by f(n) = 1. Accordingly. respectively. Most programming languages allow two-dimensional and three-dimensional arrays. Thus. 1 + (n + 1).. 1 = n + 1 2 n 2 That is in this special case. In this case. Clearly the worst case occurs when have to search through the entire array DATA. ITEM does not appear in DATA. Suppose Pk is the probability that ITEM appears in DATA[K].1 + 2. Then q = 0 and each P1 = 1/n. MULTIDIMENSIONAL ARRAYS The linear arrays we have discussed so far are also called one-dimensional arrays. 0 = (1 + 2 + … +n). i. where DATA contains n elements. some programming languages allow the number of dimensions for an array to be as high as 7. i. Pn + (n + 1). q In particular. 1 n n n n = n(n + 1) . f(n) = 1. Food for thought: Which of the following events have to be represented on the computer by multidimensional arrays? Chess board Sales figures as per year of a certain firm Co-ordinates in mathematics ------------- yes no yes . In Software 10 We have known that the complexity of our search algorithm is measured by the number f (n) of comparisons required to find ITEM in DATA. 2] A[3. and a column is a vertical list of elements). k] appears in row j and column k (a row is a horizontal list of elements.3-2-dimensional 3 x 4 array A. .yes (most obvious example) Two-Dimensional Arrays A two-dimensional (m. with the property that i≤j≤m and i≤k≤n The element of A with first subscript j and second subscript k will be denoted by Aj. Columns 1 2 3 4 1 A[1. 4] Rows 3 A[3.MSc. k or A [j. and each column contains those elements with the same second subscript. Suppose A is a two-dimensional m X n array. n) array A is a collection of m X n data elements such that each element is specified by a pair of integers (such as j. 1] A[2. There is a standard way of drawing a two-dimensional m X n array A where the elements of A form a rectangular array with ‘m’ rows and ‘n’ columns and where the element A[j. . We emphasize that each row contains those elements with the same first subscript. 2. 1] A [3. Figure 3-2 shows the case where A has 3 rows and 4 columns. 1] A [1.yes (Explanation: one axis represents the year and the other the name of the firm) Matrix ----. 2] A[1. with lower bound 1 and upper bound m. 3] A[3. m. n. with lower bound 1 and upper bound n. The length of a dimension is the number of integers in its index set. two-dimensional arrays are sometimes called matrix arrays. k). and the second dimension of A contains the index set 1. Hence. called subscripts. .…. 2] A[2.3] A[2. k] We call two-dimensional arrays as matrices in mathematics and tables in business applications. . In Software 11 Sales figures as per year and firm name ----.3] A[1. 4] 2 A[2... The pair of length m X n (read "m by n") is called the size of the array.. The first dimension of A contains the index set 1. 4] Fig. the array will be represented in the memory by a block of (m X n) sequential memory locations. in row-major order.3) (3.4) Column1 Column2 Column3 Column4 Column major order Row1 Row2 Row3 Row major order Fig.4) (2.3) (1.4) (3. which is called column-major order. or (2) row by row.2) (3.1) (2.3) (1.2) (3.2) (2.4) (3. the number of integers in its index set) can be obtained from the formula Length = upper bound . We emphasize that the particular representation used depends upon the programming language.2) (1.MSc.4) (2.1) (1. for a linear array LA. Although we represent A as a rectangular array of elements with m rows and n columns. (1.1) (2.1) (3. Figure 3-4 shows these two ways when A is a two-dimensional 3 X 4 array.1) (3. However.3) (2.3) (2.. The computer uses the formula LOC(LA[K]) = Base (LA) + w(K . the index set for each dimension still consists of consecutive integers from the lower bound to the upper bound of the dimension.3) (3.1) (1.e. the address of the first element of LA. In Software 12 Some programming languages allow you to define multidimensional arrays in which the lower bounds are not 1 (such arrays are sometimes called nonregular).4) (1. The length of a given dimension (i. 3-4 Recall that. but it does keep track of Base (LA).2) (2. not on the user. the computer does not keep track of the address LOCK (LA[K]) of every element LA [K] of LA.2) (1. Specifically.lower bound + 1 Representation of Two-Dimensional Arrays in Memory Let A be a two-dimensional m X n Array. the programming language will store the array A either by (1) column by column.1) . A similar situation also holds for any two-dimensional m x n array A. the third test of the twelfth student is as follows: LOC(SCORE[12.…. m2 ….mn data elements in which each element is specified by a list of n integers. K2. and so on. k]) of A[j.….…. K2.. the elements of B with subscripts K1.1) + (j . Then the address of SCORE [12. and 1 is the lower bound of the index set of LA). we mean .1)] = 200 + 4[45] = 384 General Multidimensional Arrays General multidimensional arrays are defined analogously. More specifically. Furthermore. suppose the programming language stores two-dimensional arrays using row-major order. Kn will be denoted by B K1.MSc. Kn . Suppose Base (SCORE) = 200 and there are w = 4 words per memory cell. That is.1)] Or the formula (Row-major order) LOC (A[j. k]) = Base (A) + w[M(k ..…. k] using the formula (Column-major order) LOC (A[j.e. we mean that the elements are listed so that the subscripts vary like an automobile odometer. such as K1. with the property that.called subscripts. Example Consider the 35 X 4 matrix array SCORE. K2. K2. By row-major order. the computer keeps track of Base (A) which is the address of the first element A[1. By column-major order. 3]) = 200 + 4 [4(12 . the next-to-last subscript varies second (less rapidly).1) + (3 . i. and that one can find that the address LOC (A[j.k]) is time independent of j and k. k]) = Base (A) + w [N(j-1) + (k . the last subscript varies first (most rapidly). w denotes the number of words per memory location for the array A.1] of array A and computes the address LOC (A[j.1)] Again. an n-dimensional m1 x m2 x … x mn array B is a collection of m1. 3]. Kn ] The array will be stored in the memory in a sequence of memory locations. the programming language will store array B either in row-major order or column-major order. Specifically. Note that the formula are linear in j and k. (Here w is the number of words per memory cell for the array LA. Kn or B [K1. In Software 13 to find the address of LA[K] in time independent of K. Most programming languages allow two-dimensional and three-dimensional arrays. Summary In this module we studied. # A linear array is a list of a finite number n of homogeneous data elements of the same type. # Sorting means arranging numerical data in decreasing order or arranging non-numerical data alphabetically. respectively. arrays where elements are referenced. the second subscript second (less rapidly). The linear arrays are called one-dimensional arrays. since each element we represent in the array is a single subscript. Zee Interactive Learning Systems . # Let A be a collection of data elements stored in the memory of the computer. Suppose we want to print the contents of each element of A or suppose we want to count the number of elements of A with a given property. # Searching refers to the operation of finding the location LOC of ITEM in DATA. "Inserting" refers to the operation of adding another element to the collection A. i.MSc. some programming languages allow the number of dimensions for an array to be as high as 7. and "deleting" refers to the operation of removing one of the elements of A. that is.e. by two and three subscripts. by accessing and processing (frequently called visiting) each element of A exactly once. and so on. or printing some message that ITEM does not appear there. # Let A be a collection of data elements in the memory of the computer.. We can accomplish this by traversing A. In Software 14 that the elements are listed such that the first subscript varies first (most rapidly). In fact. In Software 4 LINKED LISTS MAIN POINTS COVERED ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Introduction Traversing a Linked List Searching a Linked List Memory allocation. Heapsort Summary 1 .MSc. garbage collection Insertion into a linked list Deletion from a linked list Header linked lists Two-ways lists Binary trees Representing binary trees in memory Traversing binary trees Traversal Algorithms using Stacks Binary search trees Searching and inserting in binary search trees Deleting in binary search trees Heap. # The information of the element # The link field or nextpointer field. The left part represents the information part of the node.MSc. NAME. and there is an arrow drawn from it to the next node in the list. The right part represents the next pointer field of the node. which contains the address of the first node in the list. which may contain an entire record of data items (e.g.g. called nodes. This section presents an algorithm that traverses LIST to process each node once. Accordingly. The pointer of the last node contains a special value. containing the address of the next node in the list In figure 4-1. Fig 4-1 We denote null pointer as ‘X’ in the diagram. We need only this address in START to trace through the list. which is any invalid address. each node has two parts. If the list contains no nodes it is called null list or empty list and is denoted by the null pointer in the variable START. INFO and LINK – such that INFO[k] and LINK[k] contains the information part and the nextpointer field of a node of LIST respectively. You can use this algorithm in future applications.. The linked list contains a list of pointer variables. which signals the end of the list. PTR->LINK points to the next node to be . where the linear order is maintained by pointers. TRAVERSING A LINKED LIST Let LIST be a linked list. ADDRESS…). One of them is START. In Software 2 Introduction A linked list is a linear collection of data elements. Let the pointer START point to the first element and NULL indicate the end of LIST. called the null pointer. Our traversing algorithm uses a pointer variable PTR that points to the node that is currently being processed. We divide each node into two parts. LIST requires two linear arrays e. the information at the second node. Fig 4-2 Here we have initialized PTR to START. the information at the third node. This algorithm traverses LIST. Set PTR = START 2. 3. Repeat Steps 3 and 4 PTR ≠ NULL 3. Write INFO [PTR] 4. START) This procedure prints the information at each node of the list. Exit. We continued until we reached PTR = NULL. 2. Set PTR = PTR->LINK [Updates pointer] . 1. The variable PTR points to the node currently being processed. Updated PTR by the assignment PTR = PTR->LINK. 4. it will be very similar to the Algorithm above. as pictured in Figure 4-2. Thus the assignment PTR = PTR->LINK moves the pointer to the next node in the list. which signals the end of the list. 1. applying an operation PROCESS to each element of LIST. Again updated PTR by the assignment PTR = PTR->LINK. Procedure: PRINT (INFO. Example 1 The following procedure prints the information at each node of a linked list. Since the procedure must traverse the list. A formal presentation of the algorithm is as follows: Algorithm: (Traversing a Linked List) Let LIST be a linked list in the memory. Set PTR = START [Initializes pointer PTR] Repeat Steps 3 and 4 while PTR ≠ NULL Apply PROCESS to PTR->LINK Set PTR = PTR->LINK [PTR now points to the next node] [End of Step 2 loop] 5. Then processed INFO[PTR]. so that PTR points to the second node. and then processed PTR[INFO]. And so on. LINK. In Software 3 processed.MSc. the information at the first node. Then processed INFO[PTR]. NUM) 1. Procedure: COUNT (INFO. 3. This algorithm finds the location LOC of the node where ITEM first appears in LIST. LOC) List is a linked list in memory. Case 1) Unsorted LIST Suppose the data in LIST is not sorted. we can still search for ITEM in LIST by traversing through the list using a pointer variable PTR and comparing ITEM with the contents INFO[PTR] of each node. we require an initialization step for the variable NUM before traversing the list. one by one. START. . ITEM. and we let the second test take place inside the loop. however.] 5. The algorithm is as follows: Algorithm SEARCH (INFO. 4. Accordingly.MSc. START. LINK. Return In other words.e. first we check to see whether PTR == NULL If not. since INFO[PTR] is not defined when PTR == NULL. In Software 4 [End of Step 2 loop. Let us now try to search an ITEM in the LIST. Hence the procedure is very similar to the above traversing algorithm. 6. [Initializes counter] Set PTR = START [Initializes pointer] Repeat Steps 4 and 5 while PTR ≠ NULL Set NUM = NUM + 1 [Increases NUM by 1] Set PTR = PTR->LINK [Updates Pointer] [End of Step 3 loop] Return We can observe that the procedure traverses the linked list in order to count the number of elements. the procedure may be obtained by substituting the statement Write INFO [PTR] for the processing step in the above Algorithm. Before we update the pointer PTR by PTR = PTR->LINK we require two tests. SEARCHING A LINKED LIST Let LIST be a linked list. 2. of LIST. Consider this procedure to find the number NUM of elements in a linked list. Here. we check to see whether INFO[PTR] == ITEM The two tests cannot be performed at the same time. we use the first test to control the execution of a loop. Set NUM = 0. or sets LOC = NULL. LINK. 5. First we have to check to see whether we have reached the end of the list i. Exit. Repeat Step 3 while PTR ≠ NULL: if ITEM == INFO[PTR]. Set LOC = PTR . or sets LOC = NULL 1. then Set PTR = PTR->LINK [PTR now points to next node. Free pool . 3.] [End of Step 2 loop. 4.] [Search is unsuccessful. Case 2) Sorted LIST Suppose the data in LIST is sorted. Set LOC = NULL 5. Again we search for ITEM in LIST by traversing the list using a pointer variable PTR and comparing ITEM with the contents INFO[PTR] of each node. The worst-case running time is proportional to the number n of elements in LIST. Repeat Step 3 while PTR ≠ NULL 3.] [End of If structure. then. LINK. 5 Set PTR = START. [PTR now points to the next node. however.] else if ITEM == INFO [PTR] then Set LOC = PTR. Now. and Exit [ITEM now exceeds INFO[PTR]] [ End of If structure ] [End of Step 2 loop] 4. In Software 1. 5. and Exit [Search is successful. ITEM. we can stop once ITEM exceeds INFO[PTR]. if ITEM < INFO [PTR]. a binary search algorithm cannot be applied to a sorted linked list. Drawback of link list as data structure You know that with a sorted linear array we can apply a binary search whose running time is proportional to log2n.MSc. 9. and Exit. Exit.] Set LOC = NULL. and the average-case running time is approximately proportional to n/2. START. This algorithm finds the location LOC of the node where ITEM first appears in LIST. The complexity of this algorithm is the same as that of other linear search algorithms. Algorithm: SEARCH(INFO. 10. 7. 2. since there is no way of indexing the middle element in the list. On the other hand. else Set PTR = PTR->LINK. 8. Set PTR = START 2. 6. one by one. LOC) LIST is a sorted list in memory.] else Set LOC = NULL. The space is initialized to zero bytes. size_t size ) calloc returns a pointer to space for an array of nobj objects. It does nothing if p is NULL. is called garbage collection. AVAIL) In the pointer-type representation of linked lists. LINK. This list. If the request cannot be satisfied the space is initialized to zero bytes. which does this collection. p must be a pointer to space previously allocated by calloc or malloc. or NULL.com Syntax of malloc: void *malloc( size_t size ) malloc returns a pointer to space for an object of size size_t. tagging those cells which are currently in use. Syntax of free: void free( void *p ) free de-allocates the space pointed to by p. each of size size_t. which has its own pointer. collecting all untagged space onto the free-storage list. Hence this free-storage list will also be called the AVAIL list. In Software 6 Together with the linked lists in memory. START. In C the function free returns memory that has been obtained by a call to malloc or calloc. the language provides facilities for returning memory that is no longer in use. a special list is maintained which consists of unused memory cells. Such a data structure will frequently be denoted by writing LIST (INFO. Garbage Collection The operating system of a computer may periodically collect all the deleted space onto the free-storage list. Please refer to the site zeelearn. Note: More examples on memory allocation are given on the web. or NULL if the request cannot be satisfied. Any technique. The garbage collection may take place when there is only some minimum amount of space or no space at all left in the . Garbage collection usually takes place in two steps. and then runs through the memory. is called the list of available space or the free-storage list or the free pool. Suppose we implement linked lists by parallel arrays and insertions and deletions are to be performed on two linked lists.MSc. Then the unused memory cells in the arrays will also be linked together to form a linked list using AVAIL as its list pointer variable. First the computer runs through all lists. Syntax of calloc: void *calloc( size_t nobj . That is. or when the CPU is idle and has time to do the collection. LINK. the free-storage list is empty. Overflow and Underflow Sometimes we insert new data into a data structure but there is no available space. In such cases. Similarly. Generally speaking.MSc. START.4-3(a). and node N points to node B. Observe that underflow will occur with our linked lists when START = NULL and there is a deletion. INSERTION INTO A LINKED LIST Let LIST be a linked list with successive nodes A and B. node A now points to the new node N. Suppose our linked list is maintained in the memory in the form LIST (INFO. we need to modify the program for adding spaces to the underlying arrays. the garbage collection is invisible to the programmer. We can handle underflow by printing the message UNDERFLOW. to which A previously pointed. i. Notice that overflow will occur with our linked lists when AVAIL= NULL and there is an insertion. Thus the . In Software 7 free-storage list. 4-3 We have shown insertion in Fig (b). AVAIL) In the above discussion we did not consider the AVAIL list for the new node N. Let us consider that the first node in the AVAIL list will be used for the new node N. This situation is usually called overflow.e. the term underflow refers to the situation where we want to delete data from a data structure that is empty. Suppose we want to insert a node N into the list between nodes A and B Fig. as pictured in Fig.. 1) Inserting a node at the beginning of the list 2) Inserting a node after the node with a given location 3) Inserting a node into a sorted list All our algorithms assume that the linked list is in the memory in the form LIST(INFO. and if the new node N is the last node in the list. to which AVAIL previously pointed (2) AVAIL now points to the second node in the free pool. all the algorithms will include the following steps: . to which node N previously pointed (3) The nextpointer field of node N now points to node B. Insertion Algorithms Algorithms which insert nodes into linked lists come up in various situations.MSc. LINK. Observe that three pointer fields are changed as follows: (1) The nextpointer field of node A now points to the new node N. Since our insertion algorithms will use a node in the AVAIL list. We will discuss three of them here. then N will contain the null pointer. START.AVAIL) and the variable ITEM contains new information to be added to the list. to which node A previously pointed Fig. In Software 8 above figure looks like Figure 4-4. 4-4 There are two special cases: If the new node N is the first node in the list. then START will point to N. and Exit. Set NEW->LINK = START. Exit. the steps can be implemented by the pair of assignments (in this order). Set NEW[INFO] = ITEM. 1. Set START = NEW. Using the variable NEW to keep track of the location of the new node. 3. In Software 9 (a) Checking to see if space is available in the AVAIL list. START. [OVERFLOW?] if AVAIL == NULL.} Set NEW = AVAIL and AVAIL = AVAIL->LINK 4. ITEM) This algorithm inserts ITEM as the first node in the list. 4-5 Fig. (c) AVIAL = LINK->AVAIL Copying new information into the new node. [Changes START so it points to the new node. In other words. the easiest place to insert the node is at the beginning of the list. AVAIL.] 6. 4-5 Inserting at the Beginning of a List Suppose we have not sorted our linked list and there is no reason to insert a new node in any special place in the list. (b) Removing the first node from the AVAIL list. INFO[NEW] = ITEM The systematic diagram of the latter two steps is given in Fig. Following is such an algorithm: Algorithm: INSFIRST (INFO. Set NEW = AVAIL 2. then Write OVERFLOW. . [Copies new data into new node.MSc. then the algorithm will print the message OVERFLOW.] 5. NEW = AVAIL.] 7. If AVAIL is NULL. LINK. [Remove first node from AVAIL list. [New node now points to original first node. START. we let node N point to node B. [Remove first node from AVAIL list. LOC. The following is an algorithm which inserts ITEM into LIST so that ITEM follows node A or.MSc. then N is inserted as the first node in the LIST as in algorithm INSFIRST. 3. 4.] [End of If structure.] if LOC == NULL. 2. Let N denote the new node (whose location is NEW). then Write OVERFLOW. LINK. [OVERFLOW?] if AVAIL = NULL. then [Insert as first node. 4-4 . NEW->LINK = LOC->LINK And we let node A point to the new node N by the assignment LOC->LINK = NEW A formal statement of the algorithm is as follows: Algorithm: 1.] Set NEW->LINK = START and START = NEW. ITEM) The algorithm inserts ITEM so that ITEM follows the node with location LOC or inserts ITEM as the first node when LOC = NULL. and Exit.] Exit Inserting into a Sorted Linked List Suppose we want to insert a node called ITEM into a sorted linked LIST. 4-6 Inserting after a Given Node Suppose we have been given a value of LOC where either LOC is the location of a node A in a linked LIST or LOC = NULL. If LOC = NULL. in the first node when LOC = NULL. Otherwise. Set NEW[INFO] = ITEM [Copies new data into new node. INSLOC (INFO. Then ITEM must be inserted between nodes A and B so that . AVAIL. 5.] Set NEW = AVAIL and AVAIL = AVAIL->LINK. else [Insert after node with location LOC. as pictured in Fig. In Software 10 Fig. so LOC= NULL. 7. FINDA (INFO. The PTR points to node B. Procedure: 1. The formal statement of our procedure is as follows. Thus SAVE and PTR are updated by the assignment. While traversing. START. [Initializes pointers.] Set SAVE = PTR and PTR = PTR->LINK. 4-7. 2. then Set LOC = SAVE. which finds the location LOC of the last node in LIST whose value is less than ITEM. and Return. and Return. 6. or sets LOC = NULL. 5. In Software 11 INFO(A) < ITEM < INFO(B) The following is a procedure which finds the location LOC of node A. as pictured in Fig.] Set LOC = SAVE Return.] Repeat Steps 5 and 6 while PTR # NULL if ITEM < PTR[INFO]. that is. so SAVE will contain the location of the node A. [End of If structure. since they do not involve the variable SAVE.] [End of Step 4 loop. 3. the traversing stops as soon as ITEM < PTR[INFO]. Fig. ITEM LOC) This procedure finds the location LOC of the last node in a sorted list such that INFO[LOC] < ITEM. LINK. Traverse the list using a pointer variable PTR and comparing ITEM with PTR->INFO at each node. or in other words. and Return. 4-7 . keep track of the location of the preceding node by using a pointer variable SAVE. then Set LOC = NULL.MSc. 4. SAVE = PTR and PTR = PTR->LINK The traversing continues as long as PTR[INFO] > ITEM. The cases where the list is empty or where ITEM < START[INFO]. [Special case?] if ITEM < START[INFO]. are treated separately. [List empty?] if START == NULL. then Set LOC = NULL. [Updates Pointers. Set SAVE = START and PTR = START->LINK. 8. MSc. In Software 12 Now we have all the components to present an algorithm, which inserts ITEM into a linked list. The simplicity of the algorithm comes from using the previous two procedures. Algorithm : 1. 2. 3. INSSRT(INFO, LINK, START, AVAIL, ITEM) This algorithm inserts ITEM into a sorted linked list. [Use Procedure FINDA to find the location of the node preceding ITEM] Call FINDA(START, ITEM, LOC). [Use Algorithm INSLOC to insert ITEM after the node with location LOC.] Call INSLOC(START, AVAIL, LOC, ITEM). Exit. DELETION FROM A LINKED LIST Suppose N is a node between nodes A and B in linked list LIST, as pictured in Fig. 4-8. Suppose node N is to be deleted from the linked list. The deletion occurs as soon as the nextpointer field of node A is changed so that it points to node B. (Accordingly, when performing deletions, one must keep track of the address of the node which immediately precedes the node that is to be deleted.) Suppose our linked list is maintained in the memory in the form LIST (START, AVAIL). Fig. 4-8 The above figure does not take into account the fact that, when a node N is deleted from our list it will immediately return its memory space to the AVAIL list. Specifically, for easier processing, it will be returned to the beginning of the AVAIL list. Thus a more exact schematic diagram of such a deletion is the one in Fig. 4-9. MSc. In Software 13 Fig. 4-9 Observe that three pointer fields are changed as follows: (1) (2) (3) The nextpointer field of node A now points to node B, where node N previously pointed The nextpointer field of N now points to the original first node in the free pool, where AVAIL previously pointed AVAIL now points to the deleted node N There are two special cases: If the deleted node N is the first node in the list, then START will point to node B; and if the deleted node N is the last node in the list, then node A will contain the NULL pointer. Deleting the Node Following a Given Node Consider the LIST again. Suppose we have been given the location LOC of a node N in LIST. Furthermore, we are given the location LOCP of the node preceding N. When N is the first node, LOCP = NULL. The following algorithm deletes N from the list. Algorithm : DEL(INFO, LINK, START, AVAIL, LOCK, LOCP) This algorithm deletes the node N with location LOC, LOCP is the location of the node which precedes N or, when N is the first node, LOCP = NULL. 1. if LOCP == NULL, then Set START = START->LINK. [Deletes first node.] else Set LOCP->LINK = LOC->LINK . [Deletes node N.] 2. [Return deleted node to the AVAIL list.] Set LOC->LINK = AVAIL and AVAIL = LOC. 3. Exit. MSc. In Software 14 START = START->LINK is the statement, which effectively deletes the first node from the list. This covers the case when N is the first node. Figure 4-10 is the schematic diagram of the assignment START = START->LINK Fig. 4-10 Figure 4-11 is the schematic diagram of the assignment LOCP->LINK = LOC->LINK which effectively deletes node N when N is not the first node. The simplicity of the algorithm comes from the fact that we are already given the location LOCP of the node, which precedes node N. In many applications, we must first find LOCP. Fig. 4-11 Deleting the Node for a Given ITEM of Information Let LIST be a linked list in memory. Suppose we have been given an ITEM of information and we want to delete from the LIST the first node N that contains ITEM. (If ITEM is a key value, then only one node can contain ITEM.) Recall that before we delete N from the list, we have to know the location of the node preceding N. Accordingly, first we will give a procedure which finds the location LOC of the node N containing ITEM and the location LOCP of the node preceding node N. If N is the first node, we set LOCP = NULL, and if ITEM does not appear in LIST, we set LOC = NULL. Traverse the list, using a pointer variable PTR and then we will compare ITEM with INFO[PTR] at each node. While traversing, keep track of the location of the preceding node by using a pointer variable SAVE, as pictured in Figure 4-7. Thus SAVE and PTR are updated by the assignments. since they do not involve the variable SAVE. where node N is the first node) are treated separately. then it sets LOCP = NULL. then Set LOC = PTR and LOCP = SAVE. [End of if Structure. or in other words. and Return. Procedure : FINDB(INFO. LINK. ITEM. and if ITEM appears in the first Node. [End of it Structure] 6. 1. then Set LOC = NULL and LOCP = NULL. Now we can easily present an algorithm to delete the first node N from a linked list. Algorithm : DELETE(INFO. [ITEM in first node] if START->INFO = ITEM.] 4. ITEM) .] 8.e. START. and Return. Set LOC = NULL.. Set SAVE = PTR and PTR = PTR->LINK. the traversing stops as soon as ITEM = PTR->INFO. The simplicity of the algorithm comes from the fact that the task of finding the location of N and the location of its preceding node has already been done in the above procedure.] 2. Repeat Steps 5 and 6 while PTR ≠ NULL.MSc. and Return [End of if Structure. [Search unsuccessful. 5. Return.] [End of Step 4 loop] 7. LINK. [Updates pointers. then Set LOC = START and LOCP = NULL. [List empty?] if START = NULL. which contains a given ITEM of information. LOCP) This procedure finds the location LOC of the first node N which contains ITEM and the location LOCP of the node preceding N. [Initializes pointers. In Software SAVE = PTR AND 15 PTR = PTR->LINK We will continue with the traversing as long as PTR->INFO ≠ ITEM. if INFO[PTR] = ITEM. then the procedure sets LOC = NULL. START. LOC.] 3. Set SAVE = START and PTR = START->LINK. Then PTR contains the location LOC of node N and SAVE contains the location LOCP of the node preceding N The formal statement of our procedure is as follows: The cases where the list is empty or where START->INFO = ITEM (i. AVAIL. If ITEM does not appear in the list. 3. LOC. The following are two kinds of widely used header lists. if LOC = NULL. Accordingly. .] Call FINDB(START. LOCP] 2. which always contains a special node. in such a case. and Exit.] Set LOC->LINK = AVAIL and AVAIL = LOC. our header lists will always be circular. (The term "grounded" comes from the fact that in many cases the electrical ground symbol is used to indicate the null pointer. at the beginning of the list. the header node also acts as a sentinel indicating the end of the list. [Use Procedure above to find the location of N and its preceding node. (1) A grounded header list is a header list where the last node contains the null pointer. Accordingly. and LINK[START] = START indicates that a circular header list is empty. [Deletes first node. Figure 4-12 contains schematic diagrams of these header lists. ITEM. [Delete node. 1.) (2) A circular header list is a header list where the last node points back to the header node. then write ITEM node in list. We can observe that the list pointer START always points to the header node. HEADER LINKED LISTS A header-linked list is a linked list.] if LOCP = NULL. called the header node. 4. LINK [START] = NULL indicates that a grounded header list is empty. [End of If structure. In Software 16 This algorithm deletes from a linked list the first node N which contains the given item of information. then Set START= START->LINK.][Return deleted node to the AVAIL list.] else Set LOCP->LINK = LOC->LINK.MSc. Unless otherwise we have stated or implied. Exit. We frequently use Circular header lists instead of ordinary linked lists because many operations are much easier to state and implement. [Initializes the pointer PTR. the AVAIL list will always be maintained as an ordinary linked list. and a specific ITEM of information is given. Apply PROCESS to PTR->INFO.] 5. . so the first node may not require a special case The next example illustrates the usefulness of these properties. [PTR now points to the next node. Algorithm : (Traversing a Circular Header List) Let LIST be a circular header list in the memory. Set PTR = PTR->LINK. 4. Example Suppose LIST is a linked list in the memory. 4-12 Although header lists in the memory may maintain our data. Exit. 1. This algorithm traverses LIST. Set PTR = START->LINK.] [End of Step 2 loop. applying an operation PROCESS to each node of LIST. In Software 17 Fig. 3. This comes from the following two properties of circular header lists. (1) The null pointer is not used and hence all pointers contain valid addresses (2) Every (ordinary) node has a predecessor.MSc. using header lists.] 2. Repeat Steps 3 and 4 while PTR ≠ START. if PTR->INFO = ITEM.MSc. given the location LOC of a node N in the list. [PTR now points to the next node.] 4. you now have immediate access to both the next node and the preceding node in the list. which contains the location of the preceding node in the list The list also requires two more pointer variables: FIRST. Figure 4-13 shows such a list. A two-way list is a linear collection of data elements. 2. In Software 18 The algorithm below finds the location LOC of the first node in LIST. which points to the first node in the list. where each node N is divided into three parts. ITEM. TWO-WAY LISTS Each list we have discussed above is called a one-way list. Set PTR = START->LINK. . which can be traversed in two directions: in the usual forward direction from the beginning of the list to the end. Furthermore. START. Repeat while PTR. The following is such an algorithm when LIST is a circular header list. which points to the last node in the list. Enough with link list. and LAST. in particular. called nodes. LINK. The two tests which control the searching loop (step 2) were not performed at the same time in the algorithm for ordinary linked list because for ordinary link list PTR->INFO is not defined when PTR = NULL. 1. LOC) LIST is a circular header list in memory. then Set LOC = NULL. (1) An information field INFO which contains the data of N (2) A pointer field FORW that contains the location of the next node in the list (3) A pointer field BACK. that you are able to delete N from the list without traversing any part of the list. or in the backward direction from the end of the list to the beginning. This means. We have introduced a new list structure. [End of If structure.INFO ≠ ITEM and PTR ≠ START Set PTR = PTR->LINK. This algorithm finds the location LOC of the node where ITEM first appears in LIST or sets LOC = NULL. since there is only one way we can traverse the list. Take a break and then continue again. Algorithm : SRCHHL (INFO.] [End of loop] 3. which contains ITEM when LIST is an ordinary linked list. called a two-way list. Exit. The list is circular because the two end nodes point back to the header node.MSc. in a two-way list. we can also traverse the list in the backward direction. Fig. Observe that such a two-way list requires only one list pointer variable START. On the other hand. Then the way the pointers FORW and BACK are defined gives us the following: Pointer property: FORW [LOCA] = LOCB if and only if BACK [LOCB] = LOCA In other words. 4-13 We can observe that. In Software 19 Observe that the null pointer appears in the FORW field of the last node in the list and also in the BACK field of the first node in the list. We also require two list pointer variables. instead of one list pointer variable START. This is because the two pointers in the header node point to the two ends of the list. using the variable LAST and the pointer field BACK. Two-Way Header Lists The advantages of a two-way list and a circular header list may be combined into a twoway circular header list and it is pictured in Figure 4-14. FORW and BACK. We can maintain two-way lists in memory by means of linear arrays in the same way as one-way lists except that now we require two pointer arrays. respectively. On the other hand. the statement that node B follows node A is equivalent to the statement that node A precedes node B. . using the variable FIRST and the pointer field FORW. Suppose LOCA and LOCB are the locations of nodes A and B. the list AVAIL of available space in the arrays will still be maintained as a one-way list--using FORW as the pointer field--since we delete and insert nodes only at the beginning of the AVAIL list. FIRST and LAST. instead of one pointer array LINK. we can traverse a two-way list in the forward direction as before. which points to the header node. we were concentrating on linear types of data structures: strings. If T1 is nonempty then its root is called the left successor of R. family and tables of contents. This structure is mainly used to represent data containing a hierarchical relationship between elements. 4-14 TREES So far. If T does contain a root R. and queues. called a binary tree. In Software 20 Fig. the diagram in figure 4-15 represents a binary tree T as follows . We frequently represent a binary tree T by means of a diagram. if T2 is nonempty. First we investigate a special kind of tree. lists. then its root is called the right successor of R. which can be easily maintained in the computer. Specifically. Here we define a nonlinear data structure called a tree. BINARY TREES A binary tree T is defined as a finite set of elements.MSc. such that: (a) (b) T is empty (called the null tree or empty tree) or T contains a distinguished node R. then the two trees T1 and T2 are called the left and right subtrees of R respectively. e. we will see later in the chapter that more general trees may be viewed as binary trees. called the root of T and the remaining nodes of T form an ordered pair of disjoint binary trees T1 and T2. records. arrays. Although such a tree may seem to be very restrictive. called nodes.g. Similarly. 1 or 2 successors. B. E and the right subtree consists of the nodes C. H.MSc. The nodes E and J have only one successor and the nodes D. J. In Software 21 Fig. This means. The trees are said to be copies if they are similar and if they have the same contents at corresponding nodes. K and I Any node N in a binary tree T has either 0. that every node N of T contains a left and a right subtree. F. D. 4-15 (i) T consists of 11 nodes. The above definition of the binary tree T is recursive since T is defined in terms of binary subtrees T1 and T2. represented by the letters A through L excluding I (ii) The root of T is the node A at the top of the diagram (iii) A left-downward slanted line from a node N indicates a left successor of N and a rightdownward slanted line from N indicates a right successor of N Observe that (a) (b) B is left successor and C is a successor of the node A The left subtree of the root A consists of the nodes B. Moreover. Binary trees T and T` are said to be similar if they have the same structure or in other words if they have the same shape. Which is the right option for similar trees? . C and H have successors. E. if N is a terminal node then both its left and right subtrees are empty. The nodes A. in particular. The nodes with no successors are called terminal nodes. L and K have no successors. G. Food for thought: Consider these four binary trees. The trees are said to be copies if they are similar and if they have the same contents at corresponding nodes. Terminology from graph theory and horticulture is also used with a binary tree T. A terminal node is called a leaf. Analogously. which is 1 more than the level number of its parent. the line drawn from a node N of T to a successor is called an edge. The depth (or height) of a tree T is the maximum number of nodes in a branch of T. and S2 is called the right child (or son) of N. That is. the trees (a) and (c) are copies since they also have the same data at corresponding nodes. In particular. the nodes with 2 children are called internal nodes. and a path ending in a leaf is called a branch. Binary trees T and T` are said to be similar if they have the same structure or. those nodes with the same level number are said to be of the same generation. we have assigned the root R of the tree T the level number 0. The terms descendant and ancestor have their usual meaning. and the nodes with 0 children are called external nodes. then for every other node we have assigned a level number. Sometimes. Each node in a binary tree has got a level number. S1 is called the left child (or son) of N. a node L is called a descendant of a node N (and N is called an ancestor of L) if there is a succession of children from N to L. Terminology We frequently use terminology to describe family relationships between the nodes of a tree T. Specifically. In Software (a) (b) (c) (d) 22 (a) and (b) (b) and (c) (c)and (d) (a) and (b) and (d) (d) is the right answer.MSc. in other words. First. Furthermore. The tree (b) is neither similar nor a copy of the tree (d) because. The three trees (a). then N is called the parent (or father) of S1 and S2. suppose N is a node in T with left successor S1 and right successor S2. Extended Binary Trees: 2-Trees A binary tree T is said to be a 2-tree or an extended binary tree if each node N has either 0 or 2 children. . In particular. In such a case. S1and S2 are said to be siblings (or brothers). L is called a left or right descendant of N according to whether L belongs to the left or right subtree of N. Specifically. except that the root has a unique parent. This turns out to be 1 more than the largest level number of T. Every node N in a binary tree T. called the predecessor of N. if they have the same shape. (c) and (d) are similar. in a binary tree we distinguish between a left successor and a right successor even when there is only one successor. we distinguish the nodes in diagrams by using circles for internal nodes and squares for external nodes. Furthermore. and a sequence of consecutive edges is called a path. The main requirement of any representation of T is that one should have direct access to the root R of T and.e. one should have direct access to the children of N. Observe that the tree is. struct tree_node *right . Representation of a binary tree node in C struct tree_node{ char (or int) info . which uses a single array. We will discuss two ways of representing T in memory in this section. a pointer to the root node of the tree. The second way. is called the sequential representation of T. indeed. the nodes in the original tree T are now the internal nodes in the extended tree. struct tree_node *left . such as the tree in figure 4-16. Furthermore. The root of the tree will be denoted as: struct tree_node *root . The first and usual way is called the link representation of T and is analogous to the way linked lists are represented in the memory. as pictured in the figure 4-16(b). i. } Where left and right are pointers to the left son and right son of that node. and the new nodes are the external nodes in the extended tree. 4-16 We get the term “extended binary tree” from the following operation. In Software 23 Fig. Sequential Representation of Binary Trees . REPRESENTING BINARY TREES IN MEMORY Let T be a binary tree. given any node N of T. a 2-tree. Then we may convert T into a 2-tree by replacing each empty subtree by a new node. Consider any binary tree T.MSc. . These three algorithms. Fig. NULL is used to indicate an empty subtree. 4-16 TRAVERSING BINARY TREES There are three standard ways in which we can traverse a binary tree T with root R. TREE[1] = NULL indicates that the tree is empty. This representation uses only a single linear array TREE as follows: (a) The root R of T is stored in TREE[1]. the binary tree T is complete or nearly complete. if we include null entries for the successors of the terminal nodes. (b) If a node N occupies TREE[K]. In particular. In fact. as stated above. Figure 4-16(b) is the sequential representation of the binary tree T shown in figure 416(a). inorder and postorder. In Software 24 Suppose T is a binary tree that is complete or nearly complete. the tree T in Fig. which means it would require an array with approximately 26 = 64 elements. Then there is an efficient way of maintaining T in memory called the sequential representation of T. Generally speaking the sequential representation of a tree with depth d will require an array with approximately 2d+1 elements. 4-15 has 11 nodes and depth 5. Observe that we require 14 locations in the array TREE even though T has only 9 nodes. are as follows: Preorder: (1) Process the root R. For example. then we would actually require TREE[29] for the right successor of TREE[14]. Accordingly this sequential representation is usually inefficient unless. then its left child is stored in TREE[2*K] and its right child is stored in TREE[2*K+1] Again.MSc. called preorder. the root R is processed between the traversals of the subtrees. Traverse the right subtree of R in postorder. RIGHT. the node-left-right (NLR) traversal. Note: More examples on binary tree are given on the web.com TRAVERSAL ALGORITHMS USING STACKS Suppose a binary tree T is maintained in memory by some linked representation TREE (INFO. . Postorder: (1) (2) (3) 25 Traverse the left subtree of R in postorder. The algorithm also uses an array STACK. In Software Inorder: (2) (3) Traverse the left subtree of R in preorder. where L(N) denotes the left child of node N and R(N) denotes the right child. The difference between the algorithms is the time at which the root R is processed. Traverse the right subtree of R in preorder. by means of non-recursive procedures using stacks. which will hold the addresses of nodes for future processing. Process the root R. the root R is processed before the subtrees are traversed. which were defined recursively in the last section. Traverse the right subtree of R in inorder. Preorder Traversal The preorder traversal algorithm uses a variable PTR (pointer).MSc. respectively. the root R is processed after the subtrees are traversed. Observe that each of the above traversal algorithms is recursively defined. We will discuss the three traversals separately. Specifically. We can observe that each algorithm contains the same three steps. we will expect that a stack be used when the algorithms are implemented on the computer. Process the root R. Please refer to the site zeelearn. ROOT) We will discuss the implementation of the three standard traversals of T. LEFT. in the "pre" algorithm. The three algorithms are sometimes called. and in the "post" algorithm. since the algorithm involves traversing subtrees in the given order. Accordingly. in the "in" algorithm. and that the left subtree of R is always traversed before the right subtree. This is pictured in this figure 4-17. (1) (2) (3) Traverse the left subtree of R in inorder. which will contain the location of the node N currently being scanned. the left-node-right (LNR) traversal and the left-right-node (LRN) traversal. Algorithm : PREORD (INFO. [End of if structure. then return to Step (a).) [Backtracking. (Thus PTR is updated using the assignment PTR = LEFT [PTR]. 4. The traversing ends after a node N with no left child L(N) is processed.] 5. Apply PROCESS to INFO [PTR]. The algorithm does a preorder traversal of T. in actual practice the locations of the nodes are assigned to PTR and are pushed onto the STACK. STACK{1} = NULL and PTR = ROOT 2. processing each node N on the path and pushing each right child R(N).) We simulate the algorithm in the next example. and the traversing stops when LEFT [PTR] == NULL.] Set TOP = TOP + 1. An array STACK is used to temporarily hold the addresses of nodes. applying an operation PROCESS to each of its nodes. if any. and STACK[TOP] = RIGHT [PTR]. otherwise Exit.] Pop and assign to PTR the top element on STACK. ROOT) A binary tree T is in memory.] set TOP = 1.1. Then repeat the following steps until PTR = NULL or. equivalently. 4-17 Algorithm: Initially push NULL onto STACK and then set PTR = ROOT. [Right child?] if RIGHT[PTR] = NULL. [End of Step 2 loop. [Let child?] if LEFT [PTR] = NULL. Although the example works with the nodes themselves. then Set PTR = STACK [TOP] and TOP = TOP .MSc. (Note that the initial element NULL on STACK is used as a sentinel. 1. In Software 26 Fig. Repeat Steps 3 to 5 while PTR = NULL 3. while PTR ≠ NULL (a) (b) Proceed down the left-most path rooted at PTR. RIGHT. and initialize PTR. Exit. onto STACK. [Initially push NULL onto STACK.] 6. then [Push on STACK. If PTR ≠ NULL. . LEFT. MSc. BINARY SEARCH TREES We will discuss one of the most important data structures in computer science. set PTR= N (by assigning PTR: = -PTR) and return to Step (a). Please refer to the site zeelearn.] Pop and process the nodes on STACK. (a) Proceed down the left-most path rooted at PTR. the location of N is pushed onto STACK. that is. if PTR = . pushing each node N onto STACK and stopping when a node N with no left child is pushed onto STACK. (In actual practice. a variable PTR (pointer) is used which contains the location of the node N that is currently being scanned. Algorithm: Initially push NULL onto STACK (for a sentinel) and then set PTR = ROOT. a binary search tree. which will hold the addresses of nodes for future processing. (b)[Backtracking. onto STACK. (a) Proceed down the left-most path rooted at PTR. set PTR = R (N) (by assigning PTR = RIGHT [PTR] and return to Step (a). We emphasize that a node N is processed only when it is popped from STACK and it is positive. In Software 27 Inorder Traversal The inorder traversal algorithm also uses a variable pointer PTR. Algorithm: Initially push NULL into STACK (as a sentinel) and then set PTR = ROOT. In fact.) Again. if N has a right child R (N). We distinguish between the two cases by pushing either N or its negative. so -N has the obvious meaning. If NULL is popped. Note: Examples on Traversal algorithm are given on the web. Then repeat the following steps until NULL is popped from STACK. because here we may have to save a node N in two different situations. then Exit. and an array STACK. At each node N of the path.] Pop and process positive nodes on STACK. a node is processed only when it is popped from STACK.com Postorder Traversal The postorder traversal algorithm is more complicated than the proceeding two algorithms. then Exit. This structure enables you to search for and find an element with an average running time f(n) = O(log2n). push -R(N) onto STACK. We emphasize that a node N is processed only when it is popped from STACK. . -N. If NULL is popped. It also enables us to easily insert and delete elements.N for some node N. If a negative node is popped. with this algorithm. This structure contrasts with the following structures. (b)[Backtracking. as shown in this figure 4-17. If a node N with a right child R (N) is processed. Then repeat the following steps until NULL is popped from STACK. push N onto STACK and. which will contain the location of the node N currently being scanned. In Software 28 (a) Sorted linear array. In fact. (b) Repeat Step (a) until one of the following occurs: (i) We meet a node N such that ITEM = N.com SEARCHING AND INSERTING IN BINARY SEARCH TREES Suppose T is a binary search tree. (ii) We meet an empty subtree. (It is not difficult to see that this property guarantees that the inorder traversal of T will yield a sorted listing of the elements of T. or inserts ITEM as a new node in its appropriate place in the tree.MSc. proceed to the right child of N. (b) Linked list. Suppose T is a binary tree. but it is expensive to search for and find an element. Suppose we give an ITEM of information. Please refer to the site zeelearn. but it is expensive to insert and delete elements. which indicates that the search is unsuccessful. the definition of the binary tree depends on a given field whose values is distinct and may be ordered. (ii) If ITEM > N. proceed to the left child of N. and we insert ITEM in place of the empty subtree. since you must use a linear search with running time f(n) = O(n). Here you can easily insert and delete elements. a single search and insertion algorithm will give the searching and inserting. 39. 63.) Note: Examples on binary search tree are given on the web. Here you can search for and find an element with a running time f(n) = O(log2 n). (a) Compare ITEM with the root node N of the tree. In other words. 60. Then we call T a binary search tree (or binary sorted tree) if each node N of T has the following property: The value at N is greater than every value in the left subtree of N and is less than every value in the right subtree of N. 54. 16 . Although each node in a binary search tree may contain an entire record of data. proceed from the root R down through the three T until finding ITEM in T or inserting ITEM as a terminal node in T. We will discuss the basic operations of searching and inserting with respect to T. In this case the search is successful. Example Suppose the following six numbers are inserted in order into an empty binary search tree: 43. (i) If ITEM < N. The following algorithm finds the location of ITEM in the binary search tree T. This procedure finds the location LOC of ITEM in T and also the location PAR of the parent of ITEM. In Software 29 This figure 4-18 shows the six stages of the tree. Procedure : FIND (INFO.ROOT. and Return. This procedure will also be used in the next section. Observe that. [ITEM at root?] . (ii) LOC ≠NULL and PAR=NULL will indicate that ITEM is the root of T. there are three possibilities: (1) the tree is empty. (iii) LOC = NULL and PAR ≠ NULL will indicate that ITEM is not in T and can be added to T as a child of the node N with location PAR. on deletion. [Tree empty?] if ROOT = NULL. 2. RIGHT. in Step 4. (2) ITEM is added as a left child and (3) ITEM is added as a right child. then Set LOC = NULL and PAR = NULL. PAR) A binary search tree T is in memory and an ITEM of information is given. ITEM. The procedure traverses down the tree using the pointer PTR and the pointer SAVE for the parent node. 4-18 The formal presentation of our search and insertion algorithm will use the following procedure.MSc. 1. which finds the locations of a given ITEM and its parent. then the tree might be different and we might have a different depth. LOC. We emphasize that if the six numbers were given in a different order. Fig. LEFT. There are three special cases: (i) LOC = NULL and PAR = NULL will indicate that the tree is empty. ] [End of Step 4 loop. then Set SAVE = PTR and PTR = PTR->LEFT. (c) Set LOC = NEW. ROOT. if ITEM < PTR->INFO. RIGHT. we move to the left child or the right child according to whether ITEM < PTR->INFO or ITEM > PTR->INFO. 3. then Set LOC = ROOT and PAR = NULL. then Write OVERFLOW.] (a) if AVAIL = NULL. [Add ITEM to tree.] 4. and Return. LOC. then Set PTR = ROOT->RIGHT and SAVE = ROOT. 6. Exit.] 7. ITEM. [ITEM found?] if ITEM = PTR->INFO. NEW->LEFT = NULL and NEW->RIGHT = NULL 4. LOC) A binary search tree T is in memory and an ITEM of information is given. [Copy ITEM into new node in AVAIL list. Notice that. Exit. This algorithm finds the location LOC of ITEM in T or adds ITEM as a new node in T at location LOC. [Search unsuccessful.] 5. 1. 8.] if PAR = NULL. and Exit. . Call FIND(INFO. then Set LOC = PTR and PAR = SAVE. in step 6. ITEM. AVAIL. Set PTR = ROOT->RIGHT and SAVE = ROOT.MSc. LEFT. if LOC ≠ NULL.] if ITEM < ROOT->INFO. then Set ROOT = NEW else if ITEM < PAR->INFO = NEW Set PAR->LEFT = NEW else Set PAR-> RIGHT = NEW [End of if structure. PAR). then Exit. and Return. In Software 30 if ITEM = ROOT->INFO. LEFT. 3.] Set LOC = NULL and PAR = SAVE. Repeat Steps 5 and 6 while PTR ≠ NULL 5. [End of If Structure. 2. [End of If Structure. [Initialize pointers PTR and SAVE. else Set SAVE = PTR and PTR = PTR->RIGHT. RIGHT. else. ROOT. Algorithm: INSBST(INFO. MSc. In Software Observe i) ii) iii) 31 that, in step 4, there are three possibilities: The tree is empty. ITEM is added as left child. ITEM is added as right child. Complexity of the Searching Algorithm Suppose we are searching for an item of information in a binary search tree T. We have to observe the depth of the tree for the number of comparisons. This comes from the fact that we proceed down a single path of the tree. Accordingly, the running time of the search will be proportional to the depth of the tree. Suppose we have been given n data item, A1, A2,....An, and suppose we insert the items in order into a binary search tree T. It can be shown that the average depth of the n trees is approximately c log2 n, where c = 1.4. Accordingly, the average time f(n) to search for an item in a binary tree T with n elements is proportional to log2 n, that is f(n) = O(log2 n). Application of Binary Search Trees Consider a collection of n data items, A1, A2,....An. Suppose we want to find and delete all duplicates in the collection. One straightforward way to do this is as follows: Algorithm A: Scan the elements from A1 to An (that is, from left to right). (a) For each element AK, compare AK with A1, A2,.....,AK-1, that is, AK with those elements which precede AK. (b) If AK does occur among A1, A2,...., Ak-1, then delete AK. compare After all the elements have been scanned, there will be no duplicates. Example Suppose Algorithm a is applied to the following list of 15 numbers: 14, 10, 17, 12,10, 11, 20, 12, 18, 25, 20, 8 22, 11, 23 Observe that the first four numbers (14, 10, 18 and 12) are not deleted. However, A5 A8 A11 A14 = = = = 10 is 12 is 20 is 11 is deleted, since A5 = A2 deleted, since A8 = A4 deleted, since A11 = A7 deleted, since A14 = A6 When Algorithm A is finished running, the 11 numbers 14, 10, 17, 12, 11, 20, 18, 25, 8 22, 23 MSc. In Software 32 which are all distinct, will remain. Consider now the time complexity of algorithm A, which is determined by the number of comparisons. First of all, we assume that the number d of duplicates is very small compared with the number n of data items. Observe that the step involving AK will require approximately k -1 comparisons, since we compare AK with items A1, A2,...AK-1. Accordingly, the number f(n) of comparisons required by Algorithm A is approximately 0 + 1 + 2 + 3 + ....+ (n-2) + (n-1) = (n-1)n/2 = O(n2) For example, for n = 1000 items, Algorithm A will require approximately 500,000 comparisons. In other words, the running time of Algorithm A is proportional to n2. Using a binary search tree, we can give another algorithm to find the duplicates in the set A1, A2,..., An of n data items. Algorithm B: Build a binary search tree T using the elements A1, A2,...., An. In building the tree, delete AK from the list whenever the value of AK already appears in the tree. The main advantage of Algorithm B is that each element AK is compared only with the elements in a single branch of the tree. It can be shown that the average length of such a branch is approximately clog 2 k, where c = 1.4. Accordingly, the total number f(n) of comparisons required by Algorithm B is approximately nlog2n, that is, f(n) = O(nlog2n). For example, for n = 1000, Algorithm B will require approximately 10,000 comparisons rather than the 500 000 comparisons of Algorithm A. (We note that, for the worst case, the number of comparisons for Algorithm B is the same as for Algorithm A.) Note: Explanation of Deletion from binary search and examples on deletion from binary search tree are given on the web. Please refer to the site zeelearn.com HEAP; HEAPSORT In this section we will discuss another tree structure, called a heap. We will use heap in an elegant sorting algorithm called heapsort. Suppose H is a complete binary tree with n elements. (Unless otherwise stated, we assume that H is maintained in memory by a linear array TREE using the sequential representation of H, not a linked representation.) Then H is called a heap, or a maxheap, if each node N of H has the following property: The value at N is greater than or equal to the value at each of the children of N. Accordingly, the value at N is greater than or equal to the value at any of the descendants of N. A minheap is defined analogously: The value at N is less than or equal to the value at any of the children of N. Example MSc. In Software 33 Consider the complete tree H in this figure 4-19. Observe that H is a heap. This means, in particular, that the largest element in H appears at the "top" of the heap, that is, at the root of the tree. This figure 4-19(b) shows the sequential representation of H by the array TREE. That is, TREE [1] is the root of the tree H, and the left and right children of node TREE [K] are, respectively, TREE[2K] and TREE [2K + 1]. This means, in particular, that the parent of any nonroot node TREE[J] is the node TREE [J / 2] (where J / 2 means integer division). Observe that the nodes of H on the same level appear one after the other in the array TREE. 101 92 97 67 57 97 50 67 37 50 57 66 67 26 39 19 41 31 27 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 (b) Sequential representation Fig. 4-19 Inserting into a Heap Suppose H is a heap with N elements, and suppose an ITEM of information is given. We insert ITEM into the heap H as follows: (1) First adjoin ITEM at the end of H so that H is still a complete tree, but not necessarily a heap. (2) Then let ITEM rise to its "appropriate place" in H so that H is finally a heap. We will illustrate the way this procedure works before stating the procedure formally. Example Consider the heap H in this figure 4-20. Suppose we want to add ITEM = 71 to H. First we adjoin 71 as the next element in the complete tree; that is, we set TREET [21] = 71. Then (c) Compare 71 with its new parent. Fig. N. Since 71 does not exceed 92. (b) Compare 71 with its new parent. The path from 71 to the root of H is pictured in this figure 4-20(a). interchange 71 and 57. the path will now look like this 4-20(b). Since 71 is greater than 57. Please refer to the site zeelearn. the path will now look like this 4-20(c). We now find the appropriate place of 71 in the heap as follows: (a) Compare 71 with its parent. In Software 34 71 is the right child of TREE [10] = 50. ITEM = 71 has risen to is appropriate place in H. 57. Since 71 is greater than 50. interchange 71 and 50.MSc. 4-23 Note: More examples on Inserting into heapsort are given on the web. 50. A dotted line indicates that an exchange has taken place.com Procedure: INSHEAP (TREE. 92. This figure 4-23(d) shows the final tree. ITEM) . . [End of If structure.] 5.. Example .] Repeat Steps 3 to 6 while PTR < 1.] 6.. Observe that ITEM is not assigned to an element of the array TREE until the appropriate place for ITEM is found. (3) (Reheap) let L sink to its appropriate place in H so that H is finally a heap. [Add new node to H and initialize PTR. Suppose an array A with N elements is given.] Set N = N + 1 and PTR = N 2. This is accomplished as follows: (1) Assign the root R to some variable ITEM (2) Replace the deleted node R by the last node L of H so that H is still a complete tree..] 7.MSc. This procedure inserts ITEM as a new element of H. we can build a heap H out of the array A. [Moves node down. PTR gives the location of ITEM as it rises in the tree. [Find location to insert ITEM. Set PAR = [PTR/2].. Again we illustrate the way the procedure works before stating the procedure formally. and Return. [Updates PTR. with N elements. 3. Return..] Set TREE[1] = ITEM 8. Step 7 takes care of the special case that ITEM rises to the root TREE [1].1. and PAR denotes the location of the parent of ITEM 1. and an ITEM of information is given. Deleting the root of a Heap Suppose H is a heap. [Location of parent node. [Assign ITEM as the root of H. In Software 35 A heap H with N elements is stored in the array TREE. by executing Call INSHEAP(A. that is. By repeatedly applying the above Procedure to A. and suppose we want to delete the root R of H. If ITEM < TREE [PAR]. Set TREE [PTR] =TREE[PAR]..N . A[J + 1]) For j = 1. J. Set PTR = PAR. 2. but not necessarily a heap.] 4.] [End of Step 2 loop. then: Set TREE [PTR] = ITEM. 85 and 70. Fig. we find the appropriate place of 22 in the heap as follows: (a) Compare 22 with its two children. interchange 22 and 85 so the tree now looks like this 4-24(c). Again we leave this verification to the reader. which is not a heap. node 22 has dropped to its appropriate place in H.MSc. . Thus Fig. Since 22 is less than the larger child 85. Since 22 is greater than both children. (b) Compare 22 with its two new children. We also note that Step 3 of the procedure may not end until the node L reaches the bottom of the tree. that both the right and left subtrees of 22 are still heaps. interchange 22 and 55 so the tree now looks like this 4-24(d). we must verify that the above procedure does always yield a heap as a final tree. This gives the complete tree in this figure 4-24(b). (c) Compare 22 with its new children. where R = 95 is the root and L=22 is the last node of the tree. 15 and 20. 55 and 33. however. In Software 36 Consider the heap H in this figure 4-24(a). until L has no children. Applying Step 3. Since 22 is less than the larger child. i. Step 1 of the above procedure deletes R = 95 by L=22. 4-24 Remark: As we are inserting an element into a heap.e. 55. The formal statement of our procedure is as follows. Observe. 4-24(d) is the required heap H without its original root R. 7.] Set PTR =1. using Procedure 7. else Set TREE[PTR]= TREE[RIGHT] and PTR= RIGHT. then Set TREE[PTR] = TREE[LEFT] and PTR= LEFT. N) An array A with N elements is given. 1. then Set PTR = LEFT. 6. Algorithm: HEAPSORT(A.] Set LEFT =2*PTR and RIGHT= LEFT+1 [End of Step 4 loop. Set ITEM = TREE[1]. The pointers PTR. A formal statement of the algorithm is as follows. 10. [End of If structure.9. 9. Phase B deletes the elements of A in decreasing order. Application to Sorting Suppose an array A with N elements is given. This procedure assigns the root TREE[1] of H to the variable ITEM and then reheaps the remaining elements.MSc. The variable LAST saves the value of the original last node of H. N. The reason for the two “if” statement in Step 8 is that TREE[LEFT] may not defined when LEFT > N. [Remove root of H] Set LAST = TREE[N] and N= N-1. [End of If structure. Set TREE[PTR] = LAST. then Set TREE[PTR] =TREE[LEFT] and Return. The heapsort algorithm to sort A consists of the two following phases: Phase A: Build a heap H out of the elements of A.] if TREE[RIGHT]≤ TREE[LEFT]. 4. 1. 2. [Removes last node of H. Return. 5.] Repeat for j=1 to N-1 . LEFT and RIGHT gives the locations of LAST and its left and right children as LAST sinks in the tree. Phase B: Repeatedly delete the root element of H. [Initializes pointers] Repeat Steps 5 to 7 while RIGHT≤ N if LAST ≥ TREE[LEFT] and LAST ≥ TREE[RIGHT]. 3.LEFT=2 and RIGHT = 3. Step 8 takes care of the special case in which LAST does not have a right child but does have a left child (which has to be the last node in H). 8. In Software 37 Procedure: DELHEAP(TREE. This algorithm sorts the elements of A. [Build a heap H. Since the root of H always contains the largest node in H. The step 4 loop repeats as long as LAST has a right child.] if LEFT=N and if LAST<TREE[LEFT]. ITEM) A heap H with N elements is stored in the array TREE. Observe that this gives a worst-case complexity of the heapsort algorithm. the running time to sort the nelement array A using heapsort is proportional to n log2n.] 3. Phase B. Suppose H is a complete tree with m elements. That is. Phase A. and suppose the left and right subtrees of H are heaps and L is the root of H. Observe that the number of comparisons to find the appropriate place of a new element ITEM in H cannot exceed the depth of H. that is. In Software 38 Call INSHEAP(A. [End of loop. its depth is bounded by log2 m where m is the number of elements in H. the reader can verify that the given Step2(b) does not interfere with the algorithm. Suppose H is a heap.MSc. [Sort A by repeatedly deleting the root of H] Repeat while N > 1 (a) Call DELHEAP(A. A[J=1]). Complexity of Heapsort Suppose the heapsort algorithm is applied to an array A with n elements. Exit The purpose of step2(b) is to save space. and we analyze the complexity of each phase separately. the running time of Phase B of heapsort is also proportional to n log2n. which requires reheaping n times is bounded as follows: H(n) ≤ 4n log2 n Accordingly. (b) Set A[N+1] =ITEM. [End of Loop. This means that the total number h(n) of comparisons to delete the elements of A from H. one could use another array B to hold the sorted elements of A and replace Step 2(b) by Set B[N+1]= ITEM However. The algorithm has two phases. J. Accordingly the total number g(n) of comparisons to insert the n elements of A into H is bounded as follows: g(n) ≤ n log2n Consequently the running time of Phase A of heapsort is proportional to n log2 n. .n). reheaping uses at most 4 log2 m comparisons to find the appropriate place of L in the tree. Since the depth of H does not exceed log2 m. Observe that reheaping uses 4 comparisons to move the node L one step down the tree H.N. Since each phase requires time proportional to n log2n.ITEM). Since H is a complete tree. since A[N+1] does not belong to the heap H. f(n)=O(n log2.] 2. which can be traversed in two directions: in the usual forward direction from the beginning of the list to the end. This structure is mainly used to represent data containing a hierarchical relationship between elements.e. at the beginning of the list. suppose N is a node in T with left successor S1 and right successor S2. # We frequently use terminology describing family relationships to describe relationships between the nodes of a tree T. called nodes. # We have introduced a new list structure. # Suppose we have not sorted our linked list and there is no reason to insert a new node in any special place in the list. # The list. called the header node. or in the backward direction from the end of the list to the beginning. Furthermore. # Languages like C that support dynamic memory allocation with structures and pointers use the following technique. i. called a two-way list. The term underflow refers to the situation where we want to delete data from a data structure that is empty. # Sometimes we insert new data into a data structure but there is no available space. called the predecessor of N. Then N is called the parent (or father) of S1 and S2. Analogously. which always contains a special node. contains the address of the next node in the list. Every node N in a binary tree T. The first part contains the information of the element. . # Binary tree is a nonlinear data structure called a tree.. Then the easiest place to insert the node is at the beginning of the list. S1 is called the left child (or son) of N. which has its own pointer. the free-storage list is empty. where the linear order (or linearity) is given by means of pointers. We divide each node into two parts. records. has a unique parent.MSc.g. In Software 39 Summary # A linked list is a linear collection of data elements. except the root. This situation is usually called overflow. and the second part. is called the list of available space or the freestorage list or the free pool. S1and S2 are said to be siblings (or brothers). e. # If ITEM is actually a key value and we are searching through a file for the record containing ITEM then ITEM can appear only once in LIST. The linked list node is defined as a structure. family and tables of contents. # We frequently use Circular header lists instead of ordinary linked lists because many operations are much easier to state and implement-using header lists. Specifically. and S2 is called the right child (or son) of N. # A header-linked list is a linked list. called the link field or next pointer field. In Software Zee Interactive Learning Systems 40 .MSc. MSc. QUEUES. Polish Notation Consider this C-Program on Stack Quicksort Recursion Factorial Function Fibonacci Sequence Divide-and-Conquer Algorithms Towers Of Hanoi Queues Representation of Queues 1 . In Software 5 STACKS. RECURSION Main Points Covered ! ! ! ! ! ! ! ! ! ! ! ! ! Introduction Stacks Array Representation Of Stacks Arithmetic Expressions. a stack of dishes. Observe a queue at the bus stop. stacks are also called last-in first-out (LIFO) lists. Another example of a queue is a batch of jobs waiting to be processed. assuming no job has higher priority than the others. the people at the front of the line board first. Thus queues are also called first-in first-out (FIFO) lists. E. which allows us to insert and delete elements at any place in the list i. boards last. in particular. Queue: A queue is a linear list in which items may be added only at one end and items may be removed only at the other end. its underlying property is that . For notational convenience. Remember that these terms are used only with stacks. F The implication is that the right-most element is the top element. Other names used for stacks are "piles" and "push-down lists.MSc. STACK: A . B. not with other data structures. or in the middle. the person who comes first. That is. we will frequently designate the stack by writing. This means. E. Each new person who comes takes his or her place at the end of the line. STACKS : Special terminology is used for two basic operations associated with stacks. Observe that an item may be added or removed only from the top of any of the stacks. regardless of the way a stack is described. C. B. An example of such a structure is. (b) "Pop" is the term used to delete an element from a stack. it has many important applications in computer science. Stack : A stack is a linear structure in which items may be added or removed only at one end. We emphasize that. D.e. boards first and who comes last. There are certain frequent situations in computer science when we want to restrict insertions and deletions so that they can take place only at the beginning or the end of the list. at the beginning. not in the middle. in order. (a) "Push" is the term used to insert an element into a stack. Two of the data structures that are useful in such situations are stacks and queues. Accordingly. In Software 2 Introduction W e have already learned linear lists and linear arrays. that the last item to be added to a stack is the first item to be removed. F The figure 5-1 shows three ways of picturing such a stack. onto an empty stack: A. C. Suppose the following 6 elements are pushed." Although the stack may seem to be a very restricted type of data structure. The name "queue" likely comes from the everyday use of the term. at the end. D. and when the bus comes. and so on. top (b) F E D C B A Fig 5-1 Diagrams of stacks Fig 5-1 Postponed Decisions We use stacks frequently to indicate the order of the processing of data when certain steps of the processing must be postponed until other conditions are fulfilled. Furthermore. then we need to complete project B before we return to project A. (a) (b) (c) (d) (e) (f) . as pictured in the figure 5-2(c) and begin to process D. as pictured in figure 5-2(b) and begin to process C. Suppose we are processing some project A. as pictured in the figure 5-2(a) and begin to process B. suppose that while processing B we are led to project C.MSc. for the same reason. suppose that while processing C we are likewise led to project D. and desire to move on to project B. This means E cannot be deleted before F is deleted. However. We have illustrated it below. Then we place B on the stack above A. D cannot be deleted before E and F are deleted. We place the folder containing the data of A onto a stack. Then we place C on the stack above B. In Software 3 insertions and deletions can occur only at the top of the stack. Consequently. the elements may be popped from the stack only in the reverse order of that in which they were pushed onto the stack. from the stack. Notice that. YYY and ZZZ. Hence we remove folder C from the stack. leaving the stack as pictured in figure 5-2(d) and continue to process C. which contains the location of the top element of the stack. The condition TOP = 0 or TOP = NULL will indicate that the stack is empty. we remove the last folder A. each of our stacks will be maintained by a linear array STACK. the stack has three elements. Figure 5-3 pictures such an array representation of a stack. leaving the stack as pictured in figure 5-2(e) and continue to process B. Unless otherwise stated or implied. Then the only project we may continue to process is project C. Fig 5-3 Overflow: . which is on top of the stack. and since MAXSTK = 8. the stack automatically maintains the order that is required to complete the processing. XXX. 5-2 On the other hand. ARRAY REPRESENTATION OF STACKS We can represent stacks in the computer in various ways. after completing the processing of C. Usually we denote it by means of a linear array. In Software 4 Fig. leaving the empty stack pictured in figure 5-2(f) and continue the processing of our original project A. a pointer the stack can hold variable TOP. and a variable MAXSTK. which gives the maximum number of elements that the stack can hold. Finally. at each stage of the above processing. after completing the processing of B. Similarly.MSc. there is room for 5 more items in the stack. we remove folder B from the stack. suppose we are able to complete the processing of project D. Since TOP = 3. Assigns TOP element to ITEM. 1. Let’s see few examples on the above algorithm: (a) Consider the stack in figure 5-3. ITEM) and POP (STACK. TOP = 3 + 1 = 4 . [Decreases TOP by 1. Set TOP = TOP + 1. then print OVERFLOW. We simulate the operation PUSH (STACK. WWW): 1. TOP. Return POP(STACK. ITEM) This procedure pushes an ITEM onto a stack. ITEM) This procedure deletes the top element of STACK and assigns it to the variable ITEM. If not. Since TOP = 3. Underflow: The case is same in executing the procedure POP. Set TOP = TOP .MSc. 1. [Inserts ITEM in new TOP Position] 4. control is transferred to Step 2. and return. Procedure: Procedure: PUSH(STACK. Return It is observed that frequently. hence the procedures may be called using only PUSH (STACK.] 4. then we have the condition known as overflow. In Software 5 The procedure for adding (pushing) an element is called PUSH and removing (pop) an item is called POP. We must first test whether there is an element in the stack to be deleted. If not. Set STACK [TOP] = ITEM.] 3. 2. In executing the procedure PUSH. We note that the value to TOP is changed before the insertion in PUSH. TOP and MAXSTK are global variables. [Stack has an item to be removed] if TOP = 0 then Print UNDERFLOW.1. ITEM) respectively.] 3. but the value of TOP is changed after the deletion in POP. 2. and return. we have to test whether there is room in the stack for the new item. Set ITEM = STACK [TOP]. [Increases TOP by 1. then we have the condition known as underflow. [Stack already filled?] if TOP = MAXSTK. 2. TOP. MAXSTK. the number of elements in a stack fluctuates as elements are added to or removed from a stack. which requires two stacks. Most of these techniques lie beyond the scope of this text. Suppose we have been given an algorithm. Observe that STACK [TOP] = STACK [2] = YYY is now the top element in the stack. Overflow. reserving a small amount of space for each stack may increase the number of times overflow occurs.MSc. Return 6 Note that W is now the top element in the stack. STACK [TOP] = STACK [4] = WWW. Initially reserving a great deal of space for each stack will decrease the number of time overflow may occur. 4. Return. This time we simulate the operation POP (STACK. such as by adding space to the stack.1 = 2 4. We can define an array STACKA with n1 elements for stack A and an array STACKB with n2 elements for stack B. Since TOP = 3. depends on the arbitrary choice of the programmer for the amount of memory space reserved for each stack.4. . On the other hand. control is transferred to Step 2. Various techniques have been developed which modify the array representation of stacks so that the amount of space reserved for more than one stack may be more efficiently used. ITEM): 1. ITEM = ZZZ 3. Underflow depends exclusively on the given algorithm and the given input data. and hence there is no direct control by the programmer. One of these techniques is shown in fig 5. the particular choice of the amount of memory for a given stack involves a time-space tradeoff. The time required for resolving an overflow. and this choice does influence the number of times overflow may occur. However. Accordingly. on the other hand. Overflow will occur when either STACKA contains more than n1 elements or STACKB contains more than n2 elements. Minimizing Overflow There is an essential difference between underflow and overflow in dealing with stacks. this may be an expensive use of space if most of the space is seldom used. may be more expensive than the space saved. (b) Consider again the stack in figure 5-3. A and B. Generally. Top = 3 . 2. In Software 3. We will see that the stack is an essential tool in this algorithm. In this section we have shown an algorithm.10 / 2 . overflow will occur only when A and B together have more than n = n1 + n2 elements. we assume the following three levels of precedence for the usual five binary operations: Highest : Exponentiation (^ ) Next highest: Multiplication (*) and division (/) Lowest : Addition (+) and subtraction (-) Food for thought Suppose we want to evaluate the following parenthesis free arithmetic expression: 4 ^ 2 + 9 * 4 ^ 3 . Specifically. Recall that the binary operations in Q may have different levels of precedence. and we define STACK [n] as the bottom of stack B and let B 'grow' to the left. In using this data structure. This technique will usually decrease the number of times overflow occurs even though we have not increased the total amount of space reserved for the two stacks.MSc. In this case. Fig. The right answer will be (a) (b) (c) 587 299990 999995 (a) is the right choice. 5-4 ARITHMETIC EXPRESSIONS. we define STACK [1] as the bottom of stack A and let A ‘grow’ to the right. which finds the value of Q by using reverse Polish (postfix) notation. the operations of PUSH and POP need to be modified. As pictured in the figure below. First we evaluate the exponentiations to obtain 16 + 9 * 64 – 5 . In Software 7 Suppose we define a single array STACK with n = n1 + n2 elements for stacks A and B together. POLISH NOTATION Let Q be an arithmetic expression involving constants and operations. CD- . each time corresponding to a level of precedence of the operation. Accordingly. Last. Polish Notation (prefix notation) For most common arithmetic operations. EF* .5. A Polish mathematician Jan Lukasiewicz. the following infix expressions into Polish notation using brackets [ ] to indicate a partial translation: (A + B) * C = [+ AB]*C = *+ ABC A + (B * C) = A + [*BC] = + A * BC (A + B)/(C . E*F . G/H This is called infix notation. introduced a post notation methodology in which the operator symbol is placed before its two operands.MSc.D) = [+ AB]/[ . we do not need parentheses to determine the order of the operations in any arithmetic expression written in reverse Polish notation. step by step. Reverse Polish notation (postfix notation) refers to the analogous notation in which the operator symbol is placed after its two operands: AB+ . For example. Observe that the expression is traversed three times.CD] = / + AB – CD The fundamental property of Polish notation is that the order in which the operations are to be performed is completely determined by the positions of the operators and operands in the expression. one never needs parentheses when writing expressions in Polish notation. the order of the operators and operands in an arithmetic expression does not uniquely determine the order in which the operations are to be performed. A+B. 587.D. +AB -CD *EF /GH We translate. . we must distinguish between (A + B) * C and A + (B * C) by using either parentheses or some operator-precedence convention such as the usual precedence levels discussed above. we place the operator symbol between its two operands. Accordingly. GH/ Again. With this notation. In Software 8 Then we evaluate the multiplication and division to obtain 16 + 576 . we evaluate the addition and subtraction to obtain the final result. For example. C. when Step 5 is executed. 4. (Commas are used to separate the elements of P so that 5.2 5. 2. where A is the top Element and B is the next-to-top element. Algorithm: This algorithm finds the VALUE of an arithmetic expression P written in postfix notation. If an operand is encountered. Add a right parenthesis ")" at the end of P. 2 is not interpreted as the number 562). and then it evaluates the postfix expression.MSc. 5.] Scan P from left to right and repeat Steps 3 and 4 for each element of P until the Sentinel ")" is encountered. *. then (a) Remove the two top elements of STACK. the stack is the main tool that is used to accomplish the given task. put it on STACK. In Software 9 The computer usually evaluates an arithmetic expression written in infix notation into postfix notation. [This acts as a sentine1. 6. If an operator (x) is encountered. there should be only one number on STACK. which uses a STACK to hold operands. Consider an arithmetic expression: 5 * ( 6 + 2 ) – 12 / 4. /. The following algorithm. 2. (c) Place the result of (b) back on STACK [End of If structure.8 40 . 6. In each step. evaluates P. 1. and then we show how stacks are used to transform infix expressions into postfix expressions. Symbol Scanned (1) 5 (2) 6 (3) 2 (4) + (5) * STACK 5 5. We note that. 6. Evaluation of a Postfix Expression Suppose P is an arithmetic expression written in postfix notation. In postfix notation it will become: 5.6 5. 3.6. (b) Evaluate B (x) A. +. 4. Exit. 12.] Set VALUE equal to the top element on STACK. We will illustrate first how stacks are used to evaluate postfix expressions. Algorithm: POLISH (Q. are performed from left to right unless otherwise indicated by parentheses. Now it takes ‘-’ sign from the STACK and applies this on the two numbers that it has got in its hand. add it to P. 1. Transforming Infix Expression into Postfix Expressions We can transform infix expression into postfix expression. Next it takes ‘*’ from the STACK. (This is not standard. We begin by pushing a left parenthesis onto STACK and adding a right parenthesis at the end of Q. Let Q be an arithmetic expression written in infix notation. We assume that the operators in Q consist only of exponentiations (^). When it meets a division sign. Then it executes this ‘+’ on the last two numbers that it takes away from the STACK. The algorithm uses a stack to temporarily hold operators and left parentheses. push it onto STACK. 2. additions (+) and subtractions (-). Scan Q from left to right and repeat steps 3 to 6 for each element of Q until the STACK is empty 3. giving us 8. divisions (/). 5. P) Suppose Q is an arithmetic expression written in infix notation. these assumptions simplify our algorithm. it executes a division on the last two numbers that it has taken away from the STACK. Remember it still has got 40 in its hand. In Software (6) (7) (8) (9) (10) 12 4 / ) 10 40. Next it takes away 12 and 4 one after another. It executes multiplication on the two numbers that it has got in its hand. Q may also contain left and right parentheses.) We have given the algorithm. However. which are removed from STACK. 4. After that it takes ‘+’. then .3 37 How the STACK works: STACK first takes 5. and add ")" to the end of Q. giving us 40. since expressions may contain unary operators and some languages perform the exponentiations from right to left. We also assume that operators on the same level. This algorithm finds the equivalent postfix expression p. which transforms the infix expression Q into its equivalent postfix expression P. if an operator (x) is encountered.12 40.MSc. ‘)’ sign is the sign for the end of the procedure. and then it takes 6 and then 2 one after another. and those have the usual three levels of precedence as given above. including exponentiations. if a left parenthesis is encountered. Besides operands and operators. The algorithm is completed when STACK is empty. giving us 3. multiplications (*).4 40.12. The postfix expression P will be constructed from left to right using the operands from Q and the operators. Push "(" onto STACK. if an operand is encountered. then a. stack_type *stack_ptr = &stack. stack_ptr). Repeatedly pop from STACK and add to P each operator (on the top of STACK) until a left parenthesis is encountered. In Software 11 a) Repeatedly pop from STACK and add to P each operator (on the top of STACK). and they will come off in the reverse order. } initialize(stack_ptr). item_type *item_ptr = &item. We can accomplish this task by putting each character onto a stack as it is read. stack_type stack. Consider this function first. 1) push() 2) pop() 3) initialize() 4) empty() 5) full() */ . /*initialize the stack to be empty */ getchar()) ! = ‘\n’) /* push each item onto the stack*/ /*pop an item from the stack This is a typical program for stack. while( !empty(stack_ptr)) { pop(item_ptr. [End of if structure. } putchat(‘\n’). if a right parenthesis is encountered.stack_ptr). Void Reverse Read(void) { item_type item. b.] (End of Step 2 loop. In the above function we are calling five other functions.] 7.] 6. we then pop the characters off the stack. which has the same precedence as or higher precedence than (x). Difficult to understand? Read on. [Do not add the left parenthesis.MSc.] [End of if structure. Exit. while(!full(stack_ptr) && (item = push(item. Remove the left parenthesis. When the line is finished. b) Add (x) to STACK. Consider this C-Program on Stack Let us suppose that we wish to make a function that will read a line of input and then write it backward. exit (1). } /* Pop: Pop an item from the stack. Boolean_type empty(stack_type *). In Software 12 Our rule is one step at a time. Boolean_type full(stack_type*). item_type entry[MAXSTACK]. typedef struct struct_tag { int top. and then invokes the standard function exit to terminate the execution of the program. else *item_ptr = stack_ptr->entry [. Let MAXSTACK be a symbolic constant giving the maximum size allowed for stacks and item_type be a type describing the data that will be put into the stack. Void initialize(stack_type*)..stack_ptr->top]. prints the string. */ void push (Item_type item. first few lines of your program will be like this. "%s\n". So. stack_type *). Void push(item_type.*/ void pop (item_type *item_ptr. Stack_type * stack_ptr) { if (stack_ptr->top >= MAXSTACK) Error ("Stack is full"). /* Error : print error message and terminate the program.MSc. else stack_ptr->entry [ stack_ptr->top++] = item. Void pop(item_type *.*/ void Error (char *s) { fprintf (stderr. } . /* Push : push an item onto the stack. stack_type *). } stack_type. } Error is a function that receives a pointer to a character string. Let us now discuss the prototypes of all these functions. Stack_type * stack_ptr) { if (stack_ptr->top < = 0) Error ("Stack is empty"). #define MAXSTACK 10 typedef char item_type. s). e. 25.MSc. or alphabetically ordered when A contains character data. "Sorting A" refers to the operation of rearranging the elements of A so that they are in some logical order. 93. } The next function initializes a stack to be empty before it is first used in a program: /* Initialize : initialize the stack to be empty. comparing each number with 49 and stopping at the first number less than 49. called quicksort.*/ Boolean_type Empty (Stack_type *stack_ptr) { return stack_ptr->top < = 0.*/ Boolean_type Full (Stack_type *stack_ptr) { return stack_ptr->top > = MAXSTACK. In Software 13 In normal circumstances Error is not executed because. (69) The reduction step of the quicksort algorithm finds the final position of one of the numbers i. 57. we check whether the stack is full before we push an item and we check if the stack is empty before we pop an item. scanning the list from right to left. 13. 81. 64. .*/ void initialize (Stack_type *stack_ptr) { stack_ptr->top = 0. We can consider A as numerical or alphabetical. such as numerically ordered when A contains numerical data. OTHER OPERATIONS /* Empty : returns non-zero if the stack is empty. 43. In this section we will discuss only one sorting algorithm. } QUICKSORT. That is. We illustrate this "reduction step" by means of a specific example. } /* Full : returns non-zero if the stack is full. in order to illustrate an application of stacks. The number is 25. the problem of sorting a set is reduced to the problem of sorting two smaller sets. 37. 91. Then we consider the last number. AN APPLICATION OF STACKS Let A be a list of n data items. as in ReverseRead. 49. Suppose A is a list of following numbers: (49). Quicksort is an algorithm of the divide-and-conquer type. 102. Interchange 49 and 25 to obtain the list. 69. ) Beginning with 43. We can accomplish this by using two stacks. Interchange 49 and 43 to obtain the list. 102. 93. Furthermore. . 25. 91. 13. 64. The number is 57. next scan the list in the opposite direction. 13. 64. 93. 91. 91. 102. as shown below: 25. to temporarily "hold" such sub-lists. 37. the numbers to the left of 49 are each less than 49. The first number greater than 49 is 81.) Beginning with 81. (49). 69 (Observe that the numbers 91 and 69 to the right of 49 are each greater than 49. (57). This means all numbers have been scanned and compared with 49. 49. called LOWER and UPPER. 14 102.) Beginning with 25. called its boundary values. 64. 69 (Observe that the numbers 25. That is. (49). 43. 43. 57. 37 and 13 to the left of 49 are each less than 49). In Software (25). 57. and the task of sorting the original list A has now been reduced to the task of sorting each of the above sub-lists. 81. (43). we must be able to keep track of some sub-lists for future processing. We have to repeat the above reduction step with each sub-list containing 2 or more elements. 13. the addresses of the first and last elements of each sub-list. 57. (81). scan the list from left to right.MSc. The following example illustrates the way the stacks LOWER and UPPER are used. 13. We do not meet such a number before meeting 49. Since we can process only one sub-list at a time. 25. from left to right. It is 43. 91. 93. (49). 81. 93. Interchange 49 and 57 to obtain the list. Interchange 49 and 81 to obtain the list. 93. 64. 69 (Again. and all numbers greater than 49 now form the sub-list of numbers to the right of 49. 69 (Again. 37. respectively. 43. 37. 13. 57. (49). are pushed onto the stacks LOWER and UPPER. scan the list from right to left seeking a number less than 49. Beginning this time with 57. 69 First sub-list Second sub-list Thus 49 is correctly placed in its final position. the numbers to the right of 49 are each greater than 49. comparing each number with 49 and stopping at the first number greater than 49. all numbers less than 49 now form the sub-list of numbers to the left of 49. 91. 37. 37. 64. 102. 43. (81). until you meet the first number less than 49. now scan the list in the original direction. 81. from right to left. 102. 25. The reduction step is applied to a sub-list only after its boundary values are removed from the stacks. 3 [Scan from left to right] (a) Repeat while A[LEFT] < A [LOC] and LEFT ≠ LOC. then (i) [Interchange A[LEFT] and A [LOC] TEMP = A [LOC].] Algorithm : (Quicksort) This algorithm sorts an array A with N elements.] if BEG < LOC .] Set BEG = LOWER [TOP]. In Software Procedure: 15 QUICK (A. A[LOC] = A [RIGHT] A [RIGHT] = TEMP. then. LOWER [TOP] = BEG. Call QUICK (A. 1. 5. END. {pop sublist from stacks. 2 [Scan from right to left.MSc. LOC keeps track of the position of the first element A(BEG) of the sublist during the procedure. [Push left sublist onto stacks when it has 2 or more elements. LEFT = LEFT + 1 [End of loop.] (b) if LOC = RIGHT. [Push boundary values of A onto stacks when A has 2 or more elements. 6. [End of if structure.] TOP = NULL 2. [Initialize] Set LEFT = BEG.] If N>1. BEG. RIGHT = END and LOC = BEG. BEG.1. (ii) Set LOC = RIGHT (iii) GO to Step 3. UPPER[1] = N 3. then TOP = TOP + 1. [Initialize. 1.] if LOC = LEFT. Parameters BEG and END contain the boundary values of the sublist of A to which this procedure applies. LOC). LOWER [1] = 1. UPPER [TOP] = LOC – 1 [End of if structure. The local variables LEFT and RIGHT will contain the boundary values of the list of elements that have not been scanned. then RETURN> (b) (c) if A[LEFT]> A [LOC].1.] . LOC) Here A is an array with N elements. Repeat Steps 4 to 7 while TOP ≠ NULL. N. 4. N. END = UPPER [TOP] TOP = TOP .] (a) Repeat while A [LOC] < [RIGHT] and LOC ≠ RIGHT [End of loop. then Return (c) if A[LOC]> A[RIGHT]. A [LEFT] = TEMP (ii) Set LOC = LEFT (iii) Go to Step 2. (i) [Interchange A[LOC] and A [RIGHT] TEMP = A [LOC]. A [LOC] = A[LEFT]. then TOP = TOP + 1. END. /* This function is responsible to sort the contiguous list between low and high */ /* We have used the recursion function here.key.low.] Exit Let us now discuss a little C-function on this logic. 16 [Push right sublist onto stacks when it has 2 or more elements. int low. sort(lp.] if LOC + 1 < END. /* remember its location */ for(i=low+1.lp). swap(low.pivotloc –1). key_type pivotkey. return lp. We will discuss more about it a little later.lp). if(low < high) { pivotloc = partition(lp.lp->count-1). (low +high)/2. UPPER [TOP] = END [End of if structure.0.pivotloc +1.MSc. /* swap pivot into first position */ pivotkey = lp ->entry[low].pivotkey)) swap(++pivotloc. In our C program we have already told you about a recursion function. pivot = low. 8.i<=high.] [End of Step 3 loop. high). int low.1. } The sort() will look like this.i++) if(LT(lp->entry[i]. In Software 7. .high).key. } } /* This function is responsible to return the pivot location for a contiguous list */ int partition (list_type *lp. sort(lp. LOWER [TOP] = LOC + 1. /* This function is responsible to sort a contiguous list */ List_type *quick_sort(list_type *lp) { sort(lp.pivotloc. in high) { int pivotloc. then TOP = TOP + 1. */ Void sort(list_type *lp. int high) { int i.low. the second sublist will have n . There are many algorithms.pivotloc. four sublists places 4 elements and produces eight sublists. In fact. RECURSION Recursion is an important concept in computer science. We are leaving it now. The worst case occurs when we sort the list.lp). each level uses at most n comparisons. which we can best describe in terms of recursion.1 elements. hence there will be approximately log2 n levels of reductions steps. mathematical analysis and empirical evidence have both shown that. f(n) = 1. we have to do n-1 comparison to recognize that it remains in the second position. In this section we have introduced this powerful tool. so f(n) = O (n log n). the first element requires n comparison to get recognized if it remains in the first position. And so on. The complexity f(n) = O(n log n) of the average case comes from the fact that. The reason for this is indicated below. each reduction step of the algorithm produces two sublists. the algorithm has a worst-case running time of order n2/2. Observe that the reduction step in the kth level finds the location of 2k-1 elements. Consequently. Generally speaking. Observe that this is equal to the complexity of the bubble sort algorithm. if the first sublist is empty. Complexity of the Quicksort Algorithm We can measure the running time of a sorting algorithm by the number f(n) of comparisons required to sort n elements. + 2 + 1 = n ( n + 1 ) / 2 = n2/2 + O(n) = O( n2 ) comparisons. Accordingly. . In Software 17 swap(low.MSc. } Read this function very carefully. We will discuss more about it in our program section. And so on. on the average. Furthermore.4 [n log n] is the expected number of comparisons for the quicksort algorithm. (1) (2) (3) (4) Reducing Reducing Reducing Reducing the the the the initial list places 1 element and produces two sublists. but an average-case running time of order nlog n. Furthermore. return pivotloc. there will be a total of f(n) = n + ( n – 1 ) + ……. eight sublists places 8 elements and produces sixteen sublists. two sublists places 2 elements and produces four sublists. 5. So that the program will not continue to run indefinitely.4. A recursive procedure with these two properties is said to be well defined. called base criteria. n! = n. ( n . Again.2) (n .. Each time the function does refer to itself. in order to avoid the definition being circular. the argument of the function must be closer to a base value. The product of the positive integers from 1 to n.3.1)! Accordingly. = (5!). for which the function does not refer to itself. In Software 18 Consider a procedure P containing either a Call statement to itself or a Call statement to a second procedure that may eventually result in a Call statement back to the original procedure P. (1) (2) There must be certain arguments.5 = 120 3! = 1.5 = (4!). it must have the following two properties.5 = 120 and so on.6 = 720 This is true for every positive integer n. the factorial function may also be defined as follows: . so that the function is defined for all nonnegative integers. that is. 3…(n . Factorial Function We can explain the recursion with the example of factorial function.1) n It is also convenient to define 0! = 1. a function is said to be recursively defined if the function refers to itself. 2 .6.2.3. (1) There must be certain criteria. inclusive.2. Then P is called a recursive procedure. is called "n factorial" and is usually denoted by n!: n! = 1.MSc. called base values.3 = 6 4! = 1. it must be closer to the base criteria. a recursive procedure must have the following two properties.2.4. Observe that 5! = 1.2 = 2 5! = 1. The following examples should help clarify these ideas. (2) Each time the procedure does call itself (directly or indirectly).= 24 6!= 1. A recursive function with these two properties is also said to be well defined.4.2.3.2. Similarly.3. Thus we have 0! = 1 1! = 1 2! = 1.4. for which the procedure does not call itself. we associate a level number with each given execution of procedure P as follows. Set FACT = 1.1) ! However.1) ! Observe that this definition of n! is recursive. Procedure A: FACTORIAL (FACT. Call FACTORIAL (FACT. then n! = n . since it contains a call to itself.1). if N = 0. and (b) the value of n! for arbitrary n is defined in terms of a smaller value of n which is closer to the base value 0. on the other hand. 1. Procedure B: FACTORIAL (FACT. or in other words. Return. Suppose P is a recursive procedure. then. The original execution of procedure P is assigned level 1. then n! = 1 (b) If n > 0. using an iterative loop process. is a recursive procedure. In Software Definition: 19 (Factorial Function) (a) If n = 0. and Return. 3. the procedure is well-defined. Set Fact = 1.] 4. Set FACT = K * FACT. which contains P. [End of loop. (a) the value of n! is explicitly given when n = 1 (thus) is the base value). Accordingly. The depth of recursion of a recursive procedure P with a given set of arguments refers to the maximum level number of P during its execution. and each time procedure P is executed because of a recursive call. [Initializes FACT for loop. Set FACT = N * FACT. since it refers to itself when it uses (n . then Set FACT = 1. N) This procedure calculates N! and returns the value in the variable FACT. The second procedure. if N = 0. 4. the definition is not circular. N) This procedure calculates N! and returns the value in the variable FACT. During the running of an algorithm or a program. (n . 1.] 3. 2. The following are two procedures that calculate n factorial. N .MSc. We can observe that the first procedure evaluates N!. 2. Repeat for K = 1 to N. its level is 1 more than the level of the execution that has made the recursive call. . and return. Return. … That is.MSc. with a sorting algorithm. 1. a one-element set is automatically sorted. The base criteria for these algorithms are usually the one-element sets. a once-element set requires only a single comparison. The reason for this is that the algorithm A may be viewed as calling itself when it is applied to the smaller sets. the next two terms of the sequence are 34 + 55 = 89 and 55 + 89 = 144 A formal definition of this function is as follows: Definition: (Fibonacci Sequence) (a) If n = 0 or n = 1. A procedure for finding the nth term Fn of the Fibonacci sequence is given below Procedure: FIBONACCI (FIB. 2. . = 0 and F1 = 1 and each succeeding term is the sum of the two preceding terms. then Fn = n (b) If n > 1. F2. Call FIBONACCI (FIBA. Suppose A is an algorithm which partitions S into smaller sets such that the solution of the problem P for S is reduced to the solution of P for one or more of the smaller sets. 21. F1. For example. Return. and return. Call FIBONACCI (FIBB. Then A is called a divide-and-conquer algorithm. N) This procedure calculates FN and returns the value in the first parameter FIB. and with a searching algorithm. We can view a divide-and-conquer algorithm as a recursive procedure. then: Set FIB = N. 1. F0. We can use the quicksort algorithm to find the location of a single element and to reduce the problem of sorting the entire set to the problem of sorting smaller sets. 0. 2. 3. then Fn = Fn-2 + Fn-1. Divide-and-Conquer Algorithms We consider a problem P associated with a set S. 34. N-1) 4. N -2) 3. 1. 5. 55. if N = 0 or N = 1. 13. We can use the binary search algorithm to divide the given sorted set into two halves so that the problem of searching for an item in the entire set is reduced to the problem of searching for the item in one of the two halves. …) is as follows. Set FIB = FIBA + FIBB 5. In Software 20 Fibonacci Sequence The celebrated Fibonacci sequence (usually denoted by F0. For example. 8. (0.(0. Specifically. then A (m. n) = n + 1 (b) If m ≠ 0 but n = 0. TOWERS OF HANOI In the preceding section we have given some examples of recursive definition and procedures. n)…. n) is explicitly given only when m = 0. In this section we show how recursion may be used as a tool in developing an algorithm to solve a particular problem. Its importance comes from its use in mathematical logic. (0. (0. In Software 21 Ackermann Function The Ackermann function is a function with two arguments each of which can be assigned any nonnegative integer: 0. 2.…… This function is defined as follows. labeled A. B and C. 2). The function is stated here in order to give another example of a classical recursive function and to show that the recursion part of a definition may be complicated. A (m. then A(m. The object of the game is to move the disks from peg A to peg C using peg B as an auxiliary. the value of any A (m. n) = A (m . The rules of the game are as follows: (a) (b) We can move only one disk at a time. 1.1. Definition: (Ackermann Function) (a) If m = 0. n) = A(m-1. Observe that A (m. 1). n) may eventually be expressed in terms of the value of the function on one or more of the base pairs. At no time can a larger disk be placed on a smaller disk. we have a recursive definition. The problem we pick is known as the Towers of Hanoi problem.…. . only the top disk on any peg may be moved to any other peg. Although it is not obvious from the definition. Suppose we have been given three pegs. (0.MSc. 3). and suppose on peg A there are placed a finite number n of disks with decreasing size. 1). since the definition refers to itself in parts (b) and (c). 0). then a (m. n -1)) Once more. (c) If m ≠ 0 and n ≠ 0. This is pictured in figure 5-5 for the case n = 6. The base criteria are the pairs. The Ackermann function is too complex to evaluate on any example. C. 5-5 Sometimes we will write X Y to denote the instruction "Move top disk from peg X to peg Y.MSc. A. " where X and Y may be any of the three pegs. B. C. C. . 5-6. B. The solution to the Towers of Hanoi problem for n = 3 appears in fig. Observe that it consists of the following seven moves: n=3 Move Move Move Move Move Move Move top top top top top top top disk disk disk disk disk disk disk from from from from from from from peg peg peg peg peg peg peg A A C A B B A to to to to to to to peg peg peg peg peg peg peg C. In Software 22 Fig. we also give the solution to the Towers of Hanoi problem for n = 1 and n = 2 .MSc. For completeness. In Software 23 Fig 5-6 In other words. and then we move the top five disks from peg B to peg C. First we observe that the solution to the Towers of Hanoi problem for n > 1 disks may be reduced to the following subproblems. (2) Move the top disk from peg A to peg C: A C (3) Move the top n . In Software n = 1: n = 2: A A C B. we use the technique of recursion to develop a general solution. The reduction is illustrated in figure 5-7 for n = 6.1 disks from peg A to peg B. then we move the large disk from peg A to peg C. (1) Move the top n . . first we move the top five disks from peg A to peg B.MSc. B 24 C Note that n = 1 uses only one move and that n = 2 uses three moves.1 disks from peg B to peg C. Rather than finding a separate solution for each n. A C. That is. as discussed above. (1) TOWER (N . which moves the top n. BEG. when n > 1.1. AUX. When n = 1. we have the following obvious solution: TOWER (1. BEG. AUX. In Software 25 Fig. Observe that the recursive solution for n = 4 disks consists of the following 15 moves: . BEG. END) or BEG -> (3) TOWER (N . END) END We can solve each of these three sub-problems directly or is essentially the same as the original problem using fewer disks.MSc. BEG. END) To denote a procedure.1. AUX. END) consists of the single instruction BEG END Furthermore. 5-7 Let us now introduce the general notation TOWER (N. BEG. the solution may be reduced to the solution of the following three sub-problems. END AUC) (2) TOWER (1. disks from the initial peg BEG to the final Peg END using the peg AUX as an auxiliary. Accordingly. AUX. this reduction process does yield a recursive solution to the Towers of Hanoi problem. END.1 disks and a solution for n = 1 disk. since the solution for n disks is reduced to a solution for n .AUX). END) 4. [Move n .a. QUEUES . BEG.1 moves for n disks. printf(“Move a disk from %d to %d. this recursive solution requires f (n) = 2n .c.b. AUX.MSc.c. END) This procedure gives a recursive solution to the Towers of Hanoi problem for N disks.1 disks from peg BEG to peg AUX. #define DISK 64(say) main() { move(DISK. if N = 1. BEG -> END (b) Return [End of if structure. AUX. The Tower of Hanoi problem illustrates the power of recursion in the solution of various algorithmic problems. then (a) Write.b). 3. AUX). Return.b). int a. 1.] 2. One can view this solution as a divide-and-conquer algorithm. [Move N . } } Procedure: TOWER (N.] Call TOWER (N-1. Write BEG -> END 3.disk from peg AUX to peg END. Let us now discuss a small C-program on this.\n”.a). a. ] Call TOWER (N-1. BEG.BEG. In Software 26 In general. int b. move(n-1. int c) { if(n>0) { move(n-1.END. BEG. } void move(int n. called the rear. You can observe that the front and rear elements of the queue are also. However. People waiting in line at a bank form a queue. where we get BBB as the front element. Next. Observe that in such a data structure. which are last-in first-out (LIFO) lists. to yield the queue in figure (d). respectively. Now suppose we delete another element from the queue. And so on.MSc. the first and last elements of the list. An important example of a queue in computer science occurs in a timesharing system. Note that FFF is now the rear element. then it must be BBB. Then it must be AAA. in which programs with the same priority form a queue while waiting to be executed. Queues are important in everyday life. in which the first car in line is the first car through. Then they must be added at the rear of the queue. In other words. Suppose we delete an element from the queue. and insertions can take place only at the other end. Fig. EEE will have to wait until CCC and DDD are deleted. The automobiles waiting to pass through an intersection form a queue. as pictured in figure (c). 5-8 . This yields the queue in figure (b). This contrasts with stacks. We use the terms "front" and "rear" to describe a linear list only when it is implemented as a queue. EEE will be deleted before FFF because it has been placed in the queue before FFF. the order in which elements enter a queue is the order in which they leave. suppose EEE is added to the queue and then FFF is added to the queue. called the front. Example We have shown figure 5-8(a) is a schematic diagram of a queue with 4 elements. since the first element in a queue will be the first element out of the queue. where the first person in line is the first person to be waited on. where AAA is the front element and DDD is the rear element. We also call Queues as first-in first-out (FIFO) lists. In Software 27 A queue is a linear list of elements in which deletion can take place only at one end. whenever an element is added to the queue. This figure also indicates the way elements will be deleted from the queue and the way new elements will be added to the queue. i. The condition FRONT = NULL will indicate that the queue is empty. and then inserting ITEM as above. containing the location of the rear element of the queue. This occurs even though the queue itself may not contain many elements. this can be implemented by the assignment REAR = REAR + 1 This means that after N insertions. Suppose we want to insert an element ITEM into a queue at the time the queue does occupy the last part of the array.e. the value of REAR is increased by 1. This procedure may be very expensive. this can be implemented by the assignment FRONT = FRONT + 1 Similarly. eventually the queue will occupy the last part of the array. containing the location of the front element of the queue. One way to do this is to simply move the entire queue to the beginning of the array. Observe that whenever an element is deleted from the queue. the value of FRONT is increased by 1. The procedure we adopt is to assume that the array . the rear element of the queue will occupy QUEUE [N] or. and REAR. in other words. Unless otherwise stated or implied. changing FRONT and REAR accordingly. In Software 28 Representation of Queues We can represent queues in the computer in various ways. usually by means of one-way lists or linear arrays. Figure (5-9) shows the way the array will be stored in memory using an array QUEUE with N elements. each of our queues will be maintained by a linear array QUEUE and two pointer variables FRONT.MSc. when REAR = N. that is. i. QUEUE [REAR] = ITEM Similarly. . instead of increasing REAR to N + 1. Specifically. QUEUE[1] comes after QUEUE[N] in the array. 5-9 QUEUE is circular.MSc. Then we assign FRONT = NULL and REAR = NULL to indicate that the queue is empty.. In Software 29 Fig. With this assumption. Suppose that our queue contains only one element. we reset FRONT = 1 instead of increasing FRONT to N + 1. if FRONT = N and an element of QUEUE is deleted. we insert ITEM into the queue by assigning ITEM to QUEUE [1]. we reset REAR = 1 and then assign. FRONT = REAR ≠ NULL And suppose that the element is deleted.e. Other names used for stacks are "piles" and "push-down lists." # "Pop" is the term used to delete an element from a stack. Suppose A is an algorithm which partitions S into smaller sets such that the solution of the problem P for S is reduced to the solution of P for one or more of the smaller sets. # Consider a procedure P containing either a Call statement to itself or a Call statement to a second procedure that may eventually result in a Call statement back to the original procedure P. insert a node at the end of the queue If the queue is not empty. retrieve the first node If it is not empty. and insertions can take place only at the other end. Then P is called a recursive procedure. In Software 30 Comparative definition between Stack & Queue Initialise the stack to be empty Determine if the stack is empty or not Determine if the stack is full or not If the stack is not full. # We consider a problem P associated with a set S. insert a node at the top end of the stack If the stack is not empty. Stacks are also called last-in first-out (LIFO) lists. "Push" is the term used to insert an element into a stack. That is. delete the first node Summary # A stack is a linear structure in which items may be added or removed only at one end. # Quicksort is an algorithm of the divide-and-conquer type. Zee Interactive Learning Systems . retrieve the top node If it is not empty. delete the node at the top Initialise the queue to be empty Determine if the queue is empty or not Determine if the queue is full or not If the queue is not full. called the rear.MSc. Then A is called a divide-and-conquer algorithm. called the front. # A queue is a linear list of elements in which deletion can take place only at one end. the problem of sorting a set is reduced to the problem of sorting two smaller sets. MSc. In Software 6 SORTING AND SEARCHING TECHNIQUES MAIN POINTS COVERED ! Introduction ! Sorting ! Insertion Sort ! Selection Sort ! Merging ! Merge-sort ! Summary 1 . Each record in a file F can contain many fields. Each sorting algorithm must take care of this n! possibilities. but there may be one particular field whose values uniquely determine the records in the file. there are n! ways that the contents can appear in A. 21. 21. Such a field K is called a primary key. 100 . We have already discussed some of the sorting and searching algorithm such as.…. DATA must appear in memory as follows: DATA: 2.An in memory. and the values k1. k2 . In Software 2 INTRODUCTION S orting and Searching are important operations in computer science.. so that A1 ≤ A2 ≤ A3 ≤ … An. Sorting and searching apply to a file of records. 44. such as increasing or decreasing. Searching refers to the operation of finding the location of a given item in a collection of items. 44. Sorting the file F usually refers to sorting F with respect to a particular primary key. Example Suppose an array named DATA contains 8 elements as follows: DATA: 87. and here are some standard terminologies of that field. Sorting A refers to the operation of rearranging the contents of A so that they are increasing in order (numerically or lexicographically).. with character data. 76. 55. or alphabetically. linear and binary search but there are many more sorting and searching algorithms. 76. the running time f(n) of each algorithm and the space requirements of our algorithms. that is.. A2. 20. in such a field are called keys or key values. Sorting refers to the operation of arranging data in some given order. 2. 55. The particular algorithm one chooses depends on ! ! the properties of the data the operations that one may perform on the data We will discuss the complexity of each algorithm. and searching in F refers to searching for the record with a given key value. with numerical data. 20. 100 After sorting.MSc. 87. Since A has n elements. SORTING Let A be a list of n elements A1. We note that each sorting algorithm S will be made up of the following operations. we give you the approximate number of comparisons and the order of complexity of some algorithms that we have already discussed in the previous modules. Algorithm Bubble Sort Quicksort Heapsort Worst case n(n-1)/2 = O(n2) n(n+3)/2 = O(n2) 3nlogn = O(n logn) Average Case n(n-1)/2 = O(n2) ( 1. where A1. (c) is the correct choice In studying the average case. (b) The elements are sorted in descending order. we make the probabilistic assumption that all the n! permutations of the given n items are equally likely. An contain the items to be sorted and B is an auxiliary location (used for temporary storage): (a) (b) (c) Comparisons. we use the complexity function to measures the number of comparisons. There are two main cases whose complexity we will consider the worst case and the average case. However. which set B = Ai and then set Aj = B or Aj = Ai Normally. A2…. which test whether Ai < Aj or test whether Ai < B Interchanges. In Software 3 Since DATA consists of 8 elements there are 8! = 40320 ways that the numbers 2.MSc.4)nlogn = O(n logn) 3nlogn = O(n logn) Remark . Food for thought In average case analysis what is the usual assumption that one has to make? (a) The elements are sorted in ascending order. empirical . Complexity of Sorting Algorithms We measure the complexity of a sorting algorithm of the running time as a function of the numbers n of items to be sorted. which switch the contents of Ai and Aj or of Ai and B Assignments. (c) The probabilistic assumption that all the n! permutations are equally likely. 20.100 can appear in DATA. Its main advantage is the simplicity of the algorithm. Just to give you a feel of how things work. (d) No assumptions are required.….Note first that the bubble sort is a very slow way of sorting. since the number of other operations is at most a constant factor of the number of comparisons. but its worstcase complexity (n log n) seems quicker than quicksort (n2). Observe that the average-case complexity (n log n) of heapsort is the same as that of quicksort. Sorting Pointers Suppose a file F of records R1. inserting each element A[K] into its proper position in the previously sorted subarray A[1]. Sorting Files. Sorting a file F by reordering the records in memory may be very expensive when the records are very long. k2 …kn. …A[K-1]. Example Suppose the personnel file of a company contains the following data on each of its employees: Name Employee Number Sex Salary If we sort the file with respect to the Name key will yield a different order of the records than sorting the file with respect to the Employee Number key. . (Recall we call K as a primary key if its values uniquely determine the records in F. A[1] by itself is trivially sorted. The selection sort algorithm scans A from A[1] to A[n].) Sorting the file with respect to another key will order the records in another way.MSc. it simply separates the employees into two subfiles. INSERTION SORT Suppose an array A with n element A[1]. Moreover. Accordingly. we sort POINT so that KEY[POINT[1]] ≤ KEY[POINT[2]] ≤ … ≤ KEY[POINT[N]] Note that choosing a different field KEY will yield a different order of the array POINT.…. The company may want to sort the file according to the Salary field even though the field may not uniquely determine the employees. That is. …A[N] is in memory. A[2]. That is: Pass 1. In Software 4 evidence seems to indicate that quicksort is superior to heapsort on rare occasions. Sorting the file with respect to the Sex key will likely be useless. A[2]. R2. the records may be in secondary memory. one with the male employees and one with the female employees. Rn is stored in memory. “Sorting F” refers to sorting F with respect to some field K with corresponding values k1. we may prefer to form an auxiliary array POINT containing pointers to the records in memory and then sort the array POINT with respect to a field KEY rather than sorting the records themselves. the records are ordered so that ≤ kn k1 ≤ k2 ≤ …≤ The field K is called the sort key. That is. where it is even more time-consuming to move records into different locations. Pass 4. Example Suppose an array A contains 8 elements as follows: 7. A[N-1] so that A[1]. …………………………… …………………………… …………………………… …………………………….…A[j+1] is moved forward one location. A[2] is inserted either before or after A[1] so that : A[1]. 8. 5 This figure above illustrates the insertion sort algorithm.] (2) Repeat Steps 3 to 5 for K = 2. with A[k-1] comparing A[k] with A[k-2].…. and the arrow indicates the proper place for inserting A[k]. A[4] is inserted into its proper place in A[1]..A[N] is sorted. A[2]. N (3) Set TEMP = A[K] and PTR = K-1. We simplify the algorithm if there always is an element A[j] such that A[j] ≤ A[k]. (1) Set A[0] = -∝ [Initializes sentinel element. This condition can be accomplished by introducing a sentinel element A[0] = -∝ (or a very small number).. comparing A[k] with A[k-3]. In Software 5 Pass 2. A[2]. A[4] is sorted. and so on. 3. Real life instances where insertion sort is used: This sorting algorithm is frequently used when n is small..…. N). A[3]. 3. between A[1] and A[2] or after A[2] so that : A[1]. The formal statement of our insertion sort algorithm is as follows: Algorithm: (Insertion Sort) INSETION (A. A[2]. . Example There remains only the problem for us of deciding how to insert A[k] in its proper place in the sorted subarray A[1]. We can accomplish this by comparing A[k]. A[3] is inserted either into its proper place in A[1].A[2]. The circled element indicates the A[k] in each pass of the algorithm. Then each of the elements A[k-1]. 1. until first meeting an element A[j] such that A[j]≤A[k].…A[k-1]. otherwise we must constantly check to see if we are comparing A[k] with A[1] .A[k-2. 4.MSc. A[2]. A[N] is inserted into its proper place in A[1]. Pass 3. 6. and A[k] is then inserted in the j+1st position in array. A[3] so that: A[1]. This algorithm sorts the array A with N elements. 2. A[2]. A[2] is sorted.…. Pass N.A[3] is sorted .A[2] that before A[1]. A[K-1]. we get the worst case when the array A is a reverse order and the inner loop must use the maximum number K-1 of comparisons. In Software 6 (4) Repeat while TEMP < A[PTR] (a) Set A[PTR+1]=A[PTR].+ n(n-1) = --------.A[2]…. Complexity of Insertion Sort We can easily compute the number of comparisons. . And so on. Observe that there is an inner loop. which is essentially controlled by the variable PTR. This requires. insertion sort is usually used only when n is small. SELECTION SORT Suppose an array A with n element A[1] A[2]. First we have to find the smallest element in the list and put it in the first position then find the second smallest element in the list and put it in the second position. on the average. [End of loop.…A[n] is in memory. we can show that. Furthermore. one needs to move (k-1)/2 elements forward. Hence n(n-1) f(n) = 1 + 2+…. for the average case. on the average.= O(n2) 2 Furthermore. [Moves element forward] (b) Set PTR=PTR-1.= O(n 2) 2 2 2 4 Thus the insertion sort algorithm is a very slow algorithm when n is very large. 1 2 n-1 n(n-1) f(n) = --+---+…+ ----.] (5) Return. rather than a linear search.= --------. and in such a case. Accordingly. Algorithm Insertion Sort Worst Case n(n – 1)/2 = O(n2) Average Case n( n – 1)/4 = O(n2) Time may be saved by performing a binary search. The above results are summarized in the following table. log K comparisons rather than (k-1)/2 comparisons. the linear search is about as efficient as the binary search. However. to find the location in which to insert A[K] in the subarray A[1]. which uses k as an index. and there is an outer loop. there will be approximately (k1)/2 comparisons in the inner loop.MSc. [Inserts element in proper place] [End of Step 2 loop.] Set A[PTR+1]=TEMP. f(n) in the insertion sort algorithm. The selection sort algorithm for sorting A works as follows. Thus the order of complexity is not changed. First of all. MSc. In Software 7 The stepwise procedure is as follows: Pass 1. Find the location LOC of the smallest in the list of n elements A[1],A[2],…..A[n] and then interchange A[LOC] and A[1]. Then A[1] is sorted. Pass 2. Find the location LOC of the smallest in the sublist of n-1 element A[2],A[3],…..A[n] and then interchange A[LOC] and A[2] Then A[1],A[2] is sorted, since A[1] ≤ A[2]. Pass 3. Find the location LOC of the smallest in the sublist of n-2 elements A[3],A[4],….A[n], and then interchange A[LOC] and A[3] . Then A[1],A[2],…..,A[3] is sorted since A[2] ≤ A[3]. ……… ………………………………………………………………. Pass n-1 Find the location LOC of the smaller of the elements A[n-1], A[n], and then interchange A[LOC] AND A[n-1]. Then A[1],A[2],….A[n] is sorted , since A[n-1] ≤ A[n]. Thus A is sorted after n-1 passes. Example Suppose an array A contains 8 elements as follows: 7, 3, 4, 1, 8, 2, 6, 5 The problem is, finding, during the kth pass, the location LOC of the smallest among the elements A[k], A[k+1],….A[n] . This may be accomplished by using a variable MIN to hold the current smallest value while scanning the subarray from A[K] to A[n]. Specifically, first set MIN = A[k] and LOC=K, and then traverse the list, comparing MIN with other element A[j] as follows: (a) (b) If MIN ≤ A[j], then simply move to the next element. If MIN > A[j] then update MIN and LOC by setting MIN = A[j] and LOC =j. After comparing MIN with the last element A[N],MIN will contain the smallest among the elements A[k], A[k+1],…A[n] and LOC will contain its location. The above process will be stated separately as a procedure. Procedure: MIN (A, k, n, LOC) An array A is in memory. This procedure finds the location LOC of the smallest element among A[k], A[k+1],…A[n]. (1) (2) Set MIN = A[k] and LOC = k [Initializes pointers] Repeat for j = k+1, k+2, …n if MIN > A[j] , then Set MIN = A[j] and LOC= A[j] and LOC = j. MSc. In Software (3) 8 [End of loop] Return The selection sort algorithm can now be easily stated. Algorithm: (Section Sort) SELECTION (A, n) This algorithm sorts the array A with n elements. (1) Repeat Steps 2 and 3 for k=1, 2, ….., n-1 (2) Call MIN (A, k, n, LOC) (3) [Interchange A[k] and A[LOC].] Set TEMP = A[k],A[k]=A[LOC] and A[LOC]= TEMP. [End of Step 1 loop.] (4) Exit Complexity of the Selection Sort Algorithm First we have to note that the number f(n) of comparisons in the selection sort algorithm is independent of the original order of the elements. Observe that MIN(A,k,n,LOC) requires n-k comparisons. That is there are n-1 comparisons during Pass 1 to find the smallest element, there are n-2 comparisons during Pass 2 to find the second smallest element, and so on. Accordingly, n(n-1) f(n)=(n-1)+(n-2) +….+2+1 = ------- = O(n2) 2 The above result is summarized in the following table: Algorithm Selection Sort Worst Case n(n – 1)/2 = O(n2) Average Case n( n – 1)/4 = O(n2) Remark: The number of interchange and assignments depends on the original order of the elements in the array. A, but the sum of these operations does not exceed a factor of n². MERGING Suppose A is a sorted list with r elements and B is a sorted list with s elements. The operation that combines the elements of A and B into a single sorted list C with n = r + s elements is called merging. One simple way to merge is to place the elements of B after the elements of A and then use some sorting algorithm on the entire list. We cannot take advantage of the fact that A and B are individually sorted. Given below is an efficient algorithm. First, however, we indicate the general idea of the algorithm by means of two examples. MSc. In Software 9 Description of the formal algorithm We will translate the above discussion into a formal algorithm which mergers a sorted r-element array A and a sorted s-element array B into a sorted array C, with n = r + s element. First, of all, we must always keep track of the locations of the smallest element of A and the smallest element of B, which have not yet been placed in C. Let NA and NB denote these locations, respectively. Also, let PTR denote the location in C to be filled. Thus, initially we set NA = 1, NB = 1 and PTR = 1. At each step of the algorithm, we compare A[NA] and B[NB] and assign the smaller element to C[PTR]. Then we increment PTR by setting PTR = PTR+1 and we either increment NA by setting NA = NA+1, or increment NB by setting NB = NB+1, according to whether the new element in C has come from A or from B. Furthermore, if NA > r then the remaining elements of B are assigned to C; or if NB > s, then the remaining elements of A are assigned to C. The formal statement of the algorithm is as follows: Algorithm: MERGING (A, R, B, S, C) Let A and B be sorted array with R and S element, respectively. This algorithm merges A and B into an array C with N=R+S elements. (1) (2) 3. 4. [Initialize] Set NA = 1, NB= 1 and PTR = 1. [Compare.] Repeat while NA ≤ R and NB ≤ S; if A[NA] < B[NB], then (a)[Assign element from A to C] Set C[PTR] = A[NA]. (b)[Update pointer.] Set PTR =PTR+1 and NA =NA+1. else (a) [Assign element from B to C] Set C[PTR] =B[NB] (b) [Update pointer] Set PTR = PTR+1 and NB = NB+1. [End of if structure] [End of loop.] [Assign remaining elements to C] if NA > R, then Repeat for K = 0, 1, 2, …., R-NA Set C[PTR+K] = A[NA+K]. [End of loop.] else Repeat for K = 0, 1, 2, ….. , R – NA Set C[PTR + K] = A[NA + K]. [End of loop] [End of if structure.] Exit On the other hand. 84. 22. Merge each pair of elements to obtain the following list of sorted pairs: . 64. the algorithm may be improved in two ways as follows: (Here we assume that A has 5 elements and B has 100 elements) (1) Reducing the target set: Suppose after the first search we find that A[1] is to be inserted after B[16].) MERGE-SORT Suppose an array A with n elements A[1].MSc.B[100] to find the proper location to insert A[2]. B[k-19]. 33 Each pass of the merge-sort algorithm will start at the beginning of the array A and merge pairs of sorted subarrays as follows: Pass 1. 54.A[n] is in memory. and then we use a binary search on B[K-20]. 25.…. B[40]. 43. the merging algorithm can be run in linear time. which eventually has n elements. Accordingly the number f(n) of comparisons cannot exceed n f(n) ≤ n = O(n) In other words.…B[k]. 91. 37. and so on. Each comparison assigns an element to the array C. Then we need to use only a binary search on B[17] . 47. A[2]. Accordingly. The binary search and insertion algorithm does not take into account the fact that A is sorted. not near b[50]. 13. B[80] and B[100] to find B[K] such that A[1] ≤ B[K] . which indicate the location of all the words with the same first letter. In Software 10 Complexity of the Merging Algorithm The input consists of the total number n = r + s of elements in A and B.…. 80. B[s/r]). Hence we first use a linear search on B[20]. only approximately log 100 = 7 comparisons are needed to find the proper place to insert an element of A and B using the binary search and insertion algorithm. (2) Tabbing: The expected location for inserting A[1] in B is near B[20] (that is. 59. Then if we merge A and B by the above algorithm it will perform approximately 100 comparisons. Example Suppose A has 5 elements and B has 100 elements. The mergesort algorithm which sorts A will first be described by means of a specific example. (This is analogous to using the tabs in a dictionary. Suppose the array A contains 14 elements as follows: 72. we get the total numbers S of elements in the Q pairs of subarray. 80. 59. and the second part will repeatedly apply MERGEPASS until A is sorted. 91 22. 37. Hence R = N-S denotes the number of the remaining element. We will translate the above informal description of merge-sort into a formal algorithm. 25. 59. 47. Merge each pair of sorted quadruplets to obtain the following two sorted subarrays: 13. Merge the two-sorted subarrays to obtain the single sorted array 13. which will be divided into two parts. 54. except possibly the last. In Software 37 72 25 43 59. 84 Pass 4. Description The above merge-sort algorithm for sorting an array A has the following important property. 64. After Pass k. Moreover. 91 The original array A is now sorted. The procedure first merges the initial Q pairs of L-element subarray. which uses the procedure discussed above to execute a single pass of the algorithm. The first part will be a procedure MERGEPASS. 64. A which consists of a sequence of sorted subarrays. 54. 72. 72 13.80 Pass 3. Merge each pair of pairs to obtain the following list of sorted quadruplets 25.) Setting S = 2*L*Q. 37. 91 22. 25. 84 33. 22. 43. 72. that is Q = INT(N/(2*L)) (We use INT(X) to denote the integer value of X. we obtain the quotient Q which tells the number of pairs of L-elements sorted subarrays. Dividing n by 2 * L. 47. will contain exactly L = 2k elements. . 47. 33. 64.MSc. 43. 84. 59. 54. 43. we will partition the array A into sorted subarrays where each subarray. 91 13 64 22 84 47 11 54 33 80 Pass 2. each subarray consists of L elements except that the last subarray may have fewer than L elements. 37. 33. 80. We apply the MERGEPASS procedure to an n-element array. Then the procedure takes care of the case where there is an odd number of subarray (when R ≤ L) or where the last subarray has fewer than L elements. Hence the algorithm requires at the most log n passes to sort an n-elements array A. Set L = 1. Q (a) Set LB = 1+(2*J-2)*L [find lower bound of first array] (b) Call MERGE (A. S= 2*L*Q and R= N – S . we must execute the procedure MERGEPASS an even number of times. Exit Since we want the sorted array to finally appear in the original array A. A. B. 2. Algorithm: MERGESORT(A. 1. 5.] 3 [Only one subarray left?] if R ≤ L then Repeat for J = 1. A. 2*L. 2. B) The N-element array A is composed of sorted subarrays where each subarray has L elements except possibly the last subarray. L. 1. A). Set Q = INT(N/(2*L)). [Use Procedure discussed above to merge the Q pairs of subarrays. L+S+1.2…R Set B( S + J ) = A( S + J ) [End of loop] else Call MERGE (A. B. LB) [End of loop.. B). …. Set L = 4 * L.N. L. Complexity of the Merge-Sort Algorithm We use f(n) to denote the number of comparisons needed to sort an n-element array A using the merge-sort algorithm. L. In Software 12 The formal statement of MERGEPASS and the merge-sort algorithm is as follows: Procedure: MERGEPASS(A.MSc. LB+L.N) This algorithm sorts the N-element array A using an auxiliary array B. Call MERGEPASS(B.] Repeat for J = 1. L. LB. S+1. [Initialize the number of elements in the subarray] 2 Repeat Steps 3 to 6 while L < N 3. which may have fewer than L elements. Call MERGEPASS(A. N. The procedure merges the pairs of subarrays of A and assigns them to the array B. L. Recall that the algorithm requires at . N. [End of Step 2 loop] 6. R. 4. S+1) [End of if structure] 4 Return. with character data. with numerical data. f(n) ≤ n log n Observe that this algorithm has the same order as heapsort and the same average order as quicksort. or alphabetically. Moreover each pass merges a total of n elements. # We measure the complexity of a sorting algorithm of the running time as a function of the numbers n of items to be sorted. # We learned about Insertion sort. Accordingly for both the worst case and average case. Merge Sort. which is independent of n. The main drawback of merge-sort is that it requires an auxiliary array with n element. Each pass will require at most n comparisons. Each of the other sorting algorithms we have studied is that it requires only a finite number of extra locations.MSc. Zee Interactive Learning Systems . and by the discussion on the complexity of merging. selection sort. The above results are summarized in the following table: Algorithm Merge-Sort Worst Case n log n = O(n log n) Average Case n log n = O(n log n) Extra O(n) Summary # Sorting and Searching are important operations in computer science. such as increasing or decreasing. Searching refers to the operation of finding the location of a given item in a collection of items. Sorting refers to the operation of arranging data in some given order. In Software 13 most log n passes. MSc. In Software 7 HASHING TECHNIQUES MAIN POINTS COVERED ! Hashing ! Hash Functions ! Collision Resolution ! Open Addressing (Linear probing and modification) ! Chaining ! Summary 1 . as far as possible uniformly distribute the hash addresses throughout set L so that there are minimum number of collisions. The terminology. the function H should be very easy and quick to compute. In Software 2 HASHING T he search time for each algorithm we discussed so far depends on the number of elements of data. which we will be using in our presentation of hashing. Second. the function H should. Secondly we assume that F is maintained in memory by a table T of m memory locations and that L is the set of memory addresses of the locations in T. The search will require no comparisons at all but a lot of space will be wasted. This situation is called collision. but it must be modified so that a great deal of space is not wasted. For notational convenience. Unfortunately. It is possible that two different keys k1 and k2 will yield the same hash address. use the employee number as the address of the record in memory. And some method must be used to resolve it. Such a function.MSc. This modification takes the form of a function H from the set K of keys into the set L of memory addresses. such a function H may not yield distinct values. H: K # L is called a hash function or hashing function. The idea of using the key to determine the address of a record is an excellent idea. Example Suppose a company with n employees assigns an employee number to each employee. which is essentially independent of the number of elements. Hashing is a searching technique. We will introduce the subject of hashing by the following example. We can in fact. Hash Functions The two principal criteria in selecting a hash function H: K # L are as follows: First of all. we assume that the keys in k and the addresses in L are (decimal) integers. . First of all we assume that there is a file F of n records with a set k of keys that uniquely determine the records in F. Accordingly the topic of hashing is divided into two parts: (1) Hash function and (2) Collision resolutions We will discuss each of these two parts with you separately. will be oriented towards file management. Example We can consider the company in the previous example each of whose 68 employees is assigned a unique 4-digit employee number.) The hash function H is defined by or H (k) = k (mod m) H(k) = k(mod m) + 1 Here k(mod m) denotes the remainder when k is divided by m. k4. We emphasize that the positions of k2 must be the same for all the keys. Since this frequently minimizes the number of collisions. kr where each part. However. that is. Sometimes for extra milling the even-numbered parts. … .MSc. certain general techniques do help. are ignored. One technique is to chop a key k into pieces and combine them in some way to form the hash address H(k). H(k) = k1 + k2 + …. 01. (b) Midsquare method: Here. ignoring the last carry. We emphasize that the computer can easily and quickly evaluate each of these hash functions.+ kr where the leading-digit carries.99. k2. (c) Folding method: We have to partition the key k into a number of parts k1.…. The second formula is used when we want the hash addresses to range from 1 to m rather than from 0 to m – 1. …. Then the parts are added together. We will now illustrate some popular hash functions. 7148. Suppose L consists of 100 two-digit addresses: 00.are each reversed before the addition. has the same number of digits as the required address. the key k is squared and the hash function H is defined by H(k) = 1 where 1 is obtained by deleting digits from both ends of k2. We apply the above hash functions to each of the following employee numbers. In Software 3 We cannot guarantee that the second condition can be completely fulfilled without actually knowing before hand the keys and addresses. 2345 . except possibly the last. 3205. if any. 02. (a) Division method: Choose a number m larger than the number n of keys in K (We usually chose the number m to be a prime number or a number without small divisors. we choose that the function H(k) = k(mod m) + 1 to obtain: H(3205) = 4+1=5. and the memory location address H(k) is already occupied. H(7148)=71+84=55 H(2345)=23+54=77 Collision Resolution Suppose we want to add a new record R with key k to our file F. and dividing 2345 by 97 gives a remainder of 17. (b) H(7148) = 67 + 1 = 68. such as m = 97. is called the load factor. Specifically. This situation is called collision. One random hash function is to choose the student’s birthday as the hash address. Alternatively. are chosen for the hash address. We can show that there is a better than fifty-fifty chance that two of the students have the same birthday. Choose a prime number m close to 99. The particular procedure that we have chosen depends on many factors. This ratio. . In Software 4 (a) Division method. H(2345)=17 That is. In the case that the memory addresses begin with 01 rather than 00. In this subsection we discuss two general ways of resolving collisions. H(7148) = 67. dividing 3205 by 97 gives a remainder of 4. we may want to reverse the second part before adding. The following calculations are performed: k : 3205 k2 : 10 272 025 H(k): 72 7148 2345 51 093 904 5 499 025 93 99 Observe that the fourth and fifth digits.MSc. One important factor is the ratio of the number n of keys in K (which is the number of records in F) to the number m of hash addresses in L. Then H(3205) = 4. counting from the right. suppose a student class has 24 students and suppose the table has space for 365 records. thus producing the following hash addresses: H(3205)=32+50=82. dividing 7148 by 97 gives reminder of 67. First we will show that collisions are almost impossible to avoid. Chopping the key k into two parts and adding yields the following hash addresses: H(3205)= 32+05 = 37 H(7148)=71+48=19 H(2345)=23+45=68 Observe that the leading digit 1 in H(7148) is ignored . Although the load factor λ = 24/365 ≈ 7% is very small. (c) Folding method. λ = n/m. H(2345) = 17+1=18 Midsquare method. . In Software 5 We can measure the efficiency of a hash function with a collision resolution procedure by the average number of probes (key comparisons) needed to find the location of the record with a given key k. First of all. we will search for the record R in the table T by linearly searching the locations T[h].(1 + ----) 2 1-λ λ and U(λ λ) = 1 1 ---. then R is simply inserted at the beginning of its linked list. Specifically.(1+ ------2 (1-λ λ )2 ) (Here λ = n/m is the load factor. S(λ λ) = 1 1 ---. there is a hash address table LIST that contains pointers to the linked lists in T. Searching for a record or deleting a record is nothing more than searching for a node or deleting a node from a linked list. with such a collision procedure. Second. Accordingly. If the linked lists of records are not sorted.MSc. so that T[1] comes after T(m) ). One natural way to resolve the collision is to assign R to the first available location following T[h]. we are interested in the following two efficiencies that depend mainly on the load factor λ. T[h+1].) Chaining Chaining involves maintaining two tables in memory. The efficiency depends mainly on the load factor λ. except that T now has an additional field LINK that is used so that all records in T with the same hash address h may be linked together to form a linked list. but that the memory location with hash address H(k) = h is already filled. ( We assume that the table T with m locations is circular. we are interested in the following two quantities. which contains the records in F. S(λ λ) = average number of probes for a successful search U(λ λ) = average number of probes for an unsuccessful search Open Addressing: Linear Probing and Modifications Suppose we add a new record R with key k to the memory table T. until finding R or meeting an empty location. The above collision resolution is called linear probing. We place R in the first available location in the table T and then add R to the linked list with pointer LIST[H(k)]. there is a table T in memory. The average number of probes for a successful search and for an unsuccessful search are known to be the following respective quantities. Specifically. Suppose a new record R with key k is added to the file F. T[h+2]. which indicates an unsuccessful search. as we used before. A record is simply put in the first node in the AVAIL list of table T. using chaining. Example Lets consider again the data in the previous example where the 8 records. table T need not have the same number of elements as the hash address table. 7-1 The main disadvantage of chaining is that we need 3m memory cells for the data. for a successful search and for an unsuccessful search are known to be the following approximate values: 1 S(λ λ) ≈ 1 + -. Specifically there are m cells for the information field INFO. In fact. In Software 6 The average number of probes. since the number m of hash addresses in L (not the number of locations in T) may be less than the number n of records in F.λ and U(λ λ) ≈ e-λλ + λ 2 Here the load factor λ = n/m may be greater than 1. have the following hash addresses: Record: H(k): A 4 B 8 C 2 D 11 E 4 X 11 Y 5 Z 1 Using chaining. the records will appear in memory as pictured in figure 7-1 Observe that the location of a record R in Table T is related to its hash address. there are m . Fig.MSc. $ Chaining involves maintaining two tables in memory. except that T now has an additional field LINK that is used so that all records in T with the same hash address may be linked together to form a linked list. will be oriented towards file management. In Software 7 cells for the link field LINK. First of all. Then it may be more useful to use open addressing with a table with 3m locations. which has the load factor λ ≤ 1/3. the function H should be very easy and quick to compute. then to use chaining to resolve collisions. $ Suppose we want to add a new record R with key k to our file F. Second the function H should. Suppose each record requires only 1 word for its information field. there is a table T in memory. First of all. $ We have used the two principal criteria in selecting a hash function H: K # L are as follows. This situation is called collision. as we used before. which contains the records in F. Summary $ The terminology. as far as possible uniformly distribute the hash addresses throughout the set L so that there are minimum number of collisions.MSc. but suppose the memory location address H(k) is already occupied. Zee Interactive Learning Systems . and there are m cells for the pointer array LIST. which we will be using in our presentation of hashing.
Copyright © 2025 DOKUMEN.SITE Inc.