CS6303 COMPUTERARCHITECTURE. 8 Great Ideas in Computer Architecture . ideas that computer architects have invented in the last The following are eight great 60 years of computer design. 1. Design for Moore’s Law. 2. Use Abstraction to Simplify Design 3. Make the common case fast 4. Performance via parallelism 5. Performance via pipelining 6. Performance via prediction 7. Hierarchy of memories 8. Dependability via redundancy 1. Design for Moore’s Law. The Number of Transistors in a integrated circuit doubles approximately every two years. (Gordon Moore, one of the founders of Intel.) As computer designs can take years, the resources available per chip can easily double or quadruple between the start and finish of the project. Computer architects must anticipate where the technology will be when the design finishes rather than design for where it operating system and processes. an abstraction level is a generalization of a model or algorithm. for otherwise design time would lengthen as dramatically as resources grew by Moore's Law. -The simplification provided by a good abstraction layer facilitates easy reuse. assembler. -In computer science. a computer system is usually represented as consisting of five abstraction levels: hardware firmware. Use Abstraction to Simplify Design -In computer architecture.8 Great Ideas in Computer Architecture. -Both computer architects and programmers had to invent techniques to make themselves more productive. -A major productivity technique for hardware and soft ware is to use . 2. which is only possible with careful experimentation and measurement. Make the common case fast -Making the common case fast will tend to enhance performance better than optimizing the rare case. .It implies that you know what the common case is.3. . Performance via parallelism -Computer architects have offered designs that get more performance by performing operations in parallel. -Parallel computing is a form of computation in which many calculations are carried out simultaneously.Parallelism has been employed for many years. mainly in high performance computing. operating on the principle that large problems can often be divided into smaller ones which are then solved concurrently ( “Parallel”). . .4. Performance via pipelining -Pipelining is a technique used in the design of computers to increase the instruction throughput (the number of instructions that can be executed in a unit of time). but does not reduce instruction latency (the time to complete a single instruction from start to finish) as it still must go through all steps. --Rather than processing each instruction sequentially (one at a time. .5. finishing one instruction before starting the next). -The basic instruction cycle is broken up into a series of pipeline stages. --Pipelining increases instruction throughput by performing multiple operations at the same time (in parallel). each instruction is split up into a sequence of steps so that different steps can be executed concurrently (by different circuitry) and in parallel (at the same time). --Without branch prediction. the processor would have to wait until the conditional jump instruction has passed the execute stage before the next instruction can enter the fetch stage in the pipeline. --The branch predictor attempts to avoid this waste of time by trying to guess whether the conditional jump is most likely to be taken or not taken.6. . Performance via prediction --To improve the flow and throughput in a instruction pipeline. Branch predictors play a critical role in achieving high effective performance in many modern pipelined microprocessor architectures. smallest. Hierarchy of memories --Programmers want memory to be fast. and cheap. --Caches give the programmer the illusion that main memory is nearly as fast as the top of the hierarchy and nearly as big and cheap as the bottom of the hierarchy. with the fastest. and most expensive memory per bit at the top of the hierarchy and the slowest. as memory speed often shapes performance. and the cost of memory today is often the majority of computer cost. and cheapest per bit at the bottom. largest. --Architects have found that they can address these conflicting demands with a hierarchy of memories. capacity limits the size of problems that can be solved. . large.7. .Memory Hierarchy. 300ns $.00001 cents /bit Disk G Bytes.000 ns) -5 -6 10 .000.10 cents/bit Tape infinite sec-min -8 10 Instr.Levels of Memory Hierarchy Upper Level Capacity Access Time Cost faster CPU Registers 100s Bytes 1s ns Registers Cache K Bytes 4 ns 1-0. 10 ms (10. Operands Blocks Memory Pages Disk Files Tape Larger Lower Level .0001-.1 cents/bit Cache Main Memory M Bytes 100ns. Computers not only need to be fast. .Since any physical device can fail.and a high degree of reliability. systems or networks requiring continuous availability -the used term is high availability -. they need to be dependable.8. -. -Examples: Systems designers usually provide failover capability in servers. we make systems dependable by including redundant components that can take over when a failure occurs and to help detect failures. Dependability via redundancy -. binary digit :Also called a bit. compilers. Compiler: A Program that translates high-level language statements into assembly language statements. Instruction : A command that computer hardware understands and obeys. . Operating System: Supervising Program that manages the resources of a computer for the benefit of the programs that run on that computer. One of the two numbers in base 2 (0 or 1) that are the components of information. loaders and assemblers.System Software: Software that provides services that are commonly useful. assembler : A program that translates a symbolic version of instructions(Assembly language program) into the binary version. including operating systems. High level Programming Language: A portable language such as C. active matrix display: A liquid crystal display using a transistor to control the transmission of light at each individual pixel. C++. Screens are composed of hundreds of thousands to millions of . pixel :The smallest individual picture element. such as a display. liquid crystal display: A display technology using a thin layer of liquid polymers that can be used to transmit or block light according to whether a charge is applied. Java or Visual Basic that is composed of words and algebraic notation that can be translated by a compiler into assembly language. such as a keyboard. Input Device : A mechanism through which the computer is fed information. or to another computer. output device: A mechanism that conveys the result of a computation to a user. memory. Datapath: The component of the processor that performs arithmetic operations. and so on. which contains the datapath and control and which adds numbers. Memory: The storage area in which programs are kept when they are running and that contains the data needed by the running programs. and I/O devices according to the instructions of the program.Integrated circuit: Also called a chip. The active part of the computer. signals I/O devices to activate. A device combining dozens to millions of transistors. tests numbers. dynamic random access memory (DRAM): Memory built as . control :The component of the processor that commands the datapath. central processor unit(CPU) :Also called processor. but faster and less dense than DRAM.cache memory : A small. main memory :Also called primary memory. Memory used to hold programs while they are running. Because they are rotating mechanical devices. access times are about 5 to 20 milliseconds and cost per gigabyte in . fast memory that acts as a buffer for a slower. static random access memory (SRAM): Also memory built as an integrated circuit. typically consists of flash memory in PMDs and magnetic disks in servers. A form of nonvolatile secondary memory composed of rotating platters coated with a magnetic recording material. magnetic disk :Also called hard disk. typically consists of DRAM in today’s computers. larger memory. secondary memory: Nonvolatile memory used to store programs and data between runs. local area network (LAN) :A network designed to carry data within a geographically confined area.75 to $1. . It is cheaper and slower than DRAM but more expensive per bit and faster than magnetic disks. wide area network(WAN): A network extended over hundreds of kilometers that can span a continent. typically within a single building. Access times are about 5 to 50 microseconds and cost per gigabyte in 2012 was $0.00.flash memory: A nonvolatile semiconductor memory. and output reads data from memory. which has the wonderful user interface advantage of users pointing directly .Input Control. Input writes data to memory. Memory 4. Datapath 5. The processor gets instructions and data from memory. and output. Control sends the signals that determine the operations of the datapath. low-power display. 2. Output 3. LCD :The most fascinating I/O device is probably the graphics display. Five Classic components are: 1.Components of a Computer System. Most personal mobile devices use liquid crystal displays (LCDs) to get a thin. memory. the tablets and smart phones of the Post PC era have replaced the keyboard and mouse with touch sensitive displays. Touch screen: While PCs also use LCD displays. input. speakers. This technology can allow multiple touches simultaneously. microphone. Wi-Fi network. . headphone jack.Components cont… --While there are a variety of ways to implement a touch screen. many tablets today use capacitive sensing. accelerometer. rear facing camera. touching distorts the electrostatic field of the screen. which results in a change in capacitance. which allows gestures that can lead to attractive user interfaces. --The list of I/O devices includes a capacitive multitouch LCD display. if an insulator like glass is covered with a transparent conductor. Integrated Circuits (IC’s): nicknamed Chips. and Bluetooth network. Since people are electrical conductors. gyroscope. front facing camera. and I/O devices what to do according to the wishes of the instructions of the program. --Cache memory consists of a small. than DRAM.Components cont… ---The processor is the active part of the computer. . following the instructions of a program. central processing unit. it also contains the data needed by the running programs. DRAM stands for dynamic random access memory. --The datapath performs the arithmetic operations. SRAM and DRAM are two layers of the memory hierarchy. It adds numbers. static random access memory(SRAM). people call the processor the CPU. and so on. memory. fast memory that acts as a buffer for the DRAM memory. --The memory is where the programs are kept when they are running. tests numbers. Cache is built using a different memory technology. and hence more expensive. Occasionally. SRAM is faster but less dense. signals I/O devices to activate. The memory is built from DRAM chips. and control tells the datapath. --The ISA includes anything the Programmers need to know to make a binary language program work correctly including instructions. --It is the interface between the hardware and the lowest level software. and other low level system functions so that application programmers do not need to worry about such details.Components cont… ISA: Instruction set Architecture. -.The Operating system will encapsulate the details of doing I/O.The combination of the basic instruction set and the OS interface provided for application programmers is called the Application binary interface. -. allocating memory. I/O devices and so on. .(ABI). A Safe place to save Data. --Resource sharing: Rather than each computer having its own I/O devices. Communicating with other Computers. . --Main Memory or Primary Memory. --Volatile and Non volatile Memory. --Bandwidth. --Communication: Information is exchanged between computers at high speeds. --Secondary Memory --Flash Memory. WAN. users need not be near the computer they are using. computers on the network can share I/O devices.LAN. -. --Nonlocal access: By connecting computers over long distances. Technologies for building Processors and Memory. . --These good dies are then bonded into packages(connected to the input/output pins of a .In figure.of which 17 passed testing. and insulators . One wafer produced 20 dies . or 85 %. -. The yield of good dies in this case was 17/20. during which patterns of chemicals are placed on each wafer. --An Ingot is finely sliced into wafers no more than 0.( X means the die is bad).Technology Cont… --The IC manufacturing process starts with a silicon crystal ingot. --These wafers then go through a series of processing steps. creating the Transistors.1 inches thick. conductors.The Ingots are 8-12 inches in diameter and about 12 to inches long. -. die :The individual rectangular sections that are cut from a wafer. . Transistor: An on/ off switch controlled by an electric signal. yield :The percentage of good dies from the total number of dies on the wafer. Semiconductor: A substance that does not conduct electricity well. more informally known as chips. VLSI : A device(IC) containing millions of transistors. Silicon: A natural element that is a semiconductor. clock cycles per instruction (CPI): Average number of clock cycles per instruction for a program or program fragment. memory accesses. Occasionally. I/O activities. it's expressed as bytes per second (Bps). Bandwidth: The amount of data that can be carried from one point to another in a given time period (usually a second). operating system overhead. and so on. .Performance: Response time or execution time:The total time required for the computer to complete a task. including disk accesses. This kind of bandwidth is usually expressed in bits (of data) per second (bps). CPU execution time. Throughput: The total amount of work done in a given time. Performance: we can relate performance and execution time for a computer X: . . . clock period : The length of each clock cycle. usually of the processor clock.Measuring Performance: clock cycle: Also called tick. or cycle. . clock tick. clock period. which runs at a constant rate. clock. The time for one clock period. Example: . . Instruction Performance: Clock cycles per instruction (CPI): Average number of clock cycles per instruction for a program or program fragment. Example: Suppose we have two implementations of the same instruction set architecture. Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for some program, and computer B has a clock cycle time of 500 ps and a CPI of 1.2 for the same program. Which computer is faster for this program and by how much? The Classic CPU Performance Equation. Example: .The basic Components of Performance and how each is Measured. . . . Register addressing 3. 1. PC-relative addressing 5.Base or displacement addressing 4. The Following MIPS Addressing Modes.Immediate addressing 2.Pseudodirect addressing .Addressing Modes: The method used to identify the location ofare anthe operand. The operand is specified in the instruction itself.. The register is specified in the instruction. i. Register addressing: The operand is in a CPU register. .Addressing Modes: Cont…. 2. Immediate addressing: The operand is a constant within the instruction itself. 1.e. Base or displacement addressing: The operand is at the memory location whose address is the sum of a register and a constant in the instruction. 4. PC-relative addressing: The branch address is the sum of the PC and a constant in the instruction. . 3..Addressing Modes: Cont…. . .5. Pseudo-direct addressing: The jump address is the 26 bits of the instruction concatenated with the upper Four bits of the PC. Pseudo-direct addressing Cont…… . . Base or displacement addressing. Pseudodirect addressing. The Following are the MIPS Addressing Modes. where the operand is a constant within the instruction itself. 2. where the operand is a register. 3. where the branch address is the sum of the PC and a constant in the instruction. Register addressing.Addressing Modes Summary The method used to identify the location of an operand. 4. 5. where the operand is at the memory location whose address is the sum of a register and a constant in the instruction. PC-relative addressing. Immediate addressing. where the jump address is the 26 bits of the instruction concatenated with the upper Four bits of the PC. 1. clock speed increased.Uniprocessors To 1. while the number of transistors/chip increased. reached saturation and cannot be increased beyond a certain limit because of power consumption and heat dissipation issues. many techniques like pipelining. 3. which boosted the heat dissipation across the chip to a dangerous level. The Problem is that this does not work in practice . There were limitations in reducing the size of individual gates further. Increasing the clock speed of Uniprocessor has Multiprocessors: 2. super pipelined. To gain Performance within a single core. Most of the early dual core processors were running at lower clock speeds. 5. 7. the rational behind is that a dual core processor with each running at 1Ghz should be equivalent to a single core processor running at 2 Ghz. As the physical size of chip decreased. superscalar architectures are used . 6. Cooling & heat sink requirement issues were there. There were limitations in the use of silicon surface area. 4. the benefit is more on throughput than on response time. they need to rewrite their programs to take advantage of multiple processors and also they have to improve performance of their code as the number of core increases. Ability to write Parallel programs 12.Uniprocessors To Multiprocessors Cont……. Challenges in Scheduling. 8.. Architecture and compilers to double performance of their programs every 18 months without having to change a line of code. The need of the hour is……. In the past. 9. . 11. 10. In Multi-core processors.. Care must be taken to reduce Communication and Synchronization overhead. for programmers to get significant improvement in response time. Today. load balancing have to be addressed. programmers could rely on innovations in the hardware. 1. power must be brought in and distributed around the chip which includes hundreds of pins and multiple interconnection layers just for power and ground. power is dissipated as heat and must be removed. First. power and energy? There are three primary concerns. 3. Hence. 2. How should a system architect think about performance.Power is the biggest challenge facing the computer designer for every class of computer. What is the maximum power a processor ever requires? --If it attempts to draws more Power than a Power supply can provide. the voltage will eventually drop which can cause the device to malfunction --Modern processors can vary widely in power consumption with high peak currents.Power Wall: Power & Energy in Integrated circuits. by drawing more current. they . Second. since it determines the cooling requirement. --Failure to provide adequate cooling will allow the temperature to exceed the maximum value resulting in device failure. Which metric is the right one for comparing processors: energy or power? . thereby reducing the power. Reduce clock rate. Thermal overload trip is activated to power down the chip.Power & Energy Cont…… What is the sustained Power consumption? --This Metric is called Thermal design power (TDP). --Modern processors provide two features to manage heat. --Power supply is usually designed to match or exceed TDP. 2. 1. energy that is consumed when transistors switch states from 0 to 1 and vice versa.Power & Energy Cont……. ---The dominant technology for integrated circuits is called CMOS (complementary metal oxide semiconductor). --The dynamic energy depends on the capacitive loading of each transistor and the voltage applied: . For CMOS. the primary source of energy consumption is socalled dynamic energy—that is. The capacitive load per transistor is a function of both the number of transistors connected to an output (called the fan out) and the technology. increasing the number of transistors increases power dissipation. but it’s hard to lower voltage further. even if the transistors are always off. A variety of design techniques and technology innovations are being deployed to control leakage. leakage is typically responsible for 40% of the energy consumption. which determines the capacitance of both wires and transistors. static energy consumption occurs because of leakage current that flows even when a transistor is off. . Although dynamic energy is the primary source of energy consumption in CMOS. In servers.Power & Energy Cont……. Thus. Frequency switched is a function of the clock rate. --Difference between the way humans instruct computers and the way computers see instructions. and placing these numbers side by side forms the instruction.Representing instructions in the Computer. -- . --Instructions are kept in computer as a series of high and low electronic signals and may be represented as numbers. --Each piece of an instruction can be considered as an individual number. . . .Representing instructions in the Computer Cont…. Instruction format :A form of representation of an instruction composed of fields of binary numbers. . . . . . Memory Operands: . . . . . Logical Operations: . Instructions for making decisions: . . . An unscheduled event that disrupts program execution.ALU: Arithmetic and Logic Unit. subtraction and other Logical operations such as AND and OR. Hardware that performs addition. used to detect overflow. . Exception : Also called Interrupt . Addition and Subtraction: . Subtraction: . Four Bit Adder-Subtractor: . Checking Overflow • Note that in the previous slide if the numbers considered to be signed V detects overflow. V=0 means no overflow and V=1 means the result is wrong because of overflow • Overflow can be happened when adding two numbers of the same sign (both negative or positive) and result can not be shown with the available bits. If these two carries are not equal an overflow occurred. It can be detected by observing the carry into sign bit and carry out of sign bit position. . That is why these two carries are applied to exclusive-OR gate to generate V. . Multiplication: . . . An Improved Version of the Multiplication Hardware: . MIPS Multiplication Instructions: . Division: . . . . . An Improved version of the division Hardware: . MIPS Division Instructions: . MIPS Multiply & Division Instructions: .