Embedded Systems Design: A UnifiedHardware/Software Introduction Chapter 1: Introduction 1 Outline • Embedded systems overview – What are they? • Design challenge – optimizing design metrics • Technologies – Processor technologies – IC technologies – Design technologies Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 2 Embedded systems overview • Computing systems are everywhere • Most of us think of “desktop” computers – – – – PC’s Laptops Mainframes Servers • But there’s another type of computing system – Far more common... Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 3 Embedded systems overview • Embedded computing systems – Computing systems embedded within electronic devices – Hard to define. Nearly any computing system other than a desktop computer – Billions of units produced yearly, versus millions of desktop units – Perhaps 50 per household and per automobile Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Computers are in here... and here... and even here... Lots more of these, though they cost a lot less each. 4 A “short list” of embedded systems Anti-lock brakes Auto-focus cameras Automatic teller machines Automatic toll systems Automatic transmission Avionic systems Battery chargers Camcorders Cell phones Cell-phone base stations Cordless phones Cruise control Curbside check-in systems Digital cameras Disk drives Electronic card readers Electronic instruments Electronic toys/games Factory control Fax machines Fingerprint identifiers Home security systems Life-support systems Medical testing systems Modems MPEG decoders Network cards Network switches/routers On-board navigation Pagers Photocopiers Point-of-sale systems Portable video games Printers Satellite phones Scanners Smart ovens/dishwashers Speech recognizers Stereo systems Teleconferencing systems Televisions Temperature controllers Theft tracking systems TV set-top boxes VCR’s, DVD players Video game consoles Video phones Washers and dryers And the list goes on and on Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 5 Some common characteristics of embedded systems • Single-functioned – Executes a single program, repeatedly • Tightly-constrained – Low cost, low power, small, fast, etc. • Reactive and real-time – Continually reacts to changes in the system’s environment – Must compute certain results in real-time without delay Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 6 An embedded system example -- a digital camera Digital camera chip CCD CCD preprocessor Pixel coprocessor D2A A2D lens JPEG codec Microcontroller Multiplier/Accum DMA controller Memory controller • • • Display ctrl ISA bus interface UART LCD ctrl Single-functioned -- always a digital camera Tightly-constrained -- Low cost, low power, small, fast Reactive and real-time -- only to a small extent Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 7 Design challenge – optimizing design metrics • Obvious design goal: – Construct an implementation with desired functionality • Key design challenge: – Simultaneously optimize numerous design metrics • Design metric – A measurable feature of a system’s implementation – Optimizing design metrics is a key challenge Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 8 Design challenge – optimizing design metrics • Common metrics – Unit cost: the monetary cost of manufacturing each copy of the system, excluding NRE cost – NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing the system – – – – Size: the physical space required by the system Performance: the execution time or throughput of the system Power: the amount of power consumed by the system Flexibility: the ability to change the functionality of the system without incurring heavy NRE cost Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 9 Design challenge – optimizing design metrics • Common metrics (continued) – Time-to-prototype: the time needed to build a working version of the system – Time-to-market: the time required to develop a system to the point that it can be released and sold to customers – Maintainability: the ability to modify the system after its initial release – Correctness, safety, many more Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 10 Design metric competition -- improving one may worsen others • Expertise with both software and hardware is needed to optimize design metrics Power Performance Size NRE cost CCD Digital camera chip A2D CCD preprocessor Pixel coprocessor D2A lens JPEG codec Microcontroller Multiplier/Accum DMA controller Memory controller Display ctrl ISA bus interface – Not just a hardware or software expert, as is common – A designer must be comfortable with various technologies in order to choose the best for a given application and constraints UART LCD ctrl Hardware Software Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 11 Time-to-market: a demanding design metric Revenues ($) • Time required to develop a product to the point it can be sold to customers • Market window – Period during which the product would have highest sales Time (months) Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis • Average time-to-market constraint is about 8 months • Delays can be costly 12 ) • Area = 1/2 * base * height Revenues ($) Peak revenue Peak revenue from delayed entry On-time Market fall Market rise Delayed D On-time entry Delayed entry 2W W Time Embedded Systems Design: A Unified Hardware/Software Introduction. peak at W – Time of market entry defines a triangle. representing market penetration – Triangle area equals revenue Peak revenue from delayed entry On-time Market fall Market rise Delayed • Loss D On-time entry – The difference between the ontime and delayed triangle areas 2W W Time Delayed entry Embedded Systems Design: A Unified Hardware/Software Introduction. delay D=4 wks (4*(3*26 –4)/2*26^2) = 22% Lifetime 2W=52 wks. delay D=10 wks (10*(3*26 –10)/2*26^2) = 50% Delays are costly! 14 . (c) 2000 Vahid/Givargis – On-time = 1/2 * 2W * W – Delayed = 1/2 * (W-D+W)*(W-D) • Percentage revenue loss = (D(3W-D)/2W2)*100% • Try some examples – – – – – Lifetime 2W=52 wks. (c) 2000 Vahid/Givargis 13 Losses due to delayed market entry (cont.Losses due to delayed market entry • Simplified revenue model Revenues ($) Peak revenue – Product life = 2W. 000 $0 $0 0 800 1600 2400 0 Numb er of units (volume) 800 1600 2400 Numb er of units (volume) • But. must also consider time-to-market Embedded Systems Design: A Unified Hardware/Software Introduction.000.000. excluding NRE cost – NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing the system – total cost = NRE cost + unit cost * # of units – per-product cost = total cost / # of units = (NRE cost / # of units) + unit cost • Example – NRE=$2000.000 B C $120. (c) 2000 Vahid/Givargis 16 . (c) 2000 Vahid/Givargis 15 NRE and unit cost metrics • Compare technologies by costs -.000.000 tota l c ost (x1000) $200 A C $120 $80 $40 $40.best depends on quantity – Technology A: NRE=$2. unit=$30 – Technology C: NRE=$100. unit=$100 – Technology B: NRE=$30. unit=$100 – For 10 units – total cost = $2000 + 10*$100 = $3000 – per-product cost = $2000/10 + $100 = $300 Amortizing NRE cost over the units results in an additional $200 per unit Embedded Systems Design: A Unified Hardware/Software Introduction.NRE and unit cost metrics • Costs: – Unit cost: the monetary cost of manufacturing each copy of the system.000 A B $160 p er p rod uc t c ost $160.000 $80. unit=$2 $200. The performance design metric • Widely-used measure of system. methods. especially using technical processes. instructions per second – not good measures – Digital camera example – a user cares about how fast it processes images. • Speedup of B over A = B’s performance / A’s performance – Throughput speedup = 8/4 = 2 Embedded Systems Design: A Unified Hardware/Software Introduction.g. widely-abused – Clock frequency.g. (c) 2000 Vahid/Givargis 17 Three key embedded system technologies • Technology – A manner of accomplishing a task. Camera B may process 8 images per second (by capturing a new image while previous image is being stored). e.25 seconds • Throughput – Tasks per second. or knowledge • Three key technologies for embedded systems – Processor technology – IC technology – Design technology Embedded Systems Design: A Unified Hardware/Software Introduction.. Camera A processes 4 images per second – Throughput can be more than latency seems to imply due to concurrency. not clock speed or instructions per second • Latency (response time) – Time between task start and end – e. (c) 2000 Vahid/Givargis 18 . Camera’s A and B process images in 0. e.g. Processor technology • The architecture of the computation engine used to implement a system’s desired functionality • Processor does not have to be programmable – “Processor” not equal to general-purpose processor Controller Datapath Controller Datapath Controller Datapath Control logic and State register Control logic and State register Registers Control logic index Register file Custom ALU State register IR PC General ALU IR total + PC Data memory Program memory Assembly code for: Data memory Data memory Program memory Assembly code for: total = 0 for i =1 to … total = 0 for i =1 to … General-purpose Application-specific Single-purpose (“hardware”) Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 19 Processor technology • Processors vary in their customization for the problem at hand Desired functionality General-purpose processor Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis total = 0 for i = 1 to N loop total += M[i] end loop Application-specific processor Single-purpose processor 20 . General-purpose processors • Programmable device used in a variety of applications – Also known as “microprocessor” • Features – Program memory – General datapath with large register file and general ALU • User benefits – Low time-to-market and NRE costs – High flexibility • “Pentium” the most well-known. coprocessor.k. but there are hundreds of others Controller Datapath Control logic and State register Register file IR PC Program memory General ALU Data memory Assembly code for: total = 0 for i =1 to … Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 21 Single-purpose processors • Digital circuit designed to execute exactly one program – a. accelerator or peripheral • Features – Contains only the components needed to execute a single program – No program memory Controller Datapath Control logic index total State register + Data memory • Benefits – Fast – Low power – Small size Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 22 .a. (c) 2000 Vahid/Givargis 23 IC technology • The manner in which a digital (gate-level) implementation is mapped onto an IC – IC: Integrated circuit. or “chip” – IC technologies differ in their customization to a design – IC’s consist of numerous layers (perhaps 10 or more) • IC technologies differ with respect to who builds each layer and when IC package IC source gate oxide channel drain Silicon substrate Embedded Systems Design: A Unified Hardware/Software Introduction. good performance.Application-specific processors • Programmable processor optimized for a particular class of applications having common characteristics – Compromise between general-purpose and single-purpose processors Controller Datapath Control logic and State register Registers Custom ALU IR PC • Features Program memory – Program memory – Optimized datapath – Special functional units Data memory Assembly code for: total = 0 for i =1 to … • Benefits – Some flexibility. size and power Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 24 . (c) 2000 Vahid/Givargis Har 26 . (c) 2000 Vahid/Givargis 25 Full-custom/VLSI • All layers are optimized for an embedded system’s particular digital implementation – Placing transistors – Sizing transistors – Routing wires • Benefits – Excellent performance.g.IC technology • Three types of IC technologies – Full-custom/VLSI – Semi-custom ASIC (gate array and standard cell) – PLD (Programmable Logic Device) Embedded Systems Design: A Unified Hardware/Software Introduction. $300k). small size.. long time-to-market Embedded Systems Design: A Unified Hardware/Software Introduction. low power • Drawbacks – High NRE cost (e. good size.Semi-custom • Lower layers are fully or partially built – Designers are left with routing of wires and maybe placing some blocks • Benefits – Good performance. less NRE cost than a fullcustom implementation (perhaps $10k to $100k) • Drawbacks – Still require weeks to months to develop Embedded Systems Design: A Unified Hardware/Software Introduction. power hungry. almost instant IC availability • Drawbacks – Bigger. (c) 2000 Vahid/Givargis 27 PLD (Programmable Logic Device) • All layers already exist – Designers can purchase an IC – Connections on the IC are either created or destroyed to implement desired functionality – Field-Programmable Gate Array (FPGA) very popular • Benefits – Low NRE costs. (c) 2000 Vahid/Givargis 28 . expensive (perhaps $30 per unit). slower Embedded Systems Design: A Unified Hardware/Software Introduction. Moore’s law • The most important trend in embedded systems – Predicted in 1965 by Intel co-founder Gordon Moore IC transistor capacity has doubled roughly every 18 months for the past several decades 10,000 1,000 Logic transistors per chip (in millions) 100 10 1 0.1 Note: logarithmic scale 0.01 0.001 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 29 Moore’s law • Wow – This growth rate is hard to imagine, most people underestimate – How many ancestors do you have from 20 generations ago • i.e., roughly how many people alive in the 1500’s did it take to make you? • 220 = more than 1 million people – (This underestimation is the key to pyramid schemes!) Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 30 Graphical illustration of Moore’s law 1981 1984 1987 1990 1993 1996 1999 2002 10,000 transistors 150,000,000 transistors Leading edge chip in 1981 Leading edge chip in 2002 • Something that doubles frequently grows more quickly than most people realize! – A 2002 chip can hold about 15,000 1981 chips inside itself Embedded b Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 31 Design Technology • The manner in which we convert our concept of desired system functionality into an implementation Compilation/ Synthesis Compilation/Synthesis: Automates exploration and insertion of implementation details for lower level. Libraries/IP: Incorporates predesigned implementation from lower abstraction level into higher level. Test/Verification: Ensures correct functionality at each level, thus reducing costly iterations between levels. Libraries/ IP Test/ Verification System specification System synthesis Hw/Sw/ OS Model simulat./ checkers Behavioral specification Behavior synthesis Cores Hw-Sw cosimulators RT specification RT synthesis RT components HDL simulators Logic specification Logic synthesis Gates/ Cells Gate simulators To final implementation Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 32 Design productivity exponential increase 100,000 1,000 100 10 1 Productivity (K) Trans./Staff – Mo. 10,000 2009 0.01 2007 2005 2003 2001 1999 1997 1995 1993 1991 1989 1987 1985 1983 0.1 • Exponential increase over the past few decades Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 33 The co-design ladder • In the past: – Hardware and software design technologies were very different – Recent maturation of synthesis enables a unified view of hardware and software • Hardware/software “codesign” Sequential program code (e.g., C, VHDL) Behavioral synthesis (1990's) Compilers (1960's,1970's) Register transfers Assembly instructions RT synthesis (1980's, 1990's) Assemblers, linkers (1950's, 1960's) Logic equations / FSM's Machine instructions Logic synthesis (1970's, 1980's) Logic gates Microprocessor plus program bits: “software” Implementation VLSI, ASIC, or PLD implementation: “hardware” The choice of hardware versus software for a particular function is simply a tradeoff among various design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no fundamental difference between what hardware or software can implement. Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 34 Independence of processor and IC technologies • Basic tradeoff – General vs. custom – With respect to processor technology or IC technology – The two technologies are independent General, providing improved: Generalpurpose processor ASIP Singlepurpose processor Flexibility Maintainability NRE cost Time- to-prototype Time-to-market Cost (low volume) Customized, providing improved: Power efficiency Performance Size Cost (high volume) PLD Semi-custom Full-custom Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 35 Design productivity gap • While designer productivity has grown at an impressive rate over the past decades, the rate of improvement has not kept pace with chip capacity Logic transistors per chip (in millions) 10,000 100,000 1,000 10,000 100 10 1000 Gap IC capacity 1 10 0.1 0.01 0.001 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 100 Productivity (K) Trans./Staff-Mo. 1 productivity 0.1 0.01 36 Design productivity gap • 1981 leading edge chip required 100 designer months – 10,000 transistors / 100 transistors/month • 2002 leading edge chip requires 30,000 designer months – 150,000,000 / 5000 transistors/month • Designer cost increase from $1M to $300M Logic transistors per chip (in millions) 10,000 100,000 1,000 10,000 100 10 1000 100 Gap IC capacity 1 0.1 10 1 productivity 0.01 Productivity (K) Trans./Staff-Mo. 0.1 0.001 0.01 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 37 The mythical man-month • The situation is even worse than the productivity gap indicates • • In theory, adding designers to team reduces project completion time In reality, productivity per designer decreases due to complexities of team management and communication In the software community, known as “the mythical man-month” (Brooks 1975) At some point, can actually lengthen project completion time! (“Too many cooks”) • • • • • 1M transistors, 1 designer=5000 trans/month Each additional designer reduces for 100 trans/month So 2 designers produce 4900 trans/month each 60000 50000 40000 30000 20000 10000 16 16 19 18 23 24 Months until completion 43 Individual 0 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Team 15 10 20 30 Number of designers 40 38 Summary • Embedded systems are everywhere • Key challenge: optimization of design metrics – Design metrics compete with one another • A unified view of hardware and software is necessary to improve productivity • Three key technologies – Processor: general-purpose, application-specific, single-purpose – IC: Full-custom, semi-custom, PLD – Design: Compilation/synthesis, libraries/IP, test/verification Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 39 Embedded Systems Design: A Unified Hardware/Software Introduction Chapter 10: IC Technology 1 Drain – Diffusion area where electrons can flow – Can be connected to metal contacts (via’s) • Gate – Polysilicon area where control voltage is applied • Oxide – Si O2 Insulator so the gate voltage can’t leak Embedded Systems Design: A Unified Hardware/Software Introduction.Outline • Anatomy of integrated circuits • Full-Custom (VLSI) IC Technology • Semi-Custom (ASIC) IC Technology • Programmable Logic Device (PLD) IC Technology Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 2 CMOS transistor • Source. (c) 2000 Vahid/Givargis 3 . (c) 2000 Vahid/Givargis 4 Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 5 .End of the Moore’s Law? • Every dimension of the MOSFET has to scale – (PMOS) Gate oxide has to scale down to • Increase gate capacitance • Reduce leakage current from S to D • Pinch off current from source to drain – Current gate oxide thickness is about 2.5-3nm • That’s about 25 atoms!!! IC package IC source gate oxide channel drain Silicon substrate Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 6 NAND • • • • Metal layers for routing (~10) PMOS don’t like 0 NMOS don’t like 1 A stick diagram form the basis for mask sets Embedded Systems Design: A Unified Hardware/Software Introduction.20Ghz + • FinFET has been manufactured to 18nm – Still acts as a very good transistor • Simulation shown that it can be scaled to 10nm – Quantum effect start to kick in • Reduce mobility by ~10% – Ballistic transport become significant • Increase current by about ~20% Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 7 . slow wires – May also need to size buffer • Design Rules – “simple” rules for correct circuit function • Metal/metal spacing. (c) 2000 Vahid/Givargis 9 . fast wires or thin. min poly width… Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 8 Full Custom • Very Large Scale Integration (VLSI) • Placement – Place and orient transistors • Routing – Connect transistors • Sizing – Make fat.Silicon manufacturing steps • Tape out – Send design to manufacturing • Spin – One time through the manufacturing process • Photolithography – Drawing patterns by using photoresist to form barriers for deposition Embedded Systems Design: A Unified Hardware/Software Introduction. Instruction fetch… • Physical design tools – Less optimal. (c) 2000 Vahid/Givargis d 11 . higher complexity Integrate great with full-custom Embedded Systems Design: A Unified Hardware/Software Introduction. faster time-to-market Does not integrate as well with full-custom • Standard Cell – – – – A library of pre-designed cell Place and route Lower density. but faster… Embedded Systems Design: A Unified Hardware/Software Introduction. performance • Hand design – Horrible time-to-market/flexibility/NRE cost… – Reserve for the most important units in a processor • ALU. (c) 2000 Vahid/Givargis 10 Semi-Custom • Gate Array – – – – Array of prefabricated gates “place” and route Higher density.Full Custom • Best size. power. time-to-market. (c) 2000 Vahid/Givargis 12 Embedded Systems Design: A Unified Hardware/Software Introduction. NRE cost.Semi-Custom • Most popular design style • Jack of all trade – Good • Power. (c) 2000 Vahid/Givargis 13 . performance. area… • Master of none – Integrate with full custom for critical regions of design Embedded Systems Design: A Unified Hardware/Software Introduction. per-unit cost. (c) 2000 Vahid/Givargis 800-6400 usable gates 5-15 ns delay. bad for large volume – Power • Except special PLA – slower Embedded Systems Design: A Unified Hardware/Software Introduction. up to 125 MHz (2004) Few $’s price 14 Xilinx FPGA Embedded Systems Design: A Unified Hardware/Software Introduction. Programmable Array Logic. (c) 2000 Vahid/Givargis 15 . Field Programmable Gate Array • All layers already exist – Designers can purchase an IC – To implement desired functionality • Connections on the IC are either created or destroyed to implement • Benefits – Very low NRE costs – Great time to market • Drawback – High unit cost.Programmable Logic Device • Programmable Logic Device – Programmable Logic Array. (c) 2000 Vahid/Givargis 17 .Configurable Logic Block (CLB) Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 16 I/O Block Embedded Systems Design: A Unified Hardware/Software Introduction. Languages • State Machine Model – FSM/FSMD – HCFSM and Statecharts Language – Program-State Machine (PSM) Model • Concurrent Process Model – Communication – Synchronization – Implementation • Dataflow Model • Real-Time Systems Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 2 .Embedded Systems Design: A Unified Hardware/Software Introduction Chapter 8: State Machine and Concurrent Process Model 1 Outline • Models vs. (c) 2000 Vahid/Givargis 4 . (c) 2000 Vahid/Givargis 3 An example of trying to be precise in English • California Vehicle Code – Right-of-way of crosswalks • 21950. Embedded Systems Design: A Unified Hardware/Software Introduction. • Hundreds of thousands of lines of code • Desired behavior often not fully understood in beginning – Many implementation bugs due to description mistakes/omissions – English (or other natural language) common starting point • Precise description difficult to impossible • Example: Motor Vehicle Code – thousands of pages long.Introduction • Describing embedded system’s processing behavior – Can be extremely difficult • Complexity increasing with increasing IC capacity – Past: washing machines. • (b) The provisions of this section shall not relieve a pedestrian from the duty of using due care for his or her safety. No pedestrian shall suddenly leave a curb or other place of safety and walk or run into the path of a vehicle which is so close as to constitute an immediate hazard.. etc. (a) The driver of a vehicle shall yield the right-of-way to a pedestrian crossing the roadway within any marked crosswalk or within any unmarked crosswalk at an intersection. etc.. small games. – All that just for crossing the street (and there’s much more)! Embedded Systems Design: A Unified Hardware/Software Introduction. Cell phone. No pedestrian shall unnecessarily stop or delay traffic while in a marked or unmarked crosswalk. • (c) The provisions of subdivision (b) shall not relieve a driver of a vehicle from the duty of exercising due care for the safety of any pedestrian within any marked crosswalk or within any unmarked crosswalk at an intersection. • Hundreds of lines of code – Today: TV set-top boxes. except as otherwise provided in this chapter. . e. e. sets control outputs – Dataflow model • For data dominated systems. (c) 2000 Vahid/Givargis 6 .g. monitors control inputs.g. C Computation models describe system behavior – Conceptual notion. C++ ĺVHTXHQWLDOSURJUDPPRGHOREMHFW-oriented model. English. sequential program model Æ C.C++. transforms input data streams into output streams – Object-oriented model • For breaking complex software into simpler. sequential program • Languages capture models – Concrete form. C++). recipe. well-defined pieces Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 5 Models vs. state machine model • Certain languages better at capturing certain computation models Embedded Systems Design: A Unified Hardware/Software Introduction. English • Sequential programs vs.. but computation model is the key • Common computation models: – Sequential program model • Statements..g.Models and languages • How can we (precisely) capture behavior? – We may think of languages (C. Java • One language can capture variety of models – E. semantics for executing them – Communicating process model • Multiple sequential programs running concurrently – State machine model • For control dominated systems. rules for composing statements..g. languages Poetry Recipe Story State machine Sequent. program Dataflow English Spanish Japanese C C++ Java Models Languages Recipes vs. C • Variety of languages can capture one model – E. . open the door for at least 10 seconds. (c) 2000 Vahid/Givargis 7 Introductory example: An elevator controller Partial English description • Simple elevator controller – Request Resolver resolves various floor requests into single requested floor – Unit Control moves elevator to this requested floor • Try capturing in C.... X=1 Y = X + 1. and keep it open until the requested floor changes. b1 b2 bN up1 up2 dn2 up3 dn3 buttons inside elevator up/down buttons on each floor . arrows (plus some letters. (c) 2000 Vahid/Givargis “Move the elevator either up or down to reach the requested floor. Embedded Systems Design: A Unified Hardware/Software Introduction. Ensure the door is never open while moving. Once at the requested floor. numbers • Graphics: circles. Y=X+1 Embedded Systems Design: A Unified Hardware/Software Introduction.. Don’t change directions unless there are no higher requests when moving up or no lower requests when moving down…” System interface up Unit Control down open floor req Request Resolver .Text versus Graphics • Models versus languages not to be confused with text versus graphics – Text and graphics are just two types of languages • Text: letters. numbers) X = 1. dnN 8 .. ..g. (c) 2000 Vahid/Givargis 10 . down.. open = 0. Ensure the door is never open while moving. req > floor – Actions that occur in each state • E.o. open = 1. (c) 2000 Vahid/Givargis 9 Finite-state machine (FSM) model • Trying to capture this behavior as sequential program is a bit awkward • Instead. down. open = 1. dn2.} while (req != floor)..0. Embedded Systems Design: A Unified Hardware/Software Introduction.. . we might consider an FSM model.dnN. In the GoingUp state. open the door for at least 10 seconds.g.bN. describing the system as: – Possible states • E... delay(10)..} else {down = 1. b1 b2 bN up1 up2 dn2 up3 dn3 buttons inside elevator up/down buttons on each floor . . open.. and keep it open until the requested floor changes. DoorOpen – Possible transitions from one state to another based on input • E.Elevator controller using a sequential program model Sequential program model Inputs: int floor. Idle.g. Outputs: bit up.. req = .. if (req > floor) { up = 1. GoingUp. up1. GoingDn. u.. } void main() { Call concurrently: UnitControl() and RequestResolver() } System interface Partial English description “Move the elevator either up or down to reach the requested floor.d. void UnitControl() { up = down = 0.. bit b1.. up = down = 0.. while (1) { while (req == floor).. Once at the requested floor. dnN Embedded Systems Design: A Unified Hardware/Software Introduction.t = 1..0. Don’t change directions unless there are no higher requests when moving up or no lower requests when moving down…” up Unit Control down open floor req Request Resolver You might have come up with something having even more if statements. Global variables: int req.. open.upN-1. } } void RequestResolver() { while (1) .0 (up = 1. and timer_start = 0) • Try it. 0.t = 0.t = 0.0 Idle req == floor u. H maps S ĺO) • Mealy-type – Associates outputs with transitions (H maps S x I ĺO) • Shorthand notations to simplify descriptions – Implicitly assign 0 to all unassigned outputs in a state – Implicitly AND every transition condition with clock edge (FSM is synchronous) Embedded Systems Design: A Unified Hardware/Software Introduction.0 GoingUp !(req > floor) timer < 10 req > floor !(timer < 10) u.d. o1. s0> – – – – – – S is a set of all states {s0. I.1.0. im} O is a set of outputs {o0.o. F. d is down.t = 0.1.0. (c) 2000 Vahid/Givargis 11 Formal definition • An FSM is a 6-tuple F<S. sl} I is a set of inputs {i0. ….0.d. H.1 req < floor !(req<floor) GoingDn u is up.1.0. ….0 DoorOpen u. …. s1. (c) 2000 Vahid/Givargis 12 . o is open req < floor t is timer_start Embedded Systems Design: A Unified Hardware/Software Introduction. O.d.o.d.Finite-state machine (FSM) model UnitControl process using a state machine req > floor u.o.o. on} F is a next-state function (S x I ĺS) H is an output function (S ĺO) s0 is an initial state • Moore-type – Associates outputs with states (as given above. t = 1. i1. sl} – I is a set of inputs {i0. Declare all variables (none in this example) 3. o is open req < floor t is timer_start I.0. t = 1.d.o. o is open req < floor t is timer_start 14 . d is down.0.e.o.O.o. (c) 2000 Vahid/Givargis 13 Describing a system as a state machine 1. ….0.d.0 – V is a set of variables {v0.t = 0. and values of all variables Embedded Systems Design: A Unified Hardware/Software Introduction.d.1. s0> req > floor – S is a set of states {s0.0.d. integers.o. vn} – F is a next-state function (S x I x V ĺS) – H is an action function (S ĺO + V) – s0 is an initial state • • • GoingUp !(req > floor) req > floor !(timer < 10) u.) F. For each state.d.d.0 !(req > floor) GoingUp timer < 10 req > floor u.V may represent complex data types (i.t = 0.1.H may include arithmetic operations H is an action function. i1. floating point.1.0 Idle req == floor req < floor u. not just an output function – Describes variable updates as well as outputs • Complete system state now consists of current state. …. ensure exclusive and complete exiting transition conditions • No two exiting conditions can be true at same time – • Otherwise nondeterministic state machine One condition must be true at any given time – Reducing explicit transitions should be avoided when first learning Embedded Systems Design: A Unified Hardware/Software Introduction.0. List all possible states 2. list associated actions 5. im} – O is a set of outputs {o0. on} u. H.0. with conditions.1.t = 0. o1.t = 0. etc.o.1 !(req<floor) GoingDn u is up. For each state and/or transition.0.0. d is down. F.0.d. si.1. s1. v1. For each state. …. V. no variables • We described UnitControl as an FSMD FSMD: 7-tuple <S.t = 0.1 req == floor req < floor u. …..0. (c) 2000 Vahid/Givargis req > floor u. list possible transitions.0 !(req<floor) GoingDn u is up.0 Idle !(timer < 10) DoorOpen u.o.0 timer < 10 DoorOpen u. t = 1. I .t = 0.d.1.Finite-state machine with datapath model (FSMD) • FSMD extends FSM: complex data types and variables for storing data – FSMs use only Boolean data types and operations.o. to other states 4.o. O. . sequential program model could use graphical representation (i. state table) • Besides. flowchart) Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 15 Try Capturing Other Behaviors with an FSM • E.e. (c) 2000 Vahid/Givargis 16 .. sequential program model • Different thought process used with each model • State machine: – Encourages designer to think of all possible states and transitions among states based on all possible input conditions • Sequential program model: – Designed to transform data through series of instructions that may be iterated and conditionally executed • State machine description excels in many cases – More natural means of computing in those cases – Not due to graphical representation (state diagram) • Would still have same benefits if textual language used (i. A simple crosswalk traffic control light • Others Embedded Systems Design: A Unified Hardware/Software Introduction.State machine vs.. A simple telephone answering machine that answers after 4 rings when activated • E..g..g. Answering machine blinking light when there are messages • E.g.e. Verilog. etc. down=0. etc..} break.. GOINGUP: up=1. C++. timer_start=0. most popular development tools use sequential programming language – C. open. open=0. timer_start=0. (c) 2000 Vahid/Givargis 17 Language subset approach • • • Follow rules (template) for capturing state machine constructs in equivalent sequential language constructs Used with software (e.} if (req > floor) {state = GOINGUP. timer_start Each case checks transition conditions to determine next state • if(…) {state = …. while (1) { switch (state) { IDLE: up=0. if (req > floor) {state = GOINGUP.g. upgrades. Ada. if (timer < 10) {state = DOOROPEN. DOOROPEN: up=0.Capturing state machines in sequential programming language • Despite benefits of state machine model.} if (!(timer<10)){state = IDLE. Java. – Development tools are complex and expensive. (c) 2000 Vahid/Givargis #define IDLE0 #define GOINGUP1 #define GOINGDN2 #define DOOROPEN3 void UnitControl() { int state = IDLE.} if (!(req<floor)) {state = DOOROPEN.) – Language subset approach • Most common approach. Embedded Systems Design: A Unified Hardware/Software Introduction. open=0. down=0. GOINGDN: up=1. if (req < floor) {state = GOINGDN. VHDL.} if (req < floor) {state = GOINGDN.. if (req==floor) {state = IDLE. } } } UnitControl state machine in sequential programming language 18 .} if (!(req>floor)) {state = DOOROPEN.g..} break.C) and hardware languages (e.VHDL) Capturing UnitControl state machine in C – – – – Enumerate all states (#define) Declare state variable initialized to initial state (IDLE) Single switch statement branches to current state’s case Each case has actions • – up. open=1. training. down=0. timer_start=0.} Embedded Systems Design: A Unified Hardware/Software Introduction.} break.} break. therefore not easy to adapt or replace • Must protect investment • Two approaches to capturing state machine model with sequential programming language – Front-end tool approach • Additional tool installed to support state machine language – Graphical and/or textual state machine languages – May support graphical simulation – Automatically generate code in sequential programming language that is input to main development tool • Drawback: must support additional tool (licensing costs. down. down=0. timer_start=1. open=1. General template #define S0 0 #define S1 1 ... #define SN N void StateMachine() { int state = S0; // or whatever is the initial state. while (1) { switch (state) { S0: // Insert S0’s actions here & Insert transitions Ti leaving S0: if( T0’s condition is true ) {state = T0’s next state; /*actions*/ } if( T1’s condition is true ) {state = T1’s next state; /*actions*/ } ... if( Tm’s condition is true ) {state = Tm’s next state; /*actions*/ } break; S1: // Insert S1’s actions here // Insert transitions Ti leaving S1 break; ... SN: // Insert SN’s actions here // Insert transitions Ti leaving SN break; } } } Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 19 HCFSM and the Statecharts language • Hierarchical/concurrent state machine model (HCFSM) – – Extension to state machine model to support hierarchy and concurrency States can be decomposed into another state machine • • – x y A2 A z A1 w z A1 B x y z w B A2 States can execute concurrently • • With hierarchy has identical functionality as Without hierarchy, but has one less transition (z) Known as OR-decomposition With hierarchy Without hierarchy Known as AND-decomposition Concurrency Statecharts – – – Graphical language to capture HCFSM timeout: transition with time limit as condition history: remember last substate OR-decomposed state A was in before transitioning to another state B • B C D C1 x D1 y C2 u v D2 Return to saved substate of A when returning from B instead of initial state Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 20 UnitControl with FireMode req>floor u,d,o = 1,0,0 GoingUp req>floor u,d,o = 0,0,1 UnitControl timeout(10) req==floor u,d,o = 0,1,0 • FireMode !(req>floor) Idle DoorOpen fire fire !(req<floor) req<floor fire FireGoingDn GoingDn fire floor>1 req<floor – When fire is true, move elevator to 1st floor and open door – w/o hierarchy: Getting messy! – w/ hierarchy: Simple! u,d,o = 0,0,1 u,d,o = 0,1,0 floor==1 u,d,o = 0,0,1 FireDrOpen !fire With hierarchy fire UnitControl Without hierarchy NormalMode req>floor u,d,o = 1,0,0 GoingUp !(req>floor) req>floor ElevatorController UnitControl u,d,o = 0,0,1 RequestResolver NormalMode u,d,o = 0,1,0 ... !fire Idle req==floor req<floor GoingDn fire timeout(10) !(req>floor) DoorOpen u,d,o = 0,0,1 req<floor FireMode fire !fire With concurrent RequestResolver FireMode u,d,o = 0,1,0 FireGoingDn floor==1 u,d,o = 0,0,1 floor>1 FireDrOpen fire Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 21 Program-state machine model (PSM): HCFSM plus sequential program model • Program-state’s actions can be FSM or sequential program ElevatorController int req; UnitControl NormalMode up = down = 0; open = 1; while (1) { while (req == floor); open = 0; if (req > floor) { up = 1;} else {down = 1;} while (req != floor); open = 1; delay(10); } } !fire fire – Designer can choose most appropriate • Stricter hierarchy than HCFSM used in Statecharts – transition between sibling states only, single entry – Program-state may “complete” • Reaches end of sequential program code, OR • FSM transition to special complete substate • PSM has 2 types of transitions – – Transition-immediately (TI): taken regardless of source program-state Transition-on-completion (TOC): taken only if condition is true AND source program-state is complete – SpecCharts: extension of VHDL to capture PSM model – SpecC: extension of C to capture PSM model Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis RequestResolver ... req = ... ... FireMode up = 0; down = 1; open = 0; while (floor > 1); up = 0; down = 0; open = 1; • • NormalMode and FireMode described as sequential programs Black square originating within FireMode indicates !fire is a TOC transition – Transition from FireMode to NormalMode only after FireMode completed 22 Role of appropriate model and language • Finding appropriate model to capture embedded system is an important step – Model shapes the way we think of the system • Originally thought of sequence of actions, wrote sequential program – – – – – First wait for requested floor to differ from target floor Then, we close the door Then, we move up or down to the desired floor Then, we open the door Then, we repeat this sequence • To create state machine, we thought in terms of states and transitions among states – When system must react to changing inputs, state machine might be best model • HCFSM described FireMode easily, clearly • Language should capture model easily – Ideally should have features that directly capture constructs of model – FireMode would be very complex in sequential program • Checks inserted throughout code – Other factors may force choice of different model • Structured techniques can be used instead – E.g., Template for state machine capture in sequential program language Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 23 Concurrent process model • ConcurrentProcessExample() { x = ReadX() y = ReadY() Call concurrently: PrintHelloWorld(x) and PrintHowAreYou(y) } PrintHelloWorld(x) { while( 1 ) { print "Hello world." delay(x); } } PrintHowAreYou(x) { while( 1 ) { print "How are you?" delay(y); } } • • Describes functionality of system in terms of two or more concurrently executing subtasks Many systems easier to describe with concurrent process model because inherently multitasking E.g., simple example: – – – • Read two numbers X and Y Display “Hello world.” every X seconds Display “How are you?” every Y seconds More effort would be required with sequential program or state machine model PrintHelloWorld Simple concurrent process example ReadX ReadY PrintHowAreYou time Subroutine execution over time Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Enter X: 1 Enter Y: 2 Hello world. Hello world. How are you? Hello world. How are you? Hello world. ... (Time (Time (Time (Time (Time (Time = = = = = = 1 2 2 3 4 4 s) s) s) s) s) s) Sample input and output 24 Dataflow model • • Derivative of concurrent process model Nodes represent transformations – • • • • A May execute concurrently B C Edges represent flow of tokens (data) from one node to another – • Z = (A + B) * (C - D) + – – May or may not have token at any given time t1 t2 When all of node’s input edges have at least one token, node may fire When node fires, it consumes input tokens processes transformation and generates output token Nodes may fire simultaneously Several commercial tools support graphical languages for capture of dataflow model – D * Z Nodes with arithmetic transformations A B C modulate D convolve t1 t2 Can automatically translate to concurrent process model for implementation Each node becomes a process transform Z Nodes with more complex transformations Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 25 Synchronous dataflow • • • With digital signal-processors (DSPs), data flows at fixed rate Multiple tokens consumed and produced per firing Synchronous dataflow model takes advantage of this – Each edge labeled with number of tokens consumed/produced each firing – Can statically schedule nodes, so can easily use sequential program model • Don’t need real-time operating system and its overhead • • How would you map this model to a sequential programming language? Try it... Algorithms developed for scheduling nodes into “singleappearance” schedules – Only one statement needed to call each node’s associated procedure A mA B C D mB mC modulate mD convolve mt1 t1 t2 tt1 ct2 tt2 transform tZ Z Synchronous dataflow • Allows procedure inlining without code explosion, thus reducing overhead even more Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 26 Concurrent processes and real-time systems Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 27 Concurrent processes • Consider two examples having separate tasks running independently but sharing data • Difficult to write system using sequential program model • Concurrent process model easier – Separate sequential programs (processes) for each task – Programs communicate with each other Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Heartbeat Monitoring System B[1..4] Heart-beat pulse Task 1: Read pulse If pulse < Lo then Activate Siren If pulse > Hi then Activate Siren Sleep 1 second Repeat Task 2: If B1/B2 pressed then Lo = Lo +/– 1 If B3/B4 pressed then Hi = Hi +/– 1 Sleep 500 ms Repeat Set-top Box Input Signal Task 1: Read Signal Separate Audio/Video Send Audio to Task 2 Send Video to Task 3 Repeat Task 2: Wait on Task 1 Decode/output Audio Repeat Task 3: Wait on Task 1 Decode/output Video Repeat Video Audio 28 Process • A sequential program, typically an infinite loop – Executes concurrently with other processes – We are about to enter the world of “concurrent programming” • Basic operations on processes – Create and terminate • Create is like a procedure call but caller doesn’t wait – Created process can itself create new processes • Terminate kills a process, destroying all data • In HelloWord/HowAreYou example, we only created processes – Suspend and resume • Suspend puts a process on hold, saving state for later execution • Resume starts the process again where it left off – Join • A process suspends until a particular child process finishes execution Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 29 Communication among processes • Processes need to communicate data and signals to solve their computation problem – Processes that don’t communicate are just independent programs solving separate problems • Basic example: producer/consumer – Process A produces data items, Process B consumes them – E.g., A decodes video packets, B display decoded packets on a screen Encoded video packets processA() { // Decode packet // Communicate packet to B } } Decoded video packets void processB() { // Get packet from A // Display packet } • How do we achieve this communication? – Two basic methods • Shared memory • Message passing Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis To display 30 but less flexible Embedded Systems Design: A Unified Hardware/Software Introduction. count = count . count = count + 1. (c) 2000 Vahid/Givargis 32 .Shared Memory • Processes read and write shared variables – No time overhead. transform(&data) send(A. to receive the data • Both operations must explicitly specify which process it is sending to or receiving from • Receive is blocking. consume(&data). Say “count” is 3. &data).1. i = (i + 1) % N. while( count == N ). receive. } } void processB() { int i. &data). } } void main() { create_process(processA). must wait A loads count (count = 3) from memory into register R1 (R1 = 3) A increments R1 (R1 = 4) B loads count (count = 3) from memory into register R2 (R2 = 3) B decrements R2 (R2 = 2) A stores R1 back to count in memory (count = 4) B stores R2 back to count in memory (count = 2) 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: data_type buffer[N]. hard to use – mistakes are common • Example: Producer/consumer with a mistake – Share buffer[N]. while( 1 ) { while( count == 0 ). &data). } } void processB() { while( 1 ) { receive(A. consume(&data). /* region 2 */ } } – Safer model. • • • • • • • • • – count = # of valid data items in buffer If buffer is full./*loop*/ buffer[i] = data. create_process(processB). must wait If buffer is empty. void processA() { int i. int count = 0. send may or may not be blocking void processA() { while( 1 ) { produce(&data) send(B. /* region 1 */ receive(B. (c) 2000 Vahid/Givargis 31 Message Passing • Message passing – Data explicitly sent from one process to another • Sending process performs special operation. while( 1 ) { produce(&data). count – processA produces data items and stores in buffer – processB consumes data items from buffer – Error when both processes try to update count concurrently (lines 10 and 19) and the following execution sequence occurs./*loop*/ data = buffer[i]. easy to implement – But. i = (i + 1) % N. &data). send • Receiving process must perform special operation. } count now has incorrect value of 2 Embedded mb Systems Design: A Unified Hardware/Software Introduction. unlock(). while( count == N ). count_mutex.lock()./*loop*/ data = buffer[i]. } } void main() { create_process(processA). mutex count_mutex. create_process(processB). can occur • When a process enters the critical section. count_mutex. by multiple processes to a shared memory location.1.unlock(). } 34 ./*loop*/ buffer[i] = data.lock(). consume(&data). while( 1 ) { while( count == 0 ). (c) 2000 Vahid/Givargis 33 Correct Shared Memory Solution to the Consumer-Producer Problem • • The primitive mutex is used to ensure critical sections are executed in mutual exclusion of each other Following the same execution sequence as before: – – A/B execute lock operation on count_mutex Either A or B will acquire lock • • – – – – B loads count (count = 3) from memory into register R2 (R2 = 3) B decrements R2 (R2 = 2) B stores R2 back to count in memory (count = 2) B executes unlock operation • – – – • Say B acquires it A will be put in blocked state A is placed in runnable state again A loads count (count = 2) from memory into register R1 (R1 = 2) A increments R1 (R1 = 3) A stores R1 back to count in memory (count = 3) Count now has correct value of 3 Embedded Systems Design: A Unified Hardware/Software Introduction.Back to Shared Memory: Mutual Exclusion • Certain sections of code should not be performed concurrently – Critical section • Possibly noncontiguous section of code where simultaneous updates. while( 1 ) { produce(&data). i = (i + 1) % N. (c) 2000 Vahid/Givargis 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: data_type buffer[N]. count_mutex. i = (i + 1) % N. but only one process will acquire lock • All other processes trying to obtain lock will be put in blocked state until unlock operation performed by acquiring process when it exits critical section • These processes will then be placed in runnable state and will compete for lock again Embedded Systems Design: A Unified Hardware/Software Introduction. } } void processB() { int i. count = count . all other processes must be locked out until it leaves the critical section – Mutex • A shared object used for locking and unlocking segment of shared data • Disallows read/write access to memory it guards • Multiple processes can perform lock operation simultaneously. int count = 0. count = count + 1. count_mutex. void processA() { int i. Process Communication • Try modeling “req” value of our elevator controller System interface up Unit Control – Using shared memory – Using shared memory and mutexes – Using message passing down open floor req Request Resolver ... b1 b2 bN up1 up2 dn2 up3 dn3 buttons inside elevator up/down buttons on each floor ... dnN Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 35 A Common Problem in Concurrent Programming: Deadlock • Deadlock: A condition where 2 or more processes are blocked waiting for the other to unlock critical sections of code – – • Both processes are then in blocked state Cannot execute unlock operation so will wait forever Example code has 2 different critical sections of code that can be accessed simultaneously – – 2 locks needed (mutex1, mutex2) Following execution sequence produces deadlock • • • • A executes lock operation on mutex1 (and acquires it) B executes lock operation on mutex2( and acquires it) A/B both execute in critical sections 1 and 2, respectively A executes lock operation on mutex2 – • B executes lock operation on mutex1 – • • A blocked until B unlocks mutex2 B blocked until A unlocks mutex1 DEADLOCK! One deadlock elimination protocol requires locking of numbered mutexes in increasing order and two-phase locking (2PL) – 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: mutex mutex1, mutex2; void processA() { while( 1 ) { … mutex1.lock(); /* critical section mutex2.lock(); /* critical section mutex2.unlock(); /* critical section mutex1.unlock(); } } void processB() { while( 1 ) { … mutex2.lock(); /* critical section mutex1.lock(); /* critical section mutex1.unlock(); /* critical section mutex2.unlock(); } } 1 */ 2 */ 1 */ 2 */ 1 */ 2 */ Acquire locks in 1st phase only, release locks in 2nd phase Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 36 Synchronization among processes • Sometimes concurrently running processes must synchronize their execution – When a process must wait for: • another process to compute some value • reach a known point in their execution • signal some condition • Recall producer-consumer problem – processA must wait if buffer is full – processB must wait if buffer is empty – This is called busy-waiting • Process executing loops instead of being blocked • CPU time wasted • More efficient methods – Join operation, and blocking send and receive discussed earlier • Both block the process so it doesn’t waste CPU time – Condition variables and monitors Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 37 Condition variables • Condition variable is an object that has 2 operations, signal and wait • When process performs a wait on a condition variable, the process is blocked until another process performs a signal on the same condition variable • How is this done? – Process A acquires lock on a mutex – Process A performs wait, passing this mutex • Causes mutex to be unlocked – Process B can now acquire lock on same mutex – Process B enters critical section • Computes some value and/or make condition true – Process B performs signal when condition true • Causes process A to implicitly reacquire mutex lock • Process A becomes runnable Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 38 Condition variable example: consumer-producer • – – • Consumer-producer using condition variables 2 condition variables buffer_empty • Signals at least 1 free location available in buffer buffer_full • Signals at least 1 valid data item in buffer processA: – – – – – produces data item acquires lock (cs_mutex) for critical section checks value of count if count = N, buffer is full • performs wait operation on buffer_empty • this releases the lock on cs_mutex allowing processB to enter critical section, consume data item and free location in buffer • processB then performs signal if count < N, buffer is not full • processA inserts data into buffer • increments count • signals processB making it runnable if it has performed a wait operation on buffer_full 01: 02: 03: 04: 06: 07: 08: 09: 10: 11: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 37: data_type buffer[N]; int count = 0; mutex cs_mutex; condition buffer_empty, buffer_full; void processA() { int i; while( 1 ) { produce(&data); cs_mutex.lock(); if( count == N ) buffer_empty.wait(cs_mutex); buffer[i] = data; i = (i + 1) % N; count = count + 1; cs_mutex.unlock(); buffer_full.signal(); } } void processB() { int i; while( 1 ) { cs_mutex.lock(); if( count == 0 ) buffer_full.wait(cs_mutex); data = buffer[i]; i = (i + 1) % N; count = count - 1; cs_mutex.unlock(); buffer_empty.signal(); consume(&data); } } void main() { create_process(processA); create_process(processB); } Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 39 Monitors • • Collection of data and methods or subroutines that operate on data similar to an object-oriented paradigm Monitor guarantees only 1 process can execute inside monitor at a time • (a) Process X executes while Process Y has to wait • (b) Process X performs wait on a condition – Process Y allowed to enter and execute Monitor Monitor DATA Waiting DATA CODE Process X CODE Process Y Process X (a) (b) Monitor • • (c) Process Y signals condition Process X waiting on – Process Y blocked – Process X allowed to continue executing (d) Process X finishes executing in monitor or waits on a condition again – Process Y made runnable again Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Monitor DATA Waiting DATA CODE Process X CODE Process Y (c) Process Y Process X Process Y (d) 40 Monitor example: consumer-producer • • • Single monitor encapsulates both processes along with buffer and count One process will be allowed to begin executing first If processB allowed to execute first – – – – – – – – Will execute until it finds count = 0 Will perform wait on buffer_full condition variable processA now allowed to enter monitor and execute processA produces data item finds count < N so writes to buffer and increments count processA performs signal on buffer_full condition variable processA blocked processB reenters monitor and continues execution, consumes data, etc. 01: 02: 03: 04: 06: 07: 08: 09: 10: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 35: Monitor { data_type buffer[N]; int count = 0; condition buffer_full, condition buffer_empty; void processA() { int i; while( 1 ) { produce(&data); if( count == N ) buffer_empty.wait(); buffer[i] = data; i = (i + 1) % N; count = count + 1; buffer_full.signal(); } } void processB() { int i; while( 1 ) { if( count == 0 ) buffer_full.wait(); data = buffer[i]; i = (i + 1) % N; count = count - 1; buffer_empty.signal(); consume(&data); buffer_full.signal(); } } } /* end monitor */ void main() { create_process(processA); create_process(processB); } Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 41 Implementation • Mapping of system’s functionality onto hardware processors: – captured using computational model(s) – written in some language(s) • • • Implementation choice independent from language(s) choice Implementation choice based on power, size, performance, timing and cost requirements Final implementation tested for feasibility – Also serves as blueprint/prototype for mass manufacturing of final product Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis State machine Sequent. program Dataflow Pascal C/C++ Java Implementation A Implementation B Concurrent processes VHDL Implementation C The choice of computational model(s) is based on whether it allows the designer to describe the system. The choice of language(s) is based on whether it captures the computational model(s) used by the designer. The choice of implementation is based on whether it meets power, size, performance and cost requirements. 42 Can use single and/or general-purpose processors (a) Multiple processors, each executing one process – – True multitasking (parallel processing) General-purpose processors • • – – – • (a) Process3 Use programming language like C and compile to instructions of processor Expensive and in most cases not necessary Process4 Process2 – Process3 (b) Processor D General Purpose Processor Process4 Most processes don’t use 100% of processor time Can share processor time and still achieve necessary execution rates (c) Combination of (a) and (b) Processor C Process1 More common (b) One general-purpose processor running all processes Processor B Process2 Custom single-purpose processors • • Processor A Process1 Processor A Process1 Process2 (c) Multiple processes run on one general-purpose processor while one or more processes run on own single_purpose processor Process3 Process4 General Purpose Processor Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Communication Bus • • Communication Bus Concurrent process model: implementation 43 Implementation: multiple processes sharing single processor • Can manually rewrite processes as a single sequential program – – – Ok for simple examples, but extremely difficult for complex examples Automated techniques have evolved but not common E.g., simple Hello World concurrent program from before would look like: I = 1; T = 0; while (1) { Delay(I); T = T + 1; if X modulo T is 0 then call PrintHelloWorld if Y modulo T is 0 then call PrintHowAreYou } • Can use multitasking operating system – – – – • Much more common Operating system schedules processes, allocates storage, and interfaces to peripherals, etc. Real-time operating system (RTOS) can guarantee execution rate constraints are met Describe concurrent processes with languages having built-in processes (Java, Ada, etc.) or a sequential programming language with library support for concurrent processes (C, C++, etc. using POSIX threads for example) Can convert processes to sequential program with process scheduling right in code – – Less overhead (no operating system) More complex/harder to maintain Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 44 Processes vs. threads • Different meanings when operating system terminology • Regular processes – Heavyweight process – Own virtual address space (stack, data, code) – System resources (e.g., open files) • Threads – – – – Lightweight process Subprocess within process Only program counter, stack, and registers Shares address space, system resources with other threads • Allows quicker communication between threads – Small compared to heavyweight processes • Can be created quickly • Low cost switching between threads Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 45 Implementation: suspending, resuming, and joining • Multiple processes mapped to single-purpose processors – Built into processor’s implementation – Could be extra input signal that is asserted when process suspended – Additional logic needed for determining process completion • Extra output signals indicating process done • Multiple processes mapped to single general-purpose processor – Built into programming language or special multitasking library like POSIX – Language or library may rely on operating system to handle Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 46 Implementation: process scheduling • Must meet timing requirements when multiple concurrent processes implemented on single general-purpose processor – Not true multitasking • Scheduler – Special process that decides when and for how long each process is executed – Implemented as preemptive or nonpreemptive scheduler – Preemptive • Determines how long a process executes before preempting to allow another process to execute – Time quantum: predetermined amount of execution time preemptive scheduler allows each process (may be 10 to 100s of milliseconds long) • Determines which process will be next to run – Nonpreemptive • Only determines which process is next after current process finishes execution Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 48 . (c) 2000 Vahid/Givargis 47 Scheduling: priority • Process with highest priority always selected first by scheduler – Typically determined statically during creation and dynamically during execution • FIFO – Runnable processes added to end of FIFO as created or become runnable – Front process removed from FIFO when time quantum of current process is up or process is blocked • Priority queue – Runnable processes again added as created or become runnable – Process with highest priority chosen when new process needed – If multiple processes with same highest priority value then selects from them using first-come first-served – Called priority scheduling when nonpreemptive – Called round-robin when preemptive Embedded Systems Design: A Unified Hardware/Software Introduction. period = 100 ms Process must complete execution within 20 ms after it has begun regardless of its period Process begins at start of period. concurrent processes with stringent execution time constraints – E. execution time = 5 ms. (c) 2000 Vahid/Givargis 50 . runs for 4 ms then is preempted Process suspended for 14 ms. then runs for the remaining 1 ms Completed within 4 + 14 + 1 = 19 ms which meets deadline of 20 ms Without deadline process could be suspended for much longer Rate monotonic scheduling – – • E.g. – Communication and synchronization between processes for these systems is critical – Therefore.g.g. concurrent process model best suited for describing these systems Embedded Systems Design: A Unified Hardware/Software Introduction. refresh rate of display is 27 times/sec Period = 37 ms Execution deadline – • E. (c) 2000 Vahid/Givargis 49 Real-time systems • Systems composed of 2 or more cooperating.g. deadline = 20 ms.Priority assignment • Period of process – Repeating time interval the process must complete one execution within • • – Usually determined by the description of the system • • • Amount of time process must be completed by after it has started • • • • • • E. period = 100 ms Process must execute once every 100 ms Processes with shorter periods have higher priority Typically used when execution deadline = period Rate monotonic Process Period Priority A B C D E F 25 ms 50 ms 12 ms 100 ms 40 ms 75 ms 5 3 6 1 4 2 Deadline monotonic Process Deadline Priority G H I J K L 17 ms 50 ms 32 ms 10 ms 140 ms 32 ms 5 2 3 6 1 4 Deadline monotonic scheduling – – Processes with shorter deadlines have higher priority Typically used when execution deadline < period Embedded Systems Design: A Unified Hardware/Software Introduction..... set-top boxes have separate processes that read or decode video and/or sound concurrently and must decode 20 frames/sec for output to appear continuous – Other examples with stringent time constraints are: • • • • • digital cell phones navigation and process control systems assembly line monitoring systems multimedia and networking systems etc. and guidelines for building real-time embedded systems Windows CE – – – – – – • Built specifically for embedded systems and appliance market Scalable real-time 32-bit platform Supports Windows API Perfect for systems designed to interface with Internet Preemptive priority scheduling with 256 priority levels per process Kernel is 400 Kbytes QNX – Real-time microkernel surrounded by optional processes (resource managers) that provide POSIX and UNIX compatibility • • – – – Microkernels typically support only the most basic services Optional resource managers allow scalability from small ROM-based systems to huge multiprocessor systems connected by various networking and communication technologies Preemptive process scheduling using FIFO. or priority-driven scheduling 32 priority levels per process Microkernel < 10 Kbytes and complies with POSIX real-time standard Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 51 Summary • Computation models are distinct from languages • Sequential program model is popular – Most common languages like C support it directly • State machine models good for control – Extensions like HCFSM provide additional power – PSM combines state machines and sequential programs • Concurrent process model for multi-task systems – Communication and synchronization methods exist – Scheduling is critical • Dataflow model good for signal processing Embedded Systems Design: A Unified Hardware/Software Introduction. round-robin. primitives.Real-time operating systems (RTOS) • • Provide mechanisms. (c) 2000 Vahid/Givargis 52 . adaptive. Embedded Systems Design: A Unified Hardware/Software Introduction Chapter 2: Custom single-purpose processors 1 Outline • • • • • Introduction Combinational logic Sequential logic Custom single-purpose processor design RT-level custom single-purpose processor design Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 2 . low power – But.Introduction • Processor – Digital circuit that performs a computation tasks – Controller and datapath CCD – General-purpose: variety of computation tasks – Single-purpose: one particular lens computation task – Custom single-purpose: non-standard task Digital camera chip CCD preprocessor A2D JPEG codec Pixel coprocessor Microcontroller Multiplier/Accum DMA controller Display ctrl • A custom single-purpose processor may be – Fast. (c) 2000 Vahid/Givargis 3 CMOS transistor on silicon • Transistor – The basic electrical component in digital systems – Acts as an on/off switch – Voltage at “gate” controls whether current flows from source to drain – Don’t confuse this “gate” with a logic gate gate 1 IC package IC source gate oxide channel drain Conducts if gate at 1 source drain Silicon substrate nMOS transistor Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 4 . less flexible D2A Memory controller ISA bus interface UART LCD ctrl Embedded Systems Design: A Unified Hardware/Software Introduction. high NRE. small. longer time-to-market. 1 : 5V or less • Two basic CMOS types – nMOS conducts if gate at 1 – pMOS conducts if gate at 0 – Hence “complementary” 1 1 1 x x F = x' y x F = (xy)' y x y 0 • Basic gates F = (x+y)' x 0 – Inverter. (c) 2000 Vahid/Givargis x 0 0 1 1 y 0 1 0 1 F 0 0 0 1 x y x 0 0 1 1 y 0 1 0 1 F 1 1 1 0 x y F F=x+y OR F = (x+y)’ NOR F x 0 0 1 1 y 0 1 0 1 F 0 1 1 1 x x 0 0 1 1 y 0 1 0 1 F 1 0 0 0 x y F F=xy XOR y F F = (x y)’ XNOR x 0 0 1 1 y 0 1 0 1 F 0 1 1 0 x 0 0 1 1 y 0 1 0 1 F 1 0 0 1 6 . (c) 2000 Vahid/Givargis 5 Basic logic gates x F x 0 1 F 0 1 F = x’ Inverter F y F=xy AND F=x Driver x x F x 0 1 F 1 0 x y F F = (x y)’ NAND Embedded Systems Design: A Unified Hardware/Software Introduction.CMOS transistor implementations • Complementary Metal Oxide Semiconductor • We refer to logic levels source drain gate Conducts if gate at 1 source gate Conducts if gate at 0 drain pMOS nMOS – Typically 0 : 0V. NOR 0 NOR gate NAND gate inverter y Embedded Systems Design: A Unified Hardware/Software Introduction. NAND. . m x 1 … Multiplexor S(log m) n O Multiplexor O= I0 if S=0.. m function S0 ALU … S(log m) n O ALU O = A op B op determined by S. but not both.. 8 . D) Minimized output equations y bc 00 01 11 10 a 0 0 0 1 0 1 1 1 1 a 0 0 0 0 1 1 1 1 C) Output equations Outputs y z 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 Inputs b c 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 y = a'bc + ab'c' + ab'c + abc' + abc z = a'b'c + a'bc' + ab'c + abc' + abc E) Logic Gates (random logic) a b c 1 y = a + bc z bc 0 00 0 01 1 11 0 10 1 1 0 1 1 1 a y z z = ab + b’c + bc’ Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis n-bit Comparator n O0 =1 if I=0.01 … I(m-1) if S=1. or if all are 1.00 I1 if S=0.. etc.00 O1 =1 if I=0. or b and c are 1.11 sum = A + B + Ci A B less = 1 if A<B equal =1 if A=B greater=1 if A>B n B n n bit.11 I(log n -1) I0 … A B n A n n log n x n Decoder … n-bit Adder O(n-1) O1 O0 carry sum less equal greater Decoder Adder Comparator sum = A+B (first n bits) carry = (n+1)’th bit of A+B With enable input e Æ all O’s are 0 if e=0 With carry-in input CiÆ Embedded Systems Design: A Unified Hardware/Software Introduction.. (c) 2000 Vahid/Givargis 7 Combinational components I(m-1) I1 I0 n … S0 n-bit. z is 1 if b or c is to 1.. zero.01 … O(n-1) =1 if I=1. May have status outputs carry.Combinational logic design A) Problem description B) Truth table y is 1 if a is to 1. Sequential components I n load shift n-bit Register clear I n n-bit Shift register n-bit Counter n Q Q Q Shift register (storage) Register Counter Q = lsb . (c) 2000 Vahid/Givargis 10 . Slow down your preexisting clock so that you output a 1 for every four clock cycles a Combinational logic I0 B) State Diagram a=0 0 a=1 1 x=0 a=0 I1 I0 Q1 0 0 0 0 1 1 1 1 Inputs Q0 a 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 I1 0 0 0 1 1 1 1 0 Outputs I0 0 1 1 0 0 1 1 0 x 0 0 0 1 3 a=1 a=0 Q0 State register x=1 x=0 x I1 Q1 D) State Table (Moore-type) a=1 a=1 2 x=0 a=0 • Given this implementation model – Sequential logic design quickly reduces to combinational logic design Embedded Systems Design: A Unified gis Hardware/Software Introduction. Q(previous) otherwise. Q= 0 if clear=1. Q(prev)+1 if count=1 and clock=1.I stored in msb Q= 0 if clear=1.Content shifted . Embedded Systems Design: A Unified Hardware/Software Introduction. I if load=1 and clock=1. (c) 2000 Vahid/Givargis 9 Sequential logic design A) Problem Description C) Implementation Model You want to construct a clock divider. Sequential logic design (cont. (c) 2000 Vahid/Givargis … a view inside the controller and datapath 12 . (c) 2000 Vahid/Givargis 11 Custom single-purpose processor basic model … … external control inputs … external data inputs … controller datapath control inputs … datapath control outputs external control outputs datapath … controller datapath next-state and control logic registers state register functional units external data outputs … controller and datapath Embedded Systems Design: A Unified Hardware/Software Introduction.) F) Combinational Logic E) Minimized Output Equations I1 Q1Q0 00 a 01 11 10 0 0 0 1 1 1 0 1 0 1 01 11 10 I0 Q1Q0 00 a 0 0 1 1 0 1 1 0 0 1 01 11 10 x Q1Q0 00 a (random logic) a x I1 = Q1’Q0a + Q1a’ + Q1Q0’ I1 I0 = Q0a’ + Q0’a I0 0 0 0 1 0 1 0 0 1 0 x = Q1Q0 Q1 Q0 Embedded Systems Design: A Unified Hardware/Software Introduction. 1: while (1) { 2: while (!go_i) .y. else 8: x = x . specification !(x!=y) 5: 0: int x. (c) 2000 Vahid/Givargis c1 stmts !c1*c2 next statement 14 . 5: while (x != y) { 6: if (x < y) 7: y = y .Example: greatest common divisor !1 (a) black-box view • First create algorithm • Convert algorithm to “complex” state machine (c) state diagram 1: 1 !(!go_i) 2: go_i x_i y_i !go_i 2-J: GCD – Known as FSMD: finitestate machine with datapath – Can use templates to perform such conversion 3: x = x_i 4: y = y_i d_o (b) alg. (c) 2000 Vahid/Givargis 13 State diagram templates Assignment statement Loop statement while (cond) { loop-bodystatements } next statement a=b next statement a=b Branch statement !cond C: if (c1) c1 stmts else if c2 c2 stmts else other stmts next statement C: c1 cond loop-bodystatements next statement c2 stmts !c1*!c2 others J: J: next statement Embedded Systems Design: A Unified Hardware/Software Introduction. } 9: d_o = x.y 6-J: 5-J: 9: d_o = x 1-J: Embedded Systems Design: A Unified Hardware/Software Introduction.x. 4: y = y_i. } x!=y 6: x<y 7: y = y -x !(x<y) 8: x = x . y. 3: x = x_i. (c) 2000 Vahid/Givargis n-bit 2x1 0: x 0: y y_ld 5: x!=y x_neq_y 1010 5-J: n-bit 2x1 y_sel 1000 1001 6-J: 5-J: x_sel x_ld x_lt_y 7: y_sel = 1 y_ld = 1 y_i Datapath !x_neq_y x!=y 6: • Same structure as FSMD • Replace complex actions/conditions with datapath configurations < 6: x<y subtractor 8: x-y subtractor 7: y-x 9: d d_ld d_o 16 .y 9: d d_ld d_o 6-J: • Create unique identifier 7: y-x 5-J: – for each datapath component control input and output 9: d_o = x 1-J: Embedded Systems Design: A Unified Hardware/Software Introduction.Creating the datapath • Create a register for any declared variable • Create a functional unit for each arithmetic operation • Connect the ports. (c) 2000 Vahid/Givargis 15 Creating the controller’s FSM go_i !1 1: Controller 1 !(!go_i) 0000 1: 0001 2: !1 1 2: !go_i !(!go_i) !go_i 2-J: 0010 2-J: 3: x = x_i 4: y = y_i 0011 x_sel = 0 3: x_ld = 1 0100 y_sel = 0 4: y_ld = 1 0101 5: !(x!=y) 5: x_i 0110 x<y 7: y = y -x !(x<y) 8: x = x .y x_neq_y 6: !x_lt_y 8: x_sel = 1 x_ld = 1 0111 6-J: 9: 1-J: d_o = x != x_lt_y 1011 9: d_ld = 1 1100 1-J: Embedded Systems Design: A Unified Hardware/Software Introduction. registers and functional units !1 1: 1 !(!go_i) 2: x_i !go_i Datapath 2-J: x_sel 3: x = x_i 4: y = y_i x_ld n-bit 2x1 0: x 0: y y_ld !(x!=y) 5: != 5: x!=y x_neq_y 6: x<y y = y -x 7: n-bit 2x1 y_sel x!=y – Based on reads and writes – Use multiplexors for multiple sources y_i !(x<y) < subtractor 6: x<y subtractor 8: x-y x_lt_y 8: x = x . Splitting into a controller and datapath go_i Controller Controller implementation model 0000 go_i !1 x_i 1: 1 x_sel Combinational logic 0001 y_sel (b) Datapath 2: x_sel !go_i x_ld 0010 2-J: y_ld x_neq_y 0011 x_lt_y d_ld 0100 x_ld x_sel = 0 3: x_ld = 1 5: 0110 6: I1 5: x!=y x_neq_y x_neq_y=1 x_lt_y=1 7: y_sel = 1 y_ld = 1 I0 0: x 0: y != x_neq_y=0 subtractor 8: x-y subtractor 7: y-x 9: d d_ld x_lt_y=0 8: x_sel = 1 x_ld = 1 0111 < 6: x<y x_lt_y State register I2 n-bit 2x1 y_ld y_sel = 0 4: y_ld = 1 0101 n-bit 2x1 y_sel Q3 Q2 Q1 Q0 I3 y_i !(!go_i) d_o 1000 1001 6-J: 1010 5-J: 1011 9: d_ld = 1 1100 1-J: Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 17 Controller state table for the GCD example Inputs Q3 Q2 Q1 Q0 0 0 0 0 0 0 Outputs x_lt_ y * go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld 0 x_neq _y * * 0 0 0 1 X X 0 0 0 0 1 * * 0 0 0 1 0 X X 0 0 0 0 0 1 * * 1 0 0 1 1 X X 0 0 0 0 0 1 0 * * * 0 0 0 1 X X 0 0 0 0 0 1 1 * * * 0 1 0 0 0 X 1 0 0 0 1 0 0 * * * 0 1 0 1 X 0 0 1 0 0 1 0 1 0 * * 1 0 1 1 X X 0 0 0 0 1 0 1 1 * * 0 1 1 0 X X 0 0 0 0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0 0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0 0 1 1 1 * * * 1 0 0 1 X 1 0 1 0 1 0 0 0 * * * 1 0 0 1 1 X 1 0 0 1 0 0 1 * * * 1 0 1 0 X X 0 0 0 1 0 1 0 * * * 0 1 0 1 X X 0 0 0 1 0 1 1 * * * 1 1 0 0 X X 0 0 1 1 1 0 0 * * * 0 0 0 0 X X 0 0 0 1 1 0 1 * * * 0 0 0 0 X X 0 0 0 1 1 1 0 * * * 0 0 0 0 X X 0 0 0 1 1 1 1 * * * 0 0 0 0 X X 0 0 0 Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 18 . data_hi: bit[4]. data_in: bit[4]. but we see the basic steps … … a view inside the controller and datapath Embedded Systems Design: A Unified ard Hardware/Software Introduction. into one 8-bit output on data_out along with a rdy_out pulse.Completing the GCD custom single-purpose processor design • We finished the datapath • We have a state table for the next state and control logic … … controller datapath next-state and control logic registers state register functional units – All that’s left is combinational logic design • This is not an optimized design. 20 . Outputs rdy_out: bit. rdy_in=0 rdy_out Rece iver data_out(8) Bridge rdy_in=1 RecFirst4Start data_lo=data_in RecFirst4End rdy_in=1 WaitFirst4 rdy_in=0 FSMD – Bus bridge that converts 4-bit bus to 8-bit bus – Start with FSMD – Known as register-transfer (RT) level – Exercise: complete the design rdy_in WaitSecond4 rdy_in=0 rdy_in=1 RecSecond4Start data_hi=data_in rdy_in=0 Send8Start data_out=data_hi & data_lo rdy_out=1 Send8End rdy_out=0 rdy_in=1 RecSecond4End Inputs rdy_in: bit. data_out:bit[8] Variables data_lo. (c) 2000 Vahid/Givargis 19 • We often start with a state machine – Rather than algorithm – Cycle timing often too central to functionality Problem Specification RT-level custom single-purpose processor design Sende r clock data_in(4) • Example Embedded Systems Design: A Unified H Hardware/Software Introduction. (c) 2000 Vahid/Givargis Bridge A single-purpose processor that converts two 4-bit inputs. arriving one at a time over data_in along with a rdy_in pulse. (c) 2000 Vahid/Givargis 22 .RT-level custom single-purpose processor design (cont’) Bridge (a) Controller rdy_in=0 WaitFirst4 rdy_in=0 WaitSecond4 Send8Start data_out_ld=1 rdy_out=1 rdy_in=1 rdy_in=1 RecFirst4Start data_lo_ld=1 rdy_in=0 rdy_in=1 RecSecond4Start data_hi_ld=1 RecFirst4End rdy_in=1 RecSecond4End Send8End rdy_out=0 rdy_in rdy_out clk data_out data_hi data_lo data_lo_ld data_out_ld data_hi_ld to all registers data_in(4) data_out (b) Datapath Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 21 Optimizing custom single-purpose processors • Optimization is the task of making design metric values the best possible • Optimization opportunities – – – – original program FSMD datapath FSM Embedded Systems Design: A Unified Hardware/Software Introduction. 8) . else 8: x = x . (2. (8. 11: x = y. 3: x = x_i. (2.0) Embedded Systems Design: A Unified Hardware/Software Introduction. (18.4). (26. 1: while (1) { 2: while (!go_i) .8). (43. 8: y=x_i. 8).8).Optimizing the original program • Analyze program attributes and look for areas of possible improvement – – – – number of computations size of variable time and space complexity operations used • multiplication and division very expensive Embedded Systems Design: A Unified Hardware/Software Introduction. 8).2). (2. y. } 6: else { 7: x=y_i. 8). } 13: d_o = x. (10. x and y values evaluated as follows: (42. 5: while (x != y) { 6: if (x < y) 7: y = y .6). } 9: while (y != 0) { 10: r = x % y. 5: y=y_i.8) . 4: y = y_i.x. } 9: d_o = x. 1: while (1) { 2: while (!go_i) . y. 12: y = r.y. (c) 2000 Vahid/Givargis 23 Optimizing the original program (cont’) original program 0: int x. r. 8).9 iterations to complete the loop GCD(42. } GCD(42.8). // x must be the larger number 3: if (x_i >= y_i) { 4: x=x_i.3 iterations to complete the loop x and y values evaluated as follows : (42. (c) 2000 Vahid/Givargis 24 . } replace the subtraction operation(s) with modulo operation in order to speed up program optimized program 0: int x. (2.2). (2. respectively 6-J: 5-J: 9: d_o = x eliminate state 1-J – transition from state 1-J can be done directly from state 9 1-J: Embedded Systems Design: A Unified Hardware/Software Introduction. 1 !(!go_i) 2: eliminate state 1 – transitions have constant values 2: go_i !go_i 2-J: 3: 3: merge state 2 and state 2J – no loop operation in between them x = x_i !go_i x = x_i y = y_i 5: 4: y = y_i !(x!=y) 5: merge state 3 and state 4 – assignment operations are independent of one another x!=y 6: x<y 7: y = y -x !(x<y) merge state 5 and state 6 – transitions from state 6 can be done in state 5 x<y 7: y = y -x 9: x>y 8: x = x .y eliminate state 5J and 6J – transitions from each state can be done from state 7 and state 8. (c) 2000 Vahid/Givargis 25 Optimizing the FSMD (cont. transition taken is already known • states with independent operations can be merged – separate states • states which require complex operations (a*b*c*d) can be broken into smaller states to reduce hardware size – scheduling Embedded Systems Design: A Unified Hardware/Software Introduction.Optimizing the FSMD • Areas of possible improvements – merge states • states with constants on transitions can be eliminated. y. y. !1 1: original FSMD optimized FSMD int x.) int x. (c) 2000 Vahid/Givargis 26 .y d_o = x 8: x = x . they can share a single functional unit • Multi-functional units – ALUs support a variety of operations. (c) 2000 Vahid/Givargis 28 .Optimizing the datapath • Sharing of functional units – one-to-one mapping. (c) 2000 Vahid/Givargis 27 Optimizing the FSM • State encoding – task of assigning a unique bit pattern to each state in an FSM – size of state register and combinational logic vary – can be treated as an ordering problem • State minimization – task of merging equivalent states into a single state • state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state Embedded Systems Design: A Unified Hardware/Software Introduction. it can be shared among operations occurring in different states Embedded Systems Design: A Unified Hardware/Software Introduction. is not necessary – if same operation occurs in different states. as done previously. Summary • Custom single-purpose processors – – – – Straightforward design techniques Can be built to execute algorithms Typically start with FSMD CAD tools can be of great assistance Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 29 Embedded Systems Design: A Unified Hardware/Software Introduction Chapter 3 Instruction-Set Processors: Software 1 . size and power – System implementations designed with low NRE cost. short time-tomarket/prototype. no processor design – Terms microprocessor.Introduction • Instruction-Set Processor – Processor designed for a variety of computation tasks • General-Purpose Processor (GPP) • Application-Specific Processor (ASIP): optimized for a specific subset of tasks – Low unit cost because NRE is spreaded over large numbers of units • Motorola sold half a billion 68HC05 microcontrollers in 1996 alone – Carefully designed since higher NRE is acceptable • Can yield good performance. (c) 2000 Vahid/Givargis PC IR I/O Memory 3 . (c) 2000 Vahid/Givargis 2 Basic Architecture • Control unit and datapath Processor Control unit – Note similarity to single-purpose processor Datapath ALU Controller Control /Status Registers • Key differences – Datapath is general – Control unit doesn’t store the algorithm – the algorithm is “programmed” into the memory Embedded Systems Design: A Unified E Hardware/Software Introduction. microcontroller or micro adopted when they were finally implemented on one or few chips Embedded Systems Design: A Unified Hardware/Software Introduction. high flexibility • User just writes software. (c) 2000 Vahid/Givargis R0 Memory R1 . R0 102 store M[501]. M[500] 101 inc R1. Memory 10 11 . each one clock cycle. R1 Embedded Systems Design: A Unified Hardware/Software Introduction. store back in register Registers • Store 10 – Write register to memory location PC 11 IR I/O ..: – Fetch: Get next instruction into IR – Decode: Determine what the instruction means – Fetch operands: Move data from memory to datapath register – Execute: Move data through the ALU – Store results: Write data from register to memory Datapath Control /Status Registers PC IR I/O 100 load R0... (c) 2000 Vahid/Givargis 4 Control Unit • Control unit: configures the datapath operations Processor – Sequence of desired operations (“instructions”) stored in memory – “program” • Control unit ALU Controller Instruction cycle – broken into several sub-operations.. Embedded Systems Design: A Unified Hardware/Software Introduction.. 500 501 10 ...g. 5 . e..Datapath Operations • Load Processor – Read memory location into register Control unit Datapath ALU • ALU operation Controller +1 Control /Status – Input certain registers through ALU. M[500] R0 I/O 100 load R0. always points to next instruction – IR: holds the fetched instruction Control unit Datapath ALU Controller Control /Status Registers PC 100 IR load R0. 500 501 10 .... Embedded Systems Design: A Unified Hardware/Software Introduction. R0 102 store M[501]. M[500] Memory ......Control Unit Sub-Operations • Fetch Processor – Get next instruction into IR – PC: program counter. R1 R1 10 . 7 . M[500] R0 I/O 100 load R0. R0 102 store M[501]. (c) 2000 Vahid/Givargis 6 Control Unit Sub-Operations • Decode Processor Control unit – Determine what the instruction means Datapath ALU Controller Control /Status Registers PC 100 IR load R0. (c) 2000 Vahid/Givargis Memory R1 . M[500] 101 inc R1. 500 501 101 inc R1. R1 Embedded Systems Design: A Unified Hardware/Software Introduction. M[500] R0 I/O 100 load R0.. Embedded Systems Design: A Unified Hardware/Software Introduction. R0 102 store M[501]. 9 .. 500 501 10 . M[500] R0 I/O 100 load R0.. R1 Embedded Systems Design: A Unified Hardware/Software Introduction.. (c) 2000 Vahid/Givargis 8 Control Unit Sub-Operations • Execute – Move data through the ALU – This particular instruction does nothing during this sub-operation Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR load R0. (c) 2000 Vahid/Givargis Memory R1 .Control Unit Sub-Operations • Fetch operands Processor Control unit – Move data from memory to datapath register Datapath ALU Controller Control /Status Registers 10 PC 100 IR load R0. R0 102 store M[501]. M[500] 101 inc R1.. 500 501 101 inc R1.. R1 R1 10 .. M[500] Memory .. R1 R1 10 ... R1 Embedded Systems Design: A Unified Hardware/Software Introduction. M[500] .. (c) 2000 Vahid/Givargis 10 Instruction Cycles PC=100 Fetch Decode Fetch Exec. (c) 2000 Vahid/Givargis Memory R1 . Embedded Systems Design: A Unified Hardware/Software Introduction.Control Unit Sub-Operations • Store results Processor – Write data from register to memory – This particular instruction does nothing during this sub-operation Control unit Datapath ALU Controller Control /Status Registers 10 PC IR load R0. R0 102 store M[501]. 11 . M[500] 101 inc R1. R0 102 store M[501].. 500 501 10 ... M[500] 100 R0 I/O Memory 100 load R0. 500 501 101 inc R1.. Store ops results clk Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR load R0.. M[500] R0 I/O 100 load R0. 500 10 501 11 . 13 . R1 Embedded Systems Design: A Unified Hardware/Software Introduction.Instruction Cycles PC=100 Processor Fetch Decode Fetch Exec. Store ops results clk Processor Control unit Datapath ALU Controller Control /Status PC=101 Registers Fetch Decode Fetch Exec. R0 R0 I/O 100 load R0.. R1 R0 11 R1 PC=102 Fetch Decode Fetch Exec... Embedded Systems Design: A Unified Hardware/Software Introduction. M[500] 101 inc R1. (c) 2000 Vahid/Givargis 12 Instruction Cycles PC=100 Fetch Decode Fetch Exec. Store ops results clk 10 PC 102 IR store M[501].. Store ops results clk Control unit Datapath ALU Controller +1 Control /Status PC=101 Registers Fetch Decode Fetch Exec.. R0 102 store M[501]. M[500] Memory 101 inc R1. 500 501 10 . R1 11 R1 .. (c) 2000 Vahid/Givargis Memory .. R0 102 store M[501]. Store ops results clk I/O 100 load R0. Store ops results clk 10 PC 101 IR inc R1.. (c) 2000 Vahid/Givargis 14 Architectural Considerations • Clock frequency – Inverse of clock period – Must be longer than longest register to register delay in entire processor – Memory access is often the longest Processor Control unit Datapath ALU Controller Control /Status Registers PC IR I/O Memory Embedded Systems Design: A Unified Hardware/Software Introduction. 32-bit common – Desktop/servers: 32bit. registers. (c) 2000 Vahid/Givargis 15 .Architectural Considerations • N-bit processor – N-bit ALU. buses. 16bit. even 64 Processor Control unit Datapath ALU Controller Control /Status Registers PC IR • PC size determines address space I/O Memory Embedded Systems Design: A Unified Hardware/Software Introduction. memory data interface – Embedded: 8-bit. (c) 2000 Vahid/Givargis 16 Superscalar and VLIW Architectures • Performance can be improved by: – Faster clock (but there’s a limit) – Pipelining: slice up instruction into stages. executes as many as possible • May require extensive hardware to detect independent instructions – VLIW: each word in memory has multiple independent instructions • Relies on the compiler to detect and schedule instructions • Currently growing in popularity Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis ard 17 . 4 pipelined instruction execution 8 Time Pipelined 8 Time Embedded Systems Design: A Unified Hardware/Software Introduction.Pipelining: Increasing Instruction Throughput Wash 1 2 3 4 5 6 7 8 1 2 3 1 2 Non-pipelined Dry 1 Decode 1 3 4 5 6 7 Time 6 7 8 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 Instruction 1 3 4 5 6 7 pipelined dish cleaning 3 Execute Store res. overlap stages – Multiple ALUs to support more than one instruction stream • Superscalar – Scalar: non-vector operations – Fetches instructions in batches. 8 2 Fetch ops. 5 Pipelined 2 non-pipelined dish cleaning Fetch-instr. usually on the same chip Processor Program Cache Data Cache Memory Slower/cheaper technology. (c) 2000 Vahid/Givargis 18 Cache Memory • Memory access may be slow • Cache is small but fast memory close to processor – Holds copy of part of memory – Hits and misses Fast/expensive technology.Two Memory Architectures • Princeton (Von Neumann) – Fewer memory wires • Harvard – Simultaneous program and data memory access (microcontrollers) Processor Program memory Data memory Harvard Processor Memory (program and data) Princeton Embedded Systems Design: A Unified Hardware/Software Introduction. usually on a different chip Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 19 . Programmer’s View • Programmer doesn’t need detailed understanding of architecture – Instead, needs to know what instructions can be executed • Two levels of instructions: – Assembly level – Structured languages (C, C++, Java, etc.) • Most development today done using structured languages – But, some assembly level programming may still be necessary – Drivers: portion of program that communicates with and/or controls (drives) another device • Often have detailed timing considerations, extensive bit manipulation • Assembly level may be best for these Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 20 Assembly-Level Instructions Instruction 1 opcode operand1 operand2 Instruction 2 opcode operand1 operand2 Instruction 3 opcode operand1 operand2 Instruction 4 opcode operand1 operand2 ... • Instruction Set – Defines the legal set of instructions for that processor • Data transfer: memory/register, register/register, I/O, etc. • Arithmetic/logical: move register through ALU and back • Branches: determine next PC value when not just PC+1 Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 21 A Simple (Trivial) Instruction Set Assembly instruct. First byte Second byte Operation MOV Rn, direct 0000 Rn direct Rn = M(direct) MOV direct, Rn 0001 Rn direct M(direct) = Rn MOV @Rn, Rm 0010 Rn MOV Rn, #immed. 0011 Rn ADD Rn, Rm 0100 Rn Rm Rn = Rn + Rm SUB Rn, Rm 0101 Rn Rm Rn = Rn - Rm JZ Rn, relative 0110 Rn opcode Rm M(Rn) = Rm immediate relative Rn = immediate PC = PC+ relative (only if Rn is 0) operands Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 22 Addressing Modes Addressing mode Operand field Immediate Data Register-direct Register-file contents Memory contents Register address Data Register indirect Register address Memory address Direct Memory address Data Indirect Memory address Memory address Data Data Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 23 Sample Programs Equivalent assembly program C program int total = 0; for (int i=10; i!=0; i--) total += i; // next instructions... 0 1 2 3 MOV R0, #0; MOV R1, #10; MOV R2, #1; MOV R3, #0; // total = 0 // i = 10 // constant 1 // constant 0 Loop: 5 6 7 JZ R1, Next; ADD R0, R1; SUB R1, R2; JZ R3, Loop; // Done if i=0 // total += i // i-// Jump always Next: // next instructions... • Try some others – Handshake: Wait until the value of M[254] is not 0, set M[255] to 1, wait until M[254] is 0, set M[255] to 0 (assume those locations are ports). – (Harder) Count the occurrences of zero in an array stored in memory locations 100 through 199. Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 24 Programmer Considerations • Program and data memory space – Embedded processors often very limited • e.g., 64 Kbytes program, 256 bytes of RAM (expandable) • Registers: How many are there? – Only a direct concern for assembly-level programmers • I/O – How communicate with external signals? • Interrupts Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 25 Microprocessor Architecture Overview • If you are using a particular microprocessor, now is a good time to review its architecture Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 26 Example: parallel port driver LPT Connection Pin I/O Direction Register Address 1 Output 0th bit of register #2 0th - 7th 2-9 Output bit of register #0 10,11,12,13,15 Input 6,7,5,4,3th bit of register #1 14,16,17 Output 1,2,3th bit of register #2 Pin 13 PC Switch Parallel port Pin 2 LED • Using assembly language programming we can configure a PC parallel port to perform digital I/O (8255A peripheral I/F controller chip) – write and read to three special registers to accomplish this table provides list of parallel port connector pins and corresponding register location – Example : parallel port which monitors the input switch and turns the LED on/off accordingly Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 27 Parallel Port Example ; ; ; ; ; This program consists of a sub-routine that reads the state of the input pin, determining the on/off state of our switch and asserts the output pin, turning the LED on/off accordingly x86 assembly language CheckPort push push mov in and cmp jne proc ax ; save the content dx ; save the content dx, 3BCh + 1 ; base + 1 for register #1 al, dx ; read register #1 al, 10h ; mask out all but bit # 4 al, 0 ; is it 0? SwitchOn ; if not, we need to turn the LED on SwitchOff: mov in and out jmp dx, 3BCh + 0 ; base + 0 for register #0 al, dx ; read the current state of the port al, f7h ; clear first bit (masking) dx, al ; write it out to the port Done ; we are done SwitchOn: mov in or out dx, al, al, dx, Done: pop pop CheckPort dx ax endp extern “C” CheckPort(void); // defined in // assembly void main(void) { while( 1 ) { CheckPort(); } } Pin 13 PC Switch Parallel port Pin 2 LED LPT Connection Pin I/O Direction Register Address 1 Output 0th bit of register #2 3BCh + 0 ; base + 0 for register #0 dx ; read the current state of the port 01h ; set first bit (masking) al ; write it out to the port 2-9 Output 0th-7th bit of register #0 10,11,12,13,15 Input 6,7,5,4,3th bit of reg. #1 14,16,17 Output 1,2,3th bit of register #2 ; restore the content ; restore the content Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 28 Operating System • Optional software layer providing low-level services to a program (application). – File management, disk access – Keyboard/display interfacing – Scheduling multiple programs for execution • Or even just multiple threads from one program – Program makes system calls to the OS Embedded mb Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis DB file_name “out.txt” -- store file name MOV MOV INT JZ R0, 1324 R1, file_name 34 R0, L1 ----- system call “open” id address of file-name cause a system call if zero -> error . . . read the file JMP L2 -- bypass error cond. L1: . . . handle the error L2: 29 Development Environment • Development processor – The processor on which we write and debug our programs • Usually a PC • Target processor – The processor that the program will run on in our embedded system • Often different from the development processor Development processor Target processor Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis 30 Software Development Process • Compilers C File C File Compiler Binary File Binary File – Cross compiler Asm. File • Runs on one processor, but generates code for another Assemble r Binary File Linker Library Exec. File Implementation Phase Embedded Systems Design: A Unified Hardware/Software Introduction, (c) 2000 Vahid/Givargis Debugger Profiler Verification Phase • • • • Assemblers Linkers Debuggers Profilers 31 case 5: reg[fb & 0x0f] -= reg[sb >> 4]. second_byte. sizeof(program). how can we run our compiled code? Two options: – Download to target processor – Simulate • Simulation – One method: Hardware description language • But slow. break. char *argv[]) { //instruction memory //data memory } return 0.h> typedef struct { unsigned char first_byte.second_byte. } else return(-1). default: return –1. switch( fb >> 4 ) { case 0: reg[fb & 0x0f] = memory[sb]. unsigned char memory[256]. } FILE* ifs. unsigned char reg[16]. (c) 2000 Vahid/Givargis 32 Instruction Set Simulator For A Simple Processor #include <stdio. } instruction program[1024]. case 4: reg[fb & 0x0f] += reg[sb >> 4]. 2. return(0). while( ++pc < (num_bytes / 2) ) { fb = program[pc]. } if (run_program(fread(program. not always available – Another method: Instruction set simulator (ISS) • Runs on development processor. break. break. case 2: memory[reg[fb & 0x0f]] = reg[sb >> 4]. ifs)) == 0) { print_memory_contents().first_byte. but executes instructions of target processor Embedded Systems Design: A Unified Hardware/Software Introduction. case 3: reg[fb & 0x0f] = sb. sb = program[pc]. fb. “rb”) == NULL ) { return –1. } instruction. break. void run_program(int num_bytes) { If( argc != 2 || (ifs = fopen(argv[1]. (c) 2000 Vahid/Givargis } 33 . int pc = -1. case 1: memory[sb] = reg[fb & 0x0f]. break. Embedded Systems Design: A Unified Hardware/Software Introduction. sb. break. case 6: pc += sb. int main(int argc. break.Running a Program • If development processor is different than target. (c) 2000 Vahid/Givargis 35 . inefficient on a GPP – But single-purpose processor has high NRE. but not controllable • Compromise: emulator Programmer Verification Phase – Runs in real environment. embedded control. video processing – requires huge video buffers and operations on large arrays of data.g. doesn’t interact with real environment • Download to board – Use device programmer – Runs in real environment. .. – Still programmable Embedded Systems Design: A Unified Hardware/Software Introduction.Testing and Debugging (a) • ISS (b) Implementation Phase Verification Phase Implementation Phase Development processor Debugger / ISS Emulator External tools – Gives us control over time – set breakpoints... video processing. network processing. at speed or near – Supports some controllability from the PC Embedded Systems Design: A Unified Hardware/Software Introduction. etc. digital signal processing. (c) 2000 Vahid/Givargis 34 Application-Specific Instruction-Set Processors (ASIPs) • GPPs – Sometimes too general to be effective in demanding application • e. step-by-step execution.. look at register values.g. set values. telecommunications. not programmable • ASIPs – targeted to a particular domain – Contain architectural features specific to that domain • e. – But. typically part of register space – On-chip program and data memory – Direct programmer access to many of the chip’s pins – Specialized instructions for bit-manipulation and other low-level operations Embedded Systems Design: A Unified Hardware/Software Introduction. add two arrays • Vector ALUs. etc. music synthesizer • DSP features – Several instruction execution units – Multiple-accumulate single-cycle instruction. (c) 2000 Vahid/Givargis 36 Another Common ASIP: Digital Signal Processors (DSP) • For signal processing applications – Large amounts of digitized data. – Efficient vector operations – e.. other instrs. etc. digital TV.. but not in huge amounts – e. cell-phone voice filter.g. disk drive. Embedded Systems Design: A Unified Hardware/Software Introduction. often streaming – Data transformations must be applied fast – e. loop buffers.g. microwave oven • Microcontroller features – On-chip peripherals • Timers. • Tightly integrated for programmer.g. (c) 2000 Vahid/Givargis 37 . analog-digital converters. setting actuators – Mostly dealing with events (bits): data is present. washing machine.. digital camera (assuming SPP for image compression).A Common ASIP: Microcontroller • For embedded control applications – Reading sensors. VCR. serial communication. Dhrystone MIPS. licensing. we increasingly acquire a processor as Intellectual Property (IP) – e. power.g..com (customized VLIW architectures) Embedded Systems Design: A Unified Hardware/Software Introduction.k.a. www. etc. (c) 2000 Vahid/Givargis 38 Selecting a Microprocessor • Issues – Technical: speed. 750 MIPS = 750*1757 = 1.tensilica.750 Dhrystones per second – SPEC: set of more realistic benchmarks. consumer electronics. Dhrystones/sec. cost – Other: development environment. Commonly used today. (c) 2000 Vahid/Givargis 39 . no floating point operations).eembc.g.. – So. size. microprocessors were acquired as chips • Today.org • Suites of benchmarks: automotive.317. office automation.com • Another solution: retargettable compilers – e. • Speed: how evaluate a processor’s speed? – Clock speed – but instructions per cycle may differ – Instructions per second – but work per instr. networking.. • MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780).g. telecommunications Embedded Systems Design: A Unified Hardware/Software Introduction. but oriented to desktops – EEMBC – EDN Embedded Benchmark Consortium. A.improvsys. synthesizable VHDL model • Opportunity to add a custom datapath hardware and a few custom instructions. Standard code (mostly string handling. www. power and size impacts – Problem: need compiler/debugger for customized ASIP • Remember. www.Trend: Even More Customized ASIPs • In the past. most development uses structured languages • One solution: automatic compiler/debugger generation – e. or delete a few instructions – Can have significant performance. may differ – Dhrystone: Synthetic benchmark (1984). prior expertise. 256K L2. 32 I/O. 9 DAC 16K Inst. Serial Ports. 1998 Embedded Systems Design: A Unified Hardware/Software Introduction.4] Embedded Systems Design: A Unified Hardware/Software Introduction. 3 T1 Ports. (c) 2000 Vahid/Givargis dir IR[7. SRAM. Price 97W ~7M $900 32/64 ~1300 5W ~7M $900 32/64 NA NA 3.1M NA 8 Microcontroller ~1 ~0. Timer. Timer. // Register File Reset PC=0. Nov. 2x16 K L1. Fetch IR=M[PC].. 32 I/O. TI. None 4K ROM..0] Jz 0110 PC=(RF[Rn]=0) ?rel :PC to Fetch 41 .6M NA 32 268 1W 2. WDT. // Memory RF[16][16].12] Rn IR[11. much more bottom-up design 0011 Add RF[Rn] =RF[Rn]+RF[Rm] to Fetch Sub RF[Rn] = RF[Rn]-RF[Rm] to Fetch 0100 Aliases: 0101 Op IR[15.1W ~10K $5 Digital Signal Processors 16/32 ~600 NA NA $34 32 NA NA $75 40 Sources: Intel. 192 RAM.. MMX 2x32 K L1. SPI 128K.0] imm IR[7. Embedded Systems Programming. 13 ADC. and IBM Website/Datasheet. Motorola. 128 RAM.8] Rm IR[7..2W ~10K $7 8 ~.Instruction-Set Processors Processor Clock speed Intel PIII 1GHz IBM PowerPC 750X MIPS R5000 StrongARM SA-110 550 MHz 250 MHz 233 MHz Intel 8051 Motorola 68HC811 12 MHz TI C5416 160 MHz Lucent DSP32C 80 MHz 3 MHz Periph.0] rel IR[7.5 ~0.. PC=PC+1 Decode – But instructive to see how simply we can build one top down – Remember that real processors aren’t usually built this way from states below Mov1 RF[Rn] = M[dir] to Fetch Mov2 M[dir] = RF[Rn] to Fetch Mov3 M[@Rn] = RF[Rm] to Fetch Mov4 RF[Rn]= imm to Fetch Op = 0000 0001 0010 • Much more optimized. bit M[64k][16]. ARM. MIPS.. UART 4K ROM. DMA. (c) 2000 Vahid/Givargis 40 Designing an Instruction-Set Processor FSMD Declarations: • Not something an embedded system designer normally would do bit PC[16]. DMA Bus Width MIPS General Purpose Processors 32 ~900 Power Trans.. // Program Counter IR[16]. 256K L2 2x32 K 2 way set assoc. 2K Data. // Instruction Reg. Alus=11. RFr2e=1. PCclr=1.Architecture of a Simple Microprocessor • Storage devices for each declared variable Control unit – register file holds each of the variables Controller (Next-state and control logic. Mwe=1. to Fetch RFrle=1. state register) To all input contro l signals From all output control signals 16 PCld PCinc Irld PC IR Datapath RFs 1 0 2x1 mux RFwa RFw RFwe RF (16) RFr1a RFr1e RFr2a RFr2e RFr1 RFr2 ALUs PCclr ALU ALUz 3 Ms 2 1 4x1 mux A 0 Mre Mwe Memory D You just built a simple microprocessor! Embedded Systems Design: A Unified Hardware/Software Introduction. PCinc=1. ALUs=01 PCld= ALUz. RFr1e=1. Mov4 RF[Rn]= imm to Fetch Mov1 Op = 0000 0001 0010 0011 0100 0101 0110 Add Sub Jz FSMD RFwa=Rn. RFs=00. RFwe=1. RFr2a=Rm. state register) • Functional units to carry out the FSMD operations – One ALU carries out every required operation • Connections added among the components’ ports corresponding to the operations required by the FSM • Unique identifiers created for every control signal To all input contro l signals Datapath From all output control signals 16 PCld PCinc Irld PC IR RFs 1 0 2x1 mux RFwa RFw RFwe RF (16) RFr1a RFr1e RFr2a RFr1 RFr2e RFr2 ALUs PCclr ALU ALUz 3 1 2 Ms 0 4x1 mux Mre Mwe Memory A D Embedded Systems Design: A Unified Hardware/Software Introduction.RFr2e=1. PC=PC+1 MS=10. (c) 2000 Vahid/Givargis 43 . Fetch IR=M[PC]. RFs=01. Irld=1. RFr2e=1. RFs=10. RFwe=1. Ms=01. Mre=1. Decode from states below RF[Rn] = M[dir] to Fetch RFwa=Rn. to Fetch RFr2a=Rm. RFr1a=Rn. RFr1a=Rn. to Fetch RFr2a=Rm. Mov2 M[dir] = RF[Rn] to Fetch RFr1a=Rn. RFwe=1. Mwe=1. Ms=01. RFwe=1. Mre=1. (c) 2000 Vahid/Givargis 42 A Simple Microprocessor Reset PC=0. PC=(RF[Rn]=0) ?rel :PC RFrla=Rn. Ms=11. RFr1e=1. RF[Rn] =RF[Rn]+RF[Rm] RFwa=Rn. Mov3 M[@Rn] = RF[Rm] to Fetch RFr1a=Rn. RFs=00. RFr1e=1. ALUs=00 RF[Rn] = RF[Rn]-RF[rm] RFwa=Rn. FSM operations that replace the FSMD operations after a datapath is created Control unit Controller (Next-state and control logic. RFr1e=1. (c) 2000 Vahid/Givargis 44 Embedded Systems Design: A Unified Hardware/Software Introduction Chapter 4 Standard Single Purpose Processors: Peripherals 1 . low NRE. network processors. datapath. and memory • Structured languages prevail – But some assembly level programming still necessary • Many tools available – Including instruction-set simulators.Chapter Summary • Instruction-Set processors – Good performance. and in-circuit emulators • ASIPs – Microcontrollers. DSPs. more customized ASIPs • Choosing among processors is an important step • Designing an instruction-set processor is conceptually the same as designing a single-purpose processor Embedded Systems Design: A Unified Hardware/Software Introduction. flexible • Controller. . wrap-around Embedded Systems Design: A Unified Hardware/Software Introduction.a.000 Clk pulses Then 200 microseconds have passed 16-bit counter would count up to 65. measure a car’s speed Basic timer Clk 16-bit up counter • Based on counting clock pulses • • • • E. peripherals serial transmission analog/digital conversions Embedded Systems Design: A Unified Hardware/Software Introduction. let Clk period be 10 ns (f = 100 MHz) And we count 20..g. counters. watchdog timers • Timer: measures time intervals – To generate timed output events • e..35 microsec. resolution = 10 ns • Top: indicates top count reached. (c) 2000 Vahid/Givargis 2 Timers.g. hold traffic light green for 10 s – To measure input events • e.535*10 ns = 655.Introduction • Single-purpose processors – Performs specific computation task – Custom single-purpose processors • Designed by us for a unique task – Standard single-purpose processors • • • • “Off-the-shelf” -.pre-designed for a common task a.k.g. (c) 2000 Vahid/Givargis 16 Cnt Top Reset 3 .. but counts pulses on a general input signal rather than clock Timer/counter Clk – e..g. (c) 2000 Vahid/Givargis 4 Other timer structures • Interval timer – Indicates when desired time interval has passed – We set terminal count to desired interval • Number of clock cycles = Desired time interval / Clock period • Cascaded counters • Prescaler – Divides clock – Increases range. decreases resolution Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis Ha H 16/32-bit timer Clk Timer with a terminal count 16-bit up counter 16 Cnt1 Top1 Clk 16-bit up counter 16 Cnt 16-bit up counter 16 Reset Cnt2 Top2 = Top Time with prescaler Clk Prescaler Terminal count 16-bit up counter Mode 5 .Counters • Counter: like a timer. count cars passing over a sensor – Can often configure device as either a timer or counter 2x1 mux 16-bit up counter 16 Cnt Cnt_in Top Reset Mode Embedded Systems Design: A Unified Hardware/Software Introduction. Example: Reaction Timer reaction button indicator light LCD /* main.000 microsec. (c) 2000 Vahid/Givargis watchdog_reset_routine(){ /* checkreg is set so we can load value into timereg. time: 100 ms configure timer mode set Cnt to MS_INIT • Measure time between turning light on and user pushing button – 16-bit timer. count_milliseconds). counter increments every 6 cycles – Resolution = 6*83.33 ns (12 MHz)..5 = 63535 wait a random amount of time turn on indicator light start timer while (user has not pushed reaction button){ if(Top) { stop timer set Cnt to MS_INIT start timer reset Top count_milliseconds++. (c) 2000 Vahid/Givargis 6 Watchdog timer • Must reset timer every X time unit...5 microseconds = 32. resolution – timereg value = 2*(2161)–X = 131070–X – For 2 min. 2 millisec. } Embedded Systems Design: A Unified E Hardware/Software Introduction.g. ATM machine – 16-bit timer.c */ main(){ wait until card inserted call watchdog_reset_routine while(transaction in progress){ if(button pressed){ perform corresponding action call watchdog_reset_routine } /* if watchdog_reset_routine not called every < 2 minutes. clk period is 83. timeout. } } turn light off printf(“time: %i ms“. self-reset • Another use: timeouts – e. interrupt_service_routine is called */ } Embedded Systems Design: A Unified Hardware/Software Introduction. X = 120. Zero is loaded into scalereg and 11070 is loaded into timereg */ checkreg = 1 scalereg = 0 timereg = 11070 } void interrupt_service_routine(){ eject card reset screen } 7 . so timereg = 11070 osc prescaler clk (/12) 12 MHz scalereg overflow (12 bits) 1 MHz overflow Timereg (16 bits) to system reset or interrupt 1/(131070 ms) 1/(2ms) checkreg /* main.77 milliseconds – Want program to count millisec. so initialize counter to 65535 – 1000/0.c */ #define MS_INIT 63535 void main(void){ int count_milliseconds = 0.33 ns=0.5 microsec. else timer generates a signal • Common use: detect failure. – Range = 65535*0. 9 .Serial Transmission Using UARTs • UART: Universal Asynchronous Receiver Transmitter – Takes parallel data and transmits serially – Receives serial data and converts to parallel • Parity: extra bit for simple error checking • Start bit. pwm_o clk 75% duty cycle – average pwm_o is 3.5V.75V. dimmer lights • Another use: encode commands. stop bit • Baud rate 1 0 0 1 0 1 1 embedded device 1 10011011 10011011 Sending UART start bit Receiving UART end bit data – signal changes per second – bit rate usually higher 1 0 0 1 1 0 1 1 Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis H 8 Pulse width modulator • Generates pulses with specific high/low times • Duty cycle: % time high pwm_o clk – Square wave: 50% duty cycle • Common use: control average voltage to electric device – Simpler than DC-DC converter or digital-analog converter – DC motor speed. receiver uses timer to decode Embedded Systems Design: A Unified Hardware/Software Introduction.25V pwm_o clk 50% duty cycle – average pwm_o is 2. (c) 2000 Vahid/Givargis H 25% duty cycle – average pwm_o is 1. pwm_o = 0 pwm_o cycle_high Relationship between applied voltage and speed of the DC Motor Internal Structure of PWM void main(void){ /* controls period */ PWMP = 0xff. cursor ON/OFF (C). DATA_BUS = c. pwm_o = 1 counter >= cycle_high.75 75 6900 5. } A B Embedded Systems Design: A Unified Hardware/Software Introduction. return cursor home 0 0 0 0 0 0 0 0 1 * Returns cursor home 0 0 0 0 0 0 0 1 I/D S 0 0 0 0 0 0 1 D C B 0 0 0 0 0 1 S/C R/L * * Move cursor and shifts display 0 0 0 0 1 DL N F * * Sets interface data length.5 50 1840 3. 5V DC From processor 5V MOTOR while(1){}. EnableLCD(45). and character font 1 0 R/L = 0 shift to left Embedded Systems Design: A Unified Hardware/Software Introduction. /* controls duty cycle */ PWM1 = 0x7f. The PWM alone cannot drive the DC motor. and blink position (B) Writes Data 11 .0 100 9200 counter < cycle_high. (c) 2000 Vahid/Givargis 10 LCD controller E R/W RS void WriteChar(char c){ communications bus RS = 1. a possible way to implement a driver is shown below using an MJE3055T NPN transistor.Controlling a DC motor with a PWM counter ( 0 – 254) clk_div clk controls how fast the counter increments 8-bit comparator Input Voltage % of Maximum Voltage Applied RPM of DC Motor 0 0 0 2. (c) 2000 Vahid/Givargis WRITE DATA Sets cursor move direction and/or specifies not to shift display ON/OFF of all display(D). DB7–DB0 } 8 microcontroller /* indicate data being sent */ /* send data to LCD */ /* toggle the LCD with appropriate delay */ LCD controller CODES I/D = 1 cursor moves left DL = 1 8-bit I/D = 0 cursor moves right DL = 0 4-bit S = 1 with display shift N = 1 2 rows S/C =1 display shift N = 0 1 row S/C = 0 cursor movement F = 1 5x10 dots R/L = 1 shift to right F = 0 5x7 dots RS R/W DB7 DB6 DB5 DB4 DB3 DB2 DB1 DB0 Description 0 0 0 0 0 0 0 0 0 1 Clears all display. number of display lines. Keypad controller N1 N2 N3 N4 k_pressed M1 M2 M3 M4 4 key_code key_code keypad controller N=4. coasts to stop otherwise – Specification: degrees/step or #steps/revol. 1. (c) 2000 Vahid/Givargis Sequence 1 2 3 4 5 A + + + B + + + A’ + + - B’ + + - Vd 1 16 A’ 2 MC3479P 15 A 3 14 4 13 5 12 Bias’/Set 6 11 Phase A’ Clk 7 10 CW’/CCW O|C 8 9 Full’/Half Step GND Red White Yellow Black Vm B B’ GND A A’ B B’ 13 . (e. (c) 2000 Vahid/Givargis 12 Stepper motor controller • Stepper motor: rotates fixed number of degrees when given a “step” signal – In contrast. DC motor just rotates when power applied. M=4 Embedded Systems Design: A Unified Hardware/Software Introduction..g.8° or 200 steps) • Rotation achieved by applying specific voltage sequence to 4 coils (1 or 2 coils driven during each step) • Controller greatly simplifies this Embedded Systems Design: A Unified Hardware/Software Introduction. clk=1. 1. A is connected to the 8051 microcontroller and B is connected to the stepper motor. 1. } The output pins on the stepper motor driver do not provide enough current to drive the stepper motor. One possible implementation of the buffers is pictured to the right. The 8051 alone cannot drive the stepper motor. 1. } } 15 . Q1. 0. for(a=0.3 P2. a++) for(b=0. sbit dir=P2^4.2 P2. } } } } void main( ){ int z. y<=step. b. 1). 0. b<10000. int lookup[20] = { 1. notA=lookup[z -2]. Stepper Motor +V 1K Q1 A B Q2 1K Embedded Systems Design: A Unified Hardware/Software Introduction. 0. so several transistors were added to increase the current going to the stepper motor. /* clockwise movement */ if(dir == 1){ for(y=0. } void move(int dir.1 P2. y<=steps. 0.1 2 A’ B 15 3 A B’ 14 */turn the motor forward */ cw=0. /* pulse clock */ delay(). /* set direction */ clk=0. notB=lookup[z-3]. 0. delay(). notA=lookup[z+2]. /* move backwards. 0. sbit isA=P2^1. a buffer is needed. 1. isB=lookup[z-1]. /* set direction */ clk=0. To amplify the current. b++) a=a+0. 1. 7. while(1){ /*move forward. +V 1K Q1 B +V 1K A Q2 Q3 330 Embedded Systems Design: A Unified Hardware/Software Introduction. 2). 0. } /*turn the motor backwards */ cw=1.0 P1. sbit cw=P1^0. delay( ).c*/ sbit notA=P2^0. y++){ for(z=19. sbit notB=P2^2. Q3 are MJE3055T NPN transistors and Q2 is an MJE2955T PNP transistor. (c) 2000 Vahid/Givargis void delay(){ int a. j<50. 0 }. clk=1. z>=0. 8051 CW’/CCW CLK P1.c */ MC3479P Stepper Motor Driver 10 7 void main(void){ sbit clk=P1^1. 0. for (i=0. } } } /* counter clockwise movement */ if(dir==0){ for(y=0.Stepper motor with controller (driver) /* main. z. z<=19. y++){ for(z=0. j++) i = i + 0. 0.0 Stepper Motor A possible way to implement the buffers is located below. GND/ +V P2. 1. i<1000. z+4){ isA=lookup[z]. /* pulse clock */ delay(). 1. void delay(void){ int i. a<5000. 1. j. int steps) { int y. z . sbit isB=P2^3. (c) 2000 Vahid/Givargis 14 Stepper motor without controller (driver) 8051 P2.4){ isA=lookup[z]. 1. notB=lookup[z+3].5 degrees (1step)*/ move(0. i++) for ( j=0. 15 degrees (2 steps) */ move(1. isB=lookup[z+1].4 /*main. Q1 is an MJE3055T NPN transistor and Q2 is an MJE2955T PNP transistor. 63 volts 0 1 0 0 0 0 0 0 ½(5. (c) 2000 Vahid/Givargis 16 ADC using successive approximation Given an analog input signal whose voltage should range from 0 to 15 volts.63 + 3.5V 1.5V 2.0V 5.93) = 4.Analog-to-digital converters 3.0V 4.93 volts. 0 1 0 0 0 0 0 0 ½(5. and an 8-bit digital encoding. 0 1 0 1 0 0 0 0 ½(5.75) = 5.0V 0.0V 2.93) = 5.63 volts Vmax = 5.93 volts Vmin = 4.63 + 4. 0 1 0 1 0 0 0 0 ½(7.16 volts.5V 6.05 + 4.69) = 4.05 volts.69 volts Vmin = 4.05 volts Vmax = 5. 0 0 0 0 0 0 0 0 ½(5.5V 2 1 t1 t2 0100 t3 2 1 time t1 t4 1000 0110 0101 Digital output 0100 t3 time t4 1000 0110 Digital input 0101 digital to analog analog to digital proportionality t2 Embedded Systems Design: A Unified Hardware/Software Introduction.5 + 3.5 volts Vmax = 7.75 volts Vmin = 3.16 volts Vmax = 5.5V 0V 4 4 3 3 analog output (V) 5. (c) 2000 Vahid/Givargis 17 .0V 6.75 volts.16 + 4.5V 4. Then trace the successive-approximation approach to find the correct encoding. calculate the correct encoding for 5 volts.99 volts 0 1 0 1 0 1 0 1 Embedded Systems Design: A Unified Hardware/Software Introduction.5V 7.5 volts.69) = 5.5V 1111 1110 1101 1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001 0000 analog input (V) Vmax = 7.0V 1. 0 1 0 1 0 1 0 0 ½(7. 0 1 0 1 0 1 0 0 ½(5. Va / Vmax = d /(2^n – 1) 5/15 = d/(28-1) d= 85 encoding: 01010101 Successive-approximation method ½(Vmax – Vmin) = 7.16 + 4.69 volts.75) = 4.0V 3.5 + 0) = 3. Embedded Systems Design: A Unified Hardware/Software Introduction Chapter 5 Memory 1 Outline • • • • • Memory Write Ability and Storage Permanence Common Memory Types Composing Memory Memory Hierarchy and Cache Advanced RAM Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 2 . g. 4.Introduction • Embedded system’s functionality aspects – Processing • processors • transformation of data – Storage • memory • retention of data – Communication • buses • transfer of data Embedded Systems Design: A Unified Hardware/Software Introduction..096 x 8 memory: m words – – – – m × n memory … n bits per word • 32. (c) 2000 Vahid/Givargis Q0 4 .768 bits • 12 address input signals • 8 input/output data signals memory external view r/w • Memory access – r/w: selects read or write – enable: read or write only when asserted – multiport: multiple accesses to different locations simultaneously 2k × n read and write memory enable A0 … Ak-1 … Qn-1 Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 3 Memory: basic concepts • Stores large number of bits … m x n: m words of n bits each k = Log2(m) address input signals or m = 2^k words e. g.000s OR in-system. Mask-programmed ROM • In-system programmable memory – Can be written to by a processor in the embedded system using the memory – Memories in high end and middle range of write ability Embedded Systems Design: A Unified Hardware/Software Introduction. 1. RAM – Middle range • processor writes to memory.g. showing relative degrees along each axis (not to scale). EEPROM Advanced RAMs can hold bits without power • • read and write. bits stored without power Storage permanence Write ability/ storage permanence Manner and speed a memory can be written During External External External External In-system..g. 1. Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 5 Write ability • Ranges of write ability – High end • processor writes to memory simply and quickly • e.. only one time only unlimited block-oriented 1.. “programmer”.. fast fabrication programmer. FLASH. OTP ROM – Low end • bits stored only during fabrication • e. OR in-system. lose stored bits without power Advanced ROMs can be written to • Mask-programmed ROM Life of product Traditional distinctions blurred – • read only.• Traditional ROM/RAM distinctions – ROM • – RAM • • – EEPROM FLASH NVRAM Nonvolatile In-system programmable SRAM/DRAM Near zero Write ability e.. EPROM. but slower • e.g.000s of cycles of cycles Storage permanence – ability of memory to hold stored bits after they are written Write ability and storage permanence of memories. EEPROM – Lower range • special equipment.g. must be used to write to memory • e. programmer programmer writes. programmer.. NVRAM Write ability – EPROM Tens of years Battery life (10 years) Ideal memory OTP ROM e. (c) 2000 Vahid/Givargis 6 .000s of cycles cycles writes.g. (c) 2000 Vahid/Givargis 7 ROM: “Read-Only” Memory – Store software program for general-purpose processor • program instructions can be one or more ROM words – Store constant data needed by system – Implement combinational circuit Embedded Systems Design: A Unified Hardware/Software Introduction. NVRAM – Lower range • holds bits as long as power supplied to memory • e..g. (c) 2000 Vahid/Givargis External view 2k × n ROM enable A0 … • Nonvolatile memory • Can be read from but not written to.g.g.. by a processor in an embedded system • Traditionally written to. or years after memory’s power source turned off • e. “programmed”. SRAM – Low end • begins to lose bits almost immediately after written • e. mask-programmed ROM – Middle range • holds bits days. DRAM • Nonvolatile memory – Holds bits after power is no longer supplied – High end and middle range of storage permanence Embedded Systems Design: A Unified Hardware/Software Introduction. before inserting to embedded system • Uses Ak-1 … Qn-1 Q0 8 . months...g.Storage permanence • Range of storage permanence – High end • essentially never loses bits • e. (c) 2000 Vahid/Givargis 8×2 ROM 0 0 0 1 1 1 1 1 enable c b a y 0 1 1 0 0 1 1 1 z word 0 word 1 word 7 10 . (c) 2000 Vahid/Givargis H 9 Implementing combinational function • Any combinational circuit of n functions of same k variables can be done with 2^k x n ROM Truth table Inputs (address) a b c 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Outputs y z 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 Embedded Systems Design: A Unified Hardware/Software Introduction.Example: 8 x 4 ROM • • • • Horizontal lines = words Vertical lines = data Lines connected only at circles Decoder sets word 2’s line to 1 if address input is 010 • Data lines Q3 and Q1 are set to 1 because there is a “programmed” connection with word 2’s line • Word 2 is not connected with data lines Q2 and Q0 • Output is 1010 Internal view 8 × 4 ROM word 0 3×8 decoder enable word 1 word 2 A0 A1 A2 word line data line programmable connection wired-OR Q3 Q2 Q1 Q0 Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 11 OTP ROM: One-time programmable ROM • Connections “programmed” after manufacture by user – – – – user provides file of desired contents of ROM file input to machine called ROM programmer each programmable connection is a fuse ROM programmer blows fuses where connections should not exist • Very low write ability – typically written only once and requires ROM programmer device • Very high storage permanence – bits don’t change unless reconnected to programmer and more fuses blown • Commonly used in final products – cheaper. harder to inadvertently modify Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 12 .Mask-programmed ROM • Connections “programmed” at fabrication – set of masks • Lowest write ability – only once • Highest storage permanence – bits never change unless damaged • Typically used for final design of high-volume systems – spread out NRE cost for a low unit cost Embedded Systems Design: A Unified Hardware/Software Introduction. 13 EEPROM: Electrically erasable programmable ROM • Programmed and erased electronically – typically by using higher than normal voltage – can program and erase individual words • Better write ability – can be in-system programmable with built-in circuit to provide higher than normal voltage • built-in memory controller commonly used to hide details from memory user – writes very slow due to erasing and programming • “busy” pin indicates to processor EEPROM still writing – can be erased and programmed tens of thousands of times • Similar storage permanence to EPROM (about 10 years) • Far more convenient than EPROMs. (c) 2000 Vahid/Givargis (d) .EPROM: Erasable programmable ROM • Programmable component is a MOS transistor – – – – – • Transistor has “floating” gate surrounded by an insulator (a) Negative charges form a channel between source and drain storing a logic 1 (b) Large positive voltage at gate causes negative charges to move out of channel and get trapped in floating gate storing a logic 0 (c) (Erase) Shining UV rays on surface of floating-gate causes negative charges to return to channel from floating gate restoring the logic 1 (d) An EPROM package showing quartz window through which UV light can pass 0V floating gate drain source (a) +15V (b) source drain Better write ability 5-30 min – can be erased and reprogrammed thousands of times • Reduced storage permanence source drain (c) – program lasts about 10 years but is susceptible to radiation and electric noise • Typically used during design development Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 14 . but more expensive Embedded Systems Design: A Unified Hardware/Software Introduction. cell phones Embedded Systems Design: A Unified Hardware/Software Introduction. rather than one word at a time – Blocks typically several thousand bytes large • Writes to single words may be slower – Entire block must be read. each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit when rd/wr indicates read Q0 internal view I3 I2 I1 I0 – each input and output data line connects to each cell in its column × n read and write memory 2×4 decoder A0 A1 Memory cell rd/wr To every cell Q3 Q 2 Q 1 Q 0 Embedded Systems Design: A Unified Hardware/Software Introduction. word updated. digital cameras. (c) 2000 Vahid/Givargis 16 .Flash Memory • Extension of EEPROM – Same floating gate principle – Same write ability and storage permanence • Fast erase – Large blocks of memory erased at once. (c) 2000 Vahid/Givargis 15 RAM: “Random-access” memory • Typically volatile memory – bits are not held without power supply • Read and written to easily by embedded system during execution • Internal structure more complex than ROM external view r/w 2k enable A0 … Ak-1 … Qn-1 – a word consists of several memory cells. each storing 1 bit 4×4 RAM enable – rd/wr connected to every cell – when row is enabled by decoder. TV set-top boxes. then entire block written back • Used with embedded systems storing large data items in nonvolatile memory – e.g.. (c) 2000 Vahid/Givargis 18 .625 microsec.Basic types of RAM • SRAM: Static RAM – Memory cell uses flip-flop to store bit – Requires 6 transistors – Holds data as long as power supplied memory cell internals SRAM Data' Data • DRAM: Dynamic RAM – Memory cell uses MOS transistor and capacitor to store bit – More compact than SRAM – “Refresh” required due to capacitor leak • word’s cells refreshed when read W DRAM Data W – Typical refresh rate 15. (c) 2000 Vahid/Givargis 17 Ram variations • PSRAM: Pseudo-static RAM – DRAM with built-in memory refresh controller – Popular low-cost high-density alternative to SRAM • NVRAM: Nonvolatile RAM – Holds data after external power removed – Battery-backed RAM • SRAM with own permanently connected battery • writes as fast as reads • no limit on number of writes unlike nonvolatile ROM-based memory – SRAM with EEPROM or flash • stores complete RAM contents on EEPROM or flash before power turned off Embedded Systems Design: A Unified Hardware/Software Introduction. – Slower to access than SRAM Embedded Systems Design: A Unified Hardware/Software Introduction. (mW) 1200 Vcc Voltage (V) 3.2.23.21. 25. 15-19 data<7…0> 2.5 Active Pwr.... addr<15... (mW) na Active Pwr. 15-19 data<7…0> 27. (mW) ..Example: HM6264 & 27C256 RAM/ROM devices • Low-cost low-capacity memory devices • Commonly used in 8-bit microcontroller-based embedded systems • First two numeric digits indicate device type – RAM: 62 – ROM: 27 11-13. (c) 2000 Vahid/Givargis 19 Example: TC55V2325FF-100 memory device • 2-megabit synchronous pipelined burst SRAM memory device • Designed to be interfaced with 32-bit processors • Capable of fast sequential reads and writes as well as single byte I/O data<31…0> addr<15…0> Device Access Time (ns) TC55V23 10 25FF-100 addr<10.21.0> 11-13.0> Standby Pwr. 3-10 22 addr<15.23. (mW) 15 100 Vcc Voltage (V) 5 5 device characteristics Read operation • Subsequent digits indicate capacity in kilobits Write operation data data addr addr OE WE /CS1 /CS1 CS2 CS2 timing diagrams Embedded Systems Design: A Unified Hardware/Software Introduction.01 .3 device characteristics /CS1 A single read operation /CS2 CS3 CLK /WE /ADSP /OE /ADSC MODE /ADV /ADSP /ADSC /ADV CLK TC55V2325F F-100 addr <15…0> /WE /OE /CS1 and /CS2 CS3 data<31…0> block diagram timing diagram Embedded Systems Design: A Unified Hardware/Software Introduction.0> 24.25. (c) 2000 Vahid/Givargis 20 .26. 3-10 22 /OE 27 /WE 20 /CS1 26 CS2 HM6264 20 /OE /CS 27C256 block diagrams Device Access Time (ns) HM6264 85-100 27C256 90 Standby Pwr.24. compose several smaller memories into one larger memory – – – Connect side-by-side to increase width of words Connect top to bottom to increase number of words • added high-order address line selects smaller memory containing desired word using a decoder Combine techniques to increase number and width of words Increase number of words 2m+1 × n ROM 2m × n ROM A0 Am-1 Am … … 1×2 decoder … 2m × n ROM enable … … … Qn-1 2m × 3n ROM 2m × n ROM enable Increase width of words A0 Am … 2m × n ROM … A Increase number and width of words … … Q3n-1 2m × n ROM … Q2n-1 Q0 … enable Q0 outputs Embedded Systems Design: A Unified Hardware/Software Introduction. fast memory • Main memory – Large. slow memory stores entire program and data • Cache – Small. inexpensive. fast memory stores copy of likely accessed parts of larger memory – Can be multiple levels of cache Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 21 Memory hierarchy • Want inexpensive.Composing memory • • • Memory size needed often differs from size of readily available memories When available memory is larger. (c) 2000 Vahid/Givargis Processor Registers Cache Main memory Disk Tape 22 . simply ignore unneeded high-order address bits and higher data lines When available memory is smaller. expensive. check cache for copy • cache hit – copy is in cache. quick access • cache miss – copy not in cache. and write techniques Embedded Systems Design: A Unified Hardware/Software Introduction.Cache • Usually designed with SRAM – faster but more expensive than DRAM • Usually on same chip as processor – space limited. replacement policies. so much smaller than off-chip main memory – faster access ( 1 cycle vs. read address and possibly its neighbors into cache • Several cache design choices – cache mapping. (c) 2000 Vahid/Givargis 24 . (c) 2000 Vahid/Givargis 23 Cache mapping • Far fewer number of available cache addresses • Are address’ contents in cache? • Cache mapping used to assign main memory address to cache address and determine hit or miss • Three basic techniques: – Direct mapping – Fully associative mapping – Set-associative mapping • Caches partitioned into indivisible blocks or lines of adjacent memory addresses – usually 4 or 8 addresses per line Embedded Systems Design: A Unified Hardware/Software Introduction. several cycles for main memory) • Cache operation: – Request for main memory access (read or write) – First. Direct mapping • Main memory address divided into 2 fields – Index • cache address • number of bits determined by cache size – Tag • compared with tag stored in cache at address indicated by index • if tags match. (c) 2000 Vahid/Givargis 25 Fully associative mapping • Complete main memory address stored in each cache address • All addresses stored in cache simultaneously compared with desired address • Valid bit and offset same as direct mapping Tag Offset Data V T D V T D V T D … = Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis = Valid = 26 . check valid bit Tag Index Offset V T D • Valid bit Data – indicates whether data in slot has been loaded from memory Valid = • Offset – used to find particular word in cache line Embedded Systems Design: A Unified Hardware/Software Introduction. Set-associative mapping • Compromise between direct mapping and fully associative mapping • Index same as in direct mapping • But. 4-way. (c) 2000 Vahid/Givargis 28 . 8-way are common Embedded Systems Design: A Unified Hardware/Software Introduction. each cache address contains content and tags of 2 or more memory address locations • Tags of that set simultaneously compared as in fully associative mapping • Cache with set size N called N-way setassociative Tag Index V T D Offset V T D Data Valid = = – 2-way. (c) 2000 Vahid/Givargis 27 Cache-replacement policy • Technique for choosing which block to replace – when fully associative cache is full – when set-associative cache’s line is full • Direct mapped cache has no choice • Random – replace block chosen at random • LRU: least-recently used – replace block not accessed for longest time • FIFO: first-in-first-out – push block onto queue when accessed – choose block to replace by popping queue Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 29 Cache impact on system performance • Most important parameters in terms of performance: – Total size of cache • total number of data bytes cache can hold • tag. miss cost will not change – avg.94435 * 4) + (0. data cache must update main memory • Write-through – – – – write to main memory whenever cache is written to easiest to implement processor must wait for slower main memory write potential for unnecessary writes • Write-back – main memory only written when “dirty” block replaced – extra dirty bit for each block set when cache block written to – reduces number of slow main memory writes Embedded Systems Design: A Unified Hardware/Software Introduction.8904 cycles Embedded Systems Design: A Unified Hardware/Software Introduction. valid and other house keeping bits not included in total – Degree of associativity – Data block size • Larger caches achieve lower miss rates but higher access cost – e. cost of memory access = (0.g. cost of memory access = (0. hit cost = 4 cycles.05565 * 20) = 4.105 cycles (improvement) • 8 Kbyte cache: miss rate = 5. hit cost = 2 cycles. miss cost will not change – avg.065 * 20) = 4. cost of memory access = (0.565%.935 * 3) + (0. (c) 2000 Vahid/Givargis (worse) 30 ..Cache write techniques • When written. • 2 Kbyte cache: miss rate = 15%.15 * 20) = 4. miss cost = 20 cycles – avg.7 cycles • 4 Kbyte cache: miss rate = 6. hit cost = 3 cycles.5%.85 * 2) + (0. Cache performance trade-offs • Improving cache hit rate without increasing size – Increase line size – Change set-associativity 0.06 8 way 0.16 0. low cost • Many variations of DRAMs proposed – – – – – need to keep pace with processor speeds FPM DRAM: fast page mode DRAM EDO DRAM: extended data out DRAM SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM RDRAM: rambus DRAM Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 31 Advanced RAM • DRAMs commonly used as main memory in processor based embedded systems – high capacity.14 0.02 0 1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb cache size Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 32 .12 % cache miss 0.1 1 way 2 way 0.08 4 way 0.04 0. (c) 2000 Vahid/Givargis col col data data data 34 . Buffer rd/wr Row Addr. respectively • Refresh circuitry can be external or internal to DRAM device Bit storage array Embedded Systems Design: A Unified Hardware/Software Introduction. Buffer Refresh Circuit Data In Buffer – strobes consecutive memory address periodically causing memory content to be refreshed – Refresh circuitry disabled during read or write operation data Data Out Buffer • Address bus multiplexed between row and column components • Row and column addresses are latched in.Basic DRAM address cas ras Col Decoder cas. by strobing ras and cas signals. sequentially. clock Sense Amplifiers Row Decoder Col Addr. ras. (c) 2000 Vahid/Givargis ar 33 Fast Page Mode DRAM (FPM DRAM) • • • • Each row of memory bit array is viewed as a page Page contains multiple words Individual words addressed by column address Timing diagram: – row (page) address sent – 3 words read consecutively by sending column address for each • Extra cycle eliminated on each read/write of words from same page ras cas address row col data Embedded Systems Design: A Unified Hardware/Software Introduction. Extended data out DRAM (EDO DRAM) • Improvement of FPM DRAM • Extra latch before output buffer – allows strobing of cas before data read operation completed • Reduces read/write latency by additional cycle ras cas address row col data col col data data data Speedup through overlap Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 35 (S)ynchronous and Enhanced Synchronous (ES) DRAM • SDRAM latches data on active edge of clock • Eliminates time to detect ras/cas and rd/wr signals • A counter is initialized to column address then incremented on active edge of clock to access consecutive memory locations • ESDRAM improves SDRAM – added buffers enable overlapping of column addressing – faster clocking and lower read/write latency possible clock ras cas address row data Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis col data data data 36 . (c) 2000 Vahid/Givargis 37 DRAM integration problem • SRAM easily integrated on same chip as processor • DRAM more difficult – Different chip making process between DRAM and conventional logic – Goal of conventional logic (IC) designers: • minimize parasitic capacitance to reduce signal propagation delays and power consumption – Goal of DRAM designers: • create capacitor cells to retain stored information – Integration processes beginning to appear Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 38 .Rambus DRAM (RDRAM) • More of a bus interface architecture than DRAM architecture • Data is latched on both rising and falling edge of clock • Broken into 4 banks each with own row decoder – can have 4 pages open at a time • Capable of very high throughput Embedded Systems Design: A Unified Hardware/Software Introduction. Memory Management Unit (MMU) • Duties of MMU – Handles DRAM refresh. (c) 2000 Vahid/Givargis 39 Embedded Systems Design: A Unified Hardware/Software Introduction Chapter 11: Design Technology 1 . bus interface and arbitration – Takes care of memory sharing among multiple processors – Translates logic memory addresses from processor to physical memory addresses of DRAM • Modern CPUs often come with MMU built-in • Single-purpose processors can be used Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 3 . tightly constrained metrics – Productivity gap • As low as 10 lines of code or 100 transistors produced per day Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 2 Introduction • Design task – Define system functionality – Convert functionality to physical implementation while • Satisfying constrained metrics • Optimizing other design metrics • Designing embedded systems is hard – Complex functionality • Millions of possible environment scenarios • Competing.Outline • • • • Automation: synthesis Verification: hardware/software co-simulation Reuse: intellectual property cores Design process models Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 4 Automation: synthesis • • • Early design mostly hardware Software complexity increased with advent of general-purpose processor Different techniques for software design and hardware design The codesign ladder Sequential program code (e.1970s) Register transfers RT synthesis (1980s. 1980s) Machine instructions Microprocessor plus program bits Logic gates Implementation VLSI.g. (c) 2000 Vahid/Givargis Logic equations / FSM's Assemblers. 1990s) Assembly instructions Hardware/software design fields rejoining – Both can start from behavioral description in sequential program model – 30 years longer for hardware design to reach this step in the ladder • Many more design dimensions • Optimization critical Embedded Systems Design: A Unified Hardware/Software Introduction.. linkers (1950s. C. or PLD implementation 5 .Improving productivity • Design technologies developed to improve productivity • We focus on technologies advancing hardware/software unified view – Automation Specification Automation • Program replaces manual design • Synthesis Verification – Reuse Implementation Reuse • Predesigned components • Cores • General-purpose and single-purpose processors on single IC – Verification • Ensuring correctness/completeness of each design step • Hardware/software co-simulation Embedded Systems Design: A Unified Hardware/Software Introduction. 1960s) Logic synthesis (1970s. VHDL) – Caused division of the two fields • Design tools evolve for higher levels of abstraction – Different rate in each field • Behavioral synthesis (1990s) Compilers (1960s. ASIC. 1960s) Logic synthesis (1970s. Line of sequential program code can translate to 1000 gates – Many more possible implementations available • (a) Like flashlight. the higher above the ground. the more ground illuminated – Sequential program designs may differ in performance/transistor count by orders of magnitude – Logic-level designs may differ by only power of 2 modeling cost increases opportunities decrease • (b) Design process proceeds to lower abstraction level. (c) 2000 Vahid/Givargis 6 Increasing abstraction level • Higher abstraction level focus of hardware/software design evolution – Description smaller/easier to capture • E. ASIC. adders.Hardware/software parallel evolution • Software design evolution – Machine instructions – Assemblers The codesign ladder • convert assembly programs into machine instructions Sequential program code (e.g. predesigned RT components (registers. C. 1980s) Machine instructions Microprocessor plus program bits Logic gates Implementation VLSI. (c) 2000 Vahid/Givargis idea idea back-of-the-envelope sequential program register-transfers logic implementation (a) implementation (b) 7 . or PLD implementation Embedded Systems Design: A Unified Hardware/Software Introduction. linkers (1950s. logic equations.1970s) – Interconnected logic gates – Logic synthesis Register transfers RT synthesis (1980s. VHDL) – Compilers • translate sequential programs into assembly • Hardware design evolution Behavioral synthesis (1990s) Compilers (1960s..) – Behavioral synthesis • converts sequential programs into FSMDs Assemblers.. etc.g. 1990s) Assembly instructions • converts logic equations or FSMs into gates Logic equations / FSM's – Register-transfer (RT) synthesis • converts FSMDs into FSMs. narrowing in on single implementation Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 8 Gajski’s Y-chart • Each axis represents type of description – Behavioral • Defines outputs as function of inputs • Algorithms but no implementation – Structural • Implements behavior by connecting components with known behavior Processors. complex than compilers – Cost = $100s to $10. days Embedded Systems Design: A Unified Hardware/Software Introduction. MUXs – Physical • Behavior Structural FSM ĺ gates. flip-flops (same level) FSM ĺ transistors (lower level) FSM X registers.Synthesis • Automatically converting system’s behavioral description to a structural implementation – Complex whole formed by parts – Structural implementation must optimize design metrics • More expensive.. FUs (higher level) FSM X processors. • • • • Sequential programs Registers. FUs. flip-flops Logic equations/FSM Transistors Transfer functions Cell Layout Modules – E. memories (higher level) Embedded Systems Design: A Unified Hardware/Software Introduction. memories • Gives size/locations of components and wires on chip/board Synthesis converts behavior at given level to structure at same level or lower Register transfers Gates.g. (c) 2000 Vahid/Givargis Chips Boards Physical 9 .000s – User controls 100s of synthesis options – Optimization critical • Otherwise could use software – Optimizations different for each user – Run time = hours. Logic synthesis • Logic-level behavior to structural implementation – Logic equations and/or FSM to connected gates • Combinational logic synthesis – Two-level minimization (Sum of products/product of sums) • Best possible performance – Longest path = 2 gates (AND gate + OR gate/OR gate + AND gate) • Minimize size – Minimum cover – Minimum cover that is prime – Heuristics – Multilevel minimization • Trade performance for size • Pareto-optimal solution – Heuristics • FSM synthesis – State minimization – State encoding Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 10 Two-level minimization • Represent logic function as sum of products (or product of sums) – AND gate for each product – OR gate for each sum • Gives best possible performance – At most 2 gate delay • Goal: minimize size – Minimum cover Sum of products F = abc'd' + a'b'cd + a'bcd + ab'cd Direct implementation a b c F d • Minimum # of AND gates (sum of products) – Minimum cover that is prime • Minimum # of inputs to each AND gate (sum of products) Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 4 4-input AND gates and 1 4-input OR gate ĺ 40 transistors 11 . a’bcd.Minimum cover • Minimum # of AND gates (sum of products) • Literal: variable or its complement – a or a’. – Covers 1 or more minterms • a’cd covers a’bcd and a’b’cd • Cover: set of implicants that covers all minterms of function • Minimum cover: cover with minimum # of implicants Embedded Systems Design: A Unified Hardware/Software Introduction. a’cd. 40 a b c F 2 4-input AND gate 1 3-input AND gates 1 4 input OR gate ĺ 28 transistors d Embedded Systems Design: A Unified Hardware/Software Introduction. b or b’. etc. (c) 2000 Vahid/Givargis 12 Minimum cover: K-map approach • Karnaugh map (K-map) – 1 represents minterm – Circle represents implicant K-map: sum of products cd ab 00 01 11 10 • Minimum cover – Covering all 1’s with min # of circles – Example: direct vs. (c) 2000 Vahid/Givargis 13 . etc. • Minterm: product of literals – Each literal appears exactly once • abc’d’. min cover K-map: minimum cover cd ab 00 01 11 10 00 0 0 1 0 00 0 0 1 0 01 0 0 1 0 01 0 0 1 0 11 1 0 0 0 11 1 0 0 0 10 0 0 1 0 10 0 0 1 0 Minimum cover F=abc'd' + a'cd + ab'cd • Less gates Minimum cover implementation – 4 vs. etc. • Implicant: product of literals – Each literal appears no more than once • abc’d’. 5 • Less transistors – 28 vs. ab’cd. 28 F=abc'd' + a'cd + b'cd Implementation a b c d Embedded Systems Design: A Unified Hardware/Software Introduction.Minimum cover that is prime • Minimum # of inputs to AND gates • Prime implicant K-map: minimum cover that is prime cd ab – Implicant not covered by any other implicant – Max-sized circle in K-map 00 01 11 10 00 0 0 1 0 01 0 0 1 0 11 1 0 0 0 10 0 0 1 0 • Minimum cover that is prime Minimum cover that is prime – Covering with min # of prime implicants – Min # of max-sized circles – Example: prime cover vs. 4 • Less transistors – 26 vs. (c) 2000 Vahid/Givargis 15 . (c) 2000 Vahid/Givargis 1 4-input AND gate 2 3-input AND gates F 1 4 input OR gate ĺ 26 transistors 14 Minimum cover: heuristics • K-maps give optimal solution every time – Functions with > 6 inputs too complicated – Use computer-based tabular method • • • • Finds all prime implicants Finds min cover that is prime Also optimal solution every time Problem: 2n minterms for n inputs – 32 inputs = 4 billion minterms – Exponential complexity • Heuristic – Solution technique where optimal solution not guaranteed – Hopefully comes close Embedded Systems Design: A Unified Hardware/Software Introduction. min cover • Same # of gates – 4 vs. original logic equation • Repeatedly make modifications toward better solution • Common modifications – Expand • Replace each nonprime implicant with a prime implicant covering it • Delete all implicants covered by new prime implicant – Reduce • Opposite of expand – Reshape • Expands one implicant while reducing another • Maintains total # of implicants – Irredundant • Selects min # of implicants that cover from existing implicants • Synthesis tools differ in modifications used and the order they are used Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis delay • Generally not possible 2-level minim.e. size 17 .. (c) 2000 Vahid/Givargis 16 Multilevel logic minimization • Trade performance for size – Increase delay for lower # of gates – Gray area represents all possible solutions – Circle with X represents ideal solution – 2-level gives best performance • max delay = 2 gates • Solve for smallest size – Multilevel gives pareto-optimal solution • Minimum delay for a given size • Minimum size for a given delay Embedded Systems Design: A Unified Hardware/Software Introduction.Heuristics: iterative improvement • Start with initial solution – i. next states same for all possible inputs – Tabular method gives exact solution • Table of all possible state pairs • If n states. (c) 2000 Vahid/Givargis 2-level minimized a d b e c f g h F multilevel minimized a b c d e f g h F 18 FSM synthesis • FSM to gates • State minimization – Reduce # of states • Identify and merge equivalent states – Outputs. (c) 2000 Vahid/Givargis 19 . heuristics common Embedded Systems Design: A Unified Hardware/Software Introduction. n2 table entries • Thus. log2(n) bits n! possible encodings Thus. and c now have 3-gate delay • Iterative improvement heuristic commonly used Embedded Systems Design: A Unified Hardware/Software Introduction. b.Example • Minimized 2-level logic function: – F = adef + bdef + cdef + gh – Requires 5 gates with 18 total gate inputs • 4 ANDS and 1 OR • After algebraic manipulation: – F = (a + b + c)def + gh – Requires only 4 gates with 11 total gate inputs • 2 ANDS and 2 ORs – Less inputs per gate – Assume gate inputs = 2 transistors • Reduced by 14 transistors – 36 (18 * 2) down to 22 (11 * 2) – Sacrifices performance for size • Inputs a. heuristics used with large # of states • State encoding – – – – Unique bit sequence for each state If n states. gates • Efficiently implemented meta-gates (i.MUX) • Final structure consists of specified library’s components only • If technology mapping integrated with logic synthesis – More efficient circuit – More complex problem – Heuristics required Embedded Systems Design: A Unified Hardware/Software Introduction. workstations – Fast heuristics • • • • • Lower quality results Shorter run times (minutes.OR gates – Complex • various-input AND. (c) 2000 Vahid/Givargis 21 .etc. (c) 2000 Vahid/Givargis 20 Complexity impact on user • • As complexity grows. hours) Smaller amount of memory required Could run on PC Super-linear-time (i.Technology mapping • Library of gates available for implementation – Simple • only 2-input AND.e.NOR. days) Requires huge amounts of memory Typically needs to run on servers. heuristics used Heuristics differ tremendously among synthesis tools – Computationally expensive • • • • • Higher quality results Variable optimization effort settings Long run times (hours. AND-OR-INVERT.000.000 > 250.NAND. n3) heuristics usually used – User can partition large systems to reduce run times/size – 1003 > 503 + 503 (1.000) Embedded Systems Design: A Unified Hardware/Software Introduction.OR.e.. (c) 2000 Vahid/Givargis 23 . (c) 2000 Vahid/Givargis 22 Register-transfer synthesis • Converts FSMD to custom single-purpose processor – Datapath • Register units to store variables – Complex data types • Functional units – Arithmetic operations • Connection units – Buses.Integrating logic design and physical design • Past – Gate delay much greater than wire delay – Thus. simultaneous logic synthesis and physical design required for efficient circuits Reduced feature size Embedded Systems Design: A Unified Hardware/Software Introduction. functional. MUXs – FSM controller • Controls datapath – Key sub problems: • Allocation – Instantiate storage. performance evaluated as # of levels of gates only • Today Wire Delay – Gate delay shrinking as feature size shrinking – Wire delay increasing Transistor • Performance evaluation needs wire length – Transistor placement (needed for wire length) domain of physical design – Thus. connection units • Binding – Mapping FSMD operations to specific units Embedded Systems Design: A Unified Hardware/Software Introduction. binding. (c) 2000 Vahid/Givargis 24 System synthesis • Convert 1 or more processes into 1 or more processors (system) – For complex embedded systems • Multiple processes may provide better performance/power • May be better described using concurrent sequential programs • Tasks – Transformation • • • • Can merge 2 exclusive processes into 1 process Can break 1 large process into separate processes Procedure inlining Loop unrolling – Allocation • Essentially design of system architecture – Select processors to implement processes – Also select memories and busses Embedded Systems Design: A Unified Hardware/Software Introduction. loop unrolling – Advanced techniques for allocation. dead-code elimination. 2 • Optimizations important – Compiler • Constant propagation. (c) 2000 Vahid/Givargis 25 .Behavioral synthesis • High-level synthesis • Converts single sequential program to single-purpose processor – Does not require the program to schedule states • Key sub problems – Allocation – Binding – Scheduling • Assign sequential program’s operations to states • Conversion template given in Ch. scheduling Embedded Systems Design: A Unified Hardware/Software Introduction. ) – Partitioning • Mapping 1 or more processes to 1 or more processors • Variables among memories • Communications among buses – Scheduling • Multiple processes on a single processor • Memory accesses • Bus communications – Tasks performed in variety of orders – Iteration among tasks common Embedded Systems Design: A Unified Hardware/Software Introduction.g. • Meet performance requirements at minimum cost – Allocate as much behavior as possible to general-purpose processor • Low-cost/flexible implementation – Minimum # of SPPs used to meet performance • System synthesis for GPP only (software) – Common for decades • Multiprocessing • Parallel processing • Real-time scheduling • Hardware/software codesign – Simultaneous consideration of GPPs/SPPs during synthesis – Made possible by maturation of behavioral synthesis in 1990’s Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 27 .System synthesis • Tasks (cont. (c) 2000 Vahid/Givargis 26 System synthesis • Synthesis driven by constraints – E.. ALUs to build datapath – “capture and simulate” era • Capture using CAD tools • Simulate to verify correctness before fabricating – Spatial thinking • Structural diagrams • Data sheets Embedded Systems Design: A Unified Hardware/Software Introduction.Temporal vs. (c) 2000 Vahid/Givargis 29 . spatial thinking • After synthesis – “describe-and-synthesize” era – Designers work primarily in behavioral domain – “describe and synthesize” era • Describe FSMDs or sequential programs • Synthesize into structure – Temporal thinking • States or sequential statements have relationship over time • Strong understanding of hardware structure still important – Behavioral description must synthesize to efficient structural implementation Embedded Systems Design: A Unified Hardware/Software Introduction. MUXs. spatial thinking • Design thought process changed by evolution of synthesis • Before synthesis – Designers worked primarily in structural domain • Connecting simpler components to build more complex systems – Connecting logic gates to build controller – Connecting registers. (c) 2000 Vahid/Givargis 28 Temporal vs. Verification • Ensuring design is correct and complete – Correct • Implements specification accurately – Complete • Describes appropriate output to all relevant input • Formal verification – Hard – For small designs or verifying certain key properties only • Simulation – Most common verification method Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 31 . (c) 2000 Vahid/Givargis 30 Formal verification • Analyze design to prove or disprove certain properties • Correctness example – Prove ALU structural implementation equivalent to behavioral description • Derive Boolean equations for outputs • Create truth table for equations • Compare to truth table from original behavior • Completeness example – Formally prove elevator door can never open while elevator is moving • Derive conditions for door being open • Show conditions conflict with conditions for elevator moving Embedded Systems Design: A Unified Hardware/Software Introduction. Simulation • Create computer model of design – Provide sample input – Check for acceptable output • Correctness example – ALU • Provide all possible input combinations • Check outputs for correct results • Completeness example – Elevator door closed when moving • Provide all possible input sequences • Check door always closed when elevator moving Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 33 . (c) 2000 Vahid/Givargis 32 Increases confidence • Simulating all possible input sequences impossible for most systems – E. 32-bit ALU • • • • 232 * 232 = 264 possible input combinations At 1 million combinations/sec ½ million years to simulate Sequential circuits even worse • Can only simulate tiny subset of possible inputs – Typical values – Known boundary conditions • E.g. 32-bit ALU – Both operands all 0’s – Both operands all 1’s • Increases confidence of correctness/completeness • Does not prove Embedded Systems Design: A Unified Hardware/Software Introduction.g... (c) 2000 Vahid/Givargis 34 Disadvantages • Simulation setup time – Often has complex external environments – Could spend more time modeling environment than system • Models likely incomplete – Some environment behavior undocumented if complex environment – May not model behavior correctly • Simulation speed much slower than actual execution – Sequentializing parallel design • IC: gates operate in parallel • Simulation: analyze inputs. (c) 2000 Vahid/Givargis 35 .000 operating system operations – = 1. 500 nanoseconds) Embedded Systems Design: A Unified Hardware/Software Introduction. generate outputs for each gate 1 at time – Several programs added between simulated system and real hardware • 1 simulated operation: – = 10 to 100 simulator operations – = 100 to 10.e.000 to 100.000 hardware operations Embedded Systems Design: A Unified Hardware/Software Introduction..Advantages over physical implementation • Controllability – Control time • Stop/start simulation at any time – Control data values • Inputs or internal values • Observability – Examine system/environment values at any time • Debugging – Can stop simulation at any point and: • Observe internal values • Modify system/environment values before restarting – Can step through small intervals (i. 000 = 10.000 hours gate-level simulation 1 u10 u100 u10000 u1.Simulation speed • Relative speeds of different types of simulation/emulation – 1 hour actual execution of SOC • = 1.000.000 sec = 3 hours – Reduced confidence • 1 msec of cruise controller operation tells us little • Faster simulator – Emulators • Special hardware for simulations – Less precise/accurate simulators • Exchange speed for observability/controllability Embedded Systems Design: A Unified Hardware/Software Introduction.000 u10.000 1 hour 1 day hardware emulation throughput model u1000 u100.000 IC FPGA 4 days 1.2 years 12 years >1 lifetime 1 millennium Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 37 .000.4 months instruction-set simulation cycle-accurate simulation register-transfer-level HDL simulation gate-level HDL simulation 1.000.001sec * 10.2 years instruction-set simulation • = 10.000. (c) 2000 Vahid/Givargis 36 Overcoming long simulation time • Reduce amount of real time simulated – 1 msec execution instead of 1 hour • 0. 001 hour – = 3. (c) 2000 Vahid/Givargis 39 .g.6 seconds Embedded Systems Design: A Unified Hardware/Software Introduction.. instruction-level model • Simulation tools evolved separately for hardware/software – Recall separate design evolution – Software (GPP) • Typically with instruction-set simulator (ISS) – Hardware (SPP) • Typically with models in HDL environment • Integration of GPP/SPP on single IC creating need for merging simulation tools Embedded Systems Design: A Unified Hardware/Software Introduction..g. gate-level model – To very abstract • E. cruise control • Don’t care what happens at every input/output of each logic gate – Simulating RT components ~10x faster – Cycle-based simulation ~100x faster • Accurate at clock boundaries only • No information on signal changes between boundaries • Faster simulator often combined with reduction in real time – If willing to simulate for 10 hours • Use instruction-set simulator • Real execution time simulated – 10 hours * 1 / 10.g.Reducing precision/accuracy • Don’t need gate-level analysis for all simulations – E..000 – = 0. (c) 2000 Vahid/Givargis 38 Hardware/software co-simulation • Variety of simulation approaches exist – From very detailed • E. (c) 2000 Vahid/Givargis 40 Minimizing communication • Memory shared between GPP and SPPs – Where should memory go? – In ISS • HDL simulator must stall for memory access – In HDL? • ISS must stall when fetching each instruction • Model memory in both ISS and HDL – Most accesses by each model unrelated to other’s accesses • No need to communicate these between models – Co-simulator ensures consistency of shared data – Huge speedups (100x or more) reported with this technique Embedded Systems Design: A Unified Hardware/Software Introduction.Integrating GPP/SPP simulations • Simple/naïve way – HDL model of microprocessor • Runs system software • Much slower than ISS • Less observable/controllable than ISS – HDL models of SPPs – Integrate all models • Hardware-software co-simulator – – – – – – ISS for microprocessor HDL model for SPPs Create communication between simulators Simulators run separately except when transferring data Faster Though. (c) 2000 Vahid/Givargis 41 . frequent communication between ISS and HDL model slows it down Embedded Systems Design: A Unified Hardware/Software Introduction. Emulators • General physical device system mapped to – Microprocessor emulator • Microprocessor IC with some monitoring.g.. control circuitry – SPP emulator • FPGAs (10s to 100s) – Usually supports debugging tasks • Created to help solve simulation disadvantages – Mapped relatively quickly • Hours. weeks for other group to finish using Embedded Systems Design: A Unified Hardware/Software Introduction.000 to $1mill – Leads to resource bottleneck • Can maybe only afford 1 emulator • Groups wait days. mapping complex SOC to 10 FPGAs • Just partitioning into 10 parts could take weeks • Can be very expensive – Top-of-the-line FPGA-based emulator: $100. emulated cruise-control may not respond fast enough to keep control of car • Mapping still time consuming – E.g. days – Can be placed in real environment • No environment setup time • No incomplete environment – Typically faster than simulation • Hardware implementation Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 43 . (c) 2000 Vahid/Givargis 42 Disadvantages • Still not as fast as real implementations – E.. prepackaged ICs Implements GPP or SPP Reduces design/debug time Have always been available • System-on-a-chip (SOC) – All components of system implemented on single chip – Made possible by increasing IC capacities – Changing the way COTS components sold • As intellectual property (IP) rather than actual IC – Behavioral. FUs. structural. flip-flops Logic equations/FSM Transistors Transfer functions Cell Layout Modules Chips Boards Physical 45 . (c) 2000 Vahid/Givargis Behavior Structural Sequential programs Registers. MUXs Register transfers Gates. (c) 2000 Vahid/Givargis 44 Cores • Soft core – Synthesizable behavioral description – Typically written in HDL (VHDL/Verilog) Gajski’s Y-chart Processors. memories • Firm core – Structural description – Typically provided in HDL • Hard core – Physical description – Provided in variety of physical layout file formats Embedded Systems Design: A Unified Hardware/Software Introduction.Reuse: intellectual property cores • Commercial off-the-shelf (COTS) components – – – – Predesigned. or physical descriptions – Processor-level components known as cores • SOC built by integrating multiple descriptions Embedded Systems Design: A Unified Hardware/Software Introduction. power..g. core available for vendor X’s 0. (c) 2000 Vahid/Givargis 46 Advantages/disadvantages of soft/firm cores • Soft cores – Can be synthesized to nearly any technology – Can optimize for particular use • E.. performance predicted accurately • Not easily mapped (retargeted) to different process – E.g. (c) 2000 Vahid/Givargis 47 .Advantages/disadvantages of hard core • Ease of use – Developer already designed and tested core • Can use right away • Can expect to work correctly • Predictability – Size.25 micrometer CMOS process • Can’t use with vendor X’s 0. smaller designs – Requires more design effort – May not work in technology not tested for – Not as optimized as hard core for same processor • Firm cores – Compromise between hard and soft cores • Some retargetability • Limited optimization • Better predictability/ease of use Embedded Systems Design: A Unified Hardware/Software Introduction. delete unused portion of core – Lower power.18 micrometer process • Can’t use with vendor Y Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 49 . deliberate effort • “Accidental” copying not possible • Today – Cores sold in electronic format • • • • Deliberate/accidental unauthorized copying easier Safeguards greatly increased Contracts to ensure no copying/distributing Encryption techniques – limit actual exposure to IP • Watermarking – determines if particular instance of processor was copied – whether copy authorized Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 48 IP protection • Past – Illegally copying IC very difficult • Reverse engineering required tremendous.New challenges to processor providers • Cores have dramatically changed business model – Pricing models • Past – Vendors sold product as IC to designers – Designers must buy any additional copies • Could not (economically) copy from original • Today – Vendors can sell as IP – Designers can make as many copies as needed • Vendor can use different pricing models – Royalty-based model • Similar to old IC model • Designer pays for each additional model – Fixed price model • One price for IP and as many copies as needed – Many other models used Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 50 Design process model • Describes order that design steps are processed – Behavior description step – Behavior to structure conversion step – Mapping structure to physical implementation step Waterfall design model Behavioral Structural • Waterfall model Physical – Proceed to next step only after current step completed • Spiral model – Proceed through 3 steps in order but with less detail – Repeat 3 steps gradually increasing detail – Keep repeating until desired system obtained – Becoming extremely popular (hardware & software development) Embedded Systems Design: A Unified Hardware/Software Introduction.New challenges to processor users • Licensing arrangements – Not as easy as purchasing IC – More contracts enforcing pricing model and IP protection • Possibly requiring legal assistance • Extra design effort – Especially for soft cores • Must still be synthesized and tested • Minor differences in synthesis tools can cause problems • Verification requirements more difficult – Extensive testing for synthesized soft cores and soft/firm cores mapped to particular technology • Ensure correct synthesis • Timing and power vary between implementations – Early verification critical • Cores buried within IC • Cannot simply replace bad core Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis Spiral design model Structural Behavioral Physical 51 . to remain competitive by reducing power.. forgot to handle certain input condition – Prototype often needed to know complete desired behavior Waterfall design model • E.g.g. (c) 2000 Vahid/Givargis 53 . FPGAs for prototype • silicon for final product – May have to use more tools Physical • Extra effort/cost • Could require more time than waterfall method – If correct implementation first time with waterfall Embedded Systems Design: A Unified Hardware/Software Introduction. though – End up with prototype • Use to test basic functions • Get idea of functions to add/remove – Original iteration experience helps in following iterations of 3 steps Spiral design model Structural Behavioral • Must come up with ways to obtain structure and physical implementations quickly – E. size Structural – Certain features dropped • Unexpected iterations back through 3 steps cause missed deadlines Physical – Lost revenues – May never make it to market Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 52 Spiral method • First iteration of 3 steps incomplete • Much faster. customer adds features after product demo Behavioral – System specifications commonly change • E..g.Waterfall method • Not very realistic – Bugs often found in later steps that must be fixed in earlier step • E..g. (c) 2000 Vahid/Givargis 55 .General-purpose processor design models • Previous slides focused on SPPs • Can apply equally to GPPs – Waterfall model • • • • Structure developed by particular company Acquired by embedded system designer Designer develops software (behavior) Designer maps application to architecture – Compilation – Manual design – Spiral-like model • Beginning to be applied by embedded system designers Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 54 Spiral-like model • • • • • Designer develops or acquires architecture Develops application(s) Maps application to architecture Analyzes design metrics Now makes choice – Modify mapping – Modify application(s) to better suit architecture – Modify architecture to better suit application(s) Y-chart Architecture Application(s) Mapping • Not as difficult now – Maturation of synthesis/compilers – IPs can be tuned Analysis • Continue refining to lower abstraction level until particular implementation chosen Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 56 Embedded Systems Design: A Unified Hardware/Software Introduction Chapter 7 Digital Camera Example 1 .Summary • Design technology seeks to reduce gap between IC capacity growth and designer productivity growth • Synthesis has changed digital design • Increased IC capacity means sw/hw components coexist on one chip • Design paradigm shift to core-based design • Simulation essential but hard • Spiral design process is popular Embedded Systems Design: A Unified Hardware/Software Introduction. single-purpose processors – Partitioning of functionality among different processor types Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 3 .Outline • • • • Introduction to a simple digital camera Designer’s perspective Requirements specification Design – Four implementations Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 2 Introduction • Putting it all together – Instruction-set processor (GPP. ASIP) – Single-purpose processor • Custom • Standard – Memory – Interfacing • Knowledge applied to designing a simple digital camera – GPP/ASIP vs. (c) 2000 Vahid/Givargis 5 . (c) 2000 Vahid/Givargis 4 Designer’s perspective • Two key tasks – Processing images and storing in memory • When shutter pressed: – Image captured – Converted to digital form by charge-coupled device (CCD) – Compressed and archived in internal memory – Uploading images to PC • Digital camera attached to PC • Special software commands camera to transmit archived images serially Embedded Systems Design: A Unified Hardware/Software Introduction. image deletion. zooming in and out. digital stretching. etc.Introduction to a simple digital camera • Captures images • Stores images in digital format – No film – Multiple images stored in camera • Number depends on amount of memory and bits used per image • Downloads images to PC • Only recently possible – Systems-on-a-chip • Multiple processors and memories on one IC – High-capacity flash memory • Very simple description used for example – Many more features with real digital camera • Variable size images. Embedded Systems Design: A Unified Hardware/Software Introduction. when commanded. This charge can then be converted to a 8-bit value where 0 represents no exposure while 255 represents very intense exposure of that cell to light. These values can be clocked out of the CCD by external logic through a standard parallel bus interface. Lens area Covered columns Electro- Pixel rows mechanical shutter Some of the columns are covered with a black strip of paint. Electronic circuitry Pixel columns Embedded Systems Design: A Unified Hardware/Software Introduction. 16 bits/pixel. The light-intensity of these pixels is used for zerobias adjustments of all the cells.Charge-coupled device (CCD) • Special sensor that captures a B/W image (8 bits/pixel. The electromechanical shutter is activated to expose the cells to light for a brief moment. but is different across rows • Some of left most columns blocked by black paint to detect zero-bias error – Reading of other than 0 in blocked cells is zero-bias error – Each row is corrected by subtracting the average error found in blocked cells for that row Covered cells 136 145 144 176 144 122 121 173 170 146 153 183 156 131 155 175 155 168 168 161 161 128 164 176 140 123 117 111 133 147 185 183 144 120 121 186 192 206 254 188 115 117 127 130 153 151 165 184 112 119 118 132 138 131 138 117 248 12 147 12 135 9 133 0 139 7 127 2 129 4 129 5 Before zero-bias adjustment Embedded Systems Design: A Unified Hardware/Software Introduction. …) • Light-sensitive silicon solid-state device composed of many cells When exposed to light. (c) 2000 Vahid/Givargis 6 Zero-bias error • Manufacturing errors cause cells to measure slightly above or below actual light intensity • Error is typically the same across columns. and then reads the 8-bit charge value of each cell. discharges the cells. activates the electromechanical shutter. The electronic circuitry. each cell becomes electrically charged. (c) 2000 Vahid/Givargis 14 10 9 0 7 0 4 5 Zero-bias adjustment -13 -11 -9 0 -7 -1 -4 -5 123 134 135 176 137 121 117 168 157 135 144 183 149 130 151 170 142 157 159 161 154 127 160 171 127 112 108 111 126 146 181 178 131 109 112 186 185 205 250 183 102 106 118 130 146 150 161 179 99 108 109 132 131 130 134 112 235 136 126 133 132 126 125 124 After zero-bias adjustment 7 . v) – F(u. (c) 2000 Vahid/Givargis 8 DCT step • Transforms original 8 x 8 block into a cosine-frequency domain – Upper-left corner values represent more of the essence of the image – Lower-right corner values represent finer details • Can reduce precision of these values and retain reasonable image quality • FDCT (Forward DCT) formula – C(h) = [ if (h == 0) then 1/sqrt(2) else 1.Compression • Store more images • Transmit image to PC in less time • JPEG (Joint Photographic Experts Group) – Popular standard format for representing digital images in a compressed form – Provides for a number of different modes of operation – Mode used in this chapter provides high compression ratios using DCT (discrete cosine transform) – Image data divided into blocks of 8 x 8 pixels – 3 steps performed on each block • DCT • Quantization • Huffman encoding Embedded Systems Design: A Unified Hardware/Software Introduction..7 Dxy FRV>ʌ[.v) = ¼ C(u) C(v) Ȉx=0..7 Ȉy=0.0 ] • Auxiliary function used in main function F(u. X@FRV>ʌ\. column y • IDCT (Inverse DCT) – Reverses process to obtain original block (not needed for this design) Embedded Systems Design: A Unified Hardware/Software Introduction. column v • Dxy is original pixel value at row x. (c) 2000 Vahid/Givargis 9 .Y@ • Gives encoded pixel at row u. (c) 2000 Vahid/Givargis 11 . thus compression Embedded Systems Design: A Unified Hardware/Software Introduction.Quantization step • Achieve high compression ratio by reducing image quality (loss compression) – Reduce bit precision of encoded data • Fewer bits needed for encoding • One way is to divide all values by a factor of 2 – Simple right shifts can do this – Dequantization would reverse process for decompression 1150 -81 14 2 44 36 -19 -5 39 -43 -3 115 -11 1 -61 -13 13 37 -11 -9 -7 21 -13 -11 -10 -73 -42 -12 -4 -4 -6 -17 26 -6 26 36 10 20 3 -4 -83 -2 -3 -23 -21 -28 3 -1 11 22 17 -18 7 -21 12 7 41 -5 -38 5 -8 14 -21 -4 Divide each cell’s value by 8 After being decoded using DCT 144 -10 2 0 6 5 -2 -1 5 0 -1 -8 2 -1 -1 -2 -5 14 0 -2 5 -1 3 -1 -1 -9 -5 -2 -1 -1 -1 -2 3 -1 3 5 1 3 0 -1 -10 0 0 -3 -3 -4 0 0 1 3 2 -2 1 -3 2 1 5 -1 -5 1 -1 2 -3 -1 After quantization Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 10 Huffman encoding step • Serialize 8 x 8 block of pixels – Values are converted into single list using zigzag pattern • Perform Huffman encoding – More frequently occurring pixels assigned short binary code – Longer binary codes left for less frequently occurring pixels • Each pixel in serial list converted to Huffman encoded values – Much shorter list. 1 for right traversal Pixel frequencies -1 15x 0 8x -2 6x 1 5x 2 5x 3 5x 5 5x -3 4x -5 3x -10 2x 144 1x -9 1x -8 1x -4 1x 6 1x 14 1x 6 4 3 5 29 -1 1 5 9 5 1 7 1 8 1 4 1 0 8 -2 4 5 -10 5 5 2 2 5 2 3 1 6 2 4 -5 1 14 6 2 3 5 1 1 6 0 1 Huffman encoding is reversible – Huffman codes Huffman tree -3 1 -4 1 -8 1 -9 1 144 -1 0 -2 1 2 3 5 -3 -5 -10 144 -9 -8 -4 6 14 00 100 110 010 1110 1010 0110 11110 10110 01110 111111 111110 101111 101110 011111 011110 No code is a prefix of another code Embedded Systems Design: A Unified Hardware/Software Introduction. image size.Huffman encoding example • Pixel frequencies on left – – • Pixel value –1 occurs 15 times Pixel value 14 occurs 1 time Build Huffman tree from bottom up – – Create one leaf node for each pixel value and assign frequency as node’s value Create an internal node by joining any two nodes whose sum is a minimal value • – • Repeat until complete binary tree Traverse tree from root to leaf to obtain binary code for leaf’s pixel value – • This sum is internal nodes value Append 0 for left traversal. image-size variables occupy N x 4 bytes • First image archived starting at address N x 4 • Global memory address updated to N x 4 + (compressed image size) • Memory requirement based on N. (c) 2000 Vahid/Givargis 12 Archive step • Record starting address and image size – Can use linked list • One possible way to archive images – If max number of images archived is N: • • • • Set aside memory for N addresses and N image-size variables Keep a counter for location of next available address Initialize addresses and image-size variables to 0 Set global memory address to N x 4 – Assuming addresses. (c) 2000 Vahid/Givargis 13 . and average compression ratio Embedded Systems Design: A Unified Hardware/Software Introduction. short document detailing market need for a low-end digital camera that: – – – – – – captures and stores at least 50 low-res images and uploads to PC. “output X should be input Y times 2”) – Initial specification may be very general and come from marketing dept. insignificant sales beyond 12 months Embedded Systems Design: A Unified Hardware/Software Introduction.001 watt or less”) – Functional requirements • System’s behavior (e. costs around $100 with single medium-size IC costing less that $25. (c) 2000 Vahid/Givargis 14 Requirements Specification • System’s requirements – what system should do – Nonfunctional requirements • Constraints on design metrics (e..000 if market entry < 6 months.g. image-size variables and global memory pointer accordingly Embedded Systems Design: A Unified Hardware/Software Introduction. • E. has long as possible battery life.000 if between 6 and 12 months.g.g. has expected sales volume of 200. 100.. “should use 0. (c) 2000 Vahid/Givargis 15 .Uploading to PC • When connected to PC and upload command received – Read images from memory – Transmit serially using UART – While transmitting • Reset pointers.. (c) 2000 Vahid/Givargis 16 Nonfunctional requirements (cont. (c) 2000 Vahid/Givargis 17 . but smaller would be cheaper • Power – Must operate below certain temperature (cooling fan not possible) – Therefore. electrical energy consumed while processing Energy: battery lifetime (power x time) • Constrained metrics – Values must be below (sometimes above) certain threshold • Optimization metrics – Improved as much as possible to improve product • A metric can be both constrained and optimization Embedded Systems Design: A Unified Hardware/Software Introduction. constrained metric • Size – Must use IC that fits in reasonably sized camera – Constrained and optimization metric • Constraint could be 200. constrained metric • Energy – Reducing power or time reduces energy – Optimized metric: want battery to last as long as possible Embedded Systems Design: A Unified Hardware/Software Introduction.) • Performance – Must process image fast enough to be useful – 1 sec reasonable constraint • Slower would be annoying • Faster not necessary for low-end of market – Therefore.000 gates.Nonfunctional requirements • Design metrics of importance based on initial specification – – – – Performance: time required to process image Size: number of elementary logic gates (2-input NAND gate) in IC Power: measure of avg. C output file 19 . • Mapping functions to a particular processor type not done at this stage Embedded Systems Design: A Unified Hardware/Software Introduction.. typically 640x480 or more) yes no Archive in memory yes More 8×8 blocks? no Done? Transmit serially serial output e.. or simply model – Also is first implementation • Can provide insight into operations of system Executable model of digital camera 101011010 110101010 010101101.. CCD...C 101010101 010101010 101010101 0.C image file CNTRL.Informal functional specification • Flowchart breaks functionality down into simpler functions • Each function’s details could then be described in English Zero-bias adjust CCD input DCT – Done earlier in chapter Quantize • Low quality image has resolution of 64 x 64 (only for example. . 011010..g. (c) 2000 Vahid/Givargis 18 Refined functional specification • Refine informal specification into one that can actually be executed • Can use C/C++ code to describe each function – Called system-level model.C CCDPP. (c) 2000 Vahid/Givargis CODEC.C UART. – Profiling can find computationally intensive functions • Can obtain sample output used to verify correctness of final implementation Embedded Systems Design: A Unified Hardware/Software Introduction. prototype. static unsigned rowIndex. buffer[rowIndex][colIndex] = CcdPopPixel(). colIndex = 0. } } } rowIndex = 0. colIndex++) { pixel = buffer[rowIndex][colIndex]. for(colIndex=0. colIndex. &pixel) == 1 ) { buffer[rowIndex][colIndex] = (char)pixel. colIndex = 0. void CcdppCapture(void) { colIndex = -1. } } return pixel. } Embedded Systems Design: A Unified Hardware/Software Introduction. colIndex. colIndex<SZ_COL. for(colIndex=0. } #include <stdio. pixel = buffer[rowIndex][colIndex]. rowIndex<SZ_ROW. CcdCapture().CCD module • • • • Simulates real CCD CcdInitialize is passed name of image file CcdCapture reads “image” from file CcdPopPixel outputs pixels one at a time void CcdInitialize(const char *imageFileName) { imageFileHandle = fopen(imageFileName. void CcdppInitialize() { rowIndex = -1. for(rowIndex=0. rowIndex = -1. colIndex<SZ_COL. rowIndex = -1. buffer[rowIndex][colIndex] -= bias. rowIndex++) { } char CcdppPopPixel(void) { char pixel. colIndex++) { char CcdPopPixel(void) { char pixel. static char buffer[SZ_ROW][SZ_COL]. colIndex = -1. return pixel. rowIndex = -1. for(rowIndex=0. rewind(imageFileHandle). } if( fscanf(imageFileHandle. if( ++colIndex == SZ_COL ) { } bias = (CcdPopPixel() + CcdPopPixel()) / 2. char bias. for(colIndex=0. rowIndex++) { static unsigned rowIndex. (c) 2000 Vahid/Givargis } 21 . } Embedded Systems Design: A Unified Hardware/Software Introduction. "%i". colIndex = 0. if( ++rowIndex == SZ_ROW ) { colIndex = -1. "r"). } } } } rowIndex = 0.h> #define SZ_ROW 64 void CcdCapture(void) { #define SZ_COL (64 + 2) int pixel. (c) 2000 Vahid/Givargis 20 CCDPP (CCD PreProcessing) module • • Performs zero-bias adjustment CcdppCapture uses CcdCapture and CcdPopPixel to obtain image Performs zero-bias adjustment after each row read in • #define SZ_ROW 64 #define SZ_COL 64 static char buffer[SZ_ROW][SZ_COL]. static FILE *imageFileHandle. if( ++colIndex == SZ_COL ) { colIndex = 0. colIndex++) { if( ++rowIndex == SZ_ROW ) { colIndex = -1. rowIndenx<SZ_ROW. colIndex<SZ_COL. obuffer[8][8]. y++) obuffer[x][y] = FDCT(x. y. idx++. for(x=0. • • • • Models FDCT encoding ibuffer holds original 8 x 8 block obuffer holds encoded 8 x 8 block CodecPushPixel called 64 times to fill ibuffer with original block • CodecDoFdct called once to transform 8 x 8 block void CodecInitialize(void) { idx = 0. (c) 2000 Vahid/Givargis 23 . if( idx == 64 ) idx = 0. ibuffer). idx++. x<8. } Embedded Systems Design: A Unified Hardware/Software Introduction. does not receive • UartInitialize is passed name of file to output to • UartSend transmits (writes to output file) bytes at a time #include <stdio. x++) { for(y=0. • CodecPopPixel called 64 times to retrieve encoded block from obuffer } short CodecPopPixel(void) { short p. } void UartSend(char d) { fprintf(outputFileHandle. y.UART module • Actually a half UART – Only transmits. (c) 2000 Vahid/Givargis 22 CODEC module static short ibuffer[8][8]. – Explained in next slide } idx = 0. p = obuffer[idx / 8][idx % 8]. y<8. ibuffer[idx / 8][idx % 8] = p. } void CodecDoFdct(void) { int x. void UartInitialize(const char *outputFileName) { outputFileHandle = fopen(outputFileName. return p. idx. } void CodecPushPixel(short p) { if( idx == 64 ) idx = 0. } Embedded Systems Design: A Unified Hardware/Software Introduction. "w"). "%i\n". (int)d).h> static FILE *outputFileHandle. 7 Dxy x FRVʌ[.v) = ¼ x C(u) x C(v) Ȉx=0..CODEC (cont.) • Implementing FDCT formula C(h) = if (h == 0) then 1/sqrt(2) else 1..7 Ȉy=0.0 F(u. X. [FRVʌ\. Y. -12539. int x. -30273. i++) for(j=0. -12539. 23170. { 32768. j<NUM_COL_BLOCKS. 18204. UartSend(((char*)&temp)[0]). short img[8][8]) { double s[8]. u). { 32768. 12539. { 32768. 12539. 12539. -32138. v) + img[x][3] * COS(3. 27245. i<NUM_ROW_BLOCKS. } static short buffer[SZ_ROW][SZ_COL]. k<8. j++) CodecDoFdct(). -18204 }. j<SZ_COL. v) + img[x][5] * COS(5. -32138.0.FDCT */ buffer[i][j] = CcdppPopPixel(). 27245. -18204. 18204. k. temp. return (short)(r * . } } Embedded Systems Design: A Unified Hardware/Software Introduction. -27245. void CntrlInitialize(void) {} Embedded Systems Design: A Unified Hardware/Software Introduction. -12539. 6392. (c) 2000 Vahid/Givargis 24 CNTRL (controller) module • • • • • Heart of the system CntrlInitialize for consistency with other modules only CntrlCaptureImage uses CCDPP module to input image and place in buffer CntrlCompressImage breaks the 64 x 64 buffer into 8 x 8 blocks and performs FDCT on each block using the CODEC module – Also performs quantization on each block CntrlSendImage transmits encoded image serially using UART module void CntrlSendImage(void) { for(i=0. -23170./* part 1 . 12539. i++) (char)buffer[i * 8 + k][j * 8 + l]). } } static double C(int h) { for(x=0. -27245. l<8. v). l. v) + img[x][1] * COS(1. } } } /* send upper byte */ /* send lower byte */ void CntrlCompressImage(void) { for(i=0. v) + static short ONE_OVER_SQRT_TWO = 23170. { 32768. 32138 }. i. j++) { for(k=0. l++) { buffer[i * 8 + k][j * 8 + l] = CodecPopPixel(). j<SZ_COL. int v. 18204. 23170. -23170. { 32768. int uv) { img[x][4] * COS(4. 32138. v) + img[x][7] * COS(7.0. -18204. for(k=0. i<SZ_ROW. -27245. -6392. -12539. -6392. CodecPushPixel( for(i=0. v) + static double COS(int xy. -27245 }.678 chosen in order to store each value using only 2 bytes of memory Fixed-point representation explained more later FDCT unrolls inner loop of summation. 30273. v) + return COS_TABLE[xy][uv] / 32768. 6392. 32138. r = 0. -30273.678 and rounded to nearest integer 32. -30273. 18204 }. 6392 }. 30273. { 32768. 30273. -30273. l<8. 27245 }. 30273. implements outer summation as two consecutive for loops • { 32768. j. img[x][6] * COS(6.25 * C(u) * C(v)). k++) } #define SZ_ROW 64 #define SZ_COL 64 #define NUM_ROW_BLOCKS (SZ_ROW / 8) #define NUM_COL_BLOCKS (SZ_COL / 8) for(l=0.0 : ONE_OVER_SQRT_TWO / 32768. -32138 }. return h ? 1. x<8. j++) { temp = buffer[i][j]. -18204. 23170. -23170. x++) { s[x] = img[x][0] * COS(0. 6392. UartSend(((char*)&temp)[1]). x++) r += s[x] * COS(x. /* part 2 . for(j=0. i<SZ_ROW. for(x=0. -6392 } }. static int FDCT(int u. k<8. x<8. -32138. i++) for(j=0. static const short COS_TABLE[8][8] = { Only 64 possible inputs to COS. { 32768. k++) void CntrlCaptureImage(void) { for(l=0. so table can be used to save performance time • – – – Floating-point values multiplied by 32.quantization */ buffer[i*8+k][j*8+l] >>= 6. 27245. -23170. -6392. img[x][2] * COS(2. l++) CcdppCapture(). 32138. 23170. (c) 2000 Vahid/Givargis } } 25 . CodecInitialize(). then uses CNTRL module to capture.txt". } Embedded Systems Design: A Unified Hardware/Software Introduction. and time-to-market constraints • If timing constraint not satisfied then later implementations could: – use single-purpose processors for time-critical functions – rewrite functional specification Embedded Systems Design: A Unified Hardware/Software Introduction. compress. CntrlSendImage(). no iterative real-time behavior ( no while(1) ) • This system-level model can be used for extensive experimentation – Bugs much easier to correct here rather than in later models int main(int argc.txt". /* simulate functionality */ CntrlCaptureImage(). char *imageFileName = argc > 2 ? argv[2] : "image. (c) 2000 Vahid/Givargis 27 . (c) 2000 Vahid/Givargis 26 Design • Determine system’s architecture – Processors • Any combination of single-purpose (custom or standard) or general-purpose processors – Memories. size. CntrlCompressImage(). CcdppInitialize(). buses • Map functionality to that architecture – Multiple functions on one processor – One function on one or more processors • Implementation – A particular architecture and mapping – Solution space is set of all implementations • Starting point – Low-end general-purpose processor connected to flash memory • All functionality mapped to software running on processor • Usually satisfies power.Putting it all together • Main initializes all modules. CcdInitialize(imageFileName). CntrlInitialize(). char *argv[]) { char *uartOutputFileName = argc > 1 ? argv[1] : "uart_out. and transmit one image – Note: only for off-line test. /* initialize the modules */ UartInitialize(uartOutputFileName). 12 cycles per instruction • Executes one million instructions per second – CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations • ~100 assembly instructions each iteration • 409. one image per second not possible – 12 MHz. (c) 2000 Vahid/Givargis 29 . (c) 2000 Vahid/Givargis 28 Implementation 2: Microcontroller and CCDPP EEPROM SOC UART 8051 RAM CCDPP • CCDPP function implemented on custom single-purpose processor – Improves performance – less microcontroller cycles – Increases NRE cost and time-to-market – Easy to implement • Simple datapath • Few states in controller • Simple UART easy to implement as standard single-purpose processor also • EEPROM for program memory and RAM for data memory added as well Embedded Systems Design: A Unified Hardware/Software Introduction.000 (4096 x 100) instructions per image • Half of time budget for reading image alone – Would be over budget after adding compute-intensive DCT and Huffman encoding Embedded Systems Design: A Unified Hardware/Software Introduction.Implementation 1: Microcontroller alone • • • • • Low-end processor could be Intel 8051 microcontroller (core) Total IC cost including (application) NRE about $5 Well below 200 mW power Time-to-market about 3 months However. then I++ 31 . (c) 2000 Vahid/Givargis FSMD description of UART invoked Idle : I=0 I<8 Stop: Transmi t HIGH I=8 Start: Transmi t LOW Data: Transmit data(I).Microcontroller • Synthesizable version of Intel 8051 available – Written in VHDL – Captured at register transfer level (RTL) • • • Fetches instruction from ROM Decodes using Instruction Decoder ALU executes arithmetic operations – Source and destination registers reside in RAM • • Block diagram of Intel 8051 processor core 4K ROM Instruction Decoder Controller 128 RAM ALU Special data movement instructions used to load and store externally Special program generates VHDL description of ROM from output of C compiler/linker To External Memory Bus Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 30 UART • UART in idle mode until invoked – UART invoked when 8051 executes store instruction with UART’s enable register as target address • Memory-mapped communication between 8051 and all single-purpose processors • Lower 8-bits of memory address for RAM • Upper 8-bits of memory address for memory-mapped I/O devices • Start state transmits 0 indicating start of byte transmission then transitions to Data state • Data state sends 8 bits serially then transitions to Stop state • Stop state transmits 1 indicating transmission done then transitions back to idle mode Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 32 Connecting SOC components • Memory-mapped – All single-purpose processors and RAM are connected to 8051’s memory bus • Read – – – – – – Processor places address on 16-bit address bus Asserts read control signal for 1 cycle Reads data from 8-bit data bus 1 cycle later Device (RAM or SPP) detects asserted read control signal Checks address Places and holds requested data on data bus for 1 cycle • Write – – – – – Processor places address and data on address and data bus Asserts write control signal for 1 clock cycle Device (RAM or SPP) detects asserted write control signal Checks address bus Reads and stores data from data bus Embedded Systems Design: A Unified Hardware/Software Introduction. column indices GetRow state reads in one row from CCD to B ComputeBias state computes bias for that row and stores in variable Bias FixBias state iterates over same row subtracting Bias from each element NextRow transitions to GetRow for repeat of process on next row or to Idle state when all 64 rows completed Idle: GetRow: invoked B[R][C]=Pxl C=C+1 R=0 C=0 C = 66 R = 64 R < 64 NextRow: ComputeBias: C < 64 R++ C=0 C = 64 C < 66 Bias=(B[R][11] + B[R][10]) / 2 C=0 FixBias: B[R][C]=B[R][C]-Bias Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 33 . C are buffer’s row.CCDPP • • Hardware implementation of zero-bias operations Interacts with external CCD chip – • • • CCD chip resides external to our SOC mainly because combining CCD with ordinary logic not feasible – 66 bytes: 64 pixels + 2 blacked-out pixels • • • FSMD description of CCDPP Internal buffer. memory-mapped to 8051 Variables R. B. "w").Software • System-level model provides majority of code – Module hierarchy. static unsigned char xdata U_STAT_REG _at_ 65534. (c) 2000 Vahid/Givargis 34 Analysis • Entire SOC tested on VHDL simulator – Interprets VHDL descriptions and functionally simulates execution of system • Recall program code translated to VHDL description of ROM – Tests for correct functionality – Measures clock cycles to process one image (performance) • Gate-level description obtained through synthesis – Synthesis tool like compiler for SPPs – Simulate gate-level models to obtain data for power analysis Obtaining design metrics of interest VHDL VHDL VHDL VHDL simulator Power equation Synthesis tool Gate level simulator gates Execution time gates gates Sum gates Power Chip area • Number of times gates switch from 1 to 0 or 0 to 1 – Count number of gates for chip area Embedded Systems Design: A Unified Hardware/Software Introduction. procedure names. (c) 2000 Vahid/Givargis 35 . } void UartSend(char d) { fprintf(outputFileHandle. (int)d). } Embedded Systems Design: A Unified Hardware/Software Introduction. void UARTInitialize(void) {} void UARTSend(unsigned char d) { while( U_STAT_REG == 1 ) { /* busy wait */ } U_TX_REG = d. and main program unchanged • Code for UART and CCDPP modules must be redesigned – Simply replace with memory assignments • • • • xdata used to load/store variables over external memory bus _at_ specifies memory address to store these variables Byte sent to U_TX_REG by processor will invoke UART U_STAT_REG used by UART to indicate its ready for next byte – UART may be much slower than processor – Similar modification for CCDPP code • All other modules untouched Original code from system-level model Rewritten UART module static unsigned char xdata U_TX_REG _at_ 65535. void UartInitialize(const char *outputFileName) { outputFileHandle = fopen(outputFileName. "%i\n". } #include <stdio.h> static FILE *outputFileHandle. (c) 2000 Vahid/Givargis 36 Implementation 3: Microcontroller and CCDPP/Fixed-Point DCT • 9.Implementation 2: Microcontroller and CCDPP • Analysis of implementation 2 – Total execution time for processing one image: • 9.000 gates Embedded Systems Design: A Unified Hardware/Software Introduction.1 seconds still doesn’t meet performance constraint of 1 second • DCT operation prime candidate for improvement – Execution of implementation 2 shows microprocessor spends most cycles here – Could design custom hardware like we did for CCDPP • More complex so more design effort – Instead.1 s x 0. will speed up DCT functionality by modifying behavior Embedded Systems Design: A Unified Hardware/Software Introduction.033 watt – Energy consumption: • 0. (c) 2000 Vahid/Givargis 37 .1 seconds – Power consumption: • 0.033 watt) – Total chip area: • 98.30 joule (9. 0625 = 0.24 § 16 (2^4) possible values for fraction.DCT floating-point cost • Floating-point cost – – – – DCT uses ~260 floating-point operations per pixel transformation 4096 (64 x 64) pixels per image 1 million floating-point operations per image No floating-point support with Intel 8051 • Compiler must emulate – Generates procedures for each floating-point operation • mult.125 §PRUHELWVIRUIUDFWLRQZRXOGLQFUHDVHDFFXUDF\. > 10 million integer operations per image – Procedures increase code size • Fixed-point arithmetic can improve on this Embedded Systems Design: A Unified Hardware/Software Introduction. represent 3.14 as 8-bit integer with 4 bits for fraction • • • • • • 2^4 = 16 3.125 3(0011) + 0. more accurate the representation – Remaining bits represent portion of real number before decimal point • Translating a real constant to a fixed-point representation – Multiply real value by 2 ^ (# of bits used for fractional part) – Round to nearest integer – E. (c) 2000 Vahid/Givargis 38 Fixed-point arithmetic • Integer used to represent a real number – Constant number of integer’s bits represents fractional portion of real number • More bits.0625 (1/16) Last 4 bits (0010) = 2 2 x 0.g. add – Each procedure uses tens of integer operations – Thus.14 x 16 = 50..125 = 3. each represents 0. Embedded Systems Design: A Unified Hardware/Software Introduction. (c) 2000 Vahid/Givargis 39 . 59. 35 }. { 64. (c) 2000 Vahid/Givargis 40 Fixed-point implementation of CODEC • COS_TABLE gives 8-bit fixed-point representation of cosine values static const char code COS_TABLE[8][8] = { • 6 bits used for fractional portion • Result of multiplications shifted right by 6 static unsigned char C(int h) { return h ? 64 : ONE_OVER_SQRT_TWO. 12 }. 12. idx = 0.375 § • Range of real values used is limited by bit widths of possible resulting values Embedded Systems Design: A Unified Hardware/Software Introduction. y++) outBuffer[x][y] = F(x. -12. -24. -35. -53. 62.0625 = 8. -35. 64.71*16) = (3.0625 = 5. y. 45. 59. 35.14 + 2. -59. x++) r += (s[x] * COS_TABLE[x][u]) >> 6. 3. -59. inBuffer). 62. x<8. -35. j.. { 64.0110 • 8(1000) + 6(0110) x 0. -62. -59. } void CodecPushPixel(short p) { unsigned char x.Fixed-point arithmetic operations • Addition – Simply add integer representations – E. 45.14*2.14 ĺ 50 = 0011. static const char ONE_OVER_SQRT_TWO = 5. 24. j++) s[x] += (img[x][j] * COS_TABLE[j][v] ) >> 6. 24. 62. 59. } for(x=0. idx++. 53 }.1101 • 5(0101) + 13(1101) x 0.71 ĺ 43 = 0010.g. 12. -45. -53. y. 3. 24. 53. -62.0010 • 2. x<8. -45. -35 }. x++) for(y=0. 59. -45. static short xdata inBuffer[8][8]. } 41 . for(x=0. -59.g.. -45.85 • Multiply – Multiply integer representations – Shift result right by # of bits in fractional part – E. -24. idx. 62 }. } for(j=0. int v.14*16) * (2. { 64. for(x=0. if( idx == 64 ) idx = 0. 12. r = 0. -62. { 64.71 = 5. { inBuffer[idx / 8][idx % 8] = p << 6. -53 }. j<8. 45. -12. return (short)((((r * (((16*C(u)) >> 6) *C(v)) >> 6)) >> 6) >> 6).1011 • 50 + 43 = 93 = 0101.71*16) *16 ] • >> 4 = 1000. -12 } }.} static int F(int u.5094 • 50 * 43 = 2150 = 1000. 35.85 • 3. 24. (c) 2000 Vahid/Givargis void CodecDoFdct(void) { unsigned short x. x++) { s[x] = 0. -24. { 64. 53.14 * 2. 53. -53. { 64. -24. void CodecInitialize(void) { idx = 0. x<8. short img[8][8]) { long s[8].71 = 8.8125 §5. { 64. -12. -62 }. 35. y<8. } Embedded Systems Design: A Unified Hardware/Software Introduction. 45.01100110 [ = (3. outBuffer[8][8]. (c) 2000 Vahid/Givargis 42 Implementation 4: Microcontr.5 seconds – Power consumption: • 0.050 joule (1. and CCDPP/DCT and CODEC EEPROM SOC CODEC RAM 8051 UART CCDP P • Performance close but not good enough • Must resort to implementing CODEC in hardware – Single-purpose processor to perform DCT on 8 x 8 block Embedded Systems Design: A Unified Hardware/Software Introduction.000 less gates (less memory needed for code) Embedded Systems Design: A Unified Hardware/Software Introduction.5 s x 0.033 watt (same as 2) – Energy consumption: • 0.Implementation 3: Microcontroller and CCDPP/Fixed-Point DCT • Analysis of implementation 3 – Use same analysis techniques as implementation 2 – Total execution time for processing one image: • 1. (c) 2000 Vahid/Givargis 43 .000 gates • 8.033 watt) • Battery life 6x longer!! – Total chip area: • 90. 040 watt) • Battery life 12x longer than previous implementation!! – Total chip area: • 128.099 seconds (well under 1 sec) – Power consumption: • 0.CODEC design • 4 memory mapped registers – C_DATAI_REG/C_DATAO_REG used to push/pop 8 x 8 block into and out of CODEC – C_CMND_REG used to command CODEC • Writing 1 to this register invokes CODEC – C_STAT_REG indicates CODEC done and ready for next block • Polled in software • Direct translation of C code to VHDL for actual hardware implementation – Fixed-point version used • CODEC module in software changed similar to UART/CCDPP in implementation 2 Rewritten CODEC software static unsigned char xdata C_STAT_REG _at_ 65527. } void CodecDoFdct(void) { C_CMND_REG = 1.00040 joule (0. while( C_STAT_REG == 1 ) { /* busy wait */ } } Embedded Systems Design: A Unified Hardware/Software Introduction.099 s x 0.000 gates • Significant increase over previous implementations Embedded Systems Design: A Unified Hardware/Software Introduction. static unsigned char xdata C_CMND_REG _at_ 65528. static unsigned char xdata C_DATAO_REG _at_ 65530. (c) 2000 Vahid/Givargis 45 . void CodecInitialize(void) {} void CodecPushPixel(short p) { C_DATAO_REG = (char)p.040 watt • Increase over 2 and 3 because SOC has another processor – Energy consumption: • 0. static unsigned char xdata C_DATAI_REG _at_ 65529. } short CodecPopPixel(void) { return ((C_DATAI_REG << 8) | C_DATAI_REG). and CCDPP/DCT and CODEC • Analysis of implementation 4 – Total execution time for processing one image: • 0. (c) 2000 Vahid/Givargis 44 Implementation 4: Microcontr. 040 98.099 0. (c) 2000 Vahid/Givargis 46 Summary • Digital camera example – Specifications in English and executable language – Design metrics: performance.000 0.033 0.0040 • Implementation 3 – Close in performance – Cheaper – Less time to build • Implementation 4 – Great performance and energy consumption – More expensive and may miss time-to-market window • If DCT designed ourselves then increased NRE cost and time-to-market • If existing DCT purchased then increased IC cost (IP royalties) • Which is better? Embedded Systems Design: A Unified Hardware/Software Introduction. but still too slow Fixed-point arithmetic: almost fast enough Additional coprocessor for compression: fast enough.000 128.1 1.30 0. but expensive and hard to design – Tradeoffs between hw/sw – this is the main design concern Embedded Systems Design: A Unified Hardware/Software Introduction.5 0.Summary of implementations Performance (second) Power (watt) Size (gate) Energy (joule) Implementation 2 Implementation 3 Implementation 4 9.000 90. (c) 2000 Vahid/Givargis 47 . power and area • Several implementations – – – – Microcontroller: too slow Microcontroller and coprocessor: better.050 0.033 0. USA .. . 2 . VA. .Introduction to VHDL Slides adapted from the “Introduction to VLSI’’ course GM University.. 1 VHDL • VHDL is a language for describing digital hardware used by industry worldwide • VHDL is an acronym for VHSIC (Very High Speed Integrated Circuit) Hardware Description Language . Genesis of VHDL State of art circa 1980 • Multiple design entry methods and hardware description languages in use • No or limited portability of designs between CAD tools from different vendors • Objective: shortening the time from a design concept to implementation from 18 months to 6 months .. . .. 4 .2 released • December 1987: VHDL became IEEE Standard 1076-1987 and in 1988 an ANSI standard . 3 A Brief History of VHDL • June 1981: Woods Hole Workshop • July 1983: contract awarded to develop VHDL • Intermetrics • IBM • Texas Instruments • August 1985: VHDL Version 7. .Three versions of VHDL • VHDL-87 • VHDL-93 • VHDL-01 ... 6 VHDL for Specification VHDL for Simulation VHDL for Synthesis . . 5 . Levels of design description Algorithmic level Register Transfer Level Level of description most suitable for synthesis Logic (gate) level Circuit (transistor) level Physical (layout) level .. 7 Register Transfer Logic (RTL) Design Description Combinational Logic Combinational Logic … Registers . 8 . .. . 3. +..g.) Do not use two or more consecutive underscore characters (__) within a name (e. 5.. All names should start with an alphabet character (a-z or A-Z) Use only alphabet characters (a-z or A-Z) digits (0-9) and underscore (_) Do not use any punctuation or reserved characters within a name (!. -. ?.. . etc. . &. Sel__A is invalid) All names and labels in a given entity and architecture must be unique .Naming and Labeling (1) • VHDL is not case sensitive Example: Names or labels databus Databus DataBus DATABUS are all equivalent .. 10 . . 4. 2. 9 Naming and Labeling (2) General rules of thumb (according to VHDL-87) 1. Free Format • VHDL is a “free format” language No formatting conventions.reading data from the input FIFO . Space and carriage return treated the same way. . Example: if (a=b) then or if (a=b) then or if (a = b) then are all equivalent .main subcircuit Data_in <= Data_bus.. i. such as spacing or indentation imposed by VHDL compilers. -.. . “--” Comment indicator can be placed anywhere in the line Any text that follows in the same line is treated as a comment Carriage return terminates a comment No method for commenting a block extending over a couple of lines Examples: -. 11 Comments • Comments in VHDL are indicated with a “double dash”..e. 12 . One entity can have many different architectures.. Entity name Port names Port type ENTITY nand_gate IS PORT( a : IN STD_LOGIC. END nand_gate. 13 Entity Declaration • Entity Declaration describes the interface of the component. .most basic building block of a design. architecture 2 architecture 3 . 14 . Reserved words Semicolon No Semicolon Port modes (data flow directions) .e.. .Design Entity design entity entity declaration architecture 1 Design Entity . z : OUT STD_LOGIC ). b : IN STD_LOGIC. i. input and output ports. port_name : signal_mode signal_type). . END entity_name. END model.Entity declaration – simplified syntax ENTITY entity_name IS PORT ( port_name : signal_mode signal_type. 15 Architecture • Describes an implementation of a design entity. . • Architecture example: ARCHITECTURE model OF nand_gate IS BEGIN z <= a NAND b. . port_name : signal_mode signal_type... 16 . …………. . ARCHITECTURE model OF nand_gate IS BEGIN z <= a NAND b.all. b : IN STD_LOGIC. USE ieee. . 17 Entity Declaration & Architecture nand_gate. END nand_gate. z : OUT STD_LOGIC).Architecture – simplified syntax ARCHITECTURE architecture_name OF entity_name IS [ declarations ] BEGIN code END architecture_name. . . ..std_logic_1164. 18 . END model..vhd LIBRARY ieee. ENTITY nand_gate IS PORT( a : IN STD_LOGIC. 20 Mode out Entity Port signal z c Driver resides inside the entity Can’t read out within an entity c <= z .. . 19 ..Mode In Port signal Entity a Driver resides outside the entity . . ... . 22 .Mode out with signal Entity Port signal x c z Signal X can be read inside the entity Driver resides inside the entity z <= x c <= x . 21 Mode inout Entity Port signal a Signal can be read inside the entity Driver may reside both inside and outside of the entity . 23 Port Modes The Port Mode of the interface describes the direction in which data travels with respect to the component • In: Data comes in this port and can only be read within the entity. It can only appear on the left side of a signal assignment. . . It can appear only on the right side of a signal or variable assignment.. • Inout: The value of a bi-directional port can be read and updated within the entity model. It cannot be read. • Out: The value of an output port can only be updated within the entity. which means that in an assignment statement the signal can appear on the left and right sides of the <= operator . It can appear on both sides of a signal assignment. • Buffer: Used for a signal that is an output from an entity.. The value of the signal can be used inside the entity.Mode buffer Entity Port signal z c Driver resides inside the entity Port signal Z can be read inside the entity c <= z . 24 . all. END model. 26 .syntax LIBRARY library_name. . . std_logic_1164 USE ieee. ARCHITECTURE model OF nand_gate IS BEGIN z <= a NAND b. END nand_gate.package_name. z : OUT STD_LOGIC).. USE library_name. ENTITY nand_gate IS PORT( a : IN STD_LOGIC. 25 Library declarations .Library declarations Library declaration Use all definitions from the package LIBRARY ieee. b : IN STD_LOGIC.std_logic_1164. .. .package_parts. 27 Libraries • ieee Specifies multi-level logic system. 28 . REAL. INTEGER. ..). including STD_LOGIC.. UNSIGNED. etc.Fundamental parts of a library LIBRARY PACKAGE 1 PACKAGE 2 TYPES CONSTANTS FUNCTIONS PROCEDURES COMPONENTS TYPES CONSTANTS FUNCTIONS PROCEDURES COMPONENTS . basic type conversion functions. basic text i/o functions. BOOLEAN. and STD_LOGIC_VECTOR data types Need to be explicitly declared • std Specifies pre-defined data types (BIT. arithmetic operations. SIGNED. etc. . Visible by default • work Current designs after compilation . ‘-’ Don't Care . USE ieee.. ARCHITECTURE model OF nand_gate IS BEGIN z <= a NAND b. Models a pull down. .std_logic_1164.STD_LOGIC LIBRARY ieee.all. END model. b : IN STD_LOGIC.. What is STD_LOGIC you ask? . ‘H’ Weak (Weakly driven) 1. z : OUT STD_LOGIC). ENTITY nand_gate IS PORT( a : IN STD_LOGIC. END nand_gate. Models a pull up. 30 . . 29 STD_LOGIC type demystified Value Meaning ‘X’ Forcing (Strong driven) Unknown ‘0’ Forcing (Strong driven) 0 ‘1’ Forcing (Strong driven) 1 ‘Z’ High Impedance ‘W’ Weak (Weakly driven) Unknown ‘L’ Weak (Weakly driven) 0. More on STD_LOGIC Meanings (1) ‘1’ ‘X’ Contention on the bus X ‘0’ . 31 More on STD_LOGIC Meanings (2) ‘0’ 0 . ... 32 . . •Can be assigned to outputs for the case of invalid inputs(may produce significant improvement in resource utilization after synthesis). •Use with caution ‘1’ = ‘-’ give FALSE . 33 More on STD_LOGIC Meanings (4) ‘-’ •Do not care... .More on STD_LOGIC Meanings (3) VDD VDD ‘H’ ‘1’ ‘0’ ‘L’ . 34 . . 36 . . . 35 Signals SIGNAL a : STD_LOGIC.. b 8 bus . a 1 wire SIGNAL b : STD_LOGIC_VECTOR(7 DOWNTO 0)..Resolving logic levels X 0 1 Z W L H - X 0 1 Z W L H - X X X X X X X X X 0 X 0 0 0 0 X X X 1 1 1 1 1 X X 0 1 Z W L H X X 0 1 W W W W X X 0 1 L W L W X X 0 1 H W W H X X X X X X X X X . -.. SIGNAL c: STD_LOGIC_VECTOR(3 DOWNTO 0).Octal base . c <= a & b. SIGNAL b: STD_LOGIC_VECTOR(3 DOWNTO 0). 37 Vectors and Concatenation SIGNAL a: STD_LOGIC_VECTOR(3 DOWNTO 0). e: STD_LOGIC_VECTOR(7 DOWNTO 0). b <= ”0000”.e <= ”00001111” .d <= ”00001111” e <= ‘0’ & ‘0’ & ‘0’ & ‘0’ & ‘1’ & ‘1’ & ‘1’ & ‘1’. -. -. SIGNAL c. a <= ”0000”.You can use ‘_’ to increase readability e <= X”AF67”.c = ”00001111” d <= ‘0’ & ”0001111”. . -.Binary base explicitly specified d <= ”0110_0111”. -. -.. ………. SIGNAL e: STD_LOGIC_VECTOR(15 DOWNTO 0). .Standard Logic Vectors SIGNAL a: STD_LOGIC.Binary base assumed by default c <= B”0000”. SIGNAL f: STD_LOGIC_VECTOR(8 DOWNTO 0). 38 . d. b <= ”1111”.Hexadecimal base f <= O”723”. SIGNAL d: STD_LOGIC_VECTOR(7 DOWNTO 0). -. SIGNAL b: STD_LOGIC_VECTOR(3 DOWNTO 0). -. a <= ‘1’. .. Subset most suitable for synthesis . 39 . 40 xor3 Example . . .VHDL Design Styles VHDL Design Styles structural dataflow Concurrent statements Components and interconnects behavioral Sequential statements • Registers • State machines • Test benches • Algorithm spec. STD_LOGIC.Entity xor3 ENTITY xor3 PORT( A : IN B : IN C : IN Result ). . Result <=U1_out XOR C. 42 . end xor3. 41 Dataflow Architecture (xor3 gate) ARCHITECTURE dataflow OF xor3 IS SIGNAL U1_out: STD_LOGIC. : OUT STD_LOGIC . BEGIN U1_out <=A XOR B.. IS STD_LOGIC. .. U1_out . STD_LOGIC. END dataflow. I2 => B. BEGIN U1: xor2 PORT MAP (I1 => A.. • Data Flow is most useful style when series of Boolean equations can represent a logic. Concurrent statements are evaluated at the same time. COMPONENT xor2 IS PORT( I1 : IN STD_LOGIC. I2 : IN STD_LOGIC. . . order of these statements doesn’t matter. A B C Result XOR3 U1_OUT A B C RESULT XOR3 U2: xor2 PORT MAP (I1 => U1_OUT. END COMPONENT. I2 => C. thus. END structural. 44 .Dataflow Description • Describes how data moves through the system and the various processing steps.. • Data Flow uses series of concurrent statements to realize logic. Y : OUT STD_LOGIC ). 43 . Structural Architecture (xor3 gate) I1 I2 Y XOR2 ARCHITECTURE structural OF xor3 IS SIGNAL U1_OUT: STD_LOGIC. Y => U1_OUT). . Y => Result). Y : OUT STD_LOGIC ). U1: xor2 PORT MAP (I1 => A. Y : OUT STD_LOGIC ).. . I2 : IN STD_LOGIC. END COMPONENT. U1: xor2 PORT MAP (A. . 46 . I2 => B. U1_OUT). 45 Component and Instantiation (2) • Positional association connectivity (not recommended) COMPONENT xor2 IS PORT( I1 : IN STD_LOGIC.Component and Instantiation (1) • Named association connectivity (recommended) COMPONENT xor2 IS PORT( I1 : IN STD_LOGIC. B. I2 : IN STD_LOGIC. END COMPONENT. . Y => U1_OUT).. . abstract components. END IF. • Structural style is useful when expressing a design that is naturally composed of sub-blocks. This style is the closest to schematic capture and utilizes simple building blocks to compose logic functions. • Components are interconnected in a hierarchical manner. • Structural descriptions may connect simple gates or complex. . .Structural Description • Structural design is the simplest to understand. END behavioral. 47 Behavioral Architecture (xor3 gate) ARCHITECTURE behavioral OF xor3 IS BEGIN xor3_behave: PROCESS (A. . ELSE Result <= '0'... END PROCESS xor3_behave. .C) BEGIN IF ((A XOR B XOR C) = '1') THEN Result <= '1'.B. 48 . Behavioral Description • It accurately models what happens on the inputs and outputs of the black box (no matter what is inside and how it works). • This style uses PROCESS statements in VHDL.. .. . . 50 . 49 Testbench Block Diagram Testbench Processes Generating Design Under Test (DUT) Stimuli Observed Outputs . 51 Testbench Anatomy ENTITY tb IS --TB entity has no ports END tb. END arch_tb. DUT:TestComp PORT MAP( ). 52 . .. • Since Testbench is written in VHDL. different architectures) of the same design.e. . . • The same Testbench can be easily adapted to test different implementations (i. it is not restricted to a single simulation tool (portability). END COMPONENT. ----------------------------------------------------BEGIN testSequence: PROCESS -. ARCHITECTURE arch_tb OF tb IS --Local signals and constants COMPONENT TestComp --All Design Under Test component declarations PORT ( ).. -.Testbench Defined • Testbench applies stimuli (drives the inputs) to the Design Under Test (DUT) and (optionally) verifies expected outputs. • The results can be viewed in a waveform window or written to a file.Input stimuli END PROCESS.Instantiations of DUTs . C : IN STD_LOGIC. .signals mapped to the input and inout ports of tested entity SIGNAL test_vector: STD_LOGIC_VECTOR(2 DOWNTO 0). END xor3_tb_architecture. Result => test_result). SIGNAL test_result : STD_LOGIC. test_vector <= "011"..Stimulus signals . -.Testbench for XOR3 (1) LIBRARY ieee. WAIT FOR 10 ns. 54 . WAIT FOR 10 ns. WAIT FOR 10 ns. test_vector <= "001". USE ieee. . .all. WAIT FOR 10 ns. test_vector <= "010".Component declaration of the tested unit COMPONENT xor3 PORT( A : IN STD_LOGIC.std_logic_1164. 53 Testbench for XOR3 (2) BEGIN UUT : xor3 PORT MAP ( A => test_vector(0). WAIT FOR 10 ns. END PROCESS. WAIT FOR 10 ns. .. ). ARCHITECTURE xor3_tb_architecture OF xor3_tb IS -. Testing: PROCESS BEGIN test_vector <= "000". test_vector <= "110". ENTITY xor3_tb IS END xor3_tb. END COMPONENT. C => test_vector(2). test_vector <= "100". B => test_vector(1). WAIT FOR 10 ns. Result : OUT STD_LOGIC ). WAIT FOR 10 ns. B : IN STD_LOGIC. test_vector <= "111". test_vector <= "101". When declared in an ENTITY declaration. the constant is truly global. CONSTANT buffer_address : INTEGER := 16#FFFE#. . Examples: CONSTANT init_value : STD_LOGIC_VECTOR(3 downto 0) := "0100". ENTITY. . for the package can be used in several entities.e.333 ms.. CONSTANT ANDA_EXT : STD_LOGIC_VECTOR(7 downto 0) := X"B4". CONSTANT counter_width : INTEGER := 16. When declared in an ARCHITECTURE. it is visible only within this architecture. CONSTANT strobe_period : TIME := 333. 55 Constants . CONSTANT clk_period : TIME := 20 ns. ARCHITECTURE When declared in a PACKAGE. .. . the constant is local.features Constants can be declared in a PACKAGE.Constants Syntax: CONSTANT name : type := value. i.. the constant can be used in all architectures associated with this entity. 56 . capacitance.Physical data types Types representing physical quantities. . voltage. such as time.. . TIME is the only predefined physical data type.Examples 7 ns 1 min min 10.. 58 . . Value of the physical data type is called a physical literal.65 fs Numeric value Space Unit of time (dimension) . are referred in VHDL as physical data types. etc.65 us 10. 57 Time values (physical literals) . Numeric value and dimension MUST be separated by a space. Numeric value is optional. If not given. ..TIME values Numeric value can be an integer or a floating point number. 1 is implied.. 60 . . 59 Units of time Unit Base Unit fs Derived Units ps ns us ms sec min hr Definition femtoseconds (10-15 seconds) picoseconds (10-12 seconds) nanoseconds (10-9 seconds) microseconds (10-6 seconds) miliseconds (10-3 seconds) seconds minutes (60 seconds) hours (3600 seconds) . . 2 ns – 12.g.3 = 21.Values of the type TIME Value of a physical literal is defined in terms of integral multiples of the base unit.. 61 Arithmetic operations on values of the type TIME Examples: 7 ns + 10 ns = 17 ns 1..650. . . e.65 us = 10.5 ns 20 ns / 5ns = 4 . 10. Smallest available resolution in simulation can be set using a simulator command or parameter.000 fs 10.000.65 fs = 10 fs Smallest available resolution in VHDL is 1 fs.6 ps = 1187400 fs 5 ns * 4. 62 . . . 63 Data-flow VHDL Major instructions Concurrent statements • • • • concurrent signal assignment () conditional concurrent signal assignment (when-else) selected concurrent signal assignment (with-select-when) generate scheme for equations (for-generate) .VHDL Design Styles VHDL Design Styles dataflow Concurrent statements structural Components and interconnects behavioral Sequential statements • Registers • State machines • Test benches • Algorithm spec. . . 64 .. . 65 Data-flow VHDL: Example (Full adder) xiyi ci ci xi yi ci + 1 si 0 0 0 1 0 1 1 1 0 1 1 0 1 0 0 1 00 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 01 11 1 0 1 10 1 1 s i = x i y i c i xiyi ci 00 01 11 1 1 1 0 1 (a) Truth table 10 1 ci + 1 = xi yi + xici + yi ci (b) Karnaugh maps xi yi si ci ci + 1 (c) Circuit .. .. . 66 .Data-flow VHDL Major instructions Concurrent statements • • • • concurrent signal assignment () conditional concurrent signal assignment (when-else) selected concurrent signal assignment (with-select-when) generate scheme for equations (for-generate) . STD_LOGIC ) .std_logic_1164.all . 68 . STD_LOGIC . 67 Data-flow VHDL: Example (2) ARCHITECTURE fulladd_dataflow OF fulladd IS BEGIN s <= x XOR y XOR cin . . . STD_LOGIC . .. cout <= (x AND y) OR (cin AND x) OR (cin AND y) .Data-flow VHDL: Example (1) LIBRARY ieee .. STD_LOGIC . END fulladd_dataflow . STD_LOGIC . . ENTITY fulladd IS PORT ( x : IN y : IN cin : IN s : OUT cout : OUT END fulladd . USE ieee. . . 70 No Implied Precedence Wanted: y = ab + cd Incorrect y <= a and b or c and d . equivalent to y = (ab + c)d Correct y <= (a and b) or (c and d) . .. . equivalent to y <= ((a and b) or c) and d .Logic Operators • Logic operators and or nand nor xor not • Logic operators precedence xnor only in VHDL-93 Highest and or not nand nor xor xnor Lowest . 69 . b <= ”1111”. -.‘0’. 72 Rotations in VHDL a<<<1 a(3) a(2) a(1) a(0) a(2) a(1) a(0) a(3) a_rotL <= a(2 downto 0) & a(3) . ..Concatenation SIGNAL a: STD_LOGIC_VECTOR(3 DOWNTO 0).c = ”00001111” d <= ‘0’ & ”0001111”.‘0’. SIGNAL b: STD_LOGIC_VECTOR(3 DOWNTO 0).e <= ”00001111” f <= (‘0’. d..‘0’.‘1’. -. e.‘1’. 71 . c <= a & b.‘1’. a <= ”0000”. . -. -. f: STD_LOGIC_VECTOR(7 DOWNTO 0).d <= ”00001111” e <= ‘0’ & ‘0’ & ‘0’ & ‘0’ & ‘1’ & ‘1’ & ‘1’ & ‘1’.‘1’) . SIGNAL c.f <= ”00001111” . C <= A + B.std_logic_1164..all. USE ieee.operators to perform addition and subtraction: signal A : signal B : signal C : …… STD_LOGIC_VECTOR(3 downto 0).. STD_LOGIC_VECTOR(3 downto 0).std_logic_signed. USE ieee.std_logic_unsigned. . 73 Arithmetic Operators in VHDL (2) You can use standard +. . .all. or USE ieee.Arithmetic Operators in VHDL (1) To use basic arithmetic operations involving std_logic_vectors you need to include the following library packages: LIBRARY ieee. 74 . . .all. STD_LOGIC_VECTOR(3 downto 0). Data-flow VHDL Major instructions Concurrent statements • • concurrent signal assignment () conditional concurrent signal assignment (when-else) selected concurrent signal assignment (with-select-when) generate scheme for equations (for-generate) • • 75 .… … 0 1 0 1 Value 2 Target Signal Value 1 Condition N-1 Condition 2 Condition 1 . .. 76 .. Value N Value N-1 0 1 .Else target_signal <= value1 when condition1 else value2 when condition2 else . . . valueN-1 when conditionN-1 else valueN. . Conditional concurrent signal assignment When . Operators • Relational operators = /= < <= > >= • Logic and relational operators precedence Highest Lowest = and /= or not < <= nand nor > xor >= xnor . .. 77 Priority of logic and relational operators compare a = bc Incorrect … when a = b and c else … equivalent to … when (a = b) and c else … Correct … when a = (b and c) else … . 78 .. . input: IN STD_LOGIC_VECTOR(7 downto 0).all. 79 Tri-state Buffer – example (2) ARCHITECTURE tri_state_dataflow OF tri_state IS BEGIN output <= input WHEN (enable = ‘0’) ELSE (OTHERS => ‘Z’).Tri-state Buffer – example (1) LIBRARY ieee. ENTITY tri_state IS PORT ( enable: IN STD_LOGIC. END tri_state_dataflow.. USE ieee. output: OUT STD_LOGIC_VECTOR (7 DOWNTO 0) ).std_logic_1164. . END tri_state. 80 . .. . . .. 82 . 81 Selected concurrent signal assignment With –Select-When with choice_expression select target_signal <= expression1 when choices_1. .. expression2 when choices_2. expressionN when choices_N. . expression1 choices_1 expression2 choices_2 target_signal expressionN choices_N choice expression . . .Data-flow VHDL Major instructions Concurrent statements • • • • concurrent signal assignment () conditional concurrent signal assignment (when-else) selected concurrent signal assignment (with-select-when) generate scheme for equations (for-generate) . .Allowed formats of choices_k WHEN value WHEN value_1 to value_2 WHEN value_1 | value_2 | . b WHEN "011" to "110". d WHEN OTHERS... 83 Allowed formats of choice_k . c WHEN "001" | "111". ... 84 . | value N .example WITH sel SELECT y <= a WHEN "000".. . . 86 . .std_logic_1164. NEG_B : IN STD_LOGIC.. A: IN STD_LOGIC. NEG_Y : IN STD_LOGIC. . END mlu. 85 MLU: Entity Declaration LIBRARY ieee. B: IN STD_LOGIC.. USE ieee.all. L1 : IN STD_LOGIC. Y: OUT STD_LOGIC ). L0 : IN STD_LOGIC.MLU: Block Diagram MUX_0 A1 A IN0 NEG_A MUX_1 IN1 MUX_2 Y1 IN2 IN3 Y OUTPUT SEL1 SEL0 B B1 NEG_Y MUX_4_1 MUX_3 NEG_B L1 L0 . ENTITY mlu IS PORT( NEG_A : IN STD_LOGIC. with (L) select Y1 <= MUX_0 MUX_1 MUX_2 MUX_3 WHEN "00". OR B1. Y <= NOT Y1 WHEN (NEG_Y='1') ELSE Y1. B1<= NOT B WHEN (NEG_B='1') ELSE B. MUX_3 : STD_LOGIC. B1 : STD_LOGIC. 87 . L <= L1 & L0.Architecture Body BEGIN A1<= NOT A WHEN (NEG_A='1') ELSE A. WHEN OTHERS. XNOR B1. MUX_0 : STD_LOGIC.. L: STD_LOGIC_VECTOR(1 DOWNTO 0). MUX_0 <= A1 MUX_1 <= A1 MUX_2 <= A1 MUX_3 <= A1 AND B1. . WHEN "01". XOR B1. .. END mlu_dataflow. . MUX_1 : STD_LOGIC. WHEN "10".MLU: Architecture Declarative Section ARCHITECTURE mlu_dataflow OF mlu IS SIGNAL SIGNAL SIGNAL SIGNAL SIGNAL SIGNAL SIGNAL SIGNAL A1 : STD_LOGIC. 88 MLU . MUX_2 : STD_LOGIC. . Y1 : STD_LOGIC. Data-flow VHDL Major instructions Concurrent statements • • • • concurrent signal assignment () conditional concurrent signal assignment (when-else) selected concurrent signal assignment (with-select-when) generate scheme for equations (for-generate) . .. . 90 For Generate Statement For ..Generate label: FOR identifier IN range GENERATE BEGIN {Concurrent Statements} END GENERATE. . 89 . .all. USE ieee. .. ENTITY parity IS PORT( parity_in : IN STD_LOGIC_VECTOR(7 DOWNTO 0). parity_out : OUT STD_LOGIC ).. END parity. .PARITY: Block Diagram . 91 PARITY: Entity Declaration LIBRARY ieee.std_logic_1164. 92 . xor_out(3) <= xor_out(2) XOR parity_in(3). END parity_dataflow. . ... parity_out <= xor_out(6) XOR parity_in(7). xor_out(4) <= xor_out(3) XOR parity_in(4). 93 . 94 PARITY: Architecture ARCHITECTURE parity_dataflow OF parity IS SIGNAL xor_out: std_logic_vector (6 downto 1).PARITY: Block Diagram xor_out(1) xor_out(2) xor_out(3) xor_out(4) xor_out(5) xor_out(6) . xor_out(5) <= xor_out(4) XOR parity_in(5). xor_out(6) <= xor_out(5) XOR parity_in(6). BEGIN xor_out(1) <= parity_in(0) XOR parity_in(1). . xor_out(2) <= xor_out(1) XOR parity_in(2). .PARITY: Block Diagram (2) xor_out(0) xor_out(1) xor_out(2) xor_out(3) xor_out(4) xor_out(5) xor_out(6) xor_out(7) . xor_out(3) <= xor_out(2) XOR parity_in(3). xor_out(4) <= xor_out(3) XOR parity_in(4). xor_out(1) <= xor_out(0) XOR parity_in(1).. . parity_out <= xor_out(7).. xor_out(7) <= xor_out(6) XOR parity_in(7). BEGIN xor_out(0) <= parity_in(0). 96 PARITY: Architecture ARCHITECTURE parity_dataflow OF parity IS SIGNAL xor_out: STD_LOGIC_VECTOR (7 downto 0). END parity_dataflow. xor_out(2) <= xor_out(1) XOR parity_in(2). xor_out(6) <= xor_out(5) XOR parity_in(6). . 95 . xor_out(5) <= xor_out(4) XOR parity_in(5). 98 ..PARITY: Architecture (2) ARCHITECTURE parity_dataflow OF parity IS SIGNAL xor_out: STD_LOGIC_VECTOR (7 DOWNTO 0). . BEGIN xor_out(0) <= parity_in(0). . END parity_dataflow.inout .out .buffer Expressions including: • Internal signals (defined in a given architecture) • Ports of the mode .buffer . right side of the assignment Left side <= Right side <= when-else with-select <= • Internal signals (defined in a given architecture) • Ports of the mode .. end generate G2. . 97 Left vs. G2: FOR i IN 1 TO 7 GENERATE xor_out(i) <= xor_out(i-1) XOR parity_in(i).inout .in . parity_out <= xor_out(7). <= • Multiplication. . the requested maximum clock frequency).Arithmetic operations Synthesizable arithmetic operations: • Addition. >=. SHR . . 100 . • Comparisons. . <. SHL... * • Division by a power of 2.without pipelining. >. + • Subtraction. The exact internal architecture used (and thus delay and area of the circuit) may depend on the timing constraints specified during synthesis (e.combinational circuit . 99 Arithmetic operations The result of synthesis of an arithmetic operation is a . /2**6 (equivalent to right shift) • Shifts by a constant..g. . . .all and signals (inputs/outputs) of the type STD_LOGIC_VECTOR OR USE ieee.std_logic_signed.std_logic_unsigned. 102 .std_logic_arith.all and signals (inputs/outputs) of the type SIGNED .std_logic_arith.Operations on Unsigned Numbers For operations on unsigned numbers USE ieee. 101 Operations on Signed Numbers For operations on signed numbers USE ieee.all and signals (inputs/outputs) of the type STD_LOGIC_VECTOR OR USE ieee.all and signals (inputs/outputs) of the type UNSIGNED .. STD_LOGIC_VECTOR(15 DOWNTO 0) . Cout <= Sum(16) .std_logic_1164. BEGIN Sum <= ('0' & X) + Y + Cin . USE ieee. they determine whether a given vector should be treated as a signed or unsigned number.. Require USE ieee. ENTITY adder16 IS PORT ( Cin X.all . : IN : IN : OUT : OUT STD_LOGIC . S <= Sum(15 DOWNTO 0) . Y S Cout END adder16 .std_logic_unsigned.all . USE ieee. STD_LOGIC_VECTOR(15 DOWNTO 0) .Signed and Unsigned Types Behave exactly like STD_LOGIC_VECTOR plus. . END Behavior . STD_LOGIC ) . ARCHITECTURE Behavior OF adder16 IS SIGNAL Sum : STD_LOGIC_VECTOR(16 DOWNTO 0) . .all. 103 Addition of Unsigned Numbers LIBRARY ieee . . .std_logic_arith. 104 .. : IN : IN : OUT : OUT STD_LOGIC . ENTITY adder16 IS PORT ( Cin X. STD_LOGIC ) . STD_LOGIC ) .all . BEGIN Sum <= ('0' & X) + Y + Cin .std_logic_arith. . ARCHITECTURE Behavior OF adder16 IS SIGNAL Sum : STD_LOGIC_VECTOR(16 DOWNTO 0) .. .all . S <= Sum(15 DOWNTO 0) . 106 . UNSIGNED(15 DOWNTO 0) . END Behavior . USE ieee. Overflow END adder16 . USE ieee. ARCHITECTURE Behavior OF adder16 IS SIGNAL Sum : UNSIGNED(16 DOWNTO 0) . . USE ieee. Cout <= Sum(16) . BEGIN Sum <= ('0' & X) + Y + Cin .std_logic_signed. Y S Cout. ENTITY adder16 IS PORT ( Cin X.. STD_LOGIC_VECTOR(15 DOWNTO 0) .all .all .std_logic_1164. STD_LOGIC_VECTOR(15 DOWNTO 0) . . Cout <= Sum(16) . Overflow <= Sum(16) XOR X(15) XOR Y(15) XOR Sum(15) .Addition of Unsigned Numbers LIBRARY ieee . S <= Sum(15 DOWNTO 0) . 105 Addition of Signed Numbers (1) LIBRARY ieee .std_logic_1164. USE ieee. : IN : IN : OUT : OUT STD_LOGIC . END Behavior . Y S Cout END adder16 . UNSIGNED(15 DOWNTO 0) . ARCHITECTURE Behavior OF adder16 IS SIGNAL Sum : SIGNED(16 DOWNTO 0) . 108 . Cout <= Sum(16) . STD_LOGIC ) .std_logic_arith. : IN : IN : OUT : OUT STD_LOGIC . .. USE ieee. 107 Addition of Signed Numbers (2) LIBRARY ieee . Overflow END adder16 .std_logic_1164. . ENTITY adder16 IS PORT ( Cin X. SIGNED(15 DOWNTO 0) . S <= Sum(15 DOWNTO 0) .. Y S Cout. . Overflow <= Sum(16) XOR X(15) XOR Y(15) XOR Sum(15) . SIGNED(15 DOWNTO 0) . BEGIN Sum <= ('0' & X) + Y + Cin . USE ieee. END Behavior .all .all .. .all. sb <= SIGNED(b). ub <= UNSIGNED(b).Multiplication of signed and unsigned numbers (1) LIBRARY ieee. SIGNAL ures: UNSIGNED(23 downto 0). architecture dataflow of multiply is SIGNAL sa: SIGNED(15 downto 0). cu <= STD_LOGIC_VECTOR(ures).unsigned multiplication ua <= UNSIGNED(a). SIGNAL ub: UNSIGNED(7 downto 0). entity multiply is port( a : in STD_LOGIC_VECTOR(15 downto 0). cs : out STD_LOGIC_VECTOR(23 downto 0) ). cu : out STD_LOGIC_VECTOR(23 downto 0). 109 Multiplication of signed and unsigned numbers (2) begin -.all . . USE ieee.std_logic_arith. b : in STD_LOGIC_VECTOR(7 downto 0). SIGNAL ua: UNSIGNED(15 downto 0). end multiply.std_logic_1164. .signed multiplication sa <= SIGNED(a).. sres <= sa * sb.. ures <= ua * ub. cs <= STD_LOGIC_VECTOR(sres). 110 . -. SIGNAL sb: SIGNED(7 downto 0). . USE ieee. SIGNAL sres: SIGNED(23 downto 0). end dataflow. 112 .Integer Types Operations on signals (variables) of the integer types: INTEGER. and thus are recommened to be avoided by beginners.. NATURAL. NATURAL.. are synthesizable in the range -(231-1) .. 231 -1 for INTEGERs and their subtypes 0 . 111 Integer Types Operations on signals (variables) of the integer types: INTEGER. . and their sybtypes. 231 -1 for NATURALs and their subtypes . are less flexible and more difficult to control than operations on signals (variables) of the type STD_LOGIC_VECTOR UNSIGNED SIGNED.. . . such as TYPE day_of_month IS RANGE 0 TO 31. .. ARCHITECTURE Behavior OF adder16 IS BEGIN S <= X + Y . 114 . INTEGER RANGE -32767 TO 32767 ) . 113 VHDL Design Styles VHDL Design Styles dataflow Concurrent statements structural Components and interconnects behavioral Sequential statements • Registers • State machines • Test benches • Algorithm spec. .. .Addition of Signed Integers ENTITY adder16 IS PORT ( X. Y S END adder16 . END Behavior . . : IN : OUT INTEGER RANGE -32767 TO 32767 . Structural VHDL Major instructions • component instantiation (port map) • generate scheme for component instantiations (for-generate) • component instantiation with generic (generic map. 115 Structural VHDL Major instructions • component instantiation (port map) • component instantiation with generic (generic map. .. port map) • generate scheme for component instantiations (for-generate) .. . port map) . 116 . . .. 118 .Circuit built of medium scale components s(0) r(0) 0 r(1) 1 En p(0) w0 p(1) r(2) p(2) r(3) r(4) r(5) w1 p(3) q(0) q(1) y1 w2 w3 0 y0 z priority ena w 0 w 1 En Enable z(0) z(0) y 0 y 1 y 2 y 3 z(1) z(3) Clk z(1) z(2) z(2) dec2to4 1 D Q regn z(3) Clock s(1) .. 117 2-to-1 Multiplexer s w 0 0 w 1 1 f (a) Graphical symbol s f 0 w 0 1 w 1 (b) Truth table . . : IN : OUT STD_LOGIC . USE ieee. ENTITY mux2to1 IS PORT ( w0. . s f END mux2to1 ..VHDL code for a 2-to-1 Multiplexer LIBRARY ieee .all . END dataflow . 120 Priority Encoder w0 y0 w1 y1 w2 z w3 w3 w2 w1 w0 0 0 0 0 1 0 0 0 1 x 0 0 1 x x 0 1 x x x y1 y0 z d 0 0 1 1 0 1 1 1 1 d 0 1 0 1 .. ARCHITECTURE dataflow OF mux2to1 IS BEGIN f <= w0 WHEN s = '0' ELSE w1 . STD_LOGIC ) .std_logic_1164. 119 . . w1. END dataflow .VHDL code for a Priority Encoder LIBRARY ieee . 122 . STD_LOGIC_VECTOR(1 DOWNTO 0) . 121 2-to-4 Decoder En w w 1 0 y y y y 0 1 2 3 1 0 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 1 0 x x 0 0 0 0 (a) Truth table w 0 w 1 En y 0 y 1 y 2 y 3 (b) Graphical symbol . STD_LOGIC_VECTOR(3 DOWNTO 0) . .all . STD_LOGIC ) . USE ieee.. .std_logic_1164.. ARCHITECTURE dataflow OF priority IS BEGIN y <= "11" WHEN w(3) = '1' ELSE "10" WHEN w(2) = '1' ELSE "01" WHEN w(1) = '1' ELSE "00" . ENTITY priority IS PORT ( w : IN y : OUT z : OUT END priority . . z <= '0' WHEN w = "0000" ELSE '1' . END dataflow . N-bit register with enable LIBRARY ieee . . STD_LOGIC . "0000" WHEN OTHERS .VHDL code for a 2-to-4 Decoder LIBRARY ieee .all . END PROCESS .all .. ENTITY regn IS GENERIC ( N : INTEGER := 8 ) . . STD_LOGIC_VECTOR(N-1 DOWNTO 0) . "0010" WHEN "101". N N Enable Q D Clock regn . ENTITY dec2to4 IS PORT ( w : IN En : IN y : OUT END dec2to4 . STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) .. 123 . STD_LOGIC . PORT ( D : IN Enable. END IF . ARCHITECTURE Behavior OF regn IS BEGIN PROCESS (Clock) BEGIN IF (Clock'EVENT AND Clock = '1' ) THEN IF Enable = '1' THEN Q <= D . 124 .std_logic_1164. Clock : IN Q : OUT END regn . USE ieee. “1000" WHEN "111". USE ieee. WITH Enw SELECT y <= “0001" WHEN "100". ARCHITECTURE dataflow OF dec2to4 IS SIGNAL Enw : STD_LOGIC_VECTOR(2 DOWNTO 0) . "0100" WHEN "110".std_logic_1164. BEGIN Enw <= En & w . STD_LOGIC_VECTOR(3 DOWNTO 0) ) . END IF. END Behavior . STD_LOGIC_VECTOR(1 DOWNTO 0) . z : STD_LOGIC_VECTOR (3 DOWNTO 0) . . 125 Structural description – example (1) LIBRARY ieee . en : IN STD_LOGIC.all . USE ieee.std_logic_1164. ena : STD_LOGIC .Circuit built of medium scale components s(0) r(0) 0 r(1) 1 p(1) r(2) p(2) r(3) r(4) r(5) En p(0) w0 w1 p(3) q(0) q(1) y1 w2 w3 0 y0 z ena priority 1 w 0 w 1 En Enable t(0) z(0) y 0 y 1 y 2 y 3 z(1) D Q t(2) z(2) z(3) dec2to4 Clk t(1) regn t(3) Clock s(1) . q : STD_LOGIC_VECTOR (1 DOWNTO 0) . clk : IN STD_LOGIC.. 126 . ENTITY priority_resolver IS PORT (r : IN STD_LOGIC_VECTOR(5 DOWNTO 0) . ARCHITECTURE structural OF priority_resolver IS SIGNAL SIGNAL SIGNAL SIGNAL p : STD_LOGIC_VECTOR (3 DOWNTO 0) .. s : IN STD_LOGIC_VECTOR(1 DOWNTO 0) . . END priority_resolver. . t : OUT STD_LOGIC_VECTOR(3 DOWNTO 0) ) . STD_LOGIC_VECTOR(3 DOWNTO 0) . w1. STD_LOGIC ) . Enable. STD_LOGIC . 128 ..Structural description – example (2) COMPONENT mux2to1 PORT (w0. END COMPONENT . STD_LOGIC_VECTOR(1 DOWNTO 0) . PORT ( D : IN STD_LOGIC_VECTOR(N-1 DOWNTO 0) . Clock : IN STD_LOGIC . 127 Structural description – example (3) COMPONENT regn GENERIC ( N : INTEGER := 8 ) . s f END COMPONENT . : IN : OUT COMPONENT priority PORT (w : IN y : OUT z : OUT END COMPONENT . Q : OUT STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) . STD_LOGIC ) . STD_LOGIC_VECTOR(1 DOWNTO 0) . . COMPONENT dec2to4 PORT (w : IN En : IN y : OUT END COMPONENT . STD_LOGIC_VECTOR(3 DOWNTO 0) ) . . .. . STD_LOGIC . p(1) <= r(2). f => p(0)).Structural description – example (4) BEGIN u1: mux2to1 PORT MAP (w0 => r(0) . END structural. . p(1) <= r(3). . s => s(1). u3: priority PORT MAP (w => p. z => ena). y => z). w1 => r(1). En => ena. Q => t ). .. s => s(0). w1 => r(5). 129 Structural description – example (5) u5: regn GENERIC MAP (N => 4) PORT MAP (D => z . Enable => En . u4: dec2to4 PORT MAP (w => q. y => q.. . f => p(3)). u2: mux2to1 PORT MAP (w0 => r(4) . Clock => Clk. 130 . STD_LOGIC_VECTOR(0 TO 3) ) . 131 Positional association connectivity • allowed. 132 . En => ena.. . STD_LOGIC .. . u4: dec2to4 PORT MAP (w => q. u4: dec2to4 PORT MAP (w. y => z). in regular structures COMPONENT dec2to4 PORT (w : IN En : IN y : OUT END COMPONENT . En. prevents ommisions and mistakes COMPONENT dec2to4 PORT (w : IN En : IN y : OUT END COMPONENT . y).Named association connectivity • recommended in majority of cases. . STD_LOGIC_VECTOR(1 DOWNTO 0) . . STD_LOGIC . STD_LOGIC_VECTOR(0 TO 3) ) . STD_LOGIC_VECTOR(1 DOWNTO 0) . especially for the cases of • small number of ports • multiple instantiations of the same component. . COMPONENT priority PORT (w : IN STD_LOGIC_VECTOR(3 DOWNTO 0) . z : OUT STD_LOGIC ) . s(1). u5: regn GENERIC MAP(4) PORT MAP (z.all . END structural. z). p(1) <= r(2). s : IN f : OUT END COMPONENT . u4: dec2to4 PORT MAP (q. t). 134 . STD_LOGIC . END COMPONENT . Clk. y : OUT STD_LOGIC_VECTOR(1 DOWNTO 0) . u3: priority PORT MAP (p. w1. . . q. ena). En. . 133 Package – example (1) LIBRARY ieee . u2: mux2to1 PORT MAP (r(4) .. p(0)). . r(1). PACKAGE GatesPkg IS COMPONENT mux2to1 PORT (w0. STD_LOGIC ) . USE ieee. p(1) <= r(3). ena. r(5). s(0).std_logic_1164.Structural description with positional association connectivity BEGIN u1: mux2to1 PORT MAP (r(0). p(3)). Enable. constant NOTA : std_logic_vector(3 downto 0) := "0100". 136 . . END GatesPkg. . constant ADDAM : std_logic_vector(3 downto 0) := "0001". constant NOTM : std_logic_vector(3 downto 0) := "0110". COMPONENT regn GENERIC ( N : INTEGER := 8 ) . constant SUBAB : std_logic_vector(3 downto 0) := "0010". y : OUT STD_LOGIC_VECTOR(0 TO 3) ) . Clock : IN STD_LOGIC ... PORT ( D : IN STD_LOGIC_VECTOR(N-1 DOWNTO 0) . constant SUBAM : std_logic_vector(3 downto 0) := "0011". 135 Package – example (3) constant ADDAB : std_logic_vector(3 downto 0) := "0000". constant ANDAB : std_logic_vector(3 downto 0) := "0111". En : IN STD_LOGIC . END COMPONENT . constant NOTB : std_logic_vector(3 downto 0) := "0101". END COMPONENT . Q : OUT STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) . .Package – example (2) COMPONENT dec2to4 PORT (w : IN STD_LOGIC_VECTOR(1 DOWNTO 0) . . std_logic_1164.. ENTITY priority_resolver IS PORT (r : IN STD_LOGIC_VECTOR(5 DOWNTO 0) . USE ieee. q : STD_LOGIC_VECTOR (1 DOWNTO 0) . s => s(0). clk : IN STD_LOGIC. 138 Package usage (2) BEGIN u1: mux2to1 PORT MAP (w0 => r(0) .all . En => ena. f => p(0)). y => z). . y => q. . t : OUT STD_LOGIC_VECTOR(3 DOWNTO 0) ) .Package usage (1) LIBRARY ieee . f => p(3)). p(1) <= r(3). ARCHITECTURE structural OF priority_resolver IS SIGNAL SIGNAL SIGNAL SIGNAL p : STD_LOGIC_VECTOR (3 DOWNTO 0) . w1 => r(1). USE work.GatesPkg. u4: dec2to4 PORT MAP (w => q. u3: priority PORT MAP (w => p.. u2: mux2to1 PORT MAP (w0 => r(4) . z : STD_LOGIC_VECTOR (3 DOWNTO 0) . z => ena).all. . p(1) <= r(2). ena : STD_LOGIC . s : IN STD_LOGIC_VECTOR(1 DOWNTO 0) . en : IN STD_LOGIC. END priority_resolver. s => s(1). 137 . w1 => r(5). . mux2to1(dataflow). Enable => En . END FOR. END FOR.priority(dataflow). 139 Configuration declaration CONFIGURATION SimpleCfg OF priority_resolver IS FOR structural FOR ALL: mux2to1 USE ENTITY work. END FOR. FOR u3: priority USE ENTITY work. . END structural. . 140 .dec2to4(dataflow). FOR u4: dec2to4 USE ENTITY work.. END SimpleCfg.Package usage (3) u5: regn GENERIC MAP (N => 4) PORT MAP (D => z . END FOR. . Q => t ). .. Clock => Clk. . STD_LOGIC_VECTOR(1 DOWNTO 0) . port map) • generate scheme for component instantiations (for-generate) .all . 141 Structural VHDL Major instructions • component instantiation (port map) • component instantiation with generic (generic map.mux2to1(dataflow).GatesPkg. ARCHITECTURE structural OF priority_resolver IS SIGNAL p : STD_LOGIC_VECTOR (3 DOWNTO 0) . USE work..dec2to4(dataflow). USE ieee.Configuration specification LIBRARY ieee . 142 . . SIGNAL q : STD_LOGIC_VECTOR (1 DOWNTO 0) .std_logic_1164.. FOR u4: dec2to4 USE ENTITY work. SIGNAL ena : STD_LOGIC . ENTITY priority_resolver IS PORT (r : IN s : IN z : OUT END priority_resolver. STD_LOGIC_VECTOR(3 DOWNTO 0) ) .priority(dataflow). FOR u3: priority USE ENTITY work. FOR ALL: mux2to1 USE ENTITY work.all. STD_LOGIC_VECTOR(5 DOWNTO 0) . . . STD_LOGIC ) . w2 WHEN "10". w1. : IN STD_LOGIC . 144 .Example 1 s0 s1 w0 w3 s2 s3 w4 w7 f w8 w11 w12 w15 .. USE ieee. STD_LOGIC_VECTOR(1 DOWNTO 0) . 143 A 4-to-1 Multiplexer LIBRARY ieee . ENTITY mux4to1 IS PORT ( w0. END Dataflow . w2.std_logic_1164. . w3 WHEN OTHERS . w3 s : IN f : OUT END mux4to1 .all . w1 WHEN "01". ARCHITECTURE Dataflow OF mux4to1 IS BEGIN WITH s SELECT f <= w0 WHEN "00". . . w2. . m(1) ) . w(2). w(5). s(1 DOWNTO 0). w(1). : IN : IN : OUT STD_LOGIC . STD_LOGIC ) . m(2). w(15). m(2) ) . s(3 DOWNTO 2). w(3).. . STD_LOGIC ) . f ) . w(13). Mux5: mux4to1 PORT MAP ( m(0). m(3) ) . Mux4: mux4to1 PORT MAP ( w(12).std_logic_1164.. 145 Straightforward code for Example 1 ARCHITECTURE Structure OF Example1 IS COMPONENT mux4to1 PORT ( w0. m(0) ) . m(1). BEGIN Mux1: mux4to1 PORT MAP ( w(0). USE ieee. m(3). w(7). 146 . w(11). s(1 DOWNTO 0). SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) . s(1 DOWNTO 0). w(6). . w1. Mux3: mux4to1 PORT MAP ( w(8). . END Structure . w(9). STD_LOGIC_VECTOR(3 DOWNTO 0) .all . w(10). w3 s f END COMPONENT . STD_LOGIC_VECTOR(1 DOWNTO 0) . STD_LOGIC_VECTOR(0 TO 15) . ENTITY Example1 IS PORT ( w : IN s : IN f : OUT END Example1 . w(14). s(1 DOWNTO 0).Straightforward code for Example 1 LIBRARY ieee . Mux2: mux4to1 PORT MAP ( w(4). w3 s f END COMPONENT . STD_LOGIC ) . w(4*i+2). END Structure . s(1 DOWNTO 0). 148 Example 2 w0 w1 w0 w1 En w0 w1 w2 w3 w0 w1 En En y0 y1 y2 y3 En w0 w1 En w0 w1 En y0 y1 y2 y3 y0 y1 y2 y3 y0 y1 y2 y3 y4 y5 y6 y7 y0 y1 y2 y3 y8 y9 y10 y11 y0 y1 y2 y3 y12 y13 y14 y15 . Mux5: mux4to1 PORT MAP ( m(0). m(i) ) . m(3). f ) . 147 . : IN : IN : OUT STD_LOGIC . m(1). . BEGIN G1: FOR i IN 0 TO 3 GENERATE Muxes: mux4to1 PORT MAP ( w(4*i).. w(4*i+3). s(3 DOWNTO 2). m(2). w2. w1.. w(4*i+1). SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) . . END GENERATE . . STD_LOGIC_VECTOR(1 DOWNTO 0) .Modified code for Example 1 ARCHITECTURE Structure OF Example1 IS COMPONENT mux4to1 PORT ( w0. STD_LOGIC_VECTOR(3 DOWNTO 0) . "0010" WHEN "110".std_logic_1164. STD_LOGIC .. WITH Enw SELECT y <= "1000" WHEN "100".std_logic_1164. STD_LOGIC_VECTOR(0 TO 3) ) . . 150 . "0100" WHEN "101". . . "0001" WHEN "111". END Dataflow . ENTITY dec4to16 IS PORT (w : IN En : IN y : OUT END dec4to16 . BEGIN Enw <= En & w . STD_LOGIC_VECTOR(0 TO 15) ) . "0000" WHEN OTHERS . USE ieee. ENTITY dec2to4 IS PORT ( w : IN En : IN y : OUT END dec2to4 .A 2-to-4 binary decoder LIBRARY ieee .all . STD_LOGIC . STD_LOGIC_VECTOR(1 DOWNTO 0) . 149 VHDL code for Example 2 (1) LIBRARY ieee . . ARCHITECTURE Dataflow OF dec2to4 IS SIGNAL Enw : STD_LOGIC_VECTOR(2 DOWNTO 0) . USE ieee..all . En. y(4*i TO 4*i+3) ). procedures… Component declarations No variable declarations !! begin Concurrent statements: • Concurrent simple signal assignment • Conditional signal assignment • Selected signal assignment • Generate statement Concurrent Statements • Component instantiation statement • Process statement • inside process you can use only sequential statements end ARCHITECTURE_NAME. : IN : IN : OUT STD_LOGIC_VECTOR(1 DOWNTO 0) . END Structure . 151 Mixed Style Modeling architecture ARCHITECTURE_NAME of ENTITY_NAME is • • • Here you can declare signals. m(i). BEGIN G1: FOR i IN 0 TO 3 GENERATE Dec_ri: dec2to4 PORT MAP ( w(1 DOWNTO 0). . END GENERATE .. SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) . . . END GENERATE .. G2: IF i=3 GENERATE Dec_left: dec2to4 PORT MAP ( w(i DOWNTO i-1). functions. STD_LOGIC_VECTOR(0 TO 3) ) . STD_LOGIC . 152 .VHDL code for Example 2 (2) ARCHITECTURE Structure OF dec4to16 IS COMPONENT dec2to4 PORT ( w En y END COMPONENT . constants. . m ) . 154 . . ... . . 153 Anatomy of a Process OPTIONAL [label:] process [(sensitivity list)] [declaration part] begin statement part end process [label].VHDL Design Styles VHDL Design Styles dataflow Concurrent statements structural Components and interconnects behavioral Sequential statements • Registers • State machines • Test benches • Algorithm spec. order of statements is important. . test_vector<=“10”.Statement Part • Contains Sequential Statements to be Executed Each Time the Process Is Activated • Analogous to Conventional Programming Languages . . • A process must end with the keywords END PROCESS. Hence. WAIT FOR 10 ns.. . WAIT FOR 10 ns. Testing: PROCESS BEGIN test_vector<=“00”. 155 What is a PROCESS? • A process is a sequence of instructions referred to as sequential statements.. test_vector<=“01”. END PROCESS. test_vector<=“11”. WAIT FOR 10 ns. 156 . The keyword PROCESS • A process can be given a unique name using an optional LABEL • This is followed by the keyword PROCESS • The keyword BEGIN is used to indicate the start of the process • All statements within the process are executed SEQUENTIALLY. WAIT FOR 10 ns. This will cause the PROCESS to suspend indefinitely when the WAIT statement is executed. This form of WAIT can be used in a process included in a testbench when all possible combinations of inputs have been tested or a non-periodical signal has to be generated. WAIT FOR 10 ns. .Execution of statements in a PROCESS The execution of statements continues sequentially till the last statement in the process. Program control is passed to the first statement after BEGIN . END PROCESS. test_vector<=“10”. WAIT. WAIT FOR 10 ns. WAIT FOR 10 ns. WAIT FOR 10 ns. test_vector<=“10”. After execution of the last statement. 157 PROCESS with a WAIT Statement • • The last statement in the PROCESS is a WAIT instead of WAIT FOR 10 ns. test_vector<=“11”. WAIT FOR 10 ns. • Order of execution • Testing: PROCESS BEGIN test_vector<=“00”. WAIT FOR 10 ns. the control is again passed to the beginning of the process. END PROCESS. Order of execution • Program execution stops here . . Testing: PROCESS BEGIN test_vector<=“00”. WAIT FOR 10 ns. test_vector<=“01”. test_vector<=“01”.. 158 .. test_vector<=“11”. 160 ..WAIT FOR vs. WAIT WAIT FOR: waveform will keep repeating itself forever 0 1 2 3 0 1 2 … 3 WAIT : waveform will keep its state after the last wait instruction.. . 159 Sequential Statements (1) • If Statement if boolean expression then statements elsif boolean expression then statements else boolean expression then statements end if. . … . • else and elsif are optional . END IF.. • Repeats a Section of VHDL Code • Example: process every element in an array in the same way . . ELSIF Sel = “10” THEN f <= x2. IF Sel = “00” THEN f <= x1. ELSE f <= x3. . 161 Loop Statement • Loop Statement FOR i IN range LOOP statements END LOOP. end process. 162 . ..If Statement .Example SELECTOR: process begin WAIT UNTIL Clock'EVENT AND Clock = '1' . END LOOP. END LOOP. FOR i IN 0 TO 3 LOOP FOR j IN 0 TO 3 LOOP WAIT FOR 10 ns.. . 163 Loop Statement – Example (2) Testing: PROCESS BEGIN test_ab<="00". . . FOR i IN 0 TO 7 LOOP WAIT FOR 10 ns.. test_sel<=test_sel+"01".Loop Statement – Example (1) Testing: PROCESS BEGIN test_vector<="000". test_vector<=test_vector+”001". test_sel<="00". END PROCESS. 164 . END LOOP. test_ab<=test_ab+"01". . END PROCESS. the process fires.PROCESS with a SENSITIVITY LIST • List of signals to which the process is sensitive. 165 Generating selected values of one input SIGNAL test_vector : STD_LOGIC_VECTOR(2 downto 0)... • WAIT statements are NOT ALLOWED in a processes with SENSITIVITY LIST. WAIT FOR 10 ns. test_vector <= "100". WAIT FOR 10 ns... BEGIN . WAIT FOR 10 ns... test_vector <= "011". . . test_vector <= "001". END behavioral.. .. label: process (sensitivity list) declaration part begin statement part end process.. • Whenever there is an event on any of the signals in the sensitivity list.. 166 . WAIT FOR 10 ns. . testing: PROCESS BEGIN test_vector <= "000". END PROCESS.... WAIT FOR 10 ns. • Every time the process fires.. . it will run in its entirety. test_vector <= "010".. . END behavioral... BEGIN . testing: PROCESS BEGIN WAIT FOR 10 ns. double_loop: PROCESS BEGIN test_ab <="00"... . END behavioral.. test_vector <= test_vector + 1.. . . end process TESTING..... BEGIN .. .. end loop.. END PROCESS. SIGNAL test_sel : STD_LOGIC_VECTOR(1 downto 0).. test_ab <= test_ab + 1... for I in 0 to 3 loop for J in 0 to 3 loop wait for 10 ns. end loop. .. 167 Generating all possible values of two inputs SIGNAL test_ab : STD_LOGIC_VECTOR(1 downto 0). test_sel <="00"...Generating all values of one input SIGNAL test_vector : STD_LOGIC_VECTOR(3 downto 0):="0000"........ 168 . . test_sel <= test_sel + 1.. .. WAIT FOR clk1_period/2. reset1 <= ‘0’. BEGIN . END behavioral.. .. such as clocks CONSTANT clk1_period : TIME := 20 ns. clk2 <= not clk2 after clk2_period/2. SIGNAL clk1 : STD_LOGIC. CONSTANT reset2_width : TIME := 150 ns. SIGNAL reset1 : STD_LOGIC. . reset2 <= ‘0’. such as resets CONSTANT reset1_width : TIME := 100 ns.... END PROCESS.... .. reset2_generator: PROCESS WAIT FOR reset_width. SIGNAL reset2 : STD_LOGIC := ‘1’..Generating periodical signals. ... 169 Generating one-time signals. clk1_generator: PROCESS clk1 <= ‘0’. . WAIT FOR clk1_period/2.. WAIT FOR reset_width. END PROCESS. reset1_generator: PROCESS reset1 <= ‘1’... END PROCESS.. clk1 <= ‘1’. WAIT... SIGNAL clk2 : STD_LOGIC := ‘0’... BEGIN .. END behavioral. .. CONSTANT clk2_period : TIME := 200 ns. 170 . WAIT... . ..Typical error SIGNAL test_vector : STD_LOGIC_VECTOR(2 downto 0). generator1: PROCESS reset <= ‘1’. BEGIN . . 172 ..... .. END PROCESS.. WAIT. END behavioral. .. 171 Register Transfer Level (RTL) Design Description Combinational Logic Combinational Logic … Registers . WAIT FOR 600 ns test_vector <="011".. WAIT FOR 100 ns reset <= ‘0’. test_vector <="000". SIGNAL reset : STD_LOGIC. generator2: PROCESS WAIT FOR 200 ns test_vector <="001"..... END PROCESS. b. ELSIF w(1) = c THEN y <= a and b.g. a. z • All signals which appear on the right of signal assignment statement (<=) or in logic expressions are inputs e. y.. ELSE z <= "00" . 174 . . clk • Note that not all inputs need to be included in the sensitivity list . c • All signals which appear in the sensitivity list are inputs e. END PROCESS . ELSIF w(2) = '1' THEN y <= "10" . . w..Component Equivalent of a Process priority: PROCESS (clk) BEGIN IF w(3) = '1' THEN y <= "11" . clk w a b c y priority z • All signals which appear on the left of signal assignment statement (<=) are outputs e. END IF .g. 173 Processes in VHDL • Processes Describe Sequential Behavior • Processes in VHDL Are Very Powerful Statements • Allow to define an arbitrary behavior that may be difficult to represent by a real circuit • Not every process can be synthesized • Use Processes with Caution in the Code to Be Synthesized • Use Processes Freely in Testbenches and algorithm specifications .g. .D latch Truth table Graphical symbol Clock 0 1 1 Q D Clock D – 0 1 Q(t+1) Q(t) 0 1 Timing diagram t1 t2 t3 t4 Clock D Q Time 175 . D flip-flop Truth table Graphical symbol D Clk D n 0 n 1 0 – 1 – Q Clock Q(t+1) 0 1 Q(t) Q(t) Timing diagram t1 t2 t3 t4 Clock D Q Time . 176 . . .. END flipflop . 178 . D flip-flop (1) LIBRARY ieee .all . END IF . . ENTITY flipflop IS PORT ( D. END Behavior. END PROCESS . END PROCESS . . . STD_LOGIC) .all . ENTITY latch IS PORT ( D. 177 .. D Q Clock ARCHITECTURE Behavior_1 OF flipflop IS BEGIN PROCESS ( Clock ) BEGIN IF Clock'EVENT AND Clock = '1' THEN Q <= D . Clock : IN Q : OUT END latch . Clock ARCHITECTURE Behavior OF latch IS BEGIN PROCESS ( D. END Behavior_1 . Q D STD_LOGIC . Q : OUT STD_LOGIC) . Clock ) BEGIN IF Clock = '1' THEN Q <= D ..std_logic_1164.std_logic_1164. END IF . Clock : IN STD_LOGIC .D latch LIBRARY ieee . USE ieee. USE ieee. Q : OUT STD_LOGIC) . Clock : IN STD_LOGIC .D flip-flop (2) LIBRARY ieee . USE ieee. . ENTITY flipflop IS PORT ( D. Q : OUT STD_LOGIC) . END IF . USE ieee. END PROCESS . Clock : IN STD_LOGIC . END flipflop .std_logic_1164.all . END PROCESS . END flipflop . 179 ... ENTITY flipflop IS PORT ( D. D Q Clock ARCHITECTURE Behavior_2 OF flipflop IS BEGIN PROCESS BEGIN WAIT UNTIL Clock'EVENT AND Clock = '1' . D flip-flop (3) LIBRARY ieee .std_logic_1164.all . . END Behavior_1 . . 180 . D Q Clock ARCHITECTURE Behavior_1 OF flipflop IS BEGIN PROCESS ( Clock ) BEGIN IF rising_edge(Clock) THEN Q <= D . END Behavior_2 . Q <= D . END PROCESS . STD_LOGIC) . D : IN : OUT STD_LOGIC . ELSIF Clock'EVENT AND Clock = '1' THEN Q <= D . 182 . Q <= D . D Q Clock ARCHITECTURE Behavior_2 OF flipflop IS BEGIN PROCESS BEGIN WAIT UNTIL rising_edge(Clock) . END Behavior_2 . . . . Clock Q END flipflop . END IF . Q : OUT STD_LOGIC) . USE ieee. USE ieee. ENTITY flipflop IS PORT ( D.all . END Behavior . END PROCESS . Resetn. 181 .all ..std_logic_1164.D flip-flop (4) LIBRARY ieee . D flip-flop with asynchronous reset LIBRARY ieee . Clock ) BEGIN IF Resetn = '0' THEN Q <= '0' . ENTITY flipflop IS PORT ( D. Q Clock Resetn ARCHITECTURE Behavior OF flipflop IS BEGIN PROCESS ( Resetn. END flipflop ..std_logic_1164. Clock : IN STD_LOGIC . std_logic_1164. Clock ) BEGIN IF Resetn = '0' THEN Q <= "00000000" . END PROCESS . END IF .all . Resetn. USE ieee. ENTITY flipflop IS PORT ( D. END PROCESS . D Q Clock Resetn ARCHITECTURE Behavior OF flipflop IS BEGIN PROCESS BEGIN WAIT UNTIL Clock'EVENT AND Clock = '1' . Clock Q END reg8 .D flip-flop with synchronous reset LIBRARY ieee .. END IF . END Behavior . . 184 . ARCHITECTURE Behavior OF reg8 IS BEGIN PROCESS ( Resetn. : IN STD_LOGIC .std_logic_1164. STD_LOGIC) . ELSIF Clock'EVENT AND Clock = '1' THEN Q <= D . 183 . USE ieee. IF Resetn = '0' THEN Q <= '0' . Clock Q END flipflop . : IN : OUT STD_LOGIC .` 8 8 Resetn D Q Clock reg8 .. . ENTITY reg8 IS PORT ( D Resetn. 8-bit register with asynchronous reset LIBRARY ieee . END Behavior . : OUT STD_LOGIC_VECTOR(7 DOWNTO 0) ) . ELSE Q <= D .all . : IN STD_LOGIC_VECTOR(7 DOWNTO 0) . Clock : IN Q : OUT END regn . STD_LOGIC_VECTOR(N-1 DOWNTO 0) .all . STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) . Clock ) BEGIN IF Resetn = '0' THEN Q <= (OTHERS => '0') . END Behavior . PORT ( D : IN STD_LOGIC_VECTOR(N-1 DOWNTO 0) .all . Clock : IN STD_LOGIC . ENTITY regn IS GENERIC ( N : INTEGER := 16 ) .. Q : OUT STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) . ENTITY regn IS GENERIC ( N : INTEGER := 8 ) . STD_LOGIC . . END Behavior . 186 . ELSIF Clock'EVENT AND Clock = '1' THEN Q <= D . . END IF . END PROCESS .std_logic_1164. N-bit register with enable LIBRARY ieee . END IF . N N Enable Q D Clock regn .N-bit register with asynchronous reset LIBRARY ieee . END PROCESS . USE ieee. ARCHITECTURE Behavior OF regn IS BEGIN PROCESS (Clock) BEGIN IF (Clock'EVENT AND Clock = '1' ) THEN IF Enable = '1' THEN Q <= D . PORT ( D : IN Enable. N N Resetn D Q Clock regn 185 .. END regn .std_logic_1164. USE ieee. Resetn. ARCHITECTURE Behavior OF regn IS BEGIN PROCESS ( Resetn. END IF. std_logic_1164. Enable : IN STD_LOGIC . ARCHITECTURE Behavior OF upcount IS BEGIN upcount: PROCESS ( Clock ) BEGIN IF (Clock'EVENT AND Clock = '1') THEN IF Clear = '1' THEN Q <= "00" . USE ieee. 187 4-bit up-counter with asynchronous reset (1) LIBRARY ieee . END IF. STD_LOGIC_VECTOR(1 DOWNTO 0) ) . Enable 4 Q Clock upcount Resetn . ENTITY upcount IS PORT ( Clear...all . USE ieee.std_logic_unsigned. Resetn.all . . Q : OUT STD_LOGIC_VECTOR (3 DOWNTO 0)) . END PROCESS. Clear 2 Q upcount Clock . END upcount . STD_LOGIC . .std_logic_1164. USE ieee. END Behavior .std_logic_unsigned. Clock : IN Q : BUFFER END upcount .2-bit up-counter with synchronous reset LIBRARY ieee . 188 . END IF . USE ieee. ENTITY upcount IS PORT ( Clock.all . ELSE Q <= Q + “01” .all . ELSIF (Clock'EVENT AND Clock = '1') THEN IF Enable = '1' THEN Count <= Count + 1 . Q END PROCESS .4-bit up-counter with asynchronous reset (2) ARCHITECTURE Behavior OF upcount IS SIGNAL Count : STD_LOGIC_VECTOR (3 DOWNTO 0) . BEGIN PROCESS ( Clock. Clock END Behavior . END IF .. . 190 . Enable END IF . 4 upcount Resetn .. . 189 Shift register Sin D Q Q(1) Q(2) Q(3) D Q D Q Q(0) D Q Clock Enable . Resetn ) BEGIN IF Resetn = '0' THEN Count <= "0000" . Q <= Count . . STD_LOGIC . STD_LOGIC . ENTITY shift4 IS PORT ( D Enable Load Sin Clock Q END shift4 . STD_LOGIC . 191 4-bit shift register with parallel load (1) LIBRARY ieee . . 192 . .. : IN : IN : IN : IN : IN : BUFFER 4 STD_LOGIC_VECTOR(3 DOWNTO 0) .all . STD_LOGIC_VECTOR(3 DOWNTO 0) ) .Shift Register With Parallel Load Load D(3) D(1) D(2) Sin D Q D D(0) D Q Q D Q Clock Enable Q(3) Q(2) Q(1) Q(0) . USE ieee. STD_LOGIC . Enable D Q 4 Load Sin shift4 Clock .std_logic_1164. USE ieee. 193 N-bit shift register with parallel load (1) LIBRARY ieee . ELSIF Enable = ‘1’ THEN Q(0) <= Q(1) . END PROCESS . END IF . 4 Q(3) <= Sin. N Enable D Q N Load Sin shiftn Clock ..std_logic_1164. ENTITY shiftn IS GENERIC ( N : INTEGER := 8 ) . 194 . Q : BUFFER STD_LOGIC_VECTOR(N-1 DOWNTO 0) ) . Enable D Q 4 Load Sin shift4 Clock . Q(1) <= Q(2). . Q(2) <= Q(3) . Load : IN STD_LOGIC .all . Enable : IN STD_LOGIC .. END shiftn . PORT ( D : IN STD_LOGIC_VECTOR(N-1 DOWNTO 0) . Clock : IN STD_LOGIC . . END IF . Sin : IN STD_LOGIC . END Behavior_1 .4-bit shift register with parallel load (2) ARCHITECTURE Behavior_1 OF shift4 IS BEGIN PROCESS (Clock) BEGIN IF Clock'EVENT AND Clock = '1' THEN IF Load = '1' THEN Q <= D . . Sin N shiftn Clock . : IN STD_LOGIC_VECTOR(1 TO 3) . Q(N-1) <= Sin .std_logic_1164. ENTITY Numbits IS PORT ( X Count END Numbits . 196 . USE ieee. . END PROCESS ..N-bit shift register with parallel load (2) ARCHITECTURE Behavior OF shiftn IS BEGIN PROCESS (Clock) BEGIN IF (Clock'EVENT AND Clock = '1' ) THEN IF Load = '1' THEN Q <= D . N Enable END IF. : OUT INTEGER RANGE 0 TO 3) . Load END Behavior . D Q END IF . 195 Variable – Example (1) LIBRARY ieee . . . ELSIF Enable = ‘1’ THEN Genbits: FOR i IN 0 TO N-2 LOOP Q(i) <= Q(i+1) .all . END LOOP . . or at least used with caution in a synthesizable code . not the structure of the circuit • Should be avoided. Count <= Tmp. END PROCESS. BEGIN Tmp := 0.Variable – Example (2) ARCHITECTURE Behavior OF Numbits IS BEGIN PROCESS(X) – count the number of bits in X equal to 1 VARIABLE Tmp: INTEGER. .features • Can only be declared within processes and subprograms (functions & procedures) • Initial value can be explicitly specified in the declaration • When assigned take an assigned value immediately • Variable assignments represent the desired behavior. 197 Variables .. END IF. .. END LOOP. FOR i IN 1 TO 3 LOOP IF X(i) = ‘1’ THEN Tmp := Tmp + 1. 198 . END Behavior . 200 . If present.. . . they will be ignored by the synthesis tools.. Use set and reset signals instead. such as wait for 5 ns a <= b after 10 ns will not produce the required delay. . cannot be synthesized. and should not be used in the code intended for synthesis. .Delays Delays are not synthesizable Statements. such as SIGNAL a : STD_LOGIC := ‘0’. and thus should be avoided. 199 Initializations Declarations of signals (and variables) with initialized values. 201 Floating-point operations Operations on signals (and variables) of the type real are not synthesizable by the current generation of synthesis tools. . . but they can be freely used in the code intended for synthesis. .Reports and asserts Reports and asserts. . They will be used during simulation and ignored during synthesis.. 202 .. assert initial_value <= max_value report "initial value too large" severity error. such as report "Initialization complete". cannot be synthesized. constant add_instr_1_3: instruction:= (opcode => add. sub. end record instruction . or). source_reg2 => 3. . data: bit_vector(31 downto 0). and... displacement => 0). 204 . type instruction is record opcode: opcodes. source_reg1 | dest_reg => 1. .Records – Examples (1) type opcodes is (add. . source_reg1: reg_number. source_reg2: reg_number. dest_reg: reg_number. 203 Records – Examples (2) type word is record instr: instruction. end record instruction. displacement: integer. type reg_number is range 0 to 8. std_logic_1164. STD_LOGIC_VECTOR(0 TO 3) ) .. y <= "0010" .all . END IF . 206 . . END PROCESS . 205 Describing combinational logic using processes LIBRARY ieee . y <= "1000" . USE ieee. STD_LOGIC_VECTOR(1 DOWNTO 0) . y <= "0001" . En ) BEGIN IF En = '1' THEN CASE w IS WHEN "00" => WHEN "01" => WHEN "10" => WHEN OTHERS => END CASE . ENTITY dec2to4 IS PORT ( w : IN En : IN y : OUT END dec2to4 . END Behavior . y <= "0100" . ELSE y <= "0000" .2-to-4 Decoder En w w 1 0 y y y y 0 1 2 3 1 0 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 1 1 1 0 0 0 1 0 x x 0 0 0 0 (a) Truth table w 0 w 1 En y 0 y 1 y 2 y 3 (b) Graphical symbol .. ARCHITECTURE Behavior OF dec2to4 IS BEGIN PROCESS ( w. . STD_LOGIC . . . leds : OUT STD_LOGIC_VECTOR(1 TO 7) ) . STD_LOGIC ) . 208 . B ) BEGIN AeqB <= '0' .all . USE ieee.Describing combinational logic using processes LIBRARY ieee . ARCHITECTURE Behavior OF compare1 IS BEGIN PROCESS ( A. ENTITY compare1 IS PORT ( A. WHEN "0010" => leds <= "1101101" . WHEN "1000" => leds <= "1111111" . END Behavior . END PROCESS . WHEN "0100" => leds <= "0110011" . WHEN "0001" => leds <= "0110000" . ENTITY seg7 IS PORT ( bcd : IN STD_LOGIC_VECTOR(3 DOWNTO 0) . END IF . ARCHITECTURE Behavior OF seg7 IS BEGIN PROCESS ( bcd ) BEGIN CASE bcd IS -abcdefg WHEN "0000" => leds <= "1111110" . WHEN OTHERS => leds <= "-------" . WHEN "1001" => leds <= "1110011" . END Behavior . IF A = B THEN AeqB <= '1' .. WHEN "0011" => leds <= "1111001" . 207 Describing combinational logic using processes LIBRARY ieee .std_logic_1164. WHEN "0111" => leds <= "1110000" . WHEN "0101" => leds <= "1011011" . B : IN AeqB : OUT END compare1 . . . END PROCESS . END CASE . USE ieee. STD_LOGIC . . WHEN "0110" => leds <= "1011111" .all .std_logic_1164. END seg7 .. STD_LOGIC ) . ENTITY implied IS PORT ( A. B ) BEGIN IF A = B THEN AeqB <= '1' . .Implied latch (2) A B AeqB . 210 .all .Implied latch (1) LIBRARY ieee . USE ieee. 209 Incorrect code for combinational logic . B : IN AeqB : OUT END implied . STD_LOGIC . END PROCESS . . END Behavior . END IF ..Incorrect code for combinational logic . ..std_logic_1164. ARCHITECTURE Behavior OF implied IS BEGIN PROCESS ( A. 211 Covering all cases in the IF statement Using ELSE IF A = B THEN AeqB <= '1' . All possible cases need to be covered in the internal IF and CASE statements in order to avoid implied latches . None of the statements within the process should be sensitive to rising or falling edges 4. All inputs to the combinational circuit should be included in the sensitivity list 2. . . . IF A = B THEN AeqB <= '1' .Describing combinational logic using processes Rules that need to be followed: 1. 212 . ELSE AeqB <= '0' . Using default values AeqB <= '0' . No other signals should be included in the sensitivity list 3... …. WHEN S2 => Z <= "01". . . .. WHEN S3 => Z <= "00".Covering all cases in the CASE statement Using WHEN OTHERS CASE y IS WHEN S1 => Z <= "10".. 213 One-dimensional arrays – Examples (1) type word_asc is array(0 to 31) of std_logic. WHEN OTHERS => Z <= "00".. tmp(5):= ‘0’. …. type word_desc is array(31 downto 0) of std_logic. WHEN OTHERS => Z <= „--". END CASE. 214 . ….. END CASE. CASE y IS WHEN S1 => Z <= "10".. buffer_register(6) <= ‘1’.. END CASE. . …. WHEN S2 => Z <= "01". signal buffer_register: word_desc. WHEN S2 => Z <= "10". variable tmp : word_asc. CASE y IS WHEN S1 => Z <= "10". Using default values Z <= "00". error).std_logic_1164 package: std_logic_vector array of std_logic_vectors . idle. type state_counts_imp is array(idle to error) of natural. . counters(active) := 0.. 215 Predefined Unconstrained Array Types Predefined bit_vector array of bits string array of characters Defined in the ieee. variable counters: state_counts_exp. type state_counts_exp is array(controller_state range idle to error) of natural. .. .One-dimensional arrays – Examples (2) type controller_state is (initial. type state_counts_full is array(controller_state) of natural. counters(active) := counters(active) + 1... 216 . ….. active. …. …. Predefined Unconstrained Array Types subtype byte is bit_vector(7 downto 0). . 218 . variable long_sample is sample(0 to 255). . constant ready_message :string := “ready”. …. …. . 76).. -45. 217 User-defined Unconstrained Array Types type sample is array (natural range <>) of integer. …. constant look_up_table_1: sample := (127. signal memory_bus: std_logic_vector (31 downto 0). . …. …. 23. …. variable channel_busy : bit_vector(1 to 4). 63.. false otherwise . 31 downto 0).. 219 Array Attributes .Examples type A is array (1 to 4. . 220 .Array Attributes A’left(N) left bound of index range of dimension N of A A’right(N) right bound of index range of dimension N of A A’low(N) lower bound of index range of dimension N of A A’high(N) upper bound of index range of dimension N of A A’range(N) index range of dimension N of A A’reverse_range(N) reversed index range of dimension N of A A’length(N) length of index range of dimension N of A A’ascending(N) true if index range of dimension N of A is an ascending range. .. A’left(1) =1 A’right(2) =0 A’low(1) =1 A’high(2) = 31 A’range(1) = 1 to 4 A’length(2) = 32 A’ascending(2) = false . Subprograms • Include • • • • • functions and procedures Commonly used pieces of code Can be placed in a library. and then reused and shared among various projects Abstract operations that are repeatedly performed Type conversions Use only sequential statements. 222 . . . the same as processes 221 .. Typical locations of subprograms PACKAGE PACKAGE BODY LIBRARY global FUNCTION / PROCEDURE ENTITY local for all architectures of a given entity ARCHITECTURE Declarative part local for a given architecture .. 224 Function syntax FUNCTION function_name (<parameter_list>) RETURN data_type IS [declarations] BEGIN (sequential statements) END function_name. or TO/DOWNTO for STD_LOGIC_VECTOR) • are always used in some expression. no range specification should be included (for example no RANGE for INTEGERS.Functions – basic features Functions • always return a single value as a result • Are called using formal and actual parameters the same way as components • never modify parameters passed to them • parameters can only be constants (including generics) and signals (including ports). . 223 . variables are not allowed.. the default is a CONSTANT • when passing parameters. and not called on their own . .. . .examples x <= conv_integer(a). 225 Function calls . b: INTEGER.. b) LOOP .... IF x > maximum(a. . .. b) THEN ..Function parameters .example FUNCTION f1 (a. SIGNAL c: STD_LOGIC_VECTOR) RETURN BOOLEAN IS BEGIN (sequantial statements) END f1... 226 . .. .. WHILE minimum(a. RETURN m.all. ENTITY log2_int IS GENERIC (m: INTEGER :=20).. y: OUT STD_LOGIC_VECTOR(7 DOWNTO 0) ). ARCHITECTURE log2_int OF log2_int IS CONSTANT l2m : INTEGER := log2_ceil (m). .std_logic_1164.Function – Example 1 LIBRARY ieee.all. END LOOP. USE ieee. BEGIN m := 0. .all. PACKAGE body my_package IS FUNCTION log2_ceil (CONSTANT s: INTEGER) RETURN INTEGER IS VARIABLE m. PORT (x: IN STD_LOGIC_VECTOR(3 DOWNTO 0).4). END log2_int. PACKAGE my_package IS FUNCTION log2_ceil (CONSTANT s: INTEGER) RETURN INTEGER. 227 Function call – Example 1 LIBRARY ieee.std_logic_1164. SIGNAL r: STD_LOGIC_VECTOR(3 DOWNTO 0). END my_package. n := n*2. n := 1. WHILE (n < s) LOOP m := m + 1. END log2_int. 228 . USE ieee.my_package. USE ieee.std_logic_unsigned.n : INTEGER. BEGIN r <= conv_std_logic_vector(l2m. . USE work. END my_package. END log2_ceil. . y <= x*r..all. RETURN( Result ). 229 . Exp : INTEGER) RETURN INTEGER IS VARIABLE Result : INTEGER := 1. . END LOOP..Function – Example 2 library IEEE. BEGIN FOR i IN 1 TO Exp LOOP Result := Result * N. END powerOfFour. 4). ).std_logic_1164. 230 Function – Example 2 ARCHITECTURE behavioral OF powerOfFour IS FUNCTION Pow ( SIGNAL N:INTEGER. use IEEE. . END Pow. .. BEGIN Y <= Pow(X. Y : OUT INTEGER.all. . ENTITY powerOfFour IS PORT( X : IN INTEGER. END behavioral. 232 .. PACKAGE specialFunctions IS FUNCTION Pow( SIGNAL N: INTEGER. END specialFunctions . .Package containing a function (1) LIBRARY IEEE. USE IEEE. Exp : INTEGER) RETURN INTEGER IS VARIABLE Result : INTEGER := 1. . Exp : INTEGER) RETURN INTEGER. END specialFunctions . END LOOP.. END Pow. 231 Package containing a function (2) PACKAGE BODY specialFunctions IS FUNCTION Pow(SIGNAL N: INTEGER.std_logic_1164. BEGIN FOR i IN 1 TO Exp LOOP Result := Result * N.all. RETURN( Result ). . END my_package. . 233 Type conversion function (2) PACKAGE BODY my_package IS FUNCTION conv_integer (SIGNAL vector: STD_LOGIC_VECTOR) RETURN INTEGER. IF (vector(i) = ‘1’) THEN result := result+1.Type conversion function (1) LIBRARY ieee. END my_package. FOR i IN (vector’HIGH-1) DOWNTO (vector’LOW) LOOP result := result*2.all.1. BEGIN IF(vector(vector’HIGH)=‘1’ THEN result:=1. END IF.. RETURN result. .std_logic_1164. ELSE result := 0. END conv_integer. . ------------------------------------------------------------------------------------------------- . VARIABLE result: INTEGER RANGE 0 TO 2**vector’LENGTH . VARIABLE carry: STD_LOGIC. 234 . USE ieee. ------------------------------------------------------------------------------------------------PACKAGE my_package IS FUNCTION conv_integer (SIGNAL vector: STD_LOGIC_VECTOR) RETURN INTEGER. USE ieee. y: OUT INTEGER RANGE 0 TO 15). USE work. range specification should be included (for example RANGE for INTEGERS.std_logic_1164.my_package. END conv_int2. ------------------------------------------------------------------------------------------------ARCHITECTURE my_arch OF conv_int2 IS BEGIN y <= conv_integer(a)... END my_arch. the default for outputs (modes out and inout) is a variable • when passing parameters. the default for inputs (mode in) is a constant. OUT. 235 Procedures – basic features Procedures • do not return a value • are called using formal and actual parameters the same way as components • may modify parameters passed to them • each parameter must have a mode: IN. ------------------------------------------------------------------------------------------------ENTITY conv_int2 IS PORT ( a: IN STD_LOGIC_VECTOR (3 DOWNTO 0). and TO/DOWNTO for STD_LOGIC_VECTOR) • Procedure calls are statements on their own . and variables. .all. 236 . INOUT • parameters can be constants (including generics). signals (including ports).Type conversion function (3) LIBRARY ieee. . .all. . SIGNAL c: STD_LOGIC_VECTOR) RETURN BOOLEAN IS BEGIN (sequantial statements) END f1.example FUNCTION f1 (a..Procedure syntax PROCEDURE procedure_name (<parameter_list>) IS [declarations] BEGIN (sequential statements) END function_name. . . b: INTEGER. 238 . . 237 Procedure parameters .. quotient.. ENTITY decoder IS port ( decIn: IN STD_LOGIC_VECTOR(1 DOWNTO 0). in3. out1. USE work. END decoder.Procedure calls . 239 Procedure – example (1) LIBRARY ieee. . .examples compute_min_max(in1. divide(dividend. IF (a > b) THEN compute_min_max(in1. remainder)..decProcs. in3. USE ieee.all. 240 . divisor.. out2).. in2. decOut: OUT STD_LOGIC_VECTOR(3 DOWNTO 0) ). in2.std_logic_1164. ... out1.all... out2). . . Procedure – example (2) ARCHITECTURE simple OF decoder IS PROCEDURE DEC2x4 (inputs : in STD_LOGIC_VECTOR(1 downto 0).al. ------------------------------------------------------------------------------------------------- .std_logic_1164. END DEC2x4. USE ieee. END case. 242 .. WHEN "00" => decode := "0001". .decOut). 241 Operator as a function (1) LIBRARY ieee. WHEN "01" => decode := "0010". decode: out STD_LOGIC_VECTOR(3 downto 0) ) IS BEGIN CASE inputs IS WHEN "11" => decode := "1000". END my_package. . . WHEN others => decode := "0001". b: STD_LOGIC_VECTOR) RETURN STD_LOGIC_VECTOR.. END simple. WHEN "10" => decode := "0100". ------------------------------------------------------------------------------------------------PACKAGE my_package IS FUNCTION "+" (a. BEGIN DEC2x4(decIn. END LOOP. . 243 Operator overloading • Operator overloading allows different argument types for a given operation (function) • The VHDL tools resolve which of these functions to select based on the types of the inputs • This selection is transparent to the user as long as the function has been defined for the given argument types. .. FOR i IN a’REVERSE_RANGE LOOP result(i) := a(i) XOR b(i) XOR carry. . carry := (a(i) AND b(i)) OR (a(i) AND carry) OR (b(i) AND carry)). 244 . VARIABLE carry: STD_LOGIC. . END "+" . b: STD_LOGIC_VECTOR) RETURN STD_LOGIC_VECTOR. RETURN result.Operator as a function (2) PACKAGE BODY my_package IS FUNCTION "+" (a. BEGIN carry := ‘0’. VARIABLE result: STD_LOGIC_VECTOR.. END my_package. 246 . . 245 Different declarations for the same operator Example signal count: std_logic_vector(7 downto 0).. R:std_logic_vector) return std_logic_vector. function “+” ( L: std_logic_vector. .std_logic_unsigned: function “+” ( L: std_logic_vector. or count <= count + ‘1’. R: integer) return std_logic_vector. function “+” ( L: std_logic_vector. R:std_logic) return std_logic_vector. You can use: count <= count + “0000_0001”. . .Different declarations for the same operator Example Declarations in the package ieee.. or count <= count + 1. g. *) includes the types of values to which the operation may be applied. and. . constant) of a given type • Every object can only assume values of its nominated type • Each operation (e. signal apple1: apples... variable.. . begin apple1 <= orange1.Notion of type • Type defines a set of values and a set of applicable operations • Declaration of a type determines which values can be stored in an object (signal. end incorrect. 248 . and the type of the result • The goal of strong typing is a detection of errors at an early stage of the design process . 247 Example of strong typing architecture incorrect of example1 is type apples is range 0 to 100. . signal orange1: oranges. +. type oranges is range 0 to 100. . type set_index_range is range 999 downto 100.. 231-1 . constant number_of_bits: integer :=32. but at least numbers in the range –(231-1) . type bit_index is range 0 to number_of_bits-1. .. . 249 User defined integer types . but need to be known when the model is analyzed..Examples type day_of_month is range 0 to 31. Values of bounds can be expressions. 250 . type year is range 0 to 2100.Integer type Name: Status: Contents: integer predefined all integer numbers representable on a particular host computer. . false) bit (‘0’. 251 Predefined enumeration types (2) severity_level (note. mode_error) . ‘1’) character VHDL-87: 128 7-bit ASCII characters VHDL-93: 256 ISO 8859 Latin-1 8-bit characters . warning. failure) Predefined in VHDL-93 only: file_open_kind (read_mode.. 252 .. error. status_error.Predefined enumeration types (1) boolean (true. write_mode. name_error. . append_mode) file_open_status (open_ok. ‘-’. add. ‘1’. ‘\’). cr. S1). type mixed is (lf. VHDL-93: 32-bit representation . ‘4’. pass. ‘7’). 254 . . multiply. divide). Each value in an enumeration type must be either an identifier or a character literal . type alu_function is (disable. type octal_digit is (‘0’. ‘3’. ‘5’. . ‘6’. ‘/‘. 253 Floating point types • Used to represent real numbers • Numbers are represented using a significand (mantissa) part and an exponent part • Conform to the IEEE standard 754 or 854 Minimum size of representation that must be supported by the implementation of the VHDL standard: VHDL-2001: 64-bit representation VHDL-87. ‘2’. ht. subtract...User-defined enumeration types Examples type state is (S0. . .a5#E-8 0. 256 .1012 25 =(2-1+2-3) 25 0.0e-08 23.234 109 34.examples 23.234E09 34. .1 46E5 1E+12 1. 255 The ANSI/IEEE standard floating-point number representation formats ..Real literals .1 46 105 1 1012 1.48 8-6 = (4 8-1) 8-6 0.a516 16-8 =(1016-1+516-2) 16-8 .4#E-6 16#0.101#E5 8#0.0 10-8 2#0. 257 Attributes of all scalar types T’left T’right T’low T’high first (leftmost) value in T last (rightmost) value in T least value in T greatest value in T Not available in VHDL-87: T’ascending true if T is an ascending range. type output_range is max_output downto min_output.. constant max_output: real := 1. 258 .0E6. constant min_output: real := 1.0E-6.0 to 1.User-defined floating-point types Examples type input_level is range -10.0. false otherwise T’image(x) a string representing the value of x T’value(s) the value in T that is represented by s .. . .0 type probability is range 0.0 to +10. . examples type index_range is range 21 downto 11. . 259 Attributes of discrete types T’pos(x) T’val(n) T’succ(x) T’pred(x) T’leftof(x) T’rightof(x) position number of x in T value in T at position n value in T at position one greater than position of x value in T at position one less than position of x value in T at position one to the left of x value in T at position one to the right of x .. index_range’left index_range’right index_range’low index_range’high index_range’ascending index_range’image(14) index_range’value(“20”) = 21 = 11 = 11 = 21 = false = “14” = 20 . ..Attributes of all scalar types . 260 . 262 . . undriven.examples type logic_level is (unknown. high).. logic_level’pos(unknown) logic_level’val(3) logic_level’succ(unknown) logic_level’pred(undriven) logic_level’leftof(unknown) logic_level’rightof(undriven) =0 = high = low = low error = high .. .Attributes of discrete types . 261 Subtype • Defines a subset of a base type values • A condition that is used to determine which values are included in the subtype is called a constraint • All operations that are applicable to the base type also apply to any of its subtypes • Base type and subtype can be mixed in the operations. . low. otherwise an error is generated. but the result must belong to the subtype. Predefined subtypes natural integers t 0 positive integers > 0 Not predefined in VHDL-87: delay_length time t 0 ..0E-9 to 1. . 263 User-defined subtypes .Examples subtype bit_index is integer range 31 downto 0.. 264 .0E+12. . subtype input_range is real range 1. . .. 266 Operators (2) ..Operators (1) . 265 . . . end MAJORITY. Z_OUT : out STD_LOGIC). C_IN : in STD_LOGIC.Operators (3) . B_IN. architecture DATA_FLOW of MAJORITY is begin Z_OUT <= (not A_IN and B_IN and C_IN) or (A_IN and not B_IN and C_IN) or (A_IN and B_IN and not C_IN) or (A_IN and B_IN and C_IN) after 20 ns. . 267 Propagation delay in VHDL .Example entity MAJORITY is port (A_IN.. 268 .. end DATA_FLOW. . 270 . . . Logic gates behave like low pass filters and effectively filter out high frequency input changes as if they never occurred. .Propagation delay . 269 Inertial delay model Short pulses (spikes) are not passed to the outputs of logic gates due to the inertia of physical systems.Example ... .. .. 272 . 271 VHDL-87 Inertial delay model Any input signal change that does not persist for at least a propagation delay of the device is not reflected at the output. inertial delay (pulse rejection limit) = propagation delay .Example SIG_OUT <= not SIG_IN after 7 ns .Inertial delay model . to be different from the propagation delay. . . Implicitly: Z_OUT <= (not A_IN and B_IN and C_IN) or (A_IN and not B_IN and C_IN) or (A_IN and B_IN and not C_IN) or (A_IN and B_IN and C_IN) after 20 ns. also called a pulse rejection limit.VHDL-93 Enhanced inertial delay model VHDL-93 allows the inertial delay model to be declared explicitly as well as implicitly. SIG_OUT <= reject 5 ns inertial not SIG_IN after 7 ns. ... 273 VHDL-93 Enhanced inertial delay model VHDL-93 allows inertial delay. 274 . . Explicitly: Z_OUT <= inertial (not A_IN and B_IN and C_IN) or (A_IN and not B_IN and C_IN) or (A_IN and B_IN and not C_IN) or (A_IN and B_IN and C_IN) after 20 ns. Transport delay model With a transport delay model. . 275 Transport delay model . . . Transport delay model is used for high-level modeling. Inertial delay model is a default delay model because it reflects better the actual behavior of logic components. 276 ... regardless of how long the signal changes persist. Transport delay model must be declared explicitly using the keyword transport. all input signal changes are reflected at the output.Example SIG_OUT <= transport not SIG_IN after 7 ns . 277 Event list as an array – Timing wheel no events time List of events scheduled to occur at time tc signal new value . .Event-driven simulation time List of events scheduled to occur at time tq signal new value . 278 .. .. Used for functional simulation..Delta delay A propagation delay of 0 time units is equivalent to omitting the after clause and is called a delta delay. . .. . 279 Two-dimensional aspect of time . 280 . 282 . Y <= X.Simulation engine algorithm while (event list not empty) begin t = next time in list process entries for time t end If next time in list = previous time then the previous iteration of the loop has advanced time by one delta delay . 281 Signals vs Variables architecture DUMMY_1 of JUNK is signal Y : bit := ‘0’. Y <= X. . Variable assignment is immediate. end DUMMY_1.. X <= ‘1’. signal assignment with 0 delay take effect only after a delta delay.... -. in the next simulation cycle. architecture DUMMY_2 of JUNK is signal X. end process. wait for 10 ns. begin process variable X : bit := ‘0’. . -. begin wait for 10 ns. i.What is Y at this point ? ‘1’ .What is Y at this point ? ‘0’ ..e. end DUMMY_2. .. Y : bit := ‘0’. wait for 10 ns. X := ‘1’.. begin process begin wait for 10 ns. end process. S’event . S’active – True if there is a transaction on S in a given simulation cycle. . A value/time pair (v. present and future values. false otherwise. ..Properties of signals Signals represent a time-ordered list of values denoting past. 283 Signal attributes (1) S’transaction . t) is called a transaction. If a transaction changes value of a signal. .a signal of type bit that changes value from ‘0’ to ‘1’ or vice versa each time there is a transaction on S. false otherwise. This time history of a signal is called a waveform. it is called an event.True if there is an event on S in the current simulation cycle. 284 .. . S’last_value – The value of S just before the last event on S. otherwise false. .Signal attributes (2) S’last_event event on S. .The time interval since the last S’last_active . S’quiet(T) – A Boolean signal that is true if there has been no transaction on S in the time interval T up to the current time. S’stable(T) .The time interval since the last transaction on S. 285 Signal attributes (3) S’delayed(T) . but is delayed by time T. otherwise false.A Boolean signal that is true if there has been no event on S in the time interval T up to the current time.A signal that takes on the same value as S.. 286 . . . ..
Report "Sistemi Embedded - I Parte (2011-2012).pdf"