Stanford Linear System Theory

Lecture Notes for EE263Stephen Boyd Introduction to Linear Dynamical Systems Autumn 2010-11 Copyright Stephen Boyd. Limited copying or use for educational purposes is fine, but please acknowledge source, e.g., “taken from Lecture Notes for EE263, Stephen Boyd, Stanford 2010.” Contents Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture Lecture 1 – Overview 2 – Linear functions and examples 3 – Linear algebra review 4 – Orthonormal sets of vectors and QR factorization 5 – Least-squares 6 – Least-squares applications 7 – Regularized least-squares and Gauss-Newton method 8 – Least-norm solutions of underdetermined equations 9 – Autonomous linear dynamical systems 10 – Solution via Laplace transform and matrix exponential 11 – Eigenvectors and diagonalization 12 – Jordan canonical form 13 – Linear dynamical systems with inputs and outputs 14 – Example: Aircraft dynamics 15 – Symmetric matrices, quadratic forms, matrix norm, and SVD 16 – SVD applications 17 – Example: Quantum mechanics 18 – Controllability and state transfer 19 – Observability and state estimation 20 – Some final comments Basic notation Matrix primer Crimes against matrices Least-squares and least-norm solutions using Matlab Solving general linear equations using Matlab Low rank approximation and extremal gain problems Exercises EE263 Autumn 2010-11 Stephen Boyd Lecture 1 Overview • course mechanics • outline & topics • what is a linear dynamical system? • why study linear systems? • some examples 1–1 Course mechanics • all class info, lectures, homeworks, announcements on class web page: www.stanford.edu/class/ee263 course requirements: • weekly homework • takehome midterm exam (date TBD) • takehome final exam (date TBD) Overview 1–2 Prerequisites • exposure to linear algebra (e.g., Math 103) • exposure to Laplace transform, differential equations not needed, but might increase appreciation: • control systems • circuits & systems • dynamics Overview 1–3 Major topics & outline • linear algebra & applications • autonomous linear dynamical systems • linear dynamical systems with inputs & outputs • basic quadratic control & estimation Overview 1–4 Linear dynamical system continuous-time linear dynamical system (CT LDS) has the form dx = A(t)x(t) + B(t)u(t), dt where: • t ∈ R denotes time • x(t) ∈ Rn is the state (vector) • u(t) ∈ Rm is the input or control • y(t) ∈ Rp is the output Overview 1–5 y(t) = C(t)x(t) + D(t)u(t) • A(t) ∈ Rn×n is the dynamics matrix • B(t) ∈ Rn×m is the input matrix • C(t) ∈ Rp×n is the output or sensor matrix • D(t) ∈ Rp×m is the feedthrough matrix for lighter appearance, equations are often written x = Ax + Bu, ˙ y = Cx + Du • CT LDS is a first order vector differential equation • also called state equations, or ‘m-input, n-state, p-output’ LDS Overview 1–6 Some LDS terminology • most linear systems encountered are time-invariant: A, B, C, D are constant, i.e., don’t depend on t • when there is no input u (hence, no B or D) system is called autonomous • very often there is no feedthrough, i.e., D = 0 • when u(t) and y(t) are scalar, system is called single-input, single-output (SISO); when input & output signal dimensions are more than one, MIMO Overview 1–7 Discrete-time linear dynamical system discrete-time linear dynamical system (DT LDS) has the form x(t + 1) = A(t)x(t) + B(t)u(t), where • t ∈ Z = {0, ±1, ±2, . . .} • (vector) signals x, u, y are sequences y(t) = C(t)x(t) + D(t)u(t) DT LDS is a first order vector recursion Overview 1–8 Why study linear systems? applications arise in many areas, e.g. • automatic control systems • signal processing • communications • economics, finance • circuit analysis, simulation, design • mechanical and civil engineering • aeronautics • navigation, guidance Overview 1–9 Usefulness of LDS • depends on availability of computing power, which is large & increasing exponentially • used for – analysis & design – implementation, embedded in real-time systems • like DSP, was a specialized topic & technology 30 years ago Overview 1–10 1960s • transitioned from specialized topic to ubiquitous in 1980s (just like digital signal processing. ) but with more emphasis on linear algebra • first engineering application: aerospace. . ) Overview 1–11 Nonlinear dynamical systems many dynamical systems are nonlinear (a fascinating topic) so why study linear systems? • most techniques for nonlinear systems are based on linear methods • methods for linear systems often work unreasonably well. for nonlinear systems • if you don’t understand linear dynamical systems you certainly can’t understand nonlinear dynamical systems Overview 1–12 . .Origins and history • parts of LDS theory can be traced to 19th century • builds on classical circuits & systems (1920s on) (transfer functions . information theory. in practice. . . . no details) • let’s consider a specific system x = Ax. y(t) ∈ R (a ‘16-state single-output system’) • model of a lightly damped mechanical system. ˙ y = Cx with x(t) ∈ R16. looks almost random and unpredictable • we’ll see that such a solution can be decomposed into much simpler (modal) components Overview 1–14 . but it doesn’t matter Overview 1–13 typical output: 3 2 1 y 0 −1 −2 −3 0 50 100 150 200 250 300 350 3 2 1 t y 0 −1 −2 −3 0 100 200 300 400 500 600 700 800 900 1000 t • output waveform is very complicated.Examples (ideas only. C ∈ R2×16 (same A as before) problem: find appropriate u : R+ → R2 so that y(t) → ydes = (1.5 0 2 0 −2 0 1 0 −1 0 2 0 −2 0 5 0 −5 0 0. two outputs to system: x = Ax + Bu. y constant): x = 0 = Ax + Bustatic.2 0 −0.2 0 −0.0.63 0. x(0) = 0 where B ∈ R16×2. x. −2) simple approach: consider static conditions (u.36 Overview 1–16 . ˙ solve for u to get: ustatic = −CA−1B −1 y = ydes = Cx ydes = −0.2 0 50 100 150 50 100 150 50 100 150 50 100 150 50 100 150 50 100 150 50 100 150 t 200 250 300 350 50 100 150 200 250 300 350 200 250 300 350 200 250 300 350 200 250 300 350 200 250 300 350 200 250 300 350 200 250 300 350 (idea probably familiar from ‘poles’) Overview 1–15 Input design add two inputs.5 0 −0. ˙ y = Cx.2 0 1 0 −1 0 0. 2 0 −0. here y converges exactly in 50 sec Overview 1–18 .2 −0. .6 −0.let’s apply u = ustatic and just wait for things to settle: 0 u1 −0. takes about 1500 sec for y(t) to converge to ydes Overview 1–17 using very clever input waveforms (EE263) we can do much better.4 u2 0.2 0 u1 −0.4 u2 0.4 −0.2 −0.5 0 200 400 600 800 t 1000 1200 1400 1600 1800 y1 1 0.5 0 10 20 t 30 40 50 60 .5 0 10 20 t 30 40 50 60 0 y2 −0.1 −200 2 1.6 0 10 20 t 30 40 50 60 0.5 −1 −1. 0.5 0 −0.g. .2 0.3 0.5 −2 −2.1 0 −0.2 0 10 20 t 30 40 50 60 1 y1 0. e.5 0 −200 0 200 400 600 800 t 1000 1200 1400 1600 1800 0 y2 −1 −2 −3 −4 −200 0 200 400 600 800 1000 1200 1400 1600 1800 t . .8 −1 −200 0 200 400 600 800 t 1000 1200 1400 1600 1800 0. .4 −0. in fact by using larger inputs we do still better. . e.g. 5 u1 0 −5 −5 0 5 10 t 15 20 25 1 0. here we have (exact) convergence in 20 sec Overview 1–19 in this course we’ll study • how to synthesize or design such inputs • the tradeoff between size of u and convergence time Overview 1–20 .5 −5 2 1 0 5 10 t 15 20 25 y1 0 −1 −5 0 5 10 15 20 25 t 0 −0.5 −2 −5 0 5 10 t 15 20 25 .5 u2 0 −0. .5 y2 −1 −1.5 −1 −1. filtered signal y Overview 1–22 .5 0 1 2 3 4 5 6 7 8 9 10 s(t) 1 0. given quantized.5 0 1 0 1 2 3 4 5 6 7 8 9 10 w(t) 0 −1 1 0 1 2 3 4 5 6 7 8 9 10 y(t) 0 −1 0 1 2 3 4 5 6 7 8 9 10 t problem: estimate original signal u.Estimation / filtering u H(s) w A/D y • signal u is piecewise constant (period 1 sec) • filtered by 2nd-order system H(s). step response s(t) • A/D runs at 10Hz. with 3-bit quantizer Overview 1–21 1 u(t) 0 −1 1. well below quantization error (!) Overview 1–24 .4 −0.8 0. .e. .6 0. 1 u(t) (solid) and u(t) (dotted) ˆ 0.8 −1 0 1 2 3 4 5 6 7 8 9 10 t RMS error 0. GH ≈ 1) • approximate u as G(s)y . . yields terrible results Overview 1–23 formulate as estimation problem (EE263) .6 −0. .03.2 −0..simple approach: • ignore quantization • design equalizer G(s) for H(s) (i.4 0.2 0 −0.  ym   a11 a12 · · · a1n  a a22 · · · a2n  A =  21 .. . .  am1 am2 · · · amn   x1  x  x =  .EE263 Autumn 2010-11 Stephen Boyd Lecture 2 Linear functions and examples • linear equations and functions • engineering examples • interpretations 2–1 Linear equations consider system of linear equations y1 y2 ym = a11x1 + a12x2 + · · · + a1nxn = a21x1 + a22x2 + · · · + a2nxn .2   ..  .  . .  xn  Linear functions and examples 2–2 . = am1x1 + am2x2 + · · · + amnxn can be written in matrix form as y = Ax. where  y1  y  y =  .2   . e. ∀x. y ∈ Rn • f (αx) = αf (x). where A ∈ Rm×n • matrix multiplication function f is linear • converse is true: any linear function f : Rn → Rm can be written as f (x) = Ax for some A ∈ Rm×n • representation via matrix multiplication is unique: for any linear function f there is only one matrix A for which f (x) = Ax for all x • y = Ax is a concrete representation of a generic linear function Linear functions and examples 2–4 . ∀x ∈ Rn ∀α ∈ R i. superposition holds x+y f (y) f (x + y) f (x) y x Linear functions and examples 2–3 Matrix multiplication function • consider function f : Rn → Rm given by f (x) = Ax..Linear functions a function f : Rn −→ Rm is linear if • f (x + y) = f (x) + f (y). x is unknown to be determined • x is ‘input’ or ‘action’. • ith row of A concerns ith output • jth column of A concerns jth input • a27 = 0 means 2nd output (y2) doesn’t depend on 7th input (x7) • |a31| ≫ |a3j | for j = 1 means y3 depends mainly on x1 Linear functions and examples 2–6 .Interpretations of y = Ax • y is measurement or observation. y is ‘output’ or ‘result’ • y = Ax defines a function or transformation that maps x ∈ Rn into y ∈ Rm Linear functions and examples 2–5 Interpretation of aij n yi = j=1 aij xj aij is gain factor from jth input (xj ) to ith output (yi) thus.g. e.. e. xi • A is diagonal. list of zero/nonzero entries of A. aij = 0 for i < j. ... i. in some fixed direction x1 x2 x3 x4 (provided x. .e. shows which xj affect which yi Linear functions and examples 2–7 Linear elastic structure • xj is external force applied at some node.• |a52| ≫ |ai2| for i = 5 means x2 affects mainly y5 • A is lower triangular. means yi only depends on x1. i. .e. i. means ith output depends only on ith input more generally. . aij = 0 for i = j. sparsity pattern of A.. y are small) we have y ≈ Ax • A is called the compliance matrix • aij gives deflection i per unit force at j (in m/N) Linear functions and examples 2–8 . in some fixed direction • yi is (small) deflection of some node. y3 are x-.Total force/torque on rigid body x4 x1 x3 CG x2 • xj is external force/torque applied at some point/direction/axis • y ∈ R6 is resulting total force & torque on body (y1. y2.components of total torque) • we have y = Ax • A depends on geometry (of applied forces and torques with respect to center of gravity CG) • jth column gives resulting force & torque for unit force/torque j Linear functions and examples 2–9 Linear static circuit interconnection of resistors. z. y6 are x-. y4. y-. and independent sources y3 ib y1 x1 βib x2 y2 • xj is value of independent source j • yi is some circuit variable (voltage. current) • we have y = Ax • if xj are currents and yi are voltages. linear dependent (controlled) sources.components of total force. A is called the impedance or resistance matrix Linear functions and examples 2–10 . y5. z. y-. .Final position/velocity of mass due to applied forces f • unit mass. . y2 are final position and velocity (i. some component (typically vertical) of gi − gavg • y = Ax Linear functions and examples 2–12 . . constant in each interval) • y1. .. zero position/velocity at t = 0. subject to force f (t) for 0≤t≤n • f (t) = xj for j − 1 ≤ t < j. j = 1.e.e. n (x is the sequence of applied forces. i. . at t = n) • we have y = Ax • a1j gives influence of applied force during j − 1 ≤ t < j on final position • a2j gives influence of applied force during j − 1 ≤ t < j on final velocity Linear functions and examples 2–11 Gravimeter prospecting gi gavg ρj • xj = ρj − ρavg is (excess) mass density of earth in voxel j. • yi is measured gravity anomaly at location i. • A comes from physics and geometry • jth column of A shows sensor readings caused by unit density anomaly at voxel j • ith row of A shows sensitivity pattern of sensor i Linear functions and examples 2–13 Thermal system location 4 heating element 5 x1 x2 x3 x4 x5 • xj is power of jth heating element or heat source • yi is change in steady-state temperature at location i • thermal transport via conduction • y = Ax Linear functions and examples 2–14 . flat) patches. 0} (cos θij < 0 means patch i is shaded from lamp j) • jth column of A shows illumination pattern from lamp j Linear functions and examples 2–16 . no shadows • xj is power of jth lamp. yi is illumination level of patch i −2 • y = Ax.• aij gives influence of heater j at location i (in ◦C/W) • jth column of A gives pattern of steady-state temperature rise due to 1W at heater j • ith row shows how heaters affect location i Linear functions and examples 2–15 Illumination with multiple lamps pwr. xj θij rij illum. where aij = rij max{cos θij . yi • n lamps illuminating m (small. B is ‘small’) Linear functions and examples 2–17 Cost of production production inputs (materials. to the other receivers) • pj is power of jth transmitter • si is received signal power of ith receiver • zi is received interference power of ith receiver • Gij is path gain from transmitter j to receiver i • we have s = Ap. z = Bp.Signal and interference power in wireless system • n transmitter/receiver pairs • transmitter j transmits to receiver j (and. . parts. B has zero diagonal (ideally. . labor. where aij = Gii i = j 0 i=j bij = 0 Gij i=j i=j • A is diagonal. . A is ‘large’. inadvertantly. ) are combined to make a number of products • xj is price per unit of production input j • aij is units of production input j required to manufacture one unit of product i • yi is production cost per unit of product i • we have y = Ax • ith row of A is bill of materials for unit of product i Linear functions and examples 2–18 . .production inputs needed • qi is quantity of product i to be produced • rj is total quantity of production input j needed • we have r = AT q total production cost is rT x = (AT q)T x = q T Ax Linear functions and examples 2–19 Network traffic and flows • n flows with rates f1. is sum of rates of flows passing through it • flow routes given by flow-link incidence matrix Aij = 1 flow j goes over link i 0 otherwise • traffic and flow rates related by t = Af Linear functions and examples 2–20 . . fn pass from their source nodes to their destination nodes over fixed routes in a network • ti. . traffic on link i. . they are (approximately) related by a linear function Linear functions and examples 2–22 ∂fi ∂xj x0 . y0 = f (x0). . and l1. total # of packets in network Linear functions and examples 2–21 Linearization • if f : Rn → Rm is differentiable at x0 ∈ Rn. ln be latency (total travel time) of flows • l = AT d • f T l = f T AT d = (Af )T d = tT d. . . then x near x0 =⇒ f (x) very near f (x0) + Df (x0)(x − x0) where Df (x0)ij = is derivative (Jacobian) matrix • with y = f (x). . . . . dm be link delays. output deviation δy := y − y0 • then we have δy ≈ Df (x0)δx • when deviations are small.link delays and flow latency • let d1. define input deviation δx := x − x0. . q1) (p4. q3) Linear functions and examples 2–23 ρ2 (p2. (x. qi) known coordinates of beacons for i = 1. y) unknown position ρ3 (p3. y) unknown coordinates in plane • (pi. q2) • ρ ∈ R4 is a nonlinear function of (x.Navigation by range measurement • (x. y0) is last navigation fix. a short time later Linear functions and examples 2–24 . where (y0 − qi) (x0 − pi)2 + (y0 − qi)2 • linearize around (x0. y) is current position. 2. y) from (x0. q4) ρ4 ρ1 (x. . 3. y) = (x − pi)2 + (y − qi)2 δx δy . y) ∈ R2: ρi(x. y0) • first column of A shows sensitivity of range measurements to (small) change in x from x0 • obvious application: (x0. 4 • ρi measured (known) distance or range from beacon i beacons (p1. y0): δρ ≈ A ai1 = (x0 − pi) (x0 − pi)2 + (y0 − qi)2 ai2 = • ith row of A shows (approximate) change in ith range measurement for (small) shift in (x. all x’s consistent with measurements) • if there is no x such that y = Ax.Broad categories of applications linear model or function y = Ax some broad categories of applications: • estimation or inversion • control or design • mapping or transformation (this list is not exclusive..e. . find x which is almost consistent) Linear functions and examples 2–26 .. if the sensor readings are inconsistent.t. can have combinations . . find x s.e. ) Linear functions and examples 2–25 Estimation or inversion y = Ax • yi is ith measurement or sensor reading (which we know) • xj is jth parameter to be estimated or determined • aij is sensitivity of ith sensor to jth parameter sample problems: • find x. given y • find all x’s that result in y (i. y ≈ Ax (i. find a small or efficient x that meets specifications) Linear functions and examples 2–27 Mapping or transformation • x is mapped or transformed to y by linear function y = Ax sample problems: • determine if there is an x that maps to a given y • (if possible) find an x that maps to y • find all x’s that map to a given y • if there is only one x that maps to y.e. or outcomes • A describes how input choices affect results sample problems: • find x so that y = ydes • find all x’s that result in y = ydes (i.e. find it (i.Control or design y = Ax • x is vector of design parameters or inputs (which we can choose) • y is vector of results. find all designs that meet specifications) • among x’s that satisfy y = ydes.. decode or undo the mapping) Linear functions and examples 2–28 .e. find a small one (i...  . then Aej = aj .  1  .  .  0   0  1  e2 =  . the jth unit vector  1  0  e1 =  ..  . the jth column of A (ej corresponds to a pure mixture..  0   0  0  en =  .   .Matrix multiplication as mixture of columns write A ∈ Rm×n in terms of its columns: A= where aj ∈ Rm then y = Ax can be written as y = x1a1 + x2a2 + · · · + xnan (xj ’s are scalars.  . giving only column j) Linear functions and examples 2–30 . aj ’s are m-vectors) • y is a (linear) combination or mixture of the columns of A • coefficients of x give coefficients of mixture Linear functions and examples 2–29 a1 a2 · · · an an important example: x = ej . T amx ˜  thus yi = ai. x .e.2   . yi is inner product of ith row of A with x ˜ Linear functions and examples 2–31 geometric interpretation: yi = aT x = α is a hyperplane in Rn (normal to ai) ˜i ˜ yi = ai. x = 0 ˜ yi = ai..  . x = 1 ˜ Linear functions and examples 2–32 . x = 3 ˜ ai ˜ yi = ai. i. x = 2 ˜ yi = ai.  aT ñ  where ai ∈ Rn ˜ then y = Ax can be written as  aT x ˜1  aT x  ˜ y= 2   .Matrix multiplication as inner product with rows write A in terms of its rows:  aT ˜1  aT  ˜ A =  . for m = n = 2.g. A21 ∈ Rm2×n1 . A22 ∈ Rm2×n2 partition x and y conformably as x= x1 x2 .g.. block upper triangular. . diagonal.e. ) Linear functions and examples 2–33 example: block upper triangular. i.e. y2 ∈ Rm2 ) so y1 = A11x1 + A12x2. arrow . A12 ∈ Rm1×n2 ... . y1 ∈ Rm1 . .Block diagram representation y = Ax can be represented by a signal flow graph or block diagram e. i. y= y1 y2 (x1 ∈ Rn1 . x2 ∈ Rn2 . A= A11 A12 0 A22 where A11 ∈ Rm1×n1 . y2 doesn’t depend on x1 Linear functions and examples 2–34 y2 = A22x2. we represent y1 y2 as x1 a11 a21 x2 a12 a22 y2 y1 = a11 a12 a21 a22 x1 x2 • aij is the gain along the path from jth input to ith output • (by not drawing paths with zero gain) shows sparsity structure of A (e. C = AB ∈ Rm×p where n cij = k=1 aik bkj composition interpretation: y = Cz represents composition of y = Ax and x = Bz z p B x n A m y ≡ z p AB m y (note that B is on left in block diagram) Linear functions and examples 2–36 . so y2 doesn’t depend on x1 Linear functions and examples 2–35 Matrix multiplication as composition for A ∈ Rm×n and B ∈ Rn×p. no path from x1 to y2. . .block diagram: x1 A11 A12 y1 x2 A22 y2 . Column and row interpretations can write product C = AB as C= c1 · · · cp = AB = Ab1 · · · Abp i. entries of C are inner products of rows of A and columns of B • cij = 0 means ith row of A is orthogonal to jth column of B • Gram matrix of vectors f1. . . C= .  .e. fn defined as Gij = fiT fj (gives inner product of each vector with the others) • G = [f1 · · · fn]T [f1 · · · fn] Linear functions and examples 2–38 . T T cm ˜ amB ˜  i. ith column of C is A acting on ith column of B similarly we can write   T  cT ˜1 a1 B ˜ . bj ˜i ˜ i. ith row of C is ith row of A acting (on left) on B Linear functions and examples 2–37 Inner product interpretation inner product interpretation: cij = aT bj = ai.  = AB =  .. .e. .e... Matrix multiplication interpretation via paths x1 b21 b12 x2 b22 z2 b11 z1 a21 a12 a22 y2 path gain= a22b21 a11 y1 • aik bkj is gain of path from input j to output i via k • cij is sum of gains over all paths from input j to output i Linear functions and examples 2–39 . angle.EE263 Autumn 2010-11 Stephen Boyd Lecture 3 Linear algebra review • vector space. subspaces • independence. inner product 3–1 Vector spaces a vector space or linear space (over the reals) consists of • a set V • a vector sum + : V × V → V • a scalar multiplication : R × V → V • a distinguished element 0 ∈ V which satisfy a list of properties Linear algebra review 3–2 . nullspace. dimension • range. basis. rank • change of coordinates • norm. t. y. • 1x = x. ∀x ∈ V ∀α ∈ R ∀x. ∀x. . . vk ) = {α1v1 + · · · + αk vk | αi ∈ R} and v1. • (α + β)x = αx + βx. . vk ∈ Rn Linear algebra review 3–4 . z ∈ V (+ is associative) • (x + y) + z = x + (y + z). is associative) (right distributive rule) (left distributive rule) • ∀x ∈ V ∃(−x) ∈ V s. y ∈ V (+ is commutative) ∀x. . . . v2. y ∈ V ∀α. v2. β ∈ R ∀x ∈ V Linear algebra review 3–3 Examples • V1 = Rn. .• x + y = y + x. vk ) where span(v1. β ∈ R ∀x ∈ V • α(x + y) = αx + αy. x + (−x) = 0 • (αβ)x = α(βx). . ∀α. with standard (componentwise) vector addition and scalar multiplication • V2 = {0} (where 0 ∈ Rn) • V3 = span(v1. . • 0 + x = x. . . ∀x ∈ V (0 is additive identity) (existence of additive inverse) (scalar mult. . Subspaces • a subspace of a vector space is a subset of a vector space which is itself a vector space • roughly speaking. where vector sum is sum of functions: (x + z)(t) = x(t) + z(t) and scalar multiplication is defined by (αx)(t) = αx(t) (a point in V4 is a trajectory in Rn) • V5 = {x ∈ V4 | x = Ax} ˙ (points in V5 are trajectories of the linear system x = Ax) ˙ • V5 is a subspace of V4 Linear algebra review 3–6 . V3 above are subspaces of Rn Linear algebra review 3–5 Vector spaces of functions • V4 = {x : R+ → Rn | x is differentiable}. V2. a subspace is closed under vector addition and scalar multiplication • examples V1. . . α1v1 + α2v2 + · · · + αk vk = β1v1 + β2v2 + · · · + βk vk implies α1 = β1. . vk } is a basis for a vector space V if • v1. . vi+1. . α2 = β2. . . . . V = span(v1. i. . . . . . i. v2. . vi−1. vk Linear algebra review 3–7 Basis and dimension set of vectors {v1. . . . vk ) • {v1. . . . vk span V. the number of vectors in any basis is the same number of vectors in any basis is called the dimension of V. αk = βk • no vector vi can be expressed as a linear combination of the other vectors v1. .. vk } is independent equivalent: every v ∈ V can be uniquely expressed as v = α1v1 + · · · + αk vk fact: for a given vector space V. and dimV = ∞ if there is no basis) Linear algebra review 3–8 .e. v2. . denoted dimV (we assign dim{0} = 0. . . .e. v2. . v2. vk } is independent if α1v1 + α2v2 + · · · + αk vk = 0 =⇒ α1 = α2 = · · · = 0 some equivalent conditions: • coefficients of α1v1 + α2v2 + · · · + αk vk are uniquely determined. . v2. .Independent set of vectors a set of vectors {v1.. . . . a basis for their span) • A has a left inverse. BA = I • det(AT A) = 0 (we’ll establish these later) Linear algebra review 3–10 . if y = Ax and y = A˜.t.Nullspace of a matrix the nullspace of A ∈ Rm×n is defined as N (A) = { x ∈ Rn | Ax = 0 } • N (A) is set of vectors mapped to zero by y = Ax • N (A) is set of vectors orthogonal to all rows of A N (A) gives ambiguity in x given y = Ax: • if y = Ax and z ∈ N (A). then x = x + z for some z ∈ N (A) x ˜ Linear algebra review 3–9 Zero nullspace A is called one-to-one if 0 is the only element of its nullspace: N (A) = {0} ⇐⇒ • x can always be uniquely determined from y = Ax (i. the linear transformation y = Ax doesn’t ‘lose’ information) • mapping from x to Ax is one-to-one: different x’s map to different y’s • columns of A are independent (hence.e. there is a matrix B ∈ Rn×m s.. then y = A(x + z) • conversely.e. i.. Interpretations of nullspace suppose z ∈ N (A) y = Ax represents measurement of x • z is undetectable from sensors — get zero sensor readings • x and x + z are indistinguishable from sensors: Ax = A(x + z) N (A) characterizes ambiguity in x from measurement y = Ax y = Ax represents output resulting from input x • z is an input with no result • x and x + z have same result N (A) characterizes freedom of input choice for given result Linear algebra review 3–11 Range of a matrix the range of A ∈ Rm×n is defined as R(A) = {Ax | x ∈ Rn} ⊆ Rm R(A) can be interpreted as • the set of vectors that can be ‘hit’ by linear mapping y = Ax • the span of columns of A • the set of vectors y for which Ax = y has a solution Linear algebra review 3–12 . Onto matrices A is called onto if R(A) = Rm ⇐⇒ • Ax = y can be solved in x for any y • columns of A span Rm • A has a right inverse. sensors have failed or model is wrong y = Ax represents output resulting from input x • v is a possible result or output • w cannot be a result or output R(A) characterizes the possible results or achievable outputs Linear algebra review 3–14 . we’ll establish them later) Linear algebra review 3–13 Interpretations of range suppose v ∈ R(A). there is a matrix B ∈ Rn×m s. w ∈ R(A) y = Ax represents measurement of x • y = v is a possible or consistent sensor signal • y = w is impossible or inconsistent. AB = I • rows of A are independent • N (AT ) = {0} • det(AAT ) = 0 (some of these are not obvious.t. i..e. or post-) equalizer for the channel y = Ax • x = By is unique solution of Ax = y Linear algebra review 3–16 .Inverse A ∈ Rn×n is invertible or nonsingular if det A = 0 equivalent conditions: • columns of A are a basis for Rn • rows of A are a basis for Rn • y = Ax has a unique solution x for every y ∈ Rn • A has a (left and right) inverse denoted A−1 ∈ Rn×n. with AA−1 = A−1A = I • N (A) = {0} • R(A) = Rn • det AT A = det AAT = 0 Linear algebra review 3–15 Interpretations of inverse suppose A ∈ Rn×n has inverse B = A−1 • mapping associated with B undoes mapping associated with A (applied either before or after!) • x = By is a perfect (pre. Dual basis interpretation • let ai be columns of A. . . . and ˜T be rows of B = A−1 bi • from y = x1a1 + · · · + xnan and xi = ˜T y. . ñ and a1. we get bi n y= i=1 (˜T y)ai bi thus. . . inner product with rows of inverse matrix gives the coefficients in the expansion of a vector in the columns of the matrix • ˜1. n) • rank(A) + dim N (A) = n Linear algebra review 3–18 . an are called dual bases b b Linear algebra review 3–17 Rank of a matrix we define the rank of A ∈ Rm×n as rank(A) = dim R(A) (nontrivial) facts: • rank(A) = rank(AT ) • rank(A) is maximum number of independent columns (or rows) of A hence rank(A) ≤ min(m. . . C ∈ Rr×n: rank(A) lines x A y x C B y n m n r m • rank(A) = r is minimum size of vector needed to faithfully reconstruct y from x Linear algebra review 3–20 . C ∈ Rr×n.Conservation of dimension interpretation of rank(A) + dim N (A) = n: • rank(A) is dimension of set ‘hit’ by the mapping y = Ax • dim N (A) is dimension of set of x ‘crushed’ to zero by y = Ax • ‘conservation of dimension’: each dimension of input is either crushed to zero or ends up in output • roughly speaking: – n is number of degrees of freedom in input x – dim N (A) is number of degrees of freedom lost in the mapping from x to y = Ax – rank(A) is number of degrees of freedom in output y Linear algebra review 3–19 ‘Coding’ interpretation of rank • rank of product: rank(BC) ≤ min{rank(B). then rank(A) ≤ r • conversely: if rank(A) = r then A ∈ Rm×n can be factored as A = BC with B ∈ Rm×r . rank(C)} • hence if A = BC with B ∈ Rm×r . full rank means nonsingular • for skinny matrices (m ≥ n). n) • for square matrices. full rank means columns are independent • for fat matrices (m ≤ n). then y = Bz): rn + mr = (m + n)r operations • savings can be considerable if r ≪ min{m. n) we say A is full rank if rank(A) = min(m. A ∈ Rm×n • A has known factorization A = BC.Application: fast matrix-vector multiplication • need to compute matrix-vector product y = Ax. full rank means rows are independent Linear algebra review 3–22 . n} Linear algebra review 3–21 Full rank matrices for A ∈ Rm×n we always have rank(A) ≤ min(m. B ∈ Rm×r • computing y = Ax directly: mn operations • computing y = Ax as y = B(Cx) (compute z = Cx first. t2. 1 . . tn) is another basis for Rn. . . . . . . tn) ˜ define T = t 1 t2 · · · t n so x = T x. . . en) where  0 . . we have x = x1t1 + x2t2 + · · · + xntn ˜ ˜ ˜ where xi are the coordinates of x in the basis (t1. . hence ˜ x = T −1x ˜ (T is invertible since ti are a basis) T −1 transforms (standard basis) coordinates of x into ti-coordinates inner product ith row of T −1 with x extracts ti-coordinate of x Linear algebra review 3–24 . 0         ei =    (1 in ith component) obviously we have x = x1e1 + x2e2 + · · · + xnen xi are called the coordinates of x (in the standard basis) Linear algebra review 3–23 if (t1.Change of coordinates ‘standard’ basis vectors in Rn: (e1. t2. . e2. . . t2 . tn: x = T x. tn Linear algebra review 3–25 (Euclidean) norm for x ∈ Rn we define the (Euclidean) norm as x = x2 + x2 + · · · + x2 = n 1 2 √ xT x x measures length of vector (from origin) important properties: • αx = |α| x (homogeneity) • x + y ≤ x + y (triangle inequality) • x ≥ 0 (nonnegativity) • x = 0 ⇐⇒ x = 0 (definiteness) Linear algebra review 3–26 .consider linear transformation y = Ax. A ∈ Rn×n express y and x in terms of t1. t2. . ˜ so y = (T −1AT )˜ ˜ x y = Ty ˜ • A −→ T −1AT is called similarity transformation • similarity transformation by T expresses linear transformation y = Ax in coordinates t1. . . . . . . y = y. y = α x.RMS value and (Euclidean) distance root-mean-square (RMS) value of vector x ∈ Rn: rms(x) = 1 n n 1/2 x2 i i=1 x = √ n norm defines distance between vectors: dist(x. x ≥ 0 • x. with linear map defined by row vector xT Linear algebra review 3–28 . y := x1y1 + x2y2 + · · · + xnyn = xT y important properties: • αx. x • x. z • x. y is linear function : Rn → R. y • x + y. y) = x − y x x−y y Linear algebra review 3–27 Inner product x. x = 0 ⇐⇒ x = 0 f (y) = x. z = x. z + y. (if x = 0) y = αx for some α ≥ 0 • x and y are opposed: θ = π.Cauchy-Schwartz inequality and angle between vectors • for any x. xT y = x y . y) = cos−1 xT y x y x y θ thus xT y = x y cos θ Linear algebra review 3–29 xT y y 2 y special cases: • x and y are aligned: θ = 0. xT y = 0 denoted x ⊥ y Linear algebra review 3–30 . |xT y| ≤ x y • (unsigned) angle between vectors in Rn defined as θ = (x. y ∈ Rn. xT y = − x y (if x = 0) y = −αx for some α ≥ 0 • x and y are orthogonal: θ = π/2 or −π/2. and boundary passing through 0 {x | xT y ≤ 0} y 0 Linear algebra review 3–31 .interpretation of xT y > 0 and xT y < 0: • xT y > 0 means (x. y) is acute • xT y < 0 means (x. y) is obtuse x y x y xT y > 0 xT y < 0 {x | xT y ≤ 0} defines a halfspace with outward normal vector y. not vectors individually in terms of U = [u1 · · · uk ]. . . orthonormal means U T U = Ik Orthonormal sets of vectors and QR factorization 4–2 . . k (ui are called unit vectors or direction vectors) • orthogonal if ui ⊥ uj for i = j • orthonormal if both slang: we say ‘u1. QR factorization • orthogonal decomposition induced by a matrix 4–1 Orthonormal set of vectors set of vectors {u1. . i = 1. .EE263 Autumn 2010-11 Stephen Boyd Lecture 4 Orthonormal sets of vectors and QR factorization • orthonormal set of vectors • Gram-Schmidt procedure. uk are orthonormal vectors’ but orthonormality (like independence) is a property of a set of vectors. . . . uk } ∈ Rn is • normalized if ui = 1. . . . . . then w = z • multiplication by U does not change norm • mapping w = U z is isometric: it preserves distances • simple derivation using matrices: w 2 = Uz 2 = (U z)T (U z) = z T U T U z = z T z = z 2 Orthonormal sets of vectors and QR factorization 4–4 . uk } is an orthonormal basis for span(u1. .• an orthonormal set of vectors is independent (multiply α1u1 + α2u2 + · · · + αk uk = 0 by uT ) i • hence {u1. . . . . . ) Orthonormal sets of vectors and QR factorization 4–3 Geometric properties suppose columns of U = [u1 · · · uk ] are orthonormal if w = U z. uk ) = R(U ) • warning: if k < n then U U T = I (since its rank is at most k) (more on this matrix later . . . . and distances Orthonormal sets of vectors and QR factorization 4–5 Orthonormal basis for Rn • suppose u1. U z ) = (z. z ) ˜ ˜ • thus. not orthogonal) • it follows that U −1 = U T .e. U z = (U z)T (U z ) = z T U T U z = z.• inner products are also preserved: U z. . n ui uT = I i i=1 Orthonormal sets of vectors and QR factorization 4–6 . . z ˜ ˜ • if w = U z and w = U z then ˜ ˜ w. un is an orthonormal basis for Rn • then U = [u1 · · · un] is called orthogonal: it is square and satisfies UTU = I (you’d think such matrices would be called orthonormal. and hence also U U T = I. . angles. w = U z. multiplication by U preserves inner products. z ˜ ˜ ˜ ˜ ˜ • norms and inner products preserved. so angles are preserved: (U z.. . U z = z. i. Expansion in orthonormal basis suppose U is orthogonal.e. i. n x= i=1 (uT x)ui i • uT x is called the component of x in the direction ui i • a = U T x resolves x into the vector of its ui components • x = U a reconstitutes x from its ui components n • x = Ua = i=1 aiui is called the (ui-) expansion of x 4–7 Orthonormal sets of vectors and QR factorization the identity I = U U T = n T i=1 ui ui n is sometimes written (in physics) as |ui ui| I= i=1 since x= n |ui ui|x i=1 (but we won’t use this notation) Orthonormal sets of vectors and QR factorization 4–8 .. so x = U U T x. sin θ). i. e2 → (− sin θ. i. Uθ = cos θ − sin θ sin θ cos θ since e1 → (cos θ.. cos θ) reflection across line x2 = x1 tan(θ/2) is given by y = Rθ x.e. U z = z ˜ ˜ • preserves angles between vectors.. Rθ = cos θ sin θ sin θ − cos θ since e1 → (cos θ. then transformation w = U z • preserves norm of vectors. − cos θ) Orthonormal sets of vectors and QR factorization 4–10 .e. z ) examples: • rotations (about some axis) • reflections (through some plane) Orthonormal sets of vectors and QR factorization 4–9 Example: rotation by θ in R2 is given by y = Uθ x. (U z. e2 → (sin θ. U z ) = (z.Geometric interpretation if U is orthogonal. sin θ). . . . ak ∈ Rn. ar ) • rough idea of method: first orthogonalize each vector w. . .r. . . .x2 e2 θ rotation x2 e2 θ reflection e1 x1 e1 x1 can check that Uθ and Rθ are orthogonal Orthonormal sets of vectors and QR factorization 4–11 Gram-Schmidt procedure • given independent vectors a1. . . . . .t. . . span(a1. previous ones. .t. . . qk s. then normalize result to have norm one Orthonormal sets of vectors and QR factorization 4–12 . q1. . . qr is an orthonormal basis for span(a1. G-S procedure finds orthonormal vectors q1. . ar ) = span(q1. . . qr ) for r ≤ k • thus. . q3 := q3/ q3 ˜ ˜ • etc. q1 := a1 ˜ • step 1b. k we have T T T ai = (q1 ai)q1 + (q2 ai)q2 + · · · + (qi−1ai)qi−1 + qi qi ˜ = r1iq1 + r2iq2 + · · · + riiqi (note that the rij ’s come right out of the G-S procedure.Gram-Schmidt procedure • step 1a. q2 := q2/ q2 ˜ ˜ (normalize) T T • step 3a. . q2 components) ˜ • step 3b. q3 := a3 − (q1 a3)q1 − (q2 a3)q2 (remove q1. . q1 := q1/ q1 ˜ ˜ (normalize) T • step 2a. 2. q2 := a2 − (q1 a2)q1 (remove q1 component from a2) ˜ • step 2b. and rii = 0) Orthonormal sets of vectors and QR factorization 4–14 . (normalize) Orthonormal sets of vectors and QR factorization 4–13 a2 T q2 = a2 − (q1 a2)q1 ˜ q2 q1 q1 = a1 ˜ for i = 1. . . where A ∈ Rn×k . Q ∈ Rn×k . . 0 0 · · · rkk  R a1 a2 · · · ak A = q1 q2 · · · q k Q • QT Q = Ik . . . qr = a/ a . ˜ if a = 0 { r = r + 1.  . skip to next vector aj+1 ˜ and continue: r = 0. . . . . and R is upper triangular & invertible • called QR decomposition (or factorization) of A • usually computed using a variation on Gram-Schmidt procedure which is less sensitive to numerical (rounding) errors • columns of Q are orthonormal basis for R(A) Orthonormal sets of vectors and QR factorization 4–15 General Gram-Schmidt procedure • in basic G-S we assume a1. . . we find qj = 0 for some j..  . R ∈ Rk×k :  r11 r12 · · · r1k  0 r22 · · · r2k   . aj−1 • modified algorithm: when we encounter qj = 0. . for i = 1. . which means aj ˜ is linearly dependent on a1.. ak ∈ Rn are independent • if a1.  . . . . ak are dependent. k { r T a = ai − j=1 qj qj ai. } ˜ ˜ ˜ } Orthonormal sets of vectors and QR factorization 4–16 . . . .QR decomposition written in matrix form: A = QR. . . . .on exit. . qr is an orthonormal basis for R(A) (hence r = Rank(A)) • each ai is linear combination of previously generated qj ’s in matrix notation we have A = QR with QT Q = Ir and R ∈ Rr×k in upper staircase form: × × × possibly nonzero entries × × × zero entries × × × ‘corner’ entries (shown as ×) are nonzero Orthonormal sets of vectors and QR factorization 4–17 can permute columns with × to front of matrix: ˜ A = Q[R S]P where: • QT Q = Ir ˜ • R ∈ Rr×r is upper triangular and invertible • P ∈ Rk×k is a permutation matrix (which moves forward the columns of a which generated a new q) Orthonormal sets of vectors and QR factorization 4–18 . • q1. . ..g. . columns of Q2 ∈ Rn×(n−r) are orthonormal. .t. A = I) ˜ • apply general Gram-Schmidt to [A A] • Q1 are orthonormal vectors obtained from columns of A ˜ • Q2 are orthonormal vectors obtained from extra columns (A) Orthonormal sets of vectors and QR factorization 4–20 . r = Rank(A) • to check if b ∈ span(a1. .e. . .Applications • directly yields orthonormal basis for R(A) • yields factorization A = BC with B ∈ Rn×r . ak ): apply Gram-Schmidt to [a1 · · · ak b] • staircase pattern in R shows which columns of A are dependent on previous ones works incrementally: one G-S procedure yields QR factorizations of [a1 · · · ap] for p = 1. k: [a1 · · · ap] = [q1 · · · qs]Rp where s = Rank([a1 · · · ap]) and Rp is leading s × p submatrix of R Orthonormal sets of vectors and QR factorization 4–19 ‘Full’ QR factorization with A = Q1R1 the QR factorization as above. write A= Q1 Q2 R1 0 where [Q1 Q2] is orthogonal. C ∈ Rr×k . [A A] is full rank (e.. . . i. orthogonal to Q1 to find Q2: ˜ ˜ ˜ • find any matrix A s. .e. but what is its orthogonal complement R(Q2)? Orthonormal sets of vectors and QR factorization 4–21 ⊥ Orthogonal decomposition induced by A from AT = T R1 0 QT 1 QT 2 we see that AT z = 0 ⇐⇒ QT z = 0 ⇐⇒ z ∈ R(Q2) 1 so R(Q2) = N (AT ) (in fact the columns of Q2 are an orthonormal basis for N (AT )) we conclude: R(A) and N (AT ) are complementary subspaces: • R(A) + N (AT ) = Rn (recall A ∈ Rn×k ) • R(A)⊥ = N (AT ) (and N (AT )⊥ = R(A)) • called orthogonal decomposition (of Rn) induced by A ∈ Rn×k Orthonormal sets of vectors and QR factorization 4–22 ⊥ . any set of orthonormal vectors can be extended to an orthonormal basis for Rn R(Q1) and R(Q2) are called complementary subspaces since • they are orthogonal (i. one from each subspace) this is written • R(Q1) + R(Q2) = Rn • R(Q2) = R(Q1)⊥ (and R(Q1) = R(Q2)⊥) (each subspace is the orthogonal complement of the other) we know R(Q1) = R(A).i. every vector in the first subspace is orthogonal to every vector in the second subspace) • their sum is Rn (i. every vector in Rn can be expressed as a sum of two vectors..e.e.. ) • can now prove most of the assertions from the linear algebra review lecture • switching A ∈ Rn×k to AT ∈ Rk×n gives decomposition of Rk : N (A) + R(AT ) = Rk ⊥ Orthonormal sets of vectors and QR factorization 4–23 . with z ∈ R(A). . .• every y ∈ Rn can be written uniquely as y = z + w. w ∈ N (AT ) (we’ll soon see what the vector z is . cannot solve for x one approach to approximately solve y = Ax: • define residual or error r = Ax − y • find x = xls that minimizes r xls called least-squares (approximate) solution of y = Ax Least-squares 5–2 .EE263 Autumn 2010-11 Stephen Boyd Lecture 5 Least-squares • least-squares (approximate) solution of overdetermined equations • projection and orthogonality principle • least-squares estimation • BLUE property 5–1 Overdetermined linear equations consider y = Ax where A ∈ Rm×n is (strictly) skinny.e. i.. m > n • called overdetermined set of linear equations (more equations than unknowns) • for most y. .Geometric interpretation Axls is point in R(A) closest to y (Axls is projection of y onto R(A)) y r Axls R(A) Least-squares 5–3 Least-squares (approximate) solution • assume A is full rank.t. so we have xls = (AT A)−1AT y . a very famous formula Least-squares 5–4 • yields the normal equations: AT Ax = AT y . = xT AT Ax − 2y T Ax + y T y • set gradient w. x to zero: ∇x r 2 = 2AT Ax − 2AT y = 0 • assumptions imply AT A invertible. we’ll minimize norm of residual squared. . skinny r 2 • to find xls.r. and given by PR(A)(y) = Axls = A(AT A)−1AT y • A(AT A)−1AT is called the projection matrix (associated with R(A)) Least-squares 5–6 . it is the projection of y onto R(A) Axls = PR(A)(y) • the projection function PR(A) is linear. skinny) A: A†A = (AT A)−1AT A = I Least-squares 5–5 Projection on R(A) Axls is (by definition) the point in R(A) that is closest to y.e..• xls is linear function of y • xls = A−1y if A is square • xls solves y = Axls if y ∈ R(A) • A† = (AT A)−1AT is called the pseudo-inverse of A • A† is a left inverse of (full rank. i. Orthogonality principle optimal residual r = Axls − y = (A(AT A)−1AT − I)y is orthogonal to R(A): r. Ax − y > Axls − y Least-squares 5–8 . we have Ax − y 2 = = Axls − y (Axls − y) + A(x − xls) 2 2 2 + A(x − xls) this shows that for x = xls. Az = y T (A(AT A)−1AT − I)T Az = 0 for all z ∈ Rn y r Axls R(A) Least-squares 5–7 Completion of squares since r = Axls − y ⊥ A(x − xls) for any x. so Ax − y 2 = = [Q1 Q2] T R1 0 2 x−y R1 0 2 [Q1 Q2] [Q1 Q2] x − [Q1 Q2] y 5–10 T Least-squares . R ∈ Rn×n upper triangular.Least-squares via QR factorization • A ∈ Rm×n skinny. invertible • pseudo-inverse is (AT A)−1AT = (RT QT QR)−1RT QT = R−1QT so xls = R−1QT y • projection on R(A) given by matrix A(AT A)−1AT = AR−1QT = QQT Least-squares 5–9 Least-squares via full QR factorization • full QR factorization: A = [Q1 Q2] R1 0 with [Q1 Q2] ∈ Rm×m orthogonal. R1 ∈ Rn×n upper triangular. full rank • factor as A = QR with QT Q = In. invertible • multiplication by orthogonal matrix doesn’t change norm. = = R1x − QT y 1 R1x − QT y 1 −QT y 2 2 2 + QT y 2 2 −1 • this is evidently minimized by choice xls = R1 QT y 1 (which makes first term zero) • residual with optimal x is Axls − y = −Q2QT y 2 • Q1QT gives projection onto R(A) 1 • Q2QT gives projection onto R(A)⊥ 2 Least-squares 5–11 Least-squares estimation many applications in inversion. estimation. and reconstruction problems have form y = Ax + v • x is what we want to estimate or reconstruct • y is our sensor measurement(s) • v is an unknown noise or measurement error (assumed small) • ith row of A characterizes ith sensor Least-squares 5–12 . and • what we would observe if x = x. deviation between • what we actually observed (y).least-squares estimation: choose as estimate x that minimizes ˆ Aˆ − y x i. skinny consider a linear estimator of form x = By ˆ • called unbiased if x = x whenever v = 0 ˆ (i.. no estimation error when there is no noise) same as BA = I..e. and there were no noise (v = 0) ˆ least-squares estimate is just x = (AT A)−1AT y ˆ Least-squares 5–13 BLUE property linear measurement with noise: y = Ax + v with A full rank.e. B is left inverse of A Least-squares 5–14 . i.e.. so linearization around x = 0 (say) nearly exact Least-squares 5–16 .e..j i. in the following sense: for any B with BA = I. then. we’d like B ‘small’ (and BA = I) • fact: A† = (AT A)−1AT is the smallest left inverse of A.• estimation error of unbiased linear estimator is x − x = x − B(Ax + v) = −Bv ˆ obviously.j i. least-squares provides the best linear unbiased estimator (BLUE) Least-squares 5–15 Navigation from range measurements navigation using range measurements from distant beacons beacons k4 x unknown position k3 k1 k2 beacons far from unknown position x ∈ R2. we have 2 Bij ≥ A†2 ij i. 84.9 Least-squares 5–18 . −2. given y ∈ R4 (roughly speaking.ranges y ∈ R4 measured. with standard deviation 2 (details not important) problem: estimate x ∈ R2. a 2 : 1 measurement redundancy ratio) actual position is x = (5.95.07) 0 −1.58).5 0 0 y= 2.84 11. 10. measurement is y = (−11. 2.0 0 0 −1.81) Least-squares 5–17 Just enough measurements method y1 and y2 suffice to find x (when v = 0) compute estimate x by inverting top (2 × 2) half of A: ˆ x = Bjey = ˆ (norm of error: 3. Gaussian.81. −9. with measurement noise v:  T k1  kT  y = − 2 x + v T  k3  T k4  where ki is unit vector from 0 to beacon i measurement errors are independent.59.12 0. . .02 −0. . period 1 sec. j = 1. 10 • filtered by system with impulse response h(t): t w(t) = 0 h(t − τ )u(τ ) dτ • sample at 10Hz: yi = w(0.72) • Bje and A† are both left inverses of A • larger entries in B lead to larger estimation error Least-squares 5–19 Example from overview lecture u H(s) w A/D y • signal u is piecewise constant. .44 −0. .04 0.23 −0. i = 1.95 10.47 −0. .Least-squares method compute estimate x by least-squares: ˆ x = A† y = ˆ −0.26 (norm of error: 0. 0 ≤ t ≤ 10: u(t) = xj .18 y= 4.48 0. .1i). 100 ˜ Least-squares 5–20 . . j − 1 ≤ t < j.51 −0. .1i − τ ) dτ • v ∈ R100 is quantization error: vi = Q(˜i) − yi (so |vi| ≤ 0.6 −0. . where • A∈R 100×10 j is given by Aij = j−1 h(0.• 3-bit quantization: yi = Q(˜i). .5 0 1 2 3 4 5 6 7 8 9 10 s(t) 1 0. i = 1.2 0 −0.6 0.4 −0.2 −0. 100. where Q is 3-bit y quantizer characteristic Q(a) = (1/4) (round(4a + 1/2) − 1/2) • problem: estimate x ∈ R10 given y ∈ R100 example: 1 u(t) 0 −1 1.5 0 1 0 −1 1 0 −1 0 1 2 3 4 5 6 7 8 9 10 y(t) w(t) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 t Least-squares 5–21 we have y = Ax + v.8 0.4 0.125) y ˜ least-squares estimate: xls = (AT A)−1AT y u(t) (solid) & u(t) (dotted) ˆ 1 0. .8 −1 0 1 2 3 4 5 6 7 8 9 10 t 5–22 Least-squares . 5.1 0.05 0 1 2 3 4 5 6 7 8 9 10 t • rows show how sampled measurements of y are used to form estimate of xi for i = 2.15 0.15 0.05 0 −0. .15 0. Least-squares 5–23 some rows of Bls = (AT A)−1AT : row 2 0. we mostly use y(t) for 3 ≤ t ≤ 7 Least-squares 5–24 .05 0 −0.1 0.03 10 better than if we had no filtering! (RMS error 0.RMS error is x − xls √ = 0. which is the original input signal for 4 ≤ t < 5.05 0 1 2 3 4 5 6 7 8 9 10 row 8 0. 8 • to estimate x5.05 0 1 2 3 4 5 6 7 8 9 10 row 5 0.1 0.07) more on this later .05 0 −0. . . find linear combination of functions that fits data least-squares fit: choose x to minimize total square fitting error: m (x1f1(si) + · · · + xnfn(si) − gi) i=1 2 Least-squares applications 6–2 . . . . fn : S → R.e. i = 1. . xn ∈ R so that x1f1(si) + · · · + xnfn(si) ≈ gi. . m.. gi). . .EE263 Autumn 2010-11 Stephen Boyd Lecture 6 Least-squares applications • least-squares data fitting • growing sets of regressors • system identification • growing sets of measurements and recursive least-squares 6–1 Least-squares data fitting we are given: • functions f1. . i = 1. . where si ∈ S and (usually) m≫n problem: find coefficients x1. . m i. . . . . called regressors or basis functions • data or measurements (si. . to data (ti.   . . . where Aij = fj (si) • hence. . . least-squares fit is given by x = (AT A)−1AT g (assuming A is skinny. . . full rank) • corresponding function is flsfit(s) = x1f1(s) + · · · + xnfn(s) • applications: – interpolation. .• using matrix notation. approximate model of data Least-squares applications 6–3 Least-squares polynomial fitting problem: fit polynomial of degree < n. . total square fitting error is Ax − g 2. j = 1. smoothing of data – developing simple. . p(t) = a0 + a1t + · · · + an−1tn−1. . i = 1. n j−1 • matrix A has form Aij = ti  n−1 1 t1 t 2 · · · t 1 1  1 t2 t2 · · · tn−1  2 2 A= . extrapolation.  n−1 1 tm t 2 · · · t m m  (called a Vandermonde matrix) Least-squares applications 6–4 . yi). m • basis functions are fj (t) = tj−1. 135. .. i. so p is identically zero.assuming tk = tl for k = l and m ≥ n. . A is full rank: • suppose Aa = 0 • corresponding polynomial p(t) = a0 + · · · + an−1tn−1 vanishes at m points t1.025. . tm • by fundamental theorem of algebra p can have no more than n − 1 zeros. 3. 2. 4 have RMS errors .076.e.005. A full rank Least-squares applications 6–5 Example • fit g(t) = 4t/(1 + 10t2) with polynomial • m = 100 points between t = 0 & t = 1 • least-squares fit for degrees 1. respectively Least-squares applications 6–6 . . . . . and a = 0 • columns of A are independent. 7 0.8 0.1 0.1 p1(t) 0. ap} • regress y on a1. .4 0.6 0. .6 0.5 0. .1 0.9 1 p3(t) 0.3 0.5 0 1 0 0.3 0. get better fit.2 0.9 1 p4(t) 0.1 0.7 0. .2 0.7 0.5 0.8 0.7 0. . ap • project y onto span{a1. . .2 0.3 0.6 0. .4 0.5 0. .8 0. . . .6 0. .9 1 t Least-squares applications 6–7 Growing sets of regressors consider family of least-squares problems minimize for p = 1. ap are called regressors) • approximate y by linear combination of a1.5 0. .2 0.4 0.5 0 0 0.1 0.9 1 p2(t) 0. . . . so optimal residual decreases Least-squares applications 6–8 p i=1 xi ai −y .3 0.5 0 1 0 0. n (a1. .8 0. ap • as p increases.5 0 1 0 0. . .4 0. solution for each p ≤ n is given by −1 xls = (AT Ap)−1AT y = Rp QT y p p p (p) where • Ap = [a1 · · · ap] ∈ Rm×p is the first p columns of A • Ap = QpRp is the QR factorization of Ap • Rp ∈ Rp×p is the leading p × p submatrix of R • Qp = [q1 · · · qp] is the first p columns of Q Least-squares applications 6–9 Norm of optimal residual versus p plot of optimal residual versus p shows how well y can be matched by linear combination of a1... . ap. . ... as function of p residual y x1a1 − y minx1 minx1. .x7 7 i=1 xi ai −y p 0 1 2 3 4 5 6 7 Least-squares applications 6–10 . . h) that minimizes norm of model prediction error e . y (vector u. y (N ) ˆ u(N ) u(N − 1) · · · u(N − n) model prediction error is e = (y(n) − y (n). .    . . .. a least-squares problem (with variables h) Least-squares applications 6–12 . . . y(N ) − y (N )) ˆ ˆ  h0   h1   . .  hn  least-squares identification: choose model (i. . . . hn ∈ R Least-squares applications 6–11 we can write model or predicted output as    y (n) ˆ u(n) u(n − 1) · · · u(0)  y (n + 1)   u(n + 1) u(n) ··· u(1)  ˆ . .   . . . . . = . . . y example with scalar u. . y readily handled): fit I/O data with moving-average (MA) model with n delays y (t) = h0u(t) + h1u(t − 1) + · · · + hnu(t − n) ˆ where h0.e. .Least-squares system identification we measure input u(t) and output y(t) for t = 0. N of unknown system u(t) unknown system y(t) system identification problem: find reasonable model for system based on measured I/O data u. . . . . h7) = (.243.37 Least-squares applications 6–13 5 4 3 2 1 0 −1 −2 −3 −4 0 solid: y(t): actual output dashed: y (t).487. .208. .354. . .024.Example 4 2 u(t) 0 −2 −4 0 10 20 30 t 40 50 60 70 5 y(t) 0 −5 0 10 20 30 t 40 50 60 70 for n = 7 we obtain MA model with (h0. .441) with relative prediction error e / y = 0.418. . . predicted from model ˆ 10 20 30 t 40 50 60 70 Least-squares applications 6–14 .282. 9 0.1 0 0 5 10 15 20 25 n 30 35 40 45 50 difficulty: for n too large the predictive ability of the model on other I/O data (from the same system) becomes worse Least-squares applications 6–16 .7 0.Model order selection question: how large should n be? • obviously the larger n.4 0.2 0.3 0.6 0. the smaller the prediction error on the data used to form the model • suggests using largest possible model order for smallest prediction error Least-squares applications 6–15 1 relative prediction error e / y 0.5 0.8 0. 2 modeling data 0 0 5 10 15 20 25 n 30 35 40 45 50 plot suggests n = 10 is a good choice Least-squares applications 6–18 .6 0.4 validation data 0.Out of sample validation evaluate model predictive performance on another I/O data set not used to develop model model validation data set: 4 2 u(t) ¯ 0 −2 −4 0 10 20 30 t 40 50 60 70 5 y (t) ¯ 0 −5 0 10 20 30 t 40 50 60 70 Least-squares applications 6–17 now check prediction error of models (developed using modeling data) on validation data: 1 relative prediction error 0.8 0. yi corresponds to one measurement ˜ • solution is m −1 m xls = i=1 aiaT ˜ ˜i i=1 yiai ˜ • suppose that ai and yi become available sequentially..e. m increases ˜ with time Least-squares applications 6–20 . i.for n = 50 the actual and predicted outputs on system identification and model validation data are: 5 solid: y(t) dashed: predicted y(t) 0 −5 0 10 20 30 t 40 50 60 70 5 solid: y (t) ¯ dashed: predicted y (t) ¯ 0 −5 0 10 20 30 t 40 50 60 70 loss of predictive ability when n too large is called model overfit or overmodeling Least-squares applications 6–19 Growing sets of measurements least-squares problem in ‘row’ form: m minimize Ax − y 2 = i=1 (˜T x − yi)2 ai where aT are the rows of A (˜i ∈ Rn) ˜i a • x ∈ Rn is some vector to be estimated • each pair ai. P (m + 1) = P (m) + am+1aT ˜ ˜m+1 q(m + 1) = q(m) + ym+1am+1 ˜ • if P (m) is invertible. and P and P + aaT are both invertible ˜˜ • gives an O(n2) method for computing P (m + 1)−1 from P (m)−1 • standard methods for computing P (m + 1)−1 from P (m + 1) are O(n3) Least-squares applications 6–22 . . . . . once P (m) becomes invertible. q(0) = 0 ∈ Rn • for m = 0. . . we have xls(m) = P (m)−1q(m) • P (m) is invertible ⇐⇒ a1. . 1. .Recursive least-squares m −1 m we can compute xls(m) = i=1 aiaT ˜ ˜i i=1 yiai recursively ˜ • initialize P (0) = 0 ∈ Rn×n. it stays invertible) Least-squares applications 6–21 Fast update for recursive least-squares we can calculate P (m + 1)−1 = P (m) + am+1aT ˜ ˜m+1 −1 efficiently from P (m)−1 using the rank one update formula P + aaT ˜˜ −1 = P −1 − 1 (P −1a)(P −1a)T ˜ ˜ 1 + aT P −1a ˜ ˜ valid when P = P T . am span Rn ˜ ˜ (so. Verification of rank one update formula (P + aaT ) P −1 − ˜˜ 1 (P −1a)(P −1a)T ˜ ˜ T P −1 a 1+a ˜ ˜ 1 = I + aaT P −1 − ˜˜ P (P −1a)(P −1a)T ˜ ˜ T P −1 a 1+a ˜ ˜ 1 − aaT (P −1a)(P −1a)T ˜˜ ˜ ˜ T P −1 a 1+a ˜ ˜ aT P −1a ˜ ˜ 1 aaT P −1 − ˜˜ aaT P −1 ˜˜ = I + aaT P −1 − ˜˜ T P −1 a T P −1 a 1+a ˜ ˜ 1+a ˜ ˜ = I Least-squares applications 6–23 . we want Ax − y small.EE263 Autumn 2010-11 Stephen Boyd Lecture 7 Regularized least-squares and Gauss-Newton method • multi-objective least-squares • regularized least-squares • nonlinear least-squares • Gauss-Newton method 7–1 Multi-objective least-squares in many problems we have two (or more) objectives • we want J1 = Ax − y • and also J2 = F x − g (x ∈ Rn is the variable) • usually the objectives are competing • we can make one smaller. g = 0. at the expense of making the other larger common example: F = I. with small x Regularized least-squares and Gauss-Newton method 7–2 2 small small 2 . x(3) • x(3) is worse than x(2) on both counts (J2 and J1) • x(1) is better than x(2) in J2. J1) for every x: J1 x(1) x(2) x(3) J2 note that x ∈ Rn. x(2).Plot of achievable objective pairs plot (J2. F x − g 2) three example choices of x: x(1). J1(x(1)) Regularized least-squares and Gauss-Newton method 7–3 • shaded area shows (J2. J1) not achieved by any x ∈ Rn • boundary of region is called optimal trade-off curve • corresponding x are called Pareto optimal (for the two objectives Ax − y 2. point labeled x(1) is really J2(x(1)). but worse in J1 Regularized least-squares and Gauss-Newton method 7–4 . but this plot is in R2. J1) achieved by some x ∈ Rn • clear area shows (J2. Weighted-sum objective • to find Pareto optimal points. x’s on optimal trade-off curve. correspond to line with slope −µ on (J2..e. J1) plot Regularized least-squares and Gauss-Newton method 7–5 S J1 x(1) x(3) x(2) J1 + µJ2 = α J2 • x(2) minimizes weighted-sum objective for µ shown • by varying µ from 0 to +∞. J1 + µJ2 = α. i. we minimize weighted-sum objective J1 + µJ2 = Ax − y 2 + µ Fx − g 2 • parameter µ ≥ 0 gives relative weight between J1 and J2 • points where weighted sum is constant. can sweep out entire optimal tradeoff curve Regularized least-squares and Gauss-Newton method 7–6 . .Minimizing weighted-sum objective can express weighted-sum objective as ordinary least-squares objective: Ax − y 2 + µ Fx − g 2 = = A √ µF ˜ Ax − y ˜ y= ˜ 2 x− y √ µg 2 where ˜ A= A √ µF . i = 1. y √ µg ˜ hence solution is (assuming A full rank) x = = ˜ ˜ AT A −1 ˜ ˜ AT y −1 AT A + µF T F AT y + µF T g Regularized least-squares and Gauss-Newton method 7–7 Example f • unit mass at rest subject to forces xi for i − 1 < t ≤ i. . . y = aT x where a ∈ R10 • J1 = (y − 1)2 (final position error squared) • J2 = x 2 (sum of squares of forces) 2 weighted-sum objective: (aT x − 1)2 + µ x optimal x: x = aaT + µI Regularized least-squares and Gauss-Newton method −1 a 7–8 . 10 • y ∈ R is position at t = 10. . works for any A (no restrictions on shape.5 −3 J2 = x 2 • upper left corner of optimal trade-off curve corresponds to x = 0 • bottom right corresponds to input that yields y = 1.2 0.9 0. minimizer of weighted-sum objective.5 2 2.5 0. is called regularized least-squares (approximate) solution of Ax ≈ y • also called Tychonov regularization • for µ > 0. i.optimal trade-off curve: 1 0.. g = 0 the objectives are J1 = Ax − y 2.5 3 x 10 3. . J1 = 0 Regularized least-squares and Gauss-Newton method 7–9 Regularized least-squares when F = I. ) Regularized least-squares and Gauss-Newton method 7–10 . x = AT A + µI −1 J2 = x 2 AT y.3 0. .1 0 0 0.8 J1 = (y − 1)2 0.4 0.6 0.7 0.e.5 1 1. rank . • reduces to (linear) least-squares if r(x) = Ax − y Regularized least-squares and Gauss-Newton method 7–12 . model only accurate for x small • regularized solution trades off sensor fit.estimation/inversion application: • Ax − y is sensor residual • prior information: x small • or. size of x Regularized least-squares and Gauss-Newton method 7–11 Nonlinear least-squares nonlinear least-squares (NLLS) problem: find x ∈ Rn that minimizes m r(x) where r : Rn → Rm • r(x) is a vector of ‘residuals’ 2 = i=1 ri(x)2. very hard to solve exactly n 2 = i=1 ri(x)2. unknown but assumed small) • NLLS estimate: choose x to minimize ˆ m m ri(x) = i=1 i=1 2 (ρi − x − bi ) 2 Regularized least-squares and Gauss-Newton method 7–13 Gauss-Newton method for NLLS m NLLS: find x ∈ R that minimizes r(x) r : Rn → Rm • in general. using linearized r until convergence Regularized least-squares and Gauss-Newton method 7–14 . where • many good heuristics to compute locally optimal solution Gauss-Newton method: given starting guess for x repeat linearize r near current guess new guess is linear LS solution.Position estimation from ranges estimate position x ∈ R2 from approximate distances to beacons at locations b1. bm ∈ R2 without linearizing • we measure ρi = x − bi + vi (vi is range error. . . . . we approximate NLLS problem by linear LS problem: r(x) Regularized least-squares and Gauss-Newton method 2 ≈ A(k)x − b(k) 2 7–15 • next iterate solves this linearized LS problem: x(k+1) = A(k)T A(k) −1 A(k)T b(k) • repeat until convergence (which isn’t guaranteed) Regularized least-squares and Gauss-Newton method 7–16 .Gauss-Newton method (more detail): • linearize r near current iterate x(k): r(x) ≈ r(x(k)) + Dr(x(k))(x − x(k)) where Dr is the Jacobian: (Dr)ij = ∂ri/∂xj • write linearized approximation as r(x(k)) + Dr(x(k))(x − x(k)) = A(k)x − b(k) A(k) = Dr(x(k)). b(k) = Dr(x(k))x(k) − r(x(k)) • at kth iteration. 2).Gauss-Newton example • 10 beacons • + true position (−3.2) • range estimates accurate to ±0. ♦ initial guess (1. 3. −1.6.2. objective would be nice quadratic ‘bowl’ • bumps in objective due to strong nonlinearity of r Regularized least-squares and Gauss-Newton method 7–18 .5 5 4 3 2 1 0 −1 −2 −3 −4 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5 Regularized least-squares and Gauss-Newton method 7–17 NLLS objective r(x) 2 versus x: 16 14 12 10 8 6 4 2 5 0 5 0 −5 −5 0 • for a linear LS problem. objective of Gauss-Newton iterates: 12 10 8 r(x) 2 6 4 2 0 1 2 3 4 5 6 7 8 9 10 iteration • x(k) converges to (in this case. 3.31 ˆ (substantially smaller than range accuracy!) Regularized least-squares and Gauss-Newton method 7–20 . global) minimum of r(x) • convergence takes only five or so steps Regularized least-squares and Gauss-Newton method 7–19 2 • final estimate is x = (−3.3) ˆ • estimation error is x − x = 0.3. linearized model still pretty accurate) Regularized least-squares and Gauss-Newton method 7–22 .convergence of Gauss-Newton iterates: 5 4 4 56 3 3 2 2 1 0 −1 1 −2 −3 −4 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5 Regularized least-squares and Gauss-Newton method 7–21 useful varation on Gauss-Newton: add regularization term A(k)x − b(k) 2 + µ x − x(k) 2 so that next iterate is not too far from previous one (hence. • there are more variables than equations • x is underspecified. i.e. Axp = y Least-norm solutions of undetermined equations 8–2 . many choices of x lead to the same y we’ll assume that A is full rank (m). there is a solution set of all solutions has form { x | Ax = y } = { xp + z | z ∈ N (A) } where xp is any (‘particular’) solution. i.EE263 Autumn 2010-11 Stephen Boyd Lecture 8 Least-norm solutions of undetermined equations • least-norm solution of underdetermined equations • minimum norm solutions via QR factorization • derivation via Lagrange multipliers • relation to regularized least-squares • general norm minimization with equality constraints 8–1 Underdetermined linear equations we consider y = Ax where A ∈ Rm×n is fat (m < n).e.e.. so for each y ∈ Rm.. i.. e. xln is solution of optimization problem minimize x subject to Ax = y (with variable x ∈ Rn) Least-norm solutions of undetermined equations 8–4 ..• z characterizes available choices in solution • solution has dim N (A) = n − m ‘degrees of freedom’ • can choose z to satisfy other specs or optimize among solutions Least-norm solutions of undetermined equations 8–3 Least-norm solution one particular solution is xln = AT (AAT )−1y (AAT is invertible since A full rank) in fact. xln is the solution of y = Ax that minimizes x i. . so A(x − xln) = 0 and (x − xln)T xln = (x − xln)T AT (AAT )−1y = (A(x − xln)) (AAT )−1y = 0 i.e.suppose Ax = y. xln has smallest norm of any solution Least-norm solutions of undetermined equations 8–5 { x | Ax = y } N (A) = { x | Ax = 0 } xln • orthogonality condition: xln ⊥ N (A) • projection interpretation: xln is projection of 0 on solution set { x | Ax = y } Least-norm solutions of undetermined equations 8–6 .. so x 2 T = xln + x − xln 2 = xln 2 + x − xln 2 ≥ xln 2 i. (x − xln) ⊥ xln.e. skinny matrix A: • A† = (AT A)−1AT • (AT A)−1AT is a left inverse of A • A(AT A)−1AT gives projection onto R(A) Least-norm solutions of undetermined equations 8–7 Least-norm solution via QR factorization find QR factorization of AT . fat A • AT (AAT )−1 is a right inverse of A • I − AT (AAT )−1A gives projection onto N (A) cf. nonsingular then • xln = AT (AAT )−1y = QR−T y • xln = R−T y Least-norm solutions of undetermined equations 8–8 . QT Q = Im • R ∈ Rm×m upper triangular.e. analogous formulas for full rank. i. AT = QR.. with • Q ∈ Rn×m.• A† = AT (AAT )−1 is called the pseudo-inverse of full rank. . • from first condition.. . i.e. 10 • y1 is position at t = 10.Derivation via Lagrange multipliers • least-norm solution solves optimization problem minimize xT x subject to Ax = y • introduce Lagrange multipliers: L(x. i = 1. λ) = xT x + λT (Ax − y) • optimality conditions are ∇xL = 2x + AT λ = 0. 0) Least-norm solutions of undetermined equations 8–10 . x = −AT λ/2 • substitute into second to get λ = −2(AAT )−1y • hence x = AT (AAT )−1y Least-norm solutions of undetermined equations 8–9 ∇λL = Ax − y = 0 Example: transferring mass unit distance f • unit mass at rest subject to forces xi for i − 1 < t ≤ i. . y2 is velocity at t = 10 • y = Ax where A ∈ R2×10 (A is fat) • find least norm force that transfers mass unit distance with zero final velocity. y = (1. . 04 −0.06 0.4 0.0..e. fat A) Least-norm solutions of undetermined equations 8–12 −1 AT → AT AAT −1 .6 0.15 0.02 0 −0. regularized solution converges to least-norm solution as µ → 0 • in matrix terms: as µ → 0.05 0 0 2 4 6 8 10 12 t Least-norm solutions of undetermined equations 8–11 Relation to regularized least-squares • suppose A ∈ Rm×n is fat.2 0 0 2 4 6 8 10 12 0. AT A + µI (for full rank.04 xln 0.06 0 2 4 6 8 10 12 1 t position 0.02 −0.8 0.2 t velocity 0. i. full rank • define J1 = Ax − y 2.1 0. J2 = x 2 • least-norm solution minimizes J2 with J1 = 0 • minimizer of weighted-sum objective J1 + µJ2 = Ax − y xµ = AT A + µI −1 2 +µ x 2 is AT y • fact: xµ → xln as µ → 0. General norm minimization with equality constraints consider problem minimize Ax − b subject to Cx = d with variable x • includes least-squares and least-norm problems as special cases • equivalent to minimize (1/2) Ax − b subject to Cx = d • Lagrangian is L(x. we have x λ = AT A C T C 0 −1 AT b d Least-norm solutions of undetermined equations 8–14 . λ) = (1/2) Ax − b 2 2 + λT (Cx − d) = (1/2)xT AT Ax − bT Ax + (1/2)bT b + λT Cx − λT d Least-norm solutions of undetermined equations 8–13 • optimality conditions are ∇xL = AT Ax − AT b + C T λ = 0. • write in block matrix form as AT A C T C 0 x λ = AT b d ∇λL = Cx − d = 0 • if the block matrix is invertible. we can derive a more explicit (and complicated) formula for x • from first block equation we get x = (AT A)−1(AT b − C T λ) • substitute into Cx = d to get C(AT A)−1(AT b − C T λ) = d so λ = C(AT A)−1C T −1 C(AT A)−1AT b − d • recover x from equation above (not pretty) x = (AT A)−1 AT b − C T C(AT A)−1C T −1 C(AT A)−1AT b − d Least-norm solutions of undetermined equations 8–15 .if AT A is invertible. EE263 Autumn 2010-11 Stephen Boyd Lecture 9 Autonomous linear dynamical systems • autonomous linear dynamical systems • examples • higher order systems • linearization near equilibrium point • linearization along trajectory 9–1 Autonomous linear dynamical systems continuous-time autonomous LDS has form x = Ax ˙ • x(t) ∈ Rn is called the state • n is the state dimension or (informally) the number of states • A is the dynamics matrix (system is time-invariant if A doesn’t depend on t) Autonomous linear dynamical systems 9–2 . picture (phase plane): x2 x(t) = Ax(t) ˙ x(t) x1 Autonomous linear dynamical systems 9–3 example 1: x = ˙ 2 −1 0 2 1 x 1.5 1 1.5 0 −0.5 −1 −0.5 −1 −1.5 1 0.5 2 Autonomous linear dynamical systems 9–4 .5 0 0.5 −2 −2 −1. 5 1 0.5 0 0.5 −1 −1.5 1 −1 0.5 2 Autonomous linear dynamical systems 9–5 Block diagram block diagram representation of x = Ax: ˙ n x(t) ˙ A n x(t) 1/s • 1/s block represents n parallel scalar integrators • coupling comes from dynamics matrix A Autonomous linear dynamical systems 9–6 .5 −2 −2 −1.5 1 1.5 x 1.example 2: x = ˙ 2 −0.5 0 −0.5 −1 −0. dt ic vl =F vc il C = diag(C1. .useful when A has structure. Cp). . . Autonomous linear dynamical systems L = diag(L1. . dt L dil = vl. block upper triangular: x= ˙ A11 A12 0 A22 x 1/s x1 A11 A12 1/s A22 here x1 doesn’t affect x2 at all Autonomous linear dynamical systems 9–7 x2 Linear circuit ic1 vc1 C1 icp vcp Cp linear static circuit il1 L1 ilr Lr vlr vl1 circuit equations are C dvc = ic. . e.. Lr ) 9–8 . .g. . . A is usually sparse Autonomous linear dynamical systems 9–10 . xi is concentration of chemical i • linear model of reaction kinetics dxi = ai1x1 + · · · + ainxn dt • good model for some reactions.with state x = vc il . we have C −1 0 0 L−1 x= ˙ Fx Autonomous linear dynamical systems 9–9 Chemical reactions • reaction involving n chemicals. 3 0.1 0 t Autonomous linear dynamical systems 9–11 Finite-state discrete-time Markov chain z(t) ∈ {1. p(t) =  Prob(z(t) = n) (so. . e.4 0.1 2 Example: series reaction A −→ B −→ C with linear dynamics k k  −k1 0 0 x =  k1 −k2 0  x ˙ 0 k2 0 plot for k1 = k2 = 1.  .5 0.2 x2 0 1 2 3 4 5 6 7 8 9 10 0.6 x(t) 0. 2. initial x(0) = (1. . or 3) = [1 1 1 0 · · · 0]p(t)) then we have p(t + 1) = P p(t) Autonomous linear dynamical systems 9–12  .g. n} is a random sequence with Prob( z(t + 1) = i | z(t) = j ) = Pij where P ∈ Rn×n is the matrix of transition probabilities can represent probability distribution of z(t) as n-vector  Prob(z(t) = 1) . 0) 1 0. .9  0. 0..8 0. .7 x1 x3 0. Prob(z(t) = 1. P is often sparse.0 3 1 0.0 p(t + 1) =  0.9 1.1 0.7 1.1  0.9 0.1 0  p(t) 0 0.2 0 Autonomous linear dynamical systems 9–14  .1 2 0.2 • state 1 is ‘system OK’ • state 2 is ‘system down’ • state 3 is ‘system being repaired’ 0. Markov chain is depicted graphically • nodes are states • edges show transition probabilities Autonomous linear dynamical systems 9–13 example: 0.7 0. 0 0 0 A0 A1 A 2 ··· ··· 0 0 . .   . z= . I  x(t) ∈ Rn ··· · · · Ak−1   z   a (first order) LDS (with bigger state) Autonomous linear dynamical systems 9–16 . so define new variable z =  . starting at x(0) = x0. (k−1) x   x(1)  . x(0) = x0 ˙ suppose h is small time step (x doesn’t change much in h seconds) simple (‘forward Euler’) approximation: x(t + h) ≈ x(t) + hx(t) = (I + hA)x(t) ˙ by carrying out this recursion (discrete-time LDS). = ˙   x(k)   0 I 0 0 0 I . we get approximation x(kh) ≈ (I + hA)k x(0) (forward Euler is never used in practice) Autonomous linear dynamical systems 9–15 Higher order linear dynamical systems x(k) = Ak−1x(k−1) + · · · + A1x(1) + A0x.Numerical integration of continuous system compute approximate solution of x = Ax. where x(m) denotes mth derivative   x  x(1)   ∈ Rnk . . block diagram: (k−1) 1/s x (k−2) 1/s x x (k) 1/s x A0 Ak−1 Ak−2 Autonomous linear dynamical systems 9–17 Mechanical systems mechanical system with k degrees of freedom undergoing small motions: M q + Dq + Kq = 0 ¨ ˙ • q(t) ∈ Rk is the vector of generalized displacements • M is the mass matrix • K is the stiffness matrix • D is the damping matrix with state x = q q ˙ x= ˙ we have q ˙ q ¨ 0 −M −1K I −M −1D = x Autonomous linear dynamical systems 9–18 . Linearization near equilibrium point nonlinear.. i.e. f (xe) = 0 (so x(t) = xe satisfies DE) now suppose x(t) is near xe. so x(t) = f (x(t)) ≈ f (xe) + Df (xe)(x(t) − xe) ˙ Autonomous linear dynamical systems 9–19 with δx(t) = x(t) − xe. rewrite as ˙ δx(t) ≈ Df (xe)δx(t) replacing ≈ with = yields linearized approximation of DE near xe ˙ we hope solution of δx = Df (xe)δx is a good approximation of x − xe (more later) Autonomous linear dynamical systems 9–20 . time-invariant differential equation (DE): x = f (x) ˙ where f : Rn → Rn suppose xe is an equilibrium point. example: pendulum l θ m mg ¨ 2nd order nonlinear DE ml2θ = −lmg sin θ rewrite as first order DE with state x = θ ˙ θ : x= ˙ x2 −(g/l) sin x1 Autonomous linear dynamical systems 9–21 equilibrium point (pendulum down): x = 0 linearized system near xe = 0: ˙ δx = 0 1 −g/l 0 δx Autonomous linear dynamical systems 9–22 . 5 0.Does linearization ‘work’ ? the linearized system usually.4 0.45 0. solutions are constant example 2: z = z 3 near ze = 0 ˙ for z(0) > 0 solutions have form z(t) = z(0)−2 − 2t (finite escape time at t = z(0)−2/2) ˙ linearized system is δz = 0.35 0.3 0. solutions are constant Autonomous linear dynamical systems 9–23 −1/2 −1/2 0. but not always.2 0.1 0.25 0. gives a good idea of the system behavior near xe example 1: x = −x3 near xe = 0 ˙ for x(0) > 0 solutions have form x(t) = x(0)−2 + 2t ˙ linearized system is δx = 0.15 0.05 0 0 z(t) δx(t) = δz(t) x(t) 10 20 30 40 50 60 70 80 90 100 t • systems with very different behavior have same linearized system • linearized systems do not predict qualitative behavior of either system Autonomous linear dynamical systems 9–24 . e. t) − f (xtraj. t) ≈ Dxf (xtraj.. i. so linearized system is called T -periodic linear system. and is near ˙ xtraj(t) • then d (x − xtraj) = f (x. t) ˙ • suppose x(t) is another trajectory. t)(x − xtraj) dt • (time-varying) LDS ˙ δx = Dxf (xtraj.Linearization along trajectory • suppose xtraj : R+ → Rn satisfies xtraj(t) = f (xtraj(t). t)δx is called linearized or variational system along trajectory xtraj Autonomous linear dynamical systems 9–25 example: linearized oscillator suppose xtraj(t) is T -periodic solution of nonlinear DE: xtraj(t) = f (xtraj(t)). t). x(t) = f (x(t). used to study: • startup dynamics of clock and oscillator circuits • effects of power supply and other disturbances on clock behavior Autonomous linear dynamical systems 9–26 . ˙ xtraj(t + T ) = xtraj(t) linearized system is where A(t) = Df (xtraj(t)) ˙ δx = A(t)δx A(t) is T -periodic. where Z : D ⊆ C → Cp×q is defined by ∞ Z(s) = 0 e−stz(t) dt • integral of matrix is done term-by-term • convention: upper case denotes Laplace transform • D is the domain or region of convergence of Z • D includes at least {s | ℜs > a}. q Solution via Laplace transform and matrix exponential 10–2 . . i = 1. . . . . .EE263 Autumn 2010-11 Stephen Boyd Lecture 10 Solution via Laplace transform and matrix exponential • Laplace transform • solving x = Ax via Laplace transform ˙ • state transition matrix • matrix exponential • qualitative behavior and stability 10–1 Laplace transform of matrix valued function suppose z : R+ → Rp×q Laplace transform: Z = L(z). j = 1. . p. where a satisfies |zij (t)| ≤ αeat for t ≥ 0. . Derivative property L(z) = sZ(s) − z(0) ˙ to derive. where x(t) ∈ Rn • take Laplace transform: sX(s) − x(0) = AX(s) • rewrite as (sI − A)X(s) = x(0) • hence X(s) = (sI − A)−1x(0) • take inverse transform x(t) = L−1 (sI − A)−1 x(0) Solution via Laplace transform and matrix exponential 10–4 . integrate by parts: ∞ L(z)(s) = ˙ 0 e−stz(t) dt ˙ ∞ = e−stz(t) t→∞ t=0 +s 0 e−stz(t) dt = sZ(s) − z(0) Solution via Laplace transform and matrix exponential 10–3 Laplace transform solution of x = Ax ˙ consider continuous-time time-invariant (TI) LDS x = Ax ˙ for t ≥ 0. .5 0 0. i. s such that det(sI − A) = 0 • Φ(t) = L−1 (sI − A)−1 is called the state-transition matrix.5 −1 −1. it maps the initial state to the state at time t: x(t) = Φ(t)x(0) (in particular.5 2 Solution via Laplace transform and matrix exponential 10–6 .5 −2 −2 −1. state x(t) is a linear function of initial state x(0)) Solution via Laplace transform and matrix exponential 10–5 Example 1: Harmonic oscillator 0 1 −1 0 x= ˙ x 2 1.Resolvent and state transition matrix • (sI − A)−1 is called the resolvent of A • resolvent defined for s ∈ C except eigenvalues of A.5 1 1.5 0 −0.5 −1 −0.5 1 0.e. 5 −2 −2 −1.5 −1 −0.5 0 −0.sI − A = s −1 1 s .5 1 0.5 2 Solution via Laplace transform and matrix exponential 10–8 .5 −1 −1.5 1 1.5 0 0. so resolvent is s s2 +1 −1 s2 +1 1 s2 +1 s s2 +1 (sI − A) (eigenvalues are ±i) state transition matrix is Φ(t) = L −1 −1 = s s2 +1 −1 s2 +1 1 s2 +1 s s2 +1 = cos t sin t − sin t cos t a rotation matrix (−t radians) so we have x(t) = cos t sin t − sin t cos t x(0) Solution via Laplace transform and matrix exponential 10–7 Example 2: Double integrator 0 1 0 0 x= ˙ x 2 1. so resolvent is 1 s 1 s2 1 s (sI − A)−1 = (eigenvalues are 0.. so eigenvalues are either real or occur in conjugate pairs • there are n eigenvalues (if we count multiplicity as roots of X ) Solution via Laplace transform and matrix exponential 10–10 .sI − A = s −1 0 s . sn) coefficient one • roots of X are the eigenvalues of A • X has real coefficients. 0) state transition matrix is Φ(t) = L −1 1 s 1 s2 1 s 0 0 = 1 t 0 1 so we have x(t) = 1 t 0 1 x(0) Solution via Laplace transform and matrix exponential 10–9 Characteristic polynomial X (s) = det(sI − A) is called the characteristic polynomial of A • X (s) is a polynomial of degree n. with leading (i.e. j entry of resolvent has form fij (s)/X (s) where fij is polynomial with degree less than n • poles of entries of resolvent must be eigenvalues of A • but not all eigenvalues of A show up as poles of each entry (when there are cancellations between det ∆ij and X (s)) Solution via Laplace transform and matrix exponential 10–11 Matrix exponential (I − C)−1 = I + C + C 2 + C 3 + · · · (if series converges) • series expansion of resolvent: (sI − A)−1 = (1/s)(I − A/s)−1 = (valid for |s| large enough) so Φ(t) = L−1 (sI − A)−1 = I + tA + Solution via Laplace transform and matrix exponential I A A2 + + 3 + ··· s s2 s (tA)2 + ··· 2! 10–12 . so i.Eigenvalues of A and poles of resolvent i. j entry of resolvent can be expressed via Cramer’s rule as (−1)i+j det ∆ij det(sI − A) where ∆ij is sI − A with jth row and ith column deleted • det ∆ij is a polynomial of degree less than n. with A ∈ Rn×n and constant. • define matrix exponential as eM = I + M + M2 + ··· 2! for M ∈ Rn×n (which in fact converges for all M ) • with this definition. .• looks like ordinary power series e at (ta)2 = 1 + ta + + ··· 2! with square matrices instead of scalars . with a ∈ R and constant. is ˙ x(t) = etAx(0) generalizes scalar case: solution of x = ax. . is ˙ x(t) = etax(0) Solution via Laplace transform and matrix exponential 10–14 . state-transition matrix is Φ(t) = L−1 (sI − A)−1 = etA Solution via Laplace transform and matrix exponential 10–13 Matrix exponential solution of autonomous LDS solution of x = Ax. • matrix exponential is meant to look like scalar exponential • some things you’d guess hold for the matrix exponential (by analogy with the scalar exponential) do in fact hold • but many things you’d guess are wrong example: you might guess that eA+B = eAeB .84 −0. we do have eA+B = eAeB if AB = BA.40 −0.16 1. A and B commute thus for t. B= eA = eA+B = 0.84 0.16 = eAeB = 0. e(tA+sA) = etAesA with s = −t we get etAe−tA = etA−tA = e0 = I so etA is nonsingular.54 1. s ∈ R.70 0.e. with inverse etA −1 = e−tA Solution via Laplace transform and matrix exponential 10–16 . eB = 0. but it’s false (in general) 0 1 −1 0 0 1 0 0 1 1 0 1 A= .30 Solution via Laplace transform and matrix exponential 10–15 however.54 .. i.54 0.38 −0.84 −0. example: let’s find eA. plugging in t = 1. where A = we already found 0 1 0 0 etA = L−1(sI − A)−1 = 1 1 0 1 1 t 0 1 so. for any t and τ . we get eA = let’s check power series: A2 e =I +A+ + ··· = I + A 2! A since A2 = A3 = · · · = 0 Solution via Laplace transform and matrix exponential 10–17 Time transfer property for x = Ax we know ˙ x(t) = Φ(t)x(0) = etAx(0) interpretation: the matrix etA propagates initial condition into state at time t more generally we have. x(τ + t) = etAx(τ ) (to see this. apply result above to z(t) = x(t + τ )) interpretation: the matrix etA propagates state t seconds forward in time (backward if t < 0) Solution via Laplace transform and matrix exponential 10–18 . a discrete-time LDS (called discretized version of continuous-time system) Solution via Laplace transform and matrix exponential 10–20 . for small t: x(τ + t) ≈ x(τ ) + tx(τ ) = (I + tA)x(τ ) ˙ • exact solution is x(τ + t) = etAx(τ ) = (I + tA + (tA)2/2! + · · ·)x(τ ) • forward Euler is just first two terms in series Solution via Laplace transform and matrix exponential 10–19 Sampling a continuous-time system suppose x = Ax ˙ sample x at times t1 ≤ t2 ≤ · · ·: define z(k) = x(tk ) then z(k + 1) = e(tk+1−tk )Az(k) for uniform sampling tk+1 − tk = h.• recall first order (forward Euler) approximate state update. so z(k + 1) = ehAz(k). . and denoted Φ(t)) Solution via Laplace transform and matrix exponential 10–21 Qualitative behavior of x(t) suppose x = Ax. X(s) = (sI − A)−1x(0) ith component Xi(s) has form Xi(s) = where ai is a polynomial of degree < n thus the poles of Xi are all eigenvalues of A (but not necessarily the other way around) ai(s) X (s) Solution via Laplace transform and matrix exponential 10–22 .Piecewise constant system consider time-varying LDS x = A(t)x. ti+1] we have x(t) = e(t−ti)Ai · · · e(t3−t2)A2 e(t2−t1)A1 et1A0 x(0) (matrix on righthand side is called state transition matrix for system. where 0 < t1 < t2 < · · · (sometimes called jump linear system) for t ∈ [ti. x(t) ∈ Rn ˙ then x(t) = etAx(0). with ˙   A0 0 ≤ t < t1 A1 t1 ≤ t < t2 A(t) =  . first assume eigenvalues λi are distinct. or exponential decay rate (if < 0) of term • ℑλj gives frequency of oscillatory term (if = 0) eigenvalues ℑs ℜs Solution via Laplace transform and matrix exponential 10–24 . so Xi(s) cannot have repeated poles then xi(t) has form xi(t) = j=1 n βij eλj t where βij depend on x(0) (linearly) eigenvalues determine (possible) qualitative behavior of x: • eigenvalues give exponents that can occur in exponentials • real eigenvalue λ corresponds to an exponentially decaying or growing term eλt in solution • complex eigenvalue λ = σ + jω corresponds to decaying or growing sinusoidal term eσt cos(ωt + φ) in solution Solution via Laplace transform and matrix exponential 10–23 • ℜλj gives exponential growth rate (if > 0). . . . . respectively (n1 + · · · + nr = n) then xi(t) has form xi(t) = r pij (t)eλj t j=1 where pij (t) is a polynomial of degree < nj (that depends linearly on x(0)) Solution via Laplace transform and matrix exponential 10–25 Stability we say system x = Ax is stable if etA → 0 as t → ∞ ˙ meaning: • state x(t) converges to 0. n Solution via Laplace transform and matrix exponential 10–26 . no matter what x(0) is • all trajectories of x = Ax converge to 0 as t → ∞ ˙ fact: x = Ax is stable if and only if all eigenvalues of A have negative real ˙ part: ℜλi < 0. . . . . λr (distinct) with multiplicities n1. . i = 1. nr .now suppose A has repeated eigenvalues. . . as t → ∞. . so Xi can have repeated poles express eigenvalues as λ1. maxi ℜλi determines the maximum asymptotic logarithmic growth rate of x(t) (or decay.the ‘if’ part is clear since t→∞ lim p(t)eλt = 0 for any polynomial. if ℜλ < 0 we’ll see the ‘only if’ part next lecture more generally. if < 0) Solution via Laplace transform and matrix exponential 10–27 . i.EE263 Autumn 2010-11 Stephen Boyd Lecture 11 Eigenvectors and diagonalization • eigenvectors • dynamic interpretation: invariant sets • complex eigenvectors & invariant planes • left eigenvectors • diagonalization • modal form • discrete-time stability 11–1 Eigenvectors and eigenvalues λ ∈ C is an eigenvalue of A ∈ Cn×n if X (λ) = det(λI − A) = 0 equivalent to: • there exists nonzero v ∈ Cn s.e. i.t. (λI − A)v = 0.t.. wT A = λwT any such w is called a left eigenvector of A Eigenvectors and diagonalization 11–2 . wT (λI − A) = 0.e. Av = λv any such v is called an eigenvector of A (associated with eigenvalue λ) • there exists nonzero w ∈ Cn s.. λ ∈ R. effect of A on v is very simple: scaling by λ Ax v x Av (what is λ here?) Eigenvectors and diagonalization 11–4 . with A ∈ Rn×n. then so is αv. for any α ∈ C. and v ∈ Cn. if they are nonzero (and at least one is) • conjugate symmetry : if A is real and v ∈ Cn is an eigenvector associated with λ ∈ C. . we’ll consider λ ∈ C later) if v is an eigenvector. Eigenvectors and diagonalization 11–3 Scaling interpretation (assume λ ∈ R for now. then Aℜv = λℜv. then v is an eigenvector associated with λ: taking conjugate of Av = λv we get Av = λv. we can always find a real eigenvector v associated with λ: if Av = λv. Aℑv = λℑv so ℜv and ℑv are real eigenvectors. so Av = λv we’ll assume A is real from now on . α = 0 • even when A is real. .• if v is an eigenvector of A with eigenvalue λ. eigenvalue λ and eigenvector v can be complex • when A and λ are real. .g.. ) Eigenvectors and diagonalization 11–5 Dynamic interpretation suppose Av = λv. . then x(t) = eλtv ˙ several ways to see this.• λ ∈ R. λ < 0: v and Av point in opposite directions • λ ∈ R.and discrete-time systems. e. |λ| < 1: Av smaller than v • λ ∈ R. |λ| > 1: Av larger than v (we’ll see later how this relates to stability of continuous. λ > 0: v and Av point in same direction • λ ∈ R. x(t) = etAv = I + tA + (tA)2 + ··· v 2! (λt)2 = v + λtv + v + ··· 2! = eλtv (since (tA)k v = (λt)k v) Eigenvectors and diagonalization 11–6 . v = 0 if x = Ax and x(0) = v. solution is complex (we’ll interpret later). for now.• for λ ∈ C. mode contracts or shrinks as t ↑ • for λ ∈ R.e. mode expands or grows as t ↑ Eigenvectors and diagonalization 11–7 Invariant sets a set S ⊆ Rn is invariant under x = Ax if whenever x(t) ∈ S. resulting motion is very simple — always on the line spanned by v • solution x(t) = eλtv is called mode of system x = Ax (associated with ˙ eigenvalue λ) • for λ ∈ R. λ > 0. assume λ∈R • if initial state is an eigenvector v. it stays in S trajectory S vector field interpretation: trajectories only cut into S. λ < 0. never out Eigenvectors and diagonalization 11–8 . then ˙ x(τ ) ∈ S for all τ ≥ t i.: once trajectory enters S. a = α + jβ vre vim cos ωt sin ωt − sin ωt cos ωt α −β • σ gives logarithmic growth/decay factor • trajectory stays in invariant plane span{vre. λ ∈ R • line { tv | t ∈ R } is invariant (in fact. vim} • ω gives angular velocity of rotation in plane Eigenvectors and diagonalization 11–10 .suppose Av = λv. λ = σ + jω. ray { tv | t > 0 } is invariant) • if λ < 0. v = 0. λ is complex for a ∈ C. (complex) trajectory aeλtv satisfies x = Ax ˙ hence so does (real) trajectory x(t) = ℜ aeλtv = eσt where v = vre + jvim. line segment { tv | 0 ≤ t ≤ a } is invariant Eigenvectors and diagonalization 11–9 Complex eigenvectors suppose Av = λv. v = 0. g. λ < 0. e.Dynamic interpretation: left eigenvectors suppose wT A = λwT . remains on line or in plane) • left eigenvectors give linear functions of state that are simple. w = 0 d T (w x) = wT x = wT Ax = λ(wT x) ˙ dt i.. for any initial condition Eigenvectors and diagonalization 11–12 .. λ ∈ R.e. wT x is simple then • for λ = σ + jω ∈ C..e. halfspace { z | wT z ≤ a } is invariant (for a ≥ 0) Eigenvectors and diagonalization 11–11 Summary • right eigenvectors are initial conditions from which resulting motion is simple (i. wT x satisfies the DE d(wT x)/dt = λ(wT x) hence wT x(t) = eλtwT x(0) • even if trajectory x is complicated. (ℜw)T x and (ℑw)T x both have form eσt (α cos(ωt) + β sin(ωt)) • if. 5 t 3 3.5 t 3 3.5 2 2.5 4 4.5 2 2.5 0 0.5 1 1.5 5 x3 0.5 5 x2 0 −0.5 0.5 −1 0 1 0. −1 −10 −10 0 0 x example 1: x =  1 ˙ 0 1 0 block diagram: x1 −1  1/s 1/s x2 −10 1/s x3 −10 X (s) = s3 + s2 + 10s + 10 = (s + 1)(s2 + 10) √ eigenvalues are −1.5 t 3 3. ± j 10 Eigenvectors and diagonalization 11–13 trajectory with x(0) = (0.5 0 −0.5 1 1.5 5 Eigenvectors and diagonalization 11–14 .5 4 4. 1): 2 x1 1 0 −1 0 0.5 4 4.5 2 2.5 1 1. −1. 8 0.554 + j0.077 so an invariant plane is spanned by  −0.left eigenvector asssociated with eigenvalue −1 is  0. 0.6  gT x 0.7 0.5 2 2.055   0.2 0.244  .5 3 3.9 0.175  0.5 5 t Eigenvectors and diagonalization 11–15 √ eigenvector associated with eigenvalue j 10 is  −0.055 − j0. −1.4 0.5 0.771 =  0.244 + j0.1 0 0 0.3 0.771 v =  0.175  −0.5 1 1.1 g= 0  1 let’s check g T x(t) when x(0) = (0. 1) (as above): 1 0.554 vre =  0.5 4 4.077   vim Eigenvectors and diagonalization 11–16 . .5 0 0.5 1 1.5 2 2.5 1 1.e.5 5 Eigenvectors and diagonalization 11–17 Example 2: Markov chain probability distribution satisfies p(t + 1) = P p(t) pi(t) = Prob( z(t) = i ) so n i=1 pi (t) =1 n i=1 Pij Pij = Prob( z(t + 1) = i | z(t) = j ).for example.5 0.5 t 3 3. so (such matrices are called stochastic) rewrite as: =1 [1 1 · · · 1]P = [1 1 · · · 1] i.1 0.5 1 1.5 2 2. 1 hence det(I − P ) = 0. i. [1 1 · · · 1] is a left eigenvector of P with e.5 t 3 3.v.5 t 3 3. with x(0) = vre we have 1 x1 0 −1 0 0.5 2 2.5 4 4.5 4 4. hence we can n normalize v so that i=1 vi = 1 interpretation: v is an equilibrium distribution. if p(0) = v then p(t) = v for all t ≥ 0 (if v is unique it is called the steady-state distribution of the Markov chain) Eigenvectors and diagonalization 11–18 .5 4 4.1 0 0..5 5 x3 0 −0. so there is a right eigenvector v = 0 with P v = v it can be shown that v can be chosen so that vi ≥ 0.5 5 x2 0 −0.e. . . . i = 1. . so AT = T Λ and finally T −1AT = Λ Eigenvectors and diagonalization 11–19 • similarity transformation by T diagonalizes A conversely if there is a T = [v1 · · · vn] s. λn   A v1 · · · vn = v1 · · · vn define T = v1 · · · vn and Λ = diag(λ1..e. . . λn). vn linearly independent T −1AT = Λ = diag(λ1. . T −1AT = Λ is diagonal • A has a set of n linearly independent eigenvectors (if A is not diagonalizable. . . . • T invertible since v1. λn) then AT = T Λ. i. it is sometimes called defective) Eigenvectors and diagonalization 11–20 .Diagonalization suppose v1. . Avi = λivi. i = 1.. . vn is a linearly independent set of n eigenvectors of A we say A is diagonalizable if • there exists T s. . . . . vn is a linearly independent set of eigenvectors of A ∈ Rn×n: Avi = λivi. . .. .t. n express as   λ1 . . . . . n so v1. . . . .t. . Not all matrices are diagonalizable example: A = 0 1 0 0 characteristic polynomial is X (s) = s2.. so λ = 0 is only eigenvalue eigenvectors satisfy Av = 0v = 0. λi = λj for i = j. A cannot have two independent eigenvectors Eigenvectors and diagonalization 11–21 Distinct eigenvalues fact: if A has distinct eigenvalues.e. i. then A is diagonalizable (the converse is false — A can have repeated eigenvalues but still be diagonalizable) Eigenvectors and diagonalization 11–22 . 0 1 0 0 v1 v2 =0 so all eigenvectors have form v = v1 0 where v1 = 0 thus.e. i. normalized so that T wi vj = δij (i.  . T T wn wn  T T where w1 .Diagonalization and left eigenvectors rewrite T −1AT = Λ as T −1A = ΛT −1. so ˜ ˙ T x = AT x ˜ ˜ ⇔ ˙ x = T −1AT x ˜ ˜ ⇔ ˙ x = Λ˜ ˜ x Eigenvectors and diagonalization 11–24 . indep. . the rows of T −1 are (lin.. wn are the rows of T −1 thus T T wi A = λ i w i i. A = Λ . .) left eigenvectors.e. . . or   T  T w1 w1  .. . left & right eigenvectors chosen this way are dual bases) Eigenvectors and diagonalization 11–23 Modal form suppose A is diagonalizable by T define new coordinates by x = T x.e. . . system is diagonal (decoupled): 1/s λ1 x1 ˜ 1/s λn xn ˜ trajectories consist of n independent modes. Mr+3. and Mi = σi ωi −ωi σi . i. system can be put in real modal form: S −1AS = diag (Λr .e. λr ) are the real eigenvalues. r + 3. . i = r + 1. .. xi(t) = eλitxi(0) ˜ ˜ hence the name modal form Eigenvectors and diagonalization 11–25 Real modal form when eigenvalues (hence T ) are complex. λi = σi + jωi.in new coordinate system. . Mn−1) where Λr = diag(λ1. Mr+1. . . . . n where λi are the complex eigenvalues (one from each conjugate pair) Eigenvectors and diagonalization 11–26 . . . . .g.. .. resolvent: (sI − A)−1 = = sT T −1 − T ΛT −1 T (sI − Λ)T −1 −1 −1 = T (sI − Λ)−1T −1 = T diag powers (i.. . all λi = 0) Eigenvectors and diagonalization 11–28 ... . T −1 s − λ1 s − λn = T Λk T −1 T ΛT −1 · · · T ΛT −1 = T diag(λk ..e. discrete-time solution): Ak = = T ΛT −1 k 1 1 .e..block diagram of ‘complex mode’: σ 1/s −ω ω 1/s σ Eigenvectors and diagonalization 11–27 diagonalization simplifies many matrix expressions e. i. λk )T −1 1 n (for k < 0 only if A invertible. i.e. eλn )T −1 Eigenvectors and diagonalization 11–29 Analytic function of a matrix for any analytic function f : R → R. . . . we have f (A) = β0I + β1A + β2A2 + β3A3 + · · · = β0T T −1 + β1T ΛT −1 + β2(T ΛT −1)2 + · · · = T β0I + β1Λ + β2Λ2 + · · · T −1 = T diag(f (λ1). . continuous-time solution): eA = I + A + A2/2! + · · · = I + T ΛT −1 + T ΛT −1 = T eΛT −1 2 = T (I + Λ + Λ2/2! + · · ·)T −1 /2! + · · · = T diag(eλ1 .. .. .exponential (i. .e.. . f (λn))T −1 Eigenvectors and diagonalization 11–30 .e. given by power series f (a) = β0 + β1a + β2a2 + β3a3 + · · · we can define f (A) for A ∈ Rn×n (i. overload f ) as f (A) = β0I + β1A + β2A2 + β3A3 + · · · substituting A = T ΛT −1. with T −1AT = Λ ˙ then x(t) = etAx(0) = T eΛtT −1x(0) n = i=1 T eλit(wi x(0))vi thus: any trajectory can be expressed as linear combination of modes Eigenvectors and diagonalization 11–31 interpretation: • (left eigenvectors) decompose initial state x(0) into modal components T wi x(0) • eλit term propagates ith mode forward t seconds • reconstruct state as linear combination of (right) eigenvectors Eigenvectors and diagonalization 11–32 .Solution via diagonalization assume A is diagonalizable consider LDS x = Ax. . ℜλs < 0. . or equivalently. . .application: for what x(0) do we have x(t) → 0 as t → ∞? divide eigenvalues into those with negative real parts ℜλ1 < 0. . . . ℜλn ≥ 0 from x(t) = i=1 n T eλit(wi x(0))vi condition for x(t) → 0 is: x(0) ∈ span{v1. T wi x(0) = 0. vs}. . . i = s + 1. . . . n. then Ak = T Λk T −1 then x(t) = A x(0) = i=1 t n T λt(wi x(0))vi → 0 i as t → ∞ for all x(0) if and only if |λi| < 1. and the others. . . i = 1. so we have fact: x(t + 1) = Ax(t) is stable if and only if all eigenvalues of A have magnitude less than one Eigenvectors and diagonalization 11–34 . . . . n (can you prove this?) Eigenvectors and diagonalization 11–33 Stability of discrete-time systems suppose A diagonalizable consider discrete-time LDS x(t + 1) = Ax(t) if A = T ΛT −1. . . ℜλs+1 ≥ 0. we will see later that this is true even when A is not diagonalizable. . ..   ∈ Cni×ni 1  λi q i=1 ni ) 12–2 .. i..e.  J1 . ..EE263 Autumn 2010-11 Stephen Boyd Lecture 12 Jordan canonical form • Jordan canonical form • generalized modes • Cayley-Hamilton theorem 12–1 Jordan canonical form what if A cannot be diagonalized? any matrix A ∈ Rn×n can be put in Jordan canonical form by a similarity transformation.. Jq    T −1AT = J =  where  λi 1 λi is called a Jordan block of size ni with eigenvalue λi (so n = Jordan canonical form  Ji =   . we can determine the sizes of the Jordan blocks associated with λ Jordan canonical form 12–4 .• J is upper bidiagonal • J diagonal is the special case of n Jordan blocks of size ni = 1 • Jordan form is unique (up to permutations of the blocks) • can have multiple blocks with same eigenvalue Jordan canonical form 12–3 note: JCF is a conceptual tool. dim N (λI − A)k = λi =λ min{k. 2. . . . never used in numerical computations! X (s) = det(sI − A) = (s − λ1)n1 · · · (s − λq )nq hence distinct eigenvalues ⇒ ni = 1 ⇒ A diagonalizable dim N (λI − A) is the number of Jordan blocks with eigenvalue λ more generally. ni} so from dim N (λI − A)k for k = 1. v. .• factor out T and T −1. . . . . λi for j = 2.e. the first column of each Ti is an eigenvector associated with e.. ni. vini are sometimes called generalized eigenvectors Jordan canonical form 12–6 . i. size 3. . Avij = vi j−1 + λivij the vectors vi1. . . . . for k ≥ 2) (λi − λj )k k 0 (λiI−Jj ) =  0   −k(λi − λj )k−1 (k(k − 1)/2)(λi − λj )k−2  (λj − λi)k −k(λj − λi)k−1 k 0 (λj − λi) Jordan canonical form 12–5 Generalized eigenvectors suppose T −1AT = J = diag(J1. say. Jq ) express T as where Ti ∈ Cn×ni T = [T1 T2 · · · Tq ] are the columns of T associated with ith Jordan block Ji we have ATi = TiJi let Ti = [vi1 vi2 · · · vini ] then we have: Avi1 = λivi1. a block of size 3:  0 −1 0 0 −1  λiI−Ji =  0 0 0 0   0 0 1 (λiI−Ji)2 =  0 0 0  0 0 0  (λiI−Ji)3 = 0 • for other blocks (say. λI − A = T (λI − J)T −1 • for. . −1 (s − λ)   −1  s−λ = (s − λ)−1I + (s − λ)−2F1 + · · · + (s − λ)−k Fk−1 where Fi is the matrix with ones on the ith upper diagonal Jordan canonical form 12–8 . can put into form x = J x ˜ ˜ ˜ ˙ system is decomposed into independent ‘Jordan block systems’ xi = Jixi ˜ ˜ xni ˜ xni−1 ˜ x1 ˜ 1/s λ 1/s λ 1/s λ Jordan blocks are sometimes called Jordan chains (block diagram shows why) Jordan canonical form 12–7 Resolvent.  (s − λ)−1 (s − λ)−2 · · · (s − λ)−k  (s − λ)−1 · · · (s − λ)−k+1   =  . .. exponential of Jordan block resolvent of k × k Jordan block with eigenvalue λ:  s−λ −1 s−λ −1  (sI − Jλ)−1 =    .....Jordan form LDS consider LDS x = Ax ˙ ˙ by change of coordinates x = T x. .   ..   . where p is polynomial • such solutions are called generalized modes of the system Jordan canonical form 12–10 . . with ˙ x(0) = a1vi1 + · · · + ani vini = Tia then x(t) = T eJtx(0) = TieJita ˜ • trajectory stays in span of generalized eigenvectors • coefficients have form p(t)eλt. 1 Jordan blocks yield: • repeated poles in resolvent • terms of form tpetλ in etA Jordan canonical form 12–9 Generalized modes consider x = Ax.by inverse Laplace transform... exponential is: etJλ = etλ I + tF1 + · · · + (tk−1/(k − 1)!)Fk−1   1 t · · · tk−1/(k − 1)!  1 · · · tk−2/(k − 2)!   = etλ  . where X (s) = det(sI − A) example: with A = 1 2 3 4 we have X (s) = s2 − 5s − 2.with general x(0) we can write q x(t) = e x(0) = T e T where T −1 tA tJ −1 x(0) = i=1 T TietJi (Si x(0))  T S1 . we define p(A) = a0I + a1A + · · · + ak Ak Cayley-Hamilton theorem: for any A ∈ Rn×n we have X (A) = 0. = .  T Sq  hence: all solutions of x = Ax are linear combinations of (generalized) ˙ modes Jordan canonical form 12–11 Cayley-Hamilton theorem if p(s) = a0 + a1s + · · · + ak sk is a polynomial and A ∈ Rn×n. so X (A) = A2 − 5A − 2I = = 0 7 10 15 22 −5 1 2 3 4 − 2I Jordan canonical form 12–12 . we have Ap ∈ span I. also for p ∈ Z) i. . inverse is linear combination of Ak . . . . . .. n − 1 Jordan canonical form 12–14 .e. A2. An−1 (and if A is invertible. . . An−1 proof: divide X (s) into sp to get sp = q(s)X (s) + r(s) r = α0 + α1s + · · · + αn−1sn−1 is remainder polynomial then Ap = q(A)X (A) + r(A) = r(A) = α0I + α1A + · · · + αn−1An−1 Jordan canonical form 12–13 for p = −1: rewrite C-H theorem X (A) = An + an−1An−1 + · · · + a0I = 0 as I = A −(a1/a0)I − (a2/a0)A − · · · − (1/a0)An−1 (A is invertible ⇔ a0 = 0) so A−1 = −(a1/a0)I − (a2/a0)A − · · · − (1/a0)An−1 i. A. . .corollary: for every p ∈ Z+. every power of A can be expressed as linear combination of I. A. .e. .. k = 0. 0) = 0 Jordan canonical form 12–15 now let’s do general case: T −1AT = J X (s) = (s − λ1)n1 · · · (s − λq )nq suffices to show X (Ji) = 0 ni 0 1 0 ··· X (Ji) = (Ji − λ1I)n1 · · ·  0 0 1 · · ·  · · · (Ji − λq I)nq = 0 . . (Ji −λi I)ni  Jordan canonical form 12–16 . . . .Proof of C-H theorem first assume A is diagonalizable: T −1AT = Λ X (s) = (s − λ1) · · · (s − λn) since X (A) = X (T ΛT −1) = T X (Λ)T −1 it suffices to show X (Λ) = 0 X (Λ) = (Λ − λ1I) · · · (Λ − λnI) = diag(0.. .. . λ2 − λ1. λn − λ1) · · · diag(λ1 − λn. . λn−1 − λn. . EE263 Autumn 2010-11 Stephen Boyd Lecture 13 Linear dynamical systems with inputs & outputs • inputs & outputs: interpretations • transfer function • impulse and step responses • examples 13–1 Inputs & outputs recall continuous-time time-invariant LDS has form x = Ax + Bu. with B ∈ R2×1: x(t) (with u(t) = 1) ˙ x(t) (with u(t) = −1. ˙ • Ax is called the drift term (of x) ˙ • Bu is called the input term (of x) ˙ picture.5) Ax(t) ˙ B x(t) y = Cx + Du Linear dynamical systems with inputs & outputs 13–2 . where B = [b1 · · · bm] ˙ • state derivative is sum of autonomous term (Ax) and one term per input (biui) • each input ui gives another degree of freedom for x (assuming columns ˙ of B independent) bi ˜ i bi write x = Ax + Bu as xi = aT x + ˜T u. where aT . ˜T are the rows of A. B ˙ ˙ ˜i • ith state derivative is linear function of state x and input u Linear dynamical systems with inputs & outputs 13–3 Block diagram D x(t) ˙ x(t) u(t) B 1/s C y(t) A • Aij is gain factor from state xj into integrator i • Bij is gain factor from input uj into integrator i • Cij is gain factor from state xj into output yi • Dij is gain factor from input uj into output yi Linear dynamical systems with inputs & outputs 13–4 .Interpretations write x = Ax + b1u1 + · · · + bmum. i. e.e.g. y= C1 C2 x1 x2 B1 1/s A11 A12 1/s A22 x1 C1 y x2 C2 • x2 is not affected by input u... x2 propagates autonomously • x2 affects y directly and through x1 Linear dynamical systems with inputs & outputs 13–5 Transfer function take Laplace transform of x = Ax + Bu: ˙ sX(s) − x(0) = AX(s) + BU (s) hence X(s) = (sI − A)−1x(0) + (sI − A)−1BU (s) so x(t) = e x(0) + 0 tA t e(t−τ )ABu(τ ) dτ • etAx(0) is the unforced or autonomous response • etAB is called the input-to-state impulse response or impulse matrix • (sI − A)−1B is called the input-to-state transfer function or transfer matrix Linear dynamical systems with inputs & outputs 13–6 . with x1 ∈ Rn1 .interesting when there is structure. x2 ∈ Rn2 : d dt x1 x2 = A11 A12 0 A22 u x1 x2 + B1 0 u. y =h∗u where ∗ is convolution (of matrix valued functions) intepretation: • Hij is transfer function from input uj to output yi Linear dynamical systems with inputs & outputs 13–8 .with y = Cx + Du we have: Y (s) = C(sI − A)−1x(0) + (C(sI − A)−1B + D)U (s) so y(t) = Ce x(0) + 0 tA t Ce(t−τ )ABu(τ ) dτ + Du(t) • output term CetAx(0) due to initial condition • H(s) = C(sI − A)−1B + D is called the transfer function or transfer matrix • h(t) = CetAB + Dδ(t) is called the impulse response or impulse matrix (δ is the Dirac delta function) Linear dynamical systems with inputs & outputs 13–7 with zero initial condition we have: Y (s) = H(s)U (s). τ indexes time lag Linear dynamical systems with inputs & outputs 13–9 Step response the step response or step matrix is given by t s(t) = 0 h(τ ) dτ interpretations: • sij (t) is step response from jth input to ith output • sij (t) gives yi when u = ej for t ≥ 0 for invertible A. i. τ seconds ago • i indexes output.Impulse response impulse response h(t) = CetAB + Dδ(t) with x(0) = 0. we have s(t) = CA−1 etA − I B + D Linear dynamical systems with inputs & outputs 13–10 . m t yi(t) = j=1 0 hij (t − τ )uj (τ ) dτ interpretations: • hij (t) is impulse response from jth input to ith output • hij (t) gives yi when u(t) = ej δ • hij (τ ) shows how dependent output i is. y = h ∗ u.e. on what input j was.. j indexes input. 2.3 • x= y y ˙ 13–11 Linear dynamical systems with inputs & outputs system is:  0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 −2 1 0 −2 1 0 1 −2 1 1 −2 1 0 1 −2 0 1 −2   0 0 0 0 0 0 1 0 −1 1 0 −1            x= ˙          x +        u1 u2 eigenvalues of A are −1. dampers • u1 is tension between 1st & 2nd masses • u2 is tension between 2nd & 3rd masses • y ∈ R3 is displacement of masses 1. springs.71 ± j0.29 ± j0.Example 1 u1 u1 u2 u2 • unit masses. −0.71 Linear dynamical systems with inputs & outputs 13–12 . −1.00.00 ± j1.71. 2 0 t 10 15 roughly speaking: • impulse at u1 affects third mass less than other two • impulse at u2 affects first mass later than other two Linear dynamical systems with inputs & outputs 13–13 Example 2 interconnect circuit: C3 u C1 C2 C4 • u(t) ∈ R is input (drive) voltage • xi is voltage across Ci • output is state: y = x • unit resistors.2 0.impulse response: impulse response from u1 0.2 0.2 0 h11 h31 h21 t impulse response from u2 h22 h12 h32 5 5 10 15 0. unit capacitors • step response matrix shows delay to each node Linear dynamical systems with inputs & outputs 13–14 .1 −0.1 0 −0.1 0 −0.1 −0. e.21. longest delay to x4 • delays ≈ 10. 0  0 y=x eigenvalues of A are −0. dominant) eigenvalue −0.5 s1 s2 s3 0.1 0 0 5 10 15 • shortest delay to x1.17.8 0.system is   −3 1 1 0  1 −1  0 0  x +  x= ˙   1 0 −2 1  0 0 1 −1   1 0   u.4 s4 0.6 0. −2. −3. −0.66.2 0.. consistent with slowest (i.9 0.7 0.17 Linear dynamical systems with inputs & outputs 13–16 .96 Linear dynamical systems with inputs & outputs 13–15 step response matrix s(t) ∈ R4×1: 1 0.3 0. u. ˙ eliminate x to get y = H(0)u y = Cx + Du Linear dynamical systems with inputs & outputs 13–17 • if system is stable.e. i.DC or static gain matrix • transfer function at s = 0 is H(0) = −CA−1B + D ∈ Rm×p • DC transfer function describes system under static conditions. y constant: 0 = x = Ax + Bu. s(t) = 0 h(τ ) dτ ) if u(t) → u∞ ∈ Rm. ∞ H(0) = 0 ∞ h(t) dt = lim s(t) t→∞ t (recall: H(s) = 0 e −st h(t) dt. then y(t) → y∞ ∈ Rp where y∞ = H(0)u∞ Linear dynamical systems with inputs & outputs 13–18 . x.. DC gain matrix for example 1 (springs):  1/4 1/4 1/2  H(0) =  −1/2 −1/4 −1/4 DC gain matrix for example 2 (RC circuit):  1  1  H(0) =    1  1  (do these make sense?)  Linear dynamical systems with inputs & outputs 13–19 Discretization with piecewise constant inputs linear system x = Ax + Bu, y = Cx + Du ˙ suppose ud : Z+ → Rm is a sequence, and u(t) = ud(k) define sequences xd(k) = x(kh), yd(k) = y(kh), k = 0, 1, . . . for kh ≤ t < (k + 1)h, k = 0, 1, . . . • h > 0 is called the sample interval (for x and y) or update interval (for u) • u is piecewise constant (called zero-order-hold) • xd, yd are sampled versions of x, y Linear dynamical systems with inputs & outputs 13–20 xd(k + 1) = x((k + 1)h) h = e hA x(kh) + 0 eτ ABu((k + 1)h − τ ) dτ h = ehAxd(k) + 0 eτ A dτ B ud(k) xd, ud, and yd satisfy discrete-time LDS equations xd(k + 1) = Adxd(k) + Bdud(k), where h yd(k) = Cdxd(k) + Ddud(k) Ad = e hA , Bd = 0 eτ A dτ B, Cd = C, Dd = D 13–21 Linear dynamical systems with inputs & outputs called discretized system if A is invertible, we can express integral as h eτ A dτ = A−1 ehA − I 0 stability: if eigenvalues of A are λ1, . . . , λn, then eigenvalues of Ad are ehλ1 , . . . , ehλn discretization preserves stability properties since ℜλi < 0 ⇔ for h > 0 Linear dynamical systems with inputs & outputs 13–22 ehλi < 1 extensions/variations: • offsets: updates for u and sampling of x, y are offset in time • multirate: ui updated, yi sampled at different intervals (usually integer multiples of a common interval h) both very common in practice Linear dynamical systems with inputs & outputs 13–23 Dual system the dual system associated with system x = Ax + Bu, ˙ is given by z = AT z + C T v, ˙ • all matrices are transposed • role of B and C are swapped transfer function of dual system: (B T )(sI − AT )−1(C T ) + DT = H(s)T where H(s) = C(sI − A)−1B + D Linear dynamical systems with inputs & outputs 13–24 y = Cx + Du w = B T z + DT v (for SISO case, TF of dual is same as original) eigenvalues (hence stability properties) are the same Linear dynamical systems with inputs & outputs 13–25 Dual via block diagram in terms of block diagrams, dual is formed by: • transpose all matrices • swap inputs and outputs on all boxes • reverse directions of signal flow arrows • swap solder joints and summing junctions Linear dynamical systems with inputs & outputs 13–26 original system: D u(t) B 1/s x(t) C y(t) A dual system: DT w(t) BT z(t) 1/s AT CT v(t) Linear dynamical systems with inputs & outputs 13–27 Causality interpretation of t x(t) = e x(0) + 0 tA e(t−τ )ABu(τ ) dτ t y(t) = CetAx(0) + 0 Ce(t−τ )ABu(τ ) dτ + Du(t) for t ≥ 0: current state (x(t)) and output (y(t)) depend on past input (u(τ ) for τ ≤ t) i.e., mapping from input to state and output is causal (with fixed initial state) Linear dynamical systems with inputs & outputs 13–28 now consider fixed final state x(T ): for t ≤ T , t x(t) = e(t−T )Ax(T ) + T e(t−τ )ABu(τ ) dτ, i.e., current state (and output) depend on future input! so for fixed final condition, same system is anti-causal Linear dynamical systems with inputs & outputs 13–29 Idea of state x(t) is called state of system at time t since: • future output depends only on current state and future input • future output depends on past input only through current state • state summarizes effect of past inputs on future output • state is bridge between past inputs and future outputs Linear dynamical systems with inputs & outputs 13–30 Change of coordinates start with LDS x = Ax + Bu, y = Cx + Du ˙ change coordinates in Rn to x, with x = T x ˜ ˜ then ˙ x = T −1x = T −1(Ax + Bu) = T −1AT x + T −1Bu ˜ ˙ ˜ hence LDS can be expressed as ˜x ˜ ˙ x = A˜ + Bu, ˜ where ˜ A = T −1AT, ˜ B = T −1B, ˜ C = CT, ˜ D=D ˜˜ ˜ y = C x + Du TF is same (since u, y aren’t affected): ˜ ˜ ˜ ˜ C(sI − A)−1B + D = C(sI − A)−1B + D Linear dynamical systems with inputs & outputs 13–31 Standard forms for LDS can change coordinates to put A in various forms (diagonal, real modal, Jordan . . . ) e.g., to put LDS in diagonal form, find T s.t. T −1AT = diag(λ1, . . . , λn) write  ˜T b1 . T −1B =  .  , ˜T bn  ˙ xi = λixi + ˜T u, ˜ ˜ bi CT = c1 · · · cn ˜ ˜ n so y= i=1 cixi ˜˜ Linear dynamical systems with inputs & outputs 13–32 ˜T b1 1/s λ1 x1 ˜ c1 ˜ u y ˜T bn 1/s λn xn ˜ cn ˜ (here we assume D = 0) Linear dynamical systems with inputs & outputs 13–33 Discrete-time systems discrete-time LDS: x(t + 1) = Ax(t) + Bu(t), D u(t) x(t + 1) 1/z B A • only difference w/cts-time: z instead of s • interpretation of z −1 block: – unit delayor (shifts sequence back in time one epoch) – latch (plus small delay to avoid race condition) Linear dynamical systems with inputs & outputs 13–34 y(t) = Cx(t) + Du(t) x(t) C y(t) we have: x(1) = Ax(0) + Bu(0). for t ∈ Z+. t−1 x(t) = A x(0) + τ =0 t A(t−1−τ )Bu(τ ) hence y(t) = CAtx(0) + h ∗ u Linear dynamical systems with inputs & outputs 13–35 where ∗ is discrete-time convolution and h(t) = is the impulse response D. and in general. x(2) = Ax(1) + Bu(1) = A2x(0) + ABu(0) + Bu(1). t > 0 Linear dynamical systems with inputs & outputs 13–36 . t=0 t−1 CA B. 13–37 Z-transform of time-advanced signal: ∞ V (z) = = z z −tw(t + 1) t=0 ∞ z −tw(t) t=1 = zW (z) − zw(0) Linear dynamical systems with inputs & outputs 13–38 . .. .e. 1.Z-transform suppose w ∈ Rp×q is a sequence (discrete-time signal). w : Z+ → Rp×q recall Z-transform W = Z(w): ∞ W (z) = t=0 z −tw(t) where W : D ⊆ C → Cp×q (D is domain of W ) time-advanced or shifted signal v: v(t) = w(t + 1). Linear dynamical systems with inputs & outputs t = 0. . i. Discrete-time transfer function take Z-transform of system equations x(t + 1) = Ax(t) + Bu(t). Y (z) = CX(z) + DU (z) solve for X(z) to get X(z) = (zI − A)−1zx(0) + (zI − A)−1BU (z) (note extra z in first term!) Linear dynamical systems with inputs & outputs 13–39 hence Y (z) = H(z)U (z) + C(zI − A)−1zx(0) where H(z) = C(zI − A)−1B + D is the discrete-time transfer function note power series expansion of resolvent: (zI − A)−1 = z −1I + z −2A + z −3A2 + · · · Linear dynamical systems with inputs & outputs 13–40 . y(t) = Cx(t) + Du(t) yields zX(z) − zx(0) = AX(z) + BU (z). EE263 Autumn 2010-11 Stephen Boyd Lecture 14 Example: Aircraft dynamics • longitudinal aircraft dynamics • wind gust & control inputs • linearized dynamics • steady-state analysis • eigenvalues & modes • impulse matrices 14–1 Longitudinal aircraft dynamics θ body axis horizontal variables are (small) deviations from operating point or trim conditions state (components): • u: velocity of aircraft along body axis • v: velocity of aircraft perpendicular to body axis (down is positive) • θ: angle between body axis and horizontal (up is positive) ˙ • q = θ: angular velocity of aircraft (pitch rate) Example: Aircraft dynamics 14–2 .    u ˙ u − uw −. sec.429  ˙ q 0 ˙ θ 0 0 1 0 θ   .01rad ≈ 0.319 7.020 −.57◦) • matrix coefficients are called stability derivatives Example: Aircraft dynamics 14–4 . 40000 ft.322  v   −. level flight. 774 ft/sec.Inputs disturbance inputs: • uw : velocity of wind along body axis • vw : velocity of wind perpendicular to body axis control or actuator inputs: • δe: elevator angle (δe > 0 is down) • δt: thrust Example: Aircraft dynamics 14–3 Linearized dynamics for 747.01 1  −.74   v − vw 0  ˙  =    q   . crad (= 0.18 −.04  δe  +   −1.101 −.039 0 −.065 −.598  δt 0 0      • units: ft.003 .16 . 0229 . elevator & thrust changes solve for control variables in terms of wind velocities.outputs of interest: • aircraft speed u (deviation from trim) ˙ • climb rate h = −v + 7.74θ Example: Aircraft dynamics 14–5 Steady-state analysis ˙ DC gain from (uw . δt) to (u.0020 . h): H(0) = −CA−1B + D = 1 0 27. desired speed & climb rate δe δt .0 0 −1 −1. δe.34 24.0379 .2 −15. vw .0413 u − uw ˙ h + vw = Example: Aircraft dynamics 14–6 .9 gives steady-state change in speed & climb rate due to wind. increase in speed is obtained mostly by increasing elevator (i.0674j • two complex modes. increase in climb rate is obtained by increasing thrust and increasing elevator (i.. −0. downwards) • constant speed. called short-period and phugoid. downwards) (thrust on 747 gives strong pitch up torque) Example: Aircraft dynamics 14–7 Eigenvalues and modes eigenvalues are −0.e.8818j..e.• level flight.3750 ± 0.0005 ± 0. respectively • system is stable (but lightly damped) • hence step responses converge (eventually) to DC gain matrix Example: Aircraft dynamics 14–8 . 0899   −0.0005 0.5433     ± j  0.6130  −0.8235  . decays in ≈ 10 sec Example: Aircraft dynamics 14–10 .0111   0.7510 0.1637  xshort xphug Example: Aircraft dynamics 14–9 Short-period mode y(t) = CetA(ℜxshort) (pure short-period mode motion) 1 0.5 ˙ h(t) 0 −0.0082  0.5 −1 0 2 4 6 8 10 12 14 16 18 20 • only small effect on speed u • period ≈ 7 sec.0677  −0.0941  =   −0.1140     −0.0135  −0.5 u(t) 0 −0.0962     ± j  0.0283 0. =   −0.1225 0.5 −1 0 2 4 6 8 10 12 14 16 18 20 1 0.eigenvectors are    0. 5 0 5 10 15 20 Example: Aircraft dynamics 14–12 . vw ) to (u.5 h21 0 h22 0 5 10 15 20 0 −0.5 0. decays in ≈ 5000 sec Example: Aircraft dynamics 14–11 Dynamic response to wind gusts ˙ impulse response matrix from (uw .1 h11 0 h12 0 5 10 15 20 0 −0.1 0 5 10 15 20 0. h) (gives response to short wind bursts) over time period [0.1 −0.Phugoid mode y(t) = CetA(ℜxphug ) (pure phugoid mode motion) 2 1 u(t) 0 −1 −2 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2 1 ˙ h(t) 0 −1 −2 0 200 400 600 800 1000 1200 1400 1600 1800 2000 • affects both speed and climb rate • period ≈ 100 sec.1 0.5 −0. 20]: 0. 20]: 2 2 1 1 h11 0 h12 0 5 10 15 20 0 −1 −1 −2 −2 0 5 10 15 20 5 3 2.5 h21 0 h22 0 200 400 600 0 −0. h) over time period [0.5 −0.1 −0. δt) to (u.1 h11 0 h12 0 200 400 600 0 −0.1 0 200 400 600 0.5 1 0.5 0 200 400 600 Example: Aircraft dynamics 14–13 Dynamic response to actuators ˙ impulse response matrix from (δe. 600]: 0.5 2 h21 0 h22 0 5 10 15 20 1.1 0.over time period [0.5 0.5 −5 0 0 5 10 15 20 Example: Aircraft dynamics 14–14 . over time period [0. 600]: 2 2 1 1 h11 0 h12 0 200 400 600 0 −1 −1 −2 −2 0 200 400 600 3 2 1 3 2 1 h21 0 −1 −2 −3 0 200 400 600 h22 0 −1 −2 −3 0 200 400 600 Example: Aircraft dynamics 14–15 . v = 0. suppose Av = λv. matrix norm. i. quadratic forms.e. and SVD 15–2 . matrix norm.EE263 Autumn 2010-11 Stephen Boyd Lecture 15 Symmetric matrices. can assume v ∈ Rn) Symmetric matrices. and SVD • eigenvectors of symmetric matrices • quadratic forms • inequalities for quadratic forms • positive semidefinite matrices • norm of a matrix • singular value decomposition 15–1 Eigenvalues of symmetric matrices suppose A ∈ Rn×n is symmetric.. i. v ∈ Cn then v Av = v (Av) = λv v = λ i=1 T T T n |vi|2 but also v Av = (Av) v = (λv) v = λ T T T n i=1 |vi|2 so we have λ = λ.e. A = AT fact: the eigenvalues of A are real to see this. λ ∈ R (hence.. quadratic forms. . T Aqi = λiqi. and SVD 15–4 . i. matrix norm. . q1. qn s. quadratic forms.t. . qi qj = δij in matrix form: there is an orthogonal Q s. quadratic forms. Q−1AQ = QT AQ = Λ hence we can express A as n A = QΛQ = i=1 T T λi q i q i in particular.e.Eigenvectors of symmetric matrices fact: there is a set of orthonormal eigenvectors of A. and SVD 15–3 ac A = QΛQT x Interpretations Q T QT x Λ ΛQT x Q Ax linear mapping y = Ax can be decomposed as • resolve into qi coordinates • scale coordinates by λi • reconstitute with basis qi Symmetric matrices. qi are both left and right eigenvectors Symmetric matrices. matrix norm. . .t. and SVD 15–6 . matrix norm. quadratic forms. • rotate by QT • diagonal real scale (‘dilation’) by Λ • rotate back by Q decomposition A= n T λi q i q i i=1 expresses A as linear combination of 1-dimensional projections Symmetric matrices. matrix norm.or. quadratic forms. geometrically. and SVD 15–5 example: A = = −1/2 3/2 3/2 −1/2 1 √ 2 1 1 1 −1 1 0 0 −2 1 √ 2 1 1 1 −1 T T q2 q2 x q1 x T q1 q1 x T λ1 q 1 q 1 x q2 T λ2 q 2 q 2 x Ax Symmetric matrices. cn) ˙ note −C −1G is not symmetric Symmetric matrices. . vn. and SVD 15–7 Example: RC circuit i1 v1 c1 resistive circuit in vn cn ck vk = −ik . λi = λj . quadratic forms. quadratic forms. hence vi vj = 0 • in this case we can say: eigenvectors are orthogonal • in general case (λi not distinct) we must say: eigenvectors can be chosen to be orthogonal Symmetric matrices. . . a set of linearly independent eigenvectors of A: Avi = λivi.proof (case of λi distinct) since λi distinct. vi = 1 then we have T T T vi (Avj ) = λj vi vj = (Avi)T vj = λivi vj T so (λi − λj )vi vj = 0 T for i = j. can find v1. . and SVD 15–8 . matrix norm. . . ˙ i = Gv G = GT ∈ Rn×n is conductance matrix of resistive circuit thus v = −C −1Gv where C = diag(c1. matrix norm. . . . quadratic forms. −C −1G) are real • eigenvectors qi (in xi coordinates) can be chosen orthogonal • eigenvectors in voltage coordinates. and SVD 15–10 . . and SVD 15–9 Quadratic forms a function f : Rn → R of the form n f (x) = x Ax = i.j=1 T Aij xixj is called a quadratic form in a quadratic form we may as well assume A = AT since xT Ax = xT ((A + AT )/2)x ((A + AT )/2 is called the symmetric part of A) uniqueness: if xT Ax = xT Bx for all x ∈ Rn and A = AT . matrix norm.use state xi = √ civi. si = C −1/2qi. satisfy −C −1Gsi = λisi. sT Csi = δij i Symmetric matrices. quadratic forms. . cn) we conclude: • eigenvalues λ1. λn of −C −1/2GC −1/2 (hence. . . then A=B Symmetric matrices. so x = C 1/2v = −C −1/2GC −1/2x ˙ ˙ √ √ where C 1/2 = diag( c1. . B = B T . . . matrix norm. and SVD 15–11 Inequalities for quadratic forms suppose A = AT . quadratic forms..e.Examples • Bx • 2 = xT B T Bx − xi)2 2 n−1 i=1 (xi+1 2 • Fx − Gx sets defined by quadratic forms: • { x | f (x) = a } is called a quadratic surface • { x | f (x) ≤ a } is called a quadratic region Symmetric matrices. we have xT Ax ≤ λ1xT x Symmetric matrices. matrix norm. and SVD 15–12 . quadratic forms. A = QΛQT with eigenvalues sorted so λ1 ≥ · · · ≥ λn xT Ax = xT QΛQT x = (QT x)T Λ(QT x) n = i=1 T λi(qi x)2 n ≤ λ1 T (qi x)2 i=1 2 = λ1 x i. matrix norm. j we say A is positive definite if xT Ax > 0 for all x = 0 • denoted A > 0 • A > 0 if and only if λmin(A) > 0. all eigenvalues are positive Symmetric matrices. and SVD 15–13 Positive semidefinite and positive definite matrices suppose A = AT ∈ Rn×n we say A is positive semidefinite if xT Ax ≥ 0 for all x • denoted A ≥ 0 (and sometimes A • not the same as Aij ≥ 0 for all i. i.e.. so the inequalities are tight Symmetric matrices. T qn Aqn = λn qn 2. so we have λnxT x ≤ xT Ax ≤ λ1xT x sometimes λ1 is called λmax.e. all eigenvalues are nonnegative . and SVD 15–14 0) • A ≥ 0 if and only if λmin(A) ≥ 0.. matrix norm. quadratic forms. quadratic forms. λn is called λmin note also that T q1 Aq1 = λ1 q1 2. i. matrix norm.similar argument shows xT Ax ≥ λn x 2. etc. we say A is indefinite matrix inequality: if B = B T ∈ Rn we say A ≥ B if A − B ≥ 0. for example: • A ≥ 0 means A is positive semidefinite • A > B means xT Ax > xT Bx for all x = 0 Symmetric matrices. • if A ≥ B and C ≥ D. then αA ≥ 0 • if A > 0. matrix norm. then A + C ≥ B + D • if B ≤ 0 then A + B ≤ A • A2 ≥ 0 • if A ≥ 0 and α ≥ 0. matrix norm. and SVD 15–15 many properties that you’d guess hold actually do. (such matrices are called incomparable) B≥A Symmetric matrices. then A−1 > 0 matrix inequality is only a partial order : we can have A ≥ B. quadratic forms. and SVD 15–16 .Matrix inequalities • we say A is negative semidefinite if −A ≥ 0 • we say A is negative definite if −A > 0 • otherwise. A < B if B − A > 0. e.g.. quadratic forms. hence ellipsoid is thin in direction q1 • in direction qn. quadratic forms. hence ellipsoid is fat in direction qn • λmax/λmin gives maximum eccentricity ˜ ˜ if E = { x | xT Bx ≤ 1 }. then E ⊆ E ⇐⇒ A ≥ B Symmetric matrices. where B > 0. and SVD 15–17 semi-axes are given by si = λi −1/2 qi.: • eigenvectors determine directions of semiaxes • eigenvalues determine lengths of semiaxes note: • in direction q1.Ellipsoids if A = AT > 0. i. centered at 0 s1 s2 E Symmetric matrices. xT Ax is large. matrix norm. quadratic forms. the set E = { x | xT Ax ≤ 1 } is an ellipsoid in Rn. and SVD 15–18 . matrix norm. xT Ax is small.e. gain varies with direction of input x questions: • what is maximum gain of A (and corresponding maximum gain direction)? • what is minimum gain of A (and corresponding minimum gain direction)? • how does gain of A vary with direction? Symmetric matrices. matrix norm. and SVD 15–19 Matrix norm the maximum gain Ax x=0 x is called the matrix norm or spectral norm of A and is denoted A max xT AT Ax Ax 2 = max = λmax(AT A) max 2 2 x=0 x=0 x x so we have A = λmax(AT A) similarly the minimum gain is given by min Ax / x = x=0 λmin(AT A) Symmetric matrices. quadratic forms. quadratic forms.Gain of a matrix in a direction suppose A ∈ Rm×n (not necessarily square or symmetric) for x ∈ Rn. Ax / x gives the amplification factor or gain of A in the direction x obviously. matrix norm. and SVD 15–20 . 785 −0.620 0. matrix norm.265 0. eigenvector of AT A associated with λmin Symmetric matrices.53: 0.785 = 1. eigenvector of AT A associated with λmax • ‘min gain’ input direction is x = qn.620 0.78 15–22 0.620 90.620 T then A = λmax(AT A) = 9.99  = 9.18 =  4.620 0. matrix norm.53 7. quadratic forms.note that • AT A ∈ Rn×n is symmetric and AT A ≥ 0 so λmin.785 0. λmax ≥ 0 • ‘max gain’ input direction is x = q1. and SVD 15–21  1 2 example: A =  3 4  5 6  AT A = = 35 44 44 56 0.785  2. A  Symmetric matrices. and SVD . quadratic forms.785 0.7 0 0 0.785 −0.620 0. we have 0.18  0. and SVD 15–23 Properties of matrix norm • consistent with vector norm: matrix norm of a ∈ Rn×1 is √ λmax(aT a) = aT a • for any x.514 −0. quadratic forms.620  0. Ax ≤ A • scaling: aA = |a| A • triangle inequality: A + B ≤ A + B • definiteness: A = 0 ⇔ A = 0 • norm of product: AB ≤ A B x Symmetric matrices.14  = 0.514 ≤ Ax ≤ 9.785 −0.514: 0. A for all x = 0. matrix norm.785 −0. and SVD 15–24 .46 =  0.620 = 1.53 x Symmetric matrices.min gain is λmin(AT A) = 0. quadratic forms. matrix norm. Rank(A) = r • U ∈ Rm×r . matrix norm. . where σ1 ≥ · · · ≥ σr > 0 Symmetric matrices. U T U = I • V ∈ Rn×r .Singular value decomposition more complete picture of gain properties of A given by singular value decomposition (SVD) of A: A = U ΣV T where • A ∈ Rm×n. and SVD 15–25 with U = [u1 · · · ur ]. V = [v1 · · · vr ]. and SVD 15–26 . quadratic forms. σr ). . matrix norm. quadratic forms. . V T V = I • Σ = diag(σ1. . r A = U ΣV T = i=1 T σiuivi • σi are the (nonzero) singular values of A • vi are the right or input singular vectors of A • ui are the left or output singular vectors of A Symmetric matrices. . matrix norm. vr are orthonormal basis for N (A)⊥ Symmetric matrices. . quadratic forms. and SVD 15–28 . . ur are orthonormal basis for range(A) • v1. and SVD 15–27 similarly. matrix norm. . quadratic forms. AAT = (U ΣV T )(U ΣV T )T = U Σ2U T hence: • ui are eigenvectors of AAT (corresponding to nonzero eigenvalues) • σi = λi(AAT ) (and λi(AAT ) = 0 for i > r) • u1. .AT A = (U ΣV T )T (U ΣV T ) = V Σ2V T hence: • vi are eigenvectors of AT A (corresponding to nonzero eigenvalues) • σi = λi(AT A) (and λi(AT A) = 0 for i > r) • A = σ1 Symmetric matrices. . matrix norm. . . and SVD 15–29 • v1 is most sensitive (highest gain) input direction • u1 is highest gain output direction • Av1 = σ1u1 Symmetric matrices. . quadratic forms. and SVD 15–30 . ur difference with eigenvalue decomposition for symmetric A: input and output directions are different Symmetric matrices. quadratic forms. vr • scale coefficients by σi • reconstitute along output directions u1. . . matrix norm. . . .Interpretations r A = U ΣV T = i=1 T σiuivi x VT V Tx Σ ΣV T x U Ax linear mapping y = Ax can be decomposed as • compute coefficients of x along input directions v1. 0. i.. with σ1 = 1. quadratic forms. 0. matrix norm.5v1 + 0.5.6. v2 x = 0.1.6)(0. and SVD 15–32 .05) • input components along directions v1 and v2 are amplified (by about 10) and come out mostly along plane spanned by u1.5)(1)u1 + (0.5 T T • resolve x along v1. and SVD 15–31 example: A ∈ R2×2.05 • A is nonsingular • for some applications you might say A is effectively rank 2 Symmetric matrices.5)u2 v2 u1 x Ax v1 u2 Symmetric matrices. u2 • input components along directions v3 and v4 are attenuated (by about 10) • Ax / x can range between 10 and 0. v2: v1 x = 0. x = 0.e.6v2 T T • now form Ax = (v1 x)σ1u1 + (v2 x)σ2u2 = (0. matrix norm. 7. σ2 = 0. quadratic forms.SVD gives clearer picture of gain as function of input/output directions example: consider A ∈ R4×4 with Σ = diag(10. A† = (AT A)−1AT gives the least-squares approximate solution xls = A†y if A is fat and full rank. A† = AT (AAT )−1 gives the least-norm solution xln = A†y SVD Applications 16–2 . A† = V Σ−1U T is the pseudo-inverse or Moore-Penrose inverse of A if A is skinny and full rank.EE263 Autumn 2010-11 Stephen Boyd Lecture 16 SVD Applications • general pseudo-inverse • full SVD • image of unit ball under linear transformation • SVD in estimation/inversion • sensitivity of linear equations to data error • low rank approximation via SVD 16–1 General pseudo-inverse if A = 0 has SVD A = U ΣV T . we have lim AT A + µI µ→0 −1 A T = A† (check this!) SVD Applications 16–4 .e. xµ = AT A + µI here. let xµ be (unique) minimizer of Ax − y i. i.. least-squares approximate solution SVD Applications 16–3 Pseudo-inverse via regularization for µ > 0.e. xpinv is the minimum-norm. AT A + µI > 0 and so is invertible then we have lim xµ = A†y µ→0 2 +µ x 2 −1 AT y in fact..in general case: Xls = { z | Az − y = min Aw − y } w is set of least-squares approximate solutions xpinv = A†y ∈ Xls has minimum norm on Xls. Full SVD SVD of A ∈ Rm×n with Rank(A) = r:   σ1 . U = [U1 U2] ∈ Rm×m and V = [V1 V2] ∈ Rn×n are orthogonal • add zero rows/cols to Σ1 to form Σ ∈ Rm×n: Σ= Σ1 0(m − r)×r 0r×(n − r) 0(m − r)×(n − r) 16–5 SVD Applications then we have A = U1Σ1V1T = i.  .. T σr vr  A = U1Σ1V1T = u1 · · · u r • find U2 ∈ Rm×(m−r).e. V2 ∈ Rn×(n−r) s.t.: A = U ΣV T U1 U2 Σ1 0(m − r)×r 0r×(n − r) 0(m − r)×(n − r) V1T V2T called full SVD of A (SVD with positive singular values only called compact SVD) SVD Applications 16–6 ..  T v1  . 0.5 2 {Ax | x ≤ 1} is ellipsoid with principal axes σiui.Image of unit ball under linear transformation full SVD: A = U ΣV T gives intepretation of y = Ax: • rotate (by V T ) • stretch along axes by σi (σi = 0 for i > r) • zero-pad (if m > n) or truncate (if m < n) to get m-vector • rotate (by U ) SVD Applications 16–7 Image of unit ball under A 1 1 rotate by V T 1 1 stretch.5) u2 u1 rotate by U 0. Σ = diag(2. SVD Applications 16–8 . with BA = I (i.e.SVD in estimation/inversion suppose y = Ax + v.: ˆ ˜ ˆ ˆ true x lies in uncertainty ellipsoid Eunc. i. unbiased) ˆ • estimation or inversion error is x = x − x = Bv ˜ ˆ • set of possible estimation errors is ellipsoid x ∈ Eunc = { Bv | v ≤ α } ˜ • x = x − x ∈ x − Eunc = x + Eunc. of course) SVD Applications 16–10 .e.. where • y ∈ Rm is measurement • x ∈ Rn is vector to be estimated • v is a measurement noise or error ‘norm-bound’ model of noise: we assume v ≤ α but otherwise know nothing about v (α gives max norm of noise) SVD Applications 16–9 • consider estimator x = By. centered at estimate x ˆ • ‘good’ estimator has ‘small’ Eunc (with BA = I. maximum norm of error is α B . the least-squares estimator gives the smallest uncertainty ellipsoid SVD Applications 16–11 Example: navigation using range measurements (lect.. x − x ≤ α B ˆ optimality of least-squares: suppose BA = I is any estimator.g.e.e... i.semiaxes of Eunc are ασiui (singular values & vectors of B) e. and Bls = A† is the least-squares estimator then: T • BlsBls ≤ BB T • Els ⊆ E • in particular Bls ≤ B i. 4) we have  T k1  kT  y = − 2 x T  k3  T k4  where ki ∈ R2 using first two measurements and inverting: x=− ˆ T k1 T k2 −1 02×2 y using all four measurements and least-squares: x = A† y ˆ SVD Applications 16–12 . m > n. so B = Bls + Z then ZA = ZU ΣV T = 0.uncertainty regions (with α = 1): uncertainty region for x using inversion 20 15 x2 10 5 0 −10 −8 x1 uncertainty region for x using least-squares −6 −4 −2 0 2 4 6 8 10 20 15 x2 10 5 0 −10 −8 −6 −4 −2 x1 0 2 4 6 8 10 SVD Applications 16–13 Proof of optimality property suppose A ∈ Rm×n. so ZU = 0 (multiply by V Σ−1 on right) therefore BB T = (Bls + Z)(Bls + Z)T T T = BlsBls + BlsZ T + ZBls + ZZ T T = BlsBls + ZZ T T ≥ BlsBls T using ZBls = (ZU )Σ−1V T = 0 SVD Applications 16–14 . is full rank SVD: A = U ΣV T . and B satisfies BA = I define Z = B − Bls. with V orthogonal Bls = A† = V Σ−1U T . i. A can be considered singular in practice SVD Applications 16–15 δy a more refined analysis uses relative instead of absolute errors in x and y since y = Ax.e. y becomes y + δy then x becomes x + δx with δx = A−1δy hence we have δx = A−1δy ≤ A−1 if A−1 is large. • small errors in y can lead to large errors in x • can’t solve for x given y (with small errors) • hence. of course x = A−1y suppose we have an error or noise in y. we also have y ≤ A δx ≤ A x x .Sensitivity of linear equations to data error consider y = Ax.. hence A−1 δy y κ(A) = A A−1 = σmax(A)/σmin(A) is called the condition number of A we have: relative error in solution x ≤ condition number · relative error in data y or. in terms of # bits of guaranteed accuracy: # bits accuracy in solution ≈ # bits accuracy in data − log2 κ SVD Applications 16–16 . A ∈ Rn×n invertible. s. κ = σmax(A)/σmin(A) SVD Applications 16–17 Low rank approximations r suppose A ∈ R m×n . A ≈ A in the sense that ˆ A − A is minimized solution: optimal rank p approximator is p ˆ A= i=1 T σiuivi ˆ • hence A − A = r T i=p+1 σi ui vi = σp+1 T • interpretation: SVD dyads uivi are ranked in order of ‘importance’.we say • A is well conditioned if κ is small • A is poorly conditioned if κ is large (definition of ‘small’ and ‘large’ depend on application) same analysis holds for least-squares approximate solutions with A nonsquare.t. with SVD A = U ΣV T = i=1 T σiuivi ˆ ˆ ˆ we seek matrix A. Rank(A) = r. take p to get rank p approximant SVD Applications 16–18 . Rank(A) ≤ p < r. there is a unit vector z ∈ Rn s. the two subspaces intersect. . if A ∈ Rn×n. . . z ∈ span{v1. . . . dim span{v1.t. σn = σmin is distance to nearest singular matrix hence..e. the distance (measured by matrix norm) to the nearest rank i − 1 matrix for example. vp+1} = p + 1 hence. vp+1} p+1 T σiuivi z i=1 (A − B)z = Az = p+1 (A − B)z 2 = i=1 2 T 2 σi (vi z)2 ≥ σp+1 z 2 ˆ hence A − B ≥ σp+1 = A − A SVD Applications 16–19 Distance to singularity another interpretation of σi: σi = min{ A − B | Rank(B) ≤ i − 1 } i. small σmin means A is near to a singular matrix SVD Applications 16–20 . .proof: suppose Rank(B) ≤ p then dim N (B) ≥ n − p also. Bz = 0.e.. i. . . 0. 30. for i = 5. 0. are substantially smaller than the noise term v simplified model: 4 y= i=1 T σiuivi x + v SVD Applications 16–21 . 2. . 7. . .5.0001 • x is on the order of 1 • unknown error or noise v has norm on the order of 0. where • A ∈ R100×30 has SVs 10. .1 T then the terms σiuivi x.01. 0.application: model simplification suppose y = Ax + v. . . . mass m • potential V : [0. t)|2 is probability density of particle at position x. t)|2 dx = 1 for all t) evolution of Ψ governed by Schrodinger equation: i¯ Ψ = h˙ V − h ¯2 2 ∇ Ψ = HΨ 2m x √ −1 17–2 where H is Hamiltonian operator.EE263 Autumn 2010-11 Stephen Boyd Lecture 17 Example: Quantum mechanics • wave function and Schrodinger equation • discretization • preservation of probability • eigenvalues & eigenstates • example 17–1 Quantum mechanics • single particle in interval [0. time t 1 (so 0 |Ψ(x. i = Example: Quantum mechanics . 1]. 1] → R Ψ : [0. 1] × R+ → C is (complex-valued) wave function interpretation: |Ψ(x. . ... .Discretization let’s discretize position x into N discrete points. N wave function is approximated as vector Ψ(t) ∈ CN ∇2 operator is approximated as matrix x  −2 1  1 −2 1  2 2 1 −2 ∇ =N  ... (which approximates w = ∂ 2v/∂x2) Example: Quantum mechanics 17–3 discretized Schrodinger equation is (complex) linear dynamical system ˙ Ψ = (−i/¯ )(V − (¯ /2m)∇2)Ψ = (−i/¯ )HΨ h h h where V is a diagonal matrix with Vkk = V (k/N ) hence we analyze using linear dynamical system theory (with complex vectors & matrices): ˙ Ψ = (−i/¯ )HΨ h h solution of Shrodinger equation: Ψ(t) = e(−i/¯ )tH Ψ(0) h matrix e(−i/¯ )tH propogates wave function forward in time t seconds (backward if t < 0) Example: Quantum mechanics 17–4 .  so w = ∇2v means wk = (vk+1 − vk )/(1/N ) − (vk − vk−1)(1/N ) 1/N  . 1 −2      1 . ... k = 1. k/N . our discretization preserves probability exactly Example: Quantum mechanics 17–5 h U = e−(i/¯ )tH is unitary. Ψ(t) 2 is constant.Preservation of probability d Ψ dt 2 d ∗ Ψ Ψ dt ˙ ˙ = Ψ∗Ψ + Ψ∗Ψ = = ((−i/¯ )HΨ)∗Ψ + Ψ∗((−i/¯ )HΨ) h h = (i/¯ )Ψ∗HΨ + (−i/¯ )Ψ∗HΨ h h = 0 (using H = H T ∈ RN ×N ) hence. then Uz 2 = (U z)∗(U z) = z ∗U ∗U z = z ∗z = z 2 Example: Quantum mechanics 17–6 . meaning U ∗U = I unitary is extension of orthogonal for complex matrix: if U ∈ CN ×N is unitary and z ∈ CN . λN are real (λ1 ≤ · · · ≤ λN ) • its eigenvectors v1. probability density h |Ψm(t)|2 = e(−i/¯ )λk tvk 2 = |vmk |2 doesn’t change with time (vmk is mth entry of vk ) Example: Quantum mechanics 17–8 . vN h • eigenvalues of (−i/¯ )H are (−i/¯ )λ1. . . . . (−i/¯ )λN (which are pure h h h imaginary) Example: Quantum mechanics 17–7 • eigenvectors vk are called eigenstates of system • eigenvalue λk is energy of eigenstate vk h • for mode Ψ(t) = e(−i/¯ )λk tvk . . . . vN can be chosen to be orthogonal (and real) from Hv = λv ⇔ (−i/¯ )Hv = (−i/¯ )λv we see: h h • eigenvectors of (−i/¯ )H are same as eigenvectors of H. . .Eigenvalues & eigenstates H is symmetric. .. . . . so • its eigenvalues λ1. . . i. . v1.e. 2 0 0.1 0.9 1 x • potential V shown as dotted line (scaled to fit plot) • four eigenstates with lowest energy shown (i.8 0.2 0.5 0.3 0.4 0.2 0.8 0.7 0.6 0. v1.2 0 0.6 0.3 0. v2.2 0 −0.8 0. v3.9 1 x • potential bump in middle of infinite potential well • (for this example.2 0 −0.8 0.1 0.5 0. .4 0.5 0.Example Potential Function V (x) 1000 900 800 700 600 V 500 400 300 200 100 0 0 0.7 0.6 0.4 0.2 0 −0.9 1 0.3 0.7 0.6 0. we set ¯ = 1. .9 1 0.1 0.5 0.2 0 0. m = 1 .e. v4) Example: Quantum mechanics 17–10 .6 0.5 0.7 0.2 0.7 0.9 1 0.2 0 −0.3 0.1 0.3 0.8 0.2 0.2 0 0..2 0.1 0. ) h Example: Quantum mechanics 17–9 lowest energy eigenfunctions 0.4 0.4 0. 08 0.2 0..4 0.7 0.e.9 1 0.05 0 0 10 20 30 40 50 60 70 80 90 100 eigenstate ∗ • bottom plot shows |vk Ψ(0)|2. with initial wave function Ψ(0) • particle near x = 0.3 0. resolution of Ψ(0) into eigenstates Example: Quantum mechanics 17–12 • top plot shows initial probability density |Ψ(0)|2 . i.02 0 0 0.now let’s look at a trajectory of Ψ.6 0.2 • with momentum to right (can’t see in plot of |Ψ|2) • (expected) kinetic energy half potential bump height Example: Quantum mechanics 17–11 0.8 0.15 0.04 0.1 0.5 x 0.1 0.06 0. time evolution. 320: |Ψ(t)|2 Example: Quantum mechanics 17–13 cf. . then rolls back down • reverses velocity when it hits the wall at left (perfectly elastic collision) • then repeats Example: Quantum mechanics 17–14 . for t = 0. . classical solution: • particle rolls half way up potential bump. 40. stops. . . 80. 4 0. versus time t Example: Quantum mechanics k=1 |Ψk (t)|2.6 1.9 0.2 1.3 0..6 0.e.8 0. i.1 0 0 0.4 1. 17–15 .2 0.7 0.6 0.1 0.4 0.8 x 10 2 4 t N/2 plot shows probability that particle is in left half of well.8 1 1.5 0.2 0. tf ] we say input u : [ti. tf ]) (subscripts stand for initial and final) questions: • where can x(ti) be transfered to at t = tf ? • how quickly can x(ti) be transfered to some xtarget? • how do we find a u that transfers x(ti) to x(tf )? • how do we find a ‘small’ or ‘efficient’ u that transfers x(ti) to x(tf )? Controllability and state transfer 18–2 .EE263 Autumn 2010-11 Stephen Boyd Lecture 18 Controllability and state transfer • state transfer • reachable set. controllability matrix • minimum norm inputs • infinite-horizon minimum norm transfer 18–1 State transfer consider x = Ax + Bu (or x(t + 1) = Ax(t) + Bu(t)) over time interval ˙ [ti. tf ] → Rm steers or transfers state from x(ti) to x(tf ) (over time interval [ti. t−1 Rt = τ =0 At−1−τ Bu(τ ) u(0). . . can reach more points given more time) we define the reachable set R as the set of points reachable for some t: R= t≥0 Rt Controllability and state transfer 18–4 . ˙ t Rt = 0 e(t−τ )ABu(τ ) dτ u : [0. .e.Reachability consider state transfer from x(0) = 0 to x(t) we say x(t) is reachable (in t seconds or epochs) we define Rt ⊆ Rn as the set of points reachable in t seconds or epochs for CT system x = Ax + Bu.. . t] → Rm and for DT system x(t + 1) = Ax(t) + Bu(t). u(t − 1) ∈ Rm Controllability and state transfer 18–3 • Rt is a subspace of Rn • Rt ⊆ Rs if t ≤ s (i. . . An−1 hence for t ≥ n. range(Ct) = range(Cn) Controllability and state transfer 18–5 thus we have range(Ct) t < n range(C) t ≥ n where C = Cn is called the controllability matrix Rt = • any state that can be reached can be reached by t = n • the reachable set is R = range(C) Controllability and state transfer 18–6 .  .Reachability for discrete-time LDS DT system x(t + 1) = Ax(t) + Bu(t). x(t) ∈ Rn  u(t − 1) . . . x(t) = Ct  u(0) where Ct = B AB · · · At−1B  so reachable set at t is Rt = range(Ct) by C-H theorem. we can express each Ak for k ≥ n as linear combination of A0. R = Rn) system is reachable if and only if Rank(C) = n example: x(t + 1) = 0 1 1 0 x(t) + 1 1 1 1 1 1 u(t) controllability matrix is C = hence system is not controllable.. reachable set is R = range(C) = { x | x1 = x2 } Controllability and state transfer 18–7 General state transfer with tf > ti.e.  u(tf − 1) . x(tf ) = Atf −ti x(ti) + Ctf −ti  u(ti) hence can transfer x(ti) to x(tf ) = xdes ⇔ xdes − Atf −ti x(ti) ∈ Rtf −ti  • general state transfer reduces to reachability problem • if system is controllable any state transfer can be achieved in ≤ n steps • important special case: driving state to zero (sometimes called regulating or controlling state) Controllability and state transfer 18–8 .  .Controllable system system is called reachable or controllable if all states are reachable (i.   = Ct (CtCt )−1xdes . uln(0) uln is called least-norm or minimum energy input that effects state transfer can express as t−1 −1 uln(τ ) = B T (AT )(t−1−τ ) s=0 AsBB T (AT )s xdes. . . = Ct  u(0)  xdes among all u that steer x(0) = 0 to x(t) = xdes. Rank(Ct) = n to steer x(0) = 0 to x(t) = xdes. .  . u(t − 1) must satisfy  u(t − 1) . . . . the one that minimizes t−1 u(τ ) τ =0 Controllability and state transfer 2 18–9 is given by   uln(t − 1) T T . inputs u(0). . t − 1 Controllability and state transfer 18–10 . . for τ = 0.Least-norm input for reachability assume system is reachable. t) gives practical measure of controllability/reachability (as function of xdes. and directions that can be reached only with large inputs) Controllability and state transfer 18–12 .. is sometimes called minimum energy required to reach x(t) = xdes t−1 Emin = τ =0 uln(τ ) 2 = T T Ct (CtCt )−1xdes T T T Ct (CtCt )−1xdes T = xT (CtCt )−1xdes des t−1 −1 = xT des τ =0 Aτ BB T (AT )τ xdes Controllability and state transfer 18–11 • Emin(xdes.t−1 Emin.e. t) ≤ 1 } shows points in state space reachable at t with one unit of energy (shows directions that can be reached with small inputs. t) • ellipsoid { z | Emin(z. t) gives measure of how hard it is to reach x(t) = xdes from x(0) = 0 (i. how large a u is required) • Emin(xdes. the minimum value of τ =0 u(τ ) 2 required to reach x(t) = xdes. e. s) i.8 −0.75 0. t) ≤ Emin(xdes.Emin as function of t: if t ≥ s then t−1 s−1 A BB (A ) ≥ τ =0 τ =0 τ T T τ Aτ BB T (AT )τ hence t−1 −1 s−1 −1 Aτ BB T (AT )τ τ =0 ≤ τ =0 Aτ BB T (AT )τ so Emin(xdes.95 0 x(t) + 1 0 u(t) Emin(z.: takes less energy to get somewhere more leisurely Controllability and state transfer 18–13 example: x(t + 1) = 1. t) for z = [1 1]T : 10 9 8 7 6 Emin 5 4 3 2 1 0 0 5 10 15 t 20 25 30 35 Controllability and state transfer 18–14 . can’t get anywhere for free) if A is not stable. then P can have nonzero nullspace Controllability and state transfer 18–16 . x(t) = xdes = xT P xdes des if A is stable. P > 0 (i.ellipsoids Emin ≤ 1 for t = 3 and t = 10: 10 Emin (x. 10) ≤ 1 5 x2 0 −5 −10 −10 −8 −6 −4 −2 0 x1 2 4 6 8 10 Controllability and state transfer 18–15 Minimum energy over infinite horizon the matrix t−1 −1 P = lim t→∞ A BB (A ) τ =0 τ T T τ always exists. 3) ≤ 1 5 x2 0 −5 −10 −10 −8 −6 −4 −2 0 x1 2 4 6 8 10 10 Emin (x.. and gives the minimum energy required to reach a point xdes (with no limit on t): t−1 min τ =0 u(τ ) 2 x(0) = 0.e. t] → Rm fact: for t > 0. z = 0 means can get to z using u’s with energy as small as you like (u just gives a little kick to the state. unstable aircraft Controllability and state transfer 18–17 Continuous-time reachability consider now x = Ax + Bu with x(t) ∈ Rn ˙ reachable set at time t is t Rt = 0 e(t−τ )ABu(τ ) dτ u : [0. any reachable point can be reached as fast as you like (with large enough u) Controllability and state transfer 18–18 . B) • same R as discrete-time system • for continuous-time system.• P z = 0. the instability carries it out to z efficiently) • basis of highly maneuverable. where C= B AB · · · An−1B is the controllability matrix of (A. Rt = R = range(C). . An−1 and collect powers of A: etA = α0(t)I + α1(t)A + · · · + αn−1(t)An−1 therefore t x(t) = 0 t eτ ABu(t − τ ) dτ n−1 = 0 Controllability and state transfer αi(τ )Ai Bu(t − τ ) dτ i=0 18–19 n−1 t = i=0 Ai B 0 αi(τ )u(t − τ ) dτ = Cz t where zi = 0 αi(τ )u(t − τ ) dτ hence. . . in terms of A0. An+1.first let’s show for any u (and x(0) = 0) we have x(t) ∈ range(C) write etA as power series: etA = I + t2 t A + A2 + · · · 1! 2! by C-H. . . express An. x(t) is always in range(C) need to show converse: every point in range(C) can be reached Controllability and state transfer 18–20 . . . Impulsive inputs suppose x(0−) = 0 and we apply input u(t) = δ (k)(t)f . input u = δ (k)f transfers state from x(0−) = 0 to x(0+) = Ak Bf now consider input of form u(t) = δ(t)f0 + · · · + δ (n−1)(t)fn−1 where fi ∈ Rm by linearity we have  f0 . x(0+) = Ak Bf Controllability and state transfer 18–21 k k+1 t t2 k+2 Bf + A Bf + · · · 1! 2! thus. fn−1   x(0+) = Bf0 + · · · + An−1Bfn−1 = C  hence we can reach any point in range(C) (at least. using impulse inputs) Controllability and state transfer 18–22 . . where δ (k) denotes kth derivative of δ and f ∈ Rm then U (s) = sk f . so X(s) = (sI − A)−1Bsk f = s−1I + s−2A + · · · Bsk f = ( sk−1 + · · · + sAk−2 + Ak−1 +s−1Ak + · · · )Bf impulsive terms hence x(t) = impulsive terms + A Bf + A in particular. . where can we maneuver state? Controllability and state transfer 18–24 . connected by unit springs. . dampers • input is tension between masses • state is x = [y T y T ]T ˙ u(t) u(t) system is   0 0 0 1 0  0   0 0 0 1  x+ x= ˙   1 −1 1 −1 1  1 −1 1 −1 −1    u  • can we maneuver state anywhere. starting from x(0) = 0? • if not. need to show etAx(0) ∈ R if x(0) ∈ R . then x(t) ∈ R for all t (no matter what u is) to show this.can also be shown that any point in range(C) can be reached for any t > 0 using nonimpulsive inputs fact: if x(0) ∈ R. Controllability and state transfer 18–23 Example • unit masses at y1. y2. . i. precisely the ˙ ˙ differential motions it’s obvious — internal force does not affect center of mass position or total momentum! Controllability and state transfer 18–25 Least-norm input for reachability (also called minimum energy input) assume that x = Ax + Bu is reachable ˙ we seek u that steers x(0) = 0 to x(t) = xdes and minimizes t u(τ ) 0 2 dτ let’s discretize system with interval h = t/N (we’ll let N → ∞ later) thus u is piecewise constant: u(τ ) = ud(k) for kh ≤ τ < (k + 1)h. . y1 = −y2. .e.. R = span   0    0 we can reach states with y1 = −y2. Controllability and state transfer k = 0.controllability matrix is  0 1 −2 2  0 −1 2 −2   =  1 −2 2 0  −1 2 −2 0   0    0      1     −1   C= B AB A2B A3B hence reachable set is    1       −1  . N − 1 18–26 . .   . Bd = 0 Bd AdBd · · · AN −1Bd d h   ud(N − 1) . so this is approximately (t/N )B T e(t−τ )A T similarly N −1 T Ai BdBd (AT )i d d i=0 N −1 T e(ti/N )ABdBd e(ti/N )A i=0 t T = ≈ (t/N ) 0 ¯ etABB T etA dt ¯ ¯ T for large N Controllability and state transfer 18–28 .so x(t) = where Ad = ehA. ud(0) eτ A dτ B least-norm ud that yields x(t) = xdes is N −1 −1 T Ai BdBd (AT )i d d i=0 udln(k) = T Bd (AT )(N −1−k) d xdes let’s express in terms of A: T T Bd (AT )(N −1−k) = Bd e(t−τ )A d Controllability and state transfer T 18–27 where τ = t(k + 1)/N for N large. Bd ≈ (t/N )B. hence least-norm discretized input is approximately uln(τ ) = B e for large N T (t−τ )AT 0 t −1 e BB e ¯ tA ¯ T tAT ¯ dt xdes. DT solution: sum becomes integral Controllability and state transfer 18–29 min energy is 0 t uln(τ ) where Q(t) = 2 dτ = xT Q(t)−1xdes des t eτ ABB T eτ A dτ 0 T can show (A. 0≤τ ≤t hence. B) controllable ⇔ Q(t) > 0 for all t > 0 ⇔ Q(s) > 0 for some s > 0 in fact. but get larger u • cf. range(Q(t)) = R for any t > 0 Controllability and state transfer 18–30 . this is the least-norm continuous input • can make t small. and gives minimum energy required to reach a point xdes (with no limit on t): t min 0 u(τ ) 2 dτ x(0) = 0. P > 0 (i.Minimum energy over infinite horizon the matrix P = lim t→∞ 0 t −1 e τA BB e T τ AT dτ always exists. x(t) = xdes = xT P xdes des • if A is stable. can’t get anywhere for free) • if A is not stable. the instability carries it out to z efficiently) Controllability and state transfer 18–31 General state transfer consider state transfer from x(ti) to x(tf ) = xdes.. tf > ti since x(tf ) = e(tf −ti)Ax(ti) + u steers x(ti) to x(tf ) = xdes ⇔ u (shifted by ti) steers x(0) = 0 to x(tf − ti) = xdes − e(tf −ti)Ax(ti) • general state transfer reduces to reachability problem • if system is controllable. z = 0 means can get to z using u’s with energy as small as you like (u just gives a little kick to the state. then P can have nonzero nullspace • P z = 0. any state transfer can be effected – in ‘zero’ time with impulsive inputs – in any positive time with non-impulsive inputs tf ti e(tf −τ )ABu(τ ) dτ Controllability and state transfer 18–32 .e. 2. dampers • u1 is force between 1st & 2nd masses • u2 is force between 2nd & 3rd masses • y ∈ R3 is displacement of masses 1. springs.Example u1 u1 u2 u2 • unit masses.3 • x= y y ˙ 18–33 Controllability and state transfer system is:  0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 −2 1 0 −2 1 0 1 −2 1 1 −2 1 0 1 −2 0 1 −2   0 0 0 0 0 0 1 0 −1 1 0 −1            x= ˙          x +        u1 u2 Controllability and state transfer 18–34 . u = uln is: 5 u1(t) 0 −5 0 0.5 1 1.5 1 1.5 2 2.5 4 4.5 t 5 u2(t) 0 −5 0 0..e.steer state from x(0) = e1 to x(tf ) = 0 i. tf : 50 45 40 35 30 Emin 25 20 15 10 5 0 0 2 4 6 8 10 12 tf Controllability and state transfer 18–35 for tf = 3. control initial state e1 to zero at t = tf tf Emin = 0 uln(τ ) 2 dτ vs.5 t Controllability and state transfer 18–36 .5 3 3.5 2 2.5 3 3.5 4 4. 5 4 4.and for tf = 4: 5 u1(t) 0 −5 0 0.5 1 1.2 −0.5 2 2.5 3 3.4 0.2 0 −0.6 y1(t) 0.8 0.4 0 2 4 6 8 10 12 14 t Controllability and state transfer 18–38 .5 2 2.5 1 1.5 t 5 u2(t) 0 −5 0 0.5 3 3.5 t Controllability and state transfer 18–37 output y1 for u = 0: 1 0.5 4 4. 5 2 2.5 3 3.4 0 0.4 0.5 4 4.6 y1(t) 0.5 1 1.4 0.5 1 1.5 2 2.2 0 −0.output y1 for u = uln with tf = 3: 1 0.5 4 4.5 t Controllability and state transfer 18–39 output y1 for u = uln with tf = 4: 1 0.8 0.2 0 −0.5 3 3.4 0 0.2 −0.5 t Controllability and state transfer 18–40 .6 y1(t) 0.8 0.2 −0. and D are known • u and y are observed over time interval [0.g. C. in RMS value) y(t) = Cx(t) + Du(t) + v(t) Observability and state estimation 19–2 . t − 1] • w and v are not known.EE263 Autumn 2010-11 Stephen Boyd Lecture 19 Observability and state estimation • state estimation • discrete-time observability • observability – controllability duality • observers for noiseless case • least-squares observers • example • continuous-time observability 19–1 State estimation set up we consider the discrete-time system x(t + 1) = Ax(t) + Bu(t) + w(t). B. but can be described statistically. or assumed small (e. • w is state disturbance or noise • v is sensor noise or error • A.. ..e. predict) next state an algorithm or system that yields an estimate x(s) is called an observer or ˆ state estimator x(s) is denoted x(s|t − 1) to show what information estimate is based on ˆ ˆ (read. “ˆ(s) given t − 1”) x Observability and state estimation 19–3 Noiseless case let’s look at finding x(0). . . y(t − 1) • s = 0: estimate initial state • s = t − 1: estimate current state • s = t: estimate (i. with no state or measurement noise: x(t + 1) = Ax(t) + Bu(t). y(t − 1) u(t − 1) Observability and state estimation 19–4 . .State estimation problem state estimation problem: estimate x(s) from u(0). u(t − 1). .   = Otx(0) + Tt   . with x(t) ∈ Rn. y(t) ∈ Rp y(t) = Cx(t) + Du(t) then we have     y(0) u(0) . . . y(0). u(t) ∈ Rm. . . . Ot =  . t − 1] • Tt maps input to output over [0. output is zero over interval [0. t − 1] hence we have    y(0) u(0) . its effect can be subtracted out Observability and state estimation 19–6 . t−1 CA  D  CB Tt =  .   . x(0) is to be determined Observability and state estimation 19–5  hence: • can uniquely determine x(0) if and only if N (Ot) = {0} • N (Ot) gives ambiguity in determining x(0) • if x(0) ∈ N (Ot) and u = 0. . t−2 CA B  0 D CAt−3B ··· 0 ··· D     · · · CB • Ot maps initials state into resulting output over [0.  − Tt   . t − 1] • input u does not affect ability to determine x(0). .  .where  C  CA  . Otx(0) =  y(t − 1) u(t − 1) RHS is known. t − 1] for any t. ˜ A = AT . D) be dual of system (A. i. n − 1] N (O) is called unobservable subspace.. n−1 CA is called the observability matrix if x(0) can be deduced from u and y over [0. i. = [C T AT C T · · · (AT )n−1C T ] transpose of observability matrix ˜ similarly we have O = C T Observability and state estimation 19–8 . N (Ot) = N (O) where  C  CA   O = On =  . C. D).. Rank(O) = n Observability and state estimation 19–7  Observability – controllability duality ˜ ˜ ˜ ˜ let (A.   .e.e. describes ambiguity in determining state from input and output system is called observable if N (O) = {0}. ˜ B = CT . then x(0) can be deduced from u and y over [0. C. B. ˜ C = BT . ˜ D = DT controllability matrix of dual system is ˜ ˜ ˜˜ ˜ ˜ C = [B AB · · · An−1B] = OT . . B. . . An−1 hence for t ≥ n. each Ak is linear combination of A0. .Observability matrix by C-H theorem. . x(0) = F  y(t − 1) u(t − 1) which deduces x(0) (exactly) from u.  − Tt   .thus. system is observable) and let F be any left inverse of Ot. system is observable (controllable) if and only if dual system is controllable (observable) in fact.e. y over [0. .. t − 1] in fact we have    y(τ − t + 1) u(τ − t + 1) . . i. .e. x(τ − t + 1) = F  y(τ ) u(τ ) Observability and state estimation 19–10   . unobservable subspace is orthogonal complement of controllable subspace of dual ˜ N (O) = range(OT )⊥ = range(C)⊥ Observability and state estimation 19–9 Observers for noiseless case suppose Rank(Ot) = n (i.  − Tt   . F Ot = I then we have the observer    y(0) u(0) ...e. i. given past t − 1 inputs & outputs observer is (multi-input. . with inputs u and y.. n−1 CA n−1 (Az) = CA z = − n αiCAiz = 0 i=0 (by C-H) where det(sI − A) = sn + αn−1sn−1 + · · · + α0 Observability and state estimation 19–12 . then Az ∈ N (O) proof: suppose z ∈ N (O). . multi-output) finite impulse response (FIR) filter. and output x ˆ Observability and state estimation 19–11 Invariance of unobservable set fact: the unobservable subspace N (O) is invariant.e. if z ∈ N (O).e. CAk z = 0 for k = 0. . our observer estimates what state was t − 1 epochs ago. n − 1 evidently CAk (Az) = 0 for k = 0. . i.. i. .. .i. . n − 2. .e. . u(n−1)      D    ··· 0 ··· CAn−3B · · · CB (same matrices we encountered in discrete-time case!) Observability and state estimation 19–14 . ˙ y = Cx + Du can we deduce state x from u and y? let’s look at derivatives of y: y y ˙ = Cx + Du = C x + Du = CAx + CBu + Du ˙ ˙ ˙ y = CA2x + CABu + CB u + D¨ ¨ ˙ u and so on Observability and state estimation 19–13 hence we have     y y ˙ . .Continuous-time observability continuous-time system with no sensor or state noise: x = Ax + Bu.  y (n−1) where O is the observability matrix and D  CB T = . n−2 CA B  0 D    = Ox + T     u u ˙ .  . . y(t) up to order n − 1 in this case we say system is observable can construct an observer using any left inverse F of O:  x = F   Observability and state estimation  y y ˙ . y (n−1)(t) • derivative-based state reconstruction is dual of state transfer using impulsive inputs Observability and state estimation 19–16 . y(t). . . . . u(n−1)     hence if N (O) = {0} we can deduce x(t) from derivatives of u(t). . u(n−1)(t). y (n−1)    −T     u u ˙ . .rewrite as RHS is known. u(n−1)     19–15 • reconstructs x(t) (exactly and instantaneously) from u(t). x is to be determined  Ox =    y y ˙ . . . y (n−1)    −T     u u ˙ . . . . no signal processing of any kind applied to u and y can deduce x unobservable subspace N (O) gives fundamental ambiguity in deducing x from u. y the corresponding state and output. y(t) = Cx(t) + Du(t) + v(t) we assume Rank(Ot) = n (hence. i. x = Ax + Bu. x(0) = Ot  ˆ y(t − 1) u(t − 1) † T where Ot = Ot Ot Observability and state estimation  −1 T Ot 19–18 .e. with sensor noise: x(t + 1) = Ax(t) + Bu(t).  − Tt   . ˙ y = Cx + Du then state trajectory x = x + etAz satisfies ˜ ˙ x = A˜ + Bu.e. and u is any input. y consistent with both state trajectories x.. system is observable) least-squares observer uses pseudo-inverse:    y(0) u(0) † . x ˜ hence if system is unobservable.. y Observability and state estimation 19–17 Least-squares observers discrete-time system. with x. ˜ x y = C x + Du ˜ i. .A converse suppose z ∈ N (O) (the unobservable subspace). . input/output signals u.  . xls(0) = x(0) if sensor noise is zero ˆ (i. t−1 measured as τ =0 y (τ ) − y(τ ) ˆ 2 can express least-squares initial state estimate as t−1 −1 t−1 xls(0) = ˆ (A ) C CA τ =0 T τ T τ τ =0 (AT )τ C T y (τ ) ˜ where y is observed output with portion due to input subtracted: ˜ y = y − h ∗ u where h is impulse response ˜ Observability and state estimation 19–19 Least-squares observer uncertainty ellipsoid † since Ot Ot = I. x(0) = xls(0) − x(0) = Ot  ˜ ˆ v(t − 1) where x(0) is the estimation error of the initial state ˜ in particular. observer recovers exact state in noiseless case) now assume sensor noise is unknown. 1 v(τ ) t τ =0 t−1 2  ≤ α2 Observability and state estimation 19–20 .e. with input u and initial state x(0) ˆ (and no sensor noise). and • output y that was observed. but has RMS value ≤ α.. we have  v(0) † .interpretation: xls(0) minimizes discrepancy between ˆ • output y that would be observed. y(t). then P can have nonzero nullspace i. . (since memory of x(0) fades ..  . y over longer and longer periods: • if A is stable. .. ) • if A is not stable. . can’t estimate initial state perfectly even with infinite number of measurements u(t). t = 0. P > 0 i. initial state estimation error gets arbitrarily small (at least in some directions) as more and more of signals u and y are observed Observability and state estimation 19–22 . and gives the limiting uncertainty in estimating x(0) from u.e. Ot  =  v(t − 1)    1 v(τ ) t τ =0 t−1 2 x(0) ∈ Eunc ˜ ≤ α2    Eunc is ‘uncertainty ellipsoid’ for x(0) (least-square gives best Eunc) shape of uncertainty ellipsoid determined by matrix T Ot Ot −1 t−1 −1 = τ =0 (AT )τ C T CAτ maximum norm of error is √ † xls(0) − x(0) ≤ α t Ot ˆ Observability and state estimation 19–21 Infinite horizon uncertainty ellipsoid the matrix t−1 −1 P = lim t→∞ (AT )τ C T CAτ τ =0 always exists. .set of possible estimation errors is ellipsoid  v(0) † .e. . noisy) range measurements from directions −15◦. 30◦.03) Observability and state estimation 19–24 . y(t) =  . x4(t)) is velocity of particle • can assume RMS value of v is around 2 • ki is unit vector from sensor i to origin true initial position & velocities: x(0) = (1 − 3 − 0. 1). 0  1  T k1 .  x(t) + v(t) T k4  • (x1(t).Example • particle in R2 moves with uniform velocity • (linear. 0◦.04 0. can assume RMS value of v is not much more than 2 • no assumptions about initial position & velocity particle range sensors problem: estimate initial position & velocity from range measurements Observability and state estimation 19–23 express as linear system 1  0 x(t + 1) =   0 0  0 1 0 0 1 0 1 0  0 1   x(t). x2(t)) is position of particle • (x3(t). once per second • range noises IID N (0. 20◦. range measurements (& noiseless versions): 5 measurements from sensors 1 − 4 4 3 2 1 range 0 −1 −2 −3 −4 −5 0 20 40 60 80 100 120 t Observability and state estimation 19–25 • estimate based on (y(0), . . . , y(t)) is x(0|t) ˆ • actual RMS position error is (ˆ1(0|t) − x1(0))2 + (ˆ2(0|t) − x2(0))2 x x (similarly for actual RMS velocity error) Observability and state estimation 19–26 RMS position error 1.5 1 0.5 0 10 20 30 40 50 60 70 80 90 100 110 120 RMS velocity error 1 0.8 0.6 0.4 0.2 0 10 20 30 40 50 60 70 80 90 100 110 120 t Observability and state estimation 19–27 Continuous-time least-squares state estimation assume x = Ax + Bu, y = Cx + Du + v is observable ˙ least-squares estimate of initial state x(0), given u(τ ), y(τ ), 0 ≤ τ ≤ t: choose xls(0) to minimize integral square residual ˆ t J= 0 y (τ ) − Ceτ Ax(0) ˜ 2 dτ where y = y − h ∗ u is observed output minus part due to input ˜ let’s expand as J = x(0)T Qx(0) + 2rT x(0) + s, t t Q= 0 e τ AT C Ce T τA dτ, t r= 0 ˜ eτ A C T y (τ ) dτ, T s= 0 Observability and state estimation y (τ )T y (τ ) dτ ˜ ˜ 19–28 setting ∇x(0)J to zero, we obtain the least-squares observer t −1 xls(0) = Q ˆ −1 r= 0 e τ AT t 0 C Ce T τA dτ eA τ C T y (τ ) dτ ˜ T estimation error is t −1 x(0) = xls(0) − x(0) = ˜ ˆ e 0 τ AT t 0 C Ce T τA dτ eτ A C T v(τ ) dτ T therefore if v = 0 then xls(0) = x(0) ˆ Observability and state estimation 19–29 EE263 Autumn 2010-11 Stephen Boyd Lecture 20 Some parting thoughts . . . • linear algebra • levels of understanding • what’s next? 20–1 Linear algebra • comes up in many practical contexts (EE, ME, CE, AA, OR, Econ, . . . ) • nowadays is readily done cf. 10 yrs ago (when it was mostly talked about) • Matlab or equiv for fooling around • real codes (e.g., LAPACK) widely available • current level of linear algebra technology: – 500 – 1000 vbles: easy with general purpose codes – much more possible with special structure, special codes (e.g., sparse, convolution, banded, . . . ) Some parting thoughts . . . 20–2 Levels of understanding Simple, intuitive view: • 17 vbles, 17 eqns: usually has unique solution • 80 vbles, 60 eqns: 20 extra degrees of freedom Platonic view: • singular, rank, range, nullspace, Jordan form, controllability • everything is precise & unambiguous • gives insight & deeper understanding • sometimes misleading in practice Some parting thoughts . . . 20–3 Quantitative view: • based on ideas like least-squares, SVD • gives numerical measures for ideas like singularity, rank, etc. • interpretation depends on (practical) context • very useful in practice Some parting thoughts . . . 20–4 • must have understanding at one level before moving to next • never forget which level you are operating in Some parting thoughts . . . 20–5 What’s next? • EE364a — convex optimization I (Win 10-11) • EE364b — convex optimization II (Spr 10-11) (plus lots of other EE, CS, ICME, MS&E, Stat, ME, AA courses on signal processing, control, graphics & vision, machine learning, computational geometry, numerical linear algebra, . . . ) Some parting thoughts . . . 20–6 EE263 Prof. S. Boyd Basic Notation Basic set notation the set with elements a1 , . . . , ar . a is in the set S. the sets S and T are equal, i.e., every element of S is in T and every element of T is in S. S⊆T the set S is a subset of the set T , i.e., every element of S is also an element of T . ∃a ∈ S P(a) there exists an a in S for which the property P holds. ∀x ∈ S P(a) property P holds for every element in S. {a ∈ S | P(a)} the set of all a in S for which P holds (the set S is sometimes omitted if it can be determined from context). A∪B union of sets, A ∪ B = {x | x ∈ A or x ∈ B}. A∩B intersection of sets, A ∩ B = {x | x ∈ A and x ∈ B}. A×B Cartesian product of two sets, A × B = {(a, b) | a ∈ A, b ∈ B}. Some specific sets R Rn R1×n Rm×n j i the set of real numbers. the set of real n-vectors (n × 1 matrices). the set of real n-row-vectors (1 × n matrices). the set of real m × n matrices. √ can mean √ −1, in the company of electrical engineers. can mean −1, for normal people; i is the polite term in mixed company (i.e., when non-electrical engineers are present). C, Cn , Cm×n the set of complex numbers, complex n-vectors, complex m × n matrices. Z the set of integers: Z = {. . . , −1, 0, 1, . . .}. R+ the nonnegative real numbers, i.e., R+ = {x ∈ R | x ≥ 0}. [a, b], (a, b], [a, b), (a, b) the real intervals {x | a ≤ x ≤ b}, {x | a < x ≤ b}, {x | a ≤ x < b}, and {x | a < x < b}, respectively. {a1 , . . . , ar } a∈S S=T 1 Vectors and matrices We use square brackets [ and ] to construct matrices and vectors, with white space delineating the entries in a row, and a new line indicating a new row. For example [1 2] is a row vector 1 2 3 is matrix in R2×3 . [1 2]T denotes a column vector, i.e., an element in R1×2 , and 4 5 6 of R2×1 , which we abbreviate as R2 . We use curved brackets ( and ) surrounding lists of entries, delineated by commas, as an alternative method to construct (column) vectors. Thus, we have three ways to denote a column vector: 1 (1, 2) = [1 2]T = . 2 Note that in our notation scheme (which is fairly standard), [1, 2, 3] and (1 2 3) aren’t used. We also use square and curved brackets to construct block matrices and vectors. For example if x, y, z ∈ Rn , we have x y z ∈ Rn×3 , a matrix with columns x, y, and z. We can construct a block vector as x  (x, y, z) =  y  ∈ R3n .  z Functions The notation f : A → B means that f is a function on the set A into the set B. The notation b = f (a) means b is the value of the function f at the point a, where a ∈ A and b ∈ B. The set A is called the domain of the function f ; it can thought of as the set of legal parameter values that can be passed to the function f . The set B is called the codomain (or sometimes range) of the function f ; it can thought of as a set that contains all possible returned values of the function f . There are several ways to think of a function. The formal definition is that f is a subset of A × B, with the property that for every a ∈ A, there is exactly one b ∈ B such that (a, b) ∈ f . We denote this as b = f (a). Perhaps a better way to think of a function is as a black box or (software) function or subroutine. The domain is the set of all legal values (or data types or structures) that can be passed to f . The codomain of f gives the data type or data structure of the values returned by f . Thus f (a) is meaningless if a ∈ A. If a ∈ A, then b = f (a) is an element of B. Also note that the function is denoted f ; it is wrong to say ‘the function f (a)’ (since f (a) is an element 2   of B, not a function). Having said that, we do sometimes use sloppy notation such as ‘the function f (t) = t3 ’. To say this more clearly you could say ‘the function f : R → R defined by f (t) = t3 for t ∈ R’. Examples • −0.1 ∈ R, √ 2 ∈ R+ , 1 − 2j ∈ C (with j = A= √ −1). • The matrix 0.3 6.1 −0.12 7.2 0 0.01 is an element in R2×3 . We can define a function f : R3 → R2 as f (x) = Ax for any x ∈ R3 . If x ∈ R3 , then f (x) is a particular vector in R2 . We can say ‘the function f is linear’. To say ‘the function f (x) is linear’ is technically wrong since f (x) is a vector, not a function. Similarly we can’t say ‘A is linear’; it is just a matrix. • We can define a function f : {a ∈ R | a = 0} × Rn → Rn by f (a, x) = (1/a)x, for any a ∈ R, a = 0, and any x ∈ Rn . The function f could be informally described as division of a vector by a nonzero scalar. • Consider the set A = { 0, − 1, 3.2 }. The elements of A are 0, −1 and 3.2. Therefore, for example, −1 ∈ A and { 0, 3.2 } ⊆ A. Also, we can say that ∀x ∈ A, − 1 ≤ x ≤ 4 or ∃x ∈ A, x > 3. • Suppose A = { 1, − 1 }. Another representation for A is A = { x ∈ R | x2 = 1 }. • Suppose A = { 1, − 2, 0 } and B = { 3, − 2 }. Then A ∪ B = { 1, − 2, 0, 3 }, A ∩ B = { − 2 }. • Suppose A = { 1, − 2, 0 } and B = {1, 3}. Then A × B = { (1, 1), (1, 3), (−2, 1), (−2, 3), (0, 1), (0, 3) }. • f : R → R with f (x) = x2 − x defines a function from R to R while u : R+ → R2 with t cos t u(t) = . 2e−t defines a function from R+ to R2 . 3 A primer on matrices Stephen Boyd September 27, 2011 These notes describe the notation of matrices, the mechanics of matrix manipulation, and how to use matrices to formulate and solve sets of simultaneous linear equations. We won’t cover • linear algebra, i.e., the underlying mathematics of matrices • numerical linear algebra, i.e., the algorithms used to manipulate matrices and solve linear equations • software for forming and manipulating matrices, e.g., Matlab, Mathematica, or Octave • how to represent and manipulate matrices, or solve linear equations, in computer languages such as C/C++ or Java • applications, for example in statistics, mechanics, economics, circuit analysis, or graph theory 1 Matrix terminology and notation scalars), written between square  Matrices A matrix is a rectangular array of numbers (also called brackets, as in  0 1 −2.3 0.1  4 −0.1 0 A =  1.3 4.1 −1 0 1.7 An important attribute of a matrix is its size or dimensions, i.e., the numbers of rows and columns. The matrix A above, for example, has 3 rows and 4 columns, so its size is 3 × 4. (Size is always given as rows × columns.) A matrix with m rows and n columns is called an m × n matrix. An m × n matrix is called square if m = n, i.e., if it has an equal number of rows and columns. Some authors refer to an m × n matrix as fat if m < n (fewer rows than columns), or skinny if m > n (more rows than columns). The matrix A above is fat. 1  . scalars. . vectors.The entries or coefficients of a matrix are the values in the array.3. vectors. as in a3 . Unfortunately. x.. . its column index is 1. f . Greek letters (α. and scalars Some authors try to use notation that helps the reader distinguish between matrices. vectors. ) for vectors. A13 = −2. i. respectively) indices. The entries are sometimes called the components of the vector. . The i. .e. denoted by double subscripts: the i. there are about as many notational conventions as authors. j entry of a matrix C is denoted Cij (which is a number). Sometimes a 1 × 1 matrix is considered to be the same as an ordinary number. a matrix with only one row. is called a row vector. where the subscript denotes the size.3. a zero matrix is denoted just 0. ordinary numbers are often called scalars. j entry is the value in the ith row and jth column. A32 = −1. matrices) despite the author’s notational scheme (if any exists). with size n × 1. As an example. i. Zero and identity matrices The zero matrix (of size m × n) is the matrix with all entries equal to zero.e. Sometimes the size is specified by calling it an n-vector. F . so you should be prepared to figure out what things are (i. Similarly. lower-case letters (a. . and capital letters (A. its third component is w3 = 0. The positive integers i and j are called the (row and column. β. Sometimes the zero matrix is written as 0m×n . In the context of matrices and scalars. . or vector of dimension 4). In this case you’ll 2 . As an example.e. But often.3  0.. Vectors and scalars A matrix with only one column. with size 1 × n.3 is a 4-vector (or 4 × 1 matrix. w = −2. . . Notational conventions for matrices. ) for matrices. . ) might be used for numbers. The row index of the bottom left entry (which has value 4. Two matrices are equal if they are the same size and all the corresponding entries (which are numbers) are equal.1 −3 0 is a row vector (or 1 × 3 matrix). For our example above. the same symbol used to denote the number 0. Other notational conventions include matrices given in bold font (G). or vectors written with arrows above them (a).. and scalars (numbers). is called a column vector or just a vector. For example. The entries of a vector are denoted with just one subscript (since the other is 1). and the number of rows of a vector is sometimes called its dimension. its third component is v3 = 3.   1  −2     v=  3.1) is 3. namely.e. we call it a zero (row or column) vector.have to figure out the size of the zero matrix from the context. Sometimes a subscript denotes the size.. and denoted 1 (by some authors) or e (by others). e1 =  0  . even though we use the same symbol to denote them (i.. sometimes called the ones vector. Unit and ones vectors A vector with one component one and all others zero is called a unit vector. An identity matrix is another common matrix. For example. 0 i = j. The ith unit vector.e. (Remember that both are denoted with the same symbol. Also. which are the 2 × 2 and 4 × 4 identity matrices. Another term for ei is ith standard basis vector. i. the 4-dimensional ones vector is   1  1    1 =  . is usually denoted ei . and its off-diagonal entries (those with unequal row and column indices) are zero.e. you usually have the figure out the dimension of a unit vector from context. whose ith component is 1 and all others are zero.. the identity matrix of size n is defined by Iij = Perhaps more illuminating are the examples 1 0 0 1      1 i = j.) The importance of the identity matrix will become clear later. 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1      . But more often the size must be determined from context (just like zero matrices). i. you should watch out. 0).) When a zero matrix is a (row or column) vector. I. are all equal to one. As with zero or identity matrices. Note that zero matrices of different sizes are different matrices. The three unit 3-vectors are:       0 0 1 e3 =  0  . (More on this later.       1 0 0 Note that the n columns of the n × n identity matrix are the n unit n-vectors. Formally. has the same number of rows as columns. It is always square. e2 =  1  . those with equal row and column index. because some authors use the term ‘unit vector’ to mean a vector of length one. (We’ll explain that later. as in I4 or I2×2 . Identity matrices are denoted by the letter I. Its diagonal entries. the equation it appears in).e.) Another common vector is the one with all components one.  1  1 3 . In programming we call this overloading: we say the symbol 0 is overloaded because it can mean different things depending on its context (i.. . the rows and columns of A are transposed in AT .2 Matrix operations Matrices can be combined using various operations to form other matrices.  1 6 1 2 0 4        7 0 + 2 3 = 9 3  3 5 0 4 3 1 A pair of row or column vectors of the same size can be added. (Thus the symbol + is overloaded to mean scalar addition when scalars appear on its left and right hand side.) 4 . we conclude that I must refer to a 2 × 2 identity matrix.e. 1 6 9 3 −I = 0 6 9 2 .. T  0 4 0 7 3   .e. Matrix addition is denoted by the symbol +. Matrix addition is commutative. If we transpose a T matrix twice. We always have A + 0 = 0 + A = A. if A and B are matrices of the same size. i. For ij example. i. denoted AT (or sometimes A′ ). i. In words.. (This is another example where you have to figure out the exact dimensions of the zero matrix from context. Here.e. we get back the original matrix: AT = A. otherwise they could not be added. then A + B = B + A. Matrix transpose If A is an m × n matrix.      Note that this gives an example where we have to figure out what size the identity matrix is.  7 0  = 4 0 1 3 1 Transposition converts row vectors into column vectors. Matrix subtraction is similar. the zero matrix must have the same dimensions as A. but you cannot add together a row vector and a column vector (except when they are both scalars!). Matrix addition Two matrices of the same size can be added together. Since you can only add (or subtract) matrices of the same size. its transpose. to form another matrix (of the same size). by adding the corresponding entries (which are numbers). As an example. is the n × m matrix given by AT = Aji . and vice versa.) For example. and matrix addition when matrices appear on its left and right hand sides. adding the zero matrix to a matrix has no effect. It’s also associative. so we write both as A + B + C. (A + B) + C = A + (B + C). is defined by p Cij = k=1 aik bkj = ai1 b1j + · · · + aip bpj . j entry of the product C = AB. i. or even scalar division with the scalar shown in the denominator (which just means scalar multiplication by one over the scalar). i = 1. . on the right we see two cases of scalar-matrix multiplication.. then (α + β)A = αA + βA. as in      2 12 1 6  9 3  · 2 =  18 6  . as in −2 −12 1 6    (−2)  9 3  =  −18 −6  . where α and β are scalars and A is a matrix. if A is any matrix and α. . Matrix multiplication It’s also possible to multiply two matrices using matrix multiplication. Suppose A and B are compatible..     12 0 6 0    9 6 9 6 0 3 3 = 3 2 3 2 0 1 . . its width) equals the number of rows of B (i. its height). . ..e. . β are any scalars. . you need to know the ith row of A and the jth column of B. This rule looks complicated. A has size m × p and B has size p × n. n.. You can multiply two matrices A and B provided their dimensions are compatible. The product matrix C = AB. which has size m × n. Scalar multiplication obeys several laws you can figure out for yourself. The + symbol on the left is addition of scalars. which means the number of columns of A (i. .g.e. Note that 0 · A = 0 (where the lefthand zero is the scalar zero. The 5 . Scalar multiplication is usually denoted by juxtaposition. Another simple property is (αβ)A = (α)(βA). j = 1. It’s a useful exercise to identify the symbols appearing in this formula.. m. which is done by multiplying every entry of the matrix by the scalar.e. with the scalar on the left. e. but I think these look pretty ugly.e. On the left hand side we see scalar-scalar multiplication (αβ) and scalar-matrix multiplication.Scalar multiplication Another operation is scalar multiplication: multiplying a matrix by a scalar (i. but there are several ways to remember it. number). while the + symbol on the right denotes matrix addition. and the righthand zero is a matrix zero of the same size as A).  −12 0 6 0 Sometimes you see scalar multiplication with the scalar on the right. To find the i. 2 entry. Two more similar calculations give us the remaining entries C21 and C22 : 1 2 3 −1 0 4  At this point. In fact. summing the products of corresponding entries: C11 = (1)(0) + (2)(2) + (3)(−1) = 1. we move across the first row of A and down the first column of B. and B has three rows. then AB makes sense (the dimensions are compatible) but BA doesn’t even make sense (much less equal AB). the lefthand number comes from the first row of A. 6 0 −1 −1 2 1 6 9 3 = −9 −3 17 0 . may be a different size than AB (so that equality in AB = BA is meaningless). when you multiply a matrix by an identity matrix. if it makes sense. BA may not even make sense. The product matrix C will have two rows (the number of rows of A) and two columns (the number of columns of B). i.. we move across the first row of A and down the second column of B: C12 = (1)(−3) + (2)(1) + (3)(0) = −1. 0 −3  1 =   2 −1 0  1 −1 −4 3 .summation above can be interpreted as ‘moving left to right along the ith row of A’ while moving ‘top to bottom’ down the jth column of B. (The identity matrices in the formulas AI = A and IA = A have different sizes — what are they?) One very important fact about matrix multiplication is that it is (in general) not commutative: we don’t (in general) have AB = BA. Some properties of matrix multiplication Now we can explain why the identity has its name: if A is any m × n matrix. matrix multiplication probably looks very complicated to you. it has no effect. we check that they are compatible: A has three columns. It is. To find the 1. and the righthand number comes from the first column of B. In each product term here.e. you’ll get used to it. then AI = A and IA = A. so they’re compatible. we don’t (in general) have AB = BA. or. To find the 1. when A and B are square. As a simple example. where A= 1 2 3 −1 0 4 . consider: 1 6 9 3 0 −1 −1 2 = −6 11 −3 −3 . B= 2  −1 0   First. Even when AB and BA both make sense and are the same size. For example. you keep a running sum of the product of the coresponding entries from A and B. .. i. 0 −3  1 . let’s find the product C = AB. As an example. but once you see all the uses for it. Now let’s find the entries of the product C.e. As you go. if A is 2 × 3 and B is 3 × 4. 1 entry. e. . then their inner product is x.. i. 7 . y or x · y.e.e. is a scalar: vw = v1 w1 + · · · + vn wn . m Inner product Another important special case of matrix multiplication occurs when v is an row n-vector and w is a column n vector. (Non-integer powers. Matrix powers When a matrix A is square. We call F the inverse of A. We can think of matrix vector multiplication (with an m × n matrix) as a function that transforms n-vectors into m-vectors. Similarly. Matrix multiplication distributes across matrix addition: A(B + C) = AB + AC and (A + B)C = AC + BC. Therefore we write the product simply as ABC. such as A1/2 (the matrix squareroot). Other notation for the inner product is x. k copies of A multiplied together is denoted Ak .. y = xT y = x1 y1 + · · · + xn yn . i. If a matrix is not invertible. We can then also define k A−k = (A−1 ) . But remember that the matrix product xy doesn’t make sense (unless they are both scalars). to form A · A. and has size 1 × 1. then it makes sense to multiply A by itself. Matrix-vector product A very important and common case of matrix multiplication is y = Ax. and y is an m-vector. . If x and y are n-vectors. and denote it A−1 . Matrix inverse If A is square.. . α(AB) = (αA)B. (AB)C = A(BC) (provided the products make sense). i = 1. Then the product vw makes sense. or be ambiguous.e.Matrix multiplication is associative. The formula is yi = Ai1 x1 + · · · + Ain xn . i. This occurs often in the form xT y where x and y are both n-vectors. Matrix multiplication is also associative with scalar multiplication. x is an n-vector. we say it is singular or noninvertible. are pretty tricky — they might not make sense. In this case the product (which is a number) is called the inner product or dot product of the vectors x and y. i. unless certain conditions on A hold. where A is an m × n matrix.. and there is a matrix F such that F A = I. then we say that A is invertible or nonsingular. . where α is a scalar and A and B are matrices (that can be multiplied).) By convention we set A0 = I (usually only when A is invertible — see below). This is an advanced topic in linear algebra. We refer to this matrix as A2 . l ≥ 1 in general. have inverses.e. we have 1 −1 1 2 −1 = 1 3 2 1 −1 1 (you should check this!). Here we list a few others.) As a less obvious example.e. (If ad − bc = 0.. l if A is invertible) 8 . that you could figure out yourself. In other words. A basic result of linear algebra is that AA−1 = I.g.It’s important to understand that not all square matrices are invertible. When a matrix is invertible. i. the matrix is not invertible. that are not hard to derive. i. but they are not used in practice. but much more complicated. the inverse of the inverse is the original matrix. A + 0 = A. formulas for the inverse of larger (invertible) square matrices. The importance of the matrix inverse will become clear when we study linear equations. and for all k. and quite useful. −1 (A−1 ) = A. if you multiply a matrix by its inverse on the right (as well as the left).. It’s very useful to know the general formula for a 2 × 2 matrix inverse: a b c d −1 = 1 ad − bc d −b −c a provided ad − bc = 0. a zero matrix never has an inverse. you might try to show that the matrix 1 −1 −2 2 does not have an inverse. you get the identity. Useful identities We’ve already mentioned a handful of matrix identities. As an example of the matrix inverse.) There are similar. (We’re making no claims that our list is complete!) • transpose of product: (AB)T = B T AT • transpose of sum: (A + B)T = AT + B T • inverse of product: (AB)−1 = B −1 A−1 provided A and B are square (of the same size) and invertible • products of powers: Ak Al = Ak+l (for k. (For example.. e. Thus we have A B C D X Y = 9 AX + BY CX + DY = 2 2 1 3 . and 2 × 1). In this context the blocks are sometimes called submatrices of the big matrix. Thus in the examples above. You can also divide a larger matrix (or vector) into ‘blocks’. 2 × 2. it’s often useful to write an m × n matrix as a 1 × n block matrix of m-vectors (which are just its columns).e. Block matrices can be added and multiplied as if the entries were numbers. or as an m × 1 block matrix of n-row-vectors (which are its rows).g. 0 G where A.. Such matrices are called block matrices. For example. because now the bottom block has two columns. 2 position must have size m × m (since it must have the same number of rows as F ). say. and G are matrices (as are 0 and I). 1 block — it must be p × n. Then the identity matrix in the 1. suppose that C= Then we have D C Continuing this example. Of course the block matrices must have the right dimensions to be able to fit together: matrices in the same (block) row must have the same number of rows (i. matrices in the same (block) column must have the same number of columns (i. As a specific example. because the top block has two columns and the bottom block has three. the entries A. We also see that G must have m columns.. The second example is more interesting. they could be 2 × 3. B. 0 2 3 2 2 5 4 7 1 3 . Thus. . F is called the 1. as in F I A B C . D= 0 2 3 5 4 7 .e. B.. That fixes the dimensions of the 0 matrix in the 2. 1 block of the second matrix. just like the top block.Block matrices and submatrices In some applications it’s useful to form matrices whose entries are themselves matrices. B and C must have the same number of rows (e. the same ‘width’). A. Suppose that F is m × n. etc. F . are called ‘blocks’ and are sometimes named by indices. dimensions p × m. the same ‘height’). But the block expression C DT does make sense. provided the corresponding entries have the right sizes (i. ‘conform’) and you’re careful about the order of multipication. the expression C D doesn’t make sense.. . C.e. functions defined by matrix-vector multiplication are linear. Then z is a linear function of x. 3 Linear equations and matrices Linear functions Suppose that f is a function that takes as argument (input) n-vectors and returns (as output) m-vectors. ... y = Ax where A is m × n. i = 1. f (αx) = αf (x) • superposition: for any n-vectors u and v. The first thing to do is to rewrite the equations with the variables lined up in columns. z = By where B is p × m. m This gives a simple interpretation of Aij : it gives the coefficient by which yi depends on xj . (Conversely. We say f is linear if it satisfies two properties: • scaling: for any n-vector x and any scalar α. BY .. .provided the products AX. i. linear functions of linear functions of some variables). i. Suppose also that a p-vector z is a linear function of y.e. . CX. Let’s start with a simple example of two equations in three variables: 1 + x2 = x3 − 2x1 . xn can be represented by the compact matrix equation Ax = b. i. f (u + v) = f (u) + f (v) It’s not hard to show that such a function can always be represented as matrix-vector multiplication: there is an m × n matrix A such that f (x) = Ax for all n-vectors x.e. f (x) = y where n yi = j=1 Aij xj = Ai1 x1 + · · · + Ain xn . where x is a vector made from the variables. . A is an m × n matrix and b is an m-vector.e. and DY makes sense. .e.. .) We can also write out the linear function in explicit form. which we can express in the simple form z = By = (BA)x. . . . Linear equations Any set of m linear equations in (scalar) variables x1 . So matrix multiplication corresponds to composition of linear functions (i. Suppose an m-vector y is a linear function of the n-vector x. x3 = x2 − 2. and the constants on the righthand side: 2x1 +x2 −x3 = −1 0x1 −x2 +x3 = −2 Now it’s easy to rewrite the equations as a single matrix equation: 2 1 −1 0 −1 1  x1    x2  = x3 10  −1 −2 . the inverse A−1 exists. . One or more of the equations might be redundant (i. to get the solution x = A−1 b (although that procedure would.g.. then. around 10MB. Solving linear equations in practice When we solve linear equations in practice.) In many applications the matrix A has many. this means each 11 . Of course. ... so solving say a set of 1000 equations on a small PC class computer takes only a second or so. or the equations could be inconsistent as in x1 = 1. xn . When these pathologies occur. solving a system of n simultaneous linear equations) require on the order of n3 basic arithmetic operations. Another (perhaps more pessimistic) way to put this is. . (These facts are studied in linear algebra.. work).so we can express the two equations in the three variables as Ax = b where A= 2 1 −1 0 −1 1 . of its entries equal to zero. In terms of the associated linear equations. Practical methods compute the solution x = A−1 b directly. or almost all. and then multiply it on the right by the vector b.e. The power of matrix notation is that just a few symbols (e. when a matrix A is singular. (A 1000 × 1000 matrix requires storage for 106 doubles. But modern computers are very fast. will take much (125×) longer (on a PC class computer). Suppose that A is invertible. x1 x =  x2  . by computer) we do not first compute the matrix A−1 . where A is an n × n matrix. (A 5000 × 5000 matrix requires around 250MB to store. A is singular means that the equations in Ax = b are redundant or inconsistent — a sign that you have set up the wrong equations (or don’t have enough of them). written as the compact matrix equation Ax = b. The most common methods for computing x = A−1 b (i. it turns out the simultaneous linear equations Ax = b are redundant or inconsistent. 5000.. can be obtained from the others).) But solving larger sets of equations. in which case we say it is sparse.e. i. of course. A−1 exists. for example. A−1 ). Let’s multiply both sides of the matrix equation Ax = b on the left by A−1 : A−1 (Ax) = A−1 b.e.e. so we have actually solved the simultaneous linear equations: x = A−1 b.. Here we should make a comment about matrix notation. noninvertible. Now you can see the importance of the matrix inverse: it can be used to solve simultaneous linear equations. i.e. Conversely. A−1 ) can express a lot. The lefthand side simplifies to A−1 Ax = Ix = x..) From a practical point of view. the matrix A is singular. (i. x1 = 2. Otherwise. and b is an n-vector. and the equations can be solved as x = A−1 b. you can’t always solve n linear equations for n variables.   x3   b= −1 −2 Solving linear equations Now suppose we have n linear equations in n variables x1 . . a lot of work can be hidden behind just a few symbols (e.g. with hundreds of thousands of (sparse) equations. using sparse matrix techniques. It turns out that such sparse equations can be solved by computer very efficiently. and might take only a few seconds (but it depends on how sparse the equations are).equation involves only some (often just a few) of the variables. It’s not uncommon to solve for hundreds of thousands of variables. 12 . solving a system of 10000 simultaneous sparse linear equations is feasible. Even on a PC class computer. determinant. the submatrices in any row of a block matrix must have the same number of rows). • Multiplying matrices with incompatible dimensions (i. trace. subtracting. when A ∈ R2×3 and B ∈ R3×3 . • Violating the rules of constructing block matrices (e. the perpetrator forms an expression or makes an assertion that does not break any syntax rules. but is wrong because of the meaning.EE263 Prof. Example: forming AT B. (What you do with matrices in your spare time. when the number of columns of A does not equal the number of rows of B). Boyd Crimes Against Matrices In this note we list some matrix crimes that we have. since it is so easy to check your work for potential violations. forming AB. Semantic crimes In a semantic crime. 1 . at least on things you hand in to us. when A ∈ R2×3 . in EE263 we have a zero-tolerance policy for crimes against matrices. the perpetrator attempts to combine matrices (or other mathematical objects) in ways that violate basic syntax rules. or powers of a nonsquare matrix. when A ∈ R2×3 and B ∈ R3×3 . or on scratch paper. witnessed too often. matrices (or vectors) of different dimensions..) Check your work — don’t become just another sad statistic! Syntax crimes In a syntax crime. Example: writing A + B. so you need to be more vigilant to avoid committing them.e. But we recommend you avoid these crimes at all times.. We list some typical examples below. S. Be very careful to avoid committing any of these crimes. These crimes are a bit harder to detect than syntax crimes. or equating. when A ∈ R2×3 and B ∈ R3×3 . sadly. • Taking the inverse. These are serious crimes of negligence. is of course your own business. Example: forming the block matrix [A B]. Example: forming A−1 .g. in order to not build bad habits. • Adding. or involve both syntax and semantic elements. Example: writing QQT = I. 2 . concluding that B = C from AB = AC. Note: (AB)−1 = A−1 B −1 violates syntax rules. • Using (AB)−1 = A−1 B −1 (instead of the correct formula (AB)−1 = B −1 A−1 ). when a. e.) Example: forming (wwT )−1 . • Referring to a left inverse of a strictly fat matrix or a right inverse of a strictly skinny matrix. have independent columns). involves both a syntax and semantic crime. (Taking the inverse of a nonsquare matrix is a syntax crime—see above. • Dimension crimes. but singular matrix. where A ∈ Rm×n and B ∈ Rk×p . Here’s how you might check for various crimes you might commit in forming this expression. Miscellaneous crimes Some crimes are hard to classify. when m < n.e. Example: concluding x = y from aT x = aT y. when w ∈ R2 . Note: this also violates syntax rules. Note: writing (wwT )−1 = (wT )−1 w−1 .. where w ∈ R2 .• Taking the inverse of a square. y ∈ R4 . Alleging that a set of m vectors in Rn is independent. if AT B T is not a valid product. when m > n. when A is not known to be one-to-one (i. This (false) identity relies on the very useful. it violates semantic rules if A or B is not invertible. when Q ∈ R5×3 . • Cancelling matrices on the left or right in inappropriate circumstances. x. An example Let’s consider the expression (AT B)−1 . • Using (AB)T = AT B T (instead of the correct formula (AB)T = B T AT ). if A or B is not square.g. identity AB = BA.. Alleging that a set of m vectors in Rn span Rn . • Using (A + B)2 = A2 + 2AB + B 2 . but unfortunately false. Incorrect use of a matrix identity often falls in this class. or if A and B are strictly fat. we know that the dimensions of A and B must be the same (ignoring the case where one or the other is interpreted as a scalar).e. m ≥ n. • The product AT B is n × p. so we better have m = k to avoid a syntax violation. • If AT B is a strictly skinny–strictly fat product (i. The point of our analysis above is that if A and B don’t have the same dimension. the matrix AT B can be singular.• We multiply AT . but can be argued to not violate syntax. when B is scalar. i. then AT B cannot possibly be invertible. so we have a semantic violation. Note: if A is a scalar. A and B must have the same dimensions. we must have A and B square or skinny. In a similar way. which is n × m... then AT B might be a strange thing to write. no matter what values A and B might have in your application or argument.e. you can write AT B. and argue that syntax is not violated. Summary: to write (AT B)−1 (assuming neither A nor B is interpreted as a scalar). so we better have n = p in order to (attempt to) invert it. and B. To avoid this. 3 . in which case (AT B)−1 is meaningless. A and B are strictly fat). At this point. then (AT B)−1 is guaranteed to be meaningless. Of course. and are skinny or square. which is k × p. even if A and B have the same dimensions. and be skinny or square. even though m = k. is obtained using [Q. and a bit more sensitive to roundoff errors. for example. it computes A−1 y. The ‘economy’ QR decomposition. There are several ways to compute xls in Matlab. You better be sure here that A is skinny (or square) and full rank. i.0).R]=qr(A. then A\b will generate an error message. with square.. The least-squares approximate solution of Ax = y is given by xls = (AT A)−1 AT y.R]=qr(A) returns the ‘full’ QR decomposition. (The pseudo-inverse is also defined when A is not full rank.e. Yet another method is via a QR decomposition. 1 . We encourage you to verify that you get the same answer (except possibly for some small difference due to roundoff error) as you do using the backslash operator. Here you’ll get an error if A is not full rank.) To find the least-squares approximate solution using the pseudo-inverse. orthogonal Q ∈ Rm×m . which is A† = (AT A)−1 AT when A is full rank and skinny (or square). You can compute the least-squares approximate solution using the economy QR decomposition using. and R ∈ Rm×n upper triangular (with zeros in its bottom part). and full rank.. the backslash operator just solves the linear equations. m ≥ n. which means that Rank(A) = n. this method is a bit less efficient. If A is square (and invertible). You can also use the pseudo-inverse function pinv(). This is the unique x ∈ Rn that minimizes Ax − y . [Q. If A is not full rank.e. Boyd Least squares and least norm in Matlab Least squares approximate solution Suppose A ∈ Rm×n is skinny (or square). Compared with the backslash operator. or fat. In Matlab. in which Q ∈ Rm×n (with orthonormal columns) and invertible R. S.EE263 Prof. you can use xls=pinv(A)*y. i. You can also use the formula to compute the least squares approximate solution: xls=inv(A’*A)*A’*y. and then a least-squares solution will be returned. but you won’t notice it unless m and n are a thousand or so. The simplest method is to use the backslash operator : xls=A\y. otherwise you’ll compute something (with no warning messages) that isn’t the least-squares approximate solution. but it’s not given by the formula above. which computes the pseudo-inverse. You can find the least-norm solution via QR as well: [Q. or if A is skinny.. % compute residual A’*r % compute inner product of columns of A and r and checking that the result is very small.R]=qr(A. xln=Q*(R’\y). The least-norm solution of Ax = y is given by xln = AT (AAT )−1 y. Warning! We should also mention a method that doesn’t work : the backslash operator. since then R will not be invertible.0). Least norm solution Now suppose A ∈ Rm×n and is fat (or square). then A\y gives a solution of the Ax = y. which means that Rank(A) = m.[Q. To be sure you’ve really computed the least-squares approximate solution.) xln=pinv(A)*y.0). not necessarily the least-norm solution. You better be sure here that A is fat (or square) and full rank. (The pseudo-inverse is also defined when A is not full rank. You can use the formula to compute xln : xln=A’*inv(A*A’)*y. If A is fat and full rank. 2 . % compute economy QR decomposition xls=R\(Q’*y). xln has the smallest norm. for example with the commands r=A*x-y. and full rank. You can also use the pseudo-inverse function pinv(). which computes the pseudo-inverse. we encourage you to check that the residual is orthogonal to the columns of A. Among all solutions of Ax = y.e. This will fail if A is not full rank. but it’s not given by the formula above. m ≤ n. which is A† = AT (AAT )−1 when A is full rank and fat or square. You’ll get an error here if A is not full rank. We can compute xln in Matlab in several ways. i. otherwise you’ll compute something (with no warning messages) that isn’t the least-norm solution.R]=qr(A’. This means that you can’t just use the backslash operator: you have to check that what it returns is a solution.. is to use the backslash operator. Second. In particular. then Ax = b has a solution. the easiest way to determine whether Ax = b has a solution. with Rank(A) = r. we have a solution If the second line yields a result that is not very small. To check existence of a solution is the same as checking if b ∈ R(A). how small Ax−b has to be before you accept x as a solution of Ax = b). it’s just good common sense to check numerical computations as you do them. But this method does not give us a solution. and to find such a solution when it does. we conclude that Ax = b does not have a solution. find one. we do not assume that A is full rank. m > n = r). Exactly what A\b returns is a bit complicated to describe in the most general case. % possibly a solution to Ax=b norm(A*x-b) % if this is zero or very small.e. or r + 1. For example. In contrast to the rank method described above.) In Matlab this can be done as follows: x = A\b. but if there is a solution to Ax = b. and if so. when one exists. with A ∈ Rm×n .. when A is skinny and full rank (i. This method also has a hidden catch: Matlab uses a numerical tolerance to decide on the rank of a matrix.. 1 . which is either r (i. if b ∈ R(A).e. you decide on the numerical tolerance you’ll accept (i. Note that executing the first line might cause a warning to be issued. A\b returns the least-squares approximate solution. Existence of solution via rank A simple way to check if b ∈ R(A) is to check the rank of [A b]. A\b sometimes causes a warning to be issued. which in general is not a solution of Ax = b (unless we happen to have b ∈ R(A)). A common test that works well in many applications is Ax − b ≤ 10−5 b . and this tolerance might not be appropriate for your particular application. S.e. then A\b returns one. even when it returns a solution of Ax = b.EE263 Prof. If the two ranks above are equal. (In any case. A couple of warnings: First. This can be checked in Matlab using rank([A b]) == rank(A) (or evaluating the two ranks separately and comparing them). Boyd Solving general linear equations using Matlab In this note we consider the following problem: Determine whether there is a solution x ∈ Rn of the (set of) m linear equations Ax = b. the rank of A) if b ∈ R(A). A\b returns a result in many cases when there is no solution to Ax = b. We consider here the general case. Using the backslash and pseudo-inverse operator In Matlab. where z = P T x. Multiplying both sides of this equation by QT gives the equivalent set of m equations Rz = QT b. unless we have QT b = 0. 2 Actually. We get the 1 corresponding x from x = P z: −1 R1 QT b 1 x=P . Here Q ∈ Rm×m is orthogonal. Using A = QRP T we can write Ax = b as QRP T x = QRz = b. Whew. the equation above becomes R1 z1 = QT b. R ∈ Rm×n is upper triangular. Expanding this into subcomponents gives Rz = R1 R2 0 0 z= QT b 1 QT b 2 . it’s a bit opaque. R1 ∈ Rr×r is upper triangular with nonzero elements along its main diagonal. because 2 the bottom component of Rz is always zero. The zero submatrices in the bottom (block) row of R have m − r rows. Now we have a z that satisfies Rz = QT b: z = [z1 0]T . Then the equations reduce to 2 R1 z1 + R2 z2 = QT b. the construction outlined above is pretty much what A\b does. and P ∈ Rn×n is a permutation matrix. We see immediately that there is no solution of Ax = b. and can be fully explained and understood using material we’ve seen in the course. from which we 1 −1 T get z1 = R1 QT b. 1 a set r linear equations in n variables. provided we have QT b = 0. As with the backslash operator. We start with the full QR factorization of A with column permutations: AP = QR = Q1 Q2 R1 R2 0 0 . 2 . The submatrices have the following dimensions: Q1 ∈ Rm×r .You can also use the pseudo-inverse operator: x=pinv(A)*b is also guaranteed to solve Ax = b. since in general. Here we describe a method that is transparent. and R2 ∈ Rr×(n−r) . 0 This x satisfies Ax = b. With this form for z. Using the QR factorization While the backslash operator is a convenient way to check if Ax = b has a solution. if Ax = b has a solution. We can find a solution of these equations by setting z2 = 0. it doesn’t have to. Now let’s assume that we do have QT b = 0. Q2 ∈ Rm×(m−r) . you have to check that the result satisfies Ax = b. % could also get rank directly from QR factorization .1)]. % construct the submatrices Q1=Q(:. % satisfies Ax=b. [Q.r+1:m). % check if b is in range(A) norm(Q2’*b) % if this is zero or very small.1:r). if b is in range(A) % check alleged solution (just to be sure) norm(A*x-b) 3 . b is in range(A) % construct a solution x=P*[R1\(Q1’*b)..P]=qr(A). we can carry out this construction as follows: [m.n]=size(A).1:r).R.In Matlab.. % full QR factorization r=rank(A). R1=R(1:r. Q2=Q(:. zeros(n-r. e.. Consider the problem of finding a ˆ ˆ ˆ nearest positive semidefinite matrix. . a matrix A = AT 0 that minimizes A − A .) In this case. i. also of rank p. where {q1 .e... . (The Frobenius norm is just the Euclidean norm of the matrix. S. as measured in the Frobenius norm. ˜ ˆ there can be other matrices. ˆ It turns out that the same matrix A is also the nearest rank p matrix to A. i. and λ1 ≥ · · · ≥ λn . with eigenvalue decomposition A= i=1 T λi qi qi . 2 Nearest positive semidefinite matrix n Suppose that A = AT ∈ Rn×n . ˆ A−A ˆ ˆ = Tr(A − A)T (A − A) 1/2 F =  m n i=1 j=1 ˆ (Aij − Aij )2  1/2 . A − A . however. as measured in the Frobenius norm. We are given a matrix A ∈ Rm×n with ˆ rank r. Boyd Low Rank Approximation and Extremal Gain Problems These notes pull together some similar results that depend on partial or truncated SVD or eigenvector expansions. .e. i. 1 Low rank approximation In lecture 15 we considered the following problem. We found that a solution is ˆ A= where A= i=1 p T σi ui vi . A is the unique rank p closest matrix to A. The matrix A need not be the only rank p matrix that is closest to A. written out as a long column ˆ vector. i=1 r T σi ui vi ˆ is the SVD of A. A 1 .EE263 Prof. . qn } are orthonormal. that satisfy A − A = A − A = σp+1 . and we want to find the nearest matrix A ∈ Rm×n with rank p (with p ≤ r). where ˆ ‘nearest’ is measured by the matrix norm. We can also find a direction of minimum gain. If r < n. vr }. namely. zero. they all have the same minimum gain. . 3 Extremal gain problems r Suppose A ∈ Rm×n has SVD A= T σi ui vi . then any unit vector x in N (A) minimizes Ax . V = span{q1 . the span of the right singular vectors associated with the p largest singular values. .. . In other words. vp }. on which A has the largest possible minimum gain. ˆ As you might guess. to get a nearest positive semidefinite matrix. we have QT Q = Ip . v1 defines a direction of maximum gain for A. then the vector vn minimizes Ax among all vectors of norm one. as measured in the Frobenius norm. . . . 2 . then any subspace of dimension p intersects the nullspace of A. To define a subspace of dimension p we use an orthonormal basis. If r = n. The matrix A is sometimes called the positive semidefinite part of A. the matrix A is also the nearest positive semidefinite matrix to A. We define the minimum gain of A ∈ Rm×n on V as min{ Ax | x ∈ V. The solution is what you’d guess. 0}qi qi . Defining Q = [q1 · · · qp ]. So when p > r you can take V as any subspace of dimension p. The minimum gain of A on this subspace is σp . you simply remove the terms in the ˆ eigenvector expansion that correspond to negative eigenvalues. .solution to this problem is ˆ A=A= n T max{λi . Assuming r = n (i. . i=1 Thus. . x = 1}. where Ip is the p × p identity matrix. one such subspace is V = span{vr−p+1 . A has nullspace {0}). We can also find a subspace V of dimension p that has the smallest maximum gain of A. and therefore has minimum gain zero. We can put state these results in a more concrete form using matrices. . i=1 You already know that v = v1 maximizes Ax over all x with norm one. provided p ≤ r: V = span{v1 . . We can express the minimum gain of A on V as σmin (AQ). . the span of the right singular vectors associated with the p smallest singular values. Let V be a subspace of Rn . If p > r. x = 1}. defined as max{ Ax | x ∈ V. qp }.e. We can then pose the question: find a subspace of dimension p. These results can be extended to finding subspaces on which A has large or small gain. and we assume k ≤ n. The related maximization problem is maximize Tr(X T AX) subject to X T X = Ik .) A solution of this problem is X = [qn−k+1 · · · qn ]. is x = qn . and {q1 . . where the variable is x ∈ Rn . . 4 Extremal trace problems T Let A ∈ Rn×n be symmetric. and Ik denotes the k × k identity matrix. with variable x ∈ Rn . with eigenvalue decomposition A = n λi qi qi . Now consider the following generalization of the first problem: minimize Tr(X T AX) subject to X T X = Ik . with variable X ∈ Rn×k . A solution of this problem is X = [q1 · · · qk ]. Note that when k = 1. (The constraint means that the columns of X are orthonormal. 3 . . with λ1 ≥ i=1 · · · ≥ λn . A solution to this problem is x = q1 . where the variable is X ∈ Rn×k . this reduces to the first problem above.The problem of finding a subspace of dimension p that maximizes the minimum gain of A can be stated as maximize σmin (AQ) subject to QT Q = Ip . One solution to this problem is Q = [v1 · · · vp ]. The related maximization problem is maximize xT Ax subject to xT x = 1. You know that a solution of the problem minimize xT Ax subject to xT x = 1. qn } orthonormal. . . so it’s not that simple! Finally.1 3 Plot Si and p as a function of t. γ. we consider a simple power control update algorithm. starting from various initial (positive) power levels. .2 State equations for a linear mechanical system. In this problem. First some background.. . and S are discrete-time signals. Various power control algorithms are used to adjust the powers pi to ensure that Si ≥ γ (so that each receiver can receive the signal transmitted by its associated transmitter).2 . one or two dB). 1. Thus the quantities p. 2. . α = 1. Use the problem data   1 . S. The equations of motion of a lumped mechanical system undergoing small motions can be expressed as M q + Dq + Kq = f ¨ ˙ 1 . we get to the problem. Describe A and b explicitly in terms of σ.1 A simple power control algorithm for a wireless network. α and the components of G. if the interference plus noise term were to stay the same. The noise plus interference power at receiver i is given by qi = σ + Gij pj j=i where σ > 0 is the self-noise power of the receivers (assumed to be the same for all receivers).e. Repeat for γ = 5. What we’d like is Si (t) = si (t)/qi (t) = αγ.1 G =  . q. Transmitter i transmits at power level pi (which is positive). denoted by t = 0. Boyd EE263 homework problems Lecture 2 – Linear functions and examples 2. (b) matlab simulation. for example. Comment: You’ll soon understand what you see. and compare it to the target value αγ. (a) Show that the power control algorithm (1) can be expressed as a linear dynamical system with constant input. But unfortunately. σ = 0. We consider a network of n transmitter/receiver pairs. The signal power at receiver i is given by si = Gii pi . Comment briefly on what you observe. (1) This scales the power at the next time step to be the power that would achieve Si = αγ. . where α > 1 is an SINR safety margin (of. The path gain from transmitter j to receiver i is Gij (which are all nonnegative. i. Use matlab to simulate the power control algorithm (1). and Gii are positive).01. changing the transmit powers also changes the interference powers. For signal reception to occur. The signal to interference plus noise ratio (SINR) at receiver i is defined as Si = si /qi .1  . the SINR must exceed some threshold value γ (which is often in the range 3 – 10). 2.EE263 Autumn 2011-12 Prof. where A ∈ Rn×n and b ∈ Rn are constant.. in the form p(t + 1) = Ap(t) + b. Note that increasing pi (t) (power of the ith transmitter) increases Si but decreases all other Sj .2. so for example p3 (5) denotes the transmit power of transmitter 3 at time epoch t = 5. The powers are all updated synchronously at a fixed time interval.3 . A very simple power update algorithm is given by pi (t + 1) = pi (t)(αγ/Si (t)). γ = 3.1 2 . since the current output is a linear combination of (i. use state You decide on an appropriate state vector for the ARMA model. and is used in many fields (e. multi-output time-series models (i. or give an explicit counterexample. econometrics). if A ∈ Rm×n is another matrix such that f (x) = Ax for all ˜ x ∈ Rn ..g. Another model is given by y(k) = u(k) + b1 y(k − 1) + · · · + bp y(k − p). bi to be matrices. and then verify that f (x) = Ax for any x ∈ Rn . Let u and y be two time series (input and output. y(k − p)  and for the AR model. use state   u(k − 1)   . is called a moving average (MA) model. a function from Z+ into R.e. then do we have A = A? Either show that this is so. The study of time series predates the extensive study of state-space linear systems. i. Assuming M is invertible. . f (x) = Ax. i. 2. k ∈ Z. We think of u(k) as the value of the signal or quantity u at time (or epoch) k. and the set of variables over which we average ‘slides along’ with time. Another widely used model is the autoregressive moving average (ARMA) model.. sequences) related via convolution: y(k) = j hj u(k − j). . u(k − r)  y(k − 1)   . .. You can assume that the convolution is causal. . hj = 0 when j < 0. u(k) ∈ Rm .e.5 Some linear functions associated with a convolution system. with state x= input u = f . write linear system equations for the mechanical system. Suppose that f : Rn −→ Rm is linear. and output y = q.e.. 2 . 2. Show that there is a matrix A ∈ Rm×n such that for all x ∈ Rn . damping.e. This model is called an autoregressive (AR) model.) Is the matrix A ˜ ˜ that represents f unique? In other words. Suppose that u and y are scalar-valued discrete-time signals (i. and K are the mass. For the MA model.4 Representing linear functions as matrix multiplication. Finally. since the output at time k is a weighted average of the previous r inputs.) Remark: multi-input. We recommend you choose a state for the ARMA model that makes it easy for you to derive the state equations. D. respectively). The relation (or time series model) y(k) = a0 u(k) + a1 u(k − 1) + · · · + ar u(k − r) q q ˙ ..e. A time series is just a discrete-time signal.where q(t) ∈ Rk is the vector of deflections. where hk ∈ R. regression on) the current input and some previous values of the output. and f (t) ∈ Rk is the vector of externally applied forces. (There are many possible choices for the state here.3 Some standard time-series models. (Explicitly describe how you get the coefficients Aij from f . 2. x(k) =  .. and stiffness matrices. x(k) =  . the problem: Express each of these models as a linear dynamical system with input u and output y. which combines the MA and AR models: y(k) = b1 y(k − 1) + · · · + bp y(k − p) + a0 u(k) + · · · + ar u(k − r). M . even with different dimensions. y(k) ∈ Rp ) are readily handled by allowing the coefficients ai . respectively. 6 Matrix representation of polynomial differentiation. . . . p(x) = an−1 xn−1 + an−2 xn−2 + · · · + a1 x + a0 . . . . . as the vector (a0 . and Y gives (a chunk of) the resulting future output.   . 2. . Find the matrix D that represents D (i. .    . . Give the structure a reasonable.9 Matrices and signal flow graphs. an−1 ) ∈ Rn . . . y(N )  y(t) = C(t)x(t) + D(t)u(t). u(N )  The matrix G shows how the output at t = 0. if the coefficients of p are given by a. . We can represent a polynomial of degree less than n. . u(−N ) y(N ) Here U gives the past input to the system. The matrix T describes the linear mapping from (a chunk of) the input to (a chunk of) the output. u(N ) y(N ) Thus U and Y are vectors that give the first N + 1 values of the input and output signals. (a) Find A ∈ R2×2 such that y = Ax in the system below: 3      = G     . . T is called the input/output or Toeplitz matrix (of size N + 1) associated with the convolution system. yi depends only on xj for j even. For i odd. . 2. H is called the Hankel matrix (of size N + 1) associated with the convolution system.7 Consider the (discrete-time) linear dynamical system x(t + 1) = A(t)x(t) + B(t)u(t).  . (b) The Hankel matrix. i. Y =  .  . u(N ).. respectively. . Now assume that u(k) = 0 for k > 0 or k < −N and let     u(0) y(0)  u(−1)   y(1)      U = . N depends on the initial state x(0) and the sequence of inputs u(0).  x(0) u(0) .8 Some sparsity patterns. Draw a block diagram of y = Ax for A tridiagonal.(a) The input/output (Toeplitz) matrix. . suggestive name. (a) A matrix A ∈ Rn×n is tridiagonal if Aij = 0 for |i − j| > 1. 2. Dp = dp/dx.  . then the coefficients of dp/dx are given by Da). yi depends only on xj for j odd. . (b) Consider a certain linear mapping y = Ax with A ∈ Rm×n . Y =  . Find a matrix G such that      y(0) y(1) . Find the matrix T such that Y = T U . a1 .e. and define     u(0) y(0)  u(1)   y(1)      U =  .e. .. Find the matrix H such that Y = HU . Describe the sparsity structure of A. . 2. . Similarly. Consider the linear transformation D that differentiates polynomials.  . Assume that u(k) = 0 for k < 0. . for i even. . and Aii = 0 since there are no self loops. by expressing the matrix B in terms of A from the previous part (explaining why they are related as you claim).5 2 2 + y2 2 2 z1 . To describe the allowed symbol transitions we can define a matrix A ∈ Rn×n by Aij = 1 if symbol i is allowed to follow symbol j . A sentence is a finite sequence of symbols. 2. . We consider a language or code with an alphabet of n symbols 1. For each symbol we give a set of symbols which are allowed to follow the symbol. We can intrepret Aij (which is either zero or one) as the number of branches that connect node i to node j. not in the language).. (b) Consider the Markov language with five symbols 1. Give an interpretation of Bij in terms of the language. . and the following transition rules: • 1 must be followed by 2 or 3 • 2 must be followed by 2 or 5 4 . . Let B = Ak . For n = 4. symbol 2 must be followed by 3.12 Counting sequences in a language or code. 3. all branches connect two different nodes). n.e.5 . . k1 .e.5 x2 (b) Find B ∈ R2×2 such that z = Bx in the system below: x1 . n}. in the language).11 Counting paths in an undirected graph. Find the matrix A for the mass/force example in the lecture notes. (You might need to use the concept of a path of length m from node p to node q. 2. defined as 1 if there is a branch from node i to node j Aij = 0 if there is no branch from node i to node j Note that A = AT . and second.. and no self loops (i. km where ki ∈ {1. 2. find a specific input force sequence x that moves the mass to final position 1 and final velocity zero. 0 if symbol i is not allowed to follow symbol j (a) Let B = Ak . . . A language or code consists of a set of sequences. Give a simple interpretation of Bij in terms of the original graph. 3. consider a Markov language with three symbols 1. 2. which we will call the allowable sequences. 4.5 x2 + + + + z2 Do this two ways: first. . Consider an undirected graph with n nodes. where k ∈ Z. . 2. 5. The sentence 1132313 is allowable (i. k ≥ 1.e.) 2. the sentence 1132312 is not allowable (i. .. .x1 2 y1 0. As a simple example.10 Mass/force example. .5 . and symbol 3 can be followed by 1 or 2. A language is called Markov if the allowed sequences can be described by giving the allowable transitions between consecutive symbols. by directly evaluating all possible paths from each xj to each zi . Let A ∈ Rn×n be the node adjacency matrix. Symbol 1 can be followed by 1 or 3. 14 Communication over a wireless network with time-slots. . transmissions can occur only over edges assigned to time-slot k.Find the total number of allowed sentences of length 10. 3. with five symbols 1.e. and so on. . K}. 1 k=2 2 • 3 must be followed by 1 • 4 must be followed by 4 or 2 or 5 • 5 must be followed by 1 or 3 k=1 k=3 k=1 4 k=2 5 3 . Do not hesitate to use matlab. In principle you could solve this problem by writing down all allowed sequences of length 10. . 4. the message can be sent across any edge that is active at period t. . . or transmitted across any edge emanating from node i and active at time period t + 1. only the edges assigned to time-slot 2 can transmit a message. Explain clearly how you solve the problem. you must explain how you solve the problem. with n = 4 nodes. . 2. and the following symbol transition rules: • 1 must be followed by 2 or 3 • 2 must be followed by 2 or 5 • 3 must be followed by 1 • 4 must be followed by 4 or 2 or 5 • 5 must be followed by 1 or 3 Among all allowed sequences of length 10.) But we’d like you to use a smarter approach. A message or packet can be sent from one node to another by a sequence of transmissions from node to node. and k ∈ {1. the edges assigned to time-slot 1 are again active. as well as giving the specific answer. . After time period t = K. and K = 3 time-slots. all symbol transitions are allowed). only the edges assigned to time-slot 1 can transmit a message. Each edge is assigned to one of K time-slots. . . which are labeled 1. labeled 1. an edge from node j to node i means that node j can transmit a message directly to node i. . We consider a network with n nodes. A directed graph shows which nodes can send messages (directly) to which other nodes. and can be stored there. . find the most common value for the seventh symbol. the edges assigned to time-slot 2 are active. (We’re interested in the symbol for which this count is largest. This cycle repeats indefinitely: when t = mK + k. Compare this number to the simple code that consists of all sequences from the alphabet (i. To make sure the terminology is clear. specifically.13 Most common symbol in a given position. 5. for i = 1. n. K. at time period t = 2. 5. 2. Two possible transmissions are assigned to different time-slots if they would interfere with each other. we consider the very simple example shown below. at t = K + 2. At time period t = K + 1.. At time period t. Consider the Markov language from exercise 12. . where m is an integer. At time period t = 1. etc. . In addition to giving the answer. we mention some reasons why the possible transmissions are assigned to time-slots. . . It is also possible to store a message at a node during any time period. Although it doesn’t matter for the problem. and counting how many of these have symbol i as the seventh symbol. If a message is sent from node j to node i in period t. then in period t + 1 the message is at node i. 2. presumably for transmission during a later period. or if they would violate some limit (such as on the total power available at a node) if the transmissions occurred simultaneously. the pattern repeats. and K = 3 time-slots. and assigned to time-slot k 0 if transmission from node j to node i is never allowed Aij =  0 i = j. even when the message is transmitted to another node. we assume that once the message reaches a node. we are interested in getting a message that starts at a particular node. 0 3 0 0 The problems below concern the network described in the mfile ts_data. to all others. Be sure that transmissions only occur during the associated time-slots. i.. every node that has the message forwards it to all nodes it is able to transmit to. store it at node 4. This choice has no significance. we allow multi-cast: if during a time period there are multiple active edges emanating from a node that has (a copy of) the message. a copy is kept there.m. = 2). At each time period. 6 . the message is available at the node to be transmitted along any active edge emanating from that node. then transmission can occur during that time period across all (or any subset) of the active edges. so there is no harm is assuming that at each time period. as follows:   k if transmission from node j to node i is allowed. In this part of the problem. = 2). You can check that at each period.e. (a) Minimum-time point-to-point routing. transmit it to node 2. Finally. assigned to the associated time-slot. transmit it to node 3. = 1). we can send a message that starts in node 1 to node 3 as follows: • • • • • During During During During During period period period period period t=1 t=2 t=3 t=4 t=5 (time-slot (time-slot (time-slot (time-slot (time-slot k k k k k = 1). Find the fastest way to get a message that starts at node 5. the transmission used is active. consider the simple example described above. In this part of the problem. The labeled graph that specifies the possible transmissions and the associated time-slot assignments are given in a matrix A ∈ Rn×n . To illustrate this encoding of the graph. you can store a message at any node in any period. the problem. Thus. and not the simple example given above. store it at node 1. Moreover. as well as a description of your approach and method. as in the example above. = 3). Note that we set Aii = 0 for convenience. to node 18.m. at any future period. we have   0 0 0 1  2 0 1 0   Aexample =   0 0 0 2 . Give your solution as a prescription ordered in time from t = 1 to t = T (the last transmission). We consider a specific network with n = 20 nodes. What is the minimum time it takes before all nodes have a message that starts at node 7? For both parts of the problem. with edges and time-slot assignments given in ts_data. For this example. (b) Minimum time flooding. and we attach no cost to storage or transmission. You only need to give one prescription for getting the message from node 5 to node 18 in minimum time. The sequence of transmissions (and storing) described above gets the message from node 1 to node 3 in 5 periods.In this example. give the transmission (as in ‘transmit from node 7 to node 9’) or state that the message is to be stored (as in ‘store at node 13’). you must give the specific solution. transmit it to node 4. ) 2. (c) f (x) = xT Ax.e. Aii = 0 for all i. (Example: y = mx+b is described as ‘linear’ in US high schools. We consider a directed graph with n nodes. and take y ∈ Rn/2 . For simplicity we do not allow self-loops. 2. (This representation is unique: for a given affine function f there is only one A and one b for which f (x) = Ax + b for all x. Express the gradients using matrix notation. For i odd. (Without the restriction α + β = 1. even though in general they are not.15 Gradient of some common functions. A34 = 1 means there is an edge from node 4 to node 3. We consider x ∈ Rn as a signal. i. b ∈ R. is defined as the vector  ∂f  ∂x1 where the partial derivatives are evaluated at the point x.) Hint. We assume here that n is even. Show that the function f (x) = Ax + b is affine. . yi = x(i+1)/2 . with xi the (scalar) value of the signal at (discrete) time period i. at a point x ∈ Rn . (c) 2× down-sampling with averaging.) (a) Suppose that A ∈ Rm×n and b ∈ Rm .e. (b) f (x) = xT Ax. this is a special case of the previous one. For z near x. . that produce a new signal y (whose dimension varies). Below we describe several transformations of the signal x. . For each one... is given by ˆ ftay (z) = f (x) + ∇f (x)T (z − x). For i even. or informally) called linear. where a ∈ Rn . We take y ∈ R2n−1 . 1 ≤ i ≤ n.17 Affine functions. The graph is specified by its node adjacency matrix A ∈ Rn×n . and take y ∈ Rn/2 . Show that the function g(x) = f (x) − f (0) is linear. Find the gradient of the following functions. . This function is affine. (Yes. plus an offset. with yi = (x2i−1 + x2i )/2. (a) 2× up-conversion with linear interpolation. ∂f ∂xn  . Recall that the gradient of a differentiable function f : Rn → R. the Taylor approximation ˆ ftay is very near f . A function f : Rn → Rm is called affine if for any x. where A = AT ∈ Rn×n . defined as Aij = 1 if there is an edge from node j to node i 0 otherwise. (b) 2× down-sampling. We assume here that n is even.2. a linear function plus a constant. this would be the definition of linearity. affine functions are (mistakenly. In some contexts.18 Paths and cycles in a directed graph. near x.. β ∈ R with α + β = 1. (b) Now the converse: Show that any affine function f can be represented as f (x) = Ax + b. Note that the edges are oriented. You can think of an affine function as a linear function. i. y ∈ Rn and any α. yi = (xi/2 + xi/2+1 )/2. for some A ∈ Rm×n and b ∈ Rm . for A ∈ Rn×n . (a) f (x) = aT x + b.  ∇f (x) =  . for i = 1. The first order Taylor approximation of f .16 Some matrices from signal processing. find a matrix A for which y = Ax. i.e. Roughly speaking. we have f (αx + βy) = αf (x) + βf (y). 7 . n. this operation doubles the sample rate. A simple example illustrating this notation is shown below. inserting new samples in between the original ones using linear interpolation.) 2. with yi = x2i . . . We’ll denote the predicted value of z(t + 1) by z (t + 1). In other words. . i. .s0 = 1. 1.e.) (b) What is the length of a shortest path from node 13 to node 17? (If there are no paths from node 13 to node 17.1 2 4 The node adjacency matrix for this example is  3 In this example. . .. with the same starting and ending node. j = 1. .. The rest of this problem concerns a specific graph. Your solution (which includes what you can say about A and B. 2. 1. for i.sl−1 = 1.sk = 1 for k = 0. For example. 2. 0  1 A=  0 0 0 0 1 0 0 1 0 1  1 0  . (a) What is the length of a shortest cycle? (Shortest means minimum length. j which maximize the number of paths of length 10 from i to j. . ˆ 8 si = sj for i = j. z(t). there is an edge from 2 to 3 and also an edge from 3 to 2. find i. j = 0. . Suppose a matrix A ∈ Rn×n . in the graph shown above. that does not pass through node 3? (d) What is the length of a shortest path from node 13 to node 17. n. . find the most common starting node. i. a cycle is a sequence of nodes of the form s0 . s0 . . find the most common ending node. s1 . or predict. As0 . 1. . Aij ≥ 0. and then the justification. . .20 Quadratic extrapolation of a time series. z(t − 1). We are given a series z up to time t. i. and For example. .19 Element-wise nonnegative matrix and inverse. with Ask+1 . (g) Among all paths of length 10. l − 1. sl−1 .e. A path of length l > 0 from node j to node i is a sequence s0 = j. In other words. with As1 .m on the course web site. l − 1. with no repeated nodes other than the endpoints. and z(t − 2). 0  0 . and its inverse B. .) (c) What is the length of a shortest path from node 13 to node 17. given in the file directed_graph. 3. . More precisely.. you can give the answer as ‘infinity’. . enclosed in a box). Bij ≥ 0. s1 . 2. 3. 4. . 1 is a cycle of length 4. have all their elements nonnegative. . find the most common pair of starting and ending nodes. . that does pass through node 9? (e) Among all paths of length 10 that start at node 5. (f) Among all paths of length 10 that end at node 8. nodes 2 and 3 are connected in both directions. For each of the following questions. What can you say must be true of A and B? Please give your answer first. you must give the answer explicitly (for example. A cycle of length l is a path of length l. we want to extrapolate. 2. sl = i of nodes. . . z(t + 1) based on the three previous elements of the series. you will ˆ find z (t + 1) as follows. Using a quadratic model. in the graph shown above.. As2 . 2 is a path of length 3. You must also explain clearly how you arrived at your answer.s1 = 1. as well as your justification) must be short. and f (t − 2) = z(t − 2). . In other words..1*sin(2*t .22 Show that a + b ≥ a − b . n of Y . z = 5*sin(t/10 + 2) + 0. You can assume that all matrices mentioned have appropriate dimensions. t = 1:1000. Show that ˆ   z(t) z (t + 1) = c  z(t − 1)  . . Here is an example: “Every column of C is a linear combination of the columns of B” can be expressed as “C = BF for some matrix F ”. 2. . .21 Express the following statements in matrix language. . (d) Each column of P makes an acute angle with the corresponding column of Q. 1000. which is given by (1/997) 1000 2 z j=4 (ˆ(j) − z(j)) 1000 (1/997) j=4 z(j)2 1/2 (b) Use the following matlab code to generate a time series z: . 1 and 2. . Then the extrapolated value is given by z (t + 1) = f (t + 1). one is good enough for us. 9 . (c) Each column of P makes an acute angle with each column of Q. . Find the ˆ relative root-mean-square (RMS) error. f (t − 1) = z(t − 1).1*sin(t) + 0. (a) For each i. Use the quadratic extrapolation method from part (a) to find z (t) for t = 4.5). There can be several answers. . 2. . ). Find c explicitly. (b) W is obtained from V by permuting adjacent odd and even columns (i. row i of Z is a linear combination of rows i. ˆ z(t − 2) where c ∈ R1×3 . (e) The first k columns of A are orthogonal to the remaining columns of A. the quadratic extrapolator is a linear function. and does not depend on t. 3 and 4. .e.(a) Find the quadratic function f (τ ) = a2 τ 2 + a1 τ + a0 which satisfies f (t) = z(t). . medium. and give an interpretation (in one or two sentences). M.e. Human color perception is based on the responses of three different types of color light receptors.. and. where Df is the derivative or Jacobian of f . etc.). and li . left and right shoes. with different cone response vectors perceived as different colors. and the vector x of relative price changes. In this problem we will divide the visible spectrum into 20 bands. etc.2 Color perception. where δq ≈ Df (p∗ )δp. respectively. train tickets and bus tickets.g. but the basic idea is right. mi and si are nonnegative constants that describe the spectral response of the different cones. The three types of cones have different spectral response characteristics and are called L. the vector (Lcone . Now suppose there is a small price change δp. automobiles and gasoline. S because they respond mainly to long. what can you say about Eij and Eji ? (c) Suppose the price elasticity of demand matrix for two goods is E= −1 −1 −1 −1 . yi = we have the linear model y = Ex. Mcone . where p is the price vector.1 Price elasticity of demand. where pi is the incident power in the ith wavelength band. called cones. The current price and demand are denoted p∗ and q ∗ . Here are the questions: (a) What is a reasonable assumption about the diagonal elements Eii of the elasticity matrix? (b) Goods i and j are called substitutes if they provide a similar service or other satisfaction (e. Scone ). cake and pie. p∗ j Describe the nullspace of E. The perceived color is a complex function of the three cone responses. For each of these two generic situations. ∂pj This is usually rewritten in term of the elasticity matrix E. Scone = i=1 si pi .) 10 .g. and short wavelengths. with entries Df (p∗ )ij = ∂fi ∗ (p ). and model the cones’ response as follows: 20 20 20 Lcone = i=1 li p i .Lecture 3 – Linear algebra review 3. (Actual color perception is a bit more complicated than this. They are called complements if they tend to be used together (e.. This induces a change in demand. to q ≈ q ∗ + δq. and f : Rn → Rn is the demand function. so p = p∗ + δp. i.). What kind of goods could have such an elasticity matrix? 3. qi xj = δpj . Defining the vector y of relative demand changes.. ∂pj 1/p∗ j so Eij gives the relative change in demand for good i per relative change in price j. respectively. δqi ∗ . with entries Eij = ∗ ∂fi ∗ 1/qi (p ) . The demand for n different goods is a function of their prices: q = f (p). q is the demand vector. Mcone = i=1 mi p i . an observer is shown a test light and is asked to change the intensities of three primary lights until the sum of the primary lights looks like the test light. where u. where i = 1. • If AB is full rank then A and B are full rank. These issues can be handled using the material of EE364a. Find weights that achieve the match or explain why no such weights exist. either show that it is true. explain why.e. . however.3 Halfspace. G. An object’s surface can be characterized by its reflectance (i.e. the observer is asked the find a spectrum of the form pmatch = a1 u + a2 v + a3 w. Who is right? If Sally is right. visually indistinguishable? (Visually identical ˜ lights with different spectral power compositions are called metamers.(a) Metamers. . The data for this problem is in color perception. and the reflectance of the object is ri (which is between 0 and 1). Suppose a. • If A and B are onto. then so is AB. In a color matching problem. c. For each of the following statements. L coefficients. p and p. If the object is illuminated with a light spectrum characterized by Ii . and draw a picture showing a. the fraction of light it reflects) for each band of wavelengths. As a trivial example. that is visually indistinguishable from a given test light spectrum ptest . which the material of EE263 doesn’t address. In other words. G phosphor. and test light. 3.m. If Beth is right give an example of two objects that appear identical under one light source and different under another. B phosphor. say an incandescent bulb and sunlight. • If A and B are full rank then AB is full rank. Show that the set of points in Rn that are closer to a than b is a halfspace. Remark. Can this always be done? Discuss briefly. and B. Sally argues that if the two objects look identical when illuminated by a tungsten bulb. w are the spectra of the primary lights. they will look identical when illuminated by sunlight.: {x | x − a ≤ x − b } = { x | cT x ≤ d} for appropriate c ∈ Rn and d ∈ R. Some of the false statements above become true under certain assumptions on the dimensions of A and B. . 20 denotes the wavelength band. Spectra. Running color perception will define and plot the vectors wavelength. R phosphor. and reflectances are all nonnegative quantities. • If A and B have zero nullspace. and ai are the intensities to be found. You can assume that A ∈ Rm×n and B ∈ Rn×p . or give a (specific) counterexample. Give c and d explicitly. You can use the vectors sunlight and tungsten defined in color perception. R. b ∈ Rn are two given points. intensities. (d) Effects of illumination. When are two light spectra. Beth disagrees: she says that two objects can appear identical when illuminated by a tungsten bulb. . v. S coefficients. M coefficients.) (b) Visual color matching. (c) Visual matching with phosphors.m as the light sources. but look different when lit by sunlight.4 Some properties of the product of two matrices. A computer monitor has three phosphors. i. Now consider two objects illuminated (at different times) by two different light sources. It is desired to adjust the phosphor intensities to create a color that looks like a reference test light. So just ignore this while doing this problem. then the reflected light spectrum is given by Ii ri . all of the statements 11 . and the halfspace. then so does AB. 3. b.. y = x − a . (Matlab may still be useful. . (a) Show that the linearized model near x0 can be expressed as δy = k T δx.) (c) Show that every x ∈ Rn can be expressed uniquely as x = v + v ⊥ where v ∈ V. Express V and V ⊥ in terms of the matrix V = v1 v2 · · · vr ∈ Rn×r using common terms (range.. or you can use more informal language such as “A and B are both skinny. and B ∈ R5×7 has rank 3. ..00005. ∀y ∈ V } . To prove this bound you can proceed as follows: • Show that η = −1 + 1 + α2 + 2β − β where β = k T δx/ x0 − a .. of course. and also draw a picture (for n = 2) to demonstrate it. You can use matlab to verify the ranks. i. a specific A and B with the dimensions and ranks given above.. What values can Rank(AB) possibly have? For each value r that is possible. You will do that here. that the absolute value of the relative error is very small provided δx is small. we have η ≤ 0. Please try to give simple examples. that make it easy for you to justify that the ranks of A.e. Consider a single (scalar) measurement y of the distance or range of x ∈ Rn to a fixed point or beacon at a. For example. Try to find the most general conditions you can. where k is the unit vector (i.e. the linearized model is accurate to about 0. Hint: let v be the projection of x on V. e = x0 + δx − a − x0 − a − k T δx.” 3. for which Rank(AB) = r. In fact you will prove that 0≤η≤ α2 2 where α = δx / x0 − a is the relative size of δx. (b) Suppose V is described as the span of some vectors v1 . B. i. m. i.005%.5 Rank of a product. (b) Consider the error e of the linearized approximation.. with length one) pointing from a to x0 . V ⊥ = { x | x. v2 . i.) Explain briefly why the rank of AB must be one of the values you give. etc. We know.7 Orthogonal complement of a subspace.e. give an example.above are true when A and B are scalars. Suppose that A ∈ R7×5 has rank 4. but we don’t recommend it: numerical roundoff errors in matlab’s calculations can sometimes give errors in computing the rank. By maximizing and minimizing g over the interval −α ≤ β ≤ α show that 0≤η≤ α2 . 2 3.e. Derive this analytically. vr . y = 0. If V is a subspace of Rn we define V ⊥ as the set of vectors orthogonal to every element in V. for example.e. nullspace. • Verify that |β| ≤ α.e. You can give your conditions as inequalities involving n. The relative error of the approximation is given by η = e/ x0 − a . and p that make them true. For each of the statements above. . for a relative displacement of α = 1%.6 Linearizing range measurements. transpose.. . and p. 3. and AB are what you claim they are. .. find conditions on n. In many specific applications. 12 (a) Verify that V ⊥ is a subspace of Rn . i. n = m = p = 1. • Consider the function g(β) = −1 + 1 + α2 + 2β − β with |β| ≤ α. you just have to double check that the ranks it finds are correct.e. to derive a bound on how large the error can be. v ⊥ ∈ V ⊥ . m. it is possible and useful to make a stronger statement. i. i. 3.e. y ∈ Zn and α ∈ Z2 . Note that the vector y is ‘redundant’.. Find the conditions under which A has full rank. Although we won’t consider them in this course. for example Z2 . and m > n.9 Suppose that (Ax. (d) When does equality hold? 3. in terms of the relative positions of the unknown coordinates and the beacons). roughly speaking we have coded an n-bit vector as a (larger) m-bit vector. i. range.e.. You will prove the Cauchy-Schwarz inequality. indeed. Linear block codes. x = x.. and for all λ ∈ R. the components of vectors. will usually be the real numbers. This holds if and ˆ only if H is a left inverse of G. and all the operations in the matrix multiplication are Boolean.8 Consider the linearized navigation equations from the lecture notes. i. (e) Show that V ⊆ U implies U ⊥ ⊆ V ⊥ . √ (a) Suppose a ≥ 0. Suppose x ∈ Zn is a Boolean vector we wish to transmit over an 2 m×n is the coding unreliable channel. In a linear block code. independence and rank are all defined in the obvious way for vector spaces over Z2 . f (x) = Ax. HG = In ..10 Proof of Cauchy-Schwarz inequality. 3. v = 0. Consider n = 3. the received signal y is given by ˆ y = y + v. Either design a one-bit error correcting linear block ˆ code with the smallest possible m.e. the vector y = Gx is formed. (a) What is the practical significance of R(G)? (b) What is the practical significance of N (H)? (c) A one-bit error correcting code has the property that for any noise v with one component equal to one. Concepts like nullspace. (By design we mean. where H ∈ Zn×m .11 Vector spaces over the Boolean field.e. we define a function f : Zn → Zm to be linear (over Z2 ) if 2 2 2 f (x + y) = f (x) + f (y) and f (αx) = αf (x) for every x. i. i. which we assume to be the case.. and sometimes the complex numbers. where A ∈ Z2 Boolean matrix. 3. m) code. A vector in Zn is called a Boolean vector. when vi = 1. has only two elements. the field Z2 is finite. then the decoding is perfect. This is called an (n. we still have x = x. In this problem you will explore one simple example: block codes. (b) Given v. with Boolean addition and multiplication (i. i. which consists of the two numbers 0 and 1. In a linear decoder..) 13 . to get the CauchySchwarz inequality: √ √ |v T w| ≤ v T v wT w. x and Ax always point in the same direction.e. i. w ∈ Rn explain why (v + λw)T (v + λw) ≥ 0 for all λ ∈ R. there are many important applications of vector spaces and linear dynamical systems over Z2 .. the received signal is multiplied by another matrix: x = H y . Describe the conditions geometrically (i. The coded vector y is transmitted over the channel.e.(d) Show that dim V ⊥ + dim V = n. It is also possible to consider vector spaces over other fields.e. Unlike R or C. What can you say about the matrix A? Be very specific. This means that when vi = 0. the ith bit is transmitted correctly.e. a + 2bλ + cλ2 ≥ 0. ˆ where v is a noise vector (which usually is zero). x) = 0 for all x ∈ Rn . In this course the scalar field. or explain why it cannot be done. For example. Much of the linear algebra for Rn 2 and Cn carries over to Zn .. (c) Apply (a) to the quadratic resulting when the expression in (b) is expanded. where G ∈ Z2 matrix. give G and H explicitly and verify that they have the required properties. c ≥ 0. It is easy to show that 2 m×n is a every linear function can be expressed as matrix multiplication. 1 + 1 = 0). Show that |b| ≤ ac. One reasonable requirement is that ˆ ˆ 2 if the transmission is perfect.e. in Z2 .. the ith bit is changed during transmission. Bij = 0 for i < j. (We’ll be very angry if we have to type in your 5 × 3 matrix into matlab to check it.12 Right inverses. if f is any unbiased estimator. explain why. We consider the standard measurement setup: y = Ax + v. y ∈ Rm is the vector of measurements we take. but no linear unbiased estimator exists. Explain in the general case why a linear unbiased estimator exists whenever there is a nonlinear one. (d) The second and third rows of B are the same. then we say that f is an unbiased estimator (of x. its rank. In other words: it’s possible to have a nonlinear unbiased estimator. Pick the statement that is true. You may not assume anything about the dimensions of A. (a) The second row of B is zero. or explain why there is no such B. its rank is 3). This problem concerns the specific  −1 0 A= 0 1 1 0 matrix 0 −1 1 0 0 1  1 0 . and also explain why no linear unbiased estimator exists. If you believe this statement.) When there is no right inverse with the given property. then f returns the true parameter value x. briefly explain why there is no such B. If the function f : Rm → Rn satisfies f (Ax) = x for all x ∈ Rn . (b) The nullspace of B has dimension one.. For each of the cases below. C. then f must be a linear function. One of the following statements is true. In fact.e. i. but you don’t need them. there are far better nonlinear ones. 14 . and justify it completely. then give a specific example of a matrix A. f is not linear). What this means is that if f is applied to our measurement vector. so there exists at least one right inverse.. Here. But whenever there is a nonlinear unbiased estimator. we allow the possibility that f is nonlinear (which we take to mean. 3. There is no such thing as a nonlinear unbiased estimator. In cases where there is a right inverse B with the required property. In EE263 we have studied linear unbiased estimators. where A ∈ Rm×n . which are unbiased estimators that are also linear functions. there is also a linear unbiased estimator. either find a specific matrix B ∈ R5×3 that satisfies AB = I and the given property. i. nullspace. x ∈ Rn is the vector of parameters we wish to estimate.Remark: linear decoders are never used in practice. (This statement is taken to be true if there are no unbiased estimators for a particular A. given y). and v ∈ Rm is the vector of measurement errors and noise. though. You can quote any result given in the lecture notes. Nonlinear unbiased estimators do exist. A. You must also attach a printout of some matlab scripts that show the verification that AB = I. which opens the possibility that we can seek right inverses that in addition have other properties. etc. (e) B is upper triangular. and an unbiased nonlinear estimator.) If you believe this statement.13 Nonlinear unbiased estimators. In other words. 0 This matrix is full rank (i. (c) The third column of B is zero. and v = 0. There are cases for which nonlinear unbiased estimators exist. give a specific example of a matrix A. If you believe this statement. 3.e. B..e. Bij = 0 for i > j. you must briefly explain how you found your B. (f) B is lower triangular. there are many right inverses of A. and a nonlinear unbiased estimator. We have exact measurements of the (spherical) distances between each beacon and the unknown point x. d3 . pk under which we can unambiguously determine x.16 Some true/false questions. y) = (x. v ∈ Rm is the (unknown) disturbance signal. . the statement A2 = A is false. You can give your solution algebraically. and rejects the disturbance patterns you claim it does. .) Give the specific set of disturbance patterns that your equalizer rejects. which occurs only when the two points x. If the equalizer rejects a set of disturbance patterns. 3. as the angle between the vectors.. find an equalizer B that rejects as many disturbance patterns as possible. pk ∈ S n are the (known) positions of some beacons on the unit sphere. Show any other calculations needed to verify that your equalizer rejects the maximum number of patterns possible. In this problem we consider the unit sphere in Rn . Show the matlab verification that your B does indeed reconstruct x. y).15 Identifying a point on the unit sphere from spherical distances. .. which is defined as the set of vectors with norm one: S n = {x ∈ Rn | x = 1}.3. As another example. You must justify your answer. and d6 . y ∈ Rm is the (known) received signal. and d7 . measured in radians: If x.14 Channel equalizer with disturbance rejection. whose columns are the individual disturbance patterns. measured along the sphere. . . and no matter what values A and B have. . and A ∈ Rm×n describes the (known) channel. however. range. and d7 (say).’ (We only need one set of disturbances of the maximum size. . . d1 . (The disturbance patterns are given as an m × k matrix D. Determine if the following statements are true or false.) We say the equalizer B rejects the disturbance pattern di if x = x. . No justification or discussion is needed for your answers. be the same). For the problem data given in cedr_data. and let x ∈ S n be an unknown point on the unit sphere. based on this information. the spherical distance between them is sphdist(x. ˆ when v = di . then it can reconstruct the transmitted signal exactly. y ∈ S n .m. when the disturbance v is any linear combination of d1 . A communication channel is described by y = Ax + v where x ∈ Rn is the (unknown) transmitted signal. . for example. i. given the distances ρi .e. Find the conditions on p1 . What we mean by “true” is that the statement is true for all values of the matrices and vectors given. The disturbance v is known to be a linear combination of some (known) disturbance patterns. the statement “A + B = B + A” is true. (There are also matrices for which it does hold. k. . the maximum distance between two points in S n is π. d3 . because there are (square) matrices for which this doesn’t hold. . But that doesn’t make the statement true.) Now suppose p1 . for any x ∈ S n . We consider linear equalizers for the channel. without any ambiguity. we are given the numbers ρi = sphdist(x. which means x = −y.g. the exact position of x. dk ∈ Rm . where we take the angle as lying between 0 and π. . (Thus. no matter what x is.) 15 . pi ). disturbances d1 .e. For example. as in ‘My equalizer rejects three disturbance patterns: d2 . or you can give a geometric condition (involving the vectors pi ). i = 1. more precisely.) Explain how you know that there is no equalizer that rejects more disturbance patterns than yours does. which have the form x = By.. i. 3. .g. . e. We would like to determine. because no matter what the dimensions of A and B (which must. you might say that Bij are the equalizer coefficients. using any of the concepts used in class (e.. nullspace. . but you can assume that the dimensions are such that all expressions make sense. You can’t assume anything about the dimensions of the matrices (unless it’s explicitly stated). the statement holds. where B ∈ Rn×m . an identity matrix. y are antipodal. d3 . ˆ (We’ll refer to the matrix B as the equalizer. We define the spherical distance between two vectors on the unit sphere as the distance between them. rank). Here is the problem. then A is onto. We make 4 measurements.) You can’t assume anything about the dimensions of the matrices (unless it’s explicitly stated).a. In each experiment we measure and note the temperatures at the two critical locations. j. But that doesn’t make the statement true. f. (h) If AT A is onto. What we mean by “true” is that the statement is true for all values of the matrices and vectors given. . T2 ) (in degrees C). because no matter what the dimensions of A and B (which must. and dissipate 10W. . all cores are idling.) (a) If all coefficients (i. (You can assume the entries of the matrices and vectors are all real. If A and B are onto. because there are (square) matrices for which this doesn’t hold. . uk ∈ Rn are nonzero vectors such that uT uj ≥ 0 for all i. ABC (e) If (f) If A 0 0 B is full rank. then αi = 0 for i = 1. We are concerned with the temperature of a processor at two critical locations. . entries) of the matrices A and B are nonnegative. Then [A B] is skinny and full rank. A+B (b) N  A+B+C   A (c) N  AB  = N (A) ∩ N (B) ∩ N (C). k. As another example. and the other two are idling. then A is onto. denoted T = (T1 . which means if α1 . . but you can assume that the dimensions are such that all expressions make sense. be the same). c. Then the vectors are i k nonnegative independent. denoted P = (P1 . one of the processors is set to full power.e. however. These temperatures.. . P3 ) (in W).18 Temperatures in a multi-core processor.. . an identity matrix. 100W. If the matrix A B A C 0 B A B . then so is the matrix d. Determine if the following statements are true or false. . . is onto. and both A and B are onto. and i=1 αi ui = 0. αk ∈ R are nonnegative scalars. If all coefficients (i. the statement “A + B = B + A” is true. P2 . the statement A2 = A is false. b. then so are the matrices A and B.17 Some true/false questions. In the next three measurements.e. . For example. 16 . full rank matrices that satisfy AT B = 0. then so is the matrix 3. then so are the matrices A and B. (i) Suppose u1 . . In the first. If A and B are onto. 3. If A is full rank and skinny. . If A and B are onto.g. then so is the matrix e. A 2 0 (g) If A is onto. entries) of the matrix A are positive. (j) Suppose A ∈ Rn×k and B ∈ Rn×m are skinny. (There are also matrices for which it does hold. and no matter what values A and B have.. then A is full rank. e. is onto. then A + B must be onto. A B .   A  = N (A) ∩ N (B) ∩ N (C). then A is full rank. . the statement holds. then A + B is onto. (d) N (B T AT AB + B T B) = N (B). are affine functions of the power dissipated by three processor cores. . .) 3. available on the course web site. . . k. Unfortunately. defines A and y (as A and ytilde). .. 17 . k. xk have. you can use the matlab code rank([F g])==rank(F) to check if g ∈ R(F ). For bounding (a. You can assume that the matrices Ai are skinny. . which includes the signals of all transmitters. where Ai ∈ Rm×ni . We say that the ith signal is decodable if the ith receiver can determine the value of xi . . ηab = a−b . . (We will see later a much better way to check if g ∈ R(F ). If all sensors are operating correctly. we have yi = yi for all i = k. while rejecting any interference from the other signals x1 . 3. You must explain your method. according to y = A1 x1 + · · · + Ak xk . xk is decodable. where A ∈ Rm×n is known. How large can we make p. up to one sensor may have failed (but you don’t know which one has failed. xk . 3. . Ak . .20 Single sensor failure detection and identification. . The goal is for transmitter i to transmit a vector signal xi ∈ Rni to the ith receiver. You are given y and not y. where y is the same as y ˜ ˜ in all entries except. We consider a system of k transmitter-receiver pairs that share a common medium. ˜ Determine which sensor has failed (or if no sensors have failed). . p. . . 10%). i. . . of course. or even whether any has failed). For this exercise. Roughly speaking. we have y = y . . . Suppose ηab = 0. possibly. . Here are four statements about decodability: (a) Each of the signals x1 . All receivers have access to the same signal y ∈ Rm . in addition to providing the numerical solution. . Whether or not the ith signal is decodable depends. . b) be? Explain your reasoning. . without interference from the other transmitters. this means that receiver i can process the received signal so as to perfectly recover the ith transmitted signal. no matter what values x1 . . . . one (say. ˜ ˜ The file one_bad_sensor.1 (i.m. you can just draw some pictures. How big and how small can be ηba be? How big and how small can (a. The relative deviation of b from a is defined as the distance between a and b. We have y = Ax. and x ∈ Rn is to be found. you don’t have to give a formal argument.. Each receiver knows the received signal y. on the matrices A1 . divided by the norm of a. (You can also assume that ni > 0 and Ai = 0. Suppose a and b are nonzero vectors of the same size. .21 Vector space multiple access (VSMA). xi−1 . . m ≥ ni for i = 1. .) Since the k transmitters all share the same m-dimensional vector space.e. .P1 10W 100W 10W 10W P2 10W 10W 100W 10W P3 10W 10W 10W 100W T1 27◦ 45◦ 41◦ 35◦ T2 29◦ 37◦ 49◦ 55◦ Suppose we operate all cores at the same power. and submit your code. in general. for i = 1. The relative deviation is not a symmetric function of a and b. without T1 or T2 exceeding 70◦ ? You must fully explain your reasoning and method.e.19 Relative deviation between vectors. xi+1 . . . and the matrices A1 . we call this vector space multiple access. a This is often expressed as a percentage. ηab = ηba . the kth entry). If the kth sensor fails. Ak . . b). let’s not worry about breaking ties.2. You can also give the response “My attorney has advised me not to respond to this question at this time. if we have N = 3 and arecd − a1 = 2. II. You must show how 18 . then the minimum distance decoder would guess that the signal a2 was sent. When the signal ak is sent. .0.3. .” This response will receive partial credit. or any other type of answers. any further explanation or elaboration. recd then the maximum correlation decoder would guess that the signal a3 was sent.e. which collectively are called the signal constellation. we want only your answers. Give the simplest condition you can. rank(A1 ) < n1 . . the received signal. the set of vectors a1 . . There are many ways to do this. the noise v is described by a statistical model. must have a very specific form: it must consist of a conjunction of one or more of the following properties: I. aN ∈ Rn .1.1. possible answers (for each statement) could be “I” or “I and II”.e. but x1 isn’t. xk are decodable when x1 is 0. however. By ‘same’ we mean this: for any received signal arecd .22 Minimum distance and maximum correlation decoding. i. the decoded signal for the two methods is the same. we will accept any correct one. and will not read. For some statements. Choose as the estimate of the decoded signal the one in the constellation that has the largest inner product with the received signal. choose ak that minimizes arecd − ai . necessary and sufficient) conditions under which the statement holds. For example. or “I and II and IV”.2. . (d) The signals x2 . . it does not matter for the problem). you are to give the exact (i. arecd − a3 = 1. but in this problem we explore two methods. These signals. . the receiver has to estimate or guess which of the N signals was sent. nk . rank([A1 · · · Ak ]) = rank(A1 ) + rank([A2 · · · Ak ]). . . We do not want. (c) The signals x2 . . You can just assume that ties never occur. We consider a simple communication system. are known to both the transmitter and receiver. . For both methods. IV. rank([A1 · · · Ak ]) = n1 + rank([A2 · · · Ak ]). where v ∈ Rn is (channel or transmission) noise. in terms of A1 . . arecd − a2 = 0. . In a communications course. or has maximum inner product with. choose ak that maximizes aT ai . We will denote the possible signals as a1 . Choose as the estimate of the decoded signal the one in the constellation that is closest to what is received. one of the signals is always closest to. Give some general conditions on the constellation (i. which receives a version of the signal sent that is corrupted by noise. . .(b) The signal x1 is decodable. rank([A2 · · · Ak ]) = n2 + · · · + nk . based on the received signal arecd . . . III.e. 3. . . recd aT a2 = 0. . . if we have N = 3 and recd aT a1 = −1. For each of these statements. recd aT a3 = 1. . The receiver must make a guess or estimate as to which of the signals was sent. but here we’ll just assume that it is ‘small’ (and in any case. i. in which a sender transmits one of N possible signals to a receiver.e. Each answer.. Based on the corrupted received signal. aN ) under which these two decoding methods are the same. . For (just) this problem. xk are decodable. • Minimum distance decoding. As examples. the received signal is arecd = ak + v. Ak and n1 . • Maximum correlation decoding. For example. We will represent the signals by vectors in Rn .. .. there may be more than one correct answer.. Reactions are also used to model flow of metabolites into and out of the cell. Flux balance analysis is based on a very simple model of the reactions going on in a cell. . . Mm . Each reaction has a (known) stoichiometry. (We are not asking you to show that when your conditions don’t hold. give a specific counterexample. . when your conditions hold. defined as follows: Sij is the rate of production of Mi due to unit reaction rate vj = 1. But then again. 19 . As an example.) (b) Carry out your method on the data found in rev_eng_smooth_data. . A smoothing filter takes an input vector u ∈ Rn and produces an output vector y ∈ Rn . µ.) You might want to check simple cases like n = 1 (scalar signals). and κ. vn ∈ R. it means that |v2 | is the flow rate of the metabolite out of the cell. 2. . . and the production rate of M3 is 2v1 . µ. and κ are positive constants (weights). and n n J track = i=1 (ui − yi )2 . .) This corresponds to a first column of S of the form (−1.m.. .the decoding schemes always give the same answer. Rn . labeled M1 . J norm = i=1 2 yi are the tracking error and norm-squared of y. For example. an input u.24 Flux balance analysis in systems biology. with v2 giving the flow rate. respectively. We focus on m metabolites in a cell. . the production rate of M2 is v1 . . A positive value of vi means the reaction proceeds in the given direction. 1. suppose reaction R1 has the form M1 → M2 + 2M3 . In other words. . labeled R1 . and the associated output y. working from an input-output pair. (a) Explain how to find λ. If vj is negative. 0. means that reaction Rj causes metabolite Mi to be consumed at a rate 2vj . . J smooth = i=2 (yi+1 − 2yi + yi−1 )2 are measures of the continuity and smoothness of y. for example. for which your conditions don’t hold. with reaction rates v1 . 0). and κ. µ. is v1 .) This corresponds to a second column of S of the form (1. . where λ. N = 2 (only two messages in the constellation). . you will reverse engineer the smoothing filter. (You do not need to worry about ensuring that these are positive. . and n n−1 J cont = i=2 (yi − yi−1 )2 . you can assume this will occur automatically. Here we consider consumption of a metabolite as negative production. Your goal is to find the weights λ. 0. . and the methods differ. 3. which tells us the rate of consumption and production of the metabolites per unit of reaction rate.) The output y is obtained as the minimizer of the objective J = J track + λJ norm + µJ cont + κJ smooth . due to this reaction. (We will assume that n ≥ 3. . . Here is the problem: You have access to one input-output pair. 3. then metabolite Mi is produced at the rate 2|vj |. The stoichiometry data is given by the stoichiometry matrix S ∈ Rm×n . . Mm .23 Reverse engineering a smoothing filter. you might not. (The reaction R1 has no effect on metabolites M4 . There are n (reversible) reactions going on. . i. the two decoding schemes differ for some received signal. keeping track only of the gross conservation of various chemical species (metabolites) within the cell. respectively. so Sij = −2. while a negative value of vi means the reaction proceeds in the reverse direction. .e. (When v2 < 0. or draw some pictures. Also. The consumption rate of M1 . Give the values of the weights. 0). . suppose that reaction R2 corresponds to the flow of metabolite M1 into the cell. . . the integer M is called the memory of the system. (b) Carry out your method for the problem data given in flux_balance_bio_data.. You are given the input and output signal values over a time interval t = 1. The last column of S gives the amounts of metabolites used (when the entry is negative) or created (when positive) per unit of cell growth rate. nullspace. vn > 0. 1. You may find the matlab functions toeplitz and fliplr useful. This is a very simple version of the problem. or cell growth. we consider the effect of knocking out a gene. t ∈ Z. . least-squares. (Convolution systems are also called linear time-invariant systems. Now for the problem. t = . we’ll assume that reactions 1. t ∈ Z. You may assume that T > 2M . You can use any concepts from the class. Hint. which can be expressed as the flux balance equation Sv = 0. using any concepts from the course. . shown as shaded rectangles in the figure below..g. that incorporate lower and upper limits on reactions rates and other realistic constraints. 3. This means there are no reaction rates consistent with cell growth. . T : (u1 . . and call gene Gk an essential gene. you do not know h. The goal is to find the smallest memory M consistent with this data. as well as those converted to biomass within the cell. In this case. and of course. and the gene knockout. yT ). .e. Finally. . 20 . −1. given the stoichiometry matrix S. (b) Use your method from part (a) on the data found in mem_lti_data. so the reaction rate vn is the cell growth rate. . . Give the value of M found. (a) Explain how to find all essential genes.m. (a) Explain how to solve this problem. range. uT ).The last reaction. .m. where h = (h1 . we get to the point of all this. .26 Layered medium. .. . (y1 . For simplicity. flux balance. . t ∈ Z (i. Remark. .) When hM = 0. .e. Finally. . List all essential genes. The medium is modeled as a set of n layers. 0. separated by n dividing interfaces. gene Gk controls reaction Rk . . 3.) and output signal yt ∈ R. . e. we have conservation of the metabolites. corresponds to biomass creation. vk = 0. .25 Memory of a linear time-invariant system. In this problem we consider a generic model for (incoherent) transmission in a layered medium. Rn . you’ll see more sophisticated versions of the same problem. . hM ) are the impulse response coefficients of the convolution system. In EE364a. . we predict that knocking out gene Gk will kill the cell. Suppose there is no v ∈ Rn that satisfies Sv = 0. . . Since our reactions include metabolites entering or leaving the cell. Knocking out Gk has the effect of setting the associated reaction rate to zero. . Note that you do not know uτ or yτ for τ ≤ 0 or τ > T . . are related by a convolution operator M yt = τ =1 hτ ut−τ . i. . An input signal (sequence) ut ∈ R. n − 1 are each controlled by an associated gene. . n − 1. which is connected via the wires to other gate inputs and possibly to some external circuitry. . Hint: You may find the matlab function diag(x. . which means that yn = xn . Be sure to give the value of k that is most consistent with the measurement. where ti ∈ [0. The right-traveling wave in the first layer is called the incident wave. . yi = ri xi + ti yi+1 . which means that the output of gate i does not connect to the inputs of any of the gates 1. the lefttraveling wave. . . 3. each with 2 inputs. ri = 0. 2. (a) Explain how to find the scattering coefficient S. We describe the topology of the circuit by the fan-out list for each gate. You may assume that the last (fully reflective) interface is not faulty. and we let yi ∈ R denote the left-traveling wave amplitude in layer i. yi versus i.96. and the left-traveling wave in the first layer is called the reflected wave. . and ti = 0. n}. We model the last interface as totally reflective. 1] is the transmission coefficient of the ith interface. n (presumably the output of gate i connects to some external circuitry). We’ll assume that’s the case here.02. Plot the left. for i = 1. or that gate j is in the fan-out of gate i. ri = 0. Each gate has one or more inputs (typically between one and four). i. . interconnected by wires. . . 4}. A digital circuit consists of a set of n (logic) gates.k) useful. (This means that the gate interconnections form a directed acyclic graph. (b) Carry out your method for a medium with n = 20 layers. and one output. FO(2) = {3}. It also contributes the amplitude ri xi to yi . . so the left-traveling wave with amplitude yi+1 contributes the wave amplitude ti yi+1 to yi (via transmission) and wave amplitude ri yi+1 to xi+1 (via reflection). FO(4) = ∅. . The scattering coefficient for the medium is defined as the ratio S = y1 /x1 (assuming x1 = 0). is shown below. The right-traveling wave of amplitude xi contributes the amplitude ti xi to xi+1 . 21 FO(3) = ∅. . a simple digital circuit with n = 4 gates. rk = 0.02. we say that gate i drives gate j.or right-traveling waves with the faulted interface). The right. n. When the output of gate i is connected to an input of gate j. For this circuit we have FO(1) = {3.96. n}.96. . A fault in interface k results in a reversal: tk = 0. . . Explain how to find which interface is faulted. We will assume that the interfaces are symmetric. .. i = 1. which tells us which other gates the output of a gate connects to. where ri ∈ [0. . We denote the fan-out list of gate i as FO(i) ⊆ {1. .and right-traveling wave amplitudes xi .27 Digital circuit gate sizing. (c) Fault location.x1 y1 x2 y2 x3 y3 x4 y4 xn−1 yn−1 xn yn interface i = 1 2 3 n−1 n We let xi ∈ R denote the right-traveling wave amplitude in layer i. .02 for i = 1. given the transmission and reflection coefficients for the first n − 1 layers. n − 1. . 1] is the reflection coefficient of the ith interface. Carry out your method with S fault = 0.) To illustrate the notation. with all other interfaces having their nominal values ti = 0. . . . we have FO(i) ⊆ {i + 1. It’s common to order the gates in such a way that each gate only drives gates with higher indices. We can have FO(i) = ∅. . and report the value of S you find. You measure the scattering coefficient S = S fault with the fault (but you don’t have access to the left.70.and left-traveling waves on each side of an interface are related by transmission and reflection. Thus we have xi+1 = ti xi + ri yi+1 .e. . Each gate has a delay di . which depends on the scale factor xi as in Ci = αi xi . These scale factors are the design variables in the gate sizing problem. whether or not there is an x that gives di = T . The load capacitance of gate i is given by load ext Ci = Ci + j∈FO(i) ext where Ci is a positive constant that accounts for the capacitance of the interconnect wires and external circuitry.. of course. which can be intepreted as the minimum possible delay of gate i. (You don’t need to know this. where ai are positive constants. . which is given by load di = βi + γi Ci /xi . however. . in Each gate has an input capacitance Ci . to solve this problem. We can assume. and A ≤ Amax . we have di = T . 22 . i. T is larger than the largest minimum delay of the gates.. Be sure to explain how you determine if the design problem is feasible. subject to a given area constraint A ≤ Amax .e. (a) Explain how to find a design x⋆ ∈ Rn that minimizes T . where T > 0 is given. Finally. and Ci is the load capacitance of gate i. we get to the problem. . Note that the gate delay di is always larger than βi . with 1 ≤ xi ≤ xmax ) that yields di = T for i = 1. your job is to find the scale factors xi . achieved only in the limit as the gate scale factor becomes large. i. We will follow a simple design method. a choice of the xi . and the 3 output signals emerging from the right are called primary outputs of the circuit. in Cj . The total area of the circuit has the form n A= i=1 ai xi . For a given value of T . where αi are positive constants. You can assume the fanout lists. which assigns an equal delay T to all gates in the circuit.. there may or may not exist a feasible design (i. load where βi and γi are positive constants.e.e. where xmax is a given maximum allowed gate scale factor (typically on the order of 100). and all constants in the problem description are known. that T > maxi βi . They must satisfy 1 ≤ xi ≤ xmax .) Each gate has a (real) scale factor or size xi .. with 1 ≤ xi ≤ xmax . n.e. i.1 3 2 4 The 3 input signals arriving from the left are called primary inputs. that give the values x1 . Such a function is called a rational function of degree m. . e. which satisfy f (xi ) = yi . . . . 3. . x and y. . with either am = 0 or bm = 0. . bm . . . yN . which should be as small as possible. yN ∈ R. . Explain how you will solve this problem. b1 . You do not need to know anything about digital circuits. xN . (b) Carry out your method on the particular circuit with data given in the file gate_sizing_data. . . am . . . . .m. The problem is to find a rational function of smallest degree that is consistent with this data. and 0 otherwise. everything you need to know is stated above. .Your method can involve any of the methods or concepts we have seen so far in the course. . respectively. and y1 . . bm are parameters. .g. Comment. . In other words. .28 Interpolation with rational functions. The jth entry in the ith row is 1 if gate j is in the fan-out of gate i. . .m. you are to find m. . and a0 . . where yi = f (xi ). and not the simple example shown above. am . It can also involve a simple search procedure. . (This contains two vectors. and zero otherwise. with i. . j entry one if j ∈ FO(i). . 1 + b1 x + · · · + bm xm where a0 . 23 . In other words. . trying (many) different values of T over a range. In this problem we consider a function f : R → R of the form f (x) = a0 + a1 x + · · · + am xm . xN ∈ R and y1 . Note: this problem concerns the general case. b1 . and b1 .) Give the value of m you find. and the coefficients a0 . . bm . . and then carry out your method on the problem data given in ri_data. .. the ith row of F gives the fanout of gate i. . am . We are given data points x1 . . . Please show us your verification that yi = f (xi ) holds (possibly with some small numerical errors). . . The fan-out lists are given as an n×n matrix F. . . then y = P x is the projection of x on R(P ). Verify that the matrix R is orthogonal. Show that U T x ≤ x . not consistent. the point y in S closest to x is called the projection of x on S. Show that A(AT A)−1 AT is a projection matrix. of course. Note that the first requirement says that every consistent y does not trip the alarm. Suppose the columns of U ∈ Rn×k are orthonormal. which has the same distance to z as x does. we can quickly check whether y is consistent. We can exploit the redundancy in our measurements to help us identify whether such a fault has occured. then so is U −1 . (c) Suppose A ∈ Rn×k is full rank. the second requirement states that every inconsistent y does trip the alarm. (b) Suppose that the columns of U ∈ Rn×k are orthonormal.. Show that U U T is a projection matrix. or become faulty. If the system is operating normally then our measurement will. When do we have U T x = x ? 4. We’ll call a measurement y consistent if it has the form Ax for some x ∈ Rn . Find an integrity monitor B for the matrix   1 2 1  1 −1 −2    1 3 .3 Projection matrices. then so is U V . with k ≤ n. Finally. (If we are really unlucky.5 Sensor integrity monitor. When the system is operating normally (which we hope is almost always the case) we have y = Ax.2 Orthogonal matrices. go in the direction z − x through the hyperplane to a point on the opposite side. i. A matrix P ∈ Rn×n is called a projection matrix if P = P T and P 2 = P .1 Bessel’s inequality. If the system becomes faulty. then we no longer have the relation y = Ax.4 Reflection through a hyperplane. A =  −2    1 −1 −2  1 1 0 24 • By = 0 for any y which is consistent. If the system or sensors fail. (To reflect x through the hyperplane means the following: find the point z on the hyperplane closest to x. (a) Show that if U and V are orthogonal. 4. . A suite of m sensors yields measurement y ∈ Rm of some vector of parameters x ∈ Rn . (Which is why such matrices are called projection matrices .e. (c) Suppose that U ∈ R2×2 is orthogonal. Show that if P is a projection matrix.) (d) If S ⊆ Rn and x ∈ Rn . (b) Show that if U is orthogonal. we can send an alarm if By = 0. where m > n. Starting from x. we hope that the resulting measurement y will become inconsistent.Lecture 4 – Orthonormal sets of vectors and QR factorization 4. be consistent. the system will fail in such a way that y is still consistent.) 4. Find the matrix R ∈ Rn×n such that reflection of x through the hyperplane {z | aT z = 0} (with a = 0) is given by Rx.) A matrix B ∈ Rk×m is called an integrity monitor if the following holds: • By = 0 for any y which is inconsistent. . Then we’re out of luck. (Later we will show that the converse is true: every projection matrix can be expressed as U U T for some U with orthonormal columns. the problem. Show that U is either a rotation or a reflection. If we find such a matrix B. Make clear how you decide whether a given orthogonal U is a rotation or reflection. . (a) Show that if P is a projection matrix then so is I − P . ) 4. Verify that R(C) ⊆ R(A) and R(C) ⊆ R(B). 1. This means that the columns of C are a basis for R(A) ∩ R(B). That’s OK. Apply the corresponding Householder reflection to x to find Qx. . . don’t expect to have By exactly zero for a consistent y. really small. Explain how you can find a matrix C ∈ Rn×r . null. (c) Show that det Q = −1. 4. (a) Show that Q is orthogonal. (b) there exists a matrix Y ∈ Rn×n such that B = Y A. For this reason. (d) BA = 0.8 Groups of equivalent statements. Please carefully separate your answers to part (a) (the general case) and part (b) (the specific case). you have to explain what you’re doing. In the list below there are 11 statements about two square matrices A and B in Rn×n . It’s not just a random matrix. You must also verify that the matrix you choose satisfies the requirements. with independent columns. (c) AB = 0.7 Finding a basis for the intersection of ranges. in the sense that it does not accumulate much roundoff error. there are many ways to find such a B. multiplication by Q gives reflection through the plane with normal vector u. it will be really. 4. as well as giving us your explicit matrix B. • Be very careful typing in the matrix A. that is. 4. As usual. Then again. (b) Show that Qu = −u.Your B should have the smallest k (i. because of roundoff errors in computer arithmetic. (a) Suppose you are given two matrices.e. (e) rank([ A B ]) = rank(A). as well as the matlab (or other) code that generated it. ) Compute such a u for x = (3. 25 (a) R(B) ⊆ R(A). Thus. by showing that each column of C is in the range of A. find a unit-length vector u for which Qx lies on the line through e1 .m. Hints: • You might find one or more of the matlab commands orth. 4. Householder reflections are used as building blocks for fast. Be sure to give us your matrix C. (d) Given a vector x ∈ Rn . . A ∈ Rn×p and B ∈ Rn×q . number of rows) as possible.6 Householder reflections. 2. uT u = 1. Hint: Try a u of the form u = v/ v . for which R(C) = R(A) ∩ R(B).. numerically sound algorithms. where u ∈ Rn is normalized. (b) Carry out the method described in part (a) for the particular matrices A and B defined in intersect_range_data. you might not. • When checking that your B works. and also in the range of B. A Householder matrix is defined as Q = I − 2uuT . Note: Multiplication by an orthogonal matrix has very good numerical properties. 5). with v = x + αe1 (find the appropriate α and show that such a u works . for any v such that uT v = 0. Show that Qv = v. or qr useful. What can you say about its determinant? 4. Your job is to collect them into (the largest possible) groups of equivalent statements. An electrical circuit has n nodes and b branches. g. 26 . or the third. the statement ‘A is onto’ is equivalent to ‘N (A) = {0}’ (when A is square. and k are equivalent. k. For example. which we assume here). (j) rank([ A (i) there exists a matrix Z ∈ Rn×n such that B = AZ. and j ∈ Rb the vector of branch currents. Suppose Q ∈ Rn×n is orthogonal. and so on. c. and vice versa. Each node in the circuit has a potential. (k) N (A) ⊆ N (B). and statements f. which will consist of lists of mutually equivalent statements. defined as   +1 edge k leaves node i −1 edge k enters node i Aik =  0 otherwise. with topology described by a directed graph. the sum of the currents on branches entering the node equals the sum of the currents on branches leaving the node. i. Show that this can be expressed v = AT e. List each group of equivalent statements on a line. (g) rank( A B ) = rank(A). Put your answer in the following specific form. For example. because every square matrix that is onto has zero nullspace. v ∈ R(AT ). you might give your answer as a.10 Tellegen’s theorem.(f) R(A) ⊥ N (B T ). d. Two statements are equivalent if each one implies the other. we do not need any justification. You also believe that the first group of statements is not equivalent to the second.e. We let e ∈ Rn denote the vector of node potentials. Two statements are not equivalent if there exist (real) square matrices A and B for which one holds. while current flow in the opposite direction is considered negative. v ∈ Rb the vector of branch voltages.e. i e f. Each new line should start with the first letter not listed above. This means you believe that statements a. 4. j. (b) Kirchhoff ’s voltage law (KVL) states that the voltage across any branch is the difference between the potential at the node the branch leaves and the potential at the node the branch enters. j ∈ N (A). B ]) = rank(B). (a) Kirchhoff ’s current law (KCL) states that. We want just your answer. for each node. (The direction of each edge is the reference flow direction in the branch: current flowing in this direction is considered positive.9 Determinant of an orthogonal matrix.. g. in (alphabetic) order. j. each branch has a voltage and current. and h are equivalent. but the other does not. (h) R(A) ⊆ N (B). statements b and i are equivalent. d. i. c. A group of statements is equivalent if any pair of statements in the group is equivalent.. h b. Show that this can be expressed Aj = 0.) The directed graph is given by the incidence matrix A ∈ Rn×b . The algorithm you will discover is called back substitution. then find xn−1 (remembering that now you know xn ). i. because you are substituting known or computed values of xi into the equations to compute the next xi (in reverse order). then find xn−2 (remembering that now you know xn and xn−1 ). with A ∈ Rn×n nonsingular. Explain how this follows from parts (a) and (b) above. The product vk jk is the power entering (or dissipated by) branch k (when vk jk < 0. In other words. we have v T j = 0.11 Norm preserving implies orthonormal columns.12 Solving linear equations via QR factorization. and then express x as x = A−1 y = R−1 (QT x) = R−1 z. Be sure to explain why the algorithm you describe cannot fail. then for any vector x ∈ Rn we have Ax = x .. We can interpret Tellegen’s theorem as saying that the total power supplied by branches that supply power is equal to the total power dissipated by branches that dissipate power. Show that the converse holds: If A ∈ Rm×n satisfies Ax orthonormal columns (and in particular. 4. Consider the problem of solving the linear equations Ax = y.e. |vk jk | is the power supplied by branch k). Tellegen’s theorem is a power conservation law. 27 . multiplication by such a matrix preserves norm. solving Rx = z.(c) Tellegen’s theorem.. where z = QT y. In other words. then A has Hint. Start with Ax = x 2 . and try x = ei . m ≥ n). i. and also x = ei + ej . when R is upper triangular and nonsingular (which means its diagonal entries are all nonzero). 4. AT A = I. for all i = j. The trick is to first find xn . and so on. you’ll develop a method for computing x = R−1 z. In this exercise.e. and y given. Tellegen’s theorem states that for any circuit. In lecture we saw that if A ∈ Rm×n has orthonormal columns. 2 = x for all x ∈ Rn . We can use the Gram-Schmidt procedure to compute the QR factorization of A. Assuming A is full rank and skinny. the Hermitian conjugate is the same as the transpose. . 5. where A† is the (complex) pseudo-inverse of A. the solution is xls = A† y. αn . Let xls be the least-squares approximate solution of Ax = y. Also. you can leave block matrices in it. (You don’t need to simplify your expression. The complex least-squares problem is to find the x ∈ Cn that minimizes Ax − y 2 . in the real case.1 Least-squares residuals. . For complex matrices (or vectors) we define the Hermitian conjugate as the complex conjugate of the transpose. (a) What is the relation between u. y are complex). vectors. and scalars. . . We define the inner product of two complex vectors u. We associate with a complex vector u ∈ Cn a real vector u ∈ R2n . where A ∈ Cm×n and y ∈ Cm are given. For example.2 Complex linear algebra and least-squares. in general. given by A† = (A∗ A) −1 A∗ . v = u∗ v. such that α1 v1 + · · · + αn vn = 0. Most of the linear algebra you have seen is unchanged when the scalars. explanatory name). Note that these two expressions agree with the definitions you already know when the vectors are real. you simply generalize all the results to work for complex matrices. v ? Note that the first inner product involves complex ˜ ˜ vectors and the second involves real vectors. and then you apply what you already know about real linear algebra. and let yls = Axls . . (Which also reduces to the pseudo-inverse you’ve already seen when A is real). . v and u. given by ˜ u= ˜ ℜu ℑu . There are two general approaches to dealing with complex linear algebra problems. . vn } is dependent if there exist complex scalars α1 . and vectors are complex. which is equal to (A)T . We denote this as A∗ . We’ll explore that idea in this problem. x. we say a set of complex vectors {v1 . have complex entries.e. involve the transpose operator. and maybe a conceptual drawing).Lecture 5 – Least-squares 5. not all zero. Another approach is to represent complex matrices and vectors using real matrices and vectors of twice the dimensions. . There are some slight differences when it comes to the inner product and other expressions that. The Hermitian conjugate of a matrix is sometimes called its conjugate transpose (which is a nice. is a complex number. u = |u1 |2 + · · · + |un |2 1/2 . (b) What is the relation between u and u ? ˜ ˜u (c) What is the relation between Au (complex matrix-vector multiplication) and A˜ (real matrixvector multiplication)? ˜ (d) What is the relation between AT and A∗ ? (e) Using the results above. the ij entry of the matrix A∗ is given by (Aji ). v ∈ Cn as u.) 28 . which. i. matrices. ˜ We associate with a complex matrix A ∈ Cm×n the real matrix A ∈ R2m×2n given by ˜ A= ℜA ℑA −ℑA ℜA . Express A† y in terms of the real and imaginary parts of A and y. verify that A† y solves the complex least-squares problem of minimizing Ax − y (where A. Show that the residual vector r = y − yls satisfies r 2 = y 2 − yls 2 .. The norm of a complex vector is defined as u = u. Note that for a real matrix or vector. Thus. Suppose A is skinny and full-rank. give a brief geometric interpretation of this equality (just a couple of sentences. In the first. y (N ). You will plot the squareroot of the ˜ sum of squares of the difference. B. A vector signal x ∈ Rn is first processed by one channel.. . You must also submit the matlab code and output showing that you checked that AXB = I holds (up to very small numerical errors).e. The toeplitz() command may be helpful.) You will solve a specific instance of this problem. y(1). you can form the signal y (t) = a0 u(t) + b1 y (t − 1) + · · · + bn y (t − n) ˆ ˜ ˜ ˜ for t = n + 1. . and here you are using it only to check that AXB − I is small. b1 . . . and when B = I.) The following interpretation is not needed to solve the problem. The file IOdata. It generalizes the notions of left-inverse and right-inverse: When A = I.2 The middle inverse. . 6. explain why this is the case.Lecture 6 – Least-squares applications 6. . . . 29 . and X ∈ Rp×q . . Briefly discuss the results. and the goal is to find a matrix X that satisfies the equation. In this problem you will use least-squares to develop and validate autoregressive (AR) models of a system from some input/output (I/O) records. Your explanation should be as concrete as possible. bn . X is a right-inverse of A. divided by the squareroot of the sum of squares of y . u(N ).m contains the data for this problem and is available on the class web page. The matrices A and B are given. You will use least-squares to find approximate models of the form y(t) = a0 u(t) + b1 y(t − 1) + · · · + bn y(t − n). . which are the measured input and output of an unknown system. To validate or evaluate your models. y(N ). (When such an X exists. . You must explain how you found it. If you think there is no X that satisfies AXB = I. X is a left-inverse of B. . 35. . Like the norm of a vector. bn that minimize N t=n+1 (y(t) − a0 u(t) − b1 y(t − 1) − · · · − bn y(t − n)) 2 where u. and compare it to the actual output signal. ˜ Compare this to the plot above. . . . The squareroot of this quantity is the residual norm (on the 2 model data). . You are given I/O records u(1). or to determine that no such matrix exists. . ˜ ˜ y (1). . . . where A ∈ Rn×p . . y . y are the given data record. . . for n = 1. u(N ). but it doesn’t matter. . B ∈ Rq×n . You can do this by typing norm(A*X*B-eye(n)) in matlab. you do not have to try to use an efficient method based on one QR factorization. with data (i. you can try them on validation data records N u(1). . please give it.1 AR system identification. 35. . . You’ll plot this as a function of n for n = 1. . Dividing by t=n+1 y(t) yields the relative error. . (We haven’t yet studied the matrix norm. b1 .m. you can use inefficient code that just loops over n. we call it a middle inverse of the pair A. ˜ ˜ To find the predictive ability of an AR model with coefficients a0 . . If you succeed in finding an X that satisfies AXB = I. . One situation where this problem comes up is a nonstandard filtering or equalization problem. . and you must submit the code that you used to generate your solution. it measures size. To develop the models for different values of n. . . Specifically you will choose coefficients a0 . . We give it just to mention a concrete situation where a problem like this one might arise. the matrices A and B) given in the mfile axb_data. In this problem we consider the matrix equation AXB = I. N . and submitting a printout of what matlab prints out. for use in RF circuits. . The file inductor data. . β4 can be negative. given by A. or by fabricating the inductor and measuring the inductance. . β4 ∈ R are constants that characterize the approximate model. d w D The inductor is characterized by four key parameters: • n. is to find α. implemented in CMOS. the fit between Li and Li ). (since L is positive. the number of turns (which is a multiple of 1/4. . we have α > 0. can be used for design of planar spiral inductors. β3 . Thus. . w. not that it would affect how you solve the problem . . and D. the inductance Li (in nH) obtained from measurements. . Li = αnβ1 wi 2 dβ3 Di 4 ≈ Li i i Your solution must include a clear description of how you found your parameters. we give parameters ni . where α. the inner diameter The inductance L of such an inductor is a complicated function of the parameters n. In this problem you will develop a simple approximate inductance model of the form ˆ L = αnβ1 wβ2 dβ3 Dβ4 . 30 . w13 gives the wire width of inductor 13. d.3 Approximate inductance formula. but that needn’t concern us) • w. . the problem.m on the course web site contains data for 50 inductors.e. wi . which takes considerable computer time. . i. 50. β1 . it is acted on by another channel. the outer diameter • d. The figure below shows a planar spiral inductor.e.. Our goal is to do the intermediate processing in such a way that it undoes the effect of the first and last channels. for example. we leave that to your engineering judgment. We define the percentage error between Li and Li as ˆ ei = 100|Li − Li |/Li . .) This simple approximate model. 6. After this processing. β4 so that β β ˆ for i = 1.represented by B. as well as their actual numerical values. (The data is real. But be sure to ˆ tell us what criterion you use. and Di (all in µm). β1 . . At this point the signal is available for some filtering or processing by us.. Note that we have not specified the criterion that you use to judge the approximate ˆ model (i. but the constants β2 . The inductance L can be found by solving Maxwell’s equations. . di . (The data are organized as vectors of length 50. which is represented by the matrix X. .) Your task. if accurate enough. the width of the wire • D. and also. . ) For inductor i. β2 . . or predict. . .Find the average percentage error for your model. Find c explicitly. We consider a square region. to find z (t + 1). we find the quadratic function f (τ ) = a2 τ 2 + a1 τ + a0 for ˆ ˆ which t τ =t−9 (z(τ ) − f (τ )) 2 is minimized. . . z = 5*sin(t/10 + 2) + 0. . In that case you obtained the quadratic extrapolator by interpolating the last three samples. we do not require that your model minimize the average percentage error. . 6.e. z(t − 9)    .5). x1 xn+1 x2 xn x2n xn2 31 . . z(t − 1). . 1000. . . 6. and zint (the ˆ ˆ estimated values using interpolation). . The extrapolated value is then given by z (t + 1) = f (t + 1). (c) In a previous problem you developed a similar predictor for the same time series z. which is given by (1/990) 1000 2 z j=11 (ˆ(j) − z(j)) 1000 (1/990) j=11 z(j)2 1/2 . Restrict your plot to t = 1. More precisely. now you are obtaining it as the least squares fit to the last ten samples. i. (b) Use the following matlab code to generate a time series z: 1×10 t = 1:1000. z(t − 9). which we divide into an n × n array of square pixels. We’ll denote the predicted value of z(t + 1) by z (t + 1). . ˆ (a) Show that   z (t + 1) = c  ˆ   z(t) z(t − 1) . using least-squares fit. We extrapolate. 100.1*sin(t) + 0. z(t + 1) based on a least-squares fit of a quadratic function to the previous ten elements of the series. Use the quadratic extrapolation method from part (a) to find zls (t) for t = 11. Find ˆ the relative root-mean-square (RMS) error.4 Quadratic extrapolation of a time series.5 Image reconstruction from line integrals.) Hint: you might find it easier to work with log L. on the same plot. We are given a series z up to time t. z(t). zls (the estimated values using least-squares). .. (We are only asking you to find the average percentage error for your model. (e1 + · · · + e50 )/50. Compare the RMS error for these methods and plot z (the true values).1*sin(2*t . In this problem we explore a simple version of a tomography problem.  where c ∈ R does not depend on t. as shown below. The problem is to estimate the vector of densities x. . . we have l1 = l6 = l8 = l9 = 0. we want to estimate the vector of densities x. It creates the following variables: • N. associated with lines L1 . as shown above. x ∈ Rn is a vector that describes the density across the rectangular array of pixels. • y. j = 1. 32 • n_pixels. . . . . . . n2 . the number of measurements (N ). . the whole set of measurements forms a vector y ∈ RN whose elements are given by n2 yj = i=1 lij xi + vj . N . . N. Thus. . . each measurement is corrupted by a (small) noise term. . In addition. i=1 where li is the length of the intersection of line L with pixel i (or zero if they don’t intersect). j = 1. LN . Each sensor measurement is a line integral of the density over a line L. The class webpage contains the M-file tomo_data. In other words. From these measurements. we’ll assume that the density is constant inside each pixel. . In this example. . . j = 1. . i = 1. n2 . To simplify things. N. by a single index i ranging from 1 to n2 . Then. line L x1 x4 l4 x2 x5 l5 l2 x3 l3 x6 x9 x8 l7 Now suppose we have N line integral measurements. . . . a vector with the line integrals yj . the sensor measurement for line L is given by n2 li xi + v. And now the problem: you will reconstruct the pixel densities x from the line integral measurements y. and we denote by xi the density in 2 pixel i. . where lij gives the length of the intersection of line Lj with pixel i. . . .m. and v is a (small) measurement noise.The pixels are indexed column first. i = 1. the side length in pixels of the square region (n). . which you should download and run in matlab. The lines are characterized by the intersection lengths lij . from a set of sensor measurements that we now describe. This is illustrated below for a problem with n = 3. We are interested in some physical property such as density (say) which varies over the region. We also provide the function line_pixel_length. . displays the matrix A as an image. in case you’re curious. In a memoryless model. • imagesc(A).• lines_d. j on the image. changes matlab’s image display mode to grayscaled (you’ll want to do this to view the pixel patch). In this problem you will use least-squares to fit several different types of models to a given set of input/output data. • axis image. best known for its application in medical imaging as the CAT scan. . N . jth element corresponds to the intersection length for pixel i. Now define yj = − log zj . .n. j = 1.. (distance from the center of the region in pixels lengths) of each line. redefines the axes of a plot to make the pixels square. a vector containing the displacement dj . That is. moving average (MA): y(t) = a0 u(t) + a1 u(t − 1) + a2 u(t − 2) autoregressive (AR): y(t) = a0 u(t) + b1 y(t − 1) + b2 y(t − 2) autoregressive moving average (ARMA): y(t) = a0 u(t) + a1 u(t − 1) + b1 y(t − 1) 33 . N . • colormap gray. y(t) depends on u(s) for some s = t.. j = 1. . which you do need to use in order to solve the problem. and a scalar output sequence y. u(t). Another common term for such a model is static.6 Least-squares model fitting. of each line.m on the webpage. the output at time t. and • lines_theta. .m. i. . The data consist of a scalar input sequence u. You will develop several different models that relate the signals u and y. . depends only the input at time t.e. If an x-ray gets attenuated at rate xi in pixel i (a little piece of a cross-section of your body). shows how the measurements were computed. . You should take a look.e. .m returns a n × n matrix. given dj and θj (and the side length n). but you don’t need to understand it to solve the problem. and we get n2 yj = i=1 xi lij . You’ll know you have it right. Note: While irrelevant to your solution. This function computes the pixel intersection lengths for a given line. etc. y(t). We consider some simple time-series models (see problem 2 in the reader). constant model: static linear: static affine: static quadratic: y(t) = c0 y(t) = c1 u(t) y(t) = c0 + c1 u(t) y(t) = c0 + c1 u(t) + c2 u(t)2 • Dynamic models. converts the vector v (which must have n*m elements) into an n × m matrix (the first column of A is the first n elements of v. • Memoryless models. Use this information to find x. and display it as an image (of n by n pixels). N . a vector containing the angles θj . line_pixel_length.). which are linear dynamic models. . The file tmeasure. whose i.m). this is actually a simple version of tomography. for t = 1. In a dynamic model. the j-th measurement is n2 zj = i=1 e−xi lij . on the same webpage. . Matlab hints: Here are a few functions that you’ll find useful to display an image: • A=reshape(v. scaled so that its lowest value is black and its highest value is white. 6. i. . with the lij as before. by following the link to matlab datasets... For each of these models. h(n−1) is the impulse response of the channel. .e. ai . etc. we want the least-squares equalizer is g that minimizes the sum-of-squares error (g ∗ h)(t)2 . . pick a0 . u(s) for s < t. z(t) ≈ u(t − D). where z : Z → R is the filter output. and b1 that minimize N t=2 (y(t) − a0 u(t) − a1 u(t − 1) − b1 y(t − 1)) .7 Least-squares deconvolution. In other words. It is generated by a nonlinear recursion.e.. the goal is to make it as close as possible to a D-sample delay. where taps refer to taps on a delay line. . The ˆ ˆ data for this problem are available from the class web page. Since z = g ∗ h ∗ u (discrete-time convolution). y(t) depends indirectly on all previous inputs. Specifically. y. on the other hand.m. where u : Z → R is the channel input sequence. . which has infinite memory. y : Z → R is the channel output. find parameter values that minimize the sum-of-squares of the residuals. i. g(m − 1) is the impulse response of the filter. 2 (Note that we start the sum at t = 2 which ensures that u(t − 1) and y(t − 1) are defined. in an m-file named uy data. (g ∗ h)(t) ≈ 1 t=D We will refer to g ∗ h as the equalized impulse response. . .) Each of these models is specified by its parameters. You will design a deconvolution filter or equalizer which also has FIR form: m−1 z(t) = τ =0 y(t − τ )g(τ ). g(m − 1)) so that the filter output is approximately the channel input delayed by D samples. which we are to design. find the least-squares fit to the given data. .) For each model. has a finite memory: y(t) depends only on the current and two previous inputs. and plot the residual (which is y − y ). For this reason. this means that we’d like 0 t = D. the scalars ci . . The MA model. i. This will create the vectors u and y and the scalar N (the length of the vectors). Plot the output y predicted by your model. In terms of discrete-time convolution we write this as y = h ∗ u. This is shown in the block diagram below. Now you can plot u. i. A communications channel is modeled by a finite-impulse-response (FIR) filter: n−1 τ =0 y(t) = u(t − τ )h(τ ). for the ARMA model. . . y is the channel output. give the root-mean-square (RMS) residual. and g(0). y is not generated by any of the models above. t=D 34 . For example.Note that in the AR and ARMA models. a1 .e. . the squareroot of the mean of the optimal residual squared. y u ∗h ∗g z The goal is to choose g = (g(0). the AR and ARMA models are said to have infinite memory. Copy this file to your working directory and type uy data from within matlab. due to the recursive dependence on y(t − 1). Note: the dataset u. and h(0). . bi . (Another term for this MA model is 3-tap system. 6. Give a histogram plot of the amplitude distribution of both y and z.e. Show how to use least-squares to simultaneously estimate the parameter vector x ∈ Rn . Usually we assume (often implicitly) that the measurement errors vi are random... Matlab hints: The command conv convolves two vectors. For example. aT m i = 1.e.) You can assume that m ≥ n and the measurement matrix  T  a1  aT   2  A= . (Indices in matlab run from 1 to n.e. and starts at t = 0 (i.e. Otherwise describe the conditions (on the matrix A) that must hold for your method to work.8 Estimation with sensor offset and drift. . small. it just makes the interpretation simpler.) In such cases. small. is full rank (i. least-squares estimation of x works well. The signal u is binary. as well as a term that grows linearly with time (called a drift). i. It will define the channel impulse response h as a matlab vector h. to form the signal z. so h(3) in matlab corresponds to h(2).. with delay D = 12. while the argument of the channel impulse response runs from t = 0 to t = n − 1. and wi is part of the sensor error that is unpredictable. . is taken at time t = iT . If we knew the offset α and the drift term β we could just subtract the predictable part of the sensor signal.) Comment on what you find. . If your method always works. Plot the equalized impulse response (g ∗ h). the offset α ∈ R. Clearly explain your method. the command hist plots a histogram (of the amplitude distribution). and centered around 0. (You don’t need to worry about how to make this idea precise. and the drift coefficient β ∈ R. (g ∗ h)(D) = 1. each sensor measurement might include a (common) offset or bias. (This isn’t really material. (b) The vector y (also defined in deconv data.  .   . β is the drift term (again the same for all measurements).) (a) Find the least-squares equalizer g. In some cases. however. starting at time t = T . α + βiT from the sensor signal.subject to the constraint To solve the problem below you’ll need to get the file deconv data. i where • yi is the ith (scalar) measurement • x ∈ Rn is the vector of parameters we wish to estimate from the measurements • vi is the sensor or measurement error of the ith measurement In this problem we assume the measurements yi are taken at times evenly spaced. Plot the impulse responses of the channel (h) and the equalizer (g). and give a simple example where the conditions don’t hold. We model this situation as vi = α + βiT + wi where α is the sensor bias (which is unknown but the same for all sensor measurements). 1}. (You can remove the first and last D samples of z before making the histogram plot. the measurement error includes some predictable terms. m. Pass y through the least-squares equalizer found in part a. But we’re interested in the case where we don’t know the offset α or the drift coefficient β. 35 . 6. We consider the usual estimation setup: yi = aT x + vi . i. Thus.. unpredictable. u(t) ∈ {−1. u(t) = 0 for t < 0)..e. and centered around zero. the ith measurement. T seconds apart. yi .m) contains the channel output corresponding to a signal u passed through the channel (i. has rank n).m from the class web page in the matlab files section. y = h ∗ u). . of length m = 20. say so. We consider here the problem of estimating the matrix A. . and is proportional to the emission rate. . sn ∈ R2 .e. . (c) Now we suppose that one of the spot measurments is faulty. and v ∈ Rm is the noise or disturbance. t1 = [1 1]T .) The emission rates are to be determined. You will use a least-squares criterion to form an estimate A of ˆ the matrix that minimizes the quantity A. y (1) . for which it is impossible to find the emission rates given the spot measurements.10 Identifying a system from input/output data. . t4 = [3 2]T . y ∈ Rm is the output vector. . This file defines three source locations (given as a 2 × 3 matrix. Be sure to tell us which spot measurement you believe to be faulty. Explain your method. It also gives two sets of spot measurements: one for part (b). (These are positive. ˆ based on the given input/output data. . x ∈ R is the input vector. we have si = sj for i = j). . s3 = [1 2]T . at known locations s1 . In other words.6. but remember. possibly corrupted by noise. we let xj denote the emission rate for source j. . and one for part (c). we are given the following: x(1) . There are n sources of a pollutant. you will choose as your estimate A N J= k=1 Ax(k) − y (k) 2 36 . and what your guess of the emission rates is. located at t1 . we have y (k) = Ax(k) + v (k) .) 6. tm ∈ R2 . . We measure the total pollutant level at m spots. In this part.9 Estimating emissions from spot measurements. Estimate the pollutant emission rates. where α is a known (positive) constant.. . we do not have sj = ti for any i or j. and ten spot measurement locations (given as a 2×10 matrix). but the source and spot measurement locations are. m×n n These represent N samples or observations of the input and output. Carry out your method on the data given in the matlab file. respectively. we have ti = tj for i = j) and that none of the source locations is repeated (i. y (N ) ∈ Rm . with 4 sensors and 3 sources.e.e. Specifically. (a) Give a specific example of source and spot measurement locations. . the columns give the locations). The contribution from source j to measurement i is given by αxj / sj − ti 2 . . x(N ) ∈ Rn . . (And similarly for the two emission rates that give the same spot measurements. and then estimate the source emission rates.. they contain small noise and errors. and so on. . Be careful to use the right set of measurements for each problem! The spot measurements are not perfect (as we assumed in part (a)). We consider the standard setup: y = Ax + v. In other words. The problem is to estimate the (coefficients of the) matrix A. the pollutant concentration from a source follows an inverse square law. rank. where A ∈ R .m that is available on the class web site. N. we assume the spot measurements are exactly as described above. . .. . . but to simplify things we won’t concern ourselves with that. Explain how you would identify or guess which one is malfunctioning. . i. or estimated. give two specific different sets of emission rates that yield identical spot measurements.) (b) Get the data from the file emissions_data. We also assume that none of the spot locations is repeated (i. we ignore the issue of noise or sensor errors. To show that your configuration is a valid example. and give your estimate for the emissions rates of the three sources. i. its associated noise or error is far larger than the errors of the other spot measurements. such as as s1 = [0 1]T . . . (The emission rates are not the same as in part (b). k = 1. which are known..e. . we want a specific numerical example. . . . Each source emits the pollutant at some emission rate. nullspace. . given some input/output data. where v (k) are assumed to be small. . You are free to (briefly) explain your example using concepts such as range. The total pollutant measured at spot i is the sum of the contributions from the n sources. . Specifically. We assume that measurement spots do not coincide with the source locations. for j = 1. You can assume that all of the matrices are full rank.) • The estimate then average method. . and N . A(k) found during calibration. given y. since it attempts to take into account the uncertainty in the matrix A. First. state it clearly. i. where x ∈ Rn is a vector we’d like to estimate.11 Robust least-squares estimation methods. n. we find x(j) that minimizes A(j) x − y over x. Here is the twist: we do not know the matrix A exactly. y (N ) . . Now suppose we have a measurement y taken on a day when we did not calibrate the sensor system. we find the least-squares estimate of x for each of the calibration values. . for example. etc. We assume that m > n.. your source code. . There is no pattern to the (small) variations between the matrices. The goal is to estimate x.e. You may want to use the matrices X ∈ Rn×N and Y ∈ Rm×N given by X = x(1) · · · x(N ) . . Three very reasonable proposals for robust estimation are described below. j=1 which is supposed to represent the most typical value of A. • The average then estimate method. If you need to make an assumption about the input/output data to make your method work. i.e. (b) On the course web site you will find some input/output data for an instance of this problem in the mfile sysid_data. y ∈ Rm is the vector of measurements. respectively.. (a) Explain how to do this. and A ∈ Rm×n . ˆ j=1 (Here the subscript ‘ea’ stands for ‘estimate (then) average’. the m × N matrix Y contains the output data y (1) . You must give your final estimate ˆ A. .e. First. the first column of X contains x(1) . on the different days. Similarly..e. A(k) for the matrix A. and found the values A(1) . to minimize Aavg x − y . ˆ We don’t know A exactly. but we can assume that it is close to the known values A(1) . i. . . . . . These matrices are close to each other. We refer to this ˆ value of x as xae . (You can assume that ˆ ˆ Aavg is full rank.) 37 . k. . A method for guessing x in this situtation is called a robust estimation method. The n × N matrix variable X contains the input data x(1) .over A. In fact we calibrated our sensor system on k > 1 different days. we form the average of the calibration values.). We want to form an estimate x. we find that the estimates x(j) are also close but not equal. but is assumed to be small. but not exactly the same. Executing this mfile will assign values to m. where the subscript stands for ‘average (then) estimate’. and also give an explanation of what you did.. . We consider a standard measurement setup.e. . and create two matrices that contain the input and output data. . there is no discernable drift. Aavg = 1 k k A(j) . We then choose our estimate x to ˆ minimize the least squares residual using Aavg . The measurement error v is not known. i. x(N ) (i. 6. Y = y (1) · · · y (N ) in your solution. with y = Ax+v. Since the ˆ ˆ ˆ matrices A(j) are close but not equal. . have rank n. . there are more measurements than parameters to be estimated. based on this measurement.. ˆ We find our final estimate of x as the average of these estimates: xea = ˆ 1 k k x(j) . v ∈ Rm is the vector of measurement errors. .m. the variations appear to be small and random. . . A(k) and y). . so we might as well treat it as noise. if you ˆ believe that xea is a linear function of y. He describes it as follows: We should simultaneously estimate both the signal x and the interference v. then there is no hope of finding x. The noise is unknown. and just ignore it during the estimation process. based on y. He describes it as follows: We don’t know the interference. then the residual. which is our measurement. using the jth ˆ calibrated value of A. we simply ignore our estimate of v. give a formula for the matrix that gives x in terms of y. We have y = Ax + Bv + w. say whether the estimate x is a linear function of y. We should simultaneously estimate both the signal x and the interference v. v ∈ Rp is the interference. m > n. but here we have subtracted off or ˜ cancelled the effect of the estimated interference. This problem concerns three proposed methods for estimating a signal.12 Estimating a signal with interference. then you should give a formula for Bea (in terms of ˆ A(1) . ˆ Here is the problem: (a) For each of these three methods. (Here we have z = Bv + w. along with some informal justification from their proposers. is given by r(j) = A(j) x − y. .) Each of the EE263 TAs proposes a method for estimating x. for the model y = Ax + z (with z a noise) to estimate x. ˆ (b) Are the three methods described above different? If any two are the same (for all possible values of the data A(1) . but cannot be assumed to be small. and that the ranges of A and B intersect only at 0. Then. we then throw away v . We can use the usual least-squares method. Miki proposes the estimate and cancel method. exactly as in Almir’s method. Nikola proposes the ignore and estimate method. and use our estimate of x. based on a measurement that is corrupted by a small noise and also by an interference. If we make the guess x. and can be assumed to be small.) Almir proposes the estimate and ignore method. . If they are different. where A ∈ Rm×n and B ∈ Rm×p are known. we use standard least-squares to estimate x from y . from the simple model ˜ y = Ax + z. In Almir’s method. The RMS value of the collection of residuals ˆ is given by 1/2  k 1 r(j) 2  . You can assume that the matrices A and B are skinny and full rank (i. 6. but I think we should use it. . He describes it as follows: Almir’s method makes sense to me. . that need not be small. but I can improve it.e. even when w = 0. Here y ∈ Rm is the measurement (which is known). We denote this ˆ estimate of x as xrms . . If it is ˆ a linear function. The interference is unknown. Once we’ve estimated x and v. where xea = Bea y. . For example. using a standard least-squares method to estimate [xT v T ]T given y. k j=1 In the minimum RMS residual method. x ∈ Rn is the signal that we want to estimate. . These methods. give a specific example in which the estimates differ. our estimate ˆ of the interference. and w is a noise.) 38 . (This is exactly as in Nikola’s method. m > p). since a nonzero interference can masquerade as a signal. based on y. A(k) ). are given below.. explain why. but that doesn’t matter. (If this last condition does not hold. using a standard least-squares method.• Minimum RMS residuals method. we choose x to minimize this quantity. We can form the “pseudo-measurement” y = y−Bˆ. with the effect of the estimated interference subtracted ˜ v off. . y2 (t + 1). It is known that y1 (t + 1). with one time lag. and the dimensions n.e. You will pick A that minimizes the mean-square prediction error. yn (t). . . show that the formulas given in part (a) are equal (even if they don’t appear to be at first). Justify your answer. Compare your mean-square prediction error to the mean-square value of y. The second component. and submit the code that you use to solve the problem.. to predict y(t + 1). Give your final estimated model parameter A. which is a fancy name for a linear system driven by a noise. using the data found in the vts_data. In general. and also A. y(T ) ∈ Rn . . . T . . A consultant proposes the following model for the time-series: y(t) = Ay(t − 1) + v(t). and the resulting mean-square error. 1 T −1 T e(t) 2 .e. . (We keep the definition of the terms ‘small’ and ‘unpredictable’ vague. ˆ t = 2. . the ith component. . yi (t + 1). since the exact meaning won’t matter. To find the parameter A that defines our model. . The n components of y(t) might represent measurements of different quantities. i. This problem concerns a vector time-series. . . T. for each method give a formula for the estimate x in terms of A. If they are all three the same. given y(1). B.13 Vector time-series modeling. i. . . always give exactly the same results). (c) Which method or methods do you think work best? Give a very brief explanation.. . p. . . Our goal is to develop a model that allows us to predict the next element in the time series. .. part of the problem is to translate their descriptions into more precise algorithms. yn (t). .e. so we simply replace it with zero when we estimate the next time-series sample. Once we have a model of this form. yn (t). which is short for vector auto-regressive. (That is.) ˆ (b) Are the methods really different? Identify any pairs of the methods that coincide (i. y. t=2 over all lower-triangular matrices. The idea here is that v(t) is unpredictable. the parameter in our model. To show two methods are the same. . . Roughly speaking. or prices of different assets. where the matrix A ∈ Rn×n is the parameter of the model. (If your answer to part (b) is “The methods are all the same” then you can simply repeat here.. . . which contains an n × T matrix Y. m. . The prediction error depends on the time-series data. Carry out this method.) This type of model has several names. whose columns are the vector time-series samples at t = 1. It is called an VAR(1) model. does not depend on yi+1 (t). There is one more twist. To show two methods are different. . T. It is also called a Gauss-Markov model. does not depend on y2 (t). at time period t. This means that the matrix A is lower triangular. . The prediction error is given by e(t) = y (t) − y(t). . i. this means that the current time-series component yi (t) only affects the next time-series components y1 (t + 1). yi (t + 1). . . T t=1 39 . y(t). and v(t) ∈ Rn is a signal that is small and unpredictable. . “The methods are all the same”. .These descriptions are a little vague. say so. .m. . . T 1 y(t) 2 . we can predict the next time-series sample using the formula y (t + 1) = Ay(t). y(1). Aij = 0 for i < j. . ˆ t = 1. t = 2. .) 6. T. . . . the first component of the next time-series sample. you will use a least-squares criterion.e. does not depend on y3 (t). . (a) Give an explicit formula for each of the three estimates. Explain your method. give a specific numerical example in which the estimates differ. or all three different. To judge the quality of fit we use the mean-square error. Please bear in mind that a0 . (Not needed to solve the problem. . with some measurement errors. J= 1 N N i=1 |H(si ) − hi |2 . . . with a(k) . whereas many other variables and data in this problem are complex. (a) Explain how to obtain a(k+1) and b(k+1) . For example. The goal is to find a rational transfer function that fits the measured frequency response. . . which complicates solving the least-squares problem to find a(k) and b(k) . . . b that approximately minimize J. H(s) = B(s) where A and B are the polynomials A(s) = a0 + a1 s + · · · + am sm . . N. N . hN ∈ C. .) You can think of H as a rational transfer function. We start by expressing J in the following (strange) way: J= 1 N N i=1 A(si ) − hi B(si ) zi 2 . . . b(k) . . The method works by choosing a and b that minimize the lefthand expression (with zi fixed). hN are complex. bm ) ∈ Rm . . . bm ∈ R are real parameters. and b(k+1) ≈ b(k) . More precisely. and zi denoting the values of these parameters at iteration k. for finding coefficients a.e. am ) ∈ Rm+1 and b = (b1 .e. B(s) = 1 + b1 s + · · · + bm sm . backslash) in your explanation.14 Fitting a rational transfer function to frequency response data. . We are given noisy measurements of H at some points on the imaginary axis. . . We define a = (a0 . The data is a set of frequency response measurements. and hope to choose a and b so that we have H(si ) ≈ hi . h1 . . i. . sN = jωN ∈ C. . To update these parameters from iteration k to (k+1) iteration k + 1. . . which is fixed at one). i = 1. am and b1 . . . .e. . we have a(k+1) ≈ a(k) . . You must explain the math. . (k) Several pathologies can arise in this algorithm. .. i. .. . Interpretation. . . . . if) successive iterates are very close. given z (k+1) . . zi = B(si ). B (k) denoting the associated polynomials. The iteration is stopped when (or more accurately. . Here a0 . and h1 . This problem explores a famous heuristic method.) We can start the iteration with zi = 1. . N (which is what would happen if we set B (0) (s) = 1). . . and A(k) . . am ∈ R and b1 . bm are real. (1) (This can be done using ordinary linear least-squares. . for i = 1. . . 40 . and then repeating. This problem concerns a rational function H : C → C of the form A(s) . Here ω1 . we set zi = B (k) (si ). or a certain matrix can be less than full rank. . you may not refer to any matlab notation or operators (and especially. 6. First. . predict what you think y(T + 1) is. based on solving a sequence of (linear) least-squares problems. Then we choose (k+1) (k+1) a and b that minimize 1 N N i=1 A(k+1) (si ) − hi B (k+1) (si ) zi (k+1) 2 . some data s1 = jω1 . ωN are real and nonnegative. let k denote (k) the iteration number. . a and b are vectors containing the coefficients of A and B (not including the constant coefficient of B.Finally. You can ignore these pathologies. . . however. we can end up with zi = 0. with s the complex frequency variable. we proceed as follows. . then updating the numbers zi using the righthand formula.. . based on the data given. i = 1. . and s ∈ C is the complex independent variable. i. This file contains the data ω1 .(b) Implement the method.s). J(k)). wire k goes from cell I(k) to cell J(k).e.. whose positions are fixed and given. and apply it to the data given in rat_data. We model a cell as a single point in R2 (which gives its location on the IC) and ignore the requirement that the cells must not overlap. Note that the kth row of A is associated with the kth wire. and yi gives the y-coordinate of cell i.. The positions of the cells are (x1 . and free cells. (xN . y1 ). Give the final coefficient vectors a. or external pins on the IC. yn ). (In fact. 41 . yN ). . (x2 . to be the free ones. . . y1 ). i.e. We consider an integrated circuit (IC) that contains N cells or modules that are connected by K wires. The node incidence matrix A ∈ RK×N is defined as   1 wire k goes to cell j. h1 . assuming that the wires are run as straight lines between the cells. . . .e. . Plot J versus iteration k. (xN . This is a bit tricky.) There are K wires that connect pairs of the cells.m. ωN . given by K J= k=1 (xI(k) − xJ(k) )2 + (yI(k) − yJ(k) )2 .e. as well as m and N .. . and k on a linear scale. hN . I(k)). To evaluate a polynomial in matlab. with J on a logarithmic scale. is to place the free cells in order to minimize the the total square wire length. . . Terminate the algorithm when a(k+1) − a(k) b(k+1) − b(k) ≤ 10−6 . To evaluate b(s) you can use polyval([b(m:-1:1). The task of finding good positions for the free cells is called placement. at positions (xn+1 . using the command semilogy. b. and the associated final value of J. . .. and the problem data. yn+1 ). Here I and J are functions that map wire number (i.15 Quadratic placement. and the remaining N − n cells. . k) into the origination cell number (i. i. .) One common method. respectively. you can either write your own (short) code. Note: no credit will be given for implementing any algorithm other than the one described in this problem. and the destination cell number (i. we’ll use the node incidence matrix A for the associated directed graph. We will assign an orientation to each wire (even though wires are physically symmetric). whose positions are to be determined. The goal in placing the free cells is to use the smallest amount of interconnect wire. We will take the first n cells. . y2 ). . 6. To evaluate A(s) in matlab you can use the command polyval(a(m+1:-1:1). or use the matlab command polyval.. Plot |H(jω)| on a logarithmic scale versus ω on a linear scale (using semilogy). We have two types of cells: fixed cells. (The fixed cells correspond to cells that are already placed.s). xi gives the x-coordinate of cell i. j = I(k) Akj =  0 otherwise. (xn . the wires in an IC are not run on straight lines directly between the cells. called quadratic placement. and the jth column of A is associated with the jth cell. . Pretending that the wires do run on straight lines seems to give good placements. To describe the wire/cell topology and the functions I and J. . at positions (x1 . last iteration. .e. i. yN ). Specifically. to be the fixed ones. .e.. since polyval expects the polynomial coefficients to be listed in the reverse order than we use here. for the first iteration.1]. but that’s another story. j = J(k) −1 wire k goes from cell j. or even the system order n. which is K ×N . . .001). We do not assume it is controllable. . It plots the proposed placement. we define the mean-square tracking error as N 1 x(t) − xdes (t) 2 . . predict what y(201). it defines the dimensions n. y1 ). . . N . The mfile qplace_data. the state x(t) (including the initial state x(0)). which gives y(1). y(200). 6. as well as the node incidence matrix that describes the wires.e.) Then. . For a given input u. to predict what y(p + 1). . .yfree. with x(0) = 0. . . (We have p = 200 in this specific problem. (b) In this part you will determine the optimal quadratic placement for a specific set of cells and interconnect topology. i. such as placing all free cells at the origin. .17 Time series prediction. .m.  0. This mfile takes as argument the x.and y-coordinates of the fixed cells. y(t) = Cx(t) + v(t). Plot your estimates 42 . y(p). Here is the problem: get the time series data from the class web site in the file timeseriesdata.16 Least-squares state tracking. 0  6. p: y(1). . Give the optimal locations of the free cells. (c) Find E(uopt ) for the specific system with  0. Check your placement against various others. and N − n vectors xfixed and yfixed. . .. where x(t) ∈ Rn .. 0  1 B =  0 . Briefly justify your answer. Your solution can involve a (general) pseudo-inverse. The goal is to predict y(t) for the next q time steps. and to explain the matlab source code that solves it (which you must submit). there is a unique uopt that minimizes E(u).(a) Explain how to find the positions of the free cells. Consider the system x(t + 1) = Ax(t) + Bu(t) ∈ Rn .. that minimize the total square wire length. The mfile also defines the node incidence matrix A. and N = 10. i. (b) True or false: If the system is controllable. and K. Specifically. i. . We give you two more pieces of information: the system order n is less than 20. . yn ).1 0 A= 1 0 1 xdes (t) = [t 0 0]T . . y(220) are. Suppose xdes (t) ∈ Rn is given for t = 1. and the RMS value of 1/2 p the noise.m defines an instance of the quadratic placement problem.xfixed.yfixed. .and y-coordinates of the free and fixed cells. . is small (on the order of 0. .e. You will also find an mfile that plots a proposed placement in a nice way: view_layout(xfree.1 0 . . Be sure to explain how you solve this problem. You do know the measured output signal for t = 1. you do not know the matrices A ∈ Rn×n or C ∈ R1×n . E(u) = N t=1 (a) Explain how to find uopt that minimizes E (in the general case). .e. . (xn . which give the x. N (and is meant to be the desired or target state trajectory).A). In this problem. We consider an autonomous discrete-time linear system of the form x(t + 1) = Ax(t). (1/p) t=1 v(t)2 . y(p + q) will be. y(t) ∈ R is the measured output signal. and v(t) ∈ R represents an output noise signal.8 0. . . . You may make an assumption about the rank of one or more matrices that arise. (x1 . Plot your optimal placement using view_layout. . from v1 . 2. . The associated sums are v1 = −2. . . this requires that Z is invertible. . give us the numbers. . v4 = u1 + u2 . Sq ). . u6 . v1 = u 2 + u 3 .) Choose one of the following: 43 . . . S8 = {3.18 Reconstructing values from sums over subsets. vq . . If the set of subsets is not informative. v10 = 17. . This corresponds to the subsets S1 = {2. . . p} that defines the partial sum used to form vi . S2 = {1. 6}. 2}. . for i = 1. . v4 = 4. . . . 3. and u1 . 5. . . (Thus. v3 = 6. the subsets S1 . . . . (We know both vi and Si . Now that we know u1 we can find u2 from u2 = v4 − u1 = v4 − v2 + v1 . Then we reconstruct u as u = Z −1 f .) It is extremely important that you explain very clearly how you come up with your prediction. S3 = {1. 5}. As an example with p = 3 and q = 4. . Si denotes the subset of {1. 6}. . S11 is uninformative. but want to find. . . q. S5 = {1. . (a) This subproblem concerns the following specific case. . whose given subset ˜ ˜ sums agree with the given v1 . i = 1. vq . . 2. 6}. p. vi = j∈Si Here. 5. . we show how to reconstruct u1 . .) For each i. we define fi as the sum of all vj . . . • The collection of subsets S1 . . of course. Sq informative if we can determine. 5. how do you make them? 6. . . 4. u3 . . 6}. v11 = 15. . v3 = u1 + u3 . and also. . S11 is informative. 3. 6}. up without ambiguity. Note: this is only an example to illustrate the notation. 5}. over subsets that contain i: fi = i∈Sj vj . where uj . . v7 = 11. We define the subset count matrix Z ∈ Rp×p as follows: Zij is the number of subsets containing both i and j. S4 = {1. . 6}. S3 = {1. . u2 . . . u6 . 2. v6 = −5. v2 = 14. . 4. First we note that u1 = v2 − v1 . (You may also want to plot the whole ˆ ˆ set of data. (b) This subproblem concerns a general method for reconstructing u = (u1 . . u6 . S11 = {1. Justify why you believe this is the case. 4. . . In the same way we can get u3 = v3 − u1 = v3 − v2 + v1 . 5}. vq ) (and of course. What is your method? If you make choices during your procedure. . . . . just to make sure your prediction satisfies the ‘eyeball test’. give two different sets of values u1 . v8 = 9. . we know v1 . 2. Specifically. . The subsets are S6 = {2. 3}. 4. with p = 6 numbers and q = 11 subsets. from t = 1 to t = 220. or reconstruct. . Zii is the number of subsets that contain i. S2 = {1. Choose one of the following: • The collection of subsets S1 . There are real numbers u1 . 2. . To see this. (Of course. .y (201). up ) given v = (v1 . 3}. v2 = u1 + u2 + u3 . S4 = {1. . . . u1 . We do have information about sums of some subsets of the numbers. 4}. . S7 = {2. . we say it is uninformative. and reconstruct u1 . To justify this. . . . . S9 = {3. 3. . 3}. This collection of subsets is informative. S10 = {4. . v5 = 20. 3}. y (220). . . v11 . up that we do not know.) We call the collection of subsets S1 . . v9 = 1. S1 = {1. and that Z −1 f is the unique u with subset sums v.) 6. .5 0 50 100 150 200 250 300 350 400 450 500 1. L) = min{ z − u | u ∈ L}. but the method above fails. described as Li = {pi + tvi | t ∈ R}. . whenever the collection of subsets is informative. . We are given a noisy measurement ymeas (1). with the index corresponding to the time. . Explain how to find y . . . for i = 1. 6. . . or Z −1 f does not have the required subset sums. . with vi = 1. (Please give us the simplest example you can think of. Plot the residual ˆ (the difference between these two signals) on a different graph. where t = 1. . g10 .5 −1 −1. .5 g7 (t) 0 −0. . at least approximately. .. where pi ∈ Rn . 500. . . . . . i = 1. We’ll represent these signals by vectors in R500 .5 0 −0. . Plots of f4 and g7 (as examples) are shown below. .5 0 50 100 150 200 250 300 350 400 450 500 t As our estimate of the original signal. and give its RMS value.m on the course web site. . m. . even when the collection of subsets is informative.5 1 f4 (t) 0. give a specific example. 10. either Z is singular. . We define the distance of a point z ∈ Rn to a line L as dist(z. and vi ∈ Rn .20 Point of closest convergence of a set of lines. We have m lines in Rn . . 44 . and carry out your method on the signal ymeas given in ˆ sig_est_data. . . . ymeas (500)) in the RMS (root-meansquare) sense. g0 . . By ‘works’ we mean that Z is invertible.5 1 0. . gk (t) = t − 50k 10 e−(t−50k) 2 /252 . If you believe this is the case. where the collection of subsets is informative. • The method can fail. . . of a signal y(1). . Plot ymeas and y on the same graph. i. . ymeas (500). f10 . that is closest to ymeas = (ymeas (1). .e. . . y(500) that is thought to be. . y (500)) in the span of ˆ y ˆ f0 . This problem concerns discrete-time signals defined for t = 1. m. explain why. . 500 and k = 0. .19 Signal estimation using least-squares. . . To convince us of this. . . a linear combination of the 22 signals fk (t) = e−(t−50k) 2 /252 . we will use the signal y = (ˆ(1). 1.• The method works. . pm ∈ R. pm and v1 . L) gives the closest distance between the point z and the line L. the beam illuminates the top of the photodetectors. dist(z. (b) Find the point z ⋆ of closest convergence for the lines with data given in the matlab file line_conv_data. . defined as follows:   cos φ cos θ q =  cos φ sin θ  . 6. . Specifically. i.m.) dist(z. The point z ⋆ that minimizes this quantity is called the point of closest convergence. Your job is to estimate the beam direction d ∈ R3 (which is a unit vector). Li )2 .) The beam falls on m ≥ 3 photodetectors. The simpler the method the better. given p1 . . . . . and α is the photodetector sensitivity. point upward).. which varies from 0◦ to 360◦ . the direction is directly up). Please include this plot with your solution. . . where d = 1. say so. not radians. . since all unit vectors in this problem have positive 3rd component. (This means the beam travels in the direction −d. m i=1 (In other words. . . .. where θi is the angle between the beam direction d and the outward normal vector qi of the surface of the ith photodetector. vm ). .21 Estimating direction and amplitude of a light beam. and the noisy photodetector outputs. The file also contains commands to plot the lines and the point of closest convergence (once you have found it). A light beam with (nonnegative) amplitude a comes from a direction d ∈ R3 . each of which generates a scalar signal that depends on the beam amplitude and direction. . the vector of photodetector outputs. sin φ Here φ is the elevation (which will be between 0◦ and 90◦ . (b) Carry out your method on the data given in beam_estim_data. . . given the lines (i.m. and a. . 45 . the azimuth is undefined. .e. You are given the photodetector direction vectors q1 . and the direction in which the photodetector is pointed. . . If some matrix (or matrices) needs to be full rank for your method to work. the photodetector sensitivity α.e. If your method works provided some condition holds (such as some matrix being full rank). To describe unit vectors q1 .. and a vector det_el. pm . If q = e3 (i. Give your final estimate of the beam amplitude a and beam direction d (in azimuth and elevation. respectively. as the direction the ith photodetector is pointed. The numbers vi are small measurement errors.We seek a point z ⋆ ∈ Rn that minimizes the sum of the squares of the distances to the lines. photodetector i generates an output signal pi . This file contains n×m matrices P and V whose columns are the vectors p1 . . . . You can interpret qi ∈ R3 . using a method or methods from this class. . say so. which we assume has norm one. We assume that |θi | < 90◦ . and v1 . qm and d in R3 we will use azimuth and elevation. a vector det_az. gives the direction in the plane spanned by the first and second coordinates. i. in degrees). please do so. which gives the azimuth angles of the photodetector directions.e. qm ∈ R3 . p1 .e. with pi = aα cos θi + vi . . the beam amplitude. (a) Explain how to find the point of closest convergence. . The azimuth angle θ. This mfile defines p. (a) Explain how to do this. . vm .. . Note that both of these are given in degrees. If you can relate this condition to a simple one involving the lines. which gives the elevation angles of the photodetector directions. j − Uij . with i ∈ Iunknown . We’ll choose the values for ui . Such as array can represent an image. These are simple approximations of partial differentiation with respect to the x and y axes. when Uij are all equal. i. and the mn − k unknown values in a vector wunknown ∈ Rmn−k .j+1 − Uij . which satisfies vec(V ) = Dx vec(U ). say). . . respectively. the goal is to fill in or interpolate missing data in a 2D array (an image.. we partition the set of indices {1. . . is defined as Wij = Ui+1.. j = 1. Now we get to the problem. We define the roughness of an array U as We define the matrix Dy ∈ R(m−1)n×mn . . or a sampled description of a function defined on a rectangle. The first function takes as argument an m × n array U and returns an m × (n − 1) array V of forward (rightward) differences: Vij = Ui. . The complete array is obtained by putting the entries of wknown and wunknown into the correct positions of the array. . n. so that the resulting U is as smooth as possible. It will also be convenient to describe such an array by a vector u = vec(U ) ∈ Rmn . . . The roughness measure R is the sum of the squares of the differences of each element in the array and its neighbors. . We can represent this linear mapping as multiplication by a matrix Dx ∈ Rm(n−1)×mn . .. We can describe such an array by a matrix U ∈ Rm×n . so it minimizes R. . or smoothly varying.) The other linear function. We give the k known values in a vector wknown ∈ Rk . n. m. U . from the vector. This problem concerns arrays of real numbers on an m × n grid.e. for i = 1. . i = 1. m and j = 1. the goal is to guess (or estimate or assign) values for ui for i ∈ Iunknown . . vec−1 just arranges the elements in a vector into an array. . We describe these operations using two matrices Zknown ∈ Rmn×k and Zunknown ∈ Rmn×(mn−k) . i = 1. Thus. . . . so the reconstructed array is as smooth as possible. . but isn’t. which is to interpolate some unknown values in an array in the smoothest possible way.e. . The roughness measure R is zero precisely for constant arrays. We will think of the index i as associated with the y axis. and mn − k the number of unknown values (the number of elements in Iunknown ). R = Dx vec(U ) 2 + Dy vec(U ) 2 . To go back to the array representation.  . . the number of elements in Iknown ). given the known values in the array. . . which is a simple approximation of partial differentiation with respect to the y axis. . and the index j as associated with the x axis. 46 . . j = 1. .22 Smooth interpolation on a 2D grid. Small R corresponds to smooth. maps an m × n array U into an (m − 1) × n array W .6. n − 1.  vec(U ) =  . . un where U = [u1 · · · un ]. To define this precisely.e. mn} into two sets: Iknown and Iunknown . j. (This looks scarier than it is—each row of the matrix Dx has exactly one +1 and one −1 entry in it. we have U = vec−1 (u). which satisfies vec(W ) = Dy vec(U ). m − 1. where Uij gives the real number at location i. (This looks complicated. that satisfy vec(U ) = Zknown wknown + Zunknown wunknown . Here vec is the function that stacks the columns of a matrix on top of each other:   u1  . We let k ≥ 1 denote the number of known values (i. i. We are given the values ui for i ∈ Iknown . .) We will need two linear functions that operate on m × n arrays. . wknown . Zknown (which gives the locations of the known values). ). Your job is to find wunknown that minimizes R. using imagesc(). Zknown . which consists of a memoryless nonlinearity φ.n). (a) Explain how to solve this problem. .. If your solution is valid provided some matrix is (or some matrices are) full rank.) What this means is z(t) = M −1 τ =0 h(τ )v(t − τ ). as well as an image containing the known values. matrices. For this problem instance.23 Designing a nonlinear equalizer from I/O data.In summary. with the specific form  −1 ≤ a ≤ 1  a 1 − s + sa a>1 φ(a) =  −1 + s + sa a < −1. and Zunknown (which gives the locations of the unknown array values. which you are welcome to use. Dy . you are given the problem data wknown (which gives the known array values). you can issue the command colormap gray. you cannot use Uorig to create your interpolated array U. The file gives m. vec. around 50% of the array elements are known. 6. The mfile also includes the original array Uorig from which we removed elements to create the problem. This file also creates the matrices Dx and Dy . vec(U ) can be computed as U(:). To visualize the arrays use the matlab command imagesc().) You are welcome to look at the code that generates these matrices. and around 50% are unknown. as well as the plots. Dx . vec−1 . t ∈ Z. Of course. Zknown and Zunknown . Be sure to give the value of roughness R of U . but you do not need to understand it. (Note that these signals are defined for all integer times. with zeros substituted for the unknown locations. This problem concerns the discrete-time system shown below. wknown .m. Hints: • In matlab.m. This is just so you can see how well your smooth reconstruction method does in reconstructing the original array. (This was very nice of us. and the scalar signal z is the output. in some specific order). Hand in complete source code.) Here φ : R → R. followed by a convolution filter with finite impulse response h. Compare Uorig (the original array) and U (the interpolated array found by your method). not just nonnegative times. The scalar signal u is the input. This function is shown below.g. so multiplication with either matrix just stuffs the entries of the w vectors into particular locations in vec(U ). v(t) = φ(u(t)). the matrix [Zknown Zunknown ] is an mn × mn permutation matrix. You are welcome to use any of the operations. . • vec−1 (u) can be computed as reshape(u. Zunknown . u φ v ∗h z (This looks complicated. In fact. but isn’t: Each row of these matrices is a unit vector. 47 where s > 0 is a parameter. with matrix argument. This will allow you to see the pattern of known and unknown array values. or don’t have a color printer. If you prefer a grayscale image. by the way. and vectors defined above in your solution (e. . say so. . The mfile that gives the problem data will plot the original image Uorig. (b) Carry out your method using the data created by smooth_interpolation. n. g(M − 1). and M (the length of g. Now. finally. 48 . . ˆ Our equalizer will have the form shown below. . but these vectors are indexed from 1 to M . . . we have J = 0. We are going to design an equalizer for the system. . z(N ). s is called the saturation gain of the amplifier. z(1). . . an estimate of the saturation gain s. . and also the length of h). . which are unknown. You are given some input/output (I/O) data u(1).t} τ =max{0. ˆ ψ(φ(a)) = a for all a). The convolution system represents the transmission channel.. . . h(M − 1).) The term δ is the Kronecker delta. defined as δ(0) = 1. . that minimize ˆ 1 J= N −M +1 N i=M v (i) − φ(u(i)) ˆ 2 . Note that if ˆ g ∗ h = δ and s = s.h) gives the convolution of g and h. This equalizer will work well provided g ∗ h ≈ δ (in which case v (t) ≈ v(t)). i. recall that (g ∗ h)(t) = min{M −1. . . and gives an output u which is an approximation of the input signal u. u(t) = ψ(ˆ(t)). .t−M +1} g(τ )h(t − τ ).. or the channel impulse response h(0). z(−1).. u(N ). we come to the problem. The nonlinear function φ represents a power amplifier that is nonlinear for input signals larger than one in magnitude. . and v comes from the given output data z. . ˆ We exclude i = 1. .φ(a) s 1 −1 −1 1 a Here is an interpretation (that is not needed to solve the problem). To make sure our (standard) notation here is clear. t = 0. g(1) corresponds to g(0). M − 1 in the sum defining J because these terms depend (through v ) on ˆ z(0). . another system that takes the signal z as input. . You also don’t know u(t). z ∗g v ˆ ψ u ˆ This means v (t) = ˆ M −1 τ =0 g(τ )z(t − τ ). and ψ = φ−1 (i. . 2M − 1. and g(0). δ(i) = 0 for i = 0. (Note: in matlab conv(g. . .. z(t) for t ≤ 0.e. . You do not know the parameter s. i.e. . ˆ v t ∈ Z. . . (a) Explain how to find s. Here u refers to the given input data.e. 20). (This chooses the entries from a normal distribution. ˆ 6. i i 6.m on the course web site.1). and the equalization ˆ error u(t) − u(t). The discrete cosine transformation (DCT) of the signal y ∈ RN is another signal. yN ∈ R. . .24 Simple fitting. generate a measurement vector y = Ax + v. . . . . it is remotely possible that the second method will work better than the first. . (a) Find the best affine fit. if y ∈ RN is a signal. . You’ll generate different different data each time. we use y(t) to denote yt . Since you are generating the data randomly. . but this doesn’t really matter for us. . 2. N. and call it xls . the equalized signal u(t). . Plot the data and the fit in the same figure. k = 2.26 Signal reconstruction for a bandlimited signal. where ‘best’ means minimizing i=1 (yi − (axi + b))2 . As a result. Give us a and b.e. . Generate a measurement vector x of length 20 using x=randn(20. . xN ∈ R and y1 . Plot g using the matlab command stem.. . for ˆ ˆ t = 1. i. Give the values of the parameters s and g(0). . i. N }. . yi ≈ ax3 + bx2 + cxi + d. . and submit the code you used to find a and b. which are just vectors.m. . (c) Now form a 20-long truncated measurement vector y trunc which consists of the first 20 entries of y. . 2N k = 1. . These data are available in simplefitdata. (b) Repeat for the best least-squares cubic fit.1*randn(50. . 2/N . ˆ (c) Using the values of s and g(0). Form an estimate of x from y trunc . N . It is defined as N Y (k) = t=1 y(t)w(k) cos π(2t − 1)(k − 1) .. (This is often called the ’best linear fit’. In this problem you will compare a least-squares estimate of a parameter vector (which uses all available measurements) with a method that uses just enough measurements. (a) First we generate some problem data in matlab. with index interpreted as (discrete) time. N (d) Run your script (i. at least for one run.25 Estimating parameters from noisy measurements. Give a one sentence comment about what you observe. as well as J.(b) Apply your method to the data given in the file nleq_data. for t = 1. . with w(k) = 1/N . . N. Finally. . It is common to write the index for a signal as an argument.. Note. Do not mention the incident to anyone. N . (b) Find the least-squares approximate solution of y = Ax. . (a)–(c)) several times. with t ∈ {1. and you’ll get different numerical results in parts (b) and (c). u(t) − u(t)) for t = 1. If this happens to you. 6. You are given some data x1 . . typically denoted using the corresponding upper case symbol Y ∈ RN . . g(M − 1) found. . where w(k) are weights. 49 k = 1.1). for example.) Generate a 50 × 20 matrix A using A=randn(50. Carry out the following steps. . In this problem we refer to signals. . .e. . . Find the relative error xls − x / x . ˆ we can expect a large equalization error (i. the output signal z(t). yi ≈ axi + b. quickly run your script again. . .e. For the purposes of finding u(t) you can assume that z(t) = 0 for t ≤ 0. M − 1. g(M − 1) found in part (b). ˆ Plot the input signal u(t). .) Set this up and solve it as a least-squares problem. .) Generate a noise vector v of length 50 using v=0. Find the relative error of xjem . . Call this estimate xjem (‘Just Enough Measurements’). rather than as a subscript. find the equalized signal u(t).e.. Another notation you’ll sometimes see in signal processing texts is y[t] for yt . . . . (You’re free to use any other software system instead. Give K. . (We have added the command to do this in bandlimit. N. . . 2N t = 1. 6. tM .) Your job is to • Determine the smallest DCT bandwidth (i. where w(k) are the same weights as those used above in the DCT. . with N y(t) = k=1 Y (k)w(k) cos π(2t − 1)(k − 1) .27 Fitting a model for hourly temperature. and sigma.. . N . yt ∈ R. respectively) over the week. y samp would be called a non-uniformly sampled. . dct(eye(N)) and idct(eye(N)) will return matrices whose columns are the DCT. which has this bandwidth. of the unit vectors. i = 1. 50 has a small RMS value. . .e. While it need not match ˆ exactly (you were told there was a small amount of noise in y samp ). You are given a set of temperature measurements (in degrees C). the smallest K) that y could have. is significantly smaller than N . give us y (129). . to four significant figures. yM − y (tsamp )). . Here. . .The number Y (k) is referred to as the kth DCT coefficient of y. . (In signal processing. M. . taken hourly over one week (so N = 168).. We can interpret a (which has units of degrees C per hour) as the warming or cooling trend (for a > 0 or a < 0. Note. • Find an estimate of y. You don’t know v. It will also plot the sampled signal. to date. your estimate of the DCT bandwidth of y. the signal is called DCT bandlimited.m. ˆ Your estimate y must be consistent with the sample measurements y samp . . Now for the problem. An expert says that over this week. . via the inverse DCT transform. You cannot use (and do not need) any concepts from outside the class. M. The DCT bandwidth of the signal y is the smallest K for which Y (K) = 0. (The term is typically used to refer to the case when the DCT bandwidth. you should ensure that the vector of differences.. . vi are small noises or errors. an appropriate model for the hourly temperature is a trend (i. that you can solve the problem without using these functions. . noise corrupted version of y. Show y on the same plot as the original ˆ sampled signal. When K < N . ˆ where a ∈ R and p ∈ RN satisfies pt+24 = pt . You can use any concepts we have used.m. ysamp. Running this script will define N. where 1 ≤ t1 < t2 < · · · < tM ≤ N : samp yi = y(ti ) + vi . . but commented it out.e. . . . You are given noise-corrupted values of a DCT bandlimited signal y.e. a known constant. samp samp (y1 − y (tsamp ). . K. y . (b) Carry out your method on the data in bandlimit. however. . respectively. on the order of σ (and certainly no more than 3σ). for t = 1. This includes the Fourier transform and other signal processing methods you might know. . ˆ 1 ˆ M (a) Clearly describe how to solve this problem. N − 24.) Also. t = 1. at some (integer) times t1 . and inverse DCT transforms. but you do know that its RMS value is approximately σ. a 24-periodic component): yt = at + pt . . in EE263. tsamp. and Y (k) = 0 for k = K + 1. .) A signal y can be reconstructed from its DCT Y . ˆ You might find the matlab functions dct and idct useful. N . . . a linear function of t) plus a diurnal component (i. T. This i i ˆ fitting cost can be (loosely) interpreted in terms of relative or percentage fit.m also contains a 24 long vector ytom with tomorrow’s temperatures. β. Use the model found in part (b) to predict the temperature for the next 24-hour period (i. . . but different values of T . You know the matrices A. say so. Give the value of the trend parameter a that you find. m. . where α. T . (b) Carry out your method on the data found in empac_data. 51 y(t) − C x(t). If (log(Ti /Ti ))2 ≤ ǫ. . t = 1.m. for example. and the parameters ki .) (c) Temperature prediction. m. We wish to find values of α. B.m. . t = 1. 6. (These are typically integers that give the dimensions of the problem data. . or v(t). based on the model found in part (b). t = 1. It’s possible (and often occurs) that two data records have identical values of k. δ ∈ R are constants that characterize the approximate runtime model. and ni . and δ need not be integers. . and the signal v(t) is the measurement noise. Plot the model y and the measured temperatures ˆ y on the same plot. . Your job is to estimate the state trajectory x(t). . the inputs u(t). (The matlab code to do this is in the file tempdata. T Your task is to find constants α. . say so. γ.) A simple and standard model that shows how T scales with k. 2. . . the corresponding runtimes can be (and often are) a little different. If your method always finds the values that give the true global minimum value of J. and δ you find. Plot tomorrow’s temperature and your prediction of it.29 State trajectory estimation. . δ that minimize J. m. γ. and outputs y(t). ˆ x t = 1.) Now suppose you are given measured runtimes for N executions of the algorithm. β. ˆ (b) Carry out the procedure described in part (a) on the data set found in tempdata. C. For each data record. say so. and the corresponding value of J. If. We will denote your estimate of the state trajectory as x(t). . T . T − 1. . If your algorithm cannot guarantee finding the true global minimum. This means the algorithm was run on two different data sets that had the same dimensions. . If your method requires some matrix (or matrices) to be full rank. γ. . . What is the RMS value of your prediction error for tomorrow’s temperatures? 6. .m. . γ. When we guess the state trajectory x(t). . and n has the form ˆ T = αkβ mγ nδ . with different sets of input data.. which is characterized by three key parameters: k. . then î lies between Ti / exp √ǫ and Ti exp √ǫ. we say that the algorithm has (approximately) cubic complexity in n. t = 1. we have two ˆ ˆ sets of residuals. The runtime T of an algorithm depends on its input data. 2. 2. We define the fitting cost as N J = (1/N ) i=1 ˆ log(Ti /Ti ) 2 . T . .e. . γ. and n. δ ≈ 3. β ˆ where Ti = αki mγ nδ is the runtime predicted by our model. The file tempdata. x(t + 1) − (Aˆ(t) + Bu(t)). . . The signal w(t) is called the process noise. You do not know x(t). and δ for which our model (approximately) fits our measurements. T − 1. Give the values of α. on the same plot. or close to integers. with state x(t) ∈ Rn . . β.28 Empirical algorithm complexity. . . w(t). We consider a discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t) + w(t). mi . (a) Explain how to do this.(a) Explain how to find a ∈ R and p ∈ RN (which is 24-periodic) that minimize the RMS value of y − y. . you are given Ti (the runtime). . y(t) = Cx(t) + v(t). for t = 1. ˆ t = 1. from t = 169 to t = 192). t = 1. using the given parameter values. T . and n. input u(t) ∈ Rm and output y(t) ∈ Rp . β. . . but commented out. the exponents β. (In general. The objective J is a weighted sum of norm squares of our two residuals. x(t). b(K) . where the vehicles are airplanes. (In the general case the output would also be a vector. respectively.) Give the value of J corresponding to your estimate. we will consider model estimation for vehicles in a fleet. t denotes the time. the dimensions n. x(k) (t) ∈ Rn the input. T . we describe a more specific application. C.30 Fleet modeling. . You may then recover X with the command X = reshape(x. called FOQA data. Airlines are required to collect this data. the parameter ρ. (This description is not needed to solve the problem. . We will choose these to minimize the mean square error 1 E= TK T K y (k) (t) ≈ aT x(k) (t) + b(k) . (b) Carry out your method from part (a) using state_traj_estim_data. but for simplicity here we consider the scalar output case. which gives A. B. using any concepts from the ˆ course. K. The input and output trajectories are given as m × T and p × T matrices. (The tth column gives the vector at the tth period. . respectively. . 52 . m. t = 1. You will choose x(t) so as to minimize the overall objective ˆ J= T −1 t=1 T x(t + 1) − (Aˆ(t) + Bu(t)) ˆ x 2 +ρ t=1 y(t) − C x(t) ˆ 2 . where ρ > 0 is a given parameter (related to our guess of the relative sizes of w(t) and v(t)). for example. • You might find the matlab function blkdiag useful. . and the time horizon T . . . t=1 k=1 y (k) (t) − aT x(k) (t) − b(k) 2 .m. (a) Explain how to find the model parameters a and b(1) . In this problem.m). where X is an n by m matrix. The mfile includes the true value of the state trajectory. . We collect input and output data at multiple time instances. . the deflections of various control surfaces and the thrust of the engines. stacks the columns of X into a vector of dimension nm. (a) Explain how to find the state trajectory estimate x(t). T. (of course you may not use it in forming your estimate x(t)). . . and y (k) (t) ∈ R the output. If one or more matrices must satisfy a rank condition for your method to work. . The components of the inputs might be. for every commercial flight. Here k denotes the vehicle number. . k = 1. p.) We will fit a model of the form where a ∈ Rn is the (common) linear model parameter. and b(k) ∈ R is the (individual) offset for the kth vehicle. y (k) (t) ∈ R. for each vehicle in a fleet of vehicles: x(k) (t) ∈ Rn . . say so. .) While it does not affect the problem. 6. t = 1.n. the output might be vertical acceleration.which correspond to (implicit) estimates of w(t) and v(t). Matlab hints: • The matlab command x = X(:). Plot x1 (t) (the true first state component) and x1 (t) (the estimated ˆ ˆ first state component) on the same plot. . and b ∈ Rn . .) Give an explicit expression for k. y (k) (T ).e. −4  53 . t = 1.31 Regulation using ternary inputs. . choose the inputs u(t) so as to drive the state x(t) towards zero.1  −. and the 1 × T row vector y{k} contains y (k) (1). 0. The data is given using cell arrays X and y.2 0 .  . (We don’t care about what happens when there are ties.1   b=  .   1 0 round(a) =  −1 (b) Consider the specific problem with data  1 . . 1}. and round(a) rounds the a > 1/2 |a| ≤ 1/2 a < −1/2. . straightforward question.15 0 1 where k ∈ R1×n . 1}. 100.  Give k. (This is a simple.e. . and report the associated mean square error E. . b = 0.) Our goal is to regulate the system. Use the matlab function stairs to plot u(t). with x(t) ∈ Rn .2 0  −. (The problem title comes from the restriction that the input can only take three possible values. .2 1 0 .m. . Compare E to the (minimum) mean square error E com obtained using a common offset b = b(1) = · · · = b(K) for all vehicles. Give the model parameters a and b(1) . we have arbitrarily broken ties in favor of a = 0. . u(t) ∈ {−1. number a to the closest of {−1. .1   −4  0   x(1) =   0 .) 6.9 0 0 −.   . . b(K) . i. 0. The columns of the n × T matrix X{k} are x(k) (1).15 A=  .2 −. and plot x(t) and u(t) for t = 1. x(k) (T ). . By examining the offsets for the different vehicles. . . . .. we don’t want to hear a long explanation or discussion.(b) Carry out your method on the data given in fleet_mod_data.1  . we will choose u(t) so as to minimize x(t + 1) . suggest a vehicle you might want to have a maintenance crew check out. . (a) Show that u(t) has the form u(t) = round(kx(t)). Consider a discrete-time linear dynamical system x(t + 1) = Ax(t) + bu(t). i. −. We will adopt a greedy strategy: At each time t.. . . 2. . . p = [a µ σ]T . µ ∈ R. . . Note that E is a function of the parameters a.. Explicitly describe the Gauss-Newton steps. We are given a set of data. Here t ∈ R is the independent variable. .) Plot the final Gaussian function obtained along with the data on the same plot. or fails at some step. . µ. . The problem is to estimate the cell phone 54 . i. (a) Work out the details of the Gauss-Newton method for this fitting problem. of course.e.Lecture 7 – Regularized least-squares and Gauss-Newton method 7. You can use the notation ∆p(k) = [∆a(k) ∆µ(k) ∆σ (k) ]T to denote the update to the parameters. i. and σ ∈ R are parameters that affect its shape. 7. y (and N ) from the file gauss_fit_data. (Here k denotes the kth iteration. (It’s possible. . within a few tens of nanoseconds. t1 . where c is the speed of light. You’ll use the Gauss-Newton method.) (b) Get the data t. µ is called its center. y1 .e. i. c i = 1.) Briefly comment on the results you obtain in the three cases. sn ∈ R2 . . Repeat for another reasonable. Repeat for another set of parameters that is not reasonable. located at locations s1 . This problem concerns a simplified version of an E-911 system that uses time of arrival information at a number of base stations to estimate the cell phone location. yN . The parameter a is called the amplitude of the Gaussian.m. (You should plot enough iterations to convince yourself that the algorithm has nearly converged. . that the Gauss-Newton algorithm doesn’t converge.e. . . or estimate them by any other method (but you must explain your method). available on the class website. . You’ll need an initial guess for the parameters. p. p(k+1) = p(k) + ∆p(k) . You can assume that vi is on the order of a few tens of nanseconds. Implement the Gauss-Newton (as outlined in part (a) above). tN .. . Your job is to choose these parameters to minimize E. A cell phone at location x ∈ R2 (we assume that the elevation is zero for simplicity) transmits an emergency signal at time τ . You can visually estimate them (giving a short justification). We can always take σ > 0. n. (This is possible because the base stations are synchronized using the Global Positioning System. We will measure the quality of the fit by the root-mean-square (RMS) fitting error.e. and vi is the noise or error in the measured time of arrival. not a good guess for the parameters. but different initial guess for the parameters. and σ is called the spread or width.1 Fitting a Gaussian function to data. including the matrices and vectors that come up.) The measured times of arrival are ti = 1 si − x + τ + vi . . Each base station can measure the time of arrival of the emergency signal. For convenience we define p ∈ R3 as the vector of the parameters. . This signal is received at n base stations. i. if this occurs.. The federal government has mandated that cellular network operators must have the ability to locate a cell phone from which an emergency call is made. given by E= 1 N N i=1 1/2 (f (ti ) − yi ) 2 . Plot the RMS error E as a function of the iteration number. and our goal is to fit a Gaussian function to the data. . and a ∈ R. A Gaussian function has the form f (t) = ae−(t−µ) 2 /σ 2 . say so. .2 E-911. σ. You will only work with this approximate version of the problem. We are given a function F : [0. where fi = F (i/n). we can choose G to be an affine function (i. At one extreme. . at the other extreme. and that |τ | ≤ 5000 (although all that really matters are the time differences). i. To reduce the problem to a finite-dimensional one. to have G′′ (t) = 0 for all t ∈ [0. which is a smoothed version of F . so we have a problem with two objectives. a 1 × 9 vector t that contains the measured times of arrival.position x ∈ R2 . defined as C= 0 G′′ (t)2 dt. based on the time of arrival measurements t1 . Using this representation we will use the following objectives. which is the speed of light. • The matlab source code you use to solve the problem. defined as 1 d= n n i=1 (fi − gi )2 .e. whose columns give the positions of the 9 basestations. The problem is to identify the optimal trade-off curve between C and D. . In general there will be a trade-off between the two objectives. We’ll judge the smoothed version G of F in two ways: • Mean-square deviation from F .3 Curve-smoothing. Distances are given in meters. times in nanoseconds. including how you will check that your estimate is reasonable. gi+1 − 2gi + gi−1 1/n2 gives a simple approximation of G′′ (i/n).. we will represent the functions F and G (approximately) by vectors f. and the speed of light in meters/nanosecond. it defines a 2 × 9 matrix S. 1 • Mean-square curvature. 7. as well as the time of transmission τ . You can assume that n is chosen large enough to represent the functions well. and explain how to find smoothed functions G on the optimal trade-off curve. we can choose G = F . and the constant c. including the results of any verification you do. which approximate the ones defined for the functions above: • Mean-square deviation. • The numerical results obtained by your method. tn . note that 1 n−2 n−1 i=2 gi+1 − 2gi + gi−1 1/n2 2 . Your solution must contain the following: • An explanation of your approach to solving this problem. You can assume that the position x is somewhere in the box |x1 | ≤ 3000. 1] → R (whose graph gives a curve in R2 ). the vectors f and g and the objectives c and d.m. which makes D = 0. defines the data for this problem. |x2 | ≤ 3000. defined as c= In our definition of c.. gi = G(i/n). We want both D and C to be small. 1] → R. The mfile e911_data. Specifically. 1]). • Mean-square curvature. . . defined as D= 0 1 (F (t) − G(t))2 dt. 55 . available on the course web site. in which case C = 0.e. Our goal is to find another function G : [0. and check the results. g ∈ Rn . .e. which corresponds to minimizing d without regard for c. Submit your matlab code. Find and plot the optimal trade-off curve between d and c. Does your method always work? If there are some assumptions you need to make (say. as we put more and more weight on minimizing curvature). be sure to include also a plot of f . on rank of some matrix. for example. Be sure to identify any critical points (such as. 56 . independence of some vectors. On your plots of g.). and for three values of µ in between (chosen to show the trade-off nicely). Explain how to obtain the two extreme cases: µ = 0.m from the course web site. say with dotted line type. state them clearly. any intersection of the curve with an axis). This file defines a specific vector f that you will use. for reference.(a) Explain how to find g that minimizes d + µc. etc. (b) Get the file curve smoothing. Plot the optimal g for the two extreme cases µ = 0 and µ → ∞. where µ ≥ 0 is a parameter that gives the relative weighting of sum-square curvature compared to sum-square deviation. and also the solution obtained as µ → ∞ (i. figure..2 Smallest input that drives a system to a desired steady-state output. plot(y(1.) The data for this problem can be found in ss_small_input_data..m. while keeping J2 = t=0 10 f (t)2 dt small.e. an input u that results in y(t) = ydes for t = T + 1. with initial state x(1) = 0.e. logspace in matlab. Find y(t).e. If you’ve done everything right.:)). 8.. for t = 1. plot(y(2. .:)).. and p(5) = 0. using. 2. Hint: the parameter µ has to cover a very large range. . x(t) = xss . so it usually works better in practice to give it a logarithmic spacing. i. . subplot(413). . x(1) = 0. . Plot the optimal trade-off curve between J2 and J1 . Plot u and y versus t. (a) Assume the mass has zero initial position and velocity: p(0) = p(0) = 0. Plot the optimal force ˙ f . 20000. at or near rest. subjected ˙ to force f (t). Your solution to this problem should consist of a clear written narrative that explains what you are doing. . Find x that minimizes ˙ 10 f (t)2 dt t=0 subject to the following specifications: p(10) = 1. You don’t need more than 50 or so points on the trade-off curve. (i. you should observe that y(t) appears to be converging to ydes . i. You can use the following matlab code to obtain plots that look like the ones in lecture 1. Make sure the specifications are satisfied. Our goal is to bring the ˙ mass near or to the origin at t = 10. . (b) Simple simulation. where Ad ∈ R16×16 . along with comments explaining it all. −2) as t → ∞ (i.. Cd ∈ R2×16 . . . 57 .e. Give a short intuitive ˙ explanation for what you see.:)). Suppose that the system is in steady-state. exact convergence after T steps).1 Optimal control of unit mass. even better. . . and the final plots produced by matlab. . . t = 1. p(10) = 0. Consider a unit mass at position p(t) with velocity p(t). (We start from initial time t = 1 rather than the more conventional t = 0 since matlab indexes vectors starting from 1. where f (t) = xi for i − 1 < t ≤ i.. subplot(411). not 0. In this problem you will use matlab to solve an optimal control problem for a force acting on a unit mass. with u(t) = uss . We start with the discrete-time model of the system used in lecture 1: x(t + 1) = Ad x(t) + Bd u(t). Find uss and xss . (a) Steady-state analysis for desired constant output. e. u(t) = uss and y(t) = ydes are constant (do not depend on t). subplot(412). y(t) = Cd x(t).Lecture 8 – Least-norm solutions of underdetermined equations 8. . or at least not too large. Check that the end points make sense to you. for i = 1. The goal is to find an input u that results in y(t) → ydes = (1. subplot(414).e. Bd ∈ R16×2 . we want J1 = p(10)2 + p(10)2 . asymptotic convergence to a desired output) or. plot(u(2. plot(u(1. the matlab source code you devise to find the numerical answers. and the resulting p and p. The system starts from the zero state. (b) Assume the mass has initial position p(0) = 0 and velocity p(0) = 1. and gives formulas symbolically. .:)).g. i. ˙ small. 10. . . In the rest of this problem.. Note that if u(t) = u⋆ (t) for t = 1. . w 1 ≥ wT v . x10 for one second each. and the plots in lecture 1 scale t by a factor of 0. . . Explain why we must have λT y .n The 1-norm. which is like the Cauchy-Schwarz inequality (but even easier to prove): for any v. This solution has the minimum (Euclidean) norm among all solutions of Ax = y. There is no simple formula for the least 1-norm or ∞-norm solution of Ax = y. For the three cases T = 100.3 Minimum fuel and minimum peak input solutions. T . which can be done using the command loglog instead of the command plot. Therefore if you can find xmf ∈ Rn (mf stands for minimum fuel) and λ ∈ Rm such that Axmf = y and λT y xmf 1 = .Here we assume that u and y are 2 × 20000 matrices. like there is for the least (Euclidean) norm solution. (d) Plot the RMS value of u⋆ versus T for T between 100 and 1000 (for multiples of 10. . for example.e.1. if you like).. we have exact convergence to the desired output in T steps. which are defined as n x 1 = i=1 |xi |. you’ll use these ideas to verify a statement made during lecture. T = 200. since we can’t just differentiate to verify that we have the minimizer. so there are many x’s that satisfy Ax = y. AT λ ∞ then xmf is a minimum fuel solution. and let λ ∈ Rm be such that AT λ = 0. . In lecture we encountered the least-norm solution given by xln = AT (AAT )−1 y. .. The mass starts at position p(0) = 0 with zero velocity and is required 58 . and then u(t) = uss for t = T + 1. the following inequality holds: wT v ≤ v ∞ w 1 . . . (That’s one of the topics of EE364. w ∈ Rp . For each of these three cases. and T = 500. . for t = 1.) The analysis is a bit trickier as well. (Explain why. First verify the following inequality. From this inequality it follows that whenever v = 0. . . be the input with minimum RMS value 1 T T 1/2 u(t) t=1 2 that yields x(T + 1) = xss (the value found in part (a)). find u⋆ and its associated RMS value. 8.. however. . Now consider the problem from the lecture notes of a unit mass acted on by forces x1 . T + 2. In many applications we want to minimize another norm of x (i. plot u and y versus t.. then y(t) = ydes for t ≥ T + 1. any solution of Az = y must have 1-norm at least as big as the righthand side expression. i=1. (c) Smallest input. In other words. .. There will be two differences between these plots and those in lecture 1: These plots start from t = 1. Let u⋆ (t). They can be computed very easily. v ∞ Now let z be any solution of Az = y. z 1≥ AT λ ∞ Thus. Suppose A ∈ Rm×n is fat and full rank.) Methods for computing xmf and the mysterious vector λ are described in EE364. the ∞-norm is the peak of the vector or signal x. measure of size of x) subject to Ax = y. The plot is probably better viewed on a log-log scale. For example. Two common examples are the 1-norm and ∞-norm. x ∞ = max |xi |. how would you know that a solution of Ax = y has minimum 1-norm? In this problem you will explore this idea. is often a good measure of fuel use. T . an accelerating force at the beginning. is very widely used in practice. In class I stated that the minimum fuel solution is given by xmf = (1/9. i. the (Euclidean) least-norm solution. 8. −1/9). you can see a plot of the least (Euclidean) norm force profile. A sinusoidal plane wave.. 0. and a (braking) force at the end to decelerate the mass to zero velocity at t = 10. ˜ where G ∈ Rm×n . −5). Feel free to use matlab. while respecting a maximum possible current in the disk drive motor coil. We want to find a reconstruction matrix H ∈ Rn×m such that HG = H G = I. Prove this.. with spacing d between elements.. norm(z.5 Phased-array antenna weight design. y gives the measurements with some set of (linear) sensors. ˙ In the lecture notes. i.. G ∈ Rm×n . G =  2 −3  .e. Here x is some variable we wish to estimate or find. It is (basically) the input used in a disk drive to move the head from one track to another. . There are several convenient ways to find the 1. θ d The array consists of n individual antennas (called antenna elements) spaced on a line. We consider the phased-array antenna system shown below. p(10) = 0. x = Hy = H y . impinges on the array. Hints: • The input is called bang-bang. Consider the specific case ˜ ˜     2 3 −3 −1  1 0   −1 0      ˜ G =  0 4  .e.and ∞-norm of a vector z. Consider a system where y = Gx. . . many force vectors that satisfy these requirements. Hint: try λ = (1.g. or explain why there is no such H.to satisfy p(10) = 1. . • Some people drive this way. 8 seconds of coasting.      1 1   −1 −3  −1 2 1 2 8. e. for fun: what do you think is the minimum peak force vector xmp ? How would you verify that a vector xmp (mp for minimum peak) is a minimum ∞-norm solution of Ax = y? This input.inf) or sum(abs(z)) and max(abs(z)). One last question. Such a reconstruction matrix has the nice property that it recovers x perfectly from either set of measurements (y or y ). which yields the output e2πj(k−1)(d/λ) cos θ (which is a complex number) from the kth element.4 Simultaneous left inverse of two matrices. (We’ve chosen the phase center as element 1. 0. ˜ y = Gx ˜ Either find an explicit reconstruction matrix H. Verify that the 1-norm of xmf is less than the 1norm of xln . and y gives the measurements with some alternate set ˜ ˜ of (linear) sensors. the output of element 1 does not depend on the 59 . with wavelength λ and angle of arrival θ. There are.e. i. by the way.1) and norm(z. of course. is called the antenna array gain pattern. but small at other angles. then we must have x1 = x2 + x3 . Give the weights you find.) To solve this problem. and a spacing d = 0.e. which allows you to easily visualize the pattern. The function |y(θ)|. The combined array output depends on the angle of arrival of the wave. You are to choose w ∈ C20 that minimizes 60 180 k=1 |yk |2 + k=80 |yk |2 subject to the constraint y70 = 1. (In the language of antenna arrays. . i. Such a pattern would receive a signal coming from the direction θtarget . . will achieve the desired goal of being relatively insensitive to plane waves arriving at angles more than 10◦ from θ = 70◦ ). 60 .e. . The measurements are good. ). hopefully. the measurement y won’t satisfy the conservation laws (i. We can choose.. . As a simple example. . . this is done using the polar command. then we get a uniform or omnidirectional gain pattern. if A is a complex matrix or vector. . if x1 is the current in a circuit flowing into a node. where m < n. . From physical principles it is known that the quantities x must satisfy some linear equations. we can shape the gain pattern to satisfy some specifications. we want a beamwidth of 20◦ around a target direction of 70◦ .4λ (which is a typical value). i.e. A vector y ∈ Rn contains n measurements of some physical quantities x ∈ Rn . By choosing the weights w1 . and real matrices. 180. In matlab. and we assume that a1 . 8. the real and imaginary parts of the antenna weights). the weights. are called the antenna weights. More generally. . As usual. . am are independent. .. In this problem. versus k (which. In other words. of A. • Very important: in matlab. . and we want |y(θ)| small for 0◦ ≤ θ ≤ 60◦ and 80◦ ≤ θ ≤ 180◦ . The vectors ai and the constants bi are known. We want y(70◦ ) = 1.. .incidence angle θ. Here’s the problem. you will first discretize the angles between 0◦ and 180◦ in 1◦ increments. . or Hermitian conjugate. |yk |. Hints: • You’ll probably want to rewrite the problem as one involving real variables (i. which are the coefficients of the linear combination. . and also a plot of the antenna array response.e. the linear equations might come from various conservation laws. . . The complex numbers w1 . ‘jammers’ or multipath reflections) coming from other directions.. . and attenuate signals (e.6 Modifying measurements to satisfy known conservation laws.. charge .) A (complex) linear combination of these outputs is formed. for k = 1. wn intelligently.e.g. wn . the prime is actually the Hermitian conjugate operator. m. for 0◦ ≤ θ ≤ 180◦ . As a simple example. we want a gain pattern that is one in a given (‘target’) direction θtarget . you might find it fun to also plot your antenna gain pattern on a polar plot.e. aT x = bi . You will design the weights for an array with n = 20 elements. Due to measurement errors. i i = 1. and x2 and x3 are the currents flowing out of the node. if we choose the weights as w1 = 1. In other words. |y(θ)| = 1 for all θ. i. n y(θ) = k=1 wk e2πj(k−1)(d/λ) cos θ . we want the antenna array to be relatively insensitive to plane waves arriving from angles more than 10◦ away from the target direction. with yk equal to y(k ◦ ). and called the combined array output. design. . . . heat. y(πk/180). w2 = · · · = wn = 0. energy. You can then rewrite your solution in a more compact formula that uses complex matrices and vectors (if you like). you must explain how you solve the problem. so we have y ≈ x. but not perfect.. Thus y ∈ C180 will be a (complex) vector. A’ gives the conjugate transpose. i. or balance equations (mass. . • Although we don’t require you to.. and A is full rank. If any matrix inverses appear in your formula. . with   1 0  1 1     1 −1  . If you find such a B. Verify that the resulting adjusted measurement satisfies the conservation laws. in terms of y. . aT yadj = bi . ai .  1 2 −1 1 0   3 3 2 2 1  (b) Now we consider a specific example. (Note that this condition is a stronger condition than the estimator being unbiased. An engineer proposes to adjust i the measurements y by adding a correction term c ∈ Rn . to get an adjusted estimate of x. With each communication link we associate a directed arc.8 Optimal flow on a data collection network. We consider a communications network with m nodes. (We’ll be really annoyed if you just give a matrix and leave the verification to us!)   f1 =      . i. in matlab) that it satisfies the required properties.7 Estimator insensitive to certain measurement errors.) The traffic on the network (i. the flow or traffic on link j is denoted fj .e. you must explain how you found it.e. (The units are bits per second.. . We will assume that the network is connected. i. . from every node (including the special destination node) to every other node.e. information can flow in either direction. or explain in detail why none exists. known vectors. and n communication links. explain why the matrix to be inverted is nonsingular. for any measurement error in V. and verify (say. In this problem we assume that the measurement errors lie in the subspace V = span{f1 .     f2 =      .linear equations above) exactly. we say ˆ that the estimator is insensitive to measurement errors in V.   8. In other words. A=   2 1  −1 2 Either construct a specific B ∈ R2×5 for which the linear estimator x = By is insensitive to ˆ measurement errors in V. then there is no estimator insensitive to measurement errors in V. we have x = x. given by yadj = y + c. In this problem we consider the stronger condition that the estimator predicts x perfectly. the flow in each communication link) is given by a 61 .e. there is a path. plus a special destination node. fk }. We consider the usual measurement setup: y = Ax + v. where • x ∈ Rn is the vector of parameters we wish to estimate • A ∈ Rm×n is the coefficient matrix relating the parameters to the measurements You can assume that m > n. for any x ∈ Rn . . we have x = x. Each communication link connects two (distinct) nodes and is bidirectional. Now consider a linear estimator of the form x = By. although we would expect aT y ≈ bi . and any v ∈ V. for any x ∈ Rn .. but that won’t matter to us.. bi .) (a) Show that if R(A) ∩ V = {0}. i. . . If this condition holds. or sequence of links. She proposes to find the smallest possible correction term (measured by c ) such that the adjusted measurements yadj satisfy the known conservation laws. an unbiased estimator predicts x perfectly when there is no measurement error. In other ˆ words.. Give an explicit formula for the correction term. • v ∈ Rm is the vector of measurement errors • y ∈ Rm is the vector of measurements where f1 . . ˆ Recall that the estimator is called unbiased if whenever v = 0. which defines the direction of information flow that we will call positive. i 8. Using these reference directions. fk ∈ Rm are given. all the external flow entering node 2 goes to node 1. This means that at each node (except the special destination node) the sum of all flows entering the node from communication links connected to that node. The first two terms on the left give the flow entering node 3 on links 4 and 5. an external information flow si (which is nonnegative) enters. f4 = −3 means the flow on link 4 is 3 (bits per second). n 1 f 2. we arbitrarily choose the path through node 4. plus the external flow. The nodes are shown as circles. and the special destination node is shown as a square. The external flows are   1  4   s=  10  . i. (a) The vector of external flows. (That explains why we call it a data collection network.    0     10  20 62 . The vector s ∈ Rm of external flows is sometimes called the source vector. 10 One simple feasible flow is obtained by routing all the external flow entering each node along a shortest path to the destination. The term on the righthand side gives the flow leaving over link 6. (b) Now consider the specific (and very small) network shown below. the network is used to collect information from the nodes and route it through the links to the special destination node.vector f ∈ Rn .e. consider node 3 in the network of part 2. Information flow is conserved. here is the problem. are given. and each column is associated with an arc or link. nodes 1 and 3 are connected by communication link 4. and the associated arc points from node 1 to node 3. flow conservation at node 3 is given by f4 + f5 + s3 = f6 . For example. from node 3 to node 1. the last term on the left gives the external flow entering node 3. As an example.. equals the sum of the flows leaving that node on communication links. and f6 . Note that each row of A is associated with a node on the network (not including the destination node). then to the destination node. and the node incidence matrix A ∈ Rm×n that describes the network topology.) At node i. Note that this equation correctly expresses flow conservation regardless of the signs of f4 . Finally. Links 4 and 5 enter this node. In this example. For node 3. and minimizes the mean-square traffic on the network. External information enters each of the m regular nodes and flows across links to the special destination node. which has two shortest paths to the destination. A small example is shown in part 2 of this problem. s ∈ Rm . Thus f4 = 12 means the flow on that link is 12 (bits per second). In other words. Therefore. and the network topology. The node incidence matrix is defined as   1 arc j enters (or points into) node i −1 arc j leaves (or points out of) node i Aij =  0 otherwise. and link 6 leaves the node. Similarly. n j=1 j Your answer should be in terms of the external flow s. from node 1 to node 3. and you must find the flow f that satisfies the conservation equations. f5 . This simple routing scheme results in the feasible flow   5  4     0    fsimple =  0  . In this problem. is called the antenna array gain pattern. we can shape the gain pattern to satisfy some specifications. . . . wn intelligently.. The combined array output depends on the angle of arrival of the wave. Design the weights for the antenna array. which yields the output ej λ (xk cos θ+yk sin θ) (which is a complex number) from the kth element. whose elements have coordinates given in the file antenna_geom. yk ) The array consists of n individual antennas (called antenna elements) randomly placed on 2d space. for 0◦ ≤ θ < 360◦ . which are the coefficients of the linear combination. . the weights. Such a pattern would receive a signal coming from the direction θtarget . wn . The function |r(θ)|. and called the combined array output. impinges on the array. design. n r(θ) = k=1 wk ej 2π λ (xk cos θ+yk sin θ) . we want the antenna array to be relatively insensitive to plane waves arriving from angles more that 10◦ away from the target direction.g. with wavelength λ 2π and angle of arrival θ.. we want a gain pattern that is one in a given (‘target’) direction θtarget . Here’s the problem. . The complex numbers w1 . are called the antenna weights. f6 2 f4 f5 3 s3 s4 θ (xk . In other words. A sinusoidal plane wave. 63 .e. i. with the coordinates of the kth element being xk and yk . and we want |r(θ)| small for 0◦ ≤ θ ≤ 60◦ and 80◦ ≤ θ < 360◦ . We can choose. Compare the mean square flow of the optimal flow with the mean square flow of fsimple . s2 f2 s1 1 f3 f1 4 f7 D 8. By choosing the weights w1 . We consider the phased-array antenna system shown below. but small at other angles. We want r(70◦ ) = 1. A (complex) linear combination of these outputs is formed.Find the mean square optimal flow for this problem (as in part 1). .m. .9 Random geometry antenna weight design. and attenuate signals (e. (In the language of antenna arrays. . ‘jammers’ or multipath reflections) coming from other directions. you might find it fun to also plot your antenna gain pattern on a polar plot. . 1. you will find the file mtprob4. with rk equal to r(k ◦ ). which allows you to easily visualize the pattern. r(πk/180). Thus r ∈ C360 will be a (complex) vector. (In other words.) You are told that λ = 1. given A and y. this is done using the polar command. 64 .e. (AT A)−1 AT y > 1. (You may assume that the norm of the least-squares approximate solution exceeds one. where the transmitted signal power is known to be equal to one. . or Hermitian conjugate. In matlab.) (a) Explain clearly how would you find the best estimate of x. As usual. at sampling times t = 0.. Give the matlab source you use to compute x. • Although we don’t require you to. Is your estimate ˆ x a linear function of y? ˆ (b) On the EE263 webpage.e. You can then rewrite your solution in a more compact formula that uses complex matrices and vectors (if you like). Hints: • You’ll probably want to rewrite the problem as one involving real variables (i.m. i. Explain how you would compute your estimate x. Carry out the estimation procedure you developed in part (a).10 Estimation with known input norm.. . you must explain how you solve the problem.e. i.11 Minimum energy rendezvous. versus k (which. of A. we add one more piece of prior information: we know that x = 1. x ∈ Rn is the vector we wish to estimate. In other words. where A ∈ Rm×n is a full rank. . and y ∈ Rm is the measurement vector.. you will first discretize the angles between 1◦ and 360◦ in 1◦ increments. will achieve the desired goal of being relatively insensitive to plane waves arriving at angles more than 10◦ from θ = 70◦ ). .) This might occur in a communications system. the vector we are estimating is known ahead of time to have norm one. the real and imaginary parts of the antenna weights). . Give the weights you find. for k = 1. z(t + 1) = F z(t) + gv(t) where • x(t) ∈ Rn is the state of vehicle 1 • z(t) ∈ Rn is the state of vehicle 2 • u(t) ∈ R is the (scalar) input to vehicle 1 • v(t) ∈ R is the (scalar) input to vehicle 2 The initial states of the two vehicles are fixed and given: x(0) = x0 . 2. taking into account the prior information x = 1. A’ gives the conjugate transpose. To solve this problem. skinny matrix.. are given by x(t + 1) = Ax(t) + bu(t). • Very important: in matlab. v ∈ Rm is an unknown noise vector. ˆ ˆ ˆ 8. In this problem. which gives the matrix A and the observed vector y. The dynamics of two vehicles. z(0) = z0 . and also a plot of the antenna array response. We consider a standard estimation setup: y = Ax + v. and real matrices.we want a beamwidth of 20◦ around a target direction of 70◦ . 360. we assume that smaller values of v are more plausible than larger values. |rk |. As usual. You are to choose w ∈ Cn that minimizes 60 360 k=1 |rk |2 + k=80 |rk |2 subject to the constraint r70 = 1. i. the prime is actually the Hermitian conjugate operator. and verify that it satisfies x = 1.e. Give your estimate x. 8. if A is a complex matrix or vector. . hopefully.. you can use the least-norm solution of the linear equations resulting if you ignore the quadratic terms in f . we have z ≥ x . Use the notation x(k) to denote the kth iteration of your method. we don’t need to have the iterates satsfiy the nonlinear equations exactly. and w in terms of the problem data. i. . 65 . 1. .. and y ∈ Rm is a vector. (In particular. Explain clearly how you obtain x(k+1) from x(k) . with the function f : R5 → R2 given by f1 (x) f2 (x) = = −x2 + x3 − x4 + x5 − 0. (You don’t have to worry about what happens if the matrix is not full rank. v. (We repeat that you do not have to prove that the solution you found is really the least-norm one. Suppose f : Rn → Rm is a function. 8.e. F . We say that x ∈ Rn is a least-norm solution of f (x) = y if for any z ∈ Rn that satisfies f (z) = y. f (x) − y (which should be very small). do so.. g. i. x has larger dimension than y). . but be sure to make very clear what you’re assuming. 2x1 − 3x3 + x5 + 0. x0 . u(1). (a) Suggest an iterative method for (approximately) solving the nonlinear least-norm problem.) You can select the inputs to the two vehicles. where m < n (i. . Make sure to turn in your matlab code as well as to identify the least-norm x you find. the equations f (x) = y are linear. .e. however. full rank. it converges to a least-norm solution. Among choices of u. v(1).e. it is an extremely difficult problem to compute a least-norm solution to a set of nonlinear equations.. some good heuristic iterative methods that work well when the function f is not too far from affine. Give explicit formulas for the optimal u. that’s fine. This guess doesn’t necessarily satisfy the equations f (x) = y. z(N ) = w.5x2 x5 . and the equation residual. however. v(N − 1). If you need to assume that one or more matrices that arise in your solution are invertible.1x1 x2 − 0. starting from the initial guess x(0) . b. linear plus a constant). Use the method you invented in part a to find the least-norm solution of f (x) = y = 1 1 .We are interested in finding inputs for the two vehicles over the time interval t = 0.6x1 x4 + 0.e. . .. A. i. . v(0). and we know how to find the least-norm solution for such problems. You may assume that you have a starting guess. Your method should not be complicated or require a long explanation. .e. All you have to do is suggest a sensible. and w that satisfy the rendezvous condition.. and also a quadratic part. u(N − 1). and z0 . . If you need to make any assumptions about rank of some matrix. we want the one that minimizes the total input energy defined as E= N −1 t=0 u(t)2 + N −1 t=0 v(t)2 . which we’ll call x(0) . its nonlinear terms are small compared to its linear and constant part. x(N ) = w.) As initial guess. its norm. (The point w ∈ Rn is called the rendezvous point.. v. and the starting guess x(0) is good. There are. simple method that ought to work well when f is not too nonlinear.) Suggest a name for the method you invent. . When the function f is linear or affine (i. You do not have to prove that the method converges. Note that each component of f consists of a linear part. .) Your method should have the property that f (x(k) ) converges to y as k increases.e.12 Least-norm solution of nonlinear equations. u(0).. In general. N − 1 so that they rendezvous at state w ∈ Rn at time t = N . or that when it converges.3x3 x4 . etc. (b) Now we consider a specific example. i. as well as the rendezvous point w ∈ Rn . . i. . 8. x(0) =  B =  0. subject to way-point constraints p(ki ) = wi . u(1). f (K) ∈ R2 that minimize the cost function K J= k=1 f (k) 2 . . . we have p(1) = 0. h. (These state that at the time ti = hki . where ki are integers between 1 and K.1. . k3 = 40. and the way-point indices k1 . the vehicle must pass through the location wi ∈ R2 . We consider the discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t).e. (b) Carry out your method on the specific problem instance with data h = 0. α. K. and the M = 4 way-points w1 = 2 2 . time index k corresponds to time t = kh. .0 −0..e. m = 1. to simplify indexing. . . and α ∈ (0.0 0.0  . i. m > 0 is the vehicle mass. for k = 1. . m. v(1) = 0.8.14 Minimum energy input with way-point constraints.. . at rest.5 1. and the velocity by v(k) ∈ R2 .) Note that there is no requirement on the vehicle velocity at the way-points. (We take k = 1 as the initial time.25 −25 0. given all the problem data (i. Among the input sequences that yield x(20) = 0. the way-points w1 .e. The position at time index k is denoted by p(k) ∈ R2 .5 0 The goal is to choose an input sequence u(0). . We will use a discrete-time model.1  . .5 0. Plot the smoothest input usmooth . where f (k) ∈ R2 is the force applied to the vehicle at time index k. with time index k = 1. Explain how to solve this problem. .25 0 . i = 1. w3 = 4 −3 . M. . . where h > 0 is the sample interval. k2 = 30. . that minimizes Jsmooth = 1 20 19 t=0 1/2 (u(t) − u(t − 1)) 2 . K. We consider a vehicle that moves in R2 due to an applied force input. Plot f1 (k) and f2 (k) versus k. k = 1. . with       25 1. α = 0.13 The smoothest input that takes the state to zero. . . . but for the purposes of this problem. . u(19) that yields x(20) = 0.) The vehicle starts at the origin. . K + 1. (a) Explain how to solve this problem. we want the one that is smoothest. Give the optimal value of J. .) The problem is to find forces f (1). 0 1. using 66 . where we take u(−1) = 0 in this formula. w2 = −2 3 . K = 100. wM . . 2. v(k + 1) = (1 − α)v(k) + (h/m)f (k). the vehicle velocity decreases by the factor 1 − α in each time index. . kM ). we consider them exact. .. and k4 = 80. with way-point indices k1 = 10. . and give the associated value of Jsmooth .0 1. w4 = −4 −2 . . (These formulas are approximations of more accurate formulas that we will see soon. 1) models drag on the vehicle: In the absence of any other force.1. . . These are related by the equations p(k + 1) = p(k) + hv(k). A =  0.. . the matrices AT A+µI and AAT + µI are both invertible. the number of equality constraints is no more than the number of variables. and (AT A + µI)−1 AT = AT (AAT + µI)−1 . Show that K is singular. p(K + 1). Here p is a 2 × (K + 1) matrix with columns p(1).) (c) Now assume that A is fat and full rank. minimize Ax − b subject to Cx = d. .) The matrix above. using plot(p(1. 8. but not shown. First. .e. 67 . Use this u to show that K is singular.p(2. Then. (b) Suppose that there is a nonzero u ∈ N (A) ∩ N (C). Multiply on the left by z T . (The vector x gives the solution of the norm minimization problem above. A ∈ Rm×n . . Plot the vehicle trajectory. with AT substituted for A. plot(f(2.subplot(211). in the lecture notes on page 8-12. . we found that the solution can be obtained by solving the linear equations AT A C T C 0 x λ = AT b d for x and λ. . Show that as µ tends to zero from above (i. (a) Let’s first show that AT A + µI is invertible.. with columns f (1). for any matrix A. assuming µ > 0.e. multiply on the left by (AT A + µI)−1 . This is what we needed to show. is called the KKT matrix for the problem. We assume here that f is a 2 × K matrix.:). and argue that z = 0. (This is asserted.) (b) Now let’s establish the identity above. µ is positive) we have (AT A + µI)−1 AT → AT (AAT )−1 .:)). and C ∈ Rk×n . (Your job is to fill all details of the argument. and any positive number µ.. . (KKT are the initials of some of the people who came up with the optimality conditions for a more general type of problem. (These inverses exist. . when is the KKT matrix K nonsingular? The answer is: K is nonsingular if and only if C is full rank and N (A) ∩ N (C) = {0}. will show that AAT + µI is invertible.) Suppose that (AT A + µI)z = 0. i. and on the right by (AAT + µI)−1 . We asume that C is fat (k ≤ n). which we will call K ∈ R(n+k)×(n+k) . f (K).:)). (a) Suppose C is not full rank. subplot(212). where the variable is x ∈ Rn .) 8. . Using Lagrange multipliers.15 In this problem you will show that. by part (a). explain why AT (AAT + µI) = (AT A + µI)AT holds.) One question that arises is. plot(f(1. This problem concerns the general norm minimization with equality constraints problem (described in the lectures notes on pages 8-13).:)). You will fill in all details of the argument below. (The same argument.16 Singularity of the KKT matrix. But you must state this clearly. B. tells us how much the price goes up in the current period when we buy one share. means we sold shares in the period t. 8. 8. For this problem.(c) Suppose that K is singular. . so there exists a nonzero vector [uT v T ]T for which AT A C CT 0 u v = 0. . The goal is to minimize the total input energy. . a single time period could be between tens of milliseconds and minutes. . We also don’t require bt to be integers. . The amounts we purchase are large enough to have a noticeable effect on the price of the shares. which implies u ∈ N (A). Here p is the base price of the shares and α and θ are parameters that determine how purchases affect ¯ the prices. You are to do this over T time periods. ¯ pt = θpt−1 + (1 − θ)¯ + αbt . . xdest =  1  . B =  0  . and x(tdest ) = xdest (i. defined as E= T −1 t=0 u(t)2 . for t = 1. u(1). b1 . . Here xdest ∈ Rn is a given destination state. u(T − 1). . The prices change according to the following equations: p1 = p + αb1 . for example.e. . We must choose u(0). . or N (A) ∩ N (C) = {0}. (b) Carry out your method on the particular problem instance with data       1 1 1 0 0 A =  1 1 0  . the time when the state hits the desired state. Finish the argument that leads to the conclusion that either C is not full rank. Multiply AT Au + C T v = 0 on the left by uT .18 Optimal dynamic purchasing. which is positive. which lies between 0 and 1. and use Cu = 0 to conclude that Au = 0. you can just assume they are.e. .17 Minimum energy roundtrip. (The quantities B. with u(t) ∈ R and x(t) ∈ Rn . as well as the input trajectory u(0). Note that you have to find tdest . so we have b1 + · · · + bT = B. . AT Au + C T v = 0 and Cu = 0. (Depending on the circumstances. . .) We let pt denote the price per share in period t.) We will let bt denote the number of shares bought in time period t. . −1 0 0 1 1 Give the optimal value of tdest and the associated value of E. you may assume that n ≤ tdest ≤ T − n. x(0) = 0. it can be any integer between 1 and T − 1. bT can all be any real number. T .. The parameter α. and plot the optimal input trajectory u. after T steps we are back at the zero state). We consider the linear dynamical system x(t + 1) = Ax(t) + Bu(t). T = 30. measures the memory: If θ = 0 the share price has no memory. u(T − 1) so that x(T ) = 0 (i. The parameter θ. so the total cost of purchasing the B shares is C = p1 b1 + · · · + pT bT . . of shares in some company. and the purchase made in period t only affects the price in 68 . Conclude that u ∈ N (C). p t = 2. . bt < 0. If you need some matrix or matrices that arise in your analysis to be full rank. at time tdest the state is equal to xdest ). Write this out as two block equations. . T. The time tdest is not given. You are to complete a large order to buy a certain number. (a) Explain how to do this. .. . If purchases didn’t increase the price.00015. Xtest and ytest. ¯ Find the purchase quantities b1 . . you can use sum((yhat == 1) & (y == -1)). B. (a) Explain (briefly) how to find w and v. C − pB.. Use the w and v you found to make predictions about whether the documents in the test set are interesting. and −1 meaning the document is not (for example. and the transaction cost if the purchases were evenly spread over the periods (i. the cost of purchasing the shares would always be pB. (This is called a binary label. (a) You are given A. etc. u(1). The ¯ difference between the total cost and this cost. with N is as small as possible.. To count false positives. . singular values. . Compare these three quantities. α = 0. the number of occurrences of a certain term in the document. to guess whether an as-yetunread document is interesting. say so. If your method requires that some rank or other conditions to hold.g. . 1. and a label yi ∈ {−1. u(t) ∈ Rm . Give the number of correct predictions (for which yi = yi ). or in EE364a. You must explain your method clearly. for vector arguments. the effect a purchase has on the price decays by a factor of two between periods. Also give the transaction cost if all the shares were purchased in the first period. Your answer can involve any of the concepts used in the course so far. and y.19 Least-squares classification. . For scalar a. Give the optimal transaction cost. so that x(N ) = 0. using any concepts from this class. . such as least-squares. . if 1000 shares were purchased in each period). y y You may find the matlab function sign() useful.20 Minimum time control. of the same size (i. etc. eigenvalues. pseudoinverses. From this data set we construct w ∈ Rn and v ∈ R that minimize N i=1 (wT x(i) + v − yi )2 . u(N − 1). 8. bT that minimize the transaction cost C − pB.e. 69 t = 0. for the particular ¯ problem instance with B = 10000. We consider a discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t). in your matlab code.m. i. Remark. p = 10. 8. If θ = 1. false positives ˆ (î = +1 while yi = −1). e.e. n and N ). for example. ¯ θ = 0.5 (say). rank. least-squares. .. This M-file will also define a second data set. We can now use w and v to predict the label for other documents. is called the transaction cost. and x(0) = xinit .that period. QR factorization. . and false negatives (î = −1 while yi = +1) for the test set. that these conditions are satisfied for the given problem instance. for example.) Each component of the feature vector could be. . whose columns are x(i) . spam). You must also check. . with x(t) ∈ Rn .. with +1 meaning the document is interesting or useful. sign() is taken elementwise. 1}. If you need to make an assumption about the rank of a matrix. which you can learn about in a modern statistics or machine learning course. There are better methods for binary classification. Explain how to find an input sequence u(0). T = 10. the label could be decided by a person working with the documents. But least-squares classification can sometimes work well. the price has perfect memory and the price change will persist for all future periods. .e. we define sign(a) = +1 ˆ for a ≥ 0 and sign(a) = −1 for a < 0. nullspace. . by forming y = sign(wT x + v). which defines X. (b) Find w and v for the data in ls_classify_data. say so. For each of N documents we are given a feature vector x(i) ∈ Rn .8. range. if θ is 0. . receiver 2 forms an estimate of the signal y using the linear decoder ˆ y = Gz.. . in the kth interval. this is just special notation for PWA signals that refers to the condition above.) Receiver 1 forms an estimate of the signal x using the linear decoder x = F z. The transmitter broadcasts the signal z = Ax + By ∈ Rn to each receiver. of course. . This means that if each piecewise affine segment were extrapolated to the next index. This means ˆ ˆ that both decoders are perfect. for k = 1. u(N − 1) that results in x(N ) = 0. (The matrices A and B are called the coding matrices. . . respectively. (c) Find minimum norm perfect decoding matrices for the data (i. since ‘linear’ is sometimes used to mean ‘affine’.. each reconstructs the exact desired signal. Thus. Two vector signals x ∈ Rp and y ∈ Rq are to be transmitted to two receivers.) Your answer can involve any of the concepts we’ve seen so far in EE263.) ˆ The goal is to find F and G so that x = x and y = y. . We call αk and βk the slope and offset. .) We can also add a continuity requirement. if zj = αk j + βk . . . . minimize p n 2 Fij i=1 j=1 q n + i=1 j=1 G2 . .22 Piecewise affine fitting. and are known. iK ..) Finally. xinit =  1  1 1 with data   . K. You must give us the (minimum) value of N . . no matter what values x and y take. . (It is very common to refer to such a signal as piecewise-linear. 8. . .21 Interference cancelling equalizers. (The matrices F ∈ Rp×n and G ∈ Rq×n are called the decoding matrices. to the specific problem instance     1 0  1  1     . for ik ≤ j < ik+1 . i2 . it doesn’t make sense to refer to a discrete signal as continuous.e. How would you find perfect decoding matrices that. K − 1. (Of course. N. (a) When is it possible to find perfect decoding matrices F and G? (The conditions. . When a PWA signal satisfies this condition. . iK+1 . B =   0  . .m. i2 − 1. . and a sequence of inputs u(0). k = 1. We say that a signal z is piecewise-affine (PWA). 70 . . . we say that it is continuous. depend on A and B. . . with kink points i1 . In this problem we refer to vectors in RN as signals. . while completely rejecting the other (undesired) signal. . we get to the problem. ij We call such decoding matrices minimum norm perfect decoding matrices. . For this reason we call decoding matrices with this property perfect. the value would agree with the starting value for the next segment.(b) Apply the method described in part (a)  0 0 1 1  0 1 1 1 A=  1 0 1 0 1 1 1 0 (where n = 4 and m = 1).  8. A and B) given in the M-file mn_perf_dec_data. among all perfect decoding matrices. (b) Suppose that A and B satisfy the conditions in part (a). . . i3 − 1. the signal value is an affine function of the index j (which we might interpret as time in this problem) over the (integer) intervals 1. which are integers satisfying i1 = 1 < i2 < · · · < iK+1 = N + 1. . αk ik+1 + βk = αk+1 ik+1 + βk+1 . but this time. for i = 1. A(K) . .e.23 Robust input design. plot the original signal along with the PWA and continuous PWA approximations.) (ˆj − yj )2  y pwa 1/2 . . you should think of the index for y as denoting a time period. If we knew A. . Use the code we provided as a template for your plots. . day to day. and A ∈ Rm×n characterizes the measurement system. You are to design a linear estimator. (a) Give formulas for xln and xmmse . . Our goal is to choose the input x ∈ Rn so that y ≈ y des . is that we don’t know A exactly. iK+1 . (You can assume that A is full rank. ˆ ˆ 71 . and so on. for i = 1. . . A(K) . that x = x when v = 0. it varies a bit.(a) You are given a signal y ∈ RN . K.. and some kink points i1 . A(1) . We require that the estimator be unbiased. We now define y (i) = A(i) x. v ∈ Rm is the vector of measurement noises. A(:. We consider a standard measurement set up. . we could use standard EE263 methods to choose x. for example. (b) Find xln and xmmse for the problem with data given in rob_inp_des_data. The catch here. • Mean-square error minimization method. so that all m measurements are available in the mth time period. we have more degrees of freedom in our choice of input than specifications for the outcome.e. . in the next time period y2 becomes available. We are given a system. . . which might. the components of y) as arriving sequentially. 8. Write down the values of xln and xmmse you found.m.24 Unbiased estimation with deadlines. ˆ  1 N N j=1 (b) Repeat part (a). with A ∈ Rm×n . K ¯ • Least norm method. given by a matrix B ∈ Rn×m . We will consider two different methods to choose x. though. Running this data file will define y and the kink points. Running this M-file will define ydes and the matrices A(i) (given as a 3 dimensional array. with y = Ax + v. Produce and submit scatter plots of y (i) for xln and xmmse . K. but please state them explicitly. in terms of y des and A(1) . Choose xmmse to minimize the mean-square error 1 K K i=1 y (i) − y des 2 .e.m. where y des ∈ Rm is a given target outcome. with approximation error measured in the RMS sense. . . i. Choose xln to be the least-norm solution of des ¯ ¯ Ax = y . Our goal is to choose x so that y (i) ≈ y des . How would one find the best PWA approximation y pwa of y. Using this template. i. In the first time period. for example. . . x ∈ Rn is the vector of parameters to be estimated.13) is A(13) ). which we know follows y = Ax. . Define A = (1/K) i=1 A(i) . and you should imagine the measurements (i. be obtained by measurements of A taken on different days. . with the estimate of x given by x = By. . Also included in the data file (commented out) is code to produce scatter plots of your results. We’ll assume that m ≤ n. . The data file also includes a commented out code template for plotting your results.:. (c) Carry out your methods from parts (a) and (b) on the data given in pwa_data. Give us the RMS approximation error in both cases. 8. . In this problem. You can make any needed rank assumptions about matrices that come up. say. But we do have some possible values of A.. where y ∈ Rm is the vector of measurements. you are to find the continuous PWA approximation y pwac that ˆ minimizes the RMS deviation from y.. y1 becomes available. transportation. . in terms of the size of estimator matrix. i. . (a) How would you determine whether or not an unbiased linear estimator. A matrix F ∈ Rk×n . First we explain the idea of sector exposure. . which respects the given deadlines. it can involve a sequence of tests. but you must state this clearly. 8. Compare the value of J found for your estimator with the value of J for B = A† . exists? Your answer does not have to be a single condition. . 0 < k1 < k2 < · · · < kn = m. given as Rfact = F x ∈ Rk . i. The increase in J can be thought of as the cost of imposing the deadlines. x1 may only be computed from y1 . . . If Ri is large (in magnitude) our portfolio is exposed to risk fact from changes in that sector. We have a list of k economic sectors (such as manufacturing. we require that xi must be a function of y1 . . The (known) data in this problem are µ ∈ Rn . You are given ˆ increasing deadlines. . which we now explain.. . We say that ki ˆ is the deadline for computing xi . We require that xi can be computed ˆ after ki time periods. in dollars) in asset i. i. called the factor loading matrix. We want to choose x so that r is large.25 Portfolio selection with sector neutrality constraints. The parameter λ. (b) Assume that it is possible to find an unbiased linear estimator that respects the deadlines. . is called the risk aversion parameter. We let xi ∈ R denote the investment (say. The data in this problem are the measurement matrix A and the deadlines k1 .e. If Ri = 0. .) We will choose the portfolio weights x so as to maximize r − λRid .e. Explain how to find the smallest such estimator matrix. F ∈ Rk×n . The idiosyncratic risk is given by n Rid = i=1 2 σi x2 .. you can just assume they are. . the xi are sometimes called portfolio weights. ). (You can take the formula above as a definition. we say that the portfolio is neutral with respect to sector i. which is positive. ym . . σn ) ∈ Rn . which is called the risk-adjusted return.In addition. such as ‘A is skinny and full rank’.m. where 1 is the vector with all entries 1. defense. where µ ∈ Rn is a vector of asset (mean) returns. and λ ∈ R. Rfact = 0. yki only. . our estimate of the ith parameter to be estimated.) The portfolio (mean) return is given by r = µT x. σ = (σ1 . the B that minimizes n m 2 Bij . if it is small. . . . while avoiding risk exposure. . Another type of risk exposure is due to fluctations in the returns of the individual assets. subject to neutrality with respect to all sectors.e. yk1 .. . we are less exposed to risk from that sector. which we explain next. We consider the problem of selecting a portfolio composed of n assets. The number Ri is the portfolio risk fact exposure to the ith economic sector. ˆ Thus. with xi < 0 meaning that we hold a short position in asset i. We normalize our total portfolio as 1T x = 1. 72 . i=1 j=1 J= If your method requires some matrix or matrices to be full rank. . relates the fact portfolio x to the factor exposures. . . Of course we also have the normalization constraint 1T x = 1. while xn may be computed from all of the measureˆ ments y1 . i where σi > 0 are the standard deviations of the asset returns. kn . we have deadline constraints. (With normalization. you do not need to understand the statistical interpretation. (c) Carry out the method described in part (b) on the data found in the M-file unbdl_data. . energy. u(1). . . . x(1) = 0.. . The subtlety here is that we are not told what d is at t = 1. . u(t) ∈ Rm is the input. at time t. . u(k) (M + N ). . . and Rid (the idiosyncratic risk). . . there’s no point using the input (for which we are charged) until then. are given. where x(t) ∈ Rn is the state. using methods from the course.m. . K. you must choose the inputs up to time M .26 Minimum energy control with delayed destination knowledge. with dynamics x(t + 1) = Ax(t) + Bu(t). . the destination (which is one of d(1) . i.) Comment briefly on the following statement: “Since we do not know where we are supposed to go until t = M + 1. .(a) Explain how to find x. for k = 1.m. p(k) (M + N + 1). You are welcome (even encouraged) to express your solution in terms of block matrices.” 73 . Report the optimal cost function value. and the initial state x(1). find the optimal portfolio. Verify that 1T x = 1 (or very close) and Rfact = 0 (or very small). . . p(t) = Cx(t). . . We will choose the inputs to minimize the cost function M u(t) t=1 2 1 + K K M +N u(k) (t) 2 . (a) Explain how to find u(1). . . Report the associated values of r (the return). . k=1 t=M +1 which is the sum of squared-norm costs. d(K) (which we are given). averaged over all destinations. Thus. 2. . (The data file contains commented-out code for producing your plots. u(M ) and u(k) (M + 1). u(k) (M + N ). . . in the case when the final destination is d(k) . and p(t) ∈ R2 is the position of the vehicle. and for each possible destination plot the position of the vehicle p(k) (1). . u(2). (b) Using the data given in sector_neutral_portfolio_data. . . . The vehicle must reach a destination d ∈ R2 at time t = M + N + 1. as u(k) (M + 1). . . d(K) ) will be revealed to you. we simply know that it is one of K possible destinations d(1) . . . We consider a vehicle moving in R2 . The matrices A ∈ Rn×n . . (b) Carry out your method on the data given in delayed_dest_data. independent of the actual final destination. At time t = M + 1. . We will denote the choice of these inputs. but you can choose u(M + 1).e. . . . . . u(M ). u(M + N ) depending on the final destination. . . B ∈ Rn×m and C ∈ R2×n . 8. t = 1. formed from the given data. where M > 0 and N > 0. we must have p(M + N + 1) = d. R5 R6 R7 R1 C1 R2 C2 C3 R3 C4 R4 9. it decreases by one with probability 0. Fish that are less than one year old in the current year (t) bear no offspring.2 Tridiagonal systems.e. Express the population dynamics as an autonomous linear system with state x(t). fish. Tridiagonal matrices arise in many applications. We consider a certain population of fish (say) each (yearly) season. • 40% of the fish less than one year old in the current year (t) die. Fish that are between one and two years old in the current year (t) bear an average of 2 offspring each. ˙ (b) Consider a Markov chain with four states labeled 1. A square matrix A is called tridiagonal if Aij = 0 whenever |i−j| > 1. where A ∈ R4×4 is tridiagonal.. • The number of fish less than one year old in the next year (t + 1) is equal to the total number of offspring born during the current year. The state transition probabilities are described as follows: when z is not 4.3 A distributed congestion control scheme. as follows: • x2 (t) denotes the number of fish between one and two years old • x1 (t) denotes the number of fish less than one year old • x3 (t) denotes the number of fish between two and three years (We will ignore the fact that these numbers are integers. the remaining 60% live on to be between one and two years old in the next year (t + 1). Each route 74 .1 A simple population model. in the form x(t + 1) = Ax(t). (If z neither increases nor decreases. but more sophisticated population dynamics models are very useful and widely used. • All of the two-to-three year old fish in the current year die.2. along one or more links in the network. 9. it stays the same. when z is not 1. x(t) ∈ R3 will describe the population of fish at year t ∈ Z. (c) Find the linear dynamical system description for the circuit shown below.3. Let z(k) denote the state at time k. which is a path from a source node.2. (a) Draw a pretty block diagram of x = Ax. it increases by one with probability 0. i. and 70% live on to be two-to-three year old fish in the next year. Give the discrete time linear system equations that govern the evolution of the state distribution. Draw a graph of this Markov chain as in the lecture notes. Use state x = [v1 v2 v3 v4 ]T . and 50% of the old fish.3.e.Lecture 9 – Autonomous linear dynamical systems 9. where vi is the voltage across the capacitor Ci . The routes are determined and known. to the destination node.4. A data network is modeled as a set of l directed links that connect n nodes. z(k + 1) = z(k)). There are p routes in the network.) The population evolves from year t to year t + 1 as follows. Remark: this example is silly.. Fish that are between two and three years old in the current year (t) bear an average of 1 offspring each. i. • 30% of the one-to-two year old fish in the current year die. We define the congestion on link i as Ti (t) − Titarget . Does the rate x always correspond to zero congestion on every ¯ link? Is it optimal in any way? 9. . bits per second). the rates and traffic are nonnegative. Note that this congestion control method is distributed . for i = 1. We use Ti (t) to denote the total traffic on link i at time t. k − 1. We denote the source rate for route j at time t as xj (t). for example.1. . . Each route monitors the congestion for the links along its route. .) The total traffic on a link is the sum of the source rates for the routes that pass through it. p. In this problem we consider a very simple congestion control protocol. The goal in congestion control is to adjust the source rates in such a way that the traffic levels converge to the target levels if possible. from two different initial source rates. What can you say about x? ¯ ¯ Limit yourself to a few sentences. . t = 0.. i = 1. for i = 1. and −1. we only ask you to hand in the plots for two.e. Thus. Hint: use the matrix R. . the linear dynamical system found in part (a)) has a unique equilibrium point x. The congestion is positive if the traffic exceeds the target rate. . . . however. must satisfy several constraints: • Transition constraint: Consecutive symbols cannot be more than one apart from each other: we must have |xi+1 − xi | ≤ 1. . for each of the initial source rates. . (You are welcome to simulate the system from more than two initial source rates. For this communication system. however. or close to the target levels otherwise.4 Sequence transmission with constraints.) Make a brief comment on the results. l. . xk . we do not take into account the link capacities.e. we have x(t + 1) = Ax(t) + b. It then adjusts its source rate proportional to the sum of the congestion along its route. a valid sequence of symbols x1 . a 0 or a 1 can follow a 1. (a) Show that x(t). Each link has a target traffic level. 2. . . . .has a source rate (in. (We assume the system operates in discrete time. the vector of source rates. . each source only needs to know the congestion along its own route. k − 2. Be as explicit as you can about what A and b are. . and that the rate x(t) converges to it as t → ∞.. using algorithm parameter α = 0. . • Power constraint: The sum of the squares of any three consecutive symbols cannot exceed 2: x2 + x2 + x2 ≤ 2. which we denote Titarget . In real congestion control. . and the traffic on each link must be below a maximum allowed level called the link capacity. say. and all target traffic levels equal to one. i i+1 i+2 for i = 1. we define a matrix that may be useful. . (c) Now we come back to the general case (and not just the specific example from part (b)). j = 1. can be expressed as a linear dynamical system with constant input. In this problem. 1. . is defined as Rij = 1 0 route j utilizes link i otherwise. . and allow the source rates and total traffic levels to become negative. and negative if it is below the target rate. . A communication system is based on 3 symbols: 1. we ignore these effects. i.. . 0. x2 . 75 . Plot the traffic level Ti (t) for each link (on the same plot) versus t. . . Before we get to the questions. Try to use the simplest notation you can. . The route-link matrix R ∈ Rl×p . . This can be expressed as: xj (t + 1) = xj (t) − α (sum of congestion along route j) . but a −1 cannot follow a 1. Assume the congestion control update (i. . and does not directly coordinate its adjustments with the other routes. (b) Simulate the congestion control algorithm for the network shown in figure 1. l. i = 1. where α is a positive scalar that determines how aggressively the source rates react to congestion. . l. • Average constraint: The sum of any three consecutive symbols must not exceed one in absolute value: |xi + xi+1 + xi+2 | ≤ 1. How many different (valid) sequences of length 20 are there? q1 k2 m2 k1 m1 where the state is given by Here qi give the displacements of the masses. route 3 is (3. with links shown darker. 2. and ki are the spring 1 0 0 0  0 1  x 0  0 76 1 0 11 00 1 0 1 0 11 00 1 0 9. . 3). mi are the stiffnesses. Route 1 is (1. route 2 is (1. a sequence that contains 1100 would not be valid. 2. 5. respectively. . and route 4 is (4. 4. .5 Consider the mechanical system shown below: q2 . 5). k − 2. for i = 1.  q1  ˙ q2 ˙  values of the masses. 3. because the sum of the first three consecutive symbols is 2. So. where routes are defined as sequences of links. 1). The dynamics of this system are  0 0  0 0 x =  k1 +k2 ˙  k2 − m1 m1 k k2 − m22 m2  q1  q  x =  2 .4 1 3 2 1 5 4 2 3 Figure 1: Data network for part (b). for example. 4). All traffic and routes flow counterclockwise (although this doesn’t matter). . but the standard methods for computing x = A−1 b (e. the parameter values are m1 = m2 = 1. α2 ∈ R that make the specification hold). after t iterations.) ˆ ˆ This iteration uses only ‘cheap’ calculations: multiplication by A and A−1 . Determine whether each of these specifications is feasible or not (i. to satisfying the specification. ˆ r(t) = Aˆ(t) − b. (For example. has relatively small off-diagonal elements).) This problem concerns selection of the impulsive forces α1 and α2 . find the particular α1 . of the true solution x = A−1 b. Consider the following specifications: (a) q2 (10) = 2 (b) q1 (10) = 1. A common approach is to use an iterative method. 9. We assume that Az can be computed at reasonable cost.. k1 = k2 = 1. There are many. k1 = k2 = 1)..e. for t = 0.. Note that r(t) is the residual after the tth iteration. Either show that this is true. The unit ˙ square in R2 is defined by S = { x | − 1 ≤ x1 ≤ 1.) Be sure to be very clear about which alternative holds for each specification. if you cannot find α1 . whether there exist α1 . q1 (10) = 0. presumably. which computes a sequence x(1). These methods rely on another matrix ˆ ˆ ˆ A.e. ˙ 9. (a) Find the exact conditions on A for which the unit square S is invariant under x = Ax. (b) Consider the following statement: if the eigenvalues of A are real and negative. α2 that satisfy q1 (10) = 1. the matrix A might be the diagonal part of the matrix A (which.(i. m2 = 1.g. in a least-squares sense. each mass starts with zero position and a velocity determined by the impulsive forces. If the specification is feasible. − 1 ≤ x2 ≤ 1 }. 77 . that converges to the solution x = A−1 b. you are able to apply a strong impulsive force αi to mass i.. Give the ˙ conditions as explicitly as possible. where A is nonsingular (square) and x is very large (e.3. find the particular α1 . .6 Invariance of the unit square. . is to set x(0) equal to some approximation of x (e. . In many applications we need to solve a set of linear equations Ax = b. . α2 /m2 (The hat reminds us that x(t) is an approximation. which is supposed to be ‘close’ to A. LU decomposition) are not feasible. q2 (10) = 0 ˙ ˙ (d) q2 (10) = 2 when the parameters have the values used above (i. . it’s just scaling the entries of z. many other examples. As a simple example. Obviously computing ˆ A−1 z is fast. If the specification is infeasible.g. then find αi that minimize (q1 (10) − 1)2 + (q2 (10) − 2)2 .e. q2 (10) = 2. More importantly.. or give an explicit counterexample. q2 (10) = 2 when the parameters have the values m1 = 1. 1. q2 (10) = 2 (c) q1 (10) = 1. For parts a–c below. A simple iterative method. x ∈ R100000 ). x(2) . Consider the linear dynamical system x = Ax with A ∈ R2×2 . A has the property that A−1 z is easily or ˆ cheaply computed for any given z.7 Iterative solution of linear equations. k1 = k2 = 1. then S is invariant under x = Ax. sometimes called relaxation. ˆ ˆ x(0) = A−1 b) and repeat. which results in initial condition   0   0  x(0) =   α1 /m1  . for any z. and also. α2 that come closest. ˆ ˆ Immediately before t = 0. x ˆ x(t + 1) = x(t) − A−1 r(t). α2 that satisfy 2 2 the specification and minimize α1 + α2 . m1 = m2 = 1. q2 (10) = 2. ..g. and the SINRs did not appear to converge. ) Please note: • Your conditions should be as explicit as possible. 2. say. no matter what the initial power was. Consider the linear dynamical system x = A(t)x ˙ where A1 2k ≤ t < 2k + 1. 1. they converge to values for which each SINR exceeds γ. or some combination. i. t→∞ (Note that this holds when x converges to zero .8. then x(t) − x ≤ β t+1 x .. In other words. and simulated it for a specific set of parameters. Your condition should be as explicit as possible. You are going to analyze this. α. α. 78 . Let us consider fixed values of G. Give the simplest expression you can.) ˆ ˆ (b) Find the exact conditions on A and A such that the method works for any starting approximation x(0) and any b. We want you to give us the most general conditions under which the property holds. the powers appeared to diverge. with several values of initial power levels. and doesn’t work for γ > γcrit . Thus if β < 1. k = 0. What are the conditions on A1 and A2 under which the system has a nonzero periodic trajectory.ˆ ˆ ˆ (a) Let β = A−1 (A − A) (which is a measure of how close A and A are). we have ˙ lim x(t + 2) − x(t) = 0. then convergence is pretty fast. A(t + 2) = A(t) for all t ≥ 0. with x(t + 2) = x(t) and x = A(t)x. etc. (And if β < 0. 1. and the two SINR threshold levels γ = 3 and γ = 5. you expressed the power control method as a discrete-time linear dynamical system. ˙ (b) All trajectories are asymptotically periodic. now that you know alot more about linear systems. with period 2? By this we mean: there exists x : R+ → Rn . 9. and σ.8) • Your condition doesn’t guarantee convergence. for example. Of course you must explain how you came up with your expression. What are the conditions on A1 and A2 under which all trajectories of the system are asymptotically 2-periodic? By this we mean: for every x : R+ → Rn with x = A(t)x. α. It turns out that the power control algorithm works provided the SINR threshold γ is less than some critical value γcrit (which might depend on G.e. etc. You can refer to the matrices A1 and A2 .9 Analysis of a power control algorithm. and σ.1 Please refer to this problem for the setup and background. • We do not want you to give us a condition under which the property described holds. the powers converged to values for which each SINR exceeded γ. and two target SINRs. A(t) = A2 2k + 1 ≤ t < 2k + 2. (For example: β = A−1 (A − ˆ A) < 0. σ. σ). You found that for the target SINR value γ = 3.) Find an expression for γcrit in terms of G ∈ Rn×n . 2. A(t) switches between the two values A1 ∈ Rn×n and A2 ∈ Rn×n every second. whereas for the larger target SINR value γ = 5. The matrix A(t) is periodic with period 2. . . (b) Critical SINR threshold level. (‘Works’ means that no matter what the initial powers are. .. Show that if we choose ˆ−1 b. (a) Explain the simulations. for any x(0) = A ˆ ˆ b we have x(t) → x as t → ∞. singular values. i. .e. 9. or any matrices derived from them using standard matrix operations. singular values and singular vectors. their eigenvalues and eigenvectors or Jordan forms. In this problem we consider again the power control method described in homework problem 2. . k = 0. α. and eigenˆ ˆ values of A and A. Explain your simulation results from the problem 1(b) for the given values of G. Try to avoid the following two errors: ˆ • Your condition guarantees convergence but is too restrictive. In that problem. . the iterative method works. x not identically zero. Your condition can involve norms. . it should not include any limits. . (a) Existence of a periodic trajectory. condition number.8 Periodic solution of periodic linear dynamical system. .. a ∈ R .m. i = 2. We consider a thermal system described by an n-element finite-element model.) The parameter bi gives the thermal conductance between element i and element i + 1. we have x(t) → xe as t → ∞. so ai Ti is the heat flow from element i to the environment (i. . (This means that for any x(0). controllability.9. You must provide the source code for these calculations. . √ To √ formalize these requirements.12 Optimal choice of initial temperature profile. . e. so bi (Ti − Ti+1 ) is the heat flow from element i to element i + 1. T (0) ∈ Rn . with the temperature of element i at time t denoted Ti (t). (e) Show that if all eigenvalues of A have negative real part. and the second is the RMS temperature deviation from ambient at t = 0. and n ˙ ci Ti = −ai Ti − bi (Ti − Ti+1 ) − bi−1 (Ti − Ti−1 ). for any trajectory x.g.e. A4 }. least-squares. • The numerical calculations that verify the conditions hold for the given data. 79 ˙ We can interpret this model as follows. Your proof will consist of two parts: • An explanation of how you are going to show that any trajectory converges to zero. We consider a discrete-time linear dynamical system x(t + 1) = A(t)x(t). at t = tdes . The elements are arranged in a line. the direct heat loss from element i.10 Stability of a time-varying system. where A(t) ∈ {A1 . and for any trajectory x(t). tdes ∈ R is a specific time when we want the temperature profile to closely match T des ∈ Rn . We also wish to satisfy a constraint that T (0) should be not be too large. A4 . . then there is exactly one equilibrium point xe . bi−1 (Ti − Ti−1 ) is the heat flow from element i to element i − 1. The parameter ai gives the thermal conductance between element i and the environment. . so ci Ti is the net heat flow into element i. .11 Linear dynamical system with constant input.) ˙ (a) When is there an equilibrium point? (b) When are there multiple equilibrium points? (c) When is there a unique equilibrium point? (d) Now suppose that xe is an equilibrium point. singular values. Temperature is measured in degrees Celsius above ambient. A3 . negative Ti (t) corresponds to a temperature below ambient.. T max is the (given) maximum inital RMS temperature value. A(2). Finally.. The parameter ci is the heat capacity of element i. Show that this system is stable. n ˙ cn Tn = −an Tn − bn−1 (Tn − Tn−1 ). (This means that the constant trajectory x(t) = xe is a solution of x = Ax + b.) You may use any methods or concepts used in the class. From ˙ this. Here. Your argument of course will require certain conditions (that you will find) to hold for the given data A1 . . We consider the system x = Ax + b. . we use the objective (1/ n) T (tdes ) − T des and the constraint (1/ n) T (0) ≤ T max . from the desired value. and for any sequence A(0). exp(tA). A2 . . and so on. The goal of this problem is to choose the initial temperature profile. The first expression is the RMS temperature deviation. A(1). The dynamics of the system are described by ˙ c1 T1 = −a1 T1 − b1 (T1 − T2 ). we have x(t) → 0 as t → ∞. Show that z = Az. These 4 matrices. we have x(t) → 0 as t → ∞. Define z(t) = x(t) − xe . eigenvalues. and b ∈ Rn−1 are given and are all positive. 9.e. so that T (tdes ) ≈ T des . x(0)). which are 4 × 4. and show the results as well. n − 1. . with x(t) ∈ Rn . 9. A ˙ vector xe is an equilibrium point if 0 = Axe + b. . are given in tv_data. . i. give a general formula for x(t) (involving xe . where c ∈ R . (b) Solve the problem instance with the values of n. on one graph. your T (0). T des and T max defined in the file temp_prof_data. a. Give √ RMS temperature error (1/ n) T (tdes )− the T des . √ Plot. and the RMS value of initial temperature (1/ n) T (0) . c. b. T (tdes ) and T des .m.(a) Explain how to find T (0) that minimizes the objective while satisfying the constraint. 80 . tdes . since we are given conditions on the state at two time points instead of the usual single initial point.e. i. Consider the system described by x = Ax. . .. 10.7 1.2 Harmonic oscillator. i. . The system x = ˙ 0 −ω ω 0 x is called a harmonic oscillator. x(t) is constant. (b) Carefully show that d At dt e = AeAt = eAt A. some people refer to the system z = σz + Az as a damped version of x = Ax. AB = BA. . 1. (b) Show that det eA = eTr A . . λn . (This is called a two-point boundary value problem. (b) Suppose x1 (0) = 1 and x2 (1) = 2. Use only the differential equation.e. .4 Two-point boundary value problem. (a) Find the eigenvalues.5 x.1 Suppose x = Ax and z = σz + Az = (A + σI)z where σ ∈ R. and state transition matrix for the harmonic oscillator.5 −0. A leaky integrator satisfies y − σy = u. Find x(2). eλn . you replace every integrator in the original system with a leaky integrator. although it is true in the general case. and Tr Y is the sum of the eigenvalues of Y . .3 Properties of the matrix exponential. 10. to ˙ get the damped system. Verify this fact using the solution from part (a). Hint: det X is the product of the eigenvalues of X. You can assume that A is diagonalizable. Express x(t) in terms of x(0). When σ < 0. . .) 10.4 0. How are z(t) and x(t) ˙ ˙ related? Find the simplest possible expression for z(t) in terms of x(t). where A = ˙ (a) Find eA . 10. resolvent. .Lecture 10 – Solution via Laplace transform and matrix exponential 10. In this problem we consider the specific system x = Ax = ˙ 0. 10. (a) Suppose the eigenvalues of A ∈ Rn×n are λ1 . y2 (t) = sgn(x2 (t)). We have a detector or sensor that gives us the sign of each component of the state x = [x1 x2 ]T each second: y1 (t) = sgn(x1 (t)). Show that the eigenvalues of eA are eλ1 . 2. . Another way to ˙ ˙ think of the damped system is in terms of leaky integrators. and x(0) = z(0). do not use the explicit solution you found in part (a). (b) Sketch the vector field of the harmonic oscillator.6 Linear system with a quadrant detector. Verify that this holds for any trajectory of the harmonic oscillator. (a) Show that eA+B = eA eB if A and B commute. where the function sgn : R → R is defined by sgn(a) =   1 a>0 0 a=0  −1 a < 0 81 . −1 1 −1 1 .. (d) You may remember that circular motion (in a plane) is characterized by the velocity vector being orthogonal to the position vector. Justify your answer. t = 0.5 Determinant of matrix exponential. (c) The state trajectories describe circular orbits. We consider the system x = Ax. you must completely justify and explain your answer. then the eigenvalues of A−1 are 1/λ1 . Of course you must fully explain how you arrive at your conclusions. x(0) is in quadrant IV. what values could y(2) possibly take on? In terms of the quadrants. y(1. . for example: “y(0. −0. and y(3. and y(1. the output of this autonomous linear system is quantized to one-bit precision. (c) If the eigenvalues of A are λ1 .7).3) = −1. ˙ where A= and the sign function is defined as   +1 if a > 0 −1 if a < 0 sign(a) =  0 if a = 0 y(1.8) = +1. and y(3. −1. 82 . .4) = +1.) 10. The following outputs are observed: y(0. but y(3.8). . which means that its leading coefficient is one: X (s) = sn + · · ·.2) = −1. we also have y(0.8) is definitely −1. or 1)”. Consider the characteristic polynomial X (s) = det(sI − A) of the matrix A ∈ Rn×n .1 . . and x(1) is also in quadrant IV.7 Linear system with one-bit quantized output.3) = −1.7) is definitely +1” is: for any trajectory of the system for which y(0.e. .8) = +1 What can you say (if anything) about the following: y(0. −1 Based on these measurements.2) = −1.1 −1 y(t) = sign (cx(t)) 1 0. Finally. (d) The eigenvalues of A and T −1 AT are the same. Or. c= 1 −1 .7) can be anything (i. The question is: which quadrant(s) can x(2) possibly be in? You do not know the initial state x(0). You observe the sensor measurements y(0) = 1 .7) is definitely +1. y(1. (b) A is invertible if and only if A does not have a zero eigenvalue. det A−1 = 1/ det A. and.8 Some basic properties of eigenvalues. 0. y(2. the problem. (a) Show that X is monic.7) = +1. λn and A is invertible.7)? Your response might be. y(3. . Rougly speaking. Show the following: (a) The eigenvalues of A and AT are the same.There are several ways to think of these sensor measurements. y(2. .9 Characteristic polynomial. 10. det(AB) = det A det B. 1/λn . you can think of y(t) as a one-bit quantized measurement of the state at time t. 10. Hint: you’ll need to use the facts that det A = det(AT ). −1 y(1) = 1 . Of course. .4) = +1. if A is invertible. the problem can be stated as follows. You can think of y(t) = [y1 (t) y2 (t)]T as determining which quadrant the state is in at time t (thus the name quadrant detector ).. (What we mean by “y(0. 15 ± j7. Suppose A ∈ Rn×n has n linearly independent eigenvectors p1 . s − λk Note that this is a partial fraction expansion of (sI − A)−1 .12 Using matlab to find an invariant plane.e. .0939 2.) (d) Let λ1 .1387 1. λ3. 10. (c) Plot the individual states constituting the trajectory x(t) of the system starting from an initial point in the invariant plane. n. . (a) Find an orthonormal basis (q1 . say x(0) = q1 . . pT pi = 1.0428 4. Consider the continuous-time system x = Ax with A given ˙ by   −0. so that (c) Show that the constant coefficient of X is given by det(−A). The adjoint system associated with the linear dynamical system x = Ax is ˙ z = AT z..9709 −0. .0481   A=  −2.9859 −3.1164 You can verify that the eigenvalues of A are λ1. (e) Find the residue matrices for A= 1 0 1 −2 both ways described above (i. T (a) Let Rk = pk qk . X (s) = sn + an−1 sn−1 + · · · + a1 s + a0 = (s − λ1 )(s − λ2 ) · · · (s − λn ). What is Ri ? (c) Show that (sI − A)−1 = n k=1 Rk . 10. pn . i T Let qi be the ith row of Q. .8847 −0. for 0 ≤ t ≤ 40. Evidently the adjoint system and the system have the same eigenvalues.1005 1. q2 ) for the invariant plane associated with λ1 and λ2 .4 = −0.0510 −5. . Let P = [p1 · · · pn ] and Q = P −1 . By equating coefficients show that an−1 = − n i=1 λi and a0 = n i=1 (−λi ).10 The adjoint system. i = 1.4575 3. 83 . . λn denote the eigenvalues of A. (d) Show that R1 + · · · + Rn = I. (Tr X is the trace of a matrix: Tr X = n i=1 Xii .10 ± j5. . q4 ∈ R4 such that Q = [q1 q2 q3 q4 ] is orthogonal. . ˙ (a) How are the state-transition matrices of the system and the adjoint system related? (b) Show that z(0)T x(t) = z(t)T x(0).9229  −4. .2 = −0. . What is the range of Rk ? What is the rank of Rk ? Can you describe the null space of Rk ? 2 (b) Show that Ri Rj = 0 for i = j. .1444 5.0753 −1.4599  −1. it computes an orthonormal basis of the null space of a matrix.(b) Show that the sn−1 coefficient of X is given by − Tr A. (b) Find q3 . with associated eigenvalues λi . For this reason the Ri ’s are called the residue matrices of A.0880 −0. and then do a partial fraction expansion of (sI − A)−1 to find the R’s). You might find the matlab command null useful.11 Spectral resolution of the identity. find P and Q and then calculate the R’s. 10. For this reason the residue matrices are said to constitute a resolution of the identity. 9709 −0. Thus. . 84 .e.4575 3. . then the state remains indefinitely in the positive quadrant.1164 (a) What are the eigenvalues of A? Is the system stable? You can use the command eig in matlab.13 Positive quadrant invariance.15 Volume preserving flows. i. (c) Find the matrix Z such that Zx(t) gives x(t + 15). (d) Find the matrix Y such that Y x(t) gives x(t − 20). we have x(t) → 0 as t → ∞ for any x(0). (b) Plot a few trajectories of x(t). Thus. .14 Some matlab exercises. (b) True or False: if x = Ax is PQI. Verify that the qualitative behavior of the system is consistent with the eigenvalues you found in part (14a). We say the system is positive quadrant invariant (PQI) if whenever x1 (T ) ≥ 0 and x2 (T ) ≥ 0. 1. we have x1 (t) ≥ 0 and x2 (t) ≥ 0 for all t ≥ T . 10. 10. S(t) is the image of the set S under the linear transformation etA . Thus Y reconstructs what the state was 20 seconds ago. What are the conditions on A so that the flow preserves volume. n = 0.0880 −0.e. n = 0.1387 1.. Z is the ‘15 seconds forward predictor matrix’. .e..0428 4. . We say that this (timevarying) linear dynamical system is stable if every trajectory converges to zero. (f) Find x(0) such that x(10) = [1 1 1 1]T . We ˙ can propagate S along the ‘flow’ induced by the linear dynamical system by considering S(t) = eAt S = { eAt s | s ∈ S }. A(t) switches between the two values A1 and A2 every second.0481  . Note: The A matrix is available on the class web site in the file inv plane matrix. . x2 (t).. To do this you can use the matrix exponential command in matlab expm (not exp which gives the element-by-element exponential of a matrix). if the state starts inside (or enters) the positive (i. i. x3 (t) and x4 (t). vol S(t) = vol S for all t? Can the flow x = Ax be stable? ˙ Hint: if F ∈ Rn×n then vol(F S) = | det F | vol S. 2. Your conditions should be as explicit as possible.9229  −4.(d) If x(t) is in the invariant plane what can you say about the components of the vector QT x(t)? (e) Using the result of part (12d) verify that the trajectory you found in part (12c) is in the invariant plane. x1 (t). i. 10. Try to express the ˙ conditions in the simplest form.4599  −1.1005 1.1444 5.16 Stability of a periodic system. Consider the linear dynamical system x = A(t)x where ˙ A(t) = A1 A2 2n ≤ t < 2n + 1.e. then the eigenvalues of A are real.m and is equal to   −0. (e) Briefly comment on the size of the elements of the matrices Y and Z. A=  −2..m.0939 2. for a few initial conditions. In other words. Consider the continuous-time system x = Ax where A can be found in ˙ sys_dynamics_matA. 1. 2.0753 −1. Find the conditions on A1 and A2 under which the periodic system is stable. Suppose we have a set S ⊆ Rn and a linear dynamical system x = Ax. (a) Find the precise conditions on A under which the system x = Ax is PQI. We consider a system x = Ax with x(t) ∈ R2 (although the results ˙ of this problem can be generalized to systems of higher dimension).9859 −3. 2n + 1 ≤ t < 2n + 2. In other words. where F S = { F s | s ∈ S }. first) quadrant. ˙ 10.8847 −0.0510 −5. The matrix exponential is defined by the series +∞ eA = I + k=1 1 k A . . write x(t) in terms of the residue matrices Ri and associated eigenvalues λi . Run your simulation from t = 0 to t = 2. time-invariant. 0 −1 Write the solution x(t) for this dynamics matrix.10. plot the first entry. ˙ then the Laplace transform of x(t) is given by X(s) = (sI − A) −1 x(0) = x0 .5. you’ll obtain the sequence resulting from the discrete-time LDS y(k + 1) = (I + hA)y(k). For the number of steps N .. . For each of the four runs. use the values 10. the value of the first entry of x(t) at t = 2. (d) Error in Euler approximation. 100. What is the final error ǫ = z1 (4) − x1 (2)? Note: The matlab function expm uses a much more efficient algorithm to compute the matrix exponential. We have seen in class that if x(t) is the solution to the continuous-time.17 Computing trajectories of a continuous-time LDS. x0 . (The residue matrices are defined in the previous problem. . we can obtain x(t) from the inverse Laplace transform of the resolvent of A: x(t) = L−1 (sI − A) −1 x0 . with the initial condition x0 = [ 2 − 1 ] . . 1000. For each run. . Hence. . . given by ǫ = y1 (N ) − x1 (2). . i = 1. (c) Forward Euler approximation.e. go ahead and compute the corresponding final error ǫ. compute the final error in x1 . k! Compute 4 iterates of the discrete-time LDS z(k + 1) = Bz(k). linear dynamical system x = Ax. . . Plot ǫ as a function of N on a logarithmic scale (hint: use the matlab function loglog). k = 0. expm requires about the same computational effort as is needed to add the first ten terms of the series. i. Compute x1 (2). . with N steps. . with different step-sizes h. y1 (k). n. k! With A as above and h = 0. with z(0) = x0 .) (b) Consider once again the matrix 1 3 A= . (If you’re curious. compute an approximation to the trajectory x(t) by Euler approximation. N − 1 T with y(0) = x0 . (a) Assuming that A ∈ Rn×n has n independent eigenvectors. compute an approximation of the matrix exponential of hA by adding the first ten term of the series: 10 B=I+ k=1 1 (hA)k . For this example.) 85 . of each of the four sequences you obtain (with hk on the horizontal axis). On the same graph. Add z1 (k) to the plot of the y1 (k). How many steps do you estimate you would you need to achieve a precision of 10−6 ? (e) Matrix exponential. k = 0. With this same A and x0 . 3. but the result is much more accurate. and 10000 (with the corresponding step-size h = 2/N ). and the goal is to drive the vehicles to this configuration. . We let v1 . . n− 1.19 Output response envelope for linear system with uncertain initial condition. In the second. We consider the autonomous linear dynamical system x = Ax. the vehicles start out with vehicle 1 in the leftmost position. and y. . i. . Two one-second experiments are performed.m. . with unit spacing between adjacent vehicles. We define the spacing between vehicle i and i + 1 as si (t) = yi+1 (t) − yi (t). .e. and the resulting output. . On the same axes. .. n. • Right looking control is based on the spacing to the vehicle to the right. i. . . . over all possible initial conditions. .20 Alignment of a fleet of vehicles. 86 i = 1. these spacings are all one. we define the minimum output or lower output envelope as y(t) = min{y(t) | x(0) − x0 ≤ r}.) In a similar way. (Here you can choose a different initial condition for each t.e. over all possible initial conditions. (d) More generally. x(0) = [1 2]T and x(1) = [5 − 2]T ... ˙ vi = u i − vi . by first computing the matrix exponential.10. . to align the vehicles.. plot ynom .e. and so on. We define the maximum output or upper output envelope as y(t) = max{y(t) | x(0) − x0 ≤ r}. (b) Carry out your method on the problem data in uie_data. you are not required to find a single initial condition. vn denote the velocities of the vehicles. given x(0) = [3 − 1]T . the nominal output. . we only know that it lies in a ball of radius r centered at the point x0 : x(0) − x0 ≤ r. i. over the range 0 ≤ t ≤ 10. . y. and all stationary. 10. . We call this configuration aligned. given the problem data A. which move along a line with (scalar) positions y1 . We call x0 the nominal initial condition. versus t. . We use the control law ui (t) = si (t) − 1. vi = 0. What conditions must be satisfied for your procedure to work? 10. n − 1.5) or explain why you cannot (x(0) = [3 − 1]T ). and r. first vehicle at position 1. . We consider a fleet of vehicles. (When the vehicles are aligned. n.e. the maximum possible value of the output at time t. The goal is for the vehicles to converge to the configuration yi = i. y(t) = Cx(t). . . . C. i.e. yn . . . x0 . the minimum possible value of the output at time t. In the first.18 Suppose x = Ax with A ∈ Rn×n . and u1 . . .. un the net forces applied to the vehicles. . i. The vehicle motions are governed by the equations y i = vi . . (a) Explain how to find y(t) and y(t). where x(t) ∈ Rn and y(t) ∈ R. with vehicle n in the rightmost position. (a) Find x(1) and x(2). ˙ (c) Either find x(1. (Here we take each vehicle mass to be one. for i = 1. describe a procedure for finding A using experiments ˙ with different initial values. for x = Ax with A ∈ Rn×n . followed by vehicle 2 to its right. .) We assume that y1 (0) < · · · < yn (0). We do not ˙ know the initial condition exactly. labeled 1.) We will investigate three control schemes for aligning the fleet of vehicles. (b) Find A. and include a damping term in the equations. ynom (t) = CetA x0 . i = 1. . x(0) = [1 1]T ˙ and x(1) = [4 − 2]T . . i. • Left and right looking control adjusts the input force based on the spacing errors to the vehicle to the left and the vehicle to the right: ui (t) = si (t) − 1 si−1 (t) − 1 − . (Give the time. say so. Among the schemes that do work.7. . Be sure to give us times of collisions or closest approach with an absolute precision of at least 0. uses a right looking control scheme. un (t) = −(yn (t) − n). This scheme requires all vehicles to have absolute position sensors.7) = 0 means that vehicles 3 and 4 collide at t = 5. s3 (5. say so. 7). say so. . which has no vehicle to its left. . . 87 . the minimum spacing that occurs. In the questions below..) In this part of the problem you can ignore the issue of vehicle collisions. In other words.. determine if a collision occurs. i. the others only need a measurement of the distance to their righthand neighbor. we consider the specific case with n = 5 vehicles. n − 1. spacings that pass through zero.e. . 0).e. . find the earliest collision. The rightmost vehicle uses the control law un (t) = −(yn (t) − n). This scheme requires vehicle n to have an absolute position sensor. and the minimum spacing. the vehicles involved.e.e. n − 1. If the vehicles do not collide. which one gives the fastest asymptotic convergence to alignment? (If there is a tie between two or three schemes. . two pairs of vehicles collide at the same time. two or more pairs of vehicles have the same distance of closest approach. i. i.e. . . i = 1. which applies a force proportional to its position error. 5. 0.e. . . 0. which occur when any spacing between vehicles is equal to zero.) If there is a tie..7. 0. (b) Collisions. giving the time and the vehicles involved.. we apply a force on vehicle i proportional to its spacing error with respect to the vehicle to the right (i. find the point of closest approach. v = (0. between any pair of vehicles. vehicle i + 1).for vehicles i = 1. but the other vehicles only need to measure the distance to their neighbors..1. 2. which corresponds to the vehicles with zero initial velocity. This control law has the advantage that only the rightmost vehicle needs an absolute measurement sensor. n. . but not in the aligned positions. For each of the three schemes above (whether or not they work). u1 (t) = s1 (t) − 1. no matter what the initial positions and velocities are. 3. and the first vehicle. ‘Vehicles 3 and 4 collide at t = 7. 2 2 i = 2. (For example. The rightmost vehicle uses the same absolute error method as in right looking control.) We take the particular starting configuration y = (0. (For example. • Independent alignment is based on each vehicle independently adjusting its position with respect to its required position: ui (t) = −(yi (t) − i). (a) Which of the three schemes work? By ‘work’ we mean that the vehicles converge to the alignment configuration. If a collision does occur.’) If there is a tie. In this problem we analyze vehicle collisions. for t ≥ 0. i. in the opposite direction. The problem data (i.  T = 10.22 Optimal initial conditions for a bioreactor. (b) Carry out your method on the specific instance with data    3. with initial temperature 95◦ C. Component i has (positive) value (or cost) ci . B = 1. for any x(0) with nonnegative components.0 0 0 0. (The vector 20 · 1.3 0. Show that the solution of x(t) = a(t)x(t).  .1 0  .10. c =  1. You can assume that A is such that. At time t = 0 boiling water. ˙ where x(t) ∈ Rn is the state. that maximizes the total value at time T . where x(t) ∈ R. All of these are in degrees C. x(0)i = B/(nci )). with all entries nonnegative. instantaneously). We now give the thermal model used. i.1 2.) (a) Explain how to solve this problem. things you know) are A.3 0  0.6  0 0. The vector x(t) ∈ Rn+1 gives the temperature distribution at time t: x1 (t) is the liquid (water or espresso) temperature at time t. (You can just differentiate this expression. with xi (t) representing the total mass of species or component i at time t.23 Optimal espresso cup pre-heating.1  Give the optimal x(0). T . The dynamics of a bioreactor are given by x(t) = Ax(t). and B.  . represents the ambient temperature.21 Scalar time-varying linear dynamical system. (We ignore any extra cost that would be incurred in separating the components. xn+1 (t) are the temperatures of the elements in the cup.e. The dynamics are d (x(t) − 20 · 1) = A(x(t) − 20 · 1). for any t ≥ 0. under a budget constraint.e.2 0.1 0. . with t in seconds. is poured into an espresso cup. and espresso. Give us the terminal value obtained when the initial state has equal mass in each component. with n > 1.5 0. at 100◦ C. . Also give us the terminal value obtained when the same amount.4 0. after P seconds (the ‘pre-heating time’). you are to choose x(0). Compare this with the optimal terminal value. if and only if the off-diagonal entries of A are nonnegative. 10. dt where A ∈ R(n+1)×(n+1) .  .. x(0) = α1.1 0.) Find a specific ˙ example showing that the analogous formula does not hold when x(t) ∈ Rn . where B is a given positive budget. is poured in. We take the temperature of the liquid in the cup (water or espresso) as one state. is spent on each initial state component (i. .1 0. with α adjusted so that the total initial cost is B. c. Compare this with the optimal terminal value. 20 88 . x(t) will also have all components nonnegative. B/n. .. (You can assume this operation occurs instantaneously. More specifically. ˙ is given by t x(t) = exp 0 a(τ ) dτ x(0).3    A=  0.2 0.) The initial temperature distribution is   100  20    x(0) =  .) The espresso is then consumed exactly 15 seconds later (yes. The problem is to choose the pre-heating time P so as to maximize the temperature of the espresso when it is consumed.e. that satisfies cT x(0) ≤ B. so the total value (or cost) of the components at time t is cT x(t).. and the associated (optimal) terminal value cT x(T ). by the way. and show that it satisfies x(t) = a(t)x(t). the water is poured out.) Your job is to choose the initial state.  . 10. for the cup we use an n-state finite element model. (This occurs. and x2 (t). with all components 20. Note that the dynamics of the system are the same before and after pre-heating (because we assume that water and espresso behave in the same way. In addition to A. You will find it in espressodata. thermally speaking).m. to 95. Te (95).3◦ C. which gives an espresso temperature at consumption of 62. submit your code. of course.) 89 . and give final answers. Explain your method. respectively. Give both to an accuracy of one decimal place. We have very generously derived the matrix A for you. the liquid temperature changes instantly from whatever value it has. espresso and preheat water temperatures Ta (which is 20). the other states do not change.5 s. the file also defines n. which must include the optimal value of P and the resulting optimal espresso temperature when it is consumed. the ambient.’ (This is not the correct answer.At t = P . and Tl (100). as in ‘P = 23. and. .Lecture 11 – Eigenvectors and diagonalization 11. The set { z | α ≤ wT z ≤ β } is referred to as a slab.3 Another formula for the matrix exponential. 1. Suppose w is a left eigenvector of A ∈ Rn×n with real negative eigenvalue λ. each node determines its relative offset with each of its neighboring nodes. (b) Let α < β. . node 2 does not know any of the absolute clock offsets x1 . ˙ 11. Thus we have xi (t + 1) = xi (t) + ai (t).g. 11. n. n. The new offset takes effect at the next interval. Then it computes 2 90 . Draw a picture in R2 . Briefly explain this terminology. . . (a) Find a simple expression for wT eAt . which are labeled 1. .2 Consider the linear dynamical system x = Ax where A ∈ Rn×n is diagonalizable with eigenvalues ˙ λi . eigenvectors vi .) While node i does not know its absolute offset xi .. The shift or offset of clock i. but are not (initially) synchronized. You will establish the matrix analog: for any A ∈ Rn×n . x2 . while xi < 0 means the ith clock is running behind the standard clock. and left eigenvectors wi for i = 1. For example. Through this exchange each node is able to find out the relative time offset of its own clock compared to the clocks of its neighboring nodes. . it is able to adjust it by adding a delay or advance to it. (But remember. . the nodes exchange communications messages. ea = limk→∞ (1 + a/k)k . Describe the trajectories qualitatively. eA = lim (I + A/k)k . . 1 4 3 5 6 Each node has a clock. . Assume that λ1 > 0 and ℜλi < 0 for i = 2. with respect to some absolute clock (e. We refer to one node as a neighbor of another if they are connected by a link. .. you can assume A is diagonalizable. with communication links shown as lines between the nodes. Thus xi > 0 means the clock at node i is running in advance of the standard clock. . 2 .1 Left eigenvector properties.4 Synchronizing a communication network. in terms of x(0). NIST’s atomic clocks or the clock for the GPS system) will be denoted xi . which we denote t = 0. At each interval. where ai (t) is the adjustment made by the ith node to its clock in the tth interval. . The graph below shows a communication network. 6. At discrete intervals. You might remember that for any complex number a ∈ C. we introduce the numbers xi only so we can analyze the system. what happens to x(t) as t → ∞? Give the answer geometrically. or x5 . Specifically. Hint: diagonalize. node 2 is able to find out the differences x1 − x2 and x5 − x2 . The clocks run at the same speed. An engineer suggests the following scheme of adjusting the clock offsets. (c) Show that the slab { z | 0 ≤ wT z ≤ β } is invariant for x = Ax. 11. The nodes do not know their own clock offsets (or the offsets of any of the other clocks). . k→∞ To simplify things. . at year 3. and x1 (t) (the number of 0 year-olds) denotes the number of people born since the last census. Thus. n. Thus the total births during a year is given by x1 (t + 1) = b1 x1 (t) + · · · + bn xn (t). and not on time t.. i = 1. or only for some initial offsets? You are welcome to use matlab to do some relevant numerical computations. i. using a discrete-time linear dynamical system model. e.. • Death rate.g. . and not on time t. she proposes to adjust each node’s clock by only half the average offset with its neighbors. so 0 < sk < 1. x(t) → 0 as t → ∞)? Do the clocks become synchronized with each other (i. For example. this means: a2 (t) = 1 (x1 (t) − x2 (t)) + (x5 (t) − x2 (t)) . then the two nodes would just trade their offsets each time interval. we assume that dn = 1. In this problem we will study how some population distribution (say. We’ll assume that at least one of the bk ’s is positive. To avoid this. The birth rate depends only on age. . . n − 1. of people) evolves over time. January 1). e. you can imagine the units as millions. . n − 1. etc. denote time in years (since the beginning of the study). for t = 0. k = 1. 1. . 2 (c) Would you say this scheme is better or worse than the original one described above? If one is better than the other. but you must explain what you are doing and why. We assume n is large enough that no one lives to age n. . . of age i − 1. The vector x(t) ∈ Rn will give the population distribution at year t (on some fixed census date. . xi (t) is the number of people at year t.) The death rate coefficients satisfy 0 < di < 1. . The coefficient di is the fraction of people of age i − 1 who will die during the year. . (a) What happens? (b) Why? We are interested in questions such as: do all the clocks become synchronized with the standard clock (i. . so synchronization does not occur. . The death rate depends only on age. (If x3 (4) = 1. The coefficient bi is the fraction of people of age i − 1 who will have a child during the year (taking into account multiple births). n − 1.g. The birth rate coefficients satisfy bi ≥ 0. all people who make it to age n − 1 die during the year. age below 11 and over 60.e.. does it achieve synchronization faster. xk+1 (t + 1) = (1 − dk )xk (t). . . . ..) 11. We will not accept simulations of the network as an explanation.the average of these relative offsets. for node 2 we would have the adjustment a2 (t) = Finally.) 91 . • Birth rate.e. i = 1. (As mentioned above. where 1 ∈ Rn is the vector with all components 1. the question. 1. . Thus x5 (3) denotes the number of people of age 4.2 bothers you.) The total population at year t is given by 1T x(t). do all xi (t) − xj (t) converge to zero as t → ∞)? Does the system become synchronized no matter what the initial offsets are. . . Another engineer suggests a modification of the scheme described above. Thus we have. and treat them as real numbers. We’ll also ignore the fact that xi are integers. (Of course you’d expect that bi would be zero for non-fertile ages. 2 2 (x1 (t) − x2 (t)) + (x5 (t) − x2 (t)) . Let t = 0. how is it better? (For example. k = 1. The node then adjusts its offset by this average. . We define the survival rate coefficients as sk = 1 − dk .e.. say. . She notes that if the scheme above were applied to a simple network consisting of two connected nodes. Specifically. does it achieve synchronization from a bigger set of initial offsets. for node 2..5 Population dynamics. . but we won’t make that explicit assumption. .e. For each person born. 92 . if you prefer. . n.) (a) Express the population dynamics model described above as a discrete-time linear dynamical system. (h) Suppose that (b1 + · · · + bn )/n ≤ (d1 + · · · dn )/n. s1 · · · sk make it to age k. . (d) Survival normalized variables. . i. and assume that all components of x(0) and z(0) are positive.) (e) Let x and z both satisfy our population dynamics model. Let KN denote the number of allowed sequences of length N . then 1T x(t) → 0 as t → ∞. n. Determine whether each of the next four statements is true or false. find a matrix A such that x(t + 1) = Ax(t). .) (f) All the eigenvalues of A are real. Find the rate of this code.e. s1 s2 make it to age 2. 5. Compare it to the rate of the code which consists of all sequences from an alphabet of 5 symbols (i. and by ‘false’ we mean false for some choice of coefficients consistent with our assumptions. so long as our initial population distribution x(0) has all positive components. . .. . That is. then xi (t) > 0 for i = 1. (Of course by ‘true’ we mean true for any values of the coefficients consistent with our assumptions.The assumptions imply the following important property of our model: if xi (0) > 0 for i = 1. s1 make it to age 1.e. the population goes extinct. . 2. Then the population that is initially larger will always be larger. with five symbols 1. ..6 Rate of a Markov code. (To use fancy language we’d say the system is positive orthant invariant. (b) Draw a block diagram of the system found in part (a). and in general. n. . and the following symbol transition rules: • 1 must be followed by 2 or 3 • 2 must be followed by 2 or 5 • 3 must be followed by 1 • 4 must be followed by 4 or 2 or 5 • 5 must be followed by 1 or 3 (a) The rate of the code. the ‘average’ birth rate is less than the ‘average’ death rate. . 2. (In words: we consider two populations that satisfy the same dynamics. Consider the Markov language described in exercise 12. The number R = lim log2 KN N (g) If dk ≥ bk for k = 1. (c) Find the characteristic polynomial of the system explicitly in terms of the birth and death rate coefficients (or. We define yk (t) = xk (t) s1 · · · sk−1 (with y1 (t) = x1 (t)) as new population variables that are normalized to the survival rate. Therefore we don’t have to worry about negative xi (t)... That is.e. x(t + 1) = Ax(t) and z(t + 1) = Az(t). . then 1T x(t) > 1T z(t) for t = 1.. in bits per symbol. find ˜ ˜ a matrix A such y(t + 1) = Ay(t). 11. 4. . i. . i. with no restrictions on which symbols can follow which symbols). Express the population dynamics as a linear dynamical system using the variable y(t) ∈ Rn . If 1T x(0) > 1T z(0). N →∞ (if it exists) is called the rate of the code. the birth and survival rate coefficients). 3. Then 1T x(t) → 0 as t → ∞. .i denote the number of allowed sequences of length N that start with symbol i. Let µi satisfy µ2 = λi .1 + · · · + FN. Let Λ = diag(λ1 .  . (If you think there is always a separating hyperplane for a linear system. Find the asymptotic fractions fi = lim FN. so in this case A1/2 is unambiguous.9 Separating hyperplane for a linear dynamical system. .i /KN .i /KN . . we have FN.. 0 0 ··· ··· ··· . λn ). N →∞ Please don’t find your answers by simple simulation or relatively mindless computation. ˙ (b) Find the characteristic polynomial of the system using the block diagram and show that A is nonsingular if and only if an = 0. Find T such that T −1 AT is diagonal. and let GN.5 = KN . Following the idea of part a. . N →∞ gi = lim GN. .(b) Asymptotic fraction of sequences with a given starting or ending symbol. (a) We say B ∈ Rn×n is a squareroot of A if B 2 = A. λn . . .7 Companion matrices. We say that the hyperplane defined by c is a separating hyperplane for this system if no trajectory of the system ever crosses the hyperplane.. then A−1 is a bottom-companion matrix with last row −[1 a1 · · · an−1 ]/an . and we write B = log A. where c ∈ Rn is nonzero. give the conditions on A under ˙ which there is no separating hyperplane. Is the logarithm unique? What if we insist on B being real? 11. left-. µn )T −1 is a squareroot of A. Suppose that A ∈ Rn×n is diagonalizable. 0        is said to be a (top) companion matrix. where ˙ A ∈ Rn×n and x(t) ∈ Rn . the nonnegative squareroot). . Show that B = i T diag(µ1 . (c) Show that if A is nonsingular. Therefore. (d) Find the eigenvector of A associated with the eigenvalue λ. This means it is impossible to have ˜ ˜ cT x(t) > 0 for some t. .e. These are referred to as top-. A matrix A of the form  −a1 −a2  1 0   0 1 A=  .i denote the number of allowed sequences of length N that end with symbol i. . (Note that if β = 0. or right-companion matrices. . the vector c = βc ˜ defines the same hyperplane. .1 + · · · + GN. Explain how to find all separating hyperplanes for the system x = Ax. 93 . There can be four forms of companion matrices depending on whether the ai ’s occur in the first or last row. it is conventional to take µi = λi (i. . we want to see (and understand) your method. Let FN. . . or first or last column. 11.) You can assume that A has distinct eigenvalues (and therefore is diagonalizable). for any trajectory x of the system. When λi are real and nonnegative. ··· −an−1 0 0 1 −an 0 0 . In particular. ˙ (a) Draw a block diagram for the system x = Ax. say so. Let x = Ax where A is top-campanion. find an expression for a logarithm of A (which you can assume is invertible). Thus. (b) We say B is a logarithm of A if eB = A.5 = GN. bottom-. 11. an invertible matrix T ∈ Cn×n and diagonal matrix Λ ∈ Cn×n exist such that A = T ΛT −1 . (e) Suppose that A has distinct eigenvalues λ1 .8 Squareroot and logarithm of a (diagonalizable) matrix. and cT x(t) < 0 for some other t. A squareroot is sometimes denoted A1/2 (but note that there are in general many √ squareroots of a matrix). .) Now consider the autonomous linear dynamic system x = Ax. A hyperplane (passing through 0) in Rn is described by the equation cT x = 0. . u(T − 1) that maximize the norm of the state for large t. . Explain how to do this. i = 1. . however. with n > 2. and then say. for what values of θ (between 0 and π) can you have an equi-angle set on Rn ? The angle θ = 0 always has an equi-angle set (just choose any unit vector u and set x1 = · · · = xn = u). independent eigenvectors v1 . To be more precise. The goal is to choose the inputs u(0). there are equi-angle sets for every value of θ. xn is an equi-angle set in Rn . and.e. i = j. . however.g. . . . To see this. An input u(0). xj ) = θ.11. . vn of A. u). with n > 2. where x(t) ∈ Rn .) 94 . . we’re searching for u(0). Ameas = A + E. e1 . The input is subject to a total energy constraint: u(0) 2 + · · · + u(T − 1) 2 ≤ 1. Be sure to summarize your final description of how to solve the problem. what is the maximum possible value θ can have? (b) Construct a specific equi-angle set in R4 for angle θ = 100◦ = 5π/9. and that the angle between any two of them is 100◦ . . . The question then arises. . satisfies ˜ ˜ x(t) ≥ x(t) for t large enough. for t ≥ T .. that satisfies the total energy constraint. You can assume that A is diagonalizable. u(t) ∈ Rm . . . etc. . “Take the limit as t → ∞” or “Now take t to be really large”. . v) = (v. pseudo-inverse. Also. . you can’t have an equi-angle set with angle θ = π.. . ˜ e. SVD. if xi = 1. . and the angle between any pair of the vectors is θ. you should not use limits in your solution. n. u(T − 1). . u(T − 1) that satisfies the total energy constraint. λn . (Since (u. . In Rn . . . with angle θ. n. x2 = cos θ sin θ . T − 1. each of the vectors has unit norm.10 Equi-angle sets.) 11. 1. . x2 ) = π). . . . . . x(0) = 0. and that it has a single dominant eigenvalue (which here. . Unless you have to. It’s easy to find such sets: just take x1 = 1 0 . . which is assumed to be small. You can use any of the ideas from the class. This problem is about estimating a matrix A ∈ Rn×n .. so (x2 . i. While A is not known. you only have to check 6 angles. . but also x3 = −x1 (since (x1 . But what other angles are possible? For n = 2. . . In R2 .g. . but we do have a noisy measurement of it. Attach matlab output to verify that your four vectors are unit vectors. . are not known. eigenvector decomposition. for any other input sequence u(0). 11. describe the values of θ for which there is an equi-angle set with angle θ. suppose that x1 . We consider the controllable linear system x(t + 1) = Ax(t) + Bu(t). you might find a clever way to find all the angles at once. . . u(T − 1) is applied over time period 0. . We’ll take θ to be between 0 and π. (Its eigenvalues λ1 . . we know the answer: any value of θ between 0 and π is possible. . x3 ) = π). (a) For general n. . for every value of θ there is an equi-angle set with angle θ. The matrix A is not known. . and so does θ = π/2 (just choose any orthonormal basis. e. j = 1. en . we have u(t) = 0. Let x1 . . . i. means that there is one eigenvalue with largest magnitude). we do know real.12 Estimating a matrix with known eigenvectors.11 Optimal control for maximum asymptotic growth. . and (xi . x3 ) = 0. . In other words. Then we have x2 = −x1 (since (x1 . For example you cannot explain how to make x(t) as large as possible (for a specific value of t). xn ∈ Rn . An orthonormal set is a familiar example of an equi-angle set. We say that they form a (normalized) equi-angle set. In particular. Here the matrix E is measurement error. . with θ = π/2. 7   0. SVD. 0. etc.  0. i. for example using A=randn(3).5 with the data  1.ˆ We will combine our measurement of A with our prior knowledge to find an estimate A of A. is done using the function reshape. i. and also by using the spectral mapping theorem.13 Real modal form. we choose A J= 1 n2 n i. .6  . that is consistent with the known eigenvectors. with elements taken from a (column by column).. Therefore norm(A(:)) gives the squareroot of the sum of the squares of entries of the matrix A.0 ˆ Be sure to check that A does indeed have v1 .14 Spectral mapping theorem. 0.) In matlab. you might not.5  . any difference is due to numerical round-off errors in the various compuations. . If by chance they are all real. and λ ∈ C. vn .9 1. if a is an mn vector. v3 as eigenvectors. 0.. Also. Find the eigenvalues of (I + A)(I − A)−1 by first computing this matrix.0 −0.3  11. You might find the following useful (but then again.) For A ∈ Rn×n . then reshape(a. Be sure to say whether your method finds the exact minimizer of J (except. This is called the spectral mapping theorem.) Find the eigenvalues of A. and some code that checks the results (i.e.n) is an m × n matrix. of course. by (numerically) finding its ˆ eigenvectors and eigenvalues. (b) Carry out your method  2. then f (λ) is an eigenvalue of f (A). we’ll just assume that this converges).2 −1. for numerical error due to roundoff). or an approximate solution.) or decompositions (QR. its Frobenius norm. say whether you can guarantee convergence. Generate a matrix A in R10×10 using A=randn(10).7 v1 =  0  . if A is a matrix. generate a random 3 × 3 matrix. low rank approximation.4 −0. eigenvalue decomposition. etc. computes S −1 AS to verify it has the required form). If your method is iterative. To do ˆ as the matrix that minimizes this. The inverse operation.) 95 .m. we define f (A) as f (A) = a0 I + a1 A + a2 A2 + · · · (again. (You should get very close agreement.j=1 ˆ (Ameas − Aij )2 ij ˆ among all matrices which have eigenvectors v1 . Hint. then A(:) is a (column) vector consisting of all the entries of A. 0. give the value of J for A. Show that f (A)v = f (λ)v (ignoring the issue of convergence of series).e. given by a power series expansion f (u) = a0 + a1 u + a2 u2 + · · · (where ai = f (i) (0)/(i!)). 11. where v = 0. Your solution should include a clear explanation of how you will find S. least-norm. (Thus. . .6 v3 =  0.3 v2 =  0.. You can use any of the methods (least-squares.) ˆ (a) Explain how you would find A. Gauss-Newton. A is the matrix closest to our measurement. in the mean-square sense.0 2. a matrix S such that S −1 AS has the real modal form given in lecture 11. (You can assume that we only consider values of u for which this series converges. i. To illustrate this with an example.e. v2 . i.) from the course. We conclude that if λ is an eigenvalue of A.0 Ameas =  0. writing a vector out as a matrix with some given dimensions. For example.. Suppose f : R → R is analytic.e.6  . the source code that you use to find S. Suppose that Av = λv. Find the real modal form of A. please generate a new instance of A..7   0. (The entries of A will be drawn from a unit normal distribution. then finding its eigenvalues. written out column by column.e. is then given by x(t)i = x(t)i − ri x(t)i + si R(t). . using ideas from the course. the spending in sector i is si R(t).15  96 . with ri the tax rate for sector i. i.6  0. . Let x(t) ∈ Rn represent the pre-tax economic activity at the beginning of year t. t = 0. Now find the growth rate with the tax rate set to zero.4 0.and post-tax activity levels are related as follows. at the beginning of year t. 1.2  Find the growth rate.. . We let S(t) = n i=1 x(t)i denote the total economic activity in year t.1 0. . Suppose that A ∈ Rm×n and B ∈ Rn×m . where E ∈ Rn×n is the input˜ output matrix of the economy. n. These rates all satisfy 0 ≤ ri < 1. These spending proportions n satisfy si ≥ 0 and i=1 si = 1. You are welcome (even. . ˜ The pre. The total government revenue is then R(t) = rT x(t). Show that if λ ∈ C is a nonzero eigenvalue of AB. across n sectors. with s ∈ Rn giving the spending proportions in the sectors. λ = 0.16 Tax policies. s. This total revenue is then spent in the sectors proportionally. We let x(t) ∈ Rn denote the post-tax economic ˜ activity. but we want the value using the expression found in part (a). which accounts for the government taxes and spending.3 0.15 Eigenvalues of AB and BA. and we let G = lim S(t + 1) S(t) t→∞ denote the (asymptotic) growth rate (assuming it exceeds one) of the economy.1   0.   0. encouraged) to simulate the economic activity to double-check your answer. Hint: Suppose that ABv = λv. which we assume (for simplicity) takes place at the beginning of each year.e.2  E= 0.4 0. which we rule out as essentially impossible).1 0.2 0. Conclude that the nonzero eigenvalues of AB and BA are the same.11. In this problem we explore a dynamic model of an economy.  . You may assume that a matrix that arises in your analysis is diagonalizable and has a single dominant eigenvalue.3 0.25   r=  0. You can assume that all entries of E are positive. . . with x(t)i being the pre-tax activity level in sector i.3 0. ..15  0. for all t ≥ 0. (a) Explain why the growth rate does not depend on x(0) (unless it exactly satisfies a single linear equation.2 0.2 0. . which will imply that all entries of x(t) and x(t) are positive. r = 0 (in which case s doesn’t matter). 11. Express the growth rate G in terms of the problem data r. then λ is also an eigenvalue of BA. .3   s=  0. i. Economic activity propagates from year to year as x(t + 1) = E x(t).7 0. n.1 0.2 0.45  0.1  . (These assumptions aren’t actually needed—they’re just to simplify the problem for you. and E. 0. . including the effects of government taxes and spending. Construct a w = 0 for which BAw = λw. The post-tax economic activity in sector i. We will assume that all entries of x(0) are positive. 0. . ˜ i = 1. across n sectors. an eigenvalue λ1 that satisfies |λ1 | > |λi | for i = 2.) (b) Consider the problem instance with data  0. The government taxes the sector activities at rates given by r ∈ Rn .4  .e. where v = 0. ) “False” means the statement isn’t true. As another example. B ∈ Rn×n . B ∈ Rn×n are both diagonalizable. . λn . What are the eigenvalues of A? What (if anything) can you say about x(k) for k < n and k ≥ n.) Now consider the (time-invariant) linear dynamical system x = Ax. then AB is diagonalizable. . (There are also matrices for which it does hold. What we mean by “true” is that the statement is true for all values of the matrices and vectors that appear in the statement. (b) If A ∈ R3×3 satisfies A + AT = 0. distinct eigenvalues λ1 . with real. where x(t) ∈ Rn . then A is diagonalizable. (c) If Ak = 0 for some integer k ≥ 1. Here A ∈ Rn×n . but you can assume that the dimensions are such that all expressions make sense. (The period T can appear in your answer. or stable systems (for which all trajectories converge to 0.1 Some true/false questions. No justification or discussion is needed for your answers. 12. and AB = 0. (a) If A ∈ Rm×n and B ∈ Rn×p are both full rank. then every eigenvector of AB is an eigenvector of BA. a brief explanation will suffice. and no matter what values A and B have. . it can fail to hold for some values of the matrices and vectors that appear in it. For example.Lecture 12 – Jordan canonical form 12. be the same). Describe the precise conditions on A under which ˙ all trajectories of x = Ax are asymptotically T -periodic.g.4 Jordan form of a block matrix. λn . hence are asymptotically T -periodic). Give your answer in terms of the Jordan form ˙ of A. to isolate it from any (brief) discussion or explanation. then every eigenvalue of AB is an eigenvalue of BA. We’ll let v1 . and is diagonalizable. the statement A2 = A is false.) Make sure your answer works for ‘silly’ cases like A = 0 (for which all trajectories are constant. 97 . You do not need to formally prove your answer. . (f) If A.. e. the statement “A + B = B + A” is true. an identity matrix. We consider the block 2 × 2 matrix C= A I 0 A . then A is singular. You can’t assume anything about the dimensions of the matrices (unless it’s explicitly stated). (g) If A is nonsingular and A2 is diagonalizable. (e) If A.3 Asymptotically periodic trajectories. however. . in other words. vn denote (independent) eigenvectors of A associated with λ1 . But that doesn’t make the statement true. . Determine if the following statements are true or false. . (a) Find the Jordan form J of C. then I − A is nonsingular. without knowing x(0)? 12. 12. the statement holds. (a) Find x(t) in terms of x(0). . . Mark your answer clearly. . We say that x : R+ → Rn is asymptotically T -periodic if x(t + T ) − x(t) converges to 0 as t → ∞. because no matter what the dimensions of A and B are (they must.2 Consider the discrete-time system x(t + 1) = Ax(t). because there are (square) matrices for which this doesn’t hold. (b) Find a matrix T such that J = T −1 CT . Be sure to explicitly describe its block sizes. where A ∈ Rn×n . hence asymptotically T -periodic). B ∈ Rn×n . . then n ≥ m + p. (We assume T > 0 is fixed. (b) Suppose that det(zI − A) = z n . (d) If A. . and feedthrough matrices of the overall system.1 Interconnection of linear systems. output. In this problem you consider the specific interconnection shown below: w1 w2 u S v T y Here.1 0. Note that the subscripts in the matrices above. we consider a specific system with input matrices are    1 0. 2.3  . and the input u(t) is a scalar (hence. Express the overall system as a single linear dynamical system with input. state.5 0.7 0. B =  A=  0  0. Now the problem. 1.0 −0.8 −0. and output given by u v . compatible) dimensions. y = H2 z + Jw2 .Lecture 13 – Linear dynamical systems with inputs and outputs 13.   .7 −0. A ∈ Rn×n and B ∈ Rn×1 ). . refer to different matrices. there are two subsystems S and T . where x(t) ∈ Rn . You can assume all the matrices are the correct (i. Subsystem S is characterized by x = Ax + B1 u + B2 w1 .1  0 0. x z ..  . y.5 98 n = 4. which are themselves linear systems.4 −0. as in B1 and B2 . (a) Find the matrix CT such that  u(T − 1)   .9 −0.6 0. But be sure to let us know what assumptions you are making.7 0. ˙ w1 = H1 z. 13. . ˙ and subsystem T is characterized by z = F z + G1 v + G2 w2 . The initial state is x(0) = 0. Consider the discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t).1 0. .5  1  0.4 −0.2 Minimum energy control. go ahead.e. . respectively. x(T ) = CT  . w2 = Cx + D1 u + D2 w1 . Often a linear system is described in terms of a block diagram showing the interconnections between components or subsystems. If you need to make any assumptions about the rank or invertibility of any matrix you encounter in your derivations. We don’t specify the dimensions of the signals (which can be vectors) or matrices here. Be sure to explicitly give the input. The dynamics and   .  u(1)  u(0)  (b) For the remainder of this problem. dynamics. t = 0. u(T − 1). and the plot of the minimum energy as a function of T . where K ∈ Rm×p is the feedback gain matrix.2 0. .5 −0. with A ∈ Rn×n . u(t) = Ky(t). B =  0  .e.3 What is the smallest T for which we can find inputs u(0). that is. B ∈ Rn×m . . u(T − 1) is E(T ) = T −1 t=0 (u(t)) . .9 99 For a given T (greater than Tmin ) and xdes . . 13. Consider the discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t).. such that x(T ) = xdes ? What are the corresponding inputs that achieve xdes at this minimum time? What is the smallest T for which we can find inputs u(0).7  . y(t) = Cx(t). .0 0. . 1 C= 0 1 0 . find the minimum energy inputs that achieve x(T ) = xdes . evaluate the corresponding input energy. and K.5 1. (b) Consider the single-input. . B. Note: There is a direct way of computing the assymptotic limit of the minimum energy as T → ∞.8  2. 0 0. .Suppose we want the state to be xdes at time T . 2 For each T ranging from Tmin to 30. B. (c) Suppose the energy expended in applying inputs u(0). C. Plot Emin (T ) as a function of T . . . u(T − 1).1  . We’ll cover these ideas in more detail in ee363. such that x(T ) = xdes for any xdes ∈ R4 ? We’ll denote this T by Tmin . The resulting state trajectory is identical to that of an autonomous system. (You should include in your solution a description of how you computed the minimum-energy inputs.3 Output feedback for maximum damping. −0. But you don’t need to list the actual inputs you computed!) (d) You should observe that Emin (T ) is non-increasing in T . for any A.3   xdes =   −0.1 A =  −0. In output feedback control we use an input which is a linear function of the output. Consider the desired state   0. ¯ (a) Write A explicitly in terms of A.1 0. . single-output system with     1 0. C ∈ Rp×n . For each T . Show that this is the case in general (i.0 0. . and xdes ). ¯ x(t + 1) = Ax(t). . how can you compute the inputs which achieve x(T ) = xdes with the minimum expense of energy? Consider now the desired state   −1  1   xdes =   0 . which we denote by Emin (T ). do so.5 Two separate experiments are performed for t ≥ 0 on the single-input single-output (SISO) linear system x = Ax + Bu. y = Cx + Du + g.In this case. and y are the state. u. If not. C1 . and output of a linear dynamical ˜ ˜ ˜ system. In the first experiment. you can assume D1 = 0. u(t) = e−t and the resulting output is y(t) = e−3t + e−2t . C. Affine functions are more general than linear functions. and you can assume that Kopt ∈ [−2. which result when b = 0. u = Gx + Hy. Find state equations for the cascade system: u H1 (s) H2 (s) y Use the state x = [xT xT ]T . Your answer ˙ ˙ will have the form: x = Ex + F y. and output u. (a) Start with x = Ax + Bu. and solve for x and u in terms of x and y. y = Cx + Du. D2 = 0. which have the form x = Ax + Bu + f. In the second. 13. (a) Two linear systems (A1 .01. (To simplify. y = Cx + Du. (b) Can you determine A.) (c) Find the dual of the LDS found in (a). x(0) = [1 2 − 1 ]T ˙ (the initial condition is the same in each experiment). 13. A function f : Rn → Rm is called affine if it is a linear function plus a constant. Assuming A is invertible. D1 ) and (A2 .e. (a) Can you determine the transfer function C(sI − A)−1 B + D from this information? If it is possible. You will find a linear system with transfer function H(s)−1 . ˙ Fortunately we don’t need a whole new theory for (or course on) affine systems. have transfer functions H1 (s) and H2 (s). D2 = 0.7 Inverse of a linear system. By maximally damped.. D2 ) with states x1 and x2 (these are two column vectors. u(t) = e−3t and the resulting output is y(t) = 3e−3t − e−2t . B2 . Draw a block diagram of the dual system as a cascade connection of two systems. we mean that the state goes to zero with the fastest asymptotic decay rate (measured for an initial state x(0) with non-zero coefficient in the slowest mode.) The question is: find the feedback gain Kopt such that the feedback system is maximally damped. the block diagram corresponding to the dual system is the original block diagram.6 Cascade connection of systems. i.” with all arrows reversed.4 Affine dynamical systems. or D? 13. We can generalize linear dynamical systems to affine dynamical systems. a simple shift of coordinates converts it to a linear dynamical system. find two linear systems consistent with all the data given which have different transfer functions. Show that x. 2].) Remark: quite generally. “turned around. C2 . B1 . (To simplify. 13.) Hint: You are only required to give your answer Kopt up to a precision of ±0. of the form f (x) = Ax + b. you can assume D1 = 0. where D is square and invertible. respectively. not two scalar components of one vector). 1 2 (b) Use the state equations above to verify that the cascade system has transfer function H2 (s)H1 (s). define x = x + A−1 f ˜ and y = y − g + CA−1 f . input. 100 . the feedback gain matrix K is a scalar (which we call simply the feedback gain. input y. Suppose H(s) = C(sI − A)−1 B + D. Interpret the result as a linear system with state ˙ x. B. k = 0. where δ is a delay or offset in the input update. are synchronized. .. . 1. which is another mass of value m3 = 3. 13. Consider the mechanical system shown below. Find a discrete-time LDS with ud as input and yd as output.. .(b) Verify that (G(sI − E)−1 F + H)(C(sI − A)−1 B + D) = I. 13. b2 = 2 to a platform.e. In the lecture notes we considered sampling a continuous-time system in which the input update and output sampling occur at the same time. . k = 0.9 Static decoupling. y = Cx + Du.8 Offset or skewed discretization. y2 . k2 = 2 and damping b1 = 1. . Consider the continuous-time LDS x = Ax + Bu. yd (k) = y(kh). where h > 0 (i. 101 . . We define ˙ the sequences xd and yd as xd (k) = x(kh). In this problem we consider what happens when there is a constant time offset or skew between them (which often happens in practice). The input u is given by u(t) = ud (k) for kh + δ ≤ t < (k + 1)h + δ.e. the state and output are sampled every h seconds). 1. Hint: use the following “resolvent identity:” (sI − X)−1 − (sI − Y )−1 = (sI − X)−1 (X − Y )(sI − Y )−1 which can be verified by multiplying by sI − X on the left and sI − Y on the right. i. Forces u1 and u2 are applied to the first two masses. y1 u1 u2 y2 m1 m2 k1 y3 b1 b2 k2 m3 k3 b3 Two masses with values m1 = 1 and m2 = 2 are attached via spring/damper suspensions with stiffnesses k1 = 1. The platform is attached to the ground by a spring/damper suspension with stiffness k3 = 3 and damping b3 = 3. with 0 ≤ δ < h. Give the matrices that describe this LDS. and y3 . The displacements of the masses (with respect to ground) are denoted y1 . (c) Now consider a specific case: A= 0 3 0 0 . Express the matrix F explicitly in terms of A and B.e. (b) Plot the step responses matrix.05B1 (t − 1). and assume they can be bought in any amount. since the norm of the state is made as small as possible at every step. i. In this problem you will analyze this scheme. two-. where A ∈ Rn×n and B ∈ Rn×k . A.e. (a) Find an explicit expression for the proposed input u(t) in terms of x(t). where ycmd : R+ → R2 . y2 and y3 ).06B2 (t − 2). i. Compare the behavior of x(t + 1) = Ax(t) (i. B= 1 1 . as found in (10a)).e. principle plus 7% interest on the amount of three-year CDs bought three years ago. We consider the discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t). Determine whether each of these systems is stable. . (c) Find the DC gain matrix H(0) from inputs u1 and u2 to outputs y1 and y2 ..e. 13.11 Analysis of investment allocation strategies. a total of 1 is to be invested at t = 0. principle plus 5% interest on the amount of one-year CDs bought one year ago. the original system with u(t) chosen by the scheme described above) for a few initial conditions. the orginal system with u(t) = 0) and x(t + 1) = F x(t) (i. k < n. An engineer proposes the following simple method: at time t. u = [u1 u2 ]T . and three-year certificates of deposit (CDs) with interest rates 5%. ˙ ˙ ˙ Ignore the effect of gravity (or you can assume the effect of gravity has already been taken into account in the definition of y1 . 1. i.e.. We assume that B1 (0) + B2 (0) + B3 (0) = 1. i.. y2 and y3 .. is full rank. • B2 (t) denotes the amount of two-year CDs bought at period t. Briefly interpret and explain your plots. 102 . we let u = H(0)−1 ycmd .) • B1 (t) denotes the amount of one-year CDs bought at period t. • 1..10 A method for rapidly driving the state to zero. p(t). Each year or period (denoted t = 0. In order to make the steady-state deflections of masses 1 and 2 independent of each other.07B3 (t − 3)..e. respectively. . Plot the step responses from ycmd to y1 and y2 . at period t is a sum of six terms: • 1. and B. (We ignore minimum purchase requirements. • B3 (t) denotes the amount of three-year CDs bought at period t. 13..) The total payout to the investor. and 7%. principle plus 6% interest on the amount of two-year CDs bought two years ago. The engineer argues that this scheme will work well. 6%.e. (b) Now consider the linear dynamical system x(t+1) = Ax(t)+Bu(t) with u(t) given by the proposed scheme (i.e. (You can take Bj (t) to be zero for t < 0.e. i..06B2 (t − 1). i. the step responses from inputs u1 and u2 to outputs y1 . and compare with the original ones found in part b. choose u(t) that minimizes x(t + 1) . The goal is to choose an input u that causes x(t) to converge to zero as t → ∞. 6% interest on the amount of two-year CDs bought one year ago.) an investor buys certain amounts of one-. • 0. . (d) Design of an asymptotic decoupler.(a) Find matrices A ∈ R6×6 and B ∈ R6×2 such that the dynamics of the mechanical system is given by x = Ax + Bu where ˙ x = [y1 y2 y3 y1 y2 y3 ]T . • 1.. Show that x satisfies an autonomous linear dynamical system equation x(t + 1) = F x(t). 07B3 (t − 1). As feature sizes shrink to well below a micron (i. defined as limt→∞ w(t + 1)/w(t).35. i. The inputs are the voltage sources u1 . or nearly nondiagonalizable. choose some other nonnegative values for B1 (0). and what the matrices A and C are..e. • The 60-20-20 strategy.20.35. simulate the strategies to check your answer. 0.. u3 . 7% interest on the amount of three-year CDs bought two years ago. (You can.) Note: simple numerical simulation of the strategies (e. and B3 (0) that satisfy B1 (0)+B2 (0)+B3 (0) = 1. and 30% in three-year CDs. = B3 (t)/w(t). (d) Suppose you could change the initial investment allocation for the 35-35-30 strategy. For the two strategies above. u2 . • The 35-35-30 strategy. the amount of principle we could get back next year) • B2 (t) + B3 (t − 1) is the amount that matures in two years • B3 (t) is the amount that matures in three years (i. A simple lumped model of three nets is shown below.• 0.20. Don’t just blindly type in matlab commands. We define liquidity ratios as the ratio of these amounts to the total wealth: = (B2 (t) + B3 (t − 1))/w(t).e. i.35.. Be very clear about what the state x(t) is. wires which connect the output of one gate to the inputs of one (or more) other gates are called nets. matrices. must be taken into account. The total payout is re-invested 35% in one-year CDs. 0. 7% interest on the amount of three-year CDs bought one year ago. would the asymptotic growth rate be larger?) How much better is your choice of initial investment allocations? Hint for part d: think very carefully about this one. B2 (0) = 0. say so. to what values? Note: as above. • 0.30. i. and B3 (0) = 0. B2 (0) = 0.. plotting w(t + 1)/w(t) versus t to guess its limit) is not acceptable..e. (If this limit doesn’t exist. The total wealth held by the investor at period t is given by w(t) = B1 (t) + B2 (t) + B2 (t − 1) + B3 (t) + B3 (t − 1) + B3 (t − 2).e. B2 (0).e. 13. Two re-investment allocation strategies are suggested.. check to make sure you’re computing what you think you’re computing. 103 .) (c) Asymptotic liquidity.12 Analysis of cross-coupling in interconnect wiring. of course. You will have two such linear systems: one for the 35-35-30 strategy and one for the 60-20-20 strategy. Hint for whole problem: watch out for nondiagonalizable.g. The total payout is re-invested 60% in one-year CDs. is least liquid) L1 (t) L2 (t) L3 (t) = (B1 (t) + B2 (t − 1) + B3 (t − 2))/w(t). determine the asymptotic growth rate. 20% in two-year CDs. The total wealth at time t can be divided into three parts: • B1 (t) + B2 (t − 1) + B3 (t − 2) is the amount that matures in one year (i. simple numerical simulation alone is not acceptable.e. (b) Asymptotic wealth growth rate. and B3 (0) = 0.30) initial allocation? (For example. In integrated circuits. and 20% in three-year CDs. What allocation would you pick.60..35. The initial investment allocation is the same: B1 (0) = 0. (a) Describe the investments over time as a linear dynamical system x(t + 1) = Ax(t). For each of the two strategies described above. 35% in two-year CDs. and how would it be better than the (0. y(t) = Cx(t) with y(t) equal to the total wealth at time t. ‘deep submicron’) the capacitance of a wire to the substrate (which in a simple analysis can be approximated as ground). do the liquidity ratios converge as t → ∞? If so. The initial investment allocation is B1 (0) = 0. as well as to neighboring wires.07B3 (t − 2). . R1 R2 y1 (t) C2 R4 y2 (t) C4 R6 y3 (t) u3 (t) C5 C6 u1 (t) R3 C1 C7 u2 (t) R5 C3 C8 To simplify the problem we’ll assume that all resistors have value 1 and all capacitors have value 1. which defines these matrices. .m on the course web page. C6 . . . 104   2 −1 0 0  −1 1 0 0     0 0 0  . 1} for all t. To save you the trouble of typing these in. ui (t) ∈ {0. and R5 + R6 . R6 represent the resistance of the wire segments. y2 . . the capacitances C7 and C8 are capacitances between wires 1 and 2.and the outputs are the three voltages labeled y1 . and wires 2 and 3. The inputs (which represent the gates that drive the three nets) are Boolean valued. so we’ve done it for you. ˙ where  2 0 −1 0 0 0 0 1 0 0 0 0 −1 0 0 0 2 0 0 2 0 0 0 −1  1  0   0 F =  0   0 0 y = Kv. respectively) connecting the inputs to the outputs. . the circuit reduces to three wires (with resistance R1 + R2 . The capacitances C1 . . . . 0 0 0 0 2 −1 −1 1 0 0 0 0 0 0 0 0 0 0 0 0 2 −1 −1 1     C=    and v ∈ R6 is the vector of voltages at capacitors C1 . respectively. . (The different locations of the these cross-coupling capacitances models the wire 1 crossing over wire 2 near the driving gate. In this problem we will only consider inputs that switch (change value from 0 to 1 or 1 to 0) at most once.    0 0 0  0 0  1 . K =  0 0 0 1  0 0  0 0 0 0 0 1  0 0    . ) In static conditions. . . C6 are capacitances from the interconnect wires to the substrate.. R3 + R4 . (If you’re an EE student in this category. i. . We recognize that some of you don’t know how to write the equations that govern this circuit. y3 . but you don’t need to know this to do the problem .e.) The equations are C v + Gv = F u. and wire 2 crossing over wire 3 near the end of the wire. we’ve put an mfile interconn. then shame on you. The resistances R1 . . G =  0  0 0 0 −1     0 0 1 0  0 0 0 2  0 0  0 0   0 1 0 0 1 0  . respectively. g3 ? Be sure to give not only the maximum value. 0. u(t) = g. things can get very. that is applied over a fraction θ of each cycle. Here T > 0 is the period. . T3 . A. if your formula involves a matrix inverse.) This phenomenon of y2 deviating from zero (which is what it would be if there were no cross-coupling capacitance) is called bounce (induced by the cross-coupling between the nets). . k = 0.5 for t ≥ T . and θ ∈ [0. A= 0 −1 −2 −1 Plot J. B. Give a formula for x(0) in terms of the problem data. 1] is called the duty cycle of the input. You can think of u as a constant input value one. T . 1. Now suppose that input 2 remains zero.) To be more precise (and also so nobody can say we weren’t clear). and T3 . We define the 50%-threshold delay of the transition as smallest T such that |yi (t) − gi | ≤ 0. 1}. f3 . 1}. f3 . T1 . 105 . g1 . . g1 . x(t + T ) = x(t) for all t ≥ 0. Note: in this problem we don’t consider multiple transitions. i. explain why the matrix to be inverted is nonsingular. Give the largest delay. y2 (t) is large enough to trigger the following gate. what is the maximum possible value of y2 (t). for t ≥ T1 u3 (t) = f3 g3 for t < T3 . very ugly. for t ≥ 0. the input is u(t) = 1 for all t. (b) Maximum bounce due to cross-coupling. .(a) 50%-threshold delay. where gi ∈ {0. g1 .e.. 2. T1 . (If the following gate thresholds were set at 0. since u2 = 0.. and the transitions f1 . and for i = 1. (a) Explain how to find an initial state x(0) for which the resulting state trajectory is T -periodic. 0 (k + θ)T ≤ t < (k + 1)T. in contrast. x(t) 0 dt. where fi ∈ {0.. and also describe which transition gives the largest delay (e. which maximize y(t). u1 (t) = f1 g1 for t < T1 . the mean-square norm of the state. 0. 3. respectively. Try to give the simplest possible formula. ˙ where x(t) ∈ Rn . . the input is u(t) = 0 for all t. where f1 . If for any t. 0)). (But y2 does converge back to zero. over all possible t. explain why the formula you found in part (a) always makes sense and is valid. (b) Explain why there is always exactly one value of x(0) that results in x(t) being T -periodic. J= 1 T T  8 2 . f1 . In addition. then this would be first time after which the outputs would be guaranteed correct. Since the DC gain matrix of this system is I. At t = 0. find the largest (i. for t ≥ T3 u2 (t) = 0 for all t. 13.) Among the 64 possible transitions. but it’s not hard to do so. We consider the stable linear dynamical system x = Ax+Bu.e. the transition with f = (0..e. 1) to g = (1. (In part 1. which lasts T seconds.5. the output converges to the input value: y(t) → g as t → ∞. but also the times t. the system is in static condition. and θ. Note that when θ = 0. .) (c) We now consider the specific system with   0 1 0 0 1 . k = 0.13 Periodic solution with intermittent input. and the inputs have values u(t) = f for t < 0. all transitions occured at t = 0. 2. For t < 0.. g3 ∈ {0. and the system is stable. The transitions in inputs 1 and 3 induce a nonzero response in output 2. the input switches to the Boolean vector g. g3 . i. B= −14 2  T = 5. 2. What is the maximum possible bounce? In other words. i.g. The input has the specific form u(t) = 1 kT ≤ t < (k + θ)T. worst) 50%-threshold delay. 1. f3 .e. and when θ = 1. 1}. but inputs 1 and 3 undergo transitions at times t = T1 and t = T3 . (For example. and u(t) ∈ R. 106 . .versus θ. specify them. (2) We use the symbol ≈ since there may be small measurement errors in the given signal data. for 0 ≤ θ ≤ 1. equivalently.m on the course web site. or rather extends. Given the signal data. If we have ρ = 0. . . the normalized residual ρ). This problem continues. x(N ) ∈ Rn . (a) Explain how to do this. . N − 1. . Does the method always work? If some conditions have to hold. to within 5%.05. x(2). J≈ N i=0 for N large enough (say 1000). where x(0) is the periodic initial condition that you found in part (a). and give the associated value of the normalized residual. Let’s give a quantitative measure of how well the linear dynamical system model (2) holds. for a particular choice of matrices A and B. Here too we need to fit a linear dynamical system model to some given signal data. We are given the time series data.e. . u(1). as S= 1 N −1 N −1 t=1 1/2 x(t + 1) 2 . . we are not told which of the scalar signals are input components and which are state components. (b) Carry out this procedure on the data in lds sysid. a vector signal. t = 1. u(N ) ∈ Rm . z(N ) ∈ Rp . . . z(1). x(1). we will choose the matrices A and B to minimize the RMS residual R (or. we are given some time series values for a discrete-time input vector signal. B as R= 1 N −1 N −1 t=1 1/2 x(t + 1) − Ax(t) − Bu(t) 2 . . . 13. for example. . z(2). . . We define the RMS (root-mean-square) value of the residuals associated with our signal data and a candidate pair of matrices A. u(2). though. To complicate things. i. Give the matrices A and B. You may approximate J as N −1 1 x(iT /N ) 2 .14 System identification of a linear dynamical system.. Estimate the value of θ that maximizes J. roughly speaking. and also a discrete-time state vector signal. as ρ = R/S. it means that the state equation (2) holds. the previous one on system identification. We define the RMS value of x. . so we don’t expect to find matrices A and B for which the linear dynamical system equations hold exactly.15 System identification with selection of inputs & states. . over the same period. We define the normalized residual. Of course you must show your matlab source code and the output it produces. problem 14. In system identification. and we are asked to find matrices A ∈ Rn×n and B ∈ Rn×m such that we have x(t + 1) ≈ Ax(t) + Bu(t). That’s part of what we have to decide. 13. denoted ρ. if z has four components we might assign its first and third to be the state. .We will assign each component of z as either an input component. find matrices A and B) for your choice of x and u. when ρ = 1. A and B).17 FIR filter with small feedback. and be sure to show us verification that the open-loop system is stable and that the closed-loop system is not. As an extreme case. then we have an autonomous linear dynamical system model for the data. when does this control scheme result in the norm squared of the state always decreasing?) (c) Find an example of a system (i. • The matlab code used to solve the problem. here is the problem. so the dimension of the state is always at least one. For example. which contains a vector z(t) ∈ R8 for t = 1. the second term is a penalty for using a large input. • Your matrices A and B. dt where ρ > 0 is a given parameter. u(t) is chosen to minimize the compositive objective above. i. to minimize the quantity d x(t) 2 + ρ u(t) 2 . Once we assign components of z to either x or u. Assign the components of z to either state or input. one with no inputs at all. where K ∈ Rm×n . the one with the smallest dimension of u. for which the normalized RMS residuals is smaller than around 5%.) 13. This scheme is greedy because at each instant t. in the sense that the data are explained by a linear dynamical system driven by only a few inputs. ˙ but the closed-loop system x = Ax + Bu (with u as above) is unstable. that is not too big. (We will not check this for you. Our goal is to choose an input u : R+ → Rm . u(t) = x(t) = z4 (t) z3 (t) You can assume that we always assign at least one component to the state. if all components of z are assigned to x. • Your assignments to state and input. and drives the state x : R+ → Rn of the system x = Ax + Bu to zero quickly. If the dimension of u is small. (a) Show that u(t) can be expressed as u(t) = Kx(t). and attach code and associated output. for ˙ each t. Try to find ˙ the simplest example you can. without regard for the effects such an input might have in the future. . One measure of the complexity of the model is the number of components assigned to the input u. (In other words.) (b) What are the conditions on A. We seek the simplest model. 13. or a state component.e. Explain how you solved the problem. . the larger the dimension of u. and ρ under which we have (d/dt) x(t) 2 < 0 whenever x(t) = 0. Your solution should consist of the following: • Your approach. Please order the components in x and u in the same order as in z.e. To do this. The first term gives the rate of decrease (if it is negative) of the norm-squared of the state vector. the more complex the model.. B. . then we have a compact model. and its second and fourth to be the input.m on the class web server. i.. and its output. for which the open-loop system x = Ax is stable. Give a clear description of what x and u are. and develop a linear dynamical system model (i..e. Get the data given in lds sysid2. . using the scheme described above? (In other words. • The relative RMS residuals obtained by your matrices. Finally. we then proceed as in problem (14): we find matrices A and B that minimize the RMS residuals as defined in problem (14).. 100. the control scheme has the form of a constant linear state feedback.e.e.16 A greedy control scheme. i.. z2 (t) z1 (t) . You must explain how to check this. we will choose u(t). Consider a cascade of 100 one-sample delays: 107 . Give an explicit formula for K. and capacitors. deliver position: (k + 1/2)T ≤ t < (k + 1)T.18 Analysis of a switching power supply. shown in schematic diagram below. when the inductor current is delivered to the load. Here A ∈ Rn×n . (d) What are the eigenvalues of Af ? (e) How different is the impulse response of the system with feedback (α = 10−5 ) and without feedback (α = 0)? 13. B ∈ Rn . which convert one voltage to another.) Don’t worry—you don’t need to know anything about schematic diagrams or circuits to solve this problem! iL deliver i charge vL load y(t) = Cf x(t) + Df u(t) z −1 y y(t) = Cx(t) + Du(t) Vs The switch alternately connects to ground. When the switch is in the deliver position. we have di/dt = (Vs − vL )/L. where Vs > 0 is the (constant) source voltage and L > 0 is the inductance. the inductor current satisfies di/dt = Vs /L. 108 .u z −1 z −1 y (a) Express this as a linear dynamical system x(t + 1) = Ax(t) + Bu(t). The load is described by the linear dynamical system x = Ax + BiL . during which time the inductor is charged. Many of these are built from electronic switches. and to the load. with gain α = 10−5 . and the load current is iL = 0. ˙ vL = Cx. inductors. The switch is operated periodically as follows: charge position: kT ≤ t < (k + 1/2)T. (b) What are the eigenvalues of A? (c) Now we add simple feedback. and iL = i. to the system: u z −1 α Express this as a linear dynamical system x(t + 1) = Af x(t) + Bf u(t). and C ∈ R1×n . Many electronic systems include DC-DC converters or power supplies. (It’s called a boost converter because the load voltage can be larger than the supply voltage. In this problem we consider a standard boost converter. When the switch is in the charge position. where x(t) ∈ Rn is the internal state of the load. 1. The decoupler is also FIR of length 4: g(0). 2-output LDS with impulse matrix g. To eliminate them.19 Dynamic decoupling.. converges to its final value by t = 3.for k = 0. The step response matrix of the system is shown below. 2. which means that its impulse response h is nonzero only for t t = 0. neither of which we want. This is shown below. but g(t) = 0 for t ≥ 4. 3) can be obtained from the class web page in dynamic_dec_h. 1. Here T > 0 is the period of the switching power supply. you can imagine it as a chemical process reactor.m. 2 1 2 1 s11 −1 −2 0 2 4 6 s12 0 0 −1 −2 0 2 4 6 t 2 1 2 1 t s21 −1 −2 0 2 4 6 s22 0 0 −1 −2 0 2 4 6 t t The plots show that u1 has a substantial effect on y2 . This means that its step response matrix. .m. 2. g(2). r decoupler u process y compensated system 109 . An industrial process is described by a 2-input 2-output discrete-time LDS with finite impulse response of length 4. you will explore the design of a dynamic decoupler for this system. no matter what the initial inductor current i(0) and load initial state x(0) are. 3. 1. h2. Give the value of V . y1 as the reactor temperature. where you will find the 2 × 2 impulse response matrices h0. h(t) = 0 for t ≥ 4. We will consider the specific switching power supply with problem data defined in the file boost_data. . The impulse response matrix of the system (for t = 0. We will not accept simulations for some values of iL (0) and x(0) as an answer. and y2 as the reactor pressure. 13. the input to the industrial process. which is another 2-input. 2. vL (kT ) ¯ ¯ converges to a constant value V as k → ∞. and the output of the decoupler is u. h1. h3. Show that. g(3) ∈ R2×2 can be nonzero. If you want to think of this system in concrete terms. u2 as a reactant flow rate. The decoupler is used as a prefilter for the process: the input r (which is called the reference or command input) is applied as the input to the decoupler. g(1). defined as s(t) = τ =0 h(τ ). . with u1 a heater input. and that u2 has a substantial effect on y1 . 110 . say so. This means if the reference input is constant. If there is no such decoupler (i. the process output converges to ˜ the reference input. Find such a decoupler. from input r to output y. The goal is to design the decoupler (i. If there are many decouplers that satisfy the given specifications. choose g(0).We refer to this cascaded system as the compensated system. • The off-diagonal entries of s(t) are all zero (for all t). and do something sensible with any extra degrees of freedom you may have. This means the compensated system is ˜ decoupled : r1 has no effect on y2 .e. • limt→∞ s(t) = I. g(3) ∈ R2×2 ) so that the compensated system satisfies the following specifications. Let s denote the step response matrix of ˜ the compensated system.. g(2). g(1). the problem specifications are not feasible). and plot the compensated system step response matrix.e.. say so. and r2 has no effect on y1 . and explain why. Lecture 15 – Symmetric matrices. . 111 . and SVD 15. n. i. So the system can be described by t = Ax (where A is tall. and one is located in the centre. and vents are at distance rv . which in turn is proportional to x2 . More precisely. and the corresponding RMS error in temperature. Comment: In this problem we ignore the fact that. and the effect of vent j on temperature ti is given by 1 Aij = 2 rij where rij is the distance between vent j and measuring point i. .1 Simplified temperature control. Of those. and |Aij | ≤ (Aii Ajj )1/2 . cooling requires more power (per unit of temperature difference) than heating . with minimal power consumption? (b) Temperature measurement points are at distance rt from the center. r_v. 5 have vents evenly distributed around the perimeter of the room. quadratic forms. obtain exactly the least possible sum of squares error in temperature).2 Find a symmetric matrix A ∈ Rn×n that satisfies Aii ≥ 0. “cost” means the total power spent on the temperature-control systems. j = 1. The temperature in each x2 rv x3 t4 t5 t6 t7 x4 t3 t2 t1 rt x6 t9 t10 x1 t8 x5 cubicle (measured at its center as shown in the figure) is ti . i = 1.. . which is the sum of the power consumed by each heater/cooler. but is not positive semidefinite. as well as the power usage. the inhabitant of cubicle i wants the temperature to be yi hotter than the surrounding air (which is colder if yi < 0!) The objective is then to choose the xj to best match these preferences (i. . Here.) The temperature preferences differ among the inhabitants of the 10 cubicles. . Each vent j blows a stream of air at temperature xj . measured relative to the surrounding air (ambient air temperature. . . Vent 1 lies exactly on the horizontal. It also provides code for computing the distances from each vent to each desired temperature location. matrix norm. . A circular room consists of 10 identical cubicles around a circular shaft. The file temp_control. and a preferences vector y. with minimal cost. j (a) How would you choose the xj to best match the given preferences yi .) The temperatures may be hotter (xj > 0) or colder (xj < 0) than the ambient air temperature. There are 6 temperature-control systems in the room. . Using these data. But this was not meant to be an entirely realistic problem to start with! 15. . find the optimal vent temperatures x. n.e.m on the course webpage defines r_t. in general. choose any subset of {1. Show that T T AT ≥ 0 if and only if A ≥ 0. Show that A B Under what conditions do we have equality? 15.6 A Pythagorean inequality for the matrix norm.8 Properties of symmetric matrices. then the 15. (a) Show that f is positive semidefinite (i. (a) Show that if A and B are PSD and α ∈ R. Aii Ajj .. 15. (b) If P ≥ Q then −P ≤ −Q. . . What is the size of the smallest such F (i. (b) Suppose that T ∈ Rn×n is invertible. (a) Show that G = GT ≥ 0. . T AT T is called a congruence of A and T AT T and A are said to be congruent.5 Positive semidefinite (PSD) matrices. When T is invertible. For each statement below. (b) Show that any (symmetric) submatrix of a PSD matrix is PSD. (a) Let Z ∈ Rn×p be any matrix. (To form a symmetric submatrix. Suppose A = AT ∈ Rn×n . (a) If P ≥ 0 then P + Q ≥ Q.) (d) Show that if A is (symmetric) PSD. in the form f (x) = F x for some appropriate matrices F and G. Explain how to find such an F (when A ≥ 0). i = 1. then |Aij | ≤ entire ith row and column of A are zero. (c) Show that if A ≥ 0. Let f (x) = xT Ax (with A = AT ∈ Rn×n ) be a quadratic form.9 Express − xi )2 in the form xT P x with P = P T . (e) If P ≥ Q then P 2 ≥ Q2 .15. if Aii = 0. then so are αA and A + B.7 Gram matrices. . . 15. Is P ≥ 0? P > 0? 112 .. . Show that Z T AZ ≥ 0 if A ≥ 0. This problem shows that congruences preserve positive semidefiniteness. n} and then throw away all other columns and rows. either give a proof or a specific counterexample. (d) If P ≥ Q > 0 then P −1 ≤ Q−1 . . α ≥ 0. A ≥ 0) if and only if it can be expressed as f (x) = F x 2 for some matrix F ∈ Rk×n . In this problem P and Q are symmetric matrices.4 Congruences and quadratic forms. 15. fn are linearly dependent. . Given functions fi : [a.e. Suppose that A ∈ Rm×n and B ∈ Rp×n . (b) Show that G is singular if and only if the functions f1 . how small can k be)? (b) Show that f can be expressed as a difference of squared norms. n. 2 − Gx 2 . . Gij = a fi (t)fj (t) dt. How small can the sizes of F and G be? 15. (c) If P > 0 then P −1 > 0. Aii ≥ 0. n−1 i=1 (xi+1 Hint: you might find it useful for part (d) to prove Z ≥ I implies Z −1 ≤ I. b] → R. . In particular.3 Norm expressions for quadratic forms. . .e. the Gram matrix G ∈ Rn×n associated with them is defined by b ≤ A 2 + B 2. which satisfies A = AT . n. How are the eigenvalues and singular values related? 15.) You can assume the eigenvalues (and of course singular values) are sorted. B for which the statement does not hold. (a) σmax (X) ≥ max1≤i≤n (b) σmin (X) ≥ min1≤i≤n 1≤j≤n 1≤j≤n |Xij |2 . 15. 15. You can assume (to simplify) that the largest singular value of A is isolated. (The singular values are based on the full SVD: If Rank(A) < n.. . .. Y.12 An invertibility criterion. . then A ≥ B. (In practice it always works. then {x|xT Ax ≤ 1} ⊆ {x|xT Bx ≤ 1}.. σ1 > σ2 . . eAt ≥ eBt . (c) σmax (XY ) ≤ σmax (X)σmax (Y ). . λi (A) = λi (B) for i = 1.. then A ≥ B. then try x = ei + ej . (f) If A ≥ B then for all t ≥ 0. (‘True’ means the statement holds for all A and B. . How large can λmax (A) be? Suppose A = AT ∈ 15. (c) If A ≤ B..e.. . Let z(0) = a ∈ Rn be nonzero. z(t + 1) = AT w(t).e. .e. Here X. . assume that A or B is positive semidefinite. The following method can be used to compute the largest singular value (σ1 ). i. . and let σ1 . . j = 1. (d) If the eigenvalues of A and B are the same. . . j = 1. . . . . i. . with |Aij | ≤ 1. A ≤ B means B − A is positive semidefinite). Show that A = B.e. the symbol ≤ between symmetric matrices denotes matrix inequality (e. (h) If Aij ≥ Bij for i. Show that it ‘usually’ works. . . then there is an orthogonal matrix Q such that A = QT BQ.) (b) If {x|xT Ax ≤ 1} ⊆ {x|xT Bx ≤ 1}.. . Let λ1 .14 Some problems involving matrix inequalities. . . λi (A) = λi (B) for i = 1. Suppose that A ∈ Rn×n . λi (X) will denote its ith eigenvalue. Z ∈ Rn×n . 113 . Interpretation: every matrix whose distance to the identity is less than one is invertible. Analyze this algorithm. i. We do not. and then repeat the iteration w(t) = Az(t).15 Eigenvalues and singular values of a symmetric matrix. ‘false’ means there is at least one pair A. n. for t = 1. Show that A < 1 implies I − A is invertible. σn be the singular values of a matrix A ∈ Rn×n . . λ1 ≥ · · · ≥ λn and σ1 ≥ · · · ≥ σn . n. otherwise give a specific counterexample. |Xij |2 . w(t)/ w(t) ≈ u1 and z(t)/ z(t) ≈ v1 . and also the corresponding left and right singular vectors (u1 and v1 ) of A ∈ Rm×n . Hint: first try x = ei (the ith unit vector) to conclude that the entries of A and B on the diagonal are the same. Be very explicit about when it fails. . then some of the singular values are zero.) 15. 15. For each of the following statements. . prove it if it is true. Rn×n .16 More facts about singular values of matrices. however. i. For large t.e.15. In the following problems you can assume that A = AT ∈ Rn×n and B = B T ∈ Rn×n . (a) A ≥ B if λi (A) ≥ λi (B) for i = 1. xT Ax = xT Bx for all x. . n. i.11 A power method for computing A . . .13 A bound on maximum eigenvalue for a matrix with entries smaller than one. .g. As usual. For X = X T ∈ Rn×n . then the eigenvalues of A and B are the same. .10 Suppose A and B are symmetric matrices that yield the same quadratic form. (g) If A ≥ B then Aij ≥ Bij for i. Decide whether each of the following statements is true or false. sorted so λ1 (X) ≥ λ2 (X) ≥ · · · ≥ λn (X). λn be the eigenvalues. . . (e) If there is an orthogonal matrix Q such that A = QT BQ. n. i. j = 1. 2. n. . . yi = 0. Then show that σmax (A) ≤ A F ≤ rσmax (A).. These two equations ‘center’ our drawn graph.or post. Thus the Frobenius norm 2 2 (c) Show that A F = √σ1 + · · · + σr .j |Aij |2  1/2 . First of all. . So we can assume that n n xi = 0. it can have a small σmin . Can a matrix have all entries small and yet have a large gain in some direction. Ax ≤ A F x for all x. Another problem is that we can minimize J by putting all the nodes at xi = 0. 114 .17 A matrix can have all entries large and yet have small gain in some directions. 15. We let x ∈ Rn be the vector of x.19 Drawing a graph. One goal is that neighboring nodes on the graph (i.and y-coordinates so as to minimize the objective J= i<j. ones connected by an edge) should not be too far apart as drawn. where σ1 . i=1 i=1 yi = 0. which results in J = 0. 15. i. we draw node i at the location (xi . i i=1 xi yi = 0. =  i. To take this into account. This just means assigning an x. To force the nodes to spread out. i=1 i=1 x2 = 1. maximum singular value). . i∼j (xi − xj )2 + (yi − yj )2 .. Note also that it is much easier to compute the Frobenius norm of a matrix than the (spectral) norm (i.. the sum (or mean value) of the x. where i ∼ j means that nodes i and j are connected by an edge.e.e. for example. then U A F = AV F = A is not changed by a pre.and y-coordinates is zero. that is. is to make the drawn graph look good.) (a) Show that A F F = Thus the Frobenius norm is simply the Euclidean norm of the matrix when it is considered as an 2 element of Rn . we will choose the x. .coordinates of the nodes.e. The objective J is precisely the sum of the squares of the lengths (in R2 ) of the drawn edges of the graph.coordinates of the nodes. yi ) ∈ R2 .(d) σmin (XY ) ≥ σmin (X)σmin (Y ). What can you say about σmax (A)? 15. and y ∈ Rn be the vector of y.. (e) σmin (X + Y ) ≥ σmin (X) − σmax (Y ). σr are the singular values of A.orthogonal transformation. . (b) Show that if U and V are orthogonal.coordinate to each node. of course.18 √ Frobenius norm of a matrix. F. We consider the problem of drawing (in two dimensions) a graph with n vertices (or nodes) and m undirected edges (or links). We have to introduce some other constraints into our problem to get a sensible solution. j ≤ n. we impose the constraints n n n 2 yi = 1. that is.e. i.and a y. The problem. In particular. 106 106 A= 106 106 has “large” entries while A[1 − 1]T = 0. the objective J is not affected if we shift all the coordinates by some fixed amount (since J only depends on differences of the x. a large σmax ? Suppose. the sum of the diagonal entries. (Recall Tr is the trace of a matrix. The Frobenius norm of a matrix A ∈ Rn×n is defined as A Tr AT A. that |Aij | ≤ ǫ for 1 ≤ i. For example.and y-coordinates). When we draw the graph. m also contains the vectors x_circ and y_circ.m. The mfile dg_data. . with eigenvalue decomposition A = i=1 λi qi qi . so Aii = 0. . say. and Q ∈ R2×2 is any orthogonal matrix. at least approximately. the last says that the x. x1 . and have the same value of J. You are welcome to use the results described below. . . you may not refer to any matlab commands or operators. your description must be entirely in mathematical terms. n yi ˜ yi satisfy the centering and spreading constraints. You know that a solution of the problem minimize xT Ax subject to xT x = 1. you get another set of coordinates that is just as good. we do not have self-loops.and ycoordinates are uncorrelated. We’ll just live with this ambiguity. Verify that your x and y satisfy the centering and spreading conditions. . This mfile contains the node adjacency matrix of the graph.. Even with these constraints. if x and y are any set of coordinates. [x_circ y_circ]. The radius of the circle is chosen so that x_circ and y_circ satisfy the centering and spread constraints.e. denoted A. The related maximization problem is maximize xT Ax subject to xT x = 1. You can use any method or ideas we’ve encountered in the course.’o-’). (You don’t have to know what variance or uncorrelated mean. . Let A ∈ Rn×n n T be symmetric. by placing the nodes in order on a circle.. finds a set of coordinates with J as small as it can possibly be). qn } orthonormal. which sets the x. with variable x ∈ Rn . [x y]. where the variable is X ∈ Rn×k . You can draw the given graph this way using gplot(A. the objective can be k written in terms of the columns of X as Tr(X T AX) = i=1 xT Axi . If your method is iterative.) Draw your final result using the commands gplot(A. This means that if you have a proposed set of coordinates for the nodes.e. . (The graph is undirected. plot the value of J versus iteration. (b) Implement your method. Here’s the question: (a) Explain how to solve this problem. so A is symmetric.. then the coordinates given by xi ˜ xi =Q . . Hint.coordinates is one. and {q1 . where the variable is x ∈ Rn . given the graph topology. Give the value of J achieved by your choice of x and y. and Aij = 0 otherwise. A solution to this problem is x = q1 . Now consider the following generalization of the first problem: minimize Tr(X T AX) subject to X T X = Ik . but perhaps not the absolute best). . A solution of this problem is i 115 . For example. then by rotating or reflecting them. .e. and Ik denotes the k × k identity matrix. . n. according to our objective. are orthonormal.scales to be equal. is x = qn . . xk . i. with λ1 ≥ · · · ≥ λn . and carry it out for the graph given in dg_data. Be clear as to whether your approach solves the problem exactly (i. Also. . without proving them.’o-’). In describing your method. which plots the graph and axis(’square’).y. and we assume k ≤ n. the coordinates that minimize J are not unique. or whether it’s just a good heuristic (i.The first two say that the variance of the x. These coordinates are obtained using a standard technique for drawing a graph.and y. i = 1. a choice of coordinates that achieves a reasonably small value of J. . . these are just names for the equations given above. and defined as Aij = 1 if nodes i and j are connected by an edge.) The three equations above enforce ‘spreading’ of the drawn graph. how to find x and y that minimize J subject to the centering and spreading constraints. axis(’square’).. for i = 1. The constraint means that the columns of X. . The input u is ground displacement (during an earthquake). B= 1 −1 . . . As usual. B. In other words. The input u has the form u(t) = qv(t). SVD. 15. x(0) = 0. C= 1 2 . y-.21 Finding worst-case inputs. where q ∈ R3 is a (constant) vector with q = 1. single output system x(t + 1) = Ax(t) + Bu(t). .5 0. y(t) = Cx(t). y-.e. 116 .coordinates of the) displacement of base. 15. Give the directions and the associated values of D (Dmax and Dmin .. A suspension system is connected at one end to a base (that can move or vibrate) and at the other to the load (that it is supposed to isolate from vibration of the base). y(t) = Cx(t). is a very simple (discretized and lumped) dynamical model of a building. .20 Approximate left inverse with norm constraints. QR. Give an explicit description of an optimal F . 15. as well as give explicit numerical answers and plots. and y(t) ∈ R3 represents the (x-. The data A. The response of the system is judged by the RMS deviation of the load over a 100 sample interval. the system is described by x(t + 1) = Ax(t) + Bu(t). Your description can involve standard matrix operations and decompositions (eigenvector/eigenvalue. 99 t=0 y(t)2 be? Plot an input u that maximizes y(t)2 . D= 1 100 100 1/2 y(t) t=1 2 ..7 . v(99) are known (and available in the mfile worst susp data. you must explain how you solve the problem. and z.e. i. Suppose A ∈ Rm×n is full rank with m ≥ n. We seek a matrix F ∈ Rn×m that minimizes I − F A subject to the constraint F ≤ α. the driving displacement u is always in the direction q. C. along with the resulting output y. and the direction qmin ∈ R3 that minimizes D.m on the course web site).22 Worst and best direction of excitation for a suspension system. The related maximization problem is maximize Tr(X T AX) subject to X T X = Ik . and z. Note that I − F A gives a measure of how much F fails to be a left inverse of A. x(0) = 0. along with the resulting (b) How large can |y(100)| be? Plot an input u that maximizes |y(100)|. The problem is to find the direction qmax ∈ R3 that maximizes D. i. v(0). The 49 input u is known to satisfy t=0 u(t)2 ≤ 1 and u(t) = 0 for t ≥ 50. Suitably discretized. the earthquake has energy less 99 t=0 than one. respectively). . where α > 0 is given. . . The single-input. where u(t) ∈ R3 represents the (x-. this reduces to the first problem above. (a) How large can output y. with variable X ∈ Rn×k . and only lasts 50 samples. Note that when k = 1.X = [qn−k+1 · · · qn ].9 −0.coordinates of the) displacement of the load. ). with amplitude given by the (scalar) signal v. and y gives the displacement of the top of the building. A solution of this problem is X = [q1 · · · qk ]. and v(t) ∈ R gives the displacement amplitude versus time.5 0. where A= 0. 1 0. we saw two different ways of representing an ellipsoid. You can assume that xi (0) > 0 for i = 1. The second is as the image of a unit ball under a linear mapping: E2 = { y | y = Ax. In the lectures. i. where T satisfies 0 < T < 10. explain how to find all S that yield E1 = E2 . the biomass of each strain changes through several mechanisms including cell division. The total biomass at time t is given by 1T x(t) = x1 (t) + x2 (t) + x3 (t). t = 10.012 . explain how to find an S so that E1 = E2 . x2 (3.e.3 0 0 −0.1. (The estimation method won’t make any use of the information that xi (0) > 0.2 0. We’ll judge accuracy by the maximum value ˆ 2 2 2 that x(0) − x(0) can have. i. for the T you choose).23 Two representations of an ellipsoid. (c) Selection of optimal assay time. x ≤ 1 } . and submit your matlab code as well the output it produces. The three measurements of total biomass (which are called assays) will include a small additive error.) The problem here is to choose the time T of the intermediate assay in such a way that the estimate of x(0).. converge to zero.01 .. called strain 1. ˆ 117 .. v2 2 2 2 (for the assay at t = T and v3 (for the assay at t = 10). Of course you must explain exactly what you are doing.012 . give a plausible story that explains. the result you found.. or biomass (in grams) of the strains present in the population at time t. 2 i. For example.15. 3. strain 2. 15. Find ˆ the optimal value for T (of course. with S T = S > 0. You can also assume that a good method for computing the estimate of x(0). (But you don’t need to know any biology to answer this question!) The population dynamics is given by x = Ax. with non-zero volume. Be sure to say what the optimal T is. denoted x(0). where ˙   −0.e. measured in hours. In 100 words or less. denoted v1 (for the assay at t = 0). given the measurements. A= 0. the value of T that minimizes the maximum value x(0) − x(0) can have. i.1 You can assume that we always have xi (t) > 0. (b) Given A. intuitively. (a) Give a very brief interpretation of the entries of the matrix A. the sum of the squares of the measurement errors is not more than 0. Posterior intuitive explanation. centered at 0. is as accurate as possible. Given A. the vector x(0) ∈ R3 . We are looking for an answer that is accurate to within ˆ ±0.e.e. We consider a population that consists of three strains of a bacterium. explain how to find an A so that E1 = E2 .24 Determining initial bacteria populations.4) denotes the amount of strain 2 (in grams) in the sample at time t = 3. based on measurements of the total biomass taken at t = 0. where 1 ∈ R3 denotes the vector with all components one. cell death. what is the significance of a13 = 0? What is the significance of the sign of a11 ? Limit yourself to 100 words. oscillate)? Explain how you arrive at your conclusion and show any calculations (by hand or matlab) that you need to do. Over time. For example. and strain 3. the biomass of each strain is always positive. You can assume that v1 + v2 + v3 ≤ 0.4 hours.1  . grow without bound). explain how to find all A that yield E1 = E2 . what the maximum value x(0) − x(0) is. The first uses a quadratic form: E1 = x xT Sx ≤ 1 . and mutation. (c) What about uniqueness? Given S. or not converge at all (for example. 2. You may use phrases such as ‘the presence of strain i enhances (or inhibits) growth of strain j’. (b) As t → ∞. will be used. and what the optimal accuracy is (i.e. over all measurement errors that satisfy v1 + v2 + v3 ≤ 0.1 0 −0.. between 0 and 10). does the total biomass converge to ∞ (i. and t = T .e. (a) Given S. with det(A) = 0. The vector x(t) ∈ R3 will denote the amount. A biologist wishes to estimate the original biomass of each of the three strains.. . defined by Aij = 1 if there is a link connecting nodes i and j 0 otherwise. When the limits don’t. . 15. it gives the asymptotic fraction of all (long) paths that go from node i to node j.15.. . . x(N ) . . i. we say that Cij isn’t defined. . j) . A path from node i to node j. i. (Be very clear about which one you choose. which we represent as E = {x ∈ Rn | xT Ax = 1}. We say that each node is connected to itself by a path of length zero.26 Recovering an ellipsoid from boundary points. so A = 0. then the sequence 3. Note that a path can include loops. j) m→∞ when the limits exist. (b) A has distinct eigenvalues. A35 = 1).. connected by links.. . etc. and also that the matrix A you find does. |λ1 | > |λi | for i = 2. k2 . ˙ y = Cx.e. equal to Cij .m. with the property that Ak1 . 15. The number Cij gives a good measure of how “connected” nodes i and j are in the graph.e. where λ1 . (c) A has distinct singular values. . We consider a linear system of the form x = Ax. described by its adjacency matrix A ∈ Rn×n .25 A measure of connectedness in a graph. 118 . When Cij exists. Be sure to explain how you check that the ellipsoid you find is reasonably consistent with the given data. You must explain why your expression is. (d) A is diagonalizable. You can make one of the following assumptions: (a) A is full rank. j) denote the total number of paths of length m from node i to node j. the matrix A. We assume the graph has no self-loops. For example. We consider an undirected graph with n nodes. explain why Cij exists. correspond to an ellipsoid. and then carry it out on the data given in the mfile ellip_bdry_data. . that start at i and end at j. To simplify the explanation.k2 = · · · = Akm . there is no requirement that each node be visited only once.e. you can give it for the case n = 4 (which is the dimension of the given data). Aii = 0. and derive an expression for Cij . We define Cij = lim Pm (i. where A = AT ∈ Rn×n ≥ 0.27 Predicting zero crossings.j=1 Pm (i. . Let Pm (i. n i. (e) A has a dominant eigenvalue. . of length m > 0. You are given a set of vectors x(1) . 5 is a path between node 3 and node 5 of length 3. in fact. .. n. . pseudo-inverse.km+1 = 1. given the observed data x(1) .) Using your assumption. 3. eigenvalues and eigenvectors. λn are the eigenvalues of A. We assume that the graph has at least one link. But it should be clear from your discussion how it works in general. and the denominator is the total number of paths of length m. but you cannot leave a limit in your expression. . in fact. . is an m + 1-long sequence of nodes. You can use any of the concepts from the class. Explain your approach to this problem. . . Note that A = AT . such as singular values and singular vectors. 5. if node 3 and node 5 are connected by a link (i. . km+1 = j. In the fraction in this equation. x(N ) ∈ Rn that are thought to lie on or near the surface of an ellipsoid centered at the origin. More precisely it is a sequence i = k1 . so the ratio gives the fraction of all paths of length m that go between nodes i and j. . Your job is to recover. at least approximately. the numerator is the number of paths of length m between nodes i and j. . i−1 hi = j=1 wj ci−j . You can assume these times are given in increasing order. . i. . .) Note that the zero-crossing times are given to three significant digits. . i = 2. . . w2 .e. (So here we have p = 4. where we take wi and ci to be zero for i ≤ 0 or i > n. i. 15.. . . if that’s part of your procedure. 0 ≤ t1 < · · · < tp . .) Be sure to make it clear which one of these options you choose. but we do not know x(0). that y(t) = 0 for tp < t < tp+1 . i. .where x(t) ∈ Rn and y(t) ∈ R.28 Optimal time compression equalizer. t4 as the first 4 zero-crossings.000. Hint: Be careful estimating rank or determining singularity. (Note that this definition of zero-crossing times doesn’t require the output signal to cross the value zero..e. (To make things simple. . can we predict the next zero-crossing time tp+1 ? (This means. We cannot directly observe the output. We are given the (finite) impulse response of a communications channel. tp at which y(ti ) = 0.000. but have different 5th zero-crossings.e. . wn . . • If you think that you cannot determine the next zero-crossing time t5 . . and y(tp+1 ) = 0. ∗c ∗w 119 ..000. the real numbers c1 . We know A and C. . 2n. and that y(t) = 0 for 0 ≤ t < tp and t = t1 . remember that the zero-crossing times are only given to three significant figures. we are given the zero-crossing times t1 . . t3 = 2.143. . t2 = 1.. and find the next time t5 (to at least two significant figures). explain in detail why. .) We are interested in the following question: given A. . the equalizer has the same length as the channel. cn . . i.. c2 . . it is enough to just have the value zero. and not just in the last significant digit. of course. explain in detail how to do it.e. . . and the zero-crossing times t1 . t = tp . . and find two trajectories of the system which have t1 . the real numbers w1 . . you must do one of the following: • If you think that you can determine the next zero-crossing time t5 .  0 0 0 1  −18 −11 −12 −2 and zero-crossing times t1 = 0. (The zero-crossings should differ substantially.) The equalized channel response h is given by the convolution of w and c.) You will answer this question for the specific system   0 1 0 0  0 0 1 0  . .e. This is shown below. but we do know the times at which the output is zero. tp . . . t4 = 3. Our goal is to design the (finite) impulse response of an equalizer. i. C. Specifically. A= C= 1 1 1 1 . The safe operating region for the state is the unit ball B = { x | x ≤ 1 }. eigenvalues and eigenvectors. and its −1 inverse P = Wc . Your description and justification must be very clear. the controllability Gramian Wc . or DTE. . The input u is beyond our control. i = 1. or even convolution. but we won’t go into that here. . the equalized response h. we first define the total energy of the equalized response as 2n Etot = i=2 h2 . 15.g. You can appeal to any concepts used in the class. . and explain why it works. least-squares. least-norm. . (Emin can also be justified as a system security measure on statistical grounds. where x(t) ∈ Rn and u(t) ∈ Rm . (b) Suppose the safe operating region is the unit cube C = { x | |xi | ≤ 1. Let Emin denote the minimum energy required to drive the state outside the unit cube cube C. it tells us what fraction of the energy in h is contained in the time interval t = n + 1 − k. Repeat part (a) for Emin . The hope is that input u will not drive the state outside the safe operating region. matrix exponential. (Note that we do not specify t. and give the DTE for your w. . e. the time at which the state is outside the safe operating region. (b) Carry out your method for time compression length k = 1 on the data found in time_comp_data. Please note: You do not need to know anything about equalizers. To define this formally. Make sure you give the simplest possible expression for Emin . B. we can be fairly confident that the state will not leave the safe operating region. or other matrices derived from them such as the controllability matrix C.m. x(0) = 0. everything you need to solve this problem is clearly defined in the problem statement.The goal is to choose w so that most of the energy of the equalized impulse response h is concentrated within k samples of t = n + 1. n } instead of the cube unit ball B. as DTE = Edes /Etot . we define the desired to total energy ratio. but we ˙ have some idea of how large its total energy ∞ 0 u(τ ) 2 dτ is likely to be. t = n + 1 + k. we have Etot > 0.) (a) Find Emin explicitly. You can assume that h is such that for any w = 0. communications channels. i and the energy in the desired time interval as n+1+k Edes = i=n+1−k h2 . .29 Minimum energy required to leave safe operating region. Plot your solution w. i For any w for which Etot > 0. 120 . singular values and singular vectors. Thus number is clearly between 0 and 1. One measure of system security that is used is the minimum energy Emin that is required to drive the state outside the safe operating region: t Emin = min 0 u(τ ) 2 dτ x(t) ∈ B . and so on. We consider the stable controllable system x = Ax + Bu. Your solution should be in terms of the matrices A. . ..) If Emin is much larger than the energy of the u’s we can expect. where k < n − 1 is given. (a) How do you find a w = 0 that maximizes DTE? You must give a very clear description of your method. ) The system is controllable. We apply an input sequence u(0). u(T − 1) that reaches position f ∈ R3 at time T (where T ≥ n). y(2N − 1). the output energy is defined as Eout = 2N −1 t=N N −1 t=0 u(t)2 . (Of course E will depend on the data A. Still. This vehicle is to be overtaken (intercepted) by the first vehicle at time T . the energy of the minimum energy input. (We take u(t) = 0 for t ≥ N . u(N − 1). . . the time of intercept (T ) is known to the second vehicle. i. y(t) ∈ R. C. for several reasons: both vehicles start from the zero initial state. H. .. . . . . u(N − 1) to maximize the ratio of output energy to input energy. Express your answer in terms of the data A. .) We define the input energy as Ein = and similarly.e. C. 121 .e. The initial state is zero. F .15. . y(t)2 . . . and u(t) ∈ Rm is the input signal. . .30 Energy storage efficiency in a linear dynamical system. in the sense that is requires the first vehicle to expend the largest amount of input energy to overtake it. and the place of intercept (w(T )) is known ahead of time to the first vehicle. i. and are interested in the output over the next N samples. and standard matrix functions (inverse. . How would you find v(0). We consider the discrete-time linear dynamic system x(t + 1) = Ax(t) + Bu(t). y(T ) = f . . v(T − 1) that maximizes the minimum energy the first vehicle must expend to intercept the second vehicle at time T . y(t) = Cx(t).31 Energy-optimal evasion.) (b) Energy-optimal evasion. Give an expression for E. and u(t). ). subject to a limit on input energy. and v(t) ∈ Rm is the input signal. y(t) ∈ R3 is the vehicle position. where x(t) ∈ Rn . i. given the limit on input energy the second vehicle is allowed to use. y(N ). . A vehicle is governed by the following discrete-time linear dynamical system equations: x(t + 1) = Ax(t) + Bu(t). Remark: This problem is obviously not a very realistic model of a real pursuit-evasion situation. v(0) 2 + · · · + v(T − 1) 2 ≤ 1? The input v is maximally evasive. w(t) ∈ R3 is the vehicle position. and minimizes the input ‘energy’ u(0) 2 + · · · + u(T − 1) 2 . SVD. it’s possible to extend the results of this problem to handle a realistic model of a pursuit/evasion. B. . . z(0) = 0 where z(t) ∈ Rn is the state of the vehicle. y(t) = Cx(t). . Now consider a second vehicle governed by z(t + 1) = F z(t) + Gv(t). The input u is the (energy) optimal input for the vehicle to arrive at the position f at time T . B. G. Here x(t) ∈ Rn is the vehicle state.. i. .e. x(0) = 0. to maximize Eout /Ein ? What is the maximum value the ratio Eout /Ein can have? 15. the equation above is the result of a suitable sampling. . x(0) = 0. This means that w(T ) = y(T ). w(t) = Hz(t). Find the input u(0). transpose.. . and f . . where T ≥ n.. (a) Minimum energy to reach a position. How would you choose the (nonzero) input sequence u(0). (The vehicle dynamics are really continuous.e. Here’s the problem: suppose the system fails for one second. We consider a (time-invariant) linear dynamical system x = Ax + Bu. so it has the form u(t) = gδ(t − Timp ). If either the worst-case impact time or the worst-case impact vector do not depend on some of the problem data (i. At time t = Tf . We are interested in the state trajectory over the time interval [0. for Tf ≤ t ≤ Tf + Tc . and is very commonly used in automatic control systems. Your approach can include a simple numerical search (such as plotting a function of one variable to find its maximum). no matter what x(0) is. We are interested in the deviation D between x(T ) and xnom (T ).m.01). had the impact not occured (i. 9]) and the initial condition x(0) (with x(0) ≤ 1) that maximizes x(10) .) We let xnom (T ) denote the state. We don’t know what x(0) is. B.e. the worst-case impact vector. which means that input u is meant to drive the state to zero as t → ∞. as measured by the norm: D = x(T ) − xnom (T ) . (a) Explain how to find the worst-case impact time. you must explain your approach clearly and completely.32 Worst-case analysis of impact. As usual. i. and T ) say so. In essence. corrected) Tc seconds after it occurs. In this problem the input u represents an impact on the system. We’ll call the choices of Timp and g that maximize D the worst-case impact time and worst-case impact vector.e. (3) (This is called state feedback.15. the system is governed by the equations (3). respectively. In other words. and over all possible impact times between 0 and T− . D measures how far the impact has shifted the state at time T . and T . We assume that 0 ≤ Timp ≤ T− . You can use any of the concepts we have encountered in the class. (b) Get the data from wc_impact_data.e. A.. due to an impact of magnitude no more than one. which defines A. we have 0 ≤ Tf ≤ 9. g ≤ 1).e. at time T . but you can assume that x(0) ≤ 1. xinit . and K. and Timp is the time of the impact.m on the class web site. the worse possible x(0). we would like to know the maximum possible state deviation. 9].. over all possible impact directions and magnitudes no more than one (i. of the linear system xnom = Axnom . xinit .e. at time t = T . B. where g ∈ Rm is a vector that gives the direction and magnitude of the impact. and Tc = 1. and input u(t) ∈ Rm . We would like to know how large D can be. Thus. An accuracy of 0. in the mfile fault_ctrl_sys. The problem is to find the time of failure Tf (in [0. we have x(t) = Ax(t). Your explanation should be as short and clear as possible. xinit . and T . the worst failure time Tf . and the corresponding value of D. 122 . In this problem we consider a system that under normal circumstances is governed by the equations x(t) = Ax(t) + Bu(t). and the worst-case impact vector. and carry out the procedure described in part (a). a fault occurs. and the resulting value of x(10) . B. and the results.01 for Tf is fine. given the problem data A. if needed.) Here the application is a regulator. however. Be sure to give us the worst-case impact time (with absolute precision of 0.. (Timp = T− means that the impact occurs right at the end of the period of interest. You’ll find ˙ the specific matrices A. xnom (0) = xinit . ˙ u(t) = Kx(t). with state x(t) ∈ Rn . and the input signal becomes zero. 15.. You must also give your source code. T ]. B. some time in the time interval [0.33 Worst time for control system failure.. and does affect x(T ). The vector xnom (T ) is what the final state x(T ) ˙ of the system above would have been at time t = T . In other words. you are carrying out a worst-case analysis of the effects of a one second control system failure. ˙ x(0) = xinit . with u = 0). We also don’t know the time of failure Tf . The fault is cleared (i. for t < Tf and t > Tf + Tc . where A ∈ Rn×m . as an extreme case. . . provide a specific counterexample. provide a specific (numerical) counterexample. which might represent a collection of measurements or other data.34 Some proof or counterexample questions. if you think it is false. But these 30-dimensional data points can be ‘explained’ by a much smaller set of 5 ‘factors’ (the components of xi ). for all k ≥ 1. suppose N = 500. and no matter what values A and B have. B } ≤ A B ≤ A 2 + B 2. square) and full rank matrices A and B have the same number of rows.e. . for example. In this case we have 500 vectors. J= 1 N N i=1 1/2 yi − Axi − b 123 2 . . i = 1. we’ll use the RMS explanation error. . C. as usual. we have max{ A . and Akk = 0 for some k (between 1 and n). i. given only the data. You get five points for the correct solution (i. . where κ(Z) denotes. and the goal is to find m. possibly.. any element of A is a (1 × 1) submatrix of ˜ A. A. (Our main interest is in the case when N is much larger than n. . Then we have κ(A) ≤ κ([ A B ]). If the statement is true. and B and D have the same number of columns. (a) Suppose A = AT ∈ Rn×n satisfies A ≥ 0. but you can assume that the dimensions are such that all expressions make sense. and zero points otherwise. and m is smaller than n. Ak ≥ B k . the statement holds. Compatible dimensions means: A and B have the same number of rows. however. because there are (square) matrices for which this doesn’t hold. with A > B . Determine if the following statements are true or false.15. i. or discovering. two points for the right answer (i. (f) Suppose the fat (including. n = 30. This problem is about uncovering. .. because no matter what the dimensions of A and B (which must. with m < n. We refer to x as the vector of factors or underlying causes of the data y. the statement A2 = A is false. B ∈ Rn×n . . yN . For example. Then A is singular. each vector yi consists of 30 scalar measurements or data points. (b) Suppose A. . the condition number of the matrix Z.) Then we say that y = Ax + b is a linear explanation of the data y. Suppose we have yi ≈ Axi + b. C and D have the same number of rows. Consider a set of vectors y1 . As another example. Then. b. A C B D ≤ A C B D . and b ∈ Rn . In other words. the statement “A + B = B + A” is true. (This means A is obtained from A by removing some rows and columns from A. (d) For any A. .. . B. ..e. be the same). 15. What we mean by “true” is that the statement is true for all values of the matrices and vectors given. and x1 . ˜ ˜ (c) Suppose A is a submatrix of a matrix A ∈ Rm×n .) You can’t assume anything about the dimensions of the matrices (unless it’s explicitly stated).e. the right answer and a valid proof or counterexample). . .35 Uncovering a hidden linear explanation. we are given y1 . A and C have the same number of columns. In such a case. To judge the accuracy of a proposed explanation.e. a linear explanation of a set of data. the ratio of the largest to the smallest singular value. (e) For any A and B with the same number of columns. xN so that yi ≈ Axi + b. xi ∈ Rm . . (You can assume the entries of the matrices and vectors are all real. and m = 5. A = 2 (which is a matrix in R1×1 ). true or false). . D with compatible dimensions (see below). prove it. N.) Then A ≤ A . yN ∈ Rn . For example. 420.. the matrix A. Thus. so the RMS explanation error is zero. b. with full SVD C = U ΣV T . . xN . for any A and B with the given sizes and given singular values. . find A and B that achieve the values you give. . But this is not what we’re after.000. a linear explanation of a set of data is not unique. for i = 1. as in (a) 12. . (You don’t need to know what covariance is — it’s just a definition here. with Σ = diag(σ1 . the number of factors in the explanation. have the required properties.. Let C = [A B] ∈ R6×6 .m available on the course web site.) Generally. (This is just an example. value of J. (We allow the possibility that some of these singular values are zero. N . b x1 = F x1 + g. . Suppose A. yi − Axi − b .) Finally. . it is usually assumed that 1 N N xi = 0.e. . (a) Explain clearly how you would find a hidden linear explanation for a set of data y1 .) By explicit computation verify that the vectors x1 . linear plus constant) mapping to the underlying factors xi . let F ∈ Rm×m be invertible. we have yi = Axi + b.000. ˜ . . with xi ∈ Rm .. ˜ = b − AF −1 g. . 3. substantially fewer factors than the dimension of the original data (and for this smaller dimension. 15. .One rather simple linear explanation of the data is obtained with xi = yi .) Briefly justify your answers. (b) Carry out your method on the data in the file linexp_data. 2. the underlying factors have an average value zero.552. Sort these norms in descending order and plot them. To be a useful explanation. and x1 . we want to have m substantially smaller than n. Even if we fix the number of factors as m. and we have another linear explanation of the original data. . ˜ A = AF −1 . and x1 . b. . 5. . subject to the constraint that J is not too large. xN is a linear explanation of our data. with singular values 7. . . and g ∈ Rm . Then we have yi ≈ Axi + b = (AF −1 )(F xi + g) + (b − AF −1 g). . and divide each vector xi by two. . (b) 10. . 1. xN . σ6 ).. In other words. . and b = 0. and the vectors x1 . the data explains itself! In this case. we want m. (c) 0. . . the vector b. i. . (This gives a good picture of the distribution of explanation errors. . Then we can multiply the matrix A by two (say). Suppose A ∈ R6×3 . . the dimension of the underlying causes. but hopefully small. xN obtained. . . Give your answers with 3 digits after the decimal place. The file gives the matrix Y = [y1 · · · yN ]. the problem. In other words. of course. we can apply any affine (i. and unit sample covariance. i=1 1 N N xi xT = I. A = I. and generate another equally good explanation of the original data by appropriately adjusting A and b. 124 . Explain clearly why the vectors x1 . and B ∈ R6×3 . . .e. Give your final A. . More generally. yN . i i=1 In other words. we’ll accept a nonzero. how large (or small) can the specified quantity be. and verify that yi ≈ Axi + b by calculating the norm of the error vector.) (a) How large can σ1 be? (b) How small can σ1 be? (c) How large can σ6 be? (d) How small can σ6 be? What we mean is. to be as small as possible. (d) 0.36 Some bounds on singular values. xN = F xN + g ˜ is another equally good linear explanation of the data. To standardize or normalize the linear explanation. xN have the required properties.. . with singular values 2. Be sure to say how you find m. . . σmin refers to σn (the nth largest singular value).39 Possible values for correlation coefficients. which has all eigenvalues real and positive. You may use the fact that C > 0 when C23 = 0. and κ refers to the condition number. A ∈ Rn×n .8  1 125 . x = 1? What is the resulting value of y T Ax?  0. Suppose that a correlation matrix has the form below:  1 0.4 1 C23 C=  −0. for any n. A . (d) κ(eA ) ≤ e2 A (c) κ(eA ) ≤ eκ(A) . Of course. A correlation matrix C ∈ Rn×n is one that has unit diagonal entries. n..) Whether or not you give an analytical description. Cii = 1. What is the correct way to check whether xT Ax ≥ 0 holds for all x? (You are allowed to find eigenvalues in this process. The only answers we will read are ‘True’.) (b) Find symmetric matrices A and B for which neither A ≥ B nor B ≥ A holds.e.38 Some true-false questions.2 C23 1 0. (a) Find a (square) matrix A. . . matrix exponential.15.) do so. If you can give an analytical solution in terms of any concepts from the class (eigenvalues. you must explain how you find the possible values that C23 can take on. (This last choice will receive partial credit.) If you write anything else. i.) Moral: You cannot use positivity of the eigenvalues of A as a test for whether xT Ax ≥ 0 holds for all x. In the following statements. C23 = 0 is a possible value. and is symmetric and positive semidefinite. (b) σmin (e ) ≥ eσmin (A) . . Of course. give a numerical description (and explain your method.3 −0. but you don’t need to know anything from these fields to solve this problem. that maximize y T Ax. How would you find vectors y and x. (Give A and x. if it differs from the analytical method you gave). .1 0. do not write justification for any answer. pseudoinverse.2  0. (a) eA ≤ e A A . 15. Tell us whether each statement is true or false. and the eigenvalues of A. ‘True’ means that the statement holds for any matrix A ∈ Rn×n . Correlation matrices come up in probability and statistics. and ‘My attorney has advised me to not answer this question at this time’.3 −0. etc. but there is a vector x for which xT Ax < 0. 15. (f) Rank(eA − I) ≤ Rank(A). for i = 1. 15.40 Suppose that A ∈ Rm×n . (In particular. In particular. you will receive no credit for that statement. we’d like the simplest examples in each case. ‘False’. singular values. subject to y = 1.4 −0. or provide any counter-examples. ‘False’ means that the statement is not true.37 Some simple matrix inequality counter-examples.8 What are all possible values of C23 ? Justify your answer. (e) Rank(e ) ≥ Rank(A).1   0. j) ∈ K).) (a) Explain how to find w. you are welcome to flip the sign on w. This doesn’t determine w uniquely. t = 1.e. Once you’ve chosen w. Give your error rate.) ij We let p = |K| denote the number of known entries. . (You are given Aknown for (i. So we can only hope to recover either an approximation of st or of −st . You are told that A ∈ Rm×n has rank ≤ r. T. possibly. for (i. whose columns are yt . you carry out the following steps: 126 . given the received signal. If you ˆ knew the vector a. j) ∈ K. .. along with an upper bound on its rank. where a ∈ Rn and vt ∈ Rn is a noise signal. . Here is one approach. and assuming that we have chosen w so that wT yt ≈ st . . . A(0) = Aknown for ((i. ˜ where sign(u) is +1 for u ≥ 0 and −1 for u < 0. we T can multiply it by −1 and it still minimizes w subject to (1/T ) t=1 (wT yt )2 = 1. which receives the (vector) signal yt = ast + vt ∈ Rn . . Give the weight vector w that you find.m contains the original signal. the vector a) is called blind signal estimation or blind signal detection. sT .50). ˆ You will use an alternating projection method to find an estimate A of A. you are to guess or estimate the remaining entries. (In practice we’d use other methods to determine whether we have recovered st or −st .2 Alternating projections for low rank matrix completion. . After choosing an initial ˆ(0) . as a row vector s. . that has the known correct entries (i. . 1} is transmitted to a receiver. The file bs_det_data. . This leads us to ˆ T choose w to minimize w subject to (1/T ) t=1 (wT yt )2 = 1. you are given some of the entries of a matrix.) The receiver will form an approximation of the transmitted signal as st = wT yt . . . . (This last statement is vague. Plot a histogram of the values of wT yt using hist(w’*Y. (b) Apply the method to the signal in the file bs_det_data. . one of which is described at the end of this problem.. This question investigates a heuristic method for the low rank matrix completion problem. (If ˜ this is more than 50%. . with st ∈ {−1. using concepts from the class. . yT . j) ∈ K. but it will not matter. T. then a reasonable choice for w would be w = a† = a/ a 2 . where K ⊆ ij {1. its negative −st ) is given by st = sign(wT yt ). given the received vector signal y1 . and that Aij = Aknown for (i. one negative and one positive. A binary signal s1 . Your job is to choose the weight vector w so that st ≈ st . we want w to be as small as possible. You’ll know you’re doing well if the result has two peaks. .) 16. if we don’t know a we really can’t do any better.Lecture 16 – SVD applications 16. . You are to estimate or guess the entries Aij .m. t = 1. a reasonable guess of st (or. This arises in several applications. . . the fraction of times for which st = st . . but is otherwise unknown. ˆ t = 1. you will alternate ˆ point A ij ij between two projections. . Since w vt gives the noise contribution to st . . . i. we would have T (1/T ) t=1 T (wT yt )2 ≈ 1. n} is the set of indices of the known entries. We’ll assume that a = 0. where w ∈ Rn is a weight vector. when you don’t know the mapping from transmitted to received signal (in this case. Ignoring the noise signal. Estimating the transmitted signal. .e. . . This choice is the smallest (in norm) vector w for which wT a = 1. For k = 0. which contains a matrix Y. . Here’s the catch: You don’t know the vector a. In the low rank matrix completion problem. j) ∈ K. . m} × {1. . . and that the noise signal is centered around zero.1 Blind signal detection. 1. T . j) for one known entry. ij ij To judge the performance of the algorithm. and the known matrix entries. We have access to some of the ratings. The p-vector Aknown gives the corresponding known values. . The graph is specified by its node incidence matrix A ∈ Rn×m . for k = 1. This file defines the rank upper bound r. Algorithms like this can be used for problems like the Netflix challenge. it is only for your amusement and interest. but overly complicated methods. A network consists of a directed graph with n nodes connected by m directed links. i. Do not use any matlab notation in your answer. Make a very brief comment about how well the algorithm worked on this data set. We will subtract points for technically correct. . Here Aij represents the rating user i gives (or would give) to movie j. that minimizes that is closest to A ˜ ˆ A(k) − A(k) ˜ subject to Rank(A(k) ) ≤ r.e. None of this is needed to solve the problem. for example. j) ∈ K.m.) It is reasonable to assume (and is confirmed with real data) that ratings matrices like A have (approximately) low rank. j) ∈ K. 16. and A = µ for (i. (Of course in ˆ applications.3 Least-squares matching of supply and demand on a network.  m n F = i=1 j=1 ˜(k) ˆ(k) (Aij − Aij )2  1/2 ˆ • Project to the closest matrix with the known entries. j) ∈ K.. You can allude to the fact that you are given only around one sixth of the entries of A. Remark. subject to Aij ij = ˆ (Aij i=1 j=1 (k+1) ˜(k) − Aij )2  1/2 This is a heuristic method: It can fail to converge at all. The known matrix indices are given as a p × 2 matrix K.e. and want to predict other ratings before they are given. the mfile also gives the actual matrix A. you would not have access to the matrix A!) Plot A(k) − A F . This can be interpreted as meaning that a user’s rating is (mostly) determined by a relatively small number of factors. with each row giving (i. Set A(k+1) to be the matrix with the given ˜(k) in Frobenius norm. Aij =  0 otherwise 127 . where   +1 edge j enters node i −1 edge j leaves node i . (a) Clearly explain how to perform each of these projections. depending on the starting point. the entries in the kth right singular vector tell us how much of factor k (positive or negative) is in each movie. The entries in the kth left singular vector tell us how much the user ratings are influenced (positively or negatively) by factor k. . 300. or it can converge to different limit points. (b) Use 300 steps of the alternating projections algorithm to find a low rank matrix completion estimate for the problem defined in lrmc.˜ • Project to the closest matrix satisfying the rank constraint. (This would allow us to make recommendations. i. Let µ denote the mean of all the known entries. Set Aij = ˆ(0) Aknown for (i. the dimensions m and n.. But it often works well. . that minimizes known entries that is closest to A ˆ ˜ A(k+1) − A(k)  m n F ˆ(k+1) = Aknown for (i. ˆ(0) Initialize your method as follows. Set A(k) to be the matrix of rank ≤ r ˆ(k) in Frobenius norm. if this doesn’t hold. the quantity of commodity at node i is equal to the original amount there (i. Here. qi ) plus any amount shipped in from neighboring nodes. We denote the post-shipment quantity at node i as qi . for p = 1. m.) We assume that i=1 qi = i=1 di . k − 1. Explain your method in the general case. do so. . . i. In addition. The data r ∈ Rm and c ∈ Rn must satisfy 1T r = 1T c. If you can give a nice formula for the optimal A. .. and the constraints are A1 = r.e. ri (which give the rows sums) are given. . If your method ˜ j involves matrix inversion (and we’re not saying it must).) 128 where 1 denotes a vector (of appropriate size in each case) with all components one.) Each node i has a quantity qi of some commodity.      32   14     4  34 12 . n. the objective can be written as J = Tr(AT A). . as well as a demand di for the commodity. the total quantity available equals the total demand. and you must limit your argument to one page. Explain why there always exists a shipment vector s ∈ Rm which results in q = d. j=1 i = 1. . You can refer ˜ to any concepts and results from the class. . This can be positive or negative: sj > 0 means that we ship the amount |sj | in the direction of the edge orientation. . perfect matching of supply and demand at each node. i=1 Aij = cj . . Hint: First characterize N (AT ). sj < 0 just means that we ship the amount |sj | in the direction opposite to the edge orientation.. . of nodes i = i1 . . ij subject to the constraints n m Aij = ri .  r= c=  22   28  .e. ˜ (a) Ability to match supply and demand. there is no A that can satisfy the constraints.e. as are cj (which give the column sums). (This means that i. Explain how to find a shipment vector s that m achieves q = d. ik = ˜ with an edge between ip and ip+1 . You can use any concepts from the class. (These n n are typically nonnegative. After shipment. minus any amount shipped out from node i. . . You can assume m n that i=1 ri = j=1 cj . i. Explain how to find the matrix A ∈ Rm×n that minimizes m n J= i=1 j=1 A2 . AT 1 = c.We assume the graph is connected: For any pair of distinct nodes i and ˜ with i = ˜ there is a sequence i. and minimizes j=1 s2 . (b) Least-squares matching of supply and demand. j = 1. but this won’t matter here. with m = 6 and n = 8.. 16. . you’ll need to justify that the inverses exist. We will ship an amount sj along each edge j. between any nodes there is a path. carry out your method for the specific data   24    20  30    18   16       26   8  . (Entries in A do not have to be integers. ignoring edge orientation. . Using matrix notation. . i.4 Smallest matrix with given row and column sums. 16.5 Condition number. Show that κ(A) = 1 if and only if A is a multiple of an orthogonal matrix. Thus the best conditioned matrices are precisely (scaled) orthogonal matrices. 16.6 Tightness of the condition number sensitivity bound. Suppose A is invertible, Ax = y, and A(x + δx) = y + δy. In the lecture notes we showed that δx / x ≤ κ(A) δy / y . Show that this bound is not conservative, i.e., there are x, y, δx, and δy such that equality holds. Conclusion: the bound on relative error can be taken on, if the data x is in a particularly unlucky direction and the data error δx is in (another) unlucky direction. 16.7 Sensitivity to errors in A. This problem concerns the relative error incurred in solving a set of linear equations when there are errors in the matrix A (as opposed to error in the data vector b). Suppose A is invertible, Ax = b, and (A + δA)(x + δx) = b. Show that δx / x + δx ≤ κ(A) δA / A . 16.8 Minimum and maximum RMS gain of an FIR filter. Consider an FIR filter with impulse response h1 = 1, h2 = 0.6, h3 = 0.2, h4 = −0.2, h5 = −0.1. We’ll consider inputs of length 50 (i.e., input signal that are zero for t > 50), so the output will have length (up to) 54, since the FIR filter has length 5. Let u ∈ R50 denote the input signal, and y ∈ R54 denote the resulting output signal. The RMS gain of the filter for a signal u is defined as g= √1 54 √1 50 y u , which is the ratio of the RMS value of the output to the RMS value of the input. Obviously, the gain g depends on the input signal. (a) Find the maximum RMS gain gmax of the FIR filter, i.e., the largest value g can have. Plot the corresponding input signal whose RMS gain is gmax . (b) Find the minimum RMS gain gmin of the FIR filter, i.e., the smallest value g can have. Plot the corresponding input signal whose RMS gain is gmin . (c) Plot the magnitude of the transfer function of the FIR filter, i.e., |H(ejΩ )|, where 5 H(ejΩ ) = k=1 hk e−jkΩ . Find the maximum and minimum absolute values of the transfer function, and the frequencies at which they are attained. Compare to the results from parts a and b. Hint: To plot the magnitude of the transfer function, you may want to use the freqz matlab command. Make sure you understand what freqz does (using help freqz, for example). (d) (This part is for fun.) Make a conjecture about the maximum and minimum singular values of a Toeplitz matrix, and the associated left and right singular vectors. 16.9 Detecting linear relations. Suppose we have N measurements y1 , . . . , yN of a vector signal x1 , . . . , xN ∈ Rn : yi = xi + di , i = 1, . . . , N. Here di is some small measurement or sensor noise. We hypothesize that there is a linear relation among the components of the vector signal x, i.e., there is a nonzero vector q such that q T xi = 0, i = 1, . . . , N . The geometric interpretation is that all of the vectors xi lie in the hyperplane q T x = 0. We will assume that q = 1, which does not affect the linear relation. Even if the xi ’s do lie in a hyperplane q T x = 0, our measurements yi will not; we will have q T yi = q T di . These numbers are small, assuming the measurement noise is small. So the problem of determing whether or not there is 129 a linear relation among the components of the vectors xi comes down to finding out whether or not there is a unit-norm vector q such that q T yi , i = 1, . . . , N , are all small. We can view this problem geometrically as well. Assuming that the xi ’s all lie in the hyperplane q T x = 0, and the di ’s are small, the yi ’s will all lie close to the hyperplane. Thus a scatter plot of the yi ’s will reveal a sort of flat cloud, concentrated near the hyperplane q T x = 0. Indeed, for any z and q = 1, |q T z| is the distance from the vector z to the hyperplane q T x = 0. So we seek a vector q, q = 1, such that all the measurements y1 , . . . , yN lie close to the hyperplane q T x = 0 (that is, q T yi are all small). How can we determine if there is such a vector, and what is its value? We define the following normalized measure: 1 N N ρ= (q T yi )2 i=1 1 N N yi 2 . i=1 This measure is simply the ratio between the root mean square distance of the vectors to the hyperplane q T x = 0 and the root mean square length of the vectors. If ρ is small, it means that the measurements lie close to the hyperplane q T x = 0. Obviously, ρ depends on q. Here is the problem: explain how to find the minimum value of ρ over all unit-norm vectors q, and the unit-norm vector q that achieves this minimum, given the data set y1 , . . . , yN . 16.10 Stability conditions for the distributed congestion control scheme. We consider the congestion control scheme in problem 3, and will use the notation from that problem. In this problem, we study the dynamics and convergence properties of the rate adjustment scheme. To simplify things, we’ll assume that the route matrix R is skinny and full rank. You can also assume that α > 0. Let xls = (RT R)−1 RT T target denote the least-squares approximate solution of the (over-determined) equa¯ tions Rx ≈ T target . (The source rates given by xls minimize the sum of the squares of the congestion ¯ measures on all paths.) (a) Find the conditions on the update parameter α under which the rate adjustment scheme converges to xls , no matter what the initial source rate is. ¯ (b) Find the value of α that results in the fastest possible asymptotic convergence of the rate adjustment algorithm. Find the associated asymptotic convergence rate. We define the convergence rate as the smallest number c for which x(t) − xls ≤ act holds for all trajectories and all t (the ¯ constant a can depend on the trajectory). You can give your solutions in terms of any of the concepts we have studied, e.g., matrix exponential, eigenvalues, singular values, condition number, and so on. Your answers can, of course, depend on R, T target , and xls . If your answer doesn’t depend on some of these (or even all of them) be sure to point ¯ this out. We’ll take points off if you give a solution that is correct, but not as simple as it can be. 16.11 Consider the system x = Ax with ˙  0.3132  −0.0897  A =  0.0845   0.2478 0.1744 0.3566 0.2913 0.2433 −0.1875 0.2315 0.2545 0.1888 −0.5888 0.2233 −0.1004 0.2579 0.4392 −0.0407 0.3126 −0.2111 0.2063 0.1470 0.1744 −0.6711 0.0428  (a) Find the initial state x(0) ∈ R5 satisfying x(0) = 1 such that x(3) is maximum. In other words, find an initial condition of unit norm that produces the largest state at t = 3.   .   (b) Find the initial state x(0) ∈ R5 satisfying x(0) = 1 such that x(3) is minimum. To save you the trouble of typing in the matrix A, you can find it on the web page in the file max min init state.m. 130 16.12 Regularization and SVD. Let A ∈ Rn×n be full rank, with SVD n A= i=1 T σ i u i vi . (We consider the square, full rank case just for simplicity; it’s not too hard to consider the general nonsquare, non-full rank case.) Recall that the regularized approximate solution of Ax = y is defined as the vector xreg ∈ Rn that minimizes the function Ax − y 2 + µ x 2, where µ > 0 is the regularization parameter. The regularized solution is a linear function of y, so it can be expressed as xreg = By where B ∈ Rn×n . (a) Express the SVD of B in terms of the SVD of A. To be more specific, let n B= i=1 σ i u i vi ˜ ˜ ˜T denote the SVD of B. Express σi , ui , vi , for i = 1, . . . , n, in terms of σi , ui , vi , i = 1, . . . , n (and, ˜ ˜ ˜ possibly, µ). Recall the convention that σ1 ≥ · · · ≥ σn . ˜ ˜ (b) Find the norm of B. Give your answer in terms of the SVD of A (and µ). (c) Find the worst-case relative inversion error, defined as max y=0 ABy − y . y Give your answer in terms of the SVD of A (and µ). 16.13 Optimal binary signalling. We consider a communication system given by y(t) = Au(t) + v(t), Here • • • • u(t) ∈ Rn is the transmitted (vector) signal at time t y(t) ∈ Rm is the received (vector) signal at time t v(t) ∈ Rm is noise at time t t = 0, 1, . . . is the (discrete) time t = 0, 1, . . . . Note that the system has no memory: y(t) depends only on u(t). For the noise, we assume that v(t) < Vmax . Other than this maximum value for the norm, we know nothing about the noise (for example, we do not assume it is random). We consider binary signalling, which means that at each time t, the transmitter sends one of two signals, i.e., we have either u(t) = s1 ∈ Rn or u(t) = s2 ∈ Rn . The receiver then guesses which of the two signals was sent, based on y(t). The process of guessing which signal was sent, based on the received signal y(t), is called decoding. In this problem we are only interested in the case when the communication is completely reliable, which means that the receiver’s estimate of which signal was sent is always correct, no matter what v(t) is (provided v(t) < Vmax , of course). Intuition suggests that this is possible when Vmax is small enough. (a) Your job is to design the signal constellation, i.e., the vectors s1 ∈ Rn and s2 ∈ Rn , and the associated (reliable) decoding algorithm used by the receiver. Your signal constellation should minimize the maximum transmitter power, i.e., Pmax = max{ s1 , s2 }. You must describe: 131 • • • • your analysis of this problem, how you come up with s1 and s2 , the exact decoding algorithm used, how you know that the decoding algorithm is reliable, i.e., the receiver’s guess of which signal was sent is always correct. (b) The file opt_bin_data.m contains the matrix A and the scalar Vmax . Using your findings from part 1, determine the optimal signal constellation. 16.14 Some input optimization problems. with  1 0 0  1 1 0 A=  0 1 1 1 0 0 In this problem we consider the system x(t + 1) = Ax(t) + Bu(t),      1 0 1 0  0   0 1  0    , x(0) =  B=  −1  .  1 0 , 0  1 0 0 0 (a) Least-norm input to steer state to zero in minimum time. Find the minimum T , Tmin , such that x(T ) = 0 is possible. Among all (u(0), u(1), . . . u(Tmin − 1)) that steer x(0) to x(Tmin ) = 0, find the one of minimum norm, i.e., the one that minimizes JTmin = u(0) Give the minimum value of JTmin achieved. 2 + · · · + u(Tmin − 1) 2 . (b) Least-norm input to achieve x(10) ≤ 0.1. In lecture we worked out the least-norm input that drives the state exactly to zero at t = 10. Suppose instead we only require the state to be small at t = 10, for example, x(10) ≤ 0.1. Find u(0), u(1), . . . , u(9) that minimize J9 = u(0) 2 + · · · + u(9) 2 subject to the condition x(10) ≤ 0.1. Give the value of J9 achieved by your input. 16.15 Determining the number of signal sources. The signal transmitted by n sources is measured at m receivers. The signal transmitted by each of the sources at sampling period k, for k = 1, . . . , p, is denoted by an n-vector x(k) ∈ Rn . The gain from the j-th source to the i-th receiver is denoted by aij ∈ R. The signal measured at the receivers is then y(k) = A x(k) + v(k), k = 1, . . . , p, where v(k) ∈ Rm is a vector of sensor noises, and A ∈ Rm×n is the matrix of source to receiver gains. However, we do not know the gains aij , nor the transmitted signal x(k), nor even the number of sources present n. We only have the following additional a priori information: • We expect the number of sources to be less than the number of receivers (i.e., n < m, so that A is skinny); • All sources have roughly the same average power, the signal x(k) is unpredictable, and the source signals are unrelated to each other; Hence, given enough samples (i.e., p large) the vectors x(k) will ‘point in all directions’; • The sensor noise v(k) is small relative to the received signal A x(k). Here’s the question: (a) You are given a large number of vectors of sensor measurements y(k) ∈ Rm , k = 1, . . . , p. How would you estimate the number of sources, n? Be sure to clearly describe your proposed method for determining n, and to explain when and why it works. 132 • A is full-rank and well-conditioned; (b) Try your method on the signals given in the file nsources.m. Running this script will define the variables: • m, the number of receivers; • p, the number of signal samples; • Y, the receiver sensor measurements, an array of size m by p (the k-th column of Y is y(k).) What can you say about the number of signal sources present? Note: Our problem description and assumptions are not precise. An important part of this problem is to explain your method, and clarify the assumptions. 16.16 The EE263 search engine. In this problem we examine how linear algebra and low-rank approximations can be used to find matches to a search query in a set of documents. Let’s assume we have four documents: A, B, C, and D. We want to search these documents for three terms: piano, violin, and drum. We know that: in A, the word piano appears 4 times, violin 3 times, and drum 1 time; in B, the word piano appears 6 times, violin 1 time, and drum 0 times; in C, the word piano appears 7 time, violin 4 times, and drum 39 times; and in D, the word piano appears 0 times, violin 0 times, and drum 5 times. We can tabulate this as follows: A B C D piano 4 6 7 0 violin 3 1 4 0 drum 1 0 39 5 This information is used to form a term-by-document matrix A, where Aij specifies the frequency of the ith term in the jth document, i.e.,   4 6 7 0 A =  3 1 4 0 . 1 0 39 5 Now let q be a query vector, with a non-zero entry for each term. The query vector expresses a criterion by which to select a document. Typically, q will have 1 in the entries corresponding to the words we want to search for, and 0 in all other entries (but other weighting schemes are possible.) A simple measure of how relevant document j is to the query is given by the inner product of the jth column of A with q: aT q. j However, this criterion is biased towards large documents. For instance, a query for piano (q = [ 1 0 0 ]T ) by this criterion would return document C as most relevant, even though document B (and even A) is probably much more relevant. For this reason, we use the inner product normalized by the norm of the vectors, aT q j . aj q Note that our criterion for measuring how well a document matches the query is now the cosine of the angle between the document and query vectors. Since all entries are non-negative, the cosine is in [0, 1] ˜ (and the angle is in [−π/2, π/2].) Define A and q as normalized versions of A and q (A is normalized ˜ column-wise, i.e., each column is divided by its norm.) Then, ˜ ˜ c = AT q is a column vector that gives a measure of the relevance of each document to the query. And now, the question. In the file term_by_doc.m you are given m search terms, n documents, and the corresponding term-by-document matrix A ∈ Rm×n . (They were obtained randomly from Stanford’s 133 Portfolio collection of internal documents.) The variables term and document are lists of strings. The string term{i} contains the ith word. Each document is specified by its URL, i.e., if you point your web browser to the URL specified by the string document{j} you can inspect the contents of the jth document. The matrix entry A(i,j) specifies how many times term i appears in document j. ˜ (a) Compute A, the normalized term-by-document matrix. Compute and plot the singular values of ˜ A. ˜ (b) Perform a query for the word students (i = 53) on A. What are the 5 top results? ˜ (c) We will now consider low-rank approximations of A, that is ˆ Ar = ˆ ˆ A, rank(A)≤r min ˜ ˆ A−A . ˆ ˆ ˆ ˆ Compute A32 , A16 , A8 , and A4 . Perform a query for the word students on these matrices. Comment on the results. (d) Are there advantages of using low-rank approximations over using the full-rank matrix? (You can assume that a very large number of searches will be performed before the term-by-document matrix is updated.) Note: Variations and extensions of this idea are actually used in commercial search engines (although the details are closely guarded secrets . . . ) Issues in real search engines include the fact that m and n are enormous and change with time. These methods are very interesting because they can recover documents that don’t include the term searched for. For example, a search for automobile could retrieve a document with no mention of automobile, but many references to cars (can you give a justification for this?) For this reason, this approach is sometimes called latent semantic indexing. Matlab hints: You may find the command sort useful. It sorts a vector in ascending order, and can also return a vector with the original indexes of the sorted elements. Here’s a sample code that sorts the vector c in descending order, and prints the URL of the top document and its corresponding cj . [c,j]=sort(-c); c=-c; disp(document{j(1)}) disp(c(1)) 16.17 Condition number and angles between columns. Suppose A ∈ Rn×n has columns a1 , . . . , an ∈ Rn , each of which has unit norm: A = [a1 a2 · · · an ], ai = 1, i = 1, . . . , n. Suppose that two of the columns have an angle less than 10◦ between them, i.e., aT al ≥ cos 10◦ . Show k that κ(A) ≥ 10, where κ denotes the condition number. (If A is singular, we take κ(A) = ∞, and so κ(A) ≥ 10 holds.) Interpretation: If the columns were orthogonal, i.e., (ai , aj ) = 90◦ for i = j, i, j = 1, . . . , n, then A would be an orthogonal matrix, and therefore its condition number would be one (which is the smallest a condition number can be). At the other extreme, if two columns are the same (i.e., have zero angle between them), the matrix is singular, and the condition number is infinite. Intuition suggests that if some pair of columns has a small angle, such as 10◦ , then the condition number must be large. (Although in many applications, a condition number of 10 is not considered large.) 16.18 Analysis and optimization of a communication network. A communication network is modeled as a set of m directed links connecting nodes. There are n routes in the network. A route is a path, along one 134 or more links in the network, from a source node to a destination node. In this problem, the routes are fixed, and are described by an m × n route-link matrix A, defined as Aij = 1 route j passes through link i 0 otherwise. Over each route we have a nonnegative flow, measured in (say) bits per second. We denote the flow along route j as fj , and we call f ∈ Rn the flow vector. The traffic on a link i, denoted ti , is the sum of the flows on all routes passing through link i. The vector t ∈ Rm is called the traffic vector. handle. We’re Each link has an associated nonnegative delay, measured in (say) seconds. We denote the delay for link i as di , and refer to d ∈ Rm as the link delay vector. The latency on a route j, denoted lj , is the sum of the delays along each link constituting the route, i.e., the time it takes for bits entering the source to emerge at the destination. The vector l ∈ Rn is the route latency vector. The total number of bits in the network at an instant in time is given by B = f T l = tT d. (a) Worst-case flows and delays. Suppose the flows and link delays satisfy n m 2 fj ≤ F 2 , (1/n) j=1 (1/m) i=1 d2 ≤ D 2 , i where F and D are given. What is the maximum possible number of bits in the network? What values of f and d achieve this maximum value? (For this problem you can ignore the constraint that the flows and delays must be nonnegative. It turns out, however, that the worst-case flows and delays can always be chosen to be nonnegative.) (b) Utility maximization. For a flow fj , the network operator derives income at a rate pj fj , where pj is the price per unit flow on route j. The network operator’s total rate of income is thus n j=1 pj fj . (The route prices are known and positive.) The network operator is charged at a rate ci ti for having traffic ti on link i, where ci is the cost m per unit of traffic on link i. The total charge rate for link traffic is i=1 ti ci . (The link costs are known and positive.) The net income rate (or utility) to the network operator is therefore n m U net = j=1 pj fj − ci ti . i=1 Find the flow vector f that maximizes the operator’s net income rate, subject to the constraint that each fj is between 0 and F max , where F max is a given positive maximum flow value. 16.19 A heuristic for MAXCUT. Consider a graph with n nodes and m edges, with the nodes labeled 1, . . . , n and the edges labeled 1, . . . , m. We partition the nodes into two groups, B and C, i.e., B ∩ C = ∅, B ∪ C = {1, . . . , n}. We define the number of cuts associated with this partition as the number of edges between pairs of nodes when one of the nodes is in B and the other is in C. A famous problem, called the MAXCUT problem, involves choosing a partition (i.e., B and C) that maximizes the number of cuts for a given graph. For any partition, the number of cuts can be no more than m. If the number of cuts is m, nodes in group B connect only to nodes in group C and the graph is bipartite. The MAXCUT problem has many applications. We describe one here, although you do not need it to solve this problem. Suppose we have a communication system that operates with a two-phase clock. During periods t = 0, 2, 4, . . ., each node in group B transmits data to nodes in group C that it is connected to; during periods t = 1, 3, 5, . . ., each node in group C transmits to the nodes in group B that it is connected to. The number of cuts, then, is exactly the number of successful transmissions 135 that can occur in a two-period cycle. The MAXCUT problem is to assign nodes to the two groups so as to maximize the overall efficiency of communication. It turns out that the MAXCUT problem is hard to solve exactly, at least if we don’t want to resort to an exhaustive search over all, or most of, the 2n−1 possible partitions. In this problem we explore a sophisticated heuristic method for finding a good (if not the best) partition in a way that scales to large graphs. We will encode the partition as a vector x ∈ Rn , with xi ∈ {−1, 1}. The associated partition has xi = 1 for i ∈ B and xi = −1 for i ∈ C. We describe the graph by its node adjacency matrix A ∈ Rn×n C), with 1 there is an edge between node i and node j Aij = 0 otherwise Note that A is symmetric and Aii = 0 (since we do not have self-loops in our graph). (a) Find a symmetric matrix P , with Pii = 0 for i = 1, . . . , n, and a constant d, for which xT P x + d is the number of cuts encoded by any partitioning vector x. Explain how to calculate P and d from A. Of course, P and d cannot depend on x. The MAXCUT problem can now be stated as the optimization problem maximize xT P x + d subject to x2 = 1, i = 1, . . . , n, i (b) A famous heuristic for approximately solving MAXCUT is to replace the n constraints x2 = 1, i n i = 1, . . . , n, with a single constraint i=1 x2 = n, creating the so-called relaxed problem i maximize xT P x + d n 2 subject to i=1 xi = n. Explain how to solve this relaxed problem (even if you could not solve part (a)). Let x⋆ be a solution to the relaxed problem. We generate our candidate partition with xi = sign(x⋆ ). (This means that xi = 1 if x⋆ ≥ 0, and xi = −1 if x⋆ < 0.) really i i i Remark: We can give a geometric interpretation of the relaxed problem, which will also explain why it’s called relaxed. The constraints in the problem in part (a), that x2 = 1, require x to lie i on the vertices of the unit hypercube. In the relaxed problem, the constraint set is the unit ball of unit radius. Because this constraint set is larger than the original constraint set (i.e., it includes it), we say the constraints have been relaxed. (c) Run the MAXCUT heuristic described in part (b) on the data given in mc_data.m. How many cuts does your partition yield? A simple alternative to MAXCUT is to generate a large number of random partitions, using the random partition that maximizes the number of cuts as an approximate solution. Carry out this method with 1000 random partitions generated by x = sign(rand(n,1)-0.5). What is the largest number of cuts obtained by these random partitions? Note: There are many other heuristics for approximately solving the MAXCUT problem. However, we are not interested in them. In particular, please do not submit any other method for approximately solving MAXCUT. 16.20 Simultaneously estimating student ability and exercise difficulty. Each of n students takes an exam that contains m questions. Student j receives (nonnegative) grade Gij on question i. One simple model ˆ for predicting the grades is to estimate Gij ≈ Gij = aj /di , where aj is a (nonnegative) number that gives the ability of student j, and di is a (positive) number that gives the difficulty of exam question i. 136 with variable x ∈ Rn . If carrying out your method requires some rank or other conditions to hold.. and a set of positive. Finally. mn i=1 j=1 ij (a) Explain how to solve this problem. You can use any of the concepts in the class. without affecting Gij . W) = 0. You can just assume this works out. W) = w∈W. This angle is zero if and only if the two subspaces are equal. normalized question difficulties. If (V. B ∈ Rn×r . In this problem. Explain how you could find or compute (V. so ˆ that Gij ≈ Gij . v=0 max (v. etc. Let V = range(A) and W = range(B). w=0 max (w.m.21 Angle between two subspaces. w).m. least-squares. (b) Carry out your method for the matrices found in angsubdata. Give the numerical value for (range(A). SVD. you are given a complete set of grades (i. Your task is to find a set of nonnegative student abilities. range(B)). (b) Carry out your method on the data found in grademodeldata. norm. W) = max v∈V. we define the angle between two nonzero subspaces V and W as (V. (v. to ensure a unique model. w) = cos−1 vT w v w . If v ∈ W. Thus. Jordan form. J. QR factorization. it means that either there is a vector in V whose minimum angle to any vector of W is 10◦ . If your method is approximate. w=0 min (v. say so. Thus..g. or there is a vector in W whose minimum angle to any vector of V is 10◦ . In particular. 137 .e.  1/2 m n  1 G2  . say so. W). and also express it as a fraction of the RMS value of the grades. Give the difficulties of the 7 problems on the exam. 16. so that the mean exam question difficulty across the m questions is 1. we could simultaneously scale the student abilities and the exam difficulties ˆ by any positive number. pseudo-inverse. or not guaranteed to find the global minimum value of J. 1 J = mn  m n i=1 j=1 ˆ Gij − Gij 2 1/2  . Give the optimal value of J. This can be compared to the RMS value of the grades. We define the angle between a nonzero vector v ∈ Rn and a (nonzero) subspace W ⊆ Rn as (v. e. say. W). (a) Suppose you are given two matrices A ∈ Rn×r . we have (v. The angle between two nonzero vectors v and w in Rn is defined as (v. w∈W. using any concepts from EE263. choose your model to minimize the RMS error. or is easily corrected. each of rank r. Note: You do not have to concern yourself with the requirement that aj are nonnegative and di are positive. V) .Given a particular model. where we take cos−1 (a) as being between 0 and π. we will normalize the exam question difficulties di . W) = 10◦ means that the smallest angle between v and any vector in W is 10◦ . W) = 10◦ . the matrix G ∈ Rm×n ). . . Very roughly.23 One of these vectors doesn’t fit. and has only a small amount of energy contributed by x1 . i. where the tth column of Y is the vector y(t). wT y is a linear combination of the scalar signals x1 . An n-vector valued signal. . x(m) ∈ Rn . xn−1 have an RMS value substantially larger than xn . x(t) ∈ Rn . To find the delay of the gate plus its load. . . y(t) = Ax(t). It is also the signal of interest for this problem. Note that if w were chosen so that wT A = αeT . .g. . One of the vectors doesn’t fit with the others. we have to solve a complex. The file will also define n and T . . t = 1. ˆ range. Find the index of the vector that doesn’t fit. . The file one_of_these_data. nonlinear ordinary differential equation that takes into account 138 . rank. or geometric (or both). The matlab file faintestaudio. . In any case. Your explanation can be algebraic. subject to ˆ w = 1. ˆ The following is not needed to solve the problem.16. . whose columns we denote as x(1) . and a wave file of your estimate xn . xn are unrelated to each other. . along with the associated RMS value of xn . singular values. . is exactly what we want. . Carefully explain your method. where A ∈ Rn×n is invertible. we don’t know A. ˆ t = 1. that’s because the linear combination is ‘rich in xn ’. . The columns are (vector) data collected in some application. sadly. xn is the faintest scalar signal. . 16. If the linear combination has a small norm. xn . We consider a CMOS digital gate that drives a load consisting of interconnect wires that connect to the inputs of other gates. . The signals are actually audio tracks. you don’t need to worry about why the heuristic works (or doesn’t work)—it’s the method you are going to use in this problem.m contains an n × m matrix X. t = 1. the important part of our heuristic: we choose w to minimize the RMS value of xn . But. (b) Carry out your method on the problem instance with n = 4. . eigenvalues. T . Since the question is vague. but we are given a linear transformation of it. If we knew A. and especially. is defined for t = 1. . T. QR factorization. . . We will form our estimate as xn (t) = wT y(t). xn−1 . but it should be simple to state.22 Extracting the faintest signal.24 Extracting RC values from delay data. (a) Explain how to find a w that minimizes the RMS value of xn .e. T. y4 . Submit your code. . where w ∈ Rn is a vector of weights. . You ˆ are welcome to generate and listen to these files. we could easily recover the original signal (and therefore also the faintest scalar signal xn (t)). described in the matlab file faintestdata. and so on). nearly orthogonal). one idea behind the heuristic is that. We aren’t given the vector signal x(t). . . and involve ideas and methods from this course. The scalar signals x1 . . . . . or rambling explanation.m.. . xi (t). . in fact. permuting the columns would make no difference. . we want a nice. . . each 3. in what way the vector you’ve chosen doesn’t fit with the others. . . in general. using concepts from the class (e. T = 26000. for t = 1. 16. . This file will define an n × T matrix Y .. and give us the optimal weight vector w ∈ R4 you find. T . We will assume that the scalar signals x1 . using x(t) = A−1 y(t). . We’ll refer to its ith component. That. Now. clarity in your explanation of your method and approach is very important.e. . In particular. in other words. . then we would have xn (t) = αxn (t). with α = 0 a n constant. as the ith scalar signal.m contains commands to generate wave files of the linear combinations y1 . .25 seconds long and sampled at 8 kHz. complicated. Here is a heuristic method for guessing xn (t). T . In other words. least-squares. a perfect reconstruction except for the scale factor ˆ α. We will not read a long. and so are nearly uncorrelated (i. . . . The ordering of the vectors isn’t relevant.. short explanation. But they are not the same. (For simplicity we’ll take η = 1 in our delay model. . is sometimes called parameter extraction. Rn and C1 . 2 msl msl msl msl (b) Minimum mean-square logarithmic error. and a value of C for each of a set of loads. Emsa = 1 mn m n i=1 j=1 (Dij − Rj Ci ) . which is based on the time it takes the voltage of a simple RC circuit to decay to 1/2 its original value. . . . Cm msa (positive. . and so on. and the load as a simple capacitor with capacitance C. Submit the matlab code used to calculate these values. Rn and C1 .circuit nonlinearities.69. . 139 . A very simple approximation of the delay can be obtained by modeling the gate as a simple resistor with resistance R. We define the peaking factor of the system as the largest possible value of x(t + τ ) / x(t) . • The two criteria (absolute and logarithmic) are clearly close. .) Finding good values of parameters for a simple model. Explain how to find R1 . . In this problem. . parasitic capacitances. Explain how to find R1 . Emsl 1 = mn m n i=1 j=1 (log Dij − log(Rj Ci )) . given accurate delay data (obtained by simulation) for each combination of a gate driving a load. . • In this problem we are more interested in your approach and method than the final numerical results. We will take points off if your method is not the best possible one. . we will fix C1 = 1. Cm . . any t. . . even if your answer is numerically close to the correct one. Cm . . as well as R1 . m. n. 2 (The logarithm here is base e. .) msa msa msa msa msl msl msl msl (c) Find R1 . . and any τ ≥ 0. . for the particular delay data given in rcextract_data. msa msa msa msa (a) Minimum mean-square absolute error. .e.. as well as the values themselves. with C1 = 1) that minimize the mean-square logarithmic error. . we are extracting the gate resistances Rj and the load capacitances Ci . given accurate data. and scale all capacitances by 1/α. all trajectories x(t) converge to zero as ˙ t → ∞).) This simple RC delay model can be used for design. This can be done using a circuit simulator such as SPICE. . which we assume is stable (i. Rn and C1 . By simulation. we obtain the (accurate) delay Dij for gate j driving load i. Rn and C1 .25 Some attributes of a stable system. . but it doesn’t really matter. . . . . . . Rn and C1 . where η is a constant that depends on the precise definition of delay used. with C1 = 1) that minimize the mean-square absolute error. . for any nonzero trajectory x. with x(t) ∈ Rn .) The goal is to find positive numbers R1 . .e.m from the course web site. . i. . . Finally we get to the problem. If we scale all resistances by a constant α > 0. .. or approximate (but very fast) analysis. . (One common choice is η = 0. Cm so that Dij ≈ Rj Ci . . . 16. . To remove this ambiguity. and m loads labeled 1. Cm msl (positive. . . so we expect the extracted Rs and Cs to be similar in the two cases. . . • Peaking factor. given accurate delay data Dij (obtained by simulation). . . . . . Also write down your minimum Emsa and Emsl values. the approximate delays Rj Ci don’t change. We have n digital gates labeled 1. . the delay of the gate has the form ηRC. In this simple model. we will take the first load as our ‘unit load’. Please note the following: • You do not need to know anything about digital circuits to solve this problem. This problem concerns the autonomous linear dynamical system x = Ax. (D is given as an m × n matrix. We address the question of determining a value of R for each of a set of gates. T ]. 1. for all trajectories. you must explain your method. The problem is to find an upper bound on how large x(T ) can be. this means that there are trajectories with arbitrarily large values of x(T ) that are consistent with the given alarm data. . We define the minimum decorrelation time as the smallest possible τ ≥ 0 for which x(t + τ ) ⊥ x(t) can hold for some (nonzero) trajectory x. except as described below. (The threshold levels li are known. 140 . You know A and C. (We will deduct points for solutions that give bounds that are correct. T ] over which yi (t) ≥ li . such as a search over a (fine) grid of values of τ .4176.4 −1 −0. [3. where x(t) ∈ Rn is the state and y(t) ∈ Rp is the output at time t.26 System with level alarms. (If x(t + τ ) ⊥ x(t) never occurs for τ ≥ 0. Of course. . The output is monitored using level alarms with thresholds.0288.0195].) Give your bound on x(T ) .9723] We now consider the specific problem with   −0. while being consistent with the given alarm data. then the minimum decorrelation time is +∞. but higher than they need to be. ˙ y(t) = Cx(t). We define the halving time of the system as the smallest τ ≥ 0 for which x(t+τ ) ≤ x(t) /2 always holds.9210. −6  0 16. .0863].) You have alarm data over the time interval [0. If it is +∞. A linear dynamical system evolves according to x(t) = Ax(t). p. A= 0 1 0 C= 1 0 −1 0 1 1 . These tell us when yi (t) ≥ li . where τ max is known. • Minimum decorrelation time. For each output component i = 1. You can assume that you do not need to search over τ greater than τ max . Your method can involve some numerical simulation. This is the smallest possible time the state can rotate 90◦ .2 −2 1 0 0 . l1 = l2 = 1. and alarm intervals given below: y1 : y2 : [0. 1. We’d like all quantities to an accuracy of around 0. T = 10.9402].  0 0  . you are given the (possibly empty) set of the intervals in [0. of the following format. explain. [6. .6 1 0 6 with τ max = 10.) (a) Explain how to find each of these quantities.• Halving time. (b) Carry out your method for the specific case with  −1 −5 0  5 0 0 A=  0.01. We allow +∞ as an answer here.9 −4. [0. where li is the threshold level for output component i. 4. 6. but not x(t) or y(t). Furthermore. u(t) d m k y1 (t) k y2 (t) d m k y3 (t) y10 (t) d m m Ten masses are connected in series by springs and light dampers. and (iii) the resulting x(tf ) 2 .. For which of the ten possible actuator placements is the system controllable? ii. and d = 0.Lecture 18 – Controllability and state transfer 18. Your answer should include (i) a plot of the minimizing input uopt (t).e. y10 . The mass positions (deviation from rest) are denoted by y1 . You are given the following design specification: any state should be reachable without the need for very large actuation forces. say. and asked to drive the state to as close to zero as possible at time tf = 20 (i. but it could also have been placed in any of the other nine masses. (a) Actuator placement (10 masses).1 This problem has two parts that are mostly independent.) Note: To avoid error propagation in solutions. You are given the initial state of the system. a velocity disturbance in the fourth mass is to be attenuated as much as possible in 20 seconds. we want the smallest one. from among all inputs that achieve the minimum x(tf ) 2 . available at the course web site. the actuator is shown located on the second mass from the left. ˙ i.m. ˙ i. The masses. which constructs the dynamics and input matrices A and B. .01.) A force actuator is placed on the third mass from the left. the one for which the energy tf Eu = u(t)2 dt 0 is minimized. Notes: • We will be happy with an approximate solution (by using.) You may want to discretize the system. and damping constants are all identical and given by m = 1. In the figure. .) • You may (or may not) want to use the result h A 0 eAt B dt = eAh − I B. You must explain and justify how you obtained your solution..e. Is the system controllable? ii. spring constants. t ∈ [0. in which case we suggest you use 100 discretization intervals (or more. that minimizes x(tf ) 2 . As before. Four unit masses are connected in series by springs and light dampers (with k = 1. use state x = [y T y T ]T . d = 0. . (ii) the corresponding energy Eu. k = 1. use the matlab script spring_series. x(0) = e8 = [0 · · · 0 1]T .min . tf ]. . you should clearly explain and justify your decision. 141 . as shown in the figure above. an input that is piece-wise constant in small intervals. i. Where would you place the actuator? (Since the design specification is somewhat vague. Consider now a system with the same characteristics.) In other words.01. you are to choose an input u(t). An actuator is used to apply a force u(t) to one of the masses. (b) Optimal control (4 masses). but with only four masses. Use state x = [y T y T ]T . we mean that the statement holds for all values of the matrices. from t = 0 until t = ta . 0 0 1 0 u(0)2 + · · · + u(N − 1)2 . Consider a controllable discrete-time system x(t + 1) = Ax(t) + Bu(t). (a) Give an explicit formula for uln (t).. You can use a simple method to numerically approximate any integrals you encounter.18. without fixing the final time N . Find the minimum value of N such that EN (z) ≤ 1. You can assume that A is invertible..8 x(t + 1) =  1 0 0  x(t) +  0  u(t).5 Some True/False questions. You must explain what you are doing. x(0) = 0. i. for 0 < ta < tf . Let xdes = [1 0 − 1 0 0 0]T and tf = 6. dimensions. mentioned in the statement. Hint: the matlab command P=dlyap(A. Roughly speaking. 18. Who is right? Justify your answer. from t = ta until the final time tf . You are to determine an input u that results in x(tf ) = xdes . where tf and xdes are given.4 Minimum energy inputs with coasting. etc. Either justify or refute the engineer’s statement..3 Minimum energy required to steer the state to zero. 18. x(0) = x0 . just submitting some code and a plot is not enough. uln will denote the one that minimizes the ‘energy’ ta u(t) 0 2 dt. i. and in the reachability problem we steer the state from zero to a given state. Consider the mechanical system on page 11-9 of the notes. Another engineer disagrees. Therefore E(x0 ) is the same as the minimum energy required for z to reach x0 (a formula for which can be found in the lecture notes). the system ‘coasts’ or ‘drifts’ with u(t) = 0. i.. vectors. The system z(t + 1) = A−1 z(t) − A−1 Bv(t) is the same as the given one.2 Horizon selection. 142 . x(0) = 0. and have the constraint that u(t) = 0 for t > ta . Let E∞ (z) denote the minimum energy required to reach the state x(N ) = z. Consider the (scalar input) system     1 0 0 0.e. By ‘True’. An engineer asserts that the minimum energy required will decrease. (This is the shortest horizon that requires no more than 10% more control energy than infinite horizon control.) (c) Matlab exercise. except time is running backwards. pointing out that the final time has not changed. Among all u that satisfy these specifications. t−1 E(x0 ) = min τ =0 u(τ ) 2 | x(t) = 0 . (It is possible that neither is right. i.W) computes the solution of the Lyapunov equation AP AT + W = P . ˙ where A ∈ Rn×n and B ∈ Rn×m . but ‘turned backwards in time’ since here we steer the state from a given state to zero. 18. Let E(x0 ) denote the minimum energy required to drive the state to zero.e. for any final state).e.e.1E∞ (z) for all z. E∞ (z) = limN →∞ EN (z). where 0 < ta ≤ tf . ‘False’ means that the statement fails to hold in at least one case. An engineer argues as follows: This problem is like the minimum energy reachability problem. You are also given ta . Plot the minimum energy required as a function of ta . We consider the controllable system x = Ax + Bu. you are allowed to apply a (nonzero) input u during the ‘controlled portion’ of the trajectory. For N ≥ 3 let EN (z) denote the minimum input energy. (b) Now suppose that ta is increased (but still less than tf ). of course. the minimum value of required to reach x(N ) = z. . Let x : R+ → Rn be any trajectory of the linear dynamical system x = Ax. . ˙ (e) Suppose P and Q are symmetric n × n matrices. . with leading coefficient one. B = [b1 b2 ] ∈ Rn×2 . 4. where λ ∈ R. u(3) = 0 u2 (3) . n. where A ∈ Rn×n . that satisfies p(A) = 0. t = 0. and B ∈ Rq×r is skinny (i. (a) Suppose we can find an alternating input sequence so that x(2N ) = xdes . where T > n. and suppose v ∈ Rn .. v = 0. Then there is an input that steers the state from xinit at time t = 0 to xfinal at time t = n. Then we have A ≥ max i=1. . We are given a target state xdes ∈ Rn . we have x(t) ≤ x(0) .q ai . . . . . Then A = 0.6 Alternating input reachability. where F. 18.. x(t + 1) = Ax(t) + Bu(t). . and the nullspace of A is the same as the nullspace of A2 . can we always find an input sequence that uses both inputs at each time step. Can we always find an alternating input sequence so that x(2N ) = xdes ? 143 . In contrast... satisfies v T A = λv T . and let {v1 . . q ≥ r) and full rank. . are both ˙ ˙ stable. u2 (t)) ∈ R2 is the input. Can we always find a standard input sequence so that x(N ) = xdes ? In other words. which is stable. (h) Suppose A ∈ Rn×n has all eigenvalues equal to zero. Then T T if we have vi P vi ≥ vi Qvi for i = 1... . Then the linear dynamical system w = (F + G)w is stable. 1.(a) Suppose A ∈ Rn×n and p(s) = sn + a1 sn−1 + · · · + an is polynomial of degree n. 2. . we’ll refer to an input sequence as a standard input sequence if both inputs can be nonzero at each time t. p ≤ q) and full rank. (c) Let A ∈ Rp×q and let ai ∈ Rp denote the ith column of A.. u(1). . u(1) = 0 u2 (1) . v2 . where A ∈ Rn×n . (i) Consider the discrete-time linear dynamical system x(t + 1) = Ax(t) + Bu(t). We say that an input sequence u(0). at time t. . . G ∈ Rn×n . u(0) = u1 (0) 0 . vn } be a basis for Rn . Then p is the characteristic polynomial of A. We assume that x(0) = 0. We consider a linear dynamical system with n states and 2 inputs. (d) Suppose the two linear dynamical systems x = F x and z = Gz. Then AB is full rank. and u(t) = (u1 (t).e. (f) Let A ∈ Rn×n . i.e. . (b) Suppose x : R+ → Rn is a trajectory of the linear dynamical system x = Ax. 3.e. and drives the state to xdes in N steps? (b) Is the converse true? Suppose we can find a standard input sequence so that x(N ) = xdes . and a time horizon N ≥ n. . ˙ Then for any t ≥ 0. . 5.... u(2) = u1 (2) 0 . Suppose there is an input that steers the state from a particular initial state xinit at time t = 0 to a particular final state xfinal at time t = T . and u2 (t) = 0 for t = 0. x(t) ∈ Rn is the state. . . Then at least one of the following ˙ statements hold: • v T x(t) ≥ v T x(0) for all t ≥ 0 • v T x(t) ≤ v T x(0) for all t ≥ 0 (g) Suppose A ∈ Rp×q is fat (i. we must have P ≥ Q. is an alternating input sequence if u1 (t) = 0 for t = 1. if we can drive the state to xdes in 2N steps with an alternating input sequence. . . xdes and N ≥ n. if we can find an alternating input sequence so that x(2N ) = xdes . In your solution for parts (a) and (b) you should first state your answer. and if your answer is ‘No’. Your solution must be short. and N ≥ n. then we can also find a standard input sequence so that x(N ) = xdes . So.g. pseudo-inverse. we mean for any A. we won’t read more than one page. you must provide a counterexample (and you must explain clearly why it is a counterexample).. b2 . b1 .By always. You may use any of the concepts from the class (e. If your answer is ‘Yes’. b2 .). for example. eigenvalues. singular values. b1 . controllability. 144 . you are saying that for any A. which must be either ‘Yes’ or ‘No’. etc. if your answer is ‘Yes’ for part (a). you must provide a justification. xdes . .) (The matrix A is the same as in problem 14. . . . Fk are all zero. there is no other connection between the problems. for which there is an observer using only these sensors. with  0 0 . sets) of sensors. Such observers have the form x(t) = F0 y(t) + F1 dy dk y (t) + · · · + Fk k (t). . . Consider the system x = Ax. .) We consider observers that (exactly and instantaneously) reconstruct the state from the output and its derivatives. . Fk is nonzero. . C= 0 1 1 A=  0 1 1 0  0 0 0 1 0 0 0 = Cx. . . then the observer doesn’t use the ith sensor signal yi (t) to reconstruct the state. The degree gives the highest derivative of y used to reconstruct the state. just to save you typing. .. Fk . the observer has to work!) Consider an observer defined by F0 . If the ith columns of F0 . .e. of this minimum number.1 Sensor selection and observer design. dt dt where F0 . i. We say the observer uses or requires the sensor i if at least one of the ith columns of F0 . .Lecture 19 – Observability and state estimation 19. y ˙    1 0 0 0 1 1 0  1 1 0 0  . (a) What is the minimum number of sensors required for such an observer? List all combinations (i. (Of course we require this formula to hold for any trajectory of the system and any t. 145 ..e. We say the degree of the observer is the largest j such that Fj = 0. (b) What is the minimum degree observer? List all combinations of sensors for which an observer of this minimum degree can be found. 1 (This problem concerns observer design so we’ve simplified things by not even including an input. Fk are matrices that specify the observer. . .

Comments

Description