Primer on Optimal Control BOOK Speyer

Primer on OptimalControl Theory DC20_Speyer-JacobsenFM-a.indd 1 3/2/2010 10:59:15 AM Advances in Design and Control SIAM’s Advances in Design and Control series consists of texts and monographs dealing with all areas of design and control and their applications. Topics of interest include shape optimization, multidisciplinary design, trajectory optimization, feedback, and optimal control. The series focuses on the mathematical and computational aspects of engineering design and control that are usable in a wide variety of scientifc and engineering disciplines. Editor-in-Chief Ralph C. Smith, North Carolina State University Editorial Board Athanasios C. Antoulas, Rice University Siva Banda, Air Force Research Laboratory Belinda A. Batten, Oregon State University John Betts, The Boeing Company (retired) Stephen L. Campbell, North Carolina State University Eugene M. Cliff, Virginia Polytechnic Institute and State University Michel C. Delfour, University of Montreal Max D. Gunzburger, Florida State University J. William Helton, University of California, San Diego Arthur J. Krener, University of California, Davis Kirsten Morris, University of Waterloo Richard Murray, California Institute of Technology Ekkehard Sachs, University of Trier Series Volumes Speyer, Jason L., and Jacobson, David H., Primer on Optimal Control Theory Betts, John T., Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, Second Edition Shima, Tal and Rasmussen, Steven, eds., UAV Cooperative Decision and Control: Challenges and Practical Approaches Speyer, Jason L. and Chung, Walter H., Stochastic Processes, Estimation, and Control Krstic, Miroslav and Smyshlyaev, Andrey, Boundary Control of PDEs: A Course on Backstepping Designs Ito, Kazufumi and Kunisch, Karl, Lagrange Multiplier Approach to Variational Problems and Applications Xue, Dingyü, Chen, YangQuan, and Atherton, Derek P., Linear Feedback Control: Analysis and Design with MATLAB Hanson, Floyd B., Applied Stochastic Processes and Control for Jump-Diffusions: Modeling, Analysis, and Computation Michiels, Wim and Niculescu, Silviu-Iulian, Stability and Stabilization of Time-Delay Systems: An Eigenvalue- Based Approach Ioannou, Petros and Fidan, Baris, Adaptive Control Tutorial Bhaya, Amit and Kaszkurewicz, Eugenius, Control Perspectives on Numerical Algorithms and Matrix Problems Robinett III, Rush D., Wilson, David G., Eisler, G. Richard, and Hurtado, John E., Applied Dynamic Programming for Optimization of Dynamical Systems Huang, J., Nonlinear Output Regulation: Theory and Applications Haslinger, J. and Mäkinen, R. A. E., Introduction to Shape Optimization: Theory, Approximation, and Computation Antoulas, Athanasios C., Approximation of Large-Scale Dynamical Systems Gunzburger, Max D., Perspectives in Flow Control and Optimization Delfour, M. C. and Zolésio, J.-P., Shapes and Geometries: Analysis, Differential Calculus, and Optimization Betts, John T., Practical Methods for Optimal Control Using Nonlinear Programming El Ghaoui, Laurent and Niculescu, Silviu-Iulian, eds., Advances in Linear Matrix Inequality Methods in Control Helton, J. William and James, Matthew R., Extending H 1 Control to Nonlinear Systems: Control of Nonlinear Systems to Achieve Performance Objectives ¸ DC20_Speyer-JacobsenFM-a.indd 2 3/2/2010 10:59:15 AM Society for Industrial and Applied Mathematics Philadelphia Primer on Optimal Control Theory Jason L. Speyer University of California Los Angeles, California David H. Jacobson PricewaterhouseCoopers LLP Toronto, Ontario, Canada DC20_Speyer-JacobsenFM-a.indd 3 3/2/2010 10:59:15 AM is a registered trademark. Copyright © 2010 by the Society for Industrial and Applied Mathematics. 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA. Trademarked names may be used in this book without the inclusion of a trademark symbol. These names are used in an editorial context only; no infringement of trademark is intended. Library of Congress Cataloging-in-Publication Data Speyer, Jason Lee. Primer on optimal control theory / Jason L. Speyer, David H. Jacobson. p. cm. Includes bibliographical references and index. ISBN 978-0-898716-94-8 1. Control theory. 2. Mathematical optimization. I. Jacobson, David H. II. Title. QA402.3.S7426 2010 515’.642--dc22 2009047920 DC20_Speyer-JacobsenFM-a.indd 4 3/2/2010 10:59:15 AM To Barbara, a constant source of love and inspiration. To my children, Gil, Gavriel, Rakhel, and Joseph, for giving me so much joy and love. For Celia, Greta, Jonah, Levi, Miles, Thea, with love from Oupa! t DC20_Speyer-JacobsenFM-a.indd 5 3/2/2010 10:59:16 AM vii Contents List of Figures xi Preface xiii 1 Introduction 1 1.1 Control Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 General Optimal Control Problem . . . . . . . . . . . . . . . . . . . . 5 1.3 Purpose and General Outline . . . . . . . . . . . . . . . . . . . . . . 7 2 Finite-Dimensional Optimization 11 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Unconstrained Minimization . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Scalar Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Numerical Approaches to One-Dimensional Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.3 Multivariable First-Order Conditions . . . . . . . . . . . . . . 21 2.2.4 Multivariable Second-Order Conditions . . . . . . . . . . . . . 23 2.2.5 Numerical Optimization Schemes . . . . . . . . . . . . . . . . 25 2.3 Minimization Subject to Constraints . . . . . . . . . . . . . . . . . . 28 2.3.1 Simple Illustrative Example . . . . . . . . . . . . . . . . . . . 29 2.3.2 General Case: Functions of n-Variables . . . . . . . . . . . . . 36 2.3.3 Constrained Parameter Optimization Algorithm . . . . . . . . 40 2.3.4 General Form of the Second Variation . . . . . . . . . . . . . 44 2.3.5 Inequality Constraints: Functions of 2-Variables . . . . . . . . 45 viii Contents 3 Optimization of Dynamic Systems with General Performance Criteria 53 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 Linear Dynamic Systems . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2.1 Linear Ordinary Differential Equation . . . . . . . . . . . . . 57 3.2.2 Expansion Formula . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.3 Adjoining System Equation . . . . . . . . . . . . . . . . . . . 58 3.2.4 Expansion of ˆ J . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.5 Necessary Condition for Optimality . . . . . . . . . . . . . . . 60 3.2.6 Pontryagin’s Necessary Condition for Weak Variations . . . . 61 3.3 Nonlinear Dynamic System . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3.1 Perturbations in the Control and State from the Optimal Path . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.2 Pontryagin’s Weak Necessary Condition . . . . . . . . . . . . 67 3.3.3 Maximum Horizontal Distance: A Variation of the Brachistochrone Problem . . . . . . . . . . . . . . . . . 68 3.3.4 Two-Point Boundary-Value Problem . . . . . . . . . . . . . . 71 3.4 Strong Variations and Strong Form of the Pontryagin Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.4.1 Control Constraints . . . . . . . . . . . . . . . . . . . . . . . . 79 3.5 Sufficient Conditions for Optimality . . . . . . . . . . . . . . . . . . . 83 3.5.1 Derivatives of the Optimal Value Function . . . . . . . . . . . 91 3.5.2 Derivation of the H-J-B Equation . . . . . . . . . . . . . . . . 96 3.6 Unspecified Final Time t f . . . . . . . . . . . . . . . . . . . . . . . . 99 4 Terminal Equality Constraints 111 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.2 Linear Dynamic System with Terminal Equality Constraints . . . . . 113 4.2.1 Linear Dynamic System with Linear Terminal Equality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.2.2 Pontryagin Necessary Condition: Special Case . . . . . . . . . 120 4.2.3 Linear Dynamics with Nonlinear Terminal Equality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.3 Weak First-Order Optimality with Nonlinear Dynamics and Terminal Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.3.1 Sufficient Condition for Weakly First-Order Optimality . . . . 123 4.4 Strong First-Order Optimality . . . . . . . . . . . . . . . . . . . . . . 133 4.4.1 Strong First-Order Optimality with Control Constraints . . . 138 Contents ix 4.5 Unspecified Final Time t f . . . . . . . . . . . . . . . . . . . . . . . . 142 4.6 Minimum Time Problem Subject to Linear Dynamics . . . . . . . . . 145 4.7 Sufficient Conditions for Global Optimality . . . . . . . . . . . . . . . 148 5 Linear Quadratic Control Problem 155 5.1 Motivation of the LQ Problem . . . . . . . . . . . . . . . . . . . . . . 156 5.2 Preliminaries and LQ Problem Formulation . . . . . . . . . . . . . . 161 5.3 First-Order Necessary Conditions for Optimality . . . . . . . . . . . . 162 5.4 Transition Matrix Approach without Terminal Constraints . . . . . . 168 5.4.1 Symplectic Properties of the Transition Matrix . . . . . . . . . 170 5.4.2 Riccati Matrix Differential Equation . . . . . . . . . . . . . . 172 5.4.3 Canonical Transformation . . . . . . . . . . . . . . . . . . . . 175 5.4.4 Necessary and Sufficient Conditions . . . . . . . . . . . . . . . 177 5.4.5 Necessary and Sufficient Conditions for Strong Positivity . . . 181 5.4.6 Strong Positivity and the Totally Singular Second Variation . 185 5.4.7 Solving the Two-Point Boundary-Value Problem via the Shooting Method . . . . . . . . . . . . . . . . . . . . . . . 188 5.5 LQ Problem with Linear Terminal Constraints . . . . . . . . . . . . . 192 5.5.1 Normality and Controllability . . . . . . . . . . . . . . . . . . 197 5.5.2 Necessary and Sufficient Conditions . . . . . . . . . . . . . . . 201 5.6 Solution of the Matrix Riccati Equation: Additional Properties . . . . 205 5.7 LQ Regulator Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 213 5.8 Necessary and Sufficient Conditions for Free Terminal Time . . . . . 217 5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 6 LQ Differential Games 231 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 6.2 LQ Differential Game with Perfect State Information . . . . . . . . . 232 6.3 Disturbance Attenuation Problem . . . . . . . . . . . . . . . . . . . . 235 6.3.1 The Disturbance Attenuation Problem Converted into a Differential Game . . . . . . . . . . . . . . . . . . . . . . . . 238 6.3.2 Solution to the Differential Game Problem Using the Conditions of the First-Order Variations . . . . . . . . . . 239 6.3.3 Necessary and Sufficient Conditions for the Optimality of the Disturbance Attenuation Controller . . . . . . . . . . . . . 245 6.3.4 Time-Invariant Disturbance Attenuation Estimator Transformed into the H ∞ Estimator . . . . . . . . . . . . . . 250 6.3.5 H ∞ Measure and H ∞ Robustness Bound . . . . . . . . . . . . 254 6.3.6 The H ∞ Transfer-Matrix Bound . . . . . . . . . . . . . . . . . 256 x Contents A Background 261 A.1 Topics from Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 A.1.1 Implicit Function Theorems . . . . . . . . . . . . . . . . . . . 261 A.1.2 Taylor Expansions . . . . . . . . . . . . . . . . . . . . . . . . 271 A.2 Linear Algebra Review . . . . . . . . . . . . . . . . . . . . . . . . . . 273 A.2.1 Subspaces and Dimension . . . . . . . . . . . . . . . . . . . . 273 A.2.2 Matrices and Rank . . . . . . . . . . . . . . . . . . . . . . . . 274 A.2.3 Minors and Determinants . . . . . . . . . . . . . . . . . . . . 275 A.2.4 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 276 A.2.5 Quadratic Forms and Definite Matrices . . . . . . . . . . . . . 277 A.2.6 Time-Varying Vectors and Matrices . . . . . . . . . . . . . . . 283 A.2.7 Gradient Vectors and Jacobian Matrices . . . . . . . . . . . . 284 A.2.8 Second Partials and the Hessian . . . . . . . . . . . . . . . . . 287 A.2.9 Vector and Matrix Norms . . . . . . . . . . . . . . . . . . . . 288 A.2.10 Taylor’s Theorem for Functions of Vector Arguments . . . . . 293 A.3 Linear Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . 293 Bibliography 297 Index 303 xi List of Figures 1.1 Control-constrained optimization example . . . . . . . . . . . . . . . 5 2.1 A brachistochrone problem . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Definition of extremal points . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Function with a discontinuous derivative . . . . . . . . . . . . . . . . 15 2.4 Ellipse definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.5 Definition of ¯ V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.6 Definition of tangent plane . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7 Geometrical description of parameter optimization problem . . . . . . 42 3.1 Depiction of weak and strong variations . . . . . . . . . . . . . . . . . 56 3.2 Bounded control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.1 Rocket launch example . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.2 Phase portrait for the Bushaw problem . . . . . . . . . . . . . . . . . 147 4.3 Optimal value function for the Bushaw problem . . . . . . . . . . . . 151 5.1 Coordinate frame on a sphere . . . . . . . . . . . . . . . . . . . . . . 184 6.1 Disturbance attenuation block diagram . . . . . . . . . . . . . . . . . 236 6.2 Transfer function of square integrable signals . . . . . . . . . . . . . . 254 6.3 Transfer matrix from the disturbance inputs to output performance . 257 6.4 Roots of P as a function of θ −1 . . . . . . . . . . . . . . . . . . . . . 258 6.5 System description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 A.1 Definition of F y (x 0 , y 0 ) > 0 . . . . . . . . . . . . . . . . . . . . . . . . 263 xiii Preface This book began when David Jacobson wrote the first draft of Chapters 1, 3, and 4 and Jason Speyer wrote Chapters 2, 5, and 6. Since then the book has constantly evolved by modification of those chapters as we interacted with colleagues and students. We owe much to them for this polished version. The objective of the book is to make optimal control theory accessible to a large class of engineers and scientists who are not mathematicians, although they have a basic mathematical background, but who need to understand and want to appreciate the sophisticated material associated with optimal control theory. Therefore, the material is presented using elementary mathematics, which is sufficient to treat and understand in a rigorous way the issues underlying the limited class of control problems in this text. Furthermore, although many topics that build on this foundation are covered briefly, such as inequality constraints, the singular control problem, and advanced numerical methods, the foundation laid here should be adequate for reading the rich literature on these subjects. We would like to thank our many students whose input over the years has been incorporated into this final draft. Our colleagues also have been very influential in the approach we have taken. In particular, we have spent many hours discussing the concepts of optimal control theory with Professor David Hull. Special thanks are extended to Professor David Chichka, who contributed some interesting examples and numerical methods, and Professor Moshe Idan, whose careful and critical reading of the manuscript has led to a much-improved final draft. Finally, the first author must express his gratitude to Professor Bryson, a pioneer in the development of the theory, numerical methods, and application of optimal control theory as well as a teacher, mentor, and dear friend. 1 Chapter 1 Introduction The operation of many physical processes can be enhanced if more efficient operation can be determined. Such systems as aircraft, chemical processes, and economies have at the disposal of an operator certain controls which can be modulated to enhance some desired property of the system. For example, in commercial aviation, the best fuel usage at cruise is an important consideration in an airline’s profitability. Full employment and growth of the gross domestic product are measures of economic system performance; these may be enhanced by proper modulation of such controls as the change in discount rate determined by the Federal Reserve Board or changes in the tax codes devised by Congress. The essential features of such systems as addressed here are dynamic systems, available controls, measures of system performance, and constraints under which a system must operate. Models of the dynamic system are described by a set of first- order coupled nonlinear differential equations representing the propagation of the state variables as a function of the independent variable, say, time. The state vector may be composed of position, velocity, and acceleration. This motion is influenced by the inclusion of a control vector. For example, the throttle setting and the aerodynamic surfaces influence the motion of the aircraft. The performance criterion which establishes the effectiveness of the control process on the dynamical system can take 2 Chapter 1. Introduction many forms. For an aircraft, desired performance might be efficient fuel cruise (fuel per range), endurance (fuel per time), or time to a given altitude. The performance criterion is to be optimized subject to the constraints imposed by the system dynamics and other constraints. An important class of constraints are those imposed at the termination of the path. For example, the path of an aircraft may terminate in minimum time at a given altitude and velocity. Furthermore, path constraints that are functions of the controls or the states or are functions of both the state and control vectors may be imposed. Force constraints or maximum-altitude constraints may be imposed for practical implementation. In this chapter, a simple dynamic example is given to illustrate some of the concepts that are described in later chapters. These concepts as well as the optimization concepts for the following chapters are described using elementary mathematical ideas. The objective is to develop a mathematical structure which can be justified rigorously using elementary concepts. If more complex or sophisticated ideas are required, the reader will be directed to appropriate references. Therefore, the treatment here is not the most general but does cover a large class of optimization problems of practical concern. 1.1 Control Example A control example establishes the notion of control and how it can be manipulated to satisfy given goals. Consider the forced harmonic oscillator described as ¨ x + x = u, x(0), ˙ x(0) given, (1.1) where x is the position. The overdot denotes time differentiation; that is, ˙ x is dx/dt. This second-order linear differential equation can be rewritten as two first-order dif- 1.1. Control Example 3 ferential equations by identifying x 1 = x and x 2 = ˙ x. Then ˙ x 1 = x 2 , x 1 (0) given, (1.2) ˙ x 2 = −x 1 + u, x 2 (0) given, (1.3) or ¸ ˙ x 1 ˙ x 2 = ¸ 0 1 −1 0 ¸ x 1 x 2 + ¸ 0 1 u. (1.4) Suppose it is desirable to find a control which drives x 1 and x 2 to the origin from arbitrary initial conditions. Since system (1.4) is controllable (general comments on this issue can be found in [8]), there are many ways that this system can be driven to the origin. For example, suppose the control is proportional to the velocity such as u = −Kx 2 , K > 0, is a constant. Then, asymptotically the position and velocity converge to zero as t →∞. Note that the system converges for any positive value of K. It might logically be asked if there is a best value of K. This in turn requires some definition for “best.” There is a large number of possible criteria. Some common objectives are to minimize the time needed to reach the desired state or to minimize the effort it takes. A criterion that allows the engineer to balance the amount of error against the effort expended is often useful. One particular formulation of this trade-off is the quadratic performance index, specialized here to J 1 = lim t f →∞ t f 0 (a 1 x 2 1 + a 2 x 2 2 + u 2 )dt, (1.5) where a 1 > 0 and a 2 > 0, and u = −Kx 2 is substituted into the performance criterion. The constant parameter K is to be determined such that the cost criterion is minimized subject to the functional form of Equation (1.4). 4 Chapter 1. Introduction We will not solve this problem here. In Chapter 2, the parameter minimization problem is introduced to develop some of the basic concepts that are used in the solution. However, a point to note is that the control u does not have to be chosen a priori, but the best functional form will be produced by the optimization process. That is, the process will (usually) produce a control that is expressed as a function of the state of the system rather than an explicit function of time. This is especially true for the quadratic performance index subject to a linear dynamical system (see Chapters 5 and 6). Other performance measures are of interest. For example, minimum time has been mentioned for where the desired final state was the origin. For this problem to make sense, the control must be limited in some way; otherwise, infinite effort would be expended and the origin reached in zero time. In the quadratic performance index in (1.5), the limitation came from penalizing the use of control (the term u 2 inside the integral). Another possibility is to explicitly bound the control. This could represent some physical limit, such as a maximum throttle setting or limits to steering. Here, for illustration, the control variable is bounded as [u[ ≤ 1. (1.6) In later chapters it is shown that the best solution often lies on its bounds. To produce some notion of the motion of the state variables (x 1 , x 2 ) over time, note that Equations (1.2) and (1.3) can be combined by eliminating time as dx 1 /dt dx 2 /dt = x 2 (−x 1 + u) ⇒ (−x 1 + u)dx 1 = x 2 dx 2 . (1.7) Assuming u is a constant, both sides can be integrated to get (x 1 −u) 2 + x 2 2 = R 2 , (1.8) 1.2. General Optimal Control Problem 5 u = 1 u = −1 x 1 (0), x 2 (0) A B D C Figure 1.1: Control-constrained optimization example. which translates to a series of concentric circles for any specific value of the control. For u = 1 and u = −1, the series of concentric circles are as shown in Figure 1.1 There are many possible paths that drive the initial states (x 1 (0), x 2 (0)) to the origin. Starting with u = 1 at some arbitrary (x 1 (0), x 2 (0)), the path proceeds to point A or B. From A or B the control changes, u = −1 until point C or D is intercepted. From these points using u = +1, the origin is obtained. Neither of these paths starting from the initial conditions is a minimum time path, although starting from point B, the resulting paths are minimum time. The methodology for determining the optimal time paths is given in Chapter 4. 1.2 General Optimal Control Problem The general form of the optimal control problems we consider begins with a first-order, likely nonlinear, dynamical system of equations as ˙ x = f(x, u, t), x(t 0 ) = x 0 , (1.9) 6 Chapter 1. Introduction where x ∈ R n , u ∈ R m , f : R n R m R 1 → R n . Recall that (˙) denotes d( )/dt. Denote x(t) as x, u(t) as u, and the functions of x and u as x() and u(). In the example of Section 1.1, the system is given by Equation (1.4). The performance of the dynamical system is to be modulated to minimize some performance index, which we assume to be of the form J = φ(x(t f ), t f ) + t f t 0 L(x, u, t)dt, (1.10) where φ : R n R 1 →R 1 and L : R n R m R 1 →R 1 . The terms in the performance index are often driven by considerations of energy use and time constraints. For example, the performance index might be as simple as minimizing the final time (set φ(t f ) = t f and L(, , ) ≡ 0 in (1.10)). It may also attempt to minimize the amount of energy expended in achieving the desired goal or to limit the control effort expended, or any combination of these and many other considerations. In the formulation of the problem, we limit the class of control functions | to the class of bounded piecewise continuous functions. The solution is to be such that the functional J takes on its minimum for some u() ∈ | subject to the differential equations (1.9). There may also be several other constraints. One very common form of constraint, which we treat at length, is on the terminal state of the system: ψ(x(t f ), t f ) = 0, (1.11) where ψ : R n R 1 →R p . This reflects a common requirement in engineering problems, that of achieving some specified final condition exactly. The motion of the system and the amount of control available may also be subject to hard limits. These bounds may be written as S(x(t), t) ≤ 0, (1.12) 1.3. Purpose and General Outline 7 where S : R n R 1 →R 1 for a bound on the state only, or more generally for a mixed state and control space bound g(x(t), u(t), t) ≤ 0, (1.13) where g : R n R m R 1 →R 1 . These bounds represent physical or other limitations on the system. For an aircraft, for instance, the altitude must always be greater than that of the landscape, and the control available is limited by the physical capabilities of the engines and control surfaces. Many important classes of problems have been left out of our presentation. For example, the state variable inequality constraint given in (1.12) is beyond the scope of this book. 1.3 Purpose and General Outline This book aims to provide a treatment of control theory using mathematics at the level of the practicing engineer and scientist. The general problem cannot be treated in complete detail using essentially elementary mathematics. However, important special cases of the general problems can be treated in complete detail using elementary mathematics. These special cases are sufficiently broad to solve many interesting and important problems. Furthermore, these special cases suggest solutions to the more general problem. Therefore, complete solutions to the general problem are stated and used. The theoretical gap between the solution to the special cases and the solution to the general problem is discussed, and additional references are given for completeness. To introduce important concepts, mathematical style, and notation, in Chapter 2 the parameter minimization problem is formulated and conditions for local optimality are determined. By local optimality we mean that optimality can be verified about 8 Chapter 1. Introduction a small neighborhood of the optimal point. First, the notions of first- and second- order local necessary conditions for unconstrained parameter minimization problems are derived. The first-order necessary conditions are generalized in Chapter 3 to the minimization of a general performance criterion with nonlinear dynamic systems constraints. Next, the notion of first- and second-order local necessary conditions for parameter minimization problems is extended to include algebraic constraints. The first-order necessary conditions are generalized in Chapter 4 to the minimization of a general performance criterion with nonlinear dynamic systems constraints and terminal equality constraints. Second-order local necessary conditions for the minimization of general performance criterion with nonlinear dynamic systems constraints for both unconstrained and terminal equality constrained problems are given in Chapter 5. In Chapters 3 and 4, local and global conditions for optimality are given for what are called weak and strong control variations. “Weak control variation” means that at any point, the variation away from the optimal control is very small; however, this small variation may be everywhere along the path. This gives rise to the classical local necessary conditions of Euler and Lagrange. “Strong control variation” means that the variation is zero over most of the path, but along a very short section it may be arbitrarily large. This leads to the classical Weierstrass local conditions and its more modern generalization called the Pontryagin Maximum Principle. The local optimality conditions are useful in constructing numerical algorithms for determining the optimal path. Less useful numerically, but sometimes very helpful theoretically, are the global sufficiency conditions. These necessary conditions require the solution to a partial differential equation known as the Hamilton–Jacobi–Bellman equation. In Chapter 5 the second variation for weak control variations produces local necessary and sufficient conditions for optimality. These conditions are determined by 1.3. Purpose and General Outline 9 solving what is called the accessory problem in the calculus of variations, which is essentially minimizing a quadratic cost criterion subject to linear differential equations, i.e., the linear quadratic problem. The linear quadratic problem also arises directly and naturally in many applications and is the basis of much control synthesis work. In Chapter 6 the linear quadratic problem of Chapter 5 is generalized to a two-sided optimization problem producing a zero-sum differential game. The solutions to both the linear quadratic problem and the zero-sum differential game problem produce linear feedback control laws, known in the robust control literature as the H 2 and H ∞ controllers. Background material is included in the appendix. The reader is assumed to be familiar with differential equations and standard vector-matrix algebra. 11 Chapter 2 Finite-Dimensional Optimization A popular approach to the numerical solution of functional minimization problems, where a piecewise continuous control function is sought, is to convert them to an approximate parameter minimization problem. This motivation for the study of parameter minimization is shown more fully in Section 2.1. However, many of the ideas developed to characterize the parameter optimal solution extend to the functional optimization problem but can be treated from a more transparent viewpoint in this setting. These include the first-order and second-order necessary and sufficient conditions for optimality for both unconstrained and constrained minimization problems. 2.1 Motivation for Considering Parameter Minimization for Functional Optimization Following the motivation given in Chapter 1, we consider the functional optimization problem of minimizing with respect to u() ∈ |, 1 J(u, x 0 ) = φ(x(t f ), t f ) + t f t 0 L(x(t), u(t), t)dt (2.1) 1 | represents the class of bounded piecewise continuous functions. 12 Chapter 2. Finite-Dimensional Optimization subject to ˙ x(t) = f(x(t), u(t), t), x 0 given. (2.2) This functional optimization problem can be converted to a parameter optimization or function optimization problem by assuming that the control is piecewise linear as u(t) = ˆ u(u p , t) = u i (t i ) + (t −t i ) t i+1 −t i (u i+1 −u i ), t i ≤ t ≤ t i+1 , (2.3) where i = 0, . . . , N −1, t f = t N , and we define the parameter vector as u p = ¦u i , i = 0, . . . , N −1¦. (2.4) The optimization problem is then as follows. Find the control ˆ u() ∈ | that minimizes J(ˆ u, x 0 ) = φ(x(t f ), t f ) + t f t 0 L(x, ˆ u(u p , t), t)dt (2.5) subject to ˙ x = f(x(t), ˆ u(u p , t), t), x(0) = x 0 given. (2.6) Thus, the functional minimization problem is transformed into a parameter minimization problem to be solved over the time interval [t 0 , t f ]. Since the solution to (2.6) is the state as a function of u p , i.e., x(t) = ˆ x(u p , t), then the cost criterion is J(ˆ u(u p ), x 0 ) = φ(ˆ x(u p , t f )) + t f t 0 L(ˆ x(u p , t), ˆ u(u p , t), t)dt, (2.7) The parameter minimization problem is to minimize J(ˆ u(u p ), x 0 ) with respect to u p . Because we have made assumptions about the form of the control function, this will produce a result that is suboptimal. However, when care is taken, the result will be close to optimal. 2.1. Motivation 13 Example 2.1.1 As a simple example, consider a variant of the brachistochrone problem, first proposed by John Bernoulli in 1696. As shown in Figure 2.1, a bead is sliding on a wire from an initial point O to some point on the wall at a known r = r f . The wire is frictionless. The problem is to find the shape of the wire such that the bead arrives at the wall in minimum time. O r r r f z v Figure 2.1: A brachistochrone problem. In this problem, the control function is θ(t), and the system equations are ˙ z = v sin θ, z(0) = 0, z(t f ) free, ˙ r = v cos θ, r(0) = 0, r(t f ) = 1, ˙ v = g sin θ, v(0) = 0, v(t f ) free, where g is the constant acceleration due to gravity, and the initial point O is taken to be the origin. The performance index to be minimized is simply J( ˆ θ, O) = t f . The control can be parameterized in this case as a function of r more easily than as a function of time, as the final time is not known. To make the example more concrete, let r f = 1 and assume a simple approximation by dividing the interval into halves, with the parameters being the slopes at the beginning, midpoint, and end, u p = ¦u 0 , u 1 , u 2 ¦ = ¦θ(0), θ(0.5), θ(1)¦ , 14 Chapter 2. Finite-Dimensional Optimization so that ˆ θ(r) = u 0 + r 0.5 (u 1 −u 0 ), 0 ≤ r ≤ 0.5, u 1 + r−0.5 0.5 (u 1 −u 2 ), 0.5 < r ≤ 1. The problem is now converted to minimization of the final time over these three independent variables. In the next sections, we develop the theory of parameter optimization. 2.2 Unconstrained Minimization Consider that the cost criterion is a scalar function φ() of a single variable x for x ∈ [x a , x b ]. x φ x b x a 5 4 3 2 1 Figure 2.2: Definition of extremal points. The interior extremal points for this function are, as shown in Figure 2.2, 2, a relative maximum; 3, an inflection (saddle) point; and 4, a relative minimum (absolute for x ∈ [x a , x b ]). The boundary point extrema are 1, a relative minimum, and 5, a and relative maximum (absolute for x ∈ [x a , x b ]). Remark 2.2.1 We consider first only interior extremal points. Assumption 2.2.1 Assume that φ(x) is continuously differentiable everywhere in [x a , x b ]. 2.2. Unconstrained Minimization 15 The assumption avoids functions as shown in Figure 2.3. x φ Figure 2.3: Function with a discontinuous derivative. 2.2.1 Scalar Case To focus on the essential notions of determining both local first- and second-order necessary and sufficiency conditions, a scalar problem is used. These ideas are applied throughout the book. First-Order Necessary Conditions The following theorem and its proof sets the style and notation for the analysis that is used in more complex problems. Theorem 2.2.1 Let Ω = (x a , x b ). Let the cost criterion φ : R →R be a differentiable function. Let x o be an optimal solution of the optimization problem min x φ(x) subject to x ∈ Ω. (2.8) Then it is necessary that ∂φ ∂x (x o ) = 0. (2.9) 16 Chapter 2. Finite-Dimensional Optimization Remark 2.2.2 This is a first-order necessary condition for stationarity (a local (relative) minimum, a maximum, or a saddle (inflection) point; see Figure 2.2). Proof: Since x o ∈ Ω and Ω is an open interval, there exists an > 0 such that x ∈ Ω whenever [x−x o [ < . This implies that for all x o +γα ∈ Ω, where for any α ∈ R and 0 ≤ γ ≤ η (η is determined by α), we have φ(x o ) ≤ φ(x o + γα). (2.10) Since φ is differentiable, by Taylor’s Theorem (see Appendix A.1.2) φ(x o + γα) = φ(x o ) + γ ∂φ ∂x (x o )α +O(γ), (2.11) where O(γ) denotes terms of order greater than γ such that O(γ) γ →0 as γ →0. (2.12) Substitution of (2.11) into (2.10) yields 0 ≤ γ ∂φ ∂x (x o )α +O(γ). (2.13) Dividing this by γ > 0 gives 0 ≤ ∂φ ∂x (x o )α + O(γ) γ . (2.14) Let γ →0 to yield 0 ≤ ∂φ ∂x (x o )α. (2.15) Since the inequality must hold for all α, in particular for both positive and negative values of α, then this implies ∂φ ∂x (x o ) = 0. (2.16) 2.2. Unconstrained Minimization 17 Second Variation Suppose φ is twice differentiable and let x o ∈ Ω be an optimal or even a locally optimal solution. Then ∂φ/∂x(x o ) = φ x (x o ) = 0, and by Taylor’s Theorem φ(x o + γα) = φ(x o ) + 1 2 γ 2 φ xx (x o )α 2 +O(γ 2 ), (2.17) where O(γ 2 ) γ 2 →0 as γ →0. (2.18) For γ sufficiently small, φ(x o ) ≤ φ(x o + γα) = φ(x o ) + 1 2 γ 2 φ xx (x o )α 2 +O(γ 2 ), (2.19) ⇒ 0 ≤ 1 2 φ xx (x o )α 2 + O(γ 2 ) γ 2 (2.20) after dividing by γ 2 > 0. For γ 2 →0, this yields 1 2 φ xx (x o )α 2 ≥ 0 (2.21) for all α. This means that φ xx (x o ) is nonnegative (see Appendix A.2.5 for a discussion on quadratic forms and definite matrices) and is another necessary condition. Equa- tion (2.21) is known as a second-order necessary condition or a convexity condition. Sufficient Condition for a Local Minimum Suppose that x o ∈ Ω, φ x (x o ) = 0, and φ xx (x o ) > 0 (strictly positive). Then from φ(x o ) < φ(x o + γα) for all x o + γα ∈ Ω, (2.22) we can conclude that x o is a local minimum (see Figure 2.2). Remark 2.2.3 If the second variation dominates all other terms in the Taylor series (2.19), then it is called strongly positive, and φ x (x o ) = 0 and φ xx (x o ) > 0 are 18 Chapter 2. Finite-Dimensional Optimization necessary and sufficient conditions for a local minimum. This concept becomes nontrivial in functional optimization, discussed in Chapter 5. Higher-Order Variations For α ∈ R and 0 ≤ γ ≤ η denote the change in φ as ∆φ = φ(x o + γα) −φ(x o ). (2.23) Expanding this into a Taylor series gives ∆φ = δφ + 1 2 δ 2 φ + 1 3! δ 3 φ + 1 4! δ 4 φ + , (2.24) where δφ = φ x (x o )γα, δ 2 φ = φ xx (x o )(γα) 2 , etc. (2.25) Suppose that φ x (x o ) = 0 and also φ xx (x o ) = 0. If φ xxx (x o ) = 0, the extremal is a saddle. If φ xxx (x o ) = 0 and φ xxxx (x o ) > 0, the extremal is a local minimum. These conditions can be seen, respectively, in the examples φ(x) = x 3 and φ(x) = x 4 . Note that the conditions for a maximum can be obtained from those for a minimum by replacement of φ by −φ. Hence φ x (x o ) = 0, φ xx (x o ) ≤ 0, are necessary conditions for a local maximum, and φ x (x o ) = 0, φ xx (x o ) < 0, are sufficient conditions for a local maximum. 2.2.2 Numerical Approaches to One-Dimensional Minimization In this section we present two common numerical methods for finding the point at which a function is minimized. This will clarify what has just been presented. We make tacit assumptions that the functions involved are well behaved and satisfy continuity and smoothness conditions. For more complete descriptions of numerical optimization, see such specialized texts as [23] and [36]. 2.2. Unconstrained Minimization 19 Golden Section Searches Suppose that it is known that a minimum of the function φ(x) exists on the interval (a, b). The only way to be certain that an interval (a, b) contains a minimum is to have some ¯ x ∈ (a, b) such that φ(¯ x) < φ(a) and φ(¯ x) < φ(b). Assuming that there is only one minimum in the interval, the first step in finding its precise location is to find whether the minimizer is in one of the subintervals (a, ¯ x] or [¯ x, b). (The subintervals are partly closed because it is possible that ¯ x is the minimizer.) To find out, we apply the same criterion to one of the subintervals. That is, we choose a test point x t ∈ (a, b), x t = ¯ x, and evaluate the function at that point. Suppose that x t < ¯ x. We can then check to see if φ(x t ) < φ(¯ x). If it is, we know that the minimum lies in the interval (a, ¯ x). If φ(¯ x) < φ(x t ), then the minimum must lie in the interval (x t , b). Note that due to our strong assumption about a single minimum, φ(¯ x) = φ(x t ) implies that the minimum is in the interval (x t , ¯ x). What is special about the golden section search is the way in which the test points are chosen. The golden ratio has the value ( = √ 5 −1 2 ≈ 0.61803 . . . . Given the points a and b bracketing a minimum, we choose two additional points x 1 and x 2 as x 1 = b −((b −a), x 2 = a +((b −a), which gives us four points in the order a, x 1 , x 2 , b. Now suppose that φ(x 1 ) < φ(x 2 ). Then we know that the location of the minimum is between a and x 2 . Conversely, if φ(x 2 ) < φ(x 1 ), the minimum lies between x 1 and b. In either case, we are left with three points, and the interior of these points is already in the right position to be 20 Chapter 2. Finite-Dimensional Optimization used in the next iteration. In the first case, for example, the new interval is (a, x 2 ), and the point x 1 satisfies the relationship x 1 = a +((x 2 −a). This leads to the following algorithm: Given the points a, x 1 , x 2 , and b and the corresponding values of the function, then 1. If φ(x 1 ) ≤ φ(x 2 ), then (a) Set b = x 2 , and φ(b) = φ(x 2 ). (b) Set x 2 = x 1 , and φ(x 2 ) = φ(x 1 ). (c) Set x 1 = b −((b −a), and compute φ(x 1 ). (Note: Use the value of b after updating as in 1(a).) 2. Else (a) Set a = x 1 , and φ(a) = φ(x 1 ). (b) Set x 1 = x 2 , and φ(x 1 ) = φ(x 2 ). (c) Set x 2 = a +((b −a), and compute φ(x 2 ). (Note: Use the value of a after updating as in 1(a).) 3. If the length of the interval is sufficiently small, then (a) If φ(x 1 ) ≤ φ(x 2 ), return x 1 as the minimizer. (b) Else return x 2 as the minimizer. 4. Else go to 1. Note: The assumption that the function is well behaved impies that at least one of φ(x 1 ) < φ(a) or φ(x 2 ) < φ(b) is true. Furthermore, “well behaved” implies that the second derivative φ xx (¯ x) > 0 and that φ x = 0 only at ¯ x on the interval. 2.2. Unconstrained Minimization 21 Newton Iteration The golden section search is simple and reliable. However, it requires knowledge of an interval containing the minimum. It also converges linearly; that is, the size of the interval containing the minimum is reduced by the same ratio (in this case () at each step. Consider instead the point ¯ x and assume that the function can be well approximated by the first few terms of the Taylor expansion about that point. That is, φ(x) = φ(¯ x + h) ≈ φ(¯ x) + φ x (¯ x)h + φ xx (¯ x) 2 h 2 . (2.26) Minimizing this expression over h gives h = − φ x (¯ x) φ xx (¯ x) . The method proceeds iteratively as x i+1 = x i − φ x (x i ) φ xx (x i ) . It can be shown that near the minimum this method converges quadratically. That is, [x i+1 − x o [ ∼ [x i − x o [ 2 . However, if the assumption (2.26) does not hold, the method will diverge quickly. 2.2.3 Functions of n Independent Variables: First-Order Conditions In this section the cost criterion φ() to be minimized is a function of an n-vector x. In order to characterize the length of the vector, the notion of a norm is introduced. (See Appendix A.2.9 for a more complete description.) For example, define the Euclidean norm as |x| = (x T x) 1 2 . 22 Chapter 2. Finite-Dimensional Optimization Theorem 2.2.2 Suppose x ∈ R n where x = [x 1 , . . . , x n ] T . Let φ(x) : R n → R and be differentiable. Let Ω be an open subset of R n . Let x o be an optimal solution to the problem 2 min x φ(x) subject to x ∈ Ω. (2.27) Then ∂φ ∂x x=x o = φ x (x o ) = 0, (2.28) where φ x (x o ) = [φ x 1 (x o ), . . . , φ x n (x o )] is a row vector. Proof: Since x o ∈ Ω is an open subset of R n , then there exists an > 0 such that x ∈ Ω whenever x belongs to an n-dimensional ball |x − x o | < (or an n-dimensional box [x i − x o i [ < i , i = 1, . . . , N). Therefore, for every vector α ∈ R n there is a γ > 0 (γ depends upon α) such that (x o + γα) ∈ Ω whenever 0 ≤ γ ≤ η, (2.29) where η is related to |α|. Since x o is optimal, we must then have φ(x o ) ≤ φ(x o + γα) whenever 0 ≤ γ ≤ η. (2.30) Since φ is once continuously differentiable, by Taylor’s Theorem (Equation (A.48)), φ(x o + γα) = φ(x o ) + φ x (x o )γα +O(γ), (2.31) where O(γ) is the remainder term, and O(γ) γ → 0 as γ → 0. Substituting (2.31) into the inequality (2.30) yields 0 ≤ φ x (x o )γα +O(γ). (2.32) 2 This implies that x o ∈ Ω. 2.2. Unconstrained Minimization 23 Dividing this by γ and letting γ →0 gives 0 ≤ φ x (x o )α. (2.33) Since the inequality must hold for all α ∈ R n , we have φ x (x o ) = 0. (2.34) Remark 2.2.4 Note that φ x (x o ) = 0 gives n nonlinear equations with n unknowns, φ x 1 (x o ) = 0, . . . , φ x n (x o ) = 0. (2.35) This can be solved for x o , but it could be a difficult numerical procedure. Remark 2.2.5 Sometimes, instead of γα, the variation can be written as x − x o = δx = γα, but instead of dividing by γ, we can divide by |δx|. 2.2.4 Functions of n Independent Variables: Second-Order Conditions Suppose φ is twice differentiable. Let x o ∈ Ω be locally minimum. Then φ x (x o ) = 0 and by Taylor’s expansion (see Appendix A.2.10) φ(x o + γα) = φ(x o ) + 1 2 γ 2 α T φ xx (x o )α +O(γ 2 ), (2.36) where O(γ 2 ) γ 2 →0 as γ →0. Note that φ xx = (φ T x ) x is a symmetric matrix φ xx = φ x 1 x 1 φ x 1 x n . . . φ x n x 1 φ x n x n ¸ ¸ . (2.37) 24 Chapter 2. Finite-Dimensional Optimization For γ > 0 sufficiently small, φ(x o ) ≤ φ(x o + γα) = φ(x o ) + 1 2 γ 2 α T φ xx (x o )α +O(γ 2 ) (2.38) ⇒ 0 ≤ 1 2 γ 2 α T φ xx (x o )α +O(γ 2 ). (2.39) Dividing through by γ 2 and letting γ →0 gives 1 2 α T φ xx (x o )α ≥ 0. (2.40) As shown in Appendix A.2.5, this means that φ xx (x o ) ≥ 0 (2.41) (nonnegative definite). This is a necessary condition for a local minimum. The sufficient conditions for a local minimum are φ x (x o ) = 0, φ xx (x o ) > 0 (2.42) (positive definite). These conditions are sufficient because the second variation dominates the Taylor expansion, i.e., if φ xx (x o ) > 0 there always exists a γ such that O(γ 2 )/γ 2 →0 as γ →0. Suppose φ xx (x o ) is positive definite. Then, (2.40) is satisfied by the strict inequality and the quadratic form has a nice geometric interpretation as an n-dimensional ellipsoid defined by α T φ xx (x o )α = b, where b is a given positive scalar constant. Example 2.2.1 Consider the performance criterion (or performance index) φ(x 1 , x 2 ) = (x 2 1 + x 2 2 ) 2 . Application of the first-order necessary conditions gives φ x 1 = 0 ⇒x o 1 = 0, φ x 2 = 0 ⇒x o 2 = 0. 2.2. Unconstrained Minimization 25 Check to see if (x o 1 , x o 2 ) is a minimum. Using the second variation conditions φ xx = ¸ φ x 1 x 1 φ x 1 x 2 φ x 2 x 1 φ x 2 x 2 = ¸ 1 0 0 1 is positive definite because the diagonal elements are positive and the determinant of the matrix itself is positive. Alternately, the eigenvalue of φ xx must be positive (see Appendix A.2.5). Example 2.2.2 Consider the performance index φ(x 1 , x 2 ) = x 1 x 2 . Application of the first-order necessary conditions gives φ x 1 = 0 ⇒x o 2 = 0, φ x 2 = 0 ⇒x o 1 = 0. Check to see if (x o 1 , x o 2 ) is a minimum. Using the second variation conditions, φ xx = ¸ 0 1 1 0 ⇒[φ xx −λI[ = −λ 1 1 −λ = λ 2 −1 = 0. Since the eigenvalues λ = 1, −1 are mixed in sign, then the matrix φ xx is called indefinite. 2.2.5 Numerical Optimization Schemes Three numerical optimization techniques are described: a first-order method called steepest descent, a second-order method known as the Newton–Raphson method, and a method that is somewhere between these in numerical complexity and rate of converges, denoted here as the accelerated gradient method. Steepest Descent (or Gradient) Method A numerical optimization method is presented based on making small perturbations in the cost criterion function about a nominal value of the state vector. Then, small 26 Chapter 2. Finite-Dimensional Optimization improvements are made iteratively in the value of the cost criterion. These small improvements are constructed by assuming that the functions evaluated with respect to these small perturbations are essentially linear and, thereby, predict the improvement. If the actual change and the predicted change do not match within given tolerances, then the size of the small perturbations is adjusted. Consider the problem min x φ(x). (2.43) Let x i be the value of an x vector at the ith iteration. Perturbing x gives φ(x) −φ(x i ) = ∆φ(x) = φ x (x i )δx +O([[δx[[), δx = x −x i . (2.44) Choose δx i = − i φ T x (x i ), such that x i+1 = x i + δx i , and ∆φ(x i+1 ) = − i φ x (x i )φ T x (x i ) +O( i ), (2.45) where the value chosen for i is sufficiently small so that the assumed linearity remains valid and the cost criterion decreases as shown in (2.45). As the local minimum is approached, the gradient converges as lim i→∞ φ x (x i ) →0. (2.46) For a quadratic function, the steepest descent method converges in an infinite number of steps. This is because the step size, as expressed by its norm [[δx i [[ = i [[φ T x (x i )[[, becomes vanishingly small. Newton–Raphson Method Assume that near the minimum the gradient method is converging slowly. To correct this, expand φ(x) to second order about the iteration value x i as ∆φ(x) = φ x (x i )δx + 1 2 δx T φ xx (x i )δx +O([[δx[[ 2 ), (2.47) 2.2. Unconstrained Minimization 27 where δx = x −x i . Assuming that φ xx (x i ) > 0, we get min δx ¸ φ x (x i )δx + 1 2 δx T φ xx (x i )δx ⇒δx i = −φ −1 xx (x i )φ T x (x i ), (2.48) giving ∆φ(x i+1 ) = − 1 2 φ x (x i )φ −1 xx (x i )φ T x (x i ) +O([[δx i [[ 2 ). (2.49) Note that if φ is quadratic, the Newton–Raphson method converges to a minimum in one step. Accelerated Gradient Methods Since it is numerically inefficient to compute φ xx (x i ), this second partial derivative can be estimated by constructing n independent directions from a sequence of gradients, φ T x (x i ), i = 1, 2, . . . , n. For a quadratic function, this class of numerical optimization algorithms, called accelerated gradient methods, converges in n steps. The most common of these methods are the quasi-Newton methods, so called because as the estimate of φ xx (x i ), called the Hessian, approaches the actual value, the method approaches the Newton–Raphson method. The first and possibly most famous of these methods is still in popular use for solving unconstrained parameter optimization problems. It is known as the Davidon–Fletcher–Powell method [17] and dates from 1959. The method proceeds as a Newton–Raphson method where the inverse of φ xx (x i ) is also estimated from the gradients and used as though it were the actual inverse of the Hessian. The most common implementation, described briefly here, uses a modified method of updating the estimate, known as the Broyden– Fletcher–Goldfarb–Shanno, or BFGS, update [9]. Let B i be the estimate to φ xx (x i ) at the ith iteration and g i be the gradient φ x (x i ). The method proceeds by computing the search direction s i from B i s i = −g i ⇒s i = −B −1 i g i . 28 Chapter 2. Finite-Dimensional Optimization A one-dimensional search (using, possibly, the golden section search, Section 2.2.2) is performed along this direction, and the minimum found is taken as the next nominal set of parameters, x i+1 . The estimate of the inverse of the Hessian is then updated as H k = B −1 i , ∆g = g i+1 −g i , (2.50) H i+1 = H i − H i ∆g∆g T H i ∆g T H i ∆g + s i s T i ∆g T s i , (2.51) where B i > 0. It can be shown that the method converges in n steps for a quadratic function and that for general functions, B i converges to φ xx (x o ) as x i →x o (assuming that φ xx (x o ) > 0). For larger systems, a class of methods known as conjugate gradient methods requires less storage and also converges in n steps for quadratic functions. They converge less quickly for general functions, but since they do not require storing the Hessian estimate, they are preferred for very large systems. Many texts on these and other optimization methods (for example, [23] and [5]) give detailed discussions. 2.3 Minimization Subject to Constraints The constrained parameter minimization problem is min x,u φ(x, u) subject to ψ(x, u) = 0, (2.52) where x ∈ R n , u ∈ R m , and ψ(x, u) is a known n-dimensional vector of functions. Note that the cost criterion is minimized with respect to n+m parameters. For ease of presentation the parameter vector is arbitrarily decomposed into two vectors (x, u). The point of this section is to convert a constrained problem to an unconstrained problem and then apply the results of necessity and sufficiency. We often choose 2.3. Minimization Subject to Constraints 29 ψ(x, u) = f(x, u) − c = 0, where f(x, u) are known n-dimensional functions and c ∈ R n is a given vector, so that different levels of the constraint can be examined. To illustrate the ideas and the methodology for obtaining necessary and sufficient conditions for optimality, we begin with a simple example, which is then extended to the general case. In this example the constrained optimization problem is transformed into an unconstrained problem, for which conditions for optimality were given in the previous sections. We will then relate this approach to the classical Lagrange multiplier method. 2.3.1 Simple Illustrative Example Find the rectangle of maximum area inscribed in an ellipse defined by f(x, u) = x 2 a 2 + u 2 b 2 = c, (2.53) where a, b, c are positive constants. The ellipse is shown in Figure 2.4 for c = 1. The area of a rectangle is the positive value of (2x)(2u). The optimization problem is max u (2x)(2u) = min u −4xu = min u φ(x, u) (2.54) subject to (x, u) ∈ Ω = ¦(x, u)[f(x, u) −c = 0¦, (2.55) where this becomes the area when 4xu is positive. It is assumed that x ∈ R, u ∈ R, f : R 2 →R, and φ : R 2 →R. The choice of u as the minimizing parameter where x satisfies the constraint is an arbitrary choice, and both x and u can be viewed as minimizing the cost criterion φ(x, u) and satisfying the constraint ψ = f(x, u) −c = 0. It is further assumed that f and φ are once continuously differentiable in each of their arguments. The main 30 Chapter 2. Finite-Dimensional Optimization Figure 2.4: Ellipse definition. difference between the constrained and unconstrained optimization problems is that in the constrained optimization problem, the set Ω is not an open set. Therefore, if (x o , u o ) is an extremal, we cannot assert that φ(x o , u o ) ≤ φ(x, u) for all (x, u) in an open set about (x o , u o ) since any admissible variation must satisfy the constraint. The procedure is to solve for x in terms of u to give an unconstrained problem for which u belongs to an open set. Some special considerations must be made on the function φ(x, u), which relates x and u. Note that for this problem, either x o = 0 or u o = 0, or both. Let x o = 0 so that in a small region about (x o , u o ), i.e., for x −x o = δx and u −u o = δu, [δx[ < β, [δu[ < , the change in the constraint is df = f(x o + δx, u o + δu) −f(x o , u o ) = f x (x o , u o )δx + f u (x o , u o )δu +O(d) = 0, (2.56) where d = (δx 2 + δu 2 ) 1 2 and O(d) d →0 as d →0. (2.57) 2.3. Minimization Subject to Constraints 31 Then, to first order, δf = f x (x o , u o )δx + f u (x o , u o )δu = 0 (2.58) can be solved as δx = −[f x (x o , u o )] −1 f u (x o , u o )δu (2.59) if f x (x o , u o ) = 0. This implies that f(x, u) = c may be solved for x in terms of u. More precisely, if f(x, u) is continuously differentiable and f x (x, u) is invertible, then the Implicit Function Theorem (see Appendix A.1.1) implies that there exists a rectangle ¯ V as [x −x o [ < β and [u −u o [ < , shown in Figure 2.5, such that x = g(u), (2.60) Figure 2.5: Definition of ¯ V . where g(u) has continuous derivatives in [u −u o [ < . Obtaining g(u) explicitly may be quite difficult. However, this turns out not to be necessary since all that will be required is the implicit representation of x = g(u), which is f(x, u) = c. 32 Chapter 2. Finite-Dimensional Optimization This implies that x o = g(u o ) and f(g(u), u) = c whenever [u − u o [ < . Since (x o , u o ) = (g(u o ), u o ) is an optimal point, it follows that u o is the optimal solution for min u ˆ φ(u) = min u φ(g(u), u) (2.61) subject to [u −u o [ < . Note that by explicitly eliminating the number of dependent variables x in terms of the independent variables u through the constraint, the objective function is now solved on an open set, where ˆ φ(u) is continuously differentiable since φ and g(u) are continuously differentiable. Therefore, our unconstrained results now apply. That is, using the chain rule, ˆ φ u (u o ) = φ x (x o , u o )g u (u o ) + φ u (x o , u o ) = 0. (2.62) We still need to determine g u (u). From f(g(u), u) = c we obtain f x (x o , u o )g u (u o ) + f u (x o , u o ) = 0, (2.63) ⇒ g u = − f u f x . (2.64) The required first-order necessary condition is obtained by substituting (2.64) into (2.62) as φ u −φ x f u f x = 0 at x o , u o . (2.65) Note that g need not be determined. The optimal variables (x o , u o ) are determined from two equations φ u −φ x f u f x = 0 ⇒−4x + 4u a 2 2x 2u b 2 = 0, (2.66) f(x, u) = c ⇒ x 2 a 2 + u 2 b 2 = c, (2.67) From (2.66) we obtain x − u 2 b 2 a 2 x = 0 ⇒ x 2 a 2 − u 2 b 2 = 0 ⇒ x 2 a 2 = u 2 b 2 . (2.68) 2.3. Minimization Subject to Constraints 33 Then, using (2.67) and (2.68), the extremal parameters 2u 2 b 2 = c, 2x 2 a 2 = c, (2.69) ⇒ u o = ±b c 2 , x o = ±a c 2 . (2.70) There are four extremal solutions, all representing the corners of the same rectangle. The minimum value is φ o (c) = ˆ φ o (c) = −2cab, (2.71) where the dependence of φ o (c) on the constraint level c is explicit. The maximum value is +2cab. First-Order Conditions for the Constrained Optimization Problem We structure the necessary conditions given in Section 2.3.1 by defining a scalar λ as λ = − φ x f x [ (x o ,u o ) . (2.72) Then (2.65) becomes φ u = −λf u = −λψ u , (2.73) and (2.72) becomes φ x = −λf x = −λψ x . (2.74) This means that at the optimal point, the gradient of φ is normal to the plane tangent to the constraint. This is depicted in Figure 2.6, where the tangent point is at the local minimum (u o , x o ). Finally, note that from (2.72) λ = 4u o 2x o /a 2 = 4b c 2 2 a 2 (a c 2 ) = 2ab, (2.75) 34 Chapter 2. Finite-Dimensional Optimization Figure 2.6: Definition of tangent plane. which is related to φ of (2.71) by λ = − ∂φ o (c) ∂c = 2ab. (2.76) This shows that λ is an influence function relating a change in the optimal cost criterion to a change in the constraint level. We will later show that λ is related to the classical Lagrange multiplier. Second-Order Necessary and Sufficient Conditions In (2.65) the first-order condition for a scalar constraint is given. This along with the second variation give necessary and sufficient conditions for local optimality. As- suming that φ and f are twice differentiable, then so are g and ˆ φ(u). Since u o lies in an open set, the second-order necessary condition obtained for the unconstrained optimization problem applies here as well. Therefore, ˆ φ uu (u o ) = φ uu + 2φ xu g u + φ x g uu + φ xx g 2 u ≥ 0. (2.77) 2.3. Minimization Subject to Constraints 35 To determine g u and g uu we expand f(g(u), u) = ˆ f(u) = c about u o and note that the coefficients of δu and δu 2 must be zero, ˆ f(u o + δu) = ˆ f(u o ) + ˆ f u (u o )δu + 1 2 ˆ f uu (u o )δu 2 + = c, (2.78) where ˆ f u = f x g u + f u = 0 ⇒g u = − f u f x , (2.79) ˆ f uu = f xx g 2 u + f x g uu + 2f xu g u + f uu = 0, (2.80) ⇒g uu = − 1 f x ¸ f uu −2f xu f u f x + f xx f u f x 2 ¸ . (2.81) Substitution into ˆ φ uu given in (2.77) produces the desired condition ¸ φ uu − φ x f x f uu − 2 ¸ φ xu − φ x f x f xu f u f x + ¸ φ xx − φ x f x f xx f u f x 2 ≥ 0. (2.82) By identifying λ = − φ x f x , then ˆ φ uu (u o ) = [φ uu + λf uu ] −2 [φ xu + λf xu ] f u f x + [φ xx + λf xx ] f u f x 2 ≥ 0. (2.83) A necessary condition for a local minimum is that ˆ φ uu (u o ) ≥ 0. The sufficient conditions for a local minimum are ˆ φ u (u o ) = 0 and ˆ φ uu (u o ) > 0. We derive the first- and second-order necessary and sufficient conditions by an alternate method, which has sufficient generality that it is used to generate these necessary and sufficient conditions in the most general case. In particular, for this two-parameter problem we introduce the Lagrange multiplier method. The cost criterion is augmented by the constraint by the Lagrange multiplier as H = φ(x, u) + λ(f(x, u) −c). Expanding the augmented cost H in a Taylor series to second order, H(x o + δx, u o + δu, λ + δλ) −H(x o , u o , λ) ∼ = H x δx + H u δu + H λ δλ + 1 2 [ δx δu δλ] H xx H xu H xλ H ux H uu H uλ H λx H λu H λλ ¸ ¸ δx δu δλ ¸ ¸ = 1 2 δ 2 H, (2.84) 36 Chapter 2. Finite-Dimensional Optimization where from (2.73) H u = 0, from (2.74) H x = 0 and H λ = f(x, u) − c = 0. There is no requirement that the second-order term be positive semidefinite for arbitrary variations in (x, u, λ). Intuitively, the requirement is that the function φ takes on a minimum value on the tangent plane of the constraint. This is done by using the relation between δx and δu of δx = − f u (x o , u o ) f x (x o , u o ) δu. (2.85) If this is substituted into the quadratic form in (2.84), the quadratic form reduces to (note H λλ ≡ 0) δ 2 H = δu ¸ H xx f u f x 2 −2H xu f u f x + H uu ¸ δu ≥ 0, (2.86) where the coefficient of δλ becomes identically zero, i.e., f x δx + f u δu = 0. The coefficient of the quadratic in (2.86) is identical to (2.83). For the particular example of finding the largest rectangle in an ellipse, the second variation of (2.83) is verified as ˆ φ uu (u o ) = 16a/b > 0, ensuring that φ is a locally constrained maximum at u o . 2.3.2 General Case: Functions of n-Variables Theorem 2.3.1 Let f i : R n+m → R, i = 1, . . . , n, be n continuously differentiable constraints and φ : R n+m → R be the continuously differentiable performance index. Let x o ∈ R n and u o ∈ R m be the optimal variables of the problem φ o = min x,u φ(x, u) (2.87) subject to f i (x, u) = c i , i = 1, . . . , n, or f(x, u) = c. Suppose that at (x o , u o ) the nn matrix f x (x o , u o ) is nonsingular, then there exists a vector λ ∈ R n such that φ x (x o , u o ) = −λ T f x (x o , u o ), (2.88) φ u (x o , u o ) = −λ T f u (x o , u o ). (2.89) 2.3. Minimization Subject to Constraints 37 Furthermore, if (x o (c), u o (c)) are once continuously differentiable functions of c = [c 1 , . . . , c n ] T , then φ o (c) is a differentiable function of c and λ T = − ∂φ o (c) ∂c . (2.90) Remark 2.3.1 We choose ψ(x, u) = f(x, u) − c = 0 without loss of generality so that different levels of the constraint c can be examined and related to φ o as given in (2.90). Proof: Since f x (x o , u o ) is nonsingular, by the Implicit Function Theorem (see section A.1.1) there exists an > 0, an open set V ∈ R n+m containing (x o , u o ), and a differentiable function g : | →R n , where | = [u : |u −u o | < ]. 3 This means that f(x, u) = c, x T , u T ∈ V, (2.91) implies that x = g(u) for u ∈ |. (2.92) Furthermore, g(u) has a continuous derivative for u ∈ |. Since (x o , u o ) = (g(u o ), u o ) is optimal, it follows that u o is an optimal variable for a new optimization problem defined by min u φ(g(u), u) = min u ˆ φ(u) subject to u ∈ |. (2.93) | is an open subset of R m and ˆ φ is a differentiable function on |, since φ and g are differentiable. Therefore, Theorem 2.2.2 is applicable, and by the chain rule ˆ φ u (u o ) = φ x g u + φ u [ u=u o ,x=g(u o ) = 0. (2.94) 3 | is the set of points u that lie in the ball defined by |u −u o | < . 38 Chapter 2. Finite-Dimensional Optimization Furthermore, f(g(u), u) = c for all u ∈ |. This means that all derivatives of f(g(u), u) are zero, in particular the first derivative evaluated at (x o , u o ): f x g u + f u = 0. (2.95) Again, since the matrix function f x (x o , u o ) is nonsingular, we can evaluate g u as g u = −f −1 x f u . (2.96) Substitution of g u into (2.94) gives φ u −φ x f −1 x f u (x o ,u o ) = 0. (2.97) Let us now define the n-vector λ as λ T = −φ x f −1 x [ (x o ,u o ) . (2.98) Then (2.97) and (2.98) can be written as [φ x , φ u ] = −λ T [f x , f u ] . (2.99) Now we will show that λ T = −φ o c (c). Since by assumption f(x, u) and (x o (c), u o (c)) are continuously differentiable, it follows that in a neighborhood of c, f x is nonsingular. Then f(x o (c), u o (c)) = c, (2.100) φ u −φ x f −1 x f u = 0 (2.101) using the first-order condition. By differentiating φ o (c) = φ(u o , x o ), φ o c = φ x x o c + φ u u o c . (2.102) Differentiating f(x o (c), u o (c)) = c gives f x x o c + f u u o c = I ⇒x o c + f −1 x f u u o c = f −1 x . (2.103) 2.3. Minimization Subject to Constraints 39 Multiplying by φ x gives φ x x o c + φ x f −1 x f u u o c = φ x f −1 x = −λ T . (2.104) Using the first-order condition of (2.89) gives the desired result φ o c = −λ T . (2.105) Remark 2.3.2 Equation (2.99) shows that the gradient of the cost function [φ x , φ u ] is orthogonal to the tangent plane of the constraint at (x o , u o ). Since [f x , f u ] form n independent vectors (because f x is nonsingular), then the tangent plane is described by the set of vectors h such that [f x , f u ]h = 0. The set of vectors which are orthogonal to this tangent surface is any linear combination of the gradient [f x , f u ]. In particular, if λ T [f x , f u ] is such that [φ x , φ u ] = −λ T [f x , f u ], (2.106) then [φ x , φ u ] is orthogonal to the tangent surface. Lagrange Multiplier Approach Identical necessary conditions to those obtained above are derived formally by the Lagrange multiplier approach. By adjoining the constraint f = c to the cost function φ with an undetermined n-vector Lagrange multiplier λ, a function H(x, u, λ) is defined as 4 H(x, u, λ) = φ(x, u) + λ T (f(x, u) −c), (2.107) and we construct an unconstrained optimization problem in the 2n + m variables x, u, λ. Therefore, we look for the extremal point of H with respect to x, u, and λ, 4 Note that H(x, u, λ) = φ(x, u) when the constraint is satisfied. 40 Chapter 2. Finite-Dimensional Optimization where x, u, and λ are considered free and can be arbitrarily varied within some small open set containing x o , u o , and λ. From our unconstrained optimization results we have H x = φ x + λ T f x = 0, (2.108) H u = φ u + λ T f u = 0, (2.109) H λ = f(x o , u o ) −c = 0. (2.110) This gives us 2n+m equations in 2n+m unknowns x, u, λ. Note that satisfaction of the constraint is now satisfaction of the necessary condition (2.110). 2.3.3 Constrained Parameter Optimization: An Algorithmic Approach In this section we extend the steepest descent method of Section 2.2.5 to include equality constraints. The procedure suggested is to first satisfy the constraint, i.e., constraint restoration. Then, a gradient associated with changes in the cost criterion along the tangent plane to the constraint manifold is constructed. This is done by forming a projector that annihilates any component of the gradient of the cost criterion in the direction of the gradient of the constraint function. Since these gradients are determined from the first-order term in a Taylor series of the cost criterion and the constraint functions, the steps used in the iteration process must be sufficiently small to preserve the validity of this assumed linearity. Suppose y = [x T , u T ] T . The parameter optimization problem is min y φ(y) subject to f(y) = c, (2.111) where φ and f are assumed to be sufficiently smooth so that for small changes in y away from some nominal value y i , φ and f can be approximated by the first term of 2.3. Minimization Subject to Constraints 41 a Taylor series about y i as (δy = y −y i ), δφ ∼ = φ y δy, (2.112) δf ∼ = f y δy. (2.113) In the following we describe a numerical optimization algorithm composed of a constraint restoration step followed in turn by a minimization step. Although these steps can be combined, they are separated here for pedagogical reasons. Constraint Restoration Since f = c describes a manifold in y space and assuming y i is a point not on f = c, from (2.113) a change in the constraint level is related to a change in y. To move in the direction of constraint satisfaction choose δy as δy = f T y (f y f T y ) −1 δf, (2.114) where the choice of δf = −ψ = (c − f(y i )) for small > 0 forms an iterative step of driving f to c. Note that δy in (2.114) is a least-squares solution to (2.113) where f y is full rank. At the end of each iteration to satisfy the constraint, set y i = y i +δy. The iteration sequence stops when for 1 > 0, [c −f[ < 1 <1. Constrained Minimization Since f = c describes a manifold in y space and assuming y i is a point on f = c, then f y is perpendicular to the tangent plane of f = c at y i . To ensure that changes in the cost φ(y) are made only in the tangent plane, so that the constraint will not be violated (to first order), define the projection operator as P = I −f T y (f y f T y ) −1 f y , (2.115) 42 Chapter 2. Finite-Dimensional Optimization which has the properties that PP = P, Pf T y = 0, P = P T . (2.116) Therefore, the projection operator will annihilate components of a vector along f T y . The object is to use this projector to ensure that if changes are made in improving the cost, they are made in only the tangent line to the constraint. See Figure 2.7 for a geometrical description where y = [x T , u T ] T and f, x, u are scalars. T y -P 0 ψ = u i u i x x T y φ T y ψ Lines of constant performance index. Figure 2.7: Geometrical description of parameter optimization problem. This projected gradient is constructed by choosing the control changes in the steepest descent direction, while tangential to the constraint ψ(x, u) = f(x, u) − c = 0, i.e., δy = −Pφ T y , (2.117) where again is a positive number chosen small so as not to violate the assumed linearity. With this choice of δy, the cost criterion change to first order is δφ = −φ y Pφ T y = −φ y PP T φ T y = −|φ y P| 2 (2.118) 2.3. Minimization Subject to Constraints 43 (since PP T = P). Note that the constraint is satisfied to first order (|δy| < <1) ∆f = δf +O(|δy|), (2.119) where δf = f y δy = −f y Pφ T y = 0. The second-order constraint violation is then restored by going back to the constraint restoration step. This iterative process between constraint restoration and constrained minimization is continued until the stationary necessary conditions Pφ T y = 0, f = c (2.120) are met. Note that the constraint restoration and optimization steps can be combined given the assumed linearity. This optimization algorithm is called steepest descent optimization with constraints. The Lagrange multiplier technique is consistent with the results of (2.120). The n-vector Lagrange multiplier λ is now shown to contribute to the structure of the projector. If the constraint variation δf is adjoined to δφ by the Lagrange multiplier λ in (2.113), the augmented cost variation δ ¯ φ is δ ¯ φ = (φ y + λ T f y )δy. (2.121) If δy is chosen as δy = −(φ y + λ T f y ) T , (2.122) then by the usual arguments, a minimum occurs for the augmented cost when φ y + λ T f y = 0, (2.123) where f(y i ) − c = 0 and f y δy = 0. Postmultiplying (2.123) by f T y and solving for λ at (x o , u o ) results in λ T = −φ y f T y (f y f T y ) −1 . (2.124) 44 Chapter 2. Finite-Dimensional Optimization Substituting (2.124) back into (2.123) results in φ y −φ y f T y (f y f T y ) −1 f y = φ y I −f T y (f y f T y ) −1 f y = φ y P = 0, (2.125) which is just the first condition of (2.120). The constraint projection and restoration method described here is an effective method for numerical solution of minimization problems subject to equality constraints. Several other methods are also in common use, with many sharing significant ideas. All such methods are subject to a number of difficulties in actual implementation. These are beyond the scope of this text, but the interested reader may see [23], [5], and [36] for more information. 2.3.4 General Form of the Second Variation Assume that φ and f are twice differentiable. Since f is twice differentiable, so is g and, therefore, ˆ φ(u). Whereas u o lies in an open set, the general second-order necessary condition for an unconstrained optimization problem applies. Producing the inequality by the procedure of the previous section is laborious. Rather, we use the equivalent Lagrange multiplier approach, where H = φ + λ T ψ. (2.126) (If ψ = f(x, u) − c = 0, then H = φ + λ T (f − c).) Then, expanding the augmented cost to second order, assuming first-order necessary conditions hold, H(x o + δx, u o + δu, λ + δλ) −H(x o , u o , λ) = 1 2 δ 2 H +O(d 2 ) = 1 2 [ δx T δu T δλ T ] H xx H xu H xλ H ux H uu H uλ H λx H λu H λλ ¸ ¸ δx δu δλ ¸ ¸ +O(d 2 ), (2.127) 2.3. Minimization Subject to Constraints 45 where the first-order condition δH = 0 (H x = 0, H u = 0 and H λ = f −c = 0) is used and d = |δx T , δu T , δλ T |. From x o = g(u o ) and its properties in (x, u) ∈ V (see the Implicit Function Theorem, Section A.1.1), δx = g u δu +O(|δu|), (2.128) where f(g(u), u) = ˆ f(u) = c requires that all its derivatives be zero. In particular, ˆ f u = H λx g u + H λu = f x g u + f u = 0 ⇒g u = −f −1 x f u . (2.129) Using (2.128) and (2.129) in (2.127), the second variation reduces to δ 2 H = δu T g T u H xx g u + H ux g u + g T u H xu + H uu δu ≥ 0. (2.130) Therefore, the necessary condition for local optimality is [ g T u I ] ¸ H xx H xu H ux H uu ¸ g u I ≥ 0, (2.131) and the sufficiency condition is [ g T u I ] ¸ H xx H xu H ux H uu ¸ g u I > 0, (2.132) along with the first-order conditions. Note 2.3.1 Again, the coefficients associated with δλ are zero. Note 2.3.2 The definiteness conditions are only in an m-dimensional subspace associated with the tangent plane of the constraints evaluated at (x o , u o ). 2.3.5 Inequality Constraints: Functions of 2-Variables An approach for handling optimization problems with inequality constraints is to convert the inequality constraint to an equality constraint by using a device called 46 Chapter 2. Finite-Dimensional Optimization slack variables. Once the problem is in this form, the previous necessary conditions are applicable. We present the 2-variable optimization problem for simplicity, but the extension to n dimensions is straightforward. Theorem 2.3.2 Let φ : R 2 →R and θ : R 2 →R be twice continuously differentiable. Let (x o , u o ) be the optimal variables of the problem φ o = min x,u φ(x, u) subject to θ(x, u) ≤ 0, (2.133) where if θ(x o , u o ) = 0, then θ x = 0. Then there exists a scalar ν ≥ 0 such that (φ x , φ u ) = −ν(θ x , θ u ). (2.134) Remark 2.3.3 If x o , u o lies in the interior of the constraint, then ν = 0 and the necessary conditions become those of an unconstrained minimization problem, i.e., (φ x , φ u ) = 0. If x o , u o lies on the boundary of the constraint, ν > 0 and the necessary conditions of (2.134) hold. Proof: We first convert the inequality constraint into an equality constraint by introducing a slack variable α such that θ(x, u) = −α 2 . (2.135) For any real value of α ∈ R the inequality constraint is satisfied. An equivalent problem is φ o = min x,u φ(x, u) subject to θ(x, u) + α 2 = 0, (2.136) which was considered in the last section. 2.3. Minimization Subject to Constraints 47 For simplicity, we use the Lagrange multiplier approach here. Adjoin the constraint θ + α 2 = 0 to the cost function φ by an undetermined multiplier ν as H = φ(x, u) + ν(θ(x, u) + α 2 ). (2.137) We look for the extremal point for H with respect to (x, u, α, ν). From our unconstrained optimization results we have H x = φ x + ν o θ x = 0 at (x o , u o ), (2.138) H u = φ u + ν o θ u = 0 at (x o , u o ), (2.139) H ν = θ(x o , u o ) + α o2 = 0, (2.140) H α = 2ν o α o = 0. (2.141) This gives four equations in four unknowns x, u, ν, α. From (2.138) and (2.139) we obtain the condition shown in (2.134). The objective now is to show that ν ≥ 0. If α o = 0, then ν o = 0 off the boundary of the constraint (in the admissible interior). If α o = 0, then the optimal solution is on the constraint boundary where ν o = 0. To determine if ν o ≥ 0, the second variation is used as δ 2 H = [ δx δu δα δν ] H xx H xu H xα H xν H ux H uu H uα H uν H αx H αu H αα H αν H νx H νu H να H νν ¸ ¸ ¸ ¸ δx δu δα δν ¸ ¸ ¸ ¸ , (2.142) where the variation of the constraint is used to determine δx in terms of δu and δα as θ x δx + θ u δ u + 2αδα = 0. (2.143) However, on the constraint, α o = 0. Since by assumption θ x = 0, δx = − θ u θ x δu. (2.144) 48 Chapter 2. Finite-Dimensional Optimization In addition, H xα = 0, H uα = 0, H νν = 0, (2.145) H αα = 2ν o , H να = 2α o = 0. Using (2.144) and (2.145) in the second variation, (2.142) reduces to δ 2 H = [ δu δα] ¸ ¸ H uu −2H ux θ u θ x + H xx θ u θ x 2 0 0 2ν ¸ ¸ δu δα ≥ 0, (2.146) and then δ 2 H is positive semidefinite if H uu − 2H ux θ u θ x + H xx θ u θ x 2 ≥ 0, (2.147) ν ≥ 0. (2.148) Note: The second variation given above is only for the case in which the optimal variables lie on the boundary. If the optimal point lies in the interior of the constraint, then the unconstrained results apply. This simple example can be generalized to the case with n inequality constraints. There are many fine points in the extension of this theory. For example, if all the inequality constraints are feasible at or below zero, then under certain conditions the gradient of the cost criterion is contained at the minimum to be in a cone constructed from the gradients of the active constraint functions (i.e., α = 0 in the above twovariable case). This notion is implied by the Kuhn–Tucker theorem [33]. In this chapter, we have attempted only to give an introduction that illuminates the principles of optimization theory and the concepts that will be used in following chapters. 2.3. Minimization Subject to Constraints 49 Problems 1. A tin can manufacturer wants to find the dimensions of a cylindrical can (closed top and bottom) such that, for a given amount of tin, the volume of the can is a maximum. If the thickness of the tin stock is constant, a given amount of tin implies a given surface area of the can. Use height and radius as variables and use a Lagrange multiplier. 2. Determine the point x 1 , x 2 at which the function φ = x 1 + x 2 is a minimum, subject to the constraint x 2 1 + x 1 x 2 + x 2 2 = 1. 3. Minimize the performance index φ = 1 2 (x 2 + y 2 + z 2 ) subject to the constraints x + 2y + 3z = 10, x −y + 2z = 1. Show that x = 19 59 , y = 146 59 , z = 93 59 , λ 1 = −55 59 , λ 2 = 36 59 . 4. Minimize the performance index φ = 1 2 (x 2 + y 2 + z 2 ) 50 Chapter 2. Finite-Dimensional Optimization subject to the constraint x + 2y −3z −7 = 0. 5. Minimize the performance index φ = x −y + 2z subject to the constraint x 2 + y 2 + z 2 = 2. 6. Maximize the performance index φ = x 1 x 2 subject to the constraint x 1 + x 2 −1 = 0. 7. Minimize the performance index φ = −x 1 x 2 + x 2 x 3 + x 3 x 1 subject to the constraint x 1 + x 2 −x 3 + 1 = 0 8. Minimize the performance index φ = √ 4 −3x 2 subject to the constraint −1 ≤ x ≤ 1. 2.3. Minimization Subject to Constraints 51 9. Maximize the performance index φ = xu subject to the inequality constraint x + u ≤ 1. 10. (a) State the necessary and sufficient conditions and underlying assumptions for x ∗ , u ∗ to be locally minimizing for the problem of minimizing φ = φ(u, x) x ∈ R n , u ∈ R m , and subject to Ax + Bu = C. (b) Find the extremals of φ = e x 2 1 +x 2 subject to x 2 1 + x 2 2 = 1 2 . 11. (a) In the two-dimensional xt-plane, determine the extremal curve of stationary length which starts on the circle x 2 + t 2 − 1 = 0 and terminates on the line t = T = 2. (b) Solve problem (a) but consider that the termination is on the line −x +t = 2 √ 2. Note: Parts (a) and (b) are not to be solved by inspection. 53 Chapter 3 Optimization of Dynamic Systems with General Performance Criteria 3.1 Introduction In accordance with the theme of this book outlined in Chapter 1, we use linear algebra, elementary differential equation theory, and the definition of the derivative to derive conditions that are satisfied by a control function which optimizes the behavior of a dynamic system relative to a specified performance criterion. In other words, we derive necessary conditions and also a sufficient condition for the optimality of a given control function. In Section 3.2 we begin with the control of a linear dynamic system relative to a general performance criterion. Restricting attention to a linear system and introducing the notion of weak control perturbations allows an easy derivation of a weak form of the first-order necessary conditions. Then we extend these necessary conditions to nonlinear systems with the aid of a theorem by Bliss [6] on the differentiability of the solution of an ordinary differential equation with respect to a parameter. Next, we comment upon the two-point boundary-value problem based on these necessary 54 Chapter 3. Systems with General Performance Criteria conditions. We then introduce the notion of strong control perturbations, which allows the derivation of a stronger form of the first-order necessary conditions, which are referred to as Pontryagin’s Principle [38]. This result is further strengthened upon the introduction of control variable constraints. After having observed that Pontryagin’s Principle is only a necessary condition for optimality, we introduce the Hamilton–Jacobi–Bellman (H-J-B) equation and provide a general sufficient condition for optimality. The dependent variable of the H-J-B partial differential equation is the optimal value function, which is the value of the cost criterion using the optimal control. Using the H-J-B equation, we relate the derivative of the optimal value function to Pontryagin’s Lagrange multipliers. Then we derive the H-J-B equation on the assumption that the optimal value function exists and is once continuously differentiable. Finally, we treat the case of unspecified final time and derive an additional necessary condition, called the transversality condition. We illustrate, where necessary, the conditions that we develop in this chapter by working out several examples. Remark 3.1.1 Throughout this book, time (or the variable t) is considered the independent variable. This need not always be the case. In fact, the choice of what constitutes a state, a control, and a “running variable” can drastically alter the ease with which a problem may be solved. For example, in a rocket launch, the energy of the vehicle (kinetic plus potential) can be considered as a state, a control, or the independent variable. The choice of which depends on the specifics of the problem at hand. Since once those items are chosen the notation becomes a matter of choice, we will stick to calling the states x, the controls u, and the independent variable t, and the problems in this book are laid out in that notation. 3.2. Linear Dynamic Systems 55 3.2 Linear Dynamic Systems with General Performance Criterion The linear dynamic system to be controlled is described by the vector linear differential equation ˙ x(t) = A(t)x(t) + B(t)u(t), (3.1) x(t 0 ) = x 0 , where x() and u() are, respectively, n and m vector functions of time t, and where A() and B() are n n and n m matrix functions of time t. The initial condition at time t = t 0 for (3.1) is x 0 . We make the following assumptions. Assumption 3.2.1 The elements a ij () and b kl () of A() and B() are continuous functions of t on the interval [t 0 , t f ], t f > t 0 . Assumption 3.2.2 The control function u() is drawn from the set | of piecewise continuous m-vector functions of t on the interval [t 0 , t f ]. The optimal control problem is to find a control function u o () which minimizes the performance criterion J (u(); x 0 ) = φ(x(t f )) + t f t 0 L(x(t), u(t), t) dt. (3.2) Here L, the Lagrangian, and φ are scalar functions of their arguments, and we make the following assumptions concerning these functions. Note 3.2.1 The notation (), as used in L(), is used to denote the functional form of L. 56 Chapter 3. Systems with General Performance Criteria Assumption 3.2.3 The scalar function L(, , ) is once continuously differentiable in x and u and is continuous in t on [t 0 , t f ]. Assumption 3.2.4 The scalar function φ() is once continuously differentiable in x. t u(t) strong variation weak variation nominal path Figure 3.1: Depiction of weak and strong variations. We now suppose that there is a piecewise continuous control function u o () that minimizes (3.2), and we derive first-order conditions which this control will satisfy. In the derivation to come we make use of the properties of (3.1) developed in the next section. Remark 3.2.1 In this chapter, two types of variations (or perturbations) are considered: strong and weak variations. These are depicted graphically in Figure 3.1. Note that the strong variation is characterized by large variations over a very short interval. Contrast this with the weak variations, which are small perturbations over a large time interval. Also note that under the correct conditions (to be introduced 3.2. Linear Dynamic Systems 57 in this chapter), both weak and strong variations in the control can produce “small” state variations. 3.2.1 Linear Ordinary Differential Equation Under the assumptions made in Section 3.2 it is well known that (3.1) has for each u() a unique solution defined on [t 0 , t f ]. This solution [8] is given by x(t) = Φ(t, t 0 ) x 0 + t t 0 Φ(t, τ) B(τ)u(τ)dτ, (3.3) where Φ(, ) is an nn matrix function of t and τ (the transition matrix corresponding to A()) which satisfies the homogeneous ordinary differential equation d dt Φ(t, τ) = A(t)Φ(t, τ) , (3.4) Φ(t 0 , t 0 ) = I. Let the control function u o () generate the trajectory x o (), and suppose we perturb the control function u o () by adding to it an arbitrary piecewise continuous function εη(), where ε is a small positive scalar parameter. Let the trajectory that results as a consequence of the control function u o () + εη() be x o () + ξ(; ε). We have from (3.3) that x o (t) + ξ(t; ε) = Φ(t, t 0 )x 0 + t t 0 Φ(t, τ)B(τ) [u o (τ) + εη(τ)] dτ, (3.5) and it then follows that ξ(t; ε) = ε t t 0 Φ(t, τ)B(τ)η(τ)dτ. (3.6) Upon defining z(t; η()) = t t 0 Φ(t, τ)B(τ)η(τ)dτ, (3.7) 58 Chapter 3. Systems with General Performance Criteria we see that ξ(t; ε) = εz(t; η()), (3.8) so that by linearity perturbing u o () by εη() perturbs x o () by a function of exactly the same form, viz., εz(t; η()). Remark 3.2.2 The notation used in this book can be compared to the variational notation like that used in [11] by noting (for example) u() = u o () + δu() ∈ | ⇒|δu()| ≤ ε 1 , δu() = εη(). (3.9) 3.2.2 Expansion Formula The definition of a differentiable scalar function permits us to make the following Taylor expansion (see Appendix A.1.2) in the parameter ε: L(x o (t) + εz(t; η()), u o (t) + εη(t), t) = L(x o (t), u o (t), t) + εL x (x o (t), u o (t), t)z(t; η()) +εL u (x o (t), u o (t), t)η(t) +O(t; ε), (3.10) where the function O(t; ε) is piecewise continuous in t and O(t; ε) ε →0 as ε →0 for each t. 3.2.3 Adjoining System Equation We may adjoin (3.1) to the performance criterion (3.2) by means of a continuously differentiable n vector function of time λ(), called in the classical literature the Lagrange multiplier (see Section 2.3.2), as follows: ˆ J (u(); λ(), x 0 ) = J (u(); x 0 ) + t f t 0 λ T (t) [A(t)x(t) + B(t)u(t) − ˙ x(t)] dt. (3.11) 3.2. Linear Dynamic Systems 59 Note that when the differential constraint is satisfied, ˆ J (u(); λ(), x 0 ) = J (u(); x 0 ) (3.12) when (3.1) holds, so that nothing is gained or lost by this step. Because of the assumed differentiability of λ() with respect to t, we may integrate by parts in (3.11) to obtain ˆ J (u(); λ(), x 0 ) = J (u(); x 0 ) + t f t 0 ˙ λ T (t)x(t) + λ T (t)A(t)x(t) +λ T (t)B(t)u(t) dt +λ T (t 0 )x 0 −λ T (t f )x f . (3.13) 3.2.4 Expansion of ´ J We now evaluate the change in ˆ J (i.e., in J, since ˆ J = J when the dynamics are satisfied) brought about by changing u o to u o () + εη(). Defining the change in the performance index as ∆ ˆ J = ˆ J([u o () + εη()]; λ(), x 0 ) − ˆ J(u o ; λ(), x 0 ) leads to the following expansion: ∆ ˆ J = t f t 0 εL x (x o (t), u o (t), t)z(t; η()) + εL u (x o (t), u o (t), t)η(t) + ε ˙ λ T z(t, η()) +ελ T (t)A(t)z(t; η()) + ελ T (t)B(t)η(t) +O(t; ε) dt −ελ T (t f )z(t f ; η()) + εφ x (x o (t f ))z(t f ; η()) +O(ε). (3.14) Note that the coefficient of a first-order change in λ is zero, since it adjoins the system dynamics. Now let us choose − ˙ λ T (t) = L x (x o (t), u o (t), t) + λ T (t)A(t), (3.15) λ T (t f ) = φ x (x o (t f )). This is a legitimate choice for λ() as (3.15) is a linear ordinary differential equation in λ with piecewise continuous coefficients, having a unique solution. The right-hand 60 Chapter 3. Systems with General Performance Criteria side of (3.14) then simplifies to ∆ ˆ J = ε t f t 0 L u (x o (t), u o (t), t)η(t) + λ T (t)B(t)η(t) dt + t f t 0 O(t; ε)dt +O(ε) (3.16) = εδ ˆ J + t f t 0 O(t; ε)dt +O(ε). (3.17) 3.2.5 Necessary Condition for Optimality Let u o () be the minimizing control. Therefore, x o () is the resulting optimal path. As η() can be any piecewise continuous function on [t 0 , t f ] we may, because of our assumptions on L, λ, and B, set η(t) = − L u (x o (t), u o (t), t) + λ T (t)B(t) T . (3.18) Substituting (3.18) into (3.16) yields ∆ ˆ J = −ε t f t 0 L u (x o (t), u o (t), t) + λ T (t)B(t) 2 dt + t f t 0 O(t; ε)dt +O(ε) ≥ 0. (3.19) For ε small and positive, the first term in (3.19) is dominant, so −ε t f t 0 L u (x o (t), u o (t), t) + λ T (t)B(t) 2 dt ≥ 0, (3.20) where the variation is nonnegative because u o () minimizes J. The left-hand side of (3.20) is negative if L u (x o (t), u o (t), t) + λ T (t)B(t) = 0 ∀t ∈ [t 0 , t f ]. (3.21) It follows, then, that a necessary condition for u o () to minimize ˆ J and thus J is that L u (x o (t), u o (t), t) + λ T (t)B(t) = 0 (3.22) for all t in [t 0 , t f ]. The multiplier λ() is obtained by (3.15) and satisfies the assumption that λ() is continuously differentiable. 3.2. Linear Dynamic Systems 61 3.2.6 Pontryagin’s Necessary Condition for Weak Variations The necessary condition for optimality of u o () derived in Section 3.2.5 can be restated in terms of a particular Hamiltonian function to yield a weak version of Pontryagin’s Principle for the optimal control problem formulated in Section 3.2. Let us define the Hamiltonian H(x(t), u(t), λ(t), t) = L(x(t), u(t), t) + λ T (t)[A(t)x(t) +B(t)u(t)]. (3.23) Then from (3.22) and (3.15) we have the following theorem. Theorem 3.2.1 Suppose that u o () minimizes the performance criterion (3.2). Then, under Assumptions 3.2.1–3.2.4, the partial derivative of the Hamiltonian with respect to the control u(t) is zero when evaluated at x o (t), u o (t), viz., H u (x o (t), u o (t), λ(t), t) = 0, (3.24) where − ˙ λ T (t) = L x (x o (t), u o (t), t) + λ T (t)A(t), (3.25) λ T (t f ) = φ x (x o (t f )). Note 3.2.2 These necessary conditions have converted a functional minimization into a function minimization at each point in time. An Example Illustrating Pontryagin’s Necessary Condition Let us consider the system ˙ x 1 (t) = x 2 (t), x 1 (t 0 ) = x 10 , ˙ x 2 (t) = u(t), x 2 (t 0 ) = x 20 (3.26) 62 Chapter 3. Systems with General Performance Criteria and the performance functional J(u(); x 0 ) = x 1 (t f ) + 1 2 t f t 0 u 2 (t)dt. (3.27) We guess that ˜ u(t) = −(t f −t) (3.28) is an optimizing control. In order to test whether this is a candidate for an optimal control, we check whether the necessary condition developed above is satisfied, although the necessary conditions can explicitly determine this control; i.e., see (3.32). The Hamiltonian for this problem is given by H(x, u, λ, t) = 1 2 u 2 + λ 1 x 2 + λ 2 u (3.29) so that − ˙ λ 1 (t) = H x 1 = 0, λ 1 (t f ) = 1, − ˙ λ 2 (t) = H x 2 = λ 1 (t), λ 2 (t f ) = 0. (3.30) Hence we obtain λ 1 (t) = 1, λ 2 (t) = (t f −t). (3.31) From (3.29) we have that H u (x, u, λ, t) = u + λ 2 , (3.32) and it follows from Equations (3.28), (3.31), and (3.32) that H u (˜ x(t), ˜ u(t), ˜ λ(t), t) = 0. (3.33) Note that we have not shown that ˜ u() minimizes J but only that it satisfies a condition that an optimizing control must satisfy, i.e., that it satisfies a necessary condition for optimality. 3.2. Linear Dynamic Systems 63 Using the Lagrange multiplier method we now prove directly that ˜ u() is indeed minimizing. We adjoin (3.26) to J using λ() and integrate by parts to obtain ˆ J(u(); λ(), x 0 ) = t f t 0 ¸ 1 2 u 2 (t) + λ 1 (t)x 2 (t) + λ 2 (t)u(t) + ˙ λ 1 (t)x 1 (t) + ˙ λ 2 (t)x 2 (t) dt +x 1 (t f ) + λ 1 (t 0 )x 10 + λ 2 (t 0 )x 20 −λ 1 (t f )x 1 (t f ) −λ 2 (t f )x 2 (t f ). (3.34) Using (3.30) and (3.31) to specify λ() in (3.34) yields ˆ J(u(); λ(), x 0 ) = t f t 0 ¸ 1 2 u 2 (t) + (t f −t)u(t) dt +λ 1 (t 0 )x 10 + λ 2 (t 0 )x 20 . (3.35) Now the only term that depends upon u() is the integral, which we can write as 1 2 t f t 0 (u + t f −t) 2 −(t f −t) 2 dt, (3.36) and this clearly takes on its minimum value when we set u(t) = ˜ u(t) = −(t f −t), (3.37) and the minimum value of ˆ J(u(); λ(), x 0 ) is then min u ˆ J(u(); λ(), x 0 ) = − 1 2 t f t 0 (t f −t) 2 dt + λ 1 (0)x 10 + λ 2 (0)x 20 = − 1 6 (t f −t 0 ) 3 + x 10 + (t f −t 0 )x 20 . (3.38) Problems 1. In the example above, assume λ(t) = P(t)x(t). Find a differential equation for P and a feedback control law in the form u o (t) = Λ(t)x(t), i.e., determine Λ(t). 64 Chapter 3. Systems with General Performance Criteria 2. Consider the cost criterion J = φ 1 (x(t f )) + φ 2 (x(t 1 )) + t f t 0 L(x, u, t)dt (3.39) subject to ˙ x = Ax + Bu, x 0 = x(t 0 ) given, (3.40) and t 0 < t 1 < t f with t 1 and t f fixed. Determine the Pontryagin necessary conditions for this problem. 3.3 Nonlinear Dynamic System We here extend the results thus far obtained to the case where the dynamic system is nonlinear, described by ˙ x(t) = f(x(t), u(t), t), (3.41) x(t 0 ) = x 0 , where f(, , ) is an n vector function of its arguments. Our first task is clearly to investigate the behavior of this nonlinear ordinary differential equation in a neighborhood of a trajectory-control pair (x o (), u o ()) in order to obtain an expression analogous to (3.8). We make the following assumptions. Assumption 3.3.1 The n vector function f(, , ) is once continuously differentiable in x and u and continuous in t on the interval [t 0 , t f ]. Assumption 3.3.2 Equation (3.41) has a unique solution x() defined on [t 0 , t f ] for each piecewise continuous control function u(). 3.3. Nonlinear Dynamic System 65 Actually, it is necessary only to assume existence on [t 0 , t f ], as uniqueness of solutions follows from Assumption 3.3.1. Note that it is necessary in the case of nonlinear dynamic systems to make an assumption about existence of a solution on the interval [t 0 , t f ]. For example, the quadratic equation ˙ x(t) = x(t) 2 , x(t 0 ) = x 0 > 0, (3.42) has the unique solution x(t) = x 0 1 −x 0 (t −t 0 ) , (3.43) which ceases to exist at t = 1 x 0 +t 0 . This implies that (3.42) does not have a solution defined on [t 0 , t f ] if t f ≥ 1 x 0 + t 0 (a finite escape time). 3.3.1 Perturbations in the Control and State from the Optimal Path Let the control function u o () generate the unique trajectory x o (), and suppose that we perturb this control function by adding to it a piecewise continuous function εη(), where ε is a positive scalar parameter. Let the trajectory that results as a consequence of the control function u o () +εη() be x o () +ξ(; ε), where ξ(t 0 ; ε) = 0. From (3.41) we have that ˙ x o (t) + ˙ ξ(t; ε) = f([x o (t) + ξ(t; ε)], [u o (t) + εη(t)], t), (3.44) so that ˙ ξ(t; ε) = f([x o (t) + ξ(t; ε)], [u o (t) + εη(t)], t) −f(x o (t), u o (t), t). (3.45) Now because f(, , ) satisfies Assumption 3.3.1, it is well known in differential equation theory [6] that ξ(; ε) is once continuously differentiable with respect to ε (Bliss’s 66 Chapter 3. Systems with General Performance Criteria Theorem). Consequently we can write ξ(t; ε) = εz(t; η()) +O(t; ε), (3.46) where, from (3.45), the partial derivative of ξ(; ε) with respect to ε, ξ ε () is propagated as ˙ ξ ε (t) = f ξ (x o (t), u o (t), t)ξ ε (t) + f u (x o (t), u o (t), t)η(t), (3.47) where ξ ε () = z(t; η()). Having established this intermediate result, which is analogous to (3.8), we expand f(, , t) as follows: f([x o (t) + ξ(t; ε)] , [u o (t) + εη(t)] , t) = f(x o (t), u o (t), t) + εf x (x o (t), u o (t), t)z(t; η()) + εf u (x o (t), u o (t), t)η(t) +O(t; ε). (3.48) Adjoin (3.41) to the performance criterion (3.2) as follows: ˆ J (u(); λ(), x 0 ) = J (u(); x 0 ) + t f t 0 λ T (t) [f(x(t), u(t), t) − ˙ x(t)] dt, (3.49) where λ(t) is an as yet undetermined, continuously differentiable, n vector function of time on [t 0 , t f ]. Integrating by parts we obtain ˆ J (u(); λ(), x 0 ) = J (u(); x 0 ) + t f t 0 ˙ λ T (t)x(t) + λ T (t)f(x(t), u(t), t) dt +λ T (t 0 )x 0 −λ T (t f )x(t f ). (3.50) Evaluate the change in ˆ J (i.e., in J) brought about by changing u o () to u o () + εη(): ∆ ˆ J = ˆ J (u o () + εη(); λ(), x 0 ) − ˆ J (u o (); λ(), x 0 ) = t f t 0 εL x (x o (t), u o (t), t)z(t; η()) + ελ T (t)f x (x o (t), u o (t), t)z(t; η()) +εL u (x o (t), u o (t), t)η(t) + ελ T (t)f u (x o (t), u o (t), t)η(t) +ε ˙ λ T (t)z(t; η()) +O(t; ε) dt −ελ T (t f )z(t f ; η()) + εφ x (x o (t f ))z(t f ; η()) +O(ε). (3.51) 3.3. Nonlinear Dynamic System 67 Define the variational Hamiltonian as H (x(t), u(t), λ(t), t) = L(x(t), u(t), t) + λ T (t)f (x(t), u(t), t) , (3.52) and set − ˙ λ T (t) = H x (x o (t), u o (t), λ(t), t) , (3.53) λ T (t f ) = φ x (x(t f )), which, as with (3.15), is legitimate. The right-hand side of (3.51) then becomes ∆J = ε t f t 0 [H u (x o (t), u o (t), λ(t), t) η(t)] dt + t f t 0 O(t; ε)dt +O(ε), (3.54) which is identical in form to (3.16). The same reasoning used in Section 3.2.5 then yields the following weak version of Pontryagin’s Principle. 3.3.2 Pontryagin’s Weak Necessary Condition Theorem 3.3.1 Suppose u o () minimizes the performance criterion (3.2) subject to the dynamic system (3.41) and that H(x(t), u(t), λ(t), t) is defined according to (3.52). Then, under Assumptions 3.2.2–3.2.4 and 3.3.1–3.3.2, the partial derivative of the Hamiltonian with respect to the control u(t) is zero when evaluated at x o (t), u o (t), viz., H u (x o (t), u o (t), λ(t), t) = 0, (3.55) where − ˙ λ T (t) = H x (x o (t), u o (t), λ(t), t) , (3.56) λ T (t f ) = φ x (x(t f )). (3.57) 68 Chapter 3. Systems with General Performance Criteria Clearly this Theorem is exactly the same as Theorem 3.2.1 with A(t)x(t) + B(t)u(t) replaced by f(x(t), u(t), t). However, it is necessary to invoke Bliss’s The- orem on differentiability with respect to a parameter of the solution of nonlinear ordinary differential equations; this was not required in Section 3.2.1, where the linearity of (3.1) sufficed. Remark 3.3.1 If H as defined in (3.52) is not an explicit function of t, i.e., H(x(t), u(t), λ(t), t) = H (x(t), u(t), λ(t)) and u() ∈ | c , the class of continuously differentiable functions, then H (x(t), u(t), λ(t)) is a constant of the motion along the optimal path. This is easily shown by noting that ˙ H (x(t), u(t), λ(t)) = 0 by using (3.41), (3.55), and (3.56). 3.3.3 Maximum Horizontal Distance: A Variation of the Brachistochrone Problem We return to the brachistochrone example of Section 2.1. Here, we take a slightly modified form, where we maximize the horizontal distance r(t f ) traveled in a fixed time t f . In this formulation, we have the dynamics ˙ r = v cos θ, r(0) = 0, (3.58) ˙ z = v sin θ, z(0) = 0, (3.59) ˙ v = g sin θ, v(0) = 0, (3.60) and the cost criterion is J = φ(x(t f )) = −r(t f ), (3.61) where v is the velocity and θ is the control variable. 3.3. Nonlinear Dynamic System 69 Necessary Conditions We first note that the value of z appears in neither the dynamics nor the performance index and may therefore be ignored. We thus reduce the state space to x = ¸ r v . (3.62) The dynamics can be written ˙ x = f(x, θ) = ¸ v cos θ g sin θ . (3.63) The variational Hamiltonian for this problem is H(x, u, t, λ) = λ T f = λ r v cos θ + λ v g sin θ, (3.64) where we have taken the elements of the Lagrange multiplier vector to be λ r and λ v . Since H is invariant with respect to time, H = C, where C is a constant (see Remark 3.3.1). The necessary conditions from Theorem 3.3.1 are ˙ λ r = 0, λ r (t f ) = −1, (3.65) ˙ λ v = −λ r cos θ, λ v (t f ) = 0, (3.66) and ∂H ∂θ = −λ r v sin θ + λ v g cos θ = 0. (3.67) From (3.65), we have λ r (t) ≡ −1, t ∈ [0, t f ], (3.68) and from (3.67), tan θ(t) = λ v (t)g λ r v(t) = − λ v (t)g v(t) . (3.69) Note that v(0) = 0 implies that θ(0) = π/2, and λ v (t f ) = 0 =⇒ θ(t f ) = 0. These are characteristics of the classic solution to the problem. 70 Chapter 3. Systems with General Performance Criteria This problem can be solved in closed form. From (3.69) and H = C, (3.64) reduces to cos θ = − v C . (3.70) Using (3.69) and (3.70) together gives sin θ = λ v g C . (3.71) Introducing (3.70) into (3.58) gives ˙ r = − v 2 C (3.72) and into (3.60) and (3.66) results in an oscillator as ¸ ˙ v ˙ λ v = ¸ 0 g 2 /C −1/C 0 ¸ v λ v , ¸ v(0) λ v (t f ) = ¸ 0 0 . (3.73) The solution to (3.73) is λ v (t) = a 1 sin g C t + a 2 cos g C t, (3.74) −C ˙ λ v = v(t) = −a 1 g cos g C t + a 2 g sin g C t. (3.75) At t = 0, v(0) = 0 and, therefore, a 1 = 0. At t = t f , since θ(t f ) = 0, from (3.70) C = −v(t f ). Since v(t f ) is positive and thus C is negative, λ v (t f ) = 0 = a 2 cos( gt f C ) ⇒ gt f C = − π 2 ⇒ sin gt f C = −1. Therefore, v(t f ) = −ga 2 = 0. To determine C, note that at t f , θ(t f ) = 0. Then C = −v(t f ) = −2gt f π = ga 2 ⇒a 2 = − 2t f π . (3.76) Since v(t) = 2gt f π sin gt C , then ˙ r = v cos θ = − v 2 C = 2gt f π sin 2 πt 2t f . (3.77) 3.3. Nonlinear Dynamic System 71 This integrates to r(t) = 2gt f π t f 0 sin 2 πt 2t f dt = gτ 2 4 ¸ 2t τ −sin 2t τ , (3.78) where τ = 2t f π . The maximum horizontal distance is r(t f ) = gt 2 f π . In the vertical direction, ˙ z = v sin θ = v ˙ v g = gτ sin t τ cos t τ = gτ 2 sin 2t τ . (3.79) This integrates to z(t) = g ¸ 1 −cos 2t τ , (3.80) where the value at t f is z(t f ) = 2g. Note that (3.78) and (3.80) are the equations for a cycloid. 3.3.4 Two-Point Boundary-Value Problem In Section 3.2.6 we verified that a certain control function, for a specific example, satisfied the conditions of Theorem 3.2.1. This was possible without resorting to numerical (computer) methods because of the simplicity of the example. For a general nonlinear dynamic system this “verification” is also possible. First, one integrates (3.41), numerically if necessary, with u() = ˜ u(), where ˜ u() is the control function to be tested, to obtain ˜ x(). One then integrates (3.56) backward in time along the path ˜ x(), ˜ u() from the “initial” value λ T (t f ) = φ x (˜ x(t f )). Having done this, one then has in hand ˜ x(), ˜ u(), and λ() so that (3.55) can be tested. One might also wish to generate (construct) a control function ˜ u() which satisfies Pontryagin’s Principle. This is more difficult, since the initial condition for (3.41) is 72 Chapter 3. Systems with General Performance Criteria known at t = t 0 , whereas only the final condition for (3.56) is known at t = t f , i.e., we have a so-called two-point boundary-value problem: (3.41) and (3.56) cannot be integrated simultaneously from t = t 0 or from t = t f . Usually one has to resort to numerical techniques to solve the two-point boundary-value problem. The so- called linear quadratic optimal control problem is one class of problems in which the two-point boundary-value problem can be converted into an ordinary initial value problem; this class of problems is discussed in Chapter 5. A numerical optimization method is presented which may converge to a local optimal path. This method, called steepest descent, iterates on H u until the optimality condition H u = 0 is approximately satisfied. The difficulty with this approach is that as H u becomes small, the converges rate becomes slow. For comparison, a second numerical optimization method called the shooting method is described in Section 5.4.7; it satisfies the optimality condition explicitly on each iteration but requires converging to the boundary conditions. In contrast to the steepest descent method, the shooting method converges very fast in the vicinity of the the optimal path. However, if the initial choice of the boundary conditions is quite far from that of the optimal path, convergence can be very slow. Steepest descent usually converges initially well from the guessed control sequence. Another difficulty with the shooting method is that it may try to converge to an extremal trajectory that satisfies the first-order necessary conditions but is not locally minimizing (see Chapter 5). The steepest descent method does not converge to an extremal trajectory that satisfies the first-order necessary conditions but is not locally minimizing. Solving the Two-Point Boundary-Value Problem via the Steepest Descent Method The following is used in the procedure of the following steepest descent algorithm. 3.3. Nonlinear Dynamic System 73 Note 3.3.1 The relationship between a first-order perturbation in the cost and a perturbation in the control is δJ = φ x (x i (t f ))δx(t f ) + t f t 0 (L x (x i (t), u i (t), t)δx(t) + L u (x i (t), u i (t), t)δu)dt = t f t 0 (L u (x i (t), u i (t), t) + λ iT (t)f u (x i (t), u i (t), t))δudt, (3.81) where the perturbed cost (3.81) is obtained from the first-order perturbation δx generated by the linearized differential equation δ ˙ x(t) = f x (x i (t), u i (t), t)δx(t) + f u (x i (t), u i (t), t)δu(t), (3.82) adding the identically zero term t f t 0 d dt (λ iT δx)dt − φ x (x i (t f ))δx(t f ) to the perturbed cost, and by assuming that the initial condition is given. The steps in the steepest descent algorithm are given in the following procedure. 1. Choose the nominal control, u i ∈ |, at iteration i = 1. 2. Generate the nominal state path, x i (), i.e., integrate forward from t 0 to t f . 3. Integrate the adjoint λ i equation backward along the nominal path λ i (t f ) = φ T x (x i (t f )), (3.83) ˙ λ i (t) = −L x (x i (t), u i (t), t) T −f x (x i (t), u i (t), t) T λ i (t). (3.84) 4. From Note 3.3.1 let H u (x i (t), u i (t), t) = L u (x i (t), u i (t), t) + λ iT (t)f u (x i (t), u i (t), λ i , t). (3.85) Form the control variation δu(t) = −H T u (x i (t), u i (t), t) to determine the control for the (i + 1)th iteration as u i+1 (t) = u i (t) −H T u (x i (t), u i (t), t) (3.86) 74 Chapter 3. Systems with General Performance Criteria for some choice of > 0, which preserves the assumed linearity. The perturbed cost criterion (3.81) is approximately δJ = − t f t 0 |H u (x i (t), u i (t), t)| 2 dt. (3.87) 5. If |H u (x i (t), u i (t), t)| is sufficiently small over the path, then stop. If not, go to 2. Problems Solve for the control u() ∈ | that minimizes J = 1 2 ¸ x(10) 2 + 10 0 (x 2 + 2bxu + u 2 )dt subject to the scalar state equation ˙ x = x + u, x(0) = x 0 = 1 1. analytically with b = 0, −1, −10; 2. numerically by steepest descent by using the algorithm in Section 3.3.4. 3.4 Strong Variations and the Strong Form of the Pontryagin Minimum Principle Theorem 3.3.1 is referred to as a weak Pontryagin Principle because it states only that H u (x o (t), u o (t), λ(t), t) = 0. (3.88) It turns out, however, that a stronger statement is possible. That is, if u o () minimizes J, then H (x o (t), u o (t), λ(t), t) ≤ H (x o (t), u(t), λ(t), t) ∀t ∈ [t 0 , t f ]. (3.89) Clearly (3.89) implies (3.88) but (3.89) is stronger as it states that u o (t) minimizes with respect to u(t) the function H(x(t), u(t), λ(t), t). 3.4. Strong Variations and Strong Form of Pontryagin Minimum Principle 75 In the following derivation, the assumption of continuous differentiability with respect to u can be relaxed. Therefore, Assumptions 3.2.3 and 3.3.1 are replaced by the following. Assumption 3.4.1 The scalar function L(, , ) and the n vector function f(, , ) is once continuously differentiable in x and continuous in u and t where t ∈ [t 0 , t f ]. In order to prove this strong form of Pontryagin’s Principle one introduces the notion of strong perturbations (variations). Here a perturbation η() is made to u o () which may be large in magnitude but which is nonzero only over a small time interval ε. In other words, instead of introducing a perturbation εη() which can be made small by making ε small, we introduce a not necessarily small continuous perturbation η(t), ¯ t ≤ t ≤ ¯ t +ε and set η() = 0 on the intervals [t 0 , ¯ t ), ( ¯ t +ε, t f ]. As we will now show, this type of perturbation still results in a small change in x(). Since η() = 0 on [t 0 , ¯ t ), we have x(t) = x o (t) on [t 0 , ¯ t ] and ξ(t; η()) = x(t)−x o (t) is zero on this interval. On the interval [ ¯ t, ¯ t + ε] we have ˙ ξ (t; η()) = f (x o (t) + ξ(t; η()), u o (t) + η(t), t) −f (x o (t), u o (t), t) (3.90) and ξ t; η() = 0, so that ξ (t; η()) = t t [f (x o (τ) + ξ(t; η()), u o (τ) + η(τ), τ) −f (x o (τ), u o (τ), τ)] dτ (3.91) for ¯ t ≤ t ≤ ¯ t + ε. Clearly, the piecewise differentiability of ξ(; η()) with respect to t allows us to write ξ (t; η()) = (t −t) ˙ ξ t; η() +O(t −t) (3.92) 76 Chapter 3. Systems with General Performance Criteria and ξ t + ε; η() = ε ˙ ξ t; η() +O(ε). (3.93) It then follows along the lines indicated in Section 3.3 that for t ∈ [ ¯ t + ε, t f ] ξ (t; η()) = εz (t; η()) +O(t; ε). (3.94) Because of its differentiability, we can expand L for ¯ t ≤ t ≤ ¯ t + ε as L x o (t) + (t −t) ˙ ξ t; η() +O(t −t) , [u o (t) + η(t)] , t = L(x o (t), [u o (t) + η(t)] , t) + (t −t)L x (x o (t), [u o (t) + η(t)] , t) ˙ ξ t; η() +O(t −t), (3.95) and for ¯ t + ε < t ≤ t f L([x o (t) + εz (t; η()) +O(t; ε)] , u o (t), t) = L(x o (t), u o (t), t) + εL x (x o (t), u o (t), t) z (t; η()) +O(t; ε). (3.96) Similarly, for ¯ t ≤ t ≤ ¯ t + ε f x o (t) + (t −t) ˙ ξ t; η() +O(t −t) , [u o (t) + η(t)] , t = f (x o (t), [u o (t) + η(t)] , t) + (t −t)f x (x o (t), [u o (t) + η(t)] , t) ˙ ξ t; η() +O(t −t), (3.97) and for ¯ t + ε < t ≤ t f f ([x o (t) + εz (t; η()) +O(t; ε)] , u o (t), t) = f (x o (t), u o (t), t) + εf x (x o (t), u o (t), t) z (t; η()) +O(t; ε). (3.98) Using (3.50) we evaluate the change in ˆ J (i.e., in J) brought about by the strong variation u o () to u o () + η() over t ∈ [ ¯ t, ¯ t + ε], whereas η() = 0 for t ∈ [t 0 , ¯ t) and t ∈ ( ¯ t + ε, t f ]: 3.4. Strong Variations and Strong Form of Pontryagin Minimum Principle 77 ∆ ˆ J(, , ) = ˆ J(u o () + η(); λ(), x 0 ) − ˆ J(u o (); λ(), x 0 ) = t+ε t L(x o (t), [u o (t) + η(t)] , t) −L(x o (t), u o (t), t) +(t −t)L x (x o (t), [u o (t) + η(t)] , t) ˙ ξ t; η() +λ T (t) [f (x o (t), [u o (t) + η(t)] , t) −f (x o (t), u o (t), t)] +(t −t)λ T (t)f x (x o (t), [u o (t) + η(t)] , t) ˙ ξ t; η() +(t −t) ˙ λ T (t) ˙ ξ t; η() +O(t −t) dt + t f t+ε εL x (x o (t), u o (t), t) z(t; η()) + ελ T (t)f x (x o (t), u o (t), t) z(t; η()) +ε ˙ λ T (t)z(t; η()) +O(t; ε) dt +ελ T (t f )z(t f ; η()) + εφ x (x o (t f ))z(t f ; η()) +O(ε). (3.99) By using the definition of H from (3.52) and applying (3.53) over the interval ( ¯ t+ε, t f ] in (3.99), we obtain ∆ ˆ J = t+ε t H (x o (t), [u o (t) + η(t)] , λ(t), t) −H (x o (t), u o (t), λ(t), t) + (t −t)H x (x o (t), [u o (t) + η(t)] , λ(t), t) + (t − ¯ t) ˙ λ T (t) ˙ ξ t; η() + O(t −t) ¸ dt + t f ¯ t+ε O(t; ε)dt +O(ε). (3.100) The first integral in (3.100) can be expanded in terms of ε around the point ¯ t +ε to yield ∆ ˆ J = ε H (x o ( ¯ t + ε), [u o ( ¯ t + ε) + η( ¯ t + ε)] , λ( ¯ t + ε), ( ¯ t + ε)) − H (x o ( ¯ t + ε), u o ( ¯ t + ε), λ( ¯ t + ε), ( ¯ t + ε)) +ε 2 H x (x o ( ¯ t + ε), [u o ( ¯ t + ε) + η( ¯ t + ε)] , λ( ¯ t + ε), ( ¯ t + ε)) + ˙ λ T ( ¯ t + ε) ˙ ξ t; η() + εO(ε) + t f ¯ t+ε O(t; ε)dt +O(ε). (3.101) 78 Chapter 3. Systems with General Performance Criteria For small ε, the dominant term in (3.101) is the first one, and as (3.101) must be nonnegative for the left-hand side of (3.99) to be nonnegative, we have the necessary condition H(x o ( ¯ t + ε), u o ( ¯ t + ε) + η( ¯ t + ε), λ( ¯ t + ε), ¯ t + ε) ≥ H(x o ( ¯ t + ε), u o ( ¯ t + ε), λ( ¯ t + ε), ¯ t + ε) . (3.102) As ¯ t and ε are arbitrary and as the value of η( ¯ t + ε) is arbitrary, we conclude that H(x o (t), u(t), λ(t), t) is minimized with respect to u(t) at u(t) = u o (t). We have thus proved the following theorem. Theorem 3.4.1 Suppose u o () minimizes the performance criteria (3.2) subject to the dynamic system (3.41) and that H(x, u, λ, t) is defined according to (3.52). Then, under Assumptions 3.2.2, 3.2.4, 3.3.2, and 3.4.1, the Hamiltonian is minimized with respect to the control u(t) at u o (t), viz., H(x o (t), u o (t), λ(t), t) ≤ H(x o (t), u(t), λ(t), t) ∀ t in [t 0 , t f ], (3.103) where − ˙ λ T (t) = H x (x o (t), u o (t), λ(t), t), λ T (t f ) = φ x (x o (t f )). (3.104) Remark 3.4.1 Equation (3.103) is classically known as the Weierstrass condition. Remark 3.4.2 If a minimum of H(x o (t), u(t), λ(t), t) is found through (3.103) at u o (t) and H(x o (t), u(t), λ(t), t) is twice continuously differentiable with respect to u, then necessary conditions for optimality are H u (x o (t), u o (t), λ(t), t) = 0 and H uu (x o (t), u o (t), λ(t), t) ≥ 0. (3.105) The classical Legendre–Clebsch condition is H uu ≥ 0 and H uu > 0 is the strong form. 3.4. Strong Variations and Strong Form of Pontryagin Minimum Principle 79 3.4.1 Control Constraints: Strong Pontryagin Minimum Principle So far, we have allowed the control u(t) to take on any real value. However, in many (engineering) applications the size of the controls that can be applied is limited. Specifically, we may have bounded control functions in the class u(t) ∈ | B ∀ t in [t 0 , t f ], (3.106) where | B is a subset of m-dimensional bounded control functions. For example, we may have the scalar control u(t) bounded according to −1 ≤ u(t) ≤ 1 ∀ t in [t 0 , t f ], (3.107) where the set | B is defined as | B = ¦u() : −1 ≤ u() ≤ 1¦ . (3.108) Referring to (3.101), for example, we see that the only change required in our derivation when u(t) ∈ | B is that u o (t) ∈ | B and that η() is chosen so that u o (t) + η(t) ∈ | B for all t in [t 0 , t f ]. Therefore, u o (t) is found for u(t) ∈ | B which Figure 3.2: Bounded control. 80 Chapter 3. Systems with General Performance Criteria minimizes H. An example of H versus u for bounded control is shown in Figure 3.2. With this modification we have proved the following necessary condition for optimality of u o (). Theorem 3.4.2 Suppose that u o (t) ∈ | B , t ∈ [t 0 , t f ], minimizes the performance criterion (3.2) subject to the dynamic system (3.41) and the constraint that u(t) ∈ | B for all t ∈ [t 0 , t f ], and that H(x(t), u(t), λ(t), t) is defined according to (3.52). Then, under Assumptions 3.2.2, 3.2.4, 3.3.2, and 3.4.1, the Hamiltonian is minimized with respect to the control u(t), subject to the control constraint u(t) ∈ | B , at u o (t), viz., H(x o (t), u o (t), λ(t), t) ≤ H(x o (t), u(t), λ(t), t) ∀ t in [t 0 , t f ], (3.109) where u o (t), u(t) ∈ | B (3.110) and − ˙ λ T (t) = H x (x o (t), u o (t), λ(t), t), λ T (t f ) = φ x (x o (t f )). (3.111) Remark 3.4.3 Remark 3.3.1 notes that if u() ∈ U c , time does not appear explicitly in the Hamiltonian, and the condition H u = 0 holds, the Hamiltonian is constant along the optimal trajectory. Now let u() ∈ U cB , the class of piecewise continuously differentiable and bounded functions. The optimal control may be not differentiable at a finite number of points because either it goes from being unconstrained to being on its bound or it is discontinuous by jumping between two (or more) equal minima of H. The fact that u o () ∈ U cB guarantees that x(t) and λ(t) are continuous for all t ∈ [t 0 , t f ]. Hence, even if the optimal control is not differentiable at, e.g., point t d , x o (t − d ) = x o (t + d ), λ o (t − d ) = λ o (t + d ). Consequently, since u 0 () is chosen to minimize H at any time, H(x o (t − d ), λ o (t − d ), u o (t − d )) = H(x o (t + d ), λ o (t + d ), u o (t + d )), i.e., is continuous 3.4. Strong Variations and Strong Form of Pontryagin Minimum Principle 81 also across t d . Therefore, for constant control bounds, if H is not an explicit function of time, it remains constant when the control is differentiable, as discussed in Remark 3.3.1. The continuity of the H across the nondifferentiable points of u o () implies that the Hamiltonian remains constant along the entire optimal solution. In this case, it is often referred to as the constant of motion. Strong Pontryagin Minimum Principle: Special Case Let us suppose that we wish to control the linear dynamic system ˙ x(t) = Ax(t) + Bu(t), x(t 0 ) = x 0 , (3.112) so as to minimize a linear function of the final value of x, viz., min u α T x(t f ), (3.113) subject to the constraint −1 ≤ u i (t) ≤ 1, i = 1, . . . , m, (3.114) where t f is the known final time. Here, A is an nn constant matrix, B is an nm constant matrix, and α is an n-dimensional column vector. We make the following assumption. Assumption 3.4.2 The matrices [B i , AB i , . . . , A n−1 B i ], i = 1, . . . , m, have rank n. This is a controllability assumption, where the system is controllable from each control (see [8]). Under this assumption we have the following result. Theorem 3.4.3 The controls u i (t) = −sign[B T i e −A T (t−t f ) α], i = 1, . . . , m, (3.115) 82 Chapter 3. Systems with General Performance Criteria are well defined and minimize the performance criterion α T x(t f ) in the class of piecewise continuous controls that satisfy (3.114). Here, sign(σ) = 1 if σ ≥ 0, −1 if σ < 0. (3.116) Proof: First, we show that Assumption 3.4.2 ensures that B T i e −A T (t−t f ) α is not identically zero on a nonzero interval of time so that the control (3.115) is nonzero except possibly at a finite number of times that the control switches as σ goes through zero (3.116). Suppose the converse, i.e., that this function is zero on an interval of time. Then in this interval, at time t say, its time derivatives must also be zero, yielding B T i e −A T (t−t f ) α = 0, B T i A T e −A T (t−t f ) α = 0, . . . B T i A T n−1 e −A T (t−t f ) α = 0. (3.117) As the exponential function is always nonsingular and as α = 0, these equations imply that the rank of [B i , AB i , . . . , A n−1 B i ] is less than n—a contradiction. Now we show that the controls (3.115) satisfy Pontryagin’s Principle. We have H(x(t), u(t), λ(t), t) = λ T (t)Ax(t) + λ T (t)Bu(t), (3.118) so that − ˙ λ T (t) = λ T (t)A, λ T (t f ) = α T . (3.119) This equation can be integrated explicitly to yield λ(t) = e −A T (t−t f ) α, (3.120) 3.5. Sufficient Conditions for Optimality 83 so that H(x(t), u(t), λ(t), t) = α T e −A(t−t f ) (Ax(t) + Bu(t)), (3.121) and H is minimized by choosing u i (t) = −sign B T i e −A T (t−t f ) α , i = 1, . . . , m, (3.122) which is just (3.115). So we have proved that (3.115) satisfies Pontryagin’s Princi- ple. However, Pontryagin’s Principle is only a necessary condition for optimality, so we have not yet proved the optimality of the control. It turns out that this can be verified by direct calculation (cf. Section 3.2.6), but we leave this exercise to the reader. Remark 3.4.4 Using (3.122) Assumption 3.4.2 shows that u i (t) = 0 for any finite period. 3.5 Sufficient Conditions for Global Optimality: The Hamilton–Jacobi–Bellman Equation Throughout, we have stated and proved the fact that Pontryagin’s Principle is a necessary condition for optimality. It turns out that only in special cases (cf. Sec- tions 3.2.6 and 3.4.1) it is sufficient for optimality. This should not surprise the reader since in all our derivations we have chosen control perturbations that cause only a small change in x() away from x o (). In order to derive a sufficient condition for optimality via variational (perturbational) means, one would have to allow arbitrary (not only small) deviations from the trajectory x o (). If arbitrary expansions were used, this would involve higher-order derivatives of L, f, and φ and would be very cumbersome. Therefore, in this section we take a different approach to sufficiency via the Hamilton–Jacobi–Bellman (H-J-B) equation. 84 Chapter 3. Systems with General Performance Criteria Our approach to global sufficiency centers on the H-J-B partial differential equation −V t (x(t), t) = min u(t)∈U B [L(x(t), u(t), t) + V x (x(t), t)f(x(t), u(t), t)] (3.123) with V (x(t f ), t f ) = φ(x(t f )). (3.124) The function V (x(t), t) : R n R →R is known as the optimal value function, and as we will see, V (x(t), t) is the optimal value of the cost criterion starting at x(t), t and using the optimal control to the terminal time. Note that x(t) and t are independent variables in the first-order partial differential equation (3.123) and V (x(t), t) is the dependent variable. The notation x(t) should be read as x at time t. Therefore, x(t f ) in (3.124) is read as x at time t f . When we consider integration or differentiation along a path, x(t) is dependent on t. Although this equation is rather formidable at first glance, it turns out to be naturally useful in deducing sufficient conditions for optimality and for solving certain classes of optimal control problems. 5 Note that inequality constraints included on the control as well as on the state space become boundary conditions on this first-order partial differential equation. State space constraints are beyond the scope of this book and are not discussed herein. To develop some facility with the H-J-B partial differential equation, we apply it to the example treated in Section 3.2.6. Here L(x(t), u(t), t) = 1 2 u 2 (t), (3.125) f(x(t), u(t), t) = ¸ x 2 (t) u(t) , (3.126) φ(x(t f )) = x 1 (t f ). (3.127) 5 The implementation of the H-J-B equation is sometimes referred to as dynamic programming. 3.5. Sufficient Conditions for Optimality 85 Substituting these expressions into (3.123) we obtain −V t (x(t), t) = min u(t)∈U ¸ 1 2 u 2 (t) + V x 1 (x(t), t)x 2 (t) + V x 2 (x(t), t)u(t) , V (x(t f ), t f ) = x 1 (t f ). (3.128) Carrying out the minimization of the quantity in square brackets yields u o (t) = −V x 2 (x(t), t), (3.129) and substituting this back into (3.128) yields −V t (x(t), t) = V x 1 (x(t), t)x 2 (t) − 1 2 [V x 2 (x(t), t)] 2 . (3.130) We assume that (3.130) has a solution V (x(t), t) = x 1 (t) + (t f −t)x 2 (t) − 1 6 (t f −t) 3 . (3.131) Then V x 1 (x(t), t) = 1, V x 2 (x(t), t) = t f −t, (3.132) and −V t (x(t), t) = x 2 (t) − 1 2 (t f −t) 2 . (3.133) Consequently, (3.132) and (3.133) show that (3.131) satisfies (3.128) and, moreover, evaluated at t = t f equals x 1 (t f ). What is more, we have from (3.129) and (3.132) that u o (t) = −(t f −t) (3.134) and from (3.131) V (x(t 0 ), t 0 ) = x 10 + (t f −t 0 )x 20 − 1 6 (t f −t 0 ) 3 , (3.135) which agrees with (3.38). 86 Chapter 3. Systems with General Performance Criteria It is clear from these calculations that the H-J-B equation (3.129) is closely related to the optimal solution of the example in Section (3.2.6). We now develop this point fully via the following theorem. Theorem 3.5.1 Suppose there exists a once continuously differentiable scalar function V (, ) of x and t that satisfies the H-J-B equation (3.123). Suppose further that the control u o (x(t), t) that minimizes L(x(t), u(t), t) + V x (x(t), t)f(x(t), u(t), t) (3.136) subject to the constraint that u(t) ∈ | B . Then, under the further Assumptions 3.2.2 and 3.3.2 and that f(, , ), L(, , ), and φ() are continuous in all their arguments, the control function u o (x o (), ) minimizes (3.2) subject to (3.41) and (3.106), and V (x 0 , t 0 ) is equal to the minimum value of (3.2). Proof: Under the stated assumptions we note that the identity V (x(t 0 ), t 0 ) −V (x(t f ), t f ) + t f t 0 d dt V (x(t), t)dt = 0 (3.137) holds for all piecewise continuous functions u() ∈ | B . Adding this identically zero quantity to J and noting that d dt V (x(t), t) = V t (x(t), t) + V x (x(t), t)f(x(t), u(t), t) (3.138) yields ˆ J(u(); x 0 ) = V (x 0 , t 0 ) −V (x(t f ), t f ) + φ(x(t f )) + t f t 0 [L(x(t), u(t), t) + V t (x(t), t) + V x (x(t), t)f(x(t), u(t), t)] dt. (3.139) 3.5. Sufficient Conditions for Optimality 87 Suppose that V (x(t), t) satisfies (3.123). Then ˆ J(u(); x 0 ) = V (x 0 , t 0 ) + t f t 0 [H(x(t), u(t), V x (x(t), t), t) − H(x(t), u o (x(t), t), V x (x(t), t), t)] dt, (3.140) where H(x(t), u(t), V x (x(t), t), t) = L(x(t), u(t), t) + V x (x(t), t)f(x(t), u(t), t). (3.141) From the minimization in (3.123), the integrand of (3.140) is nonnegative and takes on its minimum value of zero when u(t) = u o (x(t), t). Assumption 3.3.2 ensures that the trajectory resulting from this control is well defined. Consequently, this completes the proof of the theorem. Remark 3.5.1 Some interpretation of Equation (3.140) is required. Note that the integral is taken along a nonoptimum path generated by u(t). If Equation (3.140) is rewritten as ∆J = ˆ J(u(); x 0 ) −V (x 0 , t 0 ) = t f t 0 [H(x(t), u(t), V x (x(t), t), t) −H(x(t), u o (x(t), t), V x (x(t), t), t)]dt = t f t 0 ∆Hdt, (3.142) then the integral represents the change in cost away from the optimal path. This integral is called Hilbert’s integral in the classical calculus of variations literature. Example 3.5.1 Consider Hilbert’s integral given in (3.142). Furthermore, consider also the linear quadratic problem L(x(t), u(t), t) = 1 2 [x(t) T Qx(t) + u(t) T Ru(t)], (3.143) f(x(t), u(t), t) = Ax(t) + Bu(t) (3.144) 88 Chapter 3. Systems with General Performance Criteria with Q = Q T , R = R T > 0, φ(x(t f )) = 0. Then ∆H = 1 2 [u T (t)Ru(t) −u oT (x(t), t)Ru o (t)] + V x B(u(t) −u o (x(t), t)). (3.145) Assume the form of V (, ) as V (x(t), t) = 1 2 x T (t)S(t)x(t), V x (x(t), t) = x T (t)S(t), (3.146) u o (x(t), t) = −R −1 B T S(t)x(t), S(t) = S T (t) > 0, (3.147) so ∆H = 1 2 [u T (t)Ru(t) −u oT (x(t), t)Ru o (x(t), t)] +V x (x(t), t)BR −1 R(u(t) −u o (x(t), t)) = 1 2 u T (t)Ru(t) − 1 2 u oT (x(t), t)Ru o (x(t), t) +u oT (x(t), t)Ru o (x(t), t) −u oT (x(t), t)Ru(t) = 1 2 (u(t) −u o (x(t), t)) T R(u(t) −u o (x(t), t)) ≥ 0. (3.148) This means that u o (x o (t), t) is a global minimum for this linear problem with quadratic performance index. Theorem 3.5.1 shows that satisfaction of the H-J-B equation implies that the minimum value of the cost criterion starting at the point (x(t), t) and using the optimal control to the terminal time t f is equal to V (x(t), t), the optimal value function. To develop a conceptual notion of V (x(t), t), let us consider an optimization problem involving only a terminal cost function, i.e., J = φ(x(t f )). Suppose there exists an optimal trajectory emanating from (x(t), t). Then, at every point along this path, the value of the optimal value function equals the optimal value φ(x o (t f )). Therefore, V (x(t), t) is constant along the motion, and its derivative along the path, assuming 3.5. Sufficient Conditions for Optimality 89 continuous first partial derivatives of V (x(t), t), is dV (x(t), t) dt = ∂V (x(t), t) ∂x(t) f(x(t), u o (x(t), t), t) + ∂V (x(t), t) ∂t = 0, (3.149) where u o (x(t), t) is the optimal control. If any other control is used, then dV (x(t), t) dt ≥ 0, (3.150) and therefore (3.149) can be restated as − ∂V (x(t), t) ∂t = min u∈U B ∂V (x(t), t) ∂x(t) f(x(t), u(t), t). (3.151) This equation can be generalized to that of Equation (3.123) by making the simple transformation that x n+1 (t) = t t 0 L(x(τ), u(τ), τ)dτ, (3.152) where x n+1 () is an element of a new state vector ˜ x() = [x() T , x n+1 ()] T ∈ R n+1 . (3.153) The new terminal cost criterion is ˜ φ(˜ x(t f )) = φ(x(t f )) + x n+1 (t f ), (3.154) where ˙ ˜ x(t) = ˜ f(x(t), u(t), t) = ¸ f(x(t), u(t), t) L(x(t), u(t), t) , ˜ x(t 0 ) = ¸ x 0 0 . (3.155) The H-J-B equation is now −V t (˜ x(t), t) = min u∈U B V ˜ x (˜ x(t), t) ˜ f(x(t), u(t), t) = min u∈U B V x (˜ x(t), t)f(x(t), u(t), t) + V x n+1 (˜ x(t), t)L(x(t), u(t), t) . (3.156) 90 Chapter 3. Systems with General Performance Criteria Since the dynamics are not an explicit function of x n+1 (), a variation of x n+1 () will not change the optimal solution but is just a simple translation in the value of the optimal cost and hence of the optimal value function V (˜ x(t), t). Therefore, V x n+1 (˜ x(t), t) = 1, and with the notational change, V (x(t), x n+1 (t), t) = V (x(t), t), we arrive at Equation (3.123). We now illustrate Theorem 3.5.1 by applying it to the special case of Sec- tion 3.4.1. Here L(x(t), u(t), t) = 0, f(x(t), u(t), t) = Ax(t) + Bu(t), | B = ¦u : −1 ≤ u i ≤ 1, i = 1, . . . , m¦ , φ(x(t f )) = α T x(t f ). (3.157) The H-J-B equation is −V t (x(t), t) = min u(t)∈U B [V x (x(t), t)Ax(t) + V x (x(t), t)Bu(t)] , (3.158) which yields u o i (x(t), t) = −sign B T i V T x (x(t), t) , i = 1, . . . , m, (3.159) and −V t (x(t), t) = V x (x(t), t)Ax(t) − m ¸ i=1 [V x (x(t), t)B i [ , V (x(t f ), t f ) = α T x(t f ). (3.160) The choice V (x(t), t) = α T e −A(t−t f ) x(t) − t f t m ¸ i=1 α T e −A(τ−t f ) B i dτ (3.161) yields V x (x(t), t) = α T e −A(t−t f ) , V t (x(t), t) = −α T e −A(t−t f ) Ax(t) + m ¸ i=1 α T e −A(t−t f ) B i , (3.162) V (x(t f ), t f ) = α T x(t f ), 3.5. Sufficient Conditions for Optimality 91 where, in the expression for V t (x(t), t), we use the fact that A and e −A(t−t f ) commute. Substituting the expression for V x (x(t), t) into the right-hand side of (3.160) yields α T e −A(t−t f ) Ax(t) − m ¸ i=1 α T e −A(t−t f ) B i , (3.163) which by (3.162) is −V t (x(t), t), thus verifying that the H-J-B equation is satisfied by (3.161). In view of Theorem 3.5.1 we then conclude that the controls given by (3.115) are minimizing and that the minimum value of the performance criterion α T x(t f ) is given by (3.161) evaluated at t = t 0 . Naturally, in more complicated nonlinear control problems, it is more difficult to generate or guess a V function that satisfies the H-J-B equation. Nevertheless, if such a function can be found, the control problem is then completely (globally) solved. It will not have escaped the reader that there is a striking similarity between V x (x(t), t) and λ T (t) of Pontryagin’s Principle. We will develop this relationship in the next subsection. 3.5.1 Derivatives of the Optimal Value Function In certain classes of problems it is possible to show that when Theorem 3.5.1 holds, the gradient (derivative) with respect to x(t) of the optimal value function V (x(t), t) satisfies the same equation as that for λ T (t) and thereby gives a geometrical interpretation to the Lagrange multipliers. Here we illustrate this when | is the whole m-dimensional space, viz., R m . This derivation assumes that the second partial derivative of V (x(t), t) with respect to x(t) exists. Furthermore, there are no control constraints and (3.123) becomes −V t (x(t), t) = min u∈U [L(x(t), u(t), t) + V x (x(t), t)f(x(t), u(t), t)] , V (x(t f ), t f ) = φ(x(t f )). (3.164) 92 Chapter 3. Systems with General Performance Criteria Since we take time derivatives along the optimal path generated by the optimal control, we replace the arbitrary initial state x(t) in the H-J-B equation with x o (t). Calling the minimizing control u o (x o (t), t), we obtain −V t (x o (t), t) = [L(x o (t), u o (x o (t), t), t) + V x (x o (t), t)f(x o (t), u o (x o (t), t), t)] = H(x o (t), u o (x o (t), t), V x (x o (t), t), t). (3.165) First Derivative of the Optimal Value Function Assuming that the necessary derivatives exist, we differentiate (3.165) with respect to x o (t) to obtain −V tx (x o (t), t) = L x (x o (t), u o (x o (t), t), t) +V x (x o (t), t)f x (x o (t), u o (x o (t), t), t) +f T (x o (t), u o (x o (t), t), t)V xx (x o (t), t) +L u (x o (t), u o (x o (t), t), t)u o x (x o (t), t) +V x (x o (t), t)f u (x o (t), u o (x o (t), t), t)u o x (x o (t), t). (3.166) Note that V tx (x o (t), t) = V xt (x o (t), t) (3.167) if the second partial derivatives of V exist, and d dt V x (x o (t), t) = V xt (x o (t), t) + f T (x o (t), u o (x o (t), t), t)V xx (x o (t), t). (3.168) Using the equations in (3.166) and noting that, because of the minimization with respect to u(t), L u (x o (t), u o (x o (t), t), t) + V x (x o (t), t)f u (x o (t), u o (x o (t), t), t) = 0, (3.169) 3.5. Sufficient Conditions for Optimality 93 we obtain − ˙ V x (x o (t), t) = L x (x o (t), u o (x o (t), t), t) + V x (x o (t), t)f x (x o (t), u o (x o (t), t), t). (3.170) It is also clear from (3.164) that V x (x(t f ), t f ) = φ x (x(t f )). (3.171) Comparing (3.170) and (3.171) to (3.56) and (3.57), we see that V x (x o (t), t) satisfies the same differential equation and has the same final value as λ(t) T , which can be interpreted as the derivative of the optimal value function with respect to the state. There are, however, control problems where there is no continuously differentiable V (, ) that satisfies the H-J-B equation, but Pontryagin’s λ() is well defined by its differential equation. In such cases, it is awkward to find a physical interpretation for λ(). Second Derivative of the Optimal Value Function The second derivative of the optimal value function is now derived and is, in general, the curvature of the optimal value function with respect to the state. Here, certain assumptions must be made first for simplicity of the derivation. Assumption 3.5.1 u o (x(), ) ∈ Interior |. If the optimal control function does not lie on a control boundary, then there are no restrictions assuming an arbitrarily small control variation away from the optimal control. (If u is on a bound, then u(x(t), t) = u(t) ⇒ u x (x(t), t) = 0. Using this the following results would have to be modified.) Assumption 3.5.2 H(x o (t), u o (x o (t), t), V x (x o (t), t), t) < H(x o (t), u(t), V x (x o (t), t), t) (3.172) 94 Chapter 3. Systems with General Performance Criteria implies H u (x o (t), u o (x o (t), t), V x (x o (t), t), t) = 0, H uu (x o (t), u o (x o (t), t), V x (x o (t), t), t) > 0. (3.173) Then from the H-J-B equation (3.165) −V txx (x o (t), t) = H xx (x o (t), u o (x o (t), t), V x (x o (t), t), t) +H xu (x o (t), u o (x o (t), t), V x (x o (t), t), t)u o x (x o (t), t) +H xV x (x o (t), u o (x o (t), t), V x (x o (t), t), t)V xx (x o (t), t) +V xx (x o (t), t)H V x x (x o (t), u o (x o (t), t), V x (x o (t), t), t) +V xx (x o (t), t)H V x u (x o (t), u o (x o (t), t), V x (x o (t), t), t)u o x (x o (t), t) + n ¸ i H V x i (x o (t), u o (x o (t), t), V x (x o (t), t), t)V x i xx (x o (t), t) +u oT x (x o (t), t)H ux (x o (t), u o (x o (t), t), V x (x o (t), t), t) +u oT x (x o (t), t)H uV x (x o (t), u o (x o (t), t), V x (x o (t), t), t)V xx +u oT x (x o (t), t)H uu (x o (t), u o (x o (t), t), V x (x o (t), t), t)u o x (x o (t), t), (3.174) where derivatives of H(x o (t), u o (x o (t), t), V x (x o (t), t), t) are taken with respect to each of its explicit arguments. It is assumed here that third derivatives exist. Furthermore, the function H is used to avoid tensor products, i.e., taking second partials of a dynamic vector function f(x(t), u o (x(t), t), t) with respect to the state vector. The boundary condition for (3.174) is V xx (x(t f ), t f ) = φ xx (x(t f )). (3.175) Since a differential equation for V xx along the optimal path is being sought, V xx is directly differentiated as 3.5. Sufficient Conditions for Optimality 95 dV xx (x o (t), t) dt = n ¸ i H V x i (x o (t), u o (x o (t), t), V x (x o (t), t), t)V x i xx (x o (t), t) + V xxt (x o (t), t) = n ¸ i f i (x o (t), u o (x o (t), t), V x (x o (t), t), t)V x i xx (x o (t), t) + V xxt (x o (t), t), (3.176) where the sum is used to construct the tensor product. Substitution of Equa- tion (3.174) into (3.176) gives the differential equation − dV xx (x o (t), t) dt = H xx (x o (t), u o (x o (t), t), V x (x o (t), t), t) +H xu (x o (t), u o (x o (t), t), V x (x o (t), t), t)u o x (x o (t), t) +H xV x (x o (t), u o (x o (t), t), V x (x o (t), t), t)V xx (x o (t), t) +V xx (x o (t), t)H V x x (x o (t), u o (x o (t), t), V x (x o (t), t), t) +V xx (x o (t), t)H V x u (x o (t), u o (x o (t), t), V x (x o (t), t), t)u o x (x o (t), t) +u oT x (x o (t), t)H ux (x o (t), u o (x o (t), t), V x (x o (t), t), t) +u oT x H uV x (x o (t), u o (x o (t), t), V x (x o (t), t), t)V xx +u oT x (x o (t), t)H uu (x o (t), u o (x o (t), t), V x (x o (t), t), t)u o x (x o (t), t). (3.177) Finally, an expression for u o x (x o (t), t) is required. From Assumptions 3.5.1 and 3.5.2, it follows that H u (x o (t), V x (x o (t), t), u o (x o (t), t), t) = 0 (3.178) is true for all x o (t), and then H ux (x o (t), u o (x o (t), t), V x (x o (t), t), t) +H uV x (x o (t), u o (x o (t), t), V x (x o (t), t), t)V xx (x o (t), t) +H uu (x o (t), u o (x o (t), t), V x (x o (t), t), t)u o x (x o (t), t) = 0. (3.179) 96 Chapter 3. Systems with General Performance Criteria This, given Assumption 3.5.2, produces u o x (x o (t), t) = −H −1 uu (x o (t), u o (x o (t), t), V x (x o (t), t), t) (H ux (x o (t), u o (x o (t), t), V x (x o (t), t), t) + H uV x (x o (t), u o (x o (t), t), V x (x o (t), t), t)V xx (x o (t), t)) . (3.180) Substitution of Equation (3.180) into (3.177) gives, after some manipulations and removing the arguments, − dV xx dt = (f T x −H xu H −1 uu f T u )V xx + V xx (f x −f u H −1 uu H ux ) (3.181) +(H xx −H xu H −1 uu H ux ) −V xx f u H −1 uu f T u V xx , where H V x x = f x and H V x u = f u . Note that expanding H xx , H xu , and H uu produces tensor products. The curvature of V (x o (t), t), V xx (x o (t), t) along an optimal path is propagated by a Riccati differential equation, and the existence of its solution will play an important role in the development of local sufficiency for a weak minimum. Note that the problem of minimizing a quadratic cost criterion subject to a linear dynamic constraint as given in Example 3.5.1 produces a Riccati differential equation in the symmetric matrix S(t). More results will be given in Chapter 5. 3.5.2 Derivation of the H-J-B Equation In Theorem 3.5.1 we showed that if there is a solution to the H-J-B equation which satisfies certain conditions, then this solution evaluated at t = t 0 is the optimal value of (3.2). Here, we prove the converse, viz., that under certain assumptions on the optimal value function, the H-J-B equation can be deduced. Let us define V (x(t), t) = min u(·)∈U B ¸ φ(x(t f )) + t f t L(x(τ), u(τ), τ)dτ , (3.182) 3.5. Sufficient Conditions for Optimality 97 where u(τ) ∈ | B for all τ in [t, t f ], and let u o (τ; x(t), t), t ≤ τ ≤ t f be the optimal control function for the dynamic system (3.41) with “initial condition” x(t) at τ = t. Then V (x(t), t) = φ(x(t f )) + t f t L(x(τ), u o (τ; x(t), t), τ)dτ, (3.183) so that d dt V (x(t), t) = −L(x(t), u o (t; x(t), t), t) (3.184) and V (x(t f ), t f ) = φ(x(t f )). (3.185) We can now proceed to the theorem. Theorem 3.5.2 Suppose that the optimal value function V (, ) defined by (3.184) and (3.185) is once continuously differentiable in x(t) and t. Then V (, ) satisfies the H-J-B equation (3.123) and (3.124). Proof: From the existence of V (, ), (3.184), and the differentiability of V (, ) we have that −V t (x(t), t) = V x (x(t), t)f(x(t), u o (t; x(t), t), t) + L(x(t), u o (t; x(t), t), t). (3.186) Furthermore, it follows from (3.182) that V (x(t), t) ≤ t+∆ t L(x(τ), u(τ), τ)dτ + t f t+∆ L(x(τ), u o (τ; x(t + ∆), (t + ∆)), τ)dτ +φ(x(t f )), (3.187) 98 Chapter 3. Systems with General Performance Criteria where u(τ) ∈ | B , t ≤ τ ≤ t + ∆, is an arbitrary continuous function for some positive 0 < ∆ <t f −t. This yields V (x(t), t) ≤ t+∆ t L(x(τ), u(τ), τ)dτ + V (x(t + ∆), (t + ∆)), (3.188) and expanding the right-hand side in ∆ yields V (x(t), t) ≤ L(x(t), u(t), t)∆ + V (x(t), t) + V t (x(t), t)∆ +V x (x(t), t)f(x(t), u(t), t)∆ +O(∆), (3.189) which in turn yields 0 ≤ [L(x(t), u(t), t) + V t (x(t), t) + V x (x(t), t)f(x(t), u(t), t)] ∆ +O(∆). (3.190) This inequality holds for all continuous u() on [t, t + ∆] and all ∆ ≥ 0. Hence, for all u(t) ∈ | B 0 ≤ L(x(t), u(t), t) + V t (x(t), t) + V x (x(t), t)f(x(t), u(t), t). (3.191) Considering (3.186) and (3.191) together we conclude that min u(t)∈U B [V t (x(t), t) + L(x(t), u(t), t) + V x (x(t), t)f(x(t), u(t), t)] = 0. (3.192) Since V t (x(t), t) does not depend upon u(t), we can rewrite (3.192) as −V t (x(t), t) = min u(t)∈U B [L(x(t), u(t), t) + V x (x(t), t)f(x(t), u(t), t)] . (3.193) This, together with (3.185), is just the H-J-B equation (3.123) and (3.124). Note that the above theorem and proof are based squarely on the assumption that V (, ) is once continuously differentiable in both arguments. It turns out that there are problems where this smoothness is not present so that this assumption is violated. 3.6. Unspecified Final Time t f 99 In any event, however, it is a nasty assumption as one has to solve the optimal control problem to obtain V (, ) before one can verify it. Consequently, the above theorem is largely only of theoretical value (it increases one’s insight into the H-J-B equation). The main strength of the H-J-B equation lies in Theorem 3.5.1, which provides sufficient conditions for optimality. In many derivations of the H-J-B equation it is assumed that the second partial derivatives of V (x(t), t) exist. This is done because it is known that in certain classes of optimal control problems optimality is lost (i.e., J can be made arbitrarily large and negative) if V xx (x(t), t) ceases to exist at a certain time t . However, this is a red herring in the derivation of the H-J-B equation; the expansion in (3.189) is clearly valid if one merely assumes that V (x(t), t) exists and is once continuously differentiable. 3.6 Unspecified Final Time t f So far in this chapter we have assumed that the final time t f is given. One could, however, treat t f as a parameter and, along with u(), choose it to minimize (3.2). Clearly, a necessary condition for the optimality of t f = t o f is that J t f (u o (); x 0 , t o f ) = 0, (3.194) where J(u(); x 0 , t f ) = φ(x(t f ), t f ) + t f t 0 L(x(t), u(t), t)dt, (3.195) provided, of course, that J is differentiable at t o f . Equation (3.194) is known as a transversality condition. 100 Chapter 3. Systems with General Performance Criteria Note that in (3.195) we allow a more general φ(, ) than previously, viz., one that depends on both x(t f ) and t f . Naturally, in the case where t f is given, nothing is gained by this explicit dependence of φ on t f . Suppose that u o () minimizes (3.195) when t f = t o f . Then J(u o (); x 0 , t o f ) = φ(x o (t o f ), t o f ) + t o f t 0 L(x o (t), u o (t), t)dt (3.196) and J(u o (); x 0 , t o f + ∆) = φ(x o (t o f + ∆), t o f + ∆) + t o f +∆ t 0 L(x o (t), u o (t), t)dt, (3.197) where, if ∆ > 0, u o () over the interval (t o f , t o f + ∆) is any continuous function emanating from u o (t f ) with values in |. Subtracting (3.196) from (3.197) and expanding the right-hand side of (3.197) yields J(u o (); x 0 , t o f + ∆) −J(u o (); x 0 , t o f ) = L(x o (t o f ), u o (t o f ), t o f )∆ + φ t f (x o (t o f ), t o f )∆ +φ x (x o (t o f ), t o f )f(x o (t o f ), u o (t o f ), t o f )∆ +O(∆). (3.198) It follows that for t o f to be optimal, J t f (u o (); x 0 , t o f ) = L(x o (t o f ), u o (t o f ), t o f ) +φ x (x o (t o f ), t o f )f(x o (t o f ), u o (t o f ), t o f ) + φ t f (x o (t o f ), t o f ) = 0. (3.199) This condition (the so-called transversality condition) can be written using the Hamiltonian H as φ t f (x o (t o f ), t o f ) + H(x o (t o f ), u o (t o f ), λ(t o f ), t o f ) = 0. (3.200) Since, by (3.109), H is minimized by u o (t o f ) and since x o (), λ(), L(, , ), and f(, , ) are continuous functions of time, it is clear that u o (t o f ) and u o (t o− f ) both yield the 3.6. Unspecified Final Time t f 101 same value of H. It should be noted that jumps are allowed in the control, where identical values of the minimum Hamiltonian H will occur at different values of u at t f . Consequently, (3.200) holds with u o (t o f ) replaced by u o (t o− f ) and because of this it is not necessary to assume for ∆ < 0 that u o () is continuous from the left at t o f . We have thus proved the following theorem. Theorem 3.6.1 Suppose that φ(, ) depends explicitly on t f as in (3.195) and that t f is unspecified. Then, if the pair u o (), t o f minimizes (3.195), the condition φ t f (x o (t o f ), t o f ) + H(x o (t o f ), u o (t o f ), λ(t o f ), t o f ) = 0 (3.201) holds in addition to (3.109). Remark 3.6.1 It should be noted that free final time problems can always be solved by augmenting the state vector with the time by defining a new independent variable and solving the resulting problem as a fixed final “time” problem. This is done by using the transformation t = (t f −t 0 )τ + t 0 , (3.202) ⇒dt = (t f −t 0 )dτ, (3.203) where τ goes from 0 to 1. The differential constraints become ¸ x t = ¸ (t f −t 0 )f(τ, x, u, t f ) (t f −t 0 ) , (3.204) where x = dx/dτ. We will not use this approach here. Example 3.6.1 The simplest example to illustrate (3.201) is the scalar system (n = m = 1) ˙ x(t) = bu(t), b = 0, x(t 0 ) = x 0 , (3.205) 102 Chapter 3. Systems with General Performance Criteria with a performance criterion J(u(); x 0 , t f ) = αx(t f ) + 1 2 t 2 f + t f t 0 1 2 u 2 (t)dt. (3.206) Here H(x(t), u(t), λ(t), t) = 1 2 u(t) 2 + λ(t)bu(t) (3.207) so that λ(t) = α (3.208) and u o (t) = −bα. (3.209) From (3.201), the optimal final time is determined as t o f − 1 2 b 2 α 2 = 0 ⇒ t o f = 1 2 b 2 α 2 . (3.210) The optimal cost criterion is J(u o (); x 0 , t f ) = αx 0 − 1 2 α 2 b 2 (t f −t 0 ) + 1 2 t 2 f (3.211) evaluated at t o f given in (3.210). Its derivative with respect to t f is zero at t o f , thus verifying (3.199). Problems 1. Extremize the performance index J = 1 2 1 0 e (u−x) 2 dt subject to ˙ x = u, x(0) = x 0 = 1. 3.6. Unspecified Final Time t f 103 (a) Find the state, control, and multipliers as a function of time. (b) Show that the Hamiltonian is a constant along the motion. 2. Minimize the performance index J = 2 0 ([u[ −x)dt subject to ˙ x = u, x(0) = 1, [u[ ≤ 1. (a) Determine the optimal state and control histories. (b) Discuss the derivation of the first-order necessary conditions and their underlying assumption. What modifications in the theory are required for this problem? Are there any difficulties making the extension? Explain. 3. Consider the following control problem with ˙ x = A(t)x(t) + B(t)u(t), x(t 0 ) = x 0 , min u(t) J = t f t 0 (a T x + 1 2 u T Ru)dt, R(t) > 0. Prove that for this problem, a satisfaction of Pontryagin’s Minimum Principle by a control function u ∗ is a necessary and sufficient condition for u ∗ to be the control that minimizes J. 4. Consider the problem of minimizing with respect to the control u() the cost criterion J = t f t 0 1 2 u T Rudt + φ(x(t f )), R > 0, subject to ˙ x = Ax + Bu, x(t 0 ) = x 0 (given), 104 Chapter 3. Systems with General Performance Criteria where φ(x(t f )) is twice continuously differentiable function of x. Suppose that we require u(t) to be piecewise constant as follows: u(t) = u 0 , t 0 ≤ t ≤ t 1 , u 1 , t 1 ≤ t ≤ t f , where t 1 is specified and t 0 < t 1 < t f . Derive from first principles the first-order necessary conditions that must be satisfied by an optimal control function u ∗ (t) = u ∗ 0 , t 0 ≤ t ≤ t 1 , u ∗ 1 , t 1 ≤ t ≤ t f . State explicitly all assumptions made. 5. Consider the problem of minimizing with respect to u() the cost criterion J = lim t f →∞ t f 0 ¸ qx 4 4 + u 2 2 dt subject to ˙ x = u. Find the optimal feedback law. Hint: If you use the H-J-B equation to solve this problem, assume that the optimal value function is only an explicit function of x. Check to ensure that all conditions are satisfied for H-J-B theory. 6. Minimize the functional min u(·)∈U = t f 0 (x 2 + u 2 )dt subject to ˙ x = u, x(0) = x 0 . (a) Write down the first-order necessary conditions. 3.6. Unspecified Final Time t f 105 (b) Solve the two-point boundary-value problem so that u is obtained as a function of t and x. (c) Show that the Hamiltonian is a constant along the extremal path. 7. Consider the problem of the Euler–Lagrange equation min u(·)∈U t f 0 L(x, u)dt, where u(t) ∈ R m and x(t) ∈ R n subject to ˙ x = u, x(0) = x 0 . (a) Show that the first-order necessary conditions reduce to d dt L u −L x = d dt L ˙ x −L x = 0. (b) Show that the Hamiltonian is a constant along the motion. 8. Let T = 1 2 m˙ x 2 , U = 1 2 kx 2 , and L = T −U. Use the Euler–Lagrange equations given in problem 7 to produce the equation of motion. 9. Extremize the performance index J = 1 2 ¸ cx 2 f + t f t 0 u 2 dt , where c is a positive constant. The dynamic system is defined by ˙ x = u, 106 Chapter 3. Systems with General Performance Criteria and the prescribed boundary conditions are t 0 , x 0 , t f ≡ given. 10. Extremize the performance index J = 1 2 t f t 0 (u 2 −x)dt subject to the differential constraint ˙ x = 0 and the prescribed boundary conditions t 0 = 0, x 0 = 0, t f = 1. 11. Extremize the performance index J = 1 2 t f t 0 ¸ (u 2 + 1) x 1/2 dt subject to the differential constraint ˙ x = u and the prescribed boundary conditions t 0 = 0, x 0 > 0, t f = given > 0. 12. Minimize the performance index J = bx(t f ) + 1 2 t f t 0 ¸ −[u(t) −a(t)] 2 + u 4 ¸ dt subject to the differential constraint ˙ x = u 3.6. Unspecified Final Time t f 107 and the prescribed boundary conditions t 0 = 0, x 0 = 1, a(t) = t, t f = 1. Find a value for b so that the control jumps somewhere in the interval [0, 1]. 13. Minimize analytically and numerically the performance index J = α 1 x 1 (t f ) 2 + α 2 x 2 (t f ) 2 + t f t 0 [u[ dt subject to ¨ x = u, [u[ ≤ 1, and the prescribed boundary conditions t 0 = 0, x 1 (0) = 10, x 2 (0) = 0, t f = 10. Since L = [u[ is not continuously differentiable, discuss the theoretical consequences. 14. Minimize the performance index J = 6 0 ([u[ −x 1 )dt −2x 1 (6) subject to ˙ x 1 = x 2 , x 1 (0) = 0, ˙ x 2 = u, x 2 (0) = 0, [u[ ≤ 1. Determine the optimal state and control histories. 15. Consider the problem of minimizing with respect to the control u() the cost criterion J = φ(x(t f )) + t f t 0 L(x, u, t)dt 108 Chapter 3. Systems with General Performance Criteria subject to ˙ x = f(x, u, t), x(t 0 ) = x 0 given, where F is continuously differentiable in x, and L and f are continuously differentiable in x, u, and t. Suppose that we require u(t) to be piecewise constant as follows: u(t) = u 0 , t 0 ≤ t ≤ t 1 , u 1 , t 1 ≤ t ≤ t f , where t 1 is specified and t 0 < t 1 < t f . Derive from first principles the first-order necessary conditions that must be satisfied by an optimal control function, i.e., u o (t) = u o 0 , t 0 ≤ t ≤ t 1 , u o 1 , t 1 ≤ t ≤ t f . 16. Consider the problem of minimizing φ(x(t f )) = α T x(t f ), t f fixed, subject to ˙ x = Ax + Bu for u ∈ | = ¦u : −1 ≤ u i ≤ 1, i = 1, . . . , m¦. (a) Determine the optimal value function V (x, t). (b) Write the change in cost from optimal using Hilbert’s integral and explain how this change in cost can be computed. 17. Consider the problem of minimizing with respect to u() ∈ | J = F(x(t 0 ) + φ(x(t f )) 3.6. Unspecified Final Time t f 109 subject to ˙ x = Ax + Bu, | = ¦u() : u() are piecewise continuous functions¦, where t 0 is the initial time, and t 0 < t f , with both fixed. Develop and give the conditions for weak and strong local optimality. List the necessary assumptions. 18. Minimize the performance index J = 4 0 ([u[ −x)dt + 2x(4) subject to ˙ x = u x(0) = 0, [u[ ≤ 1. Determine the optimal state and control histories. 19. Derive the H-J-B equation for the optimization problem of minimizing, with respect to u() ∈ |, the cost criterion J = e t f t 0 Ldt+φ(x(t f )) subject to ˙ x = f(x, u, t). 20. Find the control u() ∈ | B = ¦u : [u[ ≤ 1¦ that minimizes J = (x 1 (50) −10) 2 q 1 + x 2 2 (50)q 2 + 50 0 [u[dt 110 Chapter 3. Systems with General Performance Criteria subject to ¸ ˙ x 1 ˙ x 2 = ¸ 0 1 0 0 ¸ x 1 x 2 + ¸ 0 1 u, ¸ x 1 (0) x 2 (0) = ¸ 0 0 . Choose q 1 and q 2 large enough to approximately satisfy the terminal constraints x 1 (50) = 10 and x 2 (50) = 0. 111 Chapter 4 Terminal Equality Constraints 4.1 Introduction Theorem 3.3.1, a weak form of the Pontryagin Principle, is a condition that is satisfied if x o (), u o () are an optimal pair (i.e., a necessary condition for optimality). It is also clear from (3.55) that satisfaction of this condition is sufficient for the change in the performance criterion to be nonnegative, to first order, for any weak perturbation in the control away from u o (). In this chapter we first derive a weak form of the Pontryagin Principle as a necessary condition for optimality when linear terminal equality constraints are present and when the dynamic system is linear. When nonlinear terminal equality constraints are present and when the dynamic system is nonlinear, the elementary constructions used in this book turn out to be inadequate to deduce necessary conditions; indeed, deeper mathematics is required here for rigorous derivations. If deriving necessary conditions in these more involved control problem formulations is out of our reach, we can, nevertheless, rather easily and rigorously show that Pontryagin’s Principle is a sufficient condition for “weak or strong first-order optimality.” This confirms that if Pontryagin’s Principle is satisfied, the change in J can be negative only as a 112 Chapter 4. Terminal Equality Constraints consequence of second- and/or higher-order terms in its expansion. Thus Pontryagin’s Principle can be thought of as a first-order optimality condition. By way of introducing optimal control problems with terminal equality constraints we begin in Section 4.2.1 with the problem of steering a linear dynamic system from an initial point to the origin in specified time while minimizing the control “energy” consumed. It turns out that the most elementary, direct methods are adequate to handle this problem. We then derive a Pontryagin-type necessary condition for optimality for the case of a general performance criterion and both linear and nonlinear terminal equality constraints. Turning to nonlinear terminal equality constraints and nonlinear dynamic systems we introduce the notion of weak first- order optimality and state a weak form of the Pontryagin Principle. In line with our above remarks, we prove that the conditions of this principle are sufficient for weak first-order optimality and refer to rigorous proofs in the literature that the principle is a necessary condition for optimality; we comment here also on the notion of normality. We then remark that the two-point boundary-value problem that arises when terminal constraints are present is more involved than that in the unconstrained case; we briefly introduce a penalty function approach which circumvents this. Our results are then extended by allowing control constraints and by introducing strong first-order optimality. In particular, we show that Pontryagin’s Principle in strong form is a sufficient condition for strong first-order optimality; again we refer to the literature for a proof that the principle is a necessary condition for optimality. Our next theorem allows the terminal time t f to be unspecified. Finally, we obtain a sufficient condition for global optimality via a generalized Hamilton–Jacobi–Bellman equation. Throughout, examples are presented which illustrate and clarify the use of the theorems in control problems with terminal constraints. 4.2. Linear Dynamic System with Terminal Equality Constraints 113 4.2 Linear Dynamic System with General Performance Criterion and Terminal Equality Constraints We derive a Pontryagin-type necessary condition for optimality for the case of a general performance criterion and both linear (Section 4.2.1) and nonlinear (Section 4.2.2) terminal equality constraint, but with linear system dynamics. These conditions, using elementary constructions, are adequate to deduce first-order necessary conditions for optimality. 4.2.1 Linear Dynamic System with Linear Terminal Equality Constraints We return to the problem formulated in Section 3.2, where (3.1) is to be controlled to minimize the performance criterion (3.2). Now, however, we impose the restriction that, at the given final time t f , the linear equality constraint Dx(t f ) = 0 (4.1) be satisfied. Here, D is a p n constant matrix. An important special case of the above formulation occurs when L(x(t), u(t), t) = u T (t)u(t), φ(x(t f )) ≡ 0, (4.2) and D is the n n identity matrix. Then, the problem is one of steering the initial state of (3.1) from x 0 to the origin of the state space at time t = t f and minimizing the “energy” J(u(); x 0 ) = t f t 0 u T (t)u(t)dt. (4.3) 114 Chapter 4. Terminal Equality Constraints Actually, it is easy to solve this particular problem directly, without appealing to the Pontryagin Principle derived later in this section. We need the following assumptions. Assumption 4.2.1 The control function u() is drawn from the set of piecewise continuous m-vector functions of t on the interval [t 0 , t f ] that meet the terminal constraints, u() ∈ | T . Assumption 4.2.2 The linear dynamic system (3.1) is controllable (see [8]) from t = t 0 to t = t f , viz., W(t 0 , t f ) = t f t 0 Φ(t f , τ)B(τ)B T (τ)Φ T (t f , τ)dτ (4.4) is positive definite, where d dt Φ(t, τ) = A(t)Φ(t, τ), Φ(τ, τ) = I. (4.5) With this assumption we can set u o (t) = −B T (t)Φ T (t f , t)W −1 (t 0 , t f )Φ(t f , t 0 )x 0 , (4.6) and, substituting this into (3.3), the solution of (3.1), we obtain x o (t f ) = Φ(t f , t 0 )x 0 − ¸ t f t 0 Φ(t f , τ)B(τ)B T (τ)Φ T (t f , τ)dτ W −1 (t 0 , t f )Φ(t f , t 0 )x 0 = Φ(t f , t 0 )x 0 −Φ(t f , t 0 )x 0 = 0 (4.7) so that the control function u o () steers x 0 to the origin of the state space at time t = t f . Now suppose that u() is any other control function that steers x 0 to the origin at time t f . Then, using (3.3) we have 0 = Φ(t f , t 0 )x 0 + t f t 0 Φ(t f , τ)B(τ)u(τ)dτ (4.8) 4.2. Linear Dynamic System with Terminal Equality Constraints 115 and 0 = Φ(t f , t 0 )x 0 + t f t 0 Φ(t f , τ)B(τ)u o (τ)dτ. (4.9) Subtracting these two equations yields t f t 0 Φ(t f , τ)B(τ)[u(τ) −u o (τ)]dτ = 0, (4.10) and premultiplying by 2x 0 Φ T (t f , t 0 )W −1 (t 0 , t f ) yields 2 t f t 0 u oT (τ)[u(τ) −u o (τ)]dτ = 0. (4.11) Subtracting the optimal control energy from any comparison control energy associated with u() ∈ | T and using (4.11), we have ∆J = t f t 0 u T (t)u(t)dt − t f t 0 u oT (t)u o (t)dt = t f t 0 u T (t)u(t)dt − t f t 0 u oT (t)u o (t)dt −2 t f t 0 u oT (τ)[u(τ) −u o (τ)]dτ = t f t 0 [u(t) −u o (t)] T [u(t) −u o (t)]dt ≥ 0, (4.12) which establishes that u o () is minimizing, since for any u = u o the cost increases. While the above treatment is quite adequate when (4.2) holds, it does not extend readily to the general case of (3.2) and (4.1). We therefore use the approach of Sec- tion 3.2.3, adjoining (3.1) by means of a continuously differentiable n-vector function of time λ() and (4.1) by means of a p-vector ν, as follows: ˆ J(u(); λ(), ν, x 0 ) = J(u(); x 0 ) (4.13) + t f t 0 λ T (t)[A(t)x(t) +B(t)u(t) − ˙ x(t)]dt + ν T Dx(t f ). Note that ˆ J(u(); λ(), ν, x 0 ) = J(u(); x 0 ) (4.14) 116 Chapter 4. Terminal Equality Constraints when (3.1) and (4.1) hold. Integrating by parts we obtain ˆ J(u(); λ(), ν, x 0 ) = J(u(); x 0 ) + t f t 0 ˙ λ T (t)x(t) + λ T (t)A(t)x(t) +λ T (t)B(t)u(t) dt +λ T (t 0 )x 0 −λ T (t f )x(t f ) + ν T Dx(t f ). (4.15) Let us suppose that there is a piecewise continuous control function u o (t) that minimizes (3.2) and causes Dx o (t f ) = 0. (4.16) Next, evaluate the change in ˆ J brought about by changing u o () to u o () +εη(). Note that the change in ˆ J is equal to the change in J if the perturbation εη() is such that the perturbation in the trajectory at time t = t f satisfies D[x o (t f ) + ξ(t f , ε)] = 0, (4.17) i.e., u o () + εη() ∈ | T . Using (3.8), it follows that (4.17) holds only if Dξ(t f ; ε) = εDz(t f , η()) = 0. (4.18) We return to this point later. Referring to (3.14) and noting that (4.18) is relaxed below where it is adjoined to the perturbed cost by the Lagrange multiplier ν through the presence of ν T Dx(t f ) in (4.15), we have ∆ ˆ J = ˆ J(u o () + εη(); λ(), ν, x 0 ) − ˆ J(u o (); λ(), ν, x 0 ) = t f t 0 εL x (x o (t), u o (t), t)z(t; η()) + εL u (x o (t), u o (t), t)η(t) + ε ˙ λ T (t)z(t; η()) +ελ T (t)A(t)z(t; η()) + ελ T (t)B(t)η(t) +O(t; ε) dt −ελ T (t f )z(t f ; η()) + εφ x (x o (t f ))z(t f ; η()) + εν T Dz(t f ; η()) +O(ε). (4.19) 4.2. Linear Dynamic System with Terminal Equality Constraints 117 Now let us set − ˙ λ T (t) = L x (x o (t), u o (t), t) + λ T (t)A(t), (4.20) λ T (t f ) = φ x (x o (t f )) + ν T D. For fixed ν this is a legitimate choice for λ() as (4.20) is a linear ordinary differential equation in λ(t) with piecewise continuous coefficients, having a unique solution. The right-hand side of (4.19) then becomes ∆ ˆ J = ε t f t 0 L u (x o (t), u o (t), t) + λ T (t)B(t) η(t)dt + t f t 0 O(t; ε)dt +O(ε). (4.21) We set η(t) = − L u (x o (t), u o (t), t) + λ T (t)B(t) T , (4.22) which yields a piecewise continuous perturbation, and we now show that, under a certain assumption, a ν can be found such that η() given by (4.22) causes (4.18) to hold. We first introduce the following assumption. Assumption 4.2.3 The p p matrix ¯ W(t 0 , t f ) is positive definite where (see [8]) ¯ W(t 0 , t f ) = t f t 0 DΦ(t f , τ)B(τ)B T (τ)Φ T (t f , τ)D T dτ = DW(t 0 , t f )D T . (4.23) Note that this assumption is weaker than Assumption 4.2.2, being equivalent when p = n and is called output controllability where the output is y = Dx. From the linearity of (3.1) we have that z(t; η()) satisfies the equation ˙ z(t; η()) = A(t)z(t; η()) + B(t)η(t), z(t 0 ; η()) = 0, (4.24) 118 Chapter 4. Terminal Equality Constraints so that z(t f ; η()) = t f t 0 Φ(t f , τ)B(τ)η(τ)dτ, (4.25) and using (4.22) z(t f , η()) = − t f t 0 Φ(t f , τ)B(τ) L u (x o (τ), u o (τ), τ) + λ T (τ)B(τ) T dτ. (4.26) From (4.20) we have λ(t) = Φ T (t f , t)λ(t f ) + t f t Φ T (τ, t)L T x (x o (τ), u o (τ), τ)dτ = Φ T (t f , t)φ T x (x o (t f )) + Φ T (t f , t)D T ν + t f t Φ T (¯ τ, t)L T x (x o (¯ τ), u o (¯ τ), ¯ τ)d¯ τ. (4.27) Premultiplying (4.26) by D and using (4.27), we obtain Dz(t f ; η()) = − t f t 0 DΦ(t f , τ)B(τ) L T u (x o (τ), u o (τ), τ) + B T (τ)Φ T (t f , τ)φ T x (x o (t f )) +B T (τ) t f τ Φ T (¯ τ, τ)L T x (x o (¯ τ), u o (¯ τ), ¯ τ)d¯ τ dτ − t f t 0 DΦ(t f , τ)B(τ)B T (τ)Φ T (t f , τ)D T dτν. (4.28) Setting the left-hand side of (4.28) equal to zero we can, in view of Assumption 4.2.3, uniquely solve for ν in terms of the remaining (all known) quantities in (4.28). Con- sequently, we have proved that there exists a ν, independent of ε, such that η() given by (4.22) causes (4.18) to be satisfied. With this choice of η(), (4.14) holds, and we have that the change in ˆ J is ∆ ˆ J = −ε t f t 0 |L u (x o (t), u o (t), t) + λ T (t)B(t)| 2 dt + t f t 0 O(t; ε)dt +O(ε) ≥ 0. (4.29) 4.2. Linear Dynamic System with Terminal Equality Constraints 119 Using the usual limiting argument that O(t; ε)/ε → 0 and O(ε)/ε → 0 as ε → 0, then since the first-order term dominates for optimality as ε →0, t f t 0 |L u (x o (t), u o (t), t) + λ T (t)B(t)| 2 dt ≥ 0. (4.30) It then follows that a necessary condition for u o () to minimize J is that there exists a p-vector ν such that L u (x o (t), u o (t), t) + λ T (t)B(t) = 0 ∀ t in [t 0 , t f ], (4.31) where − ˙ λ T (t) = L x (x o (t), u o (t), t) + λ T (t)A(t), λ T (t f ) = φ x (x o (t f )) + ν T D. (4.32) As in Section 3.2.6, the necessary condition derived above can be restated in terms of the Hamiltonian to yield a weak version of Pontryagin’s Principle for the optimal control problem formulated with linear dynamics and linear terminal constrants. Theorem 4.2.1 Suppose that u o () minimizes the performance criterion (3.2) subject to the dynamic system (3.1) and the terminal constraint (4.1) and that Assumptions 3.2.1, 3.2.3, 3.2.4, 4.2.1, and 4.2.3 hold. Suppose further that H(x, u, λ, t) is defined according to (3.23). Then, there exists a p-vector ν such that the partial derivative of the Hamiltonian with respect to the control u(t) is zero when evaluated at the optimal state and control x o (t), u o (t), viz., H u (x o (t), u o (t), λ(t), t) = 0 ∀ t in [t 0 , t f ], (4.33) where − ˙ λ T (t) = H x (x o (t), u o (t), λ(t), t), λ T (t f ) = φ x (x o (t f )) + ν T D. (4.34) 120 Chapter 4. Terminal Equality Constraints 4.2.2 Pontryagin Necessary Condition: Special Case We return to the special case specified by (4.2). From (4.7) it follows that u o () given by (4.6) steers x 0 to the origin at time t f , and from (4.12) we concluded that u o () is in fact minimizing. We now show that u o () satisfies Pontryagin’s necessary condition. Indeed, by substituting (4.6) into (4.28), ν becomes with D = I ν = 2W −1 (t 0 , t f )Φ(t f , t 0 )x 0 . (4.35) Also, from (4.27) and with D = I, λ(t) = 2Φ T (t f , t)W −1 (t 0 , t f )Φ(t f , t 0 )x 0 . (4.36) Finally, H u (x o (t), u o (t), λ, t) = 2u oT (t) + 2B T (t)Φ T (t f , t)W −1 (t 0 , t f )Φ(t f , t 0 )x 0 , (4.37) which, by (4.6), is zero for all t in [t 0 , t f ]. 4.2.3 Linear Dynamics with Nonlinear Terminal Equality Constraints We now turn to a more general terminal constraint than (4.1), viz., ψ(x(t f )) = 0, (4.38) where ψ() is a p-dimensional vector function of its n-dimensional argument, which satisfies the following assumption. Assumption 4.2.4 The p-dimensional function ψ() is once continuously differentiable in x. As a consequence of Assumption 4.2.4 we can write ψ(x o (t f ) + εz(t f ; η())) = ψ(x o (t f )) + εψ x (x o (t f ))z(t f ; η()) +O(ε). (4.39) 4.2. Linear Dynamic System with Terminal Equality Constraints 121 The change in J, caused by a perturbation εη() in the control function which is so chosen that ψ(x o (t f ) + ξ(t f ; ε)) = 0, (4.40) is therefore given by (4.19) with D replaced by ψ x (x o (t f )). We can then proceed to set − ˙ λ T (t) = L x (x o (t), u o (t), t) + λ T (t)A(t), (4.41) λ T (t f ) = φ x (x o (t f )) + ν T ψ x (x o (t f )) to yield, as before, the change in J as given by (4.21). If we then specify η() by (4.22), we cannot directly show (i.e., by solving for ν using elementary mathematics) that there exists a ν such that, when u o () + εη() is applied to (3.1), (4.40) is satisfied; deeper mathematics 6 is required. However, it follows directly from (4.28) that we could calculate a ν such that ψ x (x o (t f ))z(t f ; η()) = 0 (4.42) by replacing in (4.28) D by ψ x (x o (t f )). In other words, we could show that there exists a ν such that u o () +εη() satisfies (4.40) to first-order in ε. If we then replace the requirement (4.40) by (4.42) we would arrive at Theorem 4.2.1 with D replaced by ψ x (x o (t f )). Although this is in fact the correct Pontryagin necessary condition for this problem, we would have arrived at it nonrigorously, viz., by replacing (4.40) with (4.42), without rigorously justifying the satisfaction of the equality constraint to only first order. 6 A form of the Implicit Function Theorem associated with constructing controls that meet nonlinear constraints exactly [38] is needed. [28] gives some insight into the issues for nonlinear programming problems. 122 Chapter 4. Terminal Equality Constraints In view of the above, we prefer to show that Pontryagin’s Principle is a sufficient condition for “first-order optimality”; this permits a rigorous treatment. Therefore, as stated in Section 4.1, the emphasis of this book, as far as proofs are concerned, swings now to developing sufficient conditions for first-order optimality. Reference is made to rigorous proofs, where these exist, that the conditions are also necessary for optimality. As outlined above, it is the nonlinearity of the terminal constraint that makes our straightforward approach inadequate for the derivation of Pontryagin’s Principle. Once a nonlinear terminal constraint is present and we have opted for showing that Pontryagin’s Principle is sufficient for first-order optimality, nothing is gained by restricting attention to the linear dynamic system (3.1). Consequently we treat the problem of minimizing (3.2) subject to (3.41) and (4.38). 4.3 Weak First-Order Optimality with Nonlinear Dynamics and Terminal Constraints We allow weak perturbations in the control of the form u o () + εη(), where η() is a piecewise continuous m-vector function on [t 0 , t f ]. We prove that the conditions for the weak form of the Pontryagin Principle are sufficient for weak first-order optimality, which is defined as follows. Definition 4.3.1 J, given in (3.2), is weakly first-order optimal subject to (3.41) and (4.38) at u o () if the first-order change in J, caused by any weak perturbation εη() of u o (), which maintains (4.38) to be satisfied or u o () + εη() ∈ | T , is nonnegative. Definition 4.3.2 J, given in (3.2), is weakly locally optimal subject to (3.41) and (4.38) at u o () if there exists an ¯ ε > 0 such that the change in J is nonnegative for 4.3. Weak First-Order Optimality 123 all perturbations ¯ η() of u o () which maintain (4.38) and which satisfy |¯ η()| ≤ ¯ ε for all t in [t 0 , t f ], or ∆ ˆ J ≥ 0 ∀ u o () + ¯ η() ∈ | T . (4.43) 4.3.1 Sufficient Condition for Weakly First-Order Optimality Weakly first-order optimal does not usually imply weakly locally optimal. Weakly first-order optimal merely implies that if J is not weakly locally optimal at u o (), this must be owing to higher-order, and not first-order, effects. Following the approach in (4.13) and (4.15), we adjoin the system equation (3.41) and the terminal constraint (4.38) to J and integrate by parts to obtain ˆ J(u(); λ(), ν, x 0 ) = J(u(); x 0 ) + t f t 0 ˙ λ T (t)x(t) + λ T (t)f(x(t), u(t), t) dt +λ T (t 0 )x 0 −λ T (t f )x(t f ) + ν T ψ(x(t f )), (4.44) and it follows that ˆ J(u(); λ(), ν, x 0 ) = J(u(); x 0 ) (4.45) whenever (3.41) and (4.38) are satisfied. We now evaluate the change in ˆ J (i.e., in J) brought about by changing u o () to u o () + εη() keeping (4.38) satisfied. Using (3.52) this change is ∆ ˆ J = ˆ J(u o () + εη(); λ(), ν, x 0 ) − ˆ J(u o (); λ(), ν, x 0 ) = t f t 0 [εH x (x o (t), u o (t), λ(t), t)z(t; η()) + ε ˙ λ T (t)z(t; η()) (4.46) + εH u (x o (t), u o (t), λ(t), t)η(t) +O(t; ε)] dt −ελ T (t f )z(t f ; η()) +εφ x (x o (t f ))z(t f ; η()) + εν T ψ x (x o (t f ))z(t f ; η()) +O(ε). 124 Chapter 4. Terminal Equality Constraints Now, if we set − ˙ λ T (t) = H x (x o (t), u o (t), λ(t), t), (4.47) λ T (t f ) = φ x (x o (t f )) + ν T ψ x (x o (t f )), we see that a sufficient condition for the change in J to be nonnegative (in fact, zero) to first order in ε is that H u (x o (t), u o (t), λ(t), t) = 0 ∀ t in [t 0 , t f ]. (4.48) Remark 4.3.1 Although the derivation of the change in ˆ J is similar to that given in Chapter 3, because no terminal constraints had to be maintained in the presence of arbitrary variations, we could conclude that the first-order conditions were necessary for weakly local optimality. Here, because we have derived a sufficient condition for weak first-order optimality, it was not necessary to actually construct a perturbation εη() which maintains satisfaction of (4.38). All that was necessary was to note that if (4.48) holds, then J is nonnegative to first order for any weak perturbation that maintains satisfaction of (4.38). Before we state our results, we introduce the notion of normality. Remark 4.3.2 (Normality) In the derivations of Pontryagin’s Principle as a necessary condition for optimality, to which we refer, the Hamiltonian H is defined as H(x, u, λ, λ 0 , t) = λ 0 L(x, u, t) + λ T f(x, u, t), (4.49) where λ 0 ≥ 0, (4.50) and whenever φ or one of its derivatives appears in a formula, it is premultiplied by λ 0 . The statement of the principle is then that if u o () minimizes J subject to the 4.3. Weak First-Order Optimality 125 dynamic system, control and terminal constraints, then there exist λ 0 ≥ 0 and ν not all zero, such that certain conditions hold. If on an optimal path the principle can be satisfied upon setting λ 0 = 1 (any positive λ 0 can be normalized to unity because of the homogeneity of the expressions involved), the problem is referred to as “normal.” Throughout this book we assume that when we refer to statements and proofs of Pontryagin’s Principle as a necessary condition for optimality, the problem is normal; consequently, λ 0 does not appear in any of our theorems. It is evident from the proofs of the necessary conditions in Chapter 3 that every free end-point control problem is normal and the control problem with linear dynamics and linear terminal constraints is normal if Assumption 4.2.3 is satisfied. Theorem 4.3.1 Suppose that Assumptions 3.2.3, 3.2.4, 3.3.1, 3.3.2, 4.2.1, and 4.2.4 are satisfied. Suppose further that u o () ∈ | T minimizes the performance criterion (3.2) subject to (3.41) and (4.38). Then there exists a p-vector ν such that H u (x o (t), u o (t), λ(t), t) = 0 ∀ t in [t 0 , t f ], (4.51) where − ˙ λ T (t) = H x (x o (t), u o (t), λ(t), t), (4.52) λ T (t f ) = φ x (x o (t f )) + ν T ψ x (x o (t f )), (4.53) and H(x, u, λ, t) = L(x, u, t) + λ T f(x, u, t). (4.54) Moreover, the above condition is sufficient for J to be weakly first-order optimal at u o (). 126 Chapter 4. Terminal Equality Constraints Proof: A rigorous proof that Pontryagin’s Principle is a necessary condition for optimality, as stated in Section 4.2.3, is beyond the scope of this book. Rigorous proofs are available in [44, 10, 25, 38]. Upon assuming that the problem is normal, the above conditions result. The second part of the theorem is proved in Section 4.3.1. Example 4.3.1 (Rocket launch example) Let us consider the problem of maximizing the terminal horizontal velocity component of a rocket in a specified time [t 0 , t f ] subject to a specified terminal altitude and a specified terminal vertical velocity component. This launch is depicted in Figure 4.1. h max v u r T x 2 d Figure 4.1: Rocket launch example. A simplified mathematical model of the vehicle is ˙ x 1 (t) = ˙ r = x 3 (t), ˙ x 2 (t) = ˙ h = x 4 (t), ˙ x 3 (t) = ˙ v = T cos u(t), ˙ x 4 (t) = ˙ w = T sin u(t) −g, (4.55) 4.3. Weak First-Order Optimality 127 where x 1 (t) is the horizontal component of the position of the vehicle, at time t, t 0 ≤ t ≤ t f , x 2 (t) is the altitude or vertical component of its position, x 3 (t) is its horizontal component of velocity, x 4 (t) is its vertical component of velocity, u(t) is the inclination of the rocket motor’s thrust vector to the horizontal, g is the (constant) gravitational acceleration, and T is the constant specific thrust of the rocket motor. We suppose the following initial conditions for the rocket at time t = t 0 : x 1 (t 0 ) = x 10 , x 2 (t 0 ) = x 20 , (4.56) x 3 (t 0 ) = 0, x 4 (t 0 ) = 0. The problem, then, is to determine u() in the interval [t 0 , t f ] to minimize J(u(); x 0 ) = −x 3 (t f ) (4.57) subject to the terminal constraints x 2 (t f ) = x 2d , x 4 (t f ) = x 4d , (4.58) where x 2d and x 4d are, respectively, the desired altitude and vertical velocity. From (4.54) we have H(x, u, λ, t) = λ 1 x 3 + λ 2 x 4 + λ 3 T cos u + λ 4 T sin u −λ 4 g, (4.59) and from (4.52) and (4.53) − ˙ λ 1 (t) = 0, λ 1 (t f ) = 0, − ˙ λ 2 (t) = 0, λ 2 (t f ) = ν 2 , − ˙ λ 3 (t) = λ 1 (t), λ 3 (t f ) = −1, − ˙ λ 4 (t) = λ 2 (t), λ 4 (t f ) = ν 4 . (4.60) 128 Chapter 4. Terminal Equality Constraints From (4.60) we conclude that λ 1 (t) = 0, λ 2 (t) = ν 2 , λ 3 (t) = −1, λ 4 (t) = ν 4 + (t f −t)ν 2 (4.61) for t in [t 0 , t f ]. From (4.59) we have H u (x, u, λ, t) = −λ 3 T sin u + λ 4 T cos u. (4.62) Therefore, using (4.62), (4.51) is satisfied if and only if u o (t) = arctan ¸ λ 4 (t) λ 3 (t) = arctan [−ν 4 −(t f −t)ν 2 ] . (4.63) Equation (4.63) gives the form of the control as a function of t, which satisfies Theorem 4.3.1. Naturally, the free parameters ν 2 and ν 4 have to be chosen so that when (4.63) is applied to (4.55), the terminal equality constraints (4.58) are satisfied. What is important to note here is that the form of u() has been determined, which is necessary for the minimization of (4.57). However, the optimal solution needs to be resolved numerically. Numerical Optimization with Terminal Constraints: The Penalty Function Approach When terminal constraints (4.38) are present, the two-point boundary-value problem outlined in Section 3.3.4 is more complex because of the presence of the unknown parameters ν in the expression for λ(t). Whereas in Section 4.2.2 it was possible to find an explicit expression determining ν, in general this is not possible for nonlinear constraints. We conclude that, given a candidate pair (˜ u(), ˜ x()) which satisfies (4.38), it is not possible to check directly whether (4.51) is satisfied, as was the case 4.3. Weak First-Order Optimality 129 in Section 3.3.4; rather, one would have to determine, numerically, whether there exists a ν which, when used in (4.52) and (4.54), yields (4.51). Similar remarks hold when constructing a control function ˜ u() which satisfies (4.38) and Pontryagin’s Principle. Here, one would have to resort to numerical techniques to determine both ˜ u() and ν. As indicated for a special case in Section 4.2.2, the class of linear quadratic optimal control problems allows an explicit solution; this is discussed further in Chapter 5. It turns out that engineers quite often sidestep the difficulty of determining the additional parameters ν by converting the control problem with a terminal constraint (4.38) into an approximately equivalent unconstrained one by adding the term ρψ T (x(t f ))ψ(x(t f )) to J to form a new performance criterion, ¯ J(u(); ρ, x 0 ) = J(u(); x 0 ) + ρψ T (x(t f ))ψ(x(t f )), (4.64) where ρ > 0. Minimization of (4.64) with respect to u(), for ρ sufficiently large, causes (4.38) to be approximately satisfied. Indeed, it can be shown that under fairly weak assumptions, the minimum of (4.64) with respect to u() tends to the minimum of (3.2) subject to (4.38), as ρ →∞; this is the so-called penalty function method of satisfying terminal equality constraints [11]. Steepest Descent Approach to Terminally Constrained Optimization Problems The following steepest descent algorithm [11] is a method for finding a constrained minimum. Choose a nominal control u N (t) and let x N (t) be the resulting path. This path satisfies the nonlinear dynamic equation ˙ x N (t) = f(x N (t), u N (t), t) ⇒x N (). (4.65) This path may be nonoptimal and the terminal constraint may not be satisfied, ψ(x N (t f )) = 0. 130 Chapter 4. Terminal Equality Constraints Consider perturbations in u N (t) as u N+1 () = u N () + δu(), (4.66) where δu() = εη(). Then, the first-order term in the Taylor expansion of the dynamics is δ ˙ x(t) = f x (x N (t), u N (t), t)δx(t) + f u (x N (t), u N (t), t)δu(t), (4.67) where δx(t) = εz(t; η()). The objective is to predict how the control perturbation δu(t) will affect the cost criterion and the terminal constraint. To construct these predictions, consider the influence function λ ψ (t) ∈ R n×p associated with the terminal constraint functions as ˙ λ ψ (t) = −f T x (x N (t), u N (t), t)λ ψ (t), λ ψ (t f ) = ψ T x (t f ); (4.68) then, the change in the terminal constraint is δψ(t f ) = ψ x (t f )δx(t f ) = λ ψ T (t 0 )δx(t 0 ) + t f t 0 λ ψ T (t)f u (x N (t), u N (t), t)δu(t) dt, (4.69) where for x(t 0 ), fixed δx(t 0 ) = 0. Similarly, the change in the performance index is J(u N+1 (), x o ) −J(u N (), x o ) = δJ = φ x δx(t f ) + t f t 0 (L x (x N (t), u N (t), t)δx(t) + L u (x N (t), u N (t), t)δu(t))dt (4.70) = λ φ (t 0 ) T δx(t 0 ) + t f t 0 (L u (x N (t), u N (t), t) +λ φ (t) T f u (x N (t), u N (t), t))δu(t) dt, (4.71) where ˙ λ φ (t) = −f T x (x N (t), u N (t), t)λ φ (t) −L T x (x N (t), u N (t), t), λ φ (t f ) = φ x (t f ) T , (4.72) 4.3. Weak First-Order Optimality 131 and the zero term t f t 0 d dt λ φ T (t)δx(t) dt − t f t 0 (−L x (x N (t), u N (t), t)δx(t) +λ φ T (t)f u (x N (t), u N (t), t)δu(t))dt = 0 (4.73) was subtracted from (4.70) to obtain, after some manipulations, (4.71). To make an improvement in δJ and decrease the constraint violation, adjoin δψ(t f ) to δJ with a Lagrange multiplier ν ∈ R p as δJ + ν T δψ(t f ) = λ φ T (t 0 ) + ν T λ ψ T (t 0 ) δx(t 0 ) + t f t 0 λ φ T (t) + ν T λ ψ T (t) (t)f u (x N (t), u N (t), t) + L u (x N (t), u N (t), t) δu(t) dt, (4.74) where ν is chosen so that a desired change in δψ(t f ) is met. Choose δu(t) = − λ φ T (t) + ν T λ ψ T (t) f u (x N (t), u N (t), t) + L u (x N (t), u N (t), t) T , (4.75) ψ x (x N (t f ))δx(t f ) = − t f t 0 λ ψ T (t)f u (x N (t), u N (t), t)f T u (x N (t), u N (t), t) λ φ T (t) + ν T λ ψ T (t) T dt − t f t 0 λ ψ (t) T f u (x N (t), u N (t), t)L T u (x N (t), u N (t), t)dt, (4.76) then ψ x (x N (t f ))δx(t f ) + t f t 0 λ ψ T (t)f u (x N (t), u N (t), t)L T u (x N (t), u N (t), t)dt + t f t 0 λ ψ T (t)f u (x N (t), u N (t), t)f T u (x N (t), u N (t), t)λ φ (t)dt = − t f t 0 λ ψ T (t)f u (x N (t), u N (t), t)f T u (x N (t), u N (t), t)λ ψ (t)dt ν. (4.77) 132 Chapter 4. Terminal Equality Constraints Solve for ν as ν = − ¸ t f t 0 λ ψ T (t)f u (x N (t), u N (t), t)f T u (x N (t), u N (t), t)λ ψ (t)dt −1 ¸ ψ x (x N (t f ))δx(t f )/ + t f t 0 λ ψ T (t)f u (x N (t), u N (t), t)L T u (x N (t), u N (t), t)dt + t f t 0 λ ψ T (t)f u (x N (t), u N (t), t)f T u (x N (t), u N (t), t)λ φ (t)dt , (4.78) where the inverse exists from Assumption 4.2.3 and D = ψ x (x N (t f )). If δψ(x N (t f )) = ψ x δx(t f ) = 0 and δx(t 0 ) = 0, then note from (4.69) that the functions λ ψ T (t)f u (x N (t), u N (t), t) and (λ φ T (t) + ν T λ ψ T (t))f u (x N (t), u N (t), t) + L u (x N (t), u N (t), t) are orthogonal. In any case, δ ˆ J = δJ + νδψ(t f ) = − t f t 0 λ φ T (t) + ν T λ ψ T (t) f u (x N (t), u N (t), t) +L u (x N (t), u N (t), t) 2 dt, (4.79) and therefore the cost becomes smaller on each iteration. The above algorithm is summarized in the following steps: 1. Choose the nominal control u N (t) over the interval t ∈ [t 0 , t f ]. 2. Integrate (4.65) forward from the initial conditions to the terminal time t f to obtain the nominal state x N (t). Store the values over the trajectory. 3. Integrate (4.68) and (4.72) backward from its terminal condition to the initial time t 0 and store the values over the trajectory. 4. Choose the desired change in the terminal constraint δψ(t f ) and the desired change in the cost through the choice of . These choices are made small enough to retain the assumed linearity but large enough for the algorithm to converge quickly. 4.4. Strong First-Order Optimality 133 5. Compute ν from (4.78). The integrals in (4.78) can be computed along with the influence functions obtained from (4.68) and (4.72). 6. Compute the perturbation δu(t) from (4.75) and form a new nominal control as given in (4.66) and repeat step 2. Check to see if the actual change in the constraints and cost are close to the predicted values of the perturbed constraint (4.69) and perturbed cost (4.71) based on the assumed linear perturbation theory. This is where the choices of and δψ(t f ) are checked to determine if they are made small enough to retain the assumed linearity but large enough for the algorithm to converge quickly. 4.4 Strong First-Order Optimality It is clear that if H(x o (t), u o (t), λ(t), t) ≤ H(x o (t), u(t), λ(t), t) ∀ u(t) ∈ | T , (4.80) then (4.51) is satisfied. Consequently, (4.51)–(4.54) with (4.51) replaced by (4.80) are also sufficient conditions for J to be weakly first-order optimal at u o (). It is clear also that (4.80) ensures that J is optimal to first order for a perturbation η(; ε) which is made up of a weak perturbation εη() and a strong perturbation (Section 3.4) which maintains (4.38), i.e., a perturbation of the form η(t; ε) = η(t), ¯ t ≤ t ≤ ¯ t + ε, εη(t) in [t 0 , ¯ t) and ( ¯ t + ε, t f ]. (4.81) However, there are more elaborate strong perturbations than (4.81). Indeed, we shall permit the larger class of strong perturbations η(; ε) given by η(t; ε) = η(t), t i ≤ t ≤ t i + εδ i , i = 1, . . . , N, εη(t), t ∈ I = [t 0 , t f ] − n ¸ i=1 [t i , t i + εδ i ] , (4.82) 134 Chapter 4. Terminal Equality Constraints where η() is continuous on the nonoverlapping intervals [t i , t i +εδ i ], i = 1, . . . , N, and is piecewise continuous on the remaining subintervals of [t 0 , t f ]. This class is much larger than (4.81), since N, δ i > 0, and η() are arbitrary. This richer class is used to satisfy the terminal constraints while optimizing the cost criterion. The intervals [t i , t i +εδ i ], although nonoverlapping, may be contiguous. We then have the following definitions. Definition 4.4.1 J of (3.2) is strongly first-order optimal subject to (3.41) and (4.38) at u o () if the first-order change in J, caused by any strong perturbation η(; ε) of the form (4.82), which maintains (4.38), is nonnegative. Definition 4.4.2 J of (3.2) is strongly locally optimal subject to (3.41) and (4.38) at u o () if there exists an ¯ ε > 0 such that the change in J is nonnegative for all perturbations η(; ε) of u o () which maintain (4.38), and which satisfy |ξ(t; η(; ε))| ≤ ¯ ε, for all t in [t 0 , t f ]. Note that, similar to the weak first-order optimality, strong first-order optimality does not usually imply strong local optimality. Strong first-order optimality merely implies that if J is not strongly locally optimal at u o (), this must be because of higher-order, and not first-order, effects. We have the expression ˆ J(u(); λ(), ν, x 0 ) = t f t 0 H(x(t), u(t), λ(t), t) + ˙ λ T (t)x(t) dt +φ(x(t f )) + ν T ψ(x(t f )) + λ T (t 0 )x 0 −λ T (t f )x(t f ), (4.83) which is equal to J(u(); x 0 ) whenever (3.41) and (4.38) are satisfied. We now evaluate the change in ˆ J (i.e., in J) caused by changing u o () to u o () + η(; ε) while keeping 4.4. Strong First-Order Optimality 135 (4.38) satisfied. ∆ ˆ J = ˆ J(u o () + η(; ε); λ(), ν, x 0 ) − ˆ J(u o (); λ(), ν, x 0 ) = t f t 0 H(x o (t) + ξ(t; η(; ε)), u o (t) + η(t; ε), λ(t), t) −H(x o (t), u o (t), λ(t), t) + ˙ λ T (t)ξ(t; η(; ε)) dt +φ(x o (t f ) + ξ(t f ; ε)) −φ(x o (t f )) (4.84) +ν T ψ(x o (t f ) + ξ(t f ; η(; ε)) −ν T ψ(x o (t f )) −λ T (t f )ξ(t f ; η(; ε)). Along the lines of Sections 3.3 and 3.4, one can show that ξ(t; η(; ε)) = εz(t; η()) +O(t, ε) ∀ t in [t 0 , t f ]. (4.85) Using this and (4.52), one can expand (4.84), as in Sections 3.3 and 3.4, to obtain ∆ ˆ J = ε N ¸ i=1 δ i [H(x o (t i + εδ i ), u o (t i + εδ i ) + η(t i + εδ i ), λ(t i + εδ i ), t i + εδ i ) − H(x o (t i + εδ i ), u o (t i + εδ i ), λ(t i + εδ i ), t i + εδ i )] +ε I H u (x o (t), u o (t), λ(t), t)η(t)dt + t f t 0 O(t; ε)dt +O(ε), (4.86) where the interval I is defined in (4.82). All the functions have been assumed continuously differentiable in all their arguments. In this way we can find a weak variation as u() +εη() ∈ | T that satisfies the variation in the terminal constraints. It follows that (4.86) is nonnegative to first order in ε if (4.80) holds. Note that (4.80) implies H u (x o (t), u o (t), λ(t), t) = 0. We have thus proved the second part of Theorem 4.4.1, stated below in this section. However, this assumed continuous differentiability of H(x o (t), u o (t), λ(t), t)η(t) is more restrictive than needed to prove strong first-order optimality with terminal constraints. We now assume that H(x, u, λ, t) is differentiable in x, λ, and t but only continuous in u. That is, L(x, u, t) and f(x, u, t) are assumed continuous and not differentiable in u, but u() + η() ∈ | T . 136 Chapter 4. Terminal Equality Constraints The objective is to remove the assumed differentiability of H with respect to u in Equation (4.86). The change in ˆ J due to changing u o () to u o () +η(; ε) ∈ | T , for η(; ε) defined in (4.82), is given in (4.84). Due to the continuous differentiability of f with respect to x, we again obtain the expansion (4.85). Using this expansion of the state, the change ∆ ˆ J becomes ∆ ˆ J = ε N ¸ i=1 δ i [H(x o (t i + εδ i ), u o (t i + εδ i ) + η(t i + εδ i ), λ(t i + εδ i ), t i + εδ i ) −H(x o (t i + εδ i ), u o (t i + εδ i ), λ(t i + εδ i ), t i + εδ i )] + I [H(x o (t), u o (t) + εη(t), λ(t), t) −H(x o (t), u o (t), λ(t), t) +εH x (x o (t), u o (t), λ(t), t)z(t; η()) + ε ˙ λ T (t)z(t; η())]dt +ε[φ x (x o (t f )) + ν T ψ x (x o (t f )) −λ(t f ) T ]z(t f ; η()) + t f t 0 O(t; ε)dt +O(ε). (4.87) In the above we used the result εH x (x o (t), u o (t) + εη(t), λ(t), t)z(t; η()) = εH x (x o (t), u o (t), λ(t), t)z(t; η()) +O(ε), (4.88) deduced by continuity. Let ˙ λ(t) = −H T x (x o (t), u o (t), λ(t), t), λ(t f ) = φ x (x o (t f )) + ν T ψ x (x o (t f )); (4.89) then δ ˆ J = ε N ¸ i=1 δ i [H(x o (t i + εδ i ), u o (t i + εδ i ) + η(t i + εδ i ), λ(t i + εδ i ), t i + εδ i ) − H(x o (t i + εδ i ), u o (t i + εδ i ), λ(t i + εδ i ), t i + εδ i )] + I [H(x o (t), u o (t) + εη(t), λ(t), t) −H(x o (t), u o (t), λ(t), t)] dt. (4.90) 4.4. Strong First-Order Optimality 137 For u o () + η(; ε) ∈ | T , a sufficient condition for δ ˆ J ≥ 0 is that H(x o (t), u(t), λ(t), t) −H(x o (t), u o (t), λ(t), t) ≥ 0 (4.91) or H(x o (t), u o (t), λ(t), t) ≤ H(x o (t), u(t), λ(t), t) (4.92) for u o (), u() ∈ | T . This completes the proof of Theorem 4.4.1, which is stated below. Theorem 4.4.1 Suppose that Assumptions 3.2.4, 3.3.2, 3.4.1, 4.2.1, and 4.2.4 are satisfied. Suppose further that u o () ∈ | T minimizes the performance criterion (3.2) subject to (3.41), and (4.38). Then there exists a p-vector ν such that H (x o (t), u o (t), λ(t), t) ≤ H (x o (t), u(t), λ(t), t) ∀ t in [t 0 , t f ], (4.93) where − ˙ λ T (t) = H x (x o (t), u o (t), λ(t), t), (4.94) λ T (t f ) = φ x (x o (t f )) + ν T ψ x (x o (t f )), (4.95) and H(x, u, λ, t) = L(x, u, t) + λ T f(x, u, t). (4.96) Moreover, the above condition is sufficient for J to be strongly first-order optimal at u o (). Since a sufficiency condition for strong first-order optimality is sought, then only Assumption 4.2.1 is required rather than an explicit assumption on controllability. Rigorous proofs of the necessity of Pontryagin’s Principle are available in [44, 10, 25, 38], and upon assuming normality, the above conditions result. The second part of the theorem is proved just before the theorem statement. 138 Chapter 4. Terminal Equality Constraints 4.4.1 Strong First-Order Optimality with Control Constraints If u(t) is required to satisfy (3.106), this does not introduce any difficulty into our straightforward variational proof that Pontryagin’s Principle is sufficient for strong first-order optimality. This follows because if (4.93) holds for u o (), u(t) ∈ | T , then the terms under the summation sign in (4.90) are nonnegative. The latter follows from (4.93) because H(x o (t), u o (t) + εη(t), λ(t), t) −H(x o (t), u o (t), λ(t), t) = εH u (x o (t), u o (t), λ(t), t)η(t) +O(ε), (4.97) where u o (t), u o (t) + εη(t) ∈ | T , and the first-order term dominates for small ε. The difficulty, described in Section 4.2.3, of deriving necessary conditions for optimality when terminal constraints are present is increased by the presence of (3.106) and the techniques used in [44, 10, 25, 38] are essential. Again, we define the set of bounded piecewise continuous control that meets the terminal constraints as | BT ⊂ | T . In view of the above, we can state the following theorem. Theorem 4.4.2 Suppose Assumptions 3.2.4, 3.3.2, 3.4.1, 4.2.1, and 4.2.4 are satisfied. Suppose further that u o () minimizes the performance criterion (3.2) subject to (3.41), (4.38) and (3.106). Then there exists a p-vector ν such that H(x o (t), u o (t), λ(t), t) ≤ H(x o (t), u(t), λ(t), t) ∀ t ∈ [t 0 , t f ], (4.98) where u o (t), u(t) ∈ | BT , (4.99) 4.4. Strong First-Order Optimality 139 and λ() and H are as in (4.94) and (4.96). Moreover, the above condition is sufficient for J to be strongly first-order optimal at u o (). Remark 4.4.1 We have considered bounds only on the control variable. The extension to mixed control and state constraints is straightforward and can be found in [11]. Problems with state-variable inequality constraints are more complex and are beyond the scope of this book. However, necessary conditions for optimality with state-variable inequality constraints can be found in [30]. Example 4.4.1 (Rocket launch example, revisited) We first return to the example of Section 4.3.1 to see what further insights Theorem 4.4.2 gives. Theorem 4.4.2 states that H should be minimized with respect to the control, subject to (4.99). In this example the control is unrestricted so minimization of H implies the necessary condition H u (x, u, λ, t) = −λ 3 T sin u + λ 4 T cos u = 0, (4.100) which implies that tan u o (t) = λ 4 (t) λ 3 (t) = −λ 4 (t) (4.101) because of (4.61). Now H uu (x, u, λ, t) = −λ 3 T cos u −λ 4 T sin u = T cos u −λ 4 T sin u. (4.102) Using the fact that, from (4.101), sin u o = −λ 4 cos u o , (4.103) 140 Chapter 4. Terminal Equality Constraints we see that H uu (x, u o , λ, t) = T cos u o + Tλ 2 4 cos u o = T(1 + λ 2 4 ) cos u o > 0 for − π 2 < u o < π 2 . (4.104) Inequality (4.104) implies that H has a local minimum with respect to u when − π 2 < u o < π 2 which satisfies (4.100). This local minimum value of H, ignoring the terms in (4.59) that do not depend on u, is obtained when u o is substituted into ˜ H(u) = −T cos u + λ 4 T sin u, (4.105) viz., ˜ H(u o ) = −T(1 + λ 2 4 ) cos u o < 0. (4.106) Now examine ˜ H 2 + ˜ H 2 u = ˜ H 2 + H 2 u = (−T cos u + λ 4 T sin u) 2 + (T sin u + λ 4 T cos u) 2 = T 2 (1 + λ 2 4 ), (4.107) which is independent of u o . From (4.107) we deduce that ˜ H 2 achieves its global maximum value when H u = 0, i.e., (4.100) holds. However, because (4.106) corresponds to ˜ H taking on a negative value, u o , given by (4.101), globally minimizes ˜ H and hence also H(x, u, λ, t). The conclusion is that (4.63) satisfies (4.98). Example 4.4.2 Our next illustrative example has a weak first-order optimum but not a strong first-order optimum. Consider J(u(); x 0 ) = min u t f t 0 u 3 (t)dt (4.108) 4.4. Strong First-Order Optimality 141 subject to ˙ x(t) = u(t), x(t 0 ) = 0, (4.109) and x(t f ) = 0, (4.110) where x and u are scalars. Here H(x, u, λ, t) = u 3 + λu (4.111) and − ˙ λ(t) = 0, λ(t f ) = ν. (4.112) From (4.111) H u (x, u, λ, t) = 3u 2 + λ, (4.113) and taking ν = 0 we see that u o (t) = 0 ∀ t in [t 0 , t f ] (4.114) causes H u (x o (t), u o (t), λ(t), t) = 0 ∀ t in [t 0 , t f ]. (4.115) It follows from Theorem 4.3.1 that J is weakly first-order optimal at u o (). Note, however, that u(t) does not minimize H(x(t), u(t), λ(t), t), which has no minimum with respect to u(t) because of the cubic term in u(t). Theorem 4.4.2 then implies that J is not strongly first-order optimal at u o (). Stated explicitly, min u H 142 Chapter 4. Terminal Equality Constraints as u → −∞ ⇒ J → −∞. This is easily confirmed directly by using the strong perturbation η(t; ε) = −(t f −t 0 −ε) for t 0 ≤ t ≤ t 0 + ε, ε for t 0 + ε < t ≤ t f (4.116) that is added to u o (t) = 0. This perturbation clearly causes ξ(t f ; ε) to be zero for all 0 < ε < t f −t 0 so that (4.110) is maintained. Furthermore, J(u o () + η(; ε); x 0 ) = t f t 0 η 3 (t; ε)dt = − t 0 +ε t 0 (t f −t 0 −ε) 3 dt + t f t 0 +ε ε 3 dt = −(t f −t 0 −ε) 3 ε + (t f −t 0 −ε)ε 3 = (t f −t 0 −ε)ε[ε 2 −(t f −t 0 −ε) 2 ], (4.117) which, for small 0 < ε < 1, J = −ε(t f − t 0 ) 3 + O(ε), verifying that (4.108) is not strongly first-order optimal at u o (). As (4.110) is satisfied for all values of ε, 0 < ε < t f − t 0 , we can also conclude that (4.108) does not have a strong local minimum at u o (). 4.5 Unspecified Final Time t f As in Section 3.6, we here allow φ and ψ to depend explicitly on the unspecified final time t f , viz., we write these functions as φ(x(t f ), t f ) and ψ(x(t f ), t f ) and assume that the first partial derivatives are continuous, i.e., as in the next assumption. Assumption 4.5.1 The function φ(, ) and the p-dimensional function ψ(, ) are once continuously differentiable in x and t. We now derive a sufficient condition for J(u(); x 0 , t f ) = φ(x(t f ), t f ) + t f t 0 L(x(t), u(t), t)dt (4.118) 4.5. Unspecified Final Time t f 143 to be strongly first-order optimal at u o (), t o f , subject to the constraint that (4.38) is satisfied. Note that weak and strong optimality are equivalent in the minimization of J with respect to the parameter t f . We perturb u o () to u o () +η(; ε) with η(; ε) defined in (4.82) and t o f to t o f +ε∆. If ∆ > 0, we define u o (t) + η(t; ε) in the interval (t o f , t o f + ε∆] as any continuous function with values in | T , and we require that the perturbed control causes ψ(x(t f ), t f ) = 0 (4.119) at t f = t o f + ε∆. Upon defining ˆ J(u(); λ(), ν, x 0 , t f ) = t f t 0 H(x(t), u(t), λ(t), t) + ˙ λ T (t)x(t) dt (4.120) +φ(x(t f ), t f ) + ν T ψ(x(t f ), t f ) + λ T (t 0 )x 0 −λ T (t f )x(t f ), we obtain, using (4.94), that (remembering η from (4.82)) ˆ J(u o () + η(; ε); λ(), ν, x 0 , t o f + ε∆) − ˆ J(u o (); λ(), ν, x 0 , t o f ) = ε N ¸ i=1 δ i [H(x o (t i + εδ i ), u o (t i + εδ i ) + η(t i + εδ i ), λ(t i + εδ i ), t i + εδ i ) − H(x o (t i + εδ i ), u o (t i + εδ i ), λ(t i + εδ i ), t i + εδ i )] +ε I H u (x o (t), u o (t), λ(t), t)η(t)dt +ε∆ φ t f (x o (t o f+ ), t o f+ ) + ν T ψ t f (x o (t o f+ ), t o f+ ) + H(x o (t o f+ ), u o (t o f+ ) + η(t o f+ ; ε), λ(t o f+ ), t o f+ ) + t f t 0 O(t; ε)dt +O(ε), (4.121) where t o f+ denotes the constant immediately to the right of t o f (i.e., the limit is taken of the functions as t → t o f from above). We can now state and prove the following theorem. 144 Chapter 4. Terminal Equality Constraints Theorem 4.5.1 Suppose that Assumptions 3.3.2, 3.4.1, 4.2.1, and 4.5.1 are satisfied. Suppose further that u o (), t o f minimize the performance criterion (4.118) subject to (3.41), (3.106) and (4.119). Then there exists a p-vector ν such that H(x o (t), u o (t), λ(t), t) ≤ H(x o (t), u(t), λ(t), t) ∀t ∈ [t 0 , t f ] (4.122) and the transversality condition Ω(x o (t o f ), u o (t o f ), ν, t o f ) = φ t f (x o (t o f ), t o f ) + ν T ψ t f (x o (t o f ), t o f ) +H(x o (t o f ), u o (t o f ), λ(t o f ), t o f ) = 0, (4.123) where u o (t), u(t) ∈ | BT (4.124) and H(x(t), u(t), λ(t), t) = L(x(t), u(t), t) + λ T (t)f(x(t), u(t), t), (4.125) − ˙ λ T (t) = H x (x o (t), u o (t), λ(t), t), λ T (t o f ) = φ x (x o (t o f ), t o f ) + ν T ψ x (x o (t o f ), t o f ). (4.126) Moreover, the above conditions are sufficient for J to be strongly first-order optimal at u o (), t o f . Proof: For ∆ > 0, (4.122) implies that the summation and the integral term in (4.121) are nonnegative and that, using (4.122), (4.123), and the continuity of H(x o (), u o (), λ(), ), the term in ε∆ is also nonnegative. For ∆ < 0 (4.121) is slightly different insofar as the ε∆ term is concerned, viz., it becomes −ε∆ φ t f (x o (t o f− ), t o f− ) + ν T ψ t f (x o (t o f− ), t o f− ) + H(x o (t o f− ), u o (t o f− ), λ(t o f− ), t o f− ) . Still, though, the conclusion is unchanged. 4.6. Minimum Time Problem Subject to Linear Dynamics 145 Rigorous proofs of the necessity of Pontryagin’s Principle are available in [44, 10, 25, 38]. Upon assuming normality, the above conditions result. 4.6 Minimum Time Problem Subject to Linear Dynamics Of special interest is the linear minimum time problem: Minimize the time t f (i.e., φ(x(t f ), t f ) = t f ) to reach the terminal point ψ(x(t f ), t f ) = x(t f ) = 0 (4.127) subject to the linear dynamic system ˙ x(t) = A(t)x(t) + B(t)u(t), x(t 0 ) = x 0 , (4.128) and the control constraint −1 ≤ u i (t) ≤ 1 ∀ t in [t 0 , t f ], i = 1, . . . , m. (4.129) In this special case, the variational Hamiltonian is H(x, u, λ, t) = λ T (t)A(t)x(t) +λ T (t)B(t)u(t). (4.130) From (4.130) we obtain − ˙ λ T (t) = λ T (t)A(t), λ T (t o f ) = ν T , (4.131) and u o i (t) = − sign B T i (t)λ(t) ∀ t ∈ [t 0 , t o f ], i = 1, . . . , m, (4.132) where sign [σ] is defined in (3.116). The optimal control u o () which satisfies (4.132) is referred to as a “bang-bang” control. It follows that (4.132) is well defined if (the controllability) Assumption 4.2.2 holds. We also have the important condition that, 146 Chapter 4. Terminal Equality Constraints from (4.123), 1 + ν T B(t o f )u o (t o f ) = 0. (4.133) Pontryagin’s Principle thus states that if a pair u o (), t o f minimizes J(u(); x 0 ) = t f (4.134) subject to (4.127)–(4.129), then there exists a ν such that (4.131), (4.132), and (4.133) are satisfied. In general, ν has to be determined numerically. Example 4.6.1 (Minimum time to the origin: The Bushaw problem) By way of illustration we particularize (4.128) and (4.129) to the Bushaw problem [11] ˙ x 1 (t) = x 2 (t), x 1 (0) = x 10 , x 1 (t f ) = 0, ˙ x 2 (t) = u(t), x 2 (0) = x 20 , x 2 (t f ) = 0, (4.135) and −1 ≤ u(t) ≤ 1. (4.136) In this case − ˙ λ 1 (t) = 0, λ 1 (t o f ) = ν 1 , − ˙ λ 2 (t) = λ 1 (t), λ 2 (t o f ) = ν 2 , (4.137) so that u o (t) = − sign λ 2 (t) = − sign [ν 2 + (t o f −t)ν 1 ], (4.138) and 1 + u o (t o f )ν 2 = 0. (4.139) Conditions (4.138) and (4.139) imply that ν 2 is ±1. 4.6. Minimum Time Problem Subject to Linear Dynamics 147 One can also immediately see from (4.138) that if u o () switches during the interval [t 0 , t o f ] it can switch only once. Thus, u o () either is constant (±1) for all t in [t 0 , t o f ] or is ±1 in the interval [t 0 , t s ] and ∓1 in the interval (t s , t f ], where t s is the switch time. If the origin cannot be reached from the given initial condition x 0 using a constant control (±1), then this option is ruled out and the one-switch bang-bang control is the only possibility. Upon examining the solutions of (4.135) with this form of switching control, it quickly becomes evident whether u o () should be +1 or −1 on [t 0 , t s ]. The switch time, t s , can then be calculated easily to ensure that x(t o f ) = 0. Then (4.139) can be verified. This is best demonstrated in the phase portrait shown in Figure 4.2, where the switch curves are parabolas (x 1 = −x 2 2 /2 for u = −1 and x 1 = x 2 2 /2 for u = 1). The other trajectories are translated parabolas. x 2 x 1 Region II u = −1 Region I u = 1 u = −1 u = 1 −4 −3 −2 −1 1 2 3 4 −3 −2 −1 1 2 3 Figure 4.2: Phase portrait for the Bushaw problem. 148 Chapter 4. Terminal Equality Constraints 4.7 Sufficient Conditions for Global Optimality: The Hamilton–Jacobi–Bellman Equation When terminal equality constraints (4.119) are present and t f can be specified or free, the Hamilton–Jacobi–Bellman (H-J-B) equation (3.123) can be generalized to −V t (x(t), t) = min u(t)∈U BT [L(x(t), u(t), t) + V x (x(t), t)f(x(t), u(t), t)] , (4.140) V (x(t f ), t f ) = φ(x(t f ), t f ) (4.141) for all x(t f ) and t f such that ψ(x(t f ), t f ) = 0 and is unspecified for all x(t f ) and t f such that ψ(x(t f ), t f ) = 0. We then have the following generalization of Theorem 3.5.1. Theorem 4.7.1 Suppose there exists a once continuously differentiable function V (, ) of x and t, t ∈ [t 0 , t f ], that satisfies (4.140) and is equal to φ(x(t f ), t f ) when ψ(x(t f ), t f ) = 0. Suppose further that the control u o (x(t), t) that minimizes H(x, u, V x , t) = L(x(t), u(t), t) + V x (x(t), t)f(x(t), u(t), t) (4.142) subject to the constraint u(t) ∈ | BT is such that Assumption 3.3.2 is satisfied. Then, under Assumptions 3.4.1, 4.2.1, and 4.5.1 the control function u o (x o (), ) minimizes (4.118) subject to (3.41), (3.106), and (4.119) and V (x 0 , t 0 ) is equal to the minimum value of (4.118). Proof: The proof follows as in Section 3.5, in particular that of Theorem 3.5.1. If we now draw u(), u() ∈ | BT , assuming (4.140) and (4.141) hold, then in the proof of Theorem 3.5.1, the value of the cost criterion ˆ J(u(); x 0 ) is still expressed by (3.140). Due to (4.140), the integrand of (4.141) is nonnegative and takes on its minimum value of zero when u(t) = u o (x(t), t) ∈ | BT , thus completing the proof. 4.7. Sufficient Conditions for Global Optimality 149 We illustrate the above theorem using the formulation in Section 4.2.1 with D = I. In this case the H-J-B equation is −V t (x(t), t) = min u(t) ¸ u T (t)u(t) + V x (x(t), t) [A(t)x(t) + B(t)u(t)] ¸ . (4.143) If we set V (x(t), t) = 2x T (t)Φ T (t f , t)W −1 (t 0 , t f )Φ(t f , t 0 )x 0 (4.144) − t f t x T 0 Φ T (t f , t 0 )W −1 (t 0 , t f )Φ(t f , τ)B(τ)B T (τ)Φ T (t f , τ)W −1 (t 0 , t f )Φ(t f , t 0 )x 0 dτ, we see that at t = t f , V (x(t f ), t f ) = 0 when x(t f ) = 0. As φ(x(t f )) = 0, this is as required by Theorem 4.7.1. Minimizing the right-hand side of (4.143) with respect to u(t) yields u o (t) = −B T (t)Φ T (t f , t)W −1 (t 0 , t f )Φ(t f , t 0 )x 0 , (4.145) which by (4.7) drives the linear dynamic system to the origin at t = t f , also as required by Theorem 4.7.1. The right-hand side of (4.143) then becomes 2x T (t)A T (t)Φ T (t f , t)W −1 (t 0 , t f )Φ(t f , t 0 )x 0 (4.146) −x T 0 Φ T (t f , t 0 )W −1 (t 0 , t f )Φ(t f , t)B(t)B T (t)Φ T (t f , t)W −1 (t 0 , t f )Φ(t f , t 0 )x 0 , which by (4.144) is just −V t (x(t), t); i.e., the H-J-B equation is satisfied. This proves by Theorem 4.7.1 that (4.145) is optimal. The minimum value of J(u(); x 0 ) = t f t 0 u T (t)u(t)dt (4.147) is then given by V (x 0 , t 0 ) which, from (4.144) and (4.4), reduces to V (x 0 , t 0 ) = x T 0 Φ T (t f , t 0 )W −1 (t 0 , t f )Φ(t f , t 0 )x 0 . (4.148) Note, for completeness, that V x (x(t), t) is just the λ T (t) given by (4.36). 150 Chapter 4. Terminal Equality Constraints Naturally, in more complicated problems it may be difficult to find an appropriate solution of (4.140) and (4.141). However, this is the price that one pays for attempting to derive a globally optimal control function for nonlinear problems with terminal constraints. Example 4.7.1 (Minimum time to the origin: Bushaw problem continued) Some additional insights are obtained by calculating the cost to go from a given point to the origin using the optimal control, i.e., the optimal value function V (x 1 , x 2 , t), for the Bushaw problem presented in Section 4.6.1. The time, t 1 , to reach the switch curve from a point (x 01 , x 02 ) off the switch curve in Region I (see Figure 4.2) is determined from t 2 1 2 + x 01 t 1 + x 02 = − (t 1 + x 02 ) 2 2 . (4.149) Once t 1 is obtained, we can obtain the x s1 = t 2 1 2 + x 01 t 1 + x 02 and x s2 = t 1 + x 02 on the switching curve, and then the time t 2 to the origin given by t 2 2 2 + x s1 t 2 + x s2 = 0. Summing the two times, t 1 + t 2 , we obtain the optimal value function: in Region I : V (x 1 , x 2 , t) = t −x 2 + [2(x 2 2 −2x 1 )] 1 2 , (4.150) in Region II : V (x 1 , x 2 , t) = t + x 2 + [2(x 2 2 + 2x 1 )] 1 2 , (4.151) where x 1 , x 2 , and t denote an arbitrary starting point. Figure 4.3 gives a plot of the optimal value function where x 2 is held fixed at −1.5 as shown in Figure 4.2 and x 1 is allowed to vary. Note that in going from Region I to II at x 1 = 1.125 the optimal value function is continuous, but not differentiable. Therefore, along the switch curve the H-J-B equation is not applicable. However, off the switch curves the H-J-B equation does apply. Note that for small changes in x 1 and x 2 across the switch curve produces a large change in ν 2 . Since the transversality condition 1 + u o ν 2 = 0 implies that in Region I with u o = +1, then ν 2 = −1, and in Region II with u o = −1, then ν 2 = 1. 4.7. Sufficient Conditions for Global Optimality 151 0 0.5 1 1.5 2 2.5 3 0 1 2 3 4 5 x 1 V ( x 1 , x 2 ) Figure 4.3: Optimal value function for the Bushaw problem. The optimal value function in the Bushaw example is not differentiable at the switch curves but does satisfy the H-J-B equation everywhere else. In the next chapter a normality condition requires that the system be controllable about any extremal path. Since there are no switches once on the switch curve, the path is not controllable. It has been suggested in [32] that along the switch curve the Lagrange multiplier λ(t) is related to the derivative of the unconstrained optimal value function along the switch curve and related to the derivative of the constrained optimal value function in Regions I and II given in (4.150) and (4.151). However, this lack of differentability over a manifold in certain classes of optimization problems has been one drawback for the application of H-J-B theory and the elevation of Pontryagin’s Principle [38], which although a local theory does not suffer from this pathological defect. 152 Chapter 4. Terminal Equality Constraints Problems 1. Consider the problem of finding u() that minimizes J = 3 0 [u[dt subject to ˙ x 1 = x 2 , x 1 (0) = 0, x 1 (3) = 2, ˙ x 2 = u, x 2 (0) = 0, x 2 (3) = 0, [u[ ≤ 1. Find the minimizing path. 2. Find the minimum time path, control, and terminal time for ˙ x 1 = u, x 1 (0) = 1, x 1 (t f ) = 0, ˙ x 2 = u 2 2 , x 2 (0) = 0, x 2 (t f ) = 1. 3. Consider a two-stage rocket where, after burnout (t = t b ) of the first stage, the second stage is to go into a prescribed orbit and the first stage must return to a prescribed landing site. Let the dynamic equations during the first-stage boost be ˙ x = f(x, u, t), x(0) = x 0 = given, t ∈ [0, t b ]. The dynamics after first-stage burnout (t = t b ) are ˙ x 1 = f 1 (x 1 , u 1 , t), ψ(x 1 (t f1 ), t f1 ) = 0, t ∈ (t b , t f1 ], ˙ x 2 = f 2 (x 2 , u 2 , t), ψ(x 2 (t f2 ), t f2 ) = 0, t ∈ (t b , t f2 ]. The problem is to maximize J = φ(x 2 (t f2 )) 4.7. Sufficient Conditions for Global Optimality 153 subject to the above constraints. At burnout, the states are continuous, i.e., x(t − b ) = x 1 (t + b ) = x 2 (t + b ). Determine the first-order necessary conditions for a weak local minimum. 4. Consider the Bushaw problem (free final time) with penalty functions on the terminal states [11]: min t f ,u γ 1 x 2 1 f + γ 2 x 2 2 f + t f , (4.152) where x 1 0 and x 2 0 are given. The state equations are ˙ x 1 = x 2 , (4.153) ˙ x 2 = u, (4.154) [u[ ≤ 1. (4.155) Determine the optimal final time and the optimal control history. Show that the solution converges to the solution to the Bushaw problem as γ 1 → ∞ and γ 2 →∞. 155 Chapter 5 Second-Order Local Optimality: The Linear Quadratic Control Problem Introduction Probably the most used result in optimal control theory is that of the solution to the linear quadratic (LQ) problem, where the dynamic equations and terminal constraints are linear and the performance criterion is a quadratic function of the state and control variables. The solution to this problem produces a control variable as a linear function of the state variables. This solution forms the basis of modern control synthesis techniques because it produces controllers for multi-input/multi-output systems for both time-varying and time-invariant systems [11, 1, 34, 2, 42]. Furthermore, this problem also forms the basis of the accessory minimum problem in the calculus of variations [11, 22, 6, 26, 19, 32, 20]. The presentation here unites previous results (see [1, 8, 11, 20, 34], among others) through a derivation which makes explicit use of the symplectic property of Hamiltonian systems. In this way, our derivation of the solution to the LQ problem is very natural. 156 Chapter 5. LQ Control Problem The LQ problem is especially important for determining additional conditions for local minimality. In particular, the second-order term in the expansion of the augmented cost criterion can be converted into an LQ problem called the Accessory Problem in the Calculus of Variations. A significant portion of this chapter is devoted to determining the necessary and sufficient conditions for the second variation in the cost, evaluated along a local optimal path, to be positive. Conditions are also given for the second variation to be strongly positive. This means that the second variation dominates over all other terms in the expansion of the cost. 5.1 Second Variation: Motivation for the Analysis of the LQ Problem Let us return to our original nonlinear dynamical system with general performance index, terminal constraints, and free final time. The equation of motion is ˙ x = f(x, u, t), x(t 0 ) = x 0 given. (5.1) The problem is min u(t)∈U T ,t f J, J = φ(x(t f ), t f ) + t f t 0 L(x(t), u(t), t)dt, (5.2) subject to the constraint ψ(x(t f ), t f ) = 0. As in earlier chapters, we include the equations of motion in the performance index through the use of a vector λ() of Lagrange multiplier functions and we append the final state constraints using a vector ν of Lagrange multipliers. The augmented performance index becomes ˆ J = φ(x(t f ), t f ) + ν T ψ(x(t f ), t f ) + t f t 0 H(x(t), u(t), λ(t), t) −λ T (t) ˙ x dt, (5.3) where the Hamiltonian H(, , , ) is defined as usual as H(x, u, λ, t) = L(x, u, t) + λ T f(x, u, t). 5.1. Motivation of the LQ Problem 157 Assumption 5.1.1 The functions f(, , ), L(, , ), φ(, ), and ψ(, ) are all twice differentiable with respect to their arguments. Assumption 5.1.2 There exists a pair of vector-valued functions and a scalar parameter (x o (), u o (), t o f ) that satisfy Equation (5.1) and minimize the performance index subject to the terminal constraints over all admissible functions u(). We assume that we have generated a locally minimizing path by satisfying the first-order necessary conditions, e.g., Theorem 4.5.1, which produce the extremal values (x o (), u o (), λ o (), ν o , t o f ). Letting the variations in the state, control, and multipliers be defined as ∆x = x −x o , ∆λ = λ −λ o , ∆u = u −u o , ∆ν = ν −ν o , ∆t f = t f −t o f , (5.4) where ∆ means a total variation and requiring that u() be an element of the set of admissible control functions, we expand the augmented cost criterion in a Taylor series as ∆ ˆ J = ˆ J(u(), t f ; x 0 , λ(), ν) − ˆ J(u o (), t o f ; x 0 , λ o (), ν o ) = t o f t 0 ∆H −∆(λ T ˙ x) dt + ∆ φ + ν T ψ t f =t o f = t o f t 0 H x ∆x + H u ∆u −λ T ∆˙ x + (H λ − ˙ x) T ∆λ dt + φ x + ν T ψ x t f =t o f ∆x +∆ν T ψ[ t f =t o f + Ω[ t f =t o f ∆t f + 1 2 t o f t 0 ∆x T H xx ∆x + 2∆x T H xu ∆u +∆u T H uu ∆u + 2 ∆x T H xλ ∆λ + ∆λ T H λu ∆u −∆λ T ∆˙ x dt + 1 2 ∆x T ∆ν T ∆t f φ xx + (ν T ψ) xx ψ T x Ω T x ψ x 0 Ω T ν Ω x Ω ν dΩ dt ¸ ¸ ¸ t f =t o f ∆x ∆ν ∆t f ¸ ¸ +H.O.T., (5.5) 158 Chapter 5. LQ Control Problem where Ω is defined in (4.123) and H.O.T. denotes higher-order terms, that is, terms that include three or more variational values multiplied together. Remark 5.1.1 There is no term ∆λ T H λλ ∆λ because H is linear in λ by definition. Recall the first-order necessary conditions for optimality (essentially from Theo- rem 4.5.1): ˙ x = H λ , H u = 0, ψ(x o (t o f ), t o f ) = 0, ˙ λ T = −H x , λ T (t o f ) = φ x + ν T ψ x t=t o f , Ω(x o (t o f ), u o (t o f ), ν, t o f ) = 0. (5.6) These cause the first-order terms in the expansion to be zero. As we have shown for ∆x(), the variations in ∆λ(), ∆ν, and ∆t f can also be expanded as ∆x() = z() +O(, ), ∆λ() = ˜ λ() +O(, ), ∆ν = ˜ ν +O(), ∆t f = ∆, ∆u() = η(). (5.7) Using these expansions and integrating by parts where necessary, the variation in the performance index becomes ∆ ˆ J = λ T (t 0 )z(t 0 ) + 2 1 2 t o f t 0 z T η T ¸ H xx H xu H ux H uu ¸ z η + 2 ˜ λ T (f x z + f u η − ˙ z) dt + 2 1 2 z T ˜ ν T ∆ φ xx + (ν T ψ) xx ψ T x Ω T x ψ x 0 Ω T ν Ω x Ω ν dΩ dt ¸ ¸ ¸ t f =t o f z ˜ ν ∆ ¸ ¸ + t o f t 0 O(t, 2 )dt +O( 2 ). (5.8) 5.1. Motivation of the LQ Problem 159 Since in our original problem the initial state is given, it is clear that ∆x(t 0 ) = z(t 0 ) = 0, making the term involving it disappear. Letting →0, we know that lim →0 O(, 2 ) 2 = 0 (5.9) so that δ 2 ˆ J = lim →0 ∆ ˆ J 2 = 1 2 t o f t 0 z T η T ¸ H xx H xu H ux H uu ¸ z η + 2 ˜ λ T (f x z + f u η − ˙ z) dt + 1 2 z T ˜ ν T ∆ φ xx + (ν T ψ) xx ψ T x Ω T x ψ x 0 Ω T ν Ω x Ω ν dΩ dt ¸ ¸ ¸ t=t o f z ˜ ν ∆ ¸ ¸ . (5.10) This second variation of the cost criterion is required to be positive definite for all variations about the assumed locally minimizing path (x o (), u o (), t o f ). If the second variation can be made negative, then there is another path neighboring to the optimal path that will give a smaller value of the cost criterion. However, this would contradict the assumed optimality of (x o (), u o (), t o f ) which satisfies only the first-order necessary conditions and would be an extremal path. From the second variation, which is equivalent to an LQ problem, conditions for minimality can be determined. We first find conditions for the fixed-time second variation without terminal constraints to be positive definite in Section 5.4. Then, conditions for positivity of the fixed-time second variation are given in Section 5.5 with terminal constraints and, finally, in Section 5.8 for free-time and with terminal constraints. As commonly presented, the LQ problem is a fixed-time problem. Therefore, the terminally constrained LQ problem is now presented where ∆ = 0. We will return to the more general problem in Section 5.8. Recalling our definitions of Equation (5.7), 160 Chapter 5. LQ Control Problem we expand the equation of motion (5.1) as ˙ x() = ˙ x o () + ˙ z() +O(, ) = f(x, u, t). (5.11) Expanding f about the nominal trajectory gives ˙ x o () + ˙ z() +O(, ) = f(x o , u o , ) + f x (x o , u o , )(z +O(, )) +f u (x o , u o , )η + H.O.T.. Subtracting out the zero quantity ˙ x o () = f(x o , u o , ) gives ˙ z() = f x (x o , u o , )z() + f u (x o , u o , )η() +O(, 2 ), (5.12) where we have noted that all higher-order terms in the expansion include to the second or higher power. Dividing through by and again letting → 0, we get the equation ˙ z() = f x (x o , u o , )z() + f u (x o , u o , )η(), z(t 0 ) = 0. (5.13) The initial condition comes from the requirement that ∆x(t 0 ) = z(t 0 ) +O(t 0 , ) = 0. Consider Equations (5.10) and (5.13). We see that Equation (5.10) is precisely the augmented performance index we would have obtained for the problem min η(·) J(η()), (5.14) where J(η()) = 1 2 t f t 0 z T η T ¸ H xx H xu H ux H uu ¸ z η dt + 1 2 z T (φ xx + (ν T ψ) xx )z t=t f (5.15) subject to the equations of motion (5.13) with the terminal constraint ψ x (x o (t f ))z(t f ) = 0. (5.16) 5.2. Preliminaries and LQ Problem Formulation 161 Further, we note that if the nominal trajectory is truly minimizing, then the solution to this problem must be η() ≡ 0. Otherwise, there would exist some control variation η() and resulting variation z() in the state history for which the second term in the expansion of the performance index would be less than that for the nominal trajectory. Therefore, the performance index as a whole would be less than that of the nominal trajectory. The minimization of Equation (5.15) with respect to Equations (5.13) and (5.16) is known as the Accessory Minimum Problem in the Calculus of Varia- tions and is formulated as an LQ problem. In the next section the LQ problem is stated with a simpler notation but follows from the second variation developed in this section. 5.2 Preliminaries and LQ Problem Formulation The problem of minimizing the quadratic performance criterion J(u(); x(t 0 ), t 0 ) = 1 2 x T (t f )S f x(t f ) + 1 2 t f t 0 [x T (t)Q(t)x(t) + 2u T (t)C(t)x(t) +u T (t)R(t)u(t)]dt (5.17) with respect to u() and where t ∈ [t 0 , t f ], x(t) ∈ R n , and u(t) ∈ R m , subject to the linear dynamic constraint ˙ x(t) = A(t)x(t) + B(t)u(t), (5.18) with initial condition x(t 0 ) = x 0 (5.19) and terminal constraints Dx(t f ) = 0, (5.20) 162 Chapter 5. LQ Control Problem where D is a p n matrix that will be studied in detail. The matrices Q(t), C(t), R(t), A(t), and B(t) are assumed to be piecewise continuous functions of time, and without loss of generality, Q(t) = Q T (t), R(t) = R T (t), and S f = S T f . Remark 5.2.1 Relate the quadratic cost criterion of Equation (5.15) with Equa- tion (5.17), the linear dynamics of Equation (5.13) with Equation (5.18), and the linear terminal constraint of Equation (5.16) with Equation (5.20). Therefore, the solution to the LQ problem of this section is the solution to the second variation or accessory minimum problem, when setting x 0 = 0. Assumption 5.2.1 The matrix R(t) > 0 and bounded for all t in the interval t 0 ≤ t ≤ t f . The implication of relaxing this assumption to positive semidefinite R is discussed in Section 5.4.6. Assumption 5.2.2 The control function u() belongs to the class | of piecewise continuous m-vector functions of t in the interval [t 0 , t f ]. Initially, no additional restrictions are required for Q(t) and S f other than symmetry. However, in later sections, special but important results are obtained by requiring that Q(t) and S f be at least positive semidefinite. Furthermore, the initial time, t 0 , on occasion throughout this chapter, is considered to be a variable and not a fixed value. 5.3 First-Order Necessary Conditions for Optimality In this section the first variation of the LQ problem is established. To include the dynamic and terminal constraints explicitly in the cost criterion, J (u(); x(t 0 ), t 0 ) (given by (5.17)) is augmented by adjoining (5.18) by means of a continuously 5.3. First-Order Necessary Conditions for Optimality 163 differentiable n-vector function of time λ() and (5.20) by means of a p-vector, ν, as ˆ J (u(); λ(), ν, x 0 , t 0 ) = J (u(); x 0 , t 0 ) + t f t 0 λ T (t) [A(t)x(t) + B(t)u(t) − ˙ x(t)] + ν T Dx(t f ). (5.21) Note that ˆ J (u(); λ(), ν, x 0 , t 0 ) = J (u(); x 0 , t 0 ) (5.22) when (5.18) and (5.20) hold. For convenience, define the variational Hamiltonian as H(x(t), u(t), λ(t), t) = 1 2 x T (t)Q(t)x(t) + 2u T (t)C(t)x(t) + u T (t)R(t)u(t) + λ T (t) [A(t)x(t) + B(t)u(t)] . (5.23) Integration of (5.21) by parts and using (5.23) ˆ J (u(); λ(), ν, x 0 , t 0 ) = t f t 0 H(x(t), λ(t)u(t), t) + ˙ λ T (t)x(t) dt + λ T (t 0 )x 0 −λ T (t f )x(t f ) + 1 2 x T (t f )S f x(t f ) + ν T Dx(t f ). (5.24) Suppose there is a control u o () ∈ | that minimizes (5.17) and causes Dx o (t f ) = 0. (5.25) The objective is to determine necessary conditions for which the cost is a minimum. This is done by evaluating changes in ˆ J brought about by changing u o () to u() ∈ |. Denote the change in the control as δu(t) = u(t) −u o (t) and the resulting change in the state as δx(t) = x(t) −x o (t). Following the methodology laid out in section 4.4, we require conditions that guarantee that the change in ˆ J is nonnegative for all admissible variations. First, the variations are limited to those for which δx remains small, i.e., | δx(t) |≤ ¯ ε for all t ∈ [t 0 , t f ]. However, strong variations in δu are 164 Chapter 5. LQ Control Problem allowed. For example, δu() can be chosen as δu(t) = ε(t)η(t), (5.26) where η(t) ∈ | and ε(t) = 1, t i ≤ t ≤ t i + εδ i , i = 1, . . . , n, ε, t ∈ I = [t 0 , t f ] − i [t i , t i + εδ i ] , (5.27) where ε > 0 is sufficiently small and ¦t i , t i + εδ i ¦ ∈ [t 0 , t f ] are arbitrary with small combined length. Therefore, ∆ ˆ J = ∆ ˆ J(u(), u o (); λ(), ν, x 0 , t 0 ) = ˆ J (u(); λ(), ν, x 0 , t 0 ) − ˆ J (u o (); λ(), ν, x 0 , t 0 ) = t f t 0 x o T (t)Q(t)δx(t) + u o T (t)C(t)δx(t) + x o T (t)C T (t)δu(t) + 1 2 δu T (t)R(t)δu(t) + u o T (t)R(t)δu(t) (5.28) + ˙ λ T (t)δx(t) + λ T (t)A(t)δx(t) + λ T (t)B(t)δu(t) dt −λ T (t f )δx(t f ) + x o T (t f )S f δx(t f ) + ν T Dδx(t f ) + t f t 0 O(t; ε)dt +O(ε), where the function O(t; ε) is piecewise continuous in t and O(t; ε) ε →0 as ε →0 for each t. (5.29) Now set − ˙ λ T (t) = x o T (t)Q(t) + u o T C(t) + λ T (t)A(t), λ T (t f ) = x o T S f + ν T D. (5.30) For fixed ν this is a legitimate choice for λ(), since (5.30) is a linear differential equation in λ(t) with continuous coefficients, having a unique solution [15]. 5.3. First-Order Necessary Conditions for Optimality 165 For ε sufficiently small, (5.28), having substituted in (5.30), is rewritten using (5.23) as ∆ ˆ J = t f t 0 ¸ 1 2 δu T (t)R(t)δu(t) + H u (x o (t), u o (t), λ(t), t)δu(t) dt + t f t 0 O(t; ε)dt +O(ε). (5.31) Since for t ∈ [t 0 , t f ] H uu = R(t) > 0, the strong form of the classical Legendre–Clebsch condition, the variation in the cost function ∆ ˆ J can be reduced if the second term is made negative. This is easily done by choosing η(t) ∈ | as η(t) = −R(t) −1 H u (x o (t), λ(t), u o (t), t) T . (5.32) Note that in arbitrary small time intervals where ε(t) = 1, this choice minimizes the integral in (5.31). This choice of η is particularly significant since it can be shown under an additional assumption that there exists a ν such that η() given by (5.32) causes (5.25) to hold. (Also see Section 4.3.1.) Assumption 5.3.1 The p p matrix ¯ W(t 0 , t f ) is positive definite where ¯ W(t 0 , t f ) = t f t 0 DΦ(t f , t)B(t)R −1 (t)B T (t)Φ T (t f , t)D T dt. (5.33) Remark 5.3.1 This assumption is equivalent to Assumption 4.2.3 and becomes the controllability condition [8] when p = n and D has rank n. From the linearity of (5.18), z(t; η()) satisfies the equation ˙ z(t; η()) = A(t)z(t; η()) + B(t)η(t), z(t 0 ; η()) = 0 (5.34) so that z(t f ; η()) = t f t 0 Φ(t f , t)B(t)η(t)dt. (5.35) 166 Chapter 5. LQ Control Problem By using (5.23) and (5.32) z(t f ; η()) = − t f t 0 Φ(t f , t)B(t)R −1 (t) R(t)u o (t) + C(t)x o (t) + B T (t)λ(t) dt. (5.36) The linear forced equation (5.30) is solved as λ(t) = Φ T (t f , t)S f x o (t f ) + Φ T (t f , t)D T ν + t f t Φ T (τ, t) Q(τ)x o (τ) + C T (τ)u o (τ) dτ, (5.37) and then, using (5.37) in (5.36), an equation explicit in ν results. Premultiplying (5.36) by D leads to the desired equation that satisfies the terminal constraints in the presence of variations in state and control as Dz(t f ; η()) = − t f t 0 DΦ(t f , t)B(t)R −1 (t) ¸ R(t)u o (t) + C(t)x o (t) + B T (t)Φ T (t f , t)S f x o (t f ) +B T (t) t f t Φ T (τ, t) Q(τ)x o (τ) + C T (τ)u o (τ) dτ dt − ¸ t f t 0 DΦ(t f , t)B(t)R(t) −1 B T (t)Φ T (t f , t)D T dt ν = 0. (5.38) By Assumption 5.3.1, a unique value of ν can be obtained, independent of ε, which satisfies the constraints (5.20). Consequently, with this choice of η(), (5.22) holds, and the change in J is 7 ∆J = − I n 1 2 |H u | 2 R(t) −1dt −ε I [[H u [[ 2 R(t) −1dt + t f t 0 O(t; ε)dt +O(ε), (5.39) where I n = i [t i , t i + εδ i ] and the intervals over which the integrals are taken are given in (5.27). Note that O(t; ε) includes all higher-order variations such that (5.29) holds. Since the intervals are arbitrary, the control variations are not restricted to 7 [[H u [[ 2 R(t) −1 is the norm square of H u weighted by R(t) −1 , i.e., [[H u [[ 2 R(t) −1 = H T u R(t) −1 H u . 5.3. First-Order Necessary Conditions for Optimality 167 only small variations. Therefore, a necessary condition for the variation in the cost criterion, ∆J, to be nonnegative for arbitrary strong variations, δu, is that H u (x o (t), λ(t), u o (t), λ(t), t) = 0. (5.40) The above results are summarized in the following theorem. Theorem 5.3.1 Suppose that Assumptions 5.2.1, 5.2.2, and 5.3.1 are satisfied. Then the necessary conditions for ∆J to be nonnegative to first order for strong perturbations in the control (5.26) are that ˙ x o (t) = A(t)x o (t) + B(t)u o (t), x o (t 0 ) = x 0 , (5.41) ˙ λ(t) = −A T (t)λ(t) −Q(t)x o (t) −C T (t)u o (t), (5.42) λ(t f ) = S f x o (t f ) + D T ν, (5.43) 0 = Dx o (t), (5.44) 0 = R(t)u o (t) + C(t)x o (t) + B T (t)λ(t). (5.45) The remaining difficulty resides with the convexity of the cost associated with the neglected second-order terms. The objective of the remaining sections is to give necessary and sufficient conditions for optimality and to understand more deeply the character of the optimal solution when it exists. We have derived first-order necessary conditions for the minimization of the quadratic cost criterion. These necessary conditions form a two-point boundary-value problem in which the boundaries are linearly related through transition matrices. It will be shown that the Lagrange multipliers are linearly related to the state variables. Furthermore, the optimal cost criterion is shown to be represented by a quadratic function of the state. This form is reminiscent of the optimal value function used in the H-J-B theory of Chapter 3 to solve the LQ problem, and here we show how to extend the LQ problem to terminal constraints 168 Chapter 5. LQ Control Problem and free terminal time for the H-J-B theory given in Chapter 4. In this way we relate the Lagrange multiplier to the derivative of the optimal value function with respect to the state as given explicitly in Section 3.5 for terminally unconstrained optimization problems. This generalization of the optimal value function to include terminal constraints and free terminal time implies that the optimal solution to this general formulation of the LQ problem is not just a local minimum but a global minimum. 5.4 LQ Problem without Terminal Constraints: Transition Matrix Approach By applying the first-order necessary conditions of Theorem 5.3.1 to the problem of Equations (5.17) through (5.19), the resulting necessary conditions for the unconstrained terminal problem are given by (5.41) to (5.45), where in (5.43) D = 0. By Assumption 5.2.1, the extremal u o (t) can be determined from (5.45) as a function of x o (t) and λ(t) as u o (t) = −R −1 (t) C(t)x o (t) + B T (t)λ(t) . (5.46) Substitution of u o (t) given by (5.46) into (5.41) and (5.42) results in the linear homogeneous 2n-vector differential equation ¸ ˙ x o (t) ˙ λ(t) = ¸ A(t) −B(t)R −1 (t)C(t) −B(t)R −1 (t)B T (t) −Q(t) + C T (t)R −1 (t)C(t) −(A(t) −B(t)R −1 (t)C(t)) T ¸ x o (t) λ(t) . (5.47) Equation (5.47) is solved as a two-point boundary-value problem where n conditions in (5.41) are given at the initial time and n conditions are specified at the final time as λ(t f ) = S f x o (t f ), i.e., (5.43) with D = 0. 5.4. Transition Matrix Approach with No Terminal Constraints 169 For convenience, define the Hamiltonian matrix as H(t) = ¸ A(t) −B(t)R −1 (t)C(t) −B(t)R −1 (t)B T (t) −Q(t) + C T (t)R −1 (t)C(t) −(A(t) −B(t)R −1 (t)C(t)) T ¸ (5.48) and the transition matrix (see Section A.3) associated with the solution of d dt Φ H (t, τ) = ˙ Φ H (t, τ) = H(t)Φ H (t, τ) (5.49) in block-partitioned form is Φ H (t, τ) = ¸ Φ 11 (t, τ) Φ 12 (t, τ) Φ 21 (t, τ) Φ 22 (t, τ) ¸ , (5.50) where t is the output (or solution) time and τ is the input (or initial) time. Using this block-partitioned transition matrix, the solution to (5.47) is represented as ¸ x o (t f ) λ(t f ) ¸ = ¸ x o (t f ) S f x o (t f ) ¸ = ¸ Φ 11 (t f , t 0 ) Φ 12 (t f , t 0 ) Φ 21 (t f , t 0 ) Φ 22 (t f , t 0 ) ¸¸ x o (t 0 ) λ(t 0 ) ¸ . (5.51) The objective is to obtain a unique relation between λ(t 0 ) and x o (t 0 ). From (5.51), the first matrix equation gives x o (t f ) = Φ 11 (t f , t 0 )x o (t 0 ) + Φ 12 (t f , t 0 )λ(t 0 ). (5.52) The second matrix equation of (5.51), using (5.52) to eliminate x o (t f ), becomes S f [Φ 11 (t f , t 0 )x o (t 0 ) + Φ 12 (t f , t 0 )λ(t 0 )] = Φ 21 (t f , t 0 )x o (t 0 ) + Φ 22 (t f , t 0 )λ(t 0 ). (5.53) By solving for λ(t 0 ), assuming the necessary matrix inverse exists, λ(t 0 ) = [S f Φ 12 (t f , t 0 ) −Φ 22 (t f , t 0 )] −1 [Φ 21 (t f , t 0 ) −S f Φ 11 (t f , t 0 )]x o (t 0 ). (5.54) The invertibility of this matrix is crucial to the problem and is discussed in detail in the following sections. The result (5.54) is of central importance to the LQ theory 170 Chapter 5. LQ Control Problem because this allows the optimal control (5.46) to be expressed as an explicit linear function of the state. For convenience, define S(t f , t; S f ) = [S f Φ 12 (t f , t) −Φ 22 (t f , t)] −1 [Φ 21 (t f , t) −S f Φ 11 (t f , t)]. (5.55) If the transition matrix is evaluated over the interval [t 0 , t], where t 0 ≤ t ≤ t f , then ¸ x o (t) λ(t) ¸ = Φ H (t, t 0 ) ¸ I S(t f , t 0 ; S f ) ¸ x o (t 0 ). (5.56) Substitution of (5.56) into (5.46) results in the general optimal control rule u o (t) = −R −1 (t) C(t) B T (t) Φ H (t, t 0 ) ¸ I S(t f , t 0 ; S f ) ¸ x o (t 0 ). (5.57) This control rule can be interpreted as a sampled data controller if x o (t 0 ) is the state measured at the last sample time t 0 and t, the present time, lies within the interval [t 0 , t 0 + ∆], where ∆ is the time between samples. If t 0 is considered to be the present time, t, then (5.57) reduces to the optimal control rule u o (t) = −R(t) −1 [C(t) + B T (t)S(t f , t; S f )]x o (t). (5.58) This linear control rule forms the basis for LQ control synthesis. Our objective is to understand the properties of this control rule. More precisely, the character of S(t f , t; S f ) and the transition matrix Φ H (t, t 0 ) are to be studied. Beginning in the next subsection, some properties peculiar to Hamiltonian systems are presented. 5.4.1 Symplectic Properties of the Transition Matrix of Hamiltonian Systems The transition matrix of the Hamiltonian system [37, 46] has the useful symplectic property, defined as Φ H (t, t 0 )JΦ T H (t, t 0 ) = J, (5.59) 5.4. Transition Matrix Approach with No Terminal Constraints 171 where J = ¸ 0 I −I 0 (5.60) is known as the fundamental symplectic matrix. To show that the transition matrix of our Hamiltonian system satisfies (5.59), we time-differentiate (5.59) as ˙ Φ H (t, t 0 )JΦ T H (t, t 0 ) + Φ H (t, t 0 )J ˙ Φ T H (t, t 0 ) = 0. (5.61) Substitution of the differential equation for Φ H (t, t 0 ) (5.49) into (5.61) results in H(t)Φ H (t, t 0 )JΦ T H (t, t 0 ) + Φ H (t, t 0 )JΦ T H (t, t 0 )H T (t) = 0. (5.62) By using (5.59), (5.62) reduces to H(t)J + JH T (t) = 0, (5.63) where H(t) defined by (5.48) clearly satisfies (5.63). To obtain (5.59) by starting with the transpose of (5.63), multiplying it on the left by Φ H (t, t 0 ) and on the right by Φ T H (t, t 0 ), and then substituting in (5.49), we obtain the exact differential d dt Φ H (t, t 0 )J T Φ T H (t, t 0 ) = 0. (5.64) The integral equals a constant matrix which when evaluated at t 0 is J T . This results in the form Φ T H (t, t 0 )J T Φ H (t, t 0 ) = J T . (5.65) Taking the transpose of (5.65), multiplying it on the left by −Φ H (t, t 0 )J and on the right by −Φ −1 H (t, t 0 )J, and using JJ = −I, we obtain (5.59). 172 Chapter 5. LQ Control Problem We now consider the spectral properties of Φ H (t, t 0 ). Note that since J T = J −1 , then from (5.59) Φ −1 H (t, t 0 ) = J T Φ T H (t, t 0 )J. (5.66) Furthermore, the characteristic equations of Φ H (t, t 0 ) and Φ H (t, t 0 ) −1 are the same since det(Φ −1 H (t, t 0 ) −λI) = det(J T Φ T H (t, t 0 )J −λI) = det(J T Φ T H (t, t 0 )J −λJ T J) = det(Φ H (t, t 0 ) −λI). (5.67) The implication of this is that since Φ H (t, t 0 ) is a 2n 2n nonsingular matrix, if µ i , i = 1, . . . , n, are n eigenvalues of Φ H (t, t 0 ) with µ i = 0 for all i, then the remaining n eigenvalues are µ i+n = 1 µ i , i = 1, . . . , n. (5.68) By partitioning Φ H (t, t 0 ) as in (5.50), the following relations are obtained from (5.59): Φ 11 (t, t 0 )Φ T 12 (t, t 0 ) = Φ 12 (t, t 0 )Φ T 11 (t, t 0 ), (5.69) Φ 21 (t, t 0 )Φ T 22 (t, t 0 ) = Φ 22 (t, t 0 )Φ T 21 (t, t 0 ), (5.70) Φ 11 (t, t 0 )Φ T 22 (t, t 0 ) − Φ 12 (t, t 0 )Φ T 21 (t, t 0 ) = I. (5.71) These identities will be used later. 5.4.2 Riccati Matrix Differential Equation Instead of forming S(t f , t; S f ) in (5.55) by calculating the transition matrix, in this section we show that S(t f , t; S f ) can be propagated directly by a quadratic matrix 5.4. Transition Matrix Approach with No Terminal Constraints 173 differential equation called the matrix Riccati equation. This equation plays a key role in all the analyses of the following sections. A few preliminaries will help simplify this derivation as well as others in the coming sections. First, note that since Φ H (t, τ) is symplectic, then ¯ Φ H (t, τ) = LΦ H (t, τ) (5.72) with the symplectic matrix L = ¸ I 0 −S f I (5.73) is also a symplectic matrix. Therefore, the partitioned form of ¯ Φ H (t, τ) satisfies (5.69) to (5.71). Second, by using the propagation equation for the transition matrix (5.49), differentiation of the identity Φ H (t f , t)Φ H (t, t 0 ) = Φ H (t f , t 0 ) (5.74) with respect to t, where t f and t 0 are two fixed times, gives the adjoint form for propagating the transition matrix as d dt Φ H (t f , t) = −Φ H (t f , t)H(t), Φ H (t f , t f ) = I, (5.75) where input time is the independent variable and the output time is fixed. Therefore, d dt ¯ Φ H (t f , t) = − ¯ Φ H (t f , t)H(t), ¯ Φ H (t f , t f ) = L. (5.76) Finally, note from (5.55) and (5.72) that S(t f , t; S f ) = − ¯ Φ −1 22 (t f , t) ¯ Φ 21 (t f , t). (5.77) 174 Chapter 5. LQ Control Problem Theorem 5.4.1 Let Φ H (t f , t) be the transition matrix for the dynamic system (5.47) with partitioning given by (5.50). Then a symmetric matrix S(t f , t; S f ) satisfies d dt S(t f , t; S f ) = − A(t) −B(t)R −1 (t)C(t) T S(t f , t; S f ) −S(t f , t; S f ) A(t) −B(t)R −1 (t)C(t) − Q(t) −C T (t)R −1 (t)C(t) +S(t f , t; S f )B(t)R −1 (t)B T (t)S(t f , t; S f ), S(t f , t f ; S f ) = S f (5.78) if the inverse in (5.77) exists and S f is symmetric. Proof: If (5.77) is rewritten as − ¯ Φ 21 (t f , t) = ¯ Φ 22 (t f , t)S(t f , t; S f ) (5.79) and differentiated with respect to t, then − d dt ¯ Φ 21 (t f , t) = d dt ¯ Φ 22 (t f , t) S(t f , f; S f ) + ¯ Φ 22 (t f , t) d dt [S(t f , f; S f )] . (5.80) The derivatives for ¯ Φ 21 (t f , t) and ¯ Φ 22 (t f , t) are obtained from the partitioning of (5.76) as ¯ Φ 21 (t f , t)(A(t) −B(t)R −1 (t)C(t)) − ¯ Φ 22 (t f , t)(Q(t) −C T (t)R −1 (t)C(t)) = ¯ Φ 21 (t f , t)B(t)R −1 (t)B T (t) + ¯ Φ 22 (t f , t)(A(t) −B(t)R −1 (t)C(t)) T S(t f , t; S f ) + ¯ Φ 22 (t f , t) dS(t f ; S f ) dt . (5.81) Since ¯ Φ 22 (t f , t) is assumed invertible, then by premultiplying by ¯ Φ 22 (t f , t) −1 and using (5.77), the Riccati equation of (5.78) is obtained. Since S f is symmetric, and d dt S(t f , t; S f ) = ˙ S(t f , t; S f ) = ˙ S T (t f , t; S f ), then S(t f , t; S f ) will be symmetric. 5.4. Transition Matrix Approach with No Terminal Constraints 175 Remark 5.4.1 By the symplectic property (5.70), using (5.77), S(t f , t; S f ) = − ¯ Φ −1 22 (t f , t) ¯ Φ 21 (t f , t) = − ¯ Φ T 21 (t f , t) ¯ Φ −T 22 (t f , t) (5.82) implies that S(t f , t; S f ) is symmetric, without directly using the Riccati differential equation (5.78). Remark 5.4.2 Since the boundary condition on ¯ Φ 22 (t f , t f ) is the identity matrix, a finite interval of time is needed before ¯ Φ 22 (t f , t) may no longer be invertible and the control law of (5.58) would no longer be meaningful. In the classical calculus of variations literature this is the focal point condition or Jacobi condition [22]. In the next section, it is shown that if the Riccati variable S(t f , t; S f ) exists, then the Hamiltonian system can be transformed into another similar Hamiltonian system for which the feedback law appears directly in the dynamic system. 5.4.3 Canonical Transformation of the Hamiltonian System Some additional insight into the character of this Hamiltonian system is obtained by a canonical similarity transformation of the variables x(t) and λ(t) into a new set of variables x o (t) and ¯ Λ(t). A transformation L(t) for Hamiltonian systems is said to be canonical if it satisfies the symplectic property L(t)JL T (t) = J, (5.83) where J is defined in (5.60). The canonical transformation that produces the desired result is L(t) = ¸ I 0 −S(t f , t; S f ) I (5.84) such that ¸ x o (t) ¯ λ(t) = ¸ I 0 −S(t f , t; S f ) I ¸ x o (t) λ(t) (5.85) 176 Chapter 5. LQ Control Problem L(t) is a canonical transformation since S(t f , t; S f ) is symmetric. Note that the state variables are not being transformed. The propagation equation for the new variables is obtained by differentiating (5.85) and using (5.47) and (5.78) as ¸ ˙ x o (t) ˙ ¯ λ(t) = ¸ A −BR −1 (C + B T S) −BR −1 B T 0 − A −BR −1 (C + B T S) T ¸ x o (t) ¯ λ(t) , ¸ x o (t f ) ¯ λ(t f ) = L(t f ) ¸ x o (t f ) S f x o (t f ) , (5.86) where use is made of the inverse of L(t): L −1 (t) = ¸ I 0 S(t f , t; S f ) I . (5.87) The zero matrix in the coefficient matrix of (5.83) is a direct consequence of S(t f , t; S f ) satisfying the Riccati equation (5.78) Note that from the boundary condition of (5.86) ¯ λ(t f ) = [S f −S(t f , t f ; S f )]x o (t f ). (5.88) However, since S(t f , t f ; S f ) = S f , ¯ λ(t f ) = 0. (5.89) Observe in (5.86) that ¯ λ(t) is propagated by a homogeneous differential equation, which, by (5.89), has the trivial solution ¯ λ(t) = 0 for all t in the interval [t 0 , t f ]. Therefore, (5.86) produces the differential equation for the state as ˙ x o (t) = [A(t) −B(t)R −1 (t)(C(t) + B T (t)S(t f , t, S f ))]x o (t), (5.90) x o (t 0 ) = x 0 , which is the dynamic equation using the optimal control rule (5.58). The existence of the Riccati variable is necessary for the existence of the linear control rule and the above canonical transformation. In the next section it is shown 5.4. Transition Matrix Approach with No Terminal Constraints 177 that the existence of the Riccati variable S(t f , t; S f ) is a necessary and sufficient condition for the quadratic cost criterion to be positive definite, i.e., it is a necessary and sufficient condition for the linear controller to be the minimizing controller. 5.4.4 Necessary and Sufficient Condition for the Positivity of the Quadratic Cost Criterion We show here that the quadratic cost criterion is actually positive for all controls which are not null when the initial condition is x(t 0 ) = 0. Conditions obtained here are applicable to the second variation of Section 5.1 being positive definite for the optimization problem without terminal constraints and fixed terminal time. The essential assumption, besides R(t) > 0, is complete controllability. The main ideas in this section were given in [20] and [4]. Assumption 5.4.1 The dynamic system (5.18) is controllable on any interval [t, t ] where t 0 ≤ t < t ≤ t f (completely controllable). If the system is completely controllable, then a bounded control ¯ u(t) = B T (t)Φ T A (t , t)W −1 A (t , t)x(t ) (5.91) always exists which transfers the state x(t 0 ) = 0 to any desired state x(t ) at t = t where W A (t , t 0 ), the controllability Grammian matrix, is W A (t , t 0 ) = t t 0 Φ A (t , t)B(t)B T (t)Φ T A (t , t)dt > 0 (5.92) and d dt Φ A (t, σ) = A(t)Φ A (t, σ), Φ(σ, σ) = I, (5.93) for all t in (t 0 , t f ]. Definition 5.4.1 J (u(); 0, t 0 ) is said to be positive definite if for each u() in |, u() = 0 (null function), J (u(); 0, t 0 ) > 0. 178 Chapter 5. LQ Control Problem We show first that if for all t in [t 0 , t f ] there exists an S(t f , t; S f ) which satisfies (5.78), then J (u(); 0, t 0 ) is positive definite. Consider adding to J (u(); x(t 0 ), t 0 ) of (5.17) the identically zero quantity 1 2 x T (t 0 )S(t f , t 0 ; S f )x(t 0 ) − 1 2 x T (t f )S(t f , t f ; S f )x(t f ) + t f t 0 1 2 d dt x T (t)S(t f , t; S f )x(t)dt = 0. (5.94) This can be rewritten using the Riccati equation (5.78) and the dynamics (5.18) as 1 2 x T (t 0 )S(t f , t 0 ; S f )x(t 0 ) − 1 2 x T (t f )S f x(t f ) + t f t 0 − 1 2 x T (t) Q(t) + S(t f , t; S f )A(t) + A T (t)S(t f , t, S f ) − C(t) + B T (t)S(t f , t; S f ) T R −1 (t) C(t) + B T (t)S(t f , t; S f ) x(t) + 1 2 x T (t) S(t f , t; S f )A(t) + A T (t)S(t f , t; S f ) x(t) +x T (t)S(t f , t; S f )B(t)u(t) dt = 0. (5.95) If (5.95) is added to (5.17), the integrand can be manipulated into a perfect square and the cost takes the form J (u(); x(t 0 ), t 0 ) = 1 2 x T (t 0 )S(t f , t 0 ; S f )x(t 0 ) + 1 2 t f t 0 u(t) + R −1 (t) C(t) + B T (t)S(t f , t; S f ) x(t) T R(t) u(t) + R −1 (t) C(t) + B T (t)S(t f , t; S f ) x(t) dt. (5.96) Therefore, the cost takes on its minimum value when u(t) takes the form of the optimal controller (5.58). If x(t 0 ) is zero, then the optimal control is the null control. Any other control u() = u o () will give a positive value to J (u(); 0, t 0 ). This shows that a sufficient condition for J (u(); 0, t 0 ) > 0 is the existence of S(t f , t; S f ) over the interval [t 0 , t f ]. 5.4. Transition Matrix Approach with No Terminal Constraints 179 Now we consider the necessity of the existence of S(t f , t; S f ) over the interval [t 0 , t f ] for J (u(); 0, t 0 ) > 0. If the cost J (u(); 0, t 0 ) is positive definite, then the necessity of the existence of the Riccati variable S(t f , t; S f ) depends on Assumption 5.4.1. First note from Remark 5.4.2 that for some t close enough to t f , S(t f , t; S f ) exists for t ≤ t ≤ t f . Therefore, by using the optimal control from t to t f for some x(t ), the minimizing cost is J (u o (); x(t ), t ) = 1 2 x T (t )S(t f , t ; S f )x(t ). (5.97) We prove the necessity of the existence of S(t f , t; S f ) by supposing the opposite— that S(t f , t; S f ) ceases to exist at some escape time, 8 t e . If this occurs, then we will show that the cost criterion can be made either as negative as we like, which violates the positive-definite assumption of the cost criterion, or as positive as desired, which violates the assumption of minimality. The cost, using the control defined as u(t) = ¯ u(t), t 0 ≤ t ≤ t , u(t) = u o (t), t < t ≤ t f , (5.98) can be written as J (u(); 0, t 0 ) = 1 2 x T (t )S(t f , t ; S f )x(t ) + t t 0 ¸ 1 2 x T (t)Q(t)x(t) + ¯ u T (t)C(t)x(t) + 1 2 ¯ u T (t)R(t)¯ u(t) dt, (5.99) where by Assumption 5.4.1 there exists a control ¯ u(t) for t ∈ [t 0 , t ] which transfers x(t 0 ) = 0 to any x(t ) at t > t 0 such that |x(t )| = 1 and |¯ u()| ≤ ρ(t ) < ∞, where ρ() is a continuous positive function. 8 Since the Riccati equation is nonlinear, the solution may approach infinite values for finite values of time. This is called the escape time. 180 Chapter 5. LQ Control Problem Since |¯ u()| is bounded by the controllability assumption, the integral in (5.99) is also bounded. If x T (t )S(t f , t ; S f )x(t ) → −∞ as t → t e (t e < t ) for some x(t ), then J (u(); 0, t 0 ) → −∞. But this violates the assumption that J (u(); 0, t 0 ) > 0. Furthermore, x(t ) T S(t f , t ; S f )x(t ) cannot go to positive infinity since this implies that the minimal cost J (u o (); x(t ), t ) can be made infinite: the controllability assumption implies that there exists a finite control which gives a finite cost. There- fore, since the integral in (5.99) can be bounded, x T (t )S(t f , t ; S f )x(t ) cannot go to ∞. Our arguments apply to the interval (t 0 , t f ]. We appeal to [22] to verify that S(t f , t; S f ) exists for all t in [t 0 , t f ]. These results are summarized in the following theorem. Theorem 5.4.2 Suppose that system (5.18) is completely controllable. A necessary and sufficient condition for J (u(); 0, t 0 ) to be positive definite is that there exists a function S(t f , t; S f ) for all t in [t 0 , t f ] which satisfies the Riccati equation (5.78). Note that since no smallness requirements are placed on x(t) as in Section 5.4, Theorem 5.4.2 is a statement of the global optimality of the solution to the LQ problem. This was shown in Example 3.5.1, where the optimal value function used in the Hilbert’s integral was a quadratic form identical to the function given in (5.97). Remark 5.4.3 The optimal value function, given by (5.97), is V (x(t), t) = 1 2 x T (t)S(t f , t; S f )x(t) (5.100) and is valid as long as the solution S(t f , t; S f ) to the Riccati equation (5.78) exists. This quadratic form satisfies from Theorem 3.5.2 the Hamilton–Jacobi–Bellman equation, again showing that the solution to the LQ problem is global. Furthermore, V T x (x(t), t) = S(t f , t; S f )x(t) satisfies the same differential equation as λ(t) and V xx (x(t), t) = S(t f , t; S f ) satisfies a Riccati equation as given in Section 3.5.1. 5.4. Transition Matrix Approach with No Terminal Constraints 181 5.4.5 Necessary and Sufficient Conditions for Strong Positivity The positivity of the second variation as represented by J (u(); 0, t 0 ) is not enough to ensure that the second variation dominates the higher-order term in the expansion of the cost criterion about the assumed minimizing path. To ensure that the second variation dominates the expansion, it is sufficient that the second variation be strongly positive [22]. Although the LQ problem does not need strong positivity, in the second variational problem if the second-order term is to dominate over higher-order terms in the expansion (see (5.8)), strong positivity is required. In this section we prove that a necessary and sufficient condition for strong positivity of the nonsingular (R = H uu > 0) second variation is that a solution exists to the matrix Riccati differential equation (5.78). The sufficiency part of this theorem is very well known and documented [8]. Though the necessity part is well known too, it is not, in our opinion, proved convincingly elsewhere except in certain special cases. Strong positivity is defined as follows. Definition 5.4.2 J (u(); 0, t 0 ) is said to be strongly positive if for each u() in |, and some k > 0, J (u(); 0, t 0 ) ≥ k|u()| 2 , (5.101) where |u()| is some suitable norm defined on |. The extension from showing that the second variation is positive definite (The- orem 5.4.2) to strongly positive is given in the following theorem. Theorem 5.4.3 A necessary and sufficient condition for J (u(); 0, t 0 ) to be strongly positive is that for all t in [t 0 , t f ] there exists a function S(t f , t; S f ) which satisfies the Riccati equation (5.78). 182 Chapter 5. LQ Control Problem Proof: In Theorem 5.4.2 we proved that J (u(); 0, t 0 ) is positive definite. To show that J (u(); 0, t 0 ) is strongly positive, consider a new LQ problem with cost criterion J (u(), ε; 0, t 0 ) = t f t 0 ¸ 1 2 x T (t)Q(t)x(t) + u T (t)C(t)x(t) + 1 2 −ε u T (t)R(t)u(t) dt + 1 2 x T (t f )S f x(t f ). (5.102) This functional is positive definite if 2 −ε > 0 and if and only if − ˙ S(t f , t; S f , ε) = Q(t) + S(t f , t; S f , ε)A(t) + A T (t)S(t f , t; S f , ε) −[C(t) + B T (t)S(t f , t; S f , ε)] T 2 −ε 2 R −1 (t)[C(t) +B T (t)S(t f , t; S f , ε)], (5.103) S(t f , t f ; S f , ε) = S f , (5.104) has a solution S(t f , t; S f , ε) defined for all t in [t 0 , t f ]. Now, since Q(t), C(t), R(t), A(t), B(t) are continuous in t and the right-hand side of (5.103) is analytic in S(t f , t; S f , ε) and ε, and since S(t f , ; S f , 0) exists, we have that S(t f , ; S f , ε) is a continuous function of ε at ε = 0 [15]. So, for ε sufficiently small, S(t f , t; S f , ε) exists for all t in [t 0 , t f ]. Therefore, for ε < 0 and sufficiently small, J (u(), ε; 0, t 0 ) is positive definite. Next, we note that J (u(), ε; 0, t 0 ) = J (u(); 0, t 0 ) + ε 2(2 −ε) t f t 0 u T (t)R(t)u(t)dt ≥ 0 (5.105) so that J (u(); 0, t 0 ) ≥ − ε 2(2 −ε) t f t 0 u T (t)R(t)u(t)dt. (5.106) 5.4. Transition Matrix Approach with No Terminal Constraints 183 From Assumption 5.2.1, R(t) > 0 with norm bound 0 < k 1 ≤ [[R(t)[[ ≤ k 2 . Therefore, we conclude from (5.106) that J (u(); 0, t 0 ) ≥ − εk 1 2(2 −ε) t f t 0 u T (t)u(t)dt, k 1 > 0. (5.107) Hence, 9 J (u(); 0, t 0 ) ≥ k|u()| 2 L 2 , (5.108) where k = − εk 1 2(2 −ε) > 0. (5.109) which implies that J (u(); 0, t 0 ) is strongly positive. The converse is found in the proof of Theorem 5.4.2, namely, that if J (u(); 0, t 0 ) is strongly positive, then an S(S f , , S f ) exists which satisfies (5.78). Remark 5.4.4 For J (u(); 0, t 0 ) strongly positive the second variation dominates all higher-order terms in the expansion of the cost criterion about the minimizing path with fixed terminal time and without terminal constraints. See Section 5.1. Example 5.4.1 (Shortest distance between a point and a great circle) This example illustrates that the second variation is no longer positive definite when the solution to the Riccati equation (5.78) escapes and that other neighboring paths can produce smaller values of the cost. Let s be the distance along a path on the surface of a sphere. The differential distance is ds = (r 2 dθ 2 + r 2 cos 2 θdφ 2 ) 1 2 = r 2 (u 2 + cos 2 θ) 1 2 dφ, (5.110) 9 t f t 0 u T (t)u(t)dt = |u()| 2 L 2 is the L 2 or integral square norm. 184 Chapter 5. LQ Control Problem Terminal Great Circle d d Figure 5.1: Coordinate frame on a sphere. where φ is the longitudinal angle and θ is the lateral angular as shown in Figure 5.1. The dynamic equation is dθ dφ = u, θ(0) = 0, (5.111) where u is the control variable and φ is treated as the independent variable. The problem is to minimize the distance on a unit sphere (r = 1) from a given point to a given great circle, i.e., find the control u that minimizes J (u(); 0, φ = 0) = φ 1 0 (u 2 + cos 2 θ) 1 2 dφ (5.112) subject to (5.111), fixed terminal φ = φ 1 , and unconstrained θ(φ 1 ). First, a trajectory that satisfies the first-order necessary conditions is determined. Then, about that path, the second-order necessary conditions will be analyzed using the conditions developed in Section 5.4.4. The variational Hamiltonian is H = (u 2 + cos 2 θ) 1 2 + λu. (5.113) The first-order necessary conditions are dθ dφ = H λ = u, θ(0) = 0, (5.114) 5.4. Transition Matrix Approach with No Terminal Constraints 185 − ˙ λ = H θ = −(u 2 + cos 2 θ) − 1 2 sin θ cos θ, λ(φ 1 ) = 0, (5.115) H u = λ + u(u 2 + cos 2 θ) − 1 2 = 0. (5.116) The first-order necessary conditions are satisfied by the trajectory u(φ) = 0, λ(φ) = 0, θ(φ) = 0 for 0 ≤ φ ≤ φ 1 . About this extremal path, the second-order necessary conditions are generated. The second variational problem given in Section 5.2 is to find the perturbed control u that minimizes the second variational cost criterion δ 2 J = 1 2 φ 1 0 (δu) 2 −(δθ) 2 dφ (5.117) subject to dδθ dφ = δu, δθ(0) = 0 for fixed terminal φ = φ 1 with no constraint on the terminal value of θ(φ 1 ). From Theorem 5.4.2 it is necessary that the solution of the associated Riccati equation (5.78) exist. Since for this example A = 0, B = 1, C = 0, Q = −1, R = 1, the associated Riccati equation and solution are dS dφ = S 2 + 1, S(φ 1 ) = 0 ⇒ S(φ) = −tan(φ 1 −φ). (5.118) Note that the solution remains finite until it escapes at φ 1 − φ = π/2. At that point any great circle path from the point to the terminal great circle will give the same cost. If the path is longer than π/2, then there are neighboring paths that do not even satisfy the first-order necessary conditions, which can give smaller cost. If φ 1 −φ < π/2, then the second variation is positive and can be shown to be strongly positive. Furthermore, the second variational controller is δu = tan(φ 1 − φ)δθ. Note that the gain is positive indicating that the best neighboring optimum controller is essentially divergent. 5.4.6 Strong Positivity and the Totally Singular Second Variation Unfortunately, it turns out that if R(t) = H uu ≥ 0, the so-called singular control problem, the second variation cannot be strongly positive [4] and so different tests for 186 Chapter 5. LQ Control Problem sufficiency have to be devised. In the case of nonsingular optimal control problems, where R(t) = H uu > 0 is invertible for all t in [t 0 , t f ], it is known that a sufficient condition for strong positivity of the second variation, and hence for a weak local minimum, is that the matrix Riccati differential equation associated with the second variation should have a solution for all t in [t 0 , t f ]. Clearly, this condition is inap- plicable in the singular case owing to the presence of R −1 (t) = H −1 uu in the matrix Riccati equation. For a long time, therefore, it was felt that no Riccati-like condition existed for the singular case. This has turned out to be not true [4]. The result is that sufficiency conditions for nonnegativity of singular and nonsingular second variations are rather closely related. In Section 5.4.5 we demonstrated that if the matrix Riccati equation has a solution for all t in [t 0 , t f ], then J(u(); 0, t 0 ) is not only positive definite but also strongly positive. Clearly, in a finite-dimensional vector space, positive definiteness is equivalent to strong positivity. However, in our space of piecewise continuous control functions this is not so. In this section we illustrate the difference between positive definiteness and strong positivity by means of a simple example. In addition, the example illustrates the fact that the totally singular second variation cannot be strongly positive. (See [4] for general proofs.) This is, of course, consistent with the fact that in the totally singular case the matrix Riccati equation is undefined because R −1 (t) does not exist. Before presenting the example, we define “ totally singular” precisely as follows. Definition 5.4.3 J (u(); 0, t 0 ) is said to be totally singular if R(t) = 0 ∀ t in [t 0 , t f ]. (5.119) Now, we consider the totally singular functional J (u(); 0, t 0 ) = t f t 0 x 2 (t)dt (5.120) 5.4. Transition Matrix Approach with No Terminal Constraints 187 subject to ˙ x(t) = u(t), x(t 0 ) = 0, (5.121) u() is a member of |. (5.122) Clearly J (u(); 0, t 0 ) is positive definite. Set u(t) = cos ω(t −t 0 ), t 0 ≤ t ≤ t f , (5.123) so that x(t) = 1 ω sin ω(t −t 0 ), t 0 ≤ t ≤ t f . (5.124) With this choice of control J (u(); 0, t 0 ) becomes J (u(); 0, t 0 ) = t f t 0 1 ω 2 sin 2 ω(t −t 0 ) dt, (5.125) and |u()| 2 L 2 = t f t 0 u 2 (t)dt = t f t 0 cos 2 ω(t −t 0 )dt = t f t 0 (1 −sin 2 ω(t −t 0 ))dt. (5.126) By definition, if J (u(); 0, t 0 ) were strongly positive, then for some k > 0 and all ω > 0, 1 ω 2 t f t 0 (sin 2 ω(t −t 0 ))dt ≥ k t f t 0 u 2 (t)dt = k t f t 0 (1 −sin 2 ω(t −t 0 ))dt, (5.127) i.e., k + 1 ω 2 t f t 0 (sin 2 ω(t −t 0 ))dt ≥ k(t f −t 0 ). (5.128) But this is impossible because the left-hand side of (5.128) tends to 1 2 k(t f − t 0 ) as ω →∞. In other words, J (u(); 0, t 0 ) of (5.120) is not strongly positive. 188 Chapter 5. LQ Control Problem Note, however, that the functional J (u(); 0, t 0 ) = t f t 0 (x 2 (t) + u 2 (t))dt (5.129) is strongly positive. This follows directly because x 2 () + u 2 () ≥ u 2 (). (5.130) The fact that the totally singular second variation cannot be strongly positive implies that in the totally singular case we should seek (necessary and) sufficient conditions only for nonnegativity and for positive definiteness of J (u(); 0, t 0 ). For a full treatment of the singular optimal control problem, see [4] and [14]. 5.4.7 Solving the Two-Point Boundary-Value Problem via the Shooting Method A second-order method for finding an optimal control numerically is with a shooting method. This method was contrasted with the first-order steepest descent method in Section 3.3.4 but can be described in detail here. To implement the shooting method, guess a value for the initial value of the adjoint vector, λ i (t 0 ). Integrate this guess forward along with the state to t f using (3.41) and (3.56) with the optimal control determined from (3.55). Now, perturb the previous guess λ i (t 0 ) by some function of the errors at the terminal time t f to get your next guess λ i+1 (t 0 ). It should be noted that small variations in λ(t 0 ) can lead to large changes in λ(t), depending on the stability properties of the adjoint equation. State equations that are stable with time running forward imply that the associated adjoint equations are unstable when they are integrated forward in time (but stable when they are integrated backward in time) [11]. We follow the development in Section 5.1. Assume f(x(t), u(t), t) and L(x(t), u(t), t) are twice differentiable in x(t) and u(t). Assume also that H uu (x(t), u(t), λ(t), t) is nonsingular along all trial trajectories. 5.4. Transition Matrix Approach with No Terminal Constraints 189 1. Choose λ i (t 0 ) where i is the iteration index. Define y i (t) = ¸ x i (t) λ i (t) . (5.131) The initial condition, x(t 0 ) = x 0 , is given. The control is calculated from H u (x i (t), u i (t), λ i (t), t) = 0 ⇒u i (t) = g(x i (t), λ i (t), t) (5.132) from the Implicit Function Theorem. This trajectory does not satisfy the given λ(t f ). 2. Numerically integrate y i forward with the assumption that H uu (y i (t), t) > 0. 3. At the terminal boundary, let λ i (t f ) −φ x (y i f ) = β. (5.133) 4. Linearize the first-order necessary conditions. For the control, this is ∆H u (y i (t), u i , t) = H ux (y i (t), u i , t)δx(t) + H uλ (y i (t), u i , t)δλ(t) +H uu (y i (t), u i , t)δu(t) +O( 1 , 2 , 3 ), (5.134) so that δH u (y i (t), u i (t), t) = H ux (y i (t), u i (t), t)δx(t) + H uλ (y i (t), u i (t), t)δλ(t) +H uu (y i (t), u i , t)δu(t) = 0. (5.135) The dynamic equation is linearized as δ ˙ x(t) = f x (x i (t), u i (t), t)δx + f u (x i (t), u i (t), t)δu(t). (5.136) For the Lagrange multipliers, −δ ˙ λ(t) = H xx (y i (t), u i (t), t)δx+H xu (y i (t), u i (t), t)δu(t)+H xλ (y i (t), u i (t), t)δλ(t). (5.137) 190 Chapter 5. LQ Control Problem From (5.135), solve for the change in control as δu(t) = −H −1 uu (y i (t), u i (t), t) H ux (y i (t), u i (t), t)δx(t) + H uλ (y i (t), u i (t), t)δλ(t) . (5.138) (Recall that H uu (y i (t), u i (t), t) > 0 is assumed, so the inverse exists.) Com- bining (5.136), (5.137), and (5.138) into a matrix linear ordinary differential equation called the Hamiltonian system as given in (5.48): ¸ δ ˙ x δ ˙ λ ¸ = ¸ A(t) −B(t)R −1 (t)C(t) −B(t)R −1 (t)B T (t) −Q(t) + C T (t)R −1 (t)C(t) −(A(t) −B(t)R −1 (t)C(t)) T ¸¸ δx δλ ¸ . (5.139) 5. Solve numerically the Hamiltonian system from (5.139) as ¸ δx(t f ) δλ(t f ) ¸ = ¸ Φ 11 (t f , t 0 ) Φ 12 (t f , t 0 ) Φ 21 (t f , t 0 ) Φ 22 (t f , t 0 ) ¸¸ δx(t 0 ) δλ(t 0 ) ¸ , (5.140) where, since x(t 0 ) is given, δx(t 0 ) = 0. Use δλ(t f ) −φ xx (t f , t 0 )δx(t f ) = dβ (5.141) for [β[ > [β + dβ[ so that dβ is chosen such that β contracts on each iteration. Solving for the partitioned elements of (5.140) δx(t f ) = Φ 12 (t f , t 0 )δλ(t 0 ), (5.142) δλ(t f ) = Φ 22 (t f , t 0 )δλ(t 0 ). (5.143) Substitute (5.142) and (5.143) into (5.141) to give (Φ 22 (t f , t 0 ) −φ xx (t f , t 0 )Φ 12 (t f , t 0 ))δλ(t 0 ) = dβ, (5.144) ⇒ δλ(t 0 ) = [Φ 22 (t f , t 0 ) −φ xx (t f , t 0 )Φ 12 (t f , t 0 )] −1 dβ. (5.145) 6. Update the guess of the initial Lagrange multipliers λ i+1 (t 0 ) = λ i (t 0 ) + δλ(t 0 ). (5.146) 7. If β < ε for some ε small enough, then stop. If not, go to 2. 5.4. Transition Matrix Approach with No Terminal Constraints 191 Sophisticated numerical optimization algorithms based on the shooting method can be found in [43]. An Application of the Shooting Method We apply the shooting method to the Brachistochrone problem of Section 3.3.3. Since an analytic solution has been obtained, the convergence rate of the numerical scheme can be accurately assessed. For this problem, f x = ¸ 0 cos θ 0 0 , f u = ¸ −v sin θ g cos θ , H x = [0 λ r cos θ] , H xu = ¸ 0 −λ r sin θ = H T ux , H xx = ¸ 0 0 0 0 , H uu = −λ r v cos θ −λ v g sin θ. The terminal penalty function is φ(x(t f )) = −r(t f ), giving φ x = [−1 0 ] and φ xx = ¸ 0 0 0 0 . Note that using (3.68), we could remove the differential equation for λ r . It is left in here for simplicity. An initial guess of λ 0 r (0) = −1 and λ 0 v (0) = −0.8 provides the results r(t f ) = 9.350 and λ v (t f ) = 0.641. The transition matrix of (5.140) is computed for t f = 2 and take g = 9.80665 to be Φ(2, 0) = 1.0000 0.2865 13.0710 −16.3410 0.0000 −0.8011 20.4083 −25.5103 0.0000 0.0000 1.0000 0.0000 0.0000 0.0610 −1.1969 0.6950 ¸ ¸ ¸ ¸ . 192 Chapter 5. LQ Control Problem Using dβ = −β, the desired change in the initial values of the Lagrange multipliers is computed as δλ 0 = ¸ 0. −0.9221309 . Note that as we already knew the correct value for λ r , the desired update is zero. Three more iterations provide λ v (0) λ v (t f ) r(t f ) 0 −0.8 0.6409 9.3499 1 −1.7221 −0.6856 11.5777 2 −1.2537 0.0307 12.4832 3 −1.2732 −0.0000 12.4862 On each step, the computed update for λ r (0) is zero, reflecting its invariance. More- over, as expected, λ v (0) and λ v (2) get closer to their analytical initial and terminal values of − 4 π and 0, respectively. 5.5 LQ Problem with Linear Terminal Constraints: Transition Matrix Approach The first-order necessary conditions of Theorem 5.3.1 are now used for the optimal control problem of (5.17) to (5.20), where the terminal constraints (5.20) and the boundary condition (5.43) are explicitly included. The two-point boundary-value problem of Theorem 5.3.1 is solved in a manner similar to that of Section 5.4 by using the transition matrix (5.50) as x o (t f ) = Φ 11 (t f , t 0 )x(t 0 ) + Φ 12 (t f , t 0 )λ(t 0 ), (5.147) λ(t f ) = Φ 21 (t f , t 0 )x(t 0 ) + Φ 22 (t f , t 0 )λ(t 0 ) = D T ν + S f x o (t f ). (5.148) From (5.148) the relation between λ(t 0 ) and (ν, x(t 0 )), assuming that ¯ Φ 22 (t f , t 0 ) defined in (5.72) is invertible, is λ(t 0 ) = − ¯ Φ −1 22 (t f , t 0 ) ¯ Φ 21 (t f , t 0 )x(t 0 ) + ¯ Φ −1 22 (t f , t 0 )D T ν. (5.149) 5.5. LQ Problem with Linear Terminal Constraints 193 This is introduced into (5.147), which in turn is introduced into (5.20), producing Dx o (t f ) = D ¯ Φ 11 (t f , t 0 ) − ¯ Φ 12 (t f , t 0 ) ¯ Φ −1 22 (t f , t 0 ) ¯ Φ 21 (t f , t 0 ) x(t 0 ) +D ¯ Φ 12 (t f , t 0 ) ¯ Φ −1 22 (t f , t 0 )D T ν = 0. (5.150) The symplectic property of the Hamiltonian transition matrix is used to reduce the coefficient of x 0 in (5.150). By using the symplectic identity (5.71) for the symplectic matrix ¯ Φ H (t f , t 0 ) defined in (5.72) and the assumed invertibility of ¯ Φ 22 (t f , t 0 ), we obtain ¯ Φ 11 (t f , t 0 ) − ¯ Φ 12 (t f , t 0 ) ¯ Φ T 21 (t f , t 0 ) ¯ Φ −T 22 (t f , t 0 ) = ¯ Φ −T 22 (t f , t 0 ). (5.151) By premultiplying and postmultiplying the symplectic identity (5.70) by ¯ Φ −1 22 (t f , t 0 ) and ¯ Φ −T 22 (t f , t 0 ), respectively, we obtain ¯ Φ −1 22 (t f , t 0 ) ¯ Φ 21 (t f , t 0 ) = ¯ Φ T 21 (t f , t 0 ) ¯ Φ −T 22 (t f , t 0 ). (5.152) Using (5.152) in (5.151), the coefficient of x 0 in (5.150) reduces to ¯ Φ 11 (t f , t 0 ) − ¯ Φ 12 (t f , t 0 ) ¯ Φ −1 22 (t f , t 0 ) ¯ Φ 21 (t f , t 0 ) = ¯ Φ −T 22 (t f , t 0 ). (5.153) Finally, by solving the matrix equation (5.153) for ¯ Φ 11 (t f , t 0 ), then eliminating ¯ Φ 11 (t f , t 0 ) in (5.69), and using the symmetric property established in (5.152), the symmetric property of ¯ Φ 12 (t f , t 0 ) ¯ Φ −1 22 (t f , t 0 ) is also established. Therefore, (5.149) and (5.150) can be written in the symmetric form ¸ λ(t 0 ) 0 = ¸ − ¯ Φ −1 22 (t f , t 0 ) ¯ Φ 21 (t f , t 0 ) ¯ Φ −1 22 (t f , t 0 )D T D ¯ Φ −T 22 (t f , t 0 ) D ¯ Φ 12 (t f , t 0 ) ¯ Φ −1 22 (t f , t 0 )D T ¸ x 0 ν . (5.154) At this point, λ(t 0 ) and ν can be determined as a function of x(t 0 ). Substitu- tion of this result into (5.46) will produce an optimal control rule for the terminal 194 Chapter 5. LQ Control Problem constrained optimal control problem not only for t 0 but for all t ∈ [t 0 , t f ]. Before explicitly doing this, the elements of the coefficient matrix of (5.154), identified as S(t f , t; S f ) = − ¯ Φ −1 22 (t f , t) ¯ Φ 21 (t f , t), (5.155) F T (t f , t) = ¯ Φ −1 22 (t f , t)D T , (5.156) G(t f , t) = D ¯ Φ 12 (t f , t) ¯ Φ −1 22 (t f , t)D T , (5.157) are to be analyzed. Our objective is to find the propagation equations for these elements and discuss their properties. In particular, these elements will be seen to combine into a Riccati variable for the constrained control problem, and the coefficient matrix of (5.154) is a symmetric matrix. As given in Theorem 5.4.1, S(t f , t; S f ) satisfies the Riccati differential equation (5.78). The differential equation for F(t f , t) is now developed by determining the differential equation for ¯ Φ −1 22 (t f , t) by noting d dt ¯ Φ −1 22 (t f , t) ¯ Φ 22 (t f , t) + ¯ Φ −1 22 (t f , t) d dt ¯ Φ 22 (t f , t) = 0. (5.158) From the adjoint differential equation for Φ H (t f , t) (5.76), (5.158) becomes d dt ¯ Φ −1 22 (t f , t) ¯ Φ 22 (t f , t) = − ¯ Φ −1 22 (t f , t) ¯ Φ 21 (t f , t)B(t)R(t) −1 B T (t) + ¯ Φ 22 (t f , t) A(t) −B(t)R(t) −1 C(t) T . (5.159) Then, by the assumed invertibility of ¯ Φ 22 (t f , t), using (5.155) d dt ¯ Φ −1 22 (t f , t) (5.160) = − A(t) −B(t)R −1 (t)C(t) T −S(t f , t; S f )B(t)R −1 (t)B T (t) ¯ Φ −1 22 (t f , t). Noting that d dt F T (t f , t) = d dt ¯ Φ −1 22 (t f , t) D T , then d dt F T (t f , t) (5.161) = − A(t) −B(t)R −1 (t)C(t) T −S(t f , t; S f )B(t)R −1 (t)B T (t) F T (t f , t), F T (t f , t f ) = D T . 5.5. LQ Problem with Linear Terminal Constraints 195 In a similar manner, the differential equation for G(t f , t) is obtained by direct differentiation of ¯ Φ 12 (t f , t) ¯ Φ −1 22 (t f , t), as d dt ¯ Φ 12 (t f , t) ¯ Φ −1 22 (t f , t) + ¯ Φ 12 (t f , t) d dt ¯ Φ −1 22 (t f , t) = ¯ Φ 11 (t f , t)B(t)R −1 (t)B(t) + ¯ Φ 12 (t f , t) A(t) −B(t)R −1 (t)C(t) T − ¯ Φ 12 (t f , t) A(t) −B(t)R −1 (t)C(t) T −S(t f , t; S f )B(t)R −1 (t)B T (t) ¸ ¯ Φ 22 (t f , t) −1 = ¯ Φ 11 (t f , t) − ¯ Φ 12 (t f , t) ¯ Φ −1 22 (t f , t) ¯ Φ 21 (t f , t) B(t)R −1 (t)B T (t) ¯ Φ −1 22 (t f , t). (5.162) Equation (5.162) is reduced further, by using the symplectic identity of (5.153), to d dt ¯ Φ 12 (t f , t) ¯ Φ −1 22 (t f , t) = ¯ Φ −T 22 (t f , t)B(t)R(t) −1 B T (t) ¯ Φ −1 22 (t f , t). (5.163) By premultiplying and postmultiplying (5.163) by D and D T , respectively, and using the definitions of F(t f , t) and G(t f , t), the differential equation for G(t f , t) is ˙ G(t f , t) = F(t f , t)B(t)R −1 (t)B T (t)F T (t f , t), G(t f , t f ) = 0. (5.164) Note that G(t f , t) generated by (5.164) is symmetric. Since it was already shown that S(t f , t; S f ) is symmetric, the coefficient matrix of (5.154) is symmetric. Our objective is to determine ν in terms of x(t 0 ). This can occur only if G(t f , t 0 ) is invertible. For this to happen it is necessary for D to be full rank. Assuming G(t f , t 0 ) is invertible, ν = −G −1 (t f , t 0 )F(t f , t 0 )x(t 0 ). (5.165) The invertibility of G(t f , t 0 ) is known as a normality condition, ensuring a finite ν for a finite x(t 0 ). By using (5.165) to eliminate ν in (5.149), λ(t 0 ) = ¯ S(t f , t 0 )x(t 0 ), (5.166) 196 Chapter 5. LQ Control Problem where ¯ S(t f , t) = S(t f , t; S f ) −F T (t f , t)G −1 (t f , t)F(t f , t). (5.167) The optimal control can now be written as an explicit function of the initial state. If t 0 is considered to be the present time t, then introducing (5.166) and (5.167) into (5.46) results in the optimal control rule for the terminal constrained optimal control problem as u o (t) = −R −1 (t)[C(t) + B T (t) ¯ S(t f , t)]x(t). (5.168) ¯ S(t f , t) satisfies the same Riccati differential equation as (5.78). This can be verified by time differentiation of ¯ S(t f , t) defined in (5.167). Furthermore, if all the terminal states are constrained, i.e., D = I, and (5.155) to (5.157) are used in (5.167), then by the symplectic identities, ¯ S(t f , t) reduces to ¯ S(t f , t) = − ¯ Φ −1 12 (t f , t) ¯ Φ 11 (t f , t). (5.169) The major difficulty with propagating ¯ S(t f , t) directly is in applying the proper boundary conditions at t f . From (5.167), ¯ S(t f , t f ) is not defined because G(t f , t f ) is not invertible. The integration of S(t f , t; S f ), F(t f , t), and G(t f , t) may have a computational savings over the integration of the transition matrix. Furthermore, G(t f , t) and F(t f , t) do not have to be integrated over the entire interval [t 0 , t f ] but only until G(t f , t) is invertible, usually some very small time step away from t f . This allows a proper initialization for ¯ S(t f , t). Once ¯ S(t f , t) is formed, only ¯ S(t f , t) need be propagated backward in time. The behavior of ¯ S(t f , t) is reflected in the behavior of u o (t) near t f . For large deviations away from the terminal manifold, u o (t) reacts by emphasizing the satisfaction of the constraints rather than reducing the performance criterion. 5.5. LQ Problem with Linear Terminal Constraints 197 In the next subsection, we demonstrate that the invertibility of G(t f , t) is equivalent to a controllability requirement associated only with the required terminal boundary restriction (5.20). 5.5.1 Normality and Controllability for the LQ Problem We show here that the normality condition assumed in (5.165) is actually a controllability requirement. This is done by converting the original problem to one in which the quadratic cost criterion is only a function of a control variable. The minimization of this new performance criterion subject to given initial conditions and the terminal constraints (5.20) requires a controllability condition, which is just G(t f , t). The following theorem is similar to that of Brockett [8] and others. Theorem 5.5.1 Assume that the symmetric matrix S(t f , t; S f ), which is a solution to the Riccati equation (5.78), exists on the interval t 0 ≤ t ≤ t f . Then there exists a control u() on the interval t 0 ≤ t ≤ t f that minimizes (5.17) subject to the differential constraint (5.18) and the boundary conditions (5.19) and (5.20) if and only if there exists a v() on the interval t 0 ≤ t ≤ t f which minimizes J 1 (v(); x 0 , t) = 1 2 t f t 0 v T (t)R(t)v(t)dt (5.170) subject to the differential constraint ˙ x = A(t)x(t) + B(t)v(t), (5.171) where A(t) = A(t) −B(t)R −1 (t) C(t) + B T (t)S(t f , t; S f ) (5.172) with the boundary conditions x(t 0 ) = x 0 and Dx(t f ) = 0. Proof: By proceeding exactly as was done to obtain (5.96), we note that if we let v(t) = u(t) + R −1 (t) C(t) + B T (t)S(t f , t; S f ) x(t), (5.173) 198 Chapter 5. LQ Control Problem then the cost specified in (5.96) can be written as J (v(); x 0 , t) = 1 2 x T (t 0 )S(t f , t 0 ; S f )x(t 0 ) + 1 2 t f t 0 v T (t)R(t)v(t)dt. (5.174) Since x(t 0 ) = x 0 is given, the cost function upon which v() has influence is J 1 (v(); x 0 , t) = 1 2 t f t 0 v T (t)R(t)v(t)dt, (5.175) which is subject to the differential constraint (5.171) when (5.173) is substituted into (5.18). We now proceed to solve this accessory problem of minimizing (5.170) subject to (5.171) using the technique given at the beginning of this section. First, the Riccati variable S ∗ (t f , t; 0) is propagated by (5.78) as ˙ S ∗ (t f , t; 0) = −A(t)S ∗ (t f , t; 0) −S ∗ (t f , t; 0)A T (t) +S ∗ (t f , t; 0)B(t)R −1 (t)B T (t)S ∗ (t f , t; 0), (5.176) S ∗ (t f , t f ; 0) = 0, where the ∗ superscript is used to denote dependence on the Riccati variable S(t f , t; S f ). For this problem, Q(t) and C(t) are now zero. The solution to this homogeneous Riccati equation with zero initial condition is S ∗ (t f , t; 0) = 0 over the interval t 0 ≤ t ≤ t f . The propagation of the linear differential equation (5.161) is F T (t f , t) = Φ T A (t f , t)D T . (5.177) Using (5.177) in (5.164), the solution to G(t f , t) is G(t f , t 0 ) = −DW A (t f , t 0 )D T , (5.178) where W A (t f , t 0 ) = t f t 0 Φ A (t f , t)B(t)R −1 (t)B T (t)Φ T A (t f , t)dt (5.179) 5.5. LQ Problem with Linear Terminal Constraints 199 is the controllability Grammian. Therefore, the invertibility of G(t f , t 0 ) does not depend upon controllability of the entire state space but depends only on the controllability to the desired terminal manifold Dx(t f ) = 0. Clearly, the invertibility of G(t f , t 0 ) is a controllability condition for v(t) to reach the terminal manifold. Then, by (5.173), invertibility of G(t f , t 0 ) is also a controllability condition on u(t). Note that F(t f , t) and G(t f , t 0 ) are the same as those that would be generated by the original problem. Theorem 5.5.2 If ˙ x(t) = A(t)x(t) + B(t)u(t) (5.180) is controllable on the interval [t 0 , t f ] and if u(t) = v(t) + Λ(t)x(t), (5.181) then ˙ x(t) = (A(t) + B(t)Λ(t))x(t) + B(t)v(t) = A(t)x(t) + B(t)v(t) (5.182) is also controllable for any finite piecewise continuous Λ(t) on the interval [t 0 , t f ]. Proof: We need to show that x T 0 Φ A (t 0 , t)B(t) = 0 for all t in the interval implies that x T 0 Φ A (t 0 , t)B(t) = 0, a contradiction to the controllability assumption. To do this, note that the controllability Grammian W A (t 0 , t f ) = t f t 0 Φ A (t 0 , t)B(t)B T (t)Φ T A (t 0 , t)dt (5.183) can be obtained by integrating the linear matrix equation ˙ W A (t, t f ) = A(t)W A (t, t f ) + W A (t, t f )A T (t) −B(t)B T (t), (5.184) W A (t f , t f ) = 0. 200 Chapter 5. LQ Control Problem Similarly, the controllability Grammian (5.179) for A(t) is obtained from ˙ W A (t, t f ) = A(t)W A (t, t f ) + W A (t, t f )A T (t) −B(t)B T (t), (5.185) W A (t f , t f ) = 0. Form the matrix E(t) = W A (t, t f ) −W A (t, t f ). (5.186) It has a linear differential equation ˙ E(t) = A(t)E(t) + E(t)A T (t) + B(t)Λ(t)W A (t, t f ) +W A (t, t f )Λ T (t)B T (t), E(t f ) = 0. (5.187) Then E(t 0 ) = − t f t 0 Φ A (t 0 , t) B(t)Λ(t)W A (t, t f ) +W A (t, t f )Λ T (t)B T (t) Φ T A (t 0 , t)dt. (5.188) Since by hypothesis, x T 0 Φ A (t 0 , t)B(t) = 0, then x T 0 E(t 0 )x 0 = 0 by (5.188) and x T 0 W A (t 0 , t)x 0 = 0 by (5.179). But from (5.186) evaluated at t 0 this implies that x T 0 W A (t 0 , t f )x 0 = 0, (5.189) which is a contradiction of the assumed controllability of ˙ x(t) = A(t)x(t) + B(t)u(t). (5.190) From Theorem 5.5.2 we see that the controllability Assumption 5.3.1 implies the normality condition G(t f , t 0 ) < 0. This observation implies that state feedback as given in (5.181) does not change controllability. When the controllability Assumption 5.3.1 for reaching the manifold (5.20) is restricted to complete controllability, then G(t f , t) < 0 ∀ t ∈ [t 0 , t f ). (5.191) 5.5. LQ Problem with Linear Terminal Constraints 201 5.5.2 Necessary and Sufficient Conditions for the Positivity of the Terminally Constrained Quadratic Cost Criterion The objective of this section is to show that a necessary and sufficient condition for the quadratic cost criterion (5.17) with linear terminal constraints (5.20) to be positive definite when x(t 0 ) = 0 is that the Riccati variable ¯ S(t f , t) exist for all t in the interval [t 0 , t f ). The results of this section closely parallel those of Section 5.4.4 for the unconstrained problem. First, it is shown that if there exists an ¯ S(t f , t) defined by (5.167) for all t in the interval [t 0 , t f ), then J (u(); 0, t o ) is positive definite. Consider adding to J (u(); 0, t 0 ) of (5.17) the identically zero quantity − 1 2 x T (t), ν T ¸ S(t f , t; S f ) F T (t f , t) F(t f , t) G(t f , t) ¸ x(t) ν t f t 0 (5.192) + 1 2 t f t 0 d dt x T (t), ν T ¸ S(t f , t; S f ) F T (t f , t) F(t f , t) G(t f , t) ¸ x(t) ν dt = 0. By using the Riccati equation (5.78), the propagation equation for F(t f , t) in (5.161), and that of G(t f , t) in (5.164), the cost criterion (5.17) can be manipulated into a perfect square as J (u(); x(t 0 ), t 0 ) = 1 2 x T (t 0 )S(t f , t 0 ; S f )x(t 0 ) +x T (t 0 )F T (t f , t 0 )ν + 1 2 ν T G(t f , t 0 )ν −x T (t f )D T ν + t f t 0 R −1 (t) C(t) + B T (t)S(t f , t; S f ) x(t) + B T (t)F T (t f , t)ν + u(t) 2 R(t) dt. (5.193) One difficulty occurs at t = t f , where ν cannot be determined in terms of x(t f ). Since the system is assumed completely controllable by Assumption 5.4.1, then G(t f , t f −∆) is invertible for any ∆ > 0. For small enough ∆, S(t f , t f −∆; S f ) 202 Chapter 5. LQ Control Problem exists. Therefore, the optimal control in an interval t f −∆ ≤ t ≤ t f is open-loop over that interval, given by u o (t) = −R −1 (t) C(t), B T (t) Φ H (t, t f −∆) ¸ I S(t f , t f −∆; S f ) − B T (t)F T (t f , t)G −1 (t f , t f −∆)F(t f , t f −∆) x(t f −∆), (5.194) where ν is determined by (5.165) evaluated at t 0 = t f −∆. Given Assumption 5.4.1, this open-loop control will satisfy the terminal boundary conditions. Since all the factors in (5.194) remain finite in [t f −∆, t f ], u o (t) remains finite in that interval. In the interval t 0 ≤ t ≤ t f − ∆ the optimal control is given by (5.168). By using the optimal control given by (5.194) for t f −∆ ≤ t ≤ t f and (5.168) for t 0 ≤ t ≤ t f −∆ to replace the control u(), the integral part of the cost in (5.193) becomes zero and is at its minimum value. With x(t 0 ) = 0, the optimal control is the null control. Any other control which satisfies the terminal boundary condition that Dx(t f ) = 0 and is unequal to u o () will give a positive value to J (u(); 0, t 0 ). The arguments for the existence of ¯ S(t f , t) over the interval [t 0 , t f ), given that the cost J (u(); 0, t 0 ) is positive definite, are the same as that given in Section 5.4.4. That is, for some t close enough to t f , ¯ S(t f , t) exists for t 0 ≤ t < t f . Therefore, the optimal control laws (5.194) or (5.168) apply, and the optimal cost starting at some x(t ) is J (u o (); x(t ), t ) = 1 2 x T (t ) ¯ S(t f , t )x(t ). (5.195) By applying a control suggested by (5.98), controllability, and positivity of the cost for x(t 0 ) = 0, the arguments of Section 5.4.4 imply that since x T (t ) ¯ S(t f , t )x(t ) can go to neither positive nor negative infinity for all finite x(t ) and for any t in the interval [t 0 , t f ), then ¯ S(t f , t ) exists for all t in [t 0 , t f ). These results are summarized as Theorem 5.5.3. 5.5. LQ Problem with Linear Terminal Constraints 203 Theorem 5.5.3 Given Assumption 5.4.1, a necessary and sufficient condition for J (u(); 0, t 0 ) to be positive definite for the class of controls which satisfy the terminal constraint (5.20) is that there exist a function ¯ S(t f , t) for all t in [t 0 , t f ) which satisfies the Riccati equation (5.78). Remark 5.5.1 Construction of ¯ S(t f , t) by (5.167) using its component parts may not be possible since S(t f , t; S f ) may not exist, whereas ¯ S(t f , t) can exist. Remark 5.5.2 Theorem 5.5.3 can be extended to show that J (u(); 0, t 0 ) is strongly positive definite by a proof similar to that used in Theorem 5.4.3 of Section 5.4.5. Remark 5.5.3 Note that the optimal value function for the LQ problem with terminal constraints is given as V (x(t), t) = 1 2 x(t) T ¯ S(t f , t)x(t). (5.196) This satisfies the Hamilton–Jacobi–Bellman equation given in Theorem 4.7.1 of Section 4.7. Example 5.5.1 (Shortest distance between two points on a sphere) In this section the minimum distance problem is generalized from the unconstrained terminal problem, given in Example 5.4.1, to the terminal constrained problem of reaching a given terminal point. The problem is to find the control u that minimizes J = φ 1 0 (u 2 + cos 2 θ) 1 2 dφ (5.197) subject to (5.111) at fixed terminal φ = φ 1 with constraint θ(φ 1 ) = 0. The first-order necessary conditions are dθ dφ = H λ = u, θ(0) = 0, θ(φ 1 ) = 0, (5.198) − ˙ λ = H θ = (u 2 + cos 2 θ) − 1 2 sin θ cos θ, λ(φ 1 ) = ν, (5.199) H u = λ + u(u 2 + cos 2 θ) − 1 2 = 0. (5.200) 204 Chapter 5. LQ Control Problem The first-order necessary conditions are satisfied by the trajectory u(φ) = 0, λ(φ) = 0(ν = 0), θ(φ) = 0 for 0 ≤ φ ≤ φ 1 . About this extremal path, the second-order necessary conditions are generated. The second variational problem given in Section 5.2 is to find the perturbed control δu that minimizes 2δ 2 J = φ 1 0 (δu) 2 −(δθ) 2 dφ (5.201) subject to dδθ dφ = δu, δθ(0) = 0 for fixed terminal φ = φ 1 with the terminal constraint δθ(φ 1 ) = 0. From Theorem 5.5.3 it is necessary that the solution of the associated Riccati equation exist. Since for this example A = 0, B = 1, C = 0, Q = −1, R = 1, the solution of the associated Riccati equation is obtained from the following equations and solutions: dS dφ = S 2 + 1, S(φ 1 ) = 0 ⇒ S(φ) = −tan(φ 1 −φ), (5.202) dF dφ = SF, F(φ 1 ) = 1 ⇒ F(φ) = sec(φ 1 −φ), (5.203) dG dφ = F 2 , G(φ 1 ) = 0 ⇒ G(φ) = −tan(φ 1 −φ). (5.204) From these solutions of S(φ), F(φ), and G(φ) the associated Riccati solution is constructed as ¯ S(φ) = S(φ) −F 2 (φ)G −1 (φ) = cot(φ 1 −φ). (5.205) Note that the solution remains finite until the solution escapes at φ 1 −φ = π. At that point any great circle path from the point to the terminal point will give the same cost. If the path is longer than π, then there are paths that do not even satisfy the first-order necessary conditions that can give smaller cost. The second variational controller is 5.6. Solution of the Matrix Riccati Equation: Additional Properties 205 δu = −cot(φ 1 −φ)δθ. Note that the gain is initially positive, indicating that the best neighboring optimum controller is essentially divergent for π/2 < φ 1 − φ < π but becomes convergent over the interval 0 < φ 1 −φ ≤ π/2. 5.6 Solution of the Matrix Riccati Equation: Additional Properties Three important properties of the solution to the matrix Riccati equation are given here. First, it is shown that if the terminal weight in the cost criterion is increased, the solution to the corresponding Riccati equation is also increased. This implies, for example, that S(t f , t; S f ) ≤ ¯ S(t f , t). Second, by restricting certain matrices to be positive semidefinite, S(t f , t; S f ) is shown to be nonnegative definite and bounded. Finally, if S f = 0 and the above restrictions hold, S(t f , t; 0) is monotonically increasing as t f increases. This is extremely important for the next section, where we analyze the infinite-time, LQ problem with constant coefficients. Theorem 5.6.1 If S 1 f and S 2 f are two terminal weights in the cost criterion (5.17), such that S 1 f −S 2 f ≥ 0, then the difference S(t f , t; S 1 f ) −S(t f , t; S 2 f ) ≥ 0 for all t ≤ t f where S(t f , t; S 2 f ) exists, and S(t f , t; S 1 f ) and S(t f , t; S 2 f ) are solutions of (5.78) for S 1 f , S 2 f , respectively. Proof: First, the unconstrained optimization problem of (5.17) and (5.18) is converted, as in Theorem 5.5.1, to an equivalent problem. This is done precisely in the manner used to obtain (5.96). However, now we are concerned with two problems having terminal weights S 1 f and S 2 f , respectively. Therefore, (5.95) is indexed with a superscript 2 and added to a cost criterion having a terminal 206 Chapter 5. LQ Control Problem weight S 1 f . The result is that the optimal cost for weight S 1 f is 1 2 x T (t 0 )S(t f , t 0 ; S 1 f )x(t 0 ) = 1 2 x T (t 0 )S(t f , t 0 ; S 2 f )x(t 0 ) + min u(·) 1 2 x T (t f )(S 1 f −S 2 f )x(t f ) (5.206) + 1 2 t f t 0 u(t) + R −1 (t) C(t) + B T (t)S(t f , t; S 2 f ) x(t) T R(t) u(t) + R −1 (t) C(t) + B T (t)S(t f , t; S 2 f ) x(t) dt . Clearly, a new problem results, if again v(t) = u(t) + R −1 (t) C(t) + B T (t)S(t f , t; S 2 f ) x(t), (5.207) where it is assumed that S(t f , t; S 2 f ) exists, as x T 0 S(t f , t 0 ; S 1 f ) −S(t f , t 0 ; S 2 f ) x 0 = min v(·) ¸ x T (t f ) S 1 f −S 2 f x(t f ) + 1 2 t f t 0 v T (t)R(t)v(t)dt . (5.208) Since S 1 f −S 2 f is nonnegative definite, then the optimal cost must be nonnegative, implying that x T 0 S(t f , t 0 ; S 1 f ) −S(t f , t 0 ; S 2 f ) x 0 ≥ 0 (5.209) for all x 0 and t 0 ≤ t f . Remark 5.6.1 Since this new control problem (5.208) will have an optimal cost as x T 0 S ∗ (t f , t; S 1 f −S 2 f )x 0 , then a relationship exists as S(t f , t 0 ; S 1 f ) = S(t f , t 0 ; S 2 f ) + S ∗ (t f , t 0 ; S 1 f −S 2 f ), (5.210) where S ∗ (t f , t; S 1 f −S 2 f ) satisfies the homogeneous Riccati equation d dt S ∗ (t f , t; S 1 f −S 2 f ) = −A(t)S ∗ (t f , t; S 1 f −S 2 f ) −S ∗ (t f , t; S 1 f −S 2 f )A T (t) S ∗ (t f , t; S 1 f −S 2 f )B(t)R −1 (t)B T (t)S ∗ (t f , t; S 1 f −S 2 f ), S ∗ (t f , t f ; S 1 f −S 2 f ) = S 1 f −S 2 f , (5.211) 5.6. Solution of the Matrix Riccati Equation: Additional Properties 207 with A(t) = A(t) −B(t)R −1 (t) C(t) + B T (t)S(t f , t 0 ; S 2 f ) . (5.212) Note that if S ∗ (t f , t; S 1 f −S 2 f ) has an inverse, then a linear matrix differential equation for S ∗−1 (t f , t; S 1 f −S 2 f ) results by simply differentiating S ∗ (t f , t; S 1 f −S 2 f )S ∗−1 (t f , t; S 1 f − S 2 f ) = I. Remark 5.6.2 From (5.167), the difference between the constrained Riccati matrix ¯ S(t f , t) and the unconstrained Riccati matrix S(t f , t; S f ) is ¯ S ∗ (t f , t) = −F T (t f , t)G −1 (t f , t)F(t f , t) ≥ 0 (5.213) for t ∈ [t 0 , t f ), since G(t f , t) < 0 for t ∈ [to, t f ) by virtue of Theorem 5.5.2. Fur- thermore, the differential equation for ¯ S ∗ (t f , t) is that of (5.211), but the boundary condition at t f is not defined. Intuitively, the constrained problem can be thought of as a limit of the unconstrained problem, where certain elements of the weighting function are allowed to go to infinity. Existence of the solution to the Riccati equation is of central importance. Re- sults using the following restrictive but useful assumption guarantees not only that S(t f , t; S f ) exists but that it is nonnegative definite. Assumption 5.6.1 S f ≥ 0 and Q(t) −C T (t)R −1 (t)C(t) ≥ 0 for all t in the interval [t 0 , t f ]. Theorem 5.6.2 Given Assumptions 5.2.1, 5.4.1, and 5.6.1, the solution to the Ric- cati equation (5.78), S(t f , t; S f ), exists on the interval t 0 ≤ t ≤ t f , regardless of t 0 , and is nonnegative definite. 208 Chapter 5. LQ Control Problem Proof: From (5.97), the minimum cost is related to the Riccati variable, if it exists, for an arbitrary initial state as 1 2 x T (t 0 )S(t f , t 0 ; S f )x(t 0 ) = min u(·) 1 2 x T (t f )S f x(t f ) (5.214) + 1 2 t f t 0 x T (t)Q(t)x(t) + 2u T (t)C(t)x(t) + u T (t)R(t)u(t) dt . Let us make a change in controls of the form u(t) = v(t) −R −1 (t)C(t)x(t). (5.215) The cost can now be converted to the equivalent form 1 2 x T (t 0 )S(t f , t 0 ; S f )x(t 0 ) = min v(·) 1 2 x T (t f )S f x(t f ) (5.216) + 1 2 t f t 0 x T (t) Q(t) −C T (t)R −1 (t)C(t) x(t) + v T (t)R(t)v(t) dt , where the cross term between u(t) and x(t) is eliminated. Since R(t) is positive definite by Assumption 5.2.1 and by Assumption 5.6.1, S f and Q(t) − C T (t)R −1 (t)C(t) are nonnegative definite, the cost must be nonnegative definite for all x(t 0 ), i.e., x T (t 0 )S(t f , t 0 ; S f )x(t 0 ) ≥ 0 ⇒S(t f , t 0 ; S f ) ≥ 0, (5.217) regardless of t 0 . Furthermore, since the original system is controllable with respect to u, then by Theorem 5.5.2, the new system ˙ x(t) = A(t) −B(t)R −1 (t)C(t) x(t) + B(t)v(t) (5.218) must be controllable with respect to v(t). Therefore, the cost is bounded from above and below, even if t 0 goes to −∞. The final two theorems deal with the monotonic and asymptotic behavior of the Riccati equation under Assumptions 5.2.1, 5.4.1, and 5.6.1. First, it is shown that when S f = 0, S(t f , t; S f ) is a monotonically increasing function of t f . Then, it is 5.6. Solution of the Matrix Riccati Equation: Additional Properties 209 shown that S ∗ (t f , t; S f ) goes asymptotically to zero as a function of t as t → −∞ regardless of the boundary condition, S f . This means that S(t f , t 0 ; S f ) and ¯ S(t f , t 0 ) approach S(t f , t 0 ; 0) as the difference t f −t 0 goes to infinity. Theorem 5.6.3 Given Assumptions 5.2.1, 5.4.1, and 5.6.1, S(t f , t 0 ; 0) ≥ S(t 1 , t 0 ; 0) for t 0 ≤ t 1 ≤ t f . (5.219) Proof: The optimal cost criterion can be written as 1 2 x T (t 0 )S(t f , t 0 ; 0)x(t 0 ) = min u(·) 1 2 t f t 0 x T (t)Q(t)x(t) + 2u T (t)C(t)x(t) + u T (t)R(t)u(t) dt = min u(·) 1 2 t 1 t 0 x T (t)Q(t)x(t) + 2u T (t)C(t)x(t) + u T (t)R(t)u(t) dt + 1 2 x T (t 1 )S(t f , t 1 ; 0)x(t 1 ) = 1 2 x T (t 0 )S(t 1 , t 0 ; S(t f , t 1 ; 0))x(t 0 ). (5.220) From (5.210), x T (t 0 )S(t f , t 0 ; 0)x(t 0 ) = x T (t 0 )S(t 1 , t 0 ; S(t f , t 1 ; 0))x(t 0 ) = x T (t 0 ) [S(t 1 , t 0 ; 0) + S ∗ (t 1 , t 0 ; S(t f , t 1 ; 0))] x(t 0 ). (5.221) Since S(t f , t 1 ; 0) ≥ 0 by Theorem 5.6.2, then S ∗ (t 1 , t 0 ; S(t f , t 1 ; 0)) ≥ 0. Therefore, x T 0 S(t f , t 0 ; 0)x 0 ≥ x T 0 S(t 1 , t 0 ; 0)x 0 . (5.222) By Assumption 5.4.1, x T 0 S(t f , t 0 ; 0)x 0 is bounded for all x 0 regardless of t 0 ≤ t f , implying S(t f , t 0 ; 0) ≥ S(t 1 , t 0 ; 0). (5.223) 210 Chapter 5. LQ Control Problem In the second theorem we show that S ∗ (t f , t 0 ; S f ) goes to zero for all S f ≥ 0 as t 0 goes to −∞by requiring an observability assumption. If the cost with a terminal constraint is nonzero for all x(t 0 ), t 0 < t f , then x T (t 0 )S(t f , t 0 ; 0)x(t 0 ) > 0 for all x(t 0 ) = 0. This condition will be shown to be guaranteed by ensuring that y(t) given by y(t) = N(t)x(t) (5.224) is observable where N(t) is the square root of the matrix Q = Q(t)−C T (t)R −1 (t)C(t), assumed to be nonnegative, and y T (t)y(t) = x T (t)Q(t)x(t). Assumption 5.6.2 The dynamic system ˙ x(t) = A(t)x(t), x(t 0 ) = x 0 , y(t) = N(t)x(t) (5.225) is completely observable on the interval [t, t ], where t 0 < t < t ≤ t f . If the system is completely observable, then the initial state can be determined as x(t 0 ) = M −1 (t 0 , t ) t t 0 Φ T A (t, t 0 )N T (t)y(t)dt, (5.226) where the observability Grammian matrix M(t 0 , t ) = t t 0 Φ T A (t, t 0 )N T (t)N(t)Φ A (t, t 0 )dt (5.227) is invertible for all t in the interval t 0 < t ≤ t f . This means that t t 0 x T (t)Q(t)x(t)dt = t t 0 y T (t)y(t)dt = x T 0 M(t 0 , t )x 0 > 0 (5.228) for all x 0 = 0, t ∈ (t 0 , t f ] and u() = 0. Therefore, J (u(); x 0 , t 0 ) > 0. Theorem 5.6.4 Given Assumptions 5.4.1, 5.6.1, and 5.6.2, and A(t) defined in (5.172) with S f = 0, the following hold: (a) S(t f , t 0 ; 0) > 0 ∀ t 0 in −∞< t 0 < t f , (b) ˙ x(t) = A(t)x(t), x(t 0 ) = x 0 is asymptotically stable, (c) S ∗ (t f , t 0 ; 0) →0 as t 0 →−∞ ∀ S f ≥ 0. 5.6. Solution of the Matrix Riccati Equation: Additional Properties 211 Proof: From Theorem 5.6.2, S(t f , t 0 ; 0) ≥ 0 and bounded. By Assumption 5.6.2, using (5.216), for all x 0 = 0, x T 0 S(t f , t 0 ; 0)x 0 > 0 (5.229) for all t 0 in −∞ < t 0 < t f . This results in S(t f , t 0 ; 0) > 0. Let us now use the optimal cost function (5.229) as a Lyapunov function to determine if x(t) in condition (b) is asymptotically stable. First, determine if the rate of change of the original cost function is negative definite. Therefore, d dt x T (t)S(t f , t; 0)x(t) = x T (t) A T (t)S(t f , t; 0) + S(t f , t; 0)A(t) + ˙ S(t f , t; 0) x(t). (5.230) By using (5.78) with S f = 0, d dt x T (t)S(t f , t; 0)x(t) = −x T (t) Q(t) + S(t f , t; 0)B(t)R(t) −1 B T (t)S(t f , t; 0) x(t). (5.231) Therefore, d dt x(t) T S(t f , t; 0)x(t) ≤ 0 and the optimal cost function (5.229) is a Lyapunov function for t < t f . By integrating (5.231), x T 0 S(t f , t 0 ; 0)x 0 −x T (t 1 )S(t f , t 1 ; 0)x(t 1 ) = t 1 t 0 x T (t) Q(t) + S(t f , t; 0)B(t)R −1 (t)B T (t)S(t f , t; 0) x(t)dt, (5.232) where t 0 < t < t f such that S(t f , t; 0) > 0. If |x(t 1 )| = 0, then |x(t)| = 0 for all t ∈ [t 0 , t 1 ], since x(t) T S(t f , t; 0)x(t) is a Lyapunov function. Therefore, by As- sumption 5.6.2, as t 1 −t 0 →∞the right-hand side of (5.232) goes to ∞, implying that x T 0 S(t f , t 0 ; 0)x 0 → ∞. But this contradicts the fact that S(t f , t 0 ; 0) < ∞ 212 Chapter 5. LQ Control Problem by Theorem 5.6.2 and x 0 is given and assumed finite. Therefore, |x(t 1 )| →0 as t 1 −t 0 →∞ for any S f ≥ 0. We now consider condition (c). In Remark 5.6.2 after Theorem 5.6.1, it is noted that ¯ S ∗ (t f , t), defined in (5.213), satisfies the homogeneous Riccati equation (5.211). However, if the boundary conditions for F(t f , t f ) and G(t f , t f ) are chosen to be consistent with S ∗ (t f , t f ; S f ) = S f , then the solution and the behavior of S(t f , t; S f ) can be determined from F(t f , t) and G(t f , t). By writing S f = KK T , (5.233) then F(t f , t f ) = K, G(t f , t f ) = −I. (5.234) Note that F(t f , t) satisfies, from (5.161), the differential equation ˙ F(t f , t) = −F(t f , t)A(t) T , F(t f , t f ) = K. (5.235) From the conditions (b) of the theorem, F(t f , t) is stable and approaches zero as t goes to −∞. From (5.164) and (5.234), the evaluation of G(t f , t) must always be negative definite for t ≤ t f , implying that G(t f , t) −1 exist for all t ≤ t f . Since F(t f , t) → 0 as t → −∞, S ∗ (t f , t; S f ) → 0 as t → −∞. Remark 5.6.3 Note that S(t f , t; S f ) →S(t f , t; 0) as t →−∞ for all S f ≥ 0. Remark 5.6.4 The assumptions of controllability and observability are stronger conditions than are usually needed. For example, if certain states are not controllable but naturally decay, then stabilizability of x(t) is sufficient for condition (b) of Theo- rem 5.6.4 to still hold. 5.7. LQ Regulator Problem 213 5.7 LQ Regulator Problem The LQ control problem is restricted in this section to a constant coefficient dynamic system and cost criterion. Furthermore, the time interval over which the cost criterion is to be minimized is assumed to be infinite. As might be suspected by the previous results, a linear constant gain controller results from this restricted formulation. This specialized problem is sometimes referred to as the linear quadratic regulator (LQR) problem. linear quadratic regulator problem The optimal control problem of Section 5.2 is specialized to the linear regulator problem by requiring that t f be infinite and A, B, Q, C, and R are all constant matrices. Theorem 5.7.1 For the LQR problem, given Assumptions 5.4.1, 5.6.1, and 5.6.2, there is a unique, symmetric, positive-definite solution, S, to the algebraic Riccati equation (ARE) (A −BR −1 C)S + S(A −BR −1 C) T + (Q−C T R −1 C) −SBR −1 B T S = 0 (5.236) such that (A −BR −1 C −BR −1 B T S) has only eigenvalues with negative real parts. Proof: From Theorem 5.6.3, S(t f , t 0 ; 0) is monotonic in t f . Since the parameters are not time dependent, then S(t f , t 0 ; 0) depends only upon t f −t 0 and is monotonically increasing with respect to t f −t 0 . Since from Theorem 5.6.2, S(t f , t 0 ; 0) is bounded for all t f − t 0 , then as t f − t 0 → ∞, S(t f , t 0 ; 0) reaches an upper limit of S. As S(t f , t 0 ; 0) approaches S, ˙ S(t f , t; 0) approaches zero, implying that S must satisfy the ARE (5.236). That is, for some ∆ > 0, S(t f , t 0 −∆; 0) = S(t f −∆, t 0 −∆; S(t f , t f −∆; 0)) = S(t f , t 0 ; S(t 0 , t 0 −∆; 0)), (5.237) 214 Chapter 5. LQ Control Problem where the time invariance of the system is used to shift the time, i.e., S(t f , t f − ∆; 0) = S(t 0 , t 0 −∆; 0) and t f −∆, t 0 −∆ become t f , t 0 . Continuity of the solution with respect to the initial conditions implies that as ∆ →∞, S(t 0 , t 0 −∆; 0) and S(t f , t 0 −∆; 0) go to S such that (5.237) becomes S = S(t f , t 0 ; S), (5.238) and S is a fixed-point solution to the autonomous (time-invariant) Riccati equation. Furthermore, by condition (c) of Theorem 5.6.4, S(t f , t 0 ; S f ) approaches the same limit regardless of S f ≥ 0 and, therefore, S is unique. By conditions (a) and (b) of Theorem 5.6.4, S is positive-definite and x is asymptotically stable. Since S is a constant, this implies that the eigenvalues of the constant matrix A = A −BR −1 C −BR −1 B T S (5.239) have only negative real parts. The relationship between the Hamiltonian matrix, H, of (5.48) and the feedback dynamic matrix A of (5.239) is vividly obtained by using the canonical transformation introduced in Section 5.4.3. First, additional properties of Hamiltonian systems are obtained for the constant Hamiltonian matrix by rewriting (5.63) as H = −J T H T J. (5.240) The characteristic equations for H and −H can be obtained by subtracting λI from both sides of (5.240) and taking the determinant as det(H −λI) = det(−J T H T J −λI) = det(−J T H T J −λJ T J) = det J T det(−H −λI) det J = det(−H −λI). (5.241) 5.7. LQ Regulator Problem 215 Since the characteristic equations for H and −H are equal, the eigenvalues of the 2n 2n matrix H are not only symmetric about the real axis but also about the imaginary axis. If n eigenvalues are λ i , i = 1, . . . , n, then the remaining n eigenvalues are λ i+n = −λ i , i = 1, . . . , n. (5.242) From (5.242), it is seen that there are just as many stable eigenvalues as unstable ones. Furthermore, there is a question as to how many eigenvalues lie on the imaginary axis. To better understand the spectral content of H, the canonical transformation of (5.84) is used with the steady state S, which is the solution to the ARE (5.236). By using this canonical transformation, similar to (5.84), the transformed Hamiltonian matrix H is LHL −1 = ¯ H = ¸ A −BR −1 B T 0 −A T . (5.243) This form is particularly interesting because A and −A T contain all the spectral information of both ¯ H and H since L is a similarity transformation. Note that from Theorem 5.7.1, the real parts of the eigenvalues of A are negative. Therefore, the feedback dynamic matrix A with S > 0 contains all the left-half plane poles in H. The numerous solutions to the ARE, in which it will be shown that S > 0 is only one of many solutions, will give various groupings of the eigenvalues of the Hamiltonian matrix. If the Hamiltonian matrix has no eigenvalues on the imaginary axis, then there is at most one solution to the matrix Riccati equation which will decompose the matrix such that the eigenvalues of A have only negative real parts. This will be the case even if we are not restricted to Assumptions 5.4.1, 5.6.1, and 5.6.2. The following theorem 216 Chapter 5. LQ Control Problem from Brockett [8] demonstrates directly that if the real parts of the eigenvalues of A are negative, then S is unique. Theorem 5.7.2 For the LQR problem, there is at most one symmetric solution of SA + A T S − SBR −1 B T S + Q = 0 having the property that the eigenvalues of A − BR −1 B T S have only negative real parts. Proof: Assume, to the contrary, that there are two symmetric solutions, S 1 and S 2 , such that both the eigenvalues of A 1 = A−BR −1 B T S 1 and A 2 = A−BR −1 B T S 2 have only negative real parts. By proceeding as in (5.94) to (5.96), the cost can be written as J (u(); x(t 0 ), t 0 ) (5.244) = 1 2 x T (t 0 )S 1 x(t 0 ) −x T (t f )S 1 x(t f ) + t f t 0 1 2 u(t) + R −1 C + B T S 1 x(t) T R u(t) + R −1 C + B T S 1 x(t) dt = 1 2 x T (t 0 )S 2 x(t 0 ) −x T (t f )S 2 x(t f ) + t f t 0 1 2 u(t) + R −1 C + B T S 2 x(t) T R u(t) + R −1 C + B T S 2 x(t) dt. Since S 1 and S 2 are symmetric and S 1 = S 2 , then there exists an x 0 such that x T 0 S 1 x 0 = x T 0 S 2 x 0 . Suppose that x T 0 S 1 x 0 ≥ x T 0 S 2 x 0 and let u(t) = −R −1 (C + B T S 2 )x(t). By taking the limits as t f goes to infinity means x(t f ) goes to zero by the assumption that A 1 and A 2 are stable, and the cost can be written as J (u(); x(t 0 ), t 0 ) = 1 2 x T (t 0 )S 2 x(t 0 ) = 1 2 x T (t 0 )S 1 x(t 0 ) (5.245) + ∞ t 0 1 2 B T (S 1 −S 2 ) x(t) T R −1 B T (S 1 −S 2 ) x(t) dt, which contradicts the hypothesis that x T 0 S 1 x 0 ≥ x T 0 S 2 x 0 and, therefore, the hypothesis that there can be two distinct solutions to the ARE, which both produce stable dynamic matrices A 1 and A 2 . 5.8. Necessary and Sufficient Conditions for Free Terminal Time 217 Remark 5.7.1 Since Assumptions 5.4.1, 5.6.1, and 5.6.2 are not required, then the solution to the ARE is not necessarily nonnegative definite. Furthermore, requiring Assumptions 5.4.1, 5.6.1, and 5.6.2 implies that as t → −∞, ¯ S(t f , t) → S(t f , t; 0). This is not generally the case. From Theorem 5.5.3, ¯ S(t f , t) ≥ S(t f , t; 0). It can well occur that S(t f , t; 0) can have a finite escape time, whereas ¯ S(t f , t; 0) remains bounded even as t → −∞. For the regulator problem, the unconstrained terminal control problem may not have a finite cost. However, by requiring that lim t f →∞ x(t f ) → 0, the terminally constrained problem may have a finite cost where lim t→−∞ ¯ S(t f , t) →S. Remark 5.7.2 It is clear from the canonical transformation of the Hamiltonian matrix H into ¯ H of (5.243) that a necessary condition for all the eigenvalues of A to have negative real parts is for H to have no eigenvalues which have zero real parts. 5.8 Necessary and Sufficient Conditions for Free Terminal Time In this section necessary and sufficient conditions for optimality of the second variation to be positive and strongly positive for the terminally constrained, free terminal time problem are determined. We return to Section 5.1, where in (5.10) the second variational augmented cost criterion δ 2 ˆ J, with augmented variation in the terminal constraints and free terminal time, is given. In (5.10) the linear dynamics in z are adjoined by the Lagrange multiplier ˜ λ in the integrand and the terminal constraints are adjoined by ˜ ν in the quadratic terminal function. For consistency of notation, the variables given in Section 5.2 and using Remark 5.2.1, where the augmented cost (5.10) is considered rather than (5.15) and for this free terminal time problem, the additional notation ˜ D T = dψ dt , E = Ω x , and ˜ E = dΩ dt is used. In particular, (5.24) is extended for free terminal time, having integrated λ T (t)x(t) by parts and using the 218 Chapter 5. LQ Control Problem Hamiltonian given in (5.23), as ˆ J (u(), ∆; x 0 , λ(), ν) = t o f t 0 H(x(t), λ(t)u(t), t) + ˙ λ T (t)x(t) dt + λ T (t 0 )x 0 −λ T (t o f )x(t o f ) + 1 2 x(t o f ) T ν T ∆ S f D T E T D 0 ˜ D T E ˜ D ˜ E ¸ ¸ x(t o f ) ν ∆ ¸ ¸ , (5.246) where ˜ λ and ˜ ν have been replaced by λ and ν, respectively. The notation is now consistent with that given for the LQ problem except for the inclusion of ∆, the variation in the terminal time, which is a control parameter entering only in the terminal function. To determine the first-order necessary conditions for variations in ˆ J to be nonnegative, variations in the control are made as δu(t) = u(t) − u o (t) and produces a variation in x(t) as δx(t) = x(t) − x o (t). Furthermore, a variation in ∆ must also be considered as δ∆ = ∆−∆ o . The change in ˆ J is made using these variations and thereby the expansion of (5.28) is extended for free terminal time as ∆ ˆ J = ∆ ˆ J(u(), u o (), ∆, ∆ o ; x 0 , λ(), ν) = ˆ J (u(), ∆; x 0 , λ(), ν) − ˆ J (u o (), ∆ o ; x 0 , λ(), ν) = t o f t 0 ¸ x o T (t)Q(t)δx(t) + u o T (t)C(t)δx(t) + x o T (t)C T (t)δu(t) + 1 2 δu T (t)R(t)δu(t) + u o T (t)R(t)δu(t) + ˙ λ T (t)δx(t) (5.247) +λ T (t)A(t)δx(t) + λ T (t)B(t)δu(t) dt +(−λ T (t o f ) + x o T (t o f )S f + ν T D + ∆ o E)δx(t f ) +(x o T (t o f )E T + ν T ˜ D T + ∆ o ˜ E)δ∆ + t o f t 0 O(t; ε)dt +O(ε), where variations of λ(t) and ν multiply the dynamic constraints and the terminal constraint, which we assume are satisfied. Following a procedure similar to that 5.8. Necessary and Sufficient Conditions for Free Terminal Time 219 which produced Theorem 5.3.1, we summarize these results for the free terminal time problem as Theorem 5.8.1. Theorem 5.8.1 Suppose that Assumptions 5.2.1, 5.2.2, and 5.3.1 are satisfied. Then the necessary conditions for ∆J to be nonnegative to first order for strong perturbations in the control (5.26) are ˙ x o (t) = A(t)x o (t) + B(t)u o (t), x o (t 0 ) = x 0 , (5.248) ˙ λ(t) = −A T (t)λ(t) −Q(t)x o (t) −C T (t)u o (t), (5.249) λ(t o f ) = S f x o (t o f ) + D T ν + E T ∆ o , (5.250) 0 = Dx o (t o f ) + ˜ D T ∆ o , (5.251) 0 = Ex o (t o f ) + ˜ Dν + ˜ E∆ o , (5.252) 0 = R(t)u o (t) + C(t)x o (t) + B T (t)λ(t). (5.253) For free terminal time, the boundary condition for the Lagrange multiplier is (5.250), the terminal constraint is (5.251), and the transversality condition is (5.252). Using the transition matrix of (5.50) to relate the initial and terminal values of (x, λ) in the above equations (5.250, 5.251, 5.252) and simplifying the notation by using (5.72), we obtain a new expression similar to (5.154) except that the variation in terminal time is included. The generalization of (5.154) for the free terminal time problem has the form λ(t 0 ) 0 0 ¸ ¸ = − ¯ Φ −1 22 (t o f , t 0 ) ¯ Φ 21 (t o f , t 0 ) ¯ Φ −1 22 (t o f , t 0 )D T D ¯ Φ −T 22 (t o f , t 0 ) D ¯ Φ 12 (t o f , t 0 ) ¯ Φ −1 22 (t o f , t 0 )D T E ¯ Φ −T 22 (t o f , t 0 ) E ¯ Φ 12 (t o f , t 0 ) ¯ Φ −1 22 (t o f , t 0 )D T + ˜ D ¯ Φ −1 22 (t o f , t 0 )E T D ¯ Φ 12 (t o f , t 0 ) ¯ Φ −1 22 (t o f , t 0 )E T + ˜ D T E ¯ Φ 12 (t o f , t 0 ) ¯ Φ −1 22 (t o f , t 0 )E T + ˜ E ¸ ¸ ¸ ¸ x 0 ν ∆ o ¸ ¸ . (5.254) 220 Chapter 5. LQ Control Problem We now make the following identifications with t = t 0 as done in (5.155), (5.156), and (5.157) with the elements in (5.254), yielding the symmetric form λ(t) 0 0 ¸ ¸ ¸ = S(t o f , t; S f ) F T (t o f , t) m T (t o f , t) F(t o f , t) G(t o f , t) n T (t o f , t) m(t o f , t) n(t o f , t) s(t o f , t) ¸ ¸ ¸ ¸ x(t) ν ∆ o ¸ ¸ ¸ . (5.255) The differential properties of − ¯ Φ −1 22 (t o f , t) ¯ Φ 21 (t o f , t), ¯ Φ −1 22 (t o f , t), and ¯ Φ 12 (t o f , t) ¯ Φ −1 22 (t o f , t) are given in Theorem 5.4.1, (5.160), and (5.163), and therefore the differential equations for S(t o f , t, S f ), F(t o f , t), and G(t o f , t) are given in (5.78), (5.161), and (5.164). The dynamics of m(t o f , t), n(t o f , t), and s(t o f , t) are determined from (5.160) and (5.163) as d dt m T (t o f , t) = − A(t) −B(t)R(t) −1 C(t) T −S(t o f , t; S f )B(t)R(t) −1 B T (t) m T (t o f , t), m(t o f , t o f ) = E, (5.256) d dt n T (t o f , t) = F(t o f , t)B(t)R(t) −1 B T (t)m T (t o f , t), n(t o f , t o f ) = ˜ D, (5.257) d dt s(t o f , t) = m(t o f , t)B(t)R(t) −1 B T (t)m T (t o f , t), s(t o f , t o f ) = ˜ E. (5.258) This approach is sometimes called the sweep method because the boundary conditions at t = t o f are swept backward to the initial time. For free terminal time the quadratic cost can be shown to reduce to a form J (u o (); x(t ), t ) = 1 2 x T (t ) ˜ S(t o f , t )x(t ), (5.259) where t is close enough to t o f such that ˜ S(t o f , t ) exists. ˜ S(t o f , t ) is determined by relating λ(t ) to x(t ) as λ(t ) = ˜ S(t o f , t )x(t ), (5.260) 5.8. Necessary and Sufficient Conditions for Free Terminal Time 221 where ν and ∆ o are eliminated in (5.255) in terms of x(t) as − ¸ G(t o f , t) n T (t o f , t) n(t o f , t) s(t o f , t) ¸¸ ν ∆ o ¸ = ¸ F(t o f , t) m(t o f , t) ¸ x(t) ⇒ ¸ ν ∆ o ¸ = − ¸ G(t o f , t) n T (t o f , t) n(t o f , t) s(t o f , t) ¸ −1 ¸ F(t o f , t) m(t o f , t) ¸ x(t). (5.261) Substituting back into (5.255) results in (5.260), where ˜ S(t o f , t) = S(t o f , t; S f ) − F(t o f , t) T m T (t o f , t) ¸ G(t o f , t) n T (t o f , t) n(t o f , t) s(t o f , t) ¸ −1 ¸ F(t o f , t) m(t o f , t) ¸ ¸ ¸ . (5.262) Remark 5.8.1 The proof for positivity of the second variational cost remains the same as given in Theorem 5.5.3. Furthermore, ˜ S(t o f , t) satisfies the same Riccati equation as S(t o f , t; S f ), given in Theorem 5.4.1. Similarly, the feedback control with free terminal time and terminal constraints is given by (5.168), where ¯ S(t f , t) is replaced by ˜ S(t o f , t). For the problem to be normal the inverse in (5.262) must exist. Remark 5.8.2 Construction of ˜ S(t o f , t) by (5.262) using its component parts may not be possible since S(t f , t; S f ) may not exist, whereas ˜ S(t o f , t) can exist. Remark 5.8.3 Theorem 5.5.3 can be extended to show that J u(), t o f ; 0, t 0 is strongly positive definite by a proof similar to that used in Theorem 5.4.3. Remark 5.8.4 Note that the optimal value function for the LQ problem with terminal constraints is given as V (x(t), t) = 1 2 x T (t) ˜ S(t o f , t)x(t). (5.263) This satisfies the Hamilton–Jacobi–Bellman equation given in Theorem 4.7.1. 222 Chapter 5. LQ Control Problem Example 5.8.1 (Second variation with variable terminal time) The objective of this example is to determine the first- and second-order necessary conditions for a free terminal time optimal control problem. The problem statement is as follows: find the control scheme which minimizes J = t f = φ(x f , t f ) (5.264) with the dynamic system equations ˙ x = ¸ ˙ x 1 ˙ x 2 = ¸ v cos β v sin β = f, x(t 0 ) = ¸ x 10 x 20 = x 0 (5.265) and the terminal boundary condition ψ(x f , t f ) = ¸ x 1f x 2f = x f = 0, (5.266) where v is constant and β is the control variable. The augmented performance index is J = t f + ν T ψ + t f t 0 H −λ T ˙ x dt, (5.267) where H = λ T f = λ 1 v cos β + λ 2 v sin β, (5.268) ˜ φ(t f , x f , ν) = t f + ν T x f . (5.269) First-order necessary conditions from Chapter 4 are ˙ x = H T λ = f, x(t 0 ) = x 0 , (5.270) ˙ λ = −H T x = 0, λ(t f ) = ˜ φ T x f = ν ⇒λ(t) = ν, (5.271) 0 = H u = H β = λ 2 v cos β −λ 1 v sin β, (5.272) and the transversality condition is 0 = H(t f ) + ˜ φ t f = λ 1 v cos β f + λ 2 v sin β f + 1. (5.273) 5.8. Necessary and Sufficient Conditions for Free Terminal Time 223 From the optimality condition (5.272) λ 2 cos β(t) = λ 1 sin β(t) ⇒ tan β(t) = λ 2 λ 1 = constant ⇒ β o (t) = constant. (5.274) Integrating the dynamics (5.265) while keeping β o (t) = β o = constant, we obtain x 1 (t) = x 10 + (t −t 0 )v cos β o , x 2 (t) = x 20 + (t −t 0 )v sin β o . (5.275) Substituting (5.275) into (5.266), we obtain x 10 + (t o f −t 0 )v cos β o = 0, x 20 + (t o f −t 0 )v sin β o = 0. (5.276) Solving (5.276) for β o , we obtain tan β o = x 20 x 10 ⇒β o = tan −1 x 20 x 10 . (5.277) Next we determine t o f . If we restrict x 10 > 0 and x 20 > 0, then β satisfies π < β o < 3π 2 and (5.277) has only one solution. Thus cos β o = −x 10 x 2 10 + x 2 20 , sin β o = −x 20 x 2 10 + x 2 20 . (5.278) Therefore, from (5.276) t o f = t 0 + x 2 10 + x 2 20 v . (5.279) Substituting (5.278) into (5.274) and the transversality condition (5.273), we obtain λ 2 (−x 10 ) = λ 1 (−x 20 ), 0 = −λ 1 x 10 x 2 10 + x 2 20 + −λ 2 x 20 x 2 10 + x 2 20 + 1 v . (5.280) 224 Chapter 5. LQ Control Problem Solving these equations for λ 1 and λ 2 , we obtain λ 1 = ν 1 = x 10 v x 2 10 + x 2 20 , λ 2 = ν 2 = x 20 v x 2 10 + x 2 20 . (5.281) Next, the second variation necessary and sufficient conditions are to be developed. The necessary conditions for the variation in the cost criterion to be nonnegative are given in Theorem 5.8.1. For the example they are ˙ x = Bu, where B = f β = ¸ f 1β f 2β = ¸ −v sin β o v cos β o , (5.282) ˙ λ = 0, λ(t o f ) = Dν = Iν ⇒λ(t) = ν, (5.283) 0 = Dx(t o f ) + ˜ D T ∆ o = x(t o f ) + f∆ o , (5.284) 0 = ˜ Dν = f T ν, where E = 0 and ˜ E = 0, (5.285) 0 = v(λ 2 cos β o −λ 1 sin β o ) + u, (5.286) . . . u = −v(λ 2 cos β o −λ 1 sin β o ). (5.287) H ββ = R = 1. (5.288) Therefore, the terminal boundary conditions for (5.255) are given by (5.283), (5.284), and (5.285) as λ(t o f ) 0 0 ¸ ¸ = 0 I 0 I 0 f 0 f T 0 ¸ ¸ x(t o f ) ν ∆ o ¸ ¸ . (5.289) From (5.255) the differential equations (5.78), (5.161), (5.164), (5.256), (5.257), and (5.258) are solved, where A = 0, Q = 0, C = 0, and B and R are given above. Therefore, S(t o f , t; S f ) = 0, F(t o f , t) = I, m(t o f , t) = 0, G(t o f , t) = f β f T β ∆t, n(t o f , t) = f T , and s(t o f , t) = 0, where ∆t = t f −t. Therefore, (5.255) reduces to λ(t) 0 0 ¸ ¸ = 0 I 0 I f β f T β ∆t f 0 f T 0 ¸ ¸ ¸ x(t) ν ∆ o ¸ ¸ . (5.290) 5.9. Summary 225 From (5.290) ¸ f β f T β ∆t f f T 0 ¸ ¸ ν ∆ o = ¸ −I 0 x(t). (5.291) The coefficient matrix of (5.291) is invertible for ∆t > 0, i.e., the determinant is −v 4 ∆t, so that ¸ ν ∆ o = − ¸ f β f T β ∆t f f T 0 ¸ −1 ¸ I 0 x(t) = − sin 2 β o v 2 ∆t −sin β o cos β o v 2 ∆t −cos β o sin β o v 2 ∆t cos 2 β o v 2 ∆t cos β o v sin β o v ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ x(t). (5.292) Substitution of (5.292) into (5.290) gives λ(t) = ˜ S(t f , t)x(t), where ˜ S(t o f , t) = − 1 v 2 ∆t ¸ sin β o −cos β o sin β o −cos β o . (5.293) ˜ S(t o f , t) is well behaved for all t < t o f , where ∆t > 0. Therefore, the extremal path satisfies the condition for positivity and is a local minimum. Note that ˜ S(t o f , t) ≥ 0 is only positive semidefinite. This is because perturbation orthogonal to the extremal path does not affect the cost. 5.9 Summary A consistent theory is given for a rather general formulation of the LQ problem. The theory has emphasized the time-varying formulation of the LQ problem with linear terminal constraints. The relationship between the transition matrix of the Hamiltonian system and the solution to the matrix Riccati differential equation is 226 Chapter 5. LQ Control Problem vividly shown using the symplectic property of Hamiltonian systems. Initially, no requirement was placed on the state weighting in the cost function, although the control weighting was assumed to be positive definite. By using this cost criterion, the existence of the solution to the Riccati differential equation is required for the cost criterion to be positive definite and strongly positive definite. If this problem were interpreted as the accessory problem in the calculus of variations [6], then the requirement that the cost criterion be positive definite is not enough. As shown in Section 5.4.5 and [22, 4] the cost criterion is required to be strongly positive to ensure that the second variation dominates over higher-order terms. The second-order necessary and sufficiency conditions are for extremal paths which satisfy weak or strong first-order conditions. However, it is assumed that there are no discontinuous changes in the optimal control history. If there are discontinuities, then the variation of the control about the discontinuity must be a strong variation. For example, see the Bushaw problem in Sections 4.6.1 and 4.7.1, where the control is bang-bang. This is not included in the current theory. In fact, the development of second-order conditions is done by converting the problem from explicitly considering strong variations to assuming that the times that the control switches are control parameters. With respect to these control parameters, the local optimality of the path is determined. First-order optimality of the cost criterion with respect to the switch time is zero and corresponds to the switch condition for the control with respect to the Hamiltonian. For example, in the Bushaw problem, the switch occurs when H u goes through zero. Second-order optimality of the cost criterion with respect to the switch time can be found in [32] and [20]. Additional extensions of the LQ theory to several classes of nonlinear systems and other advanced topics are given in [29] and [31]. 5.9. Summary 227 Problems 1. If two square matricies are symplectic, show that their product is symplectic. 2. A matrix H is “Hamiltonian” if it satisfies J −1 H T J = −H, (5.294) where J = ¸ 0 I −I 0 (5.295) is the fundamental symplectic matrix. Show that the matrix U −1 HU is also Hamiltonian, where U is a symplectic matrix. 3. Minimize the performance index J = t f 0 ¸ cxu + 1 2 u 2 dt with respect to u() and subject to ˙ x = u, x(0) = 1. (a) What are the first- and second-order necessary conditions for optimality? Don’t bother to solve them. (b) For c = 1 what are the values of t f for which the second variation is nonnegative? (c) For c = −1 what are the values of t f for which the second variation is nonnegative? 228 Chapter 5. LQ Control Problem 4. Consider the problem of minimizing with respect to the control u() ∈ | TB the cost criterion J = t f 0 (a T x + u T Ru)dt, R > 0, subject to ˙ x = Ax + Bu; x(0) = x 0 (given), x(t f ) = 0. (a) By using the results of the accessory minimum problem, show that the extremal is locally minimizing. (b) Relate the results of the second variation to controllability. (c) Show that the second variation is strongly positive. (d) Show that by extremizing the cost with respect to the Lagrange multiplier associated with the terminal constraints, the multipliers maximize the performance index. 5. Show that ¯ S(t f , t), defined in (5.169), satisfies the same Riccati differential equation as (5.78) by performing time differentiation of ¯ S(t f , t) using (5.167). 6. Consider the problem of finding u() ∈ | that minimizes J = t f 0 ¸ 1 + u 2 1 + x 2 1/2 dt subject to ˙ x = u, x(0) = 0, x(t f ) = 0. a. Show that the extremal path and control x o (t) = 0, u o (t) = 0 for t ∈ [0, t f ] satisfies the first-order necessary conditions. b. By using the second variation, determine if x o and u o are locally minimizing. 5.9. Summary 229 7. Consider the frequency domain input/output relation y(s) = 1 s(s + 2) u(s), (5.296) where y = output and u = input. (a) Obtain a minimal-order state representation for (5.296). (b) Consider the cost functional J = ∞ 0 (y 2 + u 2 )dt. (5.297) A feedback law of the form u = −kx (5.298) is desired. Determine the constant vector k such that J is minimized. (c) Calculate the transfer function k T (sI −A) −1 B. (5.299) What is the interpretation of this transfer function? Prove that this transfer function will be the same regardless of the realization used. 8. Solve the optimization problem min u J = x 2 (t f ) + t f 0 u 2 (t)dt, u ∈ |, (5.300) such that ˙ x = u and x(0) = x 0 x, u ∈ R 1 . Determine u o as a function of the time t and initial condition x 0 . Verify that the Hamiltonian is constant along the path. 9. Derive the first-order necessary conditions of optimality for the following problem: min u J(t 0 ) = 1 2 x T (t f )S(t f )x(t f ) + 1 2 t f t 0 x T (t)Q(t)x(t) + u T (t)R(t)u(t) dt 230 Chapter 5. LQ Control Problem such that ˙ x(t) = A(t)x(t) + B(t)u(t) + d(t) and d(t) ∈ R n is a known disturbance. 10. Solve the following tracking problem using the steady state solution for the Riccati equation min u J = 1 2 p [x(t f ) −r(t f )] 2 + 1 2 t f 0 [q(x −r) 2 + u 2 ]dt (5.301) subject to ˙ x = ax + bu (x, u ∈ R 1 ) (5.302) and a reference input r(t) = 0, t < 0, e −t , t > 0. Determine the optimum control u o , the optimum trajectory, and the tracking error. Examine the influence of the weighting parameter q. 231 Chapter 6 Linear Quadratic Differential Games 6.1 Introduction In the previous developments, the cost criterion was minimized with respect to the control. In this section a generalization is introduced where there are two controls, one designed to minimize the cost criterion and the other to maximize it. We first consider the LQ problem and develop necessary conditions for optimality. In particular, we show that the optimal solution satisfies a saddle point inequality. That is, if either player does not play his optimal strategy, then the other player gains. Since the problem is linear in the dynamics and quadratic in the cost criterion, we show that this saddle point can be obtained as a perfect square in the adversaries strategies. By following the procedure given in Section 5.4.4, a quadratic function of the state is assumed to be the optimal value of the cost. This function is then used to complete the squares, whereby an explicit form is obtained in a perfect square of the adversaries strategies. Note that in the game problem strategies that are functions of the state as well as time are sought. The issue of feedback control when the state is not perfectly known is considered next. Our objective is to derive a control synthesis method called H ∞ synthesis 232 Chapter 6. LQ Differential Games and as a parameter goes to zero the H 2 synthesis method is recovered. To this end a disturbance attenuation function is defined as an input–output transfer function representing the ratio of a quadratic norm of the desired system outputs over the quadratic norm of the input disturbances. It is shown that under certain conditions the disturbance attenuation function can be bounded. This is shown by formulating a related differential game problem, the solution to which satisfies the desired bound for the disturbance attenuation problem. The results produce a linear controller based upon a measurement sequence. This controller is then specialized to show an explicit full state feedback and a linear estimator for the state. Relationships with current synthesis algorithms are then made. 6.2 LQ Differential Game with Perfect State Information The linear dynamic equation is extended to include an additional control vector w as ˙ x(t) = A(t)x(t) + B(t)u(t) + Γ(t)w(t) , x(t 0 ) = x 0 . (6.1) The problem is to find control u() ∈ | which minimizes, and w() ∈ ¼ (| and ¼ are similar admissible sets defined in Assumption 3.2.2) which maximizes the performance criterion J(u(), w(); x 0 , t 0 ) = 1 2 x T (t f )Q f x(t f ) + 1 2 t f t 0 (x T (t)Q(t)x(t) +u T (t)R(t)u(t) −θw T (t)W −1 (t)w(t))dt, (6.2) where x(t) ∈ R n , u(t) ∈ R m , w(t) ∈ R p , Q T (t) = Q(t) ≥ 0, R T (t) = R(t) > 0, W T (t) = W(t) > 0, and Q T f = Q f . Note that if the parameter θ < 0, the players u() and w() are cooperative (w() minimizes the cost criterion (6.2)), and we revert 6.2. LQ Differential Game with Perfect State Information 233 back to the results of the last chapter. If θ > 0, then the players u() and w() are adversarial where u() minimizes and w() maximizes J(u(), w(); x 0 , t 0 ). Note the negative weight in the cost penalizes large excursions in w(). We consider only θ > 0. Unlike the earlier minimization problems, a saddle point inequality is sought such that J(u ◦ (), w(); x 0 , t 0 ) ≤ J(u ◦ (), w ◦ (); x 0 , t 0 ) ≤ J(u(), w ◦ (); x 0 , t 0 ). (6.3) The functions (u ◦ (), w ◦ ()) are called saddle point controls or strategies. If either player deviates from this strategy, the other player gains. This is also called a zero sum game, since whatever one player loses, the other gains. For these strategies to be useful, the strategies should be functions of both state and time. We assume that the saddle point value of the cost is given by the optimal value function J(u ◦ (), w ◦ (); x 0 , t 0 ) = V (x(t), t) = 1 2 x T (t)S G (t f , t; Q f )x(t), where S G (t f , t; Q f ) will be generated by a Riccati differential equation consistent with the game formulation. Our objective is to determine the form of the Riccati differential equation. This choice for the optimal value functions seems natural, given that the optimal value function for the LQ problem in Chapter 5 are all quadratic functions of the state. Note that since only the symmetric part of S G (t f , t; Q f ) contributes to the quadratic form, only the symmetric part is assumed. We use a procedure suggested in Section 5.4.4 to complete the square of a quadratic form. We add the identity − t f t 0 x T (t)S G (t f , t; Q f ) ˙ x(t)dt = 1 2 t f t 0 x T (t) ˙ S G (t f , t; Q f )x(t)dt − 1 2 x T (t)S G (t f , t; Q f )x(t) t f t 0 234 Chapter 6. LQ Differential Games to (6.2) as ˆ J(u(), w(); S G (t f , t; Q f ), x 0 , t 0 ) = t f t 0 ¸ 1 2 (x T (t)Q(t)x(t) + u T (t)R(t)u(t) −θw T (t)W −1 (t)w(t)) +x T (t)S G (t f , t; Q f )(A(t)x(t) +B(t)u(t) + Γ(t)w(t)) + 1 2 x T (t) ˙ S G (t f , t; Q f )x(t) dt − 1 2 x T (t)S G (t f , t; Q f )x(t) t f t 0 + 1 2 x T (t f )Q f x(t f ) = t f t 0 ¸ 1 2 x T (t)(Q(t) + S G (t f , t; Q f )A(t) + A T (t)S G (t f , t; Q f ) + ˙ S G (t f , t; Q f ))x(t) + 1 2 u T (t)R(t)u(t) − 1 2 θw T (t)W −1 (t)w(t) + x T (t)S G (t f , t; Q f )(B(t)u(t) + Γ(t)w(t)) dt − 1 2 x T (t)S G (t f , t; Q f )x(t) t f t 0 + 1 2 x T (t f )Q f x(t f ). (6.4) By choosing S G (t f , t; Q f ) to satisfy the matrix Riccati equation Q(t) + S G (t f , t; Q f )A(t) + A T (t)S G (t f , t; Q f ) + ˙ S G (t f , t; Q f ) = S G (t f , t; Q f )(B(t)R −1 (t)B T (t) −θ −1 Γ(t)W(t)Γ T (t))S G (t f , t; Q f ) , S G (t f , t f ; Q f ) = Q f , (6.5) ˆ J(u(), w(); S G (t f , t; Q f ), x 0 , t 0 ) reduces to squared terms in the strategies of u() and w() as ˆ J(u(), w(); S G (t f , t; Q f ), x 0 , t 0 ) = 1 2 t f t 0 (u(t) + R −1 (t)B T (t)S G (t f , t; Q f )x(t)) T R(t)(u(t) + R −1 (t)B T (t) S G (t f , t; Q f )x(t)) −(w(t) −θ −1 W(t)Γ T (t)S G (t f , t; Q f )x(t)) T θW −1 (w(t)θ −1 W(t)Γ T (t)S G (t f , t; Q f )x(t)) dt + 1 2 x T (t 0 )S G (t f , t 0 ; Q f )x(t 0 ). (6.6) 6.3. Disturbance Attenuation Problem 235 The saddle point inequality (6.3) is satisfied if u ◦ (t) = −R −1 (t)B T (t)S G (t f , t; Q f )x(t), w ◦ (t) = θ −1 W(t)Γ T (t)S G (t f , t; Q f )x(t), (6.7) and the the solution to the Riccati equation (6.5) remains bounded. Since we have completed the square in the cost criterion, Equation (6.6) produces a sufficiency condition for saddle point optimality. Remark 6.2.1 The solution to the controller Riccati equation (6.5) can have a finite escape time because the matrix (B(t)R −1 (t)B T (t) −θ −1 Γ(t)W(t)Γ T (t)) can be indefinite. Note that the cost criterion (6.2) may be driven to large positive values by w(), since the cost criterion is not a concave functional with respect to x(t) and w(t). Therefore, if S G (t f , t; Q f ) escapes as t → t e , where t e is the escape time, then for some x(t) as t → t e , the cost criterion approaches infinity. There exists a solution u ◦ () ∈ | and w ◦ () ∈ ¼ given in (6.7) to the differential game problem if and only if S G (t f , t; Q f ) exists for all t ∈ [t 0 , t f ]. This is proved for a more general problem in Theorem 6.3.1. 6.3 Disturbance Attenuation Problem We now use the game theoretic results of the last section to develop a controller which is to some degree insensitive to input process and measurement disturbances. Consider the setting in Figure 6.1. The objective is to design a compensator based only on the measurement history, such that the transmission from the disturbances to the performance outputs are limited in some sense. To make these statements more explicit, consider the dynamic system ˙ x(t) = A(t)x(t) + B(t)u(t) + Γ(t)w(t), x(t 0 ) = x o , (6.8) z(t) = H(t)x(t) + v(t), (6.9) 236 Chapter 6. LQ Differential Games Plant Compensator Disturbance (w, v, x 0 ) Performance Outputs (y) Measurements (z) Control (u) Figure 6.1: Disturbance attenuation block diagram. where z(t) ∈ R q is the measurement, w(t) ∈ R m is the process disturbance error, v(t) ∈ R q is the measurement disturbance error, and x 0 ∈ R n is an unknown initial condition. The matrices A(t), B(t), Γ(t), and H(t) are known functions of time. The performance outputs are measures of desired system performance, such as good tracking error or low actuation inputs to avoid saturation. The general performance measure can be written as y(t) = C(t)x(t) + D(t)u(t), (6.10) where y(t) ∈ R r . A general representation of the input-output relationship between disturbances ˜ w(t) = [w T (t), v T (t), x T 0 ] T and output performance measure y(t) is the disturbance attenuation function D a = |y()| 2 2 | ˜ w()| 2 2 , (6.11) 6.3. Disturbance Attenuation Problem 237 where we have extended the norm |y(u())| 2 2 to include a quadratic terminal function as |y()| 2 2 ∆ = 1 2 ¸ x T (t f )Q f x(t f ) + t f t 0 (x T (t)Q(t)x(t) + u T (t)R(t)u(t))dt , (6.12) where the integrand is |y(t)| 2 2 ∆ = y T (t)y(t) = x T (t)C T (t)C(t)x(t) + u T (t)D T (t)D(t)u(t), C T (t)C(t) = Q(t), C T (t)D(t) = 0, D T (t)D(t) = R(t), (6.13) and ˜ w(t) ∆ = [w T (t), v T (t), x T 0 ] T , where, with V (t) = V T (t) > 0 and P 0 = P T 0 > 0, | ˜ w()| 2 2 ∆ = 1 2 ¸ t f t 0 (w T (t)W −1 (t)w(t) + v T (t)V −1 (t)v(t))dt + x T 0 P −1 0 x 0 . (6.14) In this formulation, we assume that the cross terms for the disturbances w(t) and v(t) in (6.14) are zero. The disturbance attenuation problem is to find a controller u(t) = u(Z t ) ∈ | ⊂ |, where the measurement history is Z t ∆ = ¦z(s) : 0 ≤ s ≤ t¦, so that the disturbance attenuation problem is bounded as D a ≤ θ, θ > 0, (6.15) for all admissible processes of w(t) and v(t) and initial condition x 0 R n . | is the class of admissible controls and | ⊂ | is the subset of controllers that are linear functions of Z t . The choice of θ cannot be completely arbitrary. There exists a θ c where if θ ≤ θ c , the solution to the problem does not exist. As will be shown, the solutions to associated Riccati equations may have a finite escape time. 238 Chapter 6. LQ Differential Games 6.3.1 The Disturbance Attenuation Problem Converted into a Differential Game This disturbance attenuation problem is converted to a differential game problem with performance index obtained from manipulating Equations (6.11) and (6.15) as J(u(), ˜ w(); t 0 , t f ) = |y()| 2 2 −θ| ˜ w()| 2 2 . (6.16) For convenience define a process for a function ˆ w() as ˆ w b a () ∆ = ¦ ˆ w(t) : a ≤ t ≤ b¦ . (6.17) The differential game is then to find the minimax solution as J ◦ (u ◦ (), ˜ w ◦ (); t 0 , t f ) = min u t f t 0 (·) max w t f t 0 (·),v t f t 0 (·),x 0 J(u(), ˜ w(); t 0 , t f ). (6.18) We first assume that the min and max operations are interchangeable. It can be shown [40] that the solution has a saddle point, and therefore this interchange of the min and max operations is valid. The saddle point condition is validated in Section 6.3.3. This problem is solved by dividing the problem into a future part, τ > t, and past part, τ < t, and joining them together with a connection condition, where t is the “current” time. Therefore, expand Equation (6.18) as J ◦ (u ◦ (), ˜ w ◦ (); t 0 , t f ) = min u t t 0 (·) max w t t 0 (·),v t t 0 (·),x 0 ¸ J(u(), ˜ w(); t 0 , t) + min u t f t (·) max w t f t (·),v t f t (·) J(u(), ˜ w(); t, t f ) ¸ . (6.19) Note that for the future time interval no measurements are available. Therefore, maximizing with respect to v t f t () given the form of the performance index (6.14) and (6.16) produces the worst future process if v t f t () is given as v(τ) = 0, t < τ ≤ t f . (6.20) 6.3. Disturbance Attenuation Problem 239 Therefore, the game problem associated with the future reduces to a game between only u t f t () and w t f t (). The results given in Section 6.2 are now applicable. In particular, the controller of Equation (6.7) is u ◦ (t) = −R −1 (t)B T (t)S G (t f , t; Q f )x(t), (6.21) where x(t) is not known. The objective is to determine x(t) as a function of the measurement history by solving the problem associated with the past. Note that the optimal value function at t is V (x(t), t) = 1 2 x(t) T S G (t f , t; Q f )x(t) (6.22) from Equation (6.6), where t, the current time, rather than t 0 , is considered the initial time. 6.3.2 Solution to the Differential Game Problem Using the Conditions of the First-Order Variations Using Equation (6.22) to replace the second term in (6.19), the problem reduces to J ◦ (u ◦ (), ˜ w ◦ (); t 0 , t f ) = min u t t 0 (·) max w t t 0 (·),v t t 0 (·),x 0 [J(u(), ˜ w(); t 0 , t) + V (x(t), t)]. (6.23) This game problem can be reduced further by noting that u t t 0 () has already occurred and therefore, the minimization is meaningless. Second, if the cost criterion is maxi- mized with respect to w t t 0 and x 0 , then the resulting state is determined. Since the state history Z t is known, then the value of v t t 0 is a result and is eliminated in the cost criterion by using Equation (6.9) as v(τ) = z(τ) −H(τ)x(τ), t 0 ≤ τ < t. (6.24) 240 Chapter 6. LQ Differential Games The optimization problem now reduces to J ◦ (u ◦ (), ˜ w ◦ (); t 0 , t f ) = max w t t 0 ,x 0 1 2 ¸ t t 0 (x T (τ)Q(τ)x(τ) + u T (τ)R(τ)u(τ))dτ −θ t t 0 (w T (τ)W −1 (τ)w(τ) + (z(τ) −H(τ)x(τ)) T V −1 (τ)(z(τ) −H(τ)x(τ)))dτ −θx T 0 P −1 0 x 0 + x(t) T S G (t f , t; Q f )x(t) (6.25) subject to the dynamic equations ˙ x(t) = A(t)x(t) + B(t)u(t) + Γ(t)w(t), x(0) = x 0 . (6.26) In a manner similar to that of Section 3.3.1, the dynamics (6.26) are augmented to the performance index (6.25) with the Lagrange multiplier λ(t). The augmented performance index is ˆ J ◦ (u ◦ (), ˜ w ◦ (); t 0 , t f ) = max w t t 0 ,x 0 ¸ t t 0 1 2 x T (τ)Q(τ)x(τ) +u T (τ)R(τ)u(τ) −θ(w T (τ)W −1 (τ)w(τ) +(z(τ) −H(τ)x(τ)) T V −1 (τ)(z(τ) −H(τ)x(τ))) + 2λ T (τ)(A(τ)x(τ) +B(τ)u(τ) + Γ(τ)w(τ) − ˙ x(τ)) dτ −θ 1 2 x T 0 P −1 0 x 0 + 1 2 x T (t)S G (t f , t; Q f )x(t) . (6.27) Integrate by parts to give ˆ J ◦ (u ◦ (), ˜ w ◦ (); t 0 , t f ) = max w t t 0 ,x 0 ¸ t t 0 1 2 x T (τ)Q(τ)x(τ) +u T (τ)R(τ)u(τ) −θ(w T (τ)W −1 (τ)w(τ) +(z(τ) −H(τ)x(τ)) T V −1 (τ)(z(τ) −H(τ)x(τ))) + 2λ T (τ)(A(τ)x(τ) +B(τ)u(τ) + Γ(τ)w(τ)) + 2 ˙ λ T (τ)x(τ) dτ − λ T x t t 0 −θ 1 2 x T 0 P −1 0 x 0 + 1 2 x T (t)S G (t f , t; Q f )x(t) . (6.28) 6.3. Disturbance Attenuation Problem 241 By taking the first variation of ˆ J ◦ (u ◦ (), ˜ w ◦ (); t 0 , t f ) as was done in Section 3.3.1, we obtain δ ˆ J = ˆ J(u ◦ (), ˜ w(); t 0 , t f ) − ˆ J ◦ (u ◦ (), ˜ w ◦ (); t 0 , t f ) = t t 0 x T (τ)Q(τ)δx(τ) −θ w T (τ)W −1 (τ)δw(τ) −(z(τ) −H(τ)x(τ)) T V −1 (τ)H(τ)δx(τ) +λ T (τ)A(τ)δx(τ) + λ T (τ)Γ(τ)δw(τ) + ˙ λ T (τ)δx(τ) dτ − λ T δx t t 0 −θx T 0 P −1 0 δx 0 + x T (t)S G (t f , t; Q f )δx(t), (6.29) and then the first-order necessary conditions are ˙ λ T (τ) + λ T (τ)A(τ) + θ(z(τ) −H(τ)x(τ)) T V −1 (τ)H(τ) + x T (τ)Q(τ) = 0, (6.30) λ T (t 0 ) −θx T 0 P −1 0 = 0 ⇒λ(t 0 ) = θP −1 0 x 0 , (6.31) −λ T (t) + x T (t)S G (t f , t; Q f ) = 0. (6.32) w(τ) = θ −1 W(τ)Γ T (τ)λ(τ), (6.33) where τ is the running variable. The dynamic equations for the Hamiltonian system are ˙ x(τ) = A(τ)x(τ) + B(τ)u(τ) + θ −1 Γ(τ)W(τ)Γ T (τ)λ(τ) , x(t 0 ) = x 0 , (6.34) ˙ λ(τ) = −A T (τ)λ(τ) −Q(τ)x(τ) −H T (τ)θV −1 (τ)(z(τ) −H(τ)x(τ)) , λ(t 0 ) = θP −1 0 x 0 , (6.35) where over the interval t 0 ≤ τ ≤ t, u(τ) and z(τ) are known processes. By examination of (6.31), where ˆ x(t 0 ) = 0, we conjecture and verify in Sec- tion 6.3.3 that the following form can be swept forward: λ(τ) = θP −1 (t 0 , τ; P 0 )(x(τ) − ˆ x(τ)) (6.36) 242 Chapter 6. LQ Differential Games or that the solution for x(τ) satisfies x(τ) = ˆ x(τ) + θ −1 P(t 0 , τ; P 0 )λ(τ). (6.37) To determine the differential equations for ˆ x(τ) and P(t 0 , τ; P 0 ), we differentiate (6.37) to get ˙ x(τ) = ˙ ˆ x(τ) + θ −1 ˙ P(t 0 , τ; P 0 )λ(τ) + θ −1 P(t 0 , τ; P 0 ) ˙ λ(τ). (6.38) Substitute (6.34) and (6.35) into (6.38) we obtain A(τ)x(τ) + B(τ)u(τ) + θ −1 Γ(τ)W(τ)Γ T (τ)λ(τ) = ˙ ˆ x(τ) + θ −1 ˙ P(t 0 , τ; P 0 )λ(τ) + θ −1 P(t 0 , τ; P 0 )(−A T (τ)λ(τ) −Q(τ)x(τ) −θH T (τ)V −1 (τ)(z(τ) −H(τ)x(τ))). (6.39) Replace x(τ) using Equation (6.37) in (6.39). Then A(τ)ˆ x(τ) + θ −1 A(τ)P(t 0 , τ; P 0 )λ(τ) + B(τ)u(τ) + θ −1 Γ(τ)W(τ)Γ T (τ)λ(τ) = ˙ ˆ x(τ) + θ −1 ˙ P(t 0 , τ; P 0 )λ(τ) −θ −1 P(t 0 , τ; P 0 )A T (τ)λ(τ) −θ −1 P(t 0 , τ; P 0 )Q(τ)ˆ x(τ) −θ −1 P(t 0 , τ; P 0 )Q(τ)θ −1 P(t 0 , τ; P 0 )λ(τ) −θ −1 P(t 0 , τ; P 0 )θH T (τ)V −1 (τ)(z(τ) −H(τ)ˆ x(τ)) +θ −1 P(t 0 , τ; P 0 )θH T (τ)V −1 (τ)H(τ)θ −1 P(t 0 , τ; P 0 )λ(τ). (6.40) Rewriting so that all terms multiplying λ(τ) are on one side of the equal sign, A(τ)ˆ x(τ) − ˙ ˆ x(τ) + B(τ)u(τ) + θ −1 P(t 0 , τ; P 0 )Q(τ)ˆ x(τ) +θ −1 θP(t 0 , τ; P 0 )H T (τ)V −1 (τ)(z(τ) −H(τ)ˆ x(τ)) = −θ −1 A(τ)P(t 0 , τ; P 0 )λ(τ) −θ −1 Γ(τ)W(τ)Γ T (τ)λ(τ) + θ −1 ˙ P(t 0 , τ; P 0 )λ(τ) −θ −1 P(t 0 , τ; P 0 )A T (τ)λ(τ) −θ −1 P(t 0 , τ; P 0 )Q(τ)θ −1 P(t 0 , τ; P 0 )λ(τ) +θ −1 P(t 0 , τ; P 0 )θH T (τ)V −1 (τ)θ −1 H(τ)P(t 0 , τ; P 0 )λ(τ). (6.41) 6.3. Disturbance Attenuation Problem 243 If we choose ˆ x(τ) to satisfy ˙ ˆ x(τ) = A(τ)ˆ x(τ) + B(τ)u(τ) + θ −1 P(t 0 , τ; P 0 )Q(τ)ˆ x(τ) +P(t 0 , τ; P 0 )H T (τ)V −1 (τ)(z(τ) −H(τ)ˆ x(τ)), ˆ x(t 0 ) = 0, (6.42) and P(t 0 , τ; P 0 ) to satisfy −A(τ)P(t 0 , τ; P 0 ) −Γ(τ)W(τ)Γ T (τ) + ˙ P(t 0 , τ; P 0 ) −P(t 0 , τ; P 0 )A T (τ) −θ −1 P(t 0 , τ; P 0 )Q(τ)P(t 0 , τ; P 0 ) + P(t 0 , τ; P 0 )H T (τ)V −1 (τ)H(τ)P(t 0 , τ; P 0 ) = 0 (6.43) or ˙ P(t 0 , τ; P 0 ) = A(τ)P(t 0 , τ; P 0 ) + P(t 0 , τ; P 0 )A T (τ) + Γ(τ)W(τ)Γ T (τ) −P(t 0 , τ; P 0 )(H T (τ)V −1 (τ)H(τ) −θ −1 Q(τ))P(t 0 , τ; P 0 ) , P(t 0 ) = P 0 , (6.44) then (6.38) becomes an identity. Remark 6.3.1 The solution to the estimation Riccati equation (6.44) may have a finite escape time because the matrix (H T (τ)V −1 (τ)H(τ) −θ −1 Q(τ)) can be indefinite. Some additional properties of the Riccati differential equation of (6.5) and (6.44) are given in [40]. At the current time t from Equation (6.32), λ(t) = S G (t f , t; Q f )x(t). Therefore, the worst-case state x ◦ (t) is equivalent to x(t) when using Equation (6.37), so that x ◦ (t) = ˆ x(t) + θ −1 P(t 0 , t; P 0 )S G (t f , t; Q f )x ◦ (t). (6.45) This is explicitly shown in Section 6.3.3. In factored form from (6.45), the estimate ˆ x(t) and the worst state x ◦ (t) are related as ˆ x(t) = (I −θ −1 P(t 0 , t; P 0 )S G (t f , t; Q f ))x ◦ (t), x ◦ (t) = (I −θ −1 P(t 0 , t; P 0 )S G (t f , t; Q f )) −1 ˆ x(t). (6.46) 244 Chapter 6. LQ Differential Games The state x ◦ (t) in Equation (6.46) is the saddle value of the state that is used in the controller. The optimal controller u(t) = u(Z t ) is now written as u ◦ (t) = −R −1 (t)B T (t)S G (t f , t; Q f )(I −θ −1 P(t 0 , t; P 0 )S G (t f , t; Q f )) −1 ˆ x(t), (6.47) where S G (t f , t; Q f ) is determined by integrating (6.5) backward fromt f and P(t 0 , t; P 0 ) is determined by integrating (6.44) forward from t 0 . Note 6.3.1 ˆ x(t) summaries the measurement history in an n-vector. It should be noted that if all adversaries play their saddle point strategy, then x ◦ (t) = 0, v ◦ (t) = 0, ˆ x(t) = 0, u ◦ (t) = 0, w ◦ (t) = 0, (6.48) which implies that J(u ◦ (), ˜ w(); t 0 , t f ) ≤ J ◦ (u ◦ (), ˜ w ◦ (); t 0 , t f ) = 0. (6.49) From Equation (6.49) we see that Equation (6.15) is satisfied and the game solution also provides the solution to the disturbance attenuation problem. Remark 6.3.2 Note that when θ →∞ the disturbance attenuation or H ∞ controller given by (6.47) reduce to the H 2 controller 10 u ◦ (t) = −R −1 (t)B T (t)S G (t f , t; Q f )ˆ x(t), (6.50) where the controller gains are obtained from the Riccati equation (6.5) with θ = ∞ and is identical to the Riccati equation in Theorem 5.6.2. The state is reconstructed from the filter with ˆ x = x ◦ ˙ ˆ x(t) = A(t)ˆ x(t) + B(t)u(t) + P(t 0 , t; P 0 )H T (t), V −1 (t) (z(t) −H(t)ˆ x(t)) , (6.51) 10 In the stochastic setting this is known as the linear-quadratic-Gaussian (LQG) controller. 6.3. Disturbance Attenuation Problem 245 where the filter gains are determined from (6.44) with θ = ∞. The properties of this Riccati equation are again similar to those given in Theorem 5.6.2. 6.3.3 Necessary and Sufficient Conditions for the Optimality of the Disturbance Attenuation Controller In this section are given necessary and sufficient conditions that guarantee that the controller (6.47) satisfies J(u ◦ (), ˜ w(); t 0 , t f ) ≤ 0, i.e., that J(u ◦ (), ˜ w(); t 0 , t f ) is concave with respect to ˜ w(). To do this, the cost criterion is written as the sum of two quadratic forms evaluated at the current time t. One term is the optimal value function x T (t)S(t f , t; Q f )x(t)/2, where the cost criterion is swept backward from the terminal time to the current time t. For the second term, we verify that the optimal value function which sweeps the initial boundary function −θx T 0 P −1 0 x 0 /2 forward is ˜ V (e(t), t) = θ 2 e T (t)P −1 (t 0 , t; P 0 )e(t), (6.52) where e(t) = x(t) −ˆ x(t) and ˜ V x (e(t), t) = λ(t) = θP −1 (t 0 , t; P 0 )e(t) as given in (6.36). The dynamic equation for e(t) is found by subtracting (6.42) from (6.26) as ˙ e(t) = A(t)e(t) + Γ(t)w(t) −θ −1 P(t 0 , t; P 0 )Q(t)ˆ x(t) −P(t 0 , t; P 0 )H T (t)V −1 (t)(z(t) −H(t)ˆ x(t)). (6.53) As in Section 6.2 we complete the squares of the cost criterion (6.25) by adding the identically zero quantity 0 = t t 0 ¸ θe T (τ)P −1 (t 0 , τ; P 0 ) ˙ e(τ) + θ 2 e T (τ) ˙ P −1 (t 0 , τ; P 0 )e(τ) dτ − θ 2 e T (τ)P −1 (t 0 , τ; P 0 )e(τ) t t 0 (6.54) 246 Chapter 6. LQ Differential Games to (6.25) as J(u ◦ (), ˜ w ◦ (), x(t); t 0 , t f ) = max w t t 0 ,x 0 1 2 ¸ t t 0 (x T (τ)Q(τ)x(τ) + u T (τ)R(τ)u(τ)) −θ(w T (τ)W −1 (τ)w(τ) +(z(τ) −H(τ)x(τ)) T V −1 (τ)(z(τ) −H(τ)x(τ)) −2e T (τ)P −1 (t 0 , τ; P 0 ) ˙ e(τ) −e T (τ) ˙ P −1 (t 0 , τ; P 0 )e(τ))dτ −θx T 0 P −1 0 x 0 − θe T (τ)P −1 (t 0 , τ; P 0 )e(τ) t t 0 + x(t) T S G (t f , t; Q f )x(t) , (6.55) where the terms evaluated at t = t 0 cancel in (6.55). The state x(t) is still free and unspecified. In (6.46) the worst-case state is captured by the boundary conditions. Here, we will maximize J(u ◦ (), ˜ w ◦ (), x(t); t 0 , t f ) with respect to x(t). Note that e(t 0 ) = x 0 , since we assumed ˆ x(t 0 ) = 0 only for convenience. Furthermore, the control in the quadratic term is a given process over the interval [t 0 , t f ). From (6.44) the RDE for P −1 (t 0 , t; P 0 ) is determined as ˙ P −1 (t 0 , t; P 0 ) = −P −1 (t 0 , t; P 0 )A(t) −A T (t)P −1 (t 0 , t; P 0 ) −P −1 (t 0 , t; P 0 )Γ(t)W(t)Γ T (t)P −1 (t 0 , t; P 0 ) +(H T (t)V −1 (t)H(t) −θ −1 Q(t)), P −1 (t 0 ) = P −1 0 . (6.56) First, (6.53) is substituted into (6.55) to obtain J(u ◦ (), ˜ w ◦ (), x(t); t 0 , t f ) = max w t t 0 ,x 0 1 2 ¸ t t 0 (x T (τ)Q(τ)x(τ) + u T (τ)R(τ)u(τ) −θw T (τ)W −1 (τ)w(τ) −θ(z(τ) −H(τ)x(τ)) T V −1 (τ)(z(τ) −H(τ)x(τ)) +2θe T (τ)P −1 (t 0 , τ; P 0 )(A(τ)e(τ) + Γ(τ)w(τ) −θ −1 P(t 0 , (τ); P 0 )Q(τ)ˆ x(τ) −P(t 0 , (τ); P 0 )H T (τ)V −1 (τ)(z(τ) −H(τ)ˆ x(τ)) 6.3. Disturbance Attenuation Problem 247 +θe T (τ) ˙ P −1 (t 0 , τ; P 0 )e(τ))dτ −θe T (t)P −1 (t 0 , t; P 0 )e(t) +x(t) T S G (t f , t; Q f )x(t) , (6.57) By adding and subtracting the term θe T (τ)P −1 (t 0 , τ; P 0 )Γ(τ)W(τ)Γ T (τ)P −1 (t 0 , τ; P 0 )e(τ) and substituting in x(τ) = ˆ x(τ) + e(τ), (6.57) becomes J(u ◦ (), ˜ w ◦ (), x(t); t 0 , t f ) = max w t t 0 ,x 0 1 2 ¸ t t 0 (e T (τ)Q(τ)e(τ) + ˆ x T (τ)Q(τ)ˆ x(τ) + u T (τ)R(τ)u(τ) −θ(z(τ) −H(τ)ˆ x(τ)) T V −1 (τ)(z(τ) −H(τ)ˆ x(τ)) −θ(w T (τ) −e T (τ)P −1 (t 0 , τ; P 0 )Γ(τ)W(τ))W −1 (τ)(w(τ) −W(τ)Γ T (τ)P −1 (t 0 , τ; P 0 )e(τ)) +θe T (τ)P −1 (t 0 , τ; P 0 )Γ(τ)W(τ)Γ T (τ)P −1 (t 0 , τ; P 0 )e(τ) +2θe T (τ)P −1 (t 0 , τ; P 0 )A(τ)e(τ) −θe T (τ)H T (τ)V −1 (τ)H(τ)e(τ) +θe T (τ) ˙ P −1 (t 0 , τ; P 0 )e(τ))dτ −θe T (t)P −1 (t 0 , t; P 0 )e(t) +x(t) T S G (t f , t; Q f )x(t) . (6.58) Using (6.56) in (6.58) and maximizing with respect to w t t 0 , the optimal cost criterion reduces to J(u ◦ (), ˜ w ◦ (), x(t); t 0 , t f ) = 1 2 I(Z t ) −θe T (t)P −1 (t 0 , t; P 0 )e(t) + x T (t)S G (t f , t; Q f )x(t) , (6.59) where I(Z t ) = t t 0 ˆ x T (τ)Q(τ)ˆ x(τ) + u T (τ)R(τ)u(τ) − θ(z(τ) −H(τ)ˆ x(τ)) T V −1 (τ)(z(τ) −H(τ)ˆ x(τ)) dτ, (6.60) the maximizing w(τ) is w ◦ (τ) = W(τ)Γ T (τ)P −1 (t 0 , τ; P 0 )e(τ), (6.61) 248 Chapter 6. LQ Differential Games and maximizing (6.60) with respect to v(τ) using (6.9) gives v ◦ (τ) = −H(τ)e(τ). (6.62) Since the terms under the integral are functions only of the given measurement process, they are known functions over the past time interval. Therefore, the determination of the worst-case state is found by maximizing over the last two terms in J(u ◦ (), ˜ w ◦ (), x(t); t 0 , t f ) of (6.59). Thus, J x(t) (u ◦ (), ˜ w ◦ (), x(t); t 0 , t f ) = 0 gives (6.46) and the second variation condition for a maximum gives P −1 (t 0 , t; P 0 ) −θ −1 S G (t f , t; Q f ) > 0. (6.63) This inequality is known as the spectral radius condition. Note that J(u ◦ (), ˜ w ◦ (), x ◦ (t); t 0 , t f ) = J ◦ (u ◦ (), ˜ w ◦ (); t 0 , t f ). We now show that a necessary and sufficient condition for J(u ◦ (), ˜ w(), x(t); t 0 , t f ) to be concave with respect to ˜ w(), x(t) is that the following assumption be satisfied. Assumption 6.3.1 ˜ 1. There exists a solution P(t 0 , t; P 0 ) to the Riccati differential equation (6.44) over the interval [t 0 , t f ]. 2. There exists a solution S G (t f , t; Q f ) to the Riccati differential equation (6.5) over the interval [t 0 , t f ]. 3. P −1 (t 0 , t; P 0 ) −θ −1 S G (t f , t; Q f ) > 0 over the interval [t 0 , t f ]. Remark 6.3.3 In [40] some properties of this class of Riccati differential equation are presented. For example, if Q f ≥ 0 (> 0), then S G (t f , t; Q f ) ≥ 0 (> 0) for t f ≥ t ≥ t 0 . If S G (t f , t; Q f ) has an escape time in the interval t f ≥ t ≥ t 0 , then some eigenvalues of S G (t f , t; Q f ) must go off to positive infinity. 6.3. Disturbance Attenuation Problem 249 Theorem 6.3.1 There exists a solution u ◦ (Z t ) ∈ | to the finite-time disturbance attenuation problem if and only if Assumption 6.3.1 holds. If Assumption 6.3.1 holds, u ◦ (t) = −R −1 (t)B T (t)S G (t f , t; Q f )x ◦ (t). Proof: Sufficiency: Suppose that Assumption 6.3.1 holds. For the strategies u ◦ (t) = −R −1 (t)B T (t)S G (t f , t; Q f )x ◦ (t), w ◦ (t) = θ −1 W(t)Γ T (t)S G (t f , t; Q f )x ◦ (t), and v ◦ (t) = −H(t)e(t), Equation (6.49) holds. Furthermore, using (6.45) and (6.46), ˆ x(t) = 0, where the strategies are used in (6.42), as well as e(t) = 0, where the strategies are used in (6.53). Therefore, J(u ◦ (), ˜ w ◦ (), x ◦ (t f ); t 0 , t f ) = 1 2 I(Z t f ) −θe T (t f )P −1 (t 0 , t f ; P 0 )e(t f ) = 0, (6.64) where (u ◦ (t) = 0, ˜ w ◦ (t) = 0) and for any other strategy, where ( ˜ w(t)) = 0, J(u ◦ (), ˜ w(), x(t f ); t 0 , t f ) < 0 and thus the cost is strictly concave with respect to ˜ w(). Necessity: Suppose 1 is violated, but 2 and 3 are not, and P(t 0 , t; P 0 ) has an escape time t e ∈ [t 0 , t f ]. For t s < t e we choose the strategies for ˜ w(t) as w ◦ (t) = W(t)Γ T (t)P −1 (t 0 , t; P 0 )e(t), v ◦ (t) = −H(t)e(t) for t 0 ≤ t < t s and, using (6.32) and (6.33), w ◦ (t) = θ −1 W(t)Γ T (t)S G (t f , t; Q f )x(t), v ◦ (t) = 0 for t s ≤ t ≤ t f . Therefore, ˆ x(t) = 0 and u ◦ (t) = 0 over t ∈ [t 0 , t s ]. Furthermore, we assume some x(t 0 ) = 0 such that e(t s ) = x(t s ) = 0 and coincides with the eigenvector associated with the largest eigenvalue of P(t 0 , t s ; P 0 ). Then, as t s → t e , e T (t s )P −1 (t 0 , t s ; P 0 )e(t s ) → 0 and the cost criterion J(u ◦ (), ˜ w(), x(t s ); t 0 , t f ) = 1 2 x T (t s )S G (t f , t s ; Q f )x(t s ) > 0, which is a contradiction to the optimality of u ◦ (). Note that I(Z t s ) = 0. Suppose 2 is violated, but 1 and 3 are not, and S G (t f , t; Q f ) has an escape time t e ∈ [t 0 , t f ]. For t s > t e we choose the strategies of ˜ w(t) as w(t) = 0, 250 Chapter 6. LQ Differential Games v ◦ (t) = −H(t)e(t) for t 0 ≤ t ≤ t s and w ◦ (t) = θ −1 W(t)Γ T (t)S G (t f , t; Q f )x(t), v ◦ (t) = 0 for t s ≤ t ≤ t f . Furthermore, we assume some x(t 0 ) = 0 such that e(t s ) = x(t s ) = 0 and coincides with the eigenvector associated with the largest eigenvalue of S G (t f , t s ; Q f ). Then, as t s →t e , J(u ◦ (), ˜ w(), x(t); t 0 , t f ) = 1 2 I(Z t s ) −θe T (t t s )P −1 (t 0 , t t s ; P 0 )e(t t s ) + x T (t t s )S G (t f , t t s ; Q f )x(t t s ) > 0, (6.65) since as t s →t e , S G (t f , t t s ; Q f ) goes to positive infinity and the third term dominates and produces a contradiction to the optimality of u ◦ (). Note that the above is true for any finite control process u() over [t 0 , t s ] as long as x(t s ) = 0. Suppose 3 is violated at t = t s , where t s ∈ [t 0 , t f ], but 1 and 2 are not. Choose the strategies of ˜ w(t) as w ◦ (t) = W(t)Γ T (t)P −1 (t 0 , t; P 0 )e(t), v ◦ (t) = −H(t)e(t) for t 0 ≤ t ≤ t s and w ◦ (t) = θ −1 W(t)Γ T (t)S G (t f , t; Q f )x(t), v ◦ (t) = 0 for t s ≤ t ≤ t f . Furthermore, choose x(t 0 ) = 0 so that e(t s ) = x(t s ) = 0 so that e(t s ) = x(t s ) is an eigenvector of a negative eigenvalue of P −1 (t 0 , t s ; P 0 ) −θ −1 S G (t f , t s ; Q f ) so that e T (t s )(S G (t f , t s ; Q f ) −θP −1 (t 0 , t s ; P 0 ))e(t s ) > 0. This is again a contradiction to the optimality of u ◦ (). Note that I(Z t s ) = 0. 6.3.4 Time-Invariant Disturbance Attenuation Estimator Transformed into the H ∞ Estimator For convenience, we consider the infinite-time, time-invariant problem. We assume that P(t 0 , t; P 0 ) has converged to a steady state value denoted as P and S G (t f , t; Q f ) has converged to a steady state value denoted as S. We first make a transformation of our estimate ˆ x(t) to a new estimate x ◦ (t), which is essentially the worst-case state estimate as x ◦ (t) = I −θ −1 PS −1 ˆ x(t) = L −1 ˆ x(t), (6.66) 6.3. Disturbance Attenuation Problem 251 where the estimator propagation is written as a standard differential equation as ˙ ˆ x(t) = A + θ −1 PQ ˆ x(t) + Bu + PH T V −1 (z(t) −Hˆ x(t)) , (6.67) where we are assuming that all the coefficients are time invariant and the matrices P and S are determined from the algebraic Riccati equations (ARE) as 0 = AP + PA T −P(H T V −1 H −θ −1 Q)P + ΓWΓ T , (6.68) 0 = A T S + SA + Q−S BR −1 B T −θ −1 ΓWΓ T S. (6.69) Substitution of the transformation (6.66) into the estimator (6.67) gives L −1 ˙ ˆ x(t) = ˙ x ◦ (t) = L −1 A + θ −1 PQ Lx ◦ (t)+L −1 Bu+L −1 PH T V −1 (z(t) −HLx ◦ (t)) . (6.70) Remark 6.3.4 The convergence of the Riccati differential equation to an ARE is shown in [40] and follows similar notions given in Theorem 5.7.1 of Section 5.7 for the convergence to the ARE for the linear-quadratic problem. Also, note that the solutions to ARE of (6.68) and (6.69) can have more than one positive definite solution. However, the minimal positive definite solution (M) to the ARE captures the stable eigenvalues. The minimal positive definite solution M means that for any other solution S, S −M ≥ 0. The elements of the transformation L −1 can be manipulated into the following forms which are useful for deriving the dynamic equation for x ◦ (t): E = S I −θ −1 PS −1 = (I −θ −1 PS)S −1 −1 = S −1 −θ −1 P −1 = S −1 (I −θ −1 SP) −1 = I −θ −1 SP −1 S. (6.71) Furthermore, from (6.71) S −1 E = I + θ −1 PE = I −θ −1 PS −1 = P −1 −θ −1 S −1 P −1 = L −1 . (6.72) 252 Chapter 6. LQ Differential Games Substitution of the transformations of L and L −1 from (6.66) and (6.72) into (6.70) gives ˙ x ◦ (t) = I + θ −1 PE A + θ −1 PQ I −θ −1 PS x ◦ (t) + I + θ −1 PE Bu +MH T V −1 z(t) −H I −θ −1 PS x ◦ (t) = I + θ −1 PE A + θ −1 PQ I −θ −1 PS x ◦ (t) + I + θ −1 PE Bu +MH T V −1 (z(t) −Hx ◦ (t)) − I + θ −1 PE θPH T V −1 PSx ◦ (t) = I + θ −1 PE A + θ −1 PQ x ◦ (t) + I + θ −1 PE Bu +MH T V −1 (z(t) −Hx ◦ (t)) + I + θ −1 PE A + θ −1 PQ θP + θ −1 PH T V −1 P Sx ◦ (t) = I + θ −1 PE A + θ −1 PQ x ◦ (t) + I + θ −1 PE Bu +MH T V −1 (z(t) −Hx ◦ (t)) + θ −1 I + θ −1 PE PA T + ΓWΓ T Sx ◦ (t), (6.73) where M = L −1 P and the last line results from using (6.68) in previous equality. To continue to reduce this equation, substitute in the optimal controller u ◦ = −R −1 B T Sx ◦ (t) into (6.73) Then, (6.73) becomes ˙ x ◦ (t) = Ax ◦ (t) −BR −1 B T Sx ◦ (t) + θ −1 ΓWΓ T Sx ◦ (t) + MH T V −1 (z(t) −Hx ◦ (t)) +θ −1 I + θ −1 PE PQx ◦ (t) + θ −1 I + θ −1 PE PA T Sx ◦ (t) −θ −1 PE −A + BR −1 B T S −θ −1 ΓWΓ T S x ◦ (t). (6.74) Noting that PE = P I −θ −1 SP −1 S = P −1 −θ −1 S −1 S, I + θ −1 PE = P −1 −θ −1 S −1 P −1 . (6.75) 6.3. Disturbance Attenuation Problem 253 Substituting (6.75) into (6.74) and using (6.69), the estimator in terms of x ◦ (t) becomes ˙ x ◦ (t) = Ax ◦ (t) −BR −1 B T Sx ◦ (t) + θ −1 ΓWΓ T Sx ◦ (t) + MH T V −1 (z(t) −Hx ◦ (t)) . (6.76) The appearance of the term w = +θ −1 ΓWΓ T Sx ◦ (t) is the optimal strategy of the process noise and explicitly is included in the estimator. This estimator equation is the same if the system matrices are time varying and is equivalent to that given in [18] for their time-invariant problem. The dynamic equation for the matrix M(t) in the filter gain can be obtained by differentiating M(t) as M(t) = L −1 (t)P(t 0 , t; P 0 ) = P −1 (t 0 , t; P 0 ) −θ −1 S G (t f , t; Q f ) −1 ⇒ ˙ M(t) = M(t) ˙ P −1 (t 0 , t; P 0 ) −θ −1 ˙ S G (t f , t; Q f ) M(t). (6.77) Substitution of (6.5) and (6.44) into (6.77) produces the Riccati equation ˙ M(t) = M(t) A −θΓWΓ T S G (t f , t; Q f ) T + A −θΓWΓ T S G (t f , t; Q f ) M(t) −M(t) H T V −1 H −θ −1 S G (t f , t; Q f )BR −1 B T S G (t f , t; Q f ) M(t) +ΓWΓ T , M(t 0 ) = I −θ −1 P 0 S G (t f , t 0 ; Q f ) −1 P 0 . (6.78) For the infinite-time, time-invariant system, ˙ M = 0 and (6.78) becomes an ARE. Relating this back to the disturbance attenuation controller in the previous section (6.5), (6.42), (6.44), (6.47), the H ∞ form of the disturbance attenuation controller is u ◦ (t) = −R −1 B T Sx ◦ (t), (6.79) where x ◦ (t) is given by (6.76), in which M is given by (6.77) and controller and filter gains require the smallest positive definite (see [40]) solutions P and S to the AREs (6.68) and (6.69). 254 Chapter 6. LQ Differential Games Remark 6.3.5 Note that when θ → 0 the H ∞ controller given by (6.47) or (6.79) reduce to the H 2 controller 11 u(t) = −R −1 B T Sˆ x(t), (6.80) where the controller gains are obtained from the ARE (6.69) with θ = 0 and is identical to the ARE in Theorem 5.7.2. The state is reconstructed from the filter with ˆ x(t) = x ◦ (t): ˙ ˆ x(t) = Aˆ x(t) + Bu(t) + PH T V −1 (z(t) −Hˆ x(t)) , (6.81) where the filter gains are determined from (6.68) with θ = 0. The properties of this new ARE are similar to those given in Theorem 5.7.2. 6.3.5 H ∞ Measure and H ∞ Robustness Bound First, we show that the L 2 norms on the input–output of a system induce the H ∞ norm 12 on the resulting transfer matrix. Consider Figure 6.2, where the disturbance input, d, is a square integrable, i.e., L 2 function. We are interested in the conditions on G that will make the output performance measure, y, square integrable as well. Because of Parseval’s Theorem, a square integrable y is isomorphic (i.e., equivalent) to a square integrable transfer function, Y(s): |y| 2 2 = ∞ −∞ y(τ) 2 dτ = sup α>0 1 2π ∞ −∞ |Y(α + jω)| 2 dω. (6.82) d G(s) y Figure 6.2: Transfer function of square integrable signals. 11 In the stochastic setting it is known as the linear-quadratic-Gaussian controller. 12 The “L” in L 2 , by the way, stands for “Lebesgue.” 6.3. Disturbance Attenuation Problem 255 We can use the properties of norms and vector spaces to derive our condition on G: |y| 2 2 = sup α>0 1 2π ∞ −∞ |G(α + jω)d(α + jω)| 2 dω (6.83) ≤ sup α>0 1 2π ∞ −∞ |G(α + jω)| 2 |d(α + jω)| 2 dω (6.84) = sup α>0 1 2π ∞ −∞ ¯ σ (G(α + jω)) 2 |d(α + jω)| 2 dω (6.85) ≤ ¸ sup α>0 sup ω ¯ σ (G(α + jω)) 2 1 2π ∞ −∞ |d(α + jω)| 2 dω (6.86) = ¸ sup α>0 sup ω ¯ σ (G(α + jω)) 2 |d| 2 2 . We use Schwartz’s Inequality to get from the first line to the second. The symbol ¯ σ denotes the largest singular value of the matrix transfer function, G(). Since G is a function of the complex number, s, so is ¯ σ. Now, since |d| 2 2 < ∞ by definition, |y[[ 2 2 < ∞ if and only if sup α>0 sup ω ¯ σ (G(α + jω)) < ∞. (6.87) The above equation describes the largest possible gain that G(s) can apply to any possible input, which gives the largest value that G can obtain. Thus, we define the infinity norm of G to be |G| ∞ := sup α>0 sup ω ¯ σ G(α + jω) . (6.88) We should note that from our development it is clear that |G| ∞ describes the ratio of the two norms of d and y: |G| ∞ = |y| 2 |d| 2 . (6.89) 256 Chapter 6. LQ Differential Games The body of theory that comprises H ∞ describes the application of the ∞ norm to control problems. Examples of these are the model matching problem and the robust stability and performance problems. 6.3.6 The H ∞ Transfer-Matrix Bound In this section, the H ∞ norm of the transfer matrix from the disturbance attenuation problem is computed, where it is assumed now that the disturbance inputs of measurement and process noise are L 2 functions. To construct the closed-loop transfer matrix between the disturbance and performance output, the dynamic system coupled to the optimal H ∞ compensator is written together as ˙ x(t) = Ax(t) + Bu ◦ (t) + Γw(t) = Ax(t) −BR −1 B T Sx ◦ (t) + Γw(t), ˙ x ◦ (t) = F c x ◦ (t) + G c z(t) = F c x ◦ (t) + G c Hx(t) + G c v(t), (6.90) where F c = A −BR −1 B T S + θ −1 ΓWΓ T S −MH T V −1 H, G c = MH T V −1 . Define a new state vector which combines x(t) and x ◦ (t) as ρ(t) = x(t) x ◦ (t) (6.91) with dynamics system ˙ ρ(t) = F CL ρ(t) + Γ CL d(t), y(t) = C CL ρ(t), where F CL = A −BΛ G c H F c , d(t) = w(t) v(t) , (6.92) Γ CL = Γ 0 0 G c , C CL = [C −DR −1 B T S]. (6.93) 6.3. Disturbance Attenuation Problem 257 Plant T yd (s) Compensator Disturbance d Output performance y Measurements z Control u Figure 6.3: Transfer matrix from the disturbance inputs to output performance. The transfer matrix of the closed-loop system from the disturbances d to the output y is depicted in Figure 6.3. The transfer matrix T yd is T yd (s) = C CL [sI −F CL ] −1 Γ CL . (6.94) The following result, proved in [40], shows how the closed-loop transfer matrix is bounded. Theorem 6.3.2 The closed-loop system is stable and |T yd (s)| ∞ ≤ θ. (6.95) Example 6.3.1 (Scalar dynamic system) The characteristics of the ARE and the closed-loop system dynamics are illustrated. Consider the scalar dynamic system ˙ x(t) = −1.5x(t) + u(t) + w(t), z(t) = x(t) + v(t), 258 Chapter 6. LQ Differential Games where Q = 4, R = 2, θ −1 = 1, V = 1/14, W = 1, B = 1, Γ = 1, H = 1. The corresponding AREs are −3S + .5S 2 + 4 = 0 ⇒S = 2, 4, −3P −10P 2 + 1 = 0 ⇒P = .2, where we compute M = 1/3, MH T V −1 = 14/3 = G c . A plot of P as a function of θ −1 is shown in Figure 6.4. The ’s start on the negative reals and continue to decrease as θ −1 decreases. Then the ’s go through −∞ to +∞ and continue to decreases as θ −1 decreases till it meets the ◦’s. At that point it breaks onto the imaginary axis and its solution is no longer valid. At this point the eigenvalues of the Hamiltonian associated with the ARE reach and then split along the imaginary axis if θ −1 continues to change. Note that there can be two positive solutions. In [40] it is shown that only the smallest positive definite solution to the S and P ARE produces the optimal controller. Here, it is shown that the smallest positive solution to the ARE is associated with the root starting at θ −1 = 0 or the LQG solution. Figure 6.4: Roots of P as a function of θ −1 . The closed-loop matrix (6.92) for S = 2, P = .2, F CL = −1.5 −1 14/3 −4.2 ⇒λ = −2.8 ±1.7i, (6.96) 6.3. Disturbance Attenuation Problem 259 where λ is an eigenvalue of F CL and for S = 4, P = .2, F CL = −1.5 −2 14 −13.5 ⇒λ = −4.7, −10.3. (6.97) Note that the complex eigenvalues in (6.96) induced by this approach could not be generated by LQG design for this scalar problem. Problems 1. Show the results given in Equation (6.48). 2. Find a differential equation for the propagation of x ◦ (t) in Equation (6.46). 3. Consider the system shown in Figure 6.5 Assume t f → ∞ and all parameters are time invariant. Assume ˙ S → 0, ˙ P → 0, Figure 6.5: System description. 260 Chapter 6. LQ Differential Games which means using the ARE. The system equations are ˙ x(t) = ax(t) + u + w(t), z(t) = x(t) + v(t), y = ¸ 1 0 x(t) + ¸ 0 1 u, (6.98) where Q = 1, R = 1, W = 1, V = 1. (6.99) (a) Plot S as a function of θ −1 for a = 1 and a = −1. (b) Plot P as a function of θ −1 for a = 1 and a = −1. (c) For some choice of θ −1 show that all necessary conditions are satisfied: i. P > 0, ii. S ≥ 0, iii. I −θ −1 PS > 0. (d) Write down the compensator. 261 Appendix A Background A.1 Topics from Calculus In this section we present some results from calculus that will be of use in the main text. Note that a basic familiarity with such topics as continuity, differentiation, and integration is assumed. A.1.1 Implicit Function Theorems It should be noted that this section is adapted from Friedman’s book Advanced Cal- culus [21]. Let F(x, y) be a function defined in an open set G of R 2 . We are concerned with the set of points (x, y) in G satisfying F(x, y) = 0. Can we write the points of this set in the form y = g(x)? If so, then we say that the equation F(x, y) = 0 defines the function y = g(x) implicitly, and we call the equation F(x, y) = 0 (A.1) the implicit equation for y = g(x). If F(x, y) = x 2 + y 2 in R 2 , then the only solution of Equation (A.1) is (0, 0). Thus there is no function y = g(x) that is defined on some interval of positive length. 262 Appendix A: Background If F(x, y) = x 2 + y 2 − 1 in R 2 , then there are two functions y = g(x) for which (x, g(x)) satisfies Equation (A.1) for −1 ≤ x ≤ 1, namely, y = g(x) = ± √ 1 −x 2 . (A.2) Note, however, that if we restrict (x, y) to belong to a small neighborhood of a solution (x 0 , y 0 ) of Equation (A.1) with [x 0 [ < 1, then there is a unique solution y = g(x) satisfying g(x 0 ) = y 0 . We shall now prove a general theorem asserting that the solutions (x, y) of Equa- tion (A.1) in a small neighborhood of a point (x 0 , y 0 ) have the form y = g(x). This theorem is called the Implicit Function Theorem for a function of two variables. Theorem A.1.1 Let F(x, y) be a function defined in an open set G of R 2 having continuous first derivatives, and let (x 0 , y 0 ) be a point of G for which F(x 0 , y 0 ) = 0, (A.3) F y (x 0 , y 0 ) = 0. (A.4) Then there exists a rectangle R in G defined by [x −x 0 [ < α, [y −y 0 [ < β (A.5) such that the points (x, y) in R that satisfy Equation (A.1) have the form (x, g(x)), where g(x) is a function having a continuous derivative in [x−x 0 [ < α. Furthermore, g (x) = − F x (x, g(x)) F y (x, g(x)) if [x −x 0 [ < α. (A.6) The condition (A.4) cannot be omitted, as shown by the example of F(x, y) = x 2 + y 2 , (x 0 , y 0 ) = (0, 0). A.1. Topics from Calculus 263 Proof: We may suppose that F y (x 0 , y 0 ) > 0. By continuity, F y (x, y) > 0 if [x−x 0 [ ≤ d, [y − y 0 [ ≤ d, where d is a small positive number. Consider the function φ(y) = F(x 0 , y). It is strictly monotone increasing since φ (y) = F y (x 0 , y) > 0. Since φ(y 0 ) = F(x 0 , y 0 ) = 0, it follows that F(x 0 , y 0 −d) = φ(y 0 −d) < 0 < φ(y 0 + d) = F(x 0 , y 0 + d). (A.7) Using the continuity of F(x, y 0 −d) and of F(x, y 0 + d) we deduce that F(x, y 0 −d) < 0, F(x, y 0 + d) > 0 (A.8) if [x −x 0 [ is sufficiently small, say, if [x −x 0 [ ≤ d 1 . See Figure A.1. F <0 F <0 F >0 F >0 (x  -d  , y  +d) (x  +d  , y  +d) (x  -d  , y  -d) (x  +d  , y  -d) (x  , y  ) (x  , y  +d) (x, y x ) (x  , y  -d) Figure A.1: Definition of F y (x 0 , y 0 ) > 0. Consider, in the interval [y −y 0 [ ≤ d, the continuous function ψ(y) = F(x, y) for x fixed, [x − x o [ ≤ d 2 , d 2 = min(d, d 1 ). It is strictly monotone increasing since ψ y = F y (x, y) > 0. Also, by Equation (A.8), ψ(y 0 −d) < 0, ψ(y 0 +d) > 0. From the continuity of ψ(), there is a point y x in the interval (y o −d, y 0 +d) satisfying ψ(y x ) = 0. Since ψ is strictly monotone, y x is unique. Writing y x = g(x), we have proved that the solutions of Equation (A.1) for [x − x 0 [ < d 2 , [y − y 0 [ < d have the form (x, g(x)). 264 Appendix A: Background We shall next prove that g(x) is continuous. For any > 0, ≤ d, F(x 0 , y 0 +) > 0 > F(x 0 , y 0 −). Repeating the argument given above with y 0 ±d replaced by y 0 ±, we see that there exists a number d 2 () such that if [x −x 0 [ < d 2 (), then there is a unique y = ¯ y x in (y 0 −, y 0 +) satisfying F(x, ¯ y x ) = 0. By uniqueness of the solution y of F(x, y) = 0 in (y 0 − d, y 0 + d), it follows that ¯ y x = g(x). Hence [g(x) −y 0 [ < if [x −x 0 [ ≤ d 2 (). (A.9) This proves the continuity of g(x) at x 0 . Now let x 1 be any point in [x −x 0 [ < d 2 . Then F(x 1 , y 1 ) = 0, where y 1 = g(x 1 ), and F y (x1, y1) > 0. We therefore can apply the proof of the continuity of g(x) at x 0 and deduce the continuity of g(x) at x 1 . We proceed to prove that g(x) is differentiable. We begin with the relation F(x 0 + h, g(x 0 + h)) −F(x 0 , g(x 0 )) = 0, (A.10) where [h[ < d 2 . Writing g(x 0 + h) = g(x 0 ) + ∆g and using the differentiability of F, we get hF x (x 0 , y 0 ) + ∆g F y (x 0 , y 0 ) + η h 2 + (∆g) 2 = 0, (A.11) where η = η(h, ∆g) → 0 if (h, ∆g) → 0. Since g(x) is continuous, ∆g → 0 if h →0. Hence η →0 if h →0. Writing F x (x 0 , y 0 ) = F x , F y (x 0 , y 0 ) = F y , and dividing both sides of Equation (A.11) by hF y , we find that ∆g h = − F x F y + η hF y h 2 + (∆g) 2 . (A.12) A.1. Topics from Calculus 265 If [h[ is sufficiently small, [η/F y [ < 1/2, hence [∆g[ [h[ ≤ [F x [ [F y [ + 1 2 [∆g[ [h[ . It follows that [∆g[ [h[ ≤ C, C constant. Using this in Equation (A.12) and taking h →0, we conclude that lim h→0 ∆g h (A.13) exists and is equal to −F x /F y . We have thus proved that g (x) exists at x 0 and that Equation (A.6) holds at x 0 . The same argument can be applied at any point x in [x − x 0 [ < d 2 . Thus g (x) exists and it satisfies Equation (A.6). Since the right-hand side of Equa- tion (A.6) is continuous, the same is true of g (x). This completes the proof of the theorem. If instead of Equation (A.4) we assume that F x (x 0 , y 0 ) = 0, then we can prove an analogue of Theorem A.1.1 with the roles of x and y interchanged. The proof of Theorem A.1.1 extends to the case where x = (x 1 , x 2 , . . . , x n ). The result is the following implicit function theorem for a function of several variables. Theorem A.1.2 Let F(x, y) = F(x 1 , x 2 , . . . , x n , y) be a function defined in an open set of G of R n+1 and let (x 0 , y 0 ) = (x 0 1 , . . . , x 0 n , y 0 ) be a point of G. Assume that F(x 0 , y 0 ) = 0, F y (x 0 , y 0 ) = 0. (A.14) Then there exists an (n + 1)-dimensional rectangle R defined by [x i −x 0 i [ < α(1 ≤ i ≤ n), [y −y 0 [ < β (A.15) 266 Appendix A: Background such that for any x in Equation (A.15) there exists a unique solution y = y x of F(x, y) = 0 in the interval [y − y 0 [ < β. Writing y x = g(x), the function g(x) is continuously differentiable, and ∂ ∂x i g(x) = − (∂F/∂x i )(x, g(x)) (∂F/∂y)(x, g(x)) (1 ≤ i ≤ n). (A.16) Consider next a more complicated situation in which we want to solve two equations simultaneously: F(x, y, z, u, v) = 0, (A.17) G(x, y, z, u, v) = 0. (A.18) We introduce the determinant J = ∂(F, G) ∂(u, v) = F u F v G u G v (A.19) called the Jacobian of F, G with respect to u, v. 13 Theorem A.1.3 Let F and G have continuous first derivatives in an open set D of R 5 containing a point P 0 = (x 0 , y 0 , z 0 , u 0 , v 0 ). Assume that F(x 0 , y 0 , z 0 , u 0 , v 0 ) = 0, G(x 0 , y 0 , z 0 , u 0 , v 0 ) = 0, (A.20) and ∂(FG) ∂(u, v) (x 0 ,y 0 ,z 0 ,u 0 ,v 0 ) = 0, (A.21) which means the Jacobian is nonzero. Then there exists a cube R : [x −x 0 [ < α, [y −y 0 [ < α, [z −z 0 [ < α, (A.22) and a rectangle S : [u −u 0 [ < β 1 , [v −v 0 [ < β 2 (A.23) 13 The Jacobian determinant should not be confused with the Jacobian matrix from Defini- tion A.2.4. A.1. Topics from Calculus 267 such that for any (x, y, z) in R there is a unique pair (u, v) in S for which Equa- tion (A.17) and (A.18) hold. Writing u = f(x, y, z), v = g(x, y, z), (A.24) the functions f and g have continuous first derivatives in R, and f x = − 1 J ∂(F, G) ∂(x, v) = − 1 J F x F v G x G v , (A.25) g x = − 1 J ∂(F, G) ∂(u, x) = − 1 J F u F x G u G x . (A.26) Similar formulas hold for f y , f z , g y , g z . Proof: Since J = 0 at P 0 , either F v = 0 or G v = 0 at P 0 . Suppose F v = 0 at P 0 . By Theorem A.1.2, if (x, y, z) lies in a small rectangle T with center (x 0 , y 0 , z 0 , u 0 ), then there exists a unique solution p = φ(x, y, z, u) of Equation (A.17) in some small interval [v −v 0 [ < β 2 . Let H(x, y, z, u) = G(x, y, z, u, φ(x, y, z, u)). (A.27) Then (u, v) is a solution of Equations (A.17) and (A.18) (when (x, y, z, u) ∈ T, [v −v 0 [ < β 2 ) if and only if v = φ(x, y, z, u) and H(x, y, z, u) = 0. (A.28) φ has continuous first derivatives and φ u = −F u /F v . Hence H u = G u + G v φ u = G u − G v F u F v = G u F v −G v F u F v = − J F v = 0 (A.29) at P 0 . We therefore can apply Theorem A.1.2. We conclude that for any (x, y, z) in a small cube R with center (x 0 , y 0 , z 0 ) there is a unique solution of u of Equa- tion (A.28) in some interval [u − u 0 [ < β 1 ; the points (x, y, z, u) belong to T. Furthermore, this solution u has the form u = g(x, y, z), (A.30) 268 Appendix A: Background where g has continuous first derivatives. It follows that v = φ(x, y, z, g(x, y, z)) (A.31) also has continuous first derivatives. It remains to prove Equations (A.25) and (A.26). To do this we differentiate the equations F(x, y, z, f(x, y, z), g(x, y, z)) = 0, G(x, y, z, f(x, y, z), g(x, y, z)) = 0 with respect to x and get F x + F u f x + F v g x = 0, G x + G u f x + G v g x = 0. (A.32) Solving for f x , g x , we get Equations (A.25) and (A.26). We conclude this section with a statement of the most general implicit function theorem for a system of functions. Let F i (x, u) = F i (x 1 , . . . , x n , u 1 , . . . , u r ) (1 ≤ i ≤ r) (A.33) be functions having continuous first derivatives in an open set containing a point (x 0 , u 0 ). The matrix ∂F 1 ∂u 1 ∂F 1 ∂u 2 ∂F 1 ∂u r ∂F 2 ∂u 1 ∂F 2 ∂u 2 ∂F 2 ∂u r . . . . . . . . . ∂F r ∂u 1 ∂F r ∂u 2 ∂F r ∂u r ¸ ¸ ¸ ¸ ¸ ¸ ¸ (A.34) or briefly, (∂F i /∂u j ) is called the Jacobian matrix of (F 1 , . . . , F r ) with respect to (u 1 , . . . , u r ). The determinant of this matrix is called the Jacobian of (F 1 , . . . , F r ) A.1. Topics from Calculus 269 with respect to (u 1 , . . . , u r ) and is denoted by J = ∂(F 1 , . . . , F r ) ∂(u 1 , . . . , u r ) . (A.35) Theorem A.1.4 Let F 1 , . . . , F r have continuous first derivatives in a neighborhood of a point (x 0 , u 0 ). Assume that F i (x 0 , u 0 ) = 0 (1 ≤ i ≤ r), ∂(F 1 , . . . , F r ) ∂(u 1 , . . . , u r ) = 0 at (x 0 , u 0 ). (A.36) Then there is a δ-neighborhood R of x 0 and a γ-neighborhood S of u 0 such that for any x in R there is a unique solution u of F i (x, u) = 0 (1 ≤ i ≤ r) (A.37) in S. The vector valued function u(x) = (u 1 (x), . . . , u r (x)) thus defined has continuous first derivatives in R. In order to compute ∂u i /∂x j (1 ≤ i ≤ r) for a fixed j, we differentiate the equations F 1 (x, u(x)) = 0, . . . , F r (x, u(x)) = 0 (A.38) with respect to x j . We obtain the system of linear equations for ∂u i /∂x j : ∂F k ∂x j + r ¸ i=1 ∂F k ∂u i ∂u i ∂x j = 0 (1 ≤ k ≤ r). (A.39) The system of linear equations (A.39) in the unknowns ∂u i /∂x j can be uniquely solved, since the determinant of the coefficients matrix, which is precisely the Jacobian ∂(F 1 , . . . , F r ) ∂(u 1 , . . . , u r ) , is different from 0. 270 Appendix A: Background We briefly give the proof of Theorem A.1.4. It is based upon induction on r. Without loss of generality we may assume that ∂(F 1 , . . . , F r−1 ) ∂(u 1 , . . . , u r−1 ) = 0. (A.40) Therefore, by the inductive assumption, the solution of F i (x, u) = 0 (1 ≤ i ≤ r −1) (A.41) in a neighborhood of (x 0 , u 0 ) is given by u i = φ i (x, u r ) with (1 ≤ i ≤ r −1). Let G(x, u r ) = F r (x, φ 1 (x, u r ), . . . , φ r−1 (x, u r ), u r ). (A.42) If we show that ∂G ∂u r = 0 at (x 0 , u 0 ), (A.43) then we can use Theorem A.1.2 to solve the equation G(x, u r ) = 0. To prove Equa- tion (A.43), differentiate the equations F i (x, φ 1 (x, u r ), . . . , φ r−1 (x, u r ), u r ) = 0 (1 ≤ i ≤ r −1) (A.44) with respect to u r to obtain r−1 ¸ j=1 ∂F 1 ∂u j ∂φ j ∂u r + ∂F i ∂u r = 0 (1 ≤ i ≤ r −1). (A.45) Differentiate also Equations (A.42) with respect to u r to obtain r−1 ¸ j=1 ∂F r ∂u j ∂φ j ∂u r − ∂G ∂u r + ∂F r ∂u r = 0. (A.46) Solving the linear system of Equation (A.45) and (A.46) (in the unknowns ∂φ j /∂u r , ∂G/∂u r ) for ∂G/∂u r , we obtain ∂G ∂u r = ∂(F 1 , . . . , F r )/∂(u 1 , . . . , u r ) ∂(F 1 , . . . , F r−1 )/∂(u 1 , . . . , u r−1 ) . (A.47) This gives Equation (A.43). A.1. Topics from Calculus 271 A.1.2 Taylor Expansions We first introduce the definition of order. Definition A.1.1 Consider a function R(x). We say that R() is of higher order than x n and write R(x) ∼ O(x n ) if lim x→0 R(x) x n = lim x→0 O(x n ) x n = 0. Taylor’s Expansion for Functions of a Single Variable We begin by reviewing the definition and the simplest properties of the Taylor expansion for functions of one variable. If f(x) has an Nth derivative at x 0 its Taylor expansion of degree N about x 0 is the polynomial f(x 0 ) + 1 1! f (x 0 )(x −x 0 ) + 1 2! f (x 0 )(x −x 0 ) 2 + + 1 N! f (N) (x 0 )(x −x 0 ) N . (A.48) The relation between f and its Taylor expansion can be expressed conveniently by the following integral remainder formula. Theorem A.1.5 If f has a continuous Nth derivative in a neighborhood of x 0 , then in that neighborhood f(x) = f(x 0 ) + 1 1! f (x 0 )(x −x 0 ) + + 1 N! f (N) (x 0 )(x −x 0 ) N + R N , (A.49) where R N = 1 (N −1)! x x 0 (x −t) N−1 f (N) (t) −f (N) (x 0 ) dt, (A.50) where R N is of order O((x −x 0 ) N ) and lim (x−x 0 )→0 O((x −x 0 ) N ) [(x −x 0 ) N [ →0. (A.51) 272 Appendix A: Background Proof: The remainder can be written as the difference R N = 1 (N −1)! x x 0 (x −t) N−1 f (N) (t)dt − f (N) (x 0 ) (N −1)! x x 0 (x −t) N−1 dt. (A.52) The second of these integrals is directly computed to be f (N) (x 0 ) (N −1)! x x 0 (x −t) N−1 dt = 1 N! f (N) (x 0 )(x −x 0 ) N , (A.53) which is just the last term of the Taylor expansion. The first integral can be integrated by parts, which together with (A.53) leads to 1 (N −2)! x x 0 (x −t) N−2 [f (N−1) (t) −f (N−1) (x 0 )]dt = R N−1 . (A.54) We therefore obtain R N = − 1 N! f (N) (x 0 )(x −x 0 ) N + R N−1 . (A.55) If we substitute the preceding equation into Equation (A.49), we get Equa- tion (A.49) back again with N replaced by N − 1. The induction is completed by noticing that, for N = 1, Equation (A.49) is just f(x) = f(x 0 ) + f (x 0 )(x −x 0 ) + x x 0 [f (t) −f (x 0 )]dt (A.56) and that this is a valid equation. Finally, the remainder R N in (A.50) is shown to be O(x −x 0 ) N . The following inequality can be constructed: [R N [ = 1 (N −1)! x x 0 (x −t) N−1 f (N) (t) −f (N) (x 0 ) dt ≤ max x ∈(x−x 0 ) [f (N) (x ) −f (N) (x 0 )[ 1 (N −1)! x x 0 (x −t) N−1 dt = max x ∈(x−x 0 ) [f (N) (x ) −f (N) (x 0 )[ 1 N! [(x −x 0 )[ N . (A.57) A.2. Linear Algebra Review 273 Since it is assumed that f (N) is continuous, then using (A.57) lim (x−x 0 )→0 R N [(x −x 0 ) N [ →0, (A.58) which by Definition A.1.1 implies that R N is O((x −x 0 ) N ). Remark A.1.1 A vector representation for Taylor’s Expansion is given in Section A.2.10, after the linear algebra review of Appendix A.2. A.2 Linear Algebra Review It is assumed that the reader is familiar with the concepts of vectors, matrices, and inner products. We offer here a brief review of some of the definitions and concepts that are of particular interest to the material in the main text. A.2.1 Subspaces and Dimension A subspace o of R n is a set of vectors in R n such that for any s 1 , s 2 ∈ o and any two scalars α, β ∈ R, then αs 1 + βs 2 ∈ o. The span of o ⊂ R n is defined as the collection of all finite linear combinations of the elements of o. Suppose we have a set of vectors ¦e 1 , e 2 , . . . , e m ¦ in R n . Then the set of vectors o = ¦s [ s = α 1 e 1 + α 2 e 2 + + α m e m , α i ∈ R¦ (read, “the set of all s such that s = α 1 e 1 + ”) is a subspace and is spanned by o. A set of vectors ¦e 1 , e 2 , . . . , e m ¦ in R n is said to be linearly independent if α 1 e 1 + α 2 e 2 + + α m e m = 0 ⇔α 1 = α 2 = = α m = 0. (A.59) The set of vectors ¦e 1 , e 2 , . . . , e m ¦ is a basis for the subspace if they span o and are independent. 274 Appendix A: Background The dimension of a subspace is the smallest number of vectors required to form a basis. It can be shown that any set of m linearly independent vectors in the subspace is a basis for the space. That is, if ¦s 1 , s 2 , . . . , s m ¦ are linearly independent vectors in the m-dimensional subspace o, then any s ∈ o can be written as s = α 1 s 1 + α 2 s 2 + + α m s m , where α i , i = 1, 2, . . . m, are real scalars. Example A.2.1 Consider the vectors e 1 = 1 1 2 , e 2 = 1 0 1 , e 3 = 0 1 1 . These form a basis for a subspace o ⊂ R 3 . However, the vectors are not linearly independent, as e 1 −e 2 −e 3 = 0, so that the requirement of (A.59) above is not satisfied. It can be seen that at most two of the three vectors are linearly independent, so that o has dimension 2. Any two linearly independent vectors in o can serve as a basis. In particular, both ¦e 1 , e 2 ¦ and ¦e 2 , e 3 ¦ are bases for the space. A.2.2 Matrices and Rank Given a real matrix A ∈ R m×n , the range space of the matrix is the set of all vectors y ∈ R m that can be written as Ax, where x ∈ R n . That is, Range(A) = ¦y [ y = Ax for some x ∈ R n ¦. A.2. Linear Algebra Review 275 The null space of A is the set of vectors that when multiplied by A produce zero: Null(A) = ¦w [ 0 = Aw¦. The rank of the matrix A is the dimension of the range space. It is obvious from the definition that the largest possible rank of an m by n matrix is the smaller of m and n. For a square matrix A ∈ R n×n , the dimensions of the range and null spaces sum to n. The inverse of the square matrix A is the matrix A −1 such that AA −1 = I. The inverse exists if and only if the matrix is of full rank, that is, the rank of the n n matrix is n. A matrix for which the inverse does not exist is called singular. A.2.3 Minors and Determinants Here we make some statements about the determinants of square matrices. For a more complete treatment, see, for instance, Cullen [16]. For a matrix with a single element, the determinant is declared to be the value of the element. That is, let a 11 be the only element of the 1 1 matrix A. Then the determinant [A[ of A is [A[ = a 11 . Laplace Expansion Let M ij be the matrix created by deleting row i and column j from A. Then the determinant can be computed from the Laplace expansion as [A[ = n ¸ j=1 a ij (−1) i+j [M ij [ 276 Appendix A: Background for any row i. The expansion also holds over columns, so that the order of the subscripts in the expansion can be reversed and the summation taken over any column in the matrix. The value C ij = (−1) i+j [M ij [ is generally called the cofactor of the element a ij . The determinant [M ij [ is known as the minor of element a ij . 14 Trivially, the Laplace expansion for a matrix of dimension 2 is a 11 a 12 a 21 a 22 = a 11 a 22 −a 12 a 21 , where we have used the definition of the determinant of a 1 1 matrix to evaluate the minors. The cofactors for matrix of dimension 3 involve matrices of dimension 2, so this result can be used, along with the Laplace expansion, to compute the determinant for a 3 3 matrix, etc. The determinant has several useful properties. Among these are the following: 1. [AB[ = [A[[B[. 2. If A is n n, then [A[ = 0 if and only if the rank of A is less than n. That is, [A[ = 0 is identical to saying that A is singular. 3. [A −1 [ = 1/[A[. A.2.4 Eigenvalues and Eigenvectors An eigenvalue of a square matrix A is a scalar λ (in general complex) such that the determinant [A −λI[ = 0, (A.60) 14 Some authors refer to the matrix M ij itself as the minor, rather than its determinant. A.2. Linear Algebra Review 277 where I is the appropriately dimensioned identity matrix. Equation (A.60) is called the characteristic equation of the matrix; the values of λ for which it is satisfied are sometimes known as the characteristic roots or characteristic values of A, as well as eigenvalues. An eigenvector is a vector v = 0 such that Av = λv or, equivalently, [A −λI]v = 0. Clearly, v can exist only if [A − λI[ = 0, so that λ is an eigenvalue of the matrix. Note that even for real A, the eigenvectors are in general complex. If v is an eigenvector of A, then αv is also an eigenvector (corresponding to the same eigenvalue) for all α ∈ C. A.2.5 Quadratic Forms and Definite Matrices This section introduces certain definitions and concepts which are of fundamental importance in the study of networks and systems. Some of what follows is adapted from the book Systems, Networks, and Computation by Athans et al. [3]. Suppose that x is a column n-vector with components x i and that A is a real n n symmetric matrix, that is, A = A T , with elements a ij . Let us consider the scalar-valued function f(x) defined by the scalar product f(x) = 'x, Ax` = n ¸ i=1 n ¸ j=1 a ij x i x j . (A.61) This is called a quadratic form because it involves multiplication by pairs of the elements x i of x. For example, if x = ¸ x 1 x 2 , A = ¸ 1 2 2 3 , (A.62) 278 Appendix A: Background then f(x) = 'x, Ax` = x 2 1 + 4x 1 x 2 + 3x 2 2 , (A.63) which involves terms in the square of the components of x and their cross products. We now offer certain definitions. Definition A.2.1 If for all x, (a) f(x) = 'x, Ax` ≥ 0, f(x) is called a nonnegative definite form and A is called a nonnegative definite matrix; (b) f(x) = 'x, Ax` > 0, f(x) is called a positive definite form and A is called a positive definite matrix; (c) f(x) = 'x, Ax` ≤ 0, f(x) is called a nonpositive definite form and A is called a nonpositive definite matrix; (d) f(x) = 'x, Ax` < 0, f(x) is called a negative definite form and A is called a negative definite matrix. We now give a procedure for testing whether a given matrix is positive definite. The basic technique is summarized in the following theorem. Theorem A.2.1 Suppose that A is the real symmetric n n matrix A = a 11 a 12 a 1n a 12 a 22 a 2n . . . . . . . . . . . . a 1n a 2n a nn ¸ ¸ ¸ ¸ ¸ . (A.64) A.2. Linear Algebra Review 279 Let A k be the k k matrix, defined in terms of A, for k = 1, 2, . . . , n, by A k = a 11 a 12 a 1k a 12 a 22 a 2k . . . . . . . . . . . . a 1k a 2k a kk ¸ ¸ ¸ ¸ ¸ . (A.65) Then A is positive definite if and only if det A k > 0 (A.66) for each k = 1, 2, . . . , n. There is a host of additional properties of definite and semidefinite symmetric matrices, which we give below as theorems. Some of the proofs are easy, but others are very difficult. Suppose that A is a real symmetric matrix of dimension n. The characteristic value problem is to determine the scalar λ and the nonzero vectors v ∈ R n which simultaneously satisfy the equation Av = λv or (A −λI)v = 0. (A.67) This system of n linear equations in the unknown vector v has a nontrivial solution if and only if det(A −λI) = 0, the characteristic equation. Theorem A.2.2 The characteristic roots or eigenvalues of a symmetric matrix are all real. Proof: Let A be a symmetric matrix and let λ be any root of the characteristic equation. Then Av = λv, (A.68) where v, the eigenvector, may be a complex vector. The conjugate transpose of v is denoted v ∗ . Then v ∗ Av = λv ∗ v. (A.69) 280 Appendix A: Background Since v ∗ Av is a scalar, and A is real, i.e., A ∗ = A, (v ∗ Av) ∗ = (v ∗ Av); (A.70) that is, v ∗ Av satisfies its own conjugate and hence must be real. Since v ∗ Av and v ∗ v are real, λ must be real. Theorem A.2.3 For a symmetric matrix, all the n vectors v associated with the n eigenvalues λ are real. Proof: Since (A −λI) is real, the solution v of (A −λI)v = 0 must be real. Theorem A.2.4 If v 1 and v 2 are eigenvectors associated with the distinct eigenvalues λ 1 and λ 2 of a symmetric matrix A, then v 1 and v 2 are orthogonal. Proof: We know that Av 1 = λ 1 v 1 and Av 2 = λ 2 v 2 . This implies that v 2 T Av 1 = λ 1 v 2 T v 1 and v 1 T Av 2 = λ 2 v 1 T v 2 . Taking the transpose of the first equation gives v T 1 Av 2 = λ 1 v T 1 v 2 . (A.71) Subtract the second equation to obtain (λ 1 −λ 2 )(v T 1 v 2 ) = 0. (A.72) Since λ 1 −λ 2 = 0, then v 1 T v 2 = 0. Suppose all the eigenvalues are distinct. Then V = [v 1 , . . . , v n ] T (A.73) is an orthogonal matrix. This means that since V T V = I, (A.74) then V T = V −1 . (A.75) A.2. Linear Algebra Review 281 Even if the eigenvalues are repeated, the eigenmatrix V is still orthogonal [27]. Therefore, AV = V D, (A.76) where D is a diagonal matrix of the eigenvalues λ 1 , . . . , λ n . Therefore, D = V T AV, (A.77) where V is the orthogonal matrix forming the similarity transformation. Theorem A.2.5 A is a positive definite matrix if and only if all its eigenvalues are positive. A is a negative definite matrix if and only if all its eigenvalues are negative. In either case, the eigenvectors eigenvector of A are real and mutually orthogonal. Theorem A.2.6 If A is a positive semidefinite or negative semidefinite matrix, then at least one of its eigenvalues must be zero. If A is positive (negative) definite, then A −1 is positive (negative) definite. Theorem A.2.7 If both A and B are positive (negative) definite, and if A − B is also positive (negative) definite, then B −1 −A −1 is positive (negative) definite. Quadratic Forms with Nonsymmetric Matrices Quadratic forms generally involve symmetric matrices. However, it is clear that equation (A.61) is well defined even when A is not symmetric. The form x T Ax = 0 ∀ x ∈ R n for some A ∈ R n×n occasionally occurs in derivations and deserves some attention. Before continuing, we note the following. Theorem A.2.8 If A ∈ R n×n is symmetric and x T Ax = 0 ∀x ∈ R n , then A = 0. 282 Appendix A: Background Proof: x T Ax = n ¸ i=1 n ¸ j=1 a ij x i x j = n ¸ i=1 a ii x 2 i + 2 n ¸ i=2 i−1 ¸ j=1 a ij x i x j = 0. For this to be true for arbitrary x, all coefficients a ij must be zero. Definition A.2.2 The real matrix A is skew-symmetric if A = −A T . For any skew-symmetric A, the diagonal elements are zero, since a ii = −a ii only for a ii = 0. Theorem A.2.9 For A skew-symmtric and any vector x of appropriate dimension, x T Ax = 0. (A.78) Proof: x T Ax = x T A T x, but by definition A T = −A, so x T Ax = −x T Ax, and this can be true only if x T Ax = 0. The proof of the following is trivial. Theorem A.2.10 Any square matrix A can be written uniquely as the sum of a symmetric part A s and a skew-symmetric part A w . Given the above, we have x T Ax = x T (A s + A w ) x = x T A s x and A + A T = A s + A w + A T s + A T w = 2A s . As a result of these statements and Theorem A.2.8, we note the following. A.2. Linear Algebra Review 283 Theorem A.2.11 x T Ax = 0 ∀x ∈ R n =⇒A + A T = 0. (A.79) It is not true, however, that x T Ax = 0 ∀x ∈ R n =⇒A = 0, as the matrix may have a nonzero skew-symmetric part. A.2.6 Time-Varying Vectors and Matrices A time-varying column vector x(t) is defined as a column vector whose components are themselves functions of time, i.e., x(t) = x 1 (t) x 2 (t) . . . x n (t) ¸ ¸ ¸ ¸ ¸ , (A.80) while a time-varying matrix A(t) is defined as a matrix whose elements are time functions, i.e., A(t) = a 11 (t) a 12 (t) a 1m (t) a 21 (t) a 22 (t) a 2m (t) . . . . . . . . . . . . a n1 (t) a n2 (t) a nm (t) ¸ ¸ ¸ ¸ ¸ . (A.81) The addition of time-varying vectors and matrices, their multiplication, and the scalar-product operations are defined as before. Time Derivatives The time derivative of the vector x(t) is denoted by d/dt x(t) or ˙ x(t) and is defined by d dt x(t) = ˙ x(t) = ˙ x 1 (t) ˙ x 2 (t) . . . ˙ x n (t) ¸ ¸ ¸ ¸ ¸ . (A.82) 284 Appendix A: Background The time derivative of the matrix A(t) is denoted by d/dt A(t) or ˙ A(t) and is defined by d dt A(t) = ˙ A(t) = ˙ a 11 (t) ˙ a 12 (t) ˙ a 1m (t) ˙ a 21 (t) ˙ a 22 (t) ˙ a 2m (t) . . . . . . . . . . . . ˙ a n1 (t) ˙ a n2 (t) ˙ a nm (t) ¸ ¸ ¸ ¸ ¸ . (A.83) Of course, in order for ˙ x(t) or ˙ A(t) to make sense, the derivatives ˙ x i (t) and ˙ a ij (t) must exist. Integration We can define the integrals of vectors and matrices in a similar manner. Thus, t f t 0 x(t)dt = t f t 0 x 1 (t)dt t f t 0 x 2 (t)dt . . . t f t 0 x n (t)dt ¸ ¸ ¸ ¸ ¸ ¸ , (A.84) t f t 0 A(t)dt = t f t 0 A 11 (t)dt t f t 0 A 1m (t)dt . . . . . . . . . t f t 0 A n1 (t)dt t f t 0 A nm (t)dt ¸ ¸ ¸ . (A.85) A.2.7 Gradient Vectors and Jacobian Matrices Let us suppose that x 1 , x 2 , . . . , x n are real scalars which are the components of the column n-vector x: x = x 1 x 2 . . . x n ¸ ¸ ¸ ¸ ¸ . (A.86) Now consider a scalar-valued function of the x i , f(x 1 , x 2 , . . . , x n ) = f(x). (A.87) A.2. Linear Algebra Review 285 Clearly, f is a function mapping n-dimensional vectors to scalars: f : R n →R. (A.88) Definition A.2.3 The gradient of f with respect to the column n-vector x is denoted ∂f(x)/∂x and is defined by ∂f ∂x = ∂ ∂x f(x) = ¸ ∂f ∂x 1 ∂f ∂x 2 ∂f ∂x n , (A.89) so that the gradient is a row n-vector. Note A.2.1 We will also use the notation f x = ∂f ∂x to denote the partial derivative. Example A.2.2 Suppose f : R 3 →R and is defined by f(x) = f(x 1 , x 2 , x 3 ) = x 2 1 x 2 e −x 3 ; (A.90) then ∂f ∂x = 2x 1 x 2 e −x 3 x 2 1 e −x 3 −x 1 x 2 e −x 3 . (A.91) Again let us suppose that x ∈ R n . Let us consider a function g, g : R n →R m , (A.92) such that y = g(x), x ∈ R n , y ∈ R m . (A.93) 286 Appendix A: Background By this we mean y 1 = g 1 (x 1 , x 2 , . . . , x n ) = g 1 (x), (A.94) y 2 = g 2 (x 1 , x 2 , . . . , x n ) = g 2 (x), (A.95) . . . (A.96) y m = g m (x 1 , x 2 , . . . , x n ) = g m (x). (A.97) Definition A.2.4 The Jacobian matrix of g with respect to x is denoted by ∂g(x)/∂x and is defined as ∂g(x) ∂x = ∂g 1 ∂x 1 ∂g 1 ∂x 2 ∂g 1 ∂x n ∂g 2 ∂x 1 ∂g 2 ∂x 2 ∂g 2 ∂x n . . . . . . . . . . . . ∂g m ∂x 1 ∂g m ∂x 2 ∂g m ∂x n ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ . (A.98) Thus, if g : R n →R m , its Jacobian matrix is an mn matrix. As an immediate consequence of the definition of a gradient vector, we have ∂ ∂x 'x, y` = ∂y T x ∂x = y T , (A.99) ∂ ∂x 'x, Ay` = (Ay) T , (A.100) ∂ ∂x 'Ax, y` = ∂ ∂x 'x, A T y` = y T A. (A.101) The definition of a Jacobian matrix yields the relation ∂ ∂x Ax = A. (A.102) Now suppose that x(t) is a time-varying vector and that f(x) is a scalar-valued function of x. Then by the chain rule d dt f(x) = ∂f ∂x 1 ˙ x 1 + ∂f ∂x 2 ˙ x 2 + + ∂f ∂x n ˙ x n = n ¸ i=1 ∂f ∂x i ˙ x i , (A.103) A.2. Linear Algebra Review 287 which yields d dt f(x) = ∂f ∂x T , ˙ x(t) ¸ . (A.104) Similarly, if g : R n →R m , and if x(t) is a time-varying column vector, then d dt g(x) = d dt g 1 (x) d dt g 2 (x) . . . d dt g m (x) ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ = ∂g 1 ∂x T , ˙ x(t) ∂g 2 ∂x T , ˙ x(t) . . . ∂g m ∂x T , ˙ x(t) ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ = ∂g ∂x ˙ x(t). (A.105) It should be clear that gradient vectors and matrices can be used to compute mixed time and partial derivatives. A.2.8 Second Partials and the Hessian Consider once more a scalar function of a vector argument f : R n →R. The Hessian of f is the matrix of second partial derivatives of f with respect to the elements of x: f xx = ∂ 2 f ∂x 2 1 ∂ 2 f ∂x 1 ∂x 2 ∂ 2 f ∂x 1 ∂x n ∂ 2 f ∂x 2 ∂x 1 ∂ 2 f ∂x 2 2 ∂ 2 f ∂x 2 ∂x n . . . . . . . . . . . . ∂ 2 f ∂x n ∂x 1 ∂ 2 f ∂x n ∂x 2 ∂ 2 f ∂x 2 n ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ . (A.106) It is clear from the definition that the Hessian is symmetric. Consider the function f(x, u) : R n+m →R. As above, the partial of the function with respect to one of the vector arguments is a row vector, as in f x = ∂f ∂x = ¸ ∂f ∂x 1 ∂f ∂x 2 ∂f ∂x n . 288 Appendix A: Background The matrix of second partials of this with respect to the vector u is the nm matrix given by f xu = ∂ ∂u f T x = ∂ 2 f ∂x 1 ∂u 1 ∂ 2 f ∂x 1 ∂u 2 ∂ 2 f ∂x 1 ∂u m ∂ 2 f ∂x 2 ∂u 1 ∂ 2 f ∂x 2 ∂u 2 ∂ 2 f ∂x 2 ∂u m . . . . . . . . . . . . ∂ 2 f ∂x n ∂u 1 ∂ 2 f ∂x n ∂u 2 ∂ 2 f ∂x n ∂u m ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ . (A.107) A.2.9 Vector and Matrix Norms We conclude our brief introduction to column vectors, matrices, and their operations by discussing the concept of the norm of a column vector and the norm of a matrix. The norm is a generalization of the familiar magnitude of Euclidean length of a vector. Thus, the norm is used to decide how large a vector is and also how large a matrix is; in this manner it is used to attach a scalar magnitude to such multivariable quantities as vectors and matrices. Norms for Column Vectors Let us consider a column n-vector x; x = x 1 x 2 . . . x n ¸ ¸ ¸ ¸ ¸ . (A.108) The Euclidean norm of x, denoted by [[x[[ 2 , is simply defined by [[x[[ 2 = (x 2 1 + x 2 2 + + x 2 n ) 1/2 = 'x, x`. (A.109) It should be clear that the value of [[x[[ 2 provides us with an idea of how big x is. We recall that the Euclidean norm of a column n-vector satisfies the following conditions: [[x[[ 2 ≥ 0 and [[x[[ 2 = 0 if and only if x = 0, (A.110) A.2. Linear Algebra Review 289 [[αx[[ 2 = [α[ [[x[[ 2 for all scalars α, (A.111) [[x + y[[ 2 ≤ [[x[[ 2 +[[y[[ 2 , the triangle inequality. (A.112) For many applications, the Euclidean norm is not the most convenient to use in algebraic manipulations, although it has the most natural geometric interpretation. For this reason, one can generalize the notion of a norm in the following way. Definition A.2.5 Let x and y be column n-vectors. Then a scalar-valued function of x qualifies as a norm [[x[[ of x provided that the following three properties hold: [[x[[ > 0 ∀ x = 0, (A.113) [[αx[[ = [α[ [[x[[ ∀ α ∈ R, (A.114) [[x + y[[ ≤ [[x[[ +[[y[[ ∀ x, y. (A.115) The reader should note that Equations (A.113) to (A.115) represent a consistent generalization of the properties of the Euclidean norm given in Equations (A.110) to (A.112). In addition to the Euclidean norm, there are two other common norms: [[x[[ 1 = n ¸ i=1 [x i [, (A.116) [[x[[ ∞ = max i [x i [. (A.117) We encourage the reader to verify that the norms defined by (A.116) and (A.117) indeed satisfy the properties given in Equations (A.110) to (A.112). Example A.2.3 Suppose that x is the column vector x = 2 −1 3 ¸ ¸ ; (A.118) then [[x[[ 1 = [2[ +[ −1[ +[3[ = 6, [[x[[ 2 = (4 +1 +9) 1/2 = √ 14, [[x[[ ∞ = max¦[2[, [ − 1[, [3[¦ = 3. 290 Appendix A: Background Matrix Norms Next we turn our attention to the concept of a norm of a matrix. To motivate the definition we simply note that a column n-vector can also be viewed as an n 1 matrix. Thus, if we are to extend the properties of vector norms to those of the matrix norms, they should be consistent. For this reason, we have the following definition. Definition A.2.6 Let A and B be real n m matrices with elements a ij and b ij (i = 1, 2, . . . , n: j = 1, 2, . . . , m). Then the scalar-valued function [[A[[ of A qualifies as the norm of A if the following properties hold: [[A[[ > 0 provided not all a ij = 0, (A.119) [[αA[[ = [α[ [[A[[ ∀ α ∈ R, (A.120) [[A + B[[ ≤ [[A[[ +[[B[[. (A.121) As with vector norms, there are many convenient matrix norms, e.g., [[A[[ 1 = n ¸ i=1 m ¸ j=1 [a ij [, (A.122) [[A[[ 2 = n ¸ i=1 m ¸ j=1 a 2 ij 1/2 , (A.123) [[A[[ ∞ = max i m ¸ j=1 [a ij [. (A.124) Once more we encourage the reader to prove that these matrix norms do indeed satisfy the defining properties of Equations (A.119) to (A.121). Properties Two important properties that hold between norms which involve multiplication of a matrix with a vector and multiplication of two matrices are summarized in the following two theorems. A.2. Linear Algebra Review 291 Theorem A.2.12 Let A be an n m matrix with real elements a ij (i = 1, 2, . . . , n: j = 1, 2, . . . , m). Let x be a column m-vector with elements x j (j = 1, 2, . . . , m). Then [[Ax[[ ≤ [[A[[ [[x[[ (A.125) in the sense that (a) [[Ax[[ 1 ≤ [[A[[ 1 [[x[[ 1 , (A.126) (b) [[Ax[[ 2 ≤ [[A[[ 2 [[x[[ 2 , (A.127) (c) [[Ax[[ ∞ ≤ [[A[[ ∞ [[x[[ ∞ . (A.128) Proof: Let y = Ax; then y is a column vector with n-components y 1 , y 2 , . . . , y n . (a) [[Ax[[ 1 = [[y[[ 1 = n ¸ i=1 [y i [ = n ¸ i=1 m ¸ j=1 a ij x j ≤ n ¸ i=1 m ¸ j=1 [a ij x j [ = n ¸ i=1 m ¸ j=1 [a ij [[x j [ ≤ n ¸ i=1 m ¸ j=1 [a ij [ [[x[[ 1 since [[x[[ 1 ≥ [x j [ = n ¸ i=1 m ¸ j=1 [a ij [ [[x[[ 1 = [[A[[ 1 [[x[[ 1 . (b) [[Ax[[ 2 = [[y[[ 2 = n ¸ i=1 [y i [ 1/2 = n ¸ i=1 m ¸ j=1 a ij x j 2 ¸ ¸ 1/2 ≤ ¸ n ¸ i=1 m ¸ j=1 a 2 ij m ¸ j=1 x 2 j ¸ 1/2 by the Schwartz inequality = ¸ n ¸ i=1 m ¸ j=1 a 2 ij [[x[[ 2 2 ¸ 1/2 = n ¸ i=1 m ¸ j=1 a 2 ij 1/2 [[x[[ 2 = [[A[[ 2 [[x[[ 2 . 292 Appendix A: Background (c) [[AX[[ ∞ = [[y[[ ∞ = max i [y i [ = max i m ¸ j=1 a ij x j ≤ max i m ¸ j=1 [a ij x j [ = max i m ¸ j=1 [a ij [[x j [ ≤ max i m ¸ j=1 [a ij [ [[x[[ ∞ , because [[x[[ ∞ ≥ [x j [, = max i m ¸ j=1 [a ij [[x[[ ∞ = [[A[[ ∞ [[x[[ ∞ . We shall leave it to the reader to verify the following theorem by imitating the proofs of Theorem A.2.12. Theorem A.2.13 Let A be a real n m matrix and let B be a real m q matrix; then [[AB[[ ≤ [[A[[ [[B[[ (A.129) in the sense that (a) [[AB[[ 1 ≤ [[A[[ 1 [[B[[ 1 , (A.130) (b) [[AB[[ 2 ≤ [[A[[ 2 [[B[[ 2 , (A.131) (c) [[AB[[ ∞ ≤ [[A[[ ∞ [[B[[ ∞ . (A.132) A multitude of additional results concerning the properties of norms are available. Spectral Norm A very useful norm, called the spectral norm, of a matrix is denoted by [[A[[ s . Let A be a real nm matrix. Then A T is an mn matrix, and the product matrix A T A is an mm real matrix. Let us compute the eigenvalues of A T A, denoted by λ i (A T A), i = 1, 2, . . . , m. Since the matrix A T A is symmetric and positive semidefinite, it has A.3. Linear Dynamical Systems 293 real nonnegative eigenvalues, i.e., λ i (A T A) ≥ 0, i = 1, 2, . . . , m. (A.133) Then the spectral norm of A is defined by [[A[[ s = max[λ i (A T A)] 1/2 , (A.134) i.e., it is the square root of the maximum eigenvalue of A T A. Remark A.2.1 The singular values of A are given by λ 1/2 (A T A). A.2.10 Taylor’s Theorem for Functions of Vector Arguments We consider the Taylor expansion of a function of a vector argument. Using the definitions developed above, we have the following. Theorem A.2.14 Let f(x) : R n →R be N times continuously differentiable in all of its arguments at a point x 0 . Consider f(x 0 +εh), where |h| = 1. (It is not important which specific norm is used; for simplicity, we may assume it is the Euclidean.) Then f(x 0 + εh) = f(x 0 ) + ε ∂f ∂x x 0 h + ε 2 2! h T ∂ 2 f ∂x 2 x 0 h + + R N , where lim ε→0 R N ε N →0. The coefficients of the first two terms in the expansion are the gradient vector and the Hessian matrix. A.3 Linear Dynamical Systems In this section we review briefly some results in linear systems theory that will be used in the main text. For a more complete treatment, see [8] or [12]. Consider the 294 Appendix A: Background continuous-time linear system ˙ x(t) = A(t)x(t) + B(t)u(t), x(t 0 ) = x 0 given, (A.135) where x() ∈ R n , u() ∈ R m , and A and B are appropriately dimensioned real matrices. The functions a ij (t), i = 1, . . . , n, j = 1, . . . , n, that make up A(t) are continuous, as are the elements of B(t). The control functions in u() will be restricted to being piecewise continuous and everywhere defined. In most cases, for both clarity and convenience, we will drop the explicit dependence of the variables on time. When it is important, it will be included. Some Terminology. The vector x will be termed the state vector or simply the state, and u is the control vector. The matrix A is usually known as the plant or the system matrix, and B is the control coefficient matrix. We will assume that the system is always controllable. This means that given x 0 = 0 and some desired final state x 1 at some final time t 1 > t 0 , there exists some control function u(t) on the interval [t 0 , t 1 ] such that x(t 1 ) = x 1 using that control input. This is discussed in detail in [8]. Fact: Under the given assumptions, the solution x() associated with a particular x 0 and control input u() is unique. In particular, for u() ≡ 0 and any specified t 1 and x 1 , there is exactly one initial condition x 0 and one associated solution of (A.135) such that x(t 1 ) = x 1 . State Transition Matrix. Consider the system with the control input identically zero. Then we have ˙ x(t) = A(t)x(t). Under this condition, we can show that the solution x(t) is given by the relation x(t) = Φ(t, t 0 )x 0 , A.3. Linear Dynamical Systems 295 where Φ(, ) is known as the state transition matrix, or simply the transition matrix of the system. The state transition matrix obeys the differential equation d dt Φ(t, t 0 ) = A(t)Φ(t, t 0 ), Φ(t 0 , t 0 ) = I, (A.136) and has a couple of obvious properties: 1. Φ(t 2 , t 0 ) = Φ(t 2 , t 1 )Φ(t 1 , t 0 ), 2. Φ(t 2 , t 1 ) = Φ −1 (t 1 , t 2 ). It is important to note that the transition matrix is independent of the initial state of the system. In the special case of A(t) = A constant, the state transition matrix is given by Φ(t, t 0 ) = e A(t−t 0 ) , (A.137) where the matrix exponential is defined by the series e A(t−t 0 ) = I + A(t −t 0 ) + 1 2! A 2 (t −t 0 ) 2 + + 1 k! A k (t −t 0 ) k + . It is easy to see that the matrix exponential satisfies (A.136). Fundamental Matrix: A fundamental matrix of the system (A.135) is any matrix X(t) such that d dt X(t) = A(t)X(t). (A.138) The state transition matrix is the fundamental matrix that satisfies the initial condition X(t 0 ) = I. Fact: If there is any time t 1 such that a fundamental matrix is nonsingular, then it is nonsingular for all t. An obvious corollary is that the state transition matrix is nonsingular for all t (since the initial condition I is nonsingular). 296 Appendix A: Background Fact: If X(t) is any fundamental matrix of ˙ x = Ax, then for all t, t 1 ∈ R, Φ(t, t 1 ) = X(t) X −1 (t 1 ). Solution to the Linear System. The solution to the linear system for some given control function u() is given by x(t) = Φ(t, t 0 )x 0 + t t 0 Φ(t, τ)B(τ)u(τ)dτ. (A.139) The result is proven by taking the derivative and showing that it agrees with (A.135). 297 Bibliography [1] Anderson, B. D. O., and Moore, J. B. Linear Optimal Control. Prentice Hall, Englewood Cliffs, NJ, 1971. [2] Athans, M., Ed., Special Issue on the Linear-Quadratic-Gaussian Problem. IEEE Transactions on Automatic Control, Vol. AC-16, December 1971. [3] Athans, M., Dertouzos, M. L., Spann, R. N., and Mason, S. J. Systems, Networks, and Computation. McGraw-Hill, New York, 1974. [4] Bell, D. J., and Jacobson, D. H., Singular Optimal Control Problem. Academic Press, New York, 1975. [5] Betts, J., Practical Methods for Optimal Control Using Nonlinear Programming. SIAM, Philadelphia, 2001. [6] Bliss, G. A., Lectures on the Calculus of Variations. University of Chicago Press, Chicago, 1946. [7] Breakwell, J. V., Speyer, J. L., and Bryson, A. E. Optimization and Control of Nonlinear Systems Using the Second Variation. SIAM Journal on Control, Series A, Vol. 1, No. 2, 1963, pp. 193–223. [8] Brockett, R. W., Finite Dimensional Linear Systems. John Wiley, NewYork, 1970. 298 Bibliography [9] Broyden, C. G. The Convergence of a Class of Double-Rank Minimization Algorithms. Journal of the Institute of Mathematics and Its Applications, Vol. 6, 1970, pp. 76–90. [10] Bryant, G. F. and Mayne, D. Q. The Maximum Principle. International Journal of Control, Vol. 20, No. 6, 1174, pp. 1021–1054. [11] Bryson, A. E., and Ho, Y. C. Applied Optimal Control. Hemisphere Publishing, Washington, D.C., 1975. [12] Callier, F. M., and Desoer C. A., Linear Systems Theory, Springer Texts in Electrical Engineering, Springer, New York, 1994. [13] Cannon, M. D., Cullum, C. D., and Polak, E., Theory of Optimal Control and Mathematical Programming. McGraw-Hill, New York, 1970. [14] Clements, D. J., and Anderson, B. D. O., Singular Optimal Control: The Linear Quadratic Problem. Springer, New York, 1978. [15] Coddington, E., and Levinson, N., Theory of Ordinary Differential Equations. McGraw–Hill, New York, 1958. [16] Cullen, C. G., Matrices and Linear Transformations, 2nd Edition, Dover, New York, 1990. [17] Davidon, W. C. Variable Metric Method for Minimization. SIAM Journal on Optimization, Vol. 1, 1991, pp. 1–17. [18] Doyle, J. C., Glover, K., Khargonekar, P. P. and Francis, B., State-Space Solutions to Standard H-2 and H-Infinity Control-Problems, IEEE Transactions on Automatic Control, Vol. 34, No. 8, August 1989, pp. 831–847. Bibliography 299 [19] Dreyfus, S. E., Dynamic Programming and the Calculus of Variations. Academic Press, New York, 1965. [20] Dyer, P., and McReynolds, S. R., The Computation and Theory of Optimal Control. Academic Press, New York, 1970. [21] Friedman, A., Advanced Calculus. Holt, Rinehart and Winston, New York, 1971. [22] Gelfand, I. M., and Fomin, S. V., Calculus of Variations. Prentice Hall, Englewood Cliffs, NY, 1963. [23] Gill, P. E., Murray, W., and Wright, M. H., Practical Optimization, Academic Press, New York, 1981. [24] Grtschel, M., Krumke, S. O. and Rambau, J., Eds., Online Optimization of Large Scale Systems. Springer-Verlag, Berlin, 2001. [25] Halkin, H., Mathematical Foundations of System Optimization. Topics in 0ptimization, G. Leitmann, Ed., Academic Press, New York, 1967. [26] Hestenes, M. R., Calculus of Variations and Optimal Control Theory. John Wiley, New York, 1965. [27] Hohn, F. E., Elementary Matrix Algebra. 3rd Edition, Macmillan, New York, 1973. [28] Jacobson, D. H., A Tutorial Introduction to Optimality Conditions in Nonlinear Programming. 4th National Conference of the Operations Research Society of South Africa, November 1972. [29] Jacobson, D. H. Extensions of Linear-Quadratic Control, Optimization and Matrix Theory, Academic Press, New York, 1977. 300 Bibliography [30] Jacobson, D. H., Lele, M. M., and Speyer, J. L., New Necessary Conditions of Optimality for Control Problems with State Variable Inequality Constraints, Journal of Mathematical Analysis and Applications., Vol. 35, No. 2, 1971, pp. 255–284 . [31] Jacobson, D. H., Martin, D. H., Pacher, M., and Geveci, T. Extensions of Linear-Quadratic Control Theory, Lecture Notes in Control and Information Sciences, Vol. 27, Springer-Verlag, Berlin, 1980. [32] Jacobson, D. H., and Mayne, D. Q., Differential Dynamic Programming. Elsevier, New York, 1970. [33] Kuhn, H., and Tucker, A. W. Nonlinear Programming. Second Berkeley Sympo- sium of Mathematical Statistics and Probability, University of California Press, Berkeley, 1951. [34] Kwakernaak, H., and Sivan, R. Linear Optimal Control Systems. Wiley Interscience, New York, 1972. [35] Mangasarian, O. L., Nonlinear Programming. McGraw-Hill, New York, 1969. [36] Nocedal, J., and Wright, S. J., Numerical Optimization, Springer Series in Operations Research, P. Glynn and S. M. Robinson, Eds., Springer-Verlag, New York, 2000. [37] Pars, L., A Treatise on Analytical Dynamics. John Wiley, New York, 1965. [38] Pontryagin, L. S., Boltyanskii, V. G., Gamkrelidze, R. V., and Mischenko, E. F., The Mathematical Theory of Optimal Processes, L.W. Neustadt, Ed., Wiley Interscience, New York, 1962. Bibliography 301 [39] Rall, L. B., Computational Solution of Nonlinear Operator Equations. Wiley, New York, 1969. [40] Rhee, I., and Speyer, J. L., A Game-Theoretic Approach to a Finite-Time Disturbance Attenuation Problem, IEEE Transactions on Automatic Control, Vol. AC-36, No. 9, September 1991, pp. 1021–1032. [41] Rodriguez-Canabal, J., The Geometry of the Riccati Equation. Stochastics, Vol. 1, 1975, pp. 347–351. [42] Sain, M., Ed., Special Issue on Multivariable Control. IEEE Transactions on Automatic Control, Vol. AC-26, No. 1, February 1981. [43] Stoer, J., and Bulirsch, R., Introduction to Numerical Analysis, 3rd Edition, Texts in Applied Mathematics 12, Springer, New York, 2002. [44] Varaiya, P. P., Notes on Optimization. Van Nostrand, New York, 1972. [45] Willems, J. D., Least Squares Stationary Optimal Control and the Algebraic Riccati Equation. IEEE Transactions on Automatic Control, Vol. AC-16, December 1971, pp. 621–634. [46] Wintner, A., The Analytical Foundations of Celestial Mechanics. Princeton University Press, Princeton, NJ, 1947. [47] Zangwill, W., Nonlinear Programming: A Unified Approach. Prentice–Hall, Englewood Cliffs, NJ, 1969. Index accelerated gradient methods, 27 accessory minimum problem, 155, 161, 162 asymptotic, 208 asymptotically stable, 211 augmented performance index, 156 autonomous (time-invariant) Riccati equation, 214 bang-bang, 145 basis, 273 Bliss’s Theorem, 53, 68 bounded control functions, 79 brachistochrone problem, 13 calculus of variations, 161 canonical similarity transformation, 175 canonical transformation, 215 chain rule, 286 characteristic equation, 172, 276, 279 characteristic value, see eigenvalue cofactor, 276 complete controllability, 177 completed the square, 235 completely observable, 210 conjugate gradient, 28 constrained Riccati matrix, 207 continuous differentiability, 136 control weighting, 226 controllability, 180 controllability Grammian, 177, 199 determinant, 275 differentiable function, 15 differential game problem, 238 dimension, 274 disturbance attenuation function, 232, 236 disturbances, 235 eigenvalue, 215, 276, 279, 281 eigenvector, 276, 279, 281 303 304 INDEX elementary differential equation theory, 53 escape time, 179 Euclidean norm, 288, 289 extremal points, 14 first-order necessary condition, 32 first-order optimality, 111, 122 focal point condition, 175 free end-point control problem, 125 functional optimization problem, 11 fundamental matrix, 295 game theoretic results, 235 general second-order necessary condition, 44 general terminal constraint, 120 global optimality, 180 gradient, 285 Hamilton–Jacobi–Bellman (H-J-B) equation, 83, 86, 93, 96, 99, 148 Hamiltonian, 61, 119, 124 Hamiltonian matrix, 169 Hessian, 287 Hilbert’s integral, 87 homogeneous ordinary differential equation, 57 homogeneous Riccati equation, 212 identity matrix, 175 Implicit Function Theorem, 37, 45, 121, 262 inequality constraint, 45 influence function, 130 initial value problem, 72 integrals of vectors and matrices, 284 integrate by parts, 59 Jacobi condition, 175 Jacobian matrix, 286 Lagrange multiplier, 35, 39, 43, 44, 47, 58 Laplace expansion, 275 Legendre–Clebsch, 78 linear algebra, 53 linear control rule, 170 linear dynamic constraint, 161 linear dynamic system, 55 linear independence, 273 linear minimum time problem, 145 linear ordinary differential equation, 117 linear-quadratic optimal control, 72 INDEX 305 linear-quadratic regulator problem, 213 Lyapunov function, 211 matrix inverse, 275 matrix norms, 290 matrix Riccati differential equation, 186 matrix Riccati equation, 173, 186 maximize, 231 minimax, 238 minimize, 231 minor, 276 modern control synthesis, 155 monotonic, 208 monotonically increasing function, 208 multi-input/multi-output systems, 155 necessary and sufficient condition, 180 necessary and sufficient condition for the quadratic cost criterion, 201 necessary condition, 119 necessary condition for optimality, 111 negative definite matrix, 278 Newton–Raphson method, 27 nonlinear control problems, 91 nonlinear terminal equality constraints, 111 nonnegative definite matrix, 278 nonpositive definite matrix, 278 norm, 289, 290 norm of a column vector, 288 norm of a matrix, 288, 290 normality, 124, 125 normality condition, 195 null space, 274 observability Grammian matrix, 210 optimal control rule, 196 optimal value function, 84 order, 271 parameter optimization, 12 Parseval’s Theorem, 254 penalty function, 112, 129 perfect square, 178 piecewise continuous, 162 piecewise continuous control, 56 piecewise continuous perturbation, 117 piecewise differentiability, 75 Pontryagin’s Principle, 61, 83, 111 positive definite, 177 positive definite matrix, 278 propagation equation, 176, 194 306 INDEX quadratic form, 277 quadratic matrix differential equation, 172 quadratic performance criterion, 161 quasi-Newton methods, 27 range space, 274 rank, 274 Riccati equation, 174, 178, 235 saddle point inequality, 233 saddle point optimality, 235 sampled data controller, 170 saturation, 236 second-order necessary condition, 34 singular control problem, 185 singular matrix, 275, 276 singular values, 255 skew-symmetric matrix, 282 slack variables, 46 spectral norm, 292 spectral radius condition, 248 stabilizability, 212 state weighting, 226 steepest descent, 74 steepest descent algorithm, 129 steepest descent method, 26 steepest descent optimization with constraints, 43 strong form of Pontryagin’s Principle, 75 strong perturbations, 75, 133 strong positivity, 186 strong variations, 167 strongly first-order optimal, 134, 137 strongly locally optimal, 134 strongly positive, 17, 181, 185, 203, 221, 226 subspace, 273 sufficiency conditions, 186 sufficient condition for optimality, 83 switch time, 147 symplectic property, 170, 193 syntheses: H 2 , 232, 244 syntheses: H ∞ , 231, 253 Taylor expansion, 271 terminal constraint, 119, 150 terminal equality constraints, 111 terminal manifold, 199 time derivative of the matrix, 284 totally singular, 186 tracking error, 236 INDEX 307 trajectory-control pair, 64 transition matrix, 57, 169, 170, 294 two-point boundary-value problem, 53, 72, 112, 128 unconstrained Riccati matrix, 207 variational Hamiltonian, 163 weak first-order optimality, 122 weak perturbation, 111 weak perturbations in the control, 122 weak Pontryagin’s Principle, 74, 119 zero sum game, 233 Primer on Optimal Control Theory Advances in Design and Control SIAM’s Advances in Design and Control series consists of texts and monographs dealing with all areas of design and control and their applications. Topics of interest include shape optimization, multidisciplinary design, trajectory optimization, feedback, and optimal control. The series focuses on the mathematical and computational aspects of engineering design and control that are usable in a wide variety of scientific and engineering disciplines. Editor-in-Chief Ralph C. Smith, North Carolina State University Editorial Board Athanasios C. Antoulas, Rice University Siva Banda, Air Force Research Laboratory Belinda A. Batten, Oregon State University John Betts, The Boeing Company (retired) Stephen L. Campbell, North Carolina State University Eugene M. Cliff, Virginia Polytechnic Institute and State University Michel C. Delfour, University of Montreal Max D. Gunzburger, Florida State University J. William Helton, University of California, San Diego Arthur J. Krener, University of California, Davis Kirsten Morris, University of Waterloo Richard Murray, California Institute of Technology Ekkehard Sachs, University of Trier Series Volumes Speyer, Jason L., and Jacobson, David H., Primer on Optimal Control Theory Betts, John T., Practical Methods for Optimal Control and Estimation Using Nonlinear Programming, Second Edition Shima, Tal and Rasmussen, Steven, eds., UAV Cooperative Decision and Control: Challenges and Practical Approaches Speyer, Jason L. and Chung, Walter H., Stochastic Processes, Estimation, and Control Krstic, Miroslav and Smyshlyaev, Andrey, Boundary Control of PDEs: A Course on Backstepping Designs Ito, Kazufumi and Kunisch, Karl, Lagrange Multiplier Approach to Variational Problems and Applications Xue, Dingyü, Chen, YangQuan, and Atherton, Derek P., Linear Feedback Control: Analysis and Design with MATLAB Hanson, Floyd B., Applied Stochastic Processes and Control for Jump-Diffusions: Modeling, Analysis, and Computation Michiels, Wim and Niculescu, Silviu-Iulian, Stability and Stabilization of Time-Delay Systems: An EigenvalueBased Approach ¸ Ioannou, Petros and Fidan, Baris, Adaptive Control Tutorial Bhaya, Amit and Kaszkurewicz, Eugenius, Control Perspectives on Numerical Algorithms and Matrix Problems Robinett III, Rush D., Wilson, David G., Eisler, G. Richard, and Hurtado, John E., Applied Dynamic Programming for Optimization of Dynamical Systems Huang, J., Nonlinear Output Regulation: Theory and Applications Haslinger, J. and Mäkinen, R. A. E., Introduction to Shape Optimization: Theory, Approximation, and Computation Antoulas, Athanasios C., Approximation of Large-Scale Dynamical Systems Gunzburger, Max D., Perspectives in Flow Control and Optimization Delfour, M. C. and Zolésio, J.-P., Shapes and Geometries: Analysis, Differential Calculus, and Optimization Betts, John T., Practical Methods for Optimal Control Using Nonlinear Programming El Ghaoui, Laurent and Niculescu, Silviu-Iulian, eds., Advances in Linear Matrix Inequality Methods in Control Helton, J. William and James, Matthew R., Extending H1 Control to Nonlinear Systems: Control of Nonlinear Systems to Achieve Performance Objectives Primer on Optimal Control Theory Jason L. Speyer University of California Los Angeles, California David H. Jacobson PricewaterhouseCoopers LLP Toronto, Ontario, Canada Society for Industrial and Applied Mathematics Philadelphia Copyright © 2010 by the Society for Industrial and Applied Mathematics. 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA. Trademarked names may be used in this book without the inclusion of a trademark symbol. These names are used in an editorial context only; no infringement of trademark is intended. Library of Congress Cataloging-in-Publication Data Speyer, Jason Lee. Primer on optimal control theory / Jason L. Speyer, David H. Jacobson. p. cm. Includes bibliographical references and index. ISBN 978-0-898716-94-8 1. Control theory. 2. Mathematical optimization. I. Jacobson, David H. II. Title. QA402.3.S7426 2010 515’.642--dc22 2009047920 is a registered trademark. To Barbara, a constant source of love and inspiration. To my children, Gil, Gavriel, Rakhel, and Joseph, for giving me so much joy and love. For Celia, Greta, Jonah, Levi, Miles, Thea, with love from Oupa! t . . . .2. . . . . . . . Purpose and General Outline . . . . . . . . Numerical Approaches to One-Dimensional Minimization . . . . . . . . . . . . . . . . . . . . . . . Unconstrained Minimization . . . . . . . . Inequality Constraints: Functions of 2-Variables .2. . . . . . General Case: Functions of n-Variables . . . . . . . . . . . . Numerical Optimization Schemes . . . . . . . . . . .vii Contents List of Figures Preface 1 Introduction 1. . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . .2. . . .5 2. . . . . . . . .5 Scalar Case . .3 2. xi xiii 1 2 5 7 11 11 14 15 18 21 23 25 28 29 36 40 44 45 2 Finite-Dimensional Optimization 2.3. . . . . . . . . . . Multivariable Second-Order Conditions . . . . . . . . . . . . . . . . .2 2. . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . General Optimal Control Problem . . . . . . . . . . .3 2. .3. . . . . . . . . . . . . . . . . . . . . . . . . .1 2. . . .1 2. .2 1. . Constrained Parameter Optimization Algorithm . . . .4 2. . . . . General Form of the Second Variation . . . .4 2. . . . . . . . .1 2.3 2. .2. . . . . . . . .1 1. .2 Motivation . . .2 2. . . . . . . .3.3. . . . . . . . . . . .3 Control Example . . . Multivariable First-Order Conditions . . . . . . 2. . . . . . Minimization Subject to Constraints . . . . . Simple Illustrative Example . 3 Adjoining System Equation . . . . . . . 3. . . . . . . . . . . . .3 Linear Dynamics with Nonlinear Terminal Equality Constraints . . . . 120 . .1 Control Constraints . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . 3. . . . . . . . . .2. . . . . . . . . . . . . . . . 3. .5 Sufficient Conditions for Optimality .1 Derivatives of the Optimal Value Function .2 Pontryagin’s Weak Necessary Condition . . . .2. . . . . 111 . . . . .2 Pontryagin Necessary Condition: Special Case . . . 113 . .3 Nonlinear Dynamic System . . . . . . . . . . . . . . . . . . . . . . . . .1 Introduction . 3. . . . . . . . . . . . . .2. . 3. . . . . . . . . . .1 Strong First-Order Optimality with Control Constraints Contents . . . . . . . .5. . . . . . . . . . . .3 Maximum Horizontal Distance: A Variation of the Brachistochrone Problem . . .2 Linear Dynamic System with Terminal Equality Constraints . . . . . . . . .3. . . .2. 4. . .2. . . . . 113 . . . . . . . .5. . . . . . . . . . . . . . 4 Terminal Equality Constraints 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . .4 Two-Point Boundary-Value Problem . 4. . . . . . . . . . . . . 4. . .4.3. . . . . . 4. . . .2 Linear Dynamic Systems . . . . 3. . 3. 4. . . . . . . .2 Derivation of the H-J-B Equation . .1 Linear Ordinary Differential Equation . 4. . . . . . .6 Unspecified Final Time tf . . . . . . . . . 3.2. 120 . . . . . . . .3. . . . . .1 Perturbations in the Control and State from the Optimal Path . .1 Linear Dynamic System with Linear Terminal Equality Constraints . 3.4. . 111 . . . . 122 123 133 138 . . . . . . . . . . . . . . . . . . . . . . . . .3 Weak First-Order Optimality with Nonlinear Dynamics and Terminal Constraints . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Expansion Formula . . .2. . . . . . 3. 4. . . 53 53 55 57 58 58 59 60 61 64 65 67 68 71 74 79 83 91 96 99 . . . . . .6 Pontryagin’s Necessary Condition for Weak Variations . . . . . . . ˆ 3.3. . . . . . . . . . . . 3. . . . . . . . . . 3. . . . . . . . . . . . 3.1 Sufficient Condition for Weakly First-Order Optimality . . . . . . . . . .viii 3 Optimization of Dynamic Systems with General Performance Criteria 3. .4 Strong First-Order Optimality . . . . . . . . . . .1 Introduction . . . . 3. . . . . . . . . . . . . . . . . . . 3. . . . 3.5 Necessary Condition for Optimality .4 Strong Variations and Strong Form of the Pontryagin Minimum Principle .4 Expansion of J . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . .3. . .5. . .1 Symplectic Properties of the Transition Matrix . . . . . . 238 . . . . . . . . . . . . . . . . . 6 LQ 6. . . . . . . . . . . . . . . . .5 LQ Problem with Linear Terminal Constraints . . . . . . . . . . . . 6. 231 . . . . . .1 The Disturbance Attenuation Problem Converted into a Differential Game . . . . .3 Canonical Transformation .6 Solution of the Matrix Riccati Equation: Additional Properties . . 5. . .1 Motivation of the LQ Problem . . . . . . . . 6. . . . . . . . . . .6 Strong Positivity and the Totally Singular Second Variation 5. . . 148 155 156 161 162 168 170 172 175 177 181 185 188 192 197 201 205 213 217 225 5 Linear Quadratic Control Problem 5. . . . .2 6. . . . . . . . . . . . 5. . . . . . . . . 256 . . . . . . . . 5. . . . . .3. 5. . . . . . . 245 . . . . .3 Necessary and Sufficient Conditions for the Optimality of the Disturbance Attenuation Controller . . .1 Normality and Controllability . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . 5.3. . . .4. . . . . . . . .4. . . . .8 Necessary and Sufficient Conditions for Free Terminal Time . .4. . . . . . . . .4. . . 239 . . . . . . . . . 232 . . . 5. . 235 . . . . . . 231 . . Disturbance Attenuation Problem .2 Preliminaries and LQ Problem Formulation . . . . .4 Necessary and Sufficient Conditions . . . . . . . . . . . . . .4 Transition Matrix Approach without Terminal Constraints . . 5. . . . . . . . . . . . . . . . . . . . .4. . . .3 Differential Games Introduction . . . . . .4. . . . . . . . . . . 5. 6.7 ix Unspecified Final Time tf . . . . . . 5. 5. . . 254 . . . . . . . . . . . . . . 5. . . . . . . . . . . . .3. . . . .6 4. . . . . . . .3 First-Order Necessary Conditions for Optimality . . . . . . . . . . .5 H∞ Measure and H∞ Robustness Bound . LQ Differential Game with Perfect State Information . . . . . . . . . . . . . . . 5. 250 . . . 145 Sufficient Conditions for Global Optimality . . . . . .7 Solving the Two-Point Boundary-Value Problem via the Shooting Method . .4 Time-Invariant Disturbance Attenuation Estimator Transformed into the H∞ Estimator .Contents 4. . .6 The H∞ Transfer-Matrix Bound . 6. . .5.5 4. . . . . .2 Necessary and Sufficient Conditions .3. . . . 142 Minimum Time Problem Subject to Linear Dynamics . .9 Summary . .5 Necessary and Sufficient Conditions for Strong Positivity . . . . . . . .1 6. 5. . . . . .4. . .2 Riccati Matrix Differential Equation . . . .7 LQ Regulator Problem . 5. . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . 5. . . . .2 Solution to the Differential Game Problem Using the Conditions of the First-Order Variations .3. . . . . . . .2. A. . . . .2 Matrices and Rank . . . . . . . . . . .5 Quadratic Forms and Definite Matrices . . . . . . . . . . .2. A. .2. . . . . .2. . . . . A. . . . . .1 Subspaces and Dimension . . . . . . . . .x A Background A. . . . . . . .2. . . . . . . . . . . . . A. A. . . . . . . . . . . . . . . . . . . . .7 Gradient Vectors and Jacobian Matrices A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . .1 Topics from Calculus . . A.3 Linear Dynamical Systems . . . . . . .9 Vector and Matrix Norms . . . A.2 Taylor Expansions . . . . .10 Taylor’s Theorem for Functions of Vector A. .2 Linear Algebra Review .8 Second Partials and the Hessian . A. . . . . . . . . . . .2. . . . . . . . A. . . . . . . . . . . . . . .4 Eigenvalues and Eigenvectors . . . . . .2. . .6 Time-Varying Vectors and Matrices . . . . . .2. . . . . Bibliography Index Contents 261 261 261 271 273 273 274 275 276 277 283 284 287 288 293 293 297 303 . . . . . . . . . . . . . . . . . . . . . . . .1. . . . A. . . . . . . . . . .1 Implicit Function Theorems . . . . .3 Minors and Determinants . A. . . . A. . . . . . . . . . . .2. . . . . . . . . . . . . . . . Arguments . . . . . . . . . . . . . . . .1. . . . .xi List of Figures 1. . 147 Optimal value function for the Bushaw problem . . Transfer function of square integrable signals . . . . . .3 5. . . . . . . 184 Disturbance attenuation block diagram . 5 13 14 15 30 31 34 42 56 79 Depiction of weak and strong variations . . . . . . . . . . . . . . . . .1 6. . . . .4 6.1 4. . . . . . . . . . . . . . . . . . . A brachistochrone problem . . . . . . . . . . . . . . Geometrical description of parameter optimization . . . . . . . . . . . . . . . . . problem . . . . . . .1 Definition of Fy (x0 . . . . . . . . . . .5 2. . . . 236 254 257 258 259 A. . . Definition of tangent plane . 263 . . . . . . . . . . . . . . . y0 ) > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ellipse definition . . . . . . . . . . . . . . . . . . . . . . .2 2. . . .2 6. . . . . . . . .6 2. .3 6. . . . . . . . . . . . . .7 3. . . . .3 2. . . . Definition of extremal points . . . . . . . . . System description . . . . . . . 151 Coordinate frame on a sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function with a discontinuous derivative . . . . . . . .2 4. ¯ Definition of V . . . . . . . Transfer matrix from the disturbance inputs to output performance Roots of P as a function of θ−1 . . . . . . . . . . . . . 126 Phase portrait for the Bushaw problem . . . . . . . . . .5 Control-constrained optimization example . .1 2. . . . . . . . . . . . . . . . . . . . . . . . . . . Rocket launch example . . . . . . . . . . . . . .1 6. . . . . . . . . . . . . .4 2. . .1 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 4. . . . . Bounded control . .1 3. . but who need to understand and want to appreciate the sophisticated material associated with optimal control theory. 3. Since then the book has constantly evolved by modification of those chapters as we interacted with colleagues and students. the first author must express his gratitude to Professor Bryson. We would like to thank our many students whose input over the years has been incorporated into this final draft. Special thanks are extended to Professor David Chichka. Furthermore. and advanced numerical methods. whose careful and critical reading of the manuscript has led to a much-improved final draft. the singular control problem. and 6. the material is presented using elementary mathematics.xiii Preface This book began when David Jacobson wrote the first draft of Chapters 1. although they have a basic mathematical background. we have spent many hours discussing the concepts of optimal control theory with Professor David Hull. mentor. Finally. who contributed some interesting examples and numerical methods. The objective of the book is to make optimal control theory accessible to a large class of engineers and scientists who are not mathematicians. and 4 and Jason Speyer wrote Chapters 2. In particular. a pioneer in the development of the theory. such as inequality constraints. which is sufficient to treat and understand in a rigorous way the issues underlying the limited class of control problems in this text. and Professor Moshe Idan. Our colleagues also have been very influential in the approach we have taken. . Therefore. We owe much to them for this polished version. and dear friend. and application of optimal control theory as well as a teacher. numerical methods. the foundation laid here should be adequate for reading the rich literature on these subjects. although many topics that build on this foundation are covered briefly. 5. and acceleration. This motion is influenced by the inclusion of a control vector. time. say. The essential features of such systems as addressed here are dynamic systems. and constraints under which a system must operate. Models of the dynamic system are described by a set of firstorder coupled nonlinear differential equations representing the propagation of the state variables as a function of the independent variable. and economies have at the disposal of an operator certain controls which can be modulated to enhance some desired property of the system. For example. in commercial aviation. Such systems as aircraft.1 Chapter 1 Introduction The operation of many physical processes can be enhanced if more efficient operation can be determined. The state vector may be composed of position. Full employment and growth of the gross domestic product are measures of economic system performance. the best fuel usage at cruise is an important consideration in an airline’s profitability. these may be enhanced by proper modulation of such controls as the change in discount rate determined by the Federal Reserve Board or changes in the tax codes devised by Congress. The performance criterion which establishes the effectiveness of the control process on the dynamical system can take . available controls. the throttle setting and the aerodynamic surfaces influence the motion of the aircraft. measures of system performance. For example. chemical processes. velocity. Consider the forced harmonic oscillator described as x + x = u. The overdot denotes time differentiation. ¨ x(0).1 Control Example A control example establishes the notion of control and how it can be manipulated to satisfy given goals. ˙ This second-order linear differential equation can be rewritten as two first-order dif- . the path of an aircraft may terminate in minimum time at a given altitude and velocity. Force constraints or maximum-altitude constraints may be imposed for practical implementation. x(0) given. 1. the reader will be directed to appropriate references. endurance (fuel per time). These concepts as well as the optimization concepts for the following chapters are described using elementary mathematical ideas. Therefore. An important class of constraints are those imposed at the termination of the path. The objective is to develop a mathematical structure which can be justified rigorously using elementary concepts. Furthermore.1) where x is the position. x is dx/dt. a simple dynamic example is given to illustrate some of the concepts that are described in later chapters. For example. the treatment here is not the most general but does cover a large class of optimization problems of practical concern. desired performance might be efficient fuel cruise (fuel per range). that is. The performance criterion is to be optimized subject to the constraints imposed by the system dynamics and other constraints. or time to a given altitude. ˙ (1.2 Chapter 1. In this chapter. path constraints that are functions of the controls or the states or are functions of both the state and control vectors may be imposed. Introduction many forms. If more complex or sophisticated ideas are required. For an aircraft. . Some common objectives are to minimize the time needed to reach the desired state or to minimize the effort it takes.4) is controllable (general comments on this issue can be found in [8]). This in turn requires some definition for “best. is a constant.” There is a large number of possible criteria. ˙ or x1 ˙ x2 ˙ = 0 1 −1 0 x1 x2 + 0 1 u.4) Suppose it is desirable to find a control which drives x1 and x2 to the origin from arbitrary initial conditions. suppose the control is proportional to the velocity such as u = −Kx2 .5) where a1 > 0 and a2 > 0. A criterion that allows the engineer to balance the amount of error against the effort expended is often useful. 3 (1. x2 (0) given.2) (1. ˙ x2 = −x1 + u. Then. there are many ways that this system can be driven to the origin.1. specialized here to tf J1 = lim tf →∞ 0 (a1 x2 + a2 x2 + u2 )dt. Since system (1. and u = −Kx2 is substituted into the performance criterion. asymptotically the position and velocity converge to zero as t → ∞. For example. Control Example ferential equations by identifying x1 = x and x2 = x.3) (1.4). One particular formulation of this trade-off is the quadratic performance index. K > 0. It might logically be asked if there is a best value of K. The constant parameter K is to be determined such that the cost criterion is minimized subject to the functional form of Equation (1.1. Note that the system converges for any positive value of K. 1 2 (1. Then ˙ x1 = x 2 . x1 (0) given. This is especially true for the quadratic performance index subject to a linear dynamical system (see Chapters 5 and 6).5). x2 ) over time. dx 2 /dt (−x1 + u) Assuming u is a constant. the parameter minimization problem is introduced to develop some of the basic concepts that are used in the solution. note that Equations (1. For this problem to make sense.6) In later chapters it is shown that the best solution often lies on its bounds. However. otherwise. but the best functional form will be produced by the optimization process. such as a maximum throttle setting or limits to steering. Here. the control must be limited in some way. the process will (usually) produce a control that is expressed as a function of the state of the system rather than an explicit function of time.8) (1. (1. both sides can be integrated to get (x1 − u)2 + x2 = R2 . infinite effort would be expended and the origin reached in zero time.4 Chapter 1. In Chapter 2.7) . Other performance measures are of interest. minimum time has been mentioned for where the desired final state was the origin. for illustration.3) can be combined by eliminating time as dx 1 /dt x2 = ⇒ (−x1 + u)dx 1 = x2 dx 2 . 2 (1. the control variable is bounded as |u| ≤ 1. a point to note is that the control u does not have to be chosen a priori. Introduction We will not solve this problem here. That is. Another possibility is to explicitly bound the control. the limitation came from penalizing the use of control (the term u2 inside the integral). For example.2) and (1. To produce some notion of the motion of the state variables (x1 . In the quadratic performance index in (1. This could represent some physical limit. (1. General Optimal Control Problem 5 B u = −1 C A D u=1 x1(0). t). the origin is obtained.1 There are many possible paths that drive the initial states (x1 (0). 1. the path proceeds to point A or B. likely nonlinear.2 General Optimal Control Problem The general form of the optimal control problems we consider begins with a first-order. although starting from point B. u = −1 until point C or D is intercepted. x2 (0)). Neither of these paths starting from the initial conditions is a minimum time path. From A or B the control changes. x2(0) Figure 1. From these points using u = +1. The methodology for determining the optimal time paths is given in Chapter 4.1. ˙ x(t0 ) = x0 .9) .1: Control-constrained optimization example. u.2. which translates to a series of concentric circles for any specific value of the control. Starting with u = 1 at some arbitrary (x1 (0). For u = 1 and u = −1. dynamical system of equations as x = f (x. the resulting paths are minimum time. the series of concentric circles are as shown in Figure 1. x2 (0)) to the origin. There may also be several other constraints. It may also attempt to minimize the amount of energy expended in achieving the desired goal or to limit the control effort expended. we limit the class of control functions U to the class of bounded piecewise continuous functions. the performance index might be as simple as minimizing the final time (set φ(tf ) = tf and L(·. t)dt. or any combination of these and many other considerations. that of achieving some specified final condition exactly. The performance of the dynamical system is to be modulated to minimize some performance index. The motion of the system and the amount of control available may also be subject to hard limits. This reflects a common requirement in engineering problems. Denote x(t) as x. t) ≤ 0. One very common form of constraint.9). (1. The terms in the performance index are often driven by considerations of energy use and time constraints.4). In the example of Section 1. f : Rn × Rm × R1 → Rn . These bounds may be written as S(x(t). (1.1. u ∈ Rm . tf ) + t0 L(x. Recall that (˙) denotes d( )/dt. u. Introduction where x ∈ Rn . (1. In the formulation of the problem.12) . which we assume to be of the form tf J = φ(x(tf ). For example. and the functions of x and u as x(·) and u(·). which we treat at length. The solution is to be such that the functional J takes on its minimum for some u(·) ∈ U subject to the differential equations (1.10)). u(t) as u.11) where ψ : Rn ×R1 → Rp .10) where φ : Rn × R1 → R1 and L : Rn × Rm × R1 → R1 .6 Chapter 1. ·) ≡ 0 in (1. the system is given by Equation (1. is on the terminal state of the system: ψ(x(tf ). tf ) = 0. ·. 1. By local optimality we mean that optimality can be verified about . the state variable inequality constraint given in (1.3 Purpose and General Outline This book aims to provide a treatment of control theory using mathematics at the level of the practicing engineer and scientist. complete solutions to the general problem are stated and used. for instance. and additional references are given for completeness. To introduce important concepts. (1. mathematical style. Therefore. the altitude must always be greater than that of the landscape. The general problem cannot be treated in complete detail using essentially elementary mathematics. For an aircraft. 1. Furthermore. Many important classes of problems have been left out of our presentation. Purpose and General Outline 7 where S : Rn × R1 → R1 for a bound on the state only. u(t). These special cases are sufficiently broad to solve many interesting and important problems. or more generally for a mixed state and control space bound g(x(t). t) ≤ 0. For example. important special cases of the general problems can be treated in complete detail using elementary mathematics.13) where g : Rn × Rm × R1 → R1 . However. These bounds represent physical or other limitations on the system.3. these special cases suggest solutions to the more general problem. in Chapter 2 the parameter minimization problem is formulated and conditions for local optimality are determined. The theoretical gap between the solution to the special cases and the solution to the general problem is discussed. and notation. and the control available is limited by the physical capabilities of the engines and control surfaces.12) is beyond the scope of this book. “Strong control variation” means that the variation is zero over most of the path.8 Chapter 1. but sometimes very helpful theoretically. the notions of first. This leads to the classical Weierstrass local conditions and its more modern generalization called the Pontryagin Maximum Principle. this small variation may be everywhere along the path. The first-order necessary conditions are generalized in Chapter 3 to the minimization of a general performance criterion with nonlinear dynamic systems constraints. local and global conditions for optimality are given for what are called weak and strong control variations. These necessary conditions require the solution to a partial differential equation known as the Hamilton–Jacobi–Bellman equation. “Weak control variation” means that at any point. These conditions are determined by . Next. the variation away from the optimal control is very small. In Chapters 3 and 4.and secondorder local necessary conditions for unconstrained parameter minimization problems are derived. The local optimality conditions are useful in constructing numerical algorithms for determining the optimal path. are the global sufficiency conditions. Less useful numerically. but along a very short section it may be arbitrarily large. The first-order necessary conditions are generalized in Chapter 4 to the minimization of a general performance criterion with nonlinear dynamic systems constraints and terminal equality constraints. Second-order local necessary conditions for the minimization of general performance criterion with nonlinear dynamic systems constraints for both unconstrained and terminal equality constrained problems are given in Chapter 5. Introduction a small neighborhood of the optimal point. the notion of first. however.and second-order local necessary conditions for parameter minimization problems is extended to include algebraic constraints. In Chapter 5 the second variation for weak control variations produces local necessary and sufficient conditions for optimality. First. This gives rise to the classical local necessary conditions of Euler and Lagrange. Purpose and General Outline 9 solving what is called the accessory problem in the calculus of variations.. The linear quadratic problem also arises directly and naturally in many applications and is the basis of much control synthesis work. Background material is included in the appendix.e. The reader is assumed to be familiar with differential equations and standard vector-matrix algebra. In Chapter 6 the linear quadratic problem of Chapter 5 is generalized to a two-sided optimization problem producing a zero-sum differential game.3. i. which is essentially minimizing a quadratic cost criterion subject to linear differential equations.1. The solutions to both the linear quadratic problem and the zero-sum differential game problem produce linear feedback control laws. known in the robust control literature as the H2 and H∞ controllers. . the linear quadratic problem. . These include the first-order and second-order necessary and sufficient conditions for optimality for both unconstrained and constrained minimization problems. we consider the functional optimization problem of minimizing with respect to u(·) ∈ U. many of the ideas developed to characterize the parameter optimal solution extend to the functional optimization problem but can be treated from a more transparent viewpoint in this setting. where a piecewise continuous control function is sought.1 tf J(u.1) U represents the class of bounded piecewise continuous functions. However. tf ) + t0 1 L(x(t). .1.1 Motivation for Considering Parameter Minimization for Functional Optimization Following the motivation given in Chapter 1. u(t). 2. This motivation for the study of parameter minimization is shown more fully in Section 2. x0 ) = φ(x(tf ).11 Chapter 2 Finite-Dimensional Optimization A popular approach to the numerical solution of functional minimization problems. t)dt (2. is to convert them to an approximate parameter minimization problem. ti ≤ t ≤ ti+1 . tf ]. . (2.12 subject to Chapter 2. t). N − 1}. x0 ) with respect to up . tf = tN . ˙ x0 given. Since the solution to ˆ (2.. x(t) = x(up . t). . (2. tf ) + u t0 L(x. t). this will produce a result that is suboptimal.e. the result will be close to optimal.2) This functional optimization problem can be converted to a parameter optimization or function optimization problem by assuming that the control is piecewise linear as u(t) = u(up . when care is taken. ti+1 − ti (2. ˙ ˆ x(0) = x0 given. the functional minimization problem is transformed into a parameter minimization problem to be solved over the time interval [t0 . t). then the cost criterion is tf J(ˆ(up ). (2. tf )) + u x t0 L(ˆ(up . u(up . and we define the parameter vector as up = {ui . .3) where i = 0.5) subject to x = f (x(t). t) = ui (ti ) + ˆ (t − ti ) (ui+1 − ui ). i. t). u Because we have made assumptions about the form of the control function. x ˆ (2. x0 ) = φ(ˆ(up . t)dt.4) The optimization problem is then as follows. u(t). However. u(up . . Finite-Dimensional Optimization x(t) = f (x(t). i = 0. t). t).6) Thus. Find the control u(·) ∈ U that ˆ minimizes tf J(ˆ. t)dt ˆ (2. u(up . . x0 ) = φ(x(tf ).7) The parameter minimization problem is to minimize J(ˆ(up ).6) is the state as a function of up . N − 1. . . . . The wire is frictionless. midpoint. u2 } = {θ(0). and the initial point O is taken to be the origin. ˙ r = v cos θ. a bead is sliding on a wire from an initial point O to some point on the wall at a known r = rf . The control can be parameterized in this case as a function of r more easily than as a function of time. first proposed by John Bernoulli in 1696.1. v(0) = 0.1. . θ(0. As shown in Figure 2. where g is the constant acceleration due to gravity. O r r rf v z Figure 2. and end. the control function is θ(t). r(tf ) = 1.1. r(0) = 0. The performance index to be minimized is simply ˆ J(θ. u1 . ˙ v = g sin θ. v(tf ) free. with the parameters being the slopes at the beginning. up = {u0 . as the final time is not known.1: A brachistochrone problem. consider a variant of the brachistochrone problem. and the system equations are z = v sin θ. Motivation 13 Example 2.1 As a simple example.2.5). O) = tf . θ(1)} . let rf = 1 and assume a simple approximation by dividing the interval into halves. The problem is to find the shape of the wire such that the bead arrives at the wall in minimum time. In this problem. To make the example more concrete. ˙ z(0) = 0. z(tf ) free. xb ]).1 Assume that φ(x) is continuously differentiable everywhere in [xa . 0 ≤ r ≤ 0. 3. we develop the theory of parameter optimization. and 4. . and 5.5. 2.1 We consider first only interior extremal points.5 < r ≤ 1.2. Assumption 2. The boundary point extrema are 1. φ 5 2 1 3 4 xa xb x Figure 2.5 The problem is now converted to minimization of the final time over these three independent variables.5 (u1 0. a relative minimum (absolute for x ∈ [xa . Remark 2. In the next sections. xb ]). r−0. a and relative maximum (absolute for x ∈ [xa . − u2 ). a relative maximum. xb ]. a relative minimum. xb ]. 0. 2.2.14 so that ˆ θ(r) = u0 + u1 + r (u1 0. The interior extremal points for this function are. Finite-Dimensional Optimization − u0 ). an inflection (saddle) point.2: Definition of extremal points.2.2 Unconstrained Minimization Consider that the cost criterion is a scalar function φ(·) of a single variable x for x ∈ [xa .5 Chapter 2. as shown in Figure 2. ∂x (2. Let xo be an optimal solution of the optimization problem min φ(x) x subject to x ∈ Ω.2. φ 15 x Figure 2. xb ). Let the cost criterion φ : R → R be a differentiable function. First-Order Necessary Conditions The following theorem and its proof sets the style and notation for the analysis that is used in more complex problems.2. These ideas are applied throughout the book.1 Scalar Case To focus on the essential notions of determining both local first.3.1 Let Ω = (xa . 2.and second-order necessary and sufficiency conditions. (2. a scalar problem is used.2.8) Then it is necessary that ∂φ o (x ) = 0.2.3: Function with a discontinuous derivative. Unconstrained Minimization The assumption avoids functions as shown in Figure 2. Theorem 2.9) . 1.16) . Finite-Dimensional Optimization Remark 2. This implies that for all xo + γα ∈ Ω.14) ∂φ o (x )α + O(γ). we have φ(xo ) ≤ φ(xo + γα). a maximum. then this implies ∂φ o (x ) = 0.2 This is a first-order necessary condition for stationarity (a local (relative) minimum. by Taylor’s Theorem (see Appendix A. ∂x γ (2. Since φ is differentiable. Proof: Since xo ∈ Ω and Ω is an open interval. ∂x (2.2. see Figure 2.13) Since the inequality must hold for all α.15) ∂φ o O(γ) (x )α + .12) Substitution of (2.11) into (2.2). (2. in particular for both positive and negative values of α. ∂x (2.16 Chapter 2.11) (2. or a saddle (inflection) point.10) where O(γ) denotes terms of order greater than γ such that O(γ) →0 γ as γ → 0. there exists an > 0 such that x ∈ Ω whenever |x − xo | < . ∂x (2. ∂x (2.2) φ(xo + γα) = φ(xo ) + γ ∂φ o (x )α + O(γ).10) yields 0≤γ Dividing this by γ > 0 gives 0≤ Let γ → 0 to yield 0≤ ∂φ o (x )α. where for any α ∈ R and 0 ≤ γ ≤ η (η is determined by α). 17) for all α. 2 1 O(γ 2 ) ⇒ 0 ≤ φxx (xo )α2 + 2 γ2 after dividing by γ 2 > 0.5 for a discussion on quadratic forms and definite matrices) and is another necessary condition.2. 2 where O(γ 2 ) →0 γ2 For γ sufficiently small. then it is called strongly positive.19).2.2.21) (2. and φx (xo ) = 0 and φxx (xo ) > 0 are (2. Sufficient Condition for a Local Minimum Suppose that xo ∈ Ω. 1 φ(xo ) ≤ φ(xo + γα) = φ(xo ) + γ 2 φxx (xo )α2 + O(γ 2 ).2. and by Taylor’s Theorem 1 φ(xo + γα) = φ(xo ) + γ 2 φxx (xo )α2 + O(γ 2 ). Unconstrained Minimization Second Variation 17 Suppose φ is twice differentiable and let xo ∈ Ω be an optimal or even a locally optimal solution. This means that φxx (xo ) is nonnegative (see Appendix A.3 If the second variation dominates all other terms in the Taylor series (2.21) is known as a second-order necessary condition or a convexity condition.22) . Then from φ(xo ) < φ(xo + γα) for all xo + γα ∈ Ω. Remark 2.20) as γ → 0. Then ∂φ/∂x(xo ) = φx (xo ) = 0.2).18) (2. (2.19) (2. we can conclude that xo is a local minimum (see Figure 2. Equation (2. and φxx (xo ) > 0 (strictly positive). For γ 2 → 0. φx (xo ) = 0. this yields 1 φxx (xo )α2 ≥ 0 2 (2. 18 Chapter 2. φxx (xo ) < 0. φxx (xo ) ≤ 0. the extremal is a local minimum. This concept becomes nontrivial in functional optimization. For more complete descriptions of numerical optimization. We make tacit assumptions that the functions involved are well behaved and satisfy continuity and smoothness conditions. discussed in Chapter 5. see such specialized texts as [23] and [36]. Expanding this into a Taylor series gives 1 1 1 ∆φ = δφ + δ 2 φ + δ 3 φ + δ 4 φ + · · · . Note that the conditions for a maximum can be obtained from those for a minimum by replacement of φ by −φ. (2. This will clarify what has just been presented. are sufficient conditions for a local maximum. are necessary conditions for a local maximum.2. Higher-Order Variations For α ∈ R and 0 ≤ γ ≤ η denote the change in φ as ∆φ = φ(xo + γα) − φ(xo ). and φx (xo ) = 0.23) Suppose that φx (xo ) = 0 and also φxx (xo ) = 0. . Finite-Dimensional Optimization necessary and sufficient conditions for a local minimum. 2 3! 4! where δφ = φx (xo )γα.25) (2. Hence φx (xo ) = 0. 2. These conditions can be seen. the extremal is a saddle. etc.24) (2. δ 2 φ = φxx (xo )(γα)2 . in the examples φ(x) = x3 and φ(x) = x4 .2 Numerical Approaches to One-Dimensional Minimization In this section we present two common numerical methods for finding the point at which a function is minimized. If φxxx (xo ) = 0. respectively. If φxxx (xo ) = 0 and φxxxx (xo ) > 0. 2.2. Unconstrained Minimization Golden Section Searches 19 Suppose that it is known that a minimum of the function φ(x) exists on the interval (a, b). The only way to be certain that an interval (a, b) contains a minimum is to have some x ∈ (a, b) such that φ(¯) < φ(a) and φ(¯) < φ(b). Assuming that there is ¯ x x only one minimum in the interval, the first step in finding its precise location is to find whether the minimizer is in one of the subintervals (a, x] or [¯, b). (The subintervals ¯ x are partly closed because it is possible that x is the minimizer.) ¯ To find out, we apply the same criterion to one of the subintervals. That is, we choose a test point xt ∈ (a, b), xt = x, and evaluate the function at that point. ¯ Suppose that xt < x. We can then check to see if φ(xt ) < φ(¯). If it is, we know that ¯ x the minimum lies in the interval (a, x). If φ(¯) < φ(xt ), then the minimum must lie in ¯ x the interval (xt , b). Note that due to our strong assumption about a single minimum, φ(¯) = φ(xt ) implies that the minimum is in the interval (xt , x). x ¯ What is special about the golden section search is the way in which the test points are chosen. The golden ratio has the value √ G= 5−1 ≈ 0.61803 . . . . 2 Given the points a and b bracketing a minimum, we choose two additional points x1 and x2 as x1 = b − G(b − a), x2 = a + G(b − a), which gives us four points in the order a, x1 , x2 , b. Now suppose that φ(x1 ) < φ(x2 ). Then we know that the location of the minimum is between a and x2 . Conversely, if φ(x2 ) < φ(x1 ), the minimum lies between x1 and b. In either case, we are left with three points, and the interior of these points is already in the right position to be 20 Chapter 2. Finite-Dimensional Optimization used in the next iteration. In the first case, for example, the new interval is (a, x2 ), and the point x1 satisfies the relationship x1 = a + G(x2 − a). This leads to the following algorithm: Given the points a, x1 , x2 , and b and the corresponding values of the function, then 1. If φ(x1 ) ≤ φ(x2 ), then (a) Set b = x2 , and φ(b) = φ(x2 ). (b) Set x2 = x1 , and φ(x2 ) = φ(x1 ). (c) Set x1 = b − G(b − a), and compute φ(x1 ). (Note: Use the value of b after updating as in 1(a).) 2. Else (a) Set a = x1 , and φ(a) = φ(x1 ). (b) Set x1 = x2 , and φ(x1 ) = φ(x2 ). (c) Set x2 = a + G(b − a), and compute φ(x2 ). (Note: Use the value of a after updating as in 1(a).) 3. If the length of the interval is sufficiently small, then (a) If φ(x1 ) ≤ φ(x2 ), return x1 as the minimizer. (b) Else return x2 as the minimizer. 4. Else go to 1. Note: The assumption that the function is well behaved impies that at least one of φ(x1 ) < φ(a) or φ(x2 ) < φ(b) is true. Furthermore, “well behaved” implies that the x ¯ second derivative φxx (¯) > 0 and that φx = 0 only at x on the interval. 2.2. Unconstrained Minimization Newton Iteration 21 The golden section search is simple and reliable. However, it requires knowledge of an interval containing the minimum. It also converges linearly; that is, the size of the interval containing the minimum is reduced by the same ratio (in this case G) at each step. Consider instead the point x and assume that the function can be well approxi¯ mated by the first few terms of the Taylor expansion about that point. That is, φ(x) = φ(¯ + h) ≈ φ(¯) + φx (¯)h + x x x Minimizing this expression over h gives h=− The method proceeds iteratively as xi+1 = xi − φx (xi ) . φxx (xi ) x φx (¯) . φxx (¯) x x φxx (¯) 2 h. 2 (2.26) It can be shown that near the minimum this method converges quadratically. That is, |xi+1 − xo | ∼ |xi − xo |2 . However, if the assumption (2.26) does not hold, the method will diverge quickly. 2.2.3 Functions of n Independent Variables: First-Order Conditions In this section the cost criterion φ(·) to be minimized is a function of an n-vector x. In order to characterize the length of the vector, the notion of a norm is introduced. (See Appendix A.2.9 for a more complete description.) For example, define the Euclidean norm as x = (xT x) 2 . 1 22 Chapter 2. Finite-Dimensional Optimization Theorem 2.2.2 Suppose x ∈ Rn where x = [x1 , . . . , xn ]T . Let φ(x) : Rn → R and be differentiable. Let Ω be an open subset of Rn . Let xo be an optimal solution to the problem2 min φ(x) x subject to x ∈ Ω. (2.27) Then ∂φ ∂x = φx (xo ) = 0, x=xo (2.28) where φx (xo ) = [φx1 (xo ), . . . , φxn (xo )] is a row vector. Proof: Since xo ∈ Ω is an open subset of Rn , then there exists an > 0 such (or an that x ∈ Ω whenever x belongs to an n-dimensional ball x − xo < n-dimensional box |xi − xo | < i i, i = 1, . . . , N ). Therefore, for every vector α ∈ Rn there is a γ > 0 (γ depends upon α) such that (xo + γα) ∈ Ω whenever 0 ≤ γ ≤ η, (2.29) where η is related to α . Since xo is optimal, we must then have φ(xo ) ≤ φ(xo + γα) whenever 0 ≤ γ ≤ η. (2.30) Since φ is once continuously differentiable, by Taylor’s Theorem (Equation (A.48)), φ(xo + γα) = φ(xo ) + φx (xo )γα + O(γ), where O(γ) is the remainder term, and into the inequality (2.30) yields 0 ≤ φx (xo )γα + O(γ). 2 (2.31) O(γ) γ → 0 as γ → 0. Substituting (2.31) (2.32) This implies that xo ∈ Ω. 2.2. Unconstrained Minimization Dividing this by γ and letting γ → 0 gives 0 ≤ φx (xo )α. Since the inequality must hold for all α ∈ Rn , we have φx (xo ) = 0. 23 (2.33) (2.34) Remark 2.2.4 Note that φx (xo ) = 0 gives n nonlinear equations with n unknowns, φx1 (xo ) = 0, . . . , φxn (xo ) = 0. This can be solved for xo , but it could be a difficult numerical procedure. Remark 2.2.5 Sometimes, instead of γα, the variation can be written as x − xo = δx = γα, but instead of dividing by γ, we can divide by δx . (2.35) 2.2.4 Functions of n Independent Variables: Second-Order Conditions Suppose φ is twice differentiable. Let xo ∈ Ω be locally minimum. Then φx (xo ) = 0 and by Taylor’s expansion (see Appendix A.2.10) 1 φ(xo + γα) = φ(xo ) + γ 2 αT φxx (xo )α + O(γ 2 ), 2 where O(γ 2 ) γ2 (2.36) → 0 as γ → 0. Note that φxx = (φT )x is a symmetric matrix x φx1 x1 . = . . φxn x1  ··· ··· φx1 xn φxn xn  . (2.37) φxx 24 For γ > 0 sufficiently small, Chapter 2. Finite-Dimensional Optimization 1 φ(xo ) ≤ φ(xo + γα) = φ(xo ) + γ 2 αT φxx (xo )α + O(γ 2 ) 2 1 2 T ⇒ 0 ≤ γ α φxx (xo )α + O(γ 2 ). 2 Dividing through by γ 2 and letting γ → 0 gives 1 T α φxx (xo )α ≥ 0. 2 As shown in Appendix A.2.5, this means that φxx (xo ) ≥ 0 (2.38) (2.39) (2.40) (2.41) (nonnegative definite). This is a necessary condition for a local minimum. The sufficient conditions for a local minimum are φx (xo ) = 0, φxx (xo ) > 0 (2.42) (positive definite). These conditions are sufficient because the second variation dominates the Taylor expansion, i.e., if φxx (xo ) > 0 there always exists a γ such that O(γ 2 )/γ 2 → 0 as γ → 0. Suppose φxx (xo ) is positive definite. Then, (2.40) is satisfied by the strict inequality and the quadratic form has a nice geometric interpretation as an n-dimensional ellipsoid defined by αT φxx (xo )α = b, where b is a given positive scalar constant. Example 2.2.1 Consider the performance criterion (or performance index) φ(x1 , x2 ) = (x2 + x2 ) 1 2 . 2 Application of the first-order necessary conditions gives φx1 = 0 ⇒ xo = 0, 1 φx2 = 0 ⇒ xo = 0. 2 2. the eigenvalue of φxx must be positive (see Appendix A. x2 ) = x1 x2 . xo ) is a minimum.5 Numerical Optimization Schemes Three numerical optimization techniques are described: a first-order method called steepest descent. xo ) is a minimum. then the matrix φxx is called indefinite. Application of the first-order necessary conditions gives φx1 = 0 ⇒ xo = 0. 1 2 φxx = 0 1 1 −λ ⇒ |φxx − λI| = 0 1 1 = λ2 − 1 = 0. −λ Since the eigenvalues λ = 1. and a method that is somewhere between these in numerical complexity and rate of converges. 2. −1 are mixed in sign. a second-order method known as the Newton–Raphson method.5). Using the second variation conditions. 2 φx2 = 0 ⇒ xo = 0. Example 2. Alternately.2. small . Using the second variation conditions 1 2 φxx = φ x1 x1 φx2 x1 φx1 x2 φx2 x2 = 1 0 0 1 25 is positive definite because the diagonal elements are positive and the determinant of the matrix itself is positive. Then.2.2. Unconstrained Minimization Check to see if (xo . denoted here as the accelerated gradient method. Steepest Descent (or Gradient) Method A numerical optimization method is presented based on making small perturbations in the cost criterion function about a nominal value of the state vector.2 Consider the performance index φ(x1 .2. 1 Check to see if (xo . and x ∆φ(xi+1 ) = − i φx (xi )φT (xi ) + O( i ). x becomes vanishingly small. (2. the gradient converges as lim φx (xi ) → 0. x where the value chosen for i δx = x − xi .26 Chapter 2. This is because the step size. as expressed by its norm ||δxi || = i ||φT (xi )||.47) . These small improvements are constructed by assuming that the functions evaluated with respect to these small perturbations are essentially linear and. To correct this. 2 (2. As the local minimum is approached. thereby. predict the improvement.43) Let xi be the value of an x vector at the ith iteration. Newton–Raphson Method Assume that near the minimum the gradient method is converging slowly. x (2.45). Perturbing x gives φ(x) − φ(xi ) = ∆φ(x) = φx (xi )δx + O(||δx||).46) i→∞ For a quadratic function. If the actual change and the predicted change do not match within given tolerances. Consider the problem min φ(x). (2. Choose δxi = − i φT (xi ). such that xi+1 = xi + δxi . then the size of the small perturbations is adjusted.45) is sufficiently small so that the assumed linearity remains valid and the cost criterion decreases as shown in (2.44) (2. expand φ(x) to second order about the iteration value xi as 1 ∆φ(x) = φx (xi )δx + δxT φxx (xi )δx + O(||δx||2 ). Finite-Dimensional Optimization improvements are made iteratively in the value of the cost criterion. the steepest descent method converges in an infinite number of steps. φT (xi ).49) Note that if φ is quadratic. . .2. update [9]. converges in n steps. called the Hessian. this second partial derivative can be estimated by constructing n independent directions from a sequence of gradients.48) (2. The method proceeds by computing the search direction si from Bi si = −g i ⇒ si = −Bi−1 g i . so called because as the estimate of φxx (xi ). or BFGS.2. 2. It is known as the Davidon–Fletcher–Powell method [17] and dates from 1959. xx x δx 2 giving 1 ∆φ(xi+1 ) = − φx (xi )φ−1 (xi )φT (xi ) + O(||δxi ||2 ). Assuming that φxx (xi ) > 0. called accelerated gradient methods. uses a modified method of updating the estimate. . Unconstrained Minimization where δx = x − xi . xx x 2 27 (2. known as the Broyden– Fletcher–Goldfarb–Shanno. approaches the actual value. For a quadratic function. the method approaches the Newton–Raphson method. Accelerated Gradient Methods Since it is numerically inefficient to compute φxx (xi ). the Newton–Raphson method converges to a minimum in one step. . The first and possibly most famous of these methods is still in popular use for solving unconstrained parameter optimization problems. described briefly here. The most common implementation. The most common of these methods are the quasi-Newton methods. i = 1. . we get 1 min φx (xi )δx + δxT φxx (xi )δx ⇒ δxi = −φ−1 (xi )φT (xi ). The method proceeds as a Newton–Raphson method where the inverse of φxx (xi ) is also estimated from the gradients and used as though it were the actual inverse of the Hessian. n. this class of numerical optimization x algorithms. Let Bi be the estimate to φxx (xi ) at the ith iteration and g i be the gradient φx (xi ). 52) where x ∈ Rn . u) is a known n-dimensional vector of functions. Note that the cost criterion is minimized with respect to n+m parameters. ∆g = g i+1 − g i . Hi+1 = Hi − si sT Hi ∆g∆g T Hi i + . xi+1 . but since they do not require storing the Hessian estimate.51) where Bi > 0.3 Minimization Subject to Constraints The constrained parameter minimization problem is min φ(x. and the minimum found is taken as the next nominal set of parameters. the golden section search. The point of this section is to convert a constrained problem to an unconstrained problem and then apply the results of necessity and sufficiency.50) (2. ∆g T Hi ∆g ∆g T si (2. Finite-Dimensional Optimization A one-dimensional search (using. They converge less quickly for general functions. u). possibly. It can be shown that the method converges in n steps for a quadratic function and that for general functions. [23] and [5]) give detailed discussions. u ∈ Rm . Section 2.u (2.2) is performed along this direction. Many texts on these and other optimization methods (for example.2. Bi converges to φxx (xo ) as xi → xo (assuming that φxx (xo ) > 0).28 Chapter 2. For ease of presentation the parameter vector is arbitrarily decomposed into two vectors (x. For larger systems. 2. We often choose . a class of methods known as conjugate gradient methods requires less storage and also converges in n steps for quadratic functions. x. u) subject to ψ(x. they are preferred for very large systems. The estimate of the inverse of the Hessian is then updated as Hk = Bi−1 . and ψ(x. u) = 0. 1 Simple Illustrative Example Find the rectangle of maximum area inscribed in an ellipse defined by f (x. Minimization Subject to Constraints 29 ψ(x. u) ∈ Ω = {(x. b. It is further assumed that f and φ are once continuously differentiable in each of their arguments. u) u u u (2.3. The area of a rectangle is the positive value of (2x)(2u).2. c are positive constants. u) − c = 0. so that different levels of the constraint can be examined.54) subject to (x. u ∈ R. a2 b (2. and both x and u can be viewed as minimizing the cost criterion φ(x.53) where a. u) = f (x. (2. We will then relate this approach to the classical Lagrange multiplier method.3. The choice of u as the minimizing parameter where x satisfies the constraint is an arbitrary choice. 2.55) where this becomes the area when 4xu is positive. To illustrate the ideas and the methodology for obtaining necessary and sufficient conditions for optimality. u) and satisfying the constraint ψ = f (x. f : R2 → R. which is then extended to the general case. and φ : R2 → R.4 for c = 1. u)|f (x. The ellipse is shown in Figure 2. u) − c = 0. u) are known n-dimensional functions and c ∈ Rn is a given vector. The main . u) = x 2 u2 + 2 = c. for which conditions for optimality were given in the previous sections. where f (x. It is assumed that x ∈ R. we begin with a simple example. In this example the constrained optimization problem is transformed into an unconstrained problem. The optimization problem is max(2x)(2u) = min −4xu = min φ(x. u) − c = 0}. uo ) ≤ φ(x. |δx| < β. uo ) is an extremal. u) in an open set about (xo .4: Ellipse definition. The procedure is to solve for x in terms of u to give an unconstrained problem for which u belongs to an open set. if (xo . for x − xo = δx and u − uo = δu. i. u) for all (x. uo )δx + fu (xo .30 Chapter 2. the set Ω is not an open set. u). Note that for this problem. uo )δu + O(d) = 0. (2.56) and O(d) →0 d as d → 0. uo ) since any admissible variation must satisfy the constraint. Therefore. which relates x and u. where d = (δx2 + δu2 ) 2 1 (2. uo ). Some special considerations must be made on the function φ(x..e. Let xo = 0 so that in a small region about (xo . either xo = 0 or uo = 0. difference between the constrained and unconstrained optimization problems is that in the constrained optimization problem. or both. Finite-Dimensional Optimization Figure 2. the change in the constraint is df = f (xo + δx.57) . uo + δu) − f (xo . uo ) = fx (xo . we cannot assert that φ(xo . |δu| < . such that x = g(u). . Minimization Subject to Constraints Then.2. uo ) = 0. where g(u) has continuous derivatives in |u − uo | < .5: Definition of V . u) is invertible. to first order.1.3. u) is continuously differentiable and fx (x. More precisely. if f (x. this turns out not to be necessary since all that will be required is the implicit representation of x = g(u). (2.1) implies that there exists a rectangle ¯ V as |x − xo | < β and |u − uo | < . uo )δu = 0 can be solved as δx = − [fx (xo .5. then the Implicit Function Theorem (see Appendix A. This implies that f (x. However. Obtaining g(u) explicitly may be quite difficult. shown in Figure 2. δf = fx (xo .58) (2. which is f (x. uo )δx + fu (xo . u) = c may be solved for x in terms of u.60) ¯ Figure 2. uo )]−1 fu (xo . u) = c.59) if fx (xo . uo )δu 31 (2. Note that by explicitly eliminating the number of dependent variables x in terms of the independent variables u through the constraint. it follows that uo is the optimal solution for ˆ min φ(u) = min φ(g(u). our unconstrained results now apply. (2. u) = c whenever |u − uo | < . u) u u (2. The optimal variables (xo . uo ) are determined from two equations φu − φx a2 2u fu = 0 ⇒ −4x + 4u fx 2x b2 x 2 u2 f (x. uo ) is an optimal point. u) = c ⇒ 2 + 2 = c. fx (2.63) (2. Therefore.65) Note that g need not be determined.67) From (2.66) we obtain x− u2 a2 x2 u2 x2 u2 = 0 ⇒ 2 − 2 = 0 ⇒ 2 = 2.68) . That is. (2. uo ) = 0. We still need to determine gu (u).62) The required first-order necessary condition is obtained by substituting (2. uo . ⇒ gu = − fu . Finite-Dimensional Optimization This implies that xo = g(uo ) and f (g(u). where φ(u) is continuously differentiable since φ and g(u) are continuously differentiable. uo )gu (uo ) + fu (xo .62) as φu − φx fu =0 fx at x o . b2 x a b a b (2.64) into (2. From f (g(u). a b = 0.61) subject to |u − uo | < . Since (xo . uo ) = 0. uo )gu (uo ) + φu (xo . u) = c we obtain fx (xo .64) (2. uo ) = (g(uo ). ˆ φu (uo ) = φx (xo .32 Chapter 2. using the chain rule. the objecˆ tive function is now solved on an open set.66) (2. the extremal parameters 2u2 = c. the gradient of φ is normal to the plane tangent to the constraint. all representing the corners of the same rectangle.uo ) . b2 ⇒ uo = ±b 2x2 = c. The maximum value is +2cab. fx (2. note that from (2. First-Order Conditions for the Constrained Optimization Problem We structure the necessary conditions given in Section 2. (2.74) (2.69) (2. a2 c . using (2. Minimization Subject to Constraints Then. and (2. 2 xo = ±a c .67) and (2.70) There are four extremal solutions. 2 33 (2.72) This means that at the optimal point. This is depicted in Figure 2.68).3.71) where the dependence of φo (c) on the constraint level c is explicit.65) becomes φu = −λfu = −λψu . The minimum value is ˆ φo (c) = φo (c) = −2cab.6.2.72) 4uo λ= o 2 = 2x /a 4b 2 (a a2 c 2 c ) 2 = 2ab. where the tangent point is at the local minimum (uo .75) .1 by defining a scalar λ as λ=− Then (2. Finally. (2. xo ).3. (2.72) becomes φx = −λfx = −λψx .73) φx |(xo . 2 ˆ φuu (uo ) = φuu + 2φxu gu + φx guu + φxx gu ≥ 0. Therefore. the second-order necessary condition obtained for the unconstrained optimization problem applies here as well. ∂c (2. Asˆ suming that φ and f are twice differentiable. (2.6: Definition of tangent plane. then so are g and φ(u).65) the first-order condition for a scalar constraint is given.71) by λ=− ∂φo (c) = 2ab. which is related to φ of (2. We will later show that λ is related to the classical Lagrange multiplier. This along with the second variation give necessary and sufficient conditions for local optimality.34 Chapter 2. Since uo lies in an open set.77) . Finite-Dimensional Optimization Figure 2. Second-Order Necessary and Sufficient Conditions In (2.76) This shows that λ is an influence function relating a change in the optimal cost criterion to a change in the constraint level. (2. (2. Minimization Subject to Constraints 35 ˆ To determine gu and guu we expand f (g(u). We derive the first. u) + λ(f (x. for this two-parameter problem we introduce the Lagrange multiplier method.80) . 2 where fu ˆ fu = fx gu + fu = 0 ⇒ gu = − .78) (2.3.77) produces the desired condition φuu − φx fuu fx − 2 φxu − fu φx φx fxu + φxx − fxx fx fx fx fu fx 2 ≥ 0.and second-order necessary and sufficient conditions by an alternate method. uo + δu. The sufficient condiˆ ˆ tions for a local minimum are φu (uo ) = 0 and φuu (uo ) > 0. 1ˆ ˆ ˆ ˆ f (uo + δu) = f (uo ) + fu (uo )δu + fuu (uo )δu2 + · · · = c. which has sufficient generality that it is used to generate these necessary and sufficient conditions in the most general case. Expanding the augmented cost H in a Taylor series to second order.81) ˆ Substitution into φuu given in (2. In particular. (2.2. The cost criterion is augmented by the constraint by the Lagrange multiplier as H = φ(x. u) = f (u) = c about uo and note that the coefficients of δu and δu2 must be zero. uo .82) x By identifying λ = − φx . then f fu ˆ + [φxx + λfxx ] φuu (uo ) = [φuu + λfuu ] − 2 [φxu + λfxu ] fx fu fx 2 ≥ 0. (2. ⇒ guu = − fu 1 fuu − 2fxu + fxx fx fx fu fx 2 (2. H(xo + δx. u) − c).79) (2.83) ˆ A necessary condition for a local minimum is that φuu (uo ) ≥ 0. λ + δλ) − H(xo . λ) ∼ Hx δx + Hu δu + Hλ δλ =    Hxx Hxu Hxλ δx 1 1 + [ δx δu δλ ]  Hux Huu Huλ   δu  = δ 2 H.84) 2 2 δλ Hλx Hλu Hλλ . fx 2 ˆ fuu = fxx gu + fx guu + 2fxu gu + fuu = 0. There is no requirement that the second-order term be positive semidefinite for arbitrary variations in (x. i = 1. from (2. uo ) = −λT fu (xo .36 Chapter 2. . fx δx + fu δu = 0. fx (2.88) (2. For the particular example of finding the largest rectangle in an ellipse. uo ) the n × n matrix fx (xo .83). uo ). or f (x. then there exists a vector λ ∈ Rn such that φx (xo . λ). Suppose that at (xo .3.83) is verified as φuu (uo ) = 16a/b > 0.73) Hu = 0. uo ) = −λT fx (xo . .74) Hx = 0 and Hλ = f (x. . u) x.e.1 Let fi : Rn+m → R.u (2. i. the requirement is that the function φ takes on a minimum value on the tangent plane of the constraint. Intuitively. (2.84). .2 General Case: Functions of n-Variables Theorem 2. This is done by using the relation between δx and δu of δx = − fu (xo . 2. . . . u) = ci . .3. u) − c = 0.85) If this is substituted into the quadratic form in (2. Let xo ∈ Rn and uo ∈ Rm be the optimal variables of the problem φo = min φ(x. be n continuously differentiable constraints and φ : Rn+m → R be the continuously differentiable performance index. uo ) (2. u) = c. uo ).87) subject to fi (x. n.86) where the coefficient of δλ becomes identically zero.89) . Finite-Dimensional Optimization where from (2. i = 1. The coefficient of the quadratic in (2. the quadratic form reduces to (note Hλλ ≡ 0) δ 2 H = δu Hxx fu fx 2 − 2Hxu fu + Huu δu ≥ 0.86) is identical to (2. uo ) δu. n. ensuring that φ is a locally constrained maximum at uo . u.. the ˆ second variation of (2. uo ) is nonsingular. φu (xo . fx (xo . by the Implicit Function Theorem (see section A. 3 (2. it follows that uo is an optimal variable for a new optimization problem defined by ˆ min φ(g(u). (2. .94) U is the set of points u that lie in the ball defined by u − uo < . uo ). uo ) = (g(uo ).1. where U = [u : u − uo < ]. an open set V ∈ Rn+m containing (xo . u) = f (x.1) there exists an > 0.3 This means that f (x. Theorem 2. (2. uo (c)) are once continuously differentiable functions of c = [c1 . and by the chain rule ˆ φu (uo ) = φx gu + φu |u=uo . . ∂c (2. uo ) is optimal. Proof: Since fx (xo .90) Remark 2. u) = c. Therefore. .92) xT .3.2 is applicable.2. u) − c = 0 without loss of generality so that different levels of the constraint c can be examined and related to φo as given in (2. if (xo (c).3. then φo (c) is a differentiable function of c and λT = − ∂φo (c) . Since (xo . and a differentiable function g : U → Rn . (2.93) ˆ U is an open subset of Rm and φ is a differentiable function on U. cn ]T . . u) = min φ(u) u u subject to u ∈ U.90).1 We choose ψ(x.91) Furthermore. since φ and g are differentiable. uo ) is nonsingular. g(u) has a continuous derivative for u ∈ U.2.x=g(uo ) = 0. implies that x = g(u) for u ∈ U. . Minimization Subject to Constraints 37 Furthermore. uT ∈ V. uo (c)) = c.101) using the first-order condition.97) Let us now define the n-vector λ as −1 λT = −φx fx |(xo .102) (2. (2. we can evaluate gu as −1 gu = −fx fu . (2. uo ) is nonsingular. xo ). Finite-Dimensional Optimization Furthermore.96) Substitution of gu into (2.98) can be written as [φx .103) .97) and (2. uo ): fx gu + fu = 0.uo ) = 0.uo ) . φo = φx xo + φu uo .38 Chapter 2. u) = c for all u ∈ U.98) Then (2. it follows that in a neighborhood of c. c c c Differentiating f (xo (c). fu ] . (2. f (g(u).99) Now we will show that λT = −φo (c). (2.100) (2. (2. φu ] = −λT [fx . Then f (xo (c).95) Again. c uo (c)) are continuously differentiable.94) gives −1 φu − φx fx fu (xo . By differentiating φo (c) = φ(uo . u) are zero. fx is nonsingular. c c c c (2. −1 φu − φx fx fu = 0 (2. This means that all derivatives of f (g(u). uo (c)) = c gives −1 −1 fx xo + fu uo = I ⇒ xo + fx fu uo = fx . Since by assumption f (x. in particular the first derivative evaluated at (xo . since the matrix function fx (xo . u) and (xo (c). 107) (2. φu ] = −λT [fx . and λ. c c 39 (2. u. λ) = φ(x. if λT [fx . λ.106) and we construct an unconstrained optimization problem in the 2n + m variables x. The set of vectors which are orthogonal to this tangent surface is any linear combination of the gradient [fx .99) shows that the gradient of the cost function [φx . (2.104) Using the first-order condition of (2. u. then [φx . u. u) + λT (f (x. By adjoining the constraint f = c to the cost function φ with an undetermined n-vector Lagrange multiplier λ. c (2. fu ] is such that [φx . λ) = φ(x. Minimization Subject to Constraints Multiplying by φx gives −1 −1 φx xo + φx fx fu uo = φx fx = −λT . 4 Note that H(x. a function H(x. u. Therefore. u) − c).3. fu ]h = 0. Lagrange Multiplier Approach Identical necessary conditions to those obtained above are derived formally by the Lagrange multiplier approach.89) gives the desired result φo = −λT . u) when the constraint is satisfied. Since [fx .105) Remark 2.2 Equation (2. φu ] is orthogonal to the tangent surface. fu ] form n independent vectors (because fx is nonsingular). then the tangent plane is described by the set of vectors h such that [fx . φu ] is orthogonal to the tangent plane of the constraint at (xo . In particular.3. u. fu ]. fu ]. λ) is defined as4 H(x.2. we look for the extremal point of H with respect to x. . uo ). The parameter optimization problem is min φ(y) y subject to f (y) = c. uo ) − c = 0. and λ.109) (2.40 Chapter 2. This is done by forming a projector that annihilates any component of the gradient of the cost criterion in the direction of the gradient of the constraint function. Hλ = f (xo .2. and λ are considered free and can be arbitrarily varied within some small open set containing xo .5 to include equality constraints..e. u.110) This gives us 2n + m equations in 2n + m unknowns x. Note that satisfaction of the constraint is now satisfaction of the necessary condition (2.108) (2.110). λ. uT ]T . The procedure suggested is to first satisfy the constraint. Finite-Dimensional Optimization where x. Suppose y = [xT . u. Hu = φu + λT fu = 0. (2. Since these gradients are determined from the first-order term in a Taylor series of the cost criterion and the constraint functions. (2.3. uo . constraint restoration. 2. Then. a gradient associated with changes in the cost criterion along the tangent plane to the constraint manifold is constructed. i.111) where φ and f are assumed to be sufficiently smooth so that for small changes in y away from some nominal value y i .3 Constrained Parameter Optimization: An Algorithmic Approach In this section we extend the steepest descent method of Section 2. the steps used in the iteration process must be sufficiently small to preserve the validity of this assumed linearity. φ and f can be approximated by the first term of . From our unconstrained optimization results we have Hx = φx + λT fx = 0. Note that δy in (2. Although these steps can be combined. To ensure that changes in the cost φ(y) are made only in the tangent plane.2. 1 > 0. they are separated here for pedagogical reasons. set y i = y i + δy. (2.113) In the following we describe a numerical optimization algorithm composed of a constraint restoration step followed in turn by a minimization step. from (2. Minimization Subject to Constraints a Taylor series about y i as (δy = y − y i ). so that the constraint will not be violated (to first order). δφ ∼ φy δy. = 41 (2.112) (2.113) where fy is full rank.115) . Constraint Restoration Since f = c describes a manifold in y space and assuming y i is a point not on f = c. At the end of each iteration to satisfy the constraint. define the projection operator as T T P = I − fy (fy fy )−1 fy . = δf ∼ fy δy. (2.3.113) a change in the constraint level is related to a change in y. |c − f | < 1 1. then fy is perpendicular to the tangent plane of f = c at y i . The iteration sequence stops when for Constrained Minimization Since f = c describes a manifold in y space and assuming y i is a point on f = c. To move in the direction of constraint satisfaction choose δy as T T δy = fy (fy fy )−1 δf.114) > 0 forms an iterative step where the choice of δf = − ψ = (c − f (y i )) for small of driving f to c.114) is a least-squares solution to (2. 116) T Therefore. while tangential to the constraint ψ(x. Finite-Dimensional Optimization T P fy = 0. uT ]T and f. The object is to use this projector to ensure that if changes are made in improving the cost. i.7 for a geometrical description where y = [xT .118) . u ψ =0 ψT y T φy T -P y Lines of constant performance index.. x. Chapter 2. With this choice of δy. y where again (2. ui xi x Figure 2.42 which has the properties that P P = P. (2. the cost criterion change to first order is δφ = − φy P φT = − φy P P T φT = − φy P y y 2 (2. u) = f (x. they are made in only the tangent line to the constraint.e. This projected gradient is constructed by choosing the control changes in the steepest descent direction. the projection operator will annihilate components of a vector along fy .117) is a positive number chosen small so as not to violate the assumed linearity. u) − c = 0. See Figure 2. δy = − P φT .7: Geometrical description of parameter optimization problem. P = PT. u are scalars. (2. This iterative process between constraint restoration and constrained minimization is continued until the stationary necessary conditions P φT = 0.123) T where f (y i ) − c = 0 and fy δy = 0.119) where δf = fy δy = − fy P φT = 0. a minimum occurs for the augmented cost when φy + λT fy = 0.121) then by the usual arguments. Note that the constraint is satisfied to first order ( δy < ∆f = δf + O( δy ). (2. the augmented cost variation δ φ is ¯ δ φ = (φy + λT fy )δy. Note that the constraint restoration and optimization steps can be combined given the assumed linearity. The second-order constraint violation is then y restored by going back to the constraint restoration step. uo ) results in T T λT = −φy fy (fy fy )−1 .120) are met. If the constraint variation δf is adjoined to δφ by the Lagrange multiplier ¯ λ in (2. The n-vector Lagrange multiplier λ is now shown to contribute to the structure of the projector.124) . The Lagrange multiplier technique is consistent with the results of (2. Minimization Subject to Constraints (since P P T = P ).2. If δy is chosen as δy = − (φy + λT fy )T .123) by fy and solving for λ at (xo . This optimization algorithm is called steepest descent optimization with constraints. Postmultiplying (2. 43 1) (2. (2.113). y f =c (2.3.120).122) (2. λ) = δ 2 H + O(d2 ) 2    δx Hxx Hxu Hxλ 1 = [ δxT δuT δλT ]  Hux Huu Huλ   δu  + O(d2 ).) Then.4 General Form of the Second Variation Assume that φ and f are twice differentiable. therefore. 2. Whereas uo lies in an open set. The constraint projection and restoration method described here is an effective method for numerical solution of minimization problems subject to equality constraints. 2 δλ Hλx Hλu Hλλ (2.127) . but the interested reader may see [23]. u) − c = 0.123) results in T T T T φy − φy fy (fy fy )−1 fy = φy I − fy (fy fy )−1 fy = φy P = 0.126) (If ψ = f (x. All such methods are subject to a number of difficulties in actual implementation. so ˆ is g and. with many sharing significant ideas.124) back into (2. uo + δu. we use the equivalent Lagrange multiplier approach. Rather. (2. Since f is twice differentiable. These are beyond the scope of this text. (2. expanding the augmented cost to second order. the general second-order necessary condition for an unconstrained optimization problem applies.3. Several other methods are also in common use. λ + δλ) − H(xo . uo . φ(u). where H = φ + λT ψ.44 Chapter 2.120). Finite-Dimensional Optimization Substituting (2. [5]. assuming first-order necessary conditions hold. then H = φ + λT (f − c). Producing the inequality by the procedure of the previous section is laborious. 1 H(xo + δx. and [36] for more information.125) which is just the first condition of (2. (2. Minimization Subject to Constraints 45 where the first-order condition δH = 0 (Hx = 0. From xo = g(uo ) and its properties in (x. δλT .1 Again. (2.1. δx = gu δu + O( δu ).132) along with the first-order conditions. −1 ˆ fu = Hλx gu + Hλu = fx gu + fu = 0 ⇒ gu = −fx fu . (2. (2. the coefficients associated with δλ are zero.131) and the sufficiency condition is T [ gu I] Hxx Hux Hxu Huu gu I > 0.129) Using (2. Section A. Note 2. Hu = 0 and Hλ = f − c = 0) is used and d = δxT .5 Inequality Constraints: Functions of 2-Variables An approach for handling optimization problems with inequality constraints is to convert the inequality constraint to an equality constraint by using a device called . δuT . Note 2. (2.129) in (2. 2.128) and (2.3. u) = f (u) = c requires that all its derivatives be zero.128) ˆ where f (g(u).2 The definiteness conditions are only in an m-dimensional subspace associated with the tangent plane of the constraints evaluated at (xo . the necessary condition for local optimality is T [ gu I] Hxx Hux Hxu Huu gu I ≥ 0.1).3.127). uo ). u) ∈ V (see the Implicit Function Theorem. In particular. the second variation reduces to T T δ 2 H = δuT gu Hxx gu + Hux gu + gu Hxu + Huu δu ≥ 0.3.2.130) Therefore.3. the previous necessary conditions are applicable. Theorem 2. then ν = 0 and the necessary conditions become those of an unconstrained minimization problem. (2. u) ≤ 0. If xo . Let (xo . φu ) = −ν(θx . then θx = 0. (2.3.2 Let φ : R2 → R and θ : R2 → R be twice continuously differentiable.135) For any real value of α ∈ R the inequality constraint is satisfied.3 If xo . i. Then there exists a scalar ν ≥ 0 such that (φx . u) x.u subject to θ(x.. uo ) = 0. ν > 0 and the necessary conditions of (2.133) where if θ(xo . u) = −α2 . uo lies on the boundary of the constraint. . Proof: We first convert the inequality constraint into an equality constraint by introducing a slack variable α such that θ(x. (φx . θu ).3. u) + α2 = 0.134) Remark 2. (2. uo lies in the interior of the constraint.136) which was considered in the last section. φu ) = 0. Once the problem is in this form. uo ) be the optimal variables of the problem φo = min φ(x. We present the 2-variable optimization problem for simplicity.46 Chapter 2.u subject to θ(x. but the extension to n dimensions is straightforward. Finite-Dimensional Optimization slack variables. (2.134) hold. u) x. An equivalent problem is φo = min φ(x.e. 138) (2. then the optimal solution is on the constraint boundary where ν o = 0.139) we obtain the condition shown in (2. The objective now is to show that ν ≥ 0.3.141) Hν = θ(xo . If αo = 0. If αo = 0. ν. Adjoin the constraint θ + α2 = 0 to the cost function φ by an undetermined multiplier ν as H = φ(x.142) where the variation of the constraint is used to determine δx in terms of δu and δα as θx δx + θu δu + 2αδα = 0.134). From (2. (2. u.139) (2. we use the Lagrange multiplier approach here. δx = − θu δu. u.137) We look for the extremal point for H with respect to (x. α.140) (2. α. (2. on the constraint. From our unconstrained optimization results we have Hx = φx + ν o θx = 0 Hu = φu + ν o θu = 0 at (xo . Minimization Subject to Constraints 47 For simplicity. Hαν   δα  δν Hνν (2. θx (2.2. ν). uo ). uo ) + αo2 = 0. Hα = 2ν o αo = 0. Since by assumption θx = 0. u) + ν(θ(x.143) . u) + α2 ). at (xo . then ν o = 0 off the boundary of the constraint (in the admissible interior).138) and (2. However. This gives four equations in four unknowns x. uo ). To determine if ν o ≥ 0. the second variation is used as Hxx  Hux δ 2 H = [ δx δu δα δν ]   Hαx Hνx  Hxu Huu Hαu Hνu Hxα Huα Hαα Hνα   δx Hxν Huν   δu   . αo = 0.144) (2. 148) ν ≥ 0. For example. then under certain conditions the gradient of the cost criterion is contained at the minimum to be in a cone constructed from the gradients of the active constraint functions (i. . (2. then the unconstrained results apply. This notion is implied by the Kuhn–Tucker theorem [33]. (2.. This simple example can be generalized to the case with n inequality constraints. α = 0 in the above twovariable case). In this chapter.145) Huu − 2Hux θu + Hxx θx 0 θu θx 2 0 2ν δu ≥ 0. There are many fine points in the extension of this theory. If the optimal point lies in the interior of the constraint.142) reduces to δ H = [ δu δα ] 2 (2. if all the inequality constraints are feasible at or below zero. Hαα = 2ν o . Hνν = 0.147) (2. Using (2.144) and (2. we have attempted only to give an introduction that illuminates the principles of optimization theory and the concepts that will be used in following chapters.145) in the second variation. Finite-Dimensional Optimization Hxα = 0. Note: The second variation given above is only for the case in which the optimal variables lie on the boundary. δα (2.48 In addition. Huα = 0.146) and then δ 2 H is positive semidefinite if Huu θu − 2Hux + Hxx θx θu θx 2 ≥ 0.e. Chapter 2. Hνα = 2αo = 0. x2 at which the function φ = x 1 + x2 is a minimum. 59 λ2 = 36 . 59 λ1 = −55 . 2. Minimize the performance index 1 φ = (x2 + y 2 + z 2 ) 2 . Minimization Subject to Constraints 49 Problems 1.2. Use height and radius as variables and use a Lagrange multiplier. subject to the constraint x2 + x1 x2 + x2 = 1. 59 z= 93 . Determine the point x1 . Minimize the performance index 1 φ = (x2 + y 2 + z 2 ) 2 subject to the constraints x + 2y + 3z = 10. 59 y= 146 . A tin can manufacturer wants to find the dimensions of a cylindrical can (closed top and bottom) such that. 59 4. If the thickness of the tin stock is constant. x − y + 2z = 1. Show that x= 19 . a given amount of tin implies a given surface area of the can.3. the volume of the can is a maximum. 1 2 3. for a given amount of tin. Minimize the performance index φ = x − y + 2z subject to the constraint x2 + y 2 + z 2 = 2. 7. √ 4 − 3x2 . Minimize the performance index φ = −x1 x2 + x2 x3 + x3 x1 subject to the constraint x1 + x2 − x3 + 1 = 0 8.50 subject to the constraint Chapter 2. Finite-Dimensional Optimization x + 2y − 3z − 7 = 0. 6. Minimize the performance index φ= subject to the constraint −1 ≤ x ≤ 1. Maximize the performance index φ = x 1 x2 subject to the constraint x1 + x2 − 1 = 0. 5. Minimization Subject to Constraints 9. (a) In the two-dimensional xt-plane. determine the extremal curve of stationary length which starts on the circle x2 + t2 − 1 = 0 and terminates on the line t = T = 2. (b) Find the extremals of φ = ex1 +x2 subject to 1 x2 + x2 = . Maximize the performance index φ = xu subject to the inequality constraint x + u ≤ 1.3. 2 x ∈ Rn .2. u ∈ Rm . . u∗ to be locally minimizing for the problem of minimizing φ = φ(u. x) and subject to Ax + Bu = C. (a) State the necessary and sufficient conditions and underlying assumptions for x∗ . (b) Solve problem (a) but consider that the termination is on the line −x + t = √ 2 2. 1 2 2 11. Note: Parts (a) and (b) are not to be solved by inspection. 51 10. . 1 Introduction In accordance with the theme of this book outlined in Chapter 1. we derive necessary conditions and also a sufficient condition for the optimality of a given control function. In Section 3.2 we begin with the control of a linear dynamic system relative to a general performance criterion. we use linear algebra. elementary differential equation theory. In other words. and the definition of the derivative to derive conditions that are satisfied by a control function which optimizes the behavior of a dynamic system relative to a specified performance criterion. Next. Then we extend these necessary conditions to nonlinear systems with the aid of a theorem by Bliss [6] on the differentiability of the solution of an ordinary differential equation with respect to a parameter. we comment upon the two-point boundary-value problem based on these necessary . Restricting attention to a linear system and introducing the notion of weak control perturbations allows an easy derivation of a weak form of the first-order necessary conditions.53 Chapter 3 Optimization of Dynamic Systems with General Performance Criteria 3. we will stick to calling the states x. and the problems in this book are laid out in that notation. which is the value of the cost criterion using the optimal control. This result is further strengthened upon the introduction of control variable constraints. which are referred to as Pontryagin’s Principle [38].1 Throughout this book. we treat the case of unspecified final time and derive an additional necessary condition. and a “running variable” can drastically alter the ease with which a problem may be solved.1. After having observed that Pontryagin’s Principle is only a necessary condition for optimality. a control. Remark 3. Then we derive the H-J-B equation on the assumption that the optimal value function exists and is once continuously differentiable. where necessary. which allows the derivation of a stronger form of the first-order necessary conditions. The choice of which depends on the specifics of the problem at hand. Finally. or the independent variable. The dependent variable of the H-J-B partial differential equation is the optimal value function. in a rocket launch. and the independent variable t. In fact. we relate the derivative of the optimal value function to Pontryagin’s Lagrange multipliers. we introduce the Hamilton–Jacobi–Bellman (H-J-B) equation and provide a general sufficient condition for optimality. Using the H-J-B equation. the energy of the vehicle (kinetic plus potential) can be considered as a state. . the controls u. This need not always be the case. We then introduce the notion of strong control perturbations. Systems with General Performance Criteria conditions. We illustrate. the choice of what constitutes a state. Since once those items are chosen the notation becomes a matter of choice. For example. time (or the variable t) is considered the independent variable. called the transversality condition. a control.54 Chapter 3. the conditions that we develop in this chapter by working out several examples. and where A(·) and B(·) are n × n and n × m matrix functions of time t. tf ].1 The elements aij (·) and bkl (·) of A(·) and B(·) are continuous functions of t on the interval [t0 . We make the following assumptions. x0 ) = φ (x(tf )) + t0 L (x(t). u(t). the Lagrangian. t) dt.2.2.2) Here L.1 The notation (·). The initial condition at time t = t0 for (3.2 The control function u(·) is drawn from the set U of piecewise continuous m-vector functions of t on the interval [t0 .2. tf ]. and we make the following assumptions concerning these functions. (3.2.2 Linear Dynamic Systems with General Performance Criterion The linear dynamic system to be controlled is described by the vector linear differential equation x(t) = A(t)x(t) + B(t)u(t). is used to denote the functional form of L.1) is x0 . respectively. ˙ x(t0 ) = x0 .3. Assumption 3.1) J (u(·). and φ are scalar functions of their arguments. . Assumption 3. tf > t0 . Linear Dynamic Systems 55 3. as used in L(·). where x(·) and u(·) are. n and m vector functions of time t. The optimal control problem is to find a control function uo (·) which minimizes the performance criterion tf (3. Note 3. tf ].1) developed in the next section. Also note that under the correct conditions (to be introduced . two types of variations (or perturbations) are considered: strong and weak variations. and we derive first-order conditions which this control will satisfy. Remark 3. In the derivation to come we make use of the properties of (3. We now suppose that there is a piecewise continuous control function uo (·) that minimizes (3. ·.3 The scalar function L(·.2.1 In this chapter.56 Chapter 3. Contrast this with the weak variations. u(t) strong variation nominal path weak variation t Figure 3. which are small perturbations over a large time interval. Systems with General Performance Criteria Assumption 3.2.1.1: Depiction of weak and strong variations. These are depicted graphically in Figure 3. Assumption 3.2. Note that the strong variation is characterized by large variations over a very short interval.4 The scalar function φ(·) is once continuously differentiable in x.2). ·) is once continuously differentiable in x and u and is continuous in t on [t0 . Let the trajectory that results as a consequence of the control function uo (·) + εη(·) be xo (·) + ξ(·. (3. τ )B(τ )η(τ )dτ.5) and it then follows that t ξ(t.1 Linear Ordinary Differential Equation Under the assumptions made in Section 3. both weak and strong variations in the control can produce “small” state variations.7) . t0 ) x0 + t0 Φ (t. τ ) .2 it is well known that (3. 3. (3. (3.3.2. tf ].2. τ ) = A(t)Φ (t.6) Upon defining t z(t. ε) = ε t0 Φ(t. τ )B(τ ) [uo (τ ) + εη(τ )] dτ.4) xo (t) + ξ(t. dt Φ(t0 .1) has for each u(·) a unique solution defined on [t0 .3) that t (3. t0 ) = I. and suppose we perturb the control function uo (·) by adding to it an arbitrary piecewise continuous function εη(·). τ )B(τ )η(τ )dτ. ·) is an n×n matrix function of t and τ (the transition matrix corresponding to A(·)) which satisfies the homogeneous ordinary differential equation d Φ (t. η(·)) = t0 Φ(t. We have from (3.3) where Φ(·. τ ) B(τ )u(τ )dτ. Linear Dynamic Systems 57 in this chapter). ε) = Φ(t. ε). (3. This solution [8] is given by t x(t) = Φ (t. where ε is a small positive scalar parameter. Let the control function uo (·) generate the trajectory xo (·). t0 )x0 + t0 Φ(t. 11) ˙ .2) by means of a continuously differentiable n vector function of time λ(·).2. called in the classical literature the Lagrange multiplier (see Section 2. t)η(t) + O(t.2). ε) = εz(t. εz(t.. t)z(t.58 we see that Chapter 3. η(·)).3.2 Expansion Formula The definition of a differentiable scalar function permits us to make the following Taylor expansion (see Appendix A. (3. t) + εLx (xo (t). (3.3 Adjoining System Equation We may adjoin (3. x0 ) + tf t0 λT (t) [A(t)x(t) + B(t)u(t) − x(t)] dt. uo (t). x0 ) = J (u(·). η(·)). ε).2. ε) →0 ε as ε→0 for each t. η(·)) + εLu (xo (t). t) = L(xo (t). uo (t). (3.2.9) 3.8) so that by linearity perturbing uo (·) by εη(·) perturbs xo (·) by a function of exactly the same form. (3. λ(·). where the function O(t. uo (t) + εη(t). viz.2) in the parameter ε: L(xo (t) + εz(t. δu(·) = εη(·).1. uo (t). Systems with General Performance Criteria ξ(t. Remark 3.2 The notation used in this book can be compared to the variational notation like that used in [11] by noting (for example) u(·) = uo (·) + δu(·) ∈ U ⇒ δu(·) ≤ ε1 . ε) is piecewise continuous in t and O(t.1) to the performance criterion (3.10) 3. η(·)). as follows: ˆ J (u(·). we may integrate by parts in (3. so that nothing is gained or lost by this step. in J.15) is a linear ordinary differential equation in λ with piecewise continuous coefficients. x0 ) 59 (3.1) holds.14) Note that the coefficient of a first-order change in λ is zero..3. 3. λ(·). Defining the change in the performance index as ˆ ˆ ˆ ∆J = J([uo (·) + εη(·)]. ˆ J (u(·). The right-hand (3.2. uo (t). x0 ) = J (u(·). having a unique solution. x0 ) + T tf t0 T ˙ λT (t)x(t) + λT (t)A(t)x(t) + λT (t)B(t)u(t) dt (3. t)η(t) + ελT z(t. uo (t).e. η(·)) + εLu (xo (t). x0 ) = J (u(·).11) to obtain ˆ J (u(·).13) + λ (t0 )x0 − λ (tf )xf . η(·)) + ελT (t)B(t)η(t) + O(t. λ(·).12) when (3. λT (tf ) = φx (xo (tf )). η(·)) + O(ε). Now let us choose ˙ −λT (t) = Lx (xo (t). η(·)) + ελT (t)A(t)z(t.2. λ(·). x0 ) − J(uo . Linear Dynamic Systems Note that when the differential constraint is satisfied. (3. Because of the assumed differentiability of λ(·) with respect to t. λ(·). t)z(t.4 Expansion of J ˆ ˆ We now evaluate the change in J (i. This is a legitimate choice for λ(·) as (3. x0 ) leads to the following expansion: ˆ ∆J = tf t0 ˙ εLx (xo (t). η(·)) + εφx (xo (tf ))z(tf . uo (t).15) . since J = J when the dynamics are satisfied) brought about by changing uo to uo (·) + εη(·). ε) dt − ελT (tf )z(tf . since it adjoins the system dynamics. t) + λT (t)A(t). The left-hand side of (3. because of our assumptions on L. so −ε tf t0 Lu (xo (t).16) yields ˆ ∆J = −ε tf t0 T (3. that a necessary condition for uo (·) to minimize J and thus J is that Lu (xo (t).19) is dominant.20) where the variation is nonnegative because uo (·) minimizes J. then. 3. Therefore. The multiplier λ(·) is obtained by (3.2.21) ˆ It follows. t)η(t) + λT (t)B(t)η(t) dt O(t. ε)dt + O(ε). uo (t). t) + λT (t)B(t) 2 tf dt + t0 O(t. the first term in (3.5 Necessary Condition for Optimality Let uo (·) be the minimizing control.14) then simplifies to ˆ ∆J = ε + t0 tf t0 tf Lu (xo (t). tf ]. t) + λT (t)B(t) = 0 (3.16) (3.18) Lu (xo (t). tf ]. uo (t). uo (t). Substituting (3.22) for all t in [t0 .18) into (3. t) + λT (t)B(t) = 0 ∀t ∈ [t0 . As η(·) can be any piecewise continuous function on [t0 . ε)dt + O(ε) ≥ 0. t) + λT (t)B(t) . .60 Chapter 3. (3. and B. (3. xo (·) is the resulting optimal path.15) and satisfies the assumption that λ(·) is continuously differentiable. uo (t).19) For ε small and positive. uo (t). (3. Systems with General Performance Criteria side of (3. tf ] we may. set η(t) = − Lu (xo (t). t) + λT (t)B(t) 2 dt ≥ 0. ε)dt + O(ε) tf t0 (3. λ.20) is negative if Lu (xo (t). uo (t).17) ˆ = εδ J + O(t. uo (t). where ˙ −λT (t) = Lx (xo (t).2. ˙ x1 (t0 ) = x10 . Note 3. viz.2.2.. uo (t).2 These necessary conditions have converted a functional minimization into a function minimization at each point in time. u(t). t) = L(x(t).2.15) we have the following theorem. x2 (t0 ) = x20 (3. t) + λT (t)[A(t)x(t) + B(t)u(t)]. t) + λT (t)A(t). λT (tf ) = φx (xo (tf )).1 Suppose that uo (·) minimizes the performance criterion (3.2. An Example Illustrating Pontryagin’s Necessary Condition Let us consider the system x1 (t) = x2 (t). u(t).2).2. the partial derivative of the Hamiltonian with respect to the control u(t) is zero when evaluated at xo (t).2.3. uo (t).2.24) (3. Then.22) and (3. Theorem 3.26) (3. λ(t).4. under Assumptions 3. ˙ x2 (t) = u(t). Let us define the Hamiltonian H(x(t).25) (3. Then from (3. t) = 0.1–3.5 can be restated in terms of a particular Hamiltonian function to yield a weak version of Pontryagin’s Principle for the optimal control problem formulated in Section 3. Linear Dynamic Systems 61 3. Hu (xo (t).6 Pontryagin’s Necessary Condition for Weak Variations The necessary condition for optimality of uo (·) derived in Section 3.23) . λ(t). 62 Chapter 3. and it follows from Equations (3.29) Note that we have not shown that u(·) minimizes J but only that it satisfies a condi˜ tion that an optimizing control must satisfy. u. ˙ 2 (t) = Hx = λ1 (t). x ˜ (3. λ1 (tf ) = 1.27) is an optimizing control.31).33) (3. The Hamiltonian for this problem is given by 1 H(x. that it satisfies a necessary condition for optimality. t) = 0. λ. (3. λ. . (3. u(t). we check whether the necessary condition developed above is satisfied. i.32) that ˜ Hu (˜(t). u. Systems with General Performance Criteria and the performance functional J(u(·).30) (3. λ(t).. (3. although the necessary conditions can explicitly determine this control. x0 ) = x1 (tf ) + We guess that u(t) = −(tf − t) ˜ (3.29) we have that Hu (x. i. t) = u2 + λ1 x2 + λ2 u 2 so that ˙ −λ1 (t) = Hx1 = 0. and (3. In order to test whether this is a candidate for an optimal control.32) λ2 (t) = (tf − t).28). λ2 (tf ) = 0.28) 1 2 tf t0 u2 (t)dt.e. From (3.e..32).31) (3. −λ 2 Hence we obtain λ1 (t) = 1. t) = u + λ2 . see (3. ˜ ˆ and the minimum value of J(u(·).31) to specify λ(·) in (3. In the example above. 6 (3. λ(·).2. λ(·).34) + x1 (tf ) + λ1 (t0 )x10 + λ2 (t0 )x20 − λ1 (tf )x1 (tf ) − λ2 (tf )x2 (tf ). determine Λ(t).30) and (3. x0 ) is then ˆ min J(u(·). Using (3. λ(·).34) yields ˆ J(u(·).38) Problems 1. x0 ) = − u (3. i.3.36) and this clearly takes on its minimum value when we set u(t) = u(t) = −(tf − t). assume λ(t) = P (t)x(t). Find a differential equation for P and a feedback control law in the form uo (t) = Λ(t)x(t). .e. Now the only term that depends upon u(·) is the integral.26) to J using λ(·) and integrate by parts to obtain ˆ J(u(·).35) + λ1 (t0 )x10 + λ2 (t0 )x20 . x0 ) tf = t0 1 2 ˙ ˙ u (t) + λ1 (t)x2 (t) + λ2 (t)u(t) + λ1 (t)x1 (t) + λ2 (t)x2 (t) dt 2 (3.. which we can write as 1 2 tf t0 (u + tf − t)2 − (tf − t)2 dt. (3. x0 ) = tf t0 1 2 u (t) + (tf − t)u(t) dt 2 (3. λ(·). Linear Dynamic Systems 63 Using the Lagrange multiplier method we now prove directly that u(·) is indeed ˜ minimizing. We adjoin (3.37) 1 tf (tf − t)2 dt + λ1 (0)x10 + λ2 (0)x20 2 t0 1 = − (tf − t0 )3 + x10 + (tf − t0 )x20 . ·) is once continuously differentiable in x and u and continuous in t on the interval [t0 . t). Determine the Pontryagin necessary conditions for this problem. Consider the cost criterion tf J = φ1 (x(tf )) + φ2 (x(t1 )) + t0 L(x. described by x(t) = f (x(t). ˙ x(t0 ) = x0 . where f (·. Assumption 3. .39) subject to x = Ax + Bu.41) has a unique solution x(·) defined on [t0 .3. u(t). ·) is an n vector function of its arguments. (3. uo (·)) in order to obtain an expression analogous to (3. 3.40) and t0 < t1 < tf with t1 and tf fixed. ·. t)dt (3. Systems with General Performance Criteria 2.1 The n vector function f (·. ˙ x0 = x(t0 ) given.2 Equation (3. We make the following assumptions.3. u. tf ]. (3. Our first task is clearly to investigate the behavior of this nonlinear ordinary differential equation in a neighborhood of a trajectory-control pair (xo (·). tf ] for each piecewise continuous control function u(·).41) Assumption 3.8).3 Nonlinear Dynamic System We here extend the results thus far obtained to the case where the dynamic system is nonlinear. ·.64 Chapter 3. [uo (t) + εη(t)]. the quadratic equation x(t) = x(t)2 . tf ].1 Perturbations in the Control and State from the Optimal Path Let the control function uo (·) generate the unique trajectory xo (·).3.3. where ξ(t0 . Nonlinear Dynamic System 65 Actually. (3. uo (t). ˙ has the unique solution x(t) = which ceases to exist at t = defined on [t0 . t).44) Now because f (·. ˙ so that ˙ ξ(t. This implies that (3.41) we have that ˙ xo (t) + ξ(t. [uo (t) + εη(t)].3. Let the trajectory that results as a consequence of the control function uo (·) + εη(·) be xo (·) + ξ(·. ε) = f ([xo (t) + ξ(t.42) x0 . For example. t). where ε is a positive scalar parameter. it is necessary only to assume existence on [t0 . ·) satisfies Assumption 3. it is well known in differential equation theory [6] that ξ(·.43) + t0 . and suppose that we perturb this control function by adding to it a piecewise continuous function εη(·). tf ] if tf ≥ 1 x0 1 x0 x(t0 ) = x0 > 0.1.42) does not have a solution + t0 (a finite escape time). From (3.3. t) − f (xo (t). ε)]. tf ]. Note that it is necessary in the case of nonlinear dynamic systems to make an assumption about existence of a solution on the interval [t0 . 1 − x0 (t − t0 ) (3.1. 3.3. ·.45) (3. ε). ε) = 0. ε) is once continuously differentiable with respect to ε (Bliss’s . as uniqueness of solutions follows from Assumption 3. (3. ε) = f ([xo (t) + ξ(t. ε)]. λ(·). t) as follows: f ([xo (t) + ξ(t. η(·)) + O(t. Adjoin (3.45). x0 ) = J (u(·). which is analogous to (3.e. ε) with respect to ε. continuously differentiable. uo (t). (3. (3. from (3. ε). we expand f (·..41) to the performance criterion (3. ξε (·) is propagated as ˙ ξε (t) = fξ (xo (t). ε) dt − ελT (tf )z(tf . t)η(t). uo (t). t)z(t. [uo (t) + εη(t)] .66 Chapter 3. ˆ Evaluate the change in J (i.46) where. x0 ) tf = t0 εLx (xo (t). uo (t). t)z(t. t)η(t) + O(t. Consequently we can write ξ(t. x0 ) = J (u(·). η(·)). tf ]. t)η(t) ˙ + ελT (t)z(t. ε) = εz(t. (3. the partial derivative of ξ(·. η(·)) + εφx (xo (tf ))z(tf . t) = f (xo (t).49) where λ(t) is an as yet undetermined. t) dt (3. x0 ) − J (uo (·). λ(·). Systems with General Performance Criteria Theorem). uo (t). t) − x(t)] dt. η(·)) + εfu (xo (t). t)η(t) + ελT (t)fu (xo (t). Having established this intermediate result. t) + εfx (xo (t). n vector function of time on [t0 . ε). ˙ (3. η(·)) + O(t.50) + λ (t0 )x0 − λ (tf )x(tf ).48) λT (t) [f (x(t). λ(·). t)z(t. uo (t).51) . u(t). u(t). uo (t). uo (t). λ(·). η(·)) + εLu (xo (t). x0 ) + T tf t0 T ˙ λT (t)x(t) + λT (t)f (x(t). uo (t). η(·)) + O(ε).8).47) where ξε (·) = z(t. in J) brought about by changing uo (·) to uo (·) + εη(·): ˆ ˆ ˆ ∆J = J (uo (·) + εη(·). ε)] . t)ξε (t) + fu (xo (t).2) as follows: ˆ J (u(·). ·. Integrating by parts we obtain ˆ J (u(·). η(·)) + ελT (t)fx (xo (t). uo (t). x0 ) + tf t0 (3. (3.3. λ(t). t) + λT (t)f (x(t). which.1 Suppose uo (·) minimizes the performance criterion (3. t) .53) ∆J = ε t0 [Hu (x (t). Nonlinear Dynamic System Define the variational Hamiltonian as H (x(t). λ(t). The right-hand side of (3.5 then yields the following weak version of Pontryagin’s Principle. λ(t). λ(t).3..55) . u(t).2. as with (3. and set ˙ −λT (t) = Hx (xo (t). 3.3. (3.3. is legitimate. uo (t).51) then becomes tf 67 (3. uo (t).2–3.1–3. λ(t). u(t). uo (t). u(t). the partial derivative of the Hamiltonian with respect to the control u(t) is zero when evaluated at xo (t). t) . λT (tf ) = φx (x(tf )).54) which is identical in form to (3. λT (tf ) = φx (x(tf )).2.15). where ˙ −λT (t) = Hx (xo (t).41) and that H(x(t).52). The same reasoning used in Section 3. t) is defined according to (3. ε)dt + O(ε).2 Pontryagin’s Weak Necessary Condition Theorem 3. t) = L (x(t).16).52) (3. λ(t).2) subject to the dynamic system (3.3. t) = 0. t) . t) η(t)] dt + t0 o o tf O(t. uo (t). Then. under Assumptions 3.56) (3.57) (3. u(t).4 and 3.2.3. Hu (xo (t).2. u (t). viz. ˙ v = g sin θ.58) (3.41). t) = H (x(t).2. then H (x(t). u(t). λ(t)) = 0 by using (3. we take a slightly modified form.60) . 3.59) (3.52) is not an explicit function of t. (3. In this formulation. the class of continuously differentiable functions. v(0) = 0. i. ˙ z = v sin θ. However.56). and (3.3. where v is the velocity and θ is the control variable. λ(t)) and u(·) ∈ Uc . H(x(t). Here. this was not required in Section 3. Systems with General Performance Criteria Clearly this Theorem is exactly the same as Theorem 3. ˙ and the cost criterion is J = φ(x(tf )) = −r(tf ). λ(t).2.1. u(t). z(0) = 0. u(t).e.1. u(t).3.68 Chapter 3. we have the dynamics r = v cos θ. (3. where the linearity of (3. t). λ(t)) is a constant of the motion along the ˙ optimal path. Remark 3..1 If H as defined in (3. (3. it is necessary to invoke Bliss’s Theorem on differentiability with respect to a parameter of the solution of nonlinear ordinary differential equations. This is easily shown by noting that H (x(t).3 Maximum Horizontal Distance: A Variation of the Brachistochrone Problem We return to the brachistochrone example of Section 2. where we maximize the horizontal distance r(tf ) traveled in a fixed time tf . u(t).1 with A(t)x(t) + B(t)u(t) replaced by f (x(t).61) r(0) = 0.1) sufficed.55). (3. and λv (tf ) = 0 =⇒ θ(tf ) = 0. H = C. we have λr (t) ≡ −1. Since H is invariant with respect to time. The necessary conditions from Theorem 3.67) λr (tf ) = −1. (3. and ∂H = −λr v sin θ + λv g cos θ = 0.66) Note that v(0) = 0 implies that θ(0) = π/2.3.63) r v .3. (3. λv (tf ) = 0. tan θ(t) = λv (t)g λv (t)g =− . These are characteristics of the classic solution to the problem. θ) = ˙ v cos θ g sin θ . . tf ]. λ) = λT f = λr v cos θ + λv g sin θ.1 are ˙ λr = 0. ∂θ From (3.68) (3.64) where we have taken the elements of the Lagrange multiplier vector to be λr and λv .65) (3. u.3.69) t ∈ [0.3.67). (3. We thus reduce the state space to x= The dynamics can be written x = f (x. ˙ λv = −λr cos θ. t. λr v(t) v(t) (3.62) The variational Hamiltonian for this problem is H(x. and from (3. where C is a constant (see Remark 3. (3. Nonlinear Dynamic System Necessary Conditions 69 We first note that the value of z appears in neither the dynamics nor the performance index and may therefore be ignored.65).1). a1 = 0.74) (3.69) and (3. π π (3. v(0) = 0 and. At t = tf .58) gives r=− ˙ v2 C (3. λv (tf ) = 0 = a2 cos( gtf C gtf ) C ⇒ = − π ⇒ sin 2 gtf C = −1.71) (3. C C λv (t) = a1 sin (3.76) sin gt . C C g g ˙ −C λv = v(t) = −a1 g cos t + a2 g sin t.64) reduces to v cos θ = − . (3.75) At t = 0. (3.73) is g g t + a2 cos t. then C r = v cos θ = − ˙ πt 2gtf v2 = sin2 . Then C = −v(tf ) = Since v(t) = 2gtf π −2gtf 2tf = ga2 ⇒ a2 = − .70 Chapter 3.66) results in an oscillator as v ˙ ˙v λ = 0 g 2 /C −1/C 0 v λv . from (3. therefore.73) The solution to (3.70) into (3.70) and into (3.70) C = −v(tf ). Since v(tf ) is positive and thus C is negative. v(0) λv (tf ) = 0 0 .77) .69) and H = C. Therefore. θ(tf ) = 0.60) and (3.70) together gives sin θ = Introducing (3. C Using (3. note that at tf . C (3. To determine C. v(tf ) = −ga2 = 0. From (3. Systems with General Performance Criteria This problem can be solved in closed form.72) λv g . C π 2tf (3. since θ(tf ) = 0. One might also wish to generate (construct) a control function u(·) which satisfies ˜ Pontryagin’s Principle. and λ(·) so that (3. 3.6 we verified that a certain control function.55) can be ˜ ˜ tested. to obtain x(·).1.41).41) is . τ t t gτ 2t vv ˙ = gτ sin cos = sin . for a specific example. dt = 2tf 4 τ τ gt2 f . For a general nonlinear dynamic system this “verification” is also possible. numerically if necessary.4 Two-Point Boundary-Value Problem In Section 3. This was possible without resorting to numerical (computer) methods because of the simplicity of the example. ˜ ˜ Having done this. π (3.79) (3. Nonlinear Dynamic System This integrates to 2gtf r(t) = π where τ = direction. First.80) where the value at tf is z(tf ) = 2g. u(·) from the “initial” value λT (tf ) = φx (˜(tf )). g τ τ 2 τ 2tf .2.56) back- x ward in time along the path x(·).78) In the vertical The maximum horizontal distance is r(tf ) = (3.3.78) and (3. ˜ One then integrates (3. π tf 0 71 sin2 πt 2t gτ 2 2t − sin . This is more difficult.3. one then has in hand x(·). one integrates (3. since the initial condition for (3. with u(·) = u(·).3. Note that (3. where u(·) is the ˜ ˜ control function to be tested. satisfied the conditions of Theorem 3.80) are the equations for a cycloid. z = v sin θ = ˙ This integrates to z(t) = g 1 − cos 2t . u(·).2. .56) cannot be integrated simultaneously from t = t0 or from t = tf . we have a so-called two-point boundary-value problem: (3. the converges rate becomes slow. whereas only the final condition for (3. Solving the Two-Point Boundary-Value Problem via the Steepest Descent Method The following is used in the procedure of the following steepest descent algorithm. called steepest descent. The steepest descent method does not converge to an extremal trajectory that satisfies the first-order necessary conditions but is not locally minimizing. iterates on Hu until the optimality condition Hu = 0 is approximately satisfied. The difficulty with this approach is that as Hu becomes small. The socalled linear quadratic optimal control problem is one class of problems in which the two-point boundary-value problem can be converted into an ordinary initial value problem.7. In contrast to the steepest descent method. Systems with General Performance Criteria known at t = t0 .41) and (3. convergence can be very slow.e.4. This method. A numerical optimization method is presented which may converge to a local optimal path. Steepest descent usually converges initially well from the guessed control sequence. Another difficulty with the shooting method is that it may try to converge to an extremal trajectory that satisfies the first-order necessary conditions but is not locally minimizing (see Chapter 5). the shooting method converges very fast in the vicinity of the the optimal path. i. a second numerical optimization method called the shooting method is described in Section 5. For comparison. Usually one has to resort to numerical techniques to solve the two-point boundary-value problem. this class of problems is discussed in Chapter 5. if the initial choice of the boundary conditions is quite far from that of the optimal path. it satisfies the optimality condition explicitly on each iteration but requires converging to the boundary conditions. However. .72 Chapter 3.56) is known at t = tf . 1 The relationship between a first-order perturbation in the cost and a perturbation in the control is δJ = φx (x (tf ))δx(tf ) + t0 tf i tf (Lx (xi (t).86) . Choose the nominal control. t)δu)dt (3. ui (t).3. Integrate the adjoint λi equation backward along the nominal path λi (tf ) = φT (xi (tf )). t)δu(t). From Note 3. ui (t). 2. ui (t). ui (t). ui (t).81) is obtained from the first-order perturbation δx generated by the linearized differential equation δ x(t) = fx (xi (t).. i. xi (·). ui (t). t)δx(t) + Lu (xi (t). t))δudt.3. t) (3. t)T − fx (xi (t). t) to determine the control for the (i + 1)th iteration as T ui+1 (t) = ui (t) − Hu (xi (t).3. t) + λiT (t)fu (xi (t). ui (t). t) = Lu (xi (t). Nonlinear Dynamic System 73 Note 3. t). ui (t). integrate forward from t0 to tf .3. 4. where the perturbed cost (3. ui (t).81) = t0 (Lu (xi (t). ˙ adding the identically zero term tf d (λiT δx)dt t0 dt (3. ui (t).84) T Form the control variation δu(t) = − Hu (xi (t). ui (t).85) (3. 3. at iteration i = 1. Generate the nominal state path.e.83) (3. t)T λi (t). λi . t) + λiT (t)fu (xi (t). ui (t). 1. The steps in the steepest descent algorithm are given in the following procedure.1 let Hu (xi (t).82) − φx (xi (tf ))δx(tf ) to the perturbed cost. (3. ui ∈ U. t)δx(t) + fu (xi (t). ui (t). and by assuming that the initial condition is given. x ˙ λi (t) = −Lx (xi (t). u(t). 2. If not. tf ]. Systems with General Performance Criteria for some choice of > 0. if uo (·) minimizes J. t) ∀t ∈ [t0 .3. ˙ 1. t) = 0. The perturbed cost criterion (3.88) but (3. λ(t). x(0) = x0 = 1 3. λ(t). Problems Solve for the control u(·) ∈ U that minimizes J= 1 x(10)2 + 2 10 0 (x2 + 2bxu + u2 )dt subject to the scalar state equation x = x + u. which preserves the assumed linearity. however. analytically with b = 0. .87) 5. (3.89) is stronger as it states that uo (t) minimizes with respect to u(t) the function H(x(t). t) ≤ H (xo (t).74 Chapter 3. t) 2 dt. then H (xo (t). −10. numerically by steepest descent by using the algorithm in Section 3. uo (t).4 Strong Variations and the Strong Form of the Pontryagin Minimum Principle Theorem 3. (3. ui (t). That is. then stop. u(t).4. that a stronger statement is possible.81) is approximately δJ = − tf t0 Hu (xi (t). ui (t).1 is referred to as a weak Pontryagin Principle because it states only that Hu (xo (t).89) implies (3. −1.88) It turns out. t) is sufficiently small over the path. λ(t). If Hu (xi (t).3. uo (t). go to 2. λ(t).89) Clearly (3. t). (3. As we will now show. t ). instead of introducing a perturbation εη(·) which can be made small by making ε small. η(·) = 0. In order to prove this strong form of Pontryagin’s Principle one introduces the notion of strong perturbations (variations). Here a perturbation η(·) is made to uo (·) which may be large in magnitude but which is nonzero only over a small time interval ε. ·) is once continuously differentiable in x and continuous in u and t where t ∈ [t0 . η(·)) with respect to t allows us to write ˙ ξ (t. ¯ ¯ Since η(·) = 0 on [t0 .92) . (t + ε. η(·)) = f (xo (t) + ξ(t.3 and 3. t) and ξ t.3. η(·)) = (t − t)ξ t.90) ξ (t. the assumption of continuous differentiability with respect to u can be relaxed. η(·) + O(t − t) (3. this type of perturbation still results in a small change in x(·). so that t (3. In other words. η(·)). uo (τ ) + η(τ ). t + ε] we have ˙ ξ (t. ·. η(·)) = t [f (xo (τ ) + ξ(t. η(·)) = x(t)−xo (t) ¯¯ is zero on this interval. Assumption 3. tf ]. ·. the piecewise differentiability of ξ(·. uo (τ ). τ )] dτ (3. Assumptions 3. t ≤ t ≤ t + ε and set η(·) = 0 on the intervals [t0 . uo (t) + η(t). we have x(t) = xo (t) on [t0 . Strong Variations and Strong Form of Pontryagin Minimum Principle 75 In the following derivation.3. uo (t). τ ) − f (xo (τ ). tf ]. t) − f (xo (t).2. η(·)).4. Clearly. t ] and ξ(t. Therefore. t ). ·) and the n vector function f (·.91) ¯ ¯ for t ≤ t ≤ t + ε.1 are replaced by the following. we introduce a not necessarily small continuous ¯ ¯ ¯ ¯ perturbation η(t).1 The scalar function L(·.4. On the interval [t. [uo (t) + η(t)] . η(·)) + O(t. [uo (t) + η(t)] . η(·)) = εz (t. t (3.96) (3. η(·) = εξ t. t) z (t.98) (3. [uo (t) + η(t)] . η(·)) + O(t. t) + (t − t)fx (xo (t).97) (3. η(·) + O(t − t) . uo (t). ε). [uo (t) + η(t)] .94) ˙ = L (xo (t). t) and ¯ t ∈ (t + ε. Systems with General Performance Criteria ˙ ξ t + ε.95) ˆ Using (3. uo (t).50) we evaluate the change in J (i. uo (t). η(·) + O(ε). we can expand L for t ≤ t ≤ t + ε as L ˙ xo (t) + (t − t)ξ t.76 and Chapter 3. t) + εfx (xo (t). tf ]: . t) z (t.e. η(·)) + O(t.3 that for t ∈ [t + ε. t) = L (xo (t). t ˙ = f (xo (t). ¯ ¯ Similarly. t) ξ t. [uo (t) + η(t)] . ε)] . uo (t). t) + (t − t)Lx (xo (t). η(·)) + O(t. uo (t).. for t ≤ t ≤ t + ε f ˙ xo (t) + (t − t)ξ t. uo (t). whereas η(·) = 0 for t ∈ [t0 . [uo (t) + η(t)] . ¯ and for t + ε < t ≤ tf L ([xo (t) + εz (t. ¯ ¯ Because of its differentiability. ε)] . η(·)) + O(t. ε). η(·) + O(t − t) . t) + εLx (xo (t). tf ] ξ (t. η(·) + O(t − t). t) ξ t. (3. ε). ¯ and for t + ε < t ≤ tf f ([xo (t) + εz (t. in J) brought about by the strong ¯ ¯ ¯ variation uo (·) to uo (·) + η(·) over t ∈ [t.93) (3. t) = f (xo (t). η(·) + O(t − t). ¯ It then follows along the lines indicated in Section 3. t + ε]. t) − L (xo (t). λ(t). (t + ε)) ˙ ˙ ¯ + λT (t + ε) ξ t. [uo (t) + η(t)] . [uo (t) + η(t)] . η(·) + O(t − t) dt + tf ¯ t+ε O(t. uo (t). η(·) + λT (t) [f (xo (t). tf ] in (3. Strong Variations and Strong Form of Pontryagin Minimum Principle ˆ ˆ ˆ ∆J(·. (t + ε)) ¯ ¯ ¯ ¯ − H (xo (t + ε). uo (t). x0 ) t+ε 77 = t L (xo (t). [uo (t + ε) + η(t + ε)] . (3. t) z(t. λ(t). uo (t). uo (t). λ(t + ε). ε)dt + O(ε). t) ˙ +(t − t)Lx (xo (t). ε)dt + O(ε). t) ˙ ¯˙ + (t − t)Hx (xo (t). λ(t + ε). t) − H (xo (t). η(·)) + ελT (t)fx (xo (t).99) ¯ By using the definition of H from (3. (t + ε)) ¯ ¯ ¯ ¯ ¯ + ε2 Hx (xo (t + ε).101) . [uo (t) + η(t)] . η(·)) + O(ε). [uo (t + ε) + η(t + ε)] . x0 ) − J(uo (·).100) ¯ The first integral in (3. t) z(t. t) − f (xo (t). λ(·). η(·) + εO(ε) + tf ¯ t+ε O(t. (3.100) can be expanded in terms of ε around the point t + ε to yield ˆ ¯ ¯ ¯ ¯ ¯ ∆J = ε H (xo (t + ε). t) ξ t. ·. ·) = J(uo (·) + η(·). λ(t + ε). η(·) + O(t − t) dt tf + t+ε εLx (xo (t). η(·)) + εφx (xo (tf ))z(tf . η(·) ˙ ˙ + (t − t)λT (t)ξ t. λ(·). t)] ˙ + (t − t)λT (t)fx (xo (t).52) and applying (3. [uo (t) + η(t)] .3. η(·)) + O(t. η(·)) ˙ + ελT (t)z(t.4. uo (t). uo (t + ε). (3. [uo (t) + η(t)] .53) over the interval (t +ε. [uo (t) + η(t)] .99). λ(t). ε) dt + ελT (tf )z(tf . we obtain t+ε ˆ ∆J = t H (xo (t). t) ξ t. t) + (t − t)λT (t) ξ t. 4.103) Remark 3. we conclude that H(xo (t).1.102) ¯ ¯ As t and ε are arbitrary and as the value of η(t + ε) is arbitrary.52).103) is classically known as the Weierstrass condition.1 Suppose uo (·) minimizes the performance criteria (3. λ(t). Theorem 3. uo (t). t) ≥ 0. Then.3. (3.2. we have the necessary condition ¯ ¯ ¯ ¯ ¯ H(xo (t + ε). u(t).4. Systems with General Performance Criteria For small ε.104) ∀ t in [t0 . then necessary conditions for optimality are Hu (xo (t). 3.2 If a minimum of H(xo (t). u(t). . and 3. λ(t + ε). t) is minimized with respect to u(t) at u(t) = uo (t). λ(t). Remark 3.2. the dominant term in (3. t) is twice continuously differentiable with respect to u. uo (t). uo (t). λT (tf ) = φx (xo (tf )).. and as (3. t + ε) . t) is defined according to (3. uo (t + ε). u(t). u. t).2. under Assumptions 3.1 Equation (3. t) is found through (3. λ(t). (3. 3.103) at uo (t) and H(xo (t). We have thus proved the following theorem. uo (t).4. u(t). λ(t).101) is the first one. t) = 0 and Huu (xo (t).105) The classical Legendre–Clebsch condition is Huu ≥ 0 and Huu > 0 is the strong form.4.4. λ(t). (3. (3. λ(t + ε). the Hamiltonian is minimized with respect to the control u(t) at uo (t). λ.99) to be nonnegative. t + ε) ¯ ¯ ¯ ¯ ≥ H(xo (t + ε). λ(t).101) must be nonnegative for the left-hand side of (3. λ(t). viz. H(xo (t).2) subject to the dynamic system (3.2.78 Chapter 3. λ(t). tf ]. t) where ˙ −λT (t) = Hx (xo (t). t) ≤ H(xo (t).41) and that H(x. uo (t + ε) + η(t + ε). (3. (3. Specifically. we may have the scalar control u(t) bounded according to −1 ≤ u(t) ≤ 1 where the set UB is defined as UB = {u(·) : −1 ≤ u(·) ≤ 1} . we see that the only change required in our derivation when u(t) ∈ UB is that uo (t) ∈ UB and that η(·) is chosen so that uo (t) + η(t) ∈ UB for all t in [t0 . in many (engineering) applications the size of the controls that can be applied is limited.2: Bounded control. tf ]. Therefore. .3.4. However. for example.1 Control Constraints: Strong Pontryagin Minimum Principle So far. we have allowed the control u(t) to take on any real value.4.107) Referring to (3. uo (t) is found for u(t) ∈ UB which Figure 3. Strong Variations and Strong Form of Pontryagin Minimum Principle 79 3. tf ]. tf ].101).106) where UB is a subset of m-dimensional bounded control functions.108) ∀ t in [t0 . For example. we may have bounded control functions in the class u(t) ∈ UB ∀ t in [t0 . (3. H(xo (t). uo (t). (3. tf ].3.1 notes that if u(·) ∈ Uc . 3. where uo (t). tf ]. t). xo (t− ) = xo (t+ ). λo (t− ) = λo (t+ ).110) (3.4.111) (3. λ(t). The optimal control may be not differentiable at a finite number of points because either it goes from being unconstrained to being on its bound or it is discontinuous by jumping between two (or more) equal minima of H. tf ]. λo (t− ). i. minimizes the performance criterion (3. An example of H versus u for bounded control is shown in Figure 3. u(t) ∈ UB and ˙ −λT (t) = Hx (xo (t). Consequently. e.41) and the constraint that u(t) ∈ UB for all t ∈ [t0 . Hence. λT (tf ) = φx (xo (tf )).52). λ(t). λ(t).2) subject to the dynamic system (3. Then.2. the Hamiltonian is minimized with respect to the control u(t).1.2.80 Chapter 3. uo (t− )) = H(xo (t+ ).2 Suppose that uo (t) ∈ UB . subject to the control constraint u(t) ∈ UB . With this modification we have proved the following necessary condition for optimality of uo (·). is continuous d d d d d d .3. the class of piecewise continuously differentiable and bounded functions. Systems with General Performance Criteria minimizes H. t ∈ [t0 .e. Theorem 3.. Now let u(·) ∈ UcB .4.2.109) Remark 3.2.g. H(xo (t− ). and that H(x(t). t) ≤ H(xo (t). viz. u(t). λo (t+ ). u(t). tf ].3 Remark 3. λ(t). since u0 (·) is chosen to minimize H d d d d at any time.4. and 3. even if the optimal control is not differentiable at. t) ∀ t in [t0 . 3.4.. under Assumptions 3. uo (t). t) is defined according to (3. uo (t+ )).. the Hamiltonian is constant along the optimal trajectory. The fact that uo (·) ∈ UcB guarantees that x(t) and λ(t) are continuous for all t ∈ [t0 . point td .2. time does not appear explicitly in the Hamiltonian. and the condition Hu = 0 holds. at uo (t). ABi . if H is not an explicit function of time. This is a controllability assumption. .4. ˙ x(t0 ) = x0 .4. .3. u (3. A is an n × n constant matrix. i = 1. it is often referred to as the constant of motion. Under this assumption we have the following result.1. . (3. . m. Here.3. .112) so as to minimize a linear function of the final value of x. i = 1.3 The controls ui (t) = −sign[BiT e−A T (t−t ) f α]. . Theorem 3. B is an n × m constant matrix.113) subject to the constraint −1 ≤ ui (t) ≤ 1. as discussed in Remark 3. m. . viz. . In this case.. Strong Variations and Strong Form of Pontryagin Minimum Principle 81 also across td . (3. for constant control bounds. . Strong Pontryagin Minimum Principle: Special Case Let us suppose that we wish to control the linear dynamic system x(t) = Ax(t) + Bu(t).115) . . where the system is controllable from each control (see [8]). it remains constant when the control is differentiable. . Assumption 3. . We make the following assumption. i = 1. min αT x(tf ).114) where tf is the known final time. . . m. The continuity of the H across the nondifferentiable points of uo (·) implies that the Hamiltonian remains constant along the entire optimal solution. and α is an n-dimensional column vector. Therefore. (3. An−1 Bi ]. .4. have rank n. .2 The matrices [Bi . ABi . its time derivatives must also be zero. t) = λT (t)Ax(t) + λT (t)Bu(t). u(t). we show that Assumption 3.120) ..82 Chapter 3. . so that ˙ −λT (t) = λT (t)A. (3. Suppose the converse. −1 if σ < 0. . T BiT AT e−A (t−tf ) α = 0. i. (3. at time t say. yielding BiT e−A (t−tf ) α = 0.116). Then in this interval.2 ensures that BiT e−A α is not identically zero on a nonzero interval of time so that the control (3.e. We have H(x(t).114).117) e α = 0.115) satisfy Pontryagin’s Principle. T (t−t ) f (3. λ(t).118) This equation can be integrated explicitly to yield λ(t) = e−A T (t−t ) f α. . Systems with General Performance Criteria are well defined and minimize the performance criterion αT x(tf ) in the class of piecewise continuous controls that satisfy (3.119) (3. Here.116) Proof: First. . .115) is nonzero except possibly at a finite number of times that the control switches as σ goes through zero (3. λT (tf ) = αT . . Now we show that the controls (3. that this function is zero on an interval of time.4. . An−1 Bi ] is less than n—a contradiction. BiT AT n−1 −AT (t−tf ) T (3. sign(σ) = 1 if σ ≥ 0. these equations imply that the rank of [Bi . As the exponential function is always nonsingular and as α = 0. 4 Using (3. Remark 3. in this section we take a different approach to sufficiency via the Hamilton–Jacobi–Bellman (H-J-B) equation. Sufficient Conditions for Optimality so that H(x(t). λ(t).2 shows that ui (t) = 0 for any finite period.3.5. we have stated and proved the fact that Pontryagin’s Principle is a necessary condition for optimality. It turns out that this can be verified by direct calculation (cf. . u(t). this would involve higher-order derivatives of L. so we have not yet proved the optimality of the control. Pontryagin’s Principle is only a necessary condition for optimality.115) satisfies Pontryagin’s Principle. and H is minimized by choosing ui (t) = −sign BiT e−A T (t−t ) f 83 (3. If arbitrary expansions were used.4. .1) it is sufficient for optimality. Section 3. . However. So we have proved that (3. t) = αT e−A(t−tf ) (Ax(t) + Bu(t)). and φ and would be very cumbersome.2.122) Assumption 3.6).5 Sufficient Conditions for Global Optimality: The Hamilton–Jacobi–Bellman Equation Throughout. Sections 3. but we leave this exercise to the reader. . .122) which is just (3.6 and 3. f . 3.115). one would have to allow arbitrary (not only small) deviations from the trajectory xo (·). This should not surprise the reader since in all our derivations we have chosen control perturbations that cause only a small change in x(·) away from xo (·).121) α . m.4. It turns out that only in special cases (cf.4.2. Therefore. In order to derive a sufficient condition for optimality via variational (perturbational) means. (3. i = 1. u(t).123) and V (x(t). Systems with General Performance Criteria Our approach to global sufficiency centers on the H-J-B partial differential equation −Vt (x(t). t) is the optimal value of the cost criterion starting at x(t). x(tf ) in (3. u(t).127) The implementation of the H-J-B equation is sometimes referred to as dynamic programming. u(t).84 Chapter 3.6. t) = 1 2 u (t). V (x(t). The notation x(t) should be read as x at time t.124) The function V (x(t).2. t and using the optimal control to the terminal time. t) = u(t) φ(x(tf )) = x1 (tf ).126) (3. u(t). Here L(x(t). t) = min [L(x(t). (3. State space constraints are beyond the scope of this book and are not discussed herein. (3. t)f (x(t). t) + Vx (x(t). x(t) is dependent on t. we apply it to the example treated in Section 3.124) is read as x at time tf . t) : Rn × R → R is known as the optimal value function. Therefore. tf ) = φ(x(tf )). it turns out to be naturally useful in deducing sufficient conditions for optimality and for solving certain classes of optimal control problems. To develop some facility with the H-J-B partial differential equation. Although this equation is rather formidable at first glance.125) . 5 (3. and as we will see. When we consider integration or differentiation along a path. Note that x(t) and t are independent variables in the first-order partial differential equation (3.123) with V (x(tf ).5 Note that inequality constraints included on the control as well as on the state space become boundary conditions on this first-order partial differential equation. . 2 x2 (t) f (x(t). t) is the dependent variable. t)] u(t)∈UB (3. evaluated at t = tf equals x1 (tf ). t)]2 . What is more.132) (3.134) . Carrying out the minimization of the quantity in square brackets yields uo (t) = −Vx2 (x(t).130) (3. t) = x2 (t) − (tf − t)2 . t) = tf − t.3. t) = 1.128) yields −Vt (x(t).38).131) (3. t) = Vx1 (x(t).132) that uo (t) = −(tf − t) and from (3. t)x2 (t) − We assume that (3. moreover. (3.129) (3.128) and.128) V (x(tf ).132) and (3. t). t) = min u(t)∈U 85 1 2 u (t) + Vx1 (x(t). t)u(t) . 2 1 [Vx2 (x(t).130) has a solution 1 V (x(t). t)x2 (t) + Vx2 (x(t). t) = x1 (t) + (tf − t)x2 (t) − (tf − t)3 . 2 (3.123) we obtain −Vt (x(t). and 1 −Vt (x(t). 6 which agrees with (3. Sufficient Conditions for Optimality Substituting these expressions into (3.129) and (3.5.133) show that (3. (3.131) satisfies (3. and substituting this back into (3. 2 (3. we have from (3.135) (3. t0 ) = x10 + (tf − t0 )x20 − (tf − t0 )3 . Vx2 (x(t). tf ) = x1 (tf ).133) Consequently. 6 Then Vx1 (x(t).131) 1 V (x(t0 ). t) (3.2 and 3. t0 ) is equal to the minimum value of (3. u(t). tf ) + tf t0 d V (x(t). the control function uo (xo (·). ·). u(t). ·) minimizes (3. u(t).6). Theorem 3. Then. t0 ) − V (x(tf ). t) that minimizes L(x(t).1 Suppose there exists a once continuously differentiable scalar function V (·. ·) of x and t that satisfies the H-J-B equation (3. t) = Vt (x(t).123). t) dt yields ˆ J(u(·).2 and that f (·. and V (x0 . L(·. t)f (x(t). x0 ) = V (x0 . t) + Vx (x(t).139) + Vx (x(t). t0 ) − V (x(tf ). We now develop this point fully via the following theorem.2.136) subject to the constraint that u(t) ∈ UB . t)f (x(t).41) and (3. . t)f (x(t). t) + Vx (x(t). u(t). u(t).86 Chapter 3. Adding this identically zero quantity to J and noting that d V (x(t).2) subject to (3. ·). t) (3. t)dt = 0 dt (3. ·. tf ) + φ(x(tf )) tf (3. t) + Vt (x(t). Systems with General Performance Criteria It is clear from these calculations that the H-J-B equation (3. under the further Assumptions 3. ·.2.106).3.138) + t0 [L(x(t).5. and φ(·) are continuous in all their arguments. Proof: Under the stated assumptions we note that the identity V (x(t0 ).137) holds for all piecewise continuous functions u(·) ∈ UB .129) is closely related to the optimal solution of the example in Section (3. t)] dt. Suppose further that the control uo (x(t).2). t). t). Vx (x(t). t).3.3.144) f (x(t). Vx (x(t).142) = t0 then the integral represents the change in cost away from the optimal path. u(t). t). (3. where (3. uo (x(t). t) = L(x(t). u(t). Sufficient Conditions for Optimality Suppose that V (x(t). x0 ) = V (x0 .140) is rewritten as ˆ ∆J = J(u(·). Assumption 3. Consequently.142). u(t). the integrand of (3. t)] dt. t). t) = Ax(t) + Bu(t) . t).1 Some interpretation of Equation (3. t).140) is required.140) is nonnegative and takes on its minimum value of zero when u(t) = uo (x(t). t0 ) tf = t0 tf [H(x(t). t). 2 (3. If Equation (3. Vx (x(t). u(t).143) (3. This integral is called Hilbert’s integral in the classical calculus of variations literature. t). t) t0 − H(x(t). t) − H(x(t). x0 ) − V (x0 .5. u(t). t)]dt ∆Hdt. t) satisfies (3. t) + Vx (x(t). Note that the integral is taken along a nonoptimum path generated by u(t). this completes the proof of the theorem. t) = 1 [x(t)T Qx(t) + u(t)T Ru(t)]. Remark 3.123). Furthermore. Example 3.5. uo (x(t). consider also the linear quadratic problem L(x(t).5. t0 ) + tf 87 [H(x(t). t)f (x(t).1 Consider Hilbert’s integral given in (3. (3. Then ˆ J(u(·).141) From the minimization in (3.2 ensures that the trajectory resulting from this control is well defined. Vx (x(t).140) H(x(t).123). Vx (x(t). u(t). u(t). assuming . i. t)BR−1 R(u(t) − uo (x(t). S(t) = S T (t) > 0. t) = xT (t)S(t).. 2 Vx (x(t). ·) as V (x(t).88 Chapter 3. t)) ≥ 0. t)Ruo (x(t). (3. t) 2 2 + uoT (x(t). t) is constant along the motion.1 shows that satisfaction of the H-J-B equation implies that the minimum value of the cost criterion starting at the point (x(t). the optimal value function. Systems with General Performance Criteria with Q = QT . t) and using the optimal control to the terminal time tf is equal to V (x(t). R = RT > 0. J = φ(x(tf )). t)) 1 1 T u (t)Ru(t) − uoT (x(t).145) uo (x(t). t). Theorem 3. t) = −R−1 B T S(t)x(t). t)Ru(t) 1 (u(t) − uo (x(t). t). t)Ruo (t)] + Vx B(u(t) − uo (x(t). t)Ruo (x(t).146) (3. t) − uoT (x(t). Therefore. so ∆H = 1 T [u (t)Ru(t) − uoT (x(t).5. t)] 2 + Vx (x(t).e. φ(x(tf )) = 0. Suppose there exists an optimal trajectory emanating from (x(t). Then. let us consider an optimization problem involving only a terminal cost function. t)Ruo (x(t). t). t) = 1 T x (t)S(t)x(t). To develop a conceptual notion of V (x(t). t) is a global minimum for this linear problem with quadratic performance index. 2 = = (3. t))T R(u(t) − uo (x(t).147) (3. and its derivative along the path. Then 1 ∆H = [uT (t)Ru(t) − uoT (x(t). the value of the optimal value function equals the optimal value φ(xo (tf )). t)).148) This means that uo (xo (t). at every point along this path. 2 Assume the form of V (·. V (x(t). u(t). t) = min f (x(t).3. where ˜ ˙ x(t) = f (x(t). dt ∂x(t) ∂t where uo (x(t). (3.152) where xn+1 (·) is an element of a new state vector x(·) = [x(·)T . xn+1 (·)]T ∈ Rn+1 . t). t) = min Vx (˜(t). t)f (x(t). u(t). u(t).155) x x = min Vx (˜(t).149) (3. dt and therefore (3. u(τ ). t) = f (x(t). u(t). t) ≥ 0.150) (3. t) ∂V (x(t). t) . Sufficient Conditions for Optimality continuous first partial derivatives of V (x(t). τ )dτ. t) is the optimal control. then dV (x(t). t)L(x(t). t) ∂V (x(t).156) . (3. u(t). t) . (3.5. u(t). t). t) = ˜ The H-J-B equation is now ˜ −Vt (˜(t). t) + = 0. is dV (x(t). u∈UB ∂t ∂x(t) 89 (3. t)f (x(t). x(t0 ) = ˜ x0 0 . ˜ The new terminal cost criterion is ˜x φ(˜(tf )) = φ(x(tf )) + xn+1 (tf ). t) + Vxn+1 (˜(t). u(t). t) ∂V (x(t).154) f (x(t). t) L(x(t).149) can be restated as − ∂V (x(t). If any other control is used. t) x ˜ x u∈UB u∈UB (3.123) by making the simple transformation that t xn+1 (t) = t0 L(x(τ ). t).153) (3. uo (x(t).151) This equation can be generalized to that of Equation (3. xn+1 (t).162) V (x(tf ).123).161) Vt (x(t).5. .1 by applying it to the special case of Section 3. t). t)Bu(t)] . i and m i = 1. t) = min [Vx (x(t). t) = V (x(t). t)Bi | . t) .159) −Vt (x(t). (3. t) = Vx (x(t).157) (3.90 Chapter 3. . αT x(tf ). . x x Vxn+1 (˜(t). t) = 1. . we arrive at Equation (3. Here L(x(t). and with the notational change. V (x(t). m} .158) which yields uo (x(t).160) V (x(tf ). t) = −αT e−A(t−tf ) Ax(t) + i=1 αT e−A(t−tf ) Bi . u(t)∈UB = = = = 0. T . Systems with General Performance Criteria Since the dynamics are not an explicit function of xn+1 (·). . . t). t) = −sign BiT VxT (x(t). Therefore. {u : −1 ≤ ui ≤ 1. . We now illustrate Theorem 3. t) f (x(t). . m. (3.1. u(t). u(t). t)Ax(t) + Vx (x(t). i = 1. t) = α e yields Vx (x(t). Ax (t) + Bu(t). t) = αT e−A(t−tf ) .4. (3. m T −A(t−tf ) x(t) − t tf m αT e−A(τ −tf ) Bi dτ i=1 (3. The choice V (x(t). t)Ax(t) − i=1 |Vx (x(t). (3. a variation of xn+1 (·) will not change the optimal solution but is just a simple translation in the value of the optimal cost and hence of the optimal value function V (˜(t). tf ) = αT x(tf ). tf ) = α x(tf ). t) UB φ(x(tf )) The H-J-B equation is −Vt (x(t). u∈U V (x(tf ). u(t).161).1 holds. Nevertheless. t).5. We will develop this relationship in the next subsection. in more complicated nonlinear control problems. t) satisfies the same equation as that for λT (t) and thereby gives a geometrical interpretation to the Lagrange multipliers. there are no control constraints and (3. Furthermore. (3. Here we illustrate this when U is the whole m-dimensional space.164) . t) + Vx (x(t).1 Derivatives of the Optimal Value Function In certain classes of problems it is possible to show that when Theorem 3. t) and λT (t) of Pontryagin’s Principle.5. (3. viz.5. t) = min [L(x(t). It will not have escaped the reader that there is a striking similarity between Vx (x(t). Naturally.161) evaluated at t = t0 .1 we then conclude that the controls given by (3. 3.163) which by (3.3. thus verifying that the H-J-B equation is satisfied by (3. we use the fact that A and e−A(t−tf ) commute.123) becomes −Vt (x(t). Rm .162) is −Vt (x(t). it is more difficult to generate or guess a V function that satisfies the H-J-B equation. t)] . tf ) = φ(x(tf )).115) are minimizing and that the minimum value of the performance criterion αT x(tf ) is given by (3.. u(t). in the expression for Vt (x(t). In view of Theorem 3. This derivation assumes that the second partial derivative of V (x(t). Substituting the expression for Vx (x(t).160) yields m α e T −A(t−tf ) Ax(t) − i=1 αT e−A(t−tf ) Bi . Sufficient Conditions for Optimality 91 where.5. t) with respect to x(t) exists. the control problem is then completely (globally) solved. if such a function can be found. t)f (x(t). t). the gradient (derivative) with respect to x(t) of the optimal value function V (x(t). t) into the right-hand side of (3. Calling the minimizing control uo (xo (t). uo (xo (t). t). uo (xo (t). t)f (xo (t). t). we replace the arbitrary initial state x(t) in the H-J-B equation with xo (t). t)fx (xo (t). t). t) = 0. (3. Vx (xo (t). t)Vxx (xo (t). t) + Vx (xo (t).169) . and d Vx (xo (t). uo (xo (t). uo (xo (t). t). t). uo (xo (t). Lu (xo (t). we obtain −Vt (xo (t). t) = Lx (xo (t). uo (xo (t). t). t)uo (xo (t). t) + Vx (xo (t). uo (xo (t). uo (xo (t). t)uo (xo (t).167) (3. (3. we differentiate (3.92 Chapter 3. t) = Vxt (xo (t). t). t) + Vx (xo (t). t). t)fu (xo (t). t). uo (xo (t). t) + f T (xo (t). t)fu (xo (t). t) if the second partial derivatives of V exist. dt (3. because of the minimization with respect to u(t). t).165) First Derivative of the Optimal Value Function Assuming that the necessary derivatives exist. t).165) with respect to xo (t) to obtain −Vtx (xo (t).166) Using the equations in (3. t). t). t). t) x + Vx (xo (t). t)] = H(xo (t).168) (3. Systems with General Performance Criteria Since we take time derivatives along the optimal path generated by the optimal control. uo (xo (t). t). t) + f T (xo (t). t) = [L(xo (t). t) = Vxt (xo (t).166) and noting that. t)Vxx (xo (t). t) + Lu (xo (t). uo (xo (t). t). x Note that Vtx (xo (t). which can be interpreted as the derivative of the optimal value function with respect to the state. uo (xo (t).) Assumption 3.57). t).170) It is also clear from (3.170) and (3. t)fx (xo (t). t). If the optimal control function does not lie on a control boundary. tf ) = φx (x(tf )).171) Comparing (3. uo (xo (t).5. t) = u(t) ⇒ ux (x(t). Assumption 3. t) (3. uo (xo (t). then u(x(t). t). t) + Vx (xo (t). u(t). however. (If u is on a bound.5. t) < H(xo (t). it is awkward to find a physical interpretation for λ(·). Vx (xo (t). then there are no restrictions assuming an arbitrarily small control variation away from the optimal control.172) .5. certain assumptions must be made first for simplicity of the derivation. Second Derivative of the Optimal Value Function The second derivative of the optimal value function is now derived and is. control problems where there is no continuously differentiable V (·.171) to (3. Sufficient Conditions for Optimality we obtain 93 ˙ −Vx (xo (t). ·) ∈ Interior U. Vx (xo (t). Here. In such cases. There are. ·) that satisfies the H-J-B equation. but Pontryagin’s λ(·) is well defined by its differential equation. (3. t). t).2 H(xo (t). Using this the following results would have to be modified. t).1 uo (x(·). in general. (3. we see that Vx (xo (t). the curvature of the optimal value function with respect to the state.164) that Vx (x(tf ). t) = 0. t) satisfies the same differential equation and has the same final value as λ(t)T .3. t) = Lx (xo (t).56) and (3. t)uo (xo (t). Vx (xo (t). It is assumed here that third derivatives exist. x x (3. taking second partials of a dynamic vector function f (x(t). t) + Hxu (xo (t). t). t). t). t). t). Vx (xo (t). t)uo (xo (t). t) = 0. t)uo (xo (t). uo (xo (t). t). t)Huu (xo (t). Vx (xo (t). uo (xo (t). Then from the H-J-B equation (3. t). t). (3. uo (xo (t). t). uo (xo (t). Systems with General Performance Criteria Hu (xo (t). Vx (xo (t). t). Vxx is directly differentiated as . the function H is used to avoid tensor products. t). t) > 0. t)Vxx (xo (t). tf ) = φxx (x(tf )). t). t). t). t).175) Since a differential equation for Vxx along the optimal path is being sought. t). Vx (xo (t). t). t). Vx (xo (t). Vx (xo (t). t). t)Hux (xo (t). t) are taken with respect to each of its explicit arguments. uo (xo (t). t). t) x + HxVx (xo (t). uo (xo (t). uo (xo (t).94 implies Chapter 3. uo (xo (t). t). t). Vx (xo (t).174) where derivatives of H(xo (t). Furthermore. t)HVx u (xo (t).173) + HVxi (xo (t). t) + Vxx (xo (t). t). Vx (xo (t).165) −Vtxx (xo (t). The boundary condition for (3. t)Vxi xx (xo (t). t) i oT + ux (xo (t). Huu (xo (t).e. Vx (xo (t). t)Vxx x + uoT (xo (t). t) x n (3. t). uo (xo (t). i. Vx (xo (t).174) is Vxx (x(tf ). uo (xo (t). t) + Vxx (xo (t). uo (xo (t). uo (x(t). t)HuVx (xo (t). t) = Hxx (xo (t). uo (xo (t). t). t). Vx (xo (t). t)HVx x (xo (t). t) + uoT (xo (t). t) with respect to the state vector.. t).174) into (3. t). From Assumptions 3. t). Vx (xo (t). t). t). t). Vx (xo (t). t) = 0 is true for all xo (t). t). uo (xo (t). t) + Vxxt (xo (t). and then Hux (xo (t). t). t) x + uoT HuVx (xo (t). Sufficient Conditions for Optimality 95 dVxx (xo (t). uo (xo (t). uo (xo (t). t).5.2. Vx (xo (t). Vx (xo (t). t). t). uo (xo (t). t). t).177) Finally. t)Vxx (xo (t). t). t) x + uoT (xo (t). t). Vx (xo (t).178) . Vx (xo (t). t). Vx (xo (t). t). t)HVx u (xo (t). x x (3. t). uo (xo (t). t)Vxi xx (xo (t). (3. t)Hux (xo (t).176) i = where the sum is used to construct the tensor product. Vx (xo (t). t). uo (xo (t).176) gives the differential equation − dV xx (xo (t). t)Huu (xo (t). uo (xo (t). Vx (xo (t). uo (xo (t). t). uo (xo (t). t). t)Vxi xx (xo (t). t). uo (xo (t). t). it follows that Hu (xo (t). Vx (xo (t). t) + Vxx (xo (t). t)uo (xo (t).5. t) + Huu (xo (t). t)Vxx (xo (t). Vx (xo (t). tion (3. t)Vxx x + uoT (xo (t). t) + Vxx (xo (t). Vx (xo (t). uo (xo (t). t). t) dt Substitution of Equa- + Hxu (xo (t). an expression for uo (xo (t). t).5.3. t) x + HxVx (xo (t). t). t)uo (xo (t). t) fi (xo (t). t) is required. x (3. t) dt n = i n HVxi (xo (t). uo (xo (t). uo (xo (t). uo (xo (t).1 and x 3. Vx (xo (t). t)HVx x (xo (t). Vx (xo (t). t) + HuVx (xo (t). t). t). t)uo (xo (t).179) (3. t)uo (xo (t). t). t) = 0. t) + Vxxt (xo (t). t) = Hxx (xo (t). t). (3. t)) .96 Chapter 3.5. t) x × (Hux (xo (t). t). and the existence of its solution will play an important role in the development of local sufficiency for a weak minimum. then this solution evaluated at t = t0 is the optimal value of (3.2. Here. More results will be given in Chapter 5.2 Derivation of the H-J-B Equation In Theorem 3.5. t)Vxx (xo (t).180) Substitution of Equation (3.5. t).5. (3. produces −1 uo (xo (t). Vx (xo (t).181) where HVx x = fx and HVx u = fu . uo (xo (t). and Huu produces tensor products. τ )dτ .182) . − dVxx dt T −1 T −1 = (fx − Hxu Huu fu )Vxx + Vxx (fx − fu Huu Hux ) −1 −1 T + (Hxx − Hxu Huu Hux ) − Vxx fu Huu fu Vxx . uo (xo (t). uo (xo (t). the H-J-B equation can be deduced. t).177) gives.1 we showed that if there is a solution to the H-J-B equation which satisfies certain conditions. Vx (xo (t). t). t). after some manipulations and removing the arguments. viz..180) into (3. t) + HuVx (xo (t). t) along an optimal path is propagated by a Riccati differential equation. t) = min u(·)∈UB φ(x(tf )) + t L(x(τ ). we prove the converse. The curvature of V (xo (t). that under certain assumptions on the optimal value function. Let us define tf V (x(t). (3. t) = −Huu (xo (t). Note that expanding Hxx . u(τ ).2). given Assumption 3. Note that the problem of minimizing a quadratic cost criterion subject to a linear dynamic constraint as given in Example 3.1 produces a Riccati differential equation in the symmetric matrix S(t). Hxu . Vx (xo (t). t). Vxx (xo (t). t). 3. Systems with General Performance Criteria This. t). t)f (x(t).2 Suppose that the optimal value function V (·. t) + L(x(t). τ )dτ (3. t). (t + ∆)). Proof: From the existence of V (·.3. Sufficient Conditions for Optimality 97 where u(τ ) ∈ UB for all τ in [t. it follows from (3. uo (t. x(t). x(t).184) and (3. t ≤ τ ≤ tf be the optimal control function for the dynamic system (3. (3. t). t) = Vx (x(t). t) dt and V (x(tf ). (3.187) + φ(x(tf )). Theorem 3.124).185) V (x(t). τ )dτ. tf ) = φ(x(tf )).184) (3. t) ≤ t tf L(x(τ ).186) Furthermore. ·) we have that −Vt (x(t).123) and (3. . Then V (·. x(t + ∆). t).5. uo (τ . We can now proceed to the theorem. x(t). uo (t. ·) defined by (3. t) = φ(x(tf )) + t L(x(τ ). t) = −L(x(t).183) so that d V (x(t). x(t).41) with “initial condition” x(t) at τ = t. ·). uo (τ .184). Then tf V (x(t). ·) satisfies the H-J-B equation (3. τ )dτ + t+∆ L(x(τ ).182) that t+∆ (3. (3.185) is once continuously differentiable in x(t) and t. and let uo (τ . t). and the differentiability of V (·. t). u(τ ). uo (t.5. tf ]. x(t). t) + Vt (x(t). t) + Vt (x(t). u(t)∈UB (3. It turns out that there are problems where this smoothness is not present so that this assumption is violated. t)∆ + Vx (x(t). t) ≤ t L(x(τ ). t) + Vx (x(t). t)∆ + O(∆). u(t). t)f (x(t). which in turn yields 0 ≤ [L(x(t).123) and (3. u(t). u(t). t) + Vx (x(t). t)] = 0. u(t). u(t). t + ∆] and all ∆ ≥ 0. u(t). Note that the above theorem and proof are based squarely on the assumption that V (·. t)f (x(t). Hence. t)∆ + V (x(t). (t + ∆)). u(t). for all u(t) ∈ UB 0 ≤ L(x(t). t) + Vx (x(t). we can rewrite (3.189) u(t)∈UB Since Vt (x(t). τ )dτ + V (x(t + ∆). Systems with General Performance Criteria where u(τ ) ∈ UB . (3. together with (3. t) ≤ L(x(t). t) + L(x(t). t). u(t). t)] ∆ + O(∆). Considering (3. t) + Vt (x(t).185).193) This. t ≤ τ ≤ t + ∆. u(τ ). is an arbitrary continuous function for some positive 0 < ∆ tf − t. This yields t+∆ V (x(t).188) and expanding the right-hand side in ∆ yields V (x(t). t)f (x(t). (3.190) This inequality holds for all continuous u(·) on [t. u(t).191) together we conclude that min [Vt (x(t).192) as −Vt (x(t). t)] . u(t). . (3.124).191) (3.186) and (3. is just the H-J-B equation (3. t) + Vx (x(t). t)f (x(t).98 Chapter 3. ·) is once continuously differentiable in both arguments. t)f (x(t). t) = min [L(x(t).192) (3. t) does not depend upon u(t). 194) is known as a f transversality condition. One could. a necessary condition for the optimality of tf = to is that f Jtf (uo (·).2). t) exist. (3. Equation (3.3. This is done because it is known that in certain classes of optimal control problems optimality is lost (i. 3. J can be made arbitrarily large and negative) if Vxx (x(t).5. however. t)dt.. of course. . to ) = 0. it is a nasty assumption as one has to solve the optimal control problem to obtain V (·.1. however. u(t). which provides sufficient conditions for optimality.6. Clearly.189) is clearly valid if one merely assumes that V (x(t). Unspecified Final Time tf 99 In any event. x0 .194) J(u(·). t) ceases to exist at a certain time t . t) exists and is once continuously differentiable. x0 . In many derivations of the H-J-B equation it is assumed that the second partial derivatives of V (x(t). along with u(·).6 Unspecified Final Time tf So far in this chapter we have assumed that the final time tf is given. f where tf (3. the expansion in (3. choose it to minimize (3. Consequently.e. tf ) + t0 L(x(t).195) provided. tf ) = φ(x(tf ). However. that J is differentiable at to . the above theorem is largely only of theoretical value (it increases one’s insight into the H-J-B equation). treat tf as a parameter and. ·) before one can verify it. this is a red herring in the derivation of the H-J-B equation. The main strength of the H-J-B equation lies in Theorem 3. 199) (3. uo (t). Systems with General Performance Criteria Note that in (3. nothing is gained by this explicit dependence of φ on tf . one that depends on both x(tf ) and tf . ·.197) and expanding the right-hand side of (3. to )f (xo (to ). ·) than previously. f Jtf (uo (·). uo (·) over the interval (to .109). viz. to ) + φtf (xo (to ). Then f J(u and J(u o o to f t0 (·).195) when tf = to . f f f f f It follows that for to to be optimal. if ∆ > 0. to ) f f = L(xo (to ). x0 . uo (to ).196) from (3. f f f f f f (3. uo (t). λ(to ). to ) f f f f + φx (xo (to ). t)dt (3. H is minimized by uo (to ) and since xo (·). to )∆ + O(∆). λ(·).197) yields J(uo (·). uo (to ).100 Chapter 3. to ) = 0. it is clear that uo (to ) and uo (to− ) both yield the f f . x0 . to ) + H(xo (to ). Naturally.. ·) f are continuous functions of time. to )f (xo (to ). to + ∆) is any continuous function emaf f nating from uo (tf ) with values in U. to f + ∆) = φ(x o (to f + ∆).200) Since. to ) f f + L(xo (t).197) where. uo (to ).196) (·). Subtracting (3.195) we allow a more general φ(·. ·. to ) f = φ(x o (to ). to ) = 0. ·). by (3. t)dt. Suppose that uo (·) minimizes (3. to )∆ + φtf (xo (to ). to + ∆) − J(uo (·). to )∆ f f f f f + φx (xo (to ). uo (to ). to f + ∆) + to +∆ f t0 L(xo (t). and f (·. in the case where tf is given. to ) = L(xo (to ). L(·. f f f f f f f (3. uo (to ).198) This condition (the so-called transversality condition) can be written using the Hamiltonian H as φtf (xo (to ). x0 . (3. x0 . x0 . Unspecified Final Time tf 101 same value of H.201) where x = dx/dτ . (3. (3. ˙ b = 0. x(t0 ) = x0 .6.109). where τ goes from 0 to 1. uo (to ). (3. The differential constraints become x t = (tf − t0 )f (τ. λ(to ).204) (3.3. the condition f φtf (xo (to ). u. where identical values of the minimum Hamiltonian H will occur at different values of u at tf .195) and that tf is unspecified. Consequently. It should be noted that jumps are allowed in the control. tf ) (tf − t0 ) . Remark 3.201) is the scalar system (n = m = 1) x(t) = bu(t). This is done by using the transformation t = (tf − t0 )τ + t0 . to ) = 0 f f f f f f holds in addition to (3.1 Suppose that φ(·. to minimizes (3. Then.1 The simplest example to illustrate (3.6. We will not use this approach here. Example 3.195).1 It should be noted that free final time problems can always be solved by augmenting the state vector with the time by defining a new independent variable and solving the resulting problem as a fixed final “time” problem. Theorem 3.203) (3. f We have thus proved the following theorem.6. if the pair uo (·).200) holds with uo (to ) replaced by uo (to− ) and because of this f f it is not necessary to assume for ∆ < 0 that uo (·) is continuous from the left at to .6. to ) + H(xo (to ). ⇒ dt = (tf − t0 )dτ.202) (3. ·) depends explicitly on tf as in (3. x.205) . 210) (3. t) = u(t)2 + λ(t)bu(t) 2 so that λ(t) = α and uo (t) = −bα. the optimal final time is determined as 1 to − b2 α2 = 0 f 2 The optimal cost criterion is 1 1 J(uo (·).207) tf t0 1 2 u (t)dt. Problems 1. u(t).208) (3. f 2 (3. x0 .201).211) ⇒ 1 to = b2 α2 .199). x0 . thus f f verifying (3. Its derivative with respect to tf is zero at to . Extremize the performance index J= subject to x = u. tf ) = αx(tf ) + t2 + 2 f Here 1 H(x(t).209) (3. 2 (3.210). tf ) = αx0 − α2 b2 (tf − t0 ) + t2 2 2 f (3.102 Chapter 3. Systems with General Performance Criteria with a performance criterion 1 J(u(·). λ(t). ˙ x(0) = x0 = 1. From (3. 1 2 1 0 e(u−x) dt 2 .206) evaluated at to given in (3. a satisfaction of Pontryagin’s Minimum Principle by a control function u∗ is a necessary and sufficient condition for u∗ to be the control that minimizes J. 2.3. control. (a) Determine the optimal state and control histories. 2 R > 0. ˙ tf x(t0 ) = x0 . Consider the following control problem with x = A(t)x(t) + B(t)u(t). ˙ x(0) = 1. (b) Discuss the derivation of the first-order necessary conditions and their underlying assumption. min J = u(t) t0 1 (aT x + uT Ru)dt. Consider the problem of minimizing with respect to the control u(·) the cost criterion tf J= t0 1 T u Rudt + φ(x(tf )). subject to x = Ax + Bu. R(t) > 0. Minimize the performance index 2 103 J= 0 (|u| − x)dt subject to x = u. ˙ x(t0 ) = x0 (given). (b) Show that the Hamiltonian is a constant along the motion. Unspecified Final Time tf (a) Find the state. 2 Prove that for this problem. |u| ≤ 1. What modifications in the theory are required for this problem? Are there any difficulties making the extension? Explain. 4. . 3.6. and multipliers as a function of time. t1 ≤ t ≤ tf . Consider the problem of minimizing with respect to u(·) the cost criterion tf J = lim subject to tf →∞ 0 qx4 u2 + dt 4 2 x = u. Check to ensure that all conditions are satisfied for H-J-B theory. 5.104 Chapter 3. Minimize the functional tf min = u(·)∈U 0 (x2 + u2 )dt subject to x = u. (a) Write down the first-order necessary conditions. ˙ Find the optimal feedback law. u1 . t1 ≤ t ≤ tf . where t1 is specified and t0 < t1 < tf . t0 ≤ t ≤ t1 . 1 State explicitly all assumptions made. Derive from first principles the first-order necessary conditions that must be satisfied by an optimal control function u∗ (t) = u∗ . Systems with General Performance Criteria where φ(x(tf )) is twice continuously differentiable function of x. . 6. t0 ≤ t ≤ t1 . ˙ x(0) = x0 . Suppose that we require u(t) to be piecewise constant as follows: u(t) = u0 . assume that the optimal value function is only an explicit function of x. Hint: If you use the H-J-B equation to solve this problem. 0 u∗ . 9. Unspecified Final Time tf 105 (b) Solve the two-point boundary-value problem so that u is obtained as a function of t and x. 8. ˙ . Use the Euler–Lagrange equations given in problem 7 to produce the equation of motion. Let 1 ˙ T = mx2 . (c) Show that the Hamiltonian is a constant along the extremal path.6. Consider the problem of the Euler–Lagrange equation tf min u(·)∈U 0 L(x. ˙ dt dt (b) Show that the Hamiltonian is a constant along the motion. The dynamic system is defined by x = u. 2 u2 dt . u)dt. where u(t) ∈ Rm and x(t) ∈ Rn subject to x = u.3. (a) Show that the first-order necessary conditions reduce to d d Lu − Lx = Lx − Lx = 0. 2 and L = T − U . ˙ x(0) = x0 . 7. Extremize the performance index J= 1 cx2 + f 2 tf t0 1 U = kx2 . where c is a positive constant. Systems with General Performance Criteria and the prescribed boundary conditions are t0 .106 Chapter 3. x0 . 11. x0 > 0. (u2 + 1) x 1/2 dt subject to the differential constraint x=u ˙ and the prescribed boundary conditions t0 = 0. Extremize the performance index 1 J= 2 tf t0 x0 = 0. Extremize the performance index 1 J= 2 tf t0 (u2 − x)dt subject to the differential constraint x=0 ˙ and the prescribed boundary conditions t0 = 0. 12. tf = given > 0. 10. tf ≡ given. tf = 1. Minimize the performance index J = bx (tf ) + 1 2 tf t0 −[u(t) − a(t)]2 + u4 dt subject to the differential constraint x=u ˙ . tf = 1. ˙ x2 = u. tf t0 |u| dt and the prescribed boundary conditions t0 = 0. Minimize the performance index 6 J= 0 (|u| − x1 )dt − 2x1 (6) subject to x1 = x2 . Determine the optimal state and control histories. ˙ |u| ≤ 1. t)dt . x0 = 1. u. 1]. x2 (0) = 0. Unspecified Final Time tf and the prescribed boundary conditions t0 = 0. a(t) = t. x1 (0) = 0. x2 (0) = 0. discuss the theoretical consequences. ¨ |u| ≤ 1. tf = 10. 14. Since L = |u| is not continuously differentiable. 13. Consider the problem of minimizing with respect to the control u(·) the cost criterion tf J = φ(x(tf )) + t0 L(x. 15. 107 Find a value for b so that the control jumps somewhere in the interval [0.3. x1 (0) = 10.6. Minimize analytically and numerically the performance index J = α1 x1 (tf )2 + α2 x2 (tf )2 + subject to x = u. t1 ≤ t ≤ tf . Systems with General Performance Criteria x = f (x. . u1 . t). . where t1 is specified and t0 < t1 < tf .. Suppose that we require u(t) to be piecewise constant as follows: u(t) = u0 . and t. 17. u. u. . (a) Determine the optimal value function V (x. Derive from first principles the first-order necessary conditions that must be satisfied by an optimal control function. and L and f are continuously differentiable in x. t). .e. t1 ≤ t ≤ tf . i = 1. 0 uo . x(t0 ) = x0 given. (b) Write the change in cost from optimal using Hilbert’s integral and explain how this change in cost can be computed. . Consider the problem of minimizing with respect to u(·) ∈ U J = F (x(t0 ) + φ(x(tf )) tf fixed. 1 16. t0 ≤ t ≤ t1 . ˙ where F is continuously differentiable in x. Consider the problem of minimizing φ(x(tf )) = αT x(tf ). subject to x = Ax + Bu ˙ for u ∈ U = {u : −1 ≤ ui ≤ 1. t0 ≤ t ≤ t1 . i. m}.108 subject to Chapter 3. uo (t) = uo . 3. where t0 is the initial time. u. 109 Develop and give the conditions for weak and strong local optimality. Find the control u(·) ∈ UB = {u : |u| ≤ 1} that minimizes J = (x1 (50) − 10)2 q1 + x2 (50)q2 + 2 50 0 tf t0 Ldt+φ(x(tf )) |u|dt . Unspecified Final Time tf subject to x = Ax + Bu. List the necessary assumptions. ˙ U = {u(·) : u(·) are piecewise continuous functions}. t). 18. ˙ 20.6. Derive the H-J-B equation for the optimization problem of minimizing. Minimize the performance index 4 J= 0 (|u| − x)dt + 2x(4) subject to x = u ˙ x(0) = 0. 19. with both fixed. the cost criterion J =e subject to x = f (x. with respect to u(·) ∈ U. and t0 < tf . |u| ≤ 1. Determine the optimal state and control histories. 110 subject to x1 ˙ x2 ˙ = Chapter 3. Systems with General Performance Criteria 0 1 0 0 x1 x2 + 0 1 u. . x1 (0) x2 (0) = 0 0 . Choose q1 and q2 large enough to approximately satisfy the terminal constraints x1 (50) = 10 and x2 (50) = 0. 111 Chapter 4 Terminal Equality Constraints 4..1. a necessary condition for optimality). the change in J can be negative only as a .1 Introduction Theorem 3. the elementary constructions used in this book turn out to be inadequate to deduce necessary conditions. When nonlinear terminal equality constraints are present and when the dynamic system is nonlinear.” This confirms that if Pontryagin’s Principle is satisfied. If deriving necessary conditions in these more involved control problem formulations is out of our reach. rather easily and rigorously show that Pontryagin’s Principle is a sufficient condition for “weak or strong first-order optimality. deeper mathematics is required here for rigorous derivations. In this chapter we first derive a weak form of the Pontryagin Principle as a necessary condition for optimality when linear terminal equality constraints are present and when the dynamic system is linear. nevertheless.55) that satisfaction of this condition is sufficient for the change in the performance criterion to be nonnegative. a weak form of the Pontryagin Principle. is a condition that is satisfied if xo (·). we can. to first order. for any weak perturbation in the control away from uo (·).e. uo (·) are an optimal pair (i. indeed.3. It is also clear from (3. we obtain a sufficient condition for global optimality via a generalized Hamilton–Jacobi–Bellman equation. Turning to nonlinear terminal equality constraints and nonlinear dynamic systems we introduce the notion of weak firstorder optimality and state a weak form of the Pontryagin Principle. Finally. It turns out that the most elementary. In line with our above remarks. . We then derive a Pontryagin-type necessary condition for optimality for the case of a general performance criterion and both linear and nonlinear terminal equality constraints.1 with the problem of steering a linear dynamic system from an initial point to the origin in specified time while minimizing the control “energy” consumed. we prove that the conditions of this principle are sufficient for weak first-order optimality and refer to rigorous proofs in the literature that the principle is a necessary condition for optimality. Thus Pontryagin’s Principle can be thought of as a first-order optimality condition. again we refer to the literature for a proof that the principle is a necessary condition for optimality. Terminal Equality Constraints consequence of second. direct methods are adequate to handle this problem. we show that Pontryagin’s Principle in strong form is a sufficient condition for strong first-order optimality.112 Chapter 4. In particular. By way of introducing optimal control problems with terminal equality constraints we begin in Section 4. examples are presented which illustrate and clarify the use of the theorems in control problems with terminal constraints.and/or higher-order terms in its expansion. We then remark that the two-point boundary-value problem that arises when terminal constraints are present is more involved than that in the unconstrained case. Our next theorem allows the terminal time tf to be unspecified. Throughout. we briefly introduce a penalty function approach which circumvents this. we comment here also on the notion of normality.2. Our results are then extended by allowing control constraints and by introducing strong first-order optimality. x0 ) = t0 uT (t)u(t)dt. the problem is one of steering the initial state of (3. These conditions. Now. t) = uT (t)u(t). (4.2.2). are adequate to deduce first-order necessary conditions for optimality. where (3.3) . however. Then. φ(x(tf )) ≡ 0.1) from x0 to the origin of the state space at time t = tf and minimizing the “energy” tf J(u(·).2. 4.1) and nonlinear (Section 4. D is a p × n constant matrix.2) and D is the n × n identity matrix.1) be satisfied. u(t). we impose the restriction that. at the given final time tf . but with linear system dynamics.2) terminal equality constraint.1) is to be controlled to minimize the performance criterion (3. Here. Linear Dynamic System with Terminal Equality Constraints 113 4.2. using elementary constructions.4.1 Linear Dynamic System with Linear Terminal Equality Constraints We return to the problem formulated in Section 3. (4.2.2. An important special case of the above formulation occurs when L(x(t). the linear equality constraint Dx (tf ) = 0 (4.2 Linear Dynamic System with General Performance Criterion and Terminal Equality Constraints We derive a Pontryagin-type necessary condition for optimality for the case of a general performance criterion and both linear (Section 4. tf W (t0 . We need the following assumptions.3). it is easy to solve this particular problem directly. τ )B(τ )B T (τ )ΦT (tf .2. using (3. Assumption 4. τ ) = I. and. u(·) ∈ UT . tf )Φ(tf . we obtain x (tf ) = Φ(tf . the solution of (3.3) we have tf 0 = Φ(tf . τ )dτ (4. viz.114 Chapter 4.4) is positive definite.6) Φ(tf .1). t0 )x0 = 0 so that the control function uo (·) steers x0 to the origin of the state space at time t = tf . Then. t0 )x0 − o tf t0 Φ(τ.8) . substituting this into (3. τ ). dt With this assumption we can set uo (t) = −B T (t)ΦT (tf . tf ) = t0 Φ(tf . τ )B(τ )B T (τ )ΦT (tf . tf ] that meet the terminal constraints.2.7) = Φ(tf .1) is controllable (see [8]) from t = t0 to t = tf . τ )dτ W −1 (t0 .5) (4. (4.2 The linear dynamic system (3.1 The control function u(·) is drawn from the set of piecewise continuous m-vector functions of t on the interval [t0 . t0 )x0 . τ )B(τ )u(τ )dτ (4. t0 )x0 (4. t0 )x0 + t0 Φ(tf . τ ) = A(t)Φ(t. Terminal Equality Constraints Actually. Now suppose that u(·) is any other control function that steers x0 to the origin at time tf . without appealing to the Pontryagin Principle derived later in this section. tf )Φ(tf . Assumption 4. t)W −1 (t0 .. t0 )x0 − Φ(tf . where d Φ(t. ν. t0 )x0 + t0 Φ(tf . (4. x0 ) tf (4. it does not extend readily to the general case of (3.2. λ(·).2. (4. we have tf ∆J = t0 tf uT (t)u(t)dt − u (t)u(t)dt − T tf t0 tf t0 uoT (t)uo (t)dt u (t)u (t)dt − 2 oT o tf t0 = t0 tf uoT (τ )[u(τ ) − uo (τ )]dτ (4. While the above treatment is quite adequate when (4. x0 ) = J(u(·). adjoining (3.11).9) Subtracting these two equations yields tf t0 Φ(tf . tf ) yields tf 2 t0 uoT (τ )[u(τ ) − uo (τ )]dτ = 0. ν. τ )B(τ )[u(τ ) − uo (τ )]dτ = 0.11) Subtracting the optimal control energy from any comparison control energy associated with u(·) ∈ UT and using (4.12) = t0 [u(t) − uo (t)]T [u(t) − uo (t)]dt ≥ 0.14) . We therefore use the approach of Section 3.1) by means of a p-vector ν. Linear Dynamic System with Terminal Equality Constraints and tf 115 0 = Φ(tf . as follows: ˆ J(u(·). since for any u = uo the cost increases.2) and (4.4. x0 ) (4. τ )B(τ )uo (τ )dτ.1) by means of a continuously differentiable n-vector function of time λ(·) and (4.13) + t0 λT (t)[A(t)x(t) + B(t)u(t) − x(t)]dt + ν T Dx(tf ). λ(·). ˙ Note that ˆ J(u(·).3. which establishes that uo (·) is minimizing. x0 ) = J(u(·).10) and premultiplying by 2x0 ΦT (tf . t0 )W −1 (t0 . (4.2) holds.1). 18) = t0 ˙ εLx (xo (t). ε) dt − ελT (tf )z(tf .15). ν.2) and causes Dxo (tf ) = 0.17) (4. Note ˆ that the change in J is equal to the change in J if the perturbation εη(·) is such that the perturbation in the trajectory at time t = tf satisfies D[xo (tf ) + ξ(tf .e. Using (3. t)η(t) + ελT (t)z(t. we have ˆ ˆ ˆ ∆J = J(uo (·) + εη(·). η(·)) + εν T Dz(tf . (4. Integrating by parts we obtain ˆ J(u(·). η(·)) + εφx (xo (tf ))z(tf . x0 ) = J(u(·).19) . η(·)) + O(ε). λ(·). t)z(t. x0 ) − J(uo (·).18) is relaxed below where it is adjoined to the perturbed cost by the Lagrange multiplier ν through the presence of ν T Dx(tf ) in (4. uo (t).116 Chapter 4. Terminal Equality Constraints when (3. ε) = εDz(tf . ε)] = 0. We return to this point later.1) hold.15) + λ (t0 )x0 − λT (tf )x(tf ) + ν T Dx(tf ). uo (t). (4.8). λ(·). η(·)) + ελT (t)B(t)η(t) + O(t. η(·)) = 0. ν. it follows that (4. Let us suppose that there is a piecewise continuous control function uo (t) that minimizes (3.1) and (4.. ν. evaluate the change in J brought about by changing uo (·) to uo (·) + εη(·). η(·)) + εLu (xo (t). x0 ) tf + t0 T ˙ λT (t)x(t) + λT (t)A(t)x(t) + λT (t)B(t)u(t) dt (4.17) holds only if Dξ(tf .14) and noting that (4. λ(·). x0 ) tf (4. Referring to (3. η(·)) + ελT (t)A(t)z(t. i. uo (·) + εη(·) ∈ UT .16) ˆ Next. 4.2. Linear Dynamic System with Terminal Equality Constraints Now let us set ˙ −λT (t) = Lx (xo (t), uo (t), t) + λT (t)A(t), λT (tf ) = φx (xo (tf )) + ν T D. 117 (4.20) For fixed ν this is a legitimate choice for λ(·) as (4.20) is a linear ordinary differential equation in λ(t) with piecewise continuous coefficients, having a unique solution. The right-hand side of (4.19) then becomes ˆ ∆J = ε + t0 tf t0 tf Lu (xo (t), uo (t), t) + λT (t)B(t) η(t)dt O(t; ε)dt + O(ε). (4.21) We set η(t) = − Lu (xo (t), uo (t), t) + λT (t)B(t) T , (4.22) which yields a piecewise continuous perturbation, and we now show that, under a certain assumption, a ν can be found such that η(·) given by (4.22) causes (4.18) to hold. We first introduce the following assumption. ¯ Assumption 4.2.3 The p × p matrix W (t0 , tf ) is positive definite where (see [8]) ¯ W (t0 , tf ) = tf t0 DΦ(tf , τ )B(τ )B T (τ )ΦT (tf , τ )DT dτ = DW (t0 , tf )DT . (4.23) Note that this assumption is weaker than Assumption 4.2.2, being equivalent when p = n and is called output controllability where the output is y = Dx . From the linearity of (3.1) we have that z(t; η(·)) satisfies the equation z(t; η(·)) = A(t)z(t; η(·)) + B(t)η(t), ˙ z(t0 ; η(·)) = 0, (4.24) 118 so that tf Chapter 4. Terminal Equality Constraints z(tf ; η(·)) = t0 Φ(tf , τ )B(τ )η(τ )dτ, (4.25) and using (4.22) z(tf , η(·)) = − From (4.20) we have λ(t) = ΦT (tf , t)λ(tf ) + t tf tf t0 Φ(tf , τ )B(τ ) Lu (xo (τ ), uo (τ ), τ ) + λT (τ )B(τ ) T dτ. (4.26) ΦT (τ, t)LT (xo (τ ), uo (τ ), τ )dτ x = ΦT (tf , t)φT (xo (tf )) + ΦT (tf , t)DT ν x tf + t ΦT (¯, t)LT (xo (¯), uo (¯), τ )d¯. τ τ τ ¯ τ x (4.27) Premultiplying (4.26) by D and using (4.27), we obtain Dz(tf ; η(·)) =− tf t0 DΦ(tf , τ )B(τ ) LT (xo (τ ), uo (τ ), τ ) + B T (τ )ΦT (tf , τ )φT (xo (tf )) u x + B T (τ ) τ tf ΦT (¯, τ )LT (xo (¯), uo (¯), τ )d¯ dτ τ τ τ ¯ τ x (4.28) − tf t0 DΦ(tf , τ )B(τ )B T (τ )ΦT (tf , τ )DT dτ ν. Setting the left-hand side of (4.28) equal to zero we can, in view of Assumption 4.2.3, uniquely solve for ν in terms of the remaining (all known) quantities in (4.28). Consequently, we have proved that there exists a ν, independent of ε, such that η(·) given by (4.22) causes (4.18) to be satisfied. With this choice of η(·), (4.14) holds, and we ˆ have that the change in J is ˆ ∆J = −ε tf t0 Lu (xo (t), uo (t), t) + λT (t)B(t) 2 dt + tf t0 O(t; ε)dt + O(ε) ≥ 0. (4.29) 4.2. Linear Dynamic System with Terminal Equality Constraints 119 Using the usual limiting argument that O(t; ε)/ε → 0 and O(ε)/ε → 0 as ε → 0, then since the first-order term dominates for optimality as ε → 0, tf t0 Lu (xo (t), uo (t), t) + λT (t)B(t) 2 dt ≥ 0. (4.30) It then follows that a necessary condition for uo (·) to minimize J is that there exists a p-vector ν such that Lu (xo (t), uo (t), t) + λT (t)B(t) = 0 where ˙ −λT (t) = Lx (xo (t), uo (t), t) + λT (t)A(t), λT (tf ) = φx (xo (tf )) + ν T D. (4.32) ∀ t in [t0 , tf ], (4.31) As in Section 3.2.6, the necessary condition derived above can be restated in terms of the Hamiltonian to yield a weak version of Pontryagin’s Principle for the optimal control problem formulated with linear dynamics and linear terminal constrants. Theorem 4.2.1 Suppose that uo (·) minimizes the performance criterion (3.2) subject to the dynamic system (3.1) and the terminal constraint (4.1) and that Assumptions 3.2.1, 3.2.3, 3.2.4, 4.2.1, and 4.2.3 hold. Suppose further that H(x, u, λ, t) is defined according to (3.23). Then, there exists a p-vector ν such that the partial derivative of the Hamiltonian with respect to the control u(t) is zero when evaluated at the optimal state and control xo (t), uo (t), viz., Hu (xo (t), uo (t), λ(t), t) = 0 where ˙ −λT (t) = Hx (xo (t), uo (t), λ(t), t), λT (tf ) = φx (xo (tf )) + ν T D. (4.34) ∀ t in [t0 , tf ], (4.33) 120 Chapter 4. Terminal Equality Constraints 4.2.2 Pontryagin Necessary Condition: Special Case We return to the special case specified by (4.2). From (4.7) it follows that uo (·) given by (4.6) steers x0 to the origin at time tf , and from (4.12) we concluded that uo (·) is in fact minimizing. We now show that uo (·) satisfies Pontryagin’s necessary condition. Indeed, by substituting (4.6) into (4.28), ν becomes with D = I ν = 2W −1 (t0 , tf )Φ(tf , t0 )x0 . Also, from (4.27) and with D = I, λ(t) = 2ΦT (tf , t)W −1 (t0 , tf )Φ(tf , t0 )x0 . Finally, Hu (xo (t), uo (t), λ, t) = 2uoT (t) + 2B T (t)ΦT (tf , t)W −1 (t0 , tf )Φ(tf , t0 )x0 , which, by (4.6), is zero for all t in [t0 , tf ]. (4.37) (4.36) (4.35) 4.2.3 Linear Dynamics with Nonlinear Terminal Equality Constraints We now turn to a more general terminal constraint than (4.1), viz., ψ(x(tf )) = 0, (4.38) where ψ(·) is a p-dimensional vector function of its n-dimensional argument, which satisfies the following assumption. Assumption 4.2.4 The p-dimensional function ψ(·) is once continuously differentiable in x. As a consequence of Assumption 4.2.4 we can write ψ(xo (tf ) + εz(tf ; η(·))) = ψ(xo (tf )) + εψx (xo (tf ))z(tf ; η(·)) + O(ε). (4.39) 4.2. Linear Dynamic System with Terminal Equality Constraints 121 The change in J, caused by a perturbation εη(·) in the control function which is so chosen that ψ(xo (tf ) + ξ(tf ; ε)) = 0, (4.40) is therefore given by (4.19) with D replaced by ψx (xo (tf )). We can then proceed to set ˙ −λT (t) = Lx (xo (t), uo (t), t) + λT (t)A(t), λT (tf ) = φx (xo (tf )) + ν T ψx (xo (tf )) to yield, as before, the change in J as given by (4.21). If we then specify η(·) by (4.22), we cannot directly show (i.e., by solving for ν using elementary mathematics) that there exists a ν such that, when uo (·) + εη(·) is applied to (3.1), (4.40) is satisfied; deeper mathematics6 is required. However, it follows directly from (4.28) that we could calculate a ν such that ψx (xo (tf ))z(tf ; η(·)) = 0 (4.42) (4.41) by replacing in (4.28) D by ψx (xo (tf )). In other words, we could show that there exists a ν such that uo (·) + εη(·) satisfies (4.40) to first-order in ε. If we then replace the requirement (4.40) by (4.42) we would arrive at Theorem 4.2.1 with D replaced by ψx (xo (tf )). Although this is in fact the correct Pontryagin necessary condition for this problem, we would have arrived at it nonrigorously, viz., by replacing (4.40) with (4.42), without rigorously justifying the satisfaction of the equality constraint to only first order. A form of the Implicit Function Theorem associated with constructing controls that meet nonlinear constraints exactly [38] is needed. [28] gives some insight into the issues for nonlinear programming problems. 6 122 Chapter 4. Terminal Equality Constraints In view of the above, we prefer to show that Pontryagin’s Principle is a sufficient condition for “first-order optimality”; this permits a rigorous treatment. Therefore, as stated in Section 4.1, the emphasis of this book, as far as proofs are concerned, swings now to developing sufficient conditions for first-order optimality. Reference is made to rigorous proofs, where these exist, that the conditions are also necessary for optimality. As outlined above, it is the nonlinearity of the terminal constraint that makes our straightforward approach inadequate for the derivation of Pontryagin’s Principle. Once a nonlinear terminal constraint is present and we have opted for showing that Pontryagin’s Principle is sufficient for first-order optimality, nothing is gained by restricting attention to the linear dynamic system (3.1). Consequently we treat the problem of minimizing (3.2) subject to (3.41) and (4.38). 4.3 Weak First-Order Optimality with Nonlinear Dynamics and Terminal Constraints We allow weak perturbations in the control of the form uo (·) + εη(·), where η(·) is a piecewise continuous m-vector function on [t0 , tf ]. We prove that the conditions for the weak form of the Pontryagin Principle are sufficient for weak first-order optimality, which is defined as follows. Definition 4.3.1 J, given in (3.2), is weakly first-order optimal subject to (3.41) and (4.38) at uo (·) if the first-order change in J, caused by any weak perturbation εη(·) of uo (·), which maintains (4.38) to be satisfied or uo (·) + εη(·) ∈ UT , is nonnegative. Definition 4.3.2 J, given in (3.2), is weakly locally optimal subject to (3.41) and ¯ (4.38) at uo (·) if there exists an ε > 0 such that the change in J is nonnegative for and not first-order. t)η(t) + O(t. (4.43) 4. x0 ) = J(u(·).4. η(·)) + εν T ψx (xo (tf ))z(tf .3. this must be owing to higher-order. ν. u(t). λ(·). η(·)) + ελT (t)z(t. ν. x0 ) = J(u(·). x0 ) + T tf t0 T ˙ λT (t)x(t) + λT (t)f (x(t). effects.38) satisfied. uo (t). Using (3. η(·)) + O(ε).46) + εHu (xo (t). . x0 ) − J(uo (·). we adjoin the system equation (3. in J) brought about by changing uo (·) to uo (·) + εη(·) keeping (4. (4. x0 ) whenever (3.13) and (4. x0 ) tf = t0 ˙ [εHx (xo (t). ν.38) and which satisfy η (·) ≤ ε for ¯ ¯ ¯ all t in [t0 .e. Weakly first-order optimal merely implies that if J is not weakly locally optimal at uo (·). Following the approach in (4. λ(·).44) + λ (t0 )x0 − λ (tf )x(tf ) + ν T ψ(x(tf )).52) this change is ˆ ˆ ˆ ∆J = J(uo (·) + εη(·). ε)] dt − ελT (tf )z(tf . λ(·)..38) to J and integrate by parts to obtain ˆ J(u(·).1 Sufficient Condition for Weakly First-Order Optimality Weakly first-order optimal does not usually imply weakly locally optimal. t)z(t.38) are satisfied. and it follows that ˆ J(u(·).41) and (4. or ˆ ¯ ∆J ≥ 0 ∀ uo (·) + η (·) ∈ UT . η(·)) + εφx (xo (tf ))z(tf . ν. λ(t).45) ˆ We now evaluate the change in J (i. uo (t).41) and the terminal constraint (4. λ(·). Weak First-Order Optimality 123 all perturbations η (·) of uo (·) which maintain (4.15). λ(t). t) dt (4. tf ]. η(·)) (4.3. Here. zero) to first order in ε is that Hu (xo (t).49) and whenever φ or one of its derivatives appears in a formula.1 Although the derivation of the change in J is similar to that given in Chapter 3. where λ0 ≥ 0. if we set Chapter 4.3. u. All that was necessary was to note that if (4.3. λT (tf ) = φx (xo (tf )) + ν T ψx (xo (tf )). The statement of the principle is then that if uo (·) minimizes J subject to the . because we have derived a sufficient condition for weak first-order optimality. λ. t) = 0 ∀ t in [t0 .47) we see that a sufficient condition for the change in J to be nonnegative (in fact. t).48) ˆ Remark 4. it was not necessary to actually construct a perturbation εη(·) which maintains satisfaction of (4. we introduce the notion of normality.38). Remark 4. Before we state our results. λ(t). then J is nonnegative to first order for any weak perturbation that maintains satisfaction of (4. because no terminal constraints had to be maintained in the presence of arbitrary variations. uo (t). we could conclude that the first-order conditions were necessary for weakly local optimality. (4. λ0 .124 Now. t) + λT f (x. t) = λ0 L(x.50) (4. (4. (4.48) holds. tf ]. λ(t). the Hamiltonian H is defined as H(x. uo (t). Terminal Equality Constraints ˙ −λT (t) = Hx (xo (t). it is premultiplied by λ0 .38). t). u. to which we refer.2 (Normality) In the derivations of Pontryagin’s Principle as a necessary condition for optimality. u. 4 are satisfied. (4.51) Moreover.3. the above condition is sufficient for J to be weakly first-order optimal at uo (·). the problem is referred to as “normal. tf ]. (4. It is evident from the proofs of the necessary conditions in Chapter 3 that every free end-point control problem is normal and the control problem with linear dynamics and linear terminal constraints is normal if Assumption 4. uo (t). 3. the problem is normal.3. λ(t).53) ∀ t in [t0 . and H(x. uo (t).1.41) and (4.1 Suppose that Assumptions 3. Suppose further that uo (·) ∈ UT minimizes the performance criterion (3. λT (tf ) = φx (xo (tf )) + ν T ψx (xo (tf )).3.4.1. and 4. Then there exists a p-vector ν such that Hu (xo (t).3 is satisfied. λ(t). then there exist λ0 ≥ 0 and ν not all zero. t).3.” Throughout this book we assume that when we refer to statements and proofs of Pontryagin’s Principle as a necessary condition for optimality. Weak First-Order Optimality 125 dynamic system. 4. t) = L(x. t) = 0 where ˙ −λT (t) = Hx (xo (t).2.3. u. t) + λT f (x. λ0 does not appear in any of our theorems.52) (4. 3. . u. t). u. 3.54) (4.2) subject to (3. such that certain conditions hold.2. λ.2.2. Theorem 4. consequently. If on an optimal path the principle can be satisfied upon setting λ0 = 1 (any positive λ0 can be normalized to unity because of the homogeneity of the expressions involved).2.38). control and terminal constraints.4.2. tf ] subject to a specified terminal altitude and a specified terminal vertical velocity component.1.1 (Rocket launch example) Let us consider the problem of maximizing the terminal horizontal velocity component of a rocket in a specified time [t0 . ˙ ˙ (4. 10.126 Chapter 4.1: Rocket launch example. Example 4. is beyond the scope of this book. Upon assuming that the problem is normal. as stated in Section 4.2. h x2d T u max v r Figure 4. the above conditions result.1. A simplified mathematical model of the vehicle is ˙ x1 (t) = r = x3 (t). ˙ ˙ x4 (t) = w = T sin u(t) − g. The second part of the theorem is proved in Section 4. 38].3.3. ˙ x3 (t) = v = T cos u(t). 25.3. ˙ ˙ x2 (t) = h = x4 (t). This launch is depicted in Figure 4. Terminal Equality Constraints Proof: A rigorous proof that Pontryagin’s Principle is a necessary condition for optimality. Rigorous proofs are available in [44.55) . λ. u(t) is the inclination of the rocket motor’s thrust vector to the horizontal. then. g is the (constant) gravitational acceleration.59) ˙ −λ3 (t) = λ1 (t). From (4. Weak First-Order Optimality 127 where x1 (t) is the horizontal component of the position of the vehicle.56) where x2d and x4d are. x2 (t0 ) = x20 . tf ] to minimize J(u(·). (4. u. ˙ −λ2 (t) = 0.54) we have H(x. x3 (t0 ) = 0. x4 (t0 ) = 0.58) (4. ˙ −λ4 (t) = λ2 (t).3.4. and from (4.60) (4. at time t. is to determine u(·) in the interval [t0 . x3 (t) is its horizontal component of velocity.53) ˙ −λ1 (t) = 0. λ3 (tf ) = −1.52) and (4. The problem.57) (4. We suppose the following initial conditions for the rocket at time t = t0 : x1 (t0 ) = x10 . . λ2 (tf ) = ν2 . respectively. x4 (t) is its vertical component of velocity. x4 (tf ) = x4d . t0 ≤ t ≤ tf . and T is the constant specific thrust of the rocket motor. x0 ) = −x3 (tf ) subject to the terminal constraints x2 (tf ) = x2d . (4. x2 (t) is the altitude or vertical component of its position. λ1 (tf ) = 0. the desired altitude and vertical velocity. t) = λ1 x3 + λ2 x4 + λ3 T cos u + λ4 T sin u − λ4 g. λ4 (tf ) = ν4 . 62) = arctan [−ν4 − (tf − t)ν2 ] . as was the case . it is not possible to check directly whether (4. Numerical Optimization with Terminal Constraints: The Penalty Function Approach When terminal constraints (4. t) = −λ3 T sin u + λ4 T cos u.62). the two-point boundary-value problem outlined in Section 3. given a candidate pair (˜(·).2 it was possible to find an explicit expression determining ν. λ.60) we conclude that λ1 (t) = 0.2. (4.128 From (4. in general this is not possible for nonlinear constraints. using (4.51) is satisfied.38).55). which is necessary for the minimization of (4.57).4 is more complex because of the presence of the unknown parameters ν in the expression for λ(t). We conclude that. However. (4. Whereas in Section 4.63) gives the form of the control as a function of t.61) (4.38) are present.3. x(·)) which satisfies u ˜ (4. What is important to note here is that the form of u(·) has been determined. λ3 (t) = −1.1. the terminal equality constraints (4. λ4 (t) = ν4 + (tf − t)ν2 for t in [t0 .51) is satisfied if and only if uo (t) = arctan λ4 (t) λ3 (t) (4. the optimal solution needs to be resolved numerically.59) we have Hu (x.58) are satisfied. Naturally. which satisfies Theorem 4.63) Equation (4. From (4. the free parameters ν2 and ν4 have to be chosen so that when (4. Terminal Equality Constraints λ2 (t) = ν2 . Chapter 4. Therefore. u. tf ].63) is applied to (4.3. this is the so-called penalty function method of satisfying terminal equality constraints [11].54).64) with respect to u(·).38).38) into an approximately equivalent unconstrained one by adding the term ρψ T (x(tf ))ψ(x(tf )) to J to form a new performance criterion. causes (4. Similar remarks hold when constructing a control function u(·) which satisfies ˜ (4.52) and (4. Choose a nominal control uN (t) and let xN (t) be the resulting path. x0 ) + ρψ T (x(tf ))ψ(x(tf )). yields (4. for ρ sufficiently large. numerically.64) with respect to u(·) tends to the minimum of (3. As indicated for a special case in Section 4. it can be shown that under fairly weak assumptions. ρ. t) ⇒ xN (·). as ρ → ∞. x0 ) = J(u(·). ψ(xN (tf )) = 0. ˙ (4. rather.3.51). Weak First-Order Optimality 129 in Section 3.2.4. one would have to determine. It turns out that engineers quite often sidestep the difficulty of determining the additional parameters ν by converting the control problem with a terminal constraint (4. . Indeed.65) This path may be nonoptimal and the terminal constraint may not be satisfied. This path satisfies the nonlinear dynamic equation xN (t) = f (xN (t).2) subject to (4.38) and Pontryagin’s Principle. one would have to resort to numerical techniques to determine both u(·) and ν. this is discussed further in Chapter 5.64) where ρ > 0. ˜ the class of linear quadratic optimal control problems allows an explicit solution. Here. (4.4. whether there exists a ν which. when used in (4. ¯ J(u(·).38) to be approximately satisfied. the minimum of (4. uN (t). Minimization of (4.2. Steepest Descent Approach to Terminally Constrained Optimization Problems The following steepest descent algorithm [11] is a method for finding a constrained minimum.3. (4. t)λψ (t). fixed δx(t0 ) = 0.66) where δu(·) = εη(·). uN (t). uN (t).72) . uN (t). t)λφ (t) − LT (xN (t).67) where δx(t) = εz(t. the change in the performance index is J(uN +1 (·). t))δu(t) dt. xo ) = δJ tf = φx δx(tf ) + t0 (Lx (xN (t). (4. t) t0 + λφ (t)T fu (xN (t). To construct these predictions. T λψ (tf ) = ψx (tf ). Then. uN (t). The objective is to predict how the control perturbation δu(t) will affect the cost criterion and the terminal constraint. xo ) − J(uN (·). η(·)). t). uN (t). t)δx(t) + fu (xN (t). (4. uN (t).130 Consider perturbations in uN (t) as Chapter 4.69) T where for x(t0 ). x (4.68) then. t)δu(t). t)δx(t) + Lu (xN (t). (4. Similarly. uN (t). t)δu(t))dt (4. t)δu(t) dt. Terminal Equality Constraints uN +1 (·) = uN (·) + δu(·). the first-order term in the Taylor expansion of the dynamics is δ x(t) = fx (xN (t).71) λφ (tf ) = φx (tf )T . where T ˙ λφ (t) = −fx (xN (t). the change in the terminal constraint is δψ(tf ) = ψx (tf )δx(tf ) = λψ (t0 )δx(t0 ) + T tf t0 λψ (t)fu (xN (t). uN (t). ˙ (4. uN (t). uN (t).70) tf = λφ (t0 )T δx(t0 ) + (Lu (xN (t). consider the influence function λψ (t) ∈ Rn×p associated with the terminal constraint functions as T ˙ λψ (t) = −fx (xN (t). uN (t).3.71).74) T T where ν is chosen so that a desired change in δψ(tf ) is met.73) was subtracted from (4.76) ψx (xN (tf ))δx(tf ) + t0 tf λψ (t)fu (xN (t). uN (t).77) .70) to obtain. Choose δu(t) =− λφ (t) + ν T λψ (t) fu (xN (t). uN (t). t) T T T . t)λφ (t)dt T λψ (t)fu (xN (t). (4. (4. uN (t). t)fu (xN (t). uN (t). t)δu(t))dt = 0 (4. after some manipulations. t)dt. t) + Lu (xN (t). uN (t). uN (t). t)δx(t) t0 + λφ (t)fu (xN (t). adjoin δψ(tf ) to δJ with a Lagrange multiplier ν ∈ Rp as δJ + ν T δψ(tf ) = λφ (t0 ) + ν T λψ (t0 ) δx(t0 ) tf T T + t0 λφ (t) + ν T λψ (t) (t)fu (xN (t). uN (t). uN (t). u (4. uN (t). t)LT (xN (t). uN (t). t) δu(t) dt. t)fu (xN (t).75) ψx (xN (tf ))δx(tf ) =− − then tf tf t0 tf t0 T λψ (t)fu (xN (t). uN (t). T T (4. To make an improvement in δJ and decrease the constraint violation. t) λφ (t) + ν T λψ (t) T T T T dt λψ (t)T fu (xN (t). uN (t). Weak First-Order Optimality and the zero term tf t0 131 d T λφ (t)δx(t) dt − dt T tf (−Lx (xN (t). t) + Lu (xN (t). t)LT (xN (t). t)dt u T + =− t0 tf t0 T λψ (t)fu (xN (t). uN (t). (4. uN (t). uN (t). t)λψ (t)dt ν. t)fu (xN (t).4. t) + Lu (xN (t). Choose the desired change in the terminal constraint δψ(tf ) and the desired change in the cost through the choice of . t) are orthogonal. T where the inverse exists from Assumption 4.72) backward from its terminal condition to the initial time t0 and store the values over the trajectory. The above algorithm is summarized in the following steps: 1. t) + Lu (xN (t). tf ]. uN (t). If δψ(xN (tf )) = ψx δx(tf ) = 0 and δx(t0 ) = 0. . uN (t). then note from (4. uN (t). These choices are made small enough to retain the assumed linearity but large enough for the algorithm to converge quickly. uN (t).69) that the functions λψ (t)fu (xN (t). uN (t). t)dt u (4. t) and (λφ (t) + ν T λψ (t))fu (xN (t).132 Solve for ν as ν = − tf Chapter 4. uN (t). uN (t). Choose the nominal control uN (t) over the interval t ∈ [t0 .68) and (4. uN (t). Store the values over the trajectory.2. t)λψ (t)dt tf t0 × ψx (xN (tf ))δx(tf )/ + tf λψ (t)fu (xN (t). ˆ δ J = δJ + νδψ(tf ) = − tf t0 T T T λφ (t) + ν T λψ (t) fu (xN (t). 4. t)fu (xN (t). t)λφ (t)dt . Integrate (4.79) and therefore the cost becomes smaller on each iteration.65) forward from the initial conditions to the terminal time tf to obtain the nominal state xN (t). uN (t). t)fu (xN (t).78) T + t0 T λψ (t)fu (xN (t). uN (t). 2 T T (4.3 and D = ψx (xN (tf )). uN (t). 2. Terminal Equality Constraints λ t0 ψT −1 T (t)fu (xN (t). In any case. t) dt. t)LT (xN (t). 3. Integrate (4. 81) However.75) and form a new nominal control as given in (4. t) ≤ H(xo (t).51) replaced by (4. (4. ¯) and (t + ε. ε) =  εη(t).e.4) which maintains (4. Indeed.82) .. λ(t).4.4 Strong First-Order Optimality It is clear that if H(xo (t).80) are also sufficient conditions for J to be weakly first-order optimal at uo (·). uo (t). It is clear also that (4. Compute the perturbation δu(t) from (4. Consequently. ti + εδi ] . λ(t). Check to see if the actual change in the constraints and cost are close to the predicted values of the perturbed constraint (4.4. This is where the choices of and δψ(tf ) are checked to determine if they are made small enough to retain the assumed linearity but large enough for the algorithm to converge quickly. u(t). t ≤ t ≤ t + ε. (4. t) ∀ u(t) ∈ UT .51)–(4. t (4.68) and (4.54) with (4. i=1 (4. N. we shall permit the larger class of strong perturbations η(·. 4.38). i.72). ε) given by   η(t). n η(t. ti ≤ t ≤ ti + εδi . 6.69) and perturbed cost (4. The integrals in (4. . . ε) which is made up of a weak perturbation εη(·) and a strong perturbation (Section 3.66) and repeat step 2.78) can be computed along with the influence functions obtained from (4.80) then (4.71) based on the assumed linear perturbation theory.81). there are more elaborate strong perturbations than (4. tf ]. ε) = ¯ ¯ η(t). i = 1. ¯ εη(t) in [t0 . .80) ensures that J is optimal to first order for a perturbation η(·. t ∈ I = [t0 . Compute ν from (4.51) is satisfied. . Strong First-Order Optimality 133 5. a perturbation of the form η(t. tf ] − [ti .78). Strong first-order optimality merely implies that if J is not strongly locally optimal at uo (·). Definition 4. is nonnegative.. λ(·).4. This richer class is used to satisfy the terminal constraints while optimizing the cost criterion. x0 ) whenever (3.41) and (4. The intervals [ti . since N. t) + λT (t)x(t) dt + φ(x(tf )) + ν T ψ(x(tf )) + λT (t0 )x0 − λT (tf )x(tf ).2) is strongly first-order optimal subject to (3. Note that. and is piecewise continuous on the remaining subintervals of [t0 .2 J of (3. x0 ) = tf t0 ˙ H(x(t).4. λ(t). (4. strong first-order optimality does not usually imply strong local optimality. We now evaluate ˆ the change in J (i.41) and (4. We have the expression ˆ J(u(·). ν. may be contiguous.1 J of (3. δi > 0.38) at ¯ uo (·) if there exists an ε > 0 such that the change in J is nonnegative for all pertur¯ bations η(·.38) are satisfied.82). in J) caused by changing uo (·) to uo (·) + η(·. Definition 4.81). . tf ]. effects. .83) which is equal to J(u(·). tf ]. although nonoverlapping. Terminal Equality Constraints where η(·) is continuous on the nonoverlapping intervals [ti . η(·. and which satisfy ξ(t. We then have the following definitions.38) at uo (·) if the first-order change in J. ε)) ≤ ε. ti +εδi ]. u(t).38). N . ε) of the form (4. and not first-order. similar to the weak first-order optimality.41) and (4. and η(·) are arbitrary. which maintains (4. . . ε) of uo (·) which maintain (4. caused by any strong perturbation η(·. ti + εδi ].e. This class is much larger than (4. ε) while keeping . i = 1.134 Chapter 4.38). this must be because of higher-order. for all t in [t0 .2) is strongly locally optimal subject to (3. u. . ε)) − φ(xo (tf )) + ν T ψ(xo (tf ) + ξ(tf .4. ε)). (4. However. λ(t). t) + λT (t)ξ(t. That is. ν. but u(·) + η(·) ∈ UT . ε)) = εz(t. ti + εδi )] +ε I Hu (xo (t). λ(ti + εδi ). ε) ∀ t in [t0 . stated below in this section. λ(·).84). t) ˙ − H(xo (t).4. x0 ) − J(uo (·). uo (ti + εδi ). ε). this assumed continuous differentiability of H(xo (t). λ(·). Along the lines of Sections 3. All the functions have been assumed continuously differentiable in all their arguments. η(·)) + O(t. uo (ti + εδi ) + η(ti + εδi ). λ(t).86) is nonnegative to first order in ε if (4.84) (4. t) and f (x. to obtain N ˆ ∆J = ε i=1 δi [H(xo (ti + εδi ). λ. ε). λ. tf ].80) implies Hu (xo (t). uo (t). ε)).1. uo (t). and t but only continuous in u.38) satisfied. In this way we can find a weak variation as u(·) + εη(·) ∈ UT that satisfies the variation in the terminal constraints. L(x. as in Sections 3. It follows that (4. u. λ(t). uo (t). ε)) dt + φ(xo (tf ) + ξ(tf . one can expand (4.80) holds.4. uo (t) + η(t. t)η(t)dt + tf t0 O(t. x0 ) tf 135 = t0 H(xo (t) + ξ(t. We now assume that H(x. λ(t). ν. We have thus proved the second part of Theorem 4. η(·. ˆ ˆ ˆ ∆J = J(uo (·) + η(·.86) where the interval I is defined in (4. t) is differentiable in x. uo (t). one can show that ξ(t. (4. ε)) − ν T ψ(xo (tf )) − λT (tf )ξ(tf .4. η(·. ti + εδi ) − H(xo (ti + εδi ). Note that (4. λ(t). Strong First-Order Optimality (4.3 and 3.3 and 3. t) are assumed continuous and not differentiable in u.4. t) = 0.85) Using this and (4. ε)dt + O(ε). η(·. η(·. η(·. u.52). λ(ti + εδi ).82). t)η(t) is more restrictive than needed to prove strong first-order optimality with terminal constraints. the change ∆J becomes N ˆ ∆J = ε i=1 δi [H(xo (ti + εδi ). λ(t). λ(ti + εδi ).89) then N ˆ δJ = ε i=1 δi [H(xo (ti + εδi ). t).85). uo (ti + εδi ). uo (t). (4. ti + εδi )] + I [H(xo (t). uo (ti + εδi ). (4. uo (t) + εη(t). uo (t). λ(tf ) = φx (xo (tf )) + ν T ψx (xo (tf )). t) ˙ + εHx (xo (t). (4. we again obtain the expansion (4. t) − H(xo (t). η(·)) = εHx (xo (t). λ(t). Let T ˙ λ(t) = −Hx (xo (t). ti + εδi )] + I [H(xo (t). η(·)) tf + t0 O(t. η(·)) + O(ε). ti + εδi ) − H(xo (ti + εδi ). λ(t). ε) ∈ UT . for η(·. The change in J due to changing uo (·) to uo (·) + η(·. Using this expansion of the ˆ state. η(·)) + ελT (t)z(t. λ(t). uo (t). λ(t). uo (t) + εη(t). uo (t). λ(t). λ(ti + εδi ). uo (ti + εδi ) + η(ti + εδi ). ε)dt + O(ε). t)z(t.90) .84). ti + εδi ) − H(xo (ti + εδi ). (4.136 Chapter 4. λ(ti + εδi ). t)] dt. Due to the continuous differentiability of f with respect to x. λ(ti + εδi ). t)z(t. λ(t).82). η(·))]dt + ε[φx (xo (tf )) + ν T ψx (xo (tf )) − λ(tf )T ]z(tf . uo (ti + εδi ) + η(ti + εδi ). uo (t). ε) defined in (4. Terminal Equality Constraints The objective is to remove the assumed differentiability of H with respect to u ˆ in Equation (4.87) In the above we used the result εHx (xo (t). t)z(t. λ(t). t) − H(xo (t). is given in (4.86).88) deduced by continuity. uo (t) + εη(t). 2.4. λ(t). the above conditions result. and 4.2. t). and upon assuming normality. u. u(t). t) ≤ H (xo (t). a sufficient condition for δ J ≥ 0 is that H(xo (t). t).4. λ(t). u(·) ∈ UT . (4. Strong First-Order Optimality ˆ For uo (·) + η(·. t) ≤ H(xo (t). t) − H(xo (t). t) ≥ 0 or H(xo (t). and (4.4 are satisfied. Since a sufficiency condition for strong first-order optimality is sought. Rigorous proofs of the necessity of Pontryagin’s Principle are available in [44. Then there exists a p-vector ν such that H (xo (t).4. t) 137 (4. λ(t). λ(t). 3.4. u(t). The second part of the theorem is proved just before the theorem statement. λ(t).1. This completes the proof of Theorem 4. 38]. u. uo (t). u. λ(t). . and H(x. Theorem 4. uo (t).38).1. u(t). then only Assumption 4. t) + λT f (x.2.96) (4. tf ].4. λT (tf ) = φx (xo (tf )) + ν T ψx (xo (tf )).4. Suppose further that uo (·) ∈ UT minimizes the performance criterion (3. the above condition is sufficient for J to be strongly first-order optimal at uo (·). 25. 10. 3.93) Moreover. which is stated below.2.41).3. 4.95) ∀ t in [t0 . (4. λ.94) (4.2. uo (t).2) subject to (3. t) = L(x.1 is required rather than an explicit assumption on controllability.1 Suppose that Assumptions 3. uo (t). λ(t).92) for uo (·). t) where ˙ −λT (t) = Hx (xo (t).91) (4.1. ε) ∈ UT . 2. (4. Then there exists a p-vector ν such that H(xo (t). t) ≤ H(xo (t).38) and (3. (4. u(t). then the terms under the summation sign in (4. 4.2. uo (t).1. t) = εHu (xo (t).2. Terminal Equality Constraints 4. this does not introduce any difficulty into our straightforward variational proof that Pontryagin’s Principle is sufficient for strong first-order optimality. t)η(t) + O(ε).4. Again. 3. uo (t) + εη(t). described in Section 4. uo (t). λ(t). u(t) ∈ UBT . u(t) ∈ UT .4. The latter follows from (4.2) subject to (3. (4.2. we can state the following theorem. uo (t) + εη(t) ∈ UT . This follows because if (4. In view of the above. we define the set of bounded piecewise continuous control that meets the terminal constraints as UBT ⊂ UT . uo (t). 38] are essential.3.97) where uo (t).138 Chapter 4.4 are satisfied. λ(t).93) because H(xo (t). 3. Theorem 4. and the first-order term dominates for small ε.2 Suppose Assumptions 3.4. Suppose further that uo (·) minimizes the performance criterion (3.1 Strong First-Order Optimality with Control Constraints If u(t) is required to satisfy (3. 10. and 4. λ(t).99) ∀ t ∈ [t0 . t) where uo (t). The difficulty. of deriving necessary conditions for optimality when terminal constraints are present is increased by the presence of (3.106). λ(t).93) holds for uo (·). λ(t).98) . (4.2.106).4. tf ].3.41).90) are nonnegative. 25.1.106) and the techniques used in [44. t) − H(xo (t). 96). Problems with state-variable inequality constraints are more complex and are beyond the scope of this book.103) (4.1 to see what further insights Theorem 4.4. Strong First-Order Optimality 139 and λ(·) and H are as in (4.2 gives. t) = −λ3 T sin u + λ4 T cos u = 0.3.4. The extension to mixed control and state constraints is straightforward and can be found in [11].4.1 We have considered bounds only on the control variable. sin uo = −λ4 cos uo . Remark 4.2 states that H should be minimized with respect to the control. which implies that tan uo (t) = because of (4. Theorem 4. In this example the control is unrestricted so minimization of H implies the necessary condition Hu (x. from (4. Now Huu (x.4.94) and (4.99). However.102) λ4 (t) = −λ4 (t) λ3 (t) (4. the above condition is sufficient for J to be strongly first-order optimal at uo (·). (4.1 (Rocket launch example.101).4.4. Using the fact that. necessary conditions for optimality with state-variable inequality constraints can be found in [30]. λ.101) (4.61). u. Moreover. t) = −λ3 T cos u − λ4 T sin u = T cos u − λ4 T sin u. λ. subject to (4. Example 4. revisited) We first return to the example of Section 4.100) . u. globally minimizes H and hence also H(x.. λ.106) = (−T cos u + λ4 T sin u)2 + (T sin u + λ4 T cos u)2 = T 2 (1 + λ2 ).100). i. t) = T cos uo + T λ2 cos uo 4 = T (1 + λ2 ) cos uo > 0 for − 4 π π < uo < . uo .104) implies that H has a local minimum with respect to u when − π < uo < 2 π 2 which satisfies (4. 2 2 (4. because (4.100) holds. given by (4.e. ˜ H(uo ) = −T (1 + λ2 ) cos uo < 0. However. Terminal Equality Constraints Huu (x. This local minimum value of H. viz. From (4.101). x0 ) = min u t0 u3 (t)dt (4. λ.104) Inequality (4.108) . The conclusion is that (4.107) we deduce that H 2 achieves its global maximum value when Hu = 0.105) (4.2 Our next illustrative example has a weak first-order optimum but not a strong first-order optimum. is obtained when uo is substituted into ˜ H(u) = −T cos u + λ4 T sin u. (4.140 we see that Chapter 4.106) corresponds ˜ ˜ to H taking on a negative value.107) ˜ which is independent of uo .4. uo .59) that do not depend on u.63) satisfies (4.98). u.. Example 4. t). 4 (4. 4 Now examine 2 ˜ ˜2 ˜ H 2 + Hu = H 2 + Hu (4. ignoring the terms in (4. Consider tf J(u(·). u. uo (t). t).113) (4. u(t). λ(t). minu H .114) (4.111) (4. Here H(x. x(t0 ) = 0.110) (4. Strong First-Order Optimality subject to x(t) = u(t).3.2 then implies that J is not strongly first-order optimal at uo (·). ˙ and x(tf ) = 0. where x and u are scalars. however. Stated explicitly. λ. u. 141 (4.4. λ. t) = 0 ∀ t in [t0 . t) = 3u2 + λ.112) (4. λ(t). and taking ν = 0 we see that uo (t) = 0 ∀ t in [t0 .4. It follows from Theorem 4.1 that J is weakly first-order optimal at uo (·).4. From (4.115) Note. that u(t) does not minimize H(x(t). Theorem 4. t) = u3 + λu and ˙ −λ(t) = 0. which has no minimum with respect to u(t) because of the cubic term in u(t).111) Hu (x. tf ] causes Hu (xo (t). λ(tf ) = ν.109) (4. tf ]. x0 ) = = − tf t0 η 3 (t. ·) and the p-dimensional function ψ(·. ε) to be zero for all 0 < ε < tf − t0 so that (4.110) is maintained.6. tf ) and assume that the first partial derivatives are continuous.142 Chapter 4. 4. t)dt (4. ε). we can also conclude that (4.. This is easily confirmed directly by using the strong perturbation η(t. ε) = −(tf − t0 − ε) for t0 ≤ t ≤ t0 + ε. tf ) and ψ(x(tf ).118) . J(uo (·) + η(·. tf ) = φ(x(tf ). i. Terminal Equality Constraints as u → −∞ ⇒ J → −∞. x0 . ε)dt t0 +ε (tf − t0 − ε)3 dt + 3 tf t0 +ε ε3 dt = −(tf − t0 − ε) ε + (tf − t0 − ε)ε3 = (tf − t0 − ε)ε[ε2 − (tf − t0 − ε)2 ].108) is not strongly first-order optimal at uo (·). Furthermore. 0 < ε < tf − t0 . for small 0 < ε (4. we here allow φ and ψ to depend explicitly on the unspecified final time tf .1 The function φ(·. As (4.e. ·) are once continuously differentiable in x and t. tf ) + t0 L(x(t). ε for t0 + ε < t ≤ tf (4. viz. This perturbation clearly causes ξ(tf . which. we write these functions as φ(x(tf ).108) does not have a strong local minimum at uo (·). verifying that (4.5 Unspecified Final Time tf As in Section 3.116) that is added to uo (t) = 0.. We now derive a sufficient condition for tf J(u(·).110) is satisfied for all values of ε.5. u(t). Assumption 4. as in the next assumption. J = −ε(tf − t0 )3 + O(ε).117) t0 1. ti + εδi )] +ε I Hu (xo (t).38) is f satisfied. λ(ti + εδi ). using (4. ν. uo (ti + εδi ). λ(t). x0 . ε) with η(·.. λ(t). x0 . λ(·). ε). tf ) + λT (t0 )x0 − λT (tf )x(tf ).94). uo (t). x0 . to + ε∆) − J(uo (·). we obtain. to + ) + ν T ψtf (xo (to + ).82) and to to to + ε∆. tf ) = tf t0 (4. f Upon defining ˆ J(u(·).121) where to + denotes the constant immediately to the right of to (i. ν.e. λ(ti + εδi ).4. ti + εδi ) − H(xo (ti + εδi ). to + ε∆] as any continuous f f function with values in UT . we define uo (t) + η(t.5.119) ˙ H(x(t). . to + ) f f f f + H(xo (to + ). to + ) f f f f f tf + t0 O(t. f f If ∆ > 0. t) + λT (t)x(t) dt (4. the limit is taken f f of the functions as t → to from above). u(t). ε) defined in (4. uo (ti + εδi ) + η(ti + εδi ). ε)dt + O(ε).120) + φ(x(tf ). and we require that the perturbed control causes ψ(x(tf ). Note that weak and strong optimality are equivalent in the minimization of J with respect to the parameter tf . to ) f f N =ε i=1 δi [H(xo (ti + εδi ). subject to the constraint that (4. λ(·). (4. t)η(t)dt + ε∆ φtf (xo (to + ). We perturb uo (·) to uo (·) + η(·. ε) in the interval (to . Unspecified Final Time tf 143 to be strongly first-order optimal at uo (·).82)) ˆ ˆ J(uo (·) + η(·. λ(·). We can now state and prove the following f theorem. uo (to + ) + η(to + . to . tf ) + ν T ψ(x(tf ). tf ) = 0 at tf = to + ε∆. ε). ν. that (remembering η from (4. λ(to + ). u(t). u(t) ∈ UBT and H(x(t).1.119). using (4. uo (·). (3. (4.124) (4. u(t).125) (4.1. uo (t). .. t). (4. f f f f f (4. λ(t). and the continuity of H(xo (·). viz. For ∆ < 0 (4. ν.3. λ(to − ). it becomes −ε∆ φtf (xo (to − ).121) are nonnegative and that. λ(t). u(t). λ(to ). f f f f where uo (t). to ).126) (4. to − ) . to ) + ν T ψtf (xo (to ). to ) + ν T ψx (xo (to ).122) implies that the summation and the integral term in (4. uo (to ).122) Moreover.118) subject to f (3. to minimize the performance criterion (4. the conclusion is unchanged. to ) = φtf (xo (to ). 4.123) ∀t ∈ [t0 . Suppose further that uo (·). Then there exists a p-vector ν such that H(xo (t). t). tf ] (4. the above conditions are sufficient for J to be strongly first-order optimal at uo (·).2. uo (to ).122). ·). uo (t). t) ≤ H(xo (t). and 4.2.106) and (4.4. t) and the transversality condition Ω(xo (to ).5. λT (to ) = φx (xo (to ). λ(t).1 are satisfied. λ(t). t) + λT (t)f (x(t). Terminal Equality Constraints Theorem 4. ˙ −λT (t) = Hx (xo (t). f f f f f f f f Still. though. to ) = 0.1 Suppose that Assumptions 3.123). f Proof: For ∆ > 0. to .41). to − ) + H(xo (to − ). 3.121) is slightly different insofar as the ε∆ term is concerned. u(t).144 Chapter 4. uo (to − ). the term in ε∆ is also nonnegative. to − ) + ν T ψtf (xo (to − ). t) = L(x(t).5. λ(·). to ) f f f f f f f + H(xo (to ). 2 holds.4. (4.2. We also have the important condition that.6 Minimum Time Problem Subject to Linear Dynamics Of special interest is the linear minimum time problem: Minimize the time tf (i. f (4.129) x(t0 ) = x0 .132) is well defined if (the controllability) Assumption 4.128) (4.127) In this special case.132) λT (to ) = ν T . tf ]. Minimum Time Problem Subject to Linear Dynamics 145 Rigorous proofs of the necessity of Pontryagin’s Principle are available in [44. . tf ) = tf ) to reach the terminal point ψ(x(tf ). It follows that (4. the variational Hamiltonian is H(x. 10. λ. . tf ) = x(tf ) = 0 subject to the linear dynamic system x(t) = A(t)x(t) + B(t)u(t). the above conditions result. Upon assuming normality. i = 1. t) = λT (t)A(t)x(t) + λT (t)B(t)u(t).130) we obtain ˙ −λT (t) = λT (t)A(t). . u..130) where sign [σ] is defined in (3. φ(x(tf ). . .132) is referred to as a “bang-bang” control. From (4.e. m. 4. The optimal control uo (·) which satisfies (4. 38].131) (4. i f i = 1. and uo (t) = − sign BiT (t)λ(t) ∀ t ∈ [t0 . 25. to ]. . ˙ and the control constraint −1 ≤ ui (t) ≤ 1 ∀ t in [t0 . . (4.6. . (4. m.116). . f Conditions (4. (4. ν has to be determined numerically.135) ˙ −λ2 (t) = λ1 (t). then there exists a ν such that (4. In general. In this case ˙ −λ1 (t) = 0.139) imply that ν2 is ±1.1 (Minimum time to the origin: The Bushaw problem) By way of illustration we particularize (4. x2 (tf ) = 0.133) (4. (4.133) are satisfied. ˙ x2 (t) = u(t). f so that uo (t) = − sign λ2 (t) = − sign [ν2 + (to − t)ν1 ].128) and (4.134) subject to (4.129) to the Bushaw problem [11] x1 (t) = x2 (t).132). ˙ and −1 ≤ u(t) ≤ 1. f and 1 + uo (to )ν2 = 0.146 from (4.138) and (4.137) (4. f (4.138) (4. Chapter 4. Terminal Equality Constraints 1 + ν T B(to )uo (to ) = 0. f f Pontryagin’s Principle thus states that if a pair uo (·).131).127)–(4.123). x1 (0) = x10 . to minimizes f J(u(·). x0 ) = tf (4.139) . λ1 (to ) = ν1 . Example 4.6. x1 (tf ) = 0.129). and (4. λ2 (to ) = ν2 .136) x2 (0) = x20 . (4. f Then (4. can then be calculated easily to ensure that x(to ) = 0. ts ].6.2. to ] it can switch only once.4.135) with this form of switching control. it quickly becomes evident whether uo (·) should be +1 or −1 on [t0 . If the origin cannot be reached from the given initial condition x0 using a constant control (±1). where ts is the f switch time. tf ]. The other trajectories are translated parabolas. to ] or is ±1 in the interval [t0 . ts ] and ∓1 in the interval (ts .138) that if uo (·) switches during the interval [t0 . Minimum Time Problem Subject to Linear Dynamics 147 One can also immediately see from (4. Upon examining the solutions of (4. . where the switch curves are parabolas (x1 = −x2 /2 for u = −1 and 2 x1 = x2 /2 for u = 1). 2 u = −1 x2 3 2 1 Region II u = −1 −4 −3 −2 −1 −1 1 2 3 4 x1 Region I u=1 −2 −3 u=1 Figure 4. uo (·) either is constant (±1) for all t in f [t0 .2: Phase portrait for the Bushaw problem. The switch time. then this option is ruled out and the one-switch bang-bang control is the only possibility. This is best demonstrated in the phase portrait shown in Figure 4.139) can be verified. ts . Thus. u(t). tf ) = 0 and is unspecified for all x(tf ) and tf such that ψ(x(tf ). assuming (4. Then. t)] . We then have the following generalization of Theorem 3.148 Chapter 4.4.5.2.1.142) subject to the constraint u(t) ∈ UBT is such that Assumption 3. t) = min [L(x(t).118) subject to (3. 4. ·) minimizes (4. u. t) (4.5. If we now draw u(·). that satisfies (4. t ∈ [t0 .7 Sufficient Conditions for Global Optimality: The Hamilton–Jacobi–Bellman Equation When terminal equality constraints (4.119) are present and tf can be specified or free. t)f (x(t). . Due to (4.1.2 is satisfied.41).140) and (4. u(t).1 Suppose there exists a once continuously differentiable function V (·. u(t).140).140) (4. t) + Vx (x(t). tf ) for all x(tf ) and tf such that ψ(x(tf ).5. u(t).1.3. x0 ) is still expressed by (3.141) u(t)∈UBT V (x(tf ). (3.140). Proof: The proof follows as in Section 3. thus completing the proof.7. under Assumptions 3. the value of the cost criterion J(u(·).5. the Hamilton–Jacobi–Bellman (H-J-B) equation (3. and 4.140) and is equal to φ(x(tf ). in particular that of Theorem 3.123) can be generalized to −Vt (x(t).1.118). Vx . Terminal Equality Constraints 4. Theorem 4. u(·) ∈ UBT .5. tf ) when ψ(x(tf ).141) is nonnegative and takes on its minimum value of zero when u(t) = uo (x(t). the integrand of (4. tf ].106).1 the control function uo (xo (·). ·) of x and t. tf ) = 0.119) and V (x0 . t)f (x(t).141) hold. and (4.1. t) ∈ UBT . then in the ˆ proof of Theorem 3. t) that minimizes H(x. t) = L(x(t). Suppose further that the control uo (x(t). t) + Vx (x(t). tf ) = 0. t0 ) is equal to the minimum value of (4. tf ) = φ(x(tf ). (4. (4. t) = 2xT (t)ΦT (tf . V (x(tf ).145) is optimal. t)W −1 (t0 . for completeness. t0 )x0 .7. i. (4. t0 )x0 . this is as required by Theorem 4.7. reduces to V (x0 . also as required by Theorem 4. t0 )W −1 (t0 . This proves by Theorem 4.7) drives the linear dynamic system to the origin at t = tf . t0 )x0 .146) − xT ΦT (tf . tf )Φ(tf .7.1 that (4.147) is then given by V (x0 .2.4.144) xT ΦT (tf . u(t) (4. The right-hand side of (4. t0 )W −1 (t0 . t0 )x0 − t tf (4. t)W −1 (t0 .1 with D = I. τ )W −1 (t0 . tf )Φ(tf . t) = min uT (t)u(t) + Vx (x(t). t)B(t)B T (t)ΦT (tf .143) If we set V (x(t). that Vx (x(t). As φ(x(tf )) = 0.36). τ )B(τ )B T (τ )ΦT (tf . t0 )x0 (4. x0 ) = t0 uT (t)u(t)dt (4. t).e. t0 ) = xT ΦT (tf . from (4.143) with respect to u(t) yields uo (t) = −B T (t)ΦT (tf . t0 )W −1 (t0 . t)W −1 (t0 . tf )Φ(tf . t)W −1 (t0 . 0 we see that at t = tf .145) which by (4.. tf )Φ(tf . tf )Φ(tf . t) is just the λT (t) given by (4.1.143) then becomes 2xT (t)AT (t)ΦT (tf . t) [A(t)x(t) + B(t)u(t)] .148) . tf ) = 0 when x(tf ) = 0.4).144) and (4. 0 Note. In this case the H-J-B equation is −Vt (x(t). Minimizing the right-hand side of (4. Sufficient Conditions for Global Optimality 149 We illustrate the above theorem using the formulation in Section 4. the H-J-B equation is satisfied. tf )Φ(tf . The minimum value of tf J(u(·). t0 ) which. 0 which by (4.7. t0 )x0 dτ. tf )Φ(tf .1.144) is just −Vt (x(t). tf )Φ(tf . Figure 4.1.149) + x01 t1 + x02 and xs2 = t1 + x02 on + xs1 t2 + xs2 = 0. in more complicated problems it may be difficult to find an appropriate solution of (4. t) = t + x2 + [2(x2 + 2x1 )] 2 .2) is determined from t2 (t1 + x02 )2 1 + x01 t1 + x02 = − .6..125 the optimal value function is continuous. then ν2 = −1.141). t) = t − x2 + [2(x2 − 2x1 )] 2 .7. Since the transversality condition 1 + uo ν2 = 0 implies that in Region I with uo = +1. Terminal Equality Constraints Naturally.1 (Minimum time to the origin: Bushaw problem continued) Some additional insights are obtained by calculating the cost to go from a given point to the origin using the optimal control. this is the price that one pays for attempting to derive a globally optimal control function for nonlinear problems with terminal constraints. but not differentiable.5 as shown in Figure 4. t1 . i.2 and x1 is allowed to vary. we obtain the optimal value function: in Region I : in Region II : V (x1 . t1 + t2 . 2 1 1 (4. However. x2 . Note that in going from Region I to II at x1 = 1. for the Bushaw problem presented in Section 4. . t). 2 2 Once t1 is obtained. and then the time t2 to the origin given by Summing the two times. The time. off the switch curves the H-J-B equation does apply. x2 .3 gives a plot of the optimal value function where x2 is held fixed at −1.151) where x1 . and in Region II with uo = −1. and t denote an arbitrary starting point. Therefore. then ν2 = 1. the switching curve. the optimal value function V (x1 . Note that for small changes in x1 and x2 across the switch curve produces a large change in ν2 .e. However. we can obtain the xs1 = t2 1 2 t2 2 2 (4. Example 4. x2 . to reach the switch curve from a point (x01 . 2 V (x1 . along the switch curve the H-J-B equation is not applicable.150 Chapter 4. x02 ) off the switch curve in Region I (see Figure 4.150) (4. x2 .140) and (4. However.151). the path is not controllable.5 x1 2 2. x2 ) 3 2 1 0 0 0. Sufficient Conditions for Global Optimality 151 5 4 V (x1 .5 3 Figure 4. In the next chapter a normality condition requires that the system be controllable about any extremal path. .5 1 1.3: Optimal value function for the Bushaw problem. It has been suggested in [32] that along the switch curve the Lagrange multiplier λ(t) is related to the derivative of the unconstrained optimal value function along the switch curve and related to the derivative of the constrained optimal value function in Regions I and II given in (4. Since there are no switches once on the switch curve.7.4. which although a local theory does not suffer from this pathological defect.150) and (4. The optimal value function in the Bushaw example is not differentiable at the switch curves but does satisfy the H-J-B equation everywhere else. this lack of differentability over a manifold in certain classes of optimization problems has been one drawback for the application of H-J-B theory and the elevation of Pontryagin’s Principle [38]. t ∈ [0.152 Chapter 4. Find the minimizing path. 2 x2 (0) = 0. The dynamics after first-stage burnout (t = tb ) are x1 = f1 (x1 . Consider the problem of finding u(·) that minimizes 3 J= 0 |u|dt subject to x1 = x2 . ˙ |u| ≤ 1. x1 (tf ) = 0. ψ(x1 (tf 1 ). Consider a two-stage rocket where. x2 (0) = 0. Terminal Equality Constraints Problems 1. ˙ The problem is to maximize J = φ(x2 (tf 2 )) . t ∈ (tb . ψ(x2 (tf 2 ). tf 1 ]. t). 2. tb ]. after burnout (t = tb ) of the first stage. t). Find the minimum time path. ˙ x2 = f2 (x2 . u2 . ˙ x2 = ˙ u2 . tf 2 ) = 0. tf 2 ]. x1 (3) = 2. x2 (3) = 0. the second stage is to go into a prescribed orbit and the first stage must return to a prescribed landing site. u. and terminal time for x1 = u. control. t). tf 1 ) = 0. x1 (0) = 1. 3. t ∈ (tb . ˙ x2 = u. u1 . ˙ x(0) = x0 = given. x2 (tf ) = 1. Let the dynamic equations during the first-stage boost be x = f (x. x1 (0) = 0. 153) (4. 1 2 tf . the states are continuous. .152) where x10 and x20 are given. ˙ |u| ≤ 1. x(t− ) = x1 (t+ ) = x2 (t+ ). (4. i. Show that the solution converges to the solution to the Bushaw problem as γ1 → ∞ and γ2 → ∞. ˙ x2 = u. Sufficient Conditions for Global Optimality 153 subject to the above constraints. b b b Determine the first-order necessary conditions for a weak local minimum.4. 4. The state equations are x1 = x 2 . Consider the Bushaw problem (free final time) with penalty functions on the terminal states [11]: min γ1 x2f + γ2 x2f + tf ..154) (4. At burnout.155) Determine the optimal final time and the optimal control history.7.e.u (4. . 8. where the dynamic equations and terminal constraints are linear and the performance criterion is a quadratic function of the state and control variables. This solution forms the basis of modern control synthesis techniques because it produces controllers for multi-input/multi-output systems for both time-varying and time-invariant systems [11. 2. 32. 34. 42]. 11. our derivation of the solution to the LQ problem is very natural. 20].155 Chapter 5 Second-Order Local Optimality: The Linear Quadratic Control Problem Introduction Probably the most used result in optimal control theory is that of the solution to the linear quadratic (LQ) problem. 34]. 20. 19. among others) through a derivation which makes explicit use of the symplectic property of Hamiltonian systems. 1. In this way. . Furthermore. The presentation here unites previous results (see [1. 22. 6. The solution to this problem produces a control variable as a linear function of the state variables. this problem also forms the basis of the accessory minimum problem in the calculus of variations [11. 26. t). λ(t). ˙ (5. J = φ(x(tf ).tf J. the second-order term in the expansion of the augmented cost criterion can be converted into an LQ problem called the Accessory Problem in the Calculus of Variations. tf ) + ν T ψ(x(tf ). The equation of motion is x = f (x. t) + λT f (x. This means that the second variation dominates over all other terms in the expansion of the cost. tf ) + tf t0 H(x(t). t) − λT (t)x dt. Conditions are also given for the second variation to be strongly positive. . (5. and free final time. evaluated along a local optimal path. terminal constraints. ˙ The problem is tf x(t0 ) = x0 given. t) = L(x.1) min u(t)∈UT .2) subject to the constraint ψ(x(tf ). tf ) + t0 L(x(t). LQ Control Problem The LQ problem is especially important for determining additional conditions for local minimality. As in earlier chapters. u. u.156 Chapter 5. (5. t). In particular. t)dt. ·. u. ·) is defined as usual as H(x. we include the equations of motion in the performance index through the use of a vector λ(·) of Lagrange multiplier functions and we append the final state constraints using a vector ν of Lagrange multipliers. tf ) = 0. 5. to be positive. u. λ. ·. u(t).3) where the Hamiltonian H(·. u(t).1 Second Variation: Motivation for the Analysis of the LQ Problem Let us return to our original nonlinear dynamical system with general performance index. The augmented performance index becomes ˆ J = φ(x(tf ). A significant portion of this chapter is devoted to determining the necessary and sufficient conditions for the second variation in the cost. ·) are all twice differentiable with respect to their arguments. ·). and mulf tipliers be defined as ∆x = x − xo . L(·. ∆λ = λ − λo . which produce the extremal values (xo (·).. x0 .2 There exists a pair of vector-valued functions and a scalar parameter (xo (·). ν o ) f = = to f t0 to f t0 ∆H − ∆(λT x) dt + ∆ φ + ν T ψ ˙ tf =to f Hx ∆x + Hu ∆u − λT ∆x + (Hλ − x)T ∆λ dt + φx + ν T ψx ˙ ˙ T + ∆ν ψ|tf =to f 1 + Ω|tf =to ∆tf + f 2 T to f t0 tf =to f ∆x ∆xT Hxx ∆x + 2∆xT Hxu ∆u dt + 1 2 ∆xT + ∆u Huu ∆u + 2 ∆xT Hxλ ∆λ + ∆λT Hλu ∆u − ∆λT ∆x ˙     T φxx + (ν T ψ)xx ψx ΩT x ∆x    ∆ν  ψx 0 ΩT  ∆ν T ∆tf  ν ∆tf Ωx Ων dΩ dt tf =to f + H.1) and minimize the performance f index subject to the terminal constraints over all admissible functions u(·).5.1. (5. we expand the augmented cost criterion in a Taylor series as ˆ ˆ ˆ ∆J = J(u(·). ν o .1. λ(·). ·. tf .T.5. λo (·). ·). x0 .1 The functions f (·. ν) − J(uo (·). and ψ(·.g. to ) that satisfy Equation (5. to .5) . e. Letting the variations in the state.. ·). Assumption 5. ·. uo (·). (5. Theorem 4. uo (·).1. ∆tf = tf − to . control. to ). ∆ν = ν − ν o . Motivation of the LQ Problem 157 Assumption 5. We assume that we have generated a locally minimizing path by satisfying the first-order necessary conditions.O.1.4) f where ∆ means a total variation and requiring that u(·) be an element of the set of admissible control functions. ∆u = u − uo . φ(·. λo (·). 1): x = Hλ . Using these expansions and integrating by parts where necessary. to ) = 0. ∆u(·) = η(·). that is. Recall the first-order necessary conditions for optimality (essentially from Theorem 4. (5. LQ Control Problem where Ω is defined in (4.123) and H. ∆ν. ˙ Hu = 0. (5. denotes higher-order terms. ). ψ(xo (to ).8) + O(t. Ω(xo (to ).1 There is no term ∆λT Hλλ ∆λ because H is linear in λ by definition. ν + O( ). terms that include three or more variational values multiplied together.6) These cause the first-order terms in the expansion to be zero. ). the variations in ∆λ(·). and ∆tf can also be expanded as ∆x(·) = ∆ν = z(·) + O(·. f f ˙ λT = −Hx .5. ˜ ˜ ∆λ(·) = λ(·) + O(·.O. ν. to ) = 0. Remark 5. the variation in the performance index becomes ˆ ∆J = λT (t0 )z(t0 ) + 21 to f t0 2 21 zT ηT  Hxx Hxu Hux Huu ψx Ωx z η 0 Ων ˜ + 2λT (fx z + fu η − z) dt ˙   ΩT  ν dΩ dt tf =to f + 2 to f t0  ˜ zT ν T ∆  T φxx + (ν T ψ)xx ψx ΩT x  z  ν  ˜ ∆  (5. .T. uo (to ). As we have shown for ∆x(·). f f f λT (to ) = φx + ν T ψx f t=to f .158 Chapter 5.7) ∆tf = ∆. 2 )dt + O( 2 ).1. 7). finally. which is equivalent to an LQ problem. in Section 5. we know that lim →0 O(·. ˜ ∆  (5.5 with terminal constraints and.8. Letting → 0.10) This second variation of the cost criterion is required to be positive definite for all variations about the assumed locally minimizing path (xo (·). 2 ) 2 =0 (5. Recalling our definitions of Equation (5. the terminally constrained LQ problem is now presented where ∆ = 0. to ) which satisfies only the f first-order necessary conditions and would be an extremal path. conditions for minimality can be determined. the LQ problem is a fixed-time problem. Then. this would contradict the assumed optimality of (xo (·).5.9) so that ˆ δ 2 J = lim →0 ˆ ∆J 2 to f t0 = 1 2 + zT ηT   ˜ zT ν T ∆  Hxx Hxu Hux Huu ψx Ωx z η 0 ˜ + 2λT (fx z + fu η − z) dt ˙   ΩT  ν dΩ dt t=to f 1 2 T φxx + (ν T ψ)xx ψx ΩT x Ων  z  ν . We first find conditions for the fixed-time second variation without terminal constraints to be positive definite in Section 5. uo (·). it is clear that ∆x(t0 ) = z(t0 ) = 0. uo (·). . Motivation of the LQ Problem 159 Since in our original problem the initial state is given. Therefore.1. making the term involving it disappear. conditions for positivity of the fixed-time second variation are given in Section 5.4. As commonly presented. From the second variation. to ). We will return to the more general problem in Section 5. If the f second variation can be made negative.8 for free-time and with terminal constraints. However. then there is another path neighboring to the optimal path that will give a smaller value of the cost criterion. 13) The initial condition comes from the requirement that ∆x(t0 ) = z(t0 ) + O(t0 . uo . ) = 0. (5. ˙ ˙ ˙ Expanding f about the nominal trajectory gives xo (·) + z(·) + O(·. uo . Consider Equations (5.13). Dividing through by equation z(·) = fx (xo .13) with the terminal constraint ψx (xo (tf ))z(tf ) = 0.16) .1) as Chapter 5.10) is precisely the augmented performance index we would have obtained for the problem min J(η(·)).12) to the → 0..T. ·) η + H. and again letting (5. we get the (5. uo . ·)η(·) + O(·. ·)η(·). ·)z(·) + fu (xo . t).14) where J(η(·)) = 1 2 + tf t0 zT ηT Hxx Hxu Hux Huu t=tf z η dt (5. ·) gives ˙ z(·) = fx (xo . uo . uo . LQ Control Problem x(·) = xo (·) + z(·) + O(·.160 we expand the equation of motion (5.11) (5.15) 1 T z (φxx + (ν T ψ)xx )z 2 subject to the equations of motion (5. ·)( z + O(·. We see that Equation (5. uo . ) = f (x. uo . )) ˙ ˙ + fu (xo . η(·) (5.10) and (5. ·)z(·) + fu (xo . u. ) = f (xo . ˙ where we have noted that all higher-order terms in the expansion include second or higher power.O. ·) + fx (xo . ˙ z(t0 ) = 0. uo . Subtracting out the zero quantity xo (·) = f (xo . 2 ). tf ]. In the next section the LQ problem is stated with a simpler notation but follows from the second variation developed in this section.15) with respect to Equations (5.18) .19) (5. Therefore. the performance index as a whole would be less than that of the nominal trajectory. x(t0 ). then the solution to this problem must be η(·) ≡ 0. subject to the linear dynamic constraint x(t) = A(t)x(t) + B(t)u(t). we note that if the nominal trajectory is truly minimizing. The minimization of Equation (5. 5. Otherwise.20) (5.2. (5. ˙ with initial condition x(t0 ) = x0 and terminal constraints Dx(tf ) = 0.2 Preliminaries and LQ Problem Formulation The problem of minimizing the quadratic performance criterion 1 T 1 J(u(·). x(t) ∈ Rn . and u(t) ∈ Rm . Preliminaries and LQ Problem Formulation 161 Further.17) with respect to u(·) and where t ∈ [t0 .13) and (5.5. t0 ) = x (tf )Sf x(tf ) + 2 2 tf t0 [xT (t)Q(t)x(t) + 2uT (t)C(t)x(t) + uT (t)R(t)u(t)]dt (5.16) is known as the Accessory Minimum Problem in the Calculus of Variations and is formulated as an LQ problem. there would exist some control variation η(·) and resulting variation z(·) in the state history for which the second term in the expansion of the performance index would be less than that for the nominal trajectory. in later sections. t0 . R(t).17)) is augmented by adjoining (5. tf ]. Assumption 5.18) by means of a continuously .18). no additional restrictions are required for Q(t) and Sf other than symmetry.20).4.6. on occasion throughout this chapter.2. and the linear terminal constraint of Equation (5. C(t).3 First-Order Necessary Conditions for Optimality In this section the first variation of the LQ problem is established. the linear dynamics of Equation (5. Assumption 5.17). LQ Control Problem where D is a p × n matrix that will be studied in detail. t0 ) (given by (5.15) with Equation (5. and B(t) are assumed to be piecewise continuous functions of time.162 Chapter 5. A(t). However.2 The control function u(·) belongs to the class U of piecewise continuous m-vector functions of t in the interval [t0 . special but important results are obtained by requiring that Q(t) and Sf be at least positive semidefinite. the initial time. R(t) = RT (t). the solution to the LQ problem of this section is the solution to the second variation or accessory minimum problem. when setting x0 = 0. Furthermore. The implication of relaxing this assumption to positive semidefinite R is discussed in Section 5. x(t0 ).1 Relate the quadratic cost criterion of Equation (5. and Sf = Sf .2. Remark 5. and T without loss of generality.16) with Equation (5. Initially.13) with Equation (5. is considered to be a variable and not a fixed value. To include the dynamic and terminal constraints explicitly in the cost criterion. The matrices Q(t). Q(t) = QT (t).1 The matrix R(t) > 0 and bounded for all t in the interval t0 ≤ t ≤ tf . J (u(·). Therefore. 5.2. ν. the variations are limited to those for which δx remains small.e.23) Integration of (5.. Denote the change in the control as δu(t) = u(t) − uo (t) and the resulting change in the state as δx(t) = x(t) − xo (t). t) = 1 T x (t)Q(t)x(t) + 2uT (t)C(t)x(t) 2 + uT (t)R(t)u(t) + λT (t) [A(t)x(t) + B(t)u(t)] .21) ˙ Note that ˆ J (u(·). (5. λ(t)u(t). define the variational Hamiltonian as H(x(t).3. x0 .18) and (5. 2 Suppose there is a control uo (·) ∈ U that minimizes (5. (5. t0 ) + t0 λT (t) [A(t)x(t) + B(t)u(t) − x(t)] + ν T Dx(tf ). (5. t0 ) when (5.22) = t0 ˙ H(x(t). ˆ we require conditions that guarantee that the change in J is nonnegative for all admissible variations. as ˆ J (u(·).17) and causes Dxo (tf ) = 0. Following the methodology laid out in section 4. δx(t) ≤ ε for all t ∈ [t0 . First-Order Necessary Conditions for Optimality 163 differentiable n-vector function of time λ(·) and (5.20) hold.23) ˆ J (u(·). ν.21) by parts and using (5.24) + 1 T x (tf )Sf x(tf ) + ν T Dx(tf ). ˆ This is done by evaluating changes in J brought about by changing uo (·) to u(·) ∈ U. t) + λT (t)x(t) dt + λT (t0 )x0 − λT (tf )x(tf ) (5. λ(·). strong variations in δu are ¯ . For convenience. ν. x0 . t0 ) = J (u(·). u(t). ν. t0 ) tf (5. λ(·). λ(t). i.5. x0 . However. t0 ) tf = J (u(·).4. x0 . λ(·). x0 . First.20) by means of a p-vector. tf ].25) The objective is to determine necessary conditions for which the cost is a minimum. ν. .26)   1.28) T T T − λT (tf )δx(tf ) + xo (tf )Sf δx(tf ) + ν T Dδx(tf ) tf T + t0 O(t. having a unique solution [15]. λ(·). LQ Control Problem (5. since (5. t0 ) ˆ ˆ = J (u(·). t ∈ I = [t0 .30) is a linear differential equation in λ(t) with continuous coefficients. .29) (5. i (5. [ti . λT (tf ) = xo Sf + ν T D. λ(·). x0 . For example. Therefore. . ε)dt + O(ε). λ(·). t0 ) − J (uo (·). .27) where ε > 0 is sufficiently small and {ti . n. ν. ˆ ˆ ∆J = ∆J(u(·). uo (·).30) For fixed ν this is a legitimate choice for λ(·).  ε. ε Now set ˙ −λT (t) = xo (t)Q(t) + uo C(t) + λT (t)A(t). ti ≤ t ≤ ti + εδi . ε) is piecewise continuous in t and O(t. ν. ti + εδi ] . T T T (5. x0 . ε) → 0 as ε → 0 for each t. t0 ) tf = t0 xo (t)Q(t)δx(t) + uo (t)C(t)δx(t) + xo (t)C T (t)δu(t) 1 T + δuT (t)R(t)δu(t) + uo (t)R(t)δu(t) 2 ˙ + λT (t)δx(t) + λT (t)A(t)δx(t) + λT (t)B(t)δu(t) dt (5. x0 . tf ] − i = 1. tf ] are arbitrary with small combined length. ti + εδi } ∈ [t0 . δu(·) can be chosen as δu(t) = ε(t)η(t). where η(t) ∈ U and ε(t) = Chapter 5.164 allowed. where the function O(t. . t)B(t)η(t)dt. z(t0 .3.30). λ(t). η(·)) + B(t)η(t). η(·)) = t0 Φ(tf . (Also see Section 4.23) as ˆ ∆J = + t0 tf t0 1 T δu (t)R(t)δu(t) + Hu (xo (t).) ¯ Assumption 5. tf ] Huu = R(t) > 0. This is easily done by choosing η(t) ∈ U as η(t) = −R(t)−1 Hu (xo (t).34) z(tf .28). tf ) is positive definite where ¯ W (t0 . z(t.25) to hold. uo (t). tf ) = tf t0 DΦ(tf .32) Note that in arbitrary small time intervals where ε(t) = 1. (5.18).31). λ(t). t)DT dt.1 The p × p matrix W (t0 . First-Order Necessary Conditions for Optimality 165 For ε sufficiently small. t)δu(t) dt 2 tf O(t. From the linearity of (5. t)B(t)R−1 (t)B T (t)ΦT (tf .2. this choice minimizes the integral in (5. is rewritten using (5. the strong form of the classical Legendre–Clebsch ˆ condition. the variation in the cost function ∆J can be reduced if the second term is made negative. (5. t)T . having substituted in (5. η(·)) = A(t)z(t.35) .3 and becomes the controllability condition [8] when p = n and D has rank n.1. (5.3. uo (t). This choice of η is particularly significant since it can be shown under an additional assumption that there exists a ν such that η(·) given by (5. η(·)) satisfies the equation z(t. η(·)) = 0 ˙ so that tf (5.5. ε)dt + O(ε).31) Since for t ∈ [t0 . (5.32) causes (5.3.3. (5.1 This assumption is equivalent to Assumption 4.33) Remark 5. 1. t)B(t)R−1 (t) R(t)uo (t) + C(t)xo (t) + B T (t)λ(t) dt. η(·)) = − tf t0 Chapter 5.e.36) The linear forced equation (5. (5. ε) includes all higher-order variations such that (5. ti + εδi ] and the intervals over which the integrals are taken are given in (5. Premultiplying (5. η(·)) =− tf t0 DΦ(tf .20). t) Q(τ )xo (τ ) + C T (τ )uo (τ ) dτ.37) in (5. t)DT dt ν = 0. R(t) R(t) . By Assumption 5. t)DT ν + t tf ΦT (τ. i.39) where In = i [ti . ||Hu ||2 −1 = Hu R(t)−1 Hu . LQ Control Problem Φ(tf .32) z(tf . t)Sf xo (tf ) + ΦT (tf .38) DΦ(tf .36).37) and then.. and the change in J is7 ∆J = − In 1 Hu 2 2 R(t)−1 dt −ε I ||Hu ||2 −1 dt + R(t) tf t0 O(t. independent of ε. with this choice of η(·). the control variations are not restricted to 7 T ||Hu ||2 −1 is the norm square of Hu weighted by R(t)−1 . an equation explicit in ν results. t)Sf xo (tf ) tf t + B T (t) − tf t0 ΦT (τ. t)B(t)R(t)−1 B T (t)ΦT (tf . Since the intervals are arbitrary. a unique value of ν can be obtained.29) holds.23) and (5. using (5. (5. t)B(t)R−1 (t) R(t)uo (t) + C(t)xo (t) + B T (t)ΦT (tf .3. Note that O(t.166 By using (5.30) is solved as λ(t) = ΦT (tf . (5. ε)dt + O(ε). t) Q(τ )xo (τ ) + C T (τ )uo (τ ) dτ dt (5. which satisfies the constraints (5.22) holds.27). (5. Consequently.36) by D leads to the desired equation that satisfies the terminal constraints in the presence of variations in state and control as Dz(tf . The remaining difficulty resides with the convexity of the cost associated with the neglected second-order terms. the optimal cost criterion is shown to be represented by a quadratic function of the state. λ(t).2. a necessary condition for the variation in the cost criterion. and here we show how to extend the LQ problem to terminal constraints . to be nonnegative for arbitrary strong variations.45) (5. Therefore. Theorem 5. ˙ xo (t0 ) = x0 . It will be shown that the Lagrange multipliers are linearly related to the state variables.43) (5. δu. 5.40) ˙ λ(t) = −AT (t)λ(t) − Q(t)xo (t) − C T (t)uo (t).1.3. (5. Furthermore. and 5. λ(t).41) (5. 0 = R(t)uo (t) + C(t)xo (t) + B T (t)λ(t).5. Then the necessary conditions for ∆J to be nonnegative to first order for strong perturbations in the control (5. 0 = Dxo (t). This form is reminiscent of the optimal value function used in the H-J-B theory of Chapter 3 to solve the LQ problem. uo (t).2. is that Hu (xo (t). ∆J. The objective of the remaining sections is to give necessary and sufficient conditions for optimality and to understand more deeply the character of the optimal solution when it exists. These necessary conditions form a two-point boundary-value problem in which the boundaries are linearly related through transition matrices.44) (5. λ(tf ) = Sf xo (tf ) + DT ν.3.1 Suppose that Assumptions 5. We have derived first-order necessary conditions for the minimization of the quadratic cost criterion.3.1 are satisfied. First-Order Necessary Conditions for Optimality 167 only small variations.26) are that xo (t) = A(t)xo (t) + B(t)uo (t).42) (5. t) = 0.2. The above results are summarized in the following theorem. 3.45) as a function of xo (t) and λ(t) as uo (t) = −R−1 (t) C(t)xo (t) + B T (t)λ(t) . (5. i. the extremal uo (t) can be determined from (5. the resulting necessary conditions for the unconstrained terminal problem are given by (5.4 LQ Problem without Terminal Constraints: Transition Matrix Approach By applying the first-order necessary conditions of Theorem 5. 5.46) Substitution of uo (t) given by (5.41) are given at the initial time and n conditions are specified at the final time as λ(tf ) = Sf xo (tf ).41) to (5..1.19). This generalization of the optimal value function to include terminal constraints and free terminal time implies that the optimal solution to this general formulation of the LQ problem is not just a local minimum but a global minimum. LQ Control Problem and free terminal time for the H-J-B theory given in Chapter 4. By Assumption 5.47) is solved as a two-point boundary-value problem where n conditions in (5. (5.42) results in the linear homogeneous 2n-vector differential equation xo (t) ˙ ˙ λ(t) = A(t) − B(t)R−1 (t)C(t) −B(t)R−1 (t)B T (t) −Q(t) + C T (t)R−1 (t)C(t) −(A(t) − B(t)R−1 (t)C(t))T xo (t) λ(t) .45).2.43) with D = 0.168 Chapter 5.5 for terminally unconstrained optimization problems. .17) through (5.43) D = 0.46) into (5.41) and (5. In this way we relate the Lagrange multiplier to the derivative of the optimal value function with respect to the state as given explicitly in Section 3.1 to the problem of Equations (5. where in (5. (5.47) Equation (5.e. τ ) = H(t)ΦH (t.49) where t is the output (or solution) time and τ is the input (or initial) time. t0 ) − Φ22 (tf . (5. t0 ) Φ12 (tf . define the Hamiltonian matrix as H(t) = A(t) − B(t)R−1 (t)C(t) −B(t)R−1 (t)B T (t) 169 −Q(t) + C T (t)R−1 (t)C(t) −(A(t) − B(t)R−1 (t)C(t))T (5.3) associated with the solution of d ˙ ΦH (t.53) The invertibility of this matrix is crucial to the problem and is discussed in detail in the following sections. t0 ) Φ22 (tf . t0 ) xo (t0 ) λ(t0 ) .50) (5. (5. the solution to (5. the first matrix equation gives xo (tf ) = Φ11 (tf .4. τ ) . t0 )xo (t0 ) + Φ12 (tf . t0 )]xo (t0 ). t0 )λ(t0 ). τ ) Φ12 (t.54) is of central importance to the LQ theory . becomes Sf [Φ11 (tf .52) The second matrix equation of (5. Using this block-partitioned transition matrix. t0 ) − Sf Φ11 (tf . λ(t0 ) = [Sf Φ12 (tf .47) is represented as xo (tf ) λ(tf ) = xo (tf ) Sf xo (tf ) = Φ11 (tf . The result (5.54) (5. (5. Transition Matrix Approach with No Terminal Constraints For convenience.5. t0 )xo (t0 ) + Φ22 (tf . τ ) dt in block-partitioned form is ΦH (t. using (5.51). τ ) = Φ11 (t. t0 )λ(t0 ). τ ) Φ22 (t.51). t0 )]−1 [Φ21 (tf . t0 )λ(t0 )] = Φ21 (tf . τ ) Φ21 (t. τ ) = ΦH (t. From (5. t0 ) Φ21 (tf . (5. By solving for λ(t0 ). assuming the necessary matrix inverse exists.51) The objective is to obtain a unique relation between λ(t0 ) and xo (t0 ). t0 )xo (t0 ) + Φ12 (tf .52) to eliminate xo (tf ).48) and the transition matrix (see Section A. t) − Φ22 (tf . t. then xo (t) λ(t) = ΦH (t.58) 5. the present time.4.55) If the transition matrix is evaluated over the interval [t0 .57) reduces to the optimal control rule uo (t) = −R(t)−1 [C(t) + B T (t)S(tf . t0 . t0 . Our objective is to understand the properties of this control rule. t)]−1 [Φ21 (tf . t0 ) I S(tf . 46] has the useful symplectic property. some properties peculiar to Hamiltonian systems are presented. (5.57) This control rule can be interpreted as a sampled data controller if xo (t0 ) is the state measured at the last sample time t0 and t.170 Chapter 5.46) results in the general optimal control rule uo (t) = −R−1 (t) C(t) B T (t) ΦH (t. Sf ) xo (t0 ). (5. (5. where t0 ≤ t ≤ tf . Sf )]xo (t).59) . t. lies within the interval [t0 . t0 ) I S(tf . More precisely. the character of S(tf .56) Substitution of (5. t. defined as ΦH (t. Beginning in the next subsection. For convenience. Sf ) and the transition matrix ΦH (t. t0 )JΦT (t.56) into (5. define S(tf . where ∆ is the time between samples. then (5. t)]. t) − Sf Φ11 (tf . t0 ) = J. If t0 is considered to be the present time. This linear control rule forms the basis for LQ control synthesis. t]. H (5. t0 ) are to be studied. t.46) to be expressed as an explicit linear function of the state. t0 + ∆].1 Symplectic Properties of the Transition Matrix of Hamiltonian Systems The transition matrix of the Hamiltonian system [37. (5. LQ Control Problem because this allows the optimal control (5. Sf ) = [Sf Φ12 (tf . Sf ) xo (t0 ). 59) as ˙ ˙H ΦH (t.59). and using JJ = −I. (5. t0 )J T ΦT (t. H . t0 ) = 0. H dt (5.4. H H By using (5.60) is known as the fundamental symplectic matrix. (5. t0 ) and on the right by ΦT (t. This results in the form ΦT (t. t0 )J T ΦH (t. t0 ) = 0.62) reduces to H(t)J + JH T (t) = 0. t0 ) (5.5. t0 )J and on the right by −Φ−1 (t. multiplying it on the left by ΦH (t. multiplying it on the left by −ΦH (t.49) into (5.63). t0 ).63).48) clearly satisfies (5.62) where H(t) defined by (5. H (5. t0 ) + ΦH (t. t0 ) + ΦH (t. t0 )JΦT (t. we obtain (5. t0 ) = J T .49).65).63) (5. we time-differentiate (5.59).64) The integral equals a constant matrix which when evaluated at t0 is J T . t0 )JΦT (t.59). t0 )H T (t) = 0.59) by starting with the transpose of (5. t0 )JΦT (t. To show that the transition matrix of our Hamiltonian system satisfies (5. t0 )J.61) results in H(t)ΦH (t. we obtain the exact differential H d ΦH (t.61) Substitution of the differential equation for ΦH (t. and then substituting in (5. t0 )J ΦT (t.65) Taking the transpose of (5. Transition Matrix Approach with No Terminal Constraints where J= 0 I −I 0 171 (5. H (5. To obtain (5. 70) (5. (5. t0 ).59) Φ−1 (t. t0 ) as in (5. t.59): Φ11 (t. Sf ) can be propagated directly by a quadratic matrix . t0 )ΦT (t. are n eigenvalues of ΦH (t. t0 )ΦT (t. t0 ). t0 ). t0 )J. t0 ) − λI). Sf ) in (5. . n. t0 )−1 are the same since det(Φ−1 (t. t0 ) is a 2n × 2n nonsingular matrix. . .4. 12 11 Φ21 (t.172 Chapter 5.71) 5. µi i = 1. t0 ) = J T ΦT (t. . t0 ) − Φ12 (t. t. . t0 )ΦT (t. then the remaining n eigenvalues are µi+n = 1 . in this section we show that S(tf .69) (5. i = 1. t0 )J − λJ T J) H = det(ΦH (t. t0 ) = Φ12 (t. 22 21 Φ11 (t.2 Riccati Matrix Differential Equation Instead of forming S(tf . t0 )ΦT (t. (5. n.66) Furthermore.55) by calculating the transition matrix. LQ Control Problem We now consider the spectral properties of ΦH (t. (5. t0 ) = Φ22 (t.68) By partitioning ΦH (t. t0 ) = I. H H (5. t0 )ΦT (t. the following relations are obtained from (5. .50). if µi . t0 ) with µi = 0 for all i. t0 )J − λI) H H = det(J T ΦT (t. t0 )ΦT (t. t0 ) − λI) = det(J T ΦT (t. t0 ) and ΦH (t.67) The implication of this is that since ΦH (t. . then from (5. 22 21 These identities will be used later. the characteristic equations of ΦH (t. . Note that since J T = J −1 . then ¯ ΦH (t. dt ΦH (tf . t0 ) = ΦH (tf . by using the propagation equation for the transition matrix (5. τ ) = LΦH (t. t)ΦH (t. d¯ ¯ ΦH (tf .75) where input time is the independent variable and the output time is fixed. note from (5. (5. (5. t. Sf ) = −Φ−1 (tf .71). where tf and t0 are two fixed times. t)H(t). t)Φ21 (tf . τ ) is symplectic.72) that ¯ ¯ S(tf . Therefore. 22 (5.4. t0 ) (5. t).73) (5.74) with respect to t. t)H(t). tf ) = I. τ ) satisfies (5.49).69) to (5. note that since ΦH (t. differentiation of the identity ΦH (tf . This equation plays a key role in all the analyses of the following sections. gives the adjoint form for propagating the transition matrix as d ΦH (tf . the partitioned form of ΦH (t.55) and (5. Transition Matrix Approach with No Terminal Constraints 173 differential equation called the matrix Riccati equation. tf ) = L. Second. t) = −ΦH (tf .77) ¯ ΦH (tf . t) = −ΦH (tf . First. A few preliminaries will help simplify this derivation as well as others in the coming sections.76) . τ ) with the symplectic matrix L= I 0 −Sf I (5. Therefore.5.72) ¯ is also a symplectic matrix. dt Finally. Sf )B(t)R−1 (t)B T (t)S(tf . t) and Φ22 (tf .77) is rewritten as ¯ ¯ −Φ21 (tf . Sf ) will be . t) is assumed invertible. Sf ) dt ˙ ˙ = S(tf . t)(Q(t) − C T (t)R−1 (t)C(t)) ¯ ¯ = Φ21 (tf .76) as ¯ ¯ Φ21 (tf . LQ Control Problem Theorem 5. t. Then a symmetric matrix S(tf . t) S(tf .1 Let ΦH (tf . S(tf . t. Sf ) satisfies d S(tf . t. t. the Riccati equation of (5. dt (5. t. t. Sf ). tf .50). Sf ) = S T (tf . then S(tf . t. and symmetric. Sf ) A(t) − B(t)R−1 (t)C(t) − Q(t) − C T (t)R−1 (t)C(t) + S(tf .81) ¯ ¯ Since Φ22 (tf . then by premultiplying by Φ22 (tf . Sf ) = Sf if the inverse in (5. Since Sf is symmetric. t)−1 and using (5.4.79) (5.174 Chapter 5. t.47) with partitioning given by (5. then − d ¯ Φ21 (tf . t)B(t)R−1 (t)B T (t) + Φ22 (tf . f . t) [S(tf . Sf ) .77). t) dt = d ¯ Φ22 (tf . Sf )] . t)(A(t) − B(t)R−1 (t)C(t)) − Φ22 (tf . t) dS(tf . Sf ) and differentiated with respect to t. Proof: If (5. Sf ). f . t) be the transition matrix for the dynamic system (5. Sf ) − S(tf . Sf ) = − A(t) − B(t)R−1 (t)C(t) dt T S(tf .77) exists and Sf is symmetric. t) = Φ22 (tf . t. t. Sf ) + Φ22 (tf .78) (5. t. t. dt (5. t)S(tf . Sf ) dt d ¯ + Φ22 (tf . d S(tf .80) ¯ ¯ The derivatives for Φ21 (tf . t) are obtained from the partitioning of (5.78) is obtained. t)(A(t) ¯ −B(t)R−1 (t)C(t))T S(tf . t) may no longer be invertible and the control law of (5.1 By the symplectic property (5.2 Since the boundary condition on Φ22 (tf . ¯ Remark 5. t. ¯ ¯ ¯ ¯ S(tf . Sf ) exists. 5.85) I 0 −S(tf . Sf ) = −Φ−1 (tf .4.77). it is shown that if the Riccati variable S(tf . t) 22 21 22 175 (5. In the classical calculus of variations literature this is the focal point condition or Jacobi condition [22]. using (5.84) (5.58) would no longer be meaningful. t.70). t)Φ−T (tf .83) . Sf ) is symmetric. t. t)Φ21 (tf .3 Canonical Transformation of the Hamiltonian System Some additional insight into the character of this Hamiltonian system is obtained by a canonical similarity transformation of the variables x(t) and λ(t) into a new set of ¯ variables xo (t) and Λ(t).78). In the next section. tf ) is the identity matrix. The canonical transformation that produces the desired result is L(t) = such that xo (t) ¯ λ(t) = I 0 −S(tf .4. t) = −ΦT (tf . ¯ a finite interval of time is needed before Φ22 (tf . Sf ) I xo (t) λ(t) (5.4.82) implies that S(tf . t. Transition Matrix Approach with No Terminal Constraints Remark 5. without directly using the Riccati differential equation (5.4.5. then the Hamiltonian system can be transformed into another similar Hamiltonian system for which the feedback law appears directly in the dynamic system.60). Sf ) I (5. A transformation L(t) for Hamiltonian systems is said to be canonical if it satisfies the symplectic property L(t)JLT (t) = J. where J is defined in (5. t. Sf ) = Sf . Sf ) satisfying the Riccati equation (5. by (5.86) where use is made of the inverse of L(t): L−1 (t) = I 0 S(tf . The existence of the Riccati variable is necessary for the existence of the linear control rule and the above canonical transformation. (5.88) ¯ Observe in (5. tf . t. However. Therefore.89) (5. Note that the state variables are not being transformed. = L(tf ) (5. Sf ) is symmetric.86) that λ(t) is propagated by a homogeneous differential equation.85) and using (5. ¯ λ(tf ) = 0. which is the dynamic equation using the optimal control rule (5. LQ Control Problem L(t) is a canonical transformation since S(tf .86) produces the differential equation for the state as xo (t) = [A(t) − B(t)R−1 (t)(C(t) + B T (t)S(tf .78) Note that from the boundary condition of (5. ˙ xo (t0 ) = x0 .89). tf . ¯ which. In the next section it is shown (5.90) .87) The zero matrix in the coefficient matrix of (5. Sf )]xo (tf ).58). (5.176 Chapter 5. (5. since S(tf . t. T xo (t) ¯ λ(t) .47) and (5. Sf ))]xo (t).78) as xo (t) ˙ ˙ ¯ λ(t) xo (tf ) ¯ λ(tf ) = A − BR−1 (C + B T S) −BR−1 B T 0 − A − BR−1 (C + B T S) xo (tf ) Sf xo (tf ) . Sf ) I . t. has the trivial solution λ(t) = 0 for all t in the interval [t0 . The propagation equation for the new variables is obtained by differentiating (5. tf ]. t.83) is a direct consequence of S(tf .86) ¯ λ(tf ) = [Sf − S(tf . besides R(t) > 0.1 The dynamic system (5. Φ(σ.. t0 ) = t0 ΦA (t . t)B(t)B T (t)ΦT (t . t0 ) is said to be positive definite if for each u(·) in U. Transition Matrix Approach with No Terminal Constraints 177 that the existence of the Riccati variable S(tf . 0.18) is controllable on any interval [t. The main ideas in this section were given in [20] and [4]. the controllability Grammian matrix. then a bounded control −1 u(t) = B T (t)ΦT (t .e. σ). σ) = I. tf ]. is complete controllability. u(·) = 0 (null function). i. J (u(·). t)x(t ) ¯ A (5. it is a necessary and sufficient condition for the linear controller to be the minimizing controller. (5. If the system is completely controllable. Conditions obtained here are applicable to the second variation of Section 5. is t WA (t .92) and d ΦA (t.4. t)dt > 0 A (5. Definition 5.1 being positive definite for the optimization problem without terminal constraints and fixed terminal time. 0.4. t0 ) > 0. t0 ).4.4 Necessary and Sufficient Condition for the Positivity of the Quadratic Cost Criterion We show here that the quadratic cost criterion is actually positive for all controls which are not null when the initial condition is x(t0 ) = 0. t)WA (t . Sf ) is a necessary and sufficient condition for the quadratic cost criterion to be positive definite.4. dt for all t in (t0 . 5. σ) = A(t)ΦA (t. t ] where t0 ≤ t < t ≤ tf (completely controllable).5.91) always exists which transfers the state x(t0 ) = 0 to any desired state x(t ) at t = t where WA (t .93) . Assumption 5. t. The essential assumption.1 J (u(·). Sf ) x(t) 2 + xT (t)S(tf .78) and the dynamics (5. t. t0 ) > 0 is the existence of S(tf . Any other control u(·) = uo (·) will give a positive value to J (u(·). tf . LQ Control Problem We show first that if for all t in [t0 . Sf ) + 2 t0 − C(t) + B T (t)S(tf . t. 0.18) as 1 T 1 x (t0 )S(tf .58). (5. t0 ) 1 = xT (t0 )S(tf .94) This can be rewritten using the Riccati equation (5. t0 ) is positive definite. tf ] there exists an S(tf . Sf )x(t0 ) − xT (tf )Sf x(tf ) 2 2 tf 1 − xT (t) Q(t) + S(tf . Consider adding to J (u(·). + t0 2 dt (5. Sf ) over the interval [t0 . Sf )x(t0 ) 2 1 tf u(t) + R−1 (t) C(t) + B T (t)S(tf .95) is added to (5. (5. t. Sf ) x(t) 1 + xT (t) S(tf . Sf ) T R−1 (t) C(t) + B T (t)S(tf . Sf ) x(t) dt. t. Sf ) which satisfies (5.78).96) Therefore. This shows that a sufficient condition for J (u(·).17). t. x(t0 ). t0 ) of (5. t. If x(t0 ) is zero. t0 . Sf )x(tf ) 2 2 tf 1d T x (t)S(tf . 0. t.95) If (5. t0 . Sf )x(t)dt = 0. t0 . Sf )B(t)u(t) dt = 0. t. . the cost takes on its minimum value when u(t) takes the form of the optimal controller (5. Sf ) x(t) + 2 t0 T ×R(t) u(t) + R−1 (t) C(t) + B T (t)S(tf . Sf )x(t0 ) − xT (tf )S(tf . then J (u(·). tf ]. x(t0 ). t. t. Sf )A(t) + AT (t)S(tf .17) the identically zero quantity 1 T 1 x (t0 )S(tf . Sf )A(t) + AT (t)S(tf . the integrand can be manipulated into a perfect square and the cost takes the form J (u(·). 0.178 Chapter 5. then the optimal control is the null control. t. t0 ). t. 97) We prove the necessity of the existence of S(tf . (5. t . t0 ) is positive definite. Sf )x(t ) 2 t 1 T 1 x (t)Q(t)x(t) + uT (t)C(t)x(t) + uT (t)R(t)¯(t) dt. t. the minimizing cost is 1 J (uo (·). 2 (5. First note from Remark 5. Since the Riccati equation is nonlinear. Sf ) exists for t ≤ t ≤ tf .1. Sf ) depends on Assumption 5. the solution may approach infinite values for finite values of time. 8 . ¯ ¯ u + 2 2 t0 (5. 0. Therefore. 0. t0 ) = xT (t )S(tf . Sf ) by supposing the opposite— that S(tf .2 that for some t close enough to tf .98) u(t) = uo (t). t. which violates the positive-definite assumption of the cost criterion. If the cost J (u(·). then the necessity of the existence of the Riccati variable S(tf .4.1 there exists a control u(t) for t ∈ [t0 . using the control defined as u(t) = u(t).5. t.4. Sf )x(t ). tf ] for J (u(·). then we will show that the cost criterion can be made either as negative as we like. or as positive as desired. t < t ≤ tf . which violates the assumption of minimality. t ] which transfers ¯ ¯ x(t0 ) = 0 to any x(t ) at t > t0 such that x(t ) = 1 and u(·) ≤ ρ(t ) < ∞. S(tf . 0.4. This is called the escape time. can be written as 1 J (u(·). t0 ) > 0. t. by using the optimal control from t to tf for some x(t ). ¯ t0 ≤ t ≤ t .4.99) where by Assumption 5. Sf ) over the interval [t0 . t . The cost. where ρ(·) is a continuous positive function.8 te . Transition Matrix Approach with No Terminal Constraints 179 Now we consider the necessity of the existence of S(tf . t ) = xT (t )S(tf . x(t ). Sf ) ceases to exist at some escape time. If this occurs. t. This was shown in Example 3. t. xT (t )S(tf . the integral in (5.100) and is valid as long as the solution S(tf . since the integral in (5. Remark 5. t0 ) > 0. If xT (t )S(tf . t) = S(tf .1. x(t ). Furthermore. tf ].2 the Hamilton–Jacobi–Bellman equation.99) can be bounded. t.4. Therefore. t . Theorem 5.3 The optimal value function. Sf )x(t) 2 (5. t. t0 ) → −∞. t. t.1. tf ]. then J (u(·). tf ] which satisfies the Riccati equation (5. Theorem 5. x(t )T S(tf . t .5. . LQ Control Problem Since u(·) is bounded by the controllability assumption. t) = xT (t)S(tf .2 Suppose that system (5. again showing that the solution to the LQ problem is global. This quadratic form satisfies from Theorem 3.5.4. is 1 V (x(t). Sf )x(t ) cannot go to ∞. Furthermore. But this violates the assumption that J (u(·). where the optimal value function used in the Hilbert’s integral was a quadratic form identical to the function given in (5. Sf ) satisfies a Riccati equation as given in Section 3.4. t . Sf ) exists for all t in [t0 . 0. Our arguments apply to the interval (t0 . 0. given by (5.18) is completely controllable.4. t.78). t) = S(tf .99) ¯ is also bounded.180 Chapter 5. These results are summarized in the following theorem.78) exists. A necessary and sufficient condition for J (u(·). Sf )x(t ) → −∞ as t → te (te < t ) for some x(t ). We appeal to [22] to verify that S(tf .97). Note that since no smallness requirements are placed on x(t) as in Section 5.97). 0. t0 ) to be positive definite is that there exists a function S(tf . VxT (x(t). t ) can be made infinite: the controllability assumption implies that there exists a finite control which gives a finite cost.2 is a statement of the global optimality of the solution to the LQ problem. Sf ) to the Riccati equation (5. Sf )x(t ) cannot go to positive infinity since this implies that the minimal cost J (uo (·).5. Sf ) for all t in [t0 . Sf )x(t) satisfies the same differential equation as λ(t) and Vxx (x(t). 8)).78). where u(·) is some suitable norm defined on U. tf ] there exists a function S(tf . t0 ) is said to be strongly positive if for each u(·) in U. it is sufficient that the second variation be strongly positive [22]. 0. 0.2 J (u(·). Strong positivity is defined as follows.4. 0.3 A necessary and sufficient condition for J (u(·). The extension from showing that the second variation is positive definite (Theorem 5. t. Though the necessity part is well known too.4.5. and some k > 0. in the second variational problem if the second-order term is to dominate over higher-order terms in the expansion (see (5. The sufficiency part of this theorem is very well known and documented [8]. proved convincingly elsewhere except in certain special cases. Sf ) which satisfies the Riccati equation (5. t0 ) ≥ k u(·) 2 . in our opinion.101) . Definition 5.5 Necessary and Sufficient Conditions for Strong Positivity The positivity of the second variation as represented by J (u(·). J (u(·).4. Theorem 5. To ensure that the second variation dominates the expansion. 0. In this section we prove that a necessary and sufficient condition for strong positivity of the nonsingular (R = Huu > 0) second variation is that a solution exists to the matrix Riccati differential equation (5. it is not.4. strong positivity is required. Although the LQ problem does not need strong positivity.4. Transition Matrix Approach with No Terminal Constraints 181 5.78). (5. t0 ) to be strongly positive is that for all t in [t0 .2) to strongly positive is given in the following theorem. t0 ) is not enough to ensure that the second variation dominates the higher-order term in the expansion of the cost criterion about the assumed minimizing path. ε) exists for all t in [t0 . ε)]. ·.4. ε. t. J (u(·). t0 ) is positive definite. Sf . Sf . R(t).105) uT (t)R(t)u(t)dt.182 Chapter 5. tf . t. has a solution S(tf . t. Therefore. Sf . ε)]T + B T (t)S(tf . consider a new LQ problem with cost criterion J (u(·). and since S(tf . ε) − [C(t) + B T (t)S(tf . 2 This functional is positive definite if 2 − ε > 0 and if and only if ˙ −S(tf . t0 ) + so that J (u(·).102) 1 + xT (tf )Sf x(tf ). we note that J (u(·). ε) defined for all t in [t0 . C(t). S(tf . t0 ) ≥ − ε 2(2 − ε) tf t0 ε 2(2 − ε) tf t0 uT (t)R(t)u(t)dt ≥ 0 (5. t0 ) tf = t0 1 T 1 T x (t)Q(t)x(t) + uT (t)C(t)x(t) + u (t)R(t)u(t) dt 2 2−ε (5. Sf . t. To show that J (u(·). (5. 0.103) (5. Sf . Sf . 0.106) . t. 0. t0 ) = J (u(·). ε) = Sf .2 we proved that J (u(·). tf ]. B(t) are continuous in t and the right-hand side of (5. LQ Control Problem Proof: In Theorem 5. 0.104) Now. 0. for ε sufficiently small. 0) exists. 0. A(t). for ε < 0 and sufficiently small. tf ]. ε. Sf . Sf . ε) and ε. t. ε) is a continuous function of ε at ε = 0 [15]. Sf . t. Sf . S(tf . since Q(t). ε. t0 ) is strongly positive. ε)A(t) + AT (t)S(tf . 2 − ε −1 R (t)[C(t) 2 (5. ·. Sf .103) is analytic in S(tf . we have that S(tf . ε) = Q(t) + S(tf . Next. 0. t0 ) is positive definite. So. t. 4. See Section 5. k1 > 0. Transition Matrix Approach with No Terminal Constraints 183 From Assumption 5. Remark 5.1.1. t0 ) strongly positive the second variation dominates all higher-order terms in the expansion of the cost criterion about the minimizing path with fixed terminal time and without terminal constraints.107) (5.109) 2 L2 . Therefore. ·. R(t) > 0 with norm bound 0 < k1 ≤ ||R(t)|| ≤ k2 . 9 tf t0 1 1 (5.4. t0 ) is strongly positive.78) escapes and that other neighboring paths can produce smaller values of the cost. The differential distance is ds = (r2 dθ2 + r2 cos2 θdφ2 ) 2 = r2 (u2 + cos2 θ) 2 dφ.4 For J (u(·). 0.4. namely. t0 ) ≥ k u(·) where k=− εk1 > 0.2.4.110) uT (t)u(t)dt = u(·) 2 L2 is the L2 or integral square norm. Sf ) exists which satisfies (5.2.1 (Shortest distance between a point and a great circle) This example illustrates that the second variation is no longer positive definite when the solution to the Riccati equation (5.9 J (u(·). 0. t0 ) ≥ − Hence. (5.106) that J (u(·).5. then an S(Sf . we conclude from (5. The converse is found in the proof of Theorem 5. 0.108) which implies that J (u(·). 0. . Example 5. t0 ) is strongly positive. 2(2 − ε) (5. that if J (u(·). εk1 2(2 − ε) tf t0 uT (t)u(t)dt. 0.78). Let s be the distance along a path on the surface of a sphere. fixed terminal φ = φ1 . 0. The variational Hamiltonian is H = (u2 + cos2 θ) 2 + λu. about that path.113) . θ(0) = 0. find the control u that minimizes φ1 J (u(·). where φ is the longitudinal angle and θ is the lateral angular as shown in Figure 5. The first-order necessary conditions are dθ = Hλ = u.4.111) where u is the control variable and φ is treated as the independent variable.4. The dynamic equation is dθ = u.111).. and unconstrained θ(φ1 ). Then.1.114) 1 (5.184 Chapter 5. dφ (5. θ(0) = 0. First.112) subject to (5. φ = 0) = 0 (u2 + cos2 θ) 2 dφ 1 (5. i. the second-order necessary conditions will be analyzed using the conditions developed in Section 5. LQ Control Problem Terminal Great Circle d d Figure 5. a trajectory that satisfies the first-order necessary conditions is determined. The problem is to minimize the distance on a unit sphere (r = 1) from a given point to a given great circle. dφ (5.e.1: Coordinate frame on a sphere. the so-called singular control problem. λ(φ1 ) = 0. Note that the gain is positive indicating that the best neighboring optimum controller is essentially divergent. The second variational problem given in Section 5.5.117) = δu.4. λ(φ) = 0. R = 1. If the path is longer than π/2.4. Furthermore. Q = −1. it turns out that if R(t) = Huu ≥ 0. At that point any great circle path from the point to the terminal great circle will give the same cost. δθ(0) = 0 for fixed terminal φ = φ1 with no constraint on the terminal value of θ(φ1 ). If φ1 −φ < π/2.4. B = 1. C = 0.2 is to find the perturbed control u that minimizes the second variational cost criterion δ2J = subject to dδθ dφ 1 2 φ1 0 (δu)2 − (δθ)2 dφ (5.6 Strong Positivity and the Totally Singular Second Variation Unfortunately. then there are neighboring paths that do not even satisfy the first-order necessary conditions. From Theorem 5.118) Note that the solution remains finite until it escapes at φ1 − φ = π/2. the second variation cannot be strongly positive [4] and so different tests for .2 it is necessary that the solution of the associated Riccati equation (5. then the second variation is positive and can be shown to be strongly positive. Hu = λ + u(u2 + cos2 θ)− 2 = 0. About this extremal path. θ(φ) = 0 for 0 ≤ φ ≤ φ1 . which can give smaller cost. the associated Riccati equation and solution are dS = S 2 + 1. the second-order necessary conditions are generated. dφ (5. the second variational controller is δu = tan(φ1 − φ)δθ. S(φ1 ) = 0 ⇒ S(φ) = − tan(φ1 − φ). Since for this example A = 0. 5.115) (5. 1 1 185 (5.78) exist. Transition Matrix Approach with No Terminal Constraints ˙ −λ = Hθ = −(u2 + cos2 θ)− 2 sin θ cos θ.116) The first-order necessary conditions are satisfied by the trajectory u(φ) = 0. of course. the example illustrates the fact that the totally singular second variation cannot be strongly positive. consistent with the fact that in the totally singular case the matrix Riccati equation is undefined because R−1 (t) does not exist. this condition is inap−1 plicable in the singular case owing to the presence of R−1 (t) = Huu in the matrix Riccati equation. Before presenting the example. In Section 5. In the case of nonsingular optimal control problems. This has turned out to be not true [4]. then J(u(·).3 J (u(·). However. 0. LQ Control Problem sufficiency have to be devised. tf ]. and hence for a weak local minimum. In addition. The result is that sufficiency conditions for nonnegativity of singular and nonsingular second variations are rather closely related.186 Chapter 5.) This is. tf ]. in a finite-dimensional vector space. tf ].4. it is known that a sufficient condition for strong positivity of the second variation. In this section we illustrate the difference between positive definiteness and strong positivity by means of a simple example. is that the matrix Riccati differential equation associated with the second variation should have a solution for all t in [t0 . we define “ totally singular” precisely as follows. it was felt that no Riccati-like condition existed for the singular case. positive definiteness is equivalent to strong positivity. Clearly. where R(t) = Huu > 0 is invertible for all t in [t0 . Definition 5. Clearly. 0. in our space of piecewise continuous control functions this is not so. we consider the totally singular functional tf J (u(·).119) Now. t0 ) = t0 x2 (t)dt (5.5 we demonstrated that if the matrix Riccati equation has a solution for all t in [t0 .120) .4. (See [4] for general proofs. For a long time. t0 ) is not only positive definite but also strongly positive. 0. (5. therefore. tf ]. t0 ) is said to be totally singular if R(t) = 0 ∀ t in [t0 . . t0 ) were strongly positive. J (u(·). t0 ) is positive definite.e.121) (5. 0. ω t0 ≤ t ≤ tf . t0 ) = t0 1 sin2 ω(t − t0 ) dt. 0. 1 ω2 i.5. then for some k > 0 and all ω > 0. (5.4. Set u(t) = cos ω(t − t0 ).120) is not strongly positive.128) tends to 1 k(tf − t0 ) as 2 ω → ∞. if J (u(·).124) With this choice of control J (u(·). t0 ≤ t ≤ tf . 1 k+ 2 ω tf t0 tf t0 (sin ω(t − t0 ))dt ≥ k 2 tf t0 u (t)dt = k t0 2 tf (1 − sin2 ω(t − t0 ))dt. 0.128) But this is impossible because the left-hand side of (5. (5. 0. (5.126) By definition. 0. (5. t0 ) becomes tf J (u(·). 187 (5. Clearly J (u(·). t0 ) of (5.127) (sin2 ω(t − t0 ))dt ≥ k(tf − t0 ).122) u(·) is a member of U. ˙ x(t0 ) = 0. Transition Matrix Approach with No Terminal Constraints subject to x(t) = u(t).123) (5. In other words. so that x(t) = 1 sin ω(t − t0 ). 2 ω (5.125) and u(·) 2 L2 tf = t0 u (t)dt = t0 2 tf cos ω(t − t0 )dt = 2 tf t0 (1 − sin2 ω(t − t0 ))dt.. (5. Now. 0. guess a value for the initial value of the adjoint vector.55). t) and L(x(t). It should be noted that small variations in λ(t0 ) can lead to large changes in λ(t). State equations that are stable with time running forward imply that the associated adjoint equations are unstable when they are integrated forward in time (but stable when they are integrated backward in time) [11]. t) are twice differentiable in x(t) and u(t). u(t). 0.7 Solving the Two-Point Boundary-Value Problem via the Shooting Method A second-order method for finding an optimal control numerically is with a shooting method. λi (t0 ). . depending on the stability properties of the adjoint equation. however. 5. see [4] and [14]. t0 ) = t0 (x2 (t) + u2 (t))dt (5. Assume also that Huu (x(t).4.3. To implement the shooting method.41) and (3.130) The fact that the totally singular second variation cannot be strongly positive implies that in the totally singular case we should seek (necessary and) sufficient conditions only for nonnegativity and for positive definiteness of J (u(·). perturb the previous guess λi (t0 ) by some function of the errors at the terminal time tf to get your next guess λi+1 (t0 ).188 Note.4 but can be described in detail here.56) with the optimal control determined from (3. that the functional tf Chapter 5. For a full treatment of the singular optimal control problem. Assume f (x(t). λ(t). We follow the development in Section 5. t) is nonsingular along all trial trajectories. LQ Control Problem J (u(·). u(t). Integrate this guess forward along with the state to tf using (3. t0 ). This follows directly because x2 (·) + u2 (·) ≥ u2 (·).1.129) is strongly positive. This method was contrasted with the first-order steepest descent method in Section 3. u(t). (5. ui (t).134) . so that δHu (y i (t).132) This trajectory does not satisfy the (5. For the control. The dynamic equation is linearized as δ x(t) = fx (xi (t). 3. ui (t). 189 (5. ˙ −δ λ(t) = Hxx (y i (t). t)δλ(t) + Huu (y i (t). 3 ). given λ(tf ). ui (t). Choose λi (t0 ) where i is the iteration index. ui (t). t)δx + fu (xi (t). t)δx+Hxu (y i (t). x(t0 ) = x0 . t) > 0. λi (t). t) = Hux (y i (t). t)δλ(t) + Huu (y i (t). ui . t)δu(t) = 0. ui . ˙ For the Lagrange multipliers. Define y i (t) = xi (t) λi (t) . ui (t). t)δx(t) + Huλ (y i (t). 2. t) = 0 ⇒ ui (t) = g(xi (t). Numerically integrate y i forward with the assumption that Huu (y i (t). let i λi (tf ) − φx (yf ) = β. ui . t)δλ(t). ui . At the terminal boundary. t)δu(t) + O( 1 . λi (t).137) (5. this is ∆Hu (y i (t). 2 . Transition Matrix Approach with No Terminal Constraints 1. t)δu(t). ui (t). ui (t).131) The initial condition. The control is calculated from Hu (xi (t). t) from the Implicit Function Theorem.5. t)δu(t)+Hxλ (y i (t). t)δx(t) + Huλ (y i (t). ui (t). is given. (5. ui . t) = Hux (y i (t).133) 4. Linearize the first-order necessary conditions. ui (t).136) (5.4.135) (5. 139) δx 5. Update the guess of the initial Lagrange multipliers λi+1 (t0 ) = λi (t0 ) + δλ(t0 ).139) as δx(tf ) δλ(tf ) = Φ11 (tf . t0 )δλ(t0 ). t0 )Φ12 (tf . solve for the change in control as −1 δu(t) = −Huu (y i (t).135). (5. t0 )]−1 dβ.48): δx ˙ ˙ δλ = A(t) − B(t)R−1 (t)C(t) −B(t)R−1 (t)B T (t) −Q(t) + C T (t)R−1 (t)C(t) −(A(t) − B(t)R−1 (t)C(t))T . then stop. t0 ))δλ(t0 ) = dβ. ⇒ δλ(t0 ) = [Φ22 (tf . ui (t).140) δx(tf ) = Φ12 (tf . (5. t0 ) − φxx (tf . 7. (5.146) . t0 ) − φxx (tf .141) to give (Φ22 (tf .137). Use δλ(tf ) − φxx (tf . δλ(tf ) = Φ22 (tf . If not. t) > 0 is assumed. δλ (5.144) (5. t0 ) Φ21 (tf . t0 )Φ12 (tf .143) 6. t)δx(t) + Huλ (y i (t). (5. go to 2. t) Hux (y i (t).145) (5.142) (5.) Combining (5. so the inverse exists.136). and (5. If β < ε for some ε small enough. t0 )δλ(t0 ).142) and (5. Substitute (5. δx(t0 ) = 0. ui (t).138) (Recall that Huu (y i (t). ui (t). t0 ) Φ22 (tf . t0 ) δx(t0 ) δλ(t0 ) .140) where. t)δλ(t) . Solving for the partitioned elements of (5. (5. LQ Control Problem From (5. t0 )δx(tf ) = dβ (5.138) into a matrix linear ordinary differential equation called the Hamiltonian system as given in (5.190 Chapter 5. since x(t0 ) is given. ui (t).143) into (5. t0 ) Φ12 (tf .141) for |β| > |β + dβ| so that dβ is chosen such that β contracts on each iteration. Solve numerically the Hamiltonian system from (5. Φ(2. Note that using (3.4.80665 to be   1.68). 0) =   0. Transition Matrix Approach with No Terminal Constraints 191 Sophisticated numerical optimization algorithms based on the shooting method can be found in [43]. the convergence rate of the numerical scheme can be accurately assessed.5.0000 0. 0 −λr sin θ Hxx = .0000 0. It is left in here for simplicity. The transition matrix of (5.1969 0.0000 0. Huu = −λr v cos θ − λv g sin θ.3.0000 0.0000 1.8011 20.641.4083 −25.3.0000  0.3410  0.5103  . fx = 0 cos θ 0 0 Hxu = .8 provides the results r v r(tf ) = 9. The terminal penalty function is φ(x(tf )) = −r(tf ). giving φx = [−1 0 ] and φxx = 0 0 0 0 . 0 0 0 0 Hx = [0 λr cos θ] . An Application of the Shooting Method We apply the shooting method to the Brachistochrone problem of Section 3.6950 . we could remove the differential equation for λr .2865 13.0610 −1.0710 −16. For this problem. fu = −v sin θ g cos θ T = Hux . Since an analytic solution has been obtained. .0000 −0. An initial guess of λ0 (0) = −1 and λ0 (0) = −0.140) is computed for tf = 2 and take g = 9.350 and λv (tf ) = 0. t0 ) defined in (5. as expected. t0 )λ(t0 ) = DT ν + Sf xo (tf ).148) ¯ From (5.149) .5777 −1.1 is solved in a manner similar to that of Section 5.20) and the boundary condition (5.192 Chapter 5.72) is invertible. the desired update is zero.3499 −1.50) as xo (tf ) = Φ11 (tf . (5.7221 −0. The two-point boundary-value problem of Theorem 5.6409 9.3. the desired change in the initial values of the Lagrange multipliers is computed as δλ0 = 0.4832 −1.3. x(t0 )). λv (0) and λv (2) get closer to their analytical initial and terminal 4 values of − π and 0.9221309 .6856 11. Three more iterations provide 0 1 2 3 λv (0) λv (tf ) r(tf ) −0. Moreover.43) are explicitly included. is ¯ ¯ ¯ λ(t0 ) = −Φ−1 (tf . reflecting its invariance. 22 22 (5. t0 )x(t0 ) + Φ12 (tf . respectively. 5. t0 )x(t0 ) + Φ−1 (tf . t0 )λ(t0 ).0000 12.2537 0. t0 )x(t0 ) + Φ22 (tf .5 LQ Problem with Linear Terminal Constraints: Transition Matrix Approach The first-order necessary conditions of Theorem 5.4 by using the transition matrix (5.8 0. LQ Control Problem Using dβ = −β. −0.4862 On each step.2732 −0.20).147) (5. λ(tf ) = Φ21 (tf . where the terminal constraints (5. t0 )DT ν. the computed update for λr (0) is zero. assuming that Φ22 (tf .148) the relation between λ(t0 ) and (ν.0307 12. Note that as we already knew the correct value for λr .17) to (5. t0 )Φ21 (tf .1 are now used for the optimal control problem of (5. t0 )Φ−T (tf . 22 22 (5. which in turn is introduced into (5.152) in (5. 22 (5.150) The symplectic property of the Hamiltonian transition matrix is used to reduce the coefficient of x0 in (5. t0 ) in (5. t0 ) defined in (5. (5. LQ Problem with Linear Terminal Constraints 193 This is introduced into (5. t0 )Φ21 (tf . t0 ) − Φ12 (tf . t0 ). we obtain ¯ ¯ ¯ ¯ ¯ Φ11 (tf . the coefficient of x0 in (5.151) ¯ 22 By premultiplying and postmultiplying the symplectic identity (5. t0 ) = Φ−T (tf . Substitution of this result into (5.153) for Φ11 (tf . t0 ). t0 )Φ−1 (tf . 21 22 Using (5.153) (5. then eliminating ¯ Φ11 (tf .46) will produce an optimal control rule for the terminal . λ(t0 ) and ν can be determined as a function of x(t0 ). t0 )ΦT (tf . 21 22 22 (5. t0 )Φ21 (tf . t0 )DT ν = 0.150). t0 )Φ21 (tf . t0 )Φ−1 (tf .150) can be written in the symmetric form λ(t0 ) 0 = ¯ 22 ¯ ¯ 22 −Φ−1 (tf . t0 ). t0 )DT ¯ 22 ¯ −T (tf . t0 ) DΦ DΦ22 x0 ν . By using the symplectic identity (5.20).152) ¯ Finally.5.147).151). t0 ). t0 ) x(t0 ) 22 ¯ ¯ + DΦ12 (tf . t0 ) = Φ−T (tf .149) and (5. t0 ) − Φ12 (tf . t0 ) ¯ 22 and Φ−T (tf . and using the symmetric property established in (5. respectively. t0 ). by solving the matrix equation (5.72) and the assumed invertibility of Φ22 (tf .71) for the sym¯ ¯ plectic matrix ΦH (tf . t0 )Φ−1 (tf . t0 ) − Φ12 (tf . t0 ) = ΦT (tf . t0 )Φ−T (tf . Therefore. t0 )DT ¯ 12 (tf .154) At this point. t0 ) is also established. the ¯ 22 ¯ symmetric property of Φ12 (tf .69). t0 )Φ21 (tf .70) by Φ−1 (tf .152).5. t0 ). we obtain ¯ ¯ ¯ ¯ 22 Φ−1 (tf . (5. producing ¯ ¯ ¯ ¯ Dxo (tf ) = D Φ11 (tf . t0 ) Φ−1 (tf . t0 )Φ−1 (tf . t0 )Φ−1 (tf .150) reduces to ¯ ¯ ¯ ¯ ¯ Φ11 (tf . t)DT .160) T A(t) − B(t)R−1 (t)C(t) F T (tf . LQ Control Problem constrained optimal control problem not only for t0 but for all t ∈ [t0 . 22 (5. t) = 0. (5. t)DT . 22 ¯ 22 Φ−1 (tf . t)Φ−1 (tf . t.78).156) (5. In particular. t) Φ22 (tf . t) dt 22 =− Noting that d dt (5. Sf ) = −Φ−1 (tf . t.157) are to be analyzed. the elements of the coefficient matrix of (5. t) (5.154). by the assumed invertibility of Φ22 (tf . Before explicitly doing this.1. Sf ) satisfies the Riccati differential equation (5. t) is now developed by determining ¯ 22 the differential equation for Φ−1 (tf .76). t) Φ22 (tf . identified as ¯ ¯ S(tf . t) A(t) − B(t)R(t)−1 C(t) . t) DT . t). t) = d dt ¯ − S(tf . t) = Φ−1 (tf .158) becomes d ¯ −1 ¯ ¯ ¯ Φ22 (tf . 22 ¯ ¯ G(tf . . S(tf . t).154) is a symmetric matrix. t) = −Φ−1 (tf . tf ) = DT . t).194 Chapter 5. t) by noting d ¯ −1 d ¯ ¯ ¯ Φ22 (tf . The differential equation for F (tf . then (5. t).155) (5. tf ]. t) dt =− A(t) − B(t)R−1 (t)C(t) F T (tf . t) Φ21 (tf .158) (5. 22 ¯ F T (tf . using (5. t) = DΦ12 (tf . Sf )B(t)R−1 (t)B T (t) F T (tf .4. t) Φ22 (tf . t)Φ21 (tf .159) ¯ Then. 22 dt dt From the adjoint differential equation for ΦH (tf . Sf )B(t)R−1 (t)B T (t) Φ−1 (tf .161) T d F T (tf . t. these elements will be seen to combine into a Riccati variable for the constrained control problem. t. Our objective is to find the propagation equations for these elements and discuss their properties. t)B(t)R(t)−1 B T (t) 22 dt T ¯ + Φ22 (tf . t) + Φ−1 (tf . As given in Theorem 5. − S(tf .155) d ¯ −1 Φ (tf . and the coefficient matrix of (5. (5. t)B(t)R−1 (t)B T (t)F T (tf . to d ¯ ¯ ¯ ¯ Φ12 (tf . t0 )x(t0 ). (5. t. t) A(t) − B(t)R−1 (t)C(t) ¯ − Φ12 (tf . t0 ) is invertible. t) 22 dt dt 22 ¯ ¯ = Φ11 (tf . G(tf . as d ¯ d ¯ −1 ¯ ¯ Φ12 (tf . t).163) by D and DT .163) 22 22 22 dt By premultiplying and postmultiplying (5. Since it was already shown that S(tf .164) Note that G(tf . By using (5. This can occur only if G(tf . t) − Φ12 (tf . t0 )F (tf .149). ensuring a finite ν for a finite x(t0 ). LQ Problem with Linear Terminal Constraints 195 In a similar manner. (5. respectively.165) The invertibility of G(tf . t) is obtained by direct ¯ ¯ 22 differentiation of Φ12 (tf . t) and G(tf . t) = Φ−T (tf . t0 ) is invertible.5. t) = F (tf .154) is symmetric. Sf ) is symmetric. tf ) = 0. t)B(t)R(t)−1 B T (t)Φ−1 (tf . the coefficient matrix of (5. t)Φ−1 (tf . Assuming G(tf . t).5. t). t) generated by (5. Sf )B(t)R−1 (t)B T (t) ¯ × Φ22 (tf .166) . t)−1 ¯ ¯ ¯ ¯ ¯ = Φ11 (tf . t. t)Φ21 (tf . t) B(t)R−1 (t)B T (t)Φ−1 (tf . t) is ˙ G(tf . t0 ) is known as a normality condition. (5. ν = −G−1 (tf . For this to happen it is necessary for D to be full rank. t). t)B(t)R−1 (t)B(t) + Φ12 (tf . the differential equation for G(tf . t) A(t) − B(t)R−1 (t)C(t) T T − S(tf . Our objective is to determine ν in terms of x(t0 ). t). the differential equation for G(tf . t)Φ−1 (tf . t) Φ (tf .162) is reduced further.165) to eliminate ν in (5.164) is symmetric. t)Φ−1 (tf . t) + Φ12 (tf .162) 22 22 Equation (5. (5.153). (5. and using the definitions of F (tf . ¯ λ(t0 ) = S(tf . t) Φ−1 (tf . t0 )x(t0 ). by using the symplectic identity of (5. i.167) into (5. . tf ) is not defined because G(tf . and G(tf . t) is reflected in the behavior of uo (t) near tf . t). t. t)F (tf .e. LQ Control Problem ¯ S(tf . S(tf . t) satisfies the same Riccati differential equation as (5. t)G−1 (tf .167). t) defined in (5.167). If t0 is considered to be the present time t. F (tf . t) need be propagated backward in time.167) The optimal control can now be written as an explicit function of the initial state. t) = −Φ−1 (tf . (5. 12 (5.167).168) ¯ S(tf . t) reduces to ¯ ¯ ¯ S(tf . then introducing (5. tf ] but only until G(tf . usually some very small time step away from tf . t) may have a computational savings over the integration of the transition matrix. t)Φ11 (tf . t). only S(tf . uo (t) reacts by emphasizing the satisfaction of the constraints rather than reducing the performance criterion. This ¯ ¯ ¯ allows a proper initialization for S(tf . The integration of S(tf .78).. From (5. (5. t). S(tf . t) is invertible. t.155) to (5. Sf ) − F T (tf .157) are used in (5. ¯ The behavior of S(tf . For large deviations away from the terminal manifold. Once S(tf . t) is formed. t)]x(t). if all the terminal states are constrained. Furthermore. Sf ). and (5. t).46) results in the optimal control rule for the terminal constrained optimal control problem as ¯ uo (t) = −R−1 (t)[C(t) + B T (t)S(tf .166) and (5. then ¯ by the symplectic identities. t) do not have to be integrated over the entire interval [t0 . G(tf . Furthermore. This can be verified ¯ by time differentiation of S(tf . t) directly is in applying the proper ¯ boundary conditions at tf .169) ¯ The major difficulty with propagating S(tf . D = I. tf ) is not invertible. t) = S(tf . t) and F (tf .196 where Chapter 5. The following theorem is similar to that of Brockett [8] and others. which is a solution to the Riccati equation (5.78).170) . t) = subject to the differential constraint x = A(t)x(t) + B(t)v(t).20). LQ Problem with Linear Terminal Constraints 197 In the next subsection.172) (5. t). Theorem 5. Then there exists a control u(·) on the interval t0 ≤ t ≤ tf that minimizes (5.5.17) subject to the differential constraint (5. (5. This is done by converting the original problem to one in which the quadratic cost criterion is only a function of a control variable. Sf ) x(t). 5.20) if and only if there exists a v(·) on the interval t0 ≤ t ≤ tf which minimizes J1 (v(·). we demonstrate that the invertibility of G(tf . ˙ where A(t) = A(t) − B(t)R−1 (t) C(t) + B T (t)S(tf .165) is actually a controllability requirement. t. t. Sf ) with the boundary conditions x(t0 ) = x0 and Dx(tf ) = 0.173) (5. we note that if we let v(t) = u(t) + R−1 (t) C(t) + B T (t)S(tf . t) is equivalent to a controllability requirement associated only with the required terminal boundary restriction (5.20) requires a controllability condition.5.5. Proof: By proceeding exactly as was done to obtain (5. which is just G(tf . exists on the interval t0 ≤ t ≤ tf . Sf ). x0 .18) and the boundary conditions (5.1 Assume that the symmetric matrix S(tf . The minimization of this new performance criterion subject to given initial conditions and the terminal constraints (5.171) 1 2 tf t0 v T (t)R(t)v(t)dt (5.96).19) and (5.1 Normality and Controllability for the LQ Problem We show here that the normality condition assumed in (5. t.5. t)B(t)R−1 (t)B T (t)ΦT (tf . First. (5.175) which is subject to the differential constraint (5. 0)B(t)R−1 (t)B T (t)S ∗ (tf . t) is G(tf .161) is F T (tf . t0 ) = −DWA (tf . 0) = 0. 0) − S ∗ (tf . t)DT . (5.177) in (5. the cost function upon which v(·) has influence is J1 (v(·).170) subject to (5. For this problem. A Using (5. The solution to this homogeneous Riccati equation with zero initial condition is S ∗ (tf . 0) = −A(t)S ∗ (tf .171) when (5. LQ Control Problem then the cost specified in (5. t0 )DT . t0 . tf . 0). x0 .78) as ˙ S ∗ (tf . t) = xT (t0 )S(tf . t.177) (5.164). the solution to G(tf . t. The propagation of the linear differential equation (5. Sf )x(t0 ) + 2 2 tf t0 v T (t)R(t)v(t)dt. t. t) = ΦT (tf .179) . t) = 1 2 tf t0 v T (t)R(t)v(t)dt. t0 ) = t0 ΦA (tf . t. Sf ). where the ∗ superscript is used to denote dependence on the Riccati variable S(tf .176) (5.171) using the technique given at the beginning of this section.173) is substituted into (5. We now proceed to solve this accessory problem of minimizing (5.96) can be written as 1 1 J (v(·). 0) is propagated by (5.178) WA (tf . 0) = 0 over the interval t0 ≤ t ≤ tf . where tf (5. t)dt A (5. t. t.174) Since x(t0 ) = x0 is given. t. t.18).198 Chapter 5. x0 . S ∗ (tf . 0)AT (t) + S ∗ (tf . the Riccati variable S ∗ (tf . Q(t) and C(t) are now zero. Theorem 5. tf ) + WA (t. WA (tf .184) . note that the controllability Grammian tf WA (t0 . the invertibility of G(tf . t0 ) does not depend upon controllability of the entire state space but depends only on the controllability to the desired terminal manifold Dx(tf ) = 0. tf ) = A(t)WA (t. by (5.5.5. To 0 do this. then x(t) = (A(t) + B(t)Λ(t))x(t) + B(t)v(t) = A(t)x(t) + B(t)v(t) ˙ (5. t)B(t) = 0. t)B(t)B T (t)ΦT (t0 . t0 ) is a controllability condition for v(t) to reach the terminal manifold. tf ] and if u(t) = v(t) + Λ(t)x(t). t) and G(tf . t)dt A (5.5. tf ) = t0 ΦA (t0 . t0 ) are the same as those that would be generated by the original problem. Clearly. LQ Problem with Linear Terminal Constraints 199 is the controllability Grammian. (5. Note that F (tf . the invertibility of G(tf .181) (5. Then. t)B(t) = 0 for all t in the interval implies 0 that xT ΦA (t0 . tf ]. t0 ) is also a controllability condition on u(t).173). a contradiction to the controllability assumption. tf ) = 0. tf )AT (t) − B(t)B T (t).2 If x(t) = A(t)x(t) + B(t)u(t) ˙ is controllable on the interval [t0 .180) is also controllable for any finite piecewise continuous Λ(t) on the interval [t0 . invertibility of G(tf .183) can be obtained by integrating the linear matrix equation ˙ WA (t.182) (5. Therefore. Proof: We need to show that xT ΦA (t0 . 186) (5.1 for reaching the manifold (5. tf ).2 we see that the controllability Assumption 5. t)dt.189) From Theorem 5.5.181) does not change controllability. xT ΦA (t0 .188) Since by hypothesis. E(tf ) = 0.188) and 0 0 xT WA (t0 . It has a linear differential equation ˙ E(t) = A(t)E(t) + E(t)AT (t) + B(t)Λ(t)WA (t.179).186) evaluated at t0 this implies that 0 xT WA (t0 . then xT E(t0 )x0 = 0 by (5. tf )ΛT (t)B T (t) ΦT (t0 . t)B(t) = 0. WA (tf . tf )ΛT (t)B T (t).1 implies the normality condition G(tf .190) (5. tf ). tf ) = 0. (5. 0 which is a contradiction of the assumed controllability of x(t) = A(t)x(t) + B(t)u(t).187) ΦA (t0 .20) is restricted to complete controllability. LQ Control Problem Similarly. the controllability Grammian (5. This observation implies that state feedback as given in (5. t0 ) < 0. t) < 0 ∀ t ∈ [t0 . tf ) t0 + WA (t. But from (5. tf ) − WA (t. When the controllability Assumption 5. ˙ (5. Form the matrix E(t) = WA (t. t)x0 = 0 by (5. tf )AT (t) − B(t)B T (t).179) for A(t) is obtained from ˙ WA (t. tf ) + WA (t. tf )x0 = 0. tf ) + WA (t.3.3. tf ) = A(t)WA (t. Then E(t0 ) = − tf (5. then G(tf .200 Chapter 5. A (5.185) (5.191) . t) B(t)Λ(t)WA (t. t) exist for all t in the interval [t0 .17) with linear terminal constraints (5.167) for all t in the interval [t0 . t.78).5. Sf ) x(t) 2 R(t) + B T (t)F T (tf . t0 ) of (5.5. x(t0 ).17) can be manipulated into a perfect square as J (u(·). and that of G(tf .1. it is shown that if there exists an S(tf . Sf )x(t0 ) 2 1 + xT (t0 )F T (tf .4 for the unconstrained problem. tf ). t) in (5. tf − ∆. Consider adding to J (u(·). t) F (tf . t) x(t) ν tf (5. t) G(tf .164). 0. then J (u(·). ¯ First. t) G(tf . 0. Sf ) F T (tf .20) to be ¯ positive definite when x(t0 ) = 0 is that the Riccati variable S(tf . t) x(t) ν dt = 0.5. t) in (5.193) One difficulty occurs at t = tf .192) t0 d dt xT (t). LQ Problem with Linear Terminal Constraints 201 5. the cost criterion (5. Sf ) . t0 )ν − xT (tf )DT ν 2 tf + t0 R−1 (t) C(t) + B T (t)S(tf . t0 . (5.2 Necessary and Sufficient Conditions for the Positivity of the Terminally Constrained Quadratic Cost Criterion The objective of this section is to show that a necessary and sufficient condition for the quadratic cost criterion (5. S(tf . ν tf t0 T T S(tf . t0 ) = 1 T x (t0 )S(tf . to ) is positive definite. t.17) the identically zero quantity 1 − 2 + 1 2 x (t). For small enough ∆. t. the propagation equation for F (tf . tf − ∆) is invertible for any ∆ > 0. By using the Riccati equation (5. Sf ) F T (tf . t) defined by (5. ν T S(tf .161).4. t) F (tf . Since the system is assumed completely controllable by Assumption 5. t)ν + u(t) dt. t0 )ν + ν T G(tf . tf ). then G(tf . where ν cannot be determined in terms of x(tf ).4. The results of this section closely parallel those of Section 5. .193) becomes zero and is at its minimum value. tf ). Therefore. tf ). controllability. t0 ) is positive definite. 2 (5.194) or (5.4. t) exists for t0 ≤ t < tf . for some t close enough to tf . x(t ). By using the optimal control given by (5. Given Assumption 5.168) apply. the optimal control laws (5.202 Chapter 5. t )x(t ) can go to neither positive nor negative infinity for all finite x(t ) and for any t in the ¯ interval [t0 . t ) exists for all t in [t0 . Sf ) (5. That ¯ is. tf ). the arguments of Section 5. t ) = xT (t )S(tf . t)G−1 (tf . Any other control which satisfies the terminal boundary condition that Dx(tf ) = 0 and is unequal to uo (·) will give a positive value to J (u(·).194) for tf − ∆ ≤ t ≤ tf and (5.5. uo (t) remains finite in that interval. the integral part of the cost in (5. In the interval t0 ≤ t ≤ tf − ∆ the optimal control is given by (5.98). S(tf . tf − ∆)F (tf . this open-loop control will satisfy the terminal boundary conditions. t) over the interval [t0 .4. With x(t0 ) = 0.1.165) evaluated at t0 = tf − ∆. given that the cost J (u(·). and the optimal cost starting at some x(t ) is 1 ¯ J (uo (·). the optimal control is the null control. tf ]. Since all the factors in (5. t )x(t ).194) − B T (t)F T (tf .4 imply that since xT (t )S(tf . the optimal control in an interval tf − ∆ ≤ t ≤ tf is open-loop over that interval. 0. Therefore. where ν is determined by (5.3. These results are summarized as Theorem 5. B T (t) ΦH (t. 0.4.195) By applying a control suggested by (5. then S(tf . tf − ∆) I S(tf . are the same as that given in Section 5. and positivity of the cost ¯ for x(t0 ) = 0.194) remain finite in [tf − ∆.4. tf − ∆) x(tf − ∆).168) for t0 ≤ t ≤ tf − ∆ to replace the control u(·).168). ¯ The arguments for the existence of S(tf . tf − ∆. t0 ). given by uo (t) = −R−1 (t) C(t). LQ Control Problem exists. 2 Theorem 5.198) (5. θ(φ1 ) = 0. 2 (5. 0.196) This satisfies the Hamilton–Jacobi–Bellman equation given in Theorem 4. t)x(t). LQ Problem with Linear Terminal Constraints 203 Theorem 5.20) is that there exist a function S(tf . Remark 5. t0 ) is strongly positive definite by a proof similar to that used in Theorem 5. t.5.5.1. given in Example 5. t0 ) to be positive definite for the class of controls which satisfy the terminal ¯ constraint (5. dφ 1 ˙ −λ = Hθ = (u2 + cos2 θ)− 2 sin θ cos θ.4.4. tf ) which satisfies the Riccati equation (5.5.7.5. θ(0) = 0.1.199) (5.3 can be extended to show that J (u(·).5.4.5. Hu = λ + u(u2 + cos2 θ)− 2 = 0.167) using its component parts may not ¯ be possible since S(tf .5. to the terminal constrained problem of reaching a given terminal point. t) by (5. t) can exist.3 Note that the optimal value function for the LQ problem with terminal constraints is given as 1 ¯ V (x(t).3 of Section 5.197) subject to (5. λ(φ1 ) = ν.200) . The first-order necessary conditions are dθ = Hλ = u.3 Given Assumption 5. t) = x(t)T S(tf .7. Remark 5.1 of Section 4. The problem is to find the control u that minimizes φ1 J= 0 (u2 + cos2 θ) 2 dφ 1 (5. Sf ) may not exist.5.5.4. t) for all t in [t0 .111) at fixed terminal φ = φ1 with constraint θ(φ1 ) = 0.1 Construction of S(tf . 1 (5. ¯ Remark 5. whereas S(tf .1 (Shortest distance between two points on a sphere) In this section the minimum distance problem is generalized from the unconstrained terminal problem. a necessary and sufficient condition for J (u(·).78). 0. Example 5. G(φ1 ) = 0 ⇒ G(φ) = − tan(φ1 − φ). (5. LQ Control Problem The first-order necessary conditions are satisfied by the trajectory u(φ) = 0. At that point any great circle path from the point to the terminal point will give the same cost. the solution of the associated Riccati equation is obtained from the following equations and solutions: dS = S 2 + 1. Since for this example A = 0. dφ dG = F 2 . The second variational problem given in Section 5. then there are paths that do not even satisfy the first-order necessary conditions that can give smaller cost. dφ dF = SF. the second-order necessary conditions are generated. The second variational controller is . Q = −1. θ(φ) = 0 for 0 ≤ φ ≤ φ1 .204) From these solutions of S(φ). C = 0. From Theorem 5.3 it is necessary that the solution of the associated Riccati equation exist. B = 1. λ(φ) = 0(ν = 0).205) Note that the solution remains finite until the solution escapes at φ1 − φ = π.5. About this extremal path. F (φ). If the path is longer than π. R = 1. and G(φ) the associated Riccati solution is constructed as ¯ S(φ) = S(φ) − F 2 (φ)G−1 (φ) = cot(φ1 − φ).203) (5. F (φ1 ) = 1 ⇒ F (φ) = sec(φ1 − φ).204 Chapter 5.201) subject to dδθ dφ = δu.202) (5. S(φ1 ) = 0 ⇒ S(φ) = − tan(φ1 − φ). dφ (5. δθ(0) = 0 for fixed terminal φ = φ1 with the terminal constraint δθ(φ1 ) = 0.2 is to find the perturbed control δu that minimizes 2δ 2 J = 0 φ1 (δu)2 − (δθ)2 dφ (5. LQ problem with constant coefficients. t.5. Second. t.96). respectively. t. 5. Finally. t. Note that the gain is initially positive. if Sf = 0 and the above restrictions hold. the unconstrained optimization problem of (5. This is done precisely in the manner used to obtain (5. t).6. now we are concerned with two 1 2 problems having terminal weights Sf and Sf . (5. This is extremely important for the next section. Sf ) are solutions of (5.78) for Sf .6. First.95) is indexed with a superscript 2 and added to a cost criterion having a terminal . This implies. S(tf . indicating that the best neighboring optimum controller is essentially divergent for π/2 < φ1 − φ < π but becomes convergent over the interval 0 < φ1 − φ ≤ π/2. t.17). S(tf . t.1 If Sf and Sf are two terminal weights in the cost criterion (5. as in Theorem 5. where we analyze the infinite-time. by restricting certain matrices to be positive semidefinite.18) is converted.5.17) and (5. 0) is monotonically increasing as tf increases. 1 2 Theorem 5. that S(tf . then the difference S(tf . Sf ) exists. it is shown that if the terminal weight in the cost criterion is increased. Proof: First.6 Solution of the Matrix Riccati Equation: Additional Properties Three important properties of the solution to the matrix Riccati equation are given here. Sf ) − S(tf . 1 2 1 2 such that Sf − Sf ≥ 0. t. Therefore. for ¯ example. Sf ) and S(tf . Sf ) is shown to be nonnegative definite and bounded. However. Solution of the Matrix Riccati Equation: Additional Properties 205 δu = − cot(φ1 − φ)δθ. the solution to the corresponding Riccati equation is also increased. Sf ) ≤ S(tf . 2 Sf . respectively. Sf ) ≥ 0 for all t ≤ tf 2 1 2 1 where S(tf . t. and S(tf . to an equivalent problem.1. Sf ) x0 ≥ 0 0 (5. then the optimal cost must be nonnegative. Sf ) x(t) + 2 t0 2 ×R(t) u(t) + R−1 (t) C(t) + B T (t)S(tf . Clearly. Sf ) exists. t. 2 where it is assumed that S(tf .209) for all x0 and t0 ≤ tf . t0 . Sf − Sf ) = −A(t)S ∗ (tf . t0 . Sf )x(t0 ) + min u(·) 2 2 1 tf T 2 u(t) + R−1 (t) C(t) + B T (t)S(tf . t0 . t. Sf − Sf ) − S ∗ (tf . Sf ) + S ∗ (tf . Sf − Sf )AT (t) dt 1 2 1 2 S ∗ (tf . t0 . Sf − Sf ). Sf − Sf )x0 . t0 . (5. Sf ) x(t) dt . t. implying that 1 2 xT S(tf . Sf ) x(t). Sf ) = S(tf . t. Sf ) − S(tf .206) = xT (t0 )S(tf . t. t. The result is that the optimal cost for weight Sf is 1 T 1 x (t0 )S(tf . t.208) 1 2 Since Sf − Sf is nonnegative definite. t. then a relationship exists as 0 1 2 1 2 S(tf . if again 2 v(t) = u(t) + R−1 (t) C(t) + B T (t)S(tf .210) d ∗ 1 2 1 2 1 2 S (tf . t0 . t.1 Since this new control problem (5. Sf − Sf ) satisfies the homogeneous Riccati equation (5. t0 . Sf )x(t0 ) 2 1 T 1 2 1 2 x (tf )(Sf − Sf )x(tf ) (5. (5. t0 .211) . t. t. Remark 5. 1 2 where S ∗ (tf . Sf − Sf ). tf .206 Chapter 5. as 1 2 xT S(tf . a new problem results. LQ Control Problem 1 1 weight Sf . Sf − Sf )B(t)R−1 (t)B T (t)S ∗ (tf .208) will have an optimal cost as 1 2 xT S ∗ (tf . t0 .6. Sf ) − S(tf . Sf − Sf ) = Sf − Sf . Sf ) x0 0 (5. 1 2 1 2 S ∗ (tf .207) = min x (tf ) v(·) T 1 Sf − 2 Sf 1 x(tf ) + 2 tf t0 v T (t)R(t)v(t)dt . 212) 1 2 Note that if S ∗ (tf . Theorem 5.78). where certain elements of the weighting function are allowed to go to infinity.1. t. Intuitively. since G(tf . Sf ) exists but that it is nonnegative definite. then a linear matrix differential equation 1 2 1 2 1 for S ∗−1 (tf . S(tf . the differential equation for S ∗ (tf .6.2 Given Assumptions 5.6. Sf −Sf ) results by simply differentiating S ∗ (tf . t.213) for t ∈ [t0 . the solution to the Riccati equation (5. t.6. 207 (5. t. Sf − 2 Sf ) = I. regardless of t0 . Existence of the solution to the Riccati equation is of central importance.1. t) is that of (5. Remark 5. Sf ) . t) = −F T (tf . t) < 0 for t ∈ [to.5. t. t0 . t. t)F (tf . the difference between the constrained Riccati matrix ¯ S(tf . the constrained problem can be thought of as a limit of the unconstrained problem.2. and 5. Assumption 5.6. t) and the unconstrained Riccati matrix S(tf .211). t)G−1 (tf . but the boundary condition at tf is not defined.4. 5. t. Results using the following restrictive but useful assumption guarantees not only that S(tf . Sf ) is ¯ S ∗ (tf .1 Sf ≥ 0 and Q(t) − C T (t)R−1 (t)C(t) ≥ 0 for all t in the interval [t0 . and is nonnegative definite. tf ) by virtue of Theorem 5.2 From (5. exists on the interval t0 ≤ t ≤ tf . Fur¯ thermore.6. tf ). Solution of the Matrix Riccati Equation: Additional Properties with 2 A(t) = A(t) − B(t)R−1 (t) C(t) + B T (t)S(tf . Sf −Sf ) has an inverse. t) ≥ 0 (5.2.5.167). tf ].1. Sf −Sf )S ∗−1 (tf . . Sf ). Then. LQ Control Problem Proof: From (5. it is shown that when Sf = 0.e.6.1.218) must be controllable with respect to v(t). for an arbitrary initial state as 1 T 1 T x (t0 )S(tf . + 2 t0 where the cross term between u(t) and x(t) is eliminated.2. the cost is bounded from above and below. (5.2.217) (5. The final two theorems deal with the monotonic and asymptotic behavior of the Riccati equation under Assumptions 5. xT (t0 )S(tf .1 and by Assumption 5. it is .1. Sf ) is a monotonically increasing function of tf . 5.. even if t0 goes to −∞. t. since the original system is controllable with respect to u.6.5. The cost can now be converted to the equivalent form 1 T 1 T x (t0 )S(tf .214) u(·) 2 2 1 tf T x (t)Q(t)x(t) + 2uT (t)C(t)x(t) + uT (t)R(t)u(t) dt . t0 . Furthermore. the minimum cost is related to the Riccati variable. Sf )x(t0 ) ≥ 0 ⇒ S(tf . Sf and Q(t) − C T (t)R−1 (t)C(t) are nonnegative definite.1.208 Chapter 5. the new system x(t) = A(t) − B(t)R−1 (t)C(t) x(t) + B(t)v(t) ˙ (5. Sf )x(t0 ) = min x (tf )Sf x(tf ) (5.4. Therefore.1. Since R(t) is positive definite by Assumption 5. then by Theorem 5. t0 . and 5. S(tf .2. First.216) v(·) 2 2 1 tf T x (t) Q(t) − C T (t)R−1 (t)C(t) x(t) + v T (t)R(t)v(t) dt . t0 . Sf )x(t0 ) = min x (tf )Sf x(tf ) (5.215) regardless of t0 . + 2 t0 Let us make a change in controls of the form u(t) = v(t) − R−1 (t)C(t)x(t). if it exists. Sf ) ≥ 0. t0 . i. the cost must be nonnegative definite for all x(t0 ).97). 0)x0 . 0)x(t1 ) 2 1 T = x (t0 )S(t1 .1. t0 .3 Given Assumptions 5. t1 . 0) as the difference tf − t0 goes to infinity. Therefore. 0)x0 ≥ xT S(t1 .2. t1 . Solution of the Matrix Riccati Equation: Additional Properties 209 shown that S ∗ (tf . t0 .1. Sf ) and S(tf .5. S(tf . t0 ) approach S(tf . t1 . 0))x(t0 ). t0 .1. 0) ≥ 0 by Theorem 5. t0 .223) . t0 . t1 . t. (5.221) Since S(tf . xT S(tf . (5.6. t1 .219) Proof: The optimal cost criterion can be written as 1 T x (t0 )S(tf . t0 . t0 . 0)) ≥ 0. Theorem 5. S(tf . t0 . 0))x(t0 ) = xT (t0 ) [S(t1 .4. 5. 0). S(tf . t0 . 0))] x(t0 ).4. t1 . S(tf . t0 . 0 0 (5. (5.210). then S ∗ (t1 .1. (5.6. 0)x0 is bounded for all x0 regardless of t0 ≤ tf . t0 .6. t0 .6. t0 . t0 . 0 implying S(tf . 0) ≥ S(t1 . t0 .2. Sf . xT (t0 )S(tf . S(tf .222) By Assumption 5. xT S(tf . 0) + S ∗ (t1 . t0 . 0)x(t0 ) = xT (t0 )S(t1 . Sf ) goes asymptotically to zero as a function of t as t → −∞ ¯ regardless of the boundary condition.220) 2 From (5. 0) ≥ S(t1 . This means that S(tf . 0)x(t0 ) 2 1 tf T x (t)Q(t)x(t) + 2uT (t)C(t)x(t) + uT (t)R(t)u(t) dt = min u(·) 2 t0 1 t1 T x (t)Q(t)x(t) + 2uT (t)C(t)x(t) = min u(·) 2 t0 1 + uT (t)R(t)u(t) dt + xT (t1 )S(tf . and 5. 0) for t0 ≤ t1 ≤ tf . t0 .6.1. If the system is completely observable. t ) t ΦT (t. t0 )N T (t)N (t)ΦA (t.2 The dynamic system x(t) = A(t)x(t). y(t) = N (t)x(t) (5. t0 . assumed to be nonnegative. t0 ) > 0. t ]. ˙ (c) S ∗ (tf . Assumption 5.226) where the observability Grammian matrix t M (t0 . Therefore. t0 < tf . then the initial state can be determined as x(t0 ) = M −1 (t0 . x(t0 ) = x0 is asymptotically stable. 0) > 0 ∀ t0 in − ∞ < t0 < tf . A t0 (5.6. and A(t) defined in (5.172) with Sf = 0. J (u(·). t0 )N T (t)y(t)dt. (b) x(t) = A(t)x(t).228) for all x0 = 0. the following hold: (a) S(tf .6. t0 . Theorem 5. ˙ x(t0 ) = x0 . tf ] and u(·) = 0.1. where t0 < t < t ≤ tf . t )x0 > 0 0 (5. 0) → 0 as t0 → −∞ ∀ Sf ≥ 0.225) is completely observable on the interval [t. LQ Control Problem In the second theorem we show that S ∗ (tf . t ∈ (t0 . and 5. t ) = t0 ΦT (t. 0)x(t0 ) > 0 for all x(t0 ) = 0. 5. t0 . and y T (t)y(t) = xT (t)Q(t)x(t).4 Given Assumptions 5.4. then xT (t0 )S(tf . x0 .227) is invertible for all t in the interval t0 < t ≤ tf . t0 )dt A (5. Sf ) goes to zero for all Sf ≥ 0 as t0 goes to −∞ by requiring an observability assumption.210 Chapter 5. If the cost with a terminal constraint is nonzero for all x(t0 ).6.224) is observable where N (t) is the square root of the matrix Q = Q(t)−C T (t)R−1 (t)C(t). This condition will be shown to be guaranteed by ensuring that y(t) given by y(t) = N (t)x(t) (5. .2. This means that t t xT (t)Q(t)x(t)dt = t0 t0 y T (t)y(t)dt = xT M (t0 . 229) is a Lyapunov function for t < tf . t.216). t. Therefore. t. t. 0) > 0. 0)x(t) is a Lyapunov function. t. 0)x0 → ∞.6.230) By using (5. 0)x(t) dt ˙ = xT (t) AT (t)S(tf . (5. S(tf .2. t. Let us now use the optimal cost function (5. If x(t1 ) = 0. First. d T x (t)S(tf . xT S(tf . d dt x(t)T S(tf . 0) ≥ 0 and bounded. But this contradicts the fact that S(tf . t. This results in S(tf .232) goes to ∞. t. 0)B(t)R−1 (t)B T (t)S(tf . 0)B(t)R(t)−1 B T (t)S(tf .232) where t0 < t < tf such that S(tf . 0) x(t). using (5. implying that xT S(tf . determine if the rate of change of the original cost function is negative definite.2. t. t. 0) < ∞ 0 . 0)x0 > 0 0 (5.229) for all t0 in −∞ < t0 < tf . 0) + S(tf .6. 0) x(t)dt. t0 . By Assumption 5. t0 . t0 .2. 0)A(t) + S(tf .78) with Sf = 0.6. By integrating (5. Solution of the Matrix Riccati Equation: Additional Properties 211 Proof: From Theorem 5. since x(t)T S(tf . 0) > 0. t0 .5. t1 ].229) as a Lyapunov function to determine if x(t) in condition (b) is asymptotically stable. as t1 −t0 → ∞ the right-hand side of (5.6. 0)x(t) ≤ 0 and the optimal cost function (5. t1 . 0)x(t) dt = −xT (t) Q(t) + S(tf . (5. t. d T x (t)S(tf . 0) x(t). (5. 0)x0 − xT (t1 )S(tf . by Assumption 5. t0 . Therefore.231) Therefore. t.231). xT S(tf . for all x0 = 0. then x(t) = 0 for all t ∈ [t0 . t0 . 0)x(t1 ) 0 t1 = t0 xT (t) Q(t) + S(tf . t) is stable and approaches zero as t goes to −∞. F (tf . Sf ) → S(tf .234). tf .233) Note that F (tf . tf ) and G(tf .213). the differential equation ˙ F (tf .161). t) = −F (tf . tf ) = K. then the solution and the behavior of S(tf . Sf ) can be determined from F (tf .234) (5. However.235) From the conditions (b) of the theorem. from (5.6. t)A(t)T . defined in (5. We now consider condition (c). G(tf . By writing Sf = KK T . Since F (tf . tf ) = −I. 0) as t → −∞ for all Sf ≥ 0. x(t1 ) → 0 as t1 − t0 → ∞ for any Sf ≥ 0.6. the evaluation of G(tf . implying that G(tf . t) and G(tf . (5. tf ) are chosen to be consistent with S ∗ (tf . .3 Note that S(tf . t.6. Remark 5.4 The assumptions of controllability and observability are stronger conditions than are usually needed. t) must always be negative definite for t ≤ tf . t. F (tf . t. it is ¯ noted that S ∗ (tf . satisfies the homogeneous Riccati equation (5. then stabilizability of x(t) is sufficient for condition (b) of Theorem 5.2 after Theorem 5. t. t). Remark 5.6. t) satisfies. t).2 and x0 is given and assumed finite. tf ) = K. (5. Sf ) = Sf . t)−1 exist for all t ≤ tf .1.4 to still hold. Sf ) → 0 as t → −∞. For example.6. if certain states are not controllable but naturally decay. S ∗ (tf . From (5. Therefore. LQ Control Problem by Theorem 5. if the boundary conditions for F (tf . t) → 0 as t → −∞.6. then F (tf . In Remark 5.211).164) and (5.212 Chapter 5. C.1 For the LQR problem. t0 − ∆. 0) approaches zero.6. linear quadratic regulator problem The optimal control problem of Section 5. t0 . S(tf .6. S(t0 . there is a unique. to the algebraic Riccati equation (ARE) (A − BR−1 C)S + S(A − BR−1 C)T + (Q − C T R−1 C) − SBR−1 B T S = 0 (5. S(tf .7. t0 . then S(tf . This specialized problem is sometimes referred to as the linear quadratic regulator (LQR) problem. t0 .2. t0 . 0)). for some ∆ > 0.237) . tf − ∆. the time interval over which the cost criterion is to be minimized is assumed to be infinite.2.236). t0 . Since the parameters are not time dependent. then as tf − t0 → ∞. 0) approaches S. and 5. 0) is bounded for all tf − t0 . (5. symmetric. As might be suspected by the previous results. As S(tf . Theorem 5. That is. t.7 LQ Regulator Problem The LQ control problem is restricted in this section to a constant coefficient dynamic system and cost criterion. LQ Regulator Problem 213 5.6. Since from Theorem 5.4. Proof: From Theorem 5.6. Furthermore. 5. positive-definite solution. S. t0 .236) such that (A − BR−1 C − BR−1 B T S) has only eigenvalues with negative real parts. and R are all constant matrices. t0 − ∆. given Assumptions 5. S(tf . 0) reaches an upper limit ˙ of S. B. 0)) = S(tf .5. 0) = S(tf − ∆.1. t0 − ∆. 0) is monotonic in tf .2 is specialized to the linear regulator problem by requiring that tf be infinite and A.3.7. implying that S must satisfy the ARE (5. 0) depends only upon tf − t0 and is monotonically increasing with respect to tf − t0 . a linear constant gain controller results from this restricted formulation. Q. S(tf .1. S(tf . S(tf . S(tf .4.214 Chapter 5. 0) and tf −∆. 0) go to S such that (5. (5. t0 − ∆. S is positive-definite and x is asymptotically stable.4. t0 .6.63) as H = −J T H T J.e. i.238) and S is a fixed-point solution to the autonomous (time-invariant) Riccati equation.4.239) is vividly obtained by using the canonical transformation introduced in Section 5. By conditions (a) and (b) of Theorem 5. 0) = S(t0 . 0) and S(tf . by condition (c) of Theorem 5. S). S is unique. (5. LQ Control Problem where the time invariance of the system is used to shift the time. Furthermore.237) becomes S = S(tf .. of (5. First. t0 .239) The characteristic equations for H and −H can be obtained by subtracting λI from both sides of (5. H.6. The relationship between the Hamiltonian matrix.48) and the feedback dynamic matrix A of (5. t0 − ∆. therefore.241) . Continuity of the solution with respect to the initial conditions implies that as ∆ → ∞.3. t0 . this implies that the eigenvalues of the constant matrix A = A − BR−1 C − BR−1 B T S have only negative real parts.240) (5. additional properties of Hamiltonian systems are obtained for the constant Hamiltonian matrix by rewriting (5. Sf ) approaches the same limit regardless of Sf ≥ 0 and. t0 −∆ become tf . (5. Since S is a constant. t0 −∆. S(tf . S(t0 .240) and taking the determinant as det(H − λI) = det(−J T H T J − λI) = det(−J T H T J − λJ T J) = det J T det(−H − λI) det J = det(−H − λI). tf − ∆. 5. If the Hamiltonian matrix has no eigenvalues on the imaginary axis.4. Therefore. then there is at most one solution to the matrix Riccati equation which will decompose the matrix such that the eigenvalues of A have only negative real parts. Furthermore. then the remaining n eigenvalues are λi+n = −λi . .84).1. If n eigenvalues are λi . the feedback dynamic matrix A with S > 0 contains all the left-half plane poles in H. .242) From (5. The following theorem . . similar to (5.2. 5. in which it will be shown that S > 0 is only one of many solutions.6. Note that from Theorem 5.243) This form is particularly interesting because A and −AT contain all the spectral ¯ information of both H and H since L is a similarity transformation.236).7.1. This will be the case even if we are not restricted to Assumptions 5.7. i = 1. The numerous solutions to the ARE. which is the solution to the ARE (5. there is a question as to how many eigenvalues lie on the imaginary axis. n. the real parts of the eigenvalues of A are negative. LQ Regulator Problem 215 Since the characteristic equations for H and −H are equal. i = 1. the transformed Hamiltonian matrix H is ¯ LHL−1 = H = A −BR−1 B T 0 −AT .1. (5. it is seen that there are just as many stable eigenvalues as unstable ones. . .242). . the eigenvalues of the 2n × 2n matrix H are not only symmetric about the real axis but also about the imaginary axis.6.84) is used with the steady state S. and 5. By using this canonical transformation. (5. . the canonical transformation of (5. . will give various groupings of the eigenvalues of the Hamiltonian matrix. n. To better understand the spectral content of H. x(t0 ). . By taking the limits as tf goes to infinity means x(tf ) goes to zero by the assumption that A1 and A2 are stable.245) 2 2 ∞ 1 T T B (S1 − S2 ) x(t) R−1 B T (S1 − S2 ) x(t) dt. such that both the eigenvalues of A1 = A−BR−1 B T S1 and A2 = A−BR−1 B T S2 have only negative real parts. Suppose that xT S1 x0 ≥ xT S2 x0 and let u(t) = −R−1 (C + 0 0 0 0 B T S2 )x(t). t0 ) = 1 T x (t0 )S1 x(t0 ) − xT (tf )S1 x(tf ) 2 tf 1 u(t) + R−1 C + B T S1 x(t) + t0 2 1 T x (t0 )S2 x(t0 ) − xT (tf )S2 x(tf ) = 2 tf 1 u(t) + R−1 C + B T S2 x(t) + 2 t0 (5.244) T R u(t) + R−1 C + B T S1 x(t) dt T R u(t) + R−1 C + B T S2 x(t) dt. Proof: Assume.96). x(t0 ). to the contrary. which both produce stable dynamic matrices A1 and A2 . and the cost can be written as J (u(·). that there are two symmetric solutions. therefore. the hy0 0 pothesis that there can be two distinct solutions to the ARE. By proceeding as in (5.7.2 For the LQR problem. S1 and S2 . the cost can be written as J (u(·). + t0 2 which contradicts the hypothesis that xT S1 x0 ≥ xT S2 x0 and.216 Chapter 5. there is at most one symmetric solution of SA + AT S − SBR−1 B T S + Q = 0 having the property that the eigenvalues of A − BR−1 B T S have only negative real parts. then there exists an x0 such that xT S1 x0 = xT S2 x0 .94) to (5. then S is unique. t0 ) = 1 T 1 x (t0 )S2 x(t0 ) = xT (t0 )S1 x(t0 ) (5. Theorem 5. LQ Control Problem from Brockett [8] demonstrates directly that if the real parts of the eigenvalues of A are negative. Since S1 and S2 are symmetric and S1 = S2 . 5.1.1. the unconstrained terminal control problem may not have a finite cost.6.6.6.1. It can well ¯ occur that S(tf . by requiring that limtf →∞ x(tf ) → 0.7. then the solution to the ARE is not necessarily nonnegative definite. S(tf . For consistency of notation.1.2. requiring ¯ Assumptions 5. and 5.10) the linear dynamics in z are ˜ adjoined by the Lagrange multiplier λ in the integrand and the terminal constraints are adjoined by ν in the quadratic terminal function. ˜ the variables given in Section 5.1. 0) can have a finite escape time.2 and using Remark 5. Remark 5. t. where in (5. 5. having integrated λT (t)x(t) by parts and using the . where the augmented cost (5. with augmented variation in the terminal constraints and free terminal time. (5. 0). ¯ This is not generally the case. ¯ the terminally constrained problem may have a finite cost where limt→−∞ S(tf .4. In (5.24) is extended for free terminal time.2 It is clear from the canonical transformation of the Hamiltonian ma¯ trix H into H of (5. However. In particular. Furthermore.7. and 5.15) and for this free terminal time problem. From Theorem 5. 5. free terminal time problem are determined.6. and E = dΩ dt is used.2 implies that as t → −∞. is given. t.10) the second ˆ variational augmented cost criterion δ 2 J.243) that a necessary condition for all the eigenvalues of A to have negative real parts is for H to have no eigenvalues which have zero real parts. t) → S.2 are not required.8. t.1 Since Assumptions 5. whereas S(tf . For the regulator problem. Necessary and Sufficient Conditions for Free Terminal Time 217 Remark 5. 0). t) → S(tf .1. the ˜ additional notation DT = dψ .10) is considered rather than (5.4. t) ≥ S(tf . 5. dt ˜ E = Ωx .8 Necessary and Sufficient Conditions for Free Terminal Time In this section necessary and sufficient conditions for optimality of the second variation to be positive and strongly positive for the terminally constrained. 0) remains bounded even as t → −∞.5. S(tf .3. t. We return to Section 5. ε)dt + O(ε). Following a procedure similar to that .23). x0 . uo (·). the variation in the terminal time. ν) = to f t0 xo (t)Q(t)δx(t) + uo (t)C(t)δx(t) + xo (t)C T (t)δu(t) 1 T ˙ + δuT (t)R(t)δu(t) + uo (t)R(t)δu(t) + λT (t)δx(t) 2 + λT (t)A(t)δx(t) + λT (t)B(t)δu(t) dt (5. ∆o . x0 . t) + λT (t)x(t) dt + λT (t0 )x0 − λT (to )x(to ) f f t0    Sf D T E T x(to ) f 1 o T T T  ˜ x(tf ) ν ∆  D 0 D (5. ∆. x0 . ν) = Chapter 5. a variation in ∆ must also ˆ be considered as δ∆ = ∆ − ∆o . x0 . LQ Control Problem ˙ H(x(t). λ(·). Furthermore. The change in J is made using these variations and thereby the expansion of (5. ∆.28) is extended for free terminal time as ˆ ˆ ∆J = ∆J(u(·).246) + ν . variations in the control are made as δu(t) = u(t) − uo (t) and produces a variation in x(t) as δx(t) = x(t) − xo (t).247) T T T + (−λT (to ) + xo (to )Sf + ν T D + ∆o E)δx(tf ) f f + (x oT T (to )E T f ˜ ˜ + ν DT + ∆o E)δ∆ + T to f t0 O(t. The notation is now ˜ consistent with that given for the LQ problem except for the inclusion of ∆. 2 ˜ ˜ ∆ E D E to f ˜ where λ and ν have been replaced by λ and ν. as ˆ J (u(·). ˆ To determine the first-order necessary conditions for variations in J to be nonnegative. λ(·). λ(·). λ(·). where variations of λ(t) and ν multiply the dynamic constraints and the terminal constraint. ν) − J (uo (·). ν) ˆ ˆ = J (u(·). ∆o . ∆. which we assume are satisfied. λ(t)u(t). which is a control parameter entering only in the terminal function. respectively.218 Hamiltonian given in (5. λ(to ) = Sf xo (to ) + DT ν + E T ∆o . t0 ) E Φ12 (to . 5.3. t0 )Φ−1 (to . Theorem 5.8. the terminal constraint is (5.1 are satisfied.1.253) ˙ λ(t) = −AT (t)λ(t) − Q(t)xo (t) − C T (t)uo (t).8.3.50) to relate the initial and terminal values of (x. t0 ) DΦ12 (tf . (5. Using the transition matrix of (5.2.252). f f ˜ 0 = Dxo (to ) + DT ∆o .154) except that the variation in terminal time is included.252) and simplifying the notation by using (5. t0 )Φ−1 (to . t0 )Φ21 (to . t0 )DT f f λ(t0 )  −T o o ¯ ¯ ¯ 22  0  =  DΦ22 (tf . 5.251). and 5.250) (5. t0 )E T + DT   ν  . the boundary condition for the Lagrange multiplier is (5.5.1.249) (5.254) f f  ∆o −1 o ¯ ¯ ˜ E Φ12 (to .1 Suppose that Assumptions 5. For free terminal time. (5. t0 )Φ22 (tf .2.251. and the transversality condition is (5. t0 )E T + E f  .248) (5. The generalization of (5. t0 )DT f 0 −T o −1 o ¯ ¯ ¯ ˜ E Φ22 (tf . t0 ) Φ−1 (to . f 0 = R(t)uo (t) + C(t)xo (t) + B T (t)λ(t). t0 )DT + D f    ¯ 22 Φ−1 (to . we obtain a new expression similar to (5. we summarize these results for the free terminal time problem as Theorem 5.2.250). f ˜ ˜ 0 = Exo (to ) + Dν + E∆o .252) (5. t0 )Φ22 (tf .72).154) for the free terminal time problem has the form  ¯ −1 o  ¯ ¯ 22 −Φ22 (tf . ˙ xo (t0 ) = x0 .251) (5. Then the necessary conditions for ∆J to be nonnegative to first order for strong perturbations in the control (5.250.8.1. 5. λ) in the above equations (5. Necessary and Sufficient Conditions for Free Terminal Time 219 which produced Theorem 5. t0 )E T f  x0 ¯ ¯ 22 ˜ DΦ12 (to .26) are xo (t) = A(t)xo (t) + B(t)uo (t). t. t). t. S(to . (5.258) m(to . Sf ) F T (to .   o ∆ (5.255) ¯ ¯ 22 ¯ ¯ 22 The differential properties of −Φ−1 (to . t) are given in Theorem 5. For free terminal time the f quadratic cost can be shown to reduce to a form 1 ˜ J (uo (·).254). t) f f f λ(t)    G(to .163).155). t) f dt d T o n (tf . t ) exists. t ) is determined by f f f relating λ(t ) to x(t ) as ˜ λ(t ) = S(to .259) ˜ ˜ where t is close enough to to such that S(to . f 2 (5.4. f (5. F (to . and s(to . t). t ) = xT (t )S(to . t) n(to . t )x(t ). Sf ). t) dt d s(to . (5. and G(to . (5.78). t).160) f f f and (5. t) mT (to . to ) = E.157) with the elements in (5. and Φ12 (to . LQ Control Problem We now make the following identifications with t = t0 as done in (5.260) . t)B(t)R(t)−1 B T (t)mT (to . f f ˜ s(to . t). Sf )B(t)R(t)−1 B T (t) f (5.220 Chapter 5. t)Φ21 (to . x(t ). and therefore the differf ential equations for S(to . and (5. to ) = D. t. to ) = E. t)B(t)R(t)−1 B T (t)mT (to . t) nT (to . t) f f f f ¯ 22 Φ−1 (to . t) s(to . and (5. Φ−1 (to . t) f f f    x(t)   ν . t). yielding the symmetric form   S(to .160).164). f f This approach is sometimes called the sweep method because the boundary conditions at t = to are swept backward to the initial time. t).256) ˜ n(to . n(to .163) as d mT (to . t) are given in (5. t).1. and f f f (5. t) f f f  0 m(to .257) (5. f f = F (to .156). t) f dt = − A(t) − B(t)R(t)−1 C(t) × mT (to . t)  0  =  F (to . f f (5. The dynamics of m(to . t).161). f f = m(to . f T − S(to . t )x(t ). t) are determined from (5. 5.261) Substituting back into (5. t) f  . where  ˜ f S(to . Remark 5. For the problem to be normal the inverse in (5. t) f m(to . t) by (5. S(to . t) is replaced ˜ by S(to . where S(tf . t) = xT (t)S(to .8.4. t) f m(to .2 Construction of S(to . t) f x(t). t) f −1 × (5. Sf ).5.1.3. t).255) in terms of x(t) as − ⇒ G(to . t) can exist.3.262) must exist. t) satisfies the same Riccati equaf tion as S(to . (5.1 The proof for positivity of the second variational cost remains the ˜ same as given in Theorem 5. Furthermore. f 2 This satisfies the Hamilton–Jacobi–Bellman equation given in Theorem 4. t) f n(to .168). t) nT (to . t) f ν ∆o = F (to . t0 is strongly f positive definite by a proof similar to that used in Theorem 5.260).255) results in (5. t) f ν ∆o s(to . t. t. Necessary and Sufficient Conditions for Free Terminal Time where ν and ∆o are eliminated in (5.262) using its component parts may not f ˜ be possible since S(tf .4 Note that the optimal value function for the LQ problem with terminal constraints is given as 1 ˜ V (x(t). t) nT (to .8.8. to . t) f m(to . Sf ) may not exist. t) f f s(to .263) . f Remark 5. t) f f G(to . (5. whereas S(to .3 Theorem 5. t) f f n(to . t) f −1 221 x(t) =− G(to . t) f s(to .262) Remark 5. f ˜ Remark 5. t) f F (to . 0.3 can be extended to show that J u(·).7. the feedback control with free f ¯ terminal time and terminal constraints is given by (5.5. Similarly. Sf ) − f F (to . F (to . t) = S(to .4. t. t)x(t).1.8. t) f f n(to . given in Theorem 5. t) nT (to . t)T mT (to .8. 271) (5.1 (Second variation with variable terminal time) The objective of this example is to determine the first.270) (5.269) (5.273) . The augmented performance index is J = tf + ν T ψ + where H = λT f = λ1 v cos β + λ2 v sin β.266) where v is constant and β is the control variable. (5. ν) = tf + ν T xf .222 Chapter 5. and the transversality condition is ˜ 0 = H(tf ) + φtf = λ1 v cos βf + λ2 v sin βf + 1.272) 0 = Hu = Hβ = λ2 v cos β − λ1 v sin β. The problem statement is as follows: find the control scheme which minimizes J = tf = φ(xf . x(t0 ) = x10 x20 = x0 (5.264) and the terminal boundary condition ψ(xf . λ(tf ) = φTf = ν ⇒ λ(t) = ν.268) (5. LQ Control Problem Example 5. x tf t0 H − λT x dt.265) (5. x(t0 ) = x0 . tf ) = x1f x2f = xf = 0.and second-order necessary conditions for a free terminal time optimal control problem. (5.267) (5.8. tf ) with the dynamic system equations x= ˙ x1 ˙ x2 ˙ = v cos β v sin β = f. ˙ T ˜ ˙ λ = −Hx = 0. ˙ (5. ˜ φ(tf . xf . First-order necessary conditions from Chapter 4 are T x = Hλ = f. (5. we obtain λ2 (−x10 ) = λ1 (−x20 ). Necessary and Sufficient Conditions for Free Terminal Time From the optimality condition (5.276) to = t0 + f x 2 + x2 10 20 .276) (5.275) Next we determine to . If we restrict x10 > 0 and x20 > 0.276) for β o . Thus cos β o = −x10 x2 10 + x2 20 . x2 (t) = x20 + (t − t0 )v sin β o .272) λ2 cos β(t) = λ1 sin β(t) ⇒ tan β(t) = λ2 = constant λ1 223 ⇒ β o (t) = constant. we obtain x1 (t) = x10 + (t − t0 )v cos β o . we obtain x10 + (to − t0 )v cos β o = 0.277) (5. then β satisfies f π < βo < 3π 2 and (5.265) while keeping β o (t) = β o = constant.273). f x20 + (to − t0 )v sin β o = 0.5.275) into (5.266).274) and the transversality condition (5.280) . v (5.279) Substituting (5. 0 = −λ1 x10 x2 10 + x2 20 + −λ2 x20 x2 10 + x2 20 1 + . we obtain tan β o = x20 ⇒ β o = tan−1 x10 x20 x10 .278) Therefore.8. Substituting (5. (5.274) Integrating the dynamics (5. from (5.278) into (5. (5.277) has only one solution. f Solving (5. v (5. sin β o = −x20 x 2 + x2 10 20 . 78). For the example they are x = Bu. and (5. n(to .   ∆o 0 fT 0  0 I 0   (5. t) = f f f f f f T .282) (5. The necessary conditions for the variation in the cost criterion to be nonnegative are given in Theorem 5. Sf ) = 0. (5. .256). (5. we obtain λ 1 = ν1 = x10 v x 2 + x2 10 20 . the terminal boundary conditions for (5.255) the differential equations (5.289) From (5. where B = fβ = ˙ f1β f2β = −v sin β o v cos β o . (5. Hββ = R = 1. Q = 0. u = −v(λ2 cos β o − λ1 sin β o ). t) = 0.255) reduces to f   λ(t) 0 0  x(t) T  =  I fβ fβ ∆t f   ν  . G(to . where A = 0. m(to . (5. F (to . S(to ..281) Next. and (5. T Therefore. where ∆t = tf − t. (5. LQ Control Problem λ2 = ν2 = x20 v x 2 + x2 10 20 . f ˜ 0 = Dx(to ) + DT ∆o = x(to ) + f ∆o .257). Therefore.1.285) as   λ(to ) f 0 0    x(to ) 0 I 0 f  =  I 0 f  ν .284) (5. 0 = v(λ2 cos β o − λ1 sin β o ) + u.287) (5.224 Solving these equations for λ1 and λ2 . λ(to ) = Dν = Iν ⇒ λ(t) = ν.283).286) (5.164).285) (5. t. (5. t) = I.258) are solved. 0 fT 0 ∆o  (5. (5..288) ˙ λ = 0.284). Therefore. f f ˜ ˜ 0 = Dν = f T ν. (5. t) = 0.161). and B and R are given above. t) = fβ fβ ∆t. where E = 0 and E = 0. the second variation necessary and sufficient conditions are to be developed. and s(to .8. C = 0. Chapter 5.283) (5.290) .255) are given by (5. Note that S(to .9. Therefore. t)x(t). where ∆t > 0.292) ˜ Substitution of (5.293) ˜ S(to . t) = − 2 f v ∆t sin β o − cos β o sin β o − cos β o . i..290) T fβ fβ ∆t f 225 f T 0 ν ∆o = −I 0 x(t).    sin2 β o v 2 ∆t − cos β o sin β o v 2 ∆t cos β o v − sin β o cos β o v 2 ∆t cos2 β o v 2 ∆t sin β o v (5. t) is well behaved for all t < to .e. 5. (5.5.9 Summary A consistent theory is given for a rather general formulation of the LQ problem.290) gives λ(t) = S(tf . The relationship between the transition matrix of the Hamiltonian system and the solution to the matrix Riccati differential equation is . where 1 ˜ S(to . Summary From (5.291) is invertible for ∆t > 0. (5. This is because perturbation orthogonal to the extremal path does not affect the cost.291) The coefficient matrix of (5. the extremal path f f ˜ satisfies the condition for positivity and is a local minimum. t) ≥ 0 f is only positive semidefinite.292) into (5. The theory has emphasized the time-varying formulation of the LQ problem with linear terminal constraints. so that ν ∆o = −      = −    T fβ fβ ∆t f −1 f T 0 I 0 x(t)       x(t). the determinant is −v 4 ∆t. the switch occurs when Hu goes through zero. Initially. although the control weighting was assumed to be positive definite. In fact. For example. then the variation of the control about the discontinuity must be a strong variation. .6. For example.226 Chapter 5. the existence of the solution to the Riccati differential equation is required for the cost criterion to be positive definite and strongly positive definite. If there are discontinuities. LQ Control Problem vividly shown using the symplectic property of Hamiltonian systems. the local optimality of the path is determined. 4] the cost criterion is required to be strongly positive to ensure that the second variation dominates over higher-order terms. see the Bushaw problem in Sections 4. First-order optimality of the cost criterion with respect to the switch time is zero and corresponds to the switch condition for the control with respect to the Hamiltonian. However. The second-order necessary and sufficiency conditions are for extremal paths which satisfy weak or strong first-order conditions.4. the development of second-order conditions is done by converting the problem from explicitly considering strong variations to assuming that the times that the control switches are control parameters.7. in the Bushaw problem. This is not included in the current theory. As shown in Section 5.5 and [22. no requirement was placed on the state weighting in the cost function. By using this cost criterion.1.1 and 4. then the requirement that the cost criterion be positive definite is not enough. it is assumed that there are no discontinuous changes in the optimal control history. With respect to these control parameters. where the control is bang-bang. Second-order optimality of the cost criterion with respect to the switch time can be found in [32] and [20]. Additional extensions of the LQ theory to several classes of nonlinear systems and other advanced topics are given in [29] and [31]. If this problem were interpreted as the accessory problem in the calculus of variations [6]. and second-order necessary conditions for optimality? Don’t bother to solve them.9. show that their product is symplectic. 2. Summary 227 Problems 1. A matrix H is “Hamiltonian” if it satisfies J −1 H T J = −H. Minimize the performance index tf J= 0 1 cxu + u2 dt 2 with respect to u(·) and subject to x = u.5.294) is the fundamental symplectic matrix. where J= 0 I −I 0 (5. (b) For c = 1 what are the values of tf for which the second variation is nonnegative? (c) For c = −1 what are the values of tf for which the second variation is nonnegative? . (a) What are the first. If two square matricies are symplectic. Show that the matrix U −1 HU is also Hamiltonian.295) (5. where U is a symplectic matrix. ˙ x(0) = 1. 3. (d) Show that by extremizing the cost with respect to the Lagrange multiplier associated with the terminal constraints. t). (b) Relate the results of the second variation to controllability. the multipliers maximize the performance index. . show that the extremal is locally minimizing. determine if xo and uo are locally minimizing. b.228 Chapter 5. satisfies the same Riccati differential ¯ equation as (5. Show that the extremal path and control xo (t) = 0. Consider the problem of minimizing with respect to the control u(·) ∈ UT B the cost criterion tf J= 0 (aT x + uT Ru)dt. x(tf ) = 0.167). x(tf ) = 0. R > 0.169).78) by performing time differentiation of S(tf . ˙ a. Show that S(tf . defined in (5. Consider the problem of finding u(·) ∈ U that minimizes tf J= 0 1 + u2 1 + x2 1/2 dt subject to x = u. LQ Control Problem 4. (a) By using the results of the accessory minimum problem. (c) Show that the second variation is strongly positive. By using the second variation. ˙ x(0) = x0 (given). ¯ 5. x(0) = 0. t) using (5. tf ] satisfies the first-order necessary conditions. uo (t) = 0 for t ∈ [0. subject to x = Ax + Bu. 6. u ∈ U.9. Solve the optimization problem min J = x2 (tf ) + u tf 0 u2 (t)dt. s(s + 2) (5. (c) Calculate the transfer function k T (sI − A)−1 B. Verify that the Hamiltonian is constant along the path. Consider the frequency domain input/output relation y(s) = where y = output and u = input. Summary 7.298) is desired. u ∈ R1 . (5.297) A feedback law of the form u = −kx (5. 9. (5. Determine uo as a function of the ˙ time t and initial condition x0 . Derive the first-order necessary conditions of optimality for the following problem: 1 1 min J(t0 ) = xT (tf )S(tf )x(tf ) + u 2 2 tf t0 xT (t)Q(t)x(t) + uT (t)R(t)u(t) dt . (a) Obtain a minimal-order state representation for (5.296).300) such that x = u and x(0) = x0 x. Determine the constant vector k such that J is minimized.299) What is the interpretation of this transfer function? Prove that this transfer function will be the same regardless of the realization used.5. 8.296) J= 0 (y 2 + u2 )dt. (5. (b) Consider the cost functional ∞ 229 1 u(s). Solve the following tracking problem using the steady state solution for the Riccati equation 1 1 min J = p [x(tf ) − r(tf )]2 + u 2 2 subject to x = ax + bu ˙ and a reference input r(t) = 0.301) Determine the optimum control uo . u ∈ R1 ) (5. . 10. the optimum trajectory. and the tracking error. t > 0. LQ Control Problem x(t) = A(t)x(t) + B(t)u(t) + d(t) ˙ and d(t) ∈ Rn is a known disturbance. (x. t < 0. e−t . Examine the influence of the weighting parameter q.302) tf 0 [q(x − r)2 + u2 ]dt (5.230 such that Chapter 5. a quadratic function of the state is assumed to be the optimal value of the cost. In particular. one designed to minimize the cost criterion and the other to maximize it. we show that this saddle point can be obtained as a perfect square in the adversaries strategies. Since the problem is linear in the dynamics and quadratic in the cost criterion. Note that in the game problem strategies that are functions of the state as well as time are sought.4. In this section a generalization is introduced where there are two controls. the cost criterion was minimized with respect to the control. The issue of feedback control when the state is not perfectly known is considered next. we show that the optimal solution satisfies a saddle point inequality.231 Chapter 6 Linear Quadratic Differential Games 6. By following the procedure given in Section 5. then the other player gains.4.1 Introduction In the previous developments. Our objective is to derive a control synthesis method called H∞ synthesis . We first consider the LQ problem and develop necessary conditions for optimality. That is. if either player does not play his optimal strategy. This function is then used to complete the squares. whereby an explicit form is obtained in a perfect square of the adversaries strategies. and we revert . u(t) ∈ Rm . W T (t) = W (t) > 0.232 Chapter 6.2)). x(t0 ) = x0 .2. the players f u(·) and w(·) are cooperative (w(·) minimizes the cost criterion (6. x0 . LQ Differential Games and as a parameter goes to zero the H2 synthesis method is recovered. This controller is then specialized to show an explicit full state feedback and a linear estimator for the state. This is shown by formulating a related differential game problem.2) which maximizes the performance criterion J(u(·). and QT = Qf . It is shown that under certain conditions the disturbance attenuation function can be bounded. Note that if the parameter θ < 0. QT (t) = Q(t) ≥ 0. the solution to which satisfies the desired bound for the disturbance attenuation problem. 6. w(t) ∈ Rp . The results produce a linear controller based upon a measurement sequence.2 LQ Differential Game with Perfect State Information The linear dynamic equation is extended to include an additional control vector w as x(t) = A(t)x(t) + B(t)u(t) + Γ(t)w(t) . t0 ) = 1 T 1 x (tf )Qf x(tf ) + 2 2 T tf t0 T (xT (t)Q(t)x(t) (6. and w(·) ∈ W (U and W are similar admissible sets defined in Assumption 3. ˙ (6.1) The problem is to find control u(·) ∈ U which minimizes. To this end a disturbance attenuation function is defined as an input–output transfer function representing the ratio of a quadratic norm of the desired system outputs over the quadratic norm of the input disturbances. Relationships with current synthesis algorithms are then made.2) + u (t)R(t)u(t) − θw (t)W −1 (t)w(t))dt. where x(t) ∈ Rn . w(·). RT (t) = R(t) > 0. t. the strategies should be functions of both state and time.4. (6. Qf )x(t). For these strategies to be useful. Qf )x(t)dt ˙ tf t0 1 = 2 1 ˙ x (t)SG (tf . t. This choice for the optimal value functions seems natural.6. Qf ) will be generated by a Riccati differential equation consistent with the game formulation.2. a saddle point inequality is sought such that J(u◦ (·). We consider only θ > 0. t. t0 ) ≤ J(u(·). then the players u(·) and w(·) are adversarial where u(·) minimizes and w(·) maximizes J(u(·). 2 where SG (tf . given that the optimal value function for the LQ problem in Chapter 5 are all quadratic functions of the state. the other gains. We add the identity − tf t0 xT (t)SG (tf . If θ > 0. w◦ (·). Qf ) contributes to the quadratic form. w(·). We assume that the saddle point value of the cost is given by the optimal value function J(u◦ (·). t0 ) = V (x(t). We use a procedure suggested in Section 5. x0 . t0 ).4 to complete the square of a quadratic form. x0 . since whatever one player loses. the other player gains. t. x0 . w◦ (·). x0 . Qf )x(t)dt − xT (t)SG (tf . Unlike the earlier minimization problems. t. w◦ (·). t) = 1 T x (t)SG (tf .3) The functions (u◦ (·). Our objective is to determine the form of the Riccati differential equation. x0 . If either player deviates from this strategy. Note that since only the symmetric part of SG (tf . Note the negative weight in the cost penalizes large excursions in w(·). w(·). t0 ). w◦ (·)) are called saddle point controls or strategies. LQ Differential Game with Perfect State Information 233 back to the results of the last chapter. t0 ) ≤ J(u◦ (·). only the symmetric part is assumed. Qf )x(t) 2 T tf t0 . t. This is also called a zero sum game. t. Qf )(B(t)R−1 (t)B T (t) − θ−1 Γ(t)W (t)ΓT (t))SG (tf . tf . t. 2 (6. t.2) as ˆ J(u(·). SG (tf . t0 ) tf Chapter 6. t. Qf ). Qf )x(t) 2 tf tf t0 1 + xT (tf )Qf x(tf ) 2 = t0 1 T x (t)(Q(t) + SG (tf . t. SG (tf . (6. Qf )A(t) + AT (t)SG (tf . Qf ) = SG (tf . w(·). t. w(·). t. t. SG (tf . Qf ). SG (tf . Qf ) = Qf . t0 . Qf )x(t)) dt + xT (t0 )SG (tf . t. t. t. Qf ). Qf )x(t) 2 tf t0 1 + xT (tf )Qf x(tf ). Qf )A(t) + AT (t)SG (tf . Qf ) 2 1 1 ˙ + SG (tf . t.5) ˆ J(u(·). (6. Qf )x(t0 ). t0 ) reduces to squared terms in the strategies of u(·) and w(·) as ˆ J(u(·). Qf ) + SG (tf . t. Qf )(B(t)u(t) + Γ(t)w(t)) dt 1 − xT (t)SG (tf . Qf ) to satisfy the matrix Riccati equation ˙ Q(t) + SG (tf . x0 . Qf )x(t))T θW −1 1 × (w(t)θ−1 W (t)ΓT (t)SG (tf . Qf ) . t. t.234 to (6. w(·).4) By choosing SG (tf .6) 2 . t. Qf )x(t))T R(t)(u(t) + R−1 (t)B T (t) × SG (tf . x0 . t. Qf )x(t) dt 2 1 − xT (t)SG (tf . Qf ))x(t) + uT (t)R(t)u(t) − θwT (t)W −1 (t)w(t) 2 2 + xT (t)SG (tf . t. x0 . t. t0 ) = 1 2 tf t0 (u(t) + R−1 (t)B T (t)SG (tf . t. Qf )x(t)) − (w(t) − θ−1 W (t)ΓT (t)SG (tf . Qf )(A(t)x(t) + B(t)u(t) + Γ(t)w(t)) 1 ˙ + xT (t)SG (tf . t. LQ Differential Games = t0 1 T (x (t)Q(t)x(t) + uT (t)R(t)u(t) − θwT (t)W −1 (t)w(t)) 2 + xT (t)SG (tf . Note that the cost criterion (6. since the cost criterion is not a concave functional with respect to x(t) and w(t).5) remains bounded.3 Disturbance Attenuation Problem We now use the game theoretic results of the last section to develop a controller which is to some degree insensitive to input process and measurement disturbances.1. To make these statements more explicit. The objective is to design a compensator based only on the measurement history.8) (6.7) and the the solution to the Riccati equation (6.5) can have a finite escape time because the matrix (B(t)R−1 (t)B T (t) − θ−1 Γ(t)W (t)ΓT (t)) can be indefinite. Qf )x(t).3.2) may be driven to large positive values by w(·). x(t0 ) = xo . Consider the setting in Figure 6.1. Since we have completed the square in the cost criterion. t. This is proved for a more general problem in Theorem 6. There exists a solution u◦ (·) ∈ U and w◦ (·) ∈ W given in (6. Disturbance Attenuation Problem The saddle point inequality (6. tf ].6) produces a sufficiency condition for saddle point optimality. Qf ) exists for all t ∈ [t0 . 6.2.9) . Qf )x(t). Therefore. t. the cost criterion approaches infinity. t. (6. then for some x(t) as t → te . Equation (6. where te is the escape time. ˙ z(t) = H(t)x(t) + v(t). Qf ) escapes as t → te . (6. such that the transmission from the disturbances to the performance outputs are limited in some sense.3) is satisfied if 235 u◦ (t) = −R−1 (t)B T (t)SG (tf .1 The solution to the controller Riccati equation (6.3. consider the dynamic system x(t) = A(t)x(t) + B(t)u(t) + Γ(t)w(t).6. if SG (tf . Remark 6. t. w◦ (t) = θ−1 W (t)ΓT (t)SG (tf .7) to the differential game problem if and only if SG (tf . LQ Differential Games Disturbance (w. and x0 ∈ Rn is an unknown initial condition. where y(t) ∈ Rr . A general representation of the input-output relationship between disturbances ˜ w(t) = [wT (t). x0) Control (u) Performance Outputs (y) Plant Measurements (z) Compensator Figure 6. v T (t). The general performance measure can be written as y(t) = C(t)x(t) + D(t)u(t). such as good tracking error or low actuation inputs to avoid saturation. and H(t) are known functions of time.11) (6. v(t) ∈ Rq is the measurement disturbance error. xT ]T and output performance measure y(t) is the disturbance 0 attenuation function Da = y(·) 2 2 . v. The performance outputs are measures of desired system performance. w(t) ∈ Rm is the process disturbance error. where z(t) ∈ Rq is the measurement.1: Disturbance attenuation block diagram. B(t). The matrices A(t). Γ(t).236 Chapter 6. w(·) 2 ˜ 2 (6.10) . 14) In this formulation.6.15) ∆ for all admissible processes of w(t) and v(t) and initial condition x0 Rn .3. where. DT (t)D(t) = R(t). so that the disturbance attenuation problem is bounded as Da ≤ θ. we assume that the cross terms for the disturbances w(t) and v(t) in (6. Disturbance Attenuation Problem where we have extended the norm y(u(·)) tion as y(·) 2 2 2 2 237 to include a quadratic terminal func- = ∆ 1 T x (tf )Qf x(tf ) 2 tf + t0 (xT (t)Q(t)x(t) + uT (t)R(t)u(t))dt . As will be shown. 0 (6. (6.13) ∆ T and w(t) = [wT (t). C T (t)D(t) = 0. where the measurement history is Zt = {z(s) : 0 ≤ s ≤ t}. There exists a θc where if θ ≤ θc . the solution to the problem does not exist. . with V (t) = V T (t) > 0 and P0 = P0 > 0. The disturbance attenuation problem is to find a controller u(t) = u(Zt ) ∈ U ⊂ U. xT ]T . the solutions to associated Riccati equations may have a finite escape time.12) where the integrand is y(t) 2 2 = y T (t)y(t) = xT (t)C T (t)C(t)x(t) + uT (t)DT (t)D(t)u(t). U is the class of admissible controls and U ⊂ U is the subset of controllers that are linear functions of Zt . C T (t)C(t) = Q(t). The choice of θ cannot be completely arbitrary. ˜ 0 2 ∆ 2 = ∆ w(·) ˜ 1 2 tf t0 −1 (wT (t)W −1 (t)w(t) + v T (t)V −1 (t)v(t))dt + xT P0 x0 . v T (t).14) are zero. θ > 0. (6. (6. 238 Chapter 6. τ < t. ˆb The differential game is then to find the minimax solution as ˜ J ◦ (u◦ (·).20) t t . t0 . t. The saddle point condition is validated in Section 6. t < τ ≤ tf . w(·).vt f (·).3. tf ) = y(·) ˜ 2 2 − θ w(·) 2 . t) + min ˜ t utf (·) wtf (·). It can be shown [40] that the solution has a saddle point. and therefore this interchange of the min and max operations is valid. This problem is solved by dividing the problem into a future part. t0 . t0 . tf ) = min t 0 t t ut (·) wt (·). (6. w◦ (·). maximizing with respect to vt f (·) given the form of the performance index (6. ˜ 2 (6. t0 .19) ˜ Note that for the future time interval no measurements are available. Therefore. ˜ (6. τ > t. w◦ (·). where t is the “current” time.17) utf (·) wtf (·).vt f (·) t max t J(u(·).14) and (6.16) produces the worst future process if vt f (·) is given as v(τ ) = 0. tf ).x0 0 0 max J(u(·).1 The Disturbance Attenuation Problem Converted into a Differential Game This disturbance attenuation problem is converted to a differential game problem with performance index obtained from manipulating Equations (6.3.11) and (6.18) as ˜ J ◦ (u◦ (·). expand Equation (6. w(·).x0 0 0 t max t J(u(·). tf ) = min t 0 ∆ (6.16) For convenience define a process for a function w(·) as ˆ ˆ wa (·) = {w(t) : a ≤ t ≤ b} . and joining them together with a connection condition.18) We first assume that the min and max operations are interchangeable. LQ Differential Games 6.vt (·). (6. Therefore. w(·). tf ) . and past part.15) as J(u(·).3. w(·). t0 . 6. w(·). tf ) = min ˜ t 0 t t ut (·) wt (·). Disturbance Attenuation Problem 239 Therefore. t. t0 . the current time. if the cost criterion is maxit mized with respect to wt0 and x0 .6. The objective is to determine x(t) as a function of the measurement history by solving the problem associated with the past. (6.7) is u◦ (t) = −R−1 (t)B T (t)SG (tf . ˜ (6. the minimization is meaningless. t)]. t Since the state history Zt is known.vt (·). where t.x0 0 0 max [J(u(·). In particular. rather than t0 . the controller of Equation (6.22) to replace the second term in (6. t) + V (x(t). t0 . Qf )x(t) 2 (6. Second. the game problem associated with the future reduces to a game between only utf (·) and wtf (·).21) t t where x(t) is not known.2 Solution to the Differential Game Problem Using the Conditions of the First-Order Variations Using Equation (6.3.23) This game problem can be reduced further by noting that ut0 (·) has already occurred t and therefore. t0 ≤ τ < t. Qf )x(t). w◦ (·). the problem reduces to J ◦ (u◦ (·). t) = x(t)T SG (tf .24) .3. then the value of vt0 is a result and is eliminated in the cost criterion by using Equation (6.2 are now applicable.6). The results given in Section 6.19).9) as v(τ ) = z(τ ) − H(τ )x(τ ).22) from Equation (6. is considered the initial time. t. (6. Note that the optimal value function at t is 1 V (x(t). then the resulting state is determined. Qf )x(t) 0 subject to the dynamic equations x(t) = A(t)x(t) + B(t)u(t) + Γ(t)w(t).25) with the Lagrange multiplier λ(t).27) = max t 0 wt . t0 . w◦ (·).3.x 2 wt 0 0 t Chapter 6. LQ Differential Games (xT (τ )Q(τ )x(τ ) + uT (τ )R(τ )u(τ ))dτ t0 t −θ t0 (wT (τ )W −1 (τ )w(τ ) + (z(τ ) − H(τ )x(τ ))T V −1 (τ )(z(τ ) (6. t0 . tf ) t (6. 0 2 2 Integrate by parts to give ˆ ˜ J ◦ (u◦ (·). t. The augmented performance index is ˆ ˜ J ◦ (u◦ (·). tf ) ˜ 1 = max t . w◦ (·).1. (6. ˙ x(0) = x0 . 0 2 2 (6. t0 . Qf )x(t) .26) In a manner similar to that of Section 3.240 The optimization problem now reduces to J ◦ (u◦ (·).26) are augmented to the performance index (6. tf ) t = max t 0 wt .x0 t0 1 T x (τ )Q(τ )x(τ ) + uT (τ )R(τ )u(τ ) − θ(wT (τ )W −1 (τ )w(τ ) + (z(τ ) 2 − H(τ )x(τ ))T V −1 (τ )(z(τ ) − H(τ )x(τ ))) + 2λT (τ )(A(τ )x(τ ) + B(τ )u(τ ) + Γ(τ )w(τ ) − x(τ )) dτ ˙ 1 1 −1 − θ xT P0 x0 + xT (t)SG (tf .25) −1 − H(τ )x(τ )))dτ − θxT P0 x0 + x(t)T SG (tf . t.28) . the dynamics (6. w◦ (·). t.x0 t0 1 T x (τ )Q(τ )x(τ ) + uT (τ )R(τ )u(τ ) − θ(wT (τ )W −1 (τ )w(τ ) + (z(τ ) 2 − H(τ )x(τ ))T V −1 (τ )(z(τ ) − H(τ )x(τ ))) + 2λT (τ )(A(τ )x(τ ) ˙ + B(τ )u(τ ) + Γ(τ )w(τ )) + 2λT (τ )x(τ ) dτ − λT x t t0 1 1 −1 − θ xT P0 x0 + xT (t)SG (tf . Qf )x(t) . Disturbance Attenuation Problem 241 ˆ By taking the first variation of J ◦ (u◦ (·).31) −λT (t) + xT (t)SG (tf . 0 (6. w◦ (·).30) −1 −1 λT (t0 ) − θxT P0 = 0 ⇒ λ(t0 ) = θP0 x0 . The dynamic equations for the Hamiltonian system are x(τ ) = A(τ )x(τ ) + B(τ )u(τ ) + θ−1 Γ(τ )W (τ )ΓT (τ )λ(τ ) . where x(t0 ) = 0. Qf )δx(t). (6.31). tf ) ˜ ˜ t = t0 xT (τ )Q(τ )δx(τ ) − θ wT (τ )W −1 (τ )δw(τ ) − (z(τ ) − H(τ )x(τ ))T V −1 (τ )H(τ )δx(τ ) ˙ + λT (τ )A(τ )δx(τ ) + λT (τ )Γ(τ )δw(τ ) + λT (τ )δx(τ ) dτ − λT δx t t0 −1 − θxT P0 δx0 + xT (t)SG (tf . t0 . −1 λ(t0 ) = θP0 x0 .35) where over the interval t0 ≤ τ ≤ t. t0 . we ˜ obtain ˆ ˆ ˆ δ J = J(u◦ (·). x(t0 ) = x0 . (6. (6.32) (6.3.33) where τ is the running variable. τ . t. t. u(τ ) and z(τ ) are known processes. Qf ) = 0.1.34) ˙ ˙ λ(τ ) = −AT (τ )λ(τ ) − Q(τ )x(τ ) − H T (τ )θV −1 (τ )(z(τ ) − H(τ )x(τ )) .3.29) and then the first-order necessary conditions are ˙ λT (τ ) + λT (τ )A(τ ) + θ(z(τ ) − H(τ )x(τ ))T V −1 (τ )H(τ ) + xT (τ )Q(τ ) = 0. tf ) − J ◦ (u◦ (·).6.36) .3. t0 . (6. w(·). 0 (6. w(τ ) = θ−1 W (τ )ΓT (τ )λ(τ ). P0 )(x(τ ) − x(τ )) ˆ (6. w◦ (·). By examination of (6. tf ) as was done in Section 3. we conjecture and verify in Secˆ tion 6.3 that the following form can be swept forward: λ(τ ) = θP −1 (t0 . P0 )λ(τ ) − θ−1 P (t0 . τ . τ .38) . P0 )λ(τ ) + θ−1 P (t0 .34) and (6. τ . τ . P0 )λ(τ ) + θ−1 P (t0 . τ . P0 )Q(τ )θ−1 P (t0 . LQ Differential Games x(τ ) = x(τ ) + θ−1 P (t0 . P0 )AT (τ )λ(τ ) ˆ x − θ−1 P (t0 .37) To determine the differential equations for x(τ ) and P (t0 . P0 )λ(τ ). P0 )θH T (τ )V −1 (τ )H(τ )θ−1 P (t0 . τ . τ . P0 )λ(τ ) − θ−1 Γ(τ )W (τ )ΓT (τ )λ(τ ) + θ−1 P (t0 . P0 )H T (τ )V −1 (τ )(z(τ ) − H(τ )ˆ(τ )) ˙ = −θ−1 A(τ )P (t0 . τ .38) we obtain A(τ )x(τ ) + B(τ )u(τ ) + θ−1 Γ(τ )W (τ )ΓT (τ )λ(τ ) ˙ ˙ = x(τ ) + θ−1 P (t0 .35) into (6.37) ˆ to get ˙ ˙ ˙ x(τ ) = x(τ ) + θ−1 P (t0 .242 or that the solution for x(τ ) satisfies Chapter 6. Replace x(τ ) using Equation (6. P0 )AT (τ )λ(τ ) − θ−1 P (t0 . τ . P0 )λ(τ ). P0 )Q(τ )θ−1 P (t0 . P0 )λ(τ ) + B(τ )u(τ ) + θ−1 Γ(τ )W (τ )ΓT (τ )λ(τ ) x ˙ ˙ = x(τ ) + θ−1 P (t0 . Then A(τ )ˆ(τ ) + θ−1 A(τ )P (t0 . τ . τ . (6. Rewriting so that all terms multiplying λ(τ ) are on one side of the equal sign. τ . P0 )λ(τ ) + θ−1 P (t0 . P0 ). ˙ x A(τ )ˆ(τ ) − x(τ ) + B(τ )u(τ ) + θ−1 P (t0 . τ . P0 )λ(τ ).39) (6. τ . P0 )Q(τ )ˆ(τ ) − θ−1 P (t0 .37) in (6. τ . P0 )Q(τ )ˆ(τ ) x ˆ x + θ−1 θP (t0 . P0 )λ(τ ) − θ−1 P (t0 . τ . we differentiate (6. τ . τ . ˆ (6. τ . P0 )θH T (τ )V −1 (τ )(z(τ ) − H(τ )ˆ(τ )) + θ−1 P (t0 . ˙ ˆ Substitute (6. τ . τ .40) (6. P0 )λ(τ ).39). τ . τ . P0 )θH T (τ )V −1 (τ )θ−1 H(τ )P (t0 . P0 )λ(τ ) x − θ−1 P (t0 . P0 )(−AT (τ )λ(τ ) − Q(τ )x(τ ) ˆ − θH T (τ )V −1 (τ )(z(τ ) − H(τ )x(τ ))).41) (6. τ . P0 )H T (τ )V −1 (τ )(z(τ ) − H(τ )ˆ(τ )).45).44) may have a finite escape time because the matrix (H T (τ )V −1 (τ )H(τ ) − θ−1 Q(τ )) can be indefinite. P0 ) = A(τ )P (t0 . t. x(t0 ) = 0.44) are given in [40]. t. τ . t.3. In factored form from (6. P0 ) to satisfy ˙ − A(τ )P (t0 . τ . Therefore. Qf ))−1 x(t). τ . t. P0 )SG (tf . so that ˆ x◦ (t) = x(t) + θ−1 P (t0 . λ(t) = SG (tf . At the current time t from Equation (6.37). τ . then (6. P (t0 ) = P0 .3. τ . τ .3.46) . P0 )Q(τ )P (t0 .3. Some additional properties of the Riccati differential equation of (6. τ . P0 ) + P (t0 .6. τ .1 The solution to the estimation Riccati equation (6. τ . t. Remark 6.5) and (6. P0 )AT (τ ) + Γ(τ )W (τ )ΓT (τ ) − P (t0 . ˆ x◦ (t) = (I − θ−1 P (t0 . P0 )AT (τ ) 243 (6. τ . t. ˆ (6. P0 ) − P (t0 . P0 ) − Γ(τ )W (τ )ΓT (τ ) + P (t0 .42) − θ−1 P (t0 .43) or ˙ P (t0 . τ . the estimate x(t) and the worst state x◦ (t) are related as ˆ x(t) = (I − θ−1 P (t0 . P0 ) + P (t0 . (6. τ . τ . Qf )x◦ (t). x ˆ and P (t0 . Qf ))x◦ (t). Disturbance Attenuation Problem If we choose x(τ ) to satisfy ˆ ˙ x x(τ ) = A(τ )ˆ(τ ) + B(τ )u(τ ) + θ−1 P (t0 . P0 )Q(τ )ˆ(τ ) ˆ x + P (t0 . P0 )SG (tf . P0 ) = 0 (6. Qf )x(t). P0 )(H T (τ )V −1 (τ )H(τ ) − θ−1 Q(τ ))P (t0 . τ . t. P0 ) .45) (6.44) This is explicitly shown in Section 6.32). τ . the worst-case state x◦ (t) is equivalent to x(t) when using Equation (6. P0 )SG (tf .38) becomes an identity. P0 )H T (τ )V −1 (τ )H(τ )P (t0 . 49) (6.15) is satisfied and the game solution also provides the solution to the disturbance attenuation problem. Qf ) is determined by integrating (6. t0 . t0 . P0 ) is determined by integrating (6. tf ) ≤ J ◦ (u◦ (·). v ◦ (t) = 0.47) reduce to the H2 controller10 u◦ (t) = −R−1 (t)B T (t)SG (tf .51) In the stochastic setting this is known as the linear-quadratic-Gaussian (LQG) controller.2 Note that when θ → ∞ the disturbance attenuation or H∞ controller given by (6.49) we see that Equation (6. t. . Qf ))−1 x(t). (6.6.47) where SG (tf . The state is reconstructed from the filter with x = x◦ ˆ ˙ x x(t) = A(t)ˆ(t) + B(t)u(t) + P (t0 . t. Remark 6.5) with θ = ∞ and is identical to the Riccati equation in Theorem 5. ˆ x 10 (6. t. Qf )(I − θ−1 P (t0 .48) From Equation (6. t. Qf )ˆ(t). (6.50) where the controller gains are obtained from the Riccati equation (6.3. P0 )H T (t). w◦ (t) = 0.2. t. The optimal controller u(t) = u(Zt ) is now written as ˆ u◦ (t) = −R−1 (t)B T (t)SG (tf . u◦ (t) = 0. V −1 (t) (z(t) − H(t)ˆ(t)) .46) is the saddle value of the state that is used in the controller. t. then ˆ x◦ (t) = 0. ˆ It should be noted that if all adversaries play their saddle point strategy. tf ) = 0.44) forward from t0 . Note 6. w◦ (·). x (6.1 x(t) summaries the measurement history in an n-vector. t. P0 )SG (tf . x(t) = 0. which implies that ˜ ˜ J(u◦ (·).5) backward from tf and P (t0 . w(·).244 Chapter 6. LQ Differential Games The state x◦ (t) in Equation (6.3. 3. τ . t) = eT (t)P −1 (t0 .47) satisfies J(u◦ (·). P0 )Q(t)ˆ(t) ˙ x −P (t0 . tf ) ≤ 0. tf ) is concave with respect to w(·).44) with θ = ∞. To do this. t) = λ(t) = θP −1 (t0 . i. 2 (6. The properties of this Riccati equation are again similar to those given in Theorem 5.42) from (6.36). w(·). ˆ The dynamic equation for e(t) is found by subtracting (6.3 Necessary and Sufficient Conditions for the Optimality of the Disturbance Attenuation Controller In this section are given necessary and sufficient conditions that guarantee that the ˜ ˜ controller (6. P0 )e(τ ) dτ ˙ 2 t θ − eT (τ )P −1 (t0 .3. Qf )x(t)/2.54) t0 . the cost criterion is written as the sum of ˜ two quadratic forms evaluated at the current time t. P0 )e(t). we verify that the optimal −1 value function which sweeps the initial boundary function −θxT P0 x0 /2 forward is 0 θ ˜ V (e(t).25) by adding the identically zero quantity t 0 = t0 θ ˙ θeT (τ )P −1 (t0 . Disturbance Attenuation Problem 245 where the filter gains are determined from (6. t. t0 . t.6. P0 )e(t) as given in (6. P0 )e(τ ) + eT (τ )P −1 (t0 .52) ˜ where e(t) = x(t) − x(t) and Vx (e(t). 6.2 we complete the squares of the cost criterion (6. that J(u◦ (·). One term is the optimal value function xT (t)S(tf . t. P0 )e(τ ) 2 (6.e. t.53) As in Section 6. P0 )H T (t)V −1 (t)(z(t) − H(t)ˆ(t)). τ . t. where the cost criterion is swept backward from the terminal time to the current time t. For the second term.2. w(·). t0 . τ ..6.26) as x e(t) = A(t)e(t) + Γ(t)w(t) − θ−1 P (t0 . (6. P0 )Q(τ )ˆ(τ ) x − P (t0 . x(t). the ˆ control in the quadratic term is a given process over the interval [t0 . t. since we assumed x(t0 ) = 0 only for convenience. t. w◦ (·). tf ). w◦ (·). Here. tf ) ˜ = max t 0 −1 P −1 (t0 ) = P0 .25) as J(u◦ (·). τ . P0 ) is determined as ˙ P −1 (t0 . tf ) with respect to x(t). P0 )(A(τ )e(τ ) + Γ(τ )w(τ ) − θ−1 P (t0 .x0 1 2 t t0 (xT (τ )Q(τ )x(τ ) + uT (τ )R(τ )u(τ )) − θ(wT (τ )W −1 (τ )w(τ ) + (z(τ ) − H(τ )x(τ ))T V −1 (τ )(z(τ ) − H(τ )x(τ )) ˙ − 2eT (τ )P −1 (t0 .56) wt .55) where the terms evaluated at t = t0 cancel in (6. (τ ). τ . First. Note that ˜ e(t0 ) = x0 . P0 )e(τ ) 0 t t0 + x(t)T SG (tf . t.246 to (6. (6. P0 )e(τ ) − eT (τ )P −1 (t0 . Furthermore.53) is substituted into (6. x(t). t. t. (6.55) to obtain J(u◦ (·). From (6. In (6. τ . P0 )Γ(t)W (t)ΓT (t)P −1 (t0 . t0 .x0 1 2 t t0 (xT (τ )Q(τ )x(τ ) + uT (τ )R(τ )u(τ ) − θwT (τ )W −1 (τ )w(τ ) − θ(z(τ ) − H(τ )x(τ ))T V −1 (τ )(z(τ ) − H(τ )x(τ )) + 2θeT (τ )P −1 (t0 . LQ Differential Games wt . P0 ) = −P −1 (t0 . P0 )H T (τ )V −1 (τ )(z(τ ) − H(τ )ˆ(τ )) x . t. τ . x(t).46) the worst-case state is captured by the boundary conditions. t. P0 ) − P −1 (t0 . we will maximize J(u◦ (·). The state x(t) is still free and unspecified. t0 . P0 ) + (H T (t)V −1 (t)H(t) − θ−1 Q(t)). P0 )A(t) − AT (t)P −1 (t0 .44) the RDE for P −1 (t0 . Qf )x(t) . tf ) ˜ = max t 0 Chapter 6. P0 )e(τ ))dτ ˙ −1 − θxT P0 x0 − θeT (τ )P −1 (t0 .55). (6. w◦ (·). t0 . (τ ). t. x(t). t. P0 )e(τ )) + θeT (τ )P −1 (t0 . P0 )e(τ ))dτ − θeT (t)P −1 (t0 . t. (6. w◦ (·). P0 )A(τ )e(τ ) − θeT (τ )H T (τ )V −1 (τ )H(τ )e(τ ) ˙ + θeT (τ )P −1 (t0 . P0 )e(τ ))dτ − θeT (t)P −1 (t0 . τ .56) in (6. the optimal cost criterion reduces to J(u◦ (·). P0 )e(τ ) and substituting in x(τ ) = x(τ ) + e(τ ). τ .58) t Using (6. τ . x(t). P0 )e(τ ).x0 1 2 t (eT (τ )Q(τ )e(τ ) + xT (τ )Q(τ )ˆ(τ ) + uT (τ )R(τ )u(τ ) − θ(z(τ ) ˆ x t0 x − H(τ )ˆ(τ ))T V −1 (τ )(z(τ ) − H(τ )ˆ(τ )) − θ(wT (τ ) x − eT (τ )P −1 (t0 . w◦ (·). P0 )e(t) + xT (t)SG (tf . P0 )Γ(τ )W (τ )ΓT (τ )P −1 (t0 . Qf )x(t) . τ . P0 )Γ(τ )W (τ )ΓT (τ )P −1 (t0 . (6. t. t0 . τ . tf ) ˜ = where t 1 I(Zt ) − θeT (t)P −1 (t0 . 247 (6.61) . P0 )e(t) + x(t)T SG (tf .57) By adding and subtracting the term θeT (τ )P −1 (t0 . τ .3.58) and maximizing with respect to wt0 . t0 . P0 )e(τ ) + 2θeT (τ )P −1 (t0 .60) x x the maximizing w(τ ) is w◦ (τ ) = W (τ )ΓT (τ )P −1 (t0 . (6.59) 2 I(Zt ) = t0 x xT (τ )Q(τ )ˆ(τ ) + uT (τ )R(τ )u(τ ) ˆ − θ(z(τ ) − H(τ )ˆ(τ ))T V −1 (τ )(z(τ ) − H(τ )ˆ(τ )) dτ. (6. Qf )x(t) . τ . τ . Disturbance Attenuation Problem ˙ + θeT (τ )P −1 (t0 . (6. t. τ .6. P0 )e(t) + x(t)T SG (tf . tf ) ˜ = max t 0 wt .57) becomes ˆ J(u◦ (·). Qf )x(t) . t. P0 )Γ(τ )W (τ ))W −1 (τ )(w(τ ) − W (τ )ΓT (τ )P −1 (t0 . τ . t. Qf ) to the Riccati differential equation (6. w◦ (·).46) ˜ ˜ and the second variation condition for a maximum gives P −1 (t0 . tf ) = J ◦ (u◦ (·). w◦ (·).9) gives v ◦ (τ ) = −H(τ )e(τ ). x(t) is that the following assumption be satisfied. tf ]. x(t). Remark 6. tf ). t0 . t0 .63) . tf ) = 0 gives (6. if Qf ≥ 0 (> 0). For example. tf ) to ˜ be concave with respect to w(·). ˜ Assumption 6. Jx(t) (u◦ (·). If SG (tf .62) Since the terms under the integral are functions only of the given measurement process.1 ˜ 1. t0 . Note that J(u◦ (·). t. t. w(·). Qf ) ≥ 0 (> 0) for tf ≥ t ≥ t0 . tf ].248 Chapter 6. they are known functions over the past time interval. x◦ (t). Qf ) > 0.44) over the interval [t0 . There exists a solution P (t0 . P0 ) − θ−1 SG (tf .60) with respect to v(τ ) using (6. 3.3. the determination of the worst-case state is found by maximizing over the last two terms in J(u◦ (·).5) over the interval [t0 . 2. t. t. w◦ (·). t. t. Qf ) has an escape time in the interval tf ≥ t ≥ t0 . P0 ) − θ−1 SG (tf . Therefore. (6. Qf ) > 0 over the interval [t0 . (6.3 In [40] some properties of this class of Riccati differential equation are presented. t0 . then SG (tf . tf ) of (6. t0 . This inequality is known as the spectral radius condition.3.59). LQ Differential Games and maximizing (6. w◦ (·). t. x(t). x(t). then some eigenvalues of SG (tf . Qf ) must go off to positive infinity. ˜ ˜ We now show that a necessary and sufficient condition for J(u◦ (·). P −1 (t0 . There exists a solution SG (tf . t. tf ]. Thus. P0 ) to the Riccati differential equation (6. and v ◦ (t) = −H(t)e(t). which is a contradiction to the optimality of u◦ (·). ˜ eT (ts )P −1 (t0 . t. as ts → te . t. x(tf ). Suppose 2 is violated. ts .1 holds. tf ]. Qf )x◦ (t).49) holds. t0 .32) and (6. Qf )x(ts ) 2 > 0. (6. u◦ (t) = −R−1 (t)B T (t)SG (tf . Furthermore. t0 . v ◦ (t) = −H(t)e(t) for t0 ≤ t < ts and.3. Qf )x◦ (t). P0 ). Qf )x(t). . For ts < te we choose the strategies for w(t) as w◦ (t) = W (t)ΓT (t)P −1 (t0 . but 1 and 3 are not. Furthermore. P0 ) has an ˜ escape time te ∈ [t0 . ts .3. w(·). but 2 and 3 are not. where the strategies are used in (6. For ts > te we choose the strategies of w(t) as w(t) = 0. Therefore. x(t) = 0. ˜ Necessity: Suppose 1 is violated. tf ) < 0 and thus the cost is strictly concave with respect to w(·). t. v ◦ (t) = 0 for ˆ ts ≤ t ≤ tf . ˜ J(u◦ (·). If Assumption 6. where (w(t)) = 0.46). and SG (tf . w◦ (t) = 0) and for any other strategy. tf ) = 1 I(Ztf ) − θeT (tf )P −1 (t0 . x◦ (tf ). ts . and P (t0 . we assume some x(t0 ) = 0 such that e(ts ) = x(ts ) = 0 and coincides with the eigenvector associated with the largest eigenvalue of P (t0 . t. Disturbance Attenuation Problem 249 Theorem 6. as well as e(t) = 0. t. w◦ (t) = θ−1 W (t)ΓT (t)SG (tf . t. P0 )e(tf ) = 0. w(·).3.1 holds. Then. ts ].1 holds. w◦ (·).53). Equation (6. t. where the ˆ strategies are used in (6. tf ]. Qf ) has an escape ˜ time te ∈ [t0 .64) 2 ˜ ˜ where (u◦ (t) = 0. tf .1 There exists a solution u◦ (Zt ) ∈ U to the finite-time disturbance attenuation problem if and only if Assumption 6. P0 )e(ts ) → 0 and the cost criterion J(u◦ (·). t0 . Note that I(Zts ) = 0. Qf )x◦ (t). P0 )e(t). tf ) = 1 T x (ts )SG (tf . using (6.3. For the strategies u◦ (t) = −R−1 (t)B T (t)SG (tf .33). Proof: Sufficiency: Suppose that Assumption 6.3. x(ts ). x(t) = 0 and u◦ (t) = 0 over t ∈ [t0 .45) and (6.42). w◦ (t) = θ−1 W (t)ΓT (t)SG (tf . Therefore. using (6.6. ˜ J(u◦ (·). 65) since as ts → te . v ◦ (t) = −H(t)e(t) for ˜ t0 ≤ t ≤ ts and w◦ (t) = θ−1 W (t)ΓT (t)SG (tf . we assume some x(t0 ) = 0 such that e(ts ) = x(ts ) = 0 and coincides with the eigenvector associated with the largest eigenvalue of SG (tf . tts . t. tts . Qf ) so that eT (ts )(SG (tf . J(u◦ (·). Furthermore. Furthermore. P0 ) − θ−1 SG (tf . where ts ∈ [t0 . Then.250 Chapter 6. ts ] as long as x(ts ) = 0. Suppose 3 is violated at t = ts . We first make a transformation of our estimate x(t) to a new estimate x◦ (t). LQ Differential Games v ◦ (t) = −H(t)e(t) for t0 ≤ t ≤ ts and w◦ (t) = θ−1 W (t)ΓT (t)SG (tf . choose x(t0 ) = 0 so that e(ts ) = x(ts ) = 0 so that e(ts ) = x(ts ) is an eigenvector of a negative eigenvalue of P −1 (t0 .4 Time-Invariant Disturbance Attenuation Estimator Transformed into the H∞ Estimator For convenience. Qf ) − θP −1 (t0 . 6.3. ts .66) . but 1 and 2 are not. t0 . SG (tf . tf ]. P0 ) has converged to a steady state value denoted as P and SG (tf . which is essentially the worst-case state ˆ estimate as x◦ (t) = I − θ−1 P S −1 x(t) = L−1 x(t). ts . t. v ◦ (t) = 0 for ts ≤ t ≤ tf . Qf ) goes to positive infinity and the third term dominates and produces a contradiction to the optimality of u◦ (·). Qf )x(tts ) > 0. P0 )e(t). Qf ) has converged to a steady state value denoted as S. P0 ))e(ts ) > 0. ts . Note that I(Zts ) = 0. as ts → te . time-invariant problem. w(·). ˆ ˆ (6. We assume that P (t0 . tf ) = ˜ 1 I(Zts ) − θeT (tts )P −1 (t0 . we consider the infinite-time. Choose the strategies of w(t) as w◦ (t) = W (t)ΓT (t)P −1 (t0 . Qf )x(t). Note that the above is true for any finite control process u(·) over [t0 . This is again a contradiction to the optimality of u◦ (·). Qf )x(t). t. t. v ◦ (t) = 0 for ts ≤ t ≤ tf . t. Qf ). tts . x(t). ts . P0 )e(tts ) 2 + xT (tts )SG (tf . (6. ts . Also.3.4 The convergence of the Riccati differential equation to an ARE is shown in [40] and follows similar notions given in Theorem 5. S − M ≥ 0. (6.69) = (I − θ−1 P S)S −1 −1 −1 = S −1 − θ−1 P −1 S −1 (I − θ−1 SP ) = I − θ−1 SP −1 S. the minimal positive definite solution (M) to the ARE captures the stable eigenvalues. The elements of the transformation L−1 can be manipulated into the following forms which are useful for deriving the dynamic equation for x◦ (t): E = S I − θ−1 P S = −1 (6. (6.66) into the estimator (6.72) . Substitution of the transformation (6. note that the solutions to ARE of (6.7 for the convergence to the ARE for the linear-quadratic problem.71) S −1 E = I + θ−1 P E = I − θ−1 P S −1 = P −1 − θ−1 S −1 P −1 = L−1 .70) Remark 6. 0 = AT S + SA + Q − S BR−1 B T − θ−1 ΓW ΓT S.67) gives ˙ ˆ ˙ L−1 x(t) = x◦ (t) = L−1 A + θ−1 P Q Lx◦ (t)+L−1 Bu+L−1 P H T V −1 (z(t) − HLx◦ (t)) . Disturbance Attenuation Problem where the estimator propagation is written as a standard differential equation as ˙ ˆ ˆ x(t) = A + θ−1 P Q x(t) + Bu + P H T V −1 (z(t) − H x(t)) .68) (6. ˆ 251 (6.68) and (6.3. However. (6. from (6.1 of Section 5. The minimal positive definite solution M means that for any other solution S.69) can have more than one positive definite solution.71) Furthermore.7.67) where we are assuming that all the coefficients are time invariant and the matrices P and S are determined from the algebraic Riccati equations (ARE) as 0 = AP + P AT − P (H T V −1 H − θ−1 Q)P + ΓW ΓT .6. To continue to reduce this equation.68) in previous equality.73) where M = L−1 P and the last line results from using (6.73) becomes x◦ (t) = Ax◦ (t) − BR−1 B T Sx◦ (t) + θ−1 ΓW ΓT Sx◦ (t) + M H T V −1 (z(t) − Hx◦ (t)) ˙ + θ−1 I + θ−1 P E P Qx◦ (t) + θ−1 I + θ−1 P E P AT Sx◦ (t) − θ−1 P E −A + BR−1 B T S − θ−1 ΓW ΓT S x◦ (t). (6. (6.75) P −1 .73) Then.70) gives x◦ (t) = ˙ I + θ−1 P E A + θ−1 P Q I − θ−1 P S x◦ (t) + I + θ−1 P E Bu + M H T V −1 z(t) − H I − θ−1 P S x◦ (t) = I + θ−1 P E A + θ−1 P Q I − θ−1 P S x◦ (t) + I + θ−1 P E Bu + M H T V −1 (z(t) − Hx◦ (t)) − I + θ−1 P E θP H T V −1 P Sx◦ (t) = I + θ−1 P E A + θ−1 P Q x◦ (t) + I + θ−1 P E Bu + M H T V −1 (z(t) − Hx◦ (t)) + I + θ−1 P E = I + θ−1 P E A + θ−1 P Q θP + θ−1 P H T V −1 P Sx◦ (t) A + θ−1 P Q x◦ (t) + I + θ−1 P E Bu P AT + ΓW ΓT Sx◦ (t).74) S = P −1 − θ−1 S −1 S. Noting that P E = P I − θ−1 SP I + θ−1 P E = P −1 − θ−1 S −1 −1 + M H T V −1 (z(t) − Hx◦ (t)) + θ−1 I + θ−1 P E (6.66) and (6. substitute in the optimal controller u◦ = −R−1 B T Sx◦ (t) into (6. LQ Differential Games Substitution of the transformations of L and L−1 from (6.252 Chapter 6.72) into (6. . (6. P0 ) − θ−1 SG (tf .44). This estimator equation is the same if the system matrices are time varying and is equivalent to that given in [18] for their time-invariant problem. Qf ) −1 P0 .78) ˙ For the infinite-time.47). (6. the estimator in terms of x◦ (t) becomes x◦ (t) = Ax◦ (t) − BR−1 B T Sx◦ (t) + θ−1 ΓW ΓT Sx◦ (t) + M H T V −1 (z(t) − Hx◦ (t)) . Relating this back to the disturbance attenuation controller in the previous section (6. ˙ (6. t0 . t. (6. time-invariant system.78) becomes an ARE. Substitution of (6.68) and (6.5).77) produces the Riccati equation ˙ M (t) = M (t) A − θΓW ΓT SG (tf . t.44) into (6.77) + A − θΓW ΓT SG (tf . t. t. Qf ) −1 ˙ ˙ ˙ ⇒ M (t) = M (t) P −1 (t0 . Qf ) M (t) + ΓW ΓT .69). Qf ) T (6. P0 ) − θ−1 SG (tf . t. t.5) and (6. Disturbance Attenuation Problem 253 Substituting (6. M = 0 and (6.6. P0 ) = P −1 (t0 .74) and using (6. (6.69). t. (6. . The dynamic equation for the matrix M (t) in the filter gain can be obtained by differentiating M (t) as M (t) = L−1 (t)P (t0 . Qf )BR−1 B T SG (tf .3. (6. M (t0 ) = I − θ−1 P0 SG (tf . Qf ) M (t). in which M is given by (6.76) The appearance of the term w = +θ−1 ΓW ΓT Sx◦ (t) is the optimal strategy of the process noise and explicitly is included in the estimator. the H∞ form of the disturbance attenuation controller is u◦ (t) = −R−1 B T Sx◦ (t).75) into (6. t.79) where x◦ (t) is given by (6.76). Qf ) M (t) − M (t) H T V −1 H − θ−1 SG (tf . t.42).77) and controller and filter gains require the smallest positive definite (see [40]) solutions P and S to the AREs (6. LQ Differential Games Remark 6. We are interested in the conditions on G that will make the output performance measure.7. is a square integrable.69) with θ = 0 and is identical to the ARE in Theorem 5.79) reduce to the H2 controller11 u(t) = −R−1 B T S x(t). Y(s): y 2 2 1 = y(τ ) dτ = sup α>0 2π −∞ 2 ∞ ∞ −∞ Y(α + jω) 2 dω.2. where the disturbance input. The state is reconstructed from the filter with x(t) = x◦ (t): ˆ x(t) = Aˆ(t) + Bu(t) + P H T V −1 (z(t) − H x(t)) . i. we show that the L2 norms on the input–output of a system induce the H∞ norm12 on the resulting transfer matrix.” . d. (6. y. Because of Parseval’s Theorem. by the way.. The properties of this new ARE are similar to those given in Theorem 5.2.3.82) d G(s) y Figure 6. Consider Figure 6.2: Transfer function of square integrable signals.e. stands for “Lebesgue.7. a square integrable y is isomorphic (i. The “L” in L2 . 6.3. equivalent) to a square integrable transfer function.5 H∞ Measure and H∞ Robustness Bound First.5 Note that when θ → 0 the H∞ controller given by (6.47) or (6.80) where the controller gains are obtained from the ARE (6.81) where the filter gains are determined from (6. 11 12 In the stochastic setting it is known as the linear-quadratic-Gaussian controller.68) with θ = 0. ˆ (6.. square integrable as well.2. L2 function.e. ˆ˙ x ˆ (6.254 Chapter 6. 89) . we define the infinity norm of G to be G ∞ := sup sup σ G(α + jω) . so is σ .85) (6. Since G is a function of the complex number.84) (6. which gives the largest value that G can obtain. d 2 (6. ¯ α>0 ω ∞ (6. Thus. since d ¯ y||2 < ∞ if and only if 2 sup sup σ (G(α + jω)) < ∞. (6. Disturbance Attenuation Problem 255 We can use the properties of norms and vector spaces to derive our condition on G: y 2 2 1 = sup α>0 2π 1 ≤ sup α>0 2π 1 = sup α>0 2π ≤ = α>0 ω ∞ −∞ ∞ G(α + jω)d(α + jω) 2 dω G(α + jω) 2 (6. 2 We use Schwartz’s Inequality to get from the first line to the second. G(·).3.86) d(α + jω) 2 dω −∞ ∞ −∞ σ (G(α + jω))2 d(α + jω) 2 dω ¯ 1 2π ∞ −∞ ¯ sup sup σ (G(α + jω))2 ¯ sup sup σ (G(α + jω))2 α>0 ω d(α + jω) 2 dω d 2.6. The symbol σ ¯ denotes the largest singular value of the matrix transfer function. ¯ α>0 ω 2 2 < ∞ by definition.88) describes the ratio We should note that from our development it is clear that G of the two norms of d and y: G ∞ = y 2 .83) (6.87) The above equation describes the largest possible gain that G(s) can apply to any possible input. s. Now. 90) CCL = [C .6 The H∞ Transfer-Matrix Bound In this section. d(t) = w(t) . (6. .91) (6. the H∞ norm of the transfer matrix from the disturbance attenuation problem is computed. LQ Differential Games The body of theory that comprises H∞ describes the application of the ∞ norm to control problems. Define a new state vector which combines x(t) and x◦ (t) as ρ(t) = with dynamics system ρ(t) = FCL ρ(t) + ΓCL d(t).3.93) x(t) x◦ (t) (6. v(t) − DR−1 B T S].256 Chapter 6.92) (6. ˙ y(t) = CCL ρ(t). the dynamic system coupled to the optimal H∞ compensator is written together as x(t) = Ax(t) + Bu◦ (t) + Γw(t) = Ax(t) − BR−1 B T Sx◦ (t) + Γw(t). Examples of these are the model matching problem and the robust stability and performance problems. Gc = M H T V −1 . 6. To construct the closed-loop transfer matrix between the disturbance and performance output. where it is assumed now that the disturbance inputs of measurement and process noise are L2 functions. ˙ x◦ (t) = Fc x◦ (t) + Gc z(t) = Fc x◦ (t) + Gc Hx(t) + Gc v(t). where FCL = ΓCL = A −BΛ Fc Gc H Γ 0 0 Gc . ˙ where Fc = A − BR−1 B T S + θ−1 ΓW ΓT S − M H T V −1 H. 2 The closed-loop system is stable and Tyd (s) ∞ ≤ θ. (6.5x(t) + u(t) + w(t).3: Transfer matrix from the disturbance inputs to output performance. The transfer matrix Tyd is Tyd (s) = CCL [sI − FCL ]−1 ΓCL . proved in [40]. (6.3.6. shows how the closed-loop transfer matrix is bounded. ˙ z(t) = x(t) + v(t).1 (Scalar dynamic system) The characteristics of the ARE and the closed-loop system dynamics are illustrated.3. The transfer matrix of the closed-loop system from the disturbances d to the output y is depicted in Figure 6.94) The following result. . Theorem 6. Consider the scalar dynamic system x(t) = −1. Disturbance Attenuation Problem 257 Disturbance d Control u Output performance y Plant Tyd(s) Measurements z Compensator Figure 6.3.95) Example 6.3. 2. M H T V −1 = 14/3 = Gc . At this point the eigenvalues of the Hamiltonian associated with the ARE reach and then split along the imaginary axis if θ−1 continues to change.5 −1 14/3 −4.4. R = 2.2 ⇒ λ = −2.258 Chapter 6. where we compute M = 1/3. FCL = −1.4: Roots of P as a function of θ−1 . B = 1. 4.8 ± 1. Note that there can be two positive solutions. At that point it breaks onto the imaginary axis and its solution is no longer valid.5S 2 + 4 = 0 ⇒ S = 2. W = 1. A plot of P as a function of θ−1 is shown in Figure 6. The ×’s start on the negative reals and continue to decrease as θ−1 decreases. it is shown that the smallest positive solution to the ARE is associated with the root starting at θ−1 = 0 or the LQG solution. −3P − 10P 2 + 1 = 0 ⇒ P = . H = 1. Then the ×’s go through −∞ to +∞ and continue to decreases as θ−1 decreases till it meets the ◦’s. (6. θ−1 = 1.2. V = 1/14.96) .92) for S = 2. In [40] it is shown that only the smallest positive definite solution to the S and P ARE produces the optimal controller. Figure 6. The corresponding AREs are −3S + .7i. P = . LQ Differential Games where Q = 4. The closed-loop matrix (6. Here. Γ = 1. 259 (6. Figure 6. ˙ P → 0. Show the results given in Equation (6. −10.7.5 −2 14 −13. Assume ˙ S → 0.3. .5 Assume tf → ∞ and all parameters are time invariant. Disturbance Attenuation Problem where λ is an eigenvalue of FCL and for S = 4. FCL = −1. 3.2. 2.46).5 ⇒ λ = −4. Consider the system shown in Figure 6. P = .97) Note that the complex eigenvalues in (6.48).5: System description. Find a differential equation for the propagation of x◦ (t) in Equation (6.6.96) induced by this approach could not be generated by LQG design for this scalar problem.3. Problems 1. R = 1. (6.99) W = 1. 1 0 x(t) + 0 1 u.260 Chapter 6.98) . (a) Plot S as a function of θ−1 for a = 1 and a = −1. y = where Q = 1. P > 0. iii. ˙ z(t) = x(t) + v(t). (c) For some choice of θ−1 show that all necessary conditions are satisfied: i. I − θ−1 P S > 0. (d) Write down the compensator. (b) Plot P as a function of θ−1 for a = 1 and a = −1. LQ Differential Games which means using the ARE. The system equations are x(t) = ax(t) + u + w(t). ii. V = 1. (6. S ≥ 0. y) = x2 + y 2 in R2 . differentiation. Can we write the points of this set in the form y = g(x)? If so.1) . y) = 0.1) is (0.261 Appendix A Background A. y) = 0 defines the function y = g(x) implicitly. Note that a basic familiarity with such topics as continuity. Let F (x. and integration is assumed. If F (x. 0). A. y) = 0 the implicit equation for y = g(x). and we call the equation F (x. y) be a function defined in an open set G of R2 . We are concerned with the set of points (x. then the only solution of Equation (A.1. Thus there is no function y = g(x) that is defined on some interval of positive length. (A. then we say that the equation F (x.1 Implicit Function Theorems It should be noted that this section is adapted from Friedman’s book Advanced Calculus [21].1 Topics from Calculus In this section we present some results from calculus that will be of use in the main text. y) in G satisfying F (x. g(x)) Fy (x.1) in a small neighborhood of a point (x0 . namely.6) The condition (A. Then there exists a rectangle R in G defined by |x − x0 | < α. where g(x) is a function having a continuous derivative in |x − x0 | < α. (A. and let (x0 . This theorem is called the Implicit Function Theorem for a function of two variables.1) for −1 ≤ x ≤ 1. (x0 . y0 ) = 0. (A. 0). y0 ) = 0.5) (A. g(x)) if |x − x0 | < α. that if we restrict (x. y) = x2 + y 2 − 1 in R2 .262 Appendix A: Background If F (x. y0 ) have the form y = g(x).1 Let F (x.1) have the form (x. Furthermore.4) such that the points (x. however. y0 ) of Equation (A.1) with |x0 | < 1. y) = x2 + y 2 .4) cannot be omitted. g(x)). g(x)) satisfies Equation (A. y0 ) = (0. y) of Equation (A. then there are two functions y = g(x) for which (x. y0 ) be a point of G for which F (x0 . y) in R that satisfy Equation (A.3) (A. y) be a function defined in an open set G of R2 having continuous first derivatives. Theorem A. then there is a unique solution y = g(x) satisfying g(x0 ) = y0 . . g (x) = − Fx (x. We shall now prove a general theorem asserting that the solutions (x. as shown by the example of F (x. √ y = g(x) = ± 1 − x2 . Fy (x0 . |y − y0 | < β (A.2) Note.1. y) to belong to a small neighborhood of a solution (x0 . Fy (x. It is strictly monotone increasing since φ (y) = Fy (x0 . F >0 (x-d. d2 = min(d. y0 + d) we deduce that F (x. y0 + d) satisfying ψ(yx ) = 0. y0 ) > 0. . there is a point yx in the interval (yo − d. yx is unique.1) for |x − x0 | < d2 . yx) F <0 (x. y0 + d) > 0 (A. F (x. See Figure A. |x − xo | ≤ d2 . y0 + d). y0 − d) = φ(y0 − d) < 0 < φ(y0 + d) = F (x0 . It is strictly monotone increasing since ψy = Fy (x. y) (x. y0 − d) and of F (x. y) > 0. Since ψ is strictly monotone.1. where d is a small positive number. y) > 0 if |x − x0 | ≤ d. in the interval |y − y0 | ≤ d. ψ(y0 + d) > 0.A.7) if |x − x0 | is sufficiently small. d1 ). y+d) F >0 Figure A. Topics from Calculus 263 Proof: We may suppose that Fy (x0 . say. Consider. Consider the function φ(y) = F (x0 . By continuity. Using the continuity of F (x. y-d) F <0 (x+d. Since φ(y0 ) = F (x0 . y+d) (x. we have proved that the solutions of Equation (A. y) > 0.8) (A. y+d) (x. g(x)). it follows that F (x0 . y-d) (x-d.1. y0 − d) < 0. by Equation (A.1: Definition of Fy (x0 . |y − y0 | ≤ d. Also. |y − y0 | < d have the form (x. if |x − x0 | ≤ d1 . y0 ) > 0. y) for x fixed.8). y0 ) = 0. From the continuity of ψ(·). y). ψ(y0 − d) < 0. the continuous function ψ(y) = F (x. y-d) (x+d. Writing yx = g(x). We therefore can apply the proof of the continuity of g(x) at x0 and deduce the continuity of g(x) at x1 . We begin with the relation F (x0 + h. Writing g(x0 + h) = g(x0 ) + ∆g and using the differentiability of F . Fy (x0 . F (x0 . y0 + ) satisfying F (x. yx ) = 0.264 Appendix A: Background We shall next prove that g(x) is continuous. where y1 = g(x1 ). y1) > 0. we find that ∆g Fx η =− + h Fy hFy h2 + (∆g)2 . y0 ) = Fy . y0 ) + η h2 + (∆g)2 = 0. ∆g) → 0 if (h. y0 + d).11) by hFy . it follows that yx = g(x). y0 − ). (A.12) . y0 ) = Fx . g(x0 )) = 0.11) where η = η(h. ∆g → 0 if h → 0. Since g(x) is continuous. (A. we get hFx (x0 . and dividing both sides of Equation (A. Now let x1 be any point in |x − x0 | < d2 . g(x0 + h)) − F (x0 . ∆g) → 0. We proceed to prove that g(x) is differentiable. (A. Repeating the argument given above with y0 ± d replaced by y0 ± . Hence η → 0 if h → 0. For any > 0. By uniqueness ¯ ¯ of the solution y of F (x. then there is a unique y = yx in (y0 − . (A. y0 + ) > 0 > F (x0 . and Fy (x1. y) = 0 in (y0 − d. Writing Fx (x0 . ≤ d.10) where |h| < d2 . we see that there exists a number d2 ( ) such that if |x − x0 | < d2 ( ). y0 ) + ∆g · Fy (x0 . y1 ) = 0.9) This proves the continuity of g(x) at x0 . Then F (x1 . ¯ Hence |g(x) − y0 | < if |x − x0 | ≤ d2 ( ). y) be a function defined in an open set of G of Rn+1 and let (x0 . We have thus proved that g (x) exists at x0 and that Equation (A. .6) is continuous. x0 . Fy (x0 .6) holds at x0 . . 265 Using this in Equation (A.2 Let F (x. i |y − y 0 | < β (A. If instead of Equation (A. . . . y) = F (x1 . |η/Fy | < 1/2.A. xn . . the same is true of g (x). The result is the following implicit function theorem for a function of several variables. we conclude that ∆g h→0 h lim exists and is equal to −Fx /Fy . . hence |∆g| |Fx | 1 |∆g| ≤ + .1. This completes the proof of the theorem. Topics from Calculus If |h| is sufficiently small. y 0 ) = 0. |h| |Fy | 2 |h| It follows that |∆g| ≤ C.6). |h| C constant. . y 0 ) = (x0 . The same argument can be applied at any point x in |x − x0 | < d2 . . x2 .1. y0 ) = 0.1. xn ). then we can prove an analogue of Theorem A. y 0 ) be a point of G. Assume that 1 n F (x0 . Since the right-hand side of Equation (A.13) Then there exists an (n + 1)-dimensional rectangle R defined by |xi − x0 | < α(1 ≤ i ≤ n).15) . Thus g (x) exists and it satisfies Equation (A.1 extends to the case where x = (x1 . Theorem A.1 with the roles of x and y interchanged.4) we assume that Fx (x0 . . . (A. y 0 ) = 0. The proof of Theorem A. x2 .12) and taking h → 0.1. .14) (A. the function g(x) is continuously differentiable. v) = 0. z. (A. . Writing yx = g(x). (x0 . 13 |y − y0 | < α. G) = ∂(u. G with respect to u. y0 . u0 . and ∂(F G) ∂(u.266 Appendix A: Background such that for any x in Equation (A. |z − z0 | < α. z0 . u0 .1. v0 ).2.21) which means the Jacobian is nonzero.19) (A. z0 . y.4. v0 ) = 0. (A.3 Let F and G have continuous first derivatives in an open set D of R5 containing a point P0 = (x0 .13 Theorem A. g(x)) (1 ≤ i ≤ n).z0 . z.15) there exists a unique solution y = yx of F (x. and ∂ (∂F/∂xi )(x. v0 ) = 0. (A. y0 . u. Then there exists a cube R : |x − x0 | < α. y) = 0 in the interval |y − y 0 | < β. v) Fu Fv Gu Gv (A. g(x)) g(x) = − ∂xi (∂F/∂y)(x.22) |v − v0 | < β2 (A. u. G(x.20) (A. u0 . v) = 0.v0 ) G(x0 .17) (A.23) The Jacobian determinant should not be confused with the Jacobian matrix from Definition A. v. y.u0 . z0 . y0 . v) = 0. and a rectangle S : |u − u0 | < β1 . Assume that F (x0 . We introduce the determinant J= ∂(F.18) called the Jacobian of F.16) Consider next a more complicated situation in which we want to solve two equations simultaneously: F (x.y0 . y. By Theorem A.2. y.25) (A. (A. z) lies in a small rectangle T with center (x0 . u) = G(x. y. v) J Gx 1 Fu 1 ∂(F. y0 . (A.1. We therefore can apply Theorem A. y. z. if (x. then there exists a unique solution p = φ(x. z) in a small cube R with center (x0 .17) and (A. y. Writing u = f (x. z. the points (x. G) =− gx = − J ∂(u. v = g(x. u) of Equation (A. z. G) =− J ∂(x.2. u) ∈ T . x) J Gu Fv .28) at P0 .18) (when (x. fz . z0 . z). y. and fx = − 1 Fx 1 ∂(F. z.27) Then (u. u) belong to T . either Fv = 0 or Gv = 0 at P0 . y. u. φ(x. v) in S for which Equation (A. Let H(x. this solution u has the form u = g(x. u) and H(x. Topics from Calculus 267 such that for any (x.18) hold. We conclude that for any (x. |v − v0 | < β2 ) if and only if v = φ(x. u) = 0. u0 ).17) and (A. gy .1.26) Similar formulas hold for fy . Hence Hu = Gu + Gv φu = Gu − Gv Fu Gu Fv − Gv Fu J = =− =0 Fv Fv Fv (A.28) in some interval |u − u0 | < β1 . Gv Fx .29) (A. (A. y0 . y. u)). z.17) in some small interval |v − v0 | < β2 .1. gz . y. z). z. z). y. z0 ) there is a unique solution of u of Equation (A. y. φ has continuous first derivatives and φu = −Fu /Fv . z. z. y. Gx (A. Proof: Since J = 0 at P0 . Furthermore. v) is a solution of Equations (A.24) the functions f and g have continuous first derivatives in R. y. y. z) in R there is a unique pair (u.30) .A. Suppose Fv = 0 at P0 . . . u) = Fi (x1 .  . . (∂Fi /∂uj ) is called the Jacobian matrix of (F1 . y. f (x. . ∂Fr ∂u1 ∂F1 ∂u1 ∂F1 ∂u2 ∂F2 ∂u2 ··· ··· ··· ··· ∂F1 ∂ur ∂F2 ∂ur  (A. . .268 Appendix A: Background where g has continuous first derivatives. y.  ∂Fr ∂u2 ∂Fr ∂ur or briefly. . Fr ) with respect to (u1 . Gx + Gu fx + Gv gx = 0. . y.25) and (A.33) be functions having continuous first derivatives in an open set containing a point (x0 . ur ) (1 ≤ i ≤ r) (A.25) and (A. y. . Let Fi (x.26). z)) = 0 with respect to x and get Fx + Fu fx + Fv gx = 0. . y. z. We conclude this section with a statement of the most general implicit function theorem for a system of functions. g(x. u0 ). z). To do this we differentiate the equations F (x. .  . . The determinant of this matrix is called the Jacobian of (F1 . The matrix    ∂F2  ∂u1  .32) (A.31) Solving for fx .26). . Fr ) . z. . .34) . xn . g(x. It follows that v = φ(x.  . (A. G(x. . . gx . It remains to prove Equations (A. we get Equations (A. y. . g(x. z)) also has continuous first derivatives. ur ). z). z. . z)) = 0. . y. y. u1 .    . f (x.  . . . u(x)) = 0.1. Fr have continuous first derivatives in a neighborhood of a point (x0 . . . . . Fr ) . . . . (A. . . ∂(u1 . we differentiate the equations F1 (x. ur ) 269 (A.39) The system of linear equations (A.A. In order to compute ∂ui /∂xj (1 ≤ i ≤ r) for a fixed j. Topics from Calculus with respect to (u1 . . . . . The vector valued function u(x) = (u1 (x). .1. . ∂(F1 . . u0 ) = 0 (1 ≤ i ≤ r). . . . Fr (x. . . . . .38) i=1 ∂Fk ∂ui =0 ∂ui ∂xj (1 ≤ k ≤ r). Fr ) . ur ) is different from 0. since the determinant of the coefficients matrix. u0 ). .37) in S. ur ) and is denoted by J= ∂(F1 . . . . u(x)) = 0 with respect to xj . We obtain the system of linear equations for ∂ui /∂xj : ∂Fk + ∂xj r (A.35) Theorem A. Fr ) =0 ∂(u1 . u0 ). . . . . . . ur (x)) thus defined has continuous first derivatives in R. ur ) at (x0 . Assume that Fi (x0 . .36) Then there is a δ-neighborhood R of x0 and a γ-neighborhood S of u0 such that for any x in R there is a unique solution u of Fi (x. (A. . which is precisely the Jacobian ∂(F1 .4 Let F1 . u) = 0 (1 ≤ i ≤ r) (A. .39) in the unknowns ∂ui /∂xj can be uniquely solved. ∂(u1 . . . . . ur−1 ) Therefore.2 to solve the equation G(x.42) then we can use Theorem A. ur ).42) with respect to ur to obtain r−1 j=1 ∂Fr ∂φj ∂G ∂Fr − + = 0. u0 ). ∂uj ∂ur ∂ur ∂ur (A. ur ). (A.46) (in the unknowns ∂φj /∂ur . ∂G/∂ur ) for ∂G/∂ur .45) Differentiate also Equations (A. u) = 0 (1 ≤ i ≤ r − 1) (A. . ∂(u1 .44) j=1 ∂F1 ∂φj ∂Fi + =0 ∂uj ∂ur ∂ur (1 ≤ i ≤ r − 1). differentiate the equations Fi (x.41) (A. u0 ) is given by ui = φi (x. Fr )/∂(u1 . φ1 (x. the solution of Fi (x. . . φr−1 (x. . It is based upon induction on r. ur ) . ur ) = Fr (x. .43) (A. . . = ∂ur ∂(F1 . (A. . ur ) with (1 ≤ i ≤ r − 1). ur ). . Without loss of generality we may assume that ∂(F1 .4. ur−1 ) This gives Equation (A.47) . ur ) = 0.1. .270 Appendix A: Background We briefly give the proof of Theorem A. . . φ1 (x. . .43). . we obtain ∂G ∂(F1 .46) Solving the linear system of Equation (A. by the inductive assumption. Let G(x. . . .1. . To prove Equation (A. . . . . . Fr−1 )/∂(u1 . ur ). .43). . ur ). If we show that ∂G =0 ∂ur at (x0 . (A. .40) in a neighborhood of (x0 .45) and (A. . . . . φr−1 (x. ur ) = 0 with respect to ur to obtain r−1 (1 ≤ i ≤ r − 1) (A. Fr−1 ) = 0. . then in that neighborhood f (x) = f (x0 ) + where RN = 1 (N − 1)! x x0 1 1 (N ) f (x0 )(x − x0 ) + · · · + f (x0 )(x − x0 )N + RN .51) .50) where RN is of order O((x − x0 )N ) and O((x − x0 )N ) → 0. Topics from Calculus 271 A. (x−x0 )→0 |(x − x0 )N | lim (A. We say that R(·) is of higher order than xn and write R(x) ∼ O(xn ) if R(x) O(xn ) = lim = 0. x→0 xn x→0 xn lim Taylor’s Expansion for Functions of a Single Variable We begin by reviewing the definition and the simplest properties of the Taylor expansion for functions of one variable.49) 1! N! (x − t)N −1 f (N ) (t) − f (N ) (x0 ) dt.1.5 If f has a continuous N th derivative in a neighborhood of x0 .48) 1! 2! N! The relation between f and its Taylor expansion can be expressed conveniently by the following integral remainder formula.1 Consider a function R(x). (A.A. Theorem A.1.1. If f (x) has an N th derivative at x0 its Taylor expansion of degree N about x0 is the polynomial f (x0 ) + 1 1 1 (N ) f (x0 )(x − x0 ) + f (x0 )(x − x0 )2 + · · · + f (x0 )(x − x0 )N . (A.2 Taylor Expansions We first introduce the definition of order.1. Definition A. (A. N! (A. |f (N ) (x ) − f (N ) (x0 )| (A. The first integral can be integrated by parts.53) leads to 1 (N − 2)! x x0 (x − t)N −2 [f (N −1) (t) − f (N −1) (x0 )]dt = RN −1 . for N = 1.272 Proof: The remainder can be written as the difference RN = 1 (N − 1)! x x0 Appendix A: Background (x − t)N −1 f (N ) (t)dt − f (N ) (x0 ) (N − 1)! x x0 (x − t)N −1 dt.54) We therefore obtain RN = − 1 (N ) f (x0 )(x − x0 )N + RN −1 .56) and that this is a valid equation.52) The second of these integrals is directly computed to be f (N ) (x0 ) (N − 1)! x x0 (x − t)N −1 dt = 1 (N ) f (x0 )(x − x0 )N .49) back again with N replaced by N − 1. N! (A.50) is shown to be O(x − x0 )N . which together with (A. Finally. (A. (A. The induction is completed by noticing that.55) If we substitute the preceding equation into Equation (A. the remainder RN in (A.57) N! |f (N ) (x ) − f (N ) (x0 )| . Equation (A. The following inequality can be constructed: |RN | = ≤ = 1 (N − 1)! x ∈(x−x0 ) x x0 (x − t)N −1 f (N ) (t) − f (N ) (x0 ) dt max x ∈(x−x0 ) max x 1 (x − t)N −1 dt (N − 1)! x0 1 |(x − x0 )|N .49).49) is just x f (x) = f (x0 ) + f (x0 )(x − x0 ) + x0 [f (t) − f (x0 )]dt (A.53) which is just the last term of the Taylor expansion. we get Equation (A. A. . .1 A vector representation for Taylor’s Expansion is given in Section A. Suppose we have a set of vectors {e1 . We offer here a brief review of some of the definitions and concepts that are of particular interest to the material in the main text. then using (A. The span of S ⊂ Rn is defined as the collection of all finite linear combinations of the elements of S. . .2. after the linear algebra review of Appendix A. .59) The set of vectors {e1 . .1.2 Linear Algebra Review It is assumed that the reader is familiar with the concepts of vectors. . A. .1. em } in Rn is said to be linearly independent if α1 e1 + α2 e2 + · · · + αm em = 0 ⇔ α1 = α2 = · · · = αm = 0. em } in Rn . αi ∈ R} (read. e2 . . e2 . (x−x0 )→0 |(x − x0 )N | lim which by Definition A.1 implies that RN is O((x − x0 )N ). matrices.2. Then the set of vectors S = {s | s = α1 e1 + α2 e2 + · · · + αm em . Linear Algebra Review Since it is assumed that f (N ) is continuous. A set of vectors {e1 . . and inner products. .2. em } is a basis for the subspace if they span S and are independent.57) RN → 0. “the set of all s such that s = α1 e1 + · · ·”) is a subspace and is spanned by S. e2 .A. then αs1 + βs2 ∈ S. β ∈ R.1 Subspaces and Dimension A subspace S of Rn is a set of vectors in Rn such that for any s1 . s2 ∈ S and any two scalars α.2.10. (A. 273 (A. . .58) Remark A. That is. . It can be seen that at most two of the three vectors are linearly independent. . where x ∈ Rn . s2 . i = 1. the range space of the matrix is the set of all vectors y ∈ Rm that can be written as Ax. both {e1 . e3 =   1 These form a basis for a subspace S ⊂ R3 . as e1 − e2 − e3 = 0. 0 . . and {e2 . It can be shown that any set of m linearly independent vectors in the subspace is a basis for the space. so that the requirement of (A. the vectors are not linearly independent. . . . then any s ∈ S can be written as s = α1 s1 + α2 s2 + · · · + αm sm . sm } are linearly independent vectors in the m-dimensional subspace S. so that S has dimension 2. m. Example A.1 Consider the vectors      1   1  1 .2. . e2 } are bases for the space.2 Matrices and Rank Given a real matrix A ∈ Rm×n . 2. Range(A) = {y | y = Ax for some x ∈ Rn }. However.2.274 Appendix A: Background The dimension of a subspace is the smallest number of vectors required to form a basis. if {s1 . where αi . are real scalars. That is. e3 } A. e2 = e1 =     2 1    0  1 . .59) above is not satisfied. In particular. Any two linearly independent vectors in S can serve as a basis. 2. for instance. the determinant is declared to be the value of the element. that is. Cullen [16]. see. That is. It is obvious from the definition that the largest possible rank of an m by n matrix is the smaller of m and n. Then the determinant |A| of A is |A| = a11 . the rank of the n × n matrix is n. The inverse exists if and only if the matrix is of full rank. Laplace Expansion Let Mij be the matrix created by deleting row i and column j from A. For a square matrix A ∈ Rn×n . Linear Algebra Review 275 The null space of A is the set of vectors that when multiplied by A produce zero: Null(A) = {w | 0 = Aw}. The rank of the matrix A is the dimension of the range space. A matrix for which the inverse does not exist is called singular. let a11 be the only element of the 1 × 1 matrix A.2.A. Then the determinant can be computed from the Laplace expansion as n |A| = j=1 aij (−1)i+j |Mij | .3 Minors and Determinants Here we make some statements about the determinants of square matrices. For a more complete treatment. For a matrix with a single element. The inverse of the square matrix A is the matrix A−1 such that AA−1 = I. the dimensions of the range and null spaces sum to n. A. . then |A| = 0 if and only if the rank of A is less than n.2. rather than its determinant. The determinant has several useful properties.4 Eigenvalues and Eigenvectors An eigenvalue of a square matrix A is a scalar λ (in general complex) such that the determinant |A − λI| = 0.276 Appendix A: Background for any row i.14 Trivially. Among these are the following: 1. the Laplace expansion for a matrix of dimension 2 is a11 a12 a21 a22 = a11 a22 − a12 a21 . etc. The cofactors for matrix of dimension 3 involve matrices of dimension 2. 3. where we have used the definition of the determinant of a 1 × 1 matrix to evaluate the minors. |A| = 0 is identical to saying that A is singular. The expansion also holds over columns. A. |AB| = |A||B|. The value Cij = (−1)i+j |Mij | is generally called the cofactor of the element aij . so this result can be used. 14 (A. If A is n × n. so that the order of the subscripts in the expansion can be reversed and the summation taken over any column in the matrix. to compute the determinant for a 3 × 3 matrix. along with the Laplace expansion. |A−1 | = 1/|A|. That is. The determinant |Mij | is known as the minor of element aij .60) Some authors refer to the matrix Mij itself as the minor. 2. equivalently. [3]. If v is an eigenvector of A.A. so that λ is an eigenvalue of the matrix. (A. Equation (A. Note that even for real A.61) This is called a quadratic form because it involves multiplication by pairs of the elements xi of x. the values of λ for which it is satisfied are sometimes known as the characteristic roots or characteristic values of A. A= 1 2 2 3 .60) is called the characteristic equation of the matrix. Networks.5 Quadratic Forms and Definite Matrices This section introduces certain definitions and concepts which are of fundamental importance in the study of networks and systems.2. An eigenvector is a vector v = 0 such that Av = λv or. as well as eigenvalues. Clearly. (A. For example. Linear Algebra Review 277 where I is the appropriately dimensioned identity matrix. v can exist only if |A − λI| = 0. and Computation by Athans et al. then αv is also an eigenvector (corresponding to the same eigenvalue) for all α ∈ C. with elements aij . Let us consider the scalar-valued function f (x) defined by the scalar product n n f (x) = x. A.62) . Some of what follows is adapted from the book Systems. that is. [A − λI]v = 0. the eigenvectors are in general complex.2. Ax = i=1 j=1 aij xi xj . A = AT . if x= x1 x2 . Suppose that x is a column n-vector with components xi and that A is a real n × n symmetric matrix. f (x) is called a nonpositive definite form and A is called a nonpositive definite matrix. . Ax ≥ 0. (d) f (x) = x. . .  . Ax ≤ 0. . We now give a procedure for testing whether a given matrix is positive definite. We now offer certain definitions. (a) f (x) = x.1 If for all x. f (x) is called a negative definite form and A is called a negative definite matrix. (c) f (x) = x. Ax = x2 + 4x1 x2 + 3x2 . The basic technique is summarized in the following theorem. Ax < 0. a1n a2n · · · ann (A. Theorem A. . . Ax > 0.278 then Appendix A: Background f (x) = x.  . 1 2 (A. . .1 Suppose that A is the real symmetric n × n matrix   a11 a12 · · · a1n  a12 a22 · · · a2n    A= . Definition A. f (x) is called a positive definite form and A is called a positive definite matrix.2. (b) f (x) = x.64) .63) which involves terms in the square of the components of x and their cross products. f (x) is called a nonnegative definite form and A is called a nonnegative definite matrix. . .2. (A. the characteristic equation. for k = 1. defined in terms of A.67) (A.65) Then A is positive definite if and only if det Ak > 0 for each k = 1.69) . a1k a2k · · · akk    . Then v ∗ Av = λv ∗ v. .  279 (A. . . The conjugate transpose of v is denoted v ∗ . There is a host of additional properties of definite and semidefinite symmetric matrices. .2.A. Linear Algebra Review Let Ak be the k × k matrix. 2. . Some of the proofs are easy.2. which we give below as theorems. n. . by    Ak =   a11 a12 · · · a1k a12 a22 · · · a2k . . Proof: Let A be a symmetric matrix and let λ be any root of the characteristic equation.66) This system of n linear equations in the unknown vector v has a nontrivial solution if and only if det(A − λI) = 0. . . . (A. . may be a complex vector. Then Av = λv. The characteristic value problem is to determine the scalar λ and the nonzero vectors v ∈ Rn which simultaneously satisfy the equation Av = λv or (A − λI)v = 0. .2 The characteristic roots or eigenvalues of a symmetric matrix are all real. but others are very difficult. (A. . . Suppose that A is a real symmetric matrix of dimension n. . Theorem A.68) where v. . . n. . . 2. the eigenvector. 280 Since v ∗ Av is a scalar, and A is real, i.e., A∗ = A, (v ∗ Av)∗ = (v ∗ Av); Appendix A: Background (A.70) that is, v ∗ Av satisfies its own conjugate and hence must be real. Since v ∗ Av and v ∗ v are real, λ must be real. Theorem A.2.3 For a symmetric matrix, all the n vectors v associated with the n eigenvalues λ are real. Proof: Since (A − λI) is real, the solution v of (A − λI)v = 0 must be real. Theorem A.2.4 If v1 and v2 are eigenvectors associated with the distinct eigenvalues λ1 and λ2 of a symmetric matrix A, then v1 and v2 are orthogonal. Proof: We know that Av1 = λ1 v1 and Av2 = λ2 v2 . This implies that v2 T Av1 = λ1 v2 T v1 and v1 T Av2 = λ2 v1 T v2 . Taking the transpose of the first equation gives T T v1 Av2 = λ1 v1 v2 . (A.71) Subtract the second equation to obtain T (λ1 − λ2 )(v1 v2 ) = 0. (A.72) Since λ1 − λ2 = 0, then v1 T v2 = 0. Suppose all the eigenvalues are distinct. Then V = [v1 , . . . , vn ]T is an orthogonal matrix. This means that since V T V = I, then V T = V −1 . (A.75) (A.74) (A.73) A.2. Linear Algebra Review 281 Even if the eigenvalues are repeated, the eigenmatrix V is still orthogonal [27]. Therefore, AV = V D, where D is a diagonal matrix of the eigenvalues λ1 , . . . , λn . Therefore, D = V T AV, where V is the orthogonal matrix forming the similarity transformation. Theorem A.2.5 A is a positive definite matrix if and only if all its eigenvalues are positive. A is a negative definite matrix if and only if all its eigenvalues are negative. In either case, the eigenvectors eigenvector of A are real and mutually orthogonal. Theorem A.2.6 If A is a positive semidefinite or negative semidefinite matrix, then at least one of its eigenvalues must be zero. If A is positive (negative) definite, then A−1 is positive (negative) definite. Theorem A.2.7 If both A and B are positive (negative) definite, and if A − B is also positive (negative) definite, then B −1 − A−1 is positive (negative) definite. Quadratic Forms with Nonsymmetric Matrices Quadratic forms generally involve symmetric matrices. However, it is clear that equation (A.61) is well defined even when A is not symmetric. The form xT Ax = 0 ∀ x ∈ Rn for some A ∈ Rn×n occasionally occurs in derivations and deserves some attention. Before continuing, we note the following. Theorem A.2.8 If A ∈ Rn×n is symmetric and xT Ax = 0 ∀x ∈ Rn , then A = 0. (A.77) (A.76) 282 Proof: n n n n Appendix A: Background i−1 x Ax = i=1 j=1 T aij xi xj = i=1 aii x2 i +2 i=2 j=1 aij xi xj = 0. For this to be true for arbitrary x, all coefficients aij must be zero. Definition A.2.2 The real matrix A is skew-symmetric if A = −AT . For any skew-symmetric A, the diagonal elements are zero, since aii = −aii only for aii = 0. Theorem A.2.9 For A skew-symmtric and any vector x of appropriate dimension, xT Ax = 0. Proof: xT Ax = xT AT x, but by definition AT = −A, so xT Ax = −xT Ax, and this can be true only if xT Ax = 0. The proof of the following is trivial. Theorem A.2.10 Any square matrix A can be written uniquely as the sum of a symmetric part As and a skew-symmetric part Aw . Given the above, we have xT Ax = xT (As + Aw ) x = xT As x and A + AT = As + Aw + AT + AT = 2As . s w As a result of these statements and Theorem A.2.8, we note the following. (A.78) A.2. Linear Algebra Review Theorem A.2.11 xT Ax = 0 ∀x ∈ Rn =⇒ A + AT = 0. 283 (A.79) It is not true, however, that xT Ax = 0 ∀x ∈ Rn =⇒ A = 0, as the matrix may have a nonzero skew-symmetric part. A.2.6 Time-Varying Vectors and Matrices A time-varying column vector x(t) is defined as a column vector whose components are themselves functions of time, i.e.,    x(t) =   x1 (t) x2 (t) . . . xn (t) while a time-varying matrix A(t) is defined as a matrix whose elements are time functions, i.e.,    A(t) =   a11 (t) a12 (t) · · · a1m (t) a21 (t) a22 (t) · · · a2m (t) . . . . . . . . . . . . an1 (t) an2 (t) · · · anm (t)    .  (A.81)    ,  (A.80) The addition of time-varying vectors and matrices, their multiplication, and the scalar-product operations are defined as before. Time Derivatives The time derivative of the vector x(t) is denoted by d/dt x(t) or x(t) and is defined by ˙   x1 (t) ˙  x2 (t)  d  ˙  x(t) = x(t) =  .  . ˙ (A.82) .  dt  . xn (t) ˙ 284 Appendix A: Background ˙ A(t) or A(t) and is    .  (A.83) The time derivative of the matrix A(t) is denoted by d/dt defined by   d  ˙ A(t) = A(t) =  dt  a11 (t) a12 (t) · · · a1m (t) ˙ ˙ ˙ a21 (t) a22 (t) · · · a2m (t) ˙ ˙ ˙ . . . . . . . . . . . . an1 (t) an2 (t) · · · anm (t) ˙ ˙ ˙ ˙ Of course, in order for x(t) or A(t) to make sense, the derivatives xi (t) and aij (t) ˙ ˙ ˙ must exist. Integration We can define the integrals of vectors and matrices in a similar manner. Thus,  tf  x1 (t)dt t0  tf  tf    t0 x2 (t)dt  , x(t)dt =  (A.84)  . . t0   . tf xn (t)dt t0   tf t A11 (t)dt · · · t0f A1m (t)dt t0 tf   . . . . . . (A.85) A(t)dt =  . . . . t0 tf t0 An1 (t)dt · · · tf t0 Anm (t)dt A.2.7 Gradient Vectors and Jacobian Matrices Let us suppose that x1 , x2 , . . . , xn are real scalars which are the components of the column n-vector x:    x=     .  (A.86) x1 x2 . . . xn Now consider a scalar-valued function of the xi , f (x1 , x2 , . . . , xn ) = f (x). (A.87) A.2. Linear Algebra Review Clearly, f is a function mapping n-dimensional vectors to scalars: f : Rn → R. 285 (A.88) Definition A.2.3 The gradient of f with respect to the column n-vector x is denoted ∂f (x)/∂x and is defined by ∂f ∂ f (x) = = ∂x ∂x so that the gradient is a row n-vector. Note A.2.1 We will also use the notation fx = to denote the partial derivative. Example A.2.2 Suppose f : R3 → R and is defined by f (x) = f (x1 , x2 , x3 ) = x2 x2 e−x3 ; 1 then ∂f = ∂x 2x1 x2 e−x3 x2 e−x3 −x1 x2 e−x3 1 . (A.91) (A.90) ∂f ∂x ∂f ∂x1 ∂f ∂f ··· ∂x2 ∂xn , (A.89) Again let us suppose that x ∈ Rn . Let us consider a function g, g : Rn → Rm , such that y = g(x), x ∈ Rn , y ∈ Rm . (A.93) (A.92) 99) (A.  .  . .  ∂gm · · · ∂gm ∂xn ∂x2 ∂g1 ∂x1 ∂g2 ∂x1 .100) (A. . Then by the chain rule d ∂f ∂f ∂f f (x) = x1 + ˙ x2 + · · · + ˙ x˙n = dt ∂x1 ∂x2 ∂xn n i=1 ∂f xi . if g : Rn → Rm . (A.4 The Jacobian matrix of g with respect to x is denoted by ∂g(x)/∂x and is defined as     ∂g(x)   =  ∂x     ∂g1 ∂g1 ··· ∂x2 ∂xn   ∂g2 ∂g2   ···  ∂x2 ∂xn  . . = ∂ x. xn ) = g2 (x).97) Definition A. .2. . . . . . we have ∂ x. xn ) = g1 (x). ∂x The definition of a Jacobian matrix yields the relation ∂ Ax = A. ∂x (A. Ay ∂x ∂ Ax. . . .94) (A. AT y = y T A. .103) . xn ) = gm (x). . ˙ ∂xi (A. . y ∂x ∂ x. . x2 . . . . . x2 . .286 By this we mean Appendix A: Background y1 = g1 (x1 .96) (A.95) (A. ∂x (A. ym = gm (x1 . x2 .  . y ∂x = ∂y T x = yT . ∂gm ∂x1 (A. . its Jacobian matrix is an m × n matrix. . y2 = g2 (x1 .102) Now suppose that x(t) is a time-varying vector and that f (x) is a scalar-valued function of x.101) = (Ay)T . As an immediate consequence of the definition of a gradient vector.98) Thus. . ˙ column vector. . x(t) ˙ d g2 (x)   = g(x) =  dt ∂x    dt . . (A.105) It should be clear that gradient vectors and matrices can be used to compute mixed time and partial derivatives. if g : Rn → Rm . . . . .106) fxx =  ∂x2 ∂x1 2   . the partial of the function with respect to one of the vector arguments is a row vector. .8 Second Partials and the Hessian Consider once more a scalar function of a vector argument f : Rn → R. .2. Linear Algebra Review which yields d f (x) = dt ∂f ∂x T 287 . The Hessian of f is the matrix of second partial derivatives of f with respect to the elements of x:   ∂ 2f ∂2f ∂2f ···  ∂x2 ∂x1 ∂x2 ∂x1 ∂xn    1   2 2  ∂2f ∂ f ∂ f    ··· ∂x2 ∂x2 ∂xn  . x(t) ˙ dt ∂x ∂g ∂x x(t). x(t) ˙  ∂x  dt g1 (x)      T    d ∂g2  .   .     2 2 2  ∂ f ∂ f ∂ f  ··· ∂xn ∂x1 ∂xn ∂x2 ∂x2 n It is clear from the definition that the Hessian is symmetric. x(t) . As above. u) : Rn+m → R..    .104) Similarly. Consider the function f (x. as in fx = ∂f = ∂x ∂f ∂x1 ∂f ∂f ··· ∂x2 ∂xn .A. then       =      (A. . .       d T  ∂gm gm (x) . A. . and if x(t) is a time-varying  T   ∂g1 d . . ˙ (A. .2. . . the norm is used to decide how large a vector is and also how large a matrix is. (A.110) . . and their operations by discussing the concept of the norm of a column vector and the norm of a matrix. matrices. denoted by ||x||2 . .108)   x=  x1 x2 . The norm is a generalization of the familiar magnitude of Euclidean length of a vector. . xn The Euclidean norm of x. Norms for Column Vectors Let us consider a column n-vector x.2.109) It should be clear that the value of ||x||2 provides us with an idea of how big x is.9 Vector and Matrix Norms We conclude our brief introduction to column vectors. 2 2 ∂ f ∂ f ··· ∂xn ∂u2 ∂xn ∂um       .     fxu (A. Thus.288 Appendix A: Background The matrix of second partials of this with respect to the vector u is the n × m matrix given by     ∂ T   fx =  =  ∂u    ∂ 2f ∂x1 ∂u1 ∂2f ∂x2 ∂u1 . . (A.     . in this manner it is used to attach a scalar magnitude to such multivariable quantities as vectors and matrices. We recall that the Euclidean norm of a column n-vector satisfies the following conditions: ||x||2 ≥ 0 and ||x||2 = 0 if and only if x = 0. . . x .107) A. is simply defined by ||x||2 = (x2 + x2 + · · · + x2 )1/2 = 1 2 n x.. 2 ∂ f ∂xn ∂u1 ∂2f ∂2f ··· ∂x1 ∂u2 ∂x1 ∂um 2 ∂ f ∂2f ··· ∂x2 ∂u2 ∂x2 ∂um . .  (A. . . ∀ x. Linear Algebra Review ||αx||2 = |α| · ||x||2 for all scalars α. 3 then ||x||1 = |2| + | − 1| + |3| = 6.114) (A.2.112). there are two other common norms: n ||x||1 = i=1 i |xi |. We encourage the reader to verify that the norms defined by (A.110) to (A. (A.5 Let x and y be column n-vectors. ∀ α ∈ R.113) (A. the triangle inequality. For this reason. y.3 Suppose that x is the column vector   2 x =  −1  .115) ||αx|| = |α| · ||x|| ||x + y|| ≤ ||x|| + ||y|| The reader should note that Equations (A. Definition A. one can generalize the notion of a norm in the following way. (A.A. Example A.118) √ 14. (A.112). In addition to the Euclidean norm.116) (A. |3|} = 3.110) to (A. | − . the Euclidean norm is not the most convenient to use in algebraic manipulations. For many applications.113) to (A. Then a scalar-valued function of x qualifies as a norm ||x|| of x provided that the following three properties hold: ||x|| > 0 ∀ x = 0.112) ||x + y||2 ≤ ||x||2 + ||y||2 .116) and (A.117) ||x||∞ = max |xi |. ||x||2 = (4 + 1 + 9)1/2 = 1|.111) (A.117) indeed satisfy the properties given in Equations (A. although it has the most natural geometric interpretation.2. ||x||∞ = max{|2|.115) represent a consistent generalization of the properties of the Euclidean norm given in Equations (A. 289 (A.2. 290 Matrix Norms Appendix A: Background Next we turn our attention to the concept of a norm of a matrix. To motivate the definition we simply note that a column n-vector can also be viewed as an n × 1 matrix. Thus, if we are to extend the properties of vector norms to those of the matrix norms, they should be consistent. For this reason, we have the following definition. Definition A.2.6 Let A and B be real n × m matrices with elements aij and bij (i = 1, 2, . . . , n: j = 1, 2, . . . , m). Then the scalar-valued function ||A|| of A qualifies as the norm of A if the following properties hold: ||A|| > 0 provided not all ∀ α ∈ R, aij = 0, (A.119) (A.120) (A.121) ||αA|| = |α| · ||A|| ||A + B|| ≤ ||A|| + ||B||. As with vector norms, there are many convenient matrix norms, e.g., n m ||A||1 = i=1 j=1 n |aij |, m 1/2 (A.122) ||A||2 = i=1 j=1 m a2 ij |aij |. j=1 , (A.123) (A.124) ||A||∞ = max i Once more we encourage the reader to prove that these matrix norms do indeed satisfy the defining properties of Equations (A.119) to (A.121). Properties Two important properties that hold between norms which involve multiplication of a matrix with a vector and multiplication of two matrices are summarized in the following two theorems. A.2. Linear Algebra Review 291 Theorem A.2.12 Let A be an n × m matrix with real elements aij (i = 1, 2, . . . , n: j = 1, 2, . . . , m). Let x be a column m-vector with elements xj (j = 1, 2, . . . , m). Then ||Ax|| ≤ ||A|| · ||x|| in the sense that (a) (b) (c) ||Ax||1 ≤ ||A||1 · ||x||1 , ||Ax||2 ≤ ||A||2 · ||x||2 , ||Ax||∞ ≤ ||A||∞ · ||x||∞ . (A.126) (A.127) (A.128) (A.125) Proof: Let y = Ax; then y is a column vector with n-components y1 , y2 , . . . , yn . n n m (a) ||Ax||1 = ||y||1 = i=1 n m |yi | = i=1 j=1 n m aij xj |aij ||xj | ≤ i=1 j=1 n m |aij xj | = i=1 j=1 ≤ i=1 j=1 n |aij | · ||x||1 m since ||x||1 ≥ |xj | = i=1 j=1 |aij | ||x||1 = ||A||1 · ||x||1 .  = m n 1/2 n m 2 1/2  (b) ||Ax||2 = ||y||2 = i=1 n m |yi | a2 ij aij xj i=1 1/2 j=1 ≤ i=1 n j=1 m x2 j j=1 1/2 n by the Schwartz inequality m 1/2 = i=1 j=1 a2 ij ||x||2 2 = i=1 j=1 a2 ij ||x||2 = ||A||2 · ||x||2 . 292 m Appendix A: Background (c) ||AX||∞ = ||y||∞ = max |yi | = max i i j=1 m aij xj m ≤ max i j=1 m |aij xj | = max i j=1 |aij ||xj | because ||x||∞ ≥ |xj |, ≤ max i j=1 m |aij | · ||x||∞ , |aij j=1 = max i ||x||∞ = ||A||∞ ||x||∞ . We shall leave it to the reader to verify the following theorem by imitating the proofs of Theorem A.2.12. Theorem A.2.13 Let A be a real n × m matrix and let B be a real m × q matrix; then ||AB|| ≤ ||A|| · ||B|| in the sense that (a) ||AB||1 ≤ ||A||1 · ||B||1 , (b) ||AB||2 ≤ ||A||2 · ||B||2 , (c) ||AB||∞ ≤ ||A||∞ · ||B||∞ . (A.130) (A.131) (A.132) (A.129) A multitude of additional results concerning the properties of norms are available. Spectral Norm A very useful norm, called the spectral norm, of a matrix is denoted by ||A||s . Let A be a real n × m matrix. Then AT is an m × n matrix, and the product matrix AT A is an m × m real matrix. Let us compute the eigenvalues of AT A, denoted by λi (AT A), i = 1, 2, . . . , m. Since the matrix AT A is symmetric and positive semidefinite, it has A.3. Linear Dynamical Systems real nonnegative eigenvalues, i.e., λi (AT A) ≥ 0, i = 1, 2, . . . , m. 293 (A.133) Then the spectral norm of A is defined by ||A||s = max[λi (AT A)]1/2 , i.e., it is the square root of the maximum eigenvalue of AT A. Remark A.2.1 The singular values of A are given by λ1/2 (AT A). (A.134) A.2.10 Taylor’s Theorem for Functions of Vector Arguments We consider the Taylor expansion of a function of a vector argument. Using the definitions developed above, we have the following. Theorem A.2.14 Let f (x) : Rn → R be N times continuously differentiable in all of its arguments at a point x0 . Consider f (x0 + εh), where h = 1. (It is not important which specific norm is used; for simplicity, we may assume it is the Euclidean.) Then f (x0 + εh) = f (x0 ) + ε where RN → 0. ε→0 εN lim The coefficients of the first two terms in the expansion are the gradient vector and the Hessian matrix. ∂f ∂x h+ x0 ε2 T ∂ 2 f h 2! ∂x2 h + · · · + RN , x0 A.3 Linear Dynamical Systems In this section we review briefly some results in linear systems theory that will be used in the main text. For a more complete treatment, see [8] or [12]. Consider the 294 continuous-time linear system x(t) = A(t)x(t) + B(t)u(t), ˙ Appendix A: Background x(t0 ) = x0 given, (A.135) where x(·) ∈ Rn , u(·) ∈ Rm , and A and B are appropriately dimensioned real matrices. The functions aij (t), i = 1, . . . , n, j = 1, . . . , n, that make up A(t) are continuous, as are the elements of B(t). The control functions in u(·) will be restricted to being piecewise continuous and everywhere defined. In most cases, for both clarity and convenience, we will drop the explicit dependence of the variables on time. When it is important, it will be included. Some Terminology. The vector x will be termed the state vector or simply the state, and u is the control vector. The matrix A is usually known as the plant or the system matrix, and B is the control coefficient matrix. We will assume that the system is always controllable. This means that given x0 = 0 and some desired final state x1 at some final time t1 > t0 , there exists some control function u(t) on the interval [t0 , t1 ] such that x(t1 ) = x1 using that control input. This is discussed in detail in [8]. Fact: Under the given assumptions, the solution x(·) associated with a particular x0 and control input u(·) is unique. In particular, for u(·) ≡ 0 and any specified t1 and x1 , there is exactly one initial condition x0 and one associated solution of (A.135) such that x(t1 ) = x1 . State Transition Matrix. Consider the system with the control input identically zero. Then we have x(t) = A(t)x(t). ˙ Under this condition, we can show that the solution x(t) is given by the relation x(t) = Φ(t, t0 )x0 , A.3. Linear Dynamical Systems 295 where Φ(·, ·) is known as the state transition matrix, or simply the transition matrix of the system. The state transition matrix obeys the differential equation d Φ(t, t0 ) = A(t)Φ(t, t0 ), dt and has a couple of obvious properties: 1. Φ(t2 , t0 ) = Φ(t2 , t1 )Φ(t1 , t0 ), 2. Φ(t2 , t1 ) = Φ−1 (t1 , t2 ). It is important to note that the transition matrix is independent of the initial state of the system. In the special case of A(t) = A constant, the state transition matrix is given by Φ(t, t0 ) = eA(t−t0 ) , where the matrix exponential is defined by the series eA(t−t0 ) = I + A(t − t0 ) + 1 2 1 A (t − t0 )2 + · · · + Ak (t − t0 )k + · · · . 2! k! (A.137) Φ(t0 , t0 ) = I, (A.136) It is easy to see that the matrix exponential satisfies (A.136). Fundamental Matrix: A fundamental matrix of the system (A.135) is any matrix X(t) such that d X(t) = A(t)X(t). dt (A.138) The state transition matrix is the fundamental matrix that satisfies the initial condition X(t0 ) = I. Fact: If there is any time t1 such that a fundamental matrix is nonsingular, then it is nonsingular for all t. An obvious corollary is that the state transition matrix is nonsingular for all t (since the initial condition I is nonsingular). t1 ) = X(t) · X −1 (t1 ). ˙ Φ(t. The solution to the linear system for some given control function u(·) is given by t x(t) = Φ(t. t1 ∈ R. t0 )x0 + t0 Φ(t. (A. then for all t. τ )B(τ )u(τ )dτ.135). Solution to the Linear System. .139) The result is proven by taking the derivative and showing that it agrees with (A.296 Appendix A: Background Fact: If X(t) is any fundamental matrix of x = Ax. 2. Optimization and Control of Nonlinear Systems Using the Second Variation.. NJ. D.. AC-16.. Networks. 1975. New York. Philadelphia. pp. G. 1970. O. Prentice Hall.. 1. New York. [3] Athans. J. L.. IEEE Transactions on Automatic Control.297 Bibliography [1] Anderson. SIAM Journal on Control. J. and Mason. J. Series A. D. Practical Methods for Optimal Control Using Nonlinear Programming. Finite Dimensional Linear Systems. B. J. [6] Bliss. H. B. Speyer. J. McGraw-Hill.. A. Singular Optimal Control Problem.. 1946. V. Systems. W. M. [5] Betts.. [4] Bell. A. Spann. 1974. [7] Breakwell. New York.. SIAM. and Bryson. and Moore.. Linear Optimal Control. [2] Athans. J. Vol. No. 193–223. R. . N. and Jacobson.. Special Issue on the Linear-Quadratic-Gaussian Problem. Chicago. December 1971.. Academic Press. Vol. M. S. University of Chicago Press. 1963. M. 1971.. Englewood Cliffs. [8] Brockett. L. Dertouzos. John Wiley. and Computation. D. E. 2001. Ed. Lectures on the Calculus of Variations. R. 1–17. 1975. Washington. P. [11] Bryson. C. 76–90.. No. 8. Variable Metric Method for Minimization. McGraw–Hill. [12] Callier. [13] Cannon. Y. Vol. 6. and Francis.. 2nd Edition. State-Space Solutions to Standard H-2 and H-Infinity Control-Problems. 1978. International Journal of Control. 1994. D. [17] Davidon.. Matrices and Linear Transformations. E. 34. Singular Optimal Control: The Linear Quadratic Problem. W. Springer. and Ho. The Convergence of a Class of Double-Rank Minimization Algorithms. and Mayne. New York. Springer Texts in Electrical Engineering. N. SIAM Journal on Optimization.. C.. D. Hemisphere Publishing. . F. C. G. pp. and Desoer C. pp. Q. 6. [16] Cullen. Vol. 1958.. New York. [18] Doyle. Theory of Ordinary Differential Equations. P.. and Anderson. [14] Clements. E. 1991. Theory of Optimal Control and Mathematical Programming. C. [10] Bryant.C. M.298 Bibliography [9] Broyden.. New York. 20.. D. 1021–1054. D. Vol.. A. J. A. Khargonekar. C. B. G.. and Levinson. and Polak.. New York. August 1989. D. K. No. F. 1990. Applied Optimal Control. D. Springer. Linear Systems Theory. [15] Coddington. pp. McGraw-Hill. Dover.. 1. O.. J. pp. Journal of the Institute of Mathematics and Its Applications. B. The Maximum Principle. 1970. M. Glover.. IEEE Transactions on Automatic Control. New York. 831–847. E. 1174. Cullum. C. G. Vol. 1970. 2001. . Academic Press. New York. 1967. M. P. Murray. S. P. J. and Rambau. H. H. M. [25] Halkin. Eds. 1977.. New York. M. Academic Press. H. and Wright. [22] Gelfand.. Prentice Hall.. D. New York. S. [24] Grtschel.. E. E. A Tutorial Introduction to Optimality Conditions in Nonlinear Programming. G. S. Academic Press.. 1973. A. and McReynolds. New York. and Fomin. M. Elementary Matrix Algebra.. 1965.. November 1972.. Englewood Cliffs. 1971. New York. I. R. Academic Press. Practical Optimization. New York.. Leitmann. [21] Friedman. 1970. 1981. Extensions of Linear-Quadratic Control. Academic Press. [29] Jacobson. 1963. [23] Gill. Holt.. D. O. Macmillan. Berlin. Calculus of Variations and Optimal Control Theory. [27] Hohn. Calculus of Variations. Rinehart and Winston. New York.. Advanced Calculus. Topics in 0ptimization. 1965. E. Mathematical Foundations of System Optimization.. F.. New York. Optimization and Matrix Theory. Krumke. 3rd Edition. [26] Hestenes. NY. The Computation and Theory of Optimal Control.. 4th National Conference of the Operations Research Society of South Africa. John Wiley. H. Online Optimization of Large Scale Systems. R. S.. Dynamic Programming and the Calculus of Variations. W. [20] Dyer. Springer-Verlag. [28] Jacobson.Bibliography 299 [19] Dreyfus. Ed. V... J... [37] Pars. D. Robinson. pp. 1969. L. J... Linear Optimal Control Systems. Wiley Interscience. Neustadt.W. Gamkrelidze.. L. P. and Sivan. E. J. Springer Series in Operations Research. Lecture Notes in Control and Information Sciences.. Boltyanskii. [33] Kuhn. Journal of Mathematical Analysis and Applications.. New Necessary Conditions of Optimality for Control Problems with State Variable Inequality Constraints. T.. Numerical Optimization.300 Bibliography [30] Jacobson. Q.. 1971. D.. 2. New York. Pacher. [35] Mangasarian. V. O. Wiley Interscience. Glynn and S. S.. M. 1951... 2000. Extensions of Linear-Quadratic Control Theory. and Geveci. Springer-Verlag. Berlin. McGraw-Hill. 1965.. New York. Differential Dynamic Programming. Lele. and Mayne. [32] Jacobson. V. M.. and Wright. L. and Mischenko. Martin. G. H. 1972.. A. A Treatise on Analytical Dynamics. L. University of California Press. H. 1962. and Speyer. 27. H... H. R. W. Eds. Nonlinear Programming. Berkeley. 1970. . R. [34] Kwakernaak. New York. D. Vol.. The Mathematical Theory of Optimal Processes. [36] Nocedal. D. Nonlinear Programming. New York. [38] Pontryagin. New York. Second Berkeley Symposium of Mathematical Statistics and Probability. H. [31] Jacobson. Elsevier. New York. L. D. and Tucker. M. Springer-Verlag. F. Ed. 1980. S. 35. Vol.. John Wiley. M. H. 255–284 . No.. Vol. 1972. Princeton University Press. 3rd Edition. I. NJ. P. P. 621–634. December 1971. Special Issue on Multivariable Control.Bibliography 301 [39] Rall. [44] Varaiya. Van Nostrand. New York... Texts in Applied Mathematics 12.. Springer. Wiley. J. [40] Rhee. No. 1947.. pp... Least Squares Stationary Optimal Control and the Algebraic Riccati Equation. 1975. [43] Stoer... B. 2002. J. . The Geometry of the Riccati Equation. 1969. Princeton. A. Nonlinear Programming: A Unified Approach. 1. IEEE Transactions on Automatic Control. R. L. Ed. Notes on Optimization. L.. A Game-Theoretic Approach to a Finite-Time Disturbance Attenuation Problem.. September 1991. Vol. AC-36. 1021–1032. New York. W. [45] Willems. IEEE Transactions on Automatic Control. [41] Rodriguez-Canabal. and Bulirsch. 9. M. [46] Wintner. 347–351. J. and Speyer. Prentice–Hall.. [47] Zangwill. 1969. 1. Vol. Stochastics. AC-16. pp. IEEE Transactions on Automatic Control. Englewood Cliffs. The Analytical Foundations of Celestial Mechanics. Vol. No. D. February 1981. New York. NJ.. AC-26. Computational Solution of Nonlinear Operator Equations. Introduction to Numerical Analysis. [42] Sain. J. pp. . Index accelerated gradient methods. 13 calculus of variations. 279 characteristic value. 53. 155. 276. 235 chain rule. 79 brachistochrone problem. 177 completed the square. 276. 27 accessory minimum problem. 214 bang-bang. 215 disturbances. 279. 279. 281 eigenvector. 180 . canonical similarity transformation. 161 disturbance attenuation function. 161. 281 determinant. 215. 286 characteristic equation. 199 basis. 68 bounded control functions. 238 dimension. 208 asymptotically stable. 226 controllability. 172. 274 cofactor. see eigenvalue eigenvalue. 162 asymptotic. 15 differential game problem. 211 augmented performance index. 156 autonomous (time-invariant) Riccati equation. 136 control weighting. 177. 175 236 canonical transformation. 28 constrained Riccati matrix. 232. 276. 273 Bliss’s Theorem. 207 continuous differentiability. 275 differentiable function. 276 complete controllability. 235 completely observable. 145 controllability Grammian. 210 conjugate gradient. 287 Hilbert’s integral. 161 linear dynamic system. 121. 45 influence function. 180 gradient. 99. 11 fundamental matrix. 285 Hamilton–Jacobi–Bellman (H-J-B) equation. 288. 117 linear-quadratic optimal control. 284 integrate by parts. 86. 235 general second-order necessary condition. 53 identity matrix. 175 Jacobian matrix. 289 extremal points. 262 inequality constraint. 175 free end-point control problem. 179 Euclidean norm. 87 homogeneous ordinary differential equation. 59 Lagrange multiplier. 37. 96. 32 first-order optimality. 61. 122 focal point condition. 170 linear dynamic constraint. 47. 145 linear ordinary differential equation. 55 linear independence.304 elementary differential equation theory. 44. 286 INDEX homogeneous Riccati equation. 119. 57 Jacobi condition. 175 escape time. 72 integrals of vectors and matrices. 148 Hamiltonian. 273 linear minimum time problem. 35. 53 linear control rule. 295 game theoretic results. 125 functional optimization problem. 212 Implicit Function Theorem. 120 global optimality. 78 linear algebra. 93. 45. 83. 275 Legendre–Clebsch. 14 first-order necessary condition. 39. 169 Hessian. 58 Laplace expansion. 43. 44 general terminal constraint. 124 Hamiltonian matrix. 72 . 130 initial value problem. 111. 111 negative definite matrix. 56 piecewise continuous perturbation. 117 piecewise differentiability. 112. 194 nonpositive definite matrix. 275 matrix norms. 290 norm of a column vector. 211 matrix inverse. 61. 173. 196 optimal value function. 201 necessary condition. 125 normality condition. 213 Lyapunov function. 178 piecewise continuous. 278 parameter optimization. 278 Newton–Raphson method. 195 null space.INDEX linear-quadratic regulator problem. 276 modern control synthesis. 288 norm of a matrix. 177 positive definite matrix. 274 305 observability Grammian matrix. 111 nonnegative definite matrix. 124. 288. 91 nonlinear terminal equality constraints. 180 necessary and sufficient condition for the quadratic cost criterion. 254 penalty function. 290 matrix Riccati differential equation. 155 order. 176. 238 minimize. 290 normality. 186 maximize. 231 minor. 27 nonlinear control problems. 83. 84 . 210 optimal control rule. 186 matrix Riccati equation. 231 minimax. 111 positive definite. 289. 162 piecewise continuous control. 75 Pontryagin’s Principle. 208 multi-input/multi-output systems. 278 norm. 12 Parseval’s Theorem. 155 necessary and sufficient condition. 119 necessary condition for optimality. 129 perfect square. 278 propagation equation. 208 monotonically increasing function. 271 monotonic. 83 switch time. 161 quasi-Newton methods. 147 symplectic property. 74 steepest descent algorithm. 134 strongly positive. 277 quadratic matrix differential equation. 185. 248 stabilizability. 212 state weighting. 292 spectral radius condition. 26 Taylor expansion. 271 terminal constraint. 172 quadratic performance criterion. 185 singular matrix. 235 sampled data controller. 284 totally singular. 137 strongly locally optimal. 27 range space. 203. 178. 75 strong perturbations. 232.306 quadratic form. 129 steepest descent method. 17. 186 tracking error. 273 sufficiency conditions. 236 . 233 saddle point optimality. 253 terminal equality constraints. 186 strong variations. 133 strong positivity. 186 sufficient condition for optimality. 235 saddle point inequality. 282 slack variables. 181. 199 time derivative of the matrix. 274 Riccati equation. 111 terminal manifold. 170. 221. 236 second-order necessary condition. 134. 174. 193 syntheses: H2 . 226 steepest descent. 231. 226 subspace. 255 skew-symmetric matrix. 244 syntheses: H∞ . 170 saturation. 34 singular control problem. 150 INDEX steepest descent optimization with constraints. 276 singular values. 274 rank. 167 strongly first-order optimal. 275. 43 strong form of Pontryagin’s Principle. 46 spectral norm. 119. 75. 112. 233 307 . 169. 119 zero sum game. 170. 53. 207 variational Hamiltonian. 57. 122 weak perturbation. 294 two-point boundary-value problem.INDEX trajectory-control pair. 74. 128 unconstrained Riccati matrix. 72. 64 transition matrix. 122 weak Pontryagin’s Principle. 163 weak first-order optimality. 111 weak perturbations in the control.

Comments

Description