Chapter 4Several-variable calculus 4.1 Derivatives of Functions of Several Variables 4.1.1 Functions of Several Variables ² A function f of n variables (x 1 , x 2 , . . . , x n ) in R n is an entity that operates on these variables to produce another real number y = f(x 1 , x 2 , . . . , x n ). ² x 1 , x 2 , . . ., x n are called the independent variables, y the dependent variable. ² We write f: R n !R to indicate that f maps R n (or a domain within R n ) into R. 4.1.2 Geometric Interpretation For a function of two variables, f(x, y), consider (x, y) as defining a point P in the xy- plane. Let the value of f(x, y) be taken as the length PP 0 drawn parallel to the z-axis (or the height of point P 0 above the plane). Then as P moves in the xy-plane, P 0 maps out a surface in space whose equation is z = f(x, y). 33 Just as a function of one variable has a graph which is cut only once by each vertical line (constant x), here the surface can only be cut once by each vertical line (constant x and y). Example: f(x, y) = 6 ¡2x ¡3y The surface z = 6 ¡2x ¡3y, i.e. 2x + 3y + z = 6, is a plane with intersects: the x-axis where y = z = 0, i.e. x = 3; the y-axis where x = z = 0, i.e. y = 2; the z-axis where x = y = 0, i.e. z = 6. Example: f(x, y) = x 2 ¡ y 2 In the plane x = 0, there is a maximum at y = 0; in the plane y = 0, there is a minimum at x = 0. The whole surface is shaped like a horse’s saddle; and the point (0, 0) is called a saddle point (of which, more later). x y z = x 2 ¡y 2 Example: f(x, y) = x 2 +y 2 The intersection with the plane x = 0 is the parabola z = y 2 and with the plane y = 0 is the parabola z = x 2 . This surface is symmetric about the z axis, and is a paraboloid (parabolic bowl). x y z = x 2 + y 2 4.1.3 Partial Derivatives Given a function of several variables, we could choose to hold all but one of these variables fixed at arbitrarily chosen values, thereby obtaining a function of one variable (the 34 remaining one), which could then be differentiated. Definition Given a function f(x 1 , . . . , x n ) of n variables and an integer k between 1 and n, the partial derivative ∂f ∂x k = f x k = ∂ x k f of f with respect to the variable x k is the derivative of f with respect to x k only, while the remaining n ¡1 variables are all held fixed. Explicitly ∂f ∂x k ´ f x k = lim δx k →0 · f(x 1 , . . . , x k−1 , x k + δx k , x k+1 , . . . , x n ) ¡f(x 1 , . . . , x n ) δx k ¸ , (4.1) itself a function of (x 1 , . . . , x n ). In practice the variables held fixed act as constants: f(x) = 3x 4 + sin(2x) ) df/dx = 12x 3 + 2 cos (2x) f(x, y) = yx 4 + sin(yx) ) ∂f/∂x = 4yx 3 + y cos (yx) . Geometrical interpretation of partial derivatives in the case n = 2 Recall that the graph of f is the surface z = f(x, y) with the z coordinate measured vertically upwards. The cross section of this surface cut by a vertical plane y = constant is a curve whose slope (gradient) is the partial derivative f x . (see figure). Similarly f y is the slope of the cross section of the graph by a vertical plane x = constant. One may interpret the partial derivatives f x and f y as the slope encountered by “walking” over the surface in the x and y directions respectively. 35 Remark It is obvious from the definition that the partial derivative with respect to a particular variable obeys the same sum, product and quotient rules D II - D IV as the ordinary (single variable) derivative, i.e., if u and v are both functions of x 1 , . . . , x n , then, for k = 1, . . . , n, ∂ ∂x k (u + v) = ∂u ∂x k + ∂v ∂x k , (4.2) ∂ ∂x k (uv) = u ∂v ∂x k + v ∂u ∂x k , (4.3) ∂ ∂x k ³ u v ´ = 1 v 2 µ v ∂u ∂x k ¡u ∂v ∂x k ¶ (v 6= 0). (4.4) Corresponding to D I, we have the result that f(x 1 , . . . , x n ) is independent of x k iff ∂f ∂x k is zero for all (x 1 , . . . , x n ). (4.5) and the consequent result f(x 1 , . . . , x n ) = constant iff the n partial derivatives are all zero for all (x 1 , . . . , x n ). (4.6) Corresponding to the chain rule D V, we have the result that, if g is a function of x 1 , . . . , x n and f a function of a single variable, then ∂ ∂x k [ffg(x 1 , . . . , x n )g] = f 0 fg(x 1 , . . . , x n )g ∂g ∂x k . (4.7) For example, if f (g) = sin g and g (x, y) = x 2 + xy then f (x, y) = sin ¡ x 2 + xy ¢ f x = (2x + y) cos ¡ x 2 + xy ¢ A more powerful and very important generalization of the chain rule is coming up later in this chapter. Example Calculate the partial derivatives of the functions: (a) f(x, y) = x 2 + 2xy 2 + y 3 ; (b) f(x, y, z) = xz + e yz + sin (xy). Solution (a) Holding y constant gives ∂f ∂x = 2x + 2y 2 + 0. Holding x constant gives ∂f ∂y = 0 + 4xy + 3y 2 . (b) Holding both y and z constant gives f x = z + 0 + y cos (xy). Holding both x and z constant gives f y = 0 + ze yz + xcos (xy). Holding both x and y constant gives f z = x + ye yz + 0. 36 Example (implicit partial differentiation) If z is a function of two independent variables x and y, and z satisfies xz + lnz = 2x + 3y, find ∂z ∂x in terms of x, y and z. Solution Differentiating each term in the equation with respect to x, holding y constant, and treating z as a function of x, we obtain x ∂z ∂x + z + 1 z ∂z ∂x = 2 so that ∂z ∂x = z(2 ¡z) (1 + xz) . 4.1.4 Second and Higher Order Partial Derivatives Since ∂f ∂x = f x and ∂f ∂y = f y are themselves functions of x and y, they themselves have partial derivatives, for which we use the notations ∂ 2 f ∂x 2 = ∂ ∂x µ ∂f ∂x ¶ = (f x ) x = f xx , (4.8) ∂ 2 f ∂y∂x = ∂ ∂y µ ∂f ∂x ¶ = (f x ) y = f xy , (4.9) ∂ 2 f ∂x∂y = ∂ ∂x µ ∂f ∂y ¶ = (f y ) x = f yx , (4.10) ∂ 2 f ∂y 2 = ∂ ∂y µ ∂f ∂y ¶ = (f y ) y = f yy . (4.11) This notation extends obviously to higher order derivatives and to functions of three or more variables. For obvious reasons, f xy and f yx are called mixed derivatives. Example If f(x, y) = x 4 y 2 ¡x 2 y 6 then ∂f ∂x = 4x 3 y 2 ¡2xy 6 ∂f ∂y = 2x 4 y ¡6x 2 y 5 ∂ 2 f ∂x 2 = 12x 2 y 2 ¡2y 6 ∂ 2 f ∂y∂x = 8x 3 y ¡12xy 5 ∂ 2 f ∂y 2 = 2x 4 ¡30x 2 y 4 ∂ 2 f ∂x∂y = 8x 3 y ¡12xy 5 Mixed Derivatives Theorem If f x , f y and f xy exist and are continuous, then f yx exists and f xy = f yx . 37 We will not prove this theorem(we have not fully defined the word continuous); but for reasonable functions it will always apply. This means that to calculate a mixed derivative we can calculate in either order. For third-order derivatives the mixed derivatives theorem gives f xxy = f xyx = f yxx and so on (check for yourself in the last example). Example Verify the Mixed Derivatives Theorem for the function f(x, y) = xy 3 + xsin xy. Solution Using the sum, product and chain rules, we see that f x = y 3 +sin xy +xy cos xy, and hence that f xy = (f x ) y = 3y 2 + xcos xy + (xcos xy ¡x 2 y sin xy) = 3y 2 + 2xcos xy ¡x 2 y sin xy. Similarly, f y = 3xy 2 + x 2 cos xy, so f yx = (f y ) x = 3y 2 + (2xcos xy ¡x 2 y sin xy) = f xy . Example In 3 dimensions, the distance r of a point from the origin is given in terms of its Cartesian coordinates x, y and z by r = p x 2 + y 2 + z 2 = (x 2 + y 2 + z 2 ) 1/2 . Show that the function φ(x, y, z) = 1/r = (x 2 + y 2 + z 2 ) −1/2 obeys Laplace’s equation ∂ 2 φ ∂x 2 + ∂ 2 φ ∂y 2 + ∂ 2 φ ∂z 2 = 0 (except at the origin). Solution By the chain rule, ∂φ ∂x = ¡ 1 2 (x 2 + y 2 + z 2 ) −3/2 (2x) = ¡x(x 2 + y 2 + z 2 ) −3/2 . Therefore, by the product and chain rules, ∂ 2 φ ∂x 2 = ¡(x 2 + y 2 + z 2 ) −3/2 + (¡x) · ¡ 3 2 (x 2 + y 2 + z 2 ) −5/2 (2x) ¸ = ¡(x 2 + y 2 + z 2 ) −3/2 + 3x 2 (x 2 + y 2 + z 2 ) −5/2 . Similarly, by symmetry, ∂ 2 φ ∂y 2 = ¡(x 2 + y 2 + z 2 ) −3/2 + 3y 2 (x 2 + y 2 + z 2 ) −5/2 , ∂ 2 φ ∂z 2 = ¡(x 2 + y 2 + z 2 ) −3/2 + 3z 2 (x 2 + y 2 + z 2 ) −5/2 . Adding the three above equations now gives ∂ 2 φ ∂x 2 + ∂ 2 φ ∂y 2 + ∂ 2 φ ∂z 2 = ¡3(x 2 + y 2 + z 2 ) −3/2 + 3(x 2 + y 2 + z 2 )(x 2 + y 2 + z 2 ) −5/2 = ¡3(x 2 + y 2 + z 2 ) −3/2 + 3(x 2 + y 2 + z 2 ) −3/2 = 0. 38 4.2 Linear Approximations and Tangents 4.2.1 Tangent to Graph of a Function of One Variable The tangent to the curve y = f(x) at A = (a, f(a)) is the straight line through A with slope f 0 (a), i.e. it has the equation y = f(a) + (x ¡a)f 0 (a). (4.12) NB1. for this line, dy dx = f 0 (a) and y = f(a) at x = a. NB2. The RHS consists of the first two terms of the Taylor expansion of f about x = a (i.e. it is the best linear approximation to f (x) near x = a). Example Find the linear approximation to f(x) = 1 + x 2 near x = 2. Solution If f(x) = 1 + x 2 then f 0 (x) = 2x. At the point x = 2 we have f = 5 and f 0 = 4. Therefore the linear approximation is f(x) ¼ 5 + 4(x ¡2) = 4x ¡3. 4.2.2 Tangent Plane to Graph of a Function of Two Variables By analogy with the above, this is the (best) linear approximation to f near (a, b), as given by the first two terms of the two-variable Taylor series (appendix F). It is the plane whose equation is z = f(a, b) + (x ¡a)f x (a, b) + (y ¡b)f y (a, b). (4.13) NB: For this plane, ∂z ∂x = f x (a, b), ∂z ∂y = f y (a, b) and z = f(a, b) at x = a and y = b, i.e. we have matched the first derivatives and the value of the function at (a, b). Example Find the tangent plane to the surface z = f(x, y) = x 2 + y 2 near the point x = 1, y = 2. Solution If f(x, y) = x 2 + y 2 then f x = 2x and f y = 2y. At the point (1, 2) we have f = 5, f x = 2 and f y = 4. Thus the tangent plane is z = 5 + (x ¡1)2 + (y ¡2)4 = 2x + 4y ¡5. 4.3 Directional derivatives and the gradient vector For f(x, y), f x and f y measure the rates of change of f along the x and y directions. How can we can calculate the rate of change of f in any direction? We need to know how much f changes when both x and y change by small amounts. Near x = a and y = b, f(x, y) is approximately given by equation (4.13) for the tangent plane. Let x change by a (vanishingly) small amount dx, and y by dy (i.e. x = a + dx, y = b + dy) then f (x, y) ¼ f(a, b) + (x ¡a)f x (a, b) + (y ¡b)f y (a, b) f(a + dx, b + dy) ¼ f(a, b) + (dx)f x (a, b) + (dy)f y (a, b). 39 The change in f is df = f(a + dx, b + dy) ¡f(a, b), so df = (dx)f x (a, b) + (dy)f y (a, b) = rf ¢ dr, where we have defined the two dimensional vector representing the change in x and y, dr = dxi + dyj = (dx, dy) and the two dimensional gradient vector rf = ∂f ∂x i + ∂f ∂y j = µ ∂f ∂x , ∂f ∂y ¶ . (4.14) We can, additionally, write dr = udr where u is a unit vector in the direction dr and dr is the magnitude of the change. Then: df = rf ¢ udr and so Rate of change of f in the direction of u = df dr = rf ¢ u. The above generalises to functions of more than two variables. E.g. for a function of three variables, f (x, y, z) the three-dimensional gradient vector is rf = ∂f ∂x i + ∂f ∂y j + ∂f ∂z k = µ ∂f ∂x , ∂f ∂y , ∂f ∂z ¶ (4.15) 4.3.1 Two properties of the gradient The change df in f due to a change in the position by dr = udr is given by df = rf ¢ dr = rf ¢ udr = jrfj dr cos θ (4.16) where θ is the angle between the vectors dr and rf. We look at cases where dr is parallel or perpendicular to rf. Property 1. From (4.16) the direction dr for which df is a maximum is that for which cos θ = 1, or θ = 0, i.e. dr in the direction of rf. Thus At any point, rf points in the direction in which f is increasing most rapidly and its magnitude jrfj gives this maximum rate of change. i.e. rf “points uphill”. Property 2. From (4.16), df = 0 corresponds to θ = π/2, when rf and dr are perpendicular. But df = 0 means that f has not changed – so dr is along the surface f =constant. Thus At any point, rf points is perpendicular to the surface f = constant through that point. NB f =constant is a contour of the function f. For a function of two variables, these two properties are illustrated in the following picture: 40 Example If f(x, y, z) = z 3 + 3x 2 y 2 + sin z, find rf. Solution The three partial derivatives are ∂f ∂x = 0 + 6xy 2 + 0 = 6xy 2 ∂f ∂y = 0 + 6x 2 y + 0 = 6x 2 y ∂f ∂z = 3z 2 + 0 + cos z = 3z 2 + cos z so rf = ¡ 6xy 2 , 6x 2 y, 3z 2 + cos z ¢ . Example If f (x, y, z) = x 2 +xy+z, find rf. What is the rate of change of f along the direction i+2j +2k at the point P(1, 1, 1)? What is the magnitude of the maximum rate of change of f at this point? Solution rf = µ ∂f ∂x , ∂f ∂y , ∂f ∂z ¶ = (2x + y, x, 1) Now, at the point P(1, 1, 1), rf = (3, 1, 1). To find the rate of change of f along a vector v = (1, 2, 2) , we need the unit vector along this direction, which is ˆ v = 1 p 1 2 + 2 2 + 2 2 (1, 2, 2) = µ 1 3 , 2 3 , 2 3 ¶ . So, the rate of change of f in this direction is rf ¢ ˆ v = (3, 1, 1) ¢ µ 1 3 , 2 3 , 2 3 ¶ = 1 + 2 3 + 2 3 = 7 3 . 41 The maximum rate of change of f at the point P is jrfj = (11) 1/2 . Example Find a unit vector perpendicular to the surface z = x 2 + y 2 at the point A(1, 2, 5). Solution A (via the tangent plane) Earlier, we found the tangent plane to this surface at this point to be 2x + 4y ¡z = 5. The vector equation of a plane can be written as r ¢ n = a where r = (x, y, z) and n is a vector perpendicular to the plane. By inspection, we see that n = (2, 4, ¡1) is such a vector, and so a unit vector in this direction is ˆ n = 1 p 2 2 + 4 2 + 1 2 (2, 4, ¡1) = 1 p 21 (2, 4, ¡1) . Solution B (treat the surface as a contour of a function of three variables). The equation of the surface can be written as x 2 + y 2 ¡z = 0 so if we define a function f(x, y, z) = x 2 +y 2 ¡z we can say the surface is the contour of the function f given by f(x, y, z) = 0 = constant. We know, from Property 2 above, that rf is perpendicular to the surface f =constant. rf = (2x, 2y, ¡1) = (2, 4, ¡1) at the point A(1, 2, 5). So, as before, a unit vector perpendicular to the surface is ˆ n = 1 p 2 2 + 4 2 + 1 2 (2, 4, ¡1) = 1 p 21 (2, 4, ¡1) . 4.4 Stationary (Critical) Points of a Function of Two Variables 4.4.1 Definition For a function of two variables, f(x, y), a stationary point (x ∗ , y ∗ ) is defined to be a point at which the gradient vector is zero: rfj (x ∗ ,y ∗ ) = (f x (x ∗ , y ∗ ), f y (x ∗ , y ∗ )) = (0, 0), (4.17) i.e. both of the partial derivatives ∂f ∂x and ∂f ∂y are zero at that point. The value z ∗ = f(x ∗ , y ∗ ) of f at (x ∗ , y ∗ ) is the corresponding stationary value (SV). 42 4.4.2 Classification of SP’s of a Function of Two Variables There are three main types of stationary point for a function of two variables, maximum, minimum and saddle points. These are sketched as follows: Maximum: A local peak in the function. To get a peak, we must ensure that when point (x, y) moves away from (x ∗ , y ∗ ) a small distance in any direction, the value of f(x, y) always decreases. Minimum: A local trough in the function. To get a trough, we must ensure that when point (x, y) moves away from (x ∗ , y ∗ ) a small distance in any direction, the value of f(x, y) always increases. Saddle point: Looks like a horse’s saddle!. Moving off in some directions away from (x ∗ , y ∗ ) leads to an increase in f, while moving off in other directions leads to a decrease in f. Contours: We can represent the “landscape” of the surface z = f(x, y) by contour lines, which are curves in the (x, y) plane on which f(x, y) takes different constant val- ues. Around a maximum, the value of f(x, y) is always smaller than its value z ∗ at the maximum. The contours are closed loops around the stationary point. Around a minimum, f(x, y) > z ∗ and again the contours are closed loops around the stationary point. The representation of a saddle point by contour lines has the characteristic appearance depicted below.At the level of the saddle there are two contour lines which cross at the saddle. These two crossing contour lines separate two regions in which f > z ∗ from two regions in which f < z ∗ . Thus, as we move away from the saddle in different directions, there are two pairs of opposite directions in which f stays fixed (along the crossing contour lines), and these directions separate two opposite ranges of direction in which f increases from two opposite ranges of direction in which f decreases. 43 44 To investigate what type a given stationary point is, we must look at what values f takes close to this point. Consider the Taylor expansion (from appendix F) of f(x, y) about a point (x ∗ , y ∗ ): f(x, y) = f(x ∗ , y ∗ ) + (x ¡x ∗ )f x (x ∗ , y ∗ ) + (y ¡y ∗ )f y (x ∗ , y ∗ ) + 1 2 (x ¡x ∗ ) 2 f xx (x ∗ , y ∗ ) + (x ¡x ∗ )(y ¡y ∗ )f xy (x ∗ , y ∗ ) + 1 2 (y ¡y ∗ ) 2 f yy (x ∗ , y ∗ ) + higher order terms, (NB this matches the first and second derivatives of f(x, y) at the point (x ∗ , y ∗ )). Suppose that (x ∗ , y ∗ ) is a stationary point. Then f x (x ∗ , y ∗ ) = f y (x ∗ , y ∗ ) = 0, and if we label the values f(x ∗ , y ∗ ) = z ∗ , x¡x ∗ = δx, y¡y ∗ = δy, f xx (x ∗ , y ∗ ) = A, f xy (x ∗ , y ∗ ) = B, f yy (x ∗ , y ∗ ) = C, (4.18) we can rewrite the Taylor series in the form f(x, y) = z ∗ + 1 2 £ Aδx 2 + 2B δxδy + C δy 2 ¤ + higher order terms, (4.19) where it is convenient to write Q(δx, δy) = Aδx 2 + 2Bδxδy + C δy 2 (4.20) for the quadratic expression in the square brackets. Let’s look at the values of Q around a circle surrounding the stationary point, i.e. let δx = δs cos θ and δy = δs sin θ where θ is an angle we can vary. Note that, ² For a minimum, Q will always be positive (f > z ∗ ). ² For a maximum, Q will always be negative (f < z ∗ ). ² For a saddle, Q will change sign around the circle. Substituting in, we get: Q(δs cos θ, δs sin θ) = δs 2 ¡ Acos 2 θ + 2B cos θ sin θ + C sin 2 θ ¢ = δs 2 · 1 2 A(1 + cos 2θ) + B sin 2θ + 1 2 C(1 ¡cos 2θ) ¸ . After a few more trig identities (see Appendix G) we get Q(δs cos θ, δs sin θ) = δs 2 · 1 2 (A + C) + Rcos(2θ ¡φ) ¸ where R > 0 and R 2 = 1 4 (A + C) 2 + (B 2 ¡AC) and the angle φ is such that 1 2 (A¡C) = Rcos φ and B = Rsinφ. 45 Consider Q δs 2 = · 1 2 (A + C) + Rcos(2θ ¡φ) ¸ . As θ varies, this oscillates with amplitude R about an average value of 1 2 (A+C). Hence, if R > 1 2 jA + Cj then the oscillations are large enough to change the sign of Q as θ is varied, giving a saddle point. This condition simplifies to R 2 > 1 4 (A + C) 2 i.e. AC ¡B 2 < 0 Hence, the condition for a saddle point is: f xx f yy ¡f 2 xy < 0 Condition for a saddle point. If it is not a saddle point, then it is a maximum or a minimum. We can determine which by looking at the sign of f xx (or f yy ). Appendix G gives more details. Hence: f xx f yy ¡f 2 xy > 0, f xx > 0 Condition for a minimum. f xx f yy ¡f 2 xy > 0, f xx < 0 Condition for a maximum. Note that in BOTH cases the function f xx f yy ¡f 2 xy must be POSITIVE at the SP. Example: Locate and classify the stationary points of the function f(x, y) = 12x 3 +y 3 +12x 2 y¡ 75y. Solution f x = 36x 2 + 24xy = 12x(3x + 2y), f y = 3y 2 + 12x 2 ¡75 = 3(4x 2 + y 2 ¡25). SP’s given by f x = 0, f y = 0. For f x = 0 we have x(3x + 2y) = 0 so EITHER x = 0 OR 3x + 2y = 0, y = ¡ 3 2 x. If x = 0 then f y = 3(y 2 ¡25) = 0 )y = §5. If y = ¡ 3 2 x then f y = 3 ¡ 4x 2 + y 2 ¡25 ¢ = 3 µ 4x 2 + 9 4 x 2 ¡25 ¶ = 3 µ 25 4 x 2 ¡25 ¶ = 75 4 ¡ x 2 ¡4 ¢ and so x = §2, y = ¡ 3 2 x = ¨3. So there are 4 SP’s, (0, 5), (0, ¡5), (2, ¡3) and (¡2, 3), with respective SV’s ¡250, 250, 150 and ¡150. 46 The 2nd order partial derivatives are f xx = 72x + 24y = 24(3x + y), f xy = 24x, f yy = 6y. At (0, 5) , f xx = 120 > 0, f xy = 0, f yy = 30, H ∗ = f xx f yy ¡f 2 xy = 3600 > 0, so this SP is a minimum. At (0, ¡5) , f xx = ¡120 < 0, f xy = 0, f yy = ¡30, H ∗ = 3600 > 0, so this SP is a maximum. At (2, ¡3) , f xx = 72, f xy = 48, f yy = ¡18, H ∗ = ¡72 £18 ¡48 2 < 0, so this SP is a saddle point. At (¡2, 3) , f xx = ¡72, f xy = ¡48, f yy = 18, H ∗ = ¡72 £18 ¡48 2 < 0, so this SP is a saddle point. This is a sketch of the contours. For the connectivity, it helps to note the stationary values. 4.4.3 Definition: Hessian The function f xx f yy ¡ f 2 xy is called the Hessian H(x, y) of f. It may be written as a 2 £2 determinant: H(x, y) = ¯ ¯ ¯ ¯ f xx f xy f yx f yy ¯ ¯ ¯ ¯ . 47 4.4.4 Definition: Degenerate stationary point A stationary point (x ∗ , y ∗ ) at which H ∗ = H(x ∗ , y ∗ ) = 0 is said to be degenerate. Such stationary points will be excluded from this course. They require further investigation, involving cubic or higher order terms in the Taylor expansion. 4.5 Lagrange Multipliers 4.5.1 Introductory example Suppose we want to find the area of the smallest circle centred on the origin which touches the line y = ¡3x + 4. The diagram shows three candidate circles: the smallest is too small as it fails to touch the line, the largest too large (we can do better); the ideal circle just touches the line in one place. Note that this means the line is the tangent line to the circle at that point, and the normal to the circle is also normal to the line . Now we can write the question as: minimise f(x, y) = π(x 2 + y 2 ) such that g(x, y) = y + 3x ¡4 = 0. Each candidate circle is a line f(x, y) =constant and the line is of the form g(x, y) =constant, so to make the two normals parallel we put rf = λrg and we retain the constraint g(x, y) = 0. This procedure always produces a maximum or minimum of f given the constraint that g = 0. The quantity λ is called a Lagrange multiplier. In this case we have rf = (2πx, 2πy) and rg = (3, 1) so we put 2πx = 3λ and 2πy = λ and the constraint g(x, y) = 0 gives λ/2π + 9λ/2π ¡4 = 0 λ = 4π/5 and so, x = 6/5, y = 2/5 Hence, the area of circle = π(36/25 + 4/25) = π(40/25) = 8π/5. 48 4.5.2 General principle In general, to minimise or maximise f (x 1 , . . . , x n ) subject to a constraint g (x 1 , . . . , x n ) =constant, we set rf = λrg where the unknown λ is called the Lagrange multiplier. rf = λrg and the constraint g =constant gives n + 1 equations altogether, enough in principle to solve for the n + 1 unknowns, x 1 , . . . , x n and λ. Why does this work? With reference to the picture above, we can make the following two comments: 1. The smallest value of f on the contour g =constant is where this contour just touches a contour f =constant. For contours to just touch, the perpendiculars to the contour must be parallel, so rf = λrg. 2. To minimise f along the contour g =constant we require only that the component of rf along the contour is zero; we are only interested in changes of f along the contour. So, rf is allowed to have a component perpendicular to the contour and we can set rf = λrg for some unknown λ. Example 1 Minimize f(x, y, z) = x 2 + y 2 + z 2 subject to the constraint x ¡2y + z = 3. Solution We write the constraint condition as g(x, y, z) = x ¡2y + z = 3 so we can calculate rf = (2x, 2y, 2z) rg = (1, ¡2, 1) 49 and to have rf = λrg requires: 2x = λ; 2y = ¡2λ; 2z = λ. We substitute these into the constraint condition: 0 = x ¡2y + z ¡3 = 1 2 λ + 2λ + 1 2 λ ¡3 = 3λ ¡3 ) λ = 1 to determine the point: (x, y, z) = ( 1 2 , ¡1, 1 2 ) at which f(x, y, z) = 1 2 2 + 1 2 + 1 2 2 = 3/2. Warning: This procedure does not tell us the difference between a minimum and a maximum of f, so you may need to check some other values to verify you have the correct solution. In this case, taking (3, 0, 0) (which satisfies the constraint) gives f = 9 which is greater than the value at our stationary point, so we have found a minimum. Example 2 Find the maximim area of a rectangle with perimeter P. Solution We need to maximise the area A = xy subject to the constraint g (x, y) = 2x+2y = P. So rA = λrg gives y = 2λ x = 2λ so y = x. Substituting into the constraint, we have 4x = P x = P 4 = y. Hence, A = P 2 /16. This is a maximum, as can be seen by e.g. choosing x = P/6 and y = P/3 (which satisfies the constraint) which gives A = P 2 /18. Extension. If there is more than one constraint, e.g. g = 0, h = 0, then we use more than one Lagrange multiplier: solve for rf = λrg + µrh in terms of x, y, z, λ and µ subject to g = 0, h = 0. 4.6 The Chain Rule We have seen (section 4.1.3) the chain rule for ffg(x 1 , . . . , x n )g. Consider f (x, y) where x and y are functions of another variable t (i.e. x(t) and y (t)). If t increases by ∆t then x increases by ∆x = ∆t dx dt and y increases by ∆y = ∆t dy dt so the change in position is ∆r = (∆x, ∆y) = ∆t µ dx dt , dy dt ¶ = ∆t dr dt 50 where dr dt = µ dx dt , dy dt ¶ is the rate of change of position with t. Now, recall from our work on directional derivatives: the change in f for this small change in position is ∆f = rf ¢ ∆r = rf ¢ dr dt ∆t. If we rearrange and let ∆t !0 we obtain the chain rule for a function of two variables, which is, df dt = rf ¢ dr dt = ∂f ∂x dx dt + ∂f ∂y dy dt (4.21) NB f depends on x and y [so partial derivatives ∂f ∂x , ∂f ∂y ] whilst x and y depend on just t [so ordinary derivatives dx dt , dy dt ]. Thus f depends on t and has the ordinary derivative df dt given by the chain rule (4.21). Example If f(x, y) = x 2 + y 2 , where x = sin t, y = t 3 , then df dt = ∂f ∂x dx dt + ∂f ∂y dy dt = 2xcos t + 2y3t 2 = 2 sint cos t + 6t 5 . Of course in this simple example we can check the result by substituting for x and y before differentiation to give f(t) = (sint) 2 + (t 3 ) 2 , so df dt = 2 sin t cos t + 6t 5 as before. The chain rule extends directly to functions of three or more variables, and to include implicit differentiation. Example If f(x, y, z) = ln(2x ¡3y + 4z), where x = e t , y = ln t, z = cosh t, then df dt = ∂f ∂x dx dt + ∂f ∂y dy dt + ∂f ∂z dz dt = 2e t 2x ¡3y + 4z ¡ 3(1/t) 2x ¡3y + 4z + 4 sinh t 2x ¡3y + 4z = 2e t ¡3/t + 4 sinh t 2e t ¡3 ln t + 4 cosht . 4.6.1 Extended chain rule For f(x, y) suppose that x and y depend on two variables s and t (e.g. polar co-ordinates, x = s cos t, y = s sin t). Then ∂r ∂s = µ ∂x ∂s , ∂y ∂s ¶ and ∂r ∂t = µ ∂x ∂t , ∂y ∂t ¶ (4.22) are two vectors representing the rate of change of position with s and t respectively. 51 Changing either s or t changes x and y, so changes f, i.e. producing ∂f ∂s and ∂f ∂t according to the extended chain rule ∂f ∂s = rf ¢ ∂r ∂s = ∂f ∂x ∂x ∂s + ∂f ∂y ∂y ∂s . (4.23) ∂f ∂t = rf ¢ ∂r ∂t = ∂f ∂x ∂x ∂t + ∂f ∂y ∂y ∂t (4.24) Example f(x, y) = x 2 y 3 , where x = s ¡t 2 , y = s + 2t. Then ∂f ∂x = 2xy 3 and ∂f ∂y = 3x 2 y 2 and ∂f ∂s = ∂f ∂x ∂x ∂s + ∂f ∂y ∂y ∂s = 2xy 3 .1 + 3x 2 y 2 .1 = xy 2 (2y + 3x) = (s ¡t 2 )(s + 2t) 2 (5s + 4t ¡3t 2 ) ∂f ∂t = ∂f ∂x ∂x ∂t + ∂f ∂y ∂y ∂t = 2xy 3 (¡2t) + 3x 2 y 2 (2) = 2xy 2 (¡2ty + 3x) = 2(s ¡t 2 )(s + 2t) 2 (3s ¡2st ¡7t 2 ). Examples (i) If f is a function of x and y, where x = e s cos t, y = e s sin t, prove that sin t ∂f ∂s + cos t ∂f ∂t = e s ∂f ∂y . (ii) If f is a function of z/x and x/y, prove that x ∂f ∂x + y ∂f ∂y + z ∂f ∂z = 0. Solutions (i) If x = e s cos t and y = e s sin t then ∂x ∂s = e s cos t ∂y ∂s = e s sint ∂x ∂t = ¡e s sint ∂y ∂t = e s cos t. It follows that ∂f ∂s = ∂f ∂x e s cos t + ∂f ∂y e s sin t ∂f ∂s = ¡ ∂f ∂x e s sint + ∂f ∂y e s cos t Combining these two equations we have sint ∂f ∂s + cos t ∂f ∂t = e s ∂f ∂y . 52 (ii) Let u = z/x and v = x/y. Then ∂u ∂x = ¡z/x 2 , ∂u ∂y = 0, ∂u ∂z = 1/x, ∂v ∂x = 1/y and ∂v ∂y = ¡x/y 2 and ∂v ∂z = 0. ∂f ∂x = ∂f ∂u ∂u ∂x + ∂f ∂v ∂v ∂x = ¡ ∂f ∂u z x 2 + ∂f ∂v 1 y ∂f ∂y = ∂f ∂u ∂u ∂y + ∂f ∂v ∂v ∂y = ¡ ∂f ∂v x y 2 ∂f ∂z = ∂f ∂u ∂u ∂z + ∂f ∂v ∂v ∂z = ∂f ∂u 1 x . and so x ∂f ∂x + y ∂f ∂y + z ∂f ∂z = 0 as required. 4.6.2 Definition of the Jacobian We could write the extended chain rule (4.23) in matrix-vector form as follows: Ã ∂f ∂s ∂f ∂t ! = Ã ∂x ∂s ∂y ∂s ∂x ∂t ∂y ∂t ! 0 @ ∂f ∂x ∂f ∂y 1 A (4.25) which leads us naturally to the Jacobian matrix J = Ã ∂x ∂s ∂y ∂s ∂x ∂t ∂y ∂t ! (4.26) whose determinant is the Jacobian of the transformation from x, y to s, t: ∂ (x, y) ∂ (s, t) = ¯ ¯ ¯ ¯ ¯ ∂x ∂s ∂y ∂s ∂x ∂t ∂y ∂t ¯ ¯ ¯ ¯ ¯ = ∂x ∂s ∂y ∂t ¡ ∂y ∂s ∂x ∂t . (4.27) NB The rows of the matrix J are the vectors ∂r ∂s = ¡ ∂x ∂s , ∂y ∂s ¢ and ∂r ∂t = ¡ ∂x ∂t , ∂y ∂t ¢ expressing the rate of change of position (x, y) with s and t respectively. Geometrically, the Jacobian is ∂(x,y) ∂(s,t) = ¯ ¯ ¯ ∂r ∂s £ ∂r ∂t ¯ ¯ ¯ = ¯ ¯ ¯ ∂r ∂s ¯ ¯ ¯ ¯ ¯ ¯ ∂r ∂t ¯ ¯ ¯ sinθ, where θ is the angle between ∂r ∂s ad ∂r ∂t . Hence, the Jacobian is the area of the parallelogram whose sides are ∂r ∂s and ∂r ∂t . 4.6.3 Change of Variables Suppose we have x and y expressed in terms of two other variables s and t. How would we go about finding expressions for s and t in terms of x and y? Is this always possible? Carrying out a change of variables. The procedure for reversing a change of variables is to look for combinations of x and y which eliminate all dependence on one of s and t. This is best shown by example. Example If x = s 2 t and y = t 2 /s, find s and t in terms of x and y. Solution 53 We begin by looking for a combination of x and y that has no t-dependence. From the first equation, we can write t = x/s 2 so we substitute this into the second equation: y = (x/s 2 ) 2 /s = x 2 /s 5 and manipulate this result to give s: s 5 = x 2 /y )s = x 2/5 y −1/5 . We can then substitute this into either of the definitions to eliminate s – we choose the second: y = t 2 x −2/5 y 1/5 )t = x 1/5 y 2/5 so the full solution is s = x 2/5 y −1/5 , and t = x 1/5 y 2/5 . Example If x = s cos t and y = s sin t, find s and t in terms of x and y. Solution Here we start by eliminating t. The simplest way to do this is to use the identity sin 2 t + cos 2 t = 1: x 2 + y 2 = s 2 cos 2 t + s 2 sin 2 t = s 2 )s = (x 2 + y 2 ) 1/2 , and to eliminate s we simply divide the two expressions: y/x = s sin t/s cos t = tan t )t = tan −1 (y/x). Example If x = s 2 t and y = s 4 t 2 + 2s 2 t + 4, express s and t in terms of x and y. Solution We try to eliminate t by using the x-equation: x = s 2 t )t = xs −2 in the y-equation: y = s 4 t 2 + 2s 2 t + 4 = s 4 [xs −2 ] 2 + 2s 2 [xs −2 ] + 4 = x 2 + 2x + 4 and we find that we have no s-dependence in this equation so we can’t rearrange to find s. In this case it is not possible to determine s and t from values of x and y. Since y can be written in terms of x only, y and x are not independent. Jacobian and change of variables In the example above, we could not find s and t from x and y. The critical quantity here is the Jacobian: ∂ (x, y) ∂ (s, t) = ∂x ∂s ∂y ∂t ¡ ∂y ∂s ∂x ∂t = (2st)(2s 4 t+2s 2 )¡(4s 3 t 2 +4st)(s 2 ) = 4s 5 t 2 +4s 3 t¡4s 5 t 2 ¡4s 3 t = 0. In general, it is only possible to change variables and change back again if the Jacobian of transformation is not zero. When the Jacobian is zero, the area of the parallelogram whose sides are ∂r ∂s and ∂r ∂t is zero, which means ∂r ∂s and ∂r ∂t are parallel. Changes in either s or t give changes in the position (x, y) in the same direction, so y and x are not independent, and s and t cannot be uniquely determined from x and y. 54