Stochastic Processes - Ross

March 26, 2018 | Author: marcogarieri | Category: Stochastic Process, Markov Chain, Probability Theory, Expected Value, Probability Distribution


Comments



Description

STOCHASTIC PROCESSESSecond Edition Sheldon M. Ross University of California, Berkeley JOHN WILEY & SONS. INC. New York • Chichester • Brisbane • Toronto • Singapore ACQUISITIONS EDITOR Brad Wiley II MARKETING MANAGER Debra Riegert SENIOR PRODUCfION EDITOR Tony VenGraitis MANUFACfURING MANAGER Dorothy Sinclair TEXT AND COVER DESIGN A Good Thing, Inc PRODUCfION COORDINATION Elm Street Publishing Services, Inc This book was set in Times Roman by Bi-Comp, Inc and printed and bound by Courier/Stoughton The cover was printed by Phoenix Color Recognizing the importance of preserving what has been written, it is a policy of John Wiley & Sons, Inc to have books of enduring value published in the United States printed on acid-free paper, and we exert our best efforts to that end The paper in this book was manufactured by a mill whose forest management programs include sustained yield harvesting of its timberlands Sustained yield harvesting principles ensure that the number of trees cut each year does not exceed the amount of new growth Copyright © 1996, by John Wiley & Sons, Inc All rights reserved Published simultaneously in Canada Reproduction or translation of any part of this work beyond that permitted by Sections 107 and 108 of the 1976 United States Copyright Act without the permission of the copyright owner is unlawful Requests for permission or further information shOUld be addressed to the Permissions Department, John Wiley & Sons, Inc Library of Congress Cataloging-in-Publication Data: Ross, Sheldon M Stochastic processes/Sheldon M Ross -2nd ed p cm Includes bibliographical references and index ISBN 0-471-12062-6 (cloth alk paper) 1 Stochastic processes I Title QA274 R65 1996 5192-dc20 Printed in the United States of America 10 9 8 7 6 5 4 3 2 95-38012 CIP On March 30, 1980, a beautiful six-year-old girl died. This book is dedicated to the memory of Nichole Pomaras Preface to the First Edition This text is a nonmeasure theoretic introduction to stochastic processes, and as such assumes a knowledge of calculus and elementary probability_ In it we attempt to present some of the theory of stochastic processes, to indicate its diverse range of applications, and also to give the student some probabilistic intuition and insight in thinking about problems We have attempted, wherever possible, to view processes from a probabilistic instead of an analytic point of view. This attempt, for instance, has led us to study most processes from a sample path point of view. I would like to thank Mark Brown, Cyrus Derman, Shun-Chen Niu, Michael Pinedo, and Zvi Schechner for their helpful comments SHELDON M. Ross vii Preface to the Second Edition The second edition of Stochastic Processes includes the following changes' (i) Additional material in Chapter 2 on compound Poisson random vari- ables, including an identity that can be used to efficiently compute moments, and which leads to an elegant recursive equation for the probabilIty mass function of a nonnegative integer valued compound Poisson random variable; (ii) A separate chapter (Chapter 6) on martingales, including sections on the Azuma inequality; and (iii) A new chapter (Chapter 10) on Poisson approximations, including both the Stein-Chen method for bounding the error of these approximations and a method for improving the approximation itself. In addition, we have added numerous exercises and problems throughout the text. Additions to individual chapters follow: In Chapter 1, we have new examples on the probabilistic method, the multivanate normal distnbution, random walks on graphs, and the complete match problem Also, we have new sections on probability inequalities (includ- ing Chernoff bounds) and on Bayes estimators (showing that they are almost never unbiased). A proof of the strong law of large numbers is given in the Appendix to this chapter. New examples on patterns and on memoryless optimal coin tossing strate- gies are given in Chapter 3. There is new matenal in Chapter 4 covering the mean time spent in transient states, as well as examples relating to the Gibb's sampler, the Metropolis algonthm, and the mean cover time in star graphs. Chapter 5 includes an example on a two-sex population growth model. Chapter 6 has additional examples illustrating the use of the martingale stopping theorem. Chapter 7 includes new material on Spitzer's identity and using it to compute mean delays in single-server queues with gamma-distnbuted interarrival and service times. Chapter 8 on Brownian motion has been moved to follow the chapter on martingales to allow us to utilIZe martingales to analyze Brownian motion. ix x PREFACE TO THE SECOND EDITION Chapter 9 on stochastic order relations now includes a section on associated random variables, as well as new examples utilizing coupling in coupon collect- ing and bin packing problems. We would like to thank all those who were kind enough to write and send comments about the first edition, with particular thanks to He Sheng-wu, Stephen Herschkorn, Robert Kertz, James Matis, Erol Pekoz, Maria Rieders, and Tomasz Rolski for their many helpful comments. SHELDON M. Ross Contents CHAPTER 1. PRELIMINARIES 1 1.1. Probability 1 1.2. Random Variables 7 1.3. Expected Value 9 1.4 Moment Generating, Characteristic Functions, and Laplace Transforms 15 1.5. Conditional ExpeCtation 20 1.5.1 Conditional Expectations and Bayes Estimators 33 1.6. The Exponential Distribution, Lack of Memory, and Hazard Rate Functions 35 1.7. Some Probability Inequalities 39 1.8. Limit Theorems 41 1.9. Stochastic Processes 41 Problems 46 References 55 Appendix 56 CHAPTER 2. THE POISSON PROCESS 59 2 1. The Poisson Process 59 22. Interarrival and Waiting Time Distributions 64 2.3 Conditional Distribution of the Arrival Times 66 2.31. The MIGII Busy Period 73 2.4. Nonhomogeneous Poisson Process 78 2.5 Compound Poisson Random Variables and Processes 82 2.5 1. A Compound Poisson Identity 84 25.2. Compound Poisson Processes 87 xi xii CONTENTS 2.6 Conditional Poisson Processes 88 Problems 89 References 97 CHAPTER 3. RENEWAL THEORY 98 3.1 Introduction and Preliminaries 98 32 Distribution of N(t) 99 3 3 Some Limit Theorems 101 331 Wald's Equation 104 332 Back to Renewal Theory 106 34 The Key Renewal Theorem and Applications 109 341 Alternating Renewal Processes 114 342 Limiting Mean Excess and Expansion of met) 119 343 Age-Dependent Branching Processes 121 3.5 Delayed Renewal Processes 123 3 6 Renewal Reward Processes 132 3 6 1 A Queueing Application 138 3.7. Regenerative Processes 140 3.7.1 The Symmetric Random Walk and the Arc Sine Laws 142 3.8 Stationary Point Processes 149 Problems 153 References 161 CHAPTER 4. MARKOV CHAINS 163 41 Introduction and Examples 163 42. Chapman-Kolmogorov Equations and Classification of States 167 4 3 Limit Theorems 173 44. Transitions among Classes, the Gambler's Ruin Problem, and Mean Times in Transient States 185 4 5 Branching Processes 191 46. Applications of Markov Chains 193 4 6 1 A Markov Chain Model of Algorithmic Efficiency 193 462 An Application to Runs-A Markov Chain with a Continuous State Space 195 463 List Ordering Rules-Optimality of the Transposition Rule 198 A ••• CONTENTS XIII 4.7 Time-Reversible Markov Chains 203 48 Semi-Markov Processes 213 Problems 219 References 230 CHAPTER 5. CONTINUOUS-TIME MARKOV CHAINS 231 5 1 Introduction 231 52. Continuous-Time Markov Chains 231 5.3.-Birth and Death Processes 233 5.4 The Kolmogorov Differential Equations 239 5.4.1 Computing the Transition Probabilities 249 5.5. Limiting 251 5 6. Time Reversibility 257 5.6.1 Tandem Queues 262 5.62 A Stochastic Population Model 263 5.7 Applications of the Reversed Chain to Queueing Theory 270 57.1. Network of Queues 271 57 2. The Erlang Loss Formula 275 573 The MIG/1 Shared Processor System 278 58. Uniformization 282 Problems 2R6 References 294 CHAPTER 6. MARTINGALES 295 Introduction 295 6 1 Martingales 295 62 Stopping Times 298 6 3. Azuma's Inequality for Martingales 305 64. Submartingales, Supermartingales. and the Martingale Convergence Theorem 313 65. A Generalized Azuma Inequality 319 Problems 322 References 327 CHAPTER 7. RANDOM WALKS 328 Introduction 32R 7 1. Duality in Random Walks 329 xiv CONTENTS 7.2 Some Remarks Concerning Exchangeable Random Variables 338 73 Using Martingales to Analyze Random Walks 74 Applications to GI Gil Queues and Ruin Problems 344 7.4.1 The GIGll Queue 7 4 2 A Ruin Problem 344 347 7 5 Blackwell's Theorem on the Line 349 Problems 352 References 355 341 CHAPTER 8. BROWNIAN MOTION AND OTHER MARKOV PROCESSES 356 8.1 Introduction and Preliminaries 356 8.2. Hitting Times, Maximum Variable, and Arc Sine Laws 363 83. Variations on Brownian Motion 366 83.1 Brownian Motion Absorbed at a Value 8.3.2 Brownian Motion Reflected at the Origin 8 3 3 Geometric Brownian Motion 368 8.3.4 Integrated Brownian Motion 369 8.4 Brownian Motion with Drift 372 84.1 Using Martingales to Analyze Brownian Motion 381 366 368 85 Backward and Forward Diffusion Equations 383 8.6 Applications of the Kolmogorov Equations to Obtaining Limiting Distributions 385 8.61. Semi-Markov Processes 385 862. The MIG/1 Queue 388 8.6.3. A Ruin Problem in Risk Theory 87. A Markov Shot Noise Process' 393 392 88 Stationary Processes 396 Problems 399 References 403 CHAPTER 9. STOCHASTIC ORDER RELATIONS 404 Introduction 404 9 1 Stochastically Larger 404 CONTENTS xv 9 2. Coupling 409 9.2 1. Stochastic Monotonicity Properties of Birth and Death Processes 416 9.2.2 Exponential Convergence in Markov Chains 418 0.3. Hazard Rate Ordering and Applications to Counting Processes 420 94. Likelihood Ratio Ordering 428 95. Stochastically More Variable 433 9.6 Applications of Variability Orderings 437 9.6.1. Comparison of GIGl1 Queues 439 9.6.2. A Renewal Process Application 440 963. A Branching Process Application 443 9.7. Associated Random Variables 446 Probl ems 449 References 456 CHAPTER 10. POISSON APPROXIMATIONS 457 Introduction 457 10.1 Brun's Sieve 457 10.2 The Stein-Chen Method for Bounding the Error of the Poisson Approximation 462 10.3. Improving the Poisson Approximation 467 Problems 470 References 472 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 473 INDEX 505 CHAPTER 1 Preliminaries 1. 1 PROBABILITY A basic notion in probability theory is random experiment an experiment whose outcome cannot be determined in advance. The set of all possible outcomes of an experiment is called the sample space of that experiment, and we denote it by S. An event is a subset of a sample space, and is said to occur if the outcome of the experiment is an element of that subset. We shall suppose that for each event E of the sample space S a number P(E) is defined and satisfies the following three axioms*: Axiom (1) O:s;; P(E) ~ 1. Axiom (2) P(S) = 1 Axiom (3) For any sequence of events E., E 2 , •• that are mutually exclusive, that is, events for which E,E, = cf> when i ~ j (where cf> is the null set), P ( 9 E,) = ~ P(E,). We refer to P(E) as the probability of the event E. Some simple consequences of axioms (1). (2). and (3) are. 1.1.1. If E C F, then P(E) :S P(F). 1.1.2. P(EC) = 1 - P(E) where EC is the complement of E. 1.1.3. P ( U ~ E,) = ~ ~ P(E,) when the E, are mutually exclusive. 1.1.4. P ( U ~ E,) :S ~ ~ P(E,). The inequality (114) is known as Boole's inequality * Actually P(E) will only be defined for the so-called measurable events of S But this restriction need not concern us 1 2 PRELIMIN ARIES An important property of the probability function P is that it is continuous. To make this more precise, we need the concept of a limiting event, which we define as follows· A sequence of even ts {E" , n 2= I} is said to be an increasing sequence if E" C E,,+1> n ;> 1 and is said to be decreasing if E" :J E,,+I, n ;> 1. If {E", n 2= I} is an increasing sequence of events, then we define a new event, denoted by lim"_,,, E" by OIl limE,,= U E; when E" C E,,+I, n 2= 1. ,,_00 ,"'I Similarly if {E", n 2= I} is a decreasing sequence, then define lim,,_oo E" by OIl limE" = n E" when E" :J E,,+I, n 2= 1. ,,_00 We may now state the following: PROPOSITION 1.1.1 If {E", n 2: 1} is either an increasing or decreasing sequence of events, then Proof Suppose, first, that {E", n 2: 1} is an increasing sequence, and define events F", n 2: 1 by ( II-I ) F" = E" Y E, t = E " E ~ _ \ , n>l That is, F" consists of those points in E" that are not in any of the earlier E,. i < n It is easy to venfy that the Fn are mutually exclusive events such that '" "" U F, = U E, and U F, = U E, for all n 2: 1. ,=\ ,=1 i-I ,=\ PROBABILITY ; 3 Thus p(y E,) =p(y F.) " = L P(F,) (by Axiom 3) I n = limLP(F,) n_7J I = limp(U F.) n_:r:l 1 = limp(U E,) n_7J I = lim PeEn), which proves the result when {En' n ;:::: I} is increasing If {En. n ;:::: I} is a decreasing sequence, then n ;:::: I} is an increasing se- quence, hence, P (0 = lim 1 n_:o But, as = En)', we see that 1- P (0 En) = - PeEn)], or, equivalently, P (n En) = lim PeEn), 1 "_00 which proves the result EXAMPLE 1..1.(A) Consider a population consisting of individuals able to produce offspring of the same kind. The number of individuals 4 PRELIMINARIES initially present, denoted by Xo. is called the size of the zeroth generation All offspring of the zeroth generation constitute the first generation and their number is denoted by Xl In general, let Xn denote the size of the nth generation Since Xn = 0 implies that X n + l = 0, it follows that P{Xn = O} is increasing and thus lim n _ oc P{Xn = O} exists What does it represent? To answer this use Proposition 1.1.1 as foHows: = P{the population ever dies out}. That is, the limiting probability that the nth generation is void of individuals is equal to the probability of eventual extinction of the population. Proposition 1.1.1 can also be used to prove the Borel-Cantelli lemma. PROPOSITION 1...1.2 The Borel-Cantelli Lemma Let EI> E 2 , denote a sequence of events If "" L: P(E,) < 00, 1=] then P{an infinite number of the E, occur} = 0 Proof The event that an infinite number of the EI occur. called the lim sup E" can be expressed as '-'" ., ., lim sup E, = n U E, ,-'" This follows since if an infinite number of the E, occur, then U:: n E, occurs for each n and thus n:=] U:: n E, occurs On the other hand, if n:=t U:: n E, occurs, then U:: n E, occurs for each n, and thus for each n at least one of the E, occurs where i .2: n, and, hence, an infinite number of the E, occur PROBABILITY .' 5 As U':n E" n 2: 1, is a decreasing sequence of events, it follows from Proposition 1 1 1 that P (Q ~ E,) = P ( ~ ~ Y. E,) IX> ~ lim 2: P(E,) n_OCI c=n =0. and the result is proven. ExAMPlE 1.1.(a) Let Xl, X 2 , •• be such that P{Xn = O} = lIn 2 = 1 - P{Xn = 1}, n ~ 1 . If we let En = {Xn = O}, then, as ~ : peEn} < 00, it follows from the Borel-Cantelli lemma that the probability that Xn equals 0 for an infinite number of n is equal to O. Hence, for all n sufficiently large, Xn must equal 1, and so we may conclude that, with probabil- ity 1, lim Xn = 1. For a converse to the Borel-Cantelli lemma, independence is required. PROPOSITION 1.1.3 Converse to the Borel-Cantelli Lemma are independent events such that IX> 2: P(E n ) = 00, n-l then P{an infinite number of the En occur} = 1. 6 PRELIMINARIES Proof P{an infinite number ofthe En occur} == P td E,} == limp(U Ei) n_oo Now, (by independence) = II (1 - P(E,» (by the inequality 1 - x s e- Z ) = exp ( - P(E,») '" = 0 since 2: P(E,) = 00 for all n. i-If Hence the result follows. ExAMPLE 1.1(c) Let XI, X 2 , ••• be independent and such that P{Xn = O} = lin = 1 - P{Xn = I}, If we let E" = {Xn = O}, then as P(E,,) = 00 it follows from Proposition 1.1.3 that En occurs infinitely often. Also, as = 00 it also follows that also occurs infinitely often. Hence, with probability 1, Xn will equal 0 infinitely often and will also equal 1 infinitely often. Hence, with probability I, X" will not approach a lImiting value as n -4 00. RANDOM VARIABLES .7 1.2 RANDOM VARIABLES Consider a random experiment having sample space S A random variable X is a function that assigns a real value to each outcome in S. For any set of real numbers A, the probability that X will assume a value that is contained in the set A is equal to the probability that the outcome of the experiment is contained in X-I(A). That is, P{X E A} = P(X-I(A», where X-I(A) is the event consisting of all points s E S such that X(s) EA. The distribution function F of the random variable X is defined for any real number x by F(x) = P{X::$; x} = P{X E (- 00, x]). We shall denote 1 - F(x) by F(x), and so F(x) = P{X> x} A random variable X is said to be discrete if its set of possible values is countable For discrete random variables, F(x) 2: P{X = y} YSJ: A random variable is called continuous if there exists a functionf(x), called the probability density function, such that P{X is in B} = fB f(x) dx for every set B. Since F(x) J ~ ... f(x) dx, it follows that d f(x) = dx F(x). The joint distribution function F of two random variables X and Y is de· fined by F(x, y) = P{X ::$; x, Y::$; y}. The distribution functions of X and Y, Fx(x) P{X s; x} and Fy(y) = P{Y <: y}, 8 PRELIMIN ARIES can be obtained from F(x, y) by making use of the continuity property of the probability operator. Specifically, let Yn, n 2: 1, denote an increasing sequence converging to 00 Then as the events {X:s; x, Y:s; Yn}, n ::> 1, are increasing and '" lim{X :5; x, Y:5; Yn} U {X<:x, YS:Yn} {X <:x}, 11-+" 11=1 it follows from the continuity property that lim P{X :s; x, Y <: YII} = P{X:5; x}, n_oo or, equivalently, Fx(x) = lim F(x, y). y-+'" Similarly, Fy(y) lim F(x, y). X_OIl The random variables X and Yare said to be independent if F(x, y) = Fx(x)Fy(y) for all x and y. The random variables X and Yare said to be jointly continuous if there exists a function t(x, y), called the joint probability density function, such that P{Xis inA, Yis in B} L t t(x,y) dy dx for all sets A and B. The joint distribution of any collection XI, Xl> ", XII of random variables is defined by Furthermore, the n random vanables are said to be independent if where Fx(xr ) = lim F(XI, .. ,XII)' J X, __ oo ,"r EXPECTED VALUE 9 1.3 EXPECTED VALUE The expectation or mean of the random variable X, denoted by EtX], is defined by (1.3.1) E[X] = J:oo x dF(x) .. xf(x) dx = if X is continuous if X is discrete provided the above integral exists Equation (1.3 1) also defines the expectation of any function of X, say h(X). Since h(X) is itself a random variable, it follows from (1 3 1) that E[h(X)] == J: ... x dFh(x). where Fh is the distribution function of h(X) However, it can be shown that this is identical to J: .. h(x) dF(x). That is, (1.32) E[h(X)] = J: ... h(x) dF(x) The variance of the random variable X is defined by Var X = E[(X - E[X])2] = E[X2] - E2[X] Two jointly distributed random variables X and Yare said to be uncorre- lated if their covariance, defined by Cov(X, Y) = E[(X - EX)(Y - EY)] = E[XY] - E[X]E[Y] is zero. It follows that independent random variables are uncorrelated How- ever, the converse need not be true. (The reader should think of an example) An important property of expectations is that the expectation of a sum of random variables is equal to the sum of the expectations (1.3.3) 10 PRELIMINARIES The corresponding property for variances is that (134) Var [t. XI] II 2: Var(X,) + 2 2:2: Cov(X,}X) 1=1 EXAMPLE 1.3{A) The Matching Problem. At a party n people put their hats in the center of a room where the hats are mixed together Each person then randomly selects one We are interested in the mean and variance of X ~ t h e number that select their own hat To solve, we use the representation where X = {1 , 0 X = XI + X 2 + .. + XII' if the ith person selects his or her own hat otherwise Now, as the ith person is equally likely to select any of the n hats, it follows that P{X, = 1} = lin, and so E[X,] = lIn, 1 ( 1) n 1 Var(X,) =;; 1 -;; =--;:;r- Also Now, { 1 if the ith and jth party goers both select their own hats X/X,= 0 otherwise, and thus E[X,X;] = P{X, = 1 , ~ = 1} = P{X, = l}P{X, = 11X, = 1} 1 1 -;;n-l' EXPECfED VALUE Hence, 1 (1)2 1 = n(n - 1) - ;; = n2(n - 1) Therefore, from (1.3.3) and (1.3.4), E[X] = 1 and Var(X) n: 1 + 2 n 2 (n 1 1) =1. Thus both the mean and variance of the number of matches are equal to 1. (See Example 1.5(f) for an explanation as to why these results are not surpnsing ) EXAMPLE 1..3(a) Some Probability Identities. Let AI, A 2 , ••• , All denote events and define the indicator variables I}, j 1, ... ,n by I = {I } 0 Letting if A} occurs otherwise. then N the number of the A}, 1 :S j :s;; n, that occur. A useful identity can be obtained by noting that (1.3.5) But by the binomial theorem, (1.3.6) (-I)' = i (-1)' 1=0 1 ifN= 0 if N> O. since (7) = 0 when i > m. 11 u Hence, if we let ifN>O if N = 0, then (1.3.5) and (1.3.6) yield 1 - I = ~ ( ~ ) (-1)' or (1.3.7) 1= ~ ( ~ ) (-1)1+1 Taking expectations of both sides of (1.3.7) yields PRELIMINARIES (1.3.8) E[Il = E[N] - E [(:)] + ... + (-1)n+IE [(:)]. However, E[I] = P{N> O} == P{at least one of the A, occurs} and E[N] = E [ ~ ~ ] = ~ P(A,), E [ ( ~ ) ] = E[number of pairs of the A, that occur] = LL P(A,A), 1<, EXPECTED VALUE and, in general, by the same reasoning, E = E[number of sets of sIZe i that occur] Hence, (1.3.8) is a statement of the identity P ( 9 Ai) = peA,) - P(A,A,) + P(A,A,Ak) - ... + (-1)n+lp(A I A 2 ' •• An). Other useful identities can also be denved by this approach. For instance, suppose we want a formula for the probability that exactly , of the events AI, ... , An occur. Then define I = {1 , 0 if N=, otherwise and use the identity (1 - 1 )N-, = I, or (:) N-, (N - ') I, = 2, . (-1)' ',=0 1 n-' (N) (N - ') . =2, . (-1)' ,=0' 1 n-, ( N ) (' + i) = 2,. (-1)'. ;=0 ,+ 1 , Taking expectations of both sides of the above yields n-' (' + i) [( N )] E[I,] = 2, (-1)' E . ,=0 , ,+ 1 13 14 PRELIMINARIES or (1 39) P{exactly, of the events AI, . • An occur} = ~ (-I)' (' + i) L .. L P(A"A ,z . AI,) , ~ O r 11<lz< <I,., As an application of (1 3 9) suppose that m balls are randomly put in n boxes in such a way that, independent of the locations of the other balls, each ball is equally likely to go into any of the n boxes. Let us compute the probability that exactly' of the boxes are empty. By letting A, denote the event that the ith box is empty, we see from (1 3.9) that P{exactly, of the boxes are empty} n-r (' + i) ( n ) ( , + i)m =L(-I)' . 1-- , ,=0 "+ 1 n where the above follows since ~ I < <I consists of ( n ) terms I '" ,+ i and each term in the sum is equal to the probability that a given set of , + i boxes is empty. Our next example illustrates what has been called the p,obabilistic method This method, much employed and popularized by the mathematician Paul Erdos, attempts to solve deterministic problems by first introducing a probabil- ity structure and then employing probabilistic reasoning. EXAMPLE 1.3(c) A graph is a set of elements, called nodes, and a set of (unordered) pairs of nodes, called edges For instance, Figure 1.3.1 illustrates a graph with the set of nodes N = {I, 2, 3, 4, 5} and the set of edges E = {(1, 2), (1,3), (1,5), (2,3), (2, 4), (3,4), (3, 5)}. Show that for any graph there is a subset of nodes A such that at least one-half of the edges have one of their nodes in A and the other in AC (For instance, in the graph illustrated in Figure 1.3 1 we could take A = {l, 2, 4} ) Solution. Suppose that the graph contains m edges, and arbitrarily number them as 1, 2, ", m For any set of nodes B, if we let C(B) denote the number of edges that have exactly one of their nodes in B, then the problem is to show that max C(B) ~ m12. To verify B this, let us introduce probability by randomly choosing a set of nodes S so that each node of the graph is independently in S with probability 112 If we now let X denote the number of edges in MOMENT GENERATING. CHARACTERISTIC FUNCflONS. AND LAPLACE 15 Figure 1.3.1. A graph. the graph that have exactly one of their nodes in S, then X is a random variable whose set of possible values is all of the possible values of C(8) Now, letting XI equal 1 if edge i has exactly one of its nodes in S and letting it be 0 otherwise, then E[X] = E [ ~ X I ] = ~ E[XI] = ml2 Since at least one of the possible values of a random variable must be at least as large as its mean, we can thus conclude that C(8) ~ ml2 for some set of nodes 8 (In fact, provided that the graph is such that C(8) is not constant, we can conclude that C(8) > ml2 for some set of nodes 8 ) Problems 1.9 and 1 10 give further applications of the probabilistic method 1.4 MOMENT GENERATING, CHARACTERISTIC FUNCTIONS, AND LAPLACE TRANSFORMS The moment generating function of X is defined by rfJ(t) = E [e rX ] = f e rx dF(x) 16 PRELIMINARIES All the of X can be successively obtained by differentiating", and then evaluating at t = 0 That is, Evaluating at t = 0 ",I(t) = E[Xe 'X ]. rjJ'(t) = E[X 2 e 'X ] ",n(t) = E [X"e IX ]. It should be noted that we have assumed that it is justifiable to interchange the differentiation and integration operations. This is usually the case When a moment generating function exists, it uniquely determines the quite important because it enables us to characterize the probability of a random vanable by its generating function EXAMPLE 1..4(A) Let X and Y be independent normal random vari- with means /J.l and /J.2 and respective variances Table 1.4.1 Discrete Moment Probability Probability Mass Generating Distribution Function, p(x) Function, cb(t) Mean Variance Binomial with pa- (:) pI(1 - p)n-, (pel + (I - p))n np np(1 - p) rameters n, p, O:5p::S1 x = 0, 1, ,n Poisson with pa- >..x exp{A(e ' - I)} -A rameter A > 0 e xl' x -= 0, I. 2, Geometric with p(1 _ py-l, pe' I I-p - parameter 0 ::s x = 1,2. I - (I - p)e ' p pl p:51 Negative binomial (x - I)p,(] _ p)' " ( pe' )' r r(1 - p) - with parame- r - ] I - (1 - p)e ' P pl ters r, p x = r, r + 1, MOMENT GENERATING. CHARACTERISTIC FUNCTIONS, AND LAPLACE 17 at and a ~ . The moment generating function of their sum is given by r/Jx+y(t) = E [el(x+Y)] = E[eIX]E[e 1Y ] = r/Jx(t)r/Jy(t) (by independence) = exp{(1L1 + 1L2)t + (at + aDt 2 /2}, where the last equality comes from Table 1.4.2. Thus the moment generating function of X + Y is that of a normal random vanable with mean ILl + 1L2 and variance at + a ~ . By uniqueness, this is the distribution of X + Y. As the moment generating function of a random vanable X need not exist, it is theoretically convenient to define the characteristic function of X by -oo<t<oo, where i = -v=l. It can be shown that <p always exists and, like the moment generating function, uniquely determines the distribution of X. We may also define the joint moment generating of the random variables XI,' ., Xn by or the joint characteristic function by It may be proven that the joint moment generating function (when it exists) or the joint characteristic function uniquely determines the joint distribution. EXAMPLE 1..4(a) The Multivariate Normal Distribution. Let ZI, . ,Zn be independent standard normal random variables. If for some constants a'l' 1 -< i :=s m, 1 -< j ~ n, and IL" 1 :=s i :=s m, XI = allZI + ... + alnZ n + ILl, X 2 = a21 Z1 + ... + a2nZn + 1L2, QIIO Table 1.4.2 Contmuous Probablhty Dlstnbutlon Umform over (a, b) Exponential with parameter A > 0 Gamma with parameters (n, A), A > 0 Normal with parameters (p" ( 2 ) Beta with parameters a, b, a > 0, b > 0 Probability Density FunctIOn, f(x) 1 0 Ae-.lr(Ax),,-1 (n-I)' 1 1 ! --e-(X-") 12fT - 00 < x < 00 V2'lT u ' cx 4 - 1 (1 - X)H, 0 < X < 1 _ rea + b) C - f(a)f(b) Moment Generatmg Function, 4>(t) Mean Vanance e tb - ehl a+b (b - a)Z t(b - a) 2 12 A - A - t A A2 C t)" n n - A2 A { uZt2} P, u 2 exp p,t + 2 a ab a+b (a + b)Z(a + b + I) MOMENT GENERATING. CHARACfERISTIC FUNCfIONS. AND LAPLACE ,19 then the random variables XI, ., Xm are said to have a multivari- ate normal distribution. Let us now consider the joint moment generating function of XI, " . , X m • The first m thing to note is that since ~ t,X, is itself a linear combination of ,= I the independent normal random variables ZI, . ,Zn it is also normally distributed. Its mean and variance are and m m = 2: 2: tit, Cov(X" XJ , ~ I , ~ I Now, if Y is a normal random variable with mean IL and variance u 2 then Thus, we see that !/I(tl, ... , tm) = exp { ~ t,lL, + 1/2 ~ t. t,t, Cov(X" XJ }, which shows that the joint distribution of XI, ... , Xm is completely determined from a knowledge of the values of E[X,l and Cov(X" ~ ) , i, j = 1, '" ,m. When dealing with random vanables that only assume nonnegative values, it is sometimes more convenient to use Lap/ace transforms rather than charac- teristic functions. The Laplace transform of the distribution F is defined by Pes) = f; e- U dF(x). This in tegral exists for complex variables s = a + bi, where a ~ O. As in the case of characteristic functions, the Laplace transform uniquely determines the distribution. 20 PRELIMINARIES We may also define Laplace transforms for arbitrary functions in the follow- ing manner: The Laplace transform of the function g, denoted g, is defined by g(s) = I ~ e- SX dg(x) provided the integral exists It can be shown that g determines g up to an additive constant. 1.5 CONDITIONAL EXPECTATION If X and Yare discrete random variables, the conditional probability mass function of X, given Y = y, is defined, for all y such that P{Y = y} > 0, by P{X = I Y = } = P{X = x, Y = y} x y P{Y=y} The conditional distribution function of X given Y = y is defined by F(xly) = P{X < xl Y = y} and the conditional expectation of X given Y = y, by E[xIY=Y]= IxdF(xIY)=LxP{X=xIY=y} x If X and Y have a joint probability density function f(x, y), the conditional probability density function of X, given Y = y, is defined for all y such that fy(y) > 0 by f(xly) = f(x,y), fy(y) and the conditional probability distribution function of X, given Y = y, by F(xly) = P{X ~ xl Y = y} = to: f(xly) dx The conditional expectation of X, given Y = y, is defined, in this case, by E[XI Y= y] = r",xf(xly) dx. Thus all definitions are exactly as in the unconditional case except that all probabilities are now conditional on the event that Y = y CONDITION AL EXPECTATION '21 Let us denote by £[XI Y] that function of the random variable Y whose value at Y = y is £[XI Y = y]. An extremely useful property of conditional expectation is that for all random variables X and Y (151) £[X] = £[£[XI Y]] f £[XI Y= y] dFy(y) when the expectations exist If Y is a discrete random variable, then Equation (1 5.1) states £[X] = 2: £[XI Y= y]P{Y= y}, y While if Y is continuous with density fey), then Equation (1.5 1) says £[X] = J:", £[XI Y = y]f(y) dy We now give a proof of Equation (1 5 1) in the case where X and Yare both discrete random van abies Proof of (1 5.1) when X and Y Are Discrete To show £[X] = 2: £[XI Y y]P{Y = y}. y We write the right-hand side of the above as 2: £[XI Y = y]P{Y = y} = 2: 2:xP{X = xl Y= y}P{Y = y} y y x = 2: 2: xP{X = x, Y = y} y x :::: 2:x 2: P{X= x, Y= y} x y :::: 2:xP{X x} ::: £[X]. and the result is obtained. Thus from Equation (1.51) we see that £[X] is a weighted average of the conditional expected value of X given that Y = y, each of the terms £[XI Y y] being weighted by the probability of the event on which it is conditioned 22 PRELIMINARIES ExAMPLE 1.5(A) The Sum of a Random Number of Random Variables. Let X .. X 2 , ••• denote a sequence of independent and identically distnbuted random vanables; and let N denote a nonnegative integer valued random variable that is independent of the sequence XI, X);, ... We shall compute the moment generating function of Y = Xr by first conditioning on N. Now E [ exp {t xj I N = n ] = E [ exp {t Xr} IN = n ] = E [exp {t Xr}] (by independence) = ('I' x(t))n, where 'l'x(t) = E[e rX ] is the moment generating function of X. Hence, and so r/ly(t) = E [ex p {t X}] = E[(r/lX(t))N]. To compute the mean and vanance of Y = X" we differentiate r/ly (t) as follows: r/ly(t) = E[N(r/lx(t))N-Ir/lX(t)], r/lHt) = E[N(N - + N(r/lX(t))N-Ir/I'X(t)]. Evaluating at t == 0 gives and E[Y] = E[NE[X]] = E[N]E[X] E[y2] = E[N(N - 1)E2[X] + NE[X2]] = E[N] Var(X) + E[N2]E2[X]. CONDITIONAL EXPECTATION Hence, Var(Y) = E[y2] E2[y] = E[N] Var(X) + E2{X] Var(N). WMPLE1.5(S) A miner is trapped in a mine containing three doors. The first door leads to a tunnel that takes him to safety after two hours of travel The second door leads to a tunnel that returns him to the mine after three hours of traveL The third door leads to a tunnel that returns him to his mine after five hours. Assuming that the miner is at all times equally likely to choose anyone of the doors, let us compute the moment generating function of X, the time when the miner reaches safety. Let Y denote the door initially chosen Then Now given that Y = 1, it follows that X = 2, and so E[eIXI Y = 1] = e 21 • Also. given that Y = 2, it follows that X 3 + X', where X' is the number of additional hours to safety after returning to the mine. But once the miner returns to his cell the problem is exactly as before, and thus X' has the same distnbution as X. Therefore, Similarly, E[e'XI Y 2] = E[e ' (3+X)] = e 3' E[e IX ]. Substitution back into (1.5.2) yields or E[ IX] _ el' e - 3 - ~ I - e 5 l 23 Not only can we obtain expectations by first conditioning upon an appro· priate random variable, but we may also use this approach to compute proba· bilities. To see this, let E denote an arbitrary event and define the indicator 24 PRELIMINARIES random vanable X by x=G if E occurs if E does not occur. It follows from the definition of X that E[X] = peE) E[XI y= y] = p(EI y= y) for any random variable Y. Therefore, from Equation (1.5.1) we obtain that peE) = f peE I Y = y) dFy(y). ExAMPLE 1.S(c) Suppose in the matching problem, Example 1.3(a), that those choosing their own hats depart, while the others (those without a match) put their selected hats in the center of the room, mix them up, and then reselect. If this process continues until each individual has his or her own hat, find E[Rn] where Rn is the number of rounds that are necessary. We will now show that E [Rn] = n. The proof will be by induction on n, the number of individuals. As it is obvious for n = 1 assume that E[Rkl = k for k = 1, . ,n - 1 To compute E[Rn], start by conditioning on M, the number of matches that occur in the first round This gives n E[Rn] = L E[R n I M = i]P{M = I} . • =0 Now, given a total of i matches in the initial round, the number of rounds needed will equal 1 plus the number of rounds that are req uired when n - i people remain to be matched with their hats Therefore, n E[R n l = L (1 + E[Rn_,])P{M = i} .=0 n = 1 + E[Rn]P{M = O} + L E[Rn-,]P{M = i} .=1 n = 1 + E[Rn]P{M = O} + L (n - i)P{M = i} .=1 (by the induction hypothesis) = 1 + E[Rn]P{M = O} + n(1- P{M = O}) - E[M] = E[Rn]P{M = O} + n(1 - P{M = O}) (since E [M] = 1) which proves the result. CONDITIONAL EXPECTATION EXAMPLE 1..5(D) Suppose that X and Yare independent random variables having respective distributions F and G Then the distribu- tion of X + Y -which we denote by F * G, and call the convolution of F and G-is given by (F* G)(a) = P{X + Y:s a} = roo P{X + Y =s; a I Y = y} dG(y) = roo P{X + y < a I Y = y} dG(y) = roo F(a - y) dG(y) We denote F * F by F2 and in general F * F n - I = Fn. Thus F n , the n-fold convolution of F with itself, is the distribution of the sum of n independent random variables each having distribution F EXAMPLE 1.5(E) The Ballot Problem. In an election, candidate A receives n votes and candidate B receives m votes, where n > m. Assuming that all orderings are equally likely, show that the probability that A is always ahead in the count of votes is (n - m)/(n + m) Solution. Let Pn.m denote the desired probability By condition- ing on which candidate receives the last vote counted we have Pn.m = P{A always ahead I A receives last vote} _n_ n+m + P{A always ahead I B receives last vote} ~ n+m Now it is easy to see that, given that A receives the last vote, the probability that A is always ahead is the same as if A had received a total of n - 1 and B a total of m votes As a similar result is true when we are given that B receives the last vote, we see from the above that (153) n m Pn.m = --P n - lm +--P nm - 1 • n+m m+n We can now prove that n-m P nm =-- n+m by induction on n + m As it is obviously true when n + m = I-that is, P I • D = I-assume it whenever n + m = k Then when 2S 26 PRELIMIN ARIES n + m = k + 1 we have by (1.5.3) and the induction hypothesis n n-l-m m n-m+l P =-- +------ n.m n + m n - 1 + m m + n n + m - 1 n-m = n+m The result is thus proven. The ballot problem has some interesting applications For exam- ple, consider successive flips of a coin that always lands on "heads" with probability p, and let us determine the probability distnbution of the first time, after beginning, that the total number of heads is equal to the total number of tails. The probability that the first time this occurs is at time 2n can be obtained by first conditioning on the total number of heads in the first 2n trials. This yields P{first time equal = 2n} = P{first time equal = 2n I n heads in first 2n} (2:) pn(1 - p )n. Now given a total of n heads in the first 2n flips, it is easy to see that all possible orderings of the n heads and n tails are equally likely and thus the above conditional probability is equivalent to the probability that in an election in which each candidate receives n votes, one of the candidates is always ahead in the counting until the last vote (which ties them) But by conditioning on whoever receives the last vote, we see that this is just the probability in the ballot problem when m == n - 1. Hence, P{lirst time equal = 2n} = p", I ( ~ ) p"(l - p)" ( ~ ) p"(l - p)' = 2n - 1 EXAMPLE 1..5(F) The Matching Problem Revisited Let us recon- sider Example 1 3(a) in which n individuals mix their hats up and then randomly make a selection We shall compute the probability of exactly k matches First let E denote the event that no matches occur, and to make explicit the dependence on n write P n = P(E) Upon conditioning on whether or not the first individual selects his or her own hat-call these events M and MC-we obtain P n = P(E) = P(EI M)P(M) + p(EI MC)P(MC) CONDITIONAL EXPECTATION Clearly, P( ElM) = 0, and so (1 5.4) Now, P(E I Me) is the probability of no matches when n - 1 people select from a set of n - 1 hats that does not contain the hat of one of them. This can happen in either of two mutually exclusive ways Either there are no matches and the extra person does not select the extra hat (this being the hat of the person that chose first), or there are no matches and the extra person does select the extra hat. The probability of the first of these events is P n - I , which is seen by regarding the extra hat as "belonging" to the extra person Since the second event has probability [l/(n - 1)]P n - 2 , we have and thus, from Equation (1.5.4), n - 1 1 P n = --P n - I + - P n - 2 , n n or, equivalently, (1.55) However, clearly PI = 0, Thus, from Equation (1 5.5), P _ P __ (P3 - P2) _ .1 4 3 - 4 - 4! and, in general, we see that ( -l)n .+-- n' To obtain the probability of exactly k matches, we consider any fixed group of k individuals The probability that they, and only 27 28 PRELIM IN ARIES they, select their own hats is 1 1 1 _ (n - k)! ;;-n ---1 ••. n - (k - 1) P n - k - n! P n - k , where P n - k is the conditional probability that the other n - k individuals, selecting among their own hats, have no matches. As there are (;) choices of a set of k individuals, the desired probabil- ity of exactly k matches is 1 1 {_1)n-k ---+ ... ( n) (n k)t P = 2' 3t {n - k)t I n-k k n. kt which, for n large, is approximately equal to e-1/kL Thus for n large the number of matches has approximately the Poisson distribution with mean 1. To understand this result better recall that the Poisson distribution with mean A is the limiting distnbution of the number of successes in n independent trials, each resulting in a success with probability PI'!, when npl'! ...... A as n ...... 00. Now if we let x = {1 , 0 if the ith person selects his or her own hat otherwise, then the number of matches, X, can be regarded as the number of successes in n tnals when each is a success with probability lin. Now, whereas the above result is not immediately applicable because these trials are not independent, it is true that it is a rather weak dependence since, for example, P{X, = 1} = lin and P{X, = 11X/ = 1} = 1/{n - 1), j ¥: i. Hence we would certainly hope that the Poisson limit would still remain valid under this type of weak dependence. The results of this example show that it does. ExAMPLE 1.5(6) A Packing Problem. Suppose that n points are arranged in linear order, and suppose that a pair of adjacent points is chosen at random. That is, the pair (i, i + 1) is chosen with probability lI(n - 1), i = 1, 2, '" n - 1. We then continue to randomly choose pairs, disregarding any pair having a point CONDITIONAL EXPEL'TATION . ....-_ ..... -_. . .... -_. 2 3 4 5 6 7 a Figure 1.5.1 previously chosen, until only isolated points remain. We are inter- ested in the mean number of isolated points. For instance, if n 8 and the random pairs are, in order of appearance, (2, 3), (7, 8), (3, 4), and (4. 5), then there will be two isolated points (the pair (3, 4) is disregarded) as shown in Figure 1 5.1. If we let { I f - IJl J 0 if point i is isolated otherwise, then L:"1 f,JI represents the number of isolated points. Hence n E[number of isolated points] = L E[l,,n] 1=1 n = L P,,n. 1=1 where P,,n is defined to be the probability that point i is isolated when there are n points. Let That is, P n is the probability that the extreme point n (or 1) will be isolated. To derive an expression for P,,I1> note that we can consider the n points as consisting of two contiguous segments, namely, 1, 2,. " i and i, i + 1, ... , n, Since point i will be vacant if and only if the right-hand point of the first segment and the left-hand point of the second segment are both vacant, we see that (1.5.6) Hence the P'JI will be determined if we can calculate the correspond- ing probabilities that extreme points will be vacant. To denve an expression for the P n condition on the initial pair-say (i, i + 1 )-and note that this choice breaks the line into two independent segments-I, 2, •. , i-I and i + 2, ... , n. That is, if the initial 29 30 PRELIMINARIES pair is (i, i + 1), then the extreme point n will be isolated if the extreme point of a set of n - i-I points is isolated. Hence we have /I-I P P + ... + P P - 'L _/1_-1_-1 __ I ____ /1_-2 /I - 1"1 n - 1 - n - 1 or (n - l)P/I = PI + ... + P/I-2, Substituting n - 1 for n gives (n - 2)P n - 1 = PI + ' , , + P n - 3 and subtracting these two equations gives (n - l)P/I - (n - 2)P n - 1 = P/I-2 or P _ P = _ P n - I - Pn- 2 /I /I-I n -1 . Since PI = 1 and P 2 = 0, this yields P 3 P 2 = P 2 - PI 1 1 2 2 or P 3 =- 2! P 4 - P 3 =- P 3 P 2 1 1 1 P 4 = 2l - 31' 3 3! or and, in general, 1 1 (-1)11-1 /I-I (-I)' ---+ ... + ="-- 2' 3' (1)1 L.J " ' ,. n - , ,"0 J. Thus, from (1.5,6), L.J " ,"0 J. i = 1, n P 1 ,I1.= 0 i = 2, n - 1 ,-I (-1)//1-' (-1)1 'L-.-, 'L-.-, 2<i<n-1. /"0 J. 1'''0 J. CONDITIONAL EXPECTATION For i and n - i large we see from the above that P,.n :::::: e- 2 , and, in fact, it can be shown from the above that L ~ : I P, n-the expected number of vacant points-is approximately given by n L P,n :::::: (n + 2)e- 2 for large n ,=1 ExAMPLE 1.5(H) A Reliability Example. Consider an n compo- nent system that is subject to randomly occurring shocks Suppose that each shock has a value that, independent of all else, is chosen from a distribution G If a shock of value x occurs, then each component that was working at the moment the shock arrived will, independently, instantaneously fail with probability x We are interested in the distribution of N, the number of necessary shocks until all components are failed To compute P{N > k} let E" i = 1, , n, denote the event that component i has survived the first k shocks. Then P{N> k} = P ( Y E,) = L P(E,) - L P(E,E t ) ,<I To compute the above probability let PI denote the probability that a given set of j components will all survive some arbitrary shock Conditioning on the shock's value gives Since we see that PI = f P{set ofj survive I value is x} dG(x) = f (1 - X)I dG(x) P(E,) = pf, P(E,E t ) = p ~ , P{N> k} = npt - ( ~ ) p1 + (;) p1 31 32 PRELIMINARIES The mean of N can be computed from the above as follows: ex> E[N]:=o L P{N > k} k=O n (n) ex> :=0 ~ i (-1)1+1 ~ P ~ == ~ ( ~ ) (-1)'+1. 1=1 I 1 - P, The reader should note that we have made use of the identity E [N] = ~ ; = o P{N > k}, valid for all nonnegative integer·valued random variables N (see Problem 1.1) EXAMPLE 1..5(1) Classifying a Poisson Number of Events. Suppose that we are observing events, and that N, the total number that occur, is a Poisson random variable with mean A. Suppose also that each event that occurs is, independent of other events, classified as a type j event with probability PI' j = 1, ... , k, k ~ PI = 1 Let NI denote the number of type j events that occur, 1= 1 j = 1, , k, and let us determine their joint probability mass function k For any nonnegative integers n" j = 1, Then, since N = ~ N I , we have that , k, let n = ~ n l • 1= 1 I P{N I = n"j= 1, . ,k} = P{N I = n"j = 1, ... ,k IN = n}P{N = n} + P{N I = nl,j = 1, ... ,k I N ~ n}P{N ¥= n} = P{N I = n"j = 1, ., kiN = n}P{N = n}. Now, given that there are a total of N = n events it follows, since each event is independently a type j event with probability PI' 1 :S j < k, that N .. N 2 , , Nk has a multinomial distribution with parameters nand Pl ,P2' ., Pk. Therefore, CONDITIONAL EXPECTATION Thus we can conclude that the N, are independent Poisson random variables with respective means API' j = 1, .. , k. Conditional expectations given that Y = y satisfy all of the properties of ordinary expectations, except that now all probabilities are conditioned on the event that {Y = y} Hence, we have that E [ ~ X I I Y = y] = ~ E[X,I Y = y] implying that Also, from the equality E [X] = E[E [Xl Y]] we can conclude that E[XI W = w] = E[E[XI W = w, Y] I W = w] or, equivalently, E[XI W] = E[E[XI W, Y] I W] Also, we should note that the fundamental result E[X] = E[E[XI Y]] remains valid even when Y is a random vector. 1.5.1 Conditional Expectations and Bayes Estimators Conditional expectations have important uses in the Bayesian theory of statis- tics A classical problem in this area arises when one is to observe data X = (Xl, ... , Xn) whose distribution is determined by the value of a random variable 8, which has a specified probability distribution (called the prior distribution) Based on the value of the data X a problem of interest is to estimate the unseen value of 8. An estimator of 8 can be any function d(X) of the data, and in Bayesian statistics one often wants to choose d(X) to minimize E [(d(X) - 8)21 X], the conditional expected squared distance be- tween the estimator and the parameter Using the facts that (I) conditional on X, d(X) is a constant; and (ii) for any random variable W, E[(W - C)2] is minimtzed when c = E[W] 34 PRELIMINARIES it follows that the estimator that minimizes E[(d(X) - (1)2IX], called the Bayes estimator, is given by d(X) = E[OIX]. An estimator d(X) is said to be an unbiased estimator of (1 if E [d(X) 10] o. An important result in Bayesian statistics is that the only time that a Bayes estimator is unbiased is in the tnvial case where it is equal to (1 with probability 1. To prove this, we start with the following lemma. Lemma :1.5.:1 For any random variable Y and random vector Z Proof E[(Y E[YI Z])E[YI Z]] = 0 E[YE[YIZ]] = E[E[YE[YI Z] IZ]] = E[E[YI Z]E[YI Z]] where the final equality follows because, given Z, E[YI Z] is a constant and so E[YE[YI Z] I Z] = E[YIZ]E[YI Z] Since the final equality is exactly what we wanted to prove, the lemma follows. PROPOSITION :1.5.2 If P{E[OIX] :::: O} -=1= 1 then the Bayes estimator E[OIX] is not unbiased Proof Letting Y = 0 and Z = X in Lemma 151 yields that (15.7) E[(O E[OIX])E[OIX]] = 0 Now let Y = E[ 01 X] and suppose that Y is an unbiased estimator of 0 so that E[Y! 0] = 0 Letting Z = 0 we obtain from Lemma 151 that ( 1.5.8) E[(E[O!X] 8)0] = 0 Upon adding Equations (1.57) and (1 58) we obtain that E[(8 - E[OIX])E[O!X]] + E[(E[OIX] - 0)0] = 0 EXPONENTIAL DISTRIBUTION, LACK OF MEMORY, HAZARD RATE FUNCTIONS 35 or, E[(O - E[OIX])E[OIX] + (E[OIX] - 0)0] = 0 or. -E[(O - E[8IX]f] = 0 implying that, with probability 1, 8 - E[8IX] = 0 1.6 THE EXPONENTIAL DISTRIBUTION. LACK OF MEMORY. AND HAZARD RATE FUNCTIONS A continuous random vanable X is said to have an exponential distribution with parameter A. A > O. if its probability density function is given by { A -Ax f(x) = oe x2::0 x<O. or. equivalently, if its distribution is { I - e- Ax F(x) = IX f(y) dy = -00 0 x;;:: 0 x< O. The moment generating function of the exponential distribution is given by (1.6.1) All the moments of X can now be obtained by differentiating (1.6.1), and we leave it to the reader to verify that E[X] = l/A, Yar(X) = 11 A2, The usefulness of exponential random variables denves from the fact that they possess the memoryless property, where a random variable X is said to be without memory. or memoryless. if (16.2) P{X> s + tlX> t} = P{X> s} for s, t ;;:: O. 36 PRELiMINARIES If we think of X as being the lifetime of some instrument, then (1.62) states that the probability that the instrument lives for at least s + t hours, given that it has survived t hours, is the same as the initial probability that it lives for at least s hours. In other words, if the instrument is alive at time t, then the distribution of its remaining life is the original lifetime distribution. The condition (1.6.2) is equivalent to F(s + t) F(s)F(t). and since this is satisfied when F is the exponential, we see that such random variables are memoryless ExAMPlE 1..6(A) Consider a post office having two cler ks, and sup- pose that when A enters the system he discovers that B is being served by one of the clerks and C by the other. Suppose also that A is told that his service will begin as soon as either B or C leaves. If the amount of time a clerk spends with a customer is exponentially distributed with mean 11 A, what is the probability that, of the three customers, A is the last to leave the post office? The answer is obtained by reasoning as follows: Consider the time at which A first finds a free clerk. A t this point either B or C would have just left and the other one would still be in service. However, by the lack of memory of the exponential, it follows that the amount of additional time that this other person has to spend in the post office is exponentially distributed with mean 11 A. That is, it is the same as if he was just starting his service at this point Hence, by symmetry, the probability that he finishes before A must equal ~ ExAMPlE 1..6(B) Let XI, X 2 , •• be independent and identically dis- tributed continuous random variables with distribution F. We say that a record occurs at time n, n > 0, and has value Xn if Xn > max(XI! . ,X n -,), where Xo = 00. That is, a record occurs each time a new high is reached. Let 1i denote the time between the ith and the (i + l)th record. What is its distribution? As a preliminary to computing the distribution of 1i, let us note that the record times of the sequence X" X 2 , • will be the same as for the sequence F(X,), F(X 2 ), ,and since F(X) has a uniform (0,1) distribution (see Problem 1.2), it follows that the distribution of 1i does not depend on the actual distribution F (as long as it is continuous). So let us suppose that Fis the exponential distribution with parameter A 1 To compute the distribution of T" we will condition on R, the ith record value. Now RI = XI is exponential with rate 1. R2 has the distribution of an exponential with rate 1 given that it is greater than R,. But by the lack of memory property of the exponential this EXPONENTIAL DIS I RIBUIION, LACK OF MEMORY. HAZARD RAIE FUNCTIONS 37 means that R2 has the same distribution as RI plus an independent exponential with rate 1. Hence R2 has the same distribution as the sum of two independent exponential random variables with rate 1. The same argument shows that R, has the same distribution as the sum of i independent exponentials with rate 1. But it is well known (see Problem 1.29) that such a random variable has the gamma distribution with parameters (i, 1). That is, the density of R, is given by t ~ O . Hence, conditioning on R, yields i ~ 1, where the last equation follows since if the ith record value equals t, then none of the next k values will be records if they are all less than t. It turns out that not only is the exponential distribution "memoryless," but it is the unique distribution possessing this property. To see this, suppose that X is memoryless and let F(x) = P{X> x} Then F(s + t) = F(s)F(t). That is, Fsatisfies the functional equation g(s + t) = g(s)g(t). However, the only solutions of the above equation that satisfy any sort of reasonable condition (such as monotonicity, right or left continuity, or even measurability) are of the form g(x) = e- Ax for some suitable value of A. [A simple proof when g is assumed right continu- ous is as follows. Since g(s + t) = g(s)g(t), it follows that g(2In) = g(lIn + lin) = g2(lIn). Repeating this yields g(mln) = g"'(lIn). Also g(l) = g(lIn + ... + lin) = gn(lIn). Hence, g(mln) = (g(l))mln, which implies, since g is right continuous, that g(x) = (g(l)), Since g(l) = g2(1I2) ~ 0, we obtain g(x) = e A .... where A = -log(g(l))] Since a distribution function is always 38 PRELIMINARIES right continuous, we must have The memoryless property of the exponential is further illustrated by the failure rate function (also called the hazard rate function) of the exponen- tial distribution Consider a continuous random variable X having distribution function F and density f The failure (or hazard) rate function A(t) is defined by (1 6.3) A(t) = [(f) F(t) To interpret A(t), think of X as being the lifetime of some item, and suppose that X has survived for t hours and we desire the probability that it will not survive for an additional time dt That is, consider P{X E (t, t + dt) I X> t} Now P{X ( d ) I X } = P{X E (t, t + dt), X > t} E t, t + t > t P{X> t} _ P{XE (t,t + dt)} - P{X> t} f(t) dt ~ - - - - F(t) = A(t) dt That is, A(t) represents the probability intensity that a t-year-old item will faiL Suppose now that the lifetime distribution is exponential Then, by the memoryless property, it follows that the distribution of remaining life for a t-year-old item is the same as for a new item. Hence A(t) should be constant. This checks out since Thus, the failure rate function for the exponential distribution is constant The parameter A is often referred to as the rate of the distribution. (Note that the rate is the reciprocal of the mean, and vice versa) It turns out that the failure rate function A(t) uniquely determines the distribution F. To prove this, we note that d - - - F(t) A( t) = ---=d::--t _ F(t) SOME PROBABILITY INEQUALITIES Integration yields log F(t) = f>\(I) dt + k or Letting t == 0 shows that c = 1 and so F(t) ~ exp{ - f ~ A(t) dt} 1.7 SOME PROBABILITY INEQUALITIES We start with an inequality known as Markov's inequality. Lemma 1.7.1 Markov's Inequality If X is a nonnegative random variable, then for any a > 0 P{X:;:: a} =s; E[X]/a 39 Proof Let I{X ;::: a} be 1 jf X;::: a and 0 otherwise. Then, it is easy to see since X ;::: o that aI{X ;::: a} =s; X Taking expectations yields the result PROPOSITION 1.7.2 Chernoff Bounds Let X be a random variable with moment generating function M(t) E[e' X ] Then for a> 0 P{X;::: a} =s; e-IOM(I) P{X =s; a} =s; e-lUM(t) foral! t > 0 for all t < O. 40 PRELIMINARIES Proof For t > 0 where the inequality follows from Markov's inequality. The proof for t < 0 is similar Since the Chernoff bounds hold for all t in either the positive or negative quadrant, we obtain the best bound on P{X :> a} by using that t that mini- mizes r'DM(t) ExAMPLE 1.7(A) Chemoff Bounds/or Poisson Random Variables. If Xis Poisson with mean A, then M(t) := eA(e'-I) Hence, the Chernoff bound for P{X ~ j} is The value of t that minimizes the preceding is that value for which e' := j/ A. Provided that j/ A > 1, this minimizing value will be positive and so we obtain in this case that Our next inequality relates to expectations rather than probabilities. PROPOSITION 1.7.3 Jensen's Inequality If f is a convex function, then E[f(X)] ~ f(E[X]) provided the expectations exist. Proof We will give a proof under the supposition that fhas a Taylor series expansion Expanding about IJ- = E[X] and using the Taylor series with a remainderformula yields f(x) = f(lJ-) + f'(IJ-)(x - IJ-) + f"(fJ(x - ~ ) 2 / 2 ~ f(lJ-) + f'(IJ-)(x - IJ-) since f " ( ~ ) ~ 0 by convexity Hence, STOCHASTIC PROCESSES 41 Taking expectations gives that 1.8 LIMIT THEOREMS Some of the most important results in probability theory are in the form of limit theorems. The two most important are: StrongJLaw of Large Numbers· If XI, X 2 , ••• are independent and identically distnbuted with mean IL, then P {lim(X I + ... + X/I)/n = IL} 1. /1-+'" Central Limit Theorem If XI, X 2 , •• are independent and identically distnbuted with mean IL and variance a 2 , then lim P /I $a = II e- x212 dx. { XI + ... + X - nIL } f 1 n-+'" aVn -'" Th us if we let S" = :2::=1 X" where XI ,X2 , •• are independent and identically distributed, then the Strong Law of Large Numbers states that, with probability 1, S,,/n will converge to E[X,j; whereas the central limit theorem states that S" will have an asymptotic normal distribution as n ~ 00. 1.9 STOCHASTIC PROCESSES A stochastic process X = {X(t), t E T} is a collection of random variables. That is, for each t in the index set T, X(t) is a random variable. We often interpret t as time and call X(t) the state of the process at time t. If the index set T is a countable set, we call X a discrete-time stochastic process, and if T is a continuum, we call it a continuous-time process. Any realization of X is called a sample path. For instance, if events are occurring randomly in time and X(t) represents the number of events that occur in [0, t], then Figure 1 91 gives a sample path of X which corresponds • A proof of the Strong Law of Large Numhers is given in the Appendix to this chapter 42 PRELIMINARIES 3 2 2 3 4 t Figure 1.9.1. A sample palh oj X(t) = number oj events in [0, I]. to the initial event occumng at time 1, the next event at time 3 and the third at time 4, and no events anywhere else A continuous-time stochastic process {X(t), t E T} is said to have indepen- dent increments if for all to < tl < t2 < ... < tn. the random variables are independent. It is said to possess stationary increments if X(t + s) - X(t) has the same distribution for all t. That is, it possesses independent increments if the changes in the processes' value over nonoverlapping time intervals are independent; and it possesses stationary increments if the distribution of the change in value between any two points depends only on the distance between those points EXAMPLE 1..9(A) Consider a particle that moves along a set of m + t nodes, labelled 0, 1, .. ,m, that are arranged around a circle (see Figure 1 92) At each step the particle is equally likely to move one position in either the clockwise or counterclockwise direction. That is, if Xn is the position of the particle after its nth step then P{X n + 1 i + tlXn = i} = P{Xn+1 = i -tlX n = i} = 112 where i + 1 0 when i m, and i-I = m when i = O. Suppose now that the particle starts at 0 and continues to move around according to the above rules until all the nodes t, 2, .. ,m have been visited What is the probability that node i, i = 1, . , m, IS the last one visited? Solution Surprisingly enough, the probability that node i is the last node visited can be determined without any computations. To do so, consider the first time that the particle is at one of the two STOCHASTIC PROCESSES Figure 1.9.2. Particle moving around a circle. neighbors of node i, that is, the first time that the particle is at one of the nodes i - 1 or i + 1 (with m + 1 == 0) Suppose it is at node i - 1 (the argument in the alternative situation is identical). Since neither node i nor i + 1 has yet been visited it follows that i will be the last node visited if, and only if, i + 1 is visited before i. This is so because in order to visit i + 1 before i the particle will have to visit all the nodes on the counterclockwise path from i - 1 to i + 1 before it visits i. But the probability that a particle at node i - 1 will visit i + 1 before i is just the probability that a particle will progress m - 1 steps in a specified direction before progressing one step in the other direction That is, it is equal to the probability that a gambler who starts with 1 unit, and wins 1 when a fair coin turns up heads and loses 1 when it turns up tails, will have his fortune go up by m - 1 before he goes broke. Hence, as the preceding implies that the probability that node i is the last node visited is the same for all i, and as these probabilities must sum to 1, we obtain P{i is the last node visited} = 11m, i = 1, ,m. 43 Remark The argument used in the preceding example also shows that a gambler who is equally likely to either win or lose 1 on each gamble will be losing n before he is winning 1 with probability 1/(n + 1), or equivalently P{gambler is up 1 before being down n} = _n_. n + 1 Suppose now we want the probability that the gambler is up 2 before being down n. Upon conditioning upon whether he reaches up 1 before down n we 44 PRELIMINARIES obtain that P{gambler is up 2 before being down n} n :::: P{ up 2 before down n I up 1 before down n} --1 n+ n = P{up 1 before downn + 1}--1 n+ n + 1 n n =n+2n+l=n+2' Repeating this argument yields that P{gambler is up k before being down n}:::: ~ k ' n+ EXAMPLE 1..9(a) Suppose in Example 1 9(A) that the particle is not equally likely to move in either direction but rather moves at each step in the clockwise direction with probability p and in the counterclockwise direction with probability q :::: 1 - P If 5 < P < 1 then we will show that the probability that state i is the last state visi ted is a strictly increasing function of i, i = I, .. , m To determine the probability that state i is the last state visited, condition on whether i-I or i + 1 is visited first Now, if i-I is visited first then the probability that i will be the last state visited is the same as the probability that a gambler who wins each 1 unit bet with probability q will have her cumulative fortune increase by m - 1 before it decreases by 1 Note that this probability does not depend on i, and let its value be PI. Similarly, if i + 1 is visited before i-I then the probability that i will be the last state visited is the same as the probability that a gambler who wins each 1 unit bet with probability p will have her cumulative fortune increase by m - 1 before it decreases by 1 Call this probability P 2 , and note that since p > q, PI < P 2 Hence, we have P{i is last state} = PIP{i - 1 before i + I} + P 2 (1 - P{i - 1 before i + I}) :::: (PI - P 2 )P{i - 1 before i + I} + P 2 Now, since the event that i-I is visited before i + 1 implies the event that i - 2 is visited before i, it follows that P{i - 1 before i + I} < P{i - 2 before i}, and thus we can conclude that P{i - 1 is last state} < P{i is last state} STOCHASTIC PROCESSES EXAMPLE 1.9(c) A graph consisting of a central vertex, labeled 0, and rays emanating from that vertex is called a star graph (see Figure 1 93) Let r denote the number of rays of a star graph and let ray i consist of n, vertices, for i = 1, , r Suppose that a particle moves along the vertices of the graph so that it is equally likely to move from whichever vertex it is presently at to any of the neighbors of that vertex, where two vertices are said to be neighbors if they are joined by an edge. Thus, for instance, when at vertex 0 the particle is equally likely to move to any of its r neighbors The vertices at the far ends of the rays are called leafs What is the probability that, starting at node 0, the first leaf visited is the one on ray i, i = 1, , r? Solution Let L denote the first leaf visited. Conditioning on R, the first ray visited, yields (1 91) P{L = i} = i ! P{L = i I first ray visited is j} 1 .. 1 r Now, if j is the first ray visited (that is, the first move of the particle is from vertex 0 to its neighboring vertex on ray j) then it follows, from the remark following Example 1 9(A), that with probability 11nl the particle will visit the leaf at the end of ray j before returning to 0 (for this is the complement of the event that the gambler will be up 1 before being down n - 1) Also, if it does return to 0 before reaching the end of ray j, then the problem in essence begins anew Hence, we obtain upon conditioning whether the particle reaches the end of ray j before returning to 0 that P{L = i I first ray visited is i} = lin, + (1 - lIn,)P{L = i} P{L = il first ray visited isj} = (1- 1In l )P{L = i}, for j "" i n 1 ) Leaf t!:eaf Figure 1.9.3. A star graph. 45 46 PRELIMINARIES Substituting the preceding into Equation (191) yields that or P{L i} i = 1, ., r. PROBLEMS 1.1. Let N denote a nonnegative integer-valued random variable. Show that ., to E [N] 2: P{N:> k} = 2: P{N > k}. hal h .. O In general show that if X is nonnegative with distribution F, then E{XJ J: F(x) dx and 1.2. If X is a continuous random variable having distribution F show that. (a) F(X) is uniformly distributed over (0, 1), (b) if U is a uniform (0, 1) random variable, then rl( U) has distribution F, where rl(x) is that value of y such that F(y) = x 1.3. Let Xn denote a binomial random variable with parameters (n, Pn), n :> 1 If npn -+ A as n -+ 00, show that as n -+ 00. 1.4. Compute the mean and variance of a binomial random variable with parameters nand P 1.5. Suppose that n independent trials-each of which results in either out- come 1, 2, .. , r with respective probabilities Ph P2, • Pr-are per- formed, ~ ~ P, 1 Let N, denote the number of trials resulting in outcome i pROBLEMS 47 (a) Compute the joint distribution of N I , ••• ,N, This is called the multinomial distribution (b) Compute Cov(N/, ~ ) . (c) Compute the mean and variance of the number of outcomes that do not occur. 1.6. Let XI, X 2 , •• be independent and identically distributed continuous random variables We say that a record occurs at time n, n > 0 and has value X" if X" > max(X I , • , X,,_I), where Xo == 00 (a) Let N" denote the total number of records that have occurred up to (and including) time n. Compute E[N n ] and Var(N n ). (b) Let T = min{n. n > 1 and a record occurs at n}. Compute P{T > n} and show that P{T < oo} 1 and E[T] 00. (c) Let Ty denote the time of the first record value greater than y. That IS, Ty = min{n. X" > y}. Show that Ty is independent of X r . That is, the time of the first r value greater than y is independent of that value. (It rpay seem more intuitive if you turn this last statement around.) 1.7. Let X denote the number of white balls selected when k balls are chosen at random from an urn containing n white and m black balls. Compute E [X] and Var(X). 1.8. Let Xl and X 2 be independent Poisson random variables with means AI and A 2 • (a) Find the distribution of Xl + X 2 (b) Compute the conditional di§tribution of XI given that Xl + X 2 = n 1.9. A round-robin tournament of n contestants is one in which each of the ( ~ ) pairs of contestants plays each other exactly once, with the outcome of any play being that one of the contestants wins and the other loses Suppose the players are initially numbered 1,2,. , n. The permutation i" ... , in is called a Hamiltonian permutation if i l beats ;2, ;2 beats i 3 , .. ,and i,,_1 beats i". Show that there is an outcome of the round-robin for which the number of Hamiltonians is at least n!/2,,-I. (Hint. Use the probabilistic method.) 1.10. Consider a round-robin tournament having n contestants, and let k, k < n, be a positive integer such that (:) [1 - 112*]"-* < 1 Show that 48 PRELlMINA RIES it is possible for the tournament outcome to be such that for every set of k contestants there is a contestant who beat every member of this set. 1.11. If X is a nonnegative integer-val ued random variable then the function P(z), defined for Izi :5 1 by .. P(z) E[zX] = L ZIP{X = j}, 1=0 is called the probability generating function of X (a) Show that (b) With 0 being considered even, show that P{Xiseven} P(-l) + P(l) 2 (c) If X is binomial with parameters nand p, show that 1 + (1 - 2p)n P{X is even} = 2 . (d) If X is Poisson with mean A, show that 1 + e- 2A P{X is even} = 2 (e) If X is geometric with parameter p, show that 1-p P{X is even} = -2 -. -p (f) If X is a negative binomial random variable with parameters rand p, show that P{Xiseven} ~ [1 + (-1)' (2 ~ p)']. 1.12. If P{O <: X <: a} 1, show that Var(X) <: a 2 /4. PROBLEMS 49 1.13. Consider the following method of shuffling a deck of n playing cards, numbered 1 through n. Take the top card from the deck and then replace it so that it is equally likely to be put under exactly k cards, for k = 0, 1,. ,n - 1. Continue doing this operation until the card that was initially on the bottom of the deck is now on top. Then do it one more time and stop (a) Suppose that at some point there are k cards beneath the one that was originally on the bottom of the deck Given this set of k cards explain why each of the possible k! orderings is equally likely to be the ordering of the last k cards (b) Conclude that the final ordering of the deck is equally likely to be any of the N! possible orderings. (c) Find the expected number of times the shuffling operation is per- formed. 1.14. A fair die is continually rolled until an even number has appeared on 10 distinct rolls. Let X, denote the number of rolls that land on side i. Determine (a) E{Xd. (b) E[X2]. (c) the probability mass function of XI (d) the probability mass function of X 2 • 1.15. Let F be a continuous distribution function and let U be a uniform (0, 1) random variable (a) If X = F- I ( U), show that X has distribution function F. (b) Show that -log(U) is an exponential random variable with mean 1. 1.16. Let f(x) and g(x) be probability density functions, and suppose that for some constant c,f(x) <: cg(x) for all x. Suppose we can generate random variables having density functiong, and consider the following algorithm. Step 1: Generate Y, a random variable having density function g. Step 2: Generate U, a uniform (0, 1) random variable. Step 3: If U <: ~ ~ ~ ) set X = Y. Otherwise, go back to Step 1. Assuming that successively generated random variables are indepen- dent, show that: (a) X has density function f (b) the number of iterations of the algorithm needed to generate X is a geometric random variable with mean c. 1.17. Let XI, . , Xn be independent and identically distributed continuous random variables having distribution F. Let X,n denote the ith smallest of XI,' , Xn and let F.n be its distribution function. Show that· so PRELIMINARIES (a) Frn(x) = F(x)F r _ 1 n_l(X) + F(x)Frn_l(x) i n - i (b) Frn_l(x) = - Fr+ln(x) + - Frn(x) n n (Hints: For part (a) condition on whether Xn <:: x, and for part (b) start by conditioning on whether Xn is among the i smallest of XI, ., Xn) 1.18. A coin, which lands on heads with probability p, is continually flipped Compute the expected number of flips that are made until a string of r heads in a row is obtained 1.19. An urn contains a white and b black balls After a ball is drawn, it is returned to the urn if it is white; but if it is black, it is replaced by a white ball from another urn Let Mn denote the expected number of white balls in the urn after the foregoing operation has been repeated n times. (a) Derive the recursive equation (b) Use part (a) to prove that M = a + b - b (1 __ 1_)n n a+b (c) What is the probability that the (n + l)st ball drawn is white? L20. A Continuous Random Packing Problem Consider the interval (0, x) and suppose that we pack in this interval random unit intervals-whose left-hand points are all uniformly distributed over (0, x - I)-as follows. Let the first such random interval be II If II, .. , h have already been packed in the interval, then the next random unit interval will be packed if it does not intersect any of the intervals II, ... , h, and the interval will be denoted by h+1 If it does intersect any of the intervals II, .. , h, we disregard it and look at the next random interval The procedure is continued until there is no more room for additional unit intervals (that is, all the gaps between packed intervals are smaller than 1). Let N(x) denote the number of unit intervals packed in [0, xl by this method. For instance, if x = 5 and the successive random intervals are ( 5, 1.5), (31, 4.1), (4, 5), (1.7,27), then N(5) = 3 with packing as follows --/ 1 -- --l:J-- --/ 2 - • • • • • 0.5 1 5 1 7 27 31 4 1 PROBLEMS 51 Let M(x) = E[N(x)1 Show that M satisfies M(x) 0, x < 1, 2 !X-I M(x) x-I 0 M(y) dy + 1, x>1 1.21. Let V" V 2 , be independent uniform (0,1) random variables, and let N denote the smallest value of n, n :::> 0, such that /I /1+1 o II V, <2: e- A > II V" where II V, = 1 pol pol Show that N is a Poisson random variable with mean ,A (Hint. Show by induction on n, conditioning on VI, that P{N = n} = e-A,Anln! ) 1.22. The conditional variance of X, given Y, is defined by Var(XI Y) E{(X - E[XI YD 2 1 Y] Prove the conditional variance formula, namely, Var(X) = E{Var(XI V)] + Var(E[XI YD. Use this to obtain Var(X) in Example 1 5(8) and check your result by differentiating the generating function 1.23. Consider a particle that moves along the set of integers in the following manner If it is presently at i then it next moves to i + 1 with probability p and to i-I with probability 1 - p Starting at 0, let 0: denote the probability that it ever reaches 1. (3) Argue that (b) Show that a = t/(l _ p) ifp <2: 112 ifp < 112 (c) Find the probability that the particle ever reaches n, n > 0 (d) Suppose that p < 112 and also that the particle eventually reaches n, n > 0 If the particle is presently at i, i < n, and n has not yet 52 PRELIMINARIES been reached, show that the particle will next move to i + 1 with probability 1 - p and to i-I with probability p That is, show that P{next at i + 11 at i and will reach '!} = 1 - p (Note that the roles of p and 1 - p are interchanged when it is given that n is eventually reached) / 1.24. In Problem 1 23, let E [T] denote the expected time until the particle reaches 1 (a) Show that 1/(2p - 1) E[T]= 00 ifp> 112 ifp ~ 112 (b) Show that, for p > 112, Var(T) = 4p(1 - p) (2p - 1)3 (c) Find the expected time until the particle reaches n, n > O. (d) Find the variance of the time at which the particle reaches n, n > 0 1.25. Consider a gambler who on each gamble is equally likely to either win or lose 1 unit Starting with i show that the expected time until the gambler's fortune is either 0 or k is i(k - i), i = 0, ... , k. (Hint Let M, denote this expected time and condition on the result of the first gamble) 1.26. In the ballot problem compute the probability that A is never behind in the count of the votes. 1.27. Consider a gambler who wins or loses 1 unit on each play with respective possibilities p and 1 - p. What is the probability that, starting with n units, the gambler will play exactly n + 2i games before going broke? (Hint Make use of ballot theorem.) 1.28. Verify the formulas given for the mean and variance of an exponential random variable 1.29. If Xl, X 2 , ••• ,X n are independent and identically distributed exponential random variables with parameter A, show that L ~ X, has a gamma distri- bution with parameters (n, A) That is, show that the density function of L ~ X, is given by pROBLEMS 53, 1.30. In Example 1 6(A) if server i serves at an exponential rate Ai, i = 1, 2, compute the probability that Mr. A is the last one out 1.31. If X and Yare independent exponential random variables with respective means 11 AI and 1/ A 2 , compute the distnbution of Z = min(X, Y) What is the conditional distribution of Z given that Z = X? 1.32. Show that the only.continuous solution of the functional equation g(s + t) = g(s) + get) is g(s) = cs. 1.33. Derive the distribution of the ith record value for an arbitrary continuous distribution F (see Example 1 6(B)) 1.34. If XI and X 2 are independent nonnegative continuous random variables, show that where A,(t) is the failure rate function of X" 1.35. Let X be a random variable with probability density function f(x), and let M(t) = E [elX] be its moment generating function The tilted density function f, is defined by eUf(x) f,(x) = M(t) . Let X, have density function f, (a) Show that for any function hex) E[h(X)] = M(t)E[exp{-tX,}h(X,)] (b) Show that, for t > 0, P{X > a} :5 M(t)e-,ap{X, > a}. (c) Show that if P{X,* > a} = 1/2 then min M(t)e- ra = M(t*)e- I • a • I 54 PRELIMINARIES 1.36. Use Jensen's inequality to prove that the arithmetic mean is at least as large as the geometric mean. That is, for nonnegative x" show that' n (n )lln ~ x . l n ~ [lx, . 1.37. Let XI, X 2 , ••• be a sequence of independent and identically distributed continuous random variables Say that a peak occurs at time n if X n _ 1 < Xn > X n + l • Argue that the proportion of time that a peak occurs is, with probability 1, equal to 1/3. 1.38. In Example 1 9(A), determine the expected number of steps until all the states I, 2, .. , m are visited (Hint: Let X, denote the number of additional steps after i of these states have been visited until a total of i + 1 of them have been visited, i = 0, 1, ... , m - I, and make use of Problem 1 25.) 1.39. A particle moves along the following graph so that at each step it is equally likely to move to any of its neighbors Starting at 0 show that the expected number of steps it takes to reach n is n 2 • (Hint Let T, denote the number of steps it takes to go from vertex i-I to vertex i, i = I, ... ,n Determine E[T,] recursively, first for i = 1, then i = 2, and so on.) 1.40. Suppose that r = 3 in Example 1 9(C) and find the probability that the leaf on the ray of size nl is the last leaf to be visited. 1.41. Consider a star graph consisting of a central vertex and r rays, with one ray consisting of m vertices and the other r - 1 all consisting of n vertices. Let P, denote the probability that the leaf on the ray of m vertices is the last leaf visited by a particle that starts at 0 and at each step is equally likely to move to any of its neighbors. (a) Find P 2 • (b) Express P, in terms of P,_I. 1.42. Let Y I , Y 2 ,. • be independent and identically distributed with P{Yn = O} = ex P{Yn> Y} = (1 - ex)e- Y , y>O. REFERENCES Define the random variables X n , n > 0 by " Prove that Xo= 0 P{Xn = O} = an P{Xn > x} = (1 - an)e-X, x>O 1.43. For a nonnegative random variable X, show that for a > 0, P{X ~ a} =::;; E[X']ja' Then use this result to show that n' ~ (n/e)n REFERENCES 55 References 6 and 7 are elementary introductions to probability and its applications. Rigorous approaches to probability and stochastic processes, based on measure theory, are given in References 2, 4, 5, and 8 Example L3(C) is taken from Reference 1 Proposition 1 5 2 is due to Blackwell and Girshick (Reference 3). 1 N Alon, J. Spencer, and P. Erdos, The Probabilistic Method, Wiley, New York, 1992. 2 P Billingsley, Probability and Measure, 3rd ed, Wiley, New York, 1995 3 D Blackwell and M A Girshick, Theory of Games and Statistical Decisions, Wiley, New York, 1954 4 L. Breiman, Probability, Addison-Wesley, Reading, MA, 1968 /) 5 R Durrett, Probability Theory and Examples, Brooks/Cole, California, 1991 6 W Feller, An Introduction to Probability Theory and its Applications, Vol. I, Wiley, New York, 1957 7 S M Ross, A First Course in Probability, 4th ed, Macmillan, New York, 1994. 8 D Williams, Probability with Martingales, Cambridge University Press, Cambridge, England, 1991 APPENDIX The Strong Law of Large Numbers If XI, X 2 , ••• is a seq uence of independent and identically distributed random variables with mean p., then { . XI + X 2 + ... + Xn } P lIm = p. = 1. n_O) n Although the theorem can be proven without this assumption, our proof of the strong law of large numbers will assume that the random variables X, have a finite fourth moment. That is, we will suppose that E = K < 00. Proof of the Strong Law of Large Numbers To begin, assume that p., the n mean of the X,, is eq ual to 0 Let Sn = L X, and consider 1=1 Expanding the right side of the above will result in terms of the form and where i, j, k, I are all different As all the X, have mean 0, it follows by independence that = E[XnE[X;] = 0 = E[XnE[X;]E[Xk] = 0 E[X,X;XkAJ] = 0 Now, for a given pair i and j there will be (i) = 6 terms in the expansion that will equal X;X;. Hence, it follows upon expanding the above product S6 THE STRONG LAW OF LARGE NUMBERS and taking expectations terms by term that nE[X:J + 6 (;) EIX!Xi] nK + 3n(n - l)E[X;]EfXn 57 where we have once again made use of the independence assumption Now, since we see that Therefore, from the preceding we have that s nK + 3n(n - l)K which implies that Therefore, it follows that '" 2: ErS!ln4] < 00. (*) n=l Now, for any e > 0 it follows from the Markov inequality that and thus from (*) '" 2: > e} < 00 nul which implies by the Borel-Cantelli lemma that with probability 1, S!ln4 > e for only finitely many n As this is true for any e > 0, we can thus conclude that with probability 1, lim S!ln4 = O. S8 THE STRONG LAW OF LARGE NUMBERS But if S ~ / n 4 = (Snln)4 goes to 0 then so must Sn1n, and so we have proyen that, with probability 1, as n ~ 00 When p., the mean of the X" is not equal to 0 we can apply the preceding argument to the random variables )(. - p. to obtain that, with probability 1, n lim L ()(. - p.)ln = 0, n_1X) l-=} or equivalently, n lim L )(.In = p. n_OO ,.:::1 which proves the result. CHAPTER 2 The Poisson Process 2. 1 THE POISSON PROCESS A stochastic process {N(t), t ~ O} is said to be a counting process if N(t) represents the total number of 'events' that have occurred up to time t. Hence, a counting process N(t) must satisfy (i) N (t) ;2: 0 (ii) N(t) is integer valued (iii) If s < t, then N(s) ~ N(t). (iv) For s < t, N(t) - N(s) equals the number of events that have occurred in the interval (s, t] A counting process is said to possess independent increments if the numbers of events that occur in disjoint time intervals are independent. For example, this means that the number of events that have occurred by time t (that is, N(t» must be independent of the number of events occurring between times t and t + s (that is, N(t + s) - N(t» A counting process is said to possess stationary increments if the distribution of the number of events that occur in any interval of time depends only on the length of the time interval. In other words, the process has stationary increments if the number of events in the interval (t l + s, t2 + s] (that is, N(t z + s) - N(tl + s» has the same distribution as the number of events in the interval (tl, t z ] (that is, N(t2) - N(t l » for all tl <;:: t2, and s > O. One of the most important types of counting processes is the Poisson process, which is defined as follows. Definition 2.1.1 The counting process {N(t). t ;::= O} is said to be a Poisson process having rate A. A > 0, if (i) N(O) = 0 (ii) The process has independent increments S9 60 THE POISSON PROCESS (iii) The number of events in any interval of length t is Poisson distributed with mean At That is, for all s, t ~ 0, (At)" P{N(t + s) - N(s) = n} = e-"-, ' n. n = 0, 1, Note that it follows from condition (iii) that a Poisson process has stationary increments and also that E[N(t)] = At, which explains why A is called the rate of the process In order to determine if an arbitrary counting process is actually a Poisson process, we must show that conditions (i), (ii), and (iii) are satisfied. Condition (i), which simply states that the counting of events begins at time t = 0, and condition (ii) can usually be directly verified from our knowledge of the process. However, it is not at all clear how we would determine that condition (iii) is satisfied, and for this reason an equivalent definition of a Poisson process would be useful. As a prelude to giving a second definition of a Poisson process, we shall define the concept of a function f being o(h). Definition The function f is said to be o(h) if lim f(h) = o. h-O h We are now in a position to give an alternative definition of a Poisson process. Definition 2.1.2 The counting process {N(t), t ~ o} is said to be a Poisson process with rate A, A > 0, if (i) N(O) = 0 (ii) The process has stationary and independent increments. (iii) P{N(h) = 1} = All + o(h) (iv) P{N(h) ~ 2} = o(h) THE POISSON PROCESS THEOREM 2.1.1 Definitions 2 I I and 2 I 2 are equivalent Proof We first show that Definition 2 1 2 implies Definition 2 lITo do this let Pn(t) = P{N(t) = n} We derive a differential equation for PIl(t) in the following manner Po(t + h) = P{N(t + h) = O} = P{N(t) = 0, N(t + h) - N(t) = O} = P{N(t) = O}P{N(t + h) - N(t) = O} = Po(t)[1 - Ah + o(h)], 61' where the final two equations follow from Assumption (ii) and the fact that (iii) and (iv) imply that P{N(h) ::: O} = 1 - Ah + o(h) Hence, Po(t + - Po(t) = -APo(t) + Letting h - 0 yields (t) = - APIl(t) or Po(t) = -A, which implies, by integration, log Po(t) = - At + C or Since Po(O) = P{N(O) = O} = 1, we amve at (211) Similarly, for n ;::;: 1, Pn(t + h) = P{N(t + h) = n} = P{N(t) = n,N(t + h) - N(t) = O} + P{N(t) "'" n -1,N(t + h) - N(t) = I} + P{N(t + h) = n, N(t + h) - N(t) ;::;: 2} 62 THE POISSON PROCESS However, by (iv), the last term in the above is o(h), hence, by using (ii), we obtain Thus, Letting h -,) 0, or, equivalently, Hence, (212) Pn(t + h) ;; Pn(I)Po(h) + Pn-I(I)Pdh) + o(h) = (1 Ah)Pn(t) + AhPn-1(t) + o(h) Pn(t +h) - Pn(t);: -AP (t) + AP (I) + o(h) h n n-I h Now by (211) we have when n = 1 or which, since p. (0) = 0, yields To show that Pn(t) = e-A'(At)nlnl, we use mathematical induction and hence first assume it for n - 1 Then by (2 1 2), implying that d ( Arp ( )) = A(At)n-1 dl e n t (n - 1)' ( At) n eMp (I) = --+ C n n' ' THE POISSON PROCESS or, since P I1 (O) ::::; P{N(O) = n} = 0, '. Thus Definition 2 1 2 implies Definition 2 11 We will leave it for the reader to prove the reverse Remark The result that N(t) has a Poisson distnbution is a consequence of the Poisson approximation to the binomial distribution To see this subdivide the interval [0, t] into k equal parts where k is very large (Figure 2 1 1) First we note that the probability of having 2 or more events in any subinterval goes to 0 as k 00 This follows from P{2 or more events in any subinterval} k P{2 or more events in the ith subinterval} 1=1 ko(tlk) =to(tlk) tlk Hence, N(t) will (with a probability going to 1) just equal the number of subintervals in which an event occurs. However, by stationary and independent increments this number will have a binomial distribution with parameters k and p Atlk + o(tlk) Hence by the Poisson approximation to the binomial we see by letting k approach 00 that N(t) will have a Poisson distribution with mean equal to o .! k k [ Ai + ° (i)] = At + [t = At. Figure 2.1.1 kt - k 64 THE POISSON PROCESS 2.2 INTERARRIVAL AND WAITING TIME DISTRIBUTIONS Consider a Poisson process, and let XI denote the time of the first event Further, for n ;:> 1, let Xn denote the time between the (n 1)st and the nth event. The sequence {X n , n ~ 1} is called the sequence of interarrival times We shall now determine the distribution of the Xn To do so we first note that the event {XI > t} takes place if, and only if, no events of the Poisson process occur in the interval [0, t], and thus P{XI > t} = P{N(t) = O} = e- A1 Hence, XI has an exponential distribution with mean 1/ A To obtain the distribution of X 2 condition on XI This gives P{X2 > tlX I s} P{O events in (s,s + t]lX I = s} = P{O events in (s, s + tn (by independent increments) (by stationary increments) Therefore, from the above we conclude that Xl is also an exponential random variable with mean lIA, and furthermore. that X 2 is independent of XI' Re- peating the same argument yields the following PROPOSITION 2.2.1 X n • n 1. 2, are independent identical.ly distributed exponential random variables having mean 11 A Remark The proposition should not surprise us. The assumption of station- ary and independent increments is equivalent to asserting that, at any point in time, the process probabUistically restarts itself That is, the process from any point on is independent of all that has previously occurred (by independent increments), and also has the same distribution as the original process (by stationary increments) In other words, the process has no memory, and hence exponential interarrival times are to be expected. Another quantity of interest is Sn, the arrival time of the nth event, also called the waiting time until the nth event Since n;:> 1, INTERARRIVAL AND WAITING TIME DISTRIBUTIONS 65' it is easy to show, using moment generating functions, that Proposition 2.2.1 implies that Sn has a gamma distribution with parameters nand A. That is, its probability density is (AtY- 1 f(t) = Ae- Ar ,t;;::: O. (n - 1)' The above could also have been derived by noting that the nth event occurs prior or at time t if, and only if, the number of events occurring by time t is at least n. That is, Hence, N(t) n <=> Sn t. P{Sn $ t} = P{N(t) > n} _ -AI (At)! - L.J e -.,-, ;=n J which upon differentiation yields that the density function of Sn is f(t) = - f Ae- AI + f Ae- Ar J. (j-1). _ (At)n-I - Ae (n - 1)! Remark Another way of obtaining the density of Sn is to use the independent increment assumption as follows P{t < Sn < t + dt} = P{N(t) = n - 1,1 event in (t, t + dt)} + o(dt) = P{N(t) = n - I} P{l event in (t, t + dt)} + o (dt) -M(At)n-1 = e Adt + o(dt) (n - I)! which yields, upon dividing by d(t) and then letting it approach 0, that Proposition 2.2.1 also gives us another way of defining a Poisson process. For suppose that we start out with a sequence {X n , n 1} of independent identically distributed exponential random variables each having mean 1/ A. Now let us define a counting process by saying that the nth event of this 66 THE POISSON PROCESS process occurs at time Sn, where The resultant counting process {N(t), t ~ o} will be Poisson with rate A. • 2.3 CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES Suppose we are told that exactly one event of a Poisson process has taken place by time t, and we are asked to determine the distribution of the time at which the event occurred. Since a Poisson process possesses stationary and independent increments, it seems reasonable that each interval in {O, t] of equal length should have the same probability of containing the event In other words, the time of the event should be uniformly distributed over [0, t]. This is easily checked since, for s :=::; t, P{X < IN() = I} = P{X, < s, N(t) = I} 1 st P{N(t) I} _ P{l event in [0, s), ° events In [s, t)} - P{N(t) I} _ P{l event in [0, s)} P{O events in [s, t)} - P{N(t) = I} s =- t This result may be generalized, but before doing so we need to introduce the concept of order statistics Let Y h Y h ...• Y n be n random variables. We say that YO)' Y(2), .. , Y(n) are the order statistics corresponding to Y" Y h ... , Y n if Y(I() is the kth smallest value among Y 1 , •• ,Y n , k = 1, 2, .. ,n. If the Y,'s are independent identically distributed continuous random variables with probability density I, then the joint density of the order statistics Y(I). Y(2) • ... , YIn) is given by n I(YhYh' ,Yn) = n! II/(YI), 1=1 The above follows since (i) (Y(I), Y(2j, .. , Y I n » ~ will equal (YI, Y2,' ., Yn) if (Y 1 , Y 2 , .. , Y n ) is equal to any of the n! permutations of (YI> Y2, ... , y,,), CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES 67 and (ii) the probability density that (Y h Y z , ... , Y n ) is equal to Yi t , Y'2' ... , Y i is /(y, )/(y, ) ... /(y, ) = II; /(y,) when (y, , Y, , .. ,Yi) is a permutation • 12. 12. of (Yl> Y2, .. , Yn). If the Y" i = 1, ... , n, are uniformly distributed over (0, t), then it follows from the above that the joint density function of the order statistics Y(I), Y(2) • . . . , YIn) is n! r' 0< Y' < Y2 < ... < Yn < t. We are now ready for the following useful theorem. THEOREM 2.3.1 Given that N(t) = n, the n arrival times 51, . , 5 n have the same distribution as the order statistics corresponding to n independent random variables uniformly distributed on the interval (0, t). Proof We shall compute the conditional density function of 51,' , 5 n given that N(t) = n So let 0 < tl < t2 < ... < t.+l = t and let h, be small enough so that t, + h, < t'+I, i == 1, ,n Now, 1,2, ,nIN(t)=n} _ P{exactly 1 eventin [tf> I, + h,], i = 1, ,n, no events elsewhere in [0, tn P{N(t) = n} Hence, .. ,nIN(t)=n} n' h, h2 .. ·h. =7' and by letting the h, ---+ 0, we obtain that the conditional density of 51, ,5. given that N(t) n is n' t) =- , • t n ' which completes the proof 68 THE POISSON PROCESS Remark Intuitively, we usually say that under the condition that n events have occurred in (0, t), the times SI, ... , S" at which events occur, considered as unordered random variables, are distributed independently and uniformly in the interval (0, t) EXAMPlE 2.3(A) Suppose that travelers arrive at a train depot in accordance with a Poisson process with rate A If the train departs at time t, let us compute the expected sum of the waiting times of travelers arriving in (0, t). That is, we want E [ ~ ~ ( ; ) (t - Sf)], where SI is the arnval time of the ith traveler Conditioning on N(t) yields [ N(I) E 2:(t 1=1 n] ::::: E [ ~ (t - S,)IN(t) = n ] ::::: nt - E [ ~ S,IN(t) = n ] Now if we let VI> ... , V" denote a set of n independent uniform (0, t) random variables, then Hence, and nt 2' (by Theorem 2 3.1) ( since ~ V(i) = ~ VI) [ N(I) ] nt nt E ~ (t - S,)IN(t) = n = nt -"2 ="2 As an important application of Theorem 23.1 suppose that each event of a Poisson process with rate A is classified as being either a type-lor type-II event, and suppose that the probability of an event being classified as type-I depends on the time at which it occurs. Specifically, suppose that if an event occurs at time s, then, independently of all else, it is classified as being a type-I event with probability P(s) and a type-II event with probability 1 - P(s). By using Theorem 2.3.1 we can prove the following proposition. CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES 69 a , PROPOSITION 2.3.2 . If N,(t) represents the number of type-i events that occur by time t, i = 1, 2, then N 1 (t) and N 2 (t) are independent Poisson random variables having respective means Alp and AI(1 - p), where 1 fl P = - P(s) ds t 0 Proof We compute the joint distribution of N 1 (t) and N 2 (t) by conditioning on N(t) '" = 2: P{N.(t) = n, N 2 (t) = mIN(t) = k}P{N(t) = k} k;O = P{N.(t) = n, N 2 (t) = mIN(t) = n + m}P{N(t) = n + m} Now consider an arbitrary event that occurred in the interval [0, t] If it had occurred at time s, then the probability that it would be a type-I event would be P(s) Hence, since by Theorem 2 3 1 this event will have occurred at some time uniformly distributed on (0, t), it follows that the probability that it will be a type-I event is 1 fl p=- P(s)ds t u independently of the other events Hence, P{N.(t) = n, N 2 (t) = mIN(t) = n + m} will just equal the probability of n successes and m failures in n + m independent trials when p is the probability of success on each trial. That is, ( n+m) P{N.(t) = n, N 2 (t) = mIN(t) = n + m} = n p"(1 - p)m Consequently, (n + m)' (AI)"+m P{N (t) = n N (t) = m} = p"(1 - p )me-M ----'---'----- • ,2 n'm' (n+m)' = e- Mp (Alp )" e-M(.-p) (AI(1 - P ))m n' m" which completes the proof The importance of the above proposition is illustrated by the following example 70 THE POISSON PROCESS ExAMPLE 2.3(B) The Infinite Server Poisson Queue. Suppose that customers arrive at a service station in accordance with a Poisson process with rate A. Upon amval the customer is immediately served by one of an infinite number of possible servers, and the service times are assumed to be independent with a common distri- bution G To compute the joint distribution of the number of customers that have completed their service and the number that are in service at t, call an entering customer a type-I customer if it completes its service by time t and a type-II customer if it does not complete its service by time t. Now, if the customer enters at time s, s:S t, then it will be a type-I customer if its service time is less than t - s, and since the service time distribution is G, the probability of this will be G(t - s). Hence, P(s) == G(t - s), S <: t, and thus from Proposition 2.3.2 we obtain that the distribution of NI (t)-the number of customers that have completed service by time t-is Poisson with mean E[N1(t)] == A I ~ G(t - s) ds = A I ~ G(y) dy. Similarly N 2 (t), the number of customers being served at time t, is Poisson distributed with mean E[N 2 (t)] = A I ~ G (y) dy. Further NI (t) and N 2 (t) are independent The following example further illustrates the use of Theorem 23.1. ExAMPLE 2.3(c) Suppose that a device is subject to shocks that occur in accordance with a Poisson process having rate A. The ith shock gives rise to a damage D,. The D" i ;::= 1, are assumed to be independent and identically distributed and also to be independent of {N(t), t ;::= O}, where N(t) denotes the number of shocks in [0, t]. < The damage due to a shock is assumed to decrease exponentially in time. That is, if a shock has an initial damage D, then a time t later its damage is De- al • If we suppose that the damages are additive, then D(t), the damage at t, can be expressed as N(I) D(t) == L D,e-a(I-S,), ,=1 CONDITIONAL DISTRIBUTION Of THE ARRIVAL TIMES where S, represents the arrival time of the ith shock. We can determine E[D(t)] as follows: [ NIl) E[D(t)IN(t) ::: n] ::: E D,e-a(l-s,ll N(t) n = 2: E[D,e-a(I-S,lIN(t) = n] n = 2: E[DtIN(t) = n]E[e-a(I-S,)IN(t) = n] 1=1 n = E[D] 2: E[e-O(I-S,)IN(t) = n] 1=1 = E[D]E n] = E[D]e-aIE [t eaS'IN(t) n l Now, letting VI, . .. , Vn be independent and identically distributed uniform [0, t] random variables, then, by Theorem 2.3.1, Hence, E n] ::: E E [t e OU ,] nJ' =- et.Udx t 0 n = - (eal 1). at E[D(t)IN(t)] = N(t) (1 - e-al)E[D] at and, taking expectations, E[D(t)] = AE[D] (1 - e- al ). a 71 72 THE POISSON PROCESS Remark Another approach to obtaining E[D(t)] is to break up the interval (0, t) into nonoverlapping intervals of length h and then add the contribution at time t of shocks originating in these intervals More specifically, let h be given and define X, as the sum of the damages at time t of all shocks arriving in the interval I, == (ih, (i + l)h), i = 0, 1" ., [tlh], where [a] denotes the largest integer less than or equal to a. Then we have the representation II/hi D(t) = LX" 1=0 and so II/hI E[D(t)] = L E[XJ , ~ O To compute E(X,] condition on whether or not a shock arrives in the interval I" This yields [I/hl E[D(t)] L (AhE[De-a(/-L,)] + o(h», ,=0 where LI is the arrival time of the shock in the interval I" Hence, (2.3.1) [ [I/h l ] [] E[D(t)] = AE[D]E ~ he-a(r-L,) + * o(h) But, since L, E I,. it follows upon letting h ---+ 0 that II/hI I' 1 - e- aJ L he-a(f-L,) ---+ e-a(t-y) dy = --- , ~ O 0 a and thus from (2.3 1) upon letting h ---+ 0 E[D(t)] = AE[D] (1 - e- aJ ). a It is worth noting that the above is a more rigorous version of the following argument: Since a shock occurs in the interval (y, y + dy) with probability A dy and since its damage at time t will equal e-a!l-y) times its initial damage, it follows that the expected damage at t from shocks originating in (y, y + dy) is CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES and so E[D(t)] = AE[D] t e-a(t-y) dy = AE[D] (1 _ e-a/). a 2.3.1 The MIGII Busy Period 73 Consider the queueing system, known as MIG/l, in which customers arrive in accordance with a Poisson process with rate A Upon arrival they either enter service if the server is free or else they join the queue. The successive service times are independent and identically distributed according to G, and are also independent of the arrival process When an arrival finds the server free, we say that a busy period begins It ends when there are no longer any customers in the system We would like to compute the distribution of the length of a busy period. Suppose that a busy period has just begun at some time, which we shall designate as time ° Let Sk denote the time until k additional customers have arrived. (Thus, for instance, S k has a gamma distribution with parameters k, A ) Also let Y 1 , Y 2 , • denote the sequence of service times Now the busy period will last a time t and will consist of n services if, and only if, (i) Sk <: Y 1 + ... + Yk, k = 1, .. , n - 1 (ii) Y 1 + ... + Y n = t. (iii) There are n - 1 arrivals in (0, t). Equation (i) is necessary for, if Sk > Y 1 + ... + Yk. then the kth arrival after the initial customer will find the system empty of customers and thus the busy period would have ended prior to k + 1 (and thus prior to n) services The reasoning behind (ii) and (iii) is straightforward and left to the reader Hence, reasoning heuristically (by treating densities as if they were proba- bilities) we see from the above that (2.3.2) P{busy period is of length t and consists of n services} = P{Y 1 + .. + Y n = t, n - 1 arrivals in (0, t), Sk ~ Y 1 + ... + Yk, k = 1, ... ,n-l} ,n - lin - 1 arrivals in (0, t), Y 1 + .. + Y n = t} X P{n - 1 arrivals in (0, t), Y 1 + ... + Y n = t} 74 THE POISSON PROCESS Now the arrival process is independent of the service times and thus (233) P{n - 1 arrivals in (0, t), Y 1 + + Y n = t} _ -Ai (At)n-I - e (n _ 1)' dGn(t), where G n is the n-fold convolution of G with itself In addition, we have from Theorem 23 1 that, given n - 1 arnvals in (0, t), the ordered arnval times are distributed as the ordered values of a set of n - 1 independent uniform (0, t) random variables Hence using this fact along with (23.3) and (2.3.2) yields (234) P{busy period is of length t and consists of n services} ( At)n-I = e- M dG (t) (n - 1)' n X P{ 1'k ~ Y 1 + .. + Y k , k = 1, , n - ] I Y 1 + .. + Y n = t}, where 1'1, , 1'n-1 are independent of {Y 1 , • , Y n } and represent the ordered values of a set of n - 1 uniform (0, t) random vanables. To compute the remaining probability in (2.3 4) we need some lemmas Lemma 2 3 3 is elementary and its proof is left as an exercise. Lemma 2.3.3 Let Y 1 , Y 2 , , Y" be independent and identically distributed nonnegative random variables Then k = 1, ,n Lemma 2.3.4 Let T., , Tn denote the ordered values from a set of n independent uniform (0, t) random variables Let Y 1 , Y 2 , be independent and identically distributed nonnega- tive random variables that are also independent of {Tio , Tn} Then (235) O<y<t otherwise CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES 75 Proof The proof is by induction on n When n = 1 we must compute P{ Y 1 < 'TJIYl y} when 'T\ is uniform (0, t) But O<y<t So assume the lemma when n is replaced by n - 1 and now consider the n case Since the result is obvious for y 2::: t, suppose that y < t To make use of the induction hypothesis we will compute the left-hand side of (2.3 5) by conditioning on the values of Y 1 + + Y,,-I and 'Tn and then using the fact that conditional on 'Tn = U, 'Tit , 'Tn-l are distributed as the order statistics from a set of n - 1 uniform (0, u) random vanables (see Problem 218) Doing so, we have for s < y (236) P{Y 1 + ·+Yk<'Tk.k 1, ,nIY\+ ·-+Yn-l=s,'Tn=u,Y 1 +· + Y" = y} { P{YI+ ·+Yk<'Tt,k:l, .. ,n-lIY 1 +' '+Yn-1=s} = 0 ify2:::U, y<U where 'Tt, , 'T:_\ are the ordered values from a set of n - 1 independent uniform (0, u) random van abIes Hence by the induction hypothesis we see that the right-hand side of (2 36) is equal to Hence, for y < u, 1 and thus, for y < u. { I - stu RHS= o y<U otherwise +Yk<'Tbk=1.. ,nl'T"=u,Y!+ Y 1 + + Y"-J I Y 'T" ;;; U, 1 + . 'T" .. + Y" y} + Y" y] + Y" = y] n ly =1---- n u' 76 THE POISSON PROCESS where we have made use of Lemma 23 3 in the above Taking expectations once more yields (2.37) P{Y l + " + Y k < Tk,k =: 1,. ,n1Y 1 + ., + Y n :::: y} =- E [ 1 n n 1 Tn I y < Tn] P{y < Tn} [ ~ I y < Tn] P{y < Tn} Now the distribution of Tn is given by P{Tn <x} = P {max UI <x} 1<IS" P{U 1 <x.i = 1. ,n} = (xlt)n, O<x<t, where Un i = 1, ,., n are independent uniform (0, t) random variables Hence its density is given by n (x)n-l /.Jx)=-; -; , O<x<t and thus (238) _n (tn-I yn-I) n - 1 t n Thus from (237) and (23.8), when y < t, P{Y 1 +" + Y k < Thk 1" ,n1Y 1 + .. , + Y n ;::: y} = 1 (7Y y(tn-I t: yn-l) =1- l t and the proof is complete We need one additional lemma before going back to the busy period problem. CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES 77 Lemma 2.3.5 Let 1'1> • , 1'n-l denote the ordered values from a set of n - 1 independent uniform (0, t) random variables, and let Y lt Y 2 , •• be independent and identically distnbuted nonnegative random variables that are also independent of {1'), , 1'n-l} Then P{Y 1 + .. + Y k < 1'k. k = 1, , n 11 Y, + ... + Y n = t} ::;; lin Proof To compute the above probability we will make use of Lemma 2.3 4 by condi- tioning on Y 1 + .. + Y n - 1 That is, by Lemma 234, P{Y 1 +· +Y k <1'k,k 1, ,n-lIY 1 + +Yn-j=y,Yj +· ·+Yn=t} =P{Yj +' '+Yk<1'bk 1, .,n lIY J +···+Y n - J =y} O<y<t otherwise Hence, as Yj + ... + Y n - 1 :s; Y 1 + .. + Y n , we have that P{Yj + '+Yk<1'bk I, .,n-lIY\+···+Yn=t} =E[l-Yl+' ;+Yn'1I Y1 +' +Yn=t] n-l ""'1--- n which proves the result (by Lemma 2 3 3), Returning to the joint distribution of the length of a busy penod and the number of customers served, we must, from (2.3.4), compute Now since t U will also be a uniform (0, t) random variable whenever U is, it follows that TI, ..• , Tn-l has the same joint distribution as t - Til-I> ••• , t - Tl' Hence, upon replacing Tk by t T,,-k throughout, I:$; k <:: n I, we obtain P{Tk <: Y 1 + ... + Y .. k 1, ... , n l1Y. + ... + Y II = t} = P{t - Til-ie <:: Y. + .. + Y Ie , k 1, .. , n 11 Y 1 + ... + Y" = t} = P{t Tn-k:$; t (Y h1 + ... + Y II ), k 1, ... ,n 11 Y 1 + ... + Y" = t} "" P{T"-k:::: Y k + 1 + ... + Y", k = I, .. , n l1Y 1 + ... + Y" = t} = P{T,,-k> Y n - k +··· + Y1,k 1, ... ,n l1Y 1 + ... + Y,,==t}, 78 THE POISSON PROCESS where the last equality follows since Y 1 , • , Y n has the same joint distribution as Y n , •.• , Y 1 and so any probability statement about the Y/s remains valid if Y 1 is replaced by Yn , Y 2 by Y n - I , •• , Y k by Y n - k +". ,Y n by Yj Hence, we see that ,n-lIY 1 +"'+Y n t} = P{ Tk > Y 1 + ... + Y k , k = 1. . • n - 11 Y 1 + .. + Y n = t} = lin (from Lemma 2.3.5). Hence, from (2 3.4), if we let B(t, n) = P{busy period is of length t, n customers served in a busy period}, then !i B(t n) = e-A1 (M)n-I dG (t) dt' n! n or J I (M)n-I B(t, n) = 0 e- AJ n' dGn(t). The distribution of the length of a busy period, call it B(t) = 2:"1 B(t, n), is thus given by 2.4 NONHOMOGENEOUS POISSON PROCESS In this section we generalize the Poisson process by allowing the amval rate at time t to be a function of t. Definition 2.4.1 The counting process {N(t), t 2! O} is said to be a nonstationary or nonhomogeneous Poisson process with intensity function A(t), t 2! 0 if (i) N(O) = 0 (ii) {N(t). t 2! O} has independent increments NONHOMOGENEOUS POISSON PROCESS 79 (iii) P{N(t + h) - N(t) ~ 2} = o(h) (il') P{N(t + h) - N(t) = 1} '= A(t)h + o(h) If we let met) = t A(s) ds, then it can be shown that (2.41) P{N(t + s) - N(t) = n} exp{ -(met + s) - m(t»}[m(t + s) - m(t)Jnln l , n:> 0 That is, N(t + s) - N(t) is Poisson distributed with mean met + s) - met). The proof of (2.4 1) follows along the lines of the proof of Theorem 2.1.1 with a slight modification Fix t and define Then, Pn(s) = P{N(t + s) - N(t) n} Po(s + h) P{N(t + s + h) - N(t) = O} P{O events in (t, t + s), 0 events in (t + s, t + s + h)} P{O events in (t, t + s)}P{O events in (I + s, t + S + h)} = Po(s)[l - A(t + s)h + o(h»). where the next-to-Iast equality follows from Axiom (ii) and the last from Axioms (iii) and (iv). Hence, Po(s + hh - Po(s) = -A(t + s)Po(s) + o ~ h ) . Letting h ---10> 0 yields PHs) = - A(t + s)Po(s) or log Po(s) = - f: A(t + u) du 80 THE POISSON PROCESS or Po(s) = e-1m(I+S)-m(I)1 The remainder of the verification of (24.1) follows similarly and is left as an exercise The importance of the nonhomogeneous Poisson process resides in the fact that we no longer require stationary increments, and so we allow for the possibility that events may be more likely to occur at certain times than at other times When the intensity function A(t) is bounded, we can think of the nonhomo- geneous process as being a random sample from a homogeneous Poisson process. Specifically. let A be such that A(t) A for all t 0 and consider a Poisson process with rate A Now if we suppose that an event of the Poisson process that occurs at time t is counted with probability A(t)1 A, then the process of counted events is a nonhomogeneous Poisson process with intensity function A(t) This last statement easily follows from Definition 24 1 For instance (i), (ii), and (iii) follow since they are also true for the homogeneous Poisson process Axiom (iv) follows since P{one counted event in (t, t + h)} = P{one event in (t, t + h)} + o(h) = Ah A(t) + o(h) A = A(t)h + o(h) The interpretation of a nonhomogeneous Poisson process as a sampling from a homogeneous one also gives us another way of understanding Proposition 232 (or, equivalently, it gives us another way of proving that N(t) of Proposi- tion 2.3 2 is Poisson distributed) EXAMPLE 2.4(A) Record Values. Let Xl, X 2 , denote a sequence of independent and identically distributed nonnegative continuous I random whose hal:ard rate function is given by A(t). (That is, A(t) = f(t)1 F(t), where f and F are respectively the density and distribution function of X) We say that a record occurs at time n if Xn > max(X l , , X n - l ), where Xo = 0 If a record occurs at time n, then Xn is called a record value. Let N(t) denote the number of record values less than or equal to t That is, N(t) is a counting process of events where an event is said to occur at time x if x is a record value NONHOMOGENEOUS POISSON PROCESS We claim that {N(t), t ~ O} will be a nonhomogeneous Poisson process with intensity function A(t) To verify this claim note that there will be a record value between t and t + h if, and only if, the first X, whose value is greater than t lies between t and t + h But we have (conditioning on which X, this is, say, i = n), by the definition of a hazard rate function, P{Xn E (t, t + h) I Xn > t} = A(t)h + o(h), which proves the claim. EXAMPLE 2.4(B) The Output Process of an Infinite Server Poisson Queue (MIG/oo). It turns out that the output process of the M/ G/oo queue-that is, of the infinite server queue having Poisson arrivals and general service distribution G-is a nonhomogeneous Poisson process having intensity function A(t) = AG(t). To prove this we shall first argue that (1) the number of departures in (s, s + t) is Poisson distributed with mean A f:+ 1 G(y) dy, and (2) the numbers of departures in disjoint time intervals are inde- pendent. To prove statement (1), call an arnval type I if it departs in the interval (s, s + t) Then an arrival at y will be type I with probability { G(S + t - y) - G(s - y) P(y) = G(s + t - y) o if y < s if s < y < s + t if y > s + t. Hence, from Proposition 2.3.2 the number of such departures will be Poisson distributed with mean A I ~ P(y) dy = A I ~ (G(s + t - y) - G(s - y)) dy + A £:+1 G(s + t - y) dy = A £:+1 G(y) dy. To prove statement (2) let II and 12 denote disjoint time intervals and call an arrival type I if it departs in II, call it type II if it departs in 1 2 , and call it type III otherwise. Again, from Proposition 2.3 2, or, more precisely, from its generahzation to three types of custom- ers, it follows that the number of departures in II and 12 (that is, the number of type-I and type-II arrivals) are independent Poisson random variables. 81 82 THE POISSON PROCESS Using statements (1) and (2) it is a simple matter to verify all the axiomatic requirements for the departure process to be a nonhomogeneous Poisson process (it is much like showing that Definition 2.1.1 of a Poisson process implies Definition 2 1.2) Since A(t) ....... A as t ....... 00, it is interesting to note that the limiting output process after time t (as t ....... 00) is a Poisson process with rate A 2.5 COMPOUND POISSON RAN'nOM VARIABLES " AND PROCESSES Let XI' X 2 , ••• be a sequence of independent and identically distributed random variables having distribution function F, and suppose that this se- quence is independent of N, a Poisson random vanable with mean A The random variable is said to be a compound Poisson random variable with Poisson parameter A and component distribution F. The moment generating function of W is obtained by conditioning on N. This gives 0; E[e 'W ] = 2: E[e'WI N = n]P{N == n} "=0 ex> = 2: E[e/(X , + +Xnl I N = n]e-A/{At)"lnl " ~ O ex> (2.5.1 ) = 2: E[e/(X , + +Xnl]e-At{At)"lnI n=O ex> (25.2) = 2: E[e'X']"e-A/{At)"ln! n=O where (2.5.1) follows from the independence of {XI, X 2 , • •• } and N, and (2.5.2) follows from the independence of the X, Hence, letting denote the moment generating function of the X" we have from (2.5.2) that 0; E [e 'W ] = 2: [4>x{t)]"e- A/ {At)"ln l n ~ O (2.5.3) = exp{At{4>x{t) - 1)] COMPOUND POISSON RANDOM VARIABLES AND PROCESSES 83 . It is easily shown, either by differentiating (25.3) or by directly using a conditioning argument, that where X has distribution F. E[W] = AE[X] Var(W) = AE [X2] EXAMPLE 2.5(A) Aside from the way in which they are defined, compound Poisson random variables often arise in the following manner Suppose that events are occurring in accordance with a Poisson process having rate (say) a, and that whenever an event occurs a certain contribution results Specifically, suppose that an event occurring at time s will, independent of the past, result in a contnbution whose value is a random variable with distribution Fs. Let W denote the sum of the contributions up to time t-that is, where N(t) is the number of events occurring by time t, and Xi is the contribution made when event; occurs. Then, even though the XI are neither independent nor identically distributed it follows that W is a compound Poisson random variable with parameters A = at and 1 II F(x) = - Fs(x) ds. t 0 This can be shown by calculating the distribution of W by first conditioning on N(t), and then using the result that, given N(t), the unordered set of N(t) event times are independent uniform (0, t) random variables (see Section 2.3). When F is a discrete probability distribution function there is an interesting representation of W as a linear combination of independent Poisson random variables. Suppose that the XI are discrete random variables such that k P{XI = j} = p"j = 1, .. , k, L p, = 1. , ~ I If we let N, denote the number of the X/s that are equal to j, j = 1, ... , k, then we can express W as (25.4) 84 THE POISSON PROCESS where, using the results of Example 1.5(1), the are independent Poisson random variables with respective means APi' j = 1, . ,k As a check, let us use the representation (2.5.4) to compute the mean and variance of W. E[W] = = 2,jAp, = AE[X] , Var(W) = = 2,FAp, = AE[X2] , , which check with our previous results. 2.5.1 A Compound Poisson Identity N As before, let W = Xr be a compound Poisson random variable with N 1=1 being Poisson with mean A and the X, having distribution F. We now present a useful identity concerning W. PROPOSITION 2.5.2 Let X be a random variable having distribution F that i<; independent of W Then, for any function hex) Proof E[Wh(W)l = AE[Xh(W + X)l E[Wh(W)l = ± E[Wh(W) I N = nle-.I A: n=O n = e-.I E XI)] = e-.I E XI)] = ± e-.I A: nE [Xn h (f XI)] n-G n 1=1 where the preceding equation fo II OW<; because all of the random vanables n X,h have the same distribution Hence, from the above we obtain, upon condi- 1= I COMPOUND POISSON RANDOM VARIABLES AND PROCESSES tioning on Xn , that E[Wh(W)l = e- A (n 1)' f E [ Xn h I Xn = x] dF(x) = A e- A (n f xE XI + x)] dF(x) = A f x e- A E [ XI + x) ] dF(x) = A f xi E[h(W+ x) IN = mlP{N= m}dF(x) m=O = A f xE[h(W + x)l dF(x) = A f E[Xh(W + X) I X = xl dF(x) = AE[Xh(W + X)l Proposition 2.5.2 gives an easy way of computing the moments of W. Corollary 2.5.3 If X has distribution F, then for any po,>itive integer n Proof Let hex) = x n - I and apply Proposition 252 to obtain E[wnl = AE[X(W + x)n-Il 85 86 THE POISSON PROCESS Thus. by starting with n = 1 and successively increasing the value of n. we see that and so on E[W] = AE[X] E[W2] = A(E[X2] + E[W]E[X]) = AE[X2] + A2(E[X])2 E[WJ] = A(E[XJ] + 2E[W]E[X2] + E[W2]E[X]) = AE[XJ] + 3A2E[X]E[X2] + AJ(E[X])J We will now show that Proposition 2.5 2 yields an elegant recursive formula for the probability mass function of W when the X, are positive integer valued random variables. Suppose this is the situation, and let a J = P{X, = j}, j >- 1 and P J = P{W = j}. The successive values of P n can be obtained by making use of the following Corollary 2.5.4 Proof That PI! = e-;' is immediate, so take n > 0 Let hex) = {O lIn ifx# n if x = n Since WheW) = I{W = n}. which is defined to equal 1 if W = nand 0 otherwise. we obtain upon applying Proposition 252 that P{W = n} = AE[Xh(W + X)] = A2: E[Xh(W+ X)lx= na J J = A 2:jE[h(W + j)]a J J = A " j-l P{W + j = n}a L.J n J J COMPOUND POISSON R.-:-NDOM VARIABLES AND PROCESSES Remark When the X, are identically equal to 1, the preceding recursion reduces to the well-known identity for Poisson probabilities P{N = O} = e- A A P{N = n} = - P{N = n - I}, n ExAMPLE 2.5(a) Let W be a compound Poisson random variable with Poisson parameter A = 4 and with p{X, = i} = 114, i = 1,2, 3, 4. To determine P{W = 5}, we use the recursion of Corollary 2.5.4 as follows: Po = e- A = e- 4 p. = Aa.P o = e- 4 P 2 = + 2a2PO} = P 3 = {a. P 2 + 2a2Pl + 3a3PO} = 1: e-4 P4 = {a. P3 + 2a2P2 + 3a3P. + 4a4PO} = e- 4 A 501 -4 Ps = 5" {a. P4 + 2a2P3 + 3a3P2 + 4a4P. + 5asPo} = 120 e 2.5.2 Compound Poisson Processes A stochastic process {X(t), t O} is said to be a compound Poisson process if it can be represented, for t 0, by N(,) X(t) = 2: X, ,=. where {N(t), t > O} is a Poisson process, and {X" i = 1, 2, ... } is a family of independent and identically distributed random variables that is independent of the process {N(t), t O}. Thus, if {X(t), t O} is a compound Poisson process then X(t) is a compound Poisson random variable. As an example of a compound Poisson process, suppose that customers arrive at a store at a Poisson rate A. Suppose, also, that the amounts of money spent by each customer form a set of independent and identically distributed random variables that is independent of the arrival process. If X(t) denotes 88 THE POISSON PROCESS the total amount spent in the store by all customers arriving by time t, then {X(t), t :> O} is a compound Poisson process. 2.6 CONDITIONAL POISSON PROCESSES Let A be a positive random variable having distribution G and let {N(t), t :> O} be a counting process such that, given that A =:::: A, {N(t), t :> O} is a Poisson process having rate A. Thus, for instance, P{N(t + s) N(s) n} The process {N(t), t 2!: O} is called a conditional Poisson process since, condi- tional on the event that A = A, it is a Poisson process with rate A. It should be noted, however, that {N(t), t :> O} is not a Poisson process. For instance, whereas it does have stationary increments, it does not have independent ones (Why not?) Let us compute the conditional distribution of A given that N(t) = n. For dA small, P{A E (A, A + dA) I N(t) n} _ P{N(t) n I A E (A, A + dA)}P{A E (A, A + dA)} e-Ar (At)n dG(A) n! = - - ~ - : - - - - J '" e-AI (At)n dG(A) o n! P{N(t) = n} and so the conditional distribution of A, given that N(t) = n, is given by ExAMPLE 2.6(A) Suppose that, depending on factors not at present understood, the average rate at which seismic shocks occur in a certain region over a given season is either AI or A 2 . Suppose also that it is AI for 100 p percent of the seasons and A2 the remaining time. A simple model for such a situation would be to suppose that {N(t), 0 <: t < oo} is a conditional Poisson process such that A is either AI or A2 with respective prObabilities p and 1 - p. Given n shocks in the first t time units of a season, then the probability pROBLEMS it is a AI season is P{A = All N(t) = n} Also, by conditioning on whether A::::: AI or A = A 2 , we see that the time from t until the next shock, given N(t) = n, has the distribution P{time from t until next shock is x I N(t) n} p(1 - e- At .()e- A /(A 1 t)1I + (1 e-Al.()e-All(A2t)II(1 p) pe- A11 (Alt)1I + p) PROBLEMS 89 2.1. Show that Definition 2.1.1 of a Poisson process implies Definition 2.1.2. 2.2. For another approach to proving that Definition 2.1.2 implies Defini- tion 2.1.1' (a) Prove, using Definition 2.1.2. that Po(t + s) Po(t)Po(s) (b) Use (a)to inferthatthe interarnvaltimesX I • X 2 , • •• are independent exponential random variables with rate A. (c) Use (b) to show that N(t) is Poisson distributed with mean At. 2.3. For a Poisson process show, for s < t, that P{N(s) = k I N(t) = n} = (:) 1 t ' k 0,1, ... ,no 2.4. Let {N(t), t > O} be a Poisson process with rate A. Calculate E[N(t) . N(t + s)] 2.5. Suppose that {N 1 (t), t 2: O} and {N 2 (t), t 2: O} are independent Poisson processes with rates Al and A 2 • Show that {N1(t) + N 2 (t), t > O} is a Poisson process with rate AI + A 2 • Also, show that the probability that the first event of the combined process comes from {N 1 (t), t 2: O} is AI/(AI + A 2 ). independently of the time of the event. 2.6. A machine needs two types of components in order to function. We have a stockpile of n type-l components and m type-2 components. 90 TH E POISSON PROCESS Type-i components last for an exponential time with rate ILl before failing. Compute the mean length of time the machine is operative if a failed component is replaced by one of the same type from the stockpile; that is, compute XI! Y,)], where the X,(Y,) are exponential with rate ILl (IL2). 2.7. Compute the joint distribution of 51, 52, 53. 2.8. Generating a Poisson Random Variable. Let VI! V 2 , ••• be independent uniform (0, 1) random variables. (a) If X, = (-log V,)/A, show that X, is exponentially distnbuted with rate A. (b) Use part (a) to show that N is Poisson distributed with mean A when N is defined to equal that value of n such that n n+1 II V, e-.1 > II V" where V, 1. Compare with Problem 1 21 of Chapter 1. 2.9. Suppose that events occur according to a Poisson process with rate A. Each time an event occurs we must decide whether or not to stop, with our objective being to stop at the last event to occur prior to some specified time T. That is, if an event occurs at time t, ° $ t -< T and we decide to stop, then we lose if there are any events in the interval (t, T], and win otherwise If we do not stop when an event occurs, and no additional events occur by time T, then we also lose Consider the strategy that stops at the first event that occurs after some specified time s, ° -< s T. (a) If the preceding strategy is employed, what is the probability of winning? (b) What value of s maximizes the probability of winning? (c) Show that the probability of winning under the optimal strategy is lie. 2.10. Buses arrive at a certain stop according to a Poisson process with rate A. If you take the bus from that stop then it takes a time R, measured from the time at which you enter the bus, to arrive home. If you walk from the bus stop then it takes a time W to arrive home. Suppose that your policy when arriving at the bus stop is to wait up to a time s, and if a bus has not yet arrived by that time then you walk home. (a) Compute the expected time from when you arrive at the bus stop until you reach home. PROBLEMS 91 (b) Show that if W < 11 A + R then the expected time of part (a) is minimized by letting s = 0; if W > 1/ A + R then it is minimized by letting s = 00 (that is, you continue to wait for the bus), and when W 1/ A + R all values of s give the same expected time. (c) Give an intuitive explanation of why we need only consider the cases s = 0 and s == 00 when minimizing the expected time. 2.1L Cars pass a certain street location according to a Poisson process with rate A A person wanting to cross the street at that location waits until she can see that no cars will come by in the next T time units. Find the expected time that the person waits before starting to cross (Note, for instance, that if no cars will be passing in the first T time units then the waiting time is 0) 2. U. Events, occurring according to a Poisson process with rate A, are regis- tered by a counter. However, each time an event is registered the counter becomes inoperative for the next b units of time and does not register any n e ~ events that might occur during that interval Let R(t) denote the number of events that occur by time t and are registered (a) Find the probability that the first k events are all registered. , (b) For t :> (n - 1 )b, find P{R(t) :> n} 2.13. Suppose that shocks occur according to a Poisson process with rate A, and suppose that each shock, independently. causes the system to fail with probability p Let N denote the number of shocks that it takes for the system to fail and let T denote the time of failure Find P{N nl T = t} 2.14. Consider an elevator that starts in the basement and travels upward. Let Nt denote the number of people that get in the elevator at floor i. Assume the N, are independent and that Nt is Poisson with mean A, Each person entering at i will, independent of everything else, get off at j with probability Plj' ~ j > ' P lj = 1. Let OJ number of people getting off the elevator at floor j (a) Compute E[OJ (b) What is the distribution of OJ? (c) What is the joint distribution of OJ and Ok? 2.15. Consider an r-sided coin and suppose that on each flip of the coin exactly one of the sides appears' side i with probability PI' ~ ~ PI = 1 For given numbers n'l, . , n" let N, denote the number of flips required until side i has appeared for the n l time, i 1,. , r, and let 92 THE POISSON PROCESS Thus N is the number of flips required until side i has appeared n, times for some i = 1, , r (a) What is the distribution of N,? (b) Are the N, independent? Now suppose that the flips are performed at random times generated by a Poisson process with rate A = 1 Let T, denote the time until side i has appeared for the n, time, i = 1, , r and let T= min TI 1=1. r (c) What is the distribution of TI? '(d) Are the 11 independent? (e) Derive an expression for E[T] (f) Use (e) to derive an expression for E[Nl 2.16. The number of trials to be performed is a Poisson random variable with mean A Each trial has n possible outcomes and, independent of everything else, results in outcome number i with probability Pi. ~ ~ P, 1 Let XI denote the number of outcomes that occur exactly j times, j 0,1,. Compute E[X;l, Var(X;). 2.17. Let XI, X 2 , • , Xn be independent continuous random variables with common density function! Let X It ) denote the ith smallest of Xl, .. , X n • (a) Note that in order for X(I) to equal x, exactly i - 1 of the X's must be less than x, one must equal x. and the other n i must all be greater than x Using this fact argue that the density function of X(I) is given by (b) X(I) will be less than x if, and only if, how many of the X's are less than x? (c) Use (b) to obtain an expression for P{X lr ) S x} (d) Using (a) and (c) establish the identity for 0 ~ y S 1 PROBLEMS 93 . (e) Let S, denote the time of the ith event of the Poisson process {N(t). t ;> O}. Find E[S,IN(t) = n l ~ { i > n 2.18. Let V(llI , V(n) denote the order statistics of a set of n uniform (0, 1) random variables Show that given V(n) = y, V(I), ,V(n-I)aredistributed as the order statistics of a set of n - 1 uniform (0, y) random variables 2.19. Busloads of customers arrive at an infinite server queue at a Poisson rate A Let G denote the service distribution. A bus contains j customers with probability a"j = 1,. Let X(t) denote the number of customers that have been served by time t (a) E[X(t)] = ?, (b) Is X(t) Poisson distributed? 2.20. Suppose that each event of a Poisson process with rate A is classified as being either of type 1, 2, ,k If the event occurs at s, then, indepen- dently of all else, it is classified as type i with probability P,(s), i = 1, .. , k, ~ ~ P,(s) = 1. Let N,(t) denote the number of type i arrivals in [0, t] Show that the N,(t). i = 1, , k are independent and N,(t) is Poisson distributed with mean A J ~ P,(s) ds 2.21. Individuals enter the system in accordance with a Poisson process having rate A Each arrival independently makes its way through the states of the system Let a,(s) denote the probability that an individual is in state i a time s after it arrived Let N,(t) denote the number of individuals in state i at time t. Show that the N,(t), i ~ 1, are independent and N,(t) is Poisson with mean equal to AE[amount of time an individual is in state i during its first t units in the system] 2.22. Suppose cars enter a one-way infinite highway at a Poisson rate A The ith car to enter chooses a velocity V, and travels at this velocity Assume that the V,'s are independent positive random variables having a common distribution F. Derive the distribution of the number of cars that are located in the interval (a, h) at time t Assume that no time is lost when one car overtakes another car 2.23. For the model of Example 2.3(C), find (a) Var[D(t)]. (b) Cov[D(t), D(t + s)] 94 THE POISSON PROCESS 2.24. Suppose that cars enter a one-way highway of length L in accordance with a Poisson process with rate A Each car travels at a constant speed that is randomly determined, independently from car to car, from the distribution F. When a faster car encounters a slower one, it passes it with no loss of time Suppose that a car enters the highway at time t. Show that as t -+ 00 the speed of the car that minimizes the expected number of encounters with other cars, where we sayan encounter occurs when a car is either passed by or passes another car, is the median of the distribution G 2.25. Suppose that events occur in accordance with a Poisson process with rate A, and that an event occurring at time s, independent of the past, contributes a random amount having distribution FH S :> O. Show that W, the sum of all contributions by time t, is a compound Poisson random N variable That is, show that W has the same distribution as ~ x" where 1",1 the X, are independent and identically distributed random variables and are independent of N, a Poisson random variable Identify the distribution of the X, and the mean of N 2.26. Compute the conditional distribution of S" Sh . , Sn given that Sn = t 2.27. Compute the moment generating function of D(/) in Example 23(C). 2.28. Prove Lemma 2.3 3 2.29. Complete the proof that for a nonhomogeneous Poisson process N(t + s) - N(t) is Poisson with mean met + s) - met) 2.30. Let T " T z , •• denote the interarrival times of events of a nonhomoge- neous Poisson process having intensity function A(t) (a) Are the TI independent? (b) Are the ~ identically distributed? (c) Find the distribution of T, (d) Find the distribution of T2 2.31. Consider a nonhomogeneous Poisson process {N(t), t 2: O}, where A(t) > 0 for all t. Let N*(t) = N(m-I(t)) Show that {N*(t), t 2: O} is a Poisson process with rate A 1 PROBLEMS 95 2.32. (a) Let {N(t), t :> O} be a nonhomogeneous Poisson process with mean value function m(t) Given N(t) = n, show that the unordered set of arrival times has the same distribution as n independent and identically distributed random variables having distribution function { m(x) F(x) = x :5, t x> t (b) Suppose that workers incur accidents in accordance with a nonho- mogeneous Poisson process with mean value function m(t). Suppose further that each injured person is out of work for a random amount of time having distribution F. Let X(t) be the number of workers who are out of work at time t Compute E[X(t)] and Var(X(t)). 2.33. A two-dimensional Poisson is a process of events in the plane such that (i) for any region of area A, the number of events in A is Poisson distributed with mean AA, and (ii) the numbers of events in nonoverlapping regions are independent. Consider a fixed point, and let X denote the distance from that point to its nearest event, where distance is measured in the usual Euclidean manner Show that (a) P{X > t} = e- AlI ,2. (b) E[X] = 1/(20). Let R" i 1 denote the distance from an arbitrary point to the ith closest event to it Show that, with Ro = 0, (c) 1TRf - 1TRf-I' i 1 are independent exponential random variables, each with rate A. 2.34. Repeat Problem 2 25 when the events occur according to a nonhomoge- neous Poisson process with intensity A(t), t 0 2.35. Let {N(t), t O} be a nonhomogeneous Poisson process with intensity function A(t), t 0 However, suppose one starts observing the process at a random time T having distnbution function F. Let N*(t) = N( T + t) - N( T) denote the number of events that occur in the first t time units of observation (a) Does the process {N*(t), t :> O} possess independent increments? I (b) Repeat (a) when {N(t), t O} is a Poisson process. 2.36. Let C denote the number of customers served in an MIG/1 busy pe- riod. Find (a) E[C]. (b) Var(C) 96 THE POISSON PROCESS N(t) 2.37. Let {X(t), t 2: O} be a compound Poisson process with X(t) == and 1=1 suppose that the XI can only assume a finite set of possible values. Argue that, for t large, the distribution of X(t) is approximately normal N(I) 2.38. Let {X(t), t 2: O} be a compound Poisson process with X(t) and 1=1 suppose that A = 1 and P{X 1 == j} == JIIO, j = 1, 2, 3, 4. Calculate P{X(4) 20}, 2.39. Compute Cov(X(s), X(t)) for a compound Poisson process 2.40. Give an example of a counting process {N(t), t:> O} that is not a Poisson process but which has the property that conditional on N(t) = n the first n event times are distributed as the order statistics from a of n independent uniform (0, t) random variables. 2.41. For a conditional Poisson process. (a) Explain why a conditional Poisson process has stationary but not independent increments. (b) Compute the conditional distribution of A given {N(s), 0 < s < t}, the history of the process up to time t, and show that it depends on the history only through N(t). Explain why this is true. (c) Compute the conditional distribution of the time of the first event after t given that N(t) "'" n. (d) Compute I . P{N(h):> I} 1m h-O h (e) Let XI, X 2 " , • denote the interarrival times. Are they independent? Are they identically distributed? 2.42. Consider a conditional Poisson process where the distribution of A is the gamma distribution with parameters m and a' that is, the density is given by 0< A < 00, (a) Show that ( m + n - 1) ( a )m( t )n - -, n a+t a+t n>O. REFERENCES 97 (b) Show that the conditional distribution of A given N(t) = n is again gamma with parameters m + n, a + t. (c) What is lim P{N(t + h) - N(t) = 11 N(t) = n}lh? h_O REFERENCES Reference 3 provides an altern'lte and mathematicaUy easier treatment of the Poisson process Corollary 25.4 was originally obtained in Reference 1 by using a generating function argument Another approach to the Poisson process is given in Reference 2. 1 R M Adelson, "Compound Poisson Distributions," Operations Research Quar- terly, 17,73-75. (1966) 2 E Cinlar, Introduction to Stochastic Processes, Prentice-Hall, Englewood Cliffs, NJ,1975 3 S M Ross, Introduction to Probability Models, 5th ed , Academic Press, Orlando, FL,1993 CHAPTER 3 Renewal Theory \ 3. 1 INTRODUCTION AND PRELIMINARIES In the previous chapter we saw that the interarrival times for the Poisson process are independent and identically distributed exponential random vari- ables. A natural generalization is to consider a counting process for which the interarrival times are independent and identically distributed with an arbitrary distribution Such a counting process is called a renewal process. Formally, let {X n , n = 1,2, .} be a sequence of nonnegative independent random variables with a common distribution F, and to avoid trivialities sup- pose that F(O) = P {Xn = O} < 1 We shall interpret Xn as the time between the (n - 1 )st and nth event. Let p, = EfXn] = f; x dF(x) denote the mean time between successive events and note that from the assumptions that Xn :> 0 and F(O) < 1, it follows that 0 < p, ~ 00. Letting n So = 0, Sn = 2, X" n ~ 1, 1=1 it follows that Sn is the time of the nth event. As the number of events by time t will equal the largest value of n for which the nth event occurs before or at time t, we have that N(t), the number of events by time t, is given by (3 1 1) N(t) = sup{n: Sn < t}. Definition 3.1.1 The counting process {N(t). t ~ O} is called a renewal process 98 DISTRIBUTION OF N(t) 99 We will use the terms events and renewals interchangeably, and so we say that the nth renewal occurs at time Sn Since the interarrival times are independent and identically distributed, it follows that at each renewal the process probabilistically starts over The first question we shall attempt to answer is whether an infinite number of renewals can occur in a finite time To show that it cannot, we note, by the strong law of large numbers, that with probability 1, asn --+ 00 But since J.L > 0, this means that Sn must be going to infinity as n goes to infinity. Thus, Sn can be less than or equal to t for at most a finite number of values of n Hence, by (3 1 1), N(t) must be finite, and we can write N(t) = max{n. Sn ~ t} 3.2 DISTRIBUTION OF N(t) The distribution of N(t) can be obtained, at least in theory, by first noting the important relationship that the number of renewals by time t is greater than or equal to n if. and only if, the nth renewal occurs before or at time t That is, (321) N(t) ~ n ¢::> Sn ~ t. From (3.2 1) we obtain (322) P{N(t) = n} = P{N(t) ~ n} - P{N(t) ~ n + 1} = P{Sn -< t} - P{Sn+l ~ t} Now since the rapdom variables X,. i ~ 1, are independent and have a common distribution F. it follows that Sn = ~ ~ = I X, is distributed as F n , the n-fold convolution of F with itself Therefore, from (322) we obtain Let m(t) = ErN(t)]. m(t) is called the renewal function, and much of renewal theory is concerned with determining its properties The relationship between m(t) and F is given by the following proposition 100 PROPOSITION 3.2.1 (323) Proof where I = {I n 0 Hence, " met) = L Fn(t) n':!!l " N(t) = Lin. n=1 if the nth renewal occurred in lO, t] otherwise ElN(t)] = £ [ ~ In] a: = L P{l n = I} n ~ 1 " = L P{Sn s t} T-I RENEWAL THEORy where the interchange of expectation and summation is justified by the nonnegativity of the In The next proposition shows that N(t) has finite expectation. PROPOSITION 3.2.2 met) < 00 for all 0 s t < 00 Proof Since P{Xn = O} < 1, it follows by the continuity property of probabilities that there exists an a > 0 such that P{Xn ~ a} > 0 Now define a related renewal process SOME LIMIT THEOREMS {X"' n I} by if X" <a 101 and let N(t) = sup{n XI + . + X" ::.;;; t} Then it is easy to see that for the related process, renewals can"'only take place at times t = na, n = 0, 1,2, . ,and also the number of renewals at each of these times are independent geometric random variables with mean Thus, - tfa + 1 E[N(t)]::';;; P{X" 2: a} < 00 and the result follows since X" ::.;;; X" implies that N(t) N(t) Remark The above proof also shows that E[N'(t)] < 00 for all t::::: 0, , ;;;:: 0 3.3 SOME LIMIT THEOREMS If we let N( 00) lim r .... '" N(t) denote the total number of renewals that occurs, then it is easy to see that N(oo) = 00 with probability 1. This follows since the only way in which N (00 )-the total number of renewals that occurs-can be finite, is for one of the interarrival times to be infinite. Therefore, P{N( 00) < oo} = P{X" ::::: 00 for some n} '" <LP{Xn=oo} =0. 102 RENEW ,\L THEORY - - - - x - - - - x - - - - - - - x - - - - - - ~ - - - - - - - x - - - - - - - - - - o SN(t) SN(t)+1 Time Figure 3.3.1 Thus N(t) goes to infinity as t goes to infinity However, it would be nice to know the rate at which N(t) goes to infinity. That is, we would like to be able to say something about liml-co N(t)lt. As a prelude to determining the rate at which N(t) grows, let us first consider the random variable SN(r)' In words, just what does this random variable represent? Proceeding inductively, suppose, for instance, that N(t) = 3. Then SN(r) = S3 represents the time of the third event Since there are only three events that have occurred by time t, S3 also represents the time of the last event prior to (or at) time t This is, in fact, what SN(r) represents-namely, the time of the last renewal prior to or at time t. Similar reasoning leads to the conclusion that SN(r)+! represents the time of the first renewal after time t (see Figure 3.3.1) We may now prove the following PROPOSITION 3.3.1 With probability 1, N(t) ----- t I-L Proof Since SN(II ::; t < SN(I) 11 , we see that (331) SN(I) t SN(I)'I ---<---<-- N(t) - N(t) N(t)' However, since SN(I/ N(t) is the average of the first N(t) interarrival times, it fol- lows by the strong law of large numbers that SN(I)IN(t) _ I-L as N(t) _ 00 But since N(t) _ 00 when t _ 00, we obtain ast_ 00 Furthermore, writing SN(')'I = [ SN(I)'I J [N(t) + 1] N(t) N(t) + 1_ N(t) , SOME LIMIT THEOREMS 103, we have, by the same reasoning, The result now follows by (331) since tIN(t) is between two numbers, each of which converges to p. as t _ co EXAMPLE 3.3(A) A container contains an infinite collection of coins. Each coin has its own probability of landing heads, and these probabilities are the values of independent random variables that are uniformly distributed over (0, 1). Suppose we are to flip coins sequentially, at any time either flipping a new coin or one that had previously been used. If our objective is to maximize the long-run proportion of flips that lands on heads, how should we proceed? Solution. We will exhibit a strategy that results in the long-run proportion of heads being equal to 1. To begin, let N(n) denote the number of tails in the first n flips, and so the long-run proportion of heads, call it Ph, is given by P - I' n - N(n) - 1 I' N(n) h - 1m - - 1m --. n _ ~ n n ~ ~ n Consider the strategy that initially chooses a coin and continues to flip it until it comes up tails. At this point that coin is discarded (never to be used again) and a new one is chosen The process is then repeated To determine Ph for this strategy, note that the times at which a flipped coin lands on tails constitute renewals Hence, by Proposition 3.3.1 lim N(n) = 1/ E[number of flips between successive tails] n--+oo n But, given its probability p of landing heads, the number of flips of a coin until it lands tails is geometric with mean 1/(1 - p) Hence, conditioning gives E[number of flips between successive tails] = f1 -1 1 dp = 00 o - p implying that, with probability 1, lim N(n) = 0 n_'" n 104 RENEWAL THEOR Y Thus Proposition 3.3.1 states that with probability 1, the long-run rate at which renewals occur will equal IIp.. For this reason IIp. is called the rate of the renewal process We show that the expected average rate of renewals m(t)lt also converges to 1 I p.. However, before presenting the proof, we will find it useful to digress to stopping times and Wald's equation. 3.3.1 Wald's Equation Let Xl, X 2 ,. denote a sequence of independent random variables. We have the following definition Definition An integer-valued random variable N is said to be a stopping time for the sequence XI' X 2 , if the event {N = n} is independent of X n + l , X n + 2 , for all n = 1, 2, Intuitively, we observe the Xn's in sequential order and N denotes the number observed before stopping. If N = n, then we have stopped after observing Xl, ... , Xn and before observing X n + l , X n + 2 , •• • ExAMPLE 3.3(a) Let X n , n = 1,2, ... , be independent and such that P{Xn = O} = P{Xn = I} = !, n = 1,2, .... If we let N = min{n Xl + ... + Xn = 1O}, then N is a stopping time We may regard N as being the stopping time of an experiment that successively flips a fair coin and then stops when the number of heads reaches 10. ExAMPLE 3.3(c) Let X n , n = 1,2, .. , be independent and such that P{Xn = -I} = P{Xn = I} = !. Then N = min{n: Xl + . . . + Xn = I} SOME LIMIT THEOREMS is a stopping time It can be regarded as the stopping time for a gambler who on each play is equally likely to either win or lose 1 unit and who decides to stop the first time he is ahead (It will be shown in the next chapter that N is finite with probability 1 ) THEOREM 3.3.2 eWald's Equation). ~ 105 If X .. Xl. are independent and identically distributed random variables having finite expectations, and if N is a stopping time for XI. Xl> such that E[Nl < 00, then E [* X,,] = E[N1E[Xl Proof Letting if N 2:. n ifN<n, we have that N " 2: X" = 2: X" I" " ~ l n=! Hence, (332) However, In = 1 if, and only if, we have not stopped after successively observing XI> , X,,-I' Therefore, I" is determined by XI, , X,,_I and is thus independent of X" From (33.2), we thus obtain E [ ~ X"] = ~ E[X"lE[/"l ex: E[X] 2: E[I"] n-! '" = E[X] 2: P{N 2:. n} "=1 = E[X1E[N] 106 RENEWAL THEORY Remark In Equation (332) we interchanged expectation and summation without justification To justify this interchange, replace X, by its absolute value throughout. In this case, the interchange is justified as all terms are nonnegative However, this implies that the original interchange is allowable by Lebesgue's dominated convergence theorem. For Example 3 3(B), Wald's equation implies However, Xl +. . + X N = 10 by definition of N, and so E{N] = 20. An application of the conclusion of Wald's equation to Example 33(C) would yield E[XI + ... + X N ] = E[N]E[X] However, Xl + ... + X N = 1 and E[X] = 0, and so we would arrive at a contradiction Thus Wald's equation is not applicable, which yields the conclusion that E[N] = 00 3.3.2 Back to Renewal Theory Let XI ,(X 2 , ••• denote the interarrival times of a renewal process and let us stop at the first renewal after t-that is, at the N(t) + 1 renewal. To verify that N(t) + 1 is indeed a stopping time for the sequence of X" note that N(t) + 1 = n ¢::} N(t) = n - 1 ¢::} Xl + ... + X n - l :::; t, Xl + ... + Xn> t. Thus the event {N(t) + 1 = n} depends only on Xl, .. , Xn and is thus independent of X n + h • ., hence N(t) + 1 is a stopping time. From Wald's equation we obtain that, when E[X] < 00, E{XI + ... + XN(r)+d = E[X]E[N(t) + 1], or, equivalently, we have the following Corollary 3.3.3 If Ii- < 00, then (333) We are now in position to prove the following. SOME UMIT THEOREMS 107 THEOREM 3.3.4 (The Elementary Renewal Theorem). m(l) 1 ---- I IL Proof Suppose first that IJ- < (Xl Now (see Figure 33 1) SN(t)+I > I Taking expectations and using Corollary 333 gives lJ-(m(l) + 1) > t, implying that (334) I , 'fm(l) 1 1m In --2= ...... I IL To go the other way, we fix a constant M, and define a new renewal process {XII' n = 1, 2, } by letting _ {XII X= II M if XII:=; M, ifXII>M n = 1,2, Let S" L ~ X" and N (t) sup{n S,,:=; t} Since the interarnval times for this truncated renewal process are bounded by M, we obtain Hence by Corollary 3 3 3, where IJ-M = E[X"l Thus SN(t)+I :=; I + M, (m(l) + 1)ILM :=; I + M, I , mel) 1 1m sup --:=; - r_" I ILM Now, since S" :=; S", it follows that N(t) 2= N(t) and met) ::?: met), thus (335) I , met) 1 1m sup --:=; -. 1-'" t IJ-M Letting M _ (Xl yields (336) and the result follows from (3.3 4) and (336) 108 RENEWAL THEORY When IJ- = 00, we again consider the truncated process, since IJ-M ~ 00 as M ~ co, the result follows from (335) Remark At first glance it might seem that the elementary renewal theorem should be a simple consequence of Proposition 3.3.1. That is, since the average renewal rate will, with probability 1, converge to 11 JL, should this not imply that the expected average renewal rate also converges to lIJL? We must, however, be careful, consider the following example. Let U be a random variable that is uniformly distributed on (0, 1). Define the random variables Y n , n ~ 1, by ifU>1/n if U <: lin. Now, since, with probability 1, U will be greater than 0, it follows that Y n will equal ° for all sufficiently large n. That is, Y n will equal ° for all n large enough so that lin < U. Hence, with probability 1, as n ----. 00. However, Therefore, even though the sequence of random variables Y n converges to 0, the expected values of the Y n are all identically 1. We will end this section by showing that N(t) is asymptotically, as t ----. 00, normally distributed. To prove this result we make use both of the central limit theorem (to show that Sn is asymptotically normal) and the relationship (3.3 7) N(t) < n ¢::> Sn > t. THEOREM 3.3.5 Let IJ- and (72, assumedfinite, represent the mean and variance of an interarrival time Then as t ~ 00. THE KEY RENEWAL THEOREM AND proof Let r, = II IL + Y(Tv'11 1L3 Then P {N(/) - IIIL < Y} = P{N(/) < r , } (TVII IL 3 = P{S, > I} (by (3.3.7)) , = P {S" - rilL > I - rilL} (Tv" (Tv" = P " I > _ Y 1 + Y(T { s - r IL ( ) -112} (Tv" vt,;. 109 Now, by the central linut (S, - ',IL)I(Tv" converges to a normal random vanable having mean 0 and variance l'as I (and thus r,) approaches 00 Also, since ( Y(T ) -112 -Y 1+-- --Y vt,;. we see that and since f '" 2 IY 2 e- X 12 dx = e- X 12 dx, -y the result follows Remarks (i) There is a slight difficulty in the above argument since " should be an integer for us to use the relationship (33.7). It is not, however, too difficult to make the above argument rigorous. (ii) Theorem 3.3.5 states that N(t) is asymptotically normal with mean tIp. and variance tu 2 /p.3. 3.4 THE KEY RENEWAL THEOREM AND ApPLICATIONS A nonnegative random variable X is said to be lattice if there exists d 2! 0 such that :L:=o P{X = nd} = 1. That is, X is lattice if it only takes on integral 110 RENEWAL THEORY multiples of some nonnegative number d The largest d having this property is said to be the period of X If X is lattice and F is the distribution function of X, then we say that F is lattice. We shall state without proof the following theorem THEOREM 3.4.1 (Blackwell's Theorem). (i) If F is not lattice, then m(t + a) - m(t) - alii- as t _ 00 for all a ~ 0 (ii) If F is lattice with period d, then E[number of renewals at nd] - dlli- as n _ 00 Thus Blackwell's theorem states that if F is not lattice, then the expected number of renewals in an interval of length a, far from the origin, is approxi- matelyalp. This is quite intuitive, for as we go further away from the origin it would seem that the initial effects wear away and thus (3.4.1) g(a) == lim [m(t + a) - m(t)J I_a: should exist However, if the above limit does in fact exist, then, as a simple consequence of the elementary renewal theorem, it must equal alp.. To see this note first that g(a + h) = lim [m(t + a + h) - m(t)] I_a: = lim [m(t + a + h) - m(t + a) + m(t + a) - m(t)J 1-'" = g(h) + g(a) However, the only (increasing) solution of g(a + h) = g(a) + g(h) is g(a) = ca, a>O THE KEY RENEWAL THEOREM AND APPLICATIONS for some constant c. To show that c = 1//L define Then implying that or Xl = m(l) - m(O) X2 = m(2) - m(1) Xn = m(n) - m(n - 1) 1 . Xl + .. + Xn 1m = C n-'" n lim m(n) = c n-'" n Hence, by the elementary renewal theorem, c = 1//L. 111 When F is lattice with period d, then the limit in (34 1) cannot exist For now renewals can only occur at integral multiples of d and thus the expected number of renewals in an interval far from the origin would clearly depend not on the intervals' length per se but rather on how many points of the form nd, n :> 0, it contains. Thus in the lattice case the' relevant limit is that of the expected number of renewals at nd and, again, if lim n _", E [number of renewals at nd J exists, then by the elementary renewal theorem it must equal dl/L. If interarrivals are always positive, then part (ii) of Blackwell's theorem states that, in the lattice case, lim P{renewal at nd} = !!.. n ~ O C /L Let h be a function defined on [0, 00 J For any a > 0 let mn(a) be the supremum, and mn(a) the infinum of h(t) over the interval (n - l)a -< t ~ na. We say that h is directly Riemann integrahle if L:=l mn(a) and 112 RENEWAL THEORY ~ : = I mn(a) are finite for all a > 0 and " '" lima L mn(a) 0-.0 n=1 lim a L mll(a) 0_0 n=l- A sufficient condition for h to be directly Riemann integrable is that. " (i) h(t) ;;: 0 for all t ;;: 0, (ii) h(t) is nonincreasing, (iii) f; h(t)dt < 00. The following theorem, known as the key renewal theorem, will be stated without proof THEOREM 3.4.2 (The Key Renewal Theorem). If F is not lattice, and if h(t) is directly Riemann integrable, then lim I' h(t - x) dm(x) = ~ I' h(t) dt, h'" 0 ~ 0 where '" m(x) = L FIl(x) n I and ~ = t pet) dt To obtain a feel for the key renewal theorem start with Blackwell's theorem and reason as follows By Blackwell's theorem, we have that lim met + a) - met) =1. 1_" a ~ and, hence, I ' I' met + a) - met) 1 1m 1m =-. 0_0 I_a; a ~ Now, assuming that we can justify interchanging the limits, we obtain lim dm(t) = 1. H'" dt ~ The key renewal theorem is a formalization of the above, THE KEY RENEWAL THEOREM AND APPLICATIONS 113 Blackwell's theorem and the key renewal theorem can be shown to be equivalent. Problem 3.12 asks the reader to deduce Blackwell from the key renewal theorem; and the reverse can be proven by approximating a directly Riemann integrable function with step functions. In Section 9.3 a probabilistic proof of Blackwell's theorem is presented when F is continuous and has a failure rate function bounded away from 0 and 00. The key renewal theorem is a very important and useful result. It is used when one wants to compute the limiting value of g(t), some probability or expectation at time t. The technique we shall employ for its use is to derive an equation for g(t) by first conditioning on the time of the last renewal prior to (or at) t. This, as we will see, will yield an equation of ttie form g(t) = h(t) + t h(t - x) dm(x). We start with a lemma that gives the distribution of SN(r), the time of the last renewal prior to (or at) time t. Lemma 3.4.3 P{SN(f):SS} = F(t) + £:F(t- y)dm(y). t;:::=S ~ o . Proof .. P{SNil) :S s} 2: P{Sn :S S, Sn+l > I} n -0 CD = F(I) + 2: P{Sn:S s. Sn+l > I} n=1 = F(t) + i: f: P{Sn:S s, Sn+l > tlS n = y} dFn(y) n=t = F(I) + ~ f: F(t - y) dFn(y) = F(I) + f ~ F(t - y) d ( ~ Fn(Y)) = F(t) + £:F(t- y)dm(y). where the interchange of integral and summation is justified since all terms are nonneg- ative 114 RENEWAL THEORY Remarks (1) It follows from Lemma 34.3 that P{SN(I) = O} = F(t), dF s ' (y) = F(t - y) dm(y), NUl O<y<oo. (2) To obtain an intuitive feel for the above, suppose that F is continuous with density f. Then m(y) = ~ : = I Fn(Y), and so, for y > 0, 00 dm(y) = L fn(Y) dy n=1 0:: = L P{nth renewal occurs in (y, y + dy)} n=1 = P{renewal occurs in (y, y + dy)}. So, the probability density of SN(I) is fs )(y) dy = P{renewal in (y, y + dy), next interarrival > t - y} Nil = dm(y)F(t - y) We now present some examples of the utility of the key renewal theorem. Once again the technique we employ will be to condition on SN(t). 3.4.1 Alternating Renewal Processes Consider a system that can be in one of two states' on or off. Initially it is on and' it remains on for a time ZI, it then goes off and remains off for a time Y 1 , it then goes on for a time Z2, then off for a time Y 2 ; then on, and so forth. We suppose that the random vectors (Zn' Y n ), n ;? 1, are independent and identically distributed Hence, both the sequence of random variables {Zn} and the sequence {Y n } are independent and identically distributed; but we allow Zn and Y n to be dependent In other words, each time the process goes on everything starts over again, but when it goes off we allow the length of the off time to depend on the previous on time. Let H be the distribution of Zn, G the distribution of Y n , and F the distribu- tion of Zn + Y n , n ;? 1 Furthermore, let P(t) = P{system is on at time t}. THE KEY RENEWAL THEOREM AND APPLICATIONS 115 -THEOREM 3.4.4 If E[Zn + Yn] < 00 and F is non/altice, then Proof Say that a renewal takes place each time the system goes on Conditioning on the time of the last renewal pnor to (or at) t yields Now and, for y < t, P(t) = P{on at tISN(/) = O}P{SN(I) = O} + f ~ P{on at tISN(/) = y} dFsN(II(y) P{on at tISN(/) = O} = P{ZI > tlZ I + Y I > t} = H(t)/F(t), P{on attISN(I) = y} = P{Z > t - ylZ + Y> t - y} = H(t - y)/F(t - y) Hence, using Lemma 3 4 3, P(t) = H(t) + f ~ H(t - y) dm(y), where m(y) = L : ~ I Fn(Y) Now H(t) is clearly nonnegative, nonincreasing, and I; H(t) dt = E[Z] < 00 Since this last statement also implies that H(t) - 0 as t - 00, we have, upon application of the key renewal theorem, If we let Q(t) = P{off at t} = 1 - P(t), then E[Y] Q(t) - E[Z] + E[Y] We note that the fact that the system was initially on makes no difference in the limit. 116 RENEWAL THEORY Theorem 344 is quite important because many systems can be modelled by an alternating renewal process. For instance, consider a renewal process and let Yet) denote the time from t until the next renewal and let A(t) be the time from t since the last renewal That is, Yet) SN(I) +1 t, "A(t) = t SN(!) yet) is called the excess or residual life at t, and A(t) is called the age at t. If we imagine that the renewal process is generated by putting an item in use and then replacing it upon failure, then A(t) would represent the age of the item in use at time t and Yet) its remaining life Suppose we want to derive P{A(t) <: x}. To do so let an on-off cycle correspond to a renewal and say that the system is "on" at time t if the age at t is less than or equal to x. In other words, the system is "on" the first x units of a renewal interval and "off" the remaining time Then, if the renewal distribution is not lattice, we have by Theorem 34.4 that lim P{A(t) sx} = E[min(X,x)lIE[X] ! ...... "" J; P{min(X,x) > y} dy/ErX] = J: F(y) dy/}L· Similarly to obtain the limiting value of P{Y(t) x}, say that the system is "off" the last x units of a renewal cycle and "on" otherwise Thus the off time in a cycle is min(x, X), and so lim P{Y(t) s x} lim P{off at t} 1-+00 /-+3l = E[min(x, X)]/ E[X] = J: F(y) dy/}L· Thus summing up we have proven the following PROPOSITION 3.4.5 If the interarrival distribution is non lattice and IL < 00, then ~ ~ ~ P{Y(t) =:; x} = ! ~ ~ P{A(t) =:; x} J; F(y) dyllL THE KEY RENEWAL THEOREM AND APPLICATIONS 117 Remark To understand why the limiting distribution of excess and age are identical, consider the process after it has been in operation for a long time; for instance, suppose it started at t = -00 Then if we look backwards in time, the time between successive events will still be independent and have distribution F Hence, looking backwards we see an identically distributed renewal process. But looking backwards the excess life at t is exactly the age at t of the original process. We will find this technique of looking backward in time to be quite valuable in our studies of Markov chains in Chapters 4 and 5. (See Problem 3.14 for another way of relating the distributions of excess and age.) Another random variable of interest is XN(/)+I = SN(/)+I - SN(r)' or, equiva- lently, XN(/)+I = A{t) + Y{t) Thus X N(r)+1 represents the length of the renewal interval that contains the point t In Problem 3 3 we prove that P{XN(r)+1 > x} ~ F{x) That is, for any x it is more likely that the length of the interval containing the point t is greater than x than it is that an ordinary renewal interval is greater than x. This result, which at first glance may seem surprising, is known as the inspection paradox. We will now use alternating renewal process theory to obtain the limiting distribution of X N(r)+1 Again let an on-off cycle correspond to a renewal interval, and say that the on time in the cycle is the total cycle time if that time is greater than x and is zero otherwise. That is, the system is either totally on during a cycle (if the renewal interval is greater than x) or totally off otherwise. Then P{X N(,)+I > x} = P{length of renewal interval containing t > x} = P{on at time t}. Thus by Theorem 3.4 4, provided F is not lattice, we obtain I · P{X } - E[on time in cycle] 1m N(I)+I > X - ----"------=------'- '-'" p. = E[XIX > xlF{x)/p. = r y dF{y)/p., 118 RENEWAL THEORY or, equivalently, (3.4.2) lim P{XN(r)+. :s; x} = JX y dF(y)/p.. ~ ~ 0 Remark To better understand the inspection paradox, reason as follows: Since the line is covered by renewal intervals, is it not more likely that a larger interval-as opposed to a shorter one-covers the point t? In fact, in the limit (as t --+ 00) it is exactly true that an interval of length y is y times more likely to cover t than one of length 1. For if this were the case, then the density of the interval containing the point t, call it g, would be g(y) = y dF(y)/c (since dF(y) is the probability that an arbitrary interval is of length y and ylc the conditional probability that it contains the point). But by (3.4.2) we see that this is indeed the limiting density. For another illustration of the varied uses of alternating renewal processes consider the following example. EXAMPLE 3.4(A) An Inventory Example. Suppose that customers arrive at a store, which sells a single type of commodity, in accor- dance with a renewal process having nonlattice interamval distribu- tion F. The amounts desired by the customers are assumed to be independent with a common distribution G. The store uses the following (s, S) ordering policy' If the inventory level after serving a customer is below s, then an order is placed to bring it up to S Otherwise no order is placed Thus if the inventory level after serving a customer is x, then the amount ordered is S - x if x < s, o ifx;;?s. The order is assumed to be instantaneously filled. Let X(t) denote the inventory level at time t, and suppose we desire lim( .... ~ P{X(t) > x}. If X(O) = S, then if we say that the system is "on" whenever the inventory level is at least x and "off" otherwise, the above is just an alternating renewal process. Hence, from Theorem 3.4.4, lim P{X(t) ;;? x} = E [amount of time. the inventory;;? x in a cycle]. ( .... ~ E[tlme of a cycle] Now if we let Y., Y 2 , •• , denote the successive customer demands and let (3.4.3) N x = min{n: Y. + ... + Y n > S - x}, THE KEY RENEWAL THEOREM AND APPLICATIONS then it is the N x customer in the cycle that causes the inventory level to fall below x, and it is the Ns customer that ends the cycle. Hence if x" i ~ 1, denote the interarrival times of customers, then Nl amount of "on" time in cycle = Lx" 1=1 N, time of cycle = Lx,. 1=1 Assuming that the interarrival times are independent of the succes- sive demands, we thus have upon taking expectations (344) [ N- ] 1 E LX, r P{X():> } = 1=1 = E[N x ] L ~ t x [ N, ] E [Ns] E LX, 1= 1 However, as the Y " i:> 1, are independent and identically distrib- uted, it follows from (3.4.3) that we can interpret N x - 1 as the number of renewals by time S - x of a renewal process with interarrival time Y;, i ~ 1. Hence, E[N x ] = mG(S - x) + 1, E[N,] = mG(S - s) + 1, where G is the customer demand distnbution and '" ma(t) = L Gn(t). n=1 Hence, from (3.4.4), we arnve at • :> _ 1 + mG(S - x) 11m P{X(t) - x} - 1 (S)' 1-+'" + mG - s s ~ x ~ S . 3.4.2 Limiting Mean Excess and the Expansion of m(t) 119 Let us start by computing the mean excess of a nonlattice renewal process. Conditioning on SN(/) yields (by Lemma 3.4.3) E[Y(t)] = E[Y(t)ISN(r) = O]F(t) + f ~ E[Y(t)ISN(r) = y]F(t - y) dm(y). 120 RENEWAL THEORY ---------x---------- ----x--------------------x---- Now, y ---Y(t)--- Figure 3.4.1. SN(I) = Y; X :: renewal. E [Y(t)ISN(r) = 0] = E[X - tlX > t], E[Y(t)ISN(t) = y] = E[X - (t - y)IX > t - y]. where the above follows since SNIt) = y means that there is a renewal at y and the next interarrival time-call it X-is greater than t - Y (see Figure 3.4.1). Hence, E[Y(t)] = E[X - tlX> t]P(t) + J: E[X - (t - y)IX> t - y]F(t - y) dm(y). Now it can be shown that the function h(t) = E [X - tlX > t] F(t) is directly Riemann integrable provided E[X2] < 00, and so by the key renewal theorem E [Y(t)] ---+ J; E [X - tlX > t] F(t) dtl IL = J; J ~ (x - t) dF(x) dtllL = J; J: (x - t) dt dF(x)11L = J; x 2 dF(x )/21L = E[X2]/21L. Thus we have proven the following. PROPosn·ION 3.4.6 (by interchange of order of integration) If the interarrival distribution is nonlattice and E[X2] < 00, then lim E[Y(t)] = E[X2]/2/L 1-'" Now SN(I)+I, the time of the first renewal after t, can be expressed as SN(I) + I = t + Y(t) THE KEY RENEWAL THEOREM AND APPLICATIONS 121 Taking expectations and using Corollary 333, we have lL(m(t) + 1) = t + E[Y(t)] or met) - -.£ = E[Y(t)] - 1 IL IL Hence, from Proposition 3 46 we obtain the following Corollary 3.4.7 If E[X2] < co and F is nonlattice, then as t- co 3.4.3 Age-Dependent Branching Processes Suppose that an organism at the end of its lifetime produces a random number of offspring in accordance with the probability distribution {PI' j = 0, 1, 2, . .}. Assume further that all offspring act independently of each other and produce their own offspring in accordance with the same probability distribu- tion {PJ Finally, let us assume that the lifetimes of the organisms are indepen- dent random variables with some common distribution F Let X(t) denote the number of organisms alive at t. The stochastic process {X(t), t O} is called an age-dependent hranching process We shall concern ourselves with determining the asymptotic form of M(t) = E[X(t)], when m = > 1 THEOREM 3.4.8 If Xo = 1, m > 1, and F not lattice, then where a is the unique number such that J '" 1 e-a'dF(x) = - o m 122 RENEWAL THEORY Proal By conditioning on T .. the lifetime of the initial organism, we obtain M(t) = t £IX(t)ITI = s] dF(s) However, (345) ~ ' {I £IX(t) I TI = s] = m M(t - s) ifs> t ifs s t To see why (345) is true, suppose that T] = s, sst, and suppose further that the organism has j offspring Then the number of organisms alive at t may be wntten as Y] +. + Y" where Y, is the number of descendants (including himself) of the ith offspring that are alive at t Clearly, Y 1 , , Y, are independent with the same distribu- tion as X(t - s) Thus, £(YI + + Y,) = jM(t - s), and (345) follows by taking the expectation (with respect to j) of jM(t - s) Thus, from the above we obtain (346) M(t) = F(t) + m t M(t - s) dF(s). Now, let a denote the unique positive number such that and define the distribution G by G(s) = m t e ay dF(y), Oss< 00 Upon multiplying both sides of (346) by e a' and using the fact that dG(s) me-a, dF(s), we obtain (347) e-a'M(t) = e a'F(t) + J: e-a('-<}M(t - s) dG(s) Letting I(t) = e-a'M(t) and h(t) = e a'F(t). we have from (347), using convolution no- tation. 1= h + 1* G =h+G*1 = h + G * (h + G * f) = h + G * h + Gz * I = h + G * h + G2 * (h + G * f) = h + G * h + G2 * h + G) * I = h + G * h + G2 * h + " + Gn * h + Gn+ 1 * I DELAYED RENEWAL PROCESSES U3 Now since G is the distribution of some nonnegative random vanable, it follows that Gn(t) - 0 as n - 00 (why?), and so letting n - 00 in the above yields h + h *mo or /(/) h(/) + J ~ h(1 - s) dma(s) Now it can be shown that h(/) is directly Riemann integrable, and thus by the key renewal theorem (348) Now (349) Also, (3410) r h(/) dl /(/)_;;....;°_- /Lo J: e-aIF(/) dl J: xdG(x) J '" e aIF(t) dl = J'" e- al J'" dF(x) dl IJ 0 I = J ~ J: e- al dl dF(x) 1 J'" f = - (I a 0 (by the definition of a) J; x dG(x) m t xe- ax dF(x) Thus from (348), (349), and (34 10) we obtain ast- 00 3.5 DELAYED RENEWAL PROCESSES We often consider a counting process for which the first interarrival time has a different distribution from the remaining ones For instance, we might start 124 RENEWAL THEORY observing a renewal process at some time t > O. If a renewal does not Occur at t, then the distribution of the time we must wait until the first observed renewal will not be the same as the remaining interarrival distributions. Formally, let {X n , n = 1,2, ... } be a sequence of independent nonnegative random variables with XI having distribution G, and Xn having distribution F, n > 1 Let So = 0, ,Sn = L ~ Xi, n ::= 1, and define ND(t) = sup{n: Sn:S t}. Definition The stochastic process {No(t), t 2: O} is called a general or a delayed renewal process When G = F, we have, of course, an ordinary renewal process. As in the ordinary case, we have Let P{Nf)(t) = n} = P{Sn:S t} - P{Sn+1 <:: t} = G * Fn-I(t) G* Fn(t). mo(t) = E[No(t)]. Then it is easy to show that '" (351) mD(t) =::: 2: G*Fn-1(t) n=1 and by taking transforms of (3.5.1), we obtain (3.5.2) _ ( ) _ G(s) mo s - . 1- F(s) By using the corresponding result for the ordinary renewal process, it is easy to prove similar limit theorems for the delayed process. We leave the proof of the following proposition for the reader. Let 1L = I; x dF(x). DELAYED RENEWAL PROCESSES PROPOSITION 3.5.1 (i) With probability 1, ND(t) 1 ---- as t- 00. t IJ- (ii) mo(t) 1 as t- 00 ---- t IJ- (iii) If F is not lattice, then a mo(t + a) - mD(t) -- IJ- (iv) If F and G are lattice with penod d, then d E[number of renewals atnd] -- IJ- asn _ 00 (v) If F is not lattice, IJ- < 00, and h directly Riemann integrable, then f ~ h(t - x) dmo(x) - t h(t) dt/IJ- ExAMPLE 3.5(A) Suppose that a sequence of independent and identi- cally distributed discrete random variables XI> X 2 , ••• is observed, and suppose that we keep track of the number of times that a given subsequence of outcomes, or pattern, occurs. That is, suppose the pattern is Xl, X2, ", Xk and say that it occurs at time n if Xn = Xb X n - l = Xk-l,' •• , X n - k + l = XI. For instance, if the pattern is 0,1,0,1 and the sequence is (Xl, X 2 , ••• ) = (1,0,1,0,1,0,1,1,1,0,1,0, 1, ., .), then the pattern occurred at times 5, 7, 13. If we let N(n) denote the number of times the pattern occurs by time n, then {N(n), n ~ 1} is a delayed renewal process. The distribution until the first renewal is the distribution of the time until the pattern first occurs; whereas the subsequent interarrival distribution is the distribution of time between replications of the pattern. Suppose we want to determine the rate at which the pattern occurs. By the strong law for delayed renewal processes (part (i) of Theorem 3.5.1) this will equal the reciprocal of the mean time between patterns. But by Blackwell's theorem (part (iv) of Theorem 3.5.1) this is just the limiting probability of a renewal at 125 126 RENEWAL THEORY time n That is, (E[time between patternsl)-1 = lim P{pattern at time n} n-" k = IT P{X = XI}' Hence the rate at which the pattern occurs is P{X = x,} and the mean time between is (ITt P{X = X,})-I. For instance, if each random variable is 1 with probability p and o with probability q then the mean time between patterns of 0, 1, 0, 1 is p-2q-2. Suppose now that we are interested in the expected time that the pattern 0, 1,0, 1 first occurs. Since the expected time to go from 0, 1,0, 1 to 0, 1,0,1 is p-2q -2, it follows that starting with 0, 1 the expected number of additional outcomes to obtain 0, 1,0, 1 is p-2q-2. But since in order for the pattern 0, 1,0, 1 to occur we must first obtain 0, 1 it follows that E[time to 0, 1,0, 1] = E[time to 0, 1] + p-2q-2. By using the same logic on the pattern 0, 1 we see that the expected time between occurrences of this pattern is 1/(pq), and as this is equal to the expected time of its first occurrence we obtain that The above argument can be used to compute the expected num- ber of outcomes needed for any specified pattern to appear For instance, if a coin with probability p of coming up heads is succes- sively flipped then E{time until HTHHTHH] = E[time until HTHHl + p-Sq-2 = E[time until H] + p-3 q -l + p-5q-2 = p-l + p-3 q -l + p-Sq-2 Also, by the same reasoning, E{time until k consecutive heads appear] = (l/p)k + E[time until k - 1 consecutive heads appearl k = 2, (lip)'. ,:1 Suppose now that we want to compute the probability that a given pattern, say pattern A, occurs before a second pattern, say pattern B. For instance, consider independent flips of a coin that lands on heads with probability p, and suppose we are interested DELAYED RENEWAL PROCESSES in the probability that A = HTHT occurs before B THTT To obtain this probability we will find it useful to first consider the expected additional time after a given one of these patterns occur until the other one does. Let N 81A denote respectively the number of additional flips needed for B to appear starting with A, and similarly for N AIH . Also let NA denote the number of flips until A occurs. Then E{N lilA] = E[additional number to THTT starting with HTHT] E[additional number to THTI starting with THT] But since we see that But, and so, Also, E[Nmn ] = E[NTlIT] + E[NTlITIITlIT], E{N TlIIT ] = E[N T ] + q-3 p -l = q 1 + q 3 p -l E[NTHT] = E[N T ] + q-2p -l = q-l + q 2p-l E[N AI8 ] = E[NA] = p-2 q -2 + P lq-I To compute P A = P{A before B} let M Min(NA' Ny). Then E[NA] = E[M] + E[NA - M] Similarly, = E[M] + E[NA - MIB before A ](1 P A ) = E[Ml + E[NAly](l - P A ). Solving these equations yields P A = E[N8] + E[NAjyJ - E[NA] E[NyIA ] + E[NAIY] E[M] = E[N8] E{N8IA]PA U7 128 RENEWAL THEORy For instance, suppose that p = 112. Then as ErNAIB] = ErNA] = 24 + 22 = 20 ErN R ] = 2 + 24 = 18, ErN B1A ] = 24 - 2 3 = 8, we obtain that P = 18 + 20 - 20 = 9/14 ErM] = 18 _ 8 . 9 = 9017 A 8 + 20 ' 14' Therefore, the expected number of flips until the pattern A appears is 20, the expected number until the pattern B appears is 18, the expected number until either appears is 9017, and the probability that pattern A appears first is 9/14 (which is somewhat counter- intuitive since ErNA] > ErNB])' EXAMPLE 3.5(a) A system consists of n independent components, each of which acts like an exponential alternating renewal process More specifically, component i, i = 1, ... ,n, is up for an exponential time with mean A, and then goes down, in which state it remains for an exponential time with mean JL" before going back up and starting anew. Suppose that the system is said to be functional at any time if at least one component is up at that time (such a system is called parallel). If we let N(t) denote the number of times the system becomes nonfunctional (that is, breaks down) in [0, t], then {N(t), t ;> 0) is a delayed renewal process Suppose we want to compute the mean time between system breakdowns. To do so let us first look at the probability of a breakdown in (t, t + h) for large t and small h. Now one way for a breakdown to occur in (t, t + h) is to have exactly 1 component up at time t and all others down, and then have that component fail. Since all other possibilities taken together clearly have probability a (h), we see that limp{breakdOWnin(t,t+h)}=:t{ A, IT JL I }lh+O(h). I_eo ,;1 A, + JL, I"" Al + JL I A, But by Blackwell's theorem the above is just h times the reciprocal of the mean time between breakdowns. and so upon letting h ....... 0 we obtain ( n JL n 1 )-1 Ertime between breakdowns] = IT ) L- 1=1 Al + JL I 1-1 JL, As the expected length of a breakdown period is ( ~ ; = 1 1/JLlt\ we can compute the average length of an up (or functional) DELAYED RENEWAL PROCESSES period from ( n 1 )-1 E[length of up period] = E[time between breakdowns] - 2.:- 1 IL, As a check of the above, note that the system may be regarded as a delayed alternating renewal process whose limiting probability of being down is n lim P{system is down at t} = n ILl . 1-'" I ~ I AI + ILl We can now verify that the above is indeed equal to the expected length of a down period divided by the expected time length of a cycle (or time between breakdowns). ExAMPLE 3.5(c) Consider two coins, and suppose that each time coin i is flipped it lands on tails with some unknown probability p" i = 1, 2. Our objective is to continually flip among these coins so as to make the long-run proportion of tails equal to min(PI' P2)' The following strategy, having a very small memory requirement, will accomplish this objective Start by flipping coin 1 until a tail occurs, at which point switch to coin 2 and flip it until a tail occurs. Say that cycle 1 ends at this point Now flip coin 1 until two tails in a row occur, and then switch and do the same with coin 2. Say that cycle 2 ends at this point. In general, when cycle n ends, return to coin 1 and flip it until n + 1 tails in a row occur and then flip coin 2 until this occurs, which ends cycle n + 1. To show that the preceding policy meets our objective, let P = max(p), P2) and ap = min(p I, P2), where a < 1 (if a = 1 then all policies meet the objective). Call the coin with tail probability P the bad coin and the one with probability ap the good one. Let 8 m denote the number of flips in cycle m that use the bad coin and let G m be the number that use the good coin. We will need the following lemma. Lemma For any s > 0, P{B m ~ sG m for infinitely many m} = O. U9 130 RENEWAL THEORY Proof We will show that ., L P{B m ~ eG m } < 00 m". which, by the Borel-Cantelli lemma (see Section 11), will establish the result Now, ':. where the preceding inequality follows from the fact that G m = i implies that i ~ m and that cycle m good flips numbered i - m + 1 to i must all be tails Thus, we see that But, from Example 33(A), E[B m 1 = f (lIp)r = (l/p)m - 1 r=1 1 - p Therefore, which proves the lemma Thus, using the lemma, we see that in all but a finite number of cycles the proportion B/(B + G) of flips that use the bad coin will be less than -1 e < e. Hence, supposing that the first coin +e used in each cycle is the good coin it follows, with probability 1, that the long-run proportion of flips that use the bad coin will be less than or equal to e. Since e is arbitrary, this implies, from the continuity property of probabilities (Proposition 11.1), that the long-run proportion of flips that use the bad coin is 0 As a result, it follows that, with probability 1, the long-run proportion of flips that land tails is equal to the long-run proportion of good coin flips that land tails, and this, by the strong law of large numbers, is exp. (Whereas the preceding argument supposed that the good coin is used first in each cycle, this is not necessary and a similar argument can be given to show that the result holds without this assumption.) DELA YED RENEWAL PROCESSES 131 In the same way we proved the result in the case of an ordinary renewal process, it follows that the distribution of the time of the last renewal before (or at t) t is given by (3.5.3) P{SN(ll ~ s} = G(t) + J: F(r - y) dmD(y), When p. < 00, the distribution function F.(x) = J: F(y) dy/p., X 2: 0, is called the equilibrium distribution of F. Its Laplace transform is given by (354) Pres) = f; e- SX dF.(x) = J; e- SX r dF(y)dx/p. = f; f: e- SX dx dF(y)/ p. = 2. Joo (1 - e- sy ) dF(y) sp. 0 _ 1 - F(s) p.s The delayed renewal process with G = Fr is called the equilibrium renewal process and is extremely important. For suppose that we start observing a renewal process at time t. Then the process we observe is a delayed renewal process whose initial distribution is the distribution of Yet). Thus, for t large, it follows from Proposition 3.4.5 that the observed process is the equilibrium renewal process The stationarity of this process is proven in the next theorem. Let YD(t) denote the excess at t for a delayed renewal process. THEOREM 3.5.2 For the equilibrium renewal process (0 mD(t) = till-, (0) P{YD(t) :s; x} = Ft(x) for all t ~ 0; (iii) {ND(t), t ~ O} has stationary increments Proof (0 From (35.2) and (3.54), we have that riiD(S) = 1- i.tS 132 RENEWAL THEORY However, simple calculus shows that 1IJ-LS is the Laplace transform of the function h(t) = tllJ-, and thus by the uniqueness of transforms, we obtain mD(t) = tllJ- (ii) For a delayed renewal process, upon conditioning on SN(r) we obtain, using (353), I, Now Hence, P{YD(t) > x} = P{YD(t) > XISN(I) = O}G (t) + f ~ P{YD(t) > XISN(I) = s}F(t - s) dmD(s). P{YD(t) > XISN(I) = O} = P{X, > t + xiX, > t} _G(t+x) G(t) P{YD(t) >xlsN(I) = s} = P{X> t + X - six> t - s} _F(t+x-s) F(t - s) P{YD(t) > x} = G(t + x) + f: F(t + x - s) dmD(s) Now, letting G = F. and using part (i) yields P{YD(t) > x} = F.(t + x) + J: F(t + x - s) dsllJ- = F.(t + x) + r I F(y) dyllJ- = F.(x). (iii) To prove (iii) we note that ND(t + s) - ND(S) may be interpreted as the number of renewals in time t of a delayed renewal process, where the initial distribution is the distribution of Y D(S) The result then follows from (ii) 3.6 RENEWAL REWARD PROCESSES A large number of probability models are special cases of the following model. Consider a renewal process {N(t), t ::> O} having interarrival times X n , n 2! 1 with distribution F, and suppose that each time a renewal occurs we receive a reward. We denote by Rn the reward earned at the time of the nth renewal. We shall assume that the R n , n 2! I, are independent and identically distributed. RENEWAL REWARD PROCESSES 133 However, we do allow for the possibility that Rn may (and usually will) depend on X n , the length of the nth renewal interval, and so we assume that the pairs (Xn' R n ), n ~ 1, are independent and identically distributed. If we let N(I) R(t) = 2: Rn. n s \ then R(t) represents the total reward earned by time t. Let THEOREM 3.6.1 If E[R] < 00 and E[X] < 00, then (i) with probability 1, R(I) E[R] ---'jo-- I E[Xl as I --'jo 00, (ii) _E,,-[ R->..(t.u.) 1--'jo _E [_R ] t E[Xl as t --'jo 00 Proof To prove (i) write ( N(II ) = ~ Rn (N(t») N(t) t By the strong law of large numbers we obtain that as I --'jo 00, and by the strong law for renewal processes N(t) 1 ---'jo-- t E[X] Thus (i) is proven 134 RENEWAL THEORY To prove (ii) we first note that since N(t) + 1 is a stopping time for the sequence X" X 2 , ,it is also a stopping time for R" Rz, (Why?) Thus, by Wald's equation, E R.] = E Ri] - E[RNwd ,-1 ,.1 = (m(t) + 1)E[R] E[R N (I)+I] and so E[R(t)] = m(t) + 1 E[R] _ E[RN(!l+,J t t t ' and the result will follow from the elementary renewal theorem if we can show that 0 as t 00 So, towards this end, let g(t) == E[RN(I)+'] Then conditioning on SN!II yields g(t) = E[RN(II-dSN(1l = O]F(t) + J: E[RN(I)+dSN(I) s]F(t - s) dm(s) However, E[RN(I)_IISN(I) = 0] E[R" X, > 1], E[RN(I)-dSN(,) =: s] E[Rnl Xn > t - s], and so (361) Now, let and note that since EIRd = f; EfiRd IX, x] dF(x) < 00, it follows that h(t) 0 as t 00 and h(t);s; EIR,I for all I, and thus we can choose T so that Ih(t)1 < s whenever t T Hence, from (361), Ig(t)1 ;s; Ih(/)1 + f'-Tlh(1 - x)1 dm(x) + f' Ih(t - x)1 dm(x) t tOt ,-T t :::; 1-; + 1-;m(t - T) + EIRd m(t) - m(t - T) tit s EX 00 RENEWAL REWARD PROCESSES 135 by the elementary renewal theorem Since e is arbitrary, it follows that g(t)lt --+ 0, and the result follows Remarks If we say that a cycle is completed every time a renewal occurs, then the theorem states that the (expected) long-run average return is just the expected return earned during a cycle, divided by the expected time of a cycle. In the proof of the theorem it is tempting to say that E[RN(I)+d = E[Rd and thus lItE[R N (I)+d tnvially converges to zero. However, RN(I)+I is related to XN(I) +-I ,and XN(I)+-I is the length of the renewal interval containing the point t Since larger renewal intervals have a greater chance of containing t, it (heunstically) follows that XN(I)+I tends to be larger than an ordinary renewal interval (see Problem 3 3), and thus the distnbution of RN(I)+I is not that of R,. Also, up to now we have assumed that the reward is earned all at once at the end of the renewal cycle. However, this is not essential, and Theorem 3.6.1 remains true if the reward is earned gradually during the renewal cycle. To see this, let R(t) denote the reward earned by t, and suppose first that all returns are nonnegative Then and (ii) of Theorem 3.6 1 follows since • • Part (I) of Theorem 361 follows by noting that both Rn1t and Rn1t converge to E [R]I E [X] by the argument given in the proof. A similar argument holds when the returns are nonpositive, and the general case follows by breaking up the returns into their positive and negative parts and applying the above argument separately to each. ExAMPLE 3.6{A) Alternating Renewal Process. For an alternating renewal process (see Section 34.1) suppose that we earn at a rate of one per unit time when the system is on (and thus the reward for a cycle equals the on time of that cycle). Then the total reward earned by t is just the total on time in [0, t], and thus by Theorem 36.1, with probability 1, f . . M E[X] amount 0 on time In t ---+ E [X] + E [Y] , 136 RENEWAL THEORY where X is an on time and Y an off time in a cycle. Thus by Theorem 3.4.4 when the cycle distribution is nonlattice the limiting probability of the system being on is equal to the long-run propor- tion of time it is on. ExAMPLE 3.6(8) Average Age and Excess. Let A (t) denote the age at t of a renewal process, and suppose we are interested in com- puting To do so assume that we are being paid money at any time at a rate equal to the age of the renewal process at tpat time. That is, at time s we are being paid at a rate A(s), and so J ~ A(s) ds represents our total earnings by time t As everything starts over again when a renewal occurs, it follows that, with probability 1, I: A (s) ds E [reward dunng a renewal cycle] t --+ E [time of a renewal cycle] . Now since the age of the renewal process a time s into a renewal cycle is just s, we have I x X2 reward during a renewal cycle = 0 s ds = 2'"' where X is the time of the renewal cycle. Hence, with probability 1, . f:A(s) ds _ E[X2] ! : ~ t - 2E[X]' Similarly if Y(t) denotes the excess at t, we can compute the average excess by supposing that we are earning rewards at a rate equal to the excess at that time. Then the average value of the excess will, by Theorem 3.6.1, be given by \ I · I' Y( ) d / = E[reward during a renewal cycle] L ~ 0 sst E[X] E [f: (X - t) dt] E[X] _ E[X2] - 2E[X)' u RENEWAL REWARD PROCESSES Thus the average values of the age and excess are equal. (Why was this to be expected?) The quantity XN(I) + I = SN(I)+I - SN(t) represents the length of the renewal interVal containing the point t. Since it may also be expressed by XN(I)+I = A(t) + yet), we see that its average value is given by Since . It _ E[X2] ! : ~ 0 XN(s)+1 dslt - E[X]' E[X2] ~ E[X] E[X] (with equality only when Var(X) = 0) we see that the average value of XN(I)+I is greater than E[X]. (Why is this not surprising?) ExAMPLE 3.6(c) Suppose that travelers arrive at a train depot in accordance with a renewal process having a mean interarrival time IL. Whenever there are N travelers waiting in the depot, a train leaves. If the depot incurs a cost at the rate of nc dollars per unit time whenever there are n travelers waiting and an additional cost of K each time a train is dispatched, what is the average cost per unit time incurred by the depot? If we say that a cycle is completed whenever a train leaves, then the above is a renewal reward process. The expected length of a cycle is the expected time required for N travelers to arrive, and, since the mean interarrival time is IL, this equals E[length of cycle] = NIL. If we let Xn denote the time between the nth and (n + 1 )st amval in a cycle, then the expected cost of a cycle may be expressed as E[cost ofa cycle] = E[cX. + 2cX 2 + ... + (N - l)cXN-d + K = CIL N (N - 1) + K 2 Hence the average cost incurred is 137 138 RENEWAL THEORY 3.6.1 A Queueing Application Suppose that customers arrive at a single-server service station in accordance with a nonlattice renewal process Upon amval, a customer is immediately served if the server is idle, and he or she waits in line if the server is busy. The service times of customers are assumed to be independent and identically distributed, and are also assumed independent of the arrival stream Let XI, X 2 , ••• denote the interarrival times between customers; and let Y I , Y 2 , • denote the service times of successive customers. We shall assume that (3.6.2) E[Y,] < E[X,] < 00 Suppose that the first customer arrives at time 0 and let n(t) denote the number of customers in the system at time t. Define L = Jim J' n(s) ds/t. r ..... OJ 0 To show that L exists and is constant, with probability 1, imagine that a reward is being earned at time s at rate n(s) If we let a cycle correspond to the start of a busy period (that is, a new cycle begins each time an amval finds the system empty), then it is easy to see that the process restarts itself each cycle. As L represents the long-run average reward, it follows from Theorem 361 that (3.6.3) L = E [reward during a cycle] E [time of a cycle] E [J; n(s) dS] E[T] Also, let W, denote the amount of time the ith customer spends in 'the system and define W I · WI + ... + Wn = 1m ----'-----'" n ..... o:;) n To argue that Wexists with probability 1, imagine that we receive a reward W, on day i. Since the queueing process begins anew after each cycle, it follows that if we let N denote the number of customers served in a cycle, then W is the average reward per unit time of a renewal process in which the cycle time RENEWAL REWARD PROCESSES is N and the cycle reward is WI + .. + W N, and, hence, (364) ~ W = E[reward during a cycle] E [time of a cycle] E [ ~ W,] = E[N] 139 We should remark that it can be shown (see Proposition 7 1.1 of Chapter 7) that (36.2) implies that E [N] < 00. The following theorem is quite important in queueing theory THEOREM 3.6.2 Let A = lIE[X,] denote the arrival rate Then L = AW Proof We start with the relationship between T, the length of a cycle, and N, the number of customers served in that cycle If n customers are served in a cycle, then the next cycle begins when the (n + 1 )st customer arrives, hence, N T= LX. I ~ I Now it is easy to see that N is a stopping time for the sequence XJ, Xl, since k 1, ,n and thus {N ::;::: n} is independent of Xn - I , X n - Z , Hence by Wald's equation E[T] = E[N1E[Xl :::: E[N]/A and so by (363) and (364) (365) 140 RENEWAL THEORY But by imagining that each customer pays at a rate of 1 per unit time while in the system (and so the total amount paid by the ith arrival is just W,), we see N f: n (s) ds = ~ W, = total paid during a cycle, and so the result follows from (395) Remarks (1) The proof of Theorem 3.62 does not depend on the particular queueing model we have assumed The proof goes through without change for any queueing system that contains times at which the process probabilis- tically restarts itself and where the mean time between such cycles is finite For example, if in our model we suppose that there are k available servers, then it can be shown that a sufficient condition for the mean cycle time to be finite is that E[Y,] < kE[X,] and P{Y, < X,} > G (2) Theorem 362 states that the (time) average number in "the system" = A' (average time a customer spends in "the system"). By replacing "the system" by "the queue" the same proof shows that the average number in the queue = A (average time a customer spends in the queue), or, by replacing "the system" by "service" we have that the average number in service = AE[Y]. 3. 7 REG EN ERA TIVE PROCESSES Consider a stochastic process {X(t), t ~ G} with state space {G, 1,2, .} having the property that there exist time points at which the process (probabilistically) REGENERATIVE PROCESSES 141, restarts itself That is, suppose that with probability 1, there exists a time 51 such that the continuation of the process beyond 51 is a probabilistic replica of the whole process starting at 0 Note that this property implies the existence of further times S2, S3, . having the same property as 51 Such a stochastic process is known as a regenerative process From the above, it follows that {51, 5 h ., ,} constitute the event times of a renewal process We say that a cycle is completed every time a renewal occurs Let N(t) = max{n. 5 n ::; t} denote the number of cycles by time t The proof of the following important theorem is a further indication of the power of the key renewal theorem THEOREM 3.7.1 If F, the distribution of a cycle, has a density ouer some interval, and if E[SII < 00, then P =limP{X(t) = '}= Elamountoftimein<;tatejduringacyclel , H"'} c[timeofacyclel Proof Let pet) :::: P{X(t) = j} Conditioning on the time of the la .. t cycle before t yield .. Now, P{X(t) = jISN(.) = O} = P{X(t) = j lSI> t}, P{X(t) = jISN(I)::= s} P{X(t s) jlSI > t s}, and thus pet) = P{X(t) = j, SI > t} + J: P{X(t s) j, SI > t s} dm(s) Hence, as it can be .. hown that h(t) == P{X(t) = j, SI > t} i .. directly Riemann integrable, we have by the key renewal theorem that P(t)--+ roo P{X(t) = j,SI > t}dtIEIS.j • 0 Now, letting if X(t) == j,Sl > t otherwi .. e, 142 RENEWAL THEORY then f; I(t) dt represents the amount of time in the first cycle that X(t) j Since E [I; I(t)dt] = I: E[/(t)l dt I: P{X(t) = j, SI > t} dt, the result follows EXAMPLE 3.7(AJ Queueing Models with Renewal Arrivals. Most queueing processes in which customers arrive in accordance to a renewal process (such as those in Section 36) are regenerative processes with cycles beginning each time an arrival finds the system empty Thus for instance in the single-server queueing model with renewal arrivals, X(t), the number in the system at time t, consti- tutes a regenerative process provided the initial customer arrives at t = 0 (if not then it is a delayed regenerative process and Theorem 37.1 remains valid). From the theory of renewal reward processes it follows that PI also equals the long-run proportion of time that X(t) = j In fact, we have the following. PROPOSITION 3.7.2 For a regenerative process with E[ S11 < 00, with prohability 1, lim [amount of time inj during (0, t)l = E[time inj during a cycle 1 I_m t E[time of a cycle 1 Proof Suppose that a reward is earned at rate 1 whenever the process is in state j This generates a renewal reward process and the proposition follows directly from Theorem 361 3.7.1 The Symmetric Random Walk and the Arc Sine Laws be independent and identically distributed with P{Y, = I} = P{Y, = -I} = t REGENERATIVE PROCESSES and define Zo = 0, The process {Zn, n ;;::: O} is called the symmetnc random walk process. If we now define Xn by Xn = { ~ -1 if Zn = 0 if Zn > 0 if Zn < 0, 143 then {Xn' n :> 1} is a regenerative process that regenerates whenever Xn takes value O. To obtain some of the properties of this regenerative process, we will first study the symmetric random walk {Zn, n 2 O}. Let Un = P{Z2n = O} and note that (3.7.1) 2n -1 Un = 2n Un-I' Now let us recall from the results of Example l.S(E) of Chapter 1 (the ballot problem example) the expression for the probability that the first visit to 0 in the symmetric random walk occurs at time 2n. Namely, (3.72) e)mh P{ZI =1= 0, Z2 =1= 0, ... ,Z2n-1 =1= 0, Z2n = O} = 2n _ 1 == 2n -1' We will need the following lemma, which states that un-the probability that the symmetnc random walk is at 0 at time 2n-is also equal to the probability that the random walk does not hit 0 by time 2n. 144 RENEW AL THEORy Lemma 3.7.3 Proof From (3 7 2) we see that Hence we must show that (37.3) which we will do by induction on n. When n = 1, the above identity holds since UI =! So assume (373) for n - 1 Now n n-1 1- 2:_U_k_= 1- 2:_U_k ___ U_ n _ k ~ 1 2k - 1 k=1 2k - 1 2n - 1 Un = Un_I - 2n - 1 (by the induction hypothesis) = Un (by (371». Thus the proof is complete Since u. ~ ( ~ ) (i)'", it follows upon using an approximation due to Stirling-which states that n! - nn+ 1J2 e- n V2ii-that (2n)2n+IJ2 e -2nV2ii 1 Un - n 2n + 1 e- 2n (21T)2 2n = V;;' and so Un ~ O as n ~ 00. Thus from Lemma 3.7.3 we see that, with probability 1, the symmetric random walk will return to the origin. REGENERATIVE PROCESSES 145 The next proposition gives the distribution of the time of the last visit to o up to and including time 2n PROPOSITION 3.7.4 For k = 0, 1, , n, Proof P{Z2. = 0, ZU+I -:/= 0, , Z2n -:/= O} = P{Z2. = 0}P{Z2HI -:/= 0, = U.U n _. where we have used Lemma 3 7 3 to evaluate the second term on the right in the above We are now ready for our major result, which is that if we plot the symmetric random walk (starting with Zo = 0) by connecting Zk and Zk+1 by a straight line (see Figure 3.7.1), then the probability that up to time 2n the process has been positive for 2k time units and negative for 2n - 2k time units is the same as the probability given in Proposition 3.7.4. (For the sample path presented in Figure 371, of the first eight time units the random walk was positive for six and negative for two) z" Figure 3.7. J. A sample path for the random walk. 146 RENEWAL THEORY THEOREM 3.7.5 Let Ek n denote the event that by time 2n the symmetric random walk wil( be positive for 2k time units and negative for 2n - 2k time units, and let b kn = P(E kn ) Then (374) Proof The proof is by induction on n Since Uo = 1, it follows that (374) is valid when n = 1 So assume that b km = UkUm-k for all values of m such that m < n. To prove (374) we first consider the case where k = n Then, conditioning on T, the time of the first return to 0, yields n b nn = L P{Enni T = 2r}P{T = 2r} + P{EnniT> 2n}P{T > 2n}. ,=1 Now given that T = 2r, it is equally likely that the random walk has always been positive or always negative in (0, 2r) and it is at 0 at 2r Hence, P{EnniT> 2n} = t and so n b n n = ~ L b n -, n_,P{T = 2r} + !P{T > 2n} ,=1 n = ~ L un-,P{T= 2r} + ~ P { T > 2n}, ,=1 where the last equality, bn-,n-, = Un-,Un, follows from the induction hypothesis Now, n n L un-,P{T= 2r} = L P{Zz,,-2, = O}P{T= 2r} ,=1 ,=1 n = L P{Zz" = OiT= 2r}P{T= 2r} ,==1 == Un, and so (by Lemma 3 7 3) REGENERATIVE PROCESSES 147 Hence (374) is valid for k = n and, in fact, by symmetry, also for k = 0 The proof that (374) is valid for 0 < k < n follows in a similar fashion Again, conditioning on T yields /I b kll = L P{Ehl T 2r}P{T = 2r} ,-1 Now given that T 2r, it is equally likely that the random walk has either always been positive or always negative in (0, 2r) Hence in order for Ek n to occur, the continuation from time 2r to 2n would need 2k - 2r positive units in the former case and 2k in the latter Hence, /I n bkn = L bk-rll-rP{T + t L bkn_rP{T = 2r} r-I /I II = L uk_rP{T 2r} + L U,,-r-kP{T = 2r}, r=l where the last equality follows from the induction hypothesis As /I L uk-rP{T = 2r} :::: Uk, II L U" r-kP{T = 2r} = UII _ .. ,=1 we see that which completes the proof The probability distribution given by Theorem 37.5, namely, is called the discrete arc sine distribution. We call it such since for large k and n we have, by Stirling's approximation, that 1 UIr;U II Ie --;;=== 1TYk(n - k) 148 RENEWAL THEORY Hence for any x, 0 < x < 1, we see that the probability that the proportion of time in (0, 2n) that the symmetric random is positive is less than x, is given by (3.75) nX 1 nX 1 2: UkUn-k :::::: f dy k -0 1l 0 v' Y (n - y) 1 fX -;==:=1 == dw 1l 0 2 . 4 I = arcsme vx 1l Thus we see from the above that for n large the proportion of time that the symmetric random walk is positive in the first 2n time units has approxi- mately the arc sine distribution given by (3 7 5) Thus, for instance, the proba- bility that it is positive less than one-half of the time is (2111) arc sine Vi = ~ . One interesting consequence of the above is that it tells us that the propor- tion of time the symmetric random walk is positive is not converging to the constant value! (for if it were then the limiting distribution rather than being arc sine would be the distribution of the constant random variable) Hence if we consider the regenerative process {X n }, which keeps track of the sign of the symmetric random walk, it follows that the proportion of time that Xn equals 1 does not converge to some constant On the other hand. it is clear by symmetry and the fact that Un --+ 0 and n --+ 00 that P{X" = I} = P{Zn > O} --+ i as n --+ 00. The reason why the above (the fact that the limiting probability that a regener- ative process is in some state does not equal the long-run proportion of time it spends in that state) is not a contradiction, is that the expected cycle time is infinite That is, the mean time between visits of the symmetric random walk to state 0 is such that E[Tl 00 The above must be true, for otherwise we would have a contradiction. It also follows directly from Lemma 3.7 3 since ", OIl E[TJ;> 2: P{T> 2n} ,,-0 (from Lemma 3.7.3) and as Un -- (Vn"1T)-1 we see that ElTJ 00 STATlONARY POINT PROCESSES 149 Remark It follows from Proposition 3.7 4 and Stirling's approximation that, for 0 < x < 1 and n large, n P{no zeroes between 2nx and 2n} = 1 - 2, UkUn-k k:ru ru-} = 2, UkUn-k k:O 2 '. r ::::::;-arcsme vx, 1T where the final approximation follows from (3.7.5). 3.8 STATIONARY POINT PROCESSES A counting process {N{t), t ~ O} that possesses stationary increments is called a stationary point process We note from Theorem 3.5.2 that the equilibrium renewal process is one example of a stationary point process. THEOREM 3.8.1 For any stationary point process, excluding the trivial one with P{N(t) = O} = 1 for all t ~ 0, (381) lim P{N(t) > O} = ,\ > 0 1_0 t ' where ,\ = 00 is not excluded Proof Letf(t) = P{N(t) > O} and note thatf(t) is nonnegative and non decreasing Also Hence and by induction f(s + t) = P{N(s + t) - N(s) > 0 or N(s) > O} ::; P{N(s + t) - N(s) > O} + P{N(s) > O} = f(t) + f(s). f(t) ::; 2f(tI2) f(t) ::; nf(tln) for all n = 1, 2, .. 150 Thus, letting a be such that [(a) > 0, we have (382) [(a) < [(aln) a - aln aIln=I,2, Now define A = lim sup [(t)lt By (3.8.2) we obtain ,....0 RENEWAL THEORY To show that A = lim,....o [(t)lt, we consider two cases. First, suppose that A < 00 In this case, fix 6 > 0, and let s > ° be such that [(s)ls > A - 6. Now, for any t E (0, s) there is an integer n such that s s n n - 1 From the mono tonicity of let) and from (3.82), we obtain that for all t in this interval (383) let) [(sin) _ n - 1 [(sin) n - 1 [(s) -=::: ------=:::.---- t sl(n - 11 n sin n s Hence, Since 6 is arbitrary, and since n 00 as t 0, it follows that Iim,....o [(t)lt = A Now assume A = 00. In this case, fix any large A > ° and choose s such that [(s)ls > A Then, from (383), it fOIlOWS that for all t E (0, s) let) =::: n - 1 [(s) > n - 1 A, t n s n which implies [(t)lt == 00, and the proof is complete ExAMPLE 3.8(A) For the equilibnum renewal process P{N(t) > O} = Fe(t) = F(y) dyllL Hence, using L'hospital's rule, A = lim P{N(t) > O} = lim F(t) = 1.. HO t HO IL IL STATIONARY POINT PROCESSES Thus, for the equilibrium renewal process, A is the rate of the renewal process. For any stationary point process {N(t), t ~ O}, we have that E[N(t + s)] == E[N(t + s) - N(s)] + E[N(s)] = E[N(t)] + E[N(s)] implying that, for some constant c, E[N(t)] = ct 151 What is the relationship between c and A? (In the case of the equilibrium renewal process, it follows from Example 3 8(A) and Theorem 3.5.2 that A = c = 111-') In general we note that since c = i nP{N(t) = n} n = I t ~ i P{N(t) n} n=1 t _ P{N(t) > O} t it follows that c ~ A. In order to determine when c = A, we need the following concept. A stationary point process is said to be regular or orderly if (38.4) P{N(t) ::> 2} = o(t) It should be noted that, for a stationary point process, (3.8.4) implies that the probability that two or more events will occur simultaneously at any point is O. To see this, divide the interval [0, 1] into n equal parts. The probability of a simultaneous occurrence of events is less than the probability of two or more events in any of the intervals j = 0, 1, ,n 1, and thus this probability is bounded by nP{N(lIn) ~ 2}, which by (3.8.4) goes to zero as n ~ 00. When c < 00, the converse of this is also true That is, any stationary point process for which c is finite and for which there is zero probability of simultaneous events, is necessanly regular. The proof in this direction is more complicated and will not be given We win end this section by proving that c = A for a regular stationary point process. This is known as Korolyook's theorem. 152 RENEWAL THEORY KOROLYOOK'S THEOREM For a regular stationary point process, c, the mean number of events per unit time and the intensity A, defined by (38 I), are equal The case A = c = 00 is not excluded Proof Let us define the following notation A, for the event {N(I) > k}, B.;fortbe even+ C: 1) - N W "'2}. n-l Bn = U Bn!' }:O C •• fortbeevent {NC: 1) - N W '" 1.N(1) -N C: 1) = k} Let B > 0 and a positive integer m be given From the assumed regularity of the process, it follows that B P(Bn}) < n(m + 1)' j=O,I,. ,n-I for all sufficiently large n Hence, Therefore, (385) where Bn is the complement of Bn However, a little thought reveals that n-l A.Iin = U c,lin, }:O and hence n-1 P(A,Bn) S L P(C n ,}), }:O PROBLEMS which together with (3 85) implies that (386) m n-I m L P(Ak ) ::; L L P( Cnk) + 8 k=O rO k-O == nP{N(lIn) ::: 1} + 8 ::; ,\ + 28 for all n sufficiently large Now since (386) is true for all m, it follows that 00 L P(Ad ::; ,\ + 28 k=O Hence, 00 00 c == £ [N(1)l = L P{N(1) > k} == L P(A k ) ::;,\ + 28 and the result is obtained as 8 is arbitrary and it is already known that c ;:::: ,\ PROBLEMS 3.1. Is it true that· (a) N(t) < n if and only if Sn > t? (b) N(t) < n if and only if Sn ;> t? (c) N(t) > n if and only if Sn < t? 153 3.2. In defining a renewal process we suppose that F( (0), the probability that an interarrival time is finite, equals 1 If F( (0) < 1, then after each renewal there is a positive probability 1 - F(oo) that there will be no further renewals. Argue that when F( (0) < 1 the total number of renew- als, call it N( (0), is such that 1 + N( (0) has a geometric distribution with mean 1/(1 - F( (0)) 3.3. Express in words what the random variable XN(I}+I represents (Hint It is the length of which renewal interval?) Show that P{XN(!}+I ~ x} ;:::: F(x). Compute the above exactly when F(x) = 1 - e-AJ:. 154 RENEWAL THEORY 3.4. Prove the renewal equation met) = F(t) + f: met - x) dF(x). 3.5. Prove that the renewal function met), 0 ::5 t < 00 uniquely determines the interarrival distribution F 3.6. Let {N(t), t ;2:: O} be a renewal process and suppose that for all nand t, conditional on the event that N(t) = n, the event times SI" , Sn are distnbuted as the order statistics of a set of independent uniform (0, t) random variables Show that {N(t), t :> O} is a Poisson process (Hint Consider E[N(s) I N(t)] and then use the result of Problem 3.5.) 3.7. If F is the uniform (0, 1) distribution function show that met) = e l - 1, Now argue that the expected number of uniform (0, 1) random variables that need to be added until their sum exceeds 1 has mean e 3.8. The random variables Xl' , Xn are said to be exchangeable if X'I' , X, has the same joint distribution as XI" ,X n whenever ii, i 2 , • in is a permutation of 1, 2, ,n That is, they are exchangeable if the joint distribution function P{XI :s XI, X 2 <: Xl> ••• , Xn ::s; xn} is a symmetric function of (XI, X2, •• xn). Let Xl, X 2 , denote the interar- rival times of a renewal process (8) Argue that conditional on N(t) = n, Xl' . , Xn are exchangeable Would XI"" ,X n , Xn+l be exchangeable (conditional on N(t) = n)? (b) Use (a) to prove that for n > 0 E l X, + N ( ~ ; X NI " N(I) ~ n ] = E[XoIN(I) = n). (c) Prove that E [ Xl + 'N'('t)+ XN(I) ] N(t»O =E[X1IXI<t]. 3.9. Consider a single-server bank in which potential customers arrive at a Poisson rate A However, an arrival only enters the bank if the server is free when he or she arrives Let G denote the service distribution (8) At what rate do customers enter the bank? (b) What fraction of potential customers enter the bank? (c) What fraction of time is the server busy? PROBLEMS 155 3.10. Let XI> X 2 ", be independent and identically distributed with E r X,] < 00, Also let NI , N 2 , " be independent and identically distnbuted stopping times for the sequence XI> X 2 , , " with ErN.] < 00, Observe the Xr sequentially, stopping at N I , Now start sampling the remaining X.-acting as if the sample was just beginning with X N1 + I -stopping after an additional N2 (Thus, for instance, XI + ... + X NI has the same distnbution as X N1 + 1 + .. , + X N1 + N ). Now start sampling the remaining Xr-again acting as if the sample was just beginning-and stop after an additional N 3 , and so on. (3) Let Nl+N2 N 1 + +Nm 52 = L X" . , 5m= L Xr' .:N 1 +1 r:N 1 + +Nm_I+1 Use the strong law of large numbers to compute r I m ( 5 + ... +5) m 1 !!!, NI + ... + N m ' (b) Writing 51 + ... + 5 m 51 + ... + 5 m m NI + .. + N m m N, + ... + N m ' derive another expression for the limit in part (a). (c) Equate the two expressions to obtain Wald's equation 3.11. Consider a miner trapped in a room that contains three doors. Door 1 leads her to freedom after two-days' travel; door 2 returns her to her room after four-days' journey, and door 3 returns her to her room after eight-days' journey. Suppose at all times she is equally to choose any of the three doors, and let T denote the time it takes the miner to become free. (3) Define a sequence of independent and identically distnbuted ran- dom vanables XI> X 2 , •• , and a stopping time N such that N T=LX .. r"l Note You may have to imagine that the miner continues to ran- domly choose doors even after she reaches safety. (b) Use Wald's equation to find ErT]. 156 RENEWAL THEORY (c) Compute E [ ~ ~ I X,IN = n] and note that it is not equal to E [ ~ ~ ~ I X,], (d) Use part (c) for a second derivation of E [T] 3.ll. Show how Blackwell's theorem follows from the key renewal theorem. 3.13. A process is in one of n states, 1,2, . , n Initially it is in state 1, where it remains for an amount of time having distribution Fl' After leaving state 1 it goes to state 2, where it remains for a time having distribution F 2 • When it leaves 2 it goes to state 3, and so on. From state n it returns to 1 and starts over. Find lim P{process is in sta te i a t time t}. , ..... '" Assume that H, the distribution of time between entrances to state 1, is nonlattice and has finite mean. 3.14. Let A(t) and Y(t) denote the age and excess at t of a renewal-process Fill in the missing terms: (a) A(t) > x +-+ 0 events in the interval __ ? (b) Y(t) > x +-+ 0 events in the interval __ ? (c) P{Y(t) > x} = P{A( ) > }. (d) Compute the joint distribution of A(t) and Y(t) for a Poisson process 3.15. Let A(t) and Y(t) denote respectively the age and excess at t. Find: (a) P{Y(t) > x IA(t) = s}. (b) P{Y(t) > x IA(t + x12) = s}. (c) P{Y(t) > x IA(t + x) > s} for a Poisson process. (d) P{Y(t) > x, A(t) > y}. (e) If I-L < 00, show that, with probability 1, A(t)lt ---+ 0 as t --+ 00 3.16. Consider a renewal process whose interarnval distnbution is the gamma distnbution with parameters (n, A). Use Proposition 3.4 6 to show that '" lim E[Y(t)] = n 2 + 1. , ..... '" A Now explain how this could have been obtained without any computa- tions 3.17. An equation of the form g(t) = h(t) + J:g(t - x) dF(x) PROBLEMS 157 is called a renewal-type equation. In convolution notation the above states that g = h + g * F. Either iterate the above or use Laplace transforms to show that a re- newal-type equation has the solution g(t) = h(t) + t h(t - x) dm(x), where m(x) = ~ : ~ I Fn(x). If h is directly Riemann integrable and F nonlattice with finite mean, one can then apply the key renewal theorem to obtain r h(t) dt Iimg(t) = ~ _ . 1-'" fo F(t) dt Renewal-type equations for g(t) are obtained by conditioning on the time at which the process probabilistically starts over. Obtain a renewal- type equation for: (a) P(t), the probability an alternating renewal process is on at time t; (b) g(t) = E[A(t)], the expected age of a renewal process at t. Apply the key renewal theorem to obtain the limiting values in (a) and (b) 3.1S. In Problem 3.9 suppose that potential customers arrive in accordance with a renewal process having interarrival distnbution F. Would the number of events by time t constitute a (possibly delayed) renewal process if an event corresponds to a customer: (a) entering the bank? (b) leaving the bank? What if F were exponential? 3.19. Prove Equation (3.5.3). 3.20. Consider successive flips of a fair coin. (a) Compute the mean number of flips until the pattern HHTHHIT ap- pears. (b) Which pattern requires a larger expected time to occur: HHIT or HTHT? 158 RENEWAL THEORY 3.2L On each bet a gambler, independently of the past, either wins or loses 1 unit with respective probabilities p and] - p Suppose the gambler's strategy is to quit playing the first time she wins k consecutive bets. At the moment she quits (0) find her expected winnings (b) find the expected number of bets that she has won. 3.22. Consider successive flips of a coin having probability p of landing heads. Find the expected number of flips until the following sequences appear. (0) A = HHTIHH (b) B = HTHTI. Suppose now that p = 112. (c) Find P{A occurs before B} (d) Find the expected number of flips until either A or B occurs 3.23. A coin having probability p of landing heads is flipped k times Additional flips of the coin are then made until the pattern of the first k is repeated (possibly by using some of the first k flips). Show that the expected number of additional flips after the initial k is 2k. 3.24. Draw cards one at a time, with replacement, from a standard deck of playing cards Find the expected number of draws until four successive cards of the same suit appear. 3.25. Consider a delayed renewal process {No(t), t 2: O} whose first interarnval has distribution G and the others have distribution F Let mo(t) = E[No(t)] (0) Prove that mo(t) = G(t) + f: m(t - x) dG(x), where m(t) = ~ : ; I Fn(t) (b) Let Ao(t) denote the age at time t Show that if F is nonlattice with fx 2 dF(x) < 00 and tG(t) ----+ 0 as t ----+ 00, then f: x 2 dF(x) E[AD(t)] ----+ J'" . 2 xdF(x) o (c) Show that if G has a finite mean, then tG(t) ----+ 0 as t ----+ 00 PROBLEMS 159 3.26. Prove Blackwell's theorem for renewal reward processes That is, assum- ing that the cycle distribution is not lattice, show that, as I ~ 00, E[ d · ( +)] E[reward in cycle] rewar In I, I a ~ a E [ . time of cycle] Assume that any relevant function is directly Riemann integrable. 3.27. For a renewal reward process show that Assume the distribution of Xl is nonlattice and that any relevant function is directly Riemann integrable. When the cycle reward is defined to equal the cycle length, the above yields . _ E[X2] ~ ~ ~ E[XN(I)+d - E[X] , which is always greater than E[X] except when X is constant with probability] (Why?) 3.28. In Example 36(C) suppose that the renewal process of arrivals is a Poisson process with mean IL Let N* denote that value of N that mini- mIZes the long-run average cost if a train leaves whenever there are N travelers waiting Another type of policy is to have a train depart every T time units Compute the long-run average cost under this policy and let T* denote the value of T that minimizes it. Show that the policy that departs whenever N* travelers are waiting leads to a smaller average cost than the one that departs every T* time units. 3.29. The life of a car is a random variable with distribution F An individudl has a policy of trading in his car either when it fails or reaches the age of A. Let R(A) denote the resale value of an A-year-old car. There is no resale value of a failed car Let C 1 denote the cost of a new car and suppose that an additional cost C 2 is incurred whenever the car fails. (a) Say that a cycle begins each time a new car is purchased Compute the long-run average cost per unit time (b) Say that a cycle begins each time a car in use fails Compute the long-run average cost per unit time No Ie. In both (a) and (b) you are expected to compute the ratio of the expected cost incurred in a cycle to the expected time of a cycle The answer should, of course, be the same in both parts 160 RENEWAL THEORY 3.30. Suppose in Example 3 3(A) that a coin's probability of landing heads is a beta random variable with parameters nand m, that is, the probability density is O<p<1. Consider the policy that flips each newly chosen coin until m consecutive flips land tails, then discards that coin and does the same with a new one For this policy show that, with probability 1, the long-run proportion of flips that land heads is 1 3.31. A system consisting of four components is said to work whenever both at least one of components 1 and 2 work and at least one of components 3 and 4 work Suppose that component i alternates between working and being failed in accordance with a nonlattice alternating renewal process with distributions F, and G" i = 1, 2, 3, 4 If these alternating renewal processes are independent, find lim P{system is working at /->'" time t} 3.32 Consider a single-server queueing system having Poisson arrivals at rate A and service distribution G with mean ILG. Suppose that AILG < 1 (a) Find Po, the proportion of time the system is empty. (b) Say the system is busy whenever it is nonempty (and so the server is busy) Compute the expected length of a busy period (c) Use part (b) and Wald's'equation to compute the expected number of customers served in a busy period. 3.33. For the queueing system of Section 3.6 1 define Vet), the work in the system at time t, as the sum ofthe remaining service times of all customers in the system at t Let v = lim II V(s) dslt. '-+'" 0 Also, let D, denote the amount of time the ith customer spends waiting in queue and define W Q = lim (D] + ... + Dn)ln n-+'" (a) Argue that V and W Q exist and are constant with probability 1. (b) Prove the identity V = AE[Y]WQ + AE[y2]12, REFERENCES 161 where llA is the mean interarnval time and Y has the distnbution of a service time. 3.J4. In a k server queueing model with renewal arrivals show by counterexam- ple that the condition E[Y] < kE[X], where Y is a service time and X an interamval time, is not sufficient for a cycle time to be necessarily finite. (Hint: Give an example where the system is never empty after the initial arrival.) 3.35. Packages arnve at a mailing depot in accordance with a Poisson process having rate A. Trucks, picking up all waiting packages, arnve in accor- dance to a renewal process with nonlattice interarrival distribution F Let X(t) denote the number of packages waiting to be picked up at time t. (a) What type of stochastic process is {X(t), t ;:> O}? (b) Find an expression for ' lim P{X(t) := i}, i ;:> O. 1-+'" 3.36. Consider a regenerative process satisfying the conditions of Theorem 3.7 1. Suppose that a reward at rate r(j) is earned whenever the process is in state j. If the expected reward dunng a cycle is finite, show that the long-run average reward per unit time is, with probability 1, given by lim II r(X(s» ds = L P,r(j), H'" 0 t J where P, is the limiting probability that X(t) equals j. REFERENCES References 1,6, 7, and 11 present renewal theory at roughly the same mathematical level as the present text A simpler, more intuitive approach is given in Reference 8 Reference 10 has many interesting applications. Reference 9 provides an illuminating review paper in renewal theory For a proof of the key renewal theorem under the most stringent conditions, the reader should see Volume 20fFeller (Reference 4) Theorem 362 is known in the queueing literature as "Little's Formula" Our approach to the arc sine law is similar to that given in Volume 1 of Feller (Reference 4), the interested reader should also see Reference 3 Examples 3 3(A) and 3 5(C) are from Reference 5. 1 D. R Cox, Renewal Theory, Methuen, London, 1962 2 H. Cramer and M. Leadbetter, Stationary and Related Stochastic Processes, Wiley, New York, 1966 162 RENEWAL THEORy 3 B DeFinetti. Theory of Probability, Vol 1. Wiley. New York, 1970 4 W Feller. An Introduction to Probability Theory and Its Applications, Vols I and II, Wiley, New York, 1957 and 1966 5 S Herschkorn. E. Pekoz, and S. M Ross, "Policies Without Memory for the Infinite-Armed Bernoulli Bandit Under the Average Reward Cnterion." Probabil- ity in the Engineering and Informational Sciences, Vol 10, 1, 1996 6 D Heyman and M Sobel, Stochastic Models in Operations Research, Volume I. McGraw-Hili. New York. 1982 7 S Karlin and H Taylor, A First Course in Stochastic Processes, 2nd ed , Academic Press, Orlando, FL, 1975 8 S M. Ross, Introduction to Probability Models, 5th ed • Academic Press, Orlando, FL,1993 9 W Smith, "Renewal Theory and its Ramifications," Journal of the Royal Statistical Society, Senes B, 20. (1958), pp 243-302 ( 10 H C Tijms. Stochastic Models. An Algorithmic Approach. Wiley, New York, 1994 11. R. Wolff, Stochastic Modeling and the Theory of Queues, Prentice-Hall. NJ. 1989 CHAPTER 4 Markov Chains 4.1 INTRODUCTION AND EXAMPLES Consider a stochastic process {X n , n = 0, 1, 2, ... } that takes on a finite or countable number of possible values. Unless otherwise mentioned, this set of possible values of the process will be denoted by the set of nonnegative integers {O, 1,2, ... }. If Xn = i, then the process is said to be in state i at time n. We suppose that whenever the process is in state i, there is a fixed probability P" that it will next be in state j. That is, we suppose that (4.1.1) P{Xn + 1= j IX n = i, Xn _I = in -I'" • ,XI = i .. Xo = io} = Pi, for all states i 0, ii' " ,i n _ I' i, j and all n ~ 0. Such a stochastic process is known as a Markov chain. Equation (4.1 1) may be interpreted as stating that, for a Markov chain, the conditional distribution of any future state Xn + I' given the past states X o , XI'" ,X n _I and the present state X n , is independent of the past states and depends only on the present state. This is called the Markovian property. The value P" represents the probability that the process will, when in state i, next make a transition into state j. Since probabilities are nonnegative and since the process must make a transition into some state, we have that i,j ~ O ; i = 0, 1, .. " Let P denote the matrix of one-step transition probabilities P", so that P= P oo POI P 02 PIO P II P I2 P ,O P,! P ,2 163 164 MARKOV: CHAINS EXAMPLE 4.1.(A) The M/G/1 Queue. Suppose that customers ar- rive at a service center in accordance with a Poisson process with rate A. There is a single server and those arrivals finding the server free go immediately into service, all others wait in line until their service turn The service times of successive customers are assumed to be independent random variables having a common distribution G; and they are also assumed to be independent of the arrival process. The above system is called the MIG/1 queueing system. The letter M stands for the fact that the interarrival distribution of customers is exponential, G for the service distribution, the number 1 indicates that there is a single server. c If we let X(t) denote the number of customers in the system at t, then {X(t), t ~ O} would not possess the Markovian property that the conditional distribution of the future depends only on the present and not on the past. For if we knew the number in the system at time t, then, to predict future behavior, whereas we would not care how much time had elapsed since the last arrival (since the arrival process is memoryless), we would care how long the person in service had already been there (since the service distribu- tion G is arbitrary and therefore not memoryless) As a means of getting around the above dilemma let us only look at the system at moments when customers depart That is, let Xn denote the number of customers left behind by the nth depar- ture, n ;::=: 1 Also, let Y n denote the number of customers arriving during the service period of the (n + 1 )st customer When Xn > 0, the nth departure leaves behind Xn customers-of which one enters service and the other Xn - 1 wait in line. Hence, at the next departure the system will contain the Xn - 1 customers tha t were in line in addition to any arrivals during the service time of the (n + 1 )st customer Since a similar argument holds when Xn = 0, we see that ( 4.1.2) if Xn > 0 if Xn = O. Since Y n , n ~ 1, represent the number of arrivals in nonoverla p- ping service intervals, it follows, the arrival process being a Poisson process, that they are independent and f a: (Ax) I (4.1.3) P{Yn = j} = e- AX -.-, dG(x), o J. j = 0,1, .. From (4.1.2) and (41.3) it follows that {Xn' n = 1, 2, . '.} is a Markov chain with transition probabilities given by POI = f'" e- AX ( A ~ ) I dG(x), j;::=: 0, o J f eo (Ax)r 1+ I P 'I = 0 e- AX (j _ i + 1)' dG(x), P 'I = 0 otherwise INTRODUCfION AND EXAMPLES EXAMPLE 4.1(B) The G/M/l Queue. Suppose that customers ar- rive at a single-server service center in accordance with an arbitrary renewal process having interarrival distribution G. Suppose further that the service distribution is exponential with rate p.. If we let XII denote the number of customers in the system as seen by the nth arrival, it is easy to see that the process {X"' n 2:: I} is a Markov chain To compute the transition probabilities P" for this Markov chain, let us first note that, as long as there are customers to be served, the number of services in any length of time t is a Poisson random variable with mean p.t This is true since the time between successive services is exponential and, as we know, this implies that the number of services thus constitutes a Poisson process. Therefore, f '» e-I'I (p::)' dG(t), o J. j=O,I, ... ,i, which follows since if an arrival finds i in the system, then the next arrival will find i + 1 minus the number served, and the probability that j will be served is easily seen (by conditioning on the time between the successive arrivals) to equal the right-hand side of the above The formula for P,o is little different (it is the probability that at least i + 1 Poisson events occur in a random length of time having distribution G) and thus is given by P = f'" ~ e-1'1 (p.t)k dG(t) i 2:: O. 10 0 L., kl ' k ~ , + I 165 Remark The reader should note that in the previous two examples we were able to discover an embedded Markov chain by looking at the process only at certain time points, and by choosing these time points so as to exploit the lack of memory of the exponential distribution This is often a fruitful approach for processes in which the exponential distribution is present EXAMPLE4.1(c) Sums of Independent, Identically Distributed Ran- dom Variables. The General Random Walk. Let X" i > 1, be independent and identically distributed with P{X, = j} = a" J 0, ::tl, If we let " So 0 and S" 2: X" ,-I then {SlIt n > O} is a Markov chain for which 166 MARKOV CHAINS {Sn' n ;> O} is called the general random walk and will be studied in Chapter 7. ExAMPLE 4.1(D) The Absolute Value of the Simple Random Walk. The random walk {Sn, n ~ I}, where Sn = ~ ~ X" is said to be a simple random walk if for some p, 0 < P < 1, P{X, = I} = p, P{X, = -I} = q == 1 - p. Thus in the simple random walk the process always either goes up one step (with probability p) or down one step (with probability q). Now consider ISnl, the absolute value of the simple random walk. The process {ISnl, n ~ I} measures at each time unit the absolute distance of the simple random walk from the origin. Somewhat surprisingly {ISn'} is itself a Markov chain. To prove this we will first show that if Snl = i. then no matter what its previous values the probability that Sn equals i (as opposed to -i) is p'/(p' + q '). PROPOSITION 4.1.1 If {Sn, n :?: 1} is a simple random walk, then Proof If we let i 0 = 0 and define j = max{k 0 s. k s. n. i k = O}. then, since we know the actual value of S" it is clear that P{Sn = i IISnl = i.ISn -II = in _ I'" .. IS, I = i I} = P{Sn = illSnl = i .... , IS,+II = i,+IoIS,1 = O} Now there are two possible values of the sequence S, + I' , Sn for which IS, + II = 1,+ I' , ISnl = i The first of which results in Sn = i and has probability ,.. n-, j n-j , -+- --- P 2 2 q 2 2, and the second results in Sn = - i and has probability n -, , n -, , --- -+- p2 2q2 2 CHAPMAN-KOLMOGOROV EQUATIONS AND CLASSIFICATION OF STATES 167· Hence, n-I i n-j i -+- --- I p2 2q2 2 P{Sn = i ISnl = i, . .. ,ISII = il} = __ -I +-!.- p2 2q2 2+p2 2q2 2 = pi pi +q' and the proposition is proven. From Proposition 4.1.1 it follows upon conditioning on whether Sn = +i or -i that _. _. qr _ pr+1 + qi+1 + P{Sn+1 - -(I + l)I Sn - -I} r r - i r· p +q P +q Hence, {ISnl, n I} is a Markov chain with transition probabilities _ p,+1 + qi+1 _ Pi i + I - r r - 1 - P r r - I , i > 0, p + q 4.2 CHAPMAN-KOLMOGOROV EQUATIONS AND CLASSIFICATION OF STATES We have already defined the one-step transition probabilities P il • We now define the transition probabilities to be the probability that a process in state i will be in state j after n additional transitions. That is, n 0, i,j 0. Of course P 11 = P'I The Chapman-KoLmogorov equations provide a method for computing these n-step transition probabilities. These equations are (4.2.1) co pn+m = pn pm II L.J Ik kl k=O for all n, m 0, all i, j, 168 MARKOV CHAINS and are established by observing that pn+m = P{X =J'IX = i} II n +m 0 '" = 2, P{Xn+m = j,X n = klXo = i} k=O '" = 2, P{Xn+m = jlX n = k. Xo = i}P{Xn = klXo = i} k=O '" = 2, P ; I P ~ k ' k=O If we let p(n) denote the matrix of n-step transition probabilities P ~ I ' then Equation (4.2 1) asserts that where the dot represents matrix multiplication. Hence, P (n) = P . P (n -I) = P . P . P (n - 2) = ... = P n • and thus p(n) may be calculated by multiplying the matrix P by itself n times State j is said to be accessible from state i if for some n :> 0, P ~ I > 0. Two states i and j accessible to each other are said to communicate, and we write i ++ j. PROPOSITION 4.2.1 Communication is an equivalence relation That is (i) i++i. (0) if i ++ j, then j ++ i, (iii) if i ++ j and j ++ k, then i ++ k. Proof The first two parts follow tnvially from the definition of communication To prove (iii), suppose that i ++ j and j ++ k. then there exists m, n such that P ~ > 0, P7k > O. Hence. '" P m In - '" pmpn :> pmpn > 0 'k - L.J rr ,k - 'I Ik • ,=0 Similarly, we may show there exists an s for which Pi" > 0 CHAPMAN-KOLMOGOROV EQUATIONS AND CLASSIFICATION OF STATES 169 Two states that communicate are said to be in the same ciass, and by Proposition 4.2 ], any two classes are either disjoint or identical We say that the Markov chain is irreducible if there is only one class-that is, if all states communicate with each other State i is said to have period d if P ~ , = ° whenever n is not divisible by d and d is the greatest integer with this property. (If P ~ , = ° for all n > 0, then define the period of i to be infinite) A state with period ] is said to be aperiodic Let d(i) denote the period of i. We now show that periodicity is a class property PROPOSITION 4.2.2 If i ~ j, then d(i) = dO) Proof Let m and n be such that p ~ P7, > 0, and suppose that P;, > 0 Then pn j m :> pn pm > 0 IJ - I' 'I pn j, j m:> pn P' pm > 0 II - I' "'I ' where the second inequality follows, for instance, since the left-hand side represents the probability that starting in j the chain will be back in j after n + s + m transitions, whereas the right-hand side is the probability of the same event subject to the further restriction that the chain is in i both after nand n + s transitions Hence, dO) divides both n + m and n + s + m, thus n + s + m - (n + m) = s, whenever P;, > O. Therefore, dO) divides d(i) A similar argument yields that d(i) divides dO), thus d(i) = dO) For any states i and j define t ~ 1 to be the probability that, starting in i, the first transition into j occurs at time n Formally, t?1 = 0, til = P{Xn = j, X k =;6 j, k = ], ,n - ] IXo = i}. Let Then t'l denotes the probability of ever making a transition into state j, given that the process starts in i. (Note that for i =;6 j, t,1 is positive if, and only if, j is accessible from i.) State j is said to be recurrent if ~ I = ], and transient otherwise. 170 PROPOSITION 4.2.3 State] is recurrent if. and only if. ., '" pn = 00 L.. /I CHAINS Proof State] is recurrent if, with probability 1, a process starting at j will eventually return However, by the Markovian property it follows that the process pr<;lbabilistically restarts itself upon returning to j Hence, with probability 1. it will return again to j Repeating this argument. we see that, with probability 1. the number of visits to j will be infinite and will thus have infinite expectation On the other hand, sUPP9se j is transient Then each time the process returns to j there is a positive probability 1 - f/l that it will never again return, hence the number of visits is geometric with finite mean 1/(1 - ];}) By the above argument we see that state j is recurrent if, and only if, E[number of visits to jlX o = j] = 00 But, letting I = {1 n 0 otherwise, it follows that L; In denotes the number of visits to j Since the result follows The argument leading to the above proposition is doubly important for it also shows that a transient state will only be visited a finite number of times (hence the name transient). This leads to the conclusion that in a finite-state Markov chain not all states can be transient To see this, suppose the states are 0, 1,. . M and suppose that they are all transient Then a finite amount of time (say after time To) state 0 will never be visited, and after a time (say T 1 ) state 1 will never be visited, and after a time (say T 2 ) state 2 will never be visited, and so on Thus, after a finite time T = max{T o , T 1 ,. ., T M} no states will be visited. But as the process must be in some state after time T, we arrive at a contradiction, which shows that at least one of the states must be recurrent We will use Proposition 423 to prove that recurrence, like penodicity, is a class property CHAPMAN-KOLMOGOROV EQUATIONS AND CLASSIFICATION OF STATES 171 - corollary 4.2.4 If i is recurrent and i ++ j, then j is recurrent Proof Let m and n be such that P7} > 0, p ~ > 0 Now for any s ~ 0 pm+n+r:> pm P' pn }/ - JI II 1/ and thus " pm+n+r 2: pm pn " pr = 00 L..J 11 I' '1 L..J It ' , r and the result follows from Proposition 423 EXAMPLE 4.2(A) The Simple Random Walk. The Markov chain whose state space is the set of all integers and has transition proba- bilities P,,_I =P = 1 - P,,-I' i = 0, ::t:1, .. , where 0 < P < 1, is called the simple random walk One interpreta- tion of this process is that it represents the wanderings of a drunken man as he walks along a straight line. Another is that it represents the winnings of a gambler who on each play of the game either wins or loses one dollar. Since all states clearly communicate it follows from Corollary 4.2.4 that they are either all transient or all recurrent So let us consider state 0 and attempt to determine if ~ : = I P oo is finite or in- finite Since it is impossible to be even (using the gambling model interpretation) after an odd number of plays, we must, of course, have that P 2n+I_0 00 - , n = 1,2, ... On the other hand, the gambler would be even after 2n trials if, and only if, he won n of these and lost n of these As each play of the game results in a win with probability p and a loss with probability 1 - p, the desired probability is thus the binomial proba- bility n = 1,2,3, .. By using an approximation, due to Stirling, which asserts that 172 MARKOV CHAINS where we say that a" - b" when lim" .... "'(a"lb,,) = 1, we obtain P 2" (4p(1 - p))" 00 - V;;; Now it is easy to verify that if a" - b", then L" a" < 00, if, and only if, L" b" < 00. Hence L:= I will converge if, and only if, f (4p(1 - p))" " " I V;;; However, 4pQ - p) :5 1 ",:ith equality and only If, p = l. Hence, I P 00 = 00 If, and only If, p = ?l. Thus, the chain is recurrent when p = ! and transient if p :;6 When p = !, the above process is called a symmetric random walk. We could also look at symmetric random walks in more than one dimension. For instance, in the two-dimensional symmetric random walk the process would, at each transition, either take one step to the left, nght, up, or down, each having probability 1. Similarly, in three dimensions the process would, with probability t make a transition to any of the six adjacent points. By using the same method as in the one-dimensional random walk it can be shown that the two-dimensional symmetric random walk is recur- rent, but all higher-dimensional random walks are transient. Corollary 4.2.5 If i ++ j and j is recurrent, then I., = 1 Proof Suppose Xo = i, and let n be such that P7, > 0 Say that we miss opportunity 1 if X" ¥- j If we miss opportunity 1, then let T[ denote the next time we enter i (T[ is finite with probability 1 by Corollary 424) Say that we miss opportunity 2 if X T +" ¥- j. If opportunity 2 is nussed, let T2 denote the next time we enter i and say I that we miss opportunity 3 if X T +" ¥- j, and so on. It is easy to see that the opportunity 1 number of the first success is a geometnc random vanable with mean lIP71' and is thus finite with probability 1. The result follows since i being recurrent implies that the number of potential opportunities is infinite Let N, (t) denote the number of transitions into j by time t. If j is recurrent and Xo = j, then as the process probabilistically starts over upon transitions into j, it follows that {N/t), t 2: O} is a renewal process with interarrival distribution {f7" n 2: 1}. If Xo = i, i H j, and j is recurrent, then {N,(t), t 2: O} is a delayed renewal process with initial interamval distribution {f7" n 2: 1}. LIMIT THEOREMS 173 4.3 LIMIT THEOREMS It is easy to show that if state j is transient, then ex> " pn < 00 L.J " for all i, n=1 meaning that, starting in i, the expected number of transitions into state j is finite As a consequence it follows that for j transient p ~ , ----+ 0 as n ----+ 00. Let f.L" denote the expected number of transitions needed to return to state j. That is, if j is transient if j is recurrent. By interpreting transitions into state j as being renewals, we obtain the follow- ing theorem from Propositions 3.3.1, 3.3.4, and 3.4.1 of Chapter 3. THEOREM 4.3.1 If i and j communicate, then' (i) p{lim N,(t) It = lIlLI/IX 0 = i} = 1 1-'" n (ii) lim L P:,ln = 1/1L1/ n_CXl k::::1 (iii) If j is aperiodic. then lim P7, = 1IILw n ~ ' " (iv) If j has period d, then lim P7/ = dllLl/' n-'" If state j is recurrent, then we say that it is positive recurrent if f.L" < 00 and null recurrent if f.L" = 00 If we let TT = lim pnd(J) , '" n-+ 00 " it follows that a recurrent state j is positive recurrent if TT, > 0 and null recurrent if TT, = O. The proof of the following proposition is left as an exercise. 174 MARKOV CHAINS PROPOSITION 4.3.2 Positive (null) recurrence is a class property A positive recurrent, aperiodic state is called ergodic Before presenting a theorem that shows how to obtain the limiting probabilities in the ergodic case, we need the fol.lowing definition Definition A probability distribution {PI' j ~ O} is said to be stationary for the Markov chain if ., PI = L P,P'f' j ~ O . , ~ O If the probability distribution of Xo-say P, = P{Xo = j}, j 2= O-is a stationary distribution, then '" P{XI = j} = L P{XI = jlX o = i}P{Xo = i} ,=0 '" = LP,P" = P, ,=0 and, by induction, '" (4.3.1) P{Xn = j} = L P{Xn = jlX n - 1 = i}P{Xn_1 == i} ,=0 '" = L P"P, = Pi' ,=0 Hence, if the initial probability distribution is the stationary distribution, then Xn will have the same distribution for all n. In fact, as {X n , n > O} is a Markov chain, it easily follows from this that for each m > 0, X n , Xn +1' ••• , Xn +m will have the same joint distribution for each n; in other words, {X n , n ~ O} will be a sta tionary process. LIMIT THEOREMS 175 ;lEOREM 4.3.3 An irreducible aperiodic Markov chain to one of the following two / (i) Either the states are all transient or all null recurrent. in this case, P7 1 --+ 0 n --+ co for all i, j and there exists no wationary distribution (ii) Or else, all states are positive recurrent, that is, TT = lim pn > 0 I II In case, {TTI,j = 0,1.2. } a stationary distribution and there exists no other stationary distribution Proof We will first prove (ii) To begin, note that Letting n --+ co yields implying that Now M " '" pn :::; '" pI! = 1 L; 'I L; II j=O '" M for all M for all M. P7/ 1 = L P7k P ki L P7kPkl for all M k-O k-O Letting n --+ co yields for all M, implying that '" TTl L TTkP kl , ] 2:: 0 krO 176 MARKOV CHAINS To show that the above is actually an equality, suppose that the inequality is strict for some j Then upon adding these inequalities we obtain ~ ~ ~ ~ 2: TTl > 2: 2: TTkPkj = 2: TTk 2: P kl = 2: TTb I ~ I I I ~ O k ~ O k ~ O I ~ O k ~ O which is a contradiction Therefore, '" TTl = 2: TTkPkj , k-O j = 0, 1,2, Putting PI = TT/I.; TT k , we see that {PI' j = 0, 1, 2. } is a stationary distribution, and hence at least one stationary distribution exists Now let {PI' j = 0, 1. 2, .} be any stationary distribution Then if {PI' j = 0, 1, 2, } is the probability distnbution of Xo. then by (431) '" (432) == 2: P{Xn = jlX o = i}P{Xu = i} , ~ O From (432) we see that M PI? 2: P7 I P, for all M. ,- 0 Letting n and then M approach 00 yields '" (433) PI ? 2: TTl P, = TTl , ~ O To go the other way and show that PI::; TTl' use (4 3 2) and the fact that P7 1 ::; 1 to obtain M '" PI::; 2: P7 I P, + 2: P, for all M. , ~ I I r="Mt"1 and letting n -+ 00 gives M '" PI S 2: TTIP, + 2: P, forall M ,;0 ,;M "I LIMIT THEOREMS Since ~ ; P, ::= 1, we obtain upon letting M -+ 00 that (434) "" PI:::; L "1 PI = "1 1"0 177 If the states are transient or null recurrent and {PI' j = 0, 1, 2, .. } is a stationary distribution, then Equations (4.3.2) hold and P ~ , - + 0, which is clearly impossible. Thus, for case (i), no stationary distribution exists and the proof is complete Remarks (1) When the situation is as described in part (ii) of Theorem 4.3.3 we say that the Markov chain is ergodic. (2) It is quite intuitive that if the process is started with the limiting proba- bilities, then the resultant Markov chain is stationary. For in this case the Markov chain at time ° is equivalent to an independent Markov chain with the same P matnx at time 00, Hence the original chain at time t is equivalent to the second one at time 00 + t = 00, and is therefore sta tionary (3) In the irreducible, positive recurrent, periodic case we still have that the '1TJ' j 2: 0, are the unique nonnegative solution of But now '1T, must be interpreted as the long-run proportion of time that the Markov chain is in state j (see Problem 4.17). Thus, '1T, = lllL lJ , whereas the limiting probability of going from j to j in nd(j) steps is, by (iv) of Theorem 431, given by lim pnd =!L = d'1T n '" JJ IL I' .... /I where d is the period of the Markov chain' ExAMPLE 4.3(A) Limiting Probabilities/or the Embedded M/G/l Queue. Consider the embedded Markov chain of the MIG/l sys- 178 MARKOV CHAINS tern as in Example 4.1(A) and let f '" (Ax)' a, = e-:u-.,-dG(x). o J. That is, a j is the probability of j arrivals during a service period The transition probabilities for this chain are Po, = aj' P,,=a,-i+I' ;>0, j>i-l, P'j = 0, j < ; - 1 Let p = ~ j ja j • Since p equals the mean number of arrivals during a service period, it follows, upon conditioning on the length of that period, i,ha t p = '\£[S], where S is a service time having distribution G We shall now show that the Markov chain is positive recurrent when p < 1 by solving the system of equations These equations take the form ( 4.3.5) }"'- I TT, = TToa, + 2: TTia,_I->I, 1= I To solve, we introduce the generating functions a: '" TT(S) = 2: TT,S', A(s) = 2: ajs' ,=0 ,=0 Multiplying both sides of (4.3.5) by s' and summing over j yields a: ,-> 1 TT(S) = TToA(s) + 2: 2: TT,a,_I+ls' , = O / ~ 1 a: '" = TT A(s) + S-I ~ TT S' ~ a sri+1 o L.J I L.J ,-1+1 I = I ,=1- 1 = TToA(s) + (TT(S) - TTo)A(s)/s, L.IMIT THEOREMS or ( ) _ (s - l)1ToA(s) 1T s - S - A(s) . To compute 1To we let s -+ 1 in the above. As co IimA(s) = La l = 1, s_1 this gives I . ( ) I' s - 1 1m 1T s = 1To 1m A( ) s-' s-I S S = 1To(1 - A'(I)t', where the last equality follows from L'hospital's rule. Now '" A'(I) = L ja t = p, ,=0 and thus lim 1T(S) I-I 1 - P However, since lims_1 1T(S) = this implies that 1To/(l - p); thus stationary probabilities exist if and only if p < 1, and in this case, 1To 1 - p = 1 - AE[S]. Hence, when p < 1, or, equivalently, when E[S] < IIA, ( ) _ (1 - AE[SJ)(s - l)A(s) 1T s - S - A(s) • ExAMPLt: 4.3(&) Limiting Probabllides for the Embedded GI MIl Queue. Consider the embedded Markov chain for the GIMII queueing system as presented in Example 4.I(B). The limiting probabilities 1TA;, k = 0, 1, ... can be obtained as the unique solu- tion of k 2!:: 0, 179 180 MARKOV CHAINS which in this case reduce to (4.36) _ '" J'" -I" (J,Lt)' + 1 k TT" - ,,,,t-l TT, 0 e (i + 1 _ k)! dG(t), k'2:.1, (We've not included the equation TTo = ~ , TT,P,o since one of the equations is always redundant.) To solve the above let us try a solution of the form TTl< = cf3/t. Substitution into (4.3.6) leads to (4.3.7) or (4.3.8) The constant c can be obtained from ~ / t TTl< = 1, which implies that C =:: 1 - f3. Since the TTk are the unique solution to (4.3.6) and TTk (1 - f3)f3k satisfies, it follows that k = 0, 1, ... , where f3 is the solution of Equation (4.3.8), (It can be shown that if the mean of G is greater than the mean service time 11 J,L, then there is a unique value of f3 satisfying (4.3.8) that is between 0 and 1.) The exact value of f3 can usually only be obtained by numerical methods. EXAMPLE 4.3(c) The Age of a Renewal Process. Initially an item is put into use, and when it fails it is replaced at the beginning of the next time period by a new item Suppose that the lives of the items are independent and each will fail in its ith period of use with J?robability P" i > 1, where the distribution {P,} is aperiodic and ~ , iP, < 00. Let Xn denote the age of the item in use at time LIMIT THEOREMS n-that is, the number of periods (including the nth) it has been in use. Then if we let denote the probability that an i unit old item fails, then {X n' n 2: O} is a Markov chain with transition probabilities given by PI. I = A(i) = 1 - PI/+ 1> i 2:: 1. Hence the limiting probabilities are such that (4.3.9) (4.3 10) Iterating (4.3.10) yields TT,+ I = TT,(l - A(i)) i 2:: 1. = TT, 1(1 - A(i))(l - A(i - 1)) = TT I (1 A(l))(l - A(2))' .. (1 - A(i)) TTIP{X:> i + I}, where X is the life of an item Using ~ ~ TTl = 1 yields '" 1 = TTl 2: P{X 2:: i} I ~ I or TTl = lIE[X] and (43.11) TTl = P{X> i}IE[X], i 2:: 1, which is easily seen to satisfy (4.3.9). It is worth noting that (4 3 11) is as expected since the limiting distribution of age in the nonlattice case is the equilibr!!im distribu- tion (see Section 34 of Chapter 3) whose density is F(x)IE[X] 181 182 MARKOV CHAINS Our next two examples illustrate how the stationary probabilities can some- times be determined not by algebraically solving the stationary equations but by reasoning directly that when the initial state is chosen according to a certain set of probabilities then the resulting chain is stationary EXAMPLE 4.3(0) Suppose that during each time period, every mem- ber of a population independently dies with probability p, and also that the number of new members that join the population in each time period is a Poisson random variable with mean A. If we let Xn denote the number of members of the population at the begin- ning of period n, then it is easy to see that {X n , n = 1, ... } is a Markov chain. To find the stationary probabilities of this chain, suppose that X 0 is distributed as a Poisson random variable with parameter 0:. Since each of these Xo individuals will independently be alive at the beginning of the next period with probability 1 - p, it follows that the number of them that are still in the population at time 1 is a Poisson random variable with mean 0:(1 - p) As the number of new members that join the population by time 1 is an independent Poisson random variable with mean A, it thus follows that XI is a Poisson random variable with mean 0:(1 - p) + A. Hence, if 0: = 0:(1 - p) + A then the chain would be stationary Hence, by the uniqueness of the stationary distribution, we can conclude that the stationary distribution is Poisson with mean Alp. That is, j = 0, 1, ... EXAMPLE 4.3(E) The Gibbs Sampler. Let p(x.. . , x n ) be the joint probability mass function of the random vector XI, ... , X n • In cases where it is difficult to directly generate the values of such a random vector, but where it is relatively easy to generate, for each i, a random variable having the conditional distribution of X, given all of the other XI' j :;tf i, we can generate a random vector whose probability mass function is approximately p(x l ,. , x n ) by using the Gibbs samrler. It works as follows. Let X 0 = (x I, • , x ~ ) be any vector for which p (X?2 , ... , x?) > 0 Then generate a random variable whose distribution is the condi- tional distribution of XI given that ~ = x ~ , j = 2,. , n, and call its value x 1 Then generate a random variable whose distribution is the condi- tional distribution of X 2 given that XI = x:, ~ = x ~ , j = 3, . , n, and call its value x ~ Then continue in this fashion until you have generated a random variable whose distribution is the conditional distribution of Xn given that XI = x:' j = 1, , n - 1, and call its value x ~ . LIMIT THEOREMS Let X I = (X:, .. , x and repeat the process, this time starting with X I in place of X 0, to obtain the new vector X 2, and so on. It is easy to see that the seq uence of vectors X}, j 0 is a Markov chain and the claim is that its stationary probabilities are given by P(XI,' , XII)' To verify the claim, suppose that XO has probability mass func- tion P(XI' . , xn)' Then it is easy to see that at any point in this I . h h } }} - I r I 'II b h I a gont m t e vector x I, ••• , X 1_ I , X I ••• , X n WI e t e va ue of a random variable with mass functionp(x l , •.. ,xn)' For instance, letting X be the random variable that takes on the val ue denoted by x; then P{X: ... ,n} = P{X: = = x},j = 2, ... ,n}P{XY = x},j = 2, ... ,n} = P{X I = xliX} = x},j = 2, ... ,n}P{X} = x},j = 2, .. , n} =P(XI,' .,xn)· Thus P(XI, ... , xn) is a stationary probability distribution and so, provided that the Markov chain is irreducible and aperiodic, we can conclude that it is the limiting probability vector for the Gibbs sampler. It also follows from the preceding that P(XI, ... , xn) would be the limiting probability vector even if the Gibbs sampler were not systematic in first changing the value of XI' then X 2 , and so on. Indeed, even if the component whose value was to be changed was always randomly determined, then P(XI' ... ,x n ) would remain a stationary distribution, and would thus be the limiting probability mass function provided that the resulting chain is aperiodic and irreducible. 183 Now, consider an irreducible, positive recurrent Markov chain with station- ary probabilities, 1T}, j O-that is, 1T} is the long-run proportion of transitions that are into state j. Consider a given state of this chain, say state 0, and let N denote the number of transitions between successive visits to state O. Since visits to state 0 constitute renewals it follows from Theorem 3.3.5 that the number of visits by time n is, for large n, approximately normally distributed with mean n/ E[N] = n1T o and variance nVar(N)/(E [N])3 = It remains to determine Var(N) = E [N 2 ] - To derive an expression for E [N2], let us first determine the average number of transitions until the Markov chain next enters state O. That is, let Tn denote the number of transitions from time n onward until the Markov chain enters . . TI + T2 +. . + Til . .. . state 0, and conSider 11m . By Imagmmg that we receive a n-+iXI n reward at any time that is equal to the number of transitions from that time onward until the next visit to state 0, we obtain a renewal reward process in which a new cycle begins each time a transition into state 0 occurs, and for 184 MARKOV CHAINS which the average reward per unit time is Long-Run Average Reward Per Unit Time I ' T\ + T2 + ... + Tn 1m . n_ cc n By the theory of renewal reward processes this long-run average reward per unit time is equal to the expected reward earned during a cycle divided by the expected time of a cycle. But if N is the number of transitions between successive visits into state 0, then Thus, (4312) E [Reward Earned during a Cycle] = E [N + N - 1 + ... + 1] E{Timeofa Cycle] = E[N]. Average Reward Per Unit Time = E _ E[N 2 ] + E{N] 2E[N] However, since the average reward is just the average number of transitions it takes the Markov chain to make a transition into state 0, and since the proportion of time that the chain is in state i is 1T" it follows that (4.3 13) Average Reward Per Unit Time = 2: 1T,/L,o, where /L ,0 denotes the mean number of transitions until the chain enters state o given that it is presently in state i By equating the two expressions for the average reward given by Equations (4312) and (4.3.13), and using that E{N] l/1To, we obtain that or (43.14) 1ToE[N2] + 1 = 22: 1T,/L,o I 2 1 = - "'" 1T II +- L.J ,rIO 1To "'-0 1T0 where the final equation used that /Loo = E{N] = 1/1T 0 • The values /L,O can be obtained by solving the following set of linear equations (obtained by conditioning on the next state visited). /L,O = 1 + 2: p,}/L} 0, i >0 1"0 TRANSITIONS AMONG CLASSES, THE GAMBLER'S RUIN PROBLEM ExAMPI.E 4.3(F) Consider the two·state Markov chain with For this chain, P OO a 1 - POI P IO = fj = 1 - PII' I-a 1 - a + fj' 1T =---- I I-a+fj ILIO lIfj. Hence, from (4.3.14), £[N 2 ] 21TIILI0/1T0 + I/1To = 2(1 - a)/fj2 + (1 - a + fj)/fj and so, Var(N) = 2(1 - a)/fj2 + (1 - a + fj)/fj - (1 - a + fj)2/fj2 (1 - fj + afj - ( 2 )/fj2. Hence, for n large, the number of transitions into state 0 by time n is approximately normal with mean mTo = nfj/(1 - a + fj) and vanance nfj(1 - fj + afj - ( 2 )/(1 - a + fj) . For instance, if a = fj = 1/2, then the number of visits to state 0 by time n is for large n approximately normal with mean n/2 and variance n/4. 4.4 TRANSITIONS AMONG CLASSES, THE GAMBLER'S RUIN PROBLEM, AND MEAN TIMES IN TRANSIENT STATES 185· We begin this section by showing that a recurrent class is a closed class in the sense that once entered it is never left. PROPOSITION 4.4.1 Let R be a recurrent c1ass of states. If i E R, j e R, then P" ::::: O. Proof Suppose P" > O. Then, as i and j do not communicate (since j e R). P;, = 0 for all n Hence if the process starts in state i, there is a positive probability of at least P" that the process will never return to i This contradicts the fact that i is recurrent, and so PI, ::::: O. 186 MARKOV CHAINS Let j be a given recurrent state and let T denote the set of all transient states. For i E T, we are often interested in computing /,J' the probability of ever entering j given that the process starts in i The following proposition, by conditioning on the state after the initial transition, yields a set of equations satisfied by the /', PROPOSrrlON 4.4.2 If j is recurrent, then the set of probabilities {fr" i E T} satisfies where R denotes the set of states communicating with j Proof fr, = P{NJ(oo) > 0IX o i} = 2: P{N,(oo) > 0lx o = i, XI = k}P{XI = klxo= i} .11k where we have used Corollary 425 in asserting that h, 1 for k E R and Proposition 441 in asserting that h, = ° for k f£. T, k f£. R. EXAMPLE 4.4(A) The Gamblers Ruin Problem. Consider a gam- bIer who at each play of the game has probability p of winning 1 unit and probability q = 1 - P of losing 1 unit. Assuming successive plays of the game are independent, what is the probability that, starting with i units, the gambIer's fortune will reach N before reaching ° If we let Xn denote the player's fortune at time n, then the process {X n , n = 0, 1, 2, .} is a Markov chain with transition probabilities P oo =P NN =I, Pi i + 1 = P = 1 PI iI' i = 1,2, ,N 1. This Markov chain has three classes, namely, {O}, {I, 2, ... , N I}, and {N}, the first and third class being recurrent and the second transient. Since each transient state is only visited finitely TRANSITIONS AMONG CLASSES, THE GAMBLER'S RUIN PROBLEM often, it follows that, after some finite amount of time, the gambler will either attain her goal of N or go broke. Let f, == f,N denote the probability that, starting with i, 0 :s; i <::: N, the gambier's fortune will eventuaHy reach N By condition- ing on the outcome of the initial play of the game (or, equivalently, by using Proposition 4.4.2), we obtain i = 1, 2,. , N - 1, or, equivalently, since p + q = 1, i = 1,2, .,N-l. Since to = 0, we see from the above that Adding the first i 1 of these equations yields or 1 - (q/p)' 1 - (q/p) ~ f, it. Using iN = 1 yields 1 - (q/p)i 1 - (q/p)N f,= i N if 9.. =F 1 p if 9. = 1- p if p =F i if p = i 187· 188 It is interesting to note that as N ----+ 00 { 1-(q/P)' t,----+ o if p >! if p :S! MARKOV CHAINS Hence, from the continuity property of probabilities, it follows that if p > -!, there is a positive probability that the gambIer's fortune will converge to infinity, whereas if p ::s; ~ , then, with probability 1, the gambler will eventually go broke when playing against an infinitely rich adversary. Suppose now that we want to determine the expected number of bets that the gambler, starting at i, makes before reaching either o or n Whereas we could call this quantity m, and derive a set of linear equations for m" i = 1, . , n - 1, by conditioning on the outcome of the initial gamble, we will obtain a more elegant solu- tion by using Wald's equation along with the preceding. Imagining that the gambler continues to play after reaching either 0 or n, let XI be her winnings on the jth bet, j ~ 1. Also, let B denote the number of bets until the gambler's fortune reaches either 0 or n. That is, B = Min {m i XI = - i I = I or i XI = n - i}. I-I Since the XI are independent random variables with mean E[XI] = l(p) - 1(1 - p) = 2p - 1, and N is a stopping time for the XI' it follows by Wald's equation that But, if we let a = [1 - (q/p)']/[l - (q/p)n] be the probability that n is reached before 0, then with probability a with probability 1 - a. Hence, we obtain that (2p - 1)E[B] = na - ; or E [B] = 1 {n[1 - (q/p)'] - i} 2p - 1 1 - (q/p)n TRANSITIONS AMONG CLASSES, THe GAMBLER'S RUIN PROBLEM 189 Consider now a finite state Markov chain and suppose that the states are numbered so that T = {1, 2, ", t} denotes the set of transient states Let PI,] P" P II and note that since Q specifies only the transition probabilities from transient states into transient states, some of its row sums are less than 1 (for otherwise, T would be a closed class of states) For transient states i and j, let mil denote the expected total number of time periods spent in state j given that the chain starts in state i Conditioning on the initial transition yields' (44,1 ) mil = 8(;,j) + L P,km kl k I = 8(;,j) + L P,km kl k ~ 1 where 8(;, j) is equal to 1 when i = j and is 0 otherwise, and where the final equality follows from the fact that m kl = 0 when k is a recurrent state Let M denote the matrix of values mil' i, j = 1" ,t, that is, In matrix notation, (4.41) can be written as M = 1 + QM where 1 is the identity matrix of size t As the preceding equation is equiva- lent to (/- Q)M = 1 we obtain, upon mUltiplying both sides by (I - Qtl, that That is, the quantities mil' ie T, je T, can be obtained by inverting the matrix 1 - Q (The existence of the inverse is easily established) 190 MARKOV CHAINS For ieT, jeT, the quantitY!.J> equal to the probability of ever making a transition into state j given that the chain starts in state i, is easily determined from M To determine the relationship, we start by deriving an expression for mil by conditioning on whether state j is ever entered. mil = E [number of transitions into state j Istart in i] = m ll !./ where mil is the expected number of time periods spent in state j given that it is eventually entered from state i. Thus, we see that EXAMPLE 4.4(a) Consider the gambler's ruin problem with p = 4 and n = 6. Starting in state 3, determine' (a) the expected amount of time spent in state 3. (b) the expected number of visits to state 2 (c) the probability of ever visiting state 4. Solution The matrix Q, which specifies PI}' i, je{1, 2, 3, 4, 5} is as follows. 1 2 3 4 5 1 0 .4 0 0 0 2 .6 0 .4 0 0 Q=3 0 .6 0 .4 0 4 0 0 .6 0 .4 5 0 0 0 .6 0 Inverting (I Q) gives that 1.5865 0.9774 05714 03008 01203 14662 24436 14286 07519 03008 M= (I Qt l = 12857 21429 27143 14286 05714 1.0150 16917 21429 24436 09774 0.6090 1.0150 12857 14662 15865 Hence, m33 = 2.7143, m 3 • 2 = 2.1429 /34 = m 34 /m 44 = 1428612.4436 = .5846. BRANCHING PROCESSES As a check, note that 13 4 is just the probability, starting with 3, of visiting 4 before 0, and thus 13.4 = ~ = ~ : : ~ : : ~ : = 38/65 = .5846. 4.5 BRANCHING PROCESSES 191 Consider a population consisting of individuals able to produce offspring of the same kind Suppose that each individual will, by the end of its lifetime, have produced j new offspring with probability P" j ~ 0, independently of the number produced by any other individual. The number of individuals initially present, denoted by X o , is called the size of the zeroth generation. All offspring of the zeroth generation constitute the first generation and their number is denoted by Xl' In general, let Xn denote the size of the nth genera- tion. The Markov chain {X n , n ~ O} is called a branching process. Suppose that Xo = 1. We can calculate the mean of Xn by noting that X n _ 1 Xn = L Z" , po:; 1 where Z, represents the number of offspring of the ith individual of the (n - l)st generation Conditioning on X n - I yields E [Xn1 = E [E [XnIX n -.]] = JLE [Xn-.1 = JL 2 E [Xn - 21 where JL is the mean number of offspring per individual. Let 1To denote the probability that, starting with a single individual, the population ever dies out An equation determining 1To may be derived by conditioning on the number of offspring of the initial individual, as follows. 1To = P{population dies out} ., = L P{population dies outl Xl = j}P, ,=0 Now, given that Xl = j, the population will eventually die out if, and only if, each of the j families started by the members of the first generation eventually die out. Since each family is assumed to act independently, and since the 192 MARKOV CHAINS probability that any particular family dies out is just 7To, this yields (4.5.1) In fact we can prove the following. THEOREM 4.5.1 Suppose that Po > 0 and Po + PI < 1. Then (i) 1To is the smallest positive number satisfying (ii) 1To = 1 if, and only if, II. :s; 1. Proof To show that 1To is the smallest solution of (4.5.1), let 1T ~ 0 satisfy (45.1). We'll first show by induction that 1T ~ P{X" O} for all n. Now 1T = 2: 1T J P J ~ 1Top o = Po = P{X 1 = O} J and assume that 1T ~ P{X" = O} Then P{X"+I = O} 2: P{X"+1 = OlX 1 = j}P I J 2: (P{X" = On i PI I (by the induction hypothesis) 1T Hence, 1T ~ P{X" = O} for all n, and letting n _ 00, 1T ~ lim P{X" O} P{population dies out} = 1To " To prove (ii) define the generating function APPLICATIONS OF MARKOV CHAINS (1, 1) / / / Po t--___ ~ / / 45° / / / / / / / / Figure 4.5.1 Figure 4.5.2 Since Po + PI < I, it follows that .. c/>"(S) = L j(j - l)sJ- Z P J > 0 J ~ O (1, 1) / 193 for all s E (0, 1) Hence, c/>(s) is a strictly convex function in the open interval (0, 1) We now distinguish two cases (Figures 45 1 and 452) In Figure 4.5 1 c/>(s) > s for all s E (0, 1), and in Figure 452, c/>(s) = s for some s E (0, 1). It is geometncally clear that Figure 45 1 represents the appropnate picture when C/>' (1) :::; 1. and Figure 452 is appropriate when c/>'(1) > 1 Thus, since c/>(17 0 ) = 17 0 , 170 = 1 if, and only if, c/>'(1) :::; 1 The result follows, since c/>'(1) = L ~ jP J = IJ. 4.6 ApPLICATIONS OF MARKOV CHAINS 4.6.1 A Markov Chain Model of Algorithmic Efficiency Certain algorithms in operations research and computer science act in the following manner the objective is to determine the best of a set of N ordered elements The algorithm starts with one of the elements and then successively moves to a better element until it reaches the best. (The most important example is probably the simplex algorithm of linear programming, which attempts to maximize a linear function subject to linear constraints and where an element corresponds to an extreme point of the feasibility region.) If one looks at the algorithm's efficiency from a "worse case" point of view, then examples can usually be constructed that require roughly N - 1 steps to reach the optimal element In this section, we will present a simple probabilistic model for the number of necessary steps. Specifically. we consider a Markov chain that when transiting from any state is equally likely to enter any of the better ones 194 Consider a Markov chain for which PI I = 1 and 1 p'J=-' -1' 1- j=I, ... ,i-l, i>l, MARKOV CHAINS and let T, denote the number of transitions to go from state i to state 1. A recursive formula for E{T;] can be obtained by conditioning on the initial tran- sition: ( 4.6.1) 1 ,-1 E[T,] = 1 + i-I J ~ E[TJ Starting with E [TI] = 0, we successively see that E{T2] = 1, E{T)] = 1 + l, E{T4] = 1 +1(1 + 1 +i) = 1 +i+1, and it is not difficult to guess and then prove inductively that ,-1 1 E[T,] = 2:-:. J= I J However, to obtain a more complete description of TN' we will use the representation where I = {I J 0 N-l TN = 2: I j , j= I if the process ever enters j otherwise. The importance of the above representation stems from the following. Lemma 4.6.1 f., • f N - 1 are independent and P{f J = l} = 1/j, l:=:;;j:=:;;N-l APPLICATIONS OF MARKOV CHAINS 195 proof Given ',+1> , 'N, let n = min{i i> j. " = I} Then P{',= 11',-'-1' , } = lI(n - 1) = ! , N j/(n-I) j PROPOSITION 4.6.2 N-I 1 (i) £[TNJ = 2: ""7 , ~ 1 ] N-1I( 1) (ii) Var(TN) = 2: -: 1 - -: ,- 1 ] ] (iii) For N large. TN has approximately a Poisson distribution with mean log N Proof Parts (i) and (ii) follow from Lemma 46 1 and the representation TN = ' 2 : , ~ 1 1 ',. Since the sum of a large number of independent Bernoulli random variables. each having a small probability of being nonzero, is approximately Poisson distributed, part (iii) follows since or and so f NdX ~ I 1 fN-1 dx -<L,,-:<I+ - 1 XI] 1 X N-I 1 10gN< 2: -:< 1 + 10g(N-I), 1 ] N-I 1 log N::::,: 2: -:. , ~ 1 ] 4.6.2 An Application to Runs-A Markov Chain with a Continuous State Space Consider a sequence of numbers XI, X2, •••• If we place vertical Jines before XI and between x, and x,+ I whenever x, > X,-'-I' then we say that the runs are the segments between pairs of lines For example, the partial sequence 3, 5, 8, 2, 4, 3, 1 contains four runs as indicated 13, 5, 812,41311. Thus each run is an increasing segment of the sequence 196 MA RKOV CHAI NS Suppose now that XI. X 2 ,. are independent and identically distributed uniform (0, 1) random variables, and suppose we are interested in the distribu_ tion of the length of the successive runs For instance, if we let L I denote the length of the initial run, then as L I will be at least m if, and only if, the first m values are in increasing order. we see that 1 m. m = 1.2,. The distribution of a later run can be easily obtained if we know x. the initial value in the run. For if a run starts with x, then (4.62) (1 )m-I P{L>mlx}= -x (m - 1)' since in order for the run length to be at least m, the next m - 1 values must all be greater than x and they must be in increasing order. To obtain the unconditional distribution of the length of a given run, let In denote the initial value of the nth run. Now it is easy to see that {In. n I} is a Markov chain having a continuous state space To compute p(Ylx), the probability density that the next run begins with the value y given that a run has just begun with initial value x. reason as follows· co E (y,y + dY)l/n = x} = L P{ln+1 E (Y.y + dy). Ln = ml/ n = x}. m&1 where Ln is the length of the nth run. Now the run will be of length m and the next one will start with value y if· (i) the next m - 1 values are in increasing order and are all greater thanx; (ii) the mth value must equal y; (iii) the maximum of the first m - 1 values must exceed y. Hence, P{ln+ I E (y, Y + dy), Ln = ml/ n = x} _(1_x)m-1 - (m-l)' dyP{max(X1,. ,X m _ I »yIX,>x,i=I •... ,m-l} (1 - x)m-I (m -1)' dy ify< x = (l-x)m-1 [ (y_x)m-I] (m - 1)' dy 1 - 1 - x if Y > x. APPLICATIONS OF MARKOV CHAINS summing over m yields { I-x p(ylx) = e l _ x Y-.t e -e ify <x ify>x. 197 That is. {In' n ~ I} is a Markov chain having the continuous state space (0. 1) and a transition probability density P ( ylx) given by the above. To obtain the limiting distnbution of In we will first hazard a guess and then venfy our guess by using the analog of Theorem 4.3.3. Now II. being the initial value ofthe first run, is uniformly distributed over (0, 1). However, the later runs begin whenever a value smaller than the previous one occurs. So it seems plausible that the long-run proportion of such values that are less than y would equal the probability that a uniform (0, 1) random variable is less than y given that it is smaller than a second, and independent, uniform (0, 1) random vanable. Since it seems plausible that 17'( y), the limiting density of In' is given by 1T(Y) = 2(1 y), O<y<l [A second heuristic argument for the above limiting density is as follows: Each X, of value y will be the starting point of a new run if the value prior to it is greater than y. Hence it would seem that the rate at which runs occur, beginning with an initial value in (y, y + dy), equals F(y)f(y) dy = (1 - y) dy, and so the fraction of all runs whose initial value is in (y, y + dy) will be (1 - y) d y / f ~ (1 y) dy = 2(1 y) dy.] Since a theorem analogous to Theorem 4 3.3 can be proven in the case of a Markov chain with a continuous state space, to prove the above we need to verify that In this case the above reduces to showing that or 1 Y = f ~ e l - X (1 x) dx - f ~ ey-x(1 - x) dx, 198 MARKOV CHAINS which is easily shown upon application of the identity Thus we have shown that the limiting density of In' the initial value of the nth run, is 1T(X) = 2(1 - x), 0 < x < l. Hence the limiting distribution of Ln, the length of the nth run, is given by (4.6.3) . :> _ JI (1 - x)m-I P{Ln-m} - 0 (m -1)! 2(1-x)dx 2 (m + 1)(m - 1)! To compute the average length of a run note from (4.6.2) that E[LII= xl = i ___ m=1 (m - 1)! and so = 2. The above could also have been computed from (4.6.3) as follows: . '" 1 E[Lnl = 2 m2;1 (m + 1)(m - 1)! yielding the interesting identity '" 1 1 = ];1 (m + 1)(m -1)" 4.6.3 List Ordering Rules-Optimality of the Transposition Rule Suppose that we are given a set of n elements e l , ..• ,en that are to be arranged in some order. At each unit of time a request is made to retrieve one of these APPLICATIONS OF MARKOV CHAINS 199 elements-e, being requested (independently of the past) with probability PI' P, ~ 0, ~ ~ PI = 1 The problem of interest is to determine the optimal ordering so as to minimize the long-run average position of the element requested. Oearly if the P, were known, the optimal ordering would simply be to order the elements in decreasing order of the P,'S. In fact, even if the PI'S were unknown we could do as weJl asymptotically by ordering the elements at each unit of time in decreasing order of the number of previous requests for them However, the problem becomes more interesting if we do not allow such memory storage as would be necessary for the above rule, but rather restrict ourselves to reordering rules in which the reordered permutation of elements at any time is only allowed to depend on the present ordering and the position of the element requested For_ a gi'{en reqrderi!1g rule, the average position of the element requested can be obtained, at least in theory, by analyzing the Markov chain of n! states where the state at any time is the ordering at that time. However, for such a large number of states, the analysis quickly becomes intractable and so we shall simplify the problem by assuming that the probabilities satisfy I-p P 2 = ... = P n = --1 = q. n- For such probabilities, since all the elements 2 through n are identical in the sense that they have the same probability of being requested, we can obtain the average position of the element requested by analyzing the much simpler Markov chain of n states with the state being the position of element e t • We will now show that for such probabilities and among a wide class of rules the transposition rule, which always moves the requested element one closer to the front of the line, is optimal. Consider the following restricted class of rules that, when an element is requested and found in position i, move the element to position j, and leave the rei ative positions of the other elements unchanged. In addition, we suppose that i, < i for i > 1, it = 1, and i, ;::: i,-t, i ::;:: 2, .. , n. The set {fl' i = 1, ... , n} characterizes a rule in this class For a given rule in the above class, let K(i) = max{l.it+1 <: f}. In other words, for any i, an element in any of the positions i, i + 1, ... , i + K(i) will, if requested, be moved to a position less than or equal to i. For a specified rule R in the above class-say the one having K(i) = k(i), i 1, ... , n-Iet us denote the stationary probabilities when this rule is employed by 'IT, = P",{e l is in position f}, i ;::: 1. 200 MARKOV CHAINS In addition, let n 0 , = 2: "/ = P",{el is in a position greater than I}, i:> 0 /;1+ 1 with the notation P"" signifying that the above are limiting probabilities. Before writing down the steady-state equations, it may be worth noting the following (I) Any element moves toward the back of the list at most one position at a time. (ii) If an element is in position i and neither it nor any of the elements in the following k(i) positions are requested, it will remain in position i. (iii) Any element in one of the positions i, i + 1, ... , i + k (i) will be moved to a position <:: i if requested. The steady-state probabilities can now easily be seen to be 0, = O,+k(,) + (0, - O,tk(/))(1 - p) + (0'_1 - O,)qk(i). The above follows since element 1 will be in a position higher than i if it was either in a position higher than i + k(i) in the previous period, or it was in a position less than i + k(i) but greater than i and was not selected, or if it was in position i and if any of the elements in positions i + 1,. .• i + k(i) were selected. The above equations are equivalent to (4.6.4) 0, = a,O, , + (1 - a,)O,+k(')1 i 1, ... , n - 1, where 0 0 = 1, On = 0, a, qk(i) qk(i) + p' Now consider a special rule of the above class, namely, the transposition rule, which has j, i-I, i 2, ... , n, j, = L Let the corresponding 0, be denoted by Of for the transposition rule. Then from Equation (4.6.4), we have, since K(i) 1 for this rule, or, equivalently, - q - - 0, = -(0, - 0,_,) p APPLICATIONS OF MARKOV CHAINS 201 unplying Summing the above equations from j = 1, .. , , r yields n - n - (n - n ) [.2. + . , . + (q)'] i+, , - , ,- 1 P p' Letting r = k(i), where k(i) is the value of K(i) for a given rule R in our class, we see that or, equivalently, (4.6.5) n, = b,n'_1 + (1 - b,)n,+k(t)' i=l, .. ,n 1, nn = 0, where b = (qlp) + . , . + (qlp)k(') , 1 + (qlp) + . , . + (qlpl('l' i=l, ... ,n-l We may now prove the following. PROPOSITION 4.6.3 If p ;:: lin, then ", ~ ", for all i If p ~ lin. then ", ~ ", for all i Proof Consider the case p ~ lin, which is equivalent to p;:: q, and note that in this case 1 1 a=l- >1- =b , 1 + (k(i)lp)q - 1 + qlp + .. + (qlp)lrlll " 202 MARKOV CHAINS Now define a Markov chain with states O. 1. , n and transition probabilities (466) P,,= { c, 1 - c, ifj=i-l. ifj = i + k(i). i = 1, ,n - 1 Let f, denote the probability that this Markov chain ever enters state 0 given that it starts in state i Then f, satisfies f, = C,f,-I + (1 - c,)f, f k(l). 1= 1. ,n - 1. fo = 1, Hence, as it can be shown that the above set of equations has a unique solution. it follows from (464) that if we take c, equal to a, for all i, then f, will equal the IT, of rule R, and from (465) if we let c, = b" then f, equals TI, Now it is intuitively clear (and we defer a formal proof until Chapter 8) that the probability that the Markov chain defined by (466) will ever enter 0 is an increasing function of the vector f =: (c 1 • • Cn_l) Hence. since a, ;::: b" i = 1. . n, we see that IT,;::: n, for all i When p :s; lin, then a, ::s; b" i = 1, . n - 1. and the above inequality is reversed THEOREM 4.6.4 Among the rules considered, the limiting expected position of the element requested is minimized by the transposition rule Proof Letting X denote the position of e l • we have upon conditioning on whether or not e l is requested that the expected position of the requested element can be expressed as E "' ..] E[] ( ) E[1 + 2 + + n - X] LPosltlon = p X + 1 - P n _ 1 = (p _ 1 - p) E[X] + (1- p)n(n + 1) n - 1 2(n - 1) Thus. if p ;::: lin. the expected position is minimized by minimizing E[X], and if p ~ lin. by maximizing E[X] Since E[X] = 2 . ; ~ 0 PiX > it, the result follows from Proposition 46 3 TIME_REVERSIBLE MARKOV CHAINS 203 4.7 TIME-REVERSIBLE MARKOV CHAINS An irreducible positive recurrent Markov chain is stationary if the initial state is chosen according to the stationary probabilities. (In the case of an ergodic chain this is equivalent to imagining that the process begins at time t = -00.) We say that such a chain is in steady state. Consider now a stationary Markov chain having transition probabilities p,J and stationary probabilities 1T,. and suppose that starting at some time we trace the sequence of states going backwards in time. That is. starting at time n consider the sequence of states X n • Xn -I, . .. It turns out that this sequence of states is itself a Markov chain with transition probabilities P ~ defined by P ~ P{Xm = jlX m + 1 = i} _ P{Xm+1 = ilX m = j}P{Xm = j} - P{Xm+1 = i} To prove that the reversed process is indeed a Markov chain we need to verify that To see that the preceding is true, think of the present time as being time m + 1. Then, since X n , n ::> 1 is a Markov chain it follows that given the present state Xm+1 the past state Xm and the future states X m + 2 • X m + 3 , ••• are independent. But this is exactly what the preceding equation states. Thus the reversed process is also a Markov chain with transition probabili- ties given by 1T,P" P *- '/---' 1T, If P ~ = p,J for all i, j, then the Markov chain is said to be time reversible. The condition for time reversibility, namely, that (4.7.1) for all i. j, can be interpreted as stating that, for all states i and j, the rate at which the process goes from i to j (namely, 1T,P'J) is equal to the rate at which it goes from j to i (namely, 1T,P,,). It should be noted that this is an obvious necessary condition for time reversibility since a transition from i to j going backward 204 MARKOV CHAINS in time is equivalent to a transition from j to i going forward in time; that is if Xm = i and Xm I = j, then a transition from i to j is observed if we looking backward in time and one from j to i if we are looking forward in time If we can find nonnegative numbers, summing to 1, which satisfy (4.71) then it follows that the Markov chain is time reversible and the represent the stationary probabilities. This is so since if - (47.2) for all i, j 2: X t 1, then summing over i yields 2: Xl = 1 Since the stationary probabilities 'Fr t are the unique solution of the above, it follows that X, = 'Fr, for all i. ExAMPLE 4.7(A) An Ergodic Random Walk. We can argue, with- out any need for computations, that an ergodic chain with p/., + I + P tr I = 1 is time reversible. This follows by noting that the number of transitions from i to i + 1 must at all times be within 1 of the number from i + 1 to i. This is so since between any two transitions from ito i + 1 there must be one from i + 1 to i (and conversely) since the only way to re-enter i from a higher state is by way of state i + 1. Hence it follows that the rate of transitions from i to i + 1 equals the rate from i + 1 to i, and so the process is time reversible. ExAMP1E 4.7(a) The Metropolis Algorithm. Let a p j = 1, ... , m m be positive numbers, and let A = 2: a, Suppose that m is large ,= 1 and that A is difficult to compute, and suppose we ideally want to simulate the values of a sequence of independent random variables whose probabilities are P, = ajlA, j = 1, ... , m. One way of simulating a sequence of random variables whose distributions converge to {PI' j = 1, .. , m} is to find a Markov chain that is both easy to simulate and whose limiting probabilities are the PI' The Metropolis algorithm provides an approach for accomplishing this task Let Q be any irreducible transition probability matrix on the integers 1, ... , n such that q/, = q/, for all i and j. Now define a Markov chain {X n , n ::> O} as follows. If Xn = i, then generiite a random variable that is equal to j with probability qlJ' i, j 1; .. , m. If this random variable takes on the value j, then set Xn+ 1 equal to j with probability min{l, a,la t }, and set it equal to i otherwise. TIME-REVERSIBLE MARKOV CHAINS That is, the transition probabilities of {X n , n ~ O} are ifj * i if j = i. We will now show that the limiting probabilities of this Markov chain are precisely the PI" To prove that the p} are the limiting probabilities, we will first show that the chain is time reversible with stationary probabilities PI' j = 1, ... , m by showing that To verify the preceding we must show that Now, q,} = q}, and a/a, = pJpl and so we must verify that P, min(l, pJp,) = p} min(l, p,Ip). However this is immediate since both sides of the equation are eq ual to mine P" p}). That these stationary probabilities are also limiting probabilities follows from the fact that since Q is an irreduc- ible transition probability matrix, {X n } will also be irreducible, and as (except in the trivial case where Pi == lin) P II > 0 for some i, it is also aperiodic By choosing a transition probability matrix Q that is easy to simulate-that is, for each i it is easy to generate the value of a random variable that is equal to j with probability q,}, j = 1, ... , n-we can use the preceding to generate a Markov chain whose limiting probabilities are a/A, j = 1, ... , n. This can also be accomplished without computing A. 205 Consider a graph having a positive number w,} associated with each edge (i, j), and suppose that a particle moves from vertex to vertex in the following manner· If the particle is presently at vertex i then it will next move to vertex j with probability where w,} is 0 if (i,j) is not an edge of the graph. The Markov chain describing the sequence of vertices visited by the particle is called a random walk on an edge weighted graph. 206 MARKOV CHAINS PROPOSITION 4.7.:1. Consider a random walk on an edge weighted graph with a finite number of vertices If this Markov chain is irreducible then it is, in steady state, time reversible with stationary probabilities given by 1 7 , = ~ = - - I I Proof The time reversibility equations reduce to or, equivalently, since w II WI/ implying that which, since L 17, = 1, proves the result. ExAMPLE 4.7(0) Consider a star graph consisting of r rays, with each ray consisting of n vertices. (See Example 1.9(C) for the definition of a star graph.) Let leaf i denote the leaf on ray i. Assume that a particle moves along the vertices of the graph in the following manner Whenever it is at the central vertex O. it is then equally likely to move to any ofits neighbors Whenever it is on an internal (nonJeaf) vertex of a ray. then it moves towards the leaf of that ray with probability p and towardsOwith probability 1 - p. Wheneverit is at a leaf. it moves to its neighbor vertex with probability 1. Starting at vertex O. we are interested in finding the expected number of transitions that it takes to visit an the vertices and then return to O. TIME-REVERSIBLE MARKOV CHAINS Figure 4.7.1. A star graph with weights: w := p/(l - p). To begin, let us determine the expected number of transitions between returns to the central vertex O. To evaluate this quantity, note the Markov chain of successive vertices visited is of the type considered in Proposition 4.7.1. To see this, attach a weight equal to 1 with each edge connected to 0, and a weight equal to w' on an edge connecting the ith and (i + 1 )st vertex (from 0) of a ray, where w = pl(l - p) (see Figure 47.1) Then, with these edge weights, the probability that a particle at a vertex i steps from 0 moves towards its leaf is w'/(w' + WI-I) p. Since the total of the sum of the weights on the edges out of each of the vertices is and the sum of the weights on the edges out of vertex 0 is r, we see from Proposition 4 7 1 that Therefore, /Loo, the expected number of steps between returns to vertex 0, is Now, say that a new cycle begins whenever the particle returns to vertex 0, and let XI be the number of transitions in the jth cycle, j ;;::: 1 Also, fix i and let N denote the number of cycles that it 207 MARKOV CHAINS takes for the particle to visit leaf i and then return to 0 With these N definitions, 2: X, is equal to the number of steps it takes to visit 1= 1 leaf i and then return to O. As N is clearly a stopping time for the X" we obtain from Wald's equation that E [f X,] = ILooE[N] ::: --'-1-w-<" E[N]. /=1 To determine E[N], the expected number of cycles needed to reach leaf i, note that each cycle will independently reach leaf i with probability where 1/r is the probability that the transition from 0 is onto ray i, and 1 (11:;)n is the (gambler's ruin) probability that a particle on the first vertex of ray i will reach the leaf of that ray (that is, increase by n - 1) before returning to O. Therefore N, the number of cycles needed to reach leaf i. is a geometric random variable with mean r[1 - (1/w)"]1(1 - 1/w). and so E [f X] = 2r(1 - wn)[1 - (l/w)n] = 2r[2 - w n - (l/w)n] ,=1 I (1 - w)(1 - l/w) 2 - w - 1/w Now, let T denote the number of transitions that it takes to visit all the vertices of the graph and then return to vertex 0 To deter- mine E[T] we will use the representation T = T, + T2 +. . + T" where T, is the time to visit the leaf 1 and then return to 0, T z is the additional time from T, until both the leafs 1 and 2 have been visited and the process returned to vertex 0; and, in general, T, is the additional time from T,-t until all of the leafs 1, .. , i have been visited and the process returned to 0 Note that if leaf i is not the last of the leafs 1, , i to be visited, then T, will equal 0, and if it is the last of these leafs to be visited, then T, will have the same distribution as the time until a specified leaf is first visited and the process then returned to 0 Hence, upon conditioning on whether leaf i is the last of leafs I, ... , i to be visited (and the probability of this event is clearly 1 N), we obtain from the preced- ing that E[T] = 2r{2 - w n - l/wn] i Iii. 2-w-1/w pi TIME-REVERSIBLE MARKOV CHAINS 209 If we try to solve Equations (4.7.2) for an arbitrary Markov chain, it will usually turn out that no solution exists. For example, from (4.7.2) X, P" = X,P,i' XkPk, = X,P,k' implying (if P"P,k > 0) that which need not in general equal P kl / P ,k • Thus we see that a necessary condition for time reversibility is that (4.7.3) for all i, j, k, which is equivalent to the statement that, starting in state i, the path i --+ k -+ j --+ i has the same probability as the reversed path i --+ j --+ k --+ i. To understand the necessity of this, note that time reversibility implies that the rate at which a sequence of transitions from ito k to j to i occur must equal the rate of ones from i to j to k to i (why?), and so we must have implying (4.7.3). In fact we can show the following. THEOREM 4.7.2 A stationary Markov chain is time reversible if. and only if, starting in state i. any path back to i has the same probability as the reversed path. for all i That is, if (474) for all states i, i .. Proof The proof of necessity is as indicated. To prove sufficiency fix states i and j and rewnte (474) as P II P, I' .. P, IPII = P"P I, ... P, I I 12" .t I Summing the above over all states i l • i z , • i k yields P k+1 P _ P p k + 1 'I 11 - I, I' 210 MARKOV CHAINS Hence n n '" pH I '" pH I L.J 1/ L.J /1 P I = P __ /1 n 1/ n Letting n co now yields which establishes the result. ExAMPLE 4.7(D) A List Problem. Suppose we are given a set of n elements-numbered 1 through n-that are to be arranged in some ordered list. At each unit of time a request is made to retrieve one of these elements-element i being requested (independently of the past) with probability P, After being requested, the element is then put back, but not necessarily in the same position. In fact, let us suppose that the element requested is moved one closer to the front of the list, for instance, if the present list ordering is 1, 3,4,2,5 and element 2 is requested, then the new ordering becomes 1,3,2, 4,5. For any given probability vector r. == (PI' ... , P n ), the above can be modeled as a Markov chain with n! states with the state at any time being the list order at that time. By using Theorem 4 7.1 it is easy to show that this chain is time reversible. For instance, suppose n == 3. Consider the following path from state (I, 2, 3) to itself: (1,2,3) ---+ (2, 1,3) ---+ (2, 3,1) ---+ (3,2,1) ---+ (3, 1,2) ---+ (1,3,2) ---+ (1, 2, 3). The products of the transition probabilities in the forward direction and in the reverse direction are both equal to P; Since a similar result holds in general, the Markov chain is time reversible. In fact, time reversibility and the limiting probabilities can also be verified by noting that for any permutation (iI, ;2, ... , in), the probabilities given by ( . ')-cpnpn-I P 1[ 'I,· .. ,'n - I I ••• I 12ft satisfy Equation (4.7.1), where C is chosen so that 2: 1[(i1' .. ·' in) == 1. (II ./ft) • TIME REVERSIBLE MARKOV CHAINS Hence, we have a second argument that the chain is reversible, and the stationary probabilities are as given above. 211 The concept of the reversed chain is useful even when the process is not urne reversible. To illustrate this we start with the following theorem. & THEOREM 4.7.3 Consider an irreducible Markov chain with transition probabilities P,/ If one can find nonnegative numbers 11'" i 2: 0, summing to unity, and a transition probability matrix P* = [ P / ~ ] such that (4 7.5) then the 11'" i 2: 0, are the stationary probabilities and P,j are the transition probabilities of the reversed chain. Proof Summing the above equality over all i yields L 11', P,/ = 11',L PI; I I Hence, the 11','S are the stationary probabilities of the forward chain (and also of the reversed chain, why?) Since 11', P ,/ P *- /1---' 11'/ it follows that the P ~ are the transition probabilities of the reversed chain. The importance of Theorem 4.7 2 is that we can sometimes guess at the nature of the reversed chain and then use the set of equations (4.7.5) to obtain both the stationary probabilities and the P ~ . EXAMPLE 4.7(E) Let us reconsider Example 4.3( C), which deals with the age of a discrete time renewal process. That is, let Xn denote the age at time n of a renewal process whose interarrival times are all integers. Since the state of this Markov chain always increases by one until it hits a value chosen by the interarrival distribution and then drops to 1, it follows that the reverse process will always decrease by one until it hits state 1 at which time it jumps to a 212 MARKOV CHAINS state chosen by the interarrival distnbution. Hence it seems that the reversed process is just the excess or residual bye process. Thus letting P, denote the probability that an interarrival is i, i ~ I, it seems likely that P ~ = Pi' P':,-l=I, i> 1 Since for the reversed chain to be as given above we would need from (4.7.5) that 1T,P, _ P -",--1T 1 , LPj J =, or 1T, = 1T1 P{X > i}, where X is an interarrival time. Since ~ i 1T, = 1, the above would necessitate that 1 = 1T1 L P{X ~ i} i= 1 = 1TIE[X], and so for the reversed chain to be as conjectured we would need that (4.7.6) P { X ~ i} 1T, = E[X] To complete the proof that the reversed process is the excess and the limiting probabilities are given by (4.7 6), we need verify that 1T, p,., + 1 = 1T, + 1 P ,*+ 1 ;, or, equivalently, which is immediate. SEMI MARKOV PROCESSES Thus by looking at the reversed chain we are able to show that it is the excess renewal process and obtain at the same time the limiting distribution (of both excess and age). In fact, this example yields additional insight as to why the renewal excess and age have the same limiting distnbution. 213 The technique of using the reversed chain to obtain limiting probabilities will be further exploited in Chapter 5, where we deal with Markov chains in continuous time. 4.8 SEMI-MARKOV PROCESSES I A semi-Markov process is one that changes states in accordance with a Markov chain but takes a random amount of time between changes. More specifically consider a stochastic process with states 0, 1, .. , which is such that, whenever it enters state i, i ;;:: 0: (i) The next state it will enter is state j with probability p,/, i, j ;;:: O. (ii) Given that the next state to be entered is state j, the time until the transition from i to j occurs has distribution F,/. If we let Z(t) denote the state at time t, then {Z(t), t ~ O} is called a semi- Markov process. Thus a semi-Markov process does not possess the Markovian property that given the present state the future is independent of the past For in predicting the future not only would we want to know the present state, but also the length of time that has been spent in that state. Of course, at the moment of transition, all we would need to know is the new state (and nothing about the past). A Markov chain is a semi-Markov process in which F,/I) = { ~ t < 1 t ~ 1 That is, all transition times of a Markov chain are identically L Let H, denote the distribution of time that the semi-Markov process spends in state i before making a transition That is, by conditioning on the next state, we see H,(t) = L PIJF,/ (t), / and let /1-, denote its mean. That is, /1-, = f; x dH,(x) 214 MARKOV CHAINS If we let Xn denote the nth state visited, then {Xn' n ~ O} is a Markov chain with transition probabilities P,io It is called the embedded Markov chain of the semi-Markov process We say that the semi-Markov process is irreduc_ ible if the embedded Markov chain is irreducible as well. Let T" denote the time between successive transitions into state i and let Jl.;1 = E[T;J By using the theory of alternating renewal processes, it is a simple matter to derive an expression for the limiting probabilities of a semi- Markov process. PROPOSITION 4.8.1 If the semi-Markov process is irreducible and if Tii has a n o n i ~ t t i c e distnbution with finite mean, then P, == lim P{Z(t) = i IZ(O) = j} 1--+'" exists and is independent of the initial state Furthermore, Proof Say that a cycle begins whenever the process enters state i, and say that the process is "on" when in state i and "off" when not in i Thus we have a (delayed when Z(O) ~ i) alternating renewal process whose on time has distribution H, and whose cycle time is T", Hence, the result follows from Proposition 3 4 4 of Chapter 3 As a corollary we note that Pi is also equal to the long-run proportion of time that the process is in state i. Corollary 4.8.2 If the semi-Markov process is irreducible and Il-ll < 00, then, with probability 1, & = lim amount of time in i dunng [0, d. Il-ll 1--+'" t That is, 1l-,11l-11 equals the long-run proportion of time in state i. Proof Follows from Proposition 372 of Chapter 3 SEMI MARKOV PROCESSES 215 While Proposition 4 8.1 gives us an expression for the limiting probabilities, it is not, however, the way one actually computes the Pi To do so suppose that the embedded Markov chain {X n , n ::> O} is irreducible and positive recurrent, and let its stationary probabilities be 1T J , j ~ O. That is, the 1T j , j 2! 0, is the unique solution of, and 1Tj has the interpretation of being the proportion of the X/s that equals j (If the Markov chain is aperiodic, then 1Tj is also equal to limn_ex> P{Xn = j}.) Now as 1T J equals the proportion of transitions that are into state j, and /Lj is the mean time spent in state j per transition, it seems intuitive that the limiting probabilities should be proportional to 1T J /Lr We now prove this. THEOREM 4.8.3 Suppose the conditions of Proposition 481 and suppose further that the embedded Markov chain {Xn' n 2:: O} is positive recurrent Then Proof Define the notation as follows Y, U) = amount of time spent in state i dunng the jth visit to that state, i,j::::: 0 N, (m) = number of visits to state i in the first m transitions of the semi-Markov process In terms of the above notation we see that the proportion of time in i dunng the first m transitions, call it P,-m, is as follows N,(m) L Y,(j) (481) J= 1 P, = m = ---'--N-c, (--'m>,--- L L Y,(j) , J=I N,(m) N:;> Y,(j) -- L.J-- m /=1 N,(m) L N,(m)N£) Y,U) , m /=1 N,(m) 216 MARKOV CHAINS Now since N,(m) _ 00 as m _ 00, it follows from the strong law of large numbers that N,(m) Y,U) 2: -N( ) -ILl' ,_I , m and, by the strong law for renewal processes, that N, (m) (E[ b f . . b . . '])-1 --_ num er 0 transItions etween VISitS to I = 1f, m Hence, letting m _ 00 in (48 1) shows that and the proof is complete. From Theorem 4.8.3 it follows that the limiting probabilities depend only on the transition probabilities P,) and the mean times ILl> i, j ~ O. ExAMPLE 4.8(A) Consider a machine that can be in one of three states: good condition, fair condition, or broken down. Suppose that a machine in good condition will remain this way for a mean time ILl and will then go to either the fair condition or the broken condition with respective probabilities t and t. A machine in the fair condition will remain that way for a mean time ILz and will then break down. A broken machine will be repaired, which takes a mean time IL3, and when repaired will be in the good condition with probability i and the fair condition with probability 1. What proportion of time is the machine in each state? Solution. Letting the states be 1, 2, 3, we have that the TT, satisfy The solution is TTl + TT2 + TT3 = 1, TTl = iTT 3 , TT z = tTTI + iTT 3 , TT3 = tTTI + TT 2 • SEMI-MARKOV PROCESSES Hence, P" the proportion of time the machine is in state i, is given by The problem of determining the limiting distribution of a semi- Markov process is not completely solved by deriving the P,. For we may ask for the limit, as t ~ 00, of being in state i at time t of making the next transition after time t + x, and of this next transi- tion being into state j To express this probability let yet) = time from t until the next transition, Set) = state entered at the first transition after t. To compute lim P{Z(t) = i, yet) > x, Set) = n, , ~ ' " we again use the theory of alternating renewal processes THEOREM 4.8.4 If the semi-Markov process is irreducible and not lattice, then (482) lim P{Z(t) = i, Y(t) > x, S(t) = j IZ(O) = k} ,_00 p , J ~ F,,(y)dy = 217 Proof Say that a cycle begins each time the process enters state i and say that it is "on" if the state is i and it will remain i for at least the next x time units and the next state is j Say it is "off" otherwise Thus we have an alternating renewal process Conditioning on whether the state after i is j or not, we see that £["on" time in a cycle] = P"£[(X,, - x)'], 218 MARKOV CHAINS where X,) is a random vanable having distribution F,) and representing the time to make a transition from i to j, and y+ = max(O, y) Hence E["on" time in cycle] = P" f ~ P{X,) - X > a} da = P,) f ~ F,)(a + x) da = P,) r F" ( y) dy As E[cycle time] = iJ-,,, the result follows from alternating renewal processes By the same technique (or by summing (4.8.2) over j) we can prove the fol- lowing. Corollary 4.8.5 If the semi-Markov process is irreducible and not lattice, then (483) ! ~ ~ P{Z(t) = i, Y(t) > xIZ(O) = k} = r H, (y) dy1iJ-/1 Remarks (1) Of course the limiting probabilities in Theorem 4.8.4 and Corollary 4.8.5 also have interpretations as long-run proportions For instance, the long-run proportion of time that the semi-Markov process is in state i and will spend the next x time units without a transition and will then go to state j is given by Theorem 4.8.4. (2) Multiplying and dividing (4.8.3) by JLiJ and using P, = JL,IJLii' gives lim P{Z(t) = i, yet) > x} = P, H,.e(X), 1-'" where H, e is the equilibrium distribution of H, Hence the limiting probability of being in state i is P" and, given that the state at t is i, the time until transition (as t approaches 00) has the equilibrium distribution of H" PROBLEMS 219 PROBLEMS 4.1. A store that stocks a certain commodity uses the following (s, S) ordering policy; if its supply at the beginning of a time penod is x, then it orders o S-x if x s, if x < s. The order is immediately filled. The daily demands are independent and equal j with probability (Xi' All demands that cannot be immediately met are lost. Let Xn denote the inventory level at the end of the nth time penod. Argue that {X n , n I} is a Markov chain and compute its transition probabilities. - - 4.2. For a Markov chain prove that whenever n. < n2 < ... < nk < n. 4.3. Prove that if the number of states is n, and if state j is accessible from state i, then it is accessible in n or fewer steps 4.4. Show that 4.5. For states i, j, k, k "" j, let n P n - fk pn-k ii - L.J ii il . k=O = P{Xn = j, Xl"" k, e = 1,. ., n - llXo = I}. (a) Explain in words what represents n (b) Prove that, for i "" j, = L k=O 4.6. Show that the symmetric random walk is recurrent in two dimensions and transient in three dimensions. 4.7. For the symmetnc random walk starting at 0: (a) What is the expected time to return to O? 220 MARKOV CHAINS (b) Let N n denote the number of returns by time n. Show that E[N,.] ~ (2n + 1) c:) ( ~ r -1. (c) Use (b) and Stirling's approximation to show that for n large E[N n ] is proportional to Vn. 4.8. Let XI' X h ... be independent random variables such that P{X, = j} == al'i;;:;: O. Say that a record occurs at time n if Xn > max(X I , ..• ,X n _ I ), where Xo = -00, and if a record does occur at time n call Xn the record value. Let R, denote the ith record value. (0) Argue that {R" i ;;:;: I} is a Markov chain and compute its transi- tion probabilities. (b) Let T, denote the time between the ith and (i + l)st record. Is {T" i ;;:;: 1} a Markov chain? What about {(R" T,), i ;;:;: I}? Compute transition probabilities where appropriate. n (c) Let Sn = 2: T" n > 1. Argue that {Sn' n ;;:;: I} is a Markov chain , I and find its transition probabilities. 4.9. For a Markov chain {X n , n > O}, show that 4.10. At the beginning of every time period, each of N individuals is in one of three possible conditions: infectious, infected but not infectious, or noninfected. If a noninfected individual becomes infected during a time period then he or she will be in an infectious condition during the following time period, and from then on will be in an infected (but not infectious) condition. During every time period each of the (:) pairs of individuals are independently in contact with probability p. If a pair is in contact and one of the members of the pair is infectious and the other is noninfected then the noninfected person becomes infected (and is thus in the infectious condition at the beginning of the next period). Let Xn and Y n denote the number of infectious and the number of noninfected individuals, respectively, at the beginning of time period n. (0) If there are i infectious individuals at the beginning of a time period, what is the probability that a specified noninfected individuid will become infected in that penod? (b) Is {X n , n > O} a Markov chain? If so, give its transition probabilities. PROBLEMS 221 (c) Is {Y n , n 2: O} a Markov chain? If so, give its transition probabilities. (d) Is {(Xn' Y n ), n 2: O} a Markov chain? If so, give its transition probabil- ities. 4.11. If /" < 1 and ~ J < 1, show that: '" '" L P ~ (a) L P ~ < 00; (b) !.) = n=l., n=1 1 + L P ~ n=1 4.U. A transition probability matnx P is said to be doubly stochastic if for allj. That is, the column sums all equal 1. If a doubly stochastic chain has n states and is ergodic, calculate its limiting probabilities. 4.13. Show that positive.and null recurrence are class properties. 4.14. Show that in a finite Markov chain there are no null recurrent states and not all states can be transient. 4.15. In the MIGIl system (Example 4.3(A» suppose that p < 1 and thus the stationary probabilities exist. Compute l1'(S) and find, by taking the limit as s -+ 1, ~ ; il1 t • 4.16. An individual possesses r umbrellas, which she employs in going from her home to office and vice versa If she is at home (the office) at the beginning (end) of a day and it is raining, then she will take an umbrella with her to the office (home), provided there is one to be taken. If it is not raining, then she never takes an umbrella Assume that, independent of the past, it rains at the begInning (end) of a day with probability p. (a) Define a Markov chain with r + 1 states that will help us determine the proportion of time that our individual gets wet. (Note: She gets wet if it is raining and all umbrellas are at her other location.) (b) Compute the limiting probabilities. (c) What fraction of time does the individual become wet? 4.17. Consider a positive recurrent irreducible penodic Markov chain and let l1 J denote the long-run proportion of time in state j, j 2: O. Prove that "), j 2: 0, satisfy "1 ~ I "1 P,), ~ , l1 J = 1. 222 MARKOV CHAINS 4.18. Jobs arrive at a processing center in accordance with a Poisson process with rate A. However, the center has waiting space for only N jobs and so an arriving job finding N others waiting goes away. At most 1 job per day can be processed, and the processing of this job must start at the beginning of th e day. Th us, if there are any jobs waiting for processing at the beginning of a day, then one of them is processed that day, and if no jobs are waiting at the beginning of a day then no jobs are processed that day. Let Xn denote the number of jobs at the center at the beginning of day n. (a) Find the transition probabilities of the Markov chain {X n , n ~ O}. (b) Is this chain ergodic? Explain. (c) Write the equations for the stationary probabilities. 4.19. Let 7T j ,j 2:: 0, be the stationary probabilities for a specified Markov chain. (a) Complete the following statement: 7T,P'j is the proportion of all transitions that .... Let A denote a set of states and let A C denote the remaining states. (b) Finish the following statement: ~ . ~ 7T,P'j is the proportion of all JEAr ,EA transitions that .... (c) Let Nn{A, A C ) denote the number of the first n transitions that are from a state in A to one in AC; similarly, let Nn{A" A) denote the number that are from a state in AC to one in A. Argue that (d) Prove and interpret the following result: 2: 2: 7T,P'J JEA' ,EA 4.20. Consider a recurrent Markov chain starting in state O. Let m, denote the expected number of time periods it spends in state i before returning to O. Use Wald's equation to show that mo = 1. Now give a second proof that assumes the chain is positive recurrent and relates the m I to the stationary probabilities. 4.21. Consider a Markov chain with states 0, 1, 2, ... and such that P",+I = P, = 1 - P,.,-I' PROBLEMS 223 where Po = 1. Find the necessary and sufficient condition on the P,'S for this chain to be positive recurrent, and compute the limiting probabilities in this case. 4.22. Compute the expected number of plays, starting in i, in the gambler's ruin problem, until the gambler reaches either 0 or N. 4.23. In the gambler's ruin problem show that P{she wins the next garnblelpresent fortune is i, she eventually reaches N} { P[l - (q/p), + 1]/[1 - (q/p)'] _ (i + 1 )/2; ifp "" ~ ifp = ~ 4.24. Let T = {I, ...• t} denote the transient states of a Markov chain, and let Q be, as in Section 44, the matrix of transition probabilities from states in T to states in T. Let m,/(n) denote the expected amount of time spent in state j during the first n transitions given that the chain begins in state i, for i and j in T. Let M n be the matrix whose element in row i, column j, is m,/(n) (a) Show that Mn = 1 + Q + Q2 + ... + Qn. (b) Show that Mn - 1 + Qn"1 = Q[I + Q + Q2 + ... + Qn]. (c) Show that Mn = (I - Qf'(1 - Qn f I) 4.25. Consider the gambler's ruin problem with N 6 and p = 7. Starting in state 3, determine (a) the expected number of visits to state 5 t:" (b) the expected number of visits to state 1. (c) the expected number of visits to state 5 in the first 7 transitions. (d) the probability of ever visting state 1. 4.26. Consider the Markov chain with states 0, 1, . • n and transition prob- abilities PIt f 1 = P = 1 P,,-I, 0< i < n. Let J.l-,.n denote the mean time to go from state i to state n. (a) Derive a set of linear equations for the J.l-, n (b) Let m, denote the mean time to go from state i to state i + 1 Derive a set of equations for the m" i =:: 0, ,n 1, and show how they can be solved recursively, first for i 0, then i = 1, and so on. (c) What is the relation between J.l-. n and the m? 224 MARKOV CHAINS Starting at state 0, say that an excursion ends when the chain either returns to 0 or reaches state n Let X J denote the number of transitions in the jth excursion (that is, the one that begins at the jth return to 0), j ~ 1 (d) Find E[X J ] (Hint. Relate it to the mean time of a gambler's ruin problem) (e) Let N denote the first excursion that ends in state n, and find E[N]. (f) Find /-tOn' (g) Find/-t,n 4.27. Consider a particle that moves along a set of m + 1 nodes, labeled 0, 1, . ,m. At each move it either goes one step in the clockwise direction with probability p or one step in the counterclockwise direction with probability 1 - p. It continues moving until all the nodes 1, 2, ... , m have been visited at least once Starting at node 0, find the probability that node i is the last node visited, i = 1, ., ,m. 4.28. In Problem 4.27, find the expected number of additional steps it takes to return to the initial position after all nodes have been visited. 4.29. Each day one of n possible elements is requested, the ith one with probability P" i ~ 1, ~ ~ P, = 1 These elements are at all times arranged in an ordered list that is revised as follows. the element selected is moved to the front of the list with the relative positions of all the other elements remaining unchanged Define the state at any time to be the list ordering at that time. (a) Argue that the above is a Markov chain. (b) For any state i l ,. , in (which is a permutation of 1, 2, ... , n) let 1T(iI,' , in) denote the limiting probability Argue that P, " -1 I-P - ... -p . 'I In - 2 4.30. Suppose that two independent sequences XI' Xz, ... and Y I , Yz, ... are coming in from some laboratory and that they represent Bernoulli trials with unknown success probabilities PI and P 2 • That is, P{X, = I} = 1 - P{X, = O} = PI' P{Y, = I} = 1 - P{Y, = O} = P 2 , and all random variables are independent To decide whether PI > P 2 or P 2 > PI' we use the following test. Choose some positive integer M and stop at N, the first value of n such that either X + ... + X - (Y + ... + Y) = M I n I n PROBLEMS 225 or x + ... + X - (Y + ... + Y) = -M I n In' In the former case we then assert that PI > P 2 , and in the latter that P 2 > 'P I Show that when PI ~ Ph the probability of making an error (that is, of asserting that P 2 > PI) is 1 P{error} = 1 + AM' and, also, that the expected number of pairs observed is where P I (1 - P 2 ) A = Pil - PI)' (Hint. Relate this to the gambler'S ruin problem) 4.31. A spider hunting a fly moves between locations land 2 according to a [ 0.7 0.3] Markov chain with transition matrix starting in location 1. 0.3 0.7 The fly, unaware of the spider, starts in location 2 and moves according [ 0.4 0.6] to a Markov chain with transition matrix . The spider catches 06 0.4 the fly and the hunt ends whenever they meet in the same location. Show that the progress of the hunt, except for knowing the location where it ends, can be described by a three-state Markov chain where one absorbing state represents hunt ended and the other two that the spider and fly are at different locations. Obtain the transition matnx for this chain (a) Find the probability that at time n the spider and fly are both at their initial locations (b) What is the average duration of the hunt? 4.32. Consider a simple random walk on the integer points in which at each step a particle moves one step in the positive direction with probability 226 MARKOV CHAINS p, one step in the negative direction with probability p, and remains in the same place with probability q = 1 - 2p(0 < P < ~ ) . Suppose an absorbing barrier is placed at the origin-that is, P oo = I-and a re- flecting barrier at N-that is, P N N-I = I-and that the particle starts at n (0 < n < N). Show that the probability of absorption is 1, and find the mean number of steps 4.33. Given that {X /I' n > O} is a branching process: (8) Argue that either Xn converges to 0 or to infinity (b) Show that { 211_If.L n - I Uf.L Var(XnlX o = 1) = f.L - 1 nu 2 if f.L =I' 1 if f.L = I, where f.L and 0'2 are the mean and variance of the number of offspring an individual has 4.34. In a branching process the number of offspring per individual has a binomial distribution with parameters 2, p. Starting with a single individ- ual, calculate: (8) the extinction probability; (b) the probability that the population becomes extinct for the first time in the third generation \/ Suppose that, instead of starting with a single individual, the initial population sIze Zo is a random variable that is Poisson distributed with mean A. Show that, in this case, the extinction probability is given, for p > 1. by 4.35. Consider a branching process in which the number of offspring per individual has a Poisson distribution with mean A, A > 1 Let 110 denote the probability that, starting with a single individual, the population eventually becomes extinct Also, let a, a < 1, be such that (8) Show that a A11 o - (b) Show that, conditional on eventual extinction, the branching process follows the same probability law as the branching process in which the number of offspring per individual is Poisson with mean a PROBLEMS 4.36. For the Markov chain model of Section 46.1, namely, 1 Pil=-'-I' 1- j=l, .. ,i-l, i>l, 227 suppose that the initial state is N ~ (:), where n > m. Show that when n, m, and n - m are large the number of steps to reach 1 from state N has approximately a Poisson distribution with mean m [ clog c ~ 1 + loge c - 1) l where c = nlm. (Hint· Use Stirling's approximation.) 4.37. For any infinite sequence XI' X 2 , • we say that a new long run begins each time the sequence changes direction. That is, if the sequence starts 5, 2, 4, 5, 6, 9, 3, 4, then there are three long runs-namely, (5, 2), (4, 5,6,9), and (3,4) Let XI' X 2 , • •• be independent uniform (0, 1) random variables and let In denote the initial value of the nth long run. Argue that {In' n ~ I} is a Markov chain having a continuous state space with transition probability density given by 4.38. Suppose in Example 4.7(B) that if the Markov chain is in state i and the random variable distributed according to q, takes on the value j, then the next state is set equal to j with probability a/(a j + a,) and equal to i otherwise. Show that the limiting probabilities for this chain are 1TI = a l 12. a j • I 4.39. Find the transition probabilities for the Markov chain of Example 4.3(D) and show that it is time reversible. 4.40. Let {X n , n ~ O} be a Markov chain with stationary probabilities 1T1' j ~ O. Suppose that Xo = i and define T = Min{n: n > 0 and Xn = n. Let }"; = Xr-I,j = 0, 1, ., T. Argue that {},,;,j = 0, ., T} is distributed as the states of the reverse Markov chain (with transition probabilities P , ~ = 1T I P I ,I1T,) starting in state 0 until it returns to 0 4.41. A particle moves among n locations that are arranged in a circle (with the neighbors of location n being n - 1 and 1). At each step, it moves 228 MARKOV CHAINS one position either in the clockwise position with probability p or in the counterclockwise position with probability 1 - p (a) Find the transition probabilities of the reverse chain. (b) Is the chain time reversible? 4.42. Consider the Markov chain with states 0, 1, .. , n and with transition probabilities POI = P nn - I = 1 P, I + I = p, = 1 - p,.,- I , i=l, ... ,n-l. Show that this Markov chain is of the type considered in Proposition 4.7 1 and find its stationary probabilities. 4.43. Consider the list model presented in Example 4.7(D) Under the one- closer rule show, by using time reversibility, that the limiting probability that element j precedes element i-call it P{ j precedes i}-is such that P P{jprecedes i} > P ) P )+ , w h e n ~ >P,. 4.44. Consider a time-reversible Markov chain with transition probabilities p') and limiting probabilities 1f" and now consider the same chain trun- cated to the states 0, 1,. ., M. That is, for the truncated chain its transition probabilities p,) are P,,= P ')' 0, o ~ i ~ M,j = i 05:i=l=j<M otherwise. Show that the truncated chain is also time reversible and has limiting probabilities given by _ 1f, 1f, =-M-' 2, 1f, '=0 4.45. Show that a finite state, ergodic Markov chain such that p,) > 0 for all i oF j is time reversible if, and only if, for all i, j, k PROBLEMS 229 4.46. Let {Xn' n ~ 1} denote an irreducible Markov chain having a countable state space. Now consider a new stochastic process {Y n , n ~ O} that only accepts values of the Markov chain that are between 0 and N That is, we define Y n to be the nth value of the Markov chain that is between o and N For instance, if N = 3 and XI = 1, Xl = 3, X3 = 5, X 4 = 6, X5 2, then Y 1 :;:: 1, Y 2 = 3, Y 3 = 2. (a) Is {Y n , n ;;:;: O} a Markov chain? Explain briefly (b) Let 11) denote the proportion of time that {X n , n ::> I} is in state j If 11 J > 0 for all j, what proportion of time is {Y n , n ;;:;: O} in each of the states 0, 1, ... , N? (c) Suppose {X n } is null recurrent and let 1T,(N), i = 0, 1, , N denote the long-run proportions for {Y n , n .2 O} Show that 1T) (N) = 11,(N)E[time the X process spends in j between returns to i], j =1= i. (d) Use (c) to argue that in a symmetric random walk the expected number of visits to state i before returning to the origin equals 1 (e) If {X n , n ;;:;: O} is time reversible, show that {Y n , n ;;:;: O} is also 4.47. M balls are initially distributed among m urns. At each stage one of the balls is selected at random, taken from whichever urn it is in, and placed, at random, in one of the other m - 1 urns Consider the Markov chain whose state at any time is the vector (nl' , n m ), where n l denotes the number of balls in urn i Guess at the limiting probabilities for this Markov chain and then verify your guess and show at the same time that the Markov chain is time reversible 4.48. For an ergodic semi-Markov process. (a) Compute the rate at which the process makes a transition from i into j. (b) Show that 2: PI) IlL It = 11ILw I (c) Show that the proportion of time that the process is in state i and headed for state j is P,) 71'J IlL II where 71.) = I; F,) (I) dl (d) Show that the proportion of time that the state is i and will next be j within a time x is P,) 71' J F' ( ) './ x, IL" where F ~ J is the equilibrium distribution of F.) 230 MARKOV CHA INS 4.49. For an ergodic semi-Markov process derive an expression, as t ~ 00 for the limiting conditional probability that the next state visited afte; t is state j, given X(t) = i 4.50. A taxi alternates between three locations When it reaches location 1 it is equally likely to go next to either 2 or 3 When it reaches 2 it will next go to 1 with probability ~ and to 3 with probability i. From 3 it always goes to 1 The mean times between locations i and j are tl2 "" 20, tl3 = 30, t 23 = 30 (t,) = t),) (a) What is the (limiting) probability that the taxi's most recent stop was at location i, i = 1, 2, 3? (b) What is the (limiting) probability that the taxi is heading for loca- tion 2? (c) What fraction of time is the taxi traveling from 2 to 3? Note. Upon arrival at a location the taxi immediately departs. REFERENCES References 1, 2, 3, 4, 6, 9, 10, 11, and 12 give alternate treatments of Markov chains Reference 5 is a standard text in branching processes References 7 and 8 have nice treatments of time reversibility 1 N Bhat, Elements of Applied Stochastic Processes, Wiley, New York, 1984 2 C L Chiang, An Introduction to Stochastic Processes and Their Applications, Krieger, 1980 3 E Cinlar, Introduction to Stochastic Processes, Prentice-Hal.!, Englewood Cliffs, NJ, 1975 4 0 R Cox and H 0 Miller, The Theory of Stochastic Processes, Methuen, Lon- don, 1965 5 T Harris, The Theory of Branching Processes, Springer-Verlag, Berlin, 1963 6 S Karlin and H Taylor, A First Course in Stochastic Processes, 2nd ed , Academic Press, Orlando, FL, 1975 7 J Keilson, Markov Chain Models-Rarity and Exponentiality, Springer-Verlag, Berlin, 1979 8 F Kelly, Reversibility and Stochastic Networks, Wiley, Chichester, England, 1979 9 J Kemeny, L Snel.!, and A Knapp, Denumerable Markov Chains, Van Nostrand, Princeton, NJ, 1966 lOS Resnick, Adventures in Stochastic Processes, Birkhauser, Boston, MA, 1992 11 S M Ross, Introduction to Probability Models, 5th ed , Academic Press, Orlando, FL, 1993 12 H C Tijms, Stochastic Models, An Algorithmic Approach, Wiley, Chichester, England, 1994 CHAPTER 5 Continuous-Time Markov Chains 5.1 INTRODUCTION In this chapter we consider the continuous-time analogs of discrete-time Mar- kov chains. As in the case of their discrete-time analogs, they are characterized by the Markovian property that, given the present state, the future is indepen- dent of the past. In Section 5.2 we define continuous-time Markov chains and relate them to the discrete-time Markov chains of Chapter 4. In Section 5 3 we introduce an important class of continuous-time Markov chains known as birth and death processes. These processes can be used to model popUlations whose size changes at any time by a single unit. In Section 5 4 we derive two sets of differential equations-the forward and backward equation-that describe the probability laws for the system. The material in Section 5 5 is concerned with determining the limiting (or long-run) probabilities connected with a continuous-time Markov chain. In Section 5.6 we consider the topic of time reversibility. Among other things, we show that all birth and death processes are time reversible, and then illustrate the importance of this observation to queueing systems. Applications of time reversibility to stochastic population models are also presented in this section In Section 5.7 we illustrate the importance of the reversed chain, even when the process is not time reversible, by using it to study queueing network models, to derive the Erlang loss formula, and to analyze the shared processor system In Section 5.8 we show how to "uniformize" Markov chains-a technique useful for numerical com- putations 5.2 CONTINUous-TIME MARKOV CHAINS Consider a continuous-time stochastic process {X(t), t 2: O} taking on values in the set of nonnegative integers In analogy with the definition of a discrete- time Markov chain, given in Chapter 4, we say that the process {X(t), t 2: O} is a continuous-time Markov chain if for all s, t 2: 0, and nonnegative integers 231 232 CONTINUOUS TIME MARKOV CHAINS " ()O<: <: I, ], xu, - u - s, P{X(t + s) = jlX(s) = i, X(u) = x(u), 0 ~ u < s} = P{X(t + s) j IX(s) i} In other words, a continuous-time Markov chain is a stochastic process hav- ing the Markovian property that the conditional distribution of the future state at time t + s, given the present state at s and all past states depends only on the present state and is independent of the past If, in addition, P{X(t + s) = jIX(s) = i}isindependentofs, then the continuous-time Markov chain is said to have stationary or homogeneous transition probabilities. All Markov chains we consider will be assumed to have stationary transition prob- abilities. Suppose that a continuous-time Markov chain enters state i at some time, say time 0, and suppose that the process does not leave state i (that is, a transition does not occur) during the next s time units What is the probability that the process will not leave state i during the following t time units? To answer this, note that as the process is in state i at time s, it follows, by the Markovian property, that the probability it remains in that state during the interval {s, s + t] is just the (unconditional) probability that it stays in state i for at least t time units. That is, if we let T, denote the amount of time that the process stays in state i before making a transition into a different state, then P{7", > s + tiT, > s} = P{T, > t} for all s, t :> O. Hence, the random variable 7", is memoryless and must thus be exponentially distributed. In fact, the above gives us a way of constructing a continuous-time Markov chain. Namely, it is a stochastic process having the properties that each time it enters state i: (i) the amount of time it spends in that state before making a transition into a different state is exponentially distributed with rate, say, V,; and (ii) when the process leaves state i, it will next enter state j with some probability, call it P,!, where L p ", P,! = 1. A state i for which V, = 00 is called an instantaneous state since when entered it is instantaneously left. Whereas such states are theoretically possible, we shall assume throughout that 0 s V, < 00 for all i. (If V, = 0, then state i is called absorbing since once entered it is never left) Hence, for our purposes a continuous-time Markov chain is a stochastic process that moves from state to state in accordance with a (discrete-time) Markov chain, but is such that the amount of time it spends in each state, before proceeding to the next BIRTH AND DEATH PROCESSES 233 state, is exponentially distributed. In addition, the amount of time the process spends in state i, and the next state visited, must be independent random variables For if the next state visited were dependent on T" then information as to how long the process has already been in state i would be relevant to the prediction of the next state-and this would contradict the Markovian as- sumption. A continuous-time Markov chain is said to be regular if, with probability 1, the number of transitions in any finite length of time is finite An example of a nonregular Markov chain is the one having P,,+I = 1, It can be shown that this Markov chain-which always goes from state i to i + I, spending an exponentially distributed amount of time with mean Ih' 2 in state i-will, with positive probability, make an infinite number of transitions in any time interval of length t, t > O. We shall, however, assume from now on that all Markov chains considered are regular (some sufficient conditions for regularity are given in the Problem section). Let q'J be defined by all i =F j Since v, is the rate at which the process leaves state i and p'J is the probability that it then goes to j, it follows that q'J is the rate when in state i that the process makes a transition into state j; and in fact we call q'J the transition rate from i to j. Let us denote by p'J(t) the probability that a Markov chain, presently in state i, will be in state j after an additional time t That is, p'J(t) = P{X(t + s) = j IX(s) = i} 5.3 BIRTH AND DEATH PROCESSES A continuous-time Markov chain with states 0,1, " forwhichq'J = o whenever Ii - j I > 1 is called a birth and death process Thus a birth and death process is a continuous-time Markov chain with states 0, 1, for which transitions from state i can only go to either state i-I or state i + 1. The state of the process is usually thought of as representing the SlZe of some population, and when the state increases by 1 we say that a birth occurs, and when it decreases by 1 we say that a death occurs Let A, and Ii, be given by Ii, = q,.,-I· 234 CONTINUOUS TIME MARKOV CHAINS The values {A" i :> O} and i 2: I} are called respectively the birth rates and the death rates. Since L/ q,/ v" we see that V, A, + Hence, we can think of a birth and death process by supposing that whenever there are i people in the system the time until the next birth is exponential with rate A, and is independent of the time until the next death, which is exponential with rate I' ExAMPLE S.3(A) Two Birth and Death Processes. (i) The MIMls Queue. Suppose that customers arrive at an s-server service station in accordance with a Poisson process having rate A. That is, the times between successive arrivals are independent exponential ran- dom variables having mean 11 A. Each customer, upon arrival, goes directly into service if any of the servers are free, and if not, then the customer joins the queue (that is, he waits in line). When a server finishes serving a customer, the customer leaves the system, and the next customer in line, if there are any waiting, enters the service. The successive service times are assumed to be independent exponential random variables having mean If we let X(t) denote the number in the system at time t, then {X(t), t 2: O} is a birth and death process with = n >s, n::>O. (ii) A Linear Growth Model with Immigration. A model in which = n::> 1, An = nA + 8, n 2: 0, is called a linear growth process with immigration Such processes occur naturally in the study of biological reproduction and popula- tion growth. Each individual in the population is assumed to give birth at an exponential rate A, in addition, there is an exponential rate of increase 8 of the population due to an external source such as immigration Hence, the total birth rate where there are n persons in the system is nA + 8. Deaths are assumed to occur at an exponential rate for each member of the population, and hence = BIRTH AND DEATH PROCESSES 235 A birth and death process is said to be a pure birth process if JLn = 0 for all n (that is, if death is impossible) The simplest example of a pure birth process is the Poisson process, which has a constant birth rate An = A, n 2: O. A second example of a pure birth process results from a population in which each member acts independently and gives birth at an exponential rate A. If we suppose that no one ever dies, then, if X(t) represents the population size at time t, {X(t), t 2: O} is a pure birth process with n>O This pure birth process is called a Yule process Consider a Yule process starting with a single individual at time 0, and let T" i 2: 1, denote the time between the (i - l)st and ith birth That is, T, is the time it takes for the population size to go from i to i + L It easily follows from the definition of a Yule process that the Tn i 2: 1, are independent and T, is exponential with rate iA Now P{T] ::s; t} = 1 - e-).t, P{T] + Tz<t}= tp{T, + Tz<tlT, = x}Ae- AX dx = f: (1 - e-u(t-'»Ae-).x dx = (1 - e-).')Z, P{T, + T z + T3 ~ t} = f' P{T , + T z + T3 S tiT] + T z = x}dF, +, (x) o I 2 = f: (1 - e- 3A (1 X»2Ae- Ax (1 e- AX ) dx = (1 - e- A/ )3, and, in general, we can show by induction that P{T 1 + ... + ~ ~ t} = (1 - e-At)I. Hence, as P{T, +. . + ~ S t} = P{X(t) 2: j + lIX(O) = I}, we see that, for a Yule process, P]I(t) = (1 - e-At)J-1 - (1 - e-At)1 e-).'(1 - e-A')J-I, j"2::. 1 Thus, we see from the above that, starting with a single individual, the popula- tion size at time t will have a geometric distribution with mean eM Hence if 236 CONTINUOUS-TlME MARKOV CHAINS the population starts with i individuals, it follows that its size at t will be the sum of i independent and identically distributed geometric random variables, and will thus have a negative binomial distribution. That is, for the Yule process, j2=i>l Another interesting result about the Yule process, starting with a single individual, concerns the conditional distribution of the times of birth given the population size at t. Since the ith birth occurs at time 5; == T, + ... + T;. let us compute the conditional joint distribution of 51, .. ,5 n given that X(t) = n + 1 Reasoning heuristically and treating densities as if they were probabilities yields that for 0 :s: SI ~ S2 <. . <S" < t _ P{T , = SI, T2 = S2 - S1' .• , Tn = Sn - S,,_I, Tn+l > t - sn} - P{X(t)=n+1} P{X(t) = n + I} where C is some constant that does not depend on SI' ... 'Sn Hence we see that the conditional density of 51 , , 5" given that X(t) = n + 1 is given by n (53.1) f(S" ,s"ln + 1) = n ~ IIf(s,), , = I where f is the density function (532) { Ae-A(I-X) f(x) = 1 - e- AI o otherwise. But since (5.3 1) is the joint density function of the order statistics of a sample of n random variables from the density f (see Section 23 of Chapter 2), we have thus proven the following BIRTH AND DEATH PROCESSES 237 & PROPOSITION 5.3.1 Consider a Yule process with X(O) = 1 Then, given that X(t) = n + 1, the birth times 51! ' 5 n are distributed as the ordered values from a sample of size n from a population having density (532) Proposition 53.1 can be used to establish results about the Yule process in the same way that the corresponding result for Poisson processes is used EXAMPLE 5.3(8) Consider a Yule process with X(O) = 1. Let us compute the expected sum of the ages of the members of the population at time t The sum of the ages at time t, call it A (t), can be expressed as X(i)- \ A (t) = ao + t + L (t - S,), ,;\ where ao is the age at t = 0 of the initial individual To compute E [A (t)] condition on X(t), E[A(t)IX(t) = n + 1] = ao + t + E [ ~ ( t - S,)IX(t) = n + 1] f l Ae- A(,- xl = ao + t + n (t - x) 1 - AI dx o - e or 1 - e- AI - Ate- AI E [A (t)IX(t)] = ao + t + (X(t) - 1) ( -A/) A 1 - e Taking expectations and using the fact that X(t) has mean eM yields eAt - 1 - At E [A (t)] = ao + t + A e AI - 1 = ao + A Other quantities related to A (t), such as its generating function, can be computed by the same method. The above formula for E [A (t)] can be checked by making use of the following identity whose proof is left as an exercise (5.33) A(t) = ao + f ~ X(s) ds 238 CONTINUOUS-TIME MARKOV CHAINS Taking expectations gives E [A (t)] = a o + E [ I ~ X(s) dS] = ao + I ~ E [X(s)] ds since X(s) ~ 0 = a + I' eJ.s ds o 0 eAr - 1 = ao + A The following example provides another illustration of a pure b i ~ t h process EXAMPLE S.3(c) A Simple Epidemic Model. Consider a popula- tion of m individuals that at time 0 consists of one "infected" and m - 1 "susceptibles " Once infected an individual remains in that state forever and we suppose that in any time interval h any given infected person will cause, with probability ah + o(h), any given susceptible to become infected. If we let X(t) denote the number of infected individuals in the population at time t, the {X(t), t ~ O} is a pure birth process with A = {(m - n)na n 0 n = 1, .. , m - 1 otherwise The above follows since when there are n infected individuals, then each of the m - n susceptibles will become infected at rate na. If we let T denote the time until the total population is infected, then T can be represented as m -I T= L T" I ~ 1 where T, is the time to go from i infectives to i + 1 infectives. As the T, are independent exponential random variables with respec- tive rates A, = (m - i)ia, i = 1, ,m - 1, we see that and 1 m-I 1 E [T] = - "V -----::- a ~ i(m - i) 1 m - 1 ( 1 )2 Var(T) = 2 L.( .) a ,= 1 I m - I THE KOLMOGOROV DIFFERENTIAL EQUATIONS For reasonably sized populations E [T] can be approximated as follows. E(T]=-L -.+--; 1 m-I (1 1) ma ,21 m I I ",=,_1 f m - 1 (_1_+!) dt= 210g(m-l). ma I m - t t ma 239 5.4 THE KOLMOGOROV DIFFERENTIAL EQUATIONS Recall that PI' (t) P{X(t + s) j IX(s) i} I represents the probability that a process presently in state i will be in state j a time t later. By exploiting the Markovian property, we will derive two sets of differential equations for PI' (t), which may sometimes be explicitly solved. However, before doing so we need the following lemmas Lemma 5.4.1 ( ') I' 1 - Pit (I) I 1m = v, ,_0 1 ( ") I' PII (I) n Im--=q'l' 1-.0 t Lemma 5.4.2 For all s, I, i=l=j '" PII(1 + s) = 2: P,k(t)PIr,(s). 1 r ~ 0 Lemma 5 4.1 follows from the fact (which must be proven) that the probabil- ity of two or more transitions in time t is o(t); whereas Lemma 5.4.2, which CONTINUOUS-TIME MARKOV CHAINS is the continuous-time version of the Chapman-Kolmogorov equations of discrete-time Markov chains, follows directly from the Markovian property The details of the proof are left as exercises. From Lemma 5.4.2 we obtain P,} (t + h) = 2, P,i«h)Pi</(t), * or, equivalently, P,} (I + h) - P,/ (t) = 2, Pik(h) Pi</ (t) - [1 PIl(h)]P,/ (t). i<#, Dividing by h and then taking the limit as h -+ 0 yields, upon application of Lemma 54.1, (5.4.1) I , p,Jt + h) ~ p,/ (t) = I' '" P'i«h) P (t) - P (t) 1m h 1m L..i h */ v, 1/ • h-O h-OkH Assuming that we can interchange the limit and summation on the nght-hand side of (5.4.1), we thus obtain, again using Lemma 54.1, the following THEOREM 5.4.3 (Kolmogorov's Backward Equations). For all i, j, and t 2? 0, P:} (t) = 2: q,k P k } (t) - v, Pi/ (t) k *, Proof To complete the proof we must justify the interchange of limit and summation on the right-hand side of (5 4 1), Now, for any fixed N, = 2: q,k P k / (I) Since the above holds for all N we see that (542) k *, k<N THE KOLMOGOROV DIFFERENTIAL EQUATIONS To reverse the inequality note that for N > i, since P*,(t) S I, SlimsuP[L P'k(h) Pk,(t) + L P,k(h)} h-O HI h k .. N h J k<N _I' [", P,*(h)p () + 1 - P,,(h) '" Plk(h)] - Imsup L..i -- k t - L..i-- h- 0 HI h' h H, h k<N k<N L q,k P k , (t) + V, - L q,k' k4'1 *,,' k<N k<N 241 where the last equality follows from Lemma 5 4.1 As the above inequality is true for all N> i, we obtain upon letting N - !Xl and using the fact ~ H ' q,k = V" The above combined with (5 42) shows that which completes the proof of Theorem 5 43 The set of differential equations for PI} (t) given in Theorem 5.43 are known as the Kolmogorov backward equations They are called the backward equations because in computing the probability distribution of the state at time t + h we conditioned on the state (all the way) back at time h. That is, we started our calculation with P" (t + h) L. P{X(t + h) = iIX(O) = i, X(h) = k}P{X(h) == kIX(O) = i} k We may derive another set of equations, known as the Kolmogorov's forward equations, by now conditioning on the state at time t This yields P,} (t + h) = L. P,k(t)P k , (h) k 242 CONTINUOUS TIME MARKOV CHAINS or P,,(t + h) P,,(t) == 2: P,k(t)Pk/h) - P,,(t) k == 2: P,k(t)Pk,(h) - [1- PJI(h)]P,,(t) H-, Therefore, Assuming that we can interchange limit with summation, we obtain by Lemma 54.1 that P:, (t) = 2: qk, Prk(t) - VI P" (t) k*1 Unfortunately, we cannot always justify the interchange of limit and summa- tion, and thus the above is not always valid. However, they do hold in most models-including all birth and death processes and all finite-state models. We thus have THEOREM 5.4.4 (Kolmogorov's Forward Equations). Under suitable regularity conditions, EXAMPLE 5.4(A) The Two-State Chain. Consider a two-state con- tinuous-time Markov chain that spends an exponential time with rate A in state 0 before going to state 1, where it spends an exponen- tial time with rate p., before returning to state 0 The forward equations yield P ~ o ( t ) = p.,Po,(t) - APoo(t) -(A + p.,)Poo(t) + p." THE KOLMOGOROV DIFFERENTIAL EQUATIONS where the last equation follows from POI (t) = 1 - Poo(t). Hence, or Thus, Since POD (0) = 1, we see that c = A/(A + /L), and thus P (t) = ~ + _A_e-(H/ll' 00 A + /L A + /L Similarly (or by symmetry), P (t) = _A_ + ~ e-(H/llt II A + /L A + /L EXAMPLE 5.4(a) The Kolmogorov forward equations for the birth and death process are P: O (t) = /L I P t1 (t) - A oP,o(t), P:j(t) = ArlP, rl (t) + /Lj+1 P, j+l(t) - (Aj + /Lj)p/j(t), j oF 0 EXAMPLE 5.4(c) For a pure birth process, the forward equations reduce to (543) P;t (t) = - A, PII(t), P;j(t) = Aj_IP'j_l(t)- AjP'j(t), j>i. Integrating the top equation of (543) and then using P II (0) = 1 yields The above, of course, is true as P II (t) is the probability that the time until a transition from state i is greater than t The other 243 244 CONTINUOUS TIME MARKOV CHAINS quantities P,} (t), j > i, can be obtained recursively from (5.4.3) as follows' From (5 4.3) we have, for j > i, e A / Ar I P lJ - 1 (t) == e A / [P:/t) + A} P,/t)] d == dt [eA/p,} (t)] Integration, using P,/O) = 0, yields In the special case of a Yule process, where A} = jA, we can use the above to verify the result of Section 5 3, namely, In many models of interest the Kolmogorov differential equations cannot be explicitly solved for the transition probabilities However, we can often compute other quantities of interest, such as mean values, by first deriving and then solving a differential equation Our next example illustrates the tech- nique. EXAMPLE 5.4(D) A Two Sex Population Growth Model. Consider a population of males and females and suppose that each female in the population independently gives birth at an exponential rate A, and that each birth is (independently of all else) a female with probability p and a male with probability 1 - p. Individual females die at an exponential rate fL and males at an exponential rate v. Thus, if X(t) and Y(t) denote, respectively, the numbers of females and males in the population at time t, then {[X(t), Y(t)], t ::> O} is a continuous-time Markov chain with infinitesimal rates' q[(n, m), (n + 1, m)] = nAp, q[(n, m), (n - 1, m)] = nfL q[(n, m), (n, m + 1)] = nA(1 - p) q[(n, m), (n, m - 1)] = mv. If X(O) = i, and Y(O) = j, find (a) E{X(t)], (b) E [Y(t)], and (c) Cov[X(t), Y(t)] Solution (a) To find E [X(t)], note that {X(t), t ~ O} is itself a continuous-time Markov chain. Letting Mx(t) = E [X(t)IX(O) = i] we will derive a differential equation satisfied by Mx(t). To begin, THE KOLMOGOROV DIFFERENTIAL EQUATIONS note that, given X(t). { X(t) + 1 X(t + h) = X(t) - 1 X(t) with probability ApX(t)h + o(h) with probability p.X(t)h + o(h) with probability 1 - (p. + Ap)X(t)h + o(h). (544) Hence, taking expectations yields E [X(t + h)IX(t)] X(t) + (Ap - p.)X(t)h + o(h), and taking expectations once again yields Mx(t + h) = Mx(t) + (Ap - p.)Mx(t)h + o(h) or, Mx(t + h) - Mx(t) = (A _ )M () +,o(h) h p p. xt h' Letting h ---+ 0 gives M/(t) (Ap p.)Mx(t) or, M/(t)/ Mx(t) :::; Ap - p. and, upon integration log Mx(t) (Ap p.)t + C or, Since M(O) = K = i, we obtain that (54.5) E[X(t)] ;;;: Mx(t) ie(Ap-,..)I (b) Since knowledge of Y(s), 0 s s < t, is informative about the number of females in the population at time t, it follows that the conditional distribution of Y(t + s) given Y(u), 0 < U S t, does 245 CONTINUOUS TIME MARKOV CHAINS not depend solely on Yet). Hence, {Yet), t 2: O} is not a time Markov chain, and so we cannot derive a differential equation for My(t) = E[Y(t)IY(O) j] in the same manner as we did for Mx(t) However. we can compute an expression for My(t + h) by conditioning on both X(t) and Yet). Indeed, conditional on X(t) and Yet), yet + h) = { yet) + 1 yet) - 1 yet) with probability A(1 p)X(t)h + o(h) with probability vY(t)h + o(h) with probability 1 - [A(1 - p)X(t) - vY(t)]h + o(h). Taking expectations gives E[Y(t + h)IX(t). yet)] yet) + [A(1 p)X(t) - vY(t)]h + o(h), and taking expectations once again, yields My(t + h) = My(t) + [A(1 - p)Mx(t) - vMy(t)]h + o(h). Upon subtracting My (t), dividing by h, and letting h --+ 0, we obtain My'(t) == A(1 - p)Mx(t) - vMy(t) == iA(1 - p)e(AP-fL)1 - vMy(t). Adding vMy(t) to both sides and then multiplying by e VI gives or Integrating gives Evaluating at t == 0 gives, j My (0) = iA(1 - p) + C Ap + v - J1- THE KOLMOGOROV DIFFERENTIAL EQUATIONS and so, (54.6) My(t) = iA(l - p) e(AP-I')1 + (j - iA(l - p) ) e- V / Ap + II - P. Ap + II - P. (c) Before finding Cov[X(t), yet)], let us first compute E [X2(t)]. To begin, note that from (5.4.4) we have that: E[X2(t + h)IX(t)] = [X(t) + 1]2ApX(t)h + [X(t) - 1fp.X(t)h + X2(t)[1 - (p. + Ap)X(t)h] + o(h) = X2(t) + 2(Ap - p.)hX2(t) + (Ap + p.)hX(t) + o(h). Now, letting M 2 (t) = E [X 2 (t)], and taking expectations of the preceding yields that Dividing through by h and taking the limit as h ---+ 0 shows, upon using Equation (5 4.5), that M ~ (t) = 2(Ap - P.)M2(t) + i(Ap + p.)e(AP-I')I. Hence, or, or, equivalently, or As M2 (0) = ;2, we obtain the result (54.7) 247 248 CONTINUOUS-TIME MARKOV CHAINS Now, let Mxy(t) = E[X(t)Y(t)]. Since the probability of two or more transitions in a time h is o(h), we have that· X(t + h)Y(t + h) (X(t) + l)Y(t) with prob. ApX(t)h + o(h) X(t)(Y(t) + 1) with prob. A(l - p)X(t)h + o(h) == (X(t) - l)Y(t) with prob li-X(t)h + o(h) X(t)(Y(t) - 1) with prob vY(t)h + o(h) X(t)Y(t) with prob. 1 - the above Hence, E[X(t + h)Y(t + h)IX(t), Yet)] = X(t)Y(t) Taking expectations gives + hX(t)Y(t)[Ap - Ii- - v] + X2(t)A(1 - p)h + o(h) Mxy{t + h) = Mxy{t) + heAp - Ii- - v)Mxy(t) + A(1- p)hE[X2(t)] + o(h) implying that, or, :t {e-(AP-/L- .}I M xy (tH = A(l - P )e-(AP -/L- .}I M2 (t) = A(l - p) i(1i- + Ap) e.' Ii- - Ap Integration now yields + A(l - p) [i2 - i(1i- + AP)] e(AP-/L+.}1 Ii- - Ap THE KOLMOGOROV DIFFERENTIAL EQUATIONS or, ( 548) Mxy(t) = £A(1 p)(p.. + Ap) e(AP-I')1 . . v(p..- Ap) + A(1 - p) [;2 _ ;(p.. + AP)] e2(Ap-p.)1 Ap -p.. + v p.. Ap + CelAp -p.- VII. Hence, from Equations (545), (546), and (5.48) we obtain after some algebraic simplifications that, Cov[X(t), Y(t)] = [C ij + ;2A(1 P)] e(Ap 1'-11)1 Ap + v-p.. + iA(1 - p)(p.. + Ap) e(Ap 1')1 v(p.. - Ap) _ A(1 - p) i(p.. + Ap) e 2 (AP "1')1 Ap-p..+v p.. Ap Using that Cov[X(O), Y(O)] = 0, gives C = ij _ FA(1 - p) ;A(1 p)(p.. + Ap) Ap + v - p.. v(p.. Ap) + A(1-p) i(p..+Ap) Ap -p.. + v p.. - Ap It also follows from Equations (545) and (5.4 7) that, Var[X(t)] = ;(p.. + Ap) eIAP-I')1 _ i(p.. + Ap) e2(AP 1')1 p.. - Ap p.. - Ap 5.4.1 Computing the Transition Probabilities If we define "} by iii =1= j if i j [hen the Kolmogorov backward equations can be written as 249 250 CONTINUOUS TIME MARKOV CH,\INS and the forward equations as These equations have a particularly nice form in matrix notation. If we define the matrices R, P(/), and pr(/) by l e ~ t i n g the element in row i, column j of these matrices be, respectively, "I' Prj (I), and P:} (t), then the backwards equations can be written as and the forward equations as P'(t) P(/)R This suggests the solution '" (549) P(/) = eRr == 2: (Rt)'li' ,,,,0 where RO is the identity matrix I; and in fact it can be shown that (54.9) is valid provided that the v. _are bounded The direct use of Equation (54.9) to compute P(t) turns out to be very inefficient for two reasons First, since the matrix R contains both positive and negative values (recall that the off-diagonal elements are the q,} while the ith diagonal element is 'Ii = - v,) there is a problem of computer round- off error when we compute the powers of R. Secondly, we often have to compute many of the terms in the infinite sum (549) to arrive at a good approximation. We can, however, use (5 49) to efficiently approximate the matrix P(t) by utilizing the matrix equivalent of the identity eX = lim (1 + xlnt n .... '" which states that e RI = lim (I + Rtln)n n .. ", By letting n be a power of 2, say n = 2k, we can thus approximate P(/) by raising the matrix M = 1 + Rtln to the nth power, which can be accomplished by k matrix multiplications In addition, since only the diagonal elements of R are negative and the diagonal elements of 1 are 1 we can, by choosing n large enough, guarantee that all of the elements of M are nonnegative. LIMITING PROBABILITIES 251 5.5 LIMITING PROBABILITIES Since a continuous-time Markov chain is a semi-Markov process with it follows, from the results of Section 4 8 of Chapter 4, that if the discrete- time Markov chain with transition probabilities P" is irreducible and positive recurrent, then the limiting probabilities P, = lim( _ '" P" (t) are given by (55.1) where the 1T, are the unique nonnegative solution of (55.2) From (5.5 1) and (5.52) we see that the P, are the unique nonnegative solu- tion of V, P, = 2: V, P, P,!, I or, equivalently, using q" = V, P", (553) Remarks (1) It follows from the results given in Section 4.8 of Chapter 4 for semi- Markov processes that P, also equals the long-run proportion of time the process is in state j (2) If the initial state is chosen according to the limiting probabilities {PI}' then the resultant process will be stationary. That is, for all t. 252 CONTINUOUS-TIME MARKOV CHAINS The above is proven as follows: L Pi,(t)P, = L P,,(t) lim P kr (s) , I J ...... {1J = lim L Pi,(t)Pk/(s) J ...... CJJ ; = lim Pkj(t + s) J_OO = P,. The above interchange of limit and summation is easily justified, and is left as an exercise. (3) Another way of obtaining Equations (5.5.3) is by way of the forward equations P:,(t) == L qkjP,k(t) - V, Pi, (t). k"', If we assume that the limiting probabilities P; = limr-oo P,; (t) exist, then P:, (t) would necessarily converge to 0 as t ---7 00. (Why?) Hence, assuming that we can interchange limit and summation in the above, we obtain upon letting t ---7 00, 0= L Pkqk, - v,P,. k"', It is worth noting that the above is a more formal version of the following heuristic argument-which yields an equation for P;, the probability of being in state j at t = oo-by conditioning on the state h units prior in time: P, == L P,j(h)P; , = L (q;,h + o(h»P; + (1 - vjh + o(h»P; '''' or _ ~ o(h) 0- 1::. P,q" - v,P, + h' , ~ , and the result follows by letting h ---7 O. (4) Equation (553) has a nice interpretation, which is as follows: In any interval (0, t), the number of transitions into state j must equal to within LIMITING PROBABILITIES 253 1 the number of transitions out of state j. (Why?) Hence, in the long run the rate at which transitions into state j occur must equal the rate at which transitions out of state j occur. Now when the process is in state j it leaves at rate v" and, since P, is the proportion of time it is in state j, it thus follows that ""P, = rate at which the process leaves state j. Similarly, when the process is in state i it departs to j at rate q,J' and, since P, is the proportion of time in state i, we see that the rate at which transitions from i to j occur is equal to q,;P;. Hence, 2: P;q" rate at which the process enters state j. Therefore, (5.5.3) is just a statement of the equality of the rate at which the process enters and leaves state j. Because it balances (that is, equates) these rates, Equations (5.5.3) are sometimes referred to as balance equations. (5) When the continuous-time Markov chain is irreducible and P, > 0 for all j, we say that the chain is ergodic. Let us now determine the limiting probabilities for a birth and death process. From Equations (5.5.3), or, equivalently, by equating the rate at which the process leaves a state with the rate at which it enters that state, we obtain State Rate Process Leaves Rate Process Enters o n,n > 0 Rewnting these equations gives or, equivalently, AoPo = /-LIP I , AI PI /-L2 P 2 + (AoPo - /-LI PI) = /-L2 P 2, A2P 2 = /-L3 P3 + (AIP 1 - /-L2 P 2) = /-L3 P 3, n ~ 1, AnPn = /-LntlPntl + (An-IPn- 1 - /-LnPn) = /-LntlPntl' 254 CONTINUOUS TIME MARKOV CHAINS Solving in terms of Po yields Ao PI = -Po, J.LI AI AIAo P 2 = -PI = --Po, J.L2 J.L2J.L1 A2 A2AI A o P 3 = -P 2 = Po J.L3 J.L3J.L2J.L1 Using L : ~ o P n = 1 we obtain OC A ···A A 1 = Po + Po L n - I I 0 n ~ I J.Ln· . J.L2J.L1 or and hence (55.4) n ~ l The above equations also show us what condition is needed for the limiting probabilities to exist Namely, EXAMPLE 5.5(A) The MIM/1 Queue. In the MIMIl queue An::;:: A, J.Ln = J.L, and thus from (55.4) n ~ O LIMITING PROBABILITIES provided that AI J.L < 1 It is intuitive that A must be less than J.L for limiting probabilities to exist. Customers arrive at rate A and are served at rate J.L, and thus if A > J.L, they will arrive at a faster rate than they can be served and the queue size will go to infinity. The case A = J.L behaves much like the symmetric random walk of Section 4.3 of Chapter 4, which is null recurrent and thus has no limiting probabilities ExAMPLE: 5.5(a} Consider a job shop consisting of M machines and a single repairman, and suppose that the amount of time a machine runs before breaking down is exponentially distributed with rate A and the amount of time it takes the repairman to fix any broken machine is exponential with rate J.L. If we say that the state is n whenever there are n machines down, then this system can be modeled as a birth and death process with parameters J.Ln :: J.L, n;?: 1 A = {(M - n)A n 0 n>M. From Equation (554) we have that P n , the limiting probability that n machines will not be in use, is given by 1 Po = --------- 1 + f M! n=1 J.L (M - n)! M! (A)n (M - n)' J.L P n = -":"'M-(-A-)-n 1 + -;;. (M - n)' n:: 0, . ,M Hence, the average number of machines not in use is given by M M' (A)n M L,n , - L,nP n = n=O (M-n) J.L n=O M (A)n M' 1 + -;;. (M - n)' Suppose we wanted to know the long-run proportion of time that a given machine is working. To determine this, we compute the 255 256 CONTINUOUS TIME MARKOV CHAINS equivalent limiting probability of its working. P{machine is working} M 2: P{machine is working In not working}Pn n 0 M M-n = 2: M Pn • n 0 Consider a positive recurrent, irreducible continuous-time Markov chain and suppose we are interested in the distribution of the number of visits to some state, say state 0, by time t. Since visits to state 0 constitute renewals , it follows from Theorem 3.3 5 that for t large the number of visits is approxi- mately normally distributed with meantl £ [Too] and variance tVar(T 00)1 £3[T oo ]' where Too denotes the time between successive entrances into state 0 E [Too] can be obtained by solving for the stationary probabilities and then using the identity To calculate £ suppose that, at any time t, a reward is being earned at a rate equal to the time from t until the next entrance into state O. It then follows from the theory of renewal reward processes that, with a new cycle beginning each time state 0 is entered, the long-run average reward per unit time is given by Average Reward £ [reward earned during a cycle] £ [cycle time] £ XdX] £ [Too] = £ £ [TooD· But as P, is the proportion of time the chain is in state i, it follows that the average reward per unit time can also be expressed as Average Reward = 2: P, £ [T,o] I where 7,0 is the time to enter 0 given that the chain is presently in state i Therefore, equating the two expressions for the average reward gives that £ = 2£ [Too] 2: P, £ [1-:0] 1 TIME REVERSIBILITY 257 By expressing T,o as the time it takes to leave state i plus any additional time after it leaves i before it enters state 0, we obtain that the quantities E [I:o] satisfy the following set of linear equations E [I:o] = 11 V, + L PI, E [T iO ], i:> 0. ,,.0 5.6 TIME REVERSIBILITY consider an ergodic continuous-time Markov chain and suppose that it has been in operation an infinitely long time, that is, suppose that it started at time - 00 Such a process will be stationary, and we say that it is in steady stote. (Another approach to generating a stationary continuous-time Markov chain is to let the initial state of this chain be chosen according to the stationary probabilities) Let us consider this process going backwards in time Now, since the forward process is a continuous-time Markov chain it follows that given the present state, call it X(t), the past state X(t - s) and the future states X(y), Y > t are independent. Therefore, P{X(t - s) j IX(t) i, X(y), Y > t} = P{X(t - s) = j IX(t) i} and so we can conclude that the reverse process is also a continuous-time Markov chain. Also, since the amount of time spent in a state is the same whether one is going forward or backward in time it follows that the amount of time the reverse chain spends in state i on a visit is exponential with the same rate VI as in the forward process To check this formally, suppose that the process is in i at time t Then the probability that its (backwards) time in i exceeds s is as follows. P{process is in state i throughout [t - s, t] IX(t) = i} :;::: P{process is in state i throughout [t - s, tn! P{X(t) = i} P{X(t - s) = i}e-v,S P{X(t) = i} since P{X(t s) = i} = P{X(t) = i} = PI' In other words, going backwards in time, the amount of time that the process spends in state i during a visit is also exponentially distributed with rate V,. In addition. as was shown in Section 4.7, the sequence of states visited by the reverse process constitutes a discrete-time Markov chain with transition probabilities P ~ given by P* = Tr,P'1 I, 258 CONTINUOUS-TIME MARKOV CHAINS where {1T" j ~ O} are the stationary probabilities of the embedded discrete. time Markov chain with transition probabilities P". Hence, we see that the reverse process is also a continuous-time Markov chain with the same transi_ tion rates out of each state as the forward time process and with one-stage transition probabilities P ~ . Let q * - p* i, - V, " denote the infinitesimal rates of the reverse chain. Using the preceding formula for P , ~ we see that However, recalling that we see that and so, That is, (5.6.1) P _ 1T k/ v k where C " / k - C = L.J 1T, v" 1T, = vi P, 1T, v, P, * _ v,P,P" q" - p. , P , q , ~ = P,q". Equation (56.1) has a very nice and intuitive interpretation. but to under- stand it we must first argue that the P, are not only the stationary probabilities for the original chain but also for the reverse chain. This follows because if P, is the proportion of time that the Markov chain is in state j when looking in the forward direction of time then it will also be the proportion of time when looking backwards in time. More formally, we show that PI' j :> 0 are the stationary probabilities for the reverse chain by showing that they satisfy the balance equations for this chain. This is done by summing Equation (5.6.1) TIME REVERSIBILITY 259 over all states i, to obtain L P , q , ~ = P, L q" = P, '" , I and so {PI} do indeed satisfy the balance equations and so are the stationary probabilities of the reversed chain. Now since q , ~ is the rate, when in state i, that the reverse chain makes a transition into state j, and P, is the proportion of time this chain is in state f, it follows that P, q , ~ is the rate at which the reverse chain makes a transition from i to j. Similarly, P, q" is the rate at which the forward chain makes a transition from j to i Thus, Equation (5.6.1) states that the rate at which the forward chain makes a transition from j to i is equal to the rate at which the reverse chain makes a transition from i to j. But this is rather obvious, because every time the chain makes a transition from j to i in the usual (forward) direction of time, the chain when looked at in the reverse direction is making a transition from i to j. The stationary continuous-time Markov chain is said to be time reversible if the reverse process follows the same probabilistic law as the original process. That is, it is time reversible if for all i and j which is equivalent to P,q" = P,q" for all i, j. Since P, is the proportion of time in state i, and since when in state f the process goes to j with rate q", the condition of time reversibility is that the rate at which the process goes directly from state i to state j is equal to the rate at which it goes directly from j to f. It should be noted that this is exactly the same condition needed for an ergodic discrete-time Markov chain to be time reversible (see Section 4.7 of Chapter 4). An application of the above condition for time reversibility yields the following proposition concerning birth and death processes. PROPOSITION 5.6.1 An ergodic birth and death process is in steady state time reversible Proof To prove the above we must show that the rate at which a birth and death process goes from state i to state i + 1 is equal to the rate at which it goes from i + 1 to i Now in any length of time t the number of transitions from i to i + 1 must 260 CONTINUOUS-TIME MARKOV CHAINS equal to within 1 the number from i + 1 to i (since between each transition from ito i + 1 the process must return to i, and this can only occur through i + 1, and vice versa) Hence as the number of such transitions goes to infinity as t - 00, it follows that the rate of transitions from i to i + 1 equals the rate from i + 1 to i Proposition 5 6.1 can be used to prove the important result that the output process of an MIMls queue is a Poisson process We state this as a corollary. Corollary 5.6.2 Consider an M 1M I ~ queue in which customers arrive in accordance with a Poisson process having rate A and are served by anyone of ~ servers-each having an exponen- tially distributed service time with rate IL If A < ~ I L , then the output process of customers departing is, in steady state, a Poisson process with rate A Proof Let X(t) denote the number of customers in the system at time t Since the MIMls process is a birth and death process, it follows from Proposition 561 that {X(t), t ~ O} is time reversible Now going forward in time, the time points at which X(t) increases by 1 constitute a Poisson process since these are just the arrival times of customers Hence by time reversibility, the time points at which the X(t) increases by 1 when we go backwards in time also constitute a Poisson process But these latter points are exactly the points of time when customers depart (See Figure 5 61 ) Hence the departure times constitute a Poisson process with rate A As in the discrete-time situation, if a set of probabilities {x) satisfy the time reversibility equations then the stationary Markov chain is time reversible and the x/s are the stationary probabilities. To verify this, suppose that the x/s are nonnegative and satisfy LX,= 1 , x,q" = x,q", for all i, j ------x------x--------------x--x-------------- Figure 5.6.1. x equals the times at which, going backwards in time, X(t) increases; x also equals the times at which, going forward in time, X(t) decreases. TIME REVERSIBILITY 261 summing the bottom set of equations over all i gives L x,q" = X,L q" = x,v, , Thus, the {x) satisfy the balance equations and so are the stationary probabili- tieS, and since x, q" = x, q " the chain is time reversible Consider a continuous-time Markov chain whose state space is S We say that the Markov chain is truncated to the set A C S if q" is changed to 0 for all i E A, j $. A. All other q" remain the same. Thus transitions out of the class of states A are not allowed A useful result is that a truncated time- reversible chain remains time reversible. PROPOSITION 5.6.3 A time-reversible chain with limiting probabilities PJ' j E S, that is truncated to the set A C S and remains irreducible is also time reversible and has limiting probabilities (562) jEA Proof We must show that fori E A, j E A, or, equivalently, foriE A, jE A But the above follows since the original chain is time reversible by assumption EXAMPLE 5.6(A) Consider an M I M 11 queue in which arrivals finding N in the system do not enter but rather are lost This finite capacity M I M II system can be regarded as a truncated version of the MI Mil and so is time reversible with limiting probabilities given by p = (AlJ.t) , , i (A)" , ~ O J.t o <j< N, where we have used the results of Example 5.5(A) in the above. 262 CONTINUOUS TIME MARKOV CHAINS ~ Leaves system Figure 5.6.2. A tandem queue. 5.6.1 Tandem Queues The time reversibility of the MIMls queue has other important implications for queueing theory For instance, consider a 2-server system in which custom_ ers arrive at a Poisson rate A at server 1 After being served by server 1, they then join the queue in front of server 2 We suppose there is infinite waiting space at both servers. Each server serves one customer at a time with server i taking an exponential time with rate J,L, for a service, i = 1,2 Such a system is called a tandem, or sequential, system (Figure 5.62) Since the output from server 1 is a Poisson process, it follows that what server 2 faces is also an MI Mil queue However, we can use time reversibility to obtain much more We need first the following Lemma. Lemma 5.6.3 In an ergodic M / M /l queue in steady state' (i) the number of customers presently in the system is independent of the sequence of past departure times, (ii) the waiting time spent in the system (waiting in queue plus service time) by a customer is independent of the departure process prior to his departure Proof (i) Since the arrival process is Poisson, it follows that the sequence of future arrivals is independent of the number presently in the system Hence by time reversibil- ity the number presently in the system must also be independent of the sequence of past departures (since, looking backwards in time, departures are seen as arnvals) (ii) Consider a customer that arrives at time T, and departs at time T2 Because the system is first come first served and has Poisson arrivals, it follows that the waiting time of the customer, T2 - T" is independent of the arnval process after the time T, Now looking backward in time we will see that a customer arrives at time T2 and the same customer departs at time T, (why the same customer?) Hence, by time reversibil- ity, we see, by looking at the reversed process, that T2 - T, will be independent of the (backward) arrival process after (in the backward direction) time T 2 • But this is just the departure process before time T2 TIME REVERSIBILITY 263 tHEOREM 5.6.4 For the ergodic tandem queue in steady state (i) the number of customers presently at server 1 and at server 2 are independent, and (ii) the waiting time of a customer at server 1 is independent of its waiting time at server 2. Proof (i) By part (i) of Lemma 5.6 3 we have that the number of customers at server 1 is independent of the past departure times from server 1. Since these past departure times constitute the arrival process to server 2, the independence of the numbers of customers in the two systems follows The formula for the joint probability follows from independence and the formula for the limiting probabilities of an MIMIl queue given in Example 5 5(A). (ii) By part (ii) of Lemma 5.6.3 we see that the time spent by a given customer at server 1 is independent of the departure process prior to his departing server 1 But this latter process, in conjunction with the service times at server 2, clearly detenrunes the customer's wait at server 2 Hence the result follows Remarks (1) For the formula in Theorem 5.6.4(i) to represent the joint probability, it is clearly necessary that AI iJ-, < 1, i = 1,2. This is the necessary and sufficient condition for the tandem queue to be ergodic. (2) Though the waiting times of a given customer at the two servers are independent, it turns out somewhat surprisingly that the waiting times in queue of a customer are not independent. For a counterexample, suppose that A is very small with respect to iJ-I = iJ-2; and thus almost all customers have zero wait in queue at both servers. However, given that the wait in queue of a customer at server 1 is positive, his wait in queue at server 2 will also be positive with probability at least as large as t (Why?) Hence, the waiting times in queue are not independent. 5.6.2 A Stochastic Population Model Suppose that mutant individuals enter a population in accordance to a Poisson process with rate A. Upon entrance each mutant becomes the initial ancestor 264 CONTINUOUS-TIME MARKOV CHAINS of a family. All individuals in the population act independently and give birth at an exponential rate "and die at an exponential rate J.L, where we assume that" < J.L Let N/t) denote the number of families at time t that consist of exactly j members, j :> 0, and let N(t) = (N)(t), Nlt), ... ). Then {N(t), t 2: O} is a continuous-time Markov chain For any state!! = (n], n 2 , ... , n" .. ) with n, > 0, define the states and also define Bo!}.. = (n) + 1, n2, . ), D)!}..=(n)-1,n2,' ) j2:.1, Thus B , ~ and D,!! represent the next state from n if there is respectively a birth or death in a family of size j, j 2: 1, whereas Bo!! is the next state when a mutant appears If we let q(!!, !!') denote the transition rates of the Markov chain, then the only nonzero rates are q (!!, B o !!) = A, q(!!, B,!!) = jn,'" q (!!, D,!!) = jn,J.L, j 2:. 1, To analyze this Markov chain it will be valuable to note that families enter the population in accordance to a Poisson process, and then change states in a random fashion that is independent of the activities of other families, where we say that a family is in state j if it consists of j individuals. Now let us suppose that the population is initially void-that is, N(O) = Q-and call an arriving mutant type j if its family will consist of j individuals at time t Then it follows from the generalization of Proposition 2 3 2 of Chapter 2 to more than two types that {N/t), j :> I} are independent Poisson random variables TIME REVERSIBILITY 265 with respective means (5 6.3) where Pis) is the probability that a family, originating at time s, will consist of j individuals at time t. Letting denote the limiting probabilities, it follows from the fact that N (t), j ;::= 1, are independent Poisson random variables when N(O) = Q that the limiting probabilities will be of the form (5.6.4) '" n (X' Pen) = II e- a ,-' - n,! for some set (XI' (X2, We will now determine the (x, and show at the same time that the process is time reversible For of the form (5.64), '" n (X' = A II e- a , -\ ' n, (56.5) Similarly (56.4) gives that, for j ;::= 1, Using (565), this yields that Hence, for given by (564) with (x, = A(vlJ.L)'ljv, we have shown We can show a similar result for state!! and D,'!:. by writing!! as and then using the above. We thus have the following 266 CONTINUOUS TIME MARKOV CHAINS - THEOREM 5.6.5 The continuous-time Markov chain {N(t), t ::! O} is, in steady state, time reversible with limiting probabilities, where In other words the Iimiiing number offamilies that consist ofi individuals are independent Poisson random variables with respective means ~ (!:)', i::! I IV f.L There is an interesting interpretation of the (x, aside from it being the limiting mean number of families of size i. From (5 6.3) we see that E [N,(r)] = A t q(r- s) ds = A tq(s)ds, where q(s) is the probability that a family consists of i individuals a time s after its origination Hence, (566) ~ ~ n ; E [N,(r)] = A t q(s) ds But consider an arbitrary family and let Then the family contains i members a time s after its ongination otherwise f ~ q(s) ds = J ~ E [1(s)] ds = E [amount of time the family has i members] TIME REVERSIBILITY Hence, from (5.66), lim E[N,(t)] = AE[amount of time a family has i members], /-'" and since we see that A (V)' E[N,(t)] -;. a, =. - IV /-L E [amount of time a family has i members] = (v:/-L)' IV 267 Consider now the population model in steady state and suppose that the present state is !!..*. We would like to determine the probability that a given family of size i is the oldest (in the sense that it originated earliest) family in the population It would seem that we would be able to use the time reversibility of the process to infer that this is the same as the probability that the given family will be the last surviving family of those presently in existence. Unfortunately, however, this conclusion does not immediately follow, for with our state space it is not possible, by observing the process throughout, to determine the exact time a given family becomes extinct Thus we will need a more informative state that enables us to follow the progress of a given family throughout time For technical reasons, it will be easiest to start by truncating the model and not allowing any more than M families to exist, where M ;::: nt. That is, whenever there are M families in the population, no additional mutants are allowed. Note that, by Proposition 5 6.3, the truncated process with states remains time reversible and has stationary probabilities where a , = A(vl/-L)'liv To keep track of a given family as time progresses, we will have to label the different families Let us use the labels 1,2, , M and agree that whenever a new family originates (that is, a mutant appears) its label is uniformly chosen from the set of labels not being used at that time If we let Sf denote the number of individuals in the family labeled i, i = 1, , M (with Si = 0 meaning that there is at present no family labeled i), then we can consider the process has states S (s" , SM), Si :> 0 For a given s, let n(s) = " ), where is, as before, the number of families-oC size 268 CONTINUOUS-TIME MARKOV CHAINS i. That is, = number of j s, = i. To obtain the stationary distribution of the Markov chain having states s , note that - for = ()O n ex' = P(sln)Crr e-a,-' . -- 1=1 n,! Since alliabelings are chosen at random, it is intuitive that for a given vector n all of the M! ( M - i nl)! IT n,! 1 1=1 possible vectors that are consistent with are equally likely (that is, if M = 3, n l = n 2 = 1, n, = 0, i ;::= 3, then there are two families-one of size 1 and one of size 2-and it seems intuitive that the six possible states permutations of 0, 1, 2-are equally likely). Hence it seems intuitive that (5.6.7) where n, = n,(s) and ex , = >"(VIJJ-)'liv We will now verify the above formula and show at the same time that the chain with states s is time reversible PROPOSITION 5.6.6 The chain with states = (Sl' ,SM) is time reversible and has stationary probabilities given by (567) Proof For a vector "" (Sl, 'SI'" SM), let that is, is the state following if a member of the family labeled i gives birth Now for s, > 0 = S,V, S, > 0, = (S, + 1)1-', s, > 0 TIME REVERSIBILITY 269 Also, if .. , n" n, +!, , , ), then ,n,-I,n, +!+I, ) , , Hence for the given by (5.67) and for s, > 0, (568) is equivalent to n,a, (n, +!+I)a, +! " , , s,v = (n , 1)'(n , +1 1)' (s, + , , or or, as a, = which is clearly true. Since there are M - n,(O available labelings to a mutant born when the state we see that A = M - n (s) '- ifs, = 0, q(B,(s),s) = ifs, = 0 The Equation (5 6.8) is easily shown to also hold in this case and the proof is thus com- plete. Corollary 5.6.7 If in steady state there are n" i > 0, families of size i, then the probability that a given family of size i is the oldest family in the population is Proof Consider the truncated process with states s and suppose that the state s is such that = n, A given family of size i will be the oldest if, going backwards in 270 CONTINUOUS-TIME MARKOV CHAlNS time, it has been in existence the longest But by time reversibility, the process going backward in time has the same probability laws as the one going forward, and hence this is the same as the probability that the specified family will survive the longest among all those presently in the population But each individual has the same probability of having his or her descendants survive the longest, and since there are ~ jn/ individuals in the population, of which i belong to the specified family, the result follows in this case The proof now follows in general by letting M -» 00 Remark We have chosen to work with a truncated process since it makes it easy to guess at the limiting probabilities of the labeled process with states s. 5.7 ApPLICATIONS OF THE REVERSED CHAIN TO QUEUEING THEORY The reversed chain can be quite a useful concept even when the process is not time reversible. To see this we start with the following continuous-time analogue of Theorem 4.7.2 of Chapter 4 THEOREM 5.7.1 Let q,/ denote the transition rates of an irreducible continuous-time Markov chain If we can find a collection of numbers q ~ , i, j 2: 0, i -F j, and a collection of nonnegative numbers P" i 2: 0, summing to unity, such that and 2: qlJ = 2: q ~ , i 2: 0, /,., /,., then q ~ are the transition rates for the reversed chain and P, are the limiting probabilities (for both chains) The proof of Theorem 5.7 1 is left as an exercise Thus if we can guess at the reversed chain and the limiting probabilities, then we can use Theorem 5 7 1 to validate our guess. To illustrate this approach consider the following model in the network of queues that substantially generalizes the tandem queueing model of the previous section. APPLICATIONS OF THE REVERSED CHAIN TO QUEUEING THEORY 271 5.7.1 Network of Queues We consider a system of k servers in which customers arrive, from outside the system, to each server i, i = 1, .. , k, in accordance with independent Poisson processes at rate '" they then join the queue at i until their turn at ser- vice comes. Once a customer is served by server i he then joins the queue in front of server j with probability P,}, where L : ~ ] P'I s 1, and 1 - L : ~ I P'I represents the probability that a customer departs the system after being served by server i If we let A I denote the total arrival rate of customers to server j, then the AI can be obtained as the solution of (571) k AI = 'I + L A,P,}, ,-] j= 1, ,k Equation (571) follows since 'I is the arrival rate of customers to j coming from outside the system and, since A, is the rate at which customers depart server i (rate in must eq ual rate out), A, P'I is the arrival rate to j of those coming from server i This model can be analyzed as a continuous-time Markov chain with states (n], n 2 , , n k ), where n, denotes the number of customers at server i In accordance with the tandem queue results we might hope that the numbers of customers at each server are independent random variables That is, letting the limiting probabilities be denoted by P(n], n 2 , , nk), let us start by attempting to show that (572) where P,(n,) is thus the limiting probability that there are n, customers at server i To prove that the probabilities are indeed of the above form and to obtain the P,(n,), i = 1, .. , k, we will need to first digress and speculate about the reversed process. Now in the reversed process when a customer leaves server i, the customer will go to j with some probability that will hopefully not depend on the past If that probability-let's ·call it P,I-indeed does not depend on the past, what would its value be? To answer this, note first that since the arrival rate to a server must equal the departure rate from the server, it follows that in both the forward and reverse processes the arrival rate at server j is A I Since the rate at which customers go from j to i in the forward process must equal the rate at which they go from i to j in the reverse process, this means that ) or (573) P =AIPI, II A, 272 CONTINUOUS-TIME MARKOV CHAINS Thus we would hope that in the reversed process, when a customer leaves server i, he will go to server j with probability P" = A ,P j ,! A,. Also, arrivals to i from outside the system in the reversed process corre_ spond to departures from i that leave the system in the forward process, and hence occur at rate A 1(1 - L, P,,) The nicest possibility would be if this Was a Poisson process; then we make the following conjecture. Conjecture The reversed stochastic process is a network process of the same type as the original It has Poisson arrivals from outside the system to server i at rate A, (1 - Lj Pi,) and a departure from i goes to j with probability Pi; as given by (5 7.3) The service rate i is exponential with rate J.L, In addition the limiting probabilities satisfy To prove the conjecture and obtain the P, (n i ), consider first transitions resulting from an outside arrival. That is, consider the states = (n), .. ,n" ,n,+l, .,nk)Now qn n' = '" and, if the conjecture is true, J.L ,,, = A, (from (5 7 1» and = II P, (n,), = P,(n, + 1) II P, (n,) , i,,*1 Hence from Theorem 5 7.1 we need that APPLICATIONS OF THE REVERSED CHA1N TO QUEUEING lHEORY or That is, Pi(n + 1) A, -P,(n) J.Li and using that L:" 0 P, (n) = 1 yields (57.4) 273 Thus A;I J.L / must be less than unity and the P, must be as given above for the conjecture to be true. To continue with our proof of the conjecture, consider those transitions that result from a departure from server j going to server i That is, let n = (n l , .. , n" , nl" , n,...) and !!' = (nl. . ,n, + 1, .. , n, 1, ..• n,,), where n, > O. Since and the conjecture yields we need to show that or, using (5.7.4), that which is the definition of P". Since the addition verifications needed for Theorem 5.7.1 follow in the same manner, we have thus proven the following 274 CONTINUOUS TIME MARKOV CHAINS THEOREM 5.7.2 - Assuming that A, < f.L, for all i, in steady state, the number of customers at service i ar independent and the limiting probabilities are given by e Also from the reversed chain we have the following Corollary 5.7.3 The processes of customers departing the system from server i, i = I, independent Poisson processes having respective rates A, (I - ~ I P,) • , k, are Proof We've shown that in the reverse process, customers arrive to server i from outside the system according to independent Poisson processes having rates A, (l - ~ I P,), i ~ 1 Since an arrival from outside to server i in the reverse process corresponds to a departure out of the system from server i in the forward process, the result follows Remarks (1) The result embodied in Theorem 5.7.2 is rather remarkable in that it says that the distribution of the number of customers at server i is the same as in an MIMll system with rates A, and IL,. What is remarkable is that in the network model the arrival process at node i need not be a Poisson process For if there is a possibility that a customer may visit a server more than once (a situation called feedback), then the arrival process will not be Poisson An easy example illustrating this is to suppose that there is a single server whose service rate is very large with respect to the small arrival rate from outside Suppose also that with probability p = 9 a customer upon completion of service is fed back into the system Hence, at an arrival time epoch there is a large probability of another arrival in a short time (namely, the feedback arrival), whereas at an arbitrary time point there will be only a very slight chance of an arrival occurring shortly (since A is small). Hence the arrival process does not possess independent increments and so cannot be Poisson I CATIONS OFTHE REVERSED CHAIN TO QUEUEING THEORY APPL 275 (2) The model can be generalized to allow each service station to be a multi-server system (that is, server i operates as an M/M /k, rather than an M/Mll system) The limiting numbers of customers at each station will again be independent and e e number of customers at a server will have the same limiting distribution as if its arrival process was Poisson. 5.7.2 The Erlang Loss Formula Consider a queueing loss model in which customers arrive at a k server system in accordance with a Poisson process having rate A It is a loss model in the sense that any arrival that finds all k servers busy does not enter but rather is lost to the system The service times of the servers are assumed to have distribution G We shall suppose that G is a continuous distribution having density g and hazard rate function A(t) That is, A(t) = g(t)/ G (t) is, loosely speaking, the instantaneous probability intensity that a t-unit-old service will end We can analyze the above system by letting the state at any time be the ordered ages of the customers in service at that time That is, the state will be x ::::: (XI' X 2 , ••• , xn), XI ~ X2 ~ ••• :::; X n , if there are n customers in service, the-most recent one having arrived XI time units ago, the next most recent arrived being X 2 time units ago, and so on. The process of successive states will be a Markov process in the sense that the conditional distribution of any future state, given the present and all the past states, will depend only on the present state Even though the process is not a continuous-time Markov chain, the theory we've developed for chains can be extended to cover this process, and we will analyze the model on this basis We will attempt to use the reverse process to obtain the limiting probability density p(XI> X 2 , •. , xn), 1 ~ n ~ k, XI ~ X2 ~ ••• ::s; X n , and P(c/J) the limiting probability that the system is empty. Now since the age of a customer in service increases linearly from 0 upon her arrival to her service time upon her departure, it is clear that if we look backwards we will be following her excess or additional service time As there will never be more than k in the system, we make the following conjecture. Conjecture The reverse process is also a k-server loss system with service distribution G in which arrivals occur according to a Poisson process with rate A The state at any time represents the ordered residual service times of customers presently in service. We shall now attempt to prove the above conjecture and at the same time obtain the limiting distribution For any state! = (xI> .. , X". , xn) let e,(::!:) = (XI' , X, -I' X, + I' .. , xn). Now in the original process when the state is::! it will instantaneously go to e,(x) with a probability density equal to A(x,) since the person whose time in service is x, would have to instantaneously 276 CONTINUOUS TIME MARKOV CHAINS complete its service. Similarly, in the reversed process if the state is el!) , then it will instantaneously go to:! if a customer having service time X, instanta- neously arrives. So we see that in forward:! - with probability intensity A(x,); in reverse: -:! with (joint) probability intensity Ag(x,). Hence ifp(:!) represents the limiting density, then in accordance with Theorem 5.7.1 we would need that p(:!)A(x,) = p(e,(:!))Ag(xJ, or, since A(x,) = g(x,)1 G(xJ, p(:!) = p(e,(:!»AG(x l ) Letting i 1 and iterating the above yields (5 7.5) = AG(xl)p(e l (:!» = II = II AG(x,)P( cP). 1=1 Integrating over all vectors :! yields If (57.6) P{n in the system} = P( cP)A II J J ., J II G(x,) dX 1 dX2 .. , dX II "'1"'"'2'" ",",.1 1 A" n = P(cP);;I J J ' .. J II G(x,) dx] dX 2 ••• dXn "'1'"'2 "'. ,-1 == P(cP) (AE [,S]t, n. n = 1,2, .. , k, where E [S] = f G(x) dx is the mean service time And, upon using k P( cP) + L P{n in the system} I, n"'l APPLICATIONS OF THE REVERSED CHAIN TO QUEUEING THEORY we obtain (5.77) (AE [S]t P{n in the system} = k ( n! [ ] , " AE S-.1 L..J i' From (57.5) we have that n (57.8) n = 0, 1,. ,k. 277 and we see that the conditional distribution of the ordered ages given that there are n in the system is p(X) p{xln in the system} = P{ . h - } - n In t esystem = nl IT G(x,) ,=1 E[S] As G(x)/E [S] is just the density of Go the equilibrium distribution of G, we see that, if the conjecture is valid, then the limiting distrihution of the numher in the system depends on G only through its mean and is given hy (5.77), and, given that there are n in the system, their (unordered) ages are independent and are identically distrihuted according to the equilihrium distrihution G e of G. To complete our proof of the conjecture we must consider transitions of the forward process from! to (0, !) = (0, XI' x 2 , " ,x n ) when n < k. Now in forward:! ~ (O,!) with instantaneous intensity A, in reverse. (O,!) ~ ! with probability 1 Hence in conjunction with Theorem 5.7 1 we must verify that p(!)A = p(O,!), which follows from (5.78) since G(O) = 1 Therefore, assuming the validity of the analogue of Theorem 57 1 we have thus proven the following 278 CONTINUOUS-TIME MARKOV CHAINS THEOREM 5.7.4 The limiting distribution of the number of customers in the system is given by (AE [SlY (579) . n' P{n In system} = k (AE [S])" 2: ., ,=0 I n = 0, 1, ,k, and given that there are n in the system the ages (or the residual times) of these n are independent and identically distributed according to the equilibrium distribution of G The model considered is often called the Erlang loss system, and Equation (5.79), the Erlang loss formula. By using the reversed process we also have the following corollary. Corollary 5.7.5 J n the Erlang loss model the departure process (including both customers completing service and those that are lost) is a Poisson process at rate A Proof The above follows since in the reversed process arrivals of all customers (including those that are lost) constitutes a Poisson process 5.7.3 The MIG/l Shared Processor System Suppose that customers arrive in accordance with a Poisson process having rate A Each customer requires a random amount of work, distributed according to G The server can process work at a rate of one unit of work per unit time, and divides his time equally among all of the customers presently in the system That is, whenever there are n customers in the system, each will receive service work at a rate of lin per unit time Let A(t) = g(t)/ G(t) denote the failure rate function of the service distribu- tion, and suppose that AElSj < 1, where E[S] is the mean of G To analyze the above let the state at any time be the ordered vector of the amounts of work already performed on customers still in the system That is, the state is:! = (x" X 2 , , xn), XI ~ x 2 ~ • ~ X n , if there are n customers in the system and XI, , Xn is the amount of work performed on these n APPLICATIONS OF THE REVERSED CHAIN TO QUEUEING THEORY 279 customers Let p(-!) and P( c/J) denote the limiting probability density and the limiting probability that the system is empty We make the following conjecture regarding the reverse process Conjecture The reverse process is a system of the same type, with customers arriving at a Poisson rate A, having workloads distributed according to G and with the state representing the ordered residual workloads of customers presently in the system. To verify the above conjecture and at the same time obtain the limiting distribution let = (xp .. , X,_I' Xi+ 1" , xn) when -! = (XI' . , xn), XI ::; X2 .. x n · Note that in forward: with probability intensity A(x')ln; in reverse e,(x) with (joint) probability intensity Ag(x;) The above follows as in the previous section with the exception that if there are n in the system then a customer who already had the amount of work Xj performed on it will instantaneously complete service with probability intensity A(x,)/n. Hence, if p(x) is the limiting density, then in accordance with Theorem 5.7 1 we need that p(X) A(x,) = p(e,(x))Ag(x,), - n - or, equivalently, (5.710) = = nG(x,)(n j;;6 i n = n'Anp(c/J) II G(x,). ,=1 Integrating over all vectors yields as in (5.7.6) P{n in system} = (AE [S]t P( c/J). Using 00 P(c/J) + 2: P{n in system} = 1 n I 280 CONTINUOUS·TIME MARKOV CHAINS gives P{n in system} = (AE [S]Y(1 - AE [S]), n ~ o. Also, the conditional distribution of the ordered amounts of work already performed, given n in the system, is, from (5.7.10), p ( ~ l n ) = p ( ~ ) ! P{n in system} _ 'rr n G(x,) - n. 1=] E[S]' That is, given n customers in the system the unordered amounts of work already performed are distributed independently according to G" the equilib- rium distribution of G. All of the above is based on the assumption that the conjecture is valid. To complete the proof of its validity we must verify that 1 p(x)A = p(O,x)--l' - - n + the above being the relevant equation since the reverse process when in state (e,:!.) will go to state:!. in time (n + l)e Since the above is easily verified we have thus shown the following. THEOREM 5.7.6 For the Processor Sharing Model the number of customers in the system has the distri- bution P{n in system} = (AE[S]Y(l - AE[S]), n ~ 0 Given n in the system, the completed (or residual) workloads are independent and have distribution G, The departure process is a Poisson process with rate A. If we let L denote the average number in the system, and W, the average time a customer spends in the system, then 00 L = 2: n(AE [S]Y(1 - AE [S]) n=O AE [S] 1 - AE[S)" APPLICATIONS OF THE REVERSED CHAIN TO QUEUEING THEORY 281 We can obtain W from the formula L = AW (see Section 3 6.1 of Chapter 3), and so W= L = E[S] A 1-AE[S], Remark It is quite interesting that the average time a customer spends in the system depends on G only through its mean For instance, consider two such systems. one in which the workloads are identically 1 and the other where they are exponentially distributed with mean 1 Then from Theorem 5.7 6 the distribution of the number in the system as seen by an arnval is the same for both However, in the first system the remaining workloads of those found by an arrival are uniformly distributed over (0, 1) (this being the equilibrium distribution for the distribution of a deterministic random variable with mean 1), whereas in the second system the remaining workloads are exponentially distributed with mean 1. Hence it is quite surprising that the mean time in system for an arrival is the same in both cases. Of course the distribution of time spent by a customer in the system, as opposed to the mean of this distribution, depends on the entire distribution G and not just on its mean Another interesting computation in this model is the conditional mean time an arrival spends in the system given its workload is y To compute this quantity fix y and say that a customer is "special" if its workload is between y and y + e By L = AW we thus have that average number of special customers in the system = (average arrival rate of special customer) x (average time a special customer spends in the system). To determine the average number of special customers in the system let us first determine the density of the total workload of an arbitrary customer presently in the system Suppose such a customer has already received the amount of work x Then the conditional density of the customer's workload is f(wlhas received x) = g(w)/ G(x), But, from Theorem 5.7.6, the amount of work an arbitrary customer in the system has already received has the distribution G •. Hence the density of the total workload of !lome one present in the system is few) = f'" g(w) dG (x) o G(x) • = f'" g(w) dx o E[S] = wg(w)/ E[S] ( . _ G(x) ) Since dG.(x) - E [S] dx 282 CONTINUOUS-TIME MARKOV CHAINS Hence the average number of special customers in the system is E [number in system having workload betweeny and y + e] = Lf(y)e + o(e) = Lyg(y)e/ E [S] + o(e) In addition, the average arrival rate of customers whose workload is between y and y + e is average arrival rate = Ag(y)e + o(e) Hence we see that E [time in system I workload in (y,y + e)] = :i(;;e EL(s] + o ~ e ) Letting e ~ 0 we obtain (5.711) E [time in system I workload is y] = Ai[s] L y 1 - AE[S] Thus the average time in the system of a customer needing y units of work also depends on the service distribution only through its mean As a check of the above formula note that w = E [time in system] = f E [time in systemlworkload is y] dG(y) E[S] 1 - AE[S] (from (5 7 11)), which checks. 5.8 UNIFORMIZATION Consider a continuous-time Markov chain in which the mean time spent in a state is the same for all states That is, suppose that V, = V for all states i In this case since the amount of time spent in each state during a visit is exponentially distributed with rate v, it follows that if we let N(t) denote the number of state transitions by time t, then {N(t), t ::2': O} will be a Poisson process with rate V UNIFORMIZATION 283 To compute the transition probabilities P,,(/), we can condition on N(/) as follows: PJ/) = P{X(/) = j IX(O) = i} '" = L P{X(/) = jIX(O) = i, N(/) = n}P{N(/) = nIX(O) = i} n=O = i P{X(/) = jIX(O) = i, N(/) = n}e-"' (vIr n ~ O n The fact that there have been n transitions by time I gives us some information about the amounts of time spent in each of the first n states visited, but as the distribution of time spent in each state is the same for all states, it gives us no information about which states were visited Hence, P{X(/) = j IX(O) = i, N(/) = n} = P ~ , where P ~ , is just the n-stage transition probability associated with the discrete- time Markov chain with transition probabilities P,,; and so when v, == V (58.1) P (I) = i pn e-"I (V/)n " n-O I, n! The above equation is quite useful from a computational point of view since it enables us to approximate P,,(/) by taking a partial sum and then computing (by matrix multiplication of the transition probability matrix) the relevant n-stage probabilities P ~ , Whereas the applicability of Equation (58 1) would appear to be quite limited since it supposes that v, == v, it turns out that most Markov chains can be put in that form by the trick of allowing fictitious transitions from a state to itself To see how this works, consider any Markov chain for which the v, are bounded and let V be any number such that (5.82) V, ~ V for all i. Now when in state i the process actually leaves at rate v,; but this is equivalent to supposing that transitions occur at rate v, but only the fraction v,lv are real transitions out of i and the remainder are fictitious transitions that leave the process in state i. In other words, any Markov chain satisfying condition (5 8.2) can be thought of as being a process that spends an exponential amount of time with rate V in state i and then makes a transition to j with probability P,:, where (58.3) P*= " V, 1-- J=t V j ¥ i. 284 CONTINUOUS TIME MARKOV CHAINS Hence, from (5 8.1) we have that the transition probabilities can be com- puted by where are the n-stage transition probabilities corresponding to (583) This technique of uniformizing the rate in which a transition occurs from each state by introducing transitions from a state to itself is known as umJormization EXAMPLE 5.8(A) Let us reconsider the two-state chain of Example 5 4(A), which has POI = PIG = 1, Va = A, Letting V = A + p" the uniformized version of the above is to consider it a continuous-time Markov chain with P oo = = 1 - Pal' A + p, V, = A + p" i = 0, 1 Since P oo = P 10 , it follows that the probability of transition into state 0 is equal to p,/(A + p,) no matter what the present state Since a similar result is true for state 1, it follows that the n-stage transition probabilities are given by pn _ p, ,a - A + p,' n:> 1, i = 0, 1. Hence, Similarly, A+p, A+p, UNIFORMIZATION 285 Another quantity of interest is the total time up to t that the process has been in a given state That is, if X(t) denotes the state at t (either 0 or 1), then if we let S/(t), i = 1, 0, be given by SI(t) = t Xes) ds (5 8 4) So(t) = t (1 - X(s» ds then S;(t) denotes the time spent in state i by time t The stochastic process {S;(t), t > O} is called the occupation time process for state i Let us now suppose that X(O) = 0 and determine the distribution of So(t) Its mean is computed from (584) E [So(t)IX(O) = 0] = t E [1 - Xes)] ds = t Poo(s) ds We will not attempt to compute the variance of So(t) (see Problem 536) but rather will directly consider its distribution To determine the conditional distribution of So(t) given that X(O) = 0, we shall make use of the uniformized representation of {X(t), t O} and start by conditioning on N(t) to obtain, for s < t, '" «A + )t)n P{So(t) = L e-(A+I')/ ;- P{So(t) ssIN(t) = n} n=O n. The last equality following since N(t) = 0 implies that Suet) = t (since X(O) = 0) Now given that N(t) = n, the interval (0, t) is broken into the n + 1 subintervals (0, (X(I), X(2)' , (X(n-!), X(n), (X(n) , t) where X(I) :5 X(2) :5 ... S X(n) are the n event times by t of {N(s)} The process {X(t)} will be in the same state throughout any given subinterval It will be in state 0 for the first subinterval and afterwards it will, independently for each subinterval, be in state 0 with probability J,L/(A + J,L) Hence, given N(t) = n, the number of subintervals in which the process is in state 0 equals 1 plus a Binomial (n, J,L/(A + J,L» random variable If this sum equals k (that is, if 1 + Bin(n, J,L/(A + J,L) = k» then So(t) will equal the sum of the lengths of k of the above subintervals However, as X(I)' , X(n) are distributed as the order statistics of a set of n independent uniform (0, t) random variables it follows that the 286 CONTINUOUS TIME MARKOV CHAINS joint distribution of the n + 1 su bin terval lengths Y I , Y 2 , •• ,Y n + I-where Y, X(/) X(I I), il,. " n + 1 with X(O) = 0, X(n + I) t-is exchangeable. That is, P{Y, <: y" iI, . , n + I} is a symmetric function of the vector (y" . ,Yn + I)' (See Problem 5.37) Therefore, the distribution of the sum of any k of the Y/s is the same no matter which ones are in the sum Finally, as we see that when k <: n, this sum has the same distribution as the kth largest of a set of n independent uniform (0, I) random vanables. (When k = n + 1 the sum equals I ) Hence, for s < t, Putting the above together gives, for s < I, P{So(t) ~ sIX(O) = O} = i e-(HI')(((A + J.L)t)n f ( n ) ( ~ ) k 1(_A_)n-k+l n 1 n' k-J k-l A+J.L A+J.L Also, P{So(t) = IIX(O) O} = e- At PROBLEMS 5.1. A popUlation of organisms consists of both male and female members. In a small colony any particular male is likely to mate with any particular female in any time interval of length h, with probability Ah + o(h). Each mating immediately produces one offspring, equally likely to be male or female Let N1(t) and N 2 (t) denote the number of males and females in the popUlation at t. Denve the parameters of the continuous-time Markov chain {Nr(t), Nlt)}. 5.2. Suppose that a one-celled organism can be in one of two states-either A or B An individual in state A will change to state B at an exponential rate a, an individual in state B divides into two new individuals of type A at an exponential rate {3 Define an appropriate continuous-time Markov chain for a population of such organisms and determine the appropriate parameters for this model. PROBLEMS 287 5.3. Show that a continuous-time Markov chain is regular, given (a) that ", < M < 00 for all i or (b) that the discrete-time Markov chain with transition probabilities P" is irreducible and recurrent. 5.4. For a pure birth process with birth parameters An, n ~ 0, compute the mean, variance, and moment generating function of the time it takes the population to go from size 0 to size N. 5.5. Consider a Yule process with X(O) = i Given that X(t) = i + k, what can be said about the conditional distribution of the birth times of the k individuals born in (0, t)? 5.6. Verify the formula A(t) = a o + J ~ Xes) ds, given in Example 5.3(B). 5.7. Consider a Yule process starting with a single individual and suppose that with probability pes) an individual born at time s will be robust. Compute the distribution of the number of robust individuals born in (0, t). 5.8. Prove Lemma 5.4.1. 5.9. Prove Lemma 5.4.2. 5.10. Let pet) = Poo(t). (a) Find (b) Show that I . 1 - pet) 1m . 1-+0 t P(t)P(s) ~ pet + s) :s; 1 - pes) + P(s)P(t) (c) Show I pet) - P(s)1 < 1 - pet - s), s < t and conclude that P is continuous. 288 CONTINUOUS TiME MARKOV CHAINS 5.11. For the Yule process. (a) verify that satisfies the forward and backward equations (b) Suppose that X(O) = 1 and that at time T the process stops and is replaced by an emigration process in which departures occur in a Poisson process of rate J,L. Let T denote the time taken after T for the population to vanish. Find the density function of T and show that 5.12. Suppose that the "state" of the system can be modeled as a two-state continuous-time Markov chain with transition rates "0 = A, "1 = J,L When the state of the system is i, "events" occur in accordance with a Poisson process with rate a" i = 0, 1 Let N(t) denote the number of events in (0, t) (a) Find l i m , ~ " , N(t)lt (b) If the initial state is state 0, find E[N(t)]. 5.13. Consider a birth and death process with birth rates {All} and death rates {J,LIl}' Starting in state i, find the probability that the first k events are all births 5.14. Consider a population of SlZe n, some of whom are infected with a certain virus Suppose that in an interval of length h any specified pair of individuals will independently interact with probability Ah + o(h) If exactly one of the individuals involved in the interaction is infected then the other one becomes infected with probability a If there is a single individual infected at time 0, find the expected time at which the entire population is infected 5.15. Consider a population in which each individual independently gives birth at an exponential rate A and dies at an exponential rate J,L In addition, new members enter the population in accordance with a Pois- son process with rate 8 Let X(t) denote the population size at time t. (a) What type of process is {X(t), t ;:::= O}? (b) What are its parameters? (c) Find E [X(t) I X(O) = i] 5.16. In Example 5.4(D), find the variance of the number of males in the population at time t. PROBLEMS 289 5.17. Let A be a specified set of states of a continuous-time Markov chain and let T,(t) denote the amount of time spent in A during the time interval [0, t] given that the chain begins in state i Let Y I , ••• , Y n be independent exponential random variables with mean A. Suppose the Y, are independent of the Markov chain, and set t,(n) = E [T,(Y, + .. + Y n )] (a) Derive a set of linear equations for t,(1), i ~ 0 (b) Derive a set of linear equations for t,(n) in terms of the other t,(n) and t,(n - 1) (c) When n is large, for what value of A is t,(n) a good approximation of E [T,(t)]? 5.18. Consider a continuous-time Markov chain with X(O) = O. Let A denote a set of states that does not include 0 and set T = Min{t > O· X(t) E A} Suppose that T is finite with probability 1 Set q, = ~ q", and ,EA consider the random variable H = f ~ qAt) dt, called the random hazard. (a) Find the hazard rate function of H That is, find lim P{s < H < s h_O + hlH > s}/h (Hint Condition on the state of the chain at the time T when f ~ qAt) dt = s.) (b) What is the distribution of H? 5.19. Consider a continuous-time Markov chain with stationary probabilities {P" i ~ O}, and let T denote the first time that the chain has been in state 0 for t consecutive time units Find E [TIX(O) = 0] 5.20. Each individual in a biological population is assumed to give birth at an exponential rate A and to die at an exponential rate J.L. In addition, there is an exponential rate of increase 8 due to immigration. However, immigration is not allowed when the population size is N or larger (a) Set this up as a birth and death model. (b) If N = 3, 1 = 8 = A, J.L = 2, determine the proportion of time that immigration is restricted. 5.21. A small barbershop, operated by a single barber, has room for at most two customers. Potential customers arrive at a Poisson rate of three per hour, and the successive service times are independent exponential random variables with mean! hour (a) What is the average number of customers in the shop? (b) What is the proportion of potential customers that enter the shop? (c) If the barber could work twice as fast, how much more business would she do? 290 CONTINUOUS TIME MARKOV CHAINS 5.22. Find the limiting probabilities for the MIMIs system and determine the condition needed for these to exist. 5.23. If {X(I), I > O} and {Y(I), t > O} are independent time-reversible Markov chains, show that the process {(X(t), yet), I > O} is also. 5.24. Consider two M I M II queues with respective parameters A" J.Lp where A, < J.L" i = 1, 2. Suppose they both share the same waiting room, which has finite capacity N. (That is, whenever this room is full all potential arrivals to either queue are lost.) Compute the limiting probability that there will be n people at the first queue (1 being served and n - 1 in the waiting room when n > 0) and m at the second. (Hint Use the result of Problem 523.) 5.25. What can you say about the departure process of the stationary M 1M II queue having finite capacity? 5.26. In the stochastic population model of Section 5.62: (a) Show that when PC!!) is as given by (564) with (XI = (Aljv)(vlJ.L)I. (b) Let D(t) denote the number of families that die out in (0, t). Assum- ing that the process is in steady state 0 at time I = 0, what type of stochastic process is {D(I), I > O}? What if the popUlation is initially empty at t O? 5.27. Complete the proof of the conjecture in the queueing network model of Section 5.7.1. 5.28. N customers move about among r servers The service times at server i are exponential at rate J.L, and when a customer leaves server i it joins the queue (if there are any waiting-or else it enters service) at server j, j ~ i, with probability 1/(r 1) Let the state be (n), ... , n r ) when there are n, customers at server I, i = 1, . ,r Show the corresponding continuous-time Markov chain is time reversible and find the limiting probabilities 5.29. Consider a time-reversible continuous-time Markov chain having param- eters V,, P II and having limiting probabilities PI' j > 0 Choose some state-say state O-and consider the new Markov chain, which makes state 0 an absorbing state That is, reset Vo to equal 0 Suppose now at time points chosen according to a Poisson process with rate A, Markov PROBLEMS 291 chains-all of the above type (having 0 as an absorbing state )-are started with the initial states chosen to be j with probability Po,. All the existing chains are assumed to be independent Let N,(t) denote the number of chains in state j, j > 0, at time t (a) Argue that if there are no initial chains, then N,(t), j > 0, are independent Poisson random variables (b) In steady state argue that the vector process {(NI(t), Nit), )} is time reversible with stationary probabilities 00 n a' P(n) = IT e- a ,-' - n' , = 1 , 5.30. Consider an MIM/oo queue with channels (servers) numbered 1,2, On arrival, a customer will choose the lowest numbered channel that is free. Thus, we can think of all arrivals as occurring at channel 1 Those who find channel 1 busy overflow and become arrivals at channel 2 Those finding both channels 1 and 2 busy overflow channel 2 and become arrivals at channel 3, and so on (a) What fraction of time is channel 1 busy? (b) By considenng the corresponding MI M!2 loss system, determine the fraction of time that channel 2 is busy (c) Write down an expression for the fraction of time channel c is busy for arbitrary c. (d) What is the overflow rate from channel c to channel c + 1? Is the corresponding overflow process a Poisson process? A renewal process? Explain briefly (e) If the service distribution were general rather than exponential, which (if any) of your answers to (a)-(d) would change? Briefly ex- plain 5.31. Prove Theorem 5 7 1 5.32. (a) Prove that a stationary Markov process is reversible if, and only if, its transition rates satisfy q(jl,j2)q(j2,j) . q(jn-I ,jn)q(jn, jl) = q(jl,jn)q(jn,jn-I) q(j),j2)q(j2,jl) for any finite sequence of states jl' j2' , jn (b) Argue that it suffices to verify that the equality in (a) holds for sequences of distinct states 292 CONTINUOUS-TIME MARKOV CHAINS (c) Suppose that the stream of customers arriving at a queue forms a Poisson process of rate vand that there are two servers who possibly differ in efficiency Specifically, suppose that a customer's service time at server i is exponentially distributed with rate J-L" for i = 1, 2, where J-LI + J-L2 > v. If a customer arrives to find both servers free, he is equally likely to be allocated to either server. Define an appropriate continuous-time Markov chain for this model, show that it is time reversible, and determine the limiting probabilities 5.33. The work in a queueing system at any time is defined as the sum of the remaining service times of all customers in the system at that time. For the MIG/1 in steady state compute the mean and variance of the work in the system. 5.34. Consider an ergodic continuous-time Markov chain, with transition rates q'j' in steady state. Let PI' j ~ 0, denote the stationary probabilities. Suppose the state space is partitioned into two subsets B and Be = G. (8) Compute the probability that the process is in state i, i E B, given that it is in B That is, compute P{X(t) = iIX(t) E B}. (b) Compute the probability that the process is in state i, i E B, given that it has just entered B That is, compute P{X(t) = iIX(t) E B, X(r) E G} (c) For i E B, let T, denote the time until the process enters G given that it is in state i, and let F,(s) = E [e- ST ,] Argue that F,(s) = _v_, [L F,(s)P,) + L p,)] , v,+s jEB )EG where p,) = q,/L) q,j' (d) Argue that (e) Show by using (c) and (d) that s L P,F,(s) = L L P,q,)(1 - F,(s)) ,EB ,EG)EB PROBLEMS 293 (f) Given that the process has just entered B from G, let Tv denote the time until it leaves B. Use part (b) to conclude that (g) Using (e) and (f) argue that (h) Given that the process is in a state in B, let T1 denote the time until it leaves B. Use (a), (e), (f), and (g) to show that (i) Use (h) and the uniqueness of Laplace transforms to conclude that P{T t} = JI P{Tv> s} ds 1 0 E[Tv] (j) Use (i) to show that E[T] = E[T;] :> E[Tvl 1 2E[TJ 2' The random variable Tv is called the visit or sojourn time in the set of states B. It represents the time spent in B during a visit. The random variable T 1 , called the exit time from B, represents the remaining time in B given that the process is presently in B. The results of the above problem show that the distributions of T1 and Tv possess the same structural relation as the distributions of the excess or residual life of a renewal process at steady state and the distribution of the time between successive renewals. 5.35. Consider a renewal process whose interarrival distribution Fis a mixture of two exponentials. That is, F(x) = + q = 1 - p Compute the renewal function E[N(t)]. Hint· Imagine that at each renewal a coin, having probability p of landing heads, is flipped If heads appears, the next interarrival is exponential with rate A" and if tails, it is exponential with rate A2 Let R(t) = i if the exponential rate at time t is A, Then: 294 CONTINUOUS TIME MARKOV CHAINS (a) determine P{R(t) i}, i 1,2, (b) argue that E [N(t)] 2 II 2, A, P{R(s) , 1 0 i} ds where A(s) AR(s)' 5.36. Consider the two-state Markov chain of Example 5.8(A), with X(O) = 0 (a) Compute Cov(X(s), X(y» (b) Let So(t) denote the occupation time of state 0 by t. Use (a) and (5.84) to compute Var(S(t». 5.37. Let Y, = X(i) X(I l), i 1, ... , n + 1 where X(O) = 0, X(n + l) = t, and X(I) ~ X(2) <: •• ~ X(n) are the ordered values of a set of n independent uniform (0, t) random variables, Argue that P{Y, <: y" i = 1, .. , n + I} is a symmetric function of Yl> . ,Yn' REFERENCES References I, 2,3, 7, and 8 provide many examples of the uses of continuous-time Markov chains For additional material on time reversibility the reader should consult References 5, 6, and 10 Additional material on queueing networks can be found in References 6,9, 10, and 11 N Bailey, The Elements of Stochastic Processes with Application to the Natural Sciences, Wiley, New York, 1964 2 D. J Bartholomew, Stochastic Models for Social Processes, 2nd ed , Wiley, Lon- don, 1973 3 M S Bartlett, An Introduction to Stochastic Processes, 3rd ed., Cambridge Univer- sity Press, Cambridge, England, 1978 4 0 R Cox and H 0 Miller, The Theory of Stochastic Processes. Chapman and Hall, London, 1965 5 J Keilson, Markov Chain Models-Rarity and Exponentiality, Springer-Verlag, Berlin, 1979 6 F Kelly, Reversibility and Stochastic Networks, Wiley, New York, 1979 7 Renshaw, Modelling Biological Populations in Space and Time, Cambridge Univer- sity Press, Cambridge, England, 1991 8 H C Tijms, Stochastic Models, An Algorithmic Approach, Wiley, Chichester, England, 1994 9 N Van Dijk, Queueing Networks and Product Forms, Wiley, New York, 1993 10 J Walrand, Introduction to Queueing Networks, Prentice-Hall, Englewood Cliffs, NJ, 1988 11 P Whittle, Systems in Stochastic Equilibrium, Wiley, Chichester, England, 1986 CHAPTER 6 Martingales INTRODUCTION In this chapter we consider a type of stochastic process, known as a martingale, whose definition formalizes the concept of a fair game. As we shall see, not only are these processes inherently interesting, but they also are powerful tools for analyzing a variety of stochastic processes Martingales are defined and examples are presented in Section 6.1. In Section 6.2 we introduce the concept of a stopping time and prove the useful martingale stopping theorem. In Section 6.3 we derive and apply Azuma's inequality, which yields bounds on the tail probabilities of a martingale whose incremental changes can be bounded In Section 6 4 we present the important martingale convergence theorem and, among other applications, show how it can be utilized to prove the strong law of large numbers. Finally, in Section 65 we derive an extension of Azuma's inequality. 6. 1 MARTINGALES A stochastic process {Zn' n ~ I} is said to be a martingale process if for all n and (61.1) A martingale is a generalized version of a fair game For if we interpret Zn as a gambler's fortune after the nth gamble, then (6.1.1) states that his expected fortune after the (n + 1 )st gamble is equal to his fortune after the nth gamble no matter what may have previously occurred Taking expectations of (6 1.1) gives 295 296 MARTINGALES and so E[Z,,] = E[Zd for all n. SOME ExAMPLES OF MARTINGALES (1) Let XI> X 2 , •• be independent random variables with 0 mean; and let Z" = 'L:=1 XI' Then {Z", n ~ I} is a martingale since E[ZIl+IIZh" ,Z,,] E[Z" + X,,+dZh' .. , Zn] E[Z"IZI" .. , Z,,] + E[X,,+dZl, . ,Zn] Z" + E[Xn+1] (2) If Xl, X 2 ,... are independent random variables with E[X,] 1, then {Z", n ~ I} is a martingale when Z" n ~ = l X,. This follows since E[Z,,+dZI,'" ,Z,,] = E[Z"X,,+1IZ1,'" ,Z,,] = Z"E[X,,+dZl, .. , Z,,] = Z"E[X,,+d =Z" (3) Consider a branching process (see Section 4.5 of Chapter 4) and let Xn denote the size of the nth generation If m is the mean number of offspring per individual, then {Zn, n :> I} is a martingale when We leave the verification as an exercise. Since the conditional expectation E[XIU] satisfies all the properties of ordinary expectations, except that all probabilities are now computed condi- tional on the value of U, it follows from the identity E[X] E[E[XIY]] that ( 6.1.2) E[XIU] = E[E[XIY, U]Iu]· It is sometimes convenient to show that {Z", n :> I} is a martingale by considering the conditional expectation of Z,,+l given not only ZI , ... , Z" but also some other random vector Y. If we can then show that MARTINGALES 297 then it follows that Equation (6.1.1) is satisfied This is so because if the preceding equation holds, then E[Zn+dZ], . .. , Zn] = E[E[Zn+dZ}, , Zn, Y]IZJ, , Zn] = E[ZnI Z], . , Zn] ADDITIONAL EXAMPLES OF MARTINGALES (4) Let X, Y l , Y 2 , ••• be arbi- trary random variables such that E[lXIl < 00, and let It then follows that {Zn' n ::::-: I} is a martingale. To show this we will compute the conditional expectation of Zn+' given not Z" .. ,Zn but the more informative random variables Y" ,Y n (which is equivalent to conditioning on Z" ... , Zn, Y" .. ,Yn). This yields E[Zn+dYJ, .. ,Yn] = E[E[XIY" .. ,Yn, Yn+dIY" .. , Yn] = E[XIY" ... , Y n ] (from (6.1.2» and the result follows This martingale, often called a Duub type martingale, has important applications For instance, suppose X is a random variable whose value we want to predict and suppose that data Y l , Y 2 • • are accumulated sequentially Then, as shown in Section 1.9, the predictor of X that minimizes the expected squared error given the data Y l , , Y n is just E [XI Y l ,. ., Y n ] and so the sequence of optimal predictors constitutes a Doob type martingale (5) Our next example generalizes the fact that the partial sums of independent random variables having mean 0 is a martin- gale For any random variables Xl, X 2 , • •• , the random variables XI - E[XIIX 1 , •• ,XI-d, i::::-: 1, have mean O. Even though they need not be independent, their partial sums constitute a martingale That is, if n Zn = L {XI - E[XIIX 1 , 1=1 then, provided E[lZnll < 00 for all n, {Zn, n ::::-: I} is a martin- gale with mean 0 To verify this, note that 298 MARTINGALES Conditioning on XI , , X n , which is more informative than Zl, , Zn (since the latter are all functions of Xl, , Xn), yields that E[Zn+llxl , .. , Xn] = Zn + E lXn+dXh . ,Xn] - E[Xn+IIXJ, .. , Xn] = Zn thus showing that {Zn, n ~ I} is a martingale 6.2 STOPPING TIMES Definition The positive integer-valued, possibly infinite, random variable N is said to be a random time for the process {Zn, n ~ I} if the event {N = n} is determined by the random variables Zl, , Z. That is, knowing ZI' , Zn tells us whether or not N = n If P{N < co} = I, then the random time N is said to be a stopping time Let N be a random time for the process {Zn, n ~ I} and let ifn ~ N ifn > N {Zn, n ~ I} is called the stopped process. PROPOSITION 6.2.1 If N is a random time for the martingale {Z.}, then the stopped process {Zn} is also a martingale Proof Let I = {I • 0 if N";2 n ifN<n That is, In equals 1 if we haven't yet stopped after observing Zl,' " Zn-l We claim that (621) STOPPING TlMES 299 To verify (621) consider two cases: (i) N 2: n In this case, Zn = Zn,Zn-l = Zn-h and In = 1, and (6.2.1) follows (ii) N < n. In this case, Zn-l = Zn = ZN, In = 0, and (62.1) follows Now (622) E[Z"IZI, . , Zn-d = E[Zn-l + In(Zn - Zn-l)IZI,' .. ,Zn-d = Zn-l + InE[Zn - zn-llzj, , Zn-d where the next to last equality follows since both Zn-l and In are determined by Zl' . , Zn-l> and the last since {Zn} is a martingale We must prove that E[ZnIZl' .. , Z.-l] = Zn-l . However, (6.2.2) implies this result since, if we know the values of Zl , ... , Zn-l, then we also know the values of Zl, Z.-!. Since the stopped process is also a martingale, and since Z; Zl' we have (623) for alln. Now let us suppose that the random time N is a stopping time, that is, P{N < co} = 1 Since if n :5 N ifn > N, it follows that Zn equals ZN when n is sufficiently large. Hence, as n ~ co, with probability 1 Is it also true that (624) asn ~ co Since E[Z,,] E[Zd for all n, (624) states that It turns out that. subject to some regularity conditions, (6 24) is indeed valid. We state the following theorem without proof. 300 THEOREM 6.2.2 The Martingale Stopping Theorem If either (i) Zn are uniformly bounded, or, \ii) N is bounded, or, (iii) E[N] < 00, and there is an M < 00 such that then (62.4) is valid. Thus MARTINGALES Theorem 6.2.2 states that in a fair game if a gambler uses a stopping time to decide when to quit, then his expected final fortune is equal to his expected initial fortune. Thus in the expected value sense, no successful gambling system is possible provided one of the sufficient conditions of Theorem 6.2.2 is satis- fied The martingale stopping theorem provides another proof of Wald's equa- tion (Theorem 3.3.2). Corollary 6.2.3 (Wald's Equation) If X" i ~ 1, are independent and identically distributed (iid) with E[lXI1 < 00 and if N is a stopping time for Xl' X 2 , • with E[N] < 00, then E [ ~ X I ] E[N]E[X] Proof Let p. = E[X] Since n Zn = L (X, p.) 1=1 is a martingale, it follows, if Theorem 6 2 2 is applicable, that STOPPING TIMES 301 But E[ZN] = E [ ~ (X/ - ,.,.)] = E [ ~ X / - N,.,. ] = E [± XI] - E[N],.,. / ~ I To show that Theorem 6.22 is applicable, we venfy condition (iii) Now Zn+1 - Zn = X n + 1 - ,.,. and thus E[lzn+I-ZnIIZJ," ,Zn]=E[lxn+,-,.,.llz". ,Zn] = E[lXn+' - ,.,.11 ~ E[lXI] + 1,.,.1 In Example 3.5(A) we showed how to use Blackwell's theorem to compute the expected time until a specified pattern appears. The next example presents a martingale approach to this problem. ExAMPLE 6.2(A} Computing the Mean Time Until a Given Pattern Occurs. Suppose that a sequence of independent and identically distributed discrete random variables is observed sequentially, one at each day. What is the expected number that must be observed until some given sequence appears? More specifically, suppose that each outcome is either 0, 1, or 2 with respective probabilities ~ , !, and t and we desire the expected time until the run 02 0 occurs For instance, if the sequence of outcomes is 2, 1, 2, 0, 2, 1,0, 1,0, 0, 2, 0, then the required number N would equal 12. To compute £[ N] imagine a sequence of gamblers, each initially having 1 unit, playing at a fair gambling casino Gambler i begins betting at the beginning of day i and bets her 1 unit that the value on that day will equal O. If she wins (and thus has 2 units), she then bets the 2 units on the next outcome being 2, and if she wins this bet (and thus has 12 units), then all 12 units are bet on the next outcome being O. Hence each gambler will lose 1 unit if any of her bets fails and will win 23 if all three of her bets succeed. At the beginning of each day another gambler starts playing. If we let Xn denote the total winnings of the casino after the nth day, then since all bets are fair it follows that {Xn' n ~ I} is a martingale with mean 0 Let N denote the time until the seq uence 0 2 0 appears Now at the end of day N each of the gamblers 1, ... , N - 3 would 302 MARTINGALES have lost 1 unit, gambler N - 2 would have won 23, gambler N - 1 would have lost 1, and gambler N would have won 1 (since the outcome on day N is 0) Hence, X N = N - 3 - 23 + 1 - 1 = N - 26 and, since E[X N ] = 0 (it is easy to verify condition (iii) of Theorem 62.2) we see that E[N] = 26 In the same manner as in the above, we can compute the expected time until any given pattern of outcomes appears For instance in coin tossing the mean time until HHTIHH occurs is p-4q-2 + p-2 + p-I, where p = P{H} = 1 - q ExAMPLE 6.2(a) Consider an individual who starts at position 0 and at each step either moves 1 position to the right with probability p or one to the left with probability 1 - P Assume that the successive movements are independent. If p > 1/2 find the expected number of steps it takes until the individual reaches position i, i > O. Solution. Let X, equal 1 or -1 depending on whether step j is to the right or the left. If N is the number of steps it takes to reach i, then Hence, since E[X,] = 2p - 1, we obtain from Wald's equation that E[N](2p - 1) = i or, 1 E[N] = 2p-l ExAMPLE 6.2(c) Players X, Y, and Z contest the following game At each stage two of them are randomly chosen in sequence, with the first one chosen being req uired to give 1 coin to the other All of the possible choices are eq ually likely and successive choices are independent of the past This continues until one of the players has no remaining coins At this point that player departs and the other two continue playing until one of them has all the coins. If the players initially have x, y, and z coins, find the expected number of plays until one of them has all the s = x + y + z coins.- STOPPING TIMES Solution. Suppose that the game does not end when one of the players has all s coins but rather that the final two contestants continue to play, keeping track of their winnings by allowing for negative fortunes Let X n , Y n , and Zn denote, respectively, the amount of money that X, Y, and Z have after the nth stage of play Thus, for instance, Xn = 0, Y n - 4, Zn = s + 4 indicates that X was the first player to go broke and that after the nth play Y had lost 4 more than he began with. If we let T denote the first time that two of the values X n , Y n , and Zn are equal to 0, then the problem is to find E[T]. To find E[T] we will show that is a martingale It will then follow from the martingale stopping theorem (condition (iii) is easily shown to be satisfied) that E[M T ] = E[Mo] xy + xz + yz But, since two of X T , Y T , and ZT are 0, it follows that M T = T and so, E[T] = xy + XZ + yz. To show that {Mn, n 2!:: O} is a martingale, consider E[Mn+llx" Y1 , Z" i = 0, 1, ,n] and consider two cases. Case 1: Xn YnZn > 0 In this case X, Y, and Z are all in competition after stage n. Hence, E[Xn+l Yn+ 1 1Xn == X, Y n = y] = [(x + l)y + (x + 1)(y - 1) + x(y + 1) + x(y - 1) + (x-l)y+ (x-1)(y+ 1)]/6 =xy-1I3. As the conditional expectations of Xn+1Zn+l and Yn+1Z n + 1 are similar, we see that in this case 303 304 MARTINGALES Case 2: One of the players has been eliminated by stage n, say player X. In this case Xn+ I = Xn = 0 and E[Yn+IZn+dY n = y, Zn = z] = f(y + l)(z - 1) + (y - l)(z + 1)]/2 = yz-1 Hence. in this case we again obtain that EfMn+1Ix " Y " Z" i = 0, 1, Therefore, {Mn' n :> O} is a martingale and EfT] = xy + xz + yz EXAMPlE 6.2(D) In Example 1 5(C) we showed, by mathematical induction, that the expected number of rounds that it takes for all n people in the matching problem to obtain their own hats is equal to n We will now present a martingale argument for this result. Let R denote the number of rounds until all people have a match. Let X, denote the number of matches on the ith round, for i = 1, , R, and define X, to equal 1 for i > R. We will use the zero-mean martingale {Zko k ~ I}, where " Z" = L (X, - EfX;lxl •.. ,X,-I]) I ~ I where the last equality follows since, when any number of individu- als are to randomly choose from among their own hats, the expected number of matches is 1 As R is a stopping time for this martingale " (it is the smallest k for which L X, = n), we obtain by the martingale 1=1 stopping theorem that 0= E[ZR] = E [ ~ (X; - 1)] = E [ ~ XI] - EfR] =n-EfR] R where the final equality used the identity 2: X, = n. 1=1 AZUMA'S INEQUALITY FOR MARTINGALES 305 6.3 AZUMA'S INEQUALITY FOR MARTINGALES Let Z" i 0 be a martingale sequence. In situations where these random variables do not change too rapidly, Azuma's inequality enables us to obtain useful bounds on their probabilities Before stating it, we need some lemmas. Lemma 6.3.1 Let X be such that E[X] == 0 and P{ -a X = 1 Then for any convex function f E[f(X)] 5 a f( -a) + a:,8 f(,8)· Proof Since f is convex it follows that, in the region -a x ,8, it is never above the line segment connecting the points (-a./( -a» and (,8./(,8». (See Figure 63.1 ) As the formula for this line segment is ,8 a 1 y = --f( -a) + --f(,8) + --[J(,8) - f( -a)]x a+,8 a+,8 a+,8 it follows, since -a X 5. ,8, that ,8 a 1 f(X) + -f(,8) + -[J(,8) - f(-a)]X a+,8 a+,8 a+,8, Taking expectations gives the result. -(1. Figure 6.3.1. A convexfunction. 306 Lemma 6.3.2 For 0 :s () ::s 1 1 ()e (1-6) .. + (1 - ()e- 6 .. ::s e" /8 MARTINGALES Proof Letting () = (1 + 0:)/2 and x = 2{3, we must show that for -1 ::s 0: ::s 1, or, equivalently, Now the preceding inequality is true when 0: := -lor + 1 and when {3 is large (say when 1{31 ~ 100) Thus, if Lemma 63 2 were false then the function [(0:, (3) = e P + e P + o:(e P - e- P ) - 2 exp{o:{3 + {32/2}, would assume a strictly positive maximum in the intenor of the region R = teo:, (3) 10:1 ::s 1, 1{31 ::s 100} Setting the partial denvatives of [equal to 0, gives that e P - e- P + o:(e P + e- P ) = 2(0: + (3) exp{o:{3 + {32/2} e P - e- P = 2{3 exp{o:{3 + {32/2} Assuming a solution in which {3 ~ 0 implies, upon division, that As it can be shown that there is no solution for which 0: = 0 and {3 ~ 0 (see Problem 614) we see that or, expanding in a Taylor series, ~ ~ 2.{321+1/(2i)1 = 2. {321+1/(2i + 1)1 1=0 which is clearly not possible when {3 ~ 0 Hence, if the lemma is not true, we can conclude that the stnctly positive maximal value of [(0:, (3) occurs when {3 = O. However, [(0:, 0) = 0 and thus the lemma is proven AZUMA'S INEQUALITY FOR MARTINGALES 307 THEOREM 6.3.3 Azuma's Inequality Let Zn, n ~ 1 be a martingale with mean JJ. = E[Znl Let Zo = JJ. and suppose that for nonnegative constants a" f3" i ~ I, -a, ~ Z, - Z,_I ~ f3, Then for any n ~ 0, a > 0 (i) P{Zn - JJ. ~ a} ~ exp{ - 2 a 2 / ~ (a, + f3,)2} (ii) P{Zn JJ. ~ -a} ~ exp{ - 2 a 2 / ~ (at + f3,)2} Proal Suppose first that JJ. = 0 Now, for any c > 0 (631) P{Zn ~ a} = P{exp{cZ n } ~ eea} ~ E[exp{cZn}]e- ca (by Markov's inequality) To obtain a bound on E[exp{cZ n }], let Wn exp{cZ n } Note that Wo = 1, and that for n > 0 Therefore, E[WnIZn-d = exp{cZn-I}E[exp{c(Zn Zn-t)}IZn-d ~ Wn-J[f3n exp{ -can} + an exp{cf3n}l/ (a.r + f3n) where the inequality follows from Lemma 631 since (i) I(x) = e(X is convex, (ii) -an;s; Zn - Zn-I ;S; f3n, and (iii) E[Zn - Zn-dZn-d = E[ZnIZn-tl - E[Zn-IIZn-d 0 Taking expectations gives Iterating this inequality gives, since E[WoJ = 1, that n E[W n ] ~ II {(f3, exp{ -ca,} + at exp{cf3,}) I (a, + f3,)} I ~ l 308 MARTINGALES Thus, from Equation (631), we obtain that for any c > 0 n (632) P{Zn ~ a} $ e- ca ll {(t3,exp{-ca,} + a,exp{ct3,})/(a, + t3,)} , ~ l n $ e- CO I1 exp{d(a, + t3.)2/8}, I ~ l where the preceding inequality follows from Lemma 632 upon setting 8 a,/(a, + t3,) and x = c(a, + t3,) Hence for any c > 0, P{Zn ~ a}:5 exp { -ca + c2 ~ (a, + t3Y /8} / n n Letting c = 4a ~ (a, + t3,)2 (which is the value that minimi7es -ca + c2 ~ (a, + t3, )2/8) gives that Parts (i) and (ii) of A7uma's inequality now follow from applying the preceding, first to the 7ero-mean martingale {Zn - IJ.} and secondly to the 7ero-mean martingale {IJ. - Zn} Remark From a computational point of view, one should make use of the inequality (632) rather than the final Azuma inequality. For instance, evaluating the nght side of (6.32) with c = 4a / ~ (a l + (31)2 will give a sharper bound than the Azuma inequality ExAMPLE 6.3(A) Let Xl,"" Xn be random variables such that E[Xd = 0 and E[X,IX 1 , • • , X,-d = 0, i:2: I Then, from Example J 5 of Section 6 1, we see that 2: Xj , j = 1, . ,n is a zero-mean ,-1 martingale. Hence, if -a, :::;; Xj <: {31 for all i, then we obtain from Azuma's inequality that for a > 0, AZUMA'S INEQUALITY FOR MARTINGALES 309 and Azuma's inequality is often applied in conjunction with a Doob type mar- tingale EXAMPLE 6.3(8) Suppose that n balls are put in m urns in such a manner that each ball, independently, is equally likely to go into any of the urns. We will use Azuma's inequality to obtain bounds on the tail probabilities of X, the number of empty urns To begin, letting I{A} be the indicator variable for the event A, we have that, m X = 2: I{urn i is empty} , ~ I and thus, E[X] = mP{urn i is empty} = m(l - limY == J.L Now, let Xl denote the urn in which the jth ball is placed, j = 1, . ,n, and define a Doob type martingale by letting Zo = E [X], and for i > 0, Z, = E[XIX I , . ,X,] We wi1l now obtain our result by using Azuma's inequa1ity along with the fact that X = Zn. To determine a bound on IZ, - Z,_II first note that IZI - Zol = o. Now, for i ~ 2, let D denote the number of distinct values in the set XI, , X,_I That is, D is the number of urns having at least one ball after the first i - 1 bans are distributed Then, since each of the m - D urns that are presently empty will end up empty with probability (1 - 1/mY-'+l, we have that E[XIXI' ... , X,-d = (m - D)(l - 1/m)n-,+I. On the other hand, E[XIX I , ., X,] { em - D)(l - lIm)n-' - (m - D - 1)(1 - 1/m)n-' ifX,E(XI' .,X,_I) if X, t!. (XI, . , X,_I)' Hence, the two possible values of Z, - Z,_I, i ~ 2, are m - D (1 - lImY-' m and 310 MARTINGALES Since 1 !£ D :5; minU - 1, m), we thus obtain that -lX, !£ Z, Z,_I!£ {3" where From Azuma's inequality we thus obtain that for a > 0, and, where, n m L (lX. + {3,)2 = L (m + i - 2)2(1 lIm)2(n-'l/m 2 , ~ 2 /=2 n + L (1 - 1/m)2(2 - lIm)2 . • =m+! Azuma's inequality is often applied to a Doob type martingale for which IZ, - z'_11 !£ 1. The following corollary gives a sufficient condition for a Doob type martingale to satisfy this condition. Corollary 6.3.4 Let h be a function such that if the vectors x == (XI, . , xn) and y = (YI" , Yn) differ in at most one coordinate (that is, for SOme k, X, Y, for all i yf:. k) then Ih(x) - h(y)1 :s 1 Let X" , Xn be independent random variables Then, with X = (Xl, , X n ). we have for a > 0 that (i) P{h(X) - £[h(X)] ;;:: a} :s e- a2rln (ii) P{h(X) - £[h(X)] :s -a} :s e- hln . AZUMA'S INEQUALITY FOR MARTINGALES 311 Proof Consider the martingale Z, = E[h(X)IX " ., X,], i = 1, . ,n. Now, IE[h(X)IX , = XIo ., X, = x,] - E[h(X)IX, = XI, ,X,_I = x,-dl =IE[h(X" ,x"x,+I, .. ,Xn)]-E[h(X" ,x,_"X" ,Xn)]1 =IE[h(X", ,X"X,+I' .. ,Xn)-h(XI' ,X,-IoX" ,Xn )JI::Sl Hence, Iz, - Z,-11 ::s 1 and so the result follows from Azuma's inequality with a, = (J, = 1 ExAMPLE 6.3(c) Suppose that n balls are to be placed in m urns, with each ball independently going into urn j with probability P" j = 1, ... , m. Let Y k denote the number of urns with exactly k balls, 0 :$ k < n, and use the preceding corollary to obtain a bound on its tail probabilities. Solution To begin, note that E [Yk ] = E [ ~ I{ urn i has exactly k balls} ] = f (n) p ~ ( 1 _ p,)n-k. 1=1 k Now, let X, denote the urn in which ball i is put, i = 1, .. , n. Also, let hk(x 1 , X2, ••• ,x n ) denote the number of urns with exactly k balls when X, = Xi, i = 1,. ,n, and note that Y k = h k (X 1 , • •• , Xn). Suppose first that k = 0 In this case it is easy to see that ho satisfies the condition that if x and y differ in at most one coordinate then Iho(x) - ho(y)1 $ 1. (That is, suppose n x-balls and ny-balls are put in m urns so that the ith x-ball and the ith y-ball are put in the same urn for all but one i Then the number of urns empty of x-balls and the number empty of y-balls can clearly differ by at most 1 ) Therefore, we see from Corollary 6.3.4 that p { Yo - ~ (1- p,)n ~ a} < exp{-a 2 /2n} p { Yo - ~ (1 - PiY:$ -a} < exp{ -a 2 /2n}. Now, suppose that 0 < k < n In this case if x and y differ in at most one coordinate then it is not necessarily true that Ihk(x) - hk(y)1 $ 1, for the one different value could result in one of the 312 MARTINGALES vectors having 1 more and the other 1 less urn with k balls than they would have had if that coordinate was not included. But from this we can see that if x and y differ in at most one coordinate then Hence, h:(x) = h k (x)/2 satisfies the condition of Corollary 6.3.4 and so we can conclude that for 0 < k < n, a > 0, .... and, ExAMPLE 6.3(D) Consider a set of n components that are to be used in performing certain expenments Let X, equal 1 if component i is in functioning condition and let it equal 0 otherwise, and suppose that the X, are independent with £[X.] = P, Suppose that in order to perform experiment j, j = 1, . , m, all of the components in the set A, must be functioning If any particular component is needed in at most three experiments, show that for a > 0 p{X- f IIp, ~ -3a} sexp{-a 2 !2n} ,=1 lEA , where X is the number of experiments that can be performed. Solution Since m X = L I{experiment j can be performed}, ,=1 we see that m £[X] = L IIp,· ,=1 lEA, If we let heX) equal the number of experiments that can be per- formed, then h itself does not satisfy the condition of Corollary MARTINGALE CONVERGENCE THEOREM 63 4 because changing the value of one of the X, can change the value of h by as much as 3 However, h(X)/3 does satisfy the conditions of the coroJlary and so we obtain that P{X/3 - E[X]/3 > a}::S; exp{-a 2 /2n} P{X/3 - E[X]/3 ~ -a}::S; exp{-a 2 /2n} 6.4 SUBMARTINGALES, SUPERMARTINGALES, AND THE MARTINGALE CONVERGENCE THEOREM 313 A stochastic process {Zn, n .2! 1} having E[lZnll < 00 for all n is said to be a sub martingale if (6.4 1) and is said to be a supermartingale if (6.4 2) Hence, a sub martingale embodies the concept of a superfair and a supermartin- gale a subfair game From (641) we see that for a submartingale with the inequality reversed for a supermartingale The analogues of Theorem 6.2.2, the martingale stopping theorem, remain valid for submartingales and supermartingales. That is, the following result, whose proof is left as an exer- cise, can be established THEOREM 6.4.1 If N is a stopping time for {Zn. n ~ 1} such that anyone of the sufficient conditions of Theorem 6 2 2 is satisfied, then and E[ZN] ~ E[Zd E[ZN] S E[Zd for a submartingale for a supermartingale The most important martingale result is the martingale convergence theo- rem Before presenting it we need some preliminaries 314 MARTINGALES Lemma 6.4.2 If {Z" i 2! I} is a submartingale and N a stopping time such that P{N s n} = 1 then Proof It follows from Theorem 641 that, since N is bounded, E[ZN] 2! E[Zd Now, • ZN. N = k] = E[ZnIZIl = E[ZnI Z 1. • Zk, N = k] • Zk] (Why?) Hence the result follows by taking expectations of the above Lemma 6.4.3 If {Zn, n 2! I} is a martingale and fa convex function, then {f(Zn), n 2! I} is a submar- tingale Proof E(f(Zn+,)lz,. . Zn] 2! f(E[Zn+1Iz], ,Zn]) (by Jensen's inequality) = f(Zn) THEOREM 6.4.4 (Kolmogorov's Inequality for Submartingales) If {Zn' n 2! I} is a nonnegative submartingale, then P{max(ZI' ,Zn) > a}::;: E[Znl a fora >0 Proof Let N be the smallest value of i, i ::;: n, such that Z, > a, and define it to equal n if Z, ::;: a for all i = 1, • n Note that max(ZI. . Zn) > a is equivalent to ZN> a Therefore, P{max(ZI' • Zn) > a} = P{ZN> a} ::;: E[ZN] a (by Markov's inequality) where the last inequality follows from Lemma 642 since N ::;: n • Corollary 6.4.5 Let {Zn' n 2: I} be a martingale Then, for a > 0 (i) P{max(IZll, (ii) P{max(IZt!, , IZnl) > a} E[IZnl1/a, , IZnl) > a} 315 Proof Parts (i) and (ii) follow from Lemma 644 and Kolmogorov's inequality since the functions f(x) = Ixl and f(x) = x 2 are both convex We are now ready for the martingale convergence theorem. THEOREM 6.4.6 (The Martingale Convergence Theorem) If {Zn, n 2: I} is a martingale such that for some M < 00 E[IZnl1 M, for all n then, with probability 1, limn_oo Zn exist'i and is finite Proof We shall prove the theorem under the stronger assumption that is bounded (stronger since EIIZnl] (E[ZmIf2) Since f(x) = x 2 is convex, it follows from Lemma 643 that n 2: I} is a submartingale, hence is nondecreasing Since is bounded, it follows that it converges as n 00 Let IL < 00 be given by IL = lim n_:o We shall show that limn Zn exists and is finite by showing that {Zn, n 2: I} is, with probability 1, a Cauchy sequence That is, we will show that with probability 1 Now (643) But as k, m 00 P{IZm+k - zml > s for some k n} (by Kolmogorov's inequality) E[ZmZm+n] = E[E[ZmZm+nIZm11 = E[ZmE[Zm+nIZm11 = 316 MARTINGALES Hence, from (6.4 3), I I - P{ Zm+k - Zm > 8 for some k :$ n}:$ 2 8 Letting n _ 00 and recalling the definition of p. yields p._E[Z2] p{lzm+k - Zml > 8 for some k}:$ 2 m 8 And, therefore, P{lZm+k - Zml > 8 for some k} - 0 as m _ 00 Thus, with probability 1, Zn will be a Cauchy sequence, and thus lim...-.o> Zn will exist and be finite Corollary 6.4.7 If {Zn, n O} is a nonnegative martingale, then, with probability 1, Zn exists and is finite Proof Since Zn is nonnegative, EXAMPLE 6.4(A) Branching Processes. If Xn is the population size of the nth generation in a branching process whose mean number of offspring per individual is m, then Zn = Xnlmn is a nonnegative martingale. Hence from Corollary 64.7 it will converge as n 00. From this we can conclude that either Xn 0 or else it goes to 00 at an exponential rate. ExAMPLE 6.4(a) A Gambling Result. Consider a gambler playing a fair game, that is, if Zn is the gambler's fortune after the nth play, then {Zn' n I} is a martingale Now suppose that no credit is given, and so the gambler's fortune is not allowed to become negative, and on each gamble at least 1 unit is either won or lost. Let denote the number of plays until the gambler is forced to quit (Since Zn - Zn+! = 0, she did not gamble on the n + 1 play.) MARTINGALE CONVERGENCE THEOREM Since {Zn} is a nonnegative martingale we see by the convergence theorem that (6.4.4) lim Zn exists and is finite, with probability 1 n .... oo But IZn'l - Znl :> 1 for n < N, and so (64.4) implies that N<oo with probability 1. That is, with probability 1, the gambler will eventually go broke. 317 We will now use the martingale convergence theorem to prove the strong law of large numbers. THEOREM 6.4.8 The Strong Law of Large Numbers Let Xl , X 2 , " be a sequence of independent and identically distributed random variables • having a finite mean p.., and let Sn = L X, Then I ~ I Proof We will prove the theorem under the assumption that the moment generating function 'IT(t) = E[e 'X ] exists. For a given e > 0, let get) be defined by Since g(O) = 1, I _ 'IT(t)(p.. + e)el(l"+t) - 'lT1(t)el(I"+<ll _ g (0) - 'lT2(t) - e > 0, I ~ O there exists a value to > 0 such that g(t o ) > 1 We now show that S,Jn can be as large as p.. + e only finitely often For, note that (64.5) However, e'os'/'lTn(t o ), being the product of independent random variables with unit means (the ith being e'ox'/'IT(t», is a martingale Since it is also nonnegative, the 318 MARTINGALES convergence theorem shows that, with probability 1, lim e'Os" 1 '1'" (to) exists and is finite "_00 Hence, since g(to) > 1, it follows from (6.4 5) that P{Snln > p. + e for an infinite number of n} 0 Similarly, by defining the function f(t) = e'(I'-£) 1'I'(t) and noting that since f(O) = 1, 1'(0) == -e, there exists a value to < 0 such thatf(ta) > 1, we can prove in the same manner that P{Snln :0:; P. - e for an infinite number of n} 0 Hence. P{p. - e :5 Sn1n :0:; p. + e for all but a finite number of n} = 1, or, since the above is true for all e > 0, ? We will end this section by characterIZing Doob martingales To begin we need a definition. Definition The sequence of random variables X n • n 2: 1. is said to be uniformly integrable if for every e > 0 there is a y. such that J ' Ixl d f ~ ( x ) < e, Ixl>Y£ for all n where f ~ is the distnbution function of Xn Lemma 6.4.9 If X n , n 2: 1, is uniformly integrable then there exists M < 00 such that E[lXnll < M for aU n A GENERALIZED AZUMA INEQUALITY 319 Proof Let Yl be as in the definition of uniformly integrable. Then E[IXnll = J' Ixl + I Ixl Ixls" Ixl>" Thus, using the preceding, it follows from the martingale convergence theorem that any uniformly integrable martingale has a finite limit. Now, let Zn = E[XI Y 1 , • , Yn], n ;;::: 1, be a Doob martingale. As it can be shown that a Doob martingale is always uniformly integrable, it thus follows that lim n _", E[XI Y 1 ,. ,Y n ] exists As might be expected, this limiting value is equal to the conditional expectation of X given the entire sequence of the Y/. That is, Not only is a Doob martingale uniformly integrable but it turns out that every uniformly integrable martingale can be represented as a Doob martin- gale. For suppose that {Zn, n ;;::: I} is a uniformly integrable martingale Then, by the martingale convergence theorem, it has a limit, say limn Zn Z. Now, consider the Doob martingale {E[ZIZ1,. , Zd. k > I} and note that E[zIZ" . ,Zd = E[lim ZnlZI , = lim E[ZnIZ ..... , Zk] where the interchange of expectation and limit can be shown to be justified by the uniformly integrable assumption Thus, we see that any uniformly integrable martingale can be represented as a Doob martingale Remark Under the conditions of the martingale convergence theorem, if we let Z = Z" then it can be shown that E[lZI] < 00 6.5 A GENERALIZED AZUMA INEQUALITY The supermartingale stopping time theorem can be used to generalize Azuma's inequality when we have the same bound on all the Z, - Z,_I We start with the following proposition which is of independent interest 320 MARTINGALES PROPOSITION 6.5.1 Let {Zn' n ~ I} be a martingale with mean Zo = 0, for which for all n :> 1 Then, for any positive values a and b P{Zn ~ a + bn for some n}:5 exp{-Sab/(a + (J)2} Proof Let, for n ~ 0, Wn = exp{c(Zn - a - bn)} and note that, for n ~ 1, Using the preceding, plus the fact that knowledge of WI, ., W n - 1 is equivalent to that of ZI , , Zn I, we obtain that , Wn-d = Wn_1e- cb E[exp{c(Zn - Zn_I)}IZI, :5 Wn_le-Cb[(Je-ca + ae cll ] /(a + (J) where the first inequality follows from Lemma 6.3 1, and the second from applying Lemma 632 with () = a/(a + fJ), x = c(a + fJ) Hence, fixing the value of cas c = Sb /(a + fJ)2 yields and so {Wn' n ~ O} is a supermartingale For a fixed positive integer k, define the bounded stopping time N by N = Minimum{n either Zn ~ a + bn or n = k} Now, P { Z N ~ a + bN} = P{WN ~ I} :5 E[W N ] (by Markov's inequality) :5 E[Wo] where the final eq uality follows from the supermartingale stopping theorem But the preceding is equivalent to P{Zn ~ a + bn for some n :S k]} :5 e -Sab/(a t Il)z Letting k ~ 00 gives the result = A GENERALIZED AZUMA INEQUALITY 321 THEOREM 6.5.2 The Generalized Azuma Inequality Let {Zn' n 2: I} be a martingale with mean Zo = O. If -a :5 Zn - Zn_1 :5 fJ for all n 2: 1 then, for any positive constant c and positive integer m' (i) P{Zn 2: nc for some n 2: m} ~ exp{-2mc 2 /(a + fJ)2} (ii) P{Zn ~ -nc for some n 2: m} ~ exp{-2mc 2 /(a + fJY}. Proof To begin, note that if there is an n such that n 2: m and Zn 2: nc then, for that n, Zn 2: nc 2: mc/2 + nc/2 Hence, P{Zn 2: nc for some n 2: m} ~ P{Zn 2: mcl2 + (c/2)n for some n} :s exp{ -8(mcl2)(cl2) / (a + fJ)2} where the final inequality follows from Proposition 65 1 Part (ii) follows from part (i) upon consideration of the martingale - Zn, n 2: 0 Remark Note that Azuma's inequality states that the probability that Zm1m is at least c is such that whereas the generalized Azuma gives the same bound for the larger probability that Zn1n is at least c for any n :=:: m EXAMPLE 6.S(A) Let Sn equal the number of heads in the first n independent flips of a coin that lands heads with probability p, and let us consider the probability that after some specified number of flips the proportion of heads will ever differ from p by more than e. That is, consider P{ISnln - pi > e for some n ~ m} Now, if we let X, equal 1 if the ith flip lands heads and 0 other- wise, then n Zn == Sn - np = 2: (X, - p) 1=1 is a martingale with 0 mean As, -p ::S; Zn - Zn-l S 1 - p it follows that {Zn' n ~ O} is a zero-mean martingale that satisfies the conditions of Theorem 6.52 with a = p, {3 = 1 - P Hence, P{Zn:=:: ne for some n :=:: m}:s; exp{ -2me 2 } 322 MARTINGALES or, equivalently, P{Snln - p:> e for some n ~ m}:5 exp{-2me 2 }. Similarly, P{Snln - p:5 -e for some n ~ m} -< exp{-2me 2 } and thus, For instance, the probability that after the first 99 flips the propor- tion of heads ever differs from p by as much as .1 satisfies P{ISnln - pi ~ 1 for some n ~ loo}:5 2e- 2 = .2707. PROBLEMS 6.1. If {Zn, n ~ I} is a martingale show that, for 1 :5 k < n, 6.2. For a martingale {Zn, n ~ I}, let X, = Z, - Z,_I, i :> 1, where Zo == 0 Show that n Var(Zn) = L Var(X,) j ~ 1 6.3. Verify that Xnlmn, n ~ 1, is a martingale when Xn is the size of the nth generation of a branching process whose mean number of offspnng per individual is m 6.4. Consider the Markov chain which at each transition either goes up 1 with probability p or down 1 with probability q = 1 - p. Argue that (qlp)So, n ~ 1, is a martingale 6.5. Consider a Markov chain {X n , n :> O} with P NN = 0 Let PO) denote the probability that this chain eventually enters state N given that it starts in state i Show that {P(X n ), n ~ O} is a martingale. 6.6. Let X(n) denote the size of the nth generation of a branching process, and let 1To denote the probability that such a process, starting with a single individual, eventually goes extinct. Show that {1T';(n), n ~ O} is a martingale PROBLEMS 323 6.7. Let Xl, ., . be a sequence of independent and identically distributed n random variables with mean 0 and variance (T2. Let Sn = 2: Xl and show I ~ l that {Zn' n ;;:: I} is a martingale when 6.8. If {Xn' n ::> O} and {Y n , n ::> O} are independent martingales, is {Zn, n ;;:: O} a martingale when (a) Zn = Xn + Yn? (b) Zn = XnYn? Are these results true without the independence assumption? In each case either present a proof or give a counterexample. 6.9. A process {Zn, n ::> I} is said to be a reverse, or backwards, martingale if EIZn I < 00 for an nand Show that if X,, i ;;:: 1, are independent and identically distributed random vanables with finite expectation, then Zn = (XI + " + Xn}ln, n ~ 1, is a reverse martingale. 6.10. Consider successive flips of a coin having probability p of landing heads. Use a martingale argument to compute the expected number of flips until the following sequences appear: (a) HHTTIlHT (b) HTHTHTH 6.11. Consider a gambler who at each gamble is equaUy likely to either win or lose 1 unit. Suppose the gambler wiJ) quit playing when his winnings are either A or -B, A > 0, B > O. Use an appropriate martingale to show that the expected number of bets is AB 6.U. In Example 6.2(C), find the expected number of stages until one of the players is eliminated. n 6.13. Let Zn = II XI> where XI' i::> 1 are independent random variables with 1=1 P{X, = 2} = P{X, = O} = 112 Let N = Min{n. Zn O}. Is the martingale stopping theorem applicable? If so, what would you conclude? If not, why not? 324 MARTINGALES 6.14. Show that the equation has no solution when f3 =f=. O. (Hint. Expand in a power series.) 6.15. Let X denote the number of heads in n independent flips of a fair coin. Show that: (a) P{X - nl2 :> a} < exp{ -2a 2 In} (b) P{X - nl2 < -a} :::;; exp{-2a 2 In}. 6.16. Let X denote the number of successes in n independent Bernoulli trials, with trial i resulting in a success with probability P. Give an upper bound for P { I X - ~ P. I ~ a }. 6.17. Suppose that 100 balls are to be randomly distributed among 20 urns Let X denote the number of urns that contain at least five balls Derive an upper bound for P{X ~ I5}. 6.18. Let p denote the probability that a random selection of 88 people will contain at least three with the same birthday Use Azuma's inequality to obtain an upper bound on p (It can be shown that p = 50) 6.19. For binary n-vectors x and y (meaning that each coordinate of these vectors is either 0 or 1) define the distance between them by n p(x,y) = L: Ix. - y.1 .=1 (This is called the Hamming distance) Let A be a finite set of such vectors, and let Xl ,. ., Xn be independent random variables that are each equally likely to be either 0 or 1. Set D = min p(X,y) yEA and let I-" = E [D]. J n terms of 1-", find an upper bound for P{D ~ b} when b > I-" 6.20. Let Xl, . ,X n be independent random vectors that are all uniformly distributed in the circle of radius 1 centered at the origin Let T == PROBLEMS 325 T(X 1 , ••• , X,,) denote the length of the shortest path connecting these n points. Argue that P{IT E[T]I:> a} -< 2 exp{-a 2 1(32n)}. 6.21. A group of 2n people, consisting of n men and n women, are to be independently distributed among m rooms Each woman chooses room j with probability P, while each man chooses it with probability q" j = 1, .. , m Let X denote the number of rooms that will contain exactly one man and one woman. (a) Find,u = E [X]. (b) Bound P{IX - ,ul > b} for b > O. 6.22. Let {X", n :> O} be a Markov process for which Xo is uniform on (0, 1) and, conditional on X,,, X n +, ::;;:: { aX" + 1 - a aX" with probability Xn with probability 1 Xn where 0 < a < 1. Discuss the limiting properties of the sequence X n , n ~ L 6.23. An urn initially contains one white and one black ball At each stage a ball is drawn and is then replaced in the urn along with another ball of the same color. Let Zn denote the fraction of balls in the urn that are white after the nth replication. (a) Show that {Zn' n 2: 1} is a martingale. (b) Show that the probability that the fraction of white balls in the urn is ever as large as 3/4 is at most 2/3. 6.24. Consider a sequence of independent tosses of a coin and let P{head} be the probability of a head on any toss. Let A be the hypothesis that P{head} = a and let B be the hypothesis that P{head} = b, 0 < a, b < 1 Let XI denote the outcome of the jth toss and let Z =P{X" ... ,XnIA} n P{X" .. ,XnIB} Show that if B is true, then: (a) Zn is a martingale, and (b) lim n _", Z" exists with probability 1 (c) If b -=1= a, what is limn_to Zn? 326 MARTINGALES 6.25. Let Zn, n :> 1, be a sequence of random variables such that ZI ~ 1 and given Zl, . , Zn-I. Zn is a Poisson random variable with mean Zn _I. n > 1 What can we say about Zn for n large? 6.26. Let XI • X2 • be independent and such that P{X, = -I} = 1 - 112', P{X, = 2' - I} = 112', i ~ 1. Use this sequence to construct a zero mean martingale Zn such that limn_a> Zn = - 00 with probability 1 (Hint· Make use of the Borel- Cantelli lemma.) A continuous-time process {X(t). t ~ O} is said to be a martingale if E[IX(t)l] < 00 for all t and, for all s < t, E[X(t)IX(u), 0 <: U :s; s] :::: X(s) 6.27. Let {X(t), t 2: O} be a continuous-time Markov chain with infinitesimal transition rates q'I' i -=f=. j. Give the conditions on the q'l that result in {X(t). t ~ O} being a continuous-time martingale Do Problems 6 28-6 30 under the assumptions that (a) the continu- ous-time analogue of the martingale stopping theorem is valid, and (b) any needed regularity conditions are satisfied 6.28. Let {N(t), t 2: O} be a nonhomogeneous Poisson process with intensity function A(t), t 2: O. Let T denote the time at which the nth event occurs Show that 6.29. Let {X(t). t 2: O} be a continuous-time Markov chain that will, in finite expected time. enter an absorbing state N Suppose that X(O) = 0 and let m, denote the expected time the chain is in state i. Show that for j -=f=. O.j -=f=. N. (a) E [number of times the chain leaves state j] = vIm" where 11 VI is the mean time the chain spends in j during a visit (b) E[number of times it enters state j] :::: 2: m,q'I' (c) Argue that d-I vlm l = 2: m,q'J! j -=f=. 0 dl vomo = 1 + 2: m,q,o '*0 REFERENCES 327 6.30. Let {X(l), I :> O} be a compound Poisson process with Poisson rate A and component distribution F. Define a continuous-time martingale related to this process REFERENCES Martingales were developed by Doob and his text (Reference 2) remains a standard reference References 6 and 7 give nice surveys of martingales at a slightly more advanced level than that of the present text Example 6 2(A) is taken from Reference 3 and Example 62(C) from Reference 5 The version of the Azuma inequality we have presented is, we believe, more general than has previously appeared in the literature Whereas the usual assumption on the increments is the symmetric condition Ix - X,-II :$ a" we have allowed for a nonsymmetric bound The treatment of the generalized Azuma inequality presented in Section 65 is taken from Reference 4 Additional material on Azuma's inequality can be found in Reference I N Alon, J Spencer, and P Erdos, The Probabilistic Method, John Wiley, New York, 1992 2 J Doob, Stochastic Processes, John Wiley, New York, 1953 3 S Y R Li, "A Martingale Approach to the Study of the Occurrence of Pattern in Repeated Experiments," Annals of Probability, 8, No 6 (1980), pp 1171-1175 4 S M Ross, "Generalizing Blackwell's Extension of the Azuma Inequality," Proba- bility in the Engineering and Informational Science, 9, No 3 (1995), pp 493-496 5 0 Stirzaker, "Tower Problems and Martingales," The Mathematical Scientist, 19, No I (1994), pp 52-59 6 P Whittle, Probability via Expectation, 3rd ed , Spnnger-Verlag, Berlin, 1992 7. D. Williams, Probability with Martingales, Cambridge University Press, Cambridge, England, 1991 CHAPTER 7 Random Walks INTRODUCTION Let Xl, X 2 , ••• be independent and identically distributed (iid) with E[Ix,11 < 00. Let So = 0, Sn = ~ ; Xn n ~ 1 The process {Sn, n ;:::: O} is called a random walk process. Random walks are quite useful for modeling various phenomena For in- stance, we have previously encountered the simple random walk-P{X, = l} = P = 1 - P{x, = -l}-in which Sn can be interpreted as the winnings after the nth bet of a gambler who either wins or loses 1 unit on each bet. We may also use random walks to model more general gambling situations; for instance, many people believe that the successive prices of a given company listed on the stock market can be modeled as a random walk. As we will see, random walks are also useful in the analysis of queueing and ruin systems. In Section 7.1 we present a duality principle that is quite useful in obtaining vanous probabilities concerning random walks. One of the examples in this section deals with the GIG/l queueing system, and in analyzing it we are led to the consideration of the probability that a random walk whose mean step is negative will ever exceed a given constant. Before dealing with this probability we, however, digress in Section 72 to a discussion of exchangeability, which is the condition that justifies the duality principle. We present De Finetti's theorem which provides a characterization of an infinite sequence of exchangeable Bernoulli random variables In Section 7.3 we return to our discussion of random walks and show how martingales can be effectively utilized For instance, using martingales we show how to approximate the probability that a random walk with a negative drift ever exceeds a fixed positive value. In Section 7.4 we apply the results of the preceding section to GIG/l queues and to certain ruin problems Random walks can also be thought of as generalizations of renewal pro- cesses. For if XI is constrained to be a nonnegative random variable, then Sn could be interpreted as the time of the nth event of a renewal process In Section 7.5 we present a generalization of Blackwell's theorem when the XI 328 DUALITY IN RANDOM WALKS 329 are not required to be nonnegative and indicate a proof based on results in renewal theory. 7. 1 DUALITY IN RANDOM WALKS Let n Sn = 2: X" n 1 denote a random walk. In computing probabilities concerning {Sn, n I}, there is a duality principle that, though obvious, is quite useful. Duality Principle (XI, X 2 , •• , Xn) has the same joint distributions as (Xn' X n - h ••• , XI)' The validity of the duality principle is immediate since the X" i > 1, are independent and identically distributed. We shall now illustrate its use in a series of proposi- tions. Proposition 7.11 states that if £(X) > 0, then the random walk will become positive in a finite expected number of steps. PROPOSl"rJON 7.1.1 Suppose Xl, X 2 , are independent and identically distnbuted random variables with E[X] > 0 If N = min{n Xl + .. + Xn > O}, then E[N] < 00 Proof ., E[N] = 2: P{N > n} ., = 2: P{Xl + X 2 ,Xl + ncO ., .,Xn+ . ncO 330 RANDOM WALKS where the last equality follows from duality Therefore, ., (711 ) ErN] 2: P{Sn:S: SII-1. Sn:S: Sn-2, , SI!:S O} n ~ O Now let us say that a renewal takes place at time n if SIt :s: Sn-1o Sn :S Sn-2, , S. :s: 0, that is, a renewal takes place each time the random walk hits a low (A little thought should convince us that the times between successive renewals are indeed independent and identically distributed) Hence, from Equation (711), ., ErN] = 2: P{renewal occurs at time n} n-O = 1 + E[number of renewals that occur] Now by the strong law of large numbers it follows, since E[X] > 0, that SII _ 00, and so the number of renewals that occurs will be finite (with probability 1) But the number of renewals that occurs is either infinite with probability 1 if F( 00 )-the probability that the time between successive renewals is finite-equals 1, or has a geometric distribution with finite mean if F(oo) < 1 Hence it follows that E[number of renewals that occurs] < 00 and so, E[N] < 00 Our next proposition deals with the expected rate at which a random walk assumes new values. Let us define R", called the range of (So, SI, ... ,Sn), by the following Definition Rn is the number of distinct values of (So, , SIt) PROPOSITION 7.1.2 lim E[Rn] = P{random walk never returns to O} n_«J n DUALITY IN RANDOM WALKS Proof Letting if Sk ¥- Sk-h Sk ¥- Sk-h ,Sk ¥- So otherwise, then and SO • E[R.1 = 1 + 2: P{h = 1} k:l " = 1 + 2: P{Sk ¥- Sk-lt Sk ¥- Sk-2," , Sk ¥- O} k ~ l " = 1 + 2: P{Xk ¥- 0, X k + X k - l ¥- 0, , Xk + Xk-j + . + Xl ¥- O} k ~ l • = 1 + 2: P{XI ¥- 0, XI + X 2 ¥- 0, ,Xl + . + X k ¥- OJ, k ~ l where the last equality follows from duality Hence, • (712) E[R.1 = 1 + 2: P{SI ¥- 0, Sz ¥- 0, ,Sk ¥- O} k=l • = 2: P{T> k}, k=O where T is the time of the first return to 0 Now, as k _ 00, P{T> k} _ P{T oo} = P{no return to OJ, and so, from (7 1 2) we see that E[R.1/n - P{no return to O} EXAMPLE 7.1(A) The Simple Random Walk. In the simple random walk P{Xc = I} = p = 1 - P{X, I} Now when p = ! (the symmetric simple random walk), the random walk is recurrent and thus P{no return to O} = 0 when p = ~ 331 332 RAN DOM WALKS Hence, E[R.ln] 0 when p = When p > L let a = P{return to 0IX l = I}. Since P{return to 0IX 1 = -I} = 1 (why?), we have P{return to o} = ap + 1 - p. Also, by conditioning on X 2 , a = a 2 p + 1 - p, or, equivalently, (a - l)(ap - 1 + p) = 0 Since a < 1 from transience, we see that a = (1 - p)lp, and so E[R"ln] 2p - 1 Similarly, E[R"ln] 2(1 - p) - 1 when p Our next application of the utility of duality deals with the symmetric random walk PROPOSITION 7.1.3 In the symmetric simple random walk the expected number of visits to state k before returning to the origin is equal to 1 for all k oF 0 Proof For k > 0 let Y denote the number of visits to state k prior to the first return to the origin Then Y can be expressed as where if a visit to state k occurs at time n and there is no return to the origin before n otherwise, DUALITY IN RANDOM WALKS 333 or, equivalently, I = {I n 0 irSn > 0, S n ~ l >0. S ~ ~ 2 > 0, ,SI > 0, Sn = k otherwise Thus, .. E[Y] = L P{Sn > 0, Sn-l > 0, ,Sl > 0, Sn k} . ~ l .. = L P{Xn + + Xl > 0, Xn I + . +X1>O, ,Xl >O,X n + + XI = k} n=l '" = LP{X I + + X" >0,X2 + +X,,>O, ,X,,>O,X I + +X,,==k}, n=1 where the last equality follows from duality Hence, '" E[Y] = L P{Sn > O,S" > SI' ,S" > S,,-I, SI! = k} n=l '" = L P{symmetnc random walk hits k for the first time at time n} n-I = P{symmetric random walk ever hits k} = 1 (by recurrence), and the proof is complete * Our final application of duality is to the GIGI1 queueing model This is a single-server model that assumes customers arrive in accordance with a re- newal process having an arbitrary interarrival distribution F, and the service distribution is G Let the interarrival times be Xl, X 2 , , and let the service times be Y I , Y 2 , , and let Dn denote the delay (or wait) in queue of the nth customer Since customer n spends a time D" + Y n in the system and customer n + 1 arrives a time X/HI after customer n, it follows (see Figure 7 1 1) that { Dn + Y n - X,,_.-j D +1 = " 0 if D" + Yn ::> Xn+1 if Dn + Y" < Xn-d , or, equivalently, letting Un = Y II - Xn+ (, n ::::= 1, (7 13) .. The reader should compare this proof with the one outlined in Problem 446 of Chapter 4 334 RAN DOM WALKS ----D,,---- ---y,,---- -x----------x-------x-------x-- n n n+1 n arrives enters arrives departs servfce or ----------------X"+l-------------- ----D,,--- ---y,,---- -x --------x-------- x---- X· n n n n +1 arrives arrives enters departs and enters service service Figure 7.1.1. Dn+l = max{Dn + Y. - X,,+I' O}. Iterating the relation (7 1.3) yields Dn-'-I = max{O, Dn + Vn} = max{O, Vn + max{O, D n - I + Vn-I}} max{O, Un, Un + V n - I + D n - I } = max{O, Un, Vn + V n - I + max{O, V n - 2 + D n - 2 }} = max{O, Un, Un + V n - 1 , Vn + V n - I + V n - 2 + D n - 2 } = max{O, Un, Un + Un-I, .. ,Un + U n - 1 + ... + U l }, where the last step uses the fact that Dl = 0 Hence, for e > 0, P{D"-'-I :> e} = P{max(O, Un, Un + Un-I> .. , Un + ... + U I ):> e} P{max(O, VI, U I + V h • , VI + . + Un):> e}, where the last equality follows from duality Hence we have the following PROPOSITION 7.1.4 If Dn is the delay in queue of the nth customer in a GIG/1 queue with interarrival times X" i ~ 1, and service times Y" i ~ 1, then (71 4) P{D n'l ~ c} P{the random walk S" j 2: 1, crosses c by times n}, DUALITY IN RANDOM WALKS 335 where / S, = 2: (YI - XIII) 1=1 We also note from Proposition 71.4 that P{D,,-'-I ~ c} is nondecreasing in n. Letting P{D", >- c} = lim P{D" >- c}, ,,--+'" we have from (714) (7.15) P{D", ~ c} = P{the random walk 5" j ~ 1, ever crosses c}. If E[ U] = E[Y] - E[X] is positive, then the random walk will, by the strong law of large numbers, converge to infinity, and so P{D", ~ c} = 1 for all c if E[Y] > E[X] The above will also be true when E[Y] = E[X], and thus it is only when E[Y] < E[X] that a limiting delay distribution exists. To compute P{D", > c} in this case we need to compute the probability that a random walk whose mean change is negative will ever exceed a constant However, before attacking this problem we present a result, known as Spitzer's identity, that will enable us to explicitly compute E[D,,] in certain special cases. Spitzer's identity is concerned with the expected value of the maximum of the random walk up to a specified time. Let M" = max(O, 51, ., ,5,,), n >- 1 PROPOSITION 7.1.5 Spitzer's Identity " 1 E[M,,] = ~ k E[S;] Proof For any event A, let I(A) equal 1 if A occurs and 0 otherwise We will use the representation M" = I(S" > O)M" + I(S" :s O)M" Now, I(S" > O)M" = I(S" > 0) max S, = I(S" > O)(XI + max(X2 , ,X2 + .. + X") ISIs" RANDOM WALKS Hence, But, since XI, X 2 , • , Xn and X", XI> X 2 • , X n - I have the same joint distnbution, we see that , (717) Also, since X" Sn has the same joint distribution for all i, implying that (71.8) Thus, from (716). (71.7), and (71 8) we have that E[/(Sn > O)Mn] = E[/(Sn > O)Mn-d +.!. E[S:] n In addition, since Sn S 0 implies that Mn = M n - 1 it follows that I(Sn S O)Mn = I(Sn ::; O)Mn_1 which, combined with the preceding, yields that E[M,,] Reusing the preceding equation, this time with n - 1 substituting for n, gives that E[Mn] =! E[S:] + ~ 1 E[S:-d + E[M n - 2 ] n n- and, upon continual repetition of this argument, we obtain n 1 E[M,,] = &; k E[St] + E[M.] which proves the result since MI = S t DUAliTY IN RANDOM WALKS 337 It follows from Proposition 7 ] 4 that, with Mn = max(O, 51, P{D n + 1 ~ c} = P{M n ~ c} which implies, upon integrating from 0 to 00, that Hence, from Spitzer's identity we see that Using the preceding, an explicit formula for £[Dn+d can be obtained in certain special cases. ExAMPLE 7.1(8) Consider a single server queueing model where arrivals occur according to a renewal process with interarrival distri- bution G(s, A) and the service times have distribution G(r, J.L), where G(a, b) is the gamma distribution with parameters a and b (and thus has mean a/b) We will use Spitzer's identity to calculate a formula for £[Dn+d in the case where at least one of s or r is integral To begin, suppose that r is an integer. To compute first condition on L ~ ~ l X,+ 1 • Then, use the fact that L ~ ~ l Y, is distrib- uted as the sum of kr independent exponentials with rate J.L to obtain, upon conditioning on the number of events of a Poisson process with rate J.L that occur by time L ~ _ l X d1 , that Hence, letting we have that 338 RANDOM WALKS Since W k is gamma with parameters ks and A a simple computa- tion yields that where a' = f: e-xxodx for non integral a. Hence, we have that when r is integral n 1 kr-I kr - i (kS + i-I) ( A )b ( J.L )' E[Dn+d=2:-2:- . -- k ~ 1 k ,=0 J.L 1 A + J.L A + J.L ( a + b) where a = (a + b)!/(a!b '). When s is an integer, we use the identity Sf = Sk + (-Sk)+ to obtain that E[St] = k(E[Y] - E[XD + E [ ( ~ X,+I - ~ y,) +] We can now use the preceding analysis to obtain the following E[D n + l ] = n(rlJ.L - s/A) n 1 b-) ks - i (kr + i-I) ( J.L )kr( A )' +2:-2:- - - k=1 k ,=0 A i A + J.L A + J.L 7.2 SOME REMARKS CONCERNING EXCHANGEABLE RANDOM VARIABLES It is not necessary to assume that the random variables XI , ... , Xn are indepen- dent and identically distnbuted to obtain the duality relationship. A weaker general condition is that the random variables are exchangeable, where we say that XI, . ,X n is exchangeable if X , .. ,X, has the same joint distribu- 1 n tion for all permutations (i1> ... , in) of (1, 2, .. , n). EXAMPLE 7.2(A) Suppose balls are randomly selected, without re- placement, from an urn originally consisting of n balls of which k are white If we let x = {I I 0 if the ith selection is white otherwise, then XI, . , Xn will be exchangeable but not independent SOME REMARKS CONCERNING EXCHANGEABLE RANDOM VARIABLES 339 As an illustration of the use of exchangeability, suppose Xl and X 2 are exchangeable and let !(x) and g(x) be increasing functions. Then for all XI, X2 implying that But as exchangeability implies that E[!(X1)g(X I )] E[!(X 2 )g(X 2 )], E[!(X 1 )g(X 2 )] = E[!(X 2 )g(X I )], we see upon expanding the above inequality that Specializing to the case where XI and X 2 are independent, we have the fol- lowing PROPOSITION 7.2.1 If f and g are both increasing functions, then E[f(X)g(X)] 2 E[f(X)] E[g(X)] The infinite seq uence of random variables XI , X 2 , is said to be exchange- able if every finite subsequence XI, ", Xn is exchangeable EXAMPLE 7.2(8) Let A denote a random variable having distribution G and suppose that conditional on the event that A = A, Xl, X 2 , are independent and identically distributed with distribution F,,-that is, n P{X 1 S Xl, ,X n S xnlA = A} = II F,,(x,) ,=1 The random variables Xl' X 2 , are exchangeable since II P{X I S XI," ,XII S XII} f II F,,(x,) dG(A), ,=1 which is symmetric in (Xl, ., xn)' They are, however, not inde- pendent. 340 RANDOM WALKS There is a famous result known as De Finetti's theorem, which states that every infinite sequence of exchangeable random variables is of the form specified by Example 7 2(B). We will present a proof when the Xj are 0 - 1 (that is, Bernoulli) random variables. THEOREM 7.2.2 (De Finetti's Theorem) To every infinite sequence of exchangeable random variables XI, Xl> taking values either 0 or 1, there corresponds a probability distribution G on [0, 1] such that, for all 0 ~ k :$ n, (722) P{X 1 ==X 2 =· ==Xk==l,Xk+I==' =Xn=O} == I ~ Ak(1 - A)n-k dG(A) Proof Let m ~ n We start by computing the above probability by first conditioning on This yields (722) P{XI = = Xn = O} m = 2: P{X I == ••• == X k == 1, Xk+l = )=0 = Xn == Olsm == j} P{Sm == j} == 2: j(j - 1)· . (j - k + l)(m - j)(m - j - 1)' . (m - j - (n - k) + 1) P{S = j} ) m(m - 1) . (m - n + 1) m This last equation follows since, given Sm == j, by exchangeability each subset of size J of Xl, , Xm is equally likely to be the one consisting of all 1 's If we let Y m == Smim, then (722) may be written as (723) P{X. = = X. = 1, X ••• = = X. = O} = E [(my.,)(mY., - 1) (mY., - k + 1)[m(1 - Y.,)J[m(1 - y.,) - I) m(m - 1) (m - n + 1) The above is, for large m, roughly equal to E [ Y ~ ( 1 - ym)"-k], and the theorem should follow upon letting m _ 00 Indeed it can be shown by a result called Helly's theorem that for some subsequence m' converging to 00, the distribution of Y m ' will converge to a distribution G and (72 3) will converge to USING MARTINGALES TO ANALYZE RANDOM WALKS 341 Remark De Finetti's theorem is not valid for a finite sequence of exchange- able random variables For instance, if n 2, k 1 in Example 7.2(A), then P{X 1 1, X 2 O} = P{X 1 = 0, X 2 = I} = t. which cannot be put in the form (7.2.1) 7.8 USING MARTINGALES TO ANALYZE RANDOM W ALB:S Let " S" = 2: Xl' n;;:::: 1 1=1 denote a random walk. Our first result is to show that if the Xi are finite integer-valued random variables, then SrI is recurrent if £[X] = 0 THEOREM 7.3.1 Suppose XI can only take on one of the values 0, :t I, , :t M for some M < 00 Then is", n 2: O} is a recurrent Markov chain if, and only if, E[X] = 0 Proof It is clear that the random walk is transient when E[X] oF 0, since it will either converge to +00 (if E[X] > 0), or -00 (if E[X] < 0) So suppose E[X] = 0 and note that this implies that is", n 2: I} is a martingale Let A denote the set of states from -M up to I-that is, A ~ {-M, -(M - 1), ,I} Suppose the process starts in state i, where i 2: 0 For j > i, let AI denote the set of states A J = {j, j + 1, . , j + M} and let N denote the first time that the process is in either A or A J By Theorem 6.2.2, and so or Hence, i = E[SNISN E A]P{SN E A} + E[SNISN E AJ]P{SN E A J } 2:-MP{SNEA}+j(1 P{SNEA}) j-i P{process ever enters A} 2: P{SN E A} 2: j + M' 342 RANDOM WALKS and letting j _ 00, we see that P{process ever enters A I start at i} = 1, i ;;,?: 0 Now let B = { 1, 2, , M} By the same argument we can show that for i S 0 P{process ever enters Blstart at i} 1, is O. Therefore, P{process ever enters A UBI start at i} = 1, for all i It is easy to see that the above implies that the finite set of states A U B will be visited infinitely often However, if the process is transient, then any finite set of states is only visited finitely often Hence, the process is recurrent. Once again, let denote a random walk and suppose J.L = E[X] :;o!: O. For given A, B > 0, we shall attempt to compute P A , the probability that Sn reaches a value at least A before it reaches a value less than or equal to B. To start, let 8 :;o!: 0 be such that We shall suppose that such a 8 exists (and is usually unique) Since is the product of independent unit mean random variables, it follows that {Zn} is a martingale with mean 1 Define the stopping time N by N min{n: Sn ;;,?: A or Sn < -B} Since condition (iii) of Theorem 6.2.2 can be shown to be satisfied, we have Therefore, (7.3.1) USING MARTINGALES TO ANALYZE RANDOM WALKS 343 We can use Equation (7 3 1) to obtain an approximation for P A as follows. If we neglect the excess (or overshoot past A or - B). we have the following approximations: Hence, from (73.1). or (73.2) E[eIlSNISN:> A] = ellA, E[eIlSNISN <: -B] = e- IIB 1 - e- IIB P A = IIA -liB' e -e We can also approximate E[N] by using Wald's equation and then neglect- ing the excess. That is, Using the approximation we have and since we see that =A, E[SNISN $ -B] = -B, E[SN] = E[N]E[X], ( [N '] = APA - B(1 - PA ) E E[X]' Using the approximation (7.3 2) for P A , we obtain (733) EXAMPLE 7.3(A) The Gambier's Ruin Problem. Suppose x = { 1 I -1 with probability p with probability q = 1 - p 344 RANDOM WALKS We leave it as an exercise to show that E[(q/p)X] = 1 and so eO = q/ p. If we assume that A and B are integers, there is no overshoot, and so the approximations (7.3 2) and (7.3 3) are ex- act. Therefore, and 1 - (q/ptB (q/p)B - 1 P A = = ---'-'---=----<.----::-- (q/p)A - (q/ptR (q/p)AtR - 1 E[N] = A(1 - (q/ptB) - B((q/p)A -1). ((q/p)A - (q/ptB)(2p - 1) Suppose now that E[X] < 0 and we are interested in the probability that the random walk ever crosses A. * We will attempt to compute this by using the results so far obtained and then letting B ~ 00 Equation (73.1) states (7.34) 1 = E [eHNI S N ~ A] P{process crosses A before - B} + E[eHNISN ~ -B]P{process crosses -B before A}. 9 ~ 0 was defined to be such that E[e OX ] = 1. Since E[X] < 0, it can be shown (see Problem 7.9) that 9> O. Hence, from (7.34), we have 1 ~ eOAP{process crosses A before -B}, and, upon letting B ~ 00, we obtain (73.5) P{random walk ever crosses A} :S e- OA . 7.4 ApPLICATIONS TO GIG/l QUEUES AND RUIN PROBLEMS 7.4.1 The GIGII Queue For the GIG/1 queue, the limiting distnbution of the delay in queue of a customer is by (7 1 5) given by (7.4.1 ) P{D", ~ A} = P{Sn ~ A for some n}, where n Sn = LV" 1=1 • By crosses A we mean "either hits or exceeds A " APPLICATIONS TO GIGII QUEUES AND RUIN PRORLEMS 345 and where Y, is the service time of the ith customer and X,+I the interarrival time between the ith and (i + 1 )st arrival Hence, when E[U] E[Y] - E[X] < 0, letting () > ° be such that E[e I1U ] = E[e l1 (Y-X)] = 1, we have from (7.3.5) (74.2) There is one situation in which we can obtain the exact distribution of D"" and that is when the service distribution is exponential So suppose Y, is exponential with rate p.. Recall that for N defined as the time it takes for SIT to either cross A or - B, we showed in Equation (7 34) that (7.4.3) 1 = E[e 8 \'NISN>A]P{S"crossesA before -B} + E[e 9SN ISN S -B]P{S"crosses -BbeforeA}. Now, SIT = ~ ~ = 1 (Y i - X,+l) and let us consider the conditional distribution of SN given that SN ;::: A. This is the conditional distribution of N (74.4) X,+l) given that L (Y, - XJ+I) > A ,"'I Conditioning on the value of N (say N = n) and on the value of ,,-1 X"+l - L (Y, - X1+1) (say it equals c), ,-I note that the conditional distribution given by (7.4 4) is just the conditional distribution of Y" - c given that Y" - c > A But by the lack of memory of the exponential, it follows that the conditional distribution of Y given that Y > c + A is just c + A plus an exponential with rate p. Hence, the conditional distribution of Y" - c given that Y" - c > A is just the distribution of A plus an exponential with rate p.. Since this is true for all nand c, we see E[e 9SN ISN> A] = E[e 9 (A+Y)] e fiA f eflYp.e-/LYdy 346 RANDOM WALKS Hence, from (743), 1 = J,L 8 ellA P{S" crosses A before B} + E[e 9SN ISN< -B]P{S" crosses -BbeforeA}. Since 8 > 0, we obtain, by letting B -+ 00, 1 = ~ ellAp{S" ever crosses A}, J,L-8 and thus from (7.4 1) P{D", ~ A} A >0 Summing up, we have shown the following THEOREM 7.4.1 For the GIGl1 queue with lid -rervice t i m e ~ Y" i ~ 1, and lid interarrival t i m e ~ Xl> X 2 , , when E[Y] < E[X], where 0 > 0 i-r mch that In addition, If Y, I ~ exponential with rate f.L, then o P{D", O} =-, f.L where in rhi-r ca-re 0 i ~ -ruch that A >0, APPLICATIONS TO GIGII QUEUES AND RUIN PROBLEMS 347 7.4.2 A Ruin Problem Suppose that claims are made to an insurance company in accordance with a renewal process with interarrival times XI, Xl, . . Suppose that the values of the successive claims are iid and are independent of the renewal process of when they occurred Let Y, denote the value of the ith claim Thus if N(t) denotes the number of claims by time t, then the total value of claims made to the insurance company by time t is L ~ ( I ) YI" On the other hand, suppose that the company receives money at a constant rate of e per unit time, e > o We are interested in determining the probability that the insurance company, starting with an initial capital of A, will eventually be wiped out. That is, we want p P { ~ Y, > et + A for some t :> O}, ,"I Now it is clear that the company will eventually be wiped out with probability 1 if E[Y] ;::= eE[X] (Why is that?) So we'll assume E[Y] < eE[X]. It is also fairly obvious that if the company is to be wiped out, that event will occur when a claim occurs (since it is only when claims occur that the insurance company's assets decrease) Now at the moment after the nth claim occurs the company's fortune is n n A + e 2: x, - 2: Y,. ,;1 ,=1 Thus the probability we want, call it p(A), is P (A) = P {A + e ~ X, - ~ Y , < 0 for some n }, or, equivalently, peA) = P{S,. > A for some n}, where ,. SIr = 2: (y, - eX,) ;=1 348 RANDOM WALKS is a random walk. From (7 4 1), we see that (745) p(A) = P{D", > A} where D", is the limiting delay in queue of a GIG/l queue with interarrival times eX. and service times Y;. Thus from Theorem 7.4.1 we have the following. THEOREM 7.4.2 (i) The probability of the insurance company ever being wiped out, call it p(A), is such that where e is such that E[exp{O(Y, - cX,)] = 1 (ii) If the claim values are exponentially distributed with rate /J., then where 0 is such that (iii) If the arrival process of claims is a Poisson process with rate A, then p(O) = AE[Y]/c Proof Parts (i) and (ii) follow immediately from Theorem 7 4 1 In part (iii), cX will be exponential with rate Aic, thus from (745) p(O) will equal the probability that the limiting customer delay in an MIG/1 queue is positive But this is just the probability that an arriving customer in an MIG/1 finds the system nonempty Since it is a system with Poisson arnvals, the limiting distribution of what an amval sees is identical to the limiting distnbution of the system state at time t (This is so since the distnbution of the system state at time t, given that a customer has arrived at t is, due to the Poisson arrival assumption, identical to the unconditional distribution of the state at t) Hence the (limiting) probability that an arrival will find the system nonempty is equal to the limiting probability that the system is nonempty and that, as we have shown by many different ways, is equal to the arrival rate times the mean service time (see Example 4 3(A) of Chapter 4) BLACKWELL'S THEOREM ON THE LINE 349 7.5 BLACKWELL'S THEOREM ON THE LINE Let {S", n ~ I} denote a random walk for which 0 < J.L = E[X] < 00. Let Vet) denote the number of n for which Sn :5 t That is, '" {I Vet) = L I", where I" = n=l 0 if S" :5 t otherwise If the X, were nonnegative, then Vet) would just be N(t), the number of renewals by time t. Let u(t) = E [V(t)]. In this section we will prove the analog of Black- well's theorem. BLACKWELL'S THEOREM If J.L > 0 and the X, are not lattice, then u(t + a) - u(t) - a/ J.L as t - 00 for a > 0 Before presenting a proof of the above, we will find it useful to introduce the concept of ascending and descending ladder variables We say that an ascending ladder variable of ladder height S" occurs at time n if S" > max(So, Sl, ... , Sn-l), where So == o. That is, an ascending ladder variable occurs whenever the random walk reaches a new high. For instance, the initial one occurs the first time the random walk becomes positive If a ladder vanable of height S" occurs at time n, then the next ladder variable will occur at the first value of n + j for which or, equivalently, at the first n + j for which X" + I + ... + X,,+! > O. Since the X, are independent and identically distributed, it thus follows that the changes in the random walk between ladder variables are all probabilistic replicas of each other. That is, if N, denotes the time between the (i - l)st and ith ladder variable, then the random vectors (N" 'SN , - SN,J, i ~ 1, are independent and identically distributed (where SNo == 0) 350 RANDOM WALKS Similarly we can define the concept of descending ladder variables by saying that they occur when the random walk hits a new low Let p(P.) denote the probability of ever achieving an ascending (descending) ladder variable That is, P = P{Sn > 0 for some n}, P. = P{Sn < 0 for some n}. Now at each ascending (descending) ladder variable there will again be the same probability p(P.) of ever achieving another one. Hence there will be exactly n such ascending [descending] ladder variables, n 2!:: 0, with probability pn(1 - p) [p*(1 - P.)]. Therefore, the number of ascending [descending] ladder variables will be finite and have a finite mean if, and only if, p[P.] is less than 1 Since ErX] > 0, it follows by the strong law of large numbers that, with probability 1, Sn ~ 00 as n ~ 00, and so, with probability 1 there will be an infinite number of ascending ladder variables but only a finite number of descending ones Thus, p = 1 and P. < 1 We are now ready to prove Blackwell's theorem The proof will be in parts First we will argue that u(t + a) - u(t) approaches a limit as t ~ 00 Then we will show that this limiting value is equal to a constant times a; and finally we will prove the generalization of the elementary renewal theorem, which will enable us to identify this constant as 11 J,L. PROOF OF BLACKWELL'S THEOREM The successive ascending ladder heights constitute a renewal process Let Y(t) denote the excess at 1 of this renewal process That is, 1 + Y(/) is the first value of the random walk that exceeds 1 Now it is easy to see that given the value of Y(t), say Y(/) = y, the distribution of U(t + a) - U(/) does not depend on t That is, if we know that the first point of the random walk that exceeds t occurs at a distance y past I, then the number of points in (/,1 + a) has the same distribution as the number of points in (0, a) given that the first positive value is y Hence, for some function g, E[U(t + a) - U(t)IY(t)] = g(Y(t», and so, taking expectations, u(t + a) - U(/) = E[g(Y(t»J Now Y(/), being the excess at 1 of a nonlattice renewal process, converges to a limiting distnbution (namely, the equilibnum interarrival distribution) Hence, E[g(Y(t»] will converge to E[g(y(oo»J where y(oo) has the limiting distribution of Y(l) Thus we have shown the existence of lim [u(t + a) - u(t)] .-'" BLACKWELL'S THEOREM ON THE LINE 351 Now let h(a) = lim [u(t + a) - u(t)] 1-0) Then h(a + b) = lim [u(t + a + b) - u(t + b) + u(t + b) - u(t)] 1-'" = lim[u(t + b + a) - u(t + b)] 1-'" + lim [u(t + b) - u(t)] 1-'" = h(a) + h(b), which implies that for some constant c (751) h(a) = lim [u(t + a) - u(t)] = ca 1-'" To identify the value of c let N, denote the first n for which Sn > t If the XI are bounded, say by M, then NI t<2:X, 5,t+M I ~ I Taking expectations and using Wald's equation (E[N,] < 00 by an argument similar to that used in Proposition 71 1) yields t < E[N,]IL 5, t + M, and so (75.2) E[N,] ---- t IL If the XI are not bounded, then we can use a truncation argument (exactly as in the proof of the elementary renewal theorem) to establish (75 2) Now U(t) can be expressed as (753) where Nt is the number of times Sn lands in (- 00, t] after having gone past t Since the random vanable N,* will be no greater than the number of points occurring after time N, for which the random walk is less than SN, it follows that I (754) E[N1]:$ E[number of n for which Sn < 0] 352 RAN DOM WALKS We will now argue that the right-hand side of the above equation is finite, and so from (7.52) and (753) (7.5 5) u(t) 1 --- t /L The argument that the right-hand side of (7 5 4) is finite runs as follows' we note from Proposition 71 1 that E[N] < 00 when N is the first value of n for which Sn > 0 At time N there is a positive probability 1 - p* that no future value of the random walk will ever fall below SN If a future value does fall below SN, then again by an argument similar to that used in Proposition 7 1 1, the expected additional time until the random walk again becomes positive is finite At that point there will again be a positive probability 1 - p* that no future value will fall below the present positive one, and so on We can use this as the basis of a proof showing that E[NIXI <0] E [number of n for which Sn < 0] :5 1 < 00 -p* Thus we have shown (755) We will now complete our proof by appealing to (75 1) and (7.55) From (751) we have implying that or, equivalently, u(i + 1) - uU) - c as i _ 00, ~ u(i + 1) - u(i) L.. ---'----'----'-'- - c as n _ 00, 1=1 n u(n + 1) - u(1) -c, n which, from (755) implies that c = lI/L' and the proof is complete Remark The proof given lacks rigor in one place. Namely, even though the distribution of Y(t) converges to that of Y( (0), it does not necessarily follow that E[g(Y(t))] will converge to E[g(Y(oo))]. We should have proven this convergence directly PROBLEMS 7.1. Consider the following model for the flow of water in and out of a dam. Suppose that, during day n, Y n units of water flow into the dam from outside sources such as rainfall and river flow At the end of each day PROBLEMS 353 water is released from the dam according to the following rule: If the water content of the dam is greater than a, then the amount a is released. If it is less than or equal to a, then the total contents of the dam are released. The capacity of the dam is C, and once at capacity any addi- tional water that attempts to enter the dam is assumed lost Thus, for instance, if the water level at the beginning of day n is x, then the level at the end of the day (before any water is released) is min(x + Y n , C). Let Sn denote the amount of water in the dam immediately after the water has been released at the end of day n. Assuming that the Y n , n ~ 1, are independent and identically distributed, show that {Sn, n ~ 1} is a random walk with reflecting barriers at 0 and C - a 7.2. Let Xl, .. , Xn be equally likely to be any of the n! perm utations of (1, 2, . , n) Argue that 7.3. For the simple random walk compute the expected number of visits to state k. 7.4. Let Xl, X 2 ,. ., Xn be exchangeable. Compute E[Xdx(I» X(2» ... ,X(n)]. where X(l) :s; X(2):oS .. < X(n) are the X, in ordered arrangement. 7.5. If Xl, X 2 ,.. is an infinite seq uence of exchangeable random van- ables, with E[X1] < 00, show that Cov(X I , X 2 ) ~ 0 (Hint: Look at Var(L; X,) ) Give a counterexample when the set of exchangeable ran- dom variables is finite 7.6. An ordinary deck of cards is randomly shuffled and then the cards are exposed one at a time At some time before all the cards have been exposed you must say "next," and if the next card exposed is a spade then you win and if not then you lose For any strategy, show that at the moment you call "next" the conditional probability that you win is eq ual to the conditional probability that the last card is a spade. Conclude from this that the probability of winning is 114 for all strategies 7.7. Argue that the random walk for which X, only assumes the values 0, ~ 1 , . ,:!:.M and E[Xr1 = 0 is null recurrent 7.8. Let Sn, n ~ 0 denote a random walk for which 354 RANDOM WALKS Let, for A > 0, B > 0, N = min{w Sn ~ A or Sn ::=:: - B}. Show that E [N] < 00. (Hint: Argue that there exists a value k such that P{Sk > A + B} > O. Then show that E[N] S kE[G], where Gis an appropriately defined geometric random variable.) 7.9. Use Jensen's inequality, which states that E[f(X)] ~ f(E[X]) whenever fis convex to prove that if (J ¥- 0, E[X] < 0, and E[e BX ] = 1, then (J > 0 7.10. In the insurance ruin problem of Section 74 explain why the company will eventually be ruined with probability 1 if E[Y] ~ cE[X]. 7.11. In the ruin problem of Section 7.4 let F denote the interarrival distribu- tion of claims and let G be the distribution of the size of a claim. Show that p(A), the probability that a company starting with A units of assets is ever ruined, satisfies p(A) = I ~ I:+(I p(A + ct - x) dG(x) dF(t) + I ~ G(A + ct) dF(t). 7.U. For a random walk with J.L = E[X] > 0 argue that, with probability 1, u(t) 1 ----')0- t J.L where u(t) equals the number of n for which 0 ~ Sn ~ t. 7.13. Let Sn = ~ ~ X, be a random walk and let A" i > 0, denote the probability that a ladder height equals i-that is, Ai = P{first positive value of Sn equals i}. (a) Show that if { q, P{X, = j} = a" j= -1 then Ai satisfies REFERENCES (b) If P{X c = j} = 1f5, j = -2, -1,0, 1,2, show that 1 +v'5 AI = 3 + v'5' 2 A2 = -3 -+-v'5----=S 355 7.14. Let Sn, n ::::-:: 0, denote a random walk in which Xc has distribution F Let G(t, s) denote the probability that the first value of Sn that exceeds t is less than or equal to t + s That is, G(t, s) = P{first sum exceeding t is ~ t + s} Show that G(t,s) = F(t + s) - F(t) + t", G(t - y, s) dF(y). REFERENCES Reference 5 is the standard text on random walks. Useful chapters on this subject can be found in References 1, 2, and 4. Reference 3 should be consulted for additional results both on exchangeable random vanables and on some special random walks Our proof of Spitzer's identity is from Reference 1. S Asmussen, Applied Probability and Queues, Wiley, New York, 1985 2 0 R Cox and H 0 Miller, Theory of Stochastic Processes, Methuen, London, 1965 3 B DeFinetti, Theory of Probability, Vols I and II, Wiley, New York, 1970 4. W Feller, An Introduction to Probability Theory and its Applications, Wiley, New York,1966 5 F Spitzer, Principles of Random Walks, Van Nostrand, Pnnceton NJ, 1964 CHAPTER 8 Brownian Motion and Other Markov Processes 8.1 INTRODUCTION AND PRELIMINARIES Let us start by considering the symmetric random walk that in each time unit is equally likely to take a unit step either to the left or to the right Now suppose that we speed up this process by taking smaller and smaller steps in smaller and smaller time intervals If we now go to the limit in the correct manner, what we obtain is Brownian motion More precisely suppose that each Ilt time units we take a step of size Ilx either to the left or to the right with equal probabilities. If we let X(t) denote the position at time t, then (8.1 1) where x == {+1 I -1 if the ith step of length Il x is to the right if it is to the left, and where the X, are assumed independent with P{X, == 1} == P{X, == -1} == t Since E[XJ == 0, Var(X,) == E[X;] = 1, we see from (81 1) that (812) 356 E[X(t)] = 0, Var(X(t)) = (IlX)2 [ ~ J INTRODUClION AND PRELIMINARIES 357 We shall now let d x and dE go to 0 However, we must do it in a way to keep the resulting limiting process nontrivial (for instance, if we let dx = dE and then let dE --+ 0, then from the above we see that E[X(t)] and Var(X(t» would both c o n v ~ e to 0 and thus X(t) would equal 0 with probability 1) If we let dx = cY dE for some positive constant c, then from (8 1 2) we see that as dt --+ 0 E[X(t)] = 0, Var(X(t» --+ c 2 t We now list some intuitive properties of this limiting process obtained by taking dx = c ~ and then letting dE --+ 0 From (81 1) and the central limit theorem we see that. (i) X(t) is normal with mean 0 and variance c 2 t In addition, as the changes of value of the random walk in nonoverlapping time intervals are independent, we have (ii) {X(t), t ~ O} has independent increments Finally, as the distribution of the change in position of the random walk over any time interval depends only on the length of that interval, it would appear that. (iii) {X(t), t ~ O} has stationary increments We are now ready for the following definition Definition A stochastic process [X(t), t ~ 0] is said to be a Brownian motion process if (i) X(O) = 0, (ii) {X(t), t 2= O} has stationary independent increments, (iii) for every t > 0, X(t) is normally distributed with mean 0 and variance c 2 t The Brownian motion process, sometimes called the Wiener process, is one of the most useful stochastic processes in applied probability theory It originated in physics as a description of Brownian motion This phenomenon, named after the English botanist Robert Brown, who discovered it, is the motion exhibited by a small particle that is totally immersed in a liquid or 358 BROWNIAN MonON AND OTHER MARKOV PROCESSES gas Since its discovery, the process has been used beneficially in such areas as statistical testing of goodness of fit, analyzing the price levels on the stock market, and quantum mechanics. The first explanation of the phenomenon of Brownian motion was given by Einstein in 1905 He showed that Brownian motion could be explained by assuming that the immersed particle was continually being subject to bombard- ment by the molecules of the surrounding medium. However, the above concise definition of this stochastic process underlying Brownian motion was given by Wiener in a series of papers originating in 1918. When c = 1, the process is often called standard Brownian motion. As any Brownian motion can always be converted to the standard process by looking at X(t)/c, we shall suppose throughout that c = 1 The interpretation of Brownian motion as the limit of the random walks (8.1.1) suggests that X(t) should be a continuous function of t. This turns out to be the case, and it may be proven that, with probability 1, X(t) is indeed a continuous function of t This fact is quite deep, and no proof r shall be attempted. Also, we should note in passing that while the sample path X(t) is always continuous, it is in no wayan ordinary function For, as we might expect from its limiting random walk interpretation, X(t) is always pointy and thus never smooth, and, in fact, it can be proven (though it's deep) that, with probability 1, X(t) is nowhere differentiable. The independent increment assumption implies that the change in position between time points sand t + s-that is, X(t + s) - X(s)-is independent of all process values before time s. Hence P{X(t + s) <: al Xes) = x, X(u), 0 s; u < s} = P{X(t + s) - Xes) <: a - x I Xes) = x, X(u), 0 s; u < s} = P{X(t + s) - Xes) <: a - x} = P{X(t + s) .s al Xes) = x}, which states that the conditional distribution of a future state X(t + s) given the present Xes) and the past X(u), 0 < u < s, depends only on the present A process satisfying this condition is called a Markov process. Since X(t) is normal with mean 0 and variance t, its density function is given by 1 2 !rex) = --e- X 121 V21ii From the stationary independent increment assumption, it easily follows that the joint density of X(t l ), • , X(t,,) is given by INTRODUCTION AND PRELIMINARIES 359 By using (8.1 3), we may compute in principle any desired probabilities For instance suppose we require the conditional distribution of X(s) given that X(t) = B, where s < t. The conditional density is + ( I B) = t(x)/'-s(B - x) lslt x /,(B) _ {-x 2 (B-X)2} -K)exp b- 2(I-s) _ K {_ I(X - BSII)2} - 2 exp 2 ( ) . s 1- s Hence the conditional distribution of X(s) given that X(I) = B is, for s < I, normal with mean and variance given by (81.4a) (814b) E[X(s) I X(t) = B] = Bslt, Var(X(s) I X(I) = B) = S(I - s)ll. It is interesting to note that the conditional variance of X(s), given that X(I) = B, s < t, does not depend on B! That is, if we let sit = a,O < a < 1, then the conditional distribution of X(s) given X(t) is normal with mean aX(I) and variance a(1 - a)l. It also follows from (81.3) that X(t),. ,X(tn) has a joint distribution that is multivariate normal, and thus a Brownian motion process is a Gaussian process where we have made use of the following definition. Definition A stochastic process {X(t), t 2: O} is called a Gaussian process if X(t,). , X(tn) has a multivariate normal distnbution for all t 1 , , tn Since a multivariate normal distribution is completely determined by the marginal mean values and the covariance values, it follows that Brownian motion could also be defined as a Gaussian process having E[X(I)] = 0 and, for s ::; I, Cov(X(s), X(t» = Cov(X(s), X(s) + X(I) - X(s» = Cov(X(s), X(s» + Cov(X(s), X(I) - X(s» =s, where the last equality follows from independent increments and Var(X(s» = s 360 BROWNIAN MOTION AND OTHER MARKOV PROCESSES Let {X(t), t ;::= O} be a Brownian motion process and consider the process values between 0 and 1 conditional on X(1) = O. That is, consider the condi- tional stochastic process {X( t), 0 ~ 1 :s; 11 X(1) = O}. By the same argument as we used in establishing (8 1.4), we can show that this process, known as the Brownian Bridge (as it is tied down both at 0 and l), is a Gaussian process Let us compute its covariance function. Since, from (8 1.4), E[X(s) I X(I) = 0] = 0 for s < 1, we have that, for s < t < 1, Cov[(X(s), X(t» I X(I) = 0] = E[X(s)X(t) I X(1) = 0] = E[E[X(s)X(I)IX(I), X(1) = O]IX(1) = 0] = E[X(t)E[X(s)IX(t)]IX(1) = 0] = E [x(t) 7 X(I) I X(1) = 0] s = -1(1 - I) t = s(1 - I). (by (8 l.4b» (by (8 1.4a» Thus the Brownian Bridge can be defined as a Gaussian process with mean value 0 and covariance function s(1 - t), s :5 t. This leads to an alternative approach to obtaining such a process. PROPOSITION 8.1.1 If {X(t), t ~ O} is Brownian motion, then {Z(t), 0 :5 t:5 1} is a Brownian Bridge process when Z(t) = X(t) - tX(1) Proof Since it is immediate that {Z(t), t ~ O} is a Gaussian process, all we need venfy is that E[Z(t)J = 0 and Cov(Z(s), Z(t)) = s(1 - t) when s:5 t The former is immediate and the latter follows from Cov(Z(s), Z(t)) = Cov(X(s) - sX(l), X(t) - tX(1)) = Cov(X(s), X(t)) - t Cov(X(s), X(1)) - s Cov(X(1), X(t)) + st Cov(X( 1), X(1)) = s - st - st + st = s(1 - t), and the proof is complete INTRODUcrlON AND PRELIMINARIES 361 The Brownian Bridge plays a pivotal role in the study of empirical distribu- tion functions. To see this let XI , X 2 , • be independent uniform (0, 1) random variables and define Nn(s), 0 < S < 1, as the number of the first n that are less than or equal to s. That is, n Nn(s) = 2: l,(s), ,= I where l,(s) = { ~ if X, ~ s otherwise The random function Fn(s) = Nn(s)/n, 0 ~ s ~ 1, is called the Empirical Distribution Function Let us study its limiting properties as n ~ 00 Since Nn(s) is a binomial random variable with parameters nand s, it follows from the strong law of large numbers that, for fixed s, as n ~ 00 with probability 1 In fact, it can be proven (the so-called Glivenko-Cantelli theorem) that this convergence is uniform in s. That is, with probability 1, sup IFn(s) - sl ~ 0 0<,<1 asn ~ 00 It also follows, by the central limit theorem, that for fixed s, Vn(Fn(s) - s) has an asymptotic normal distribution with mean 0 and variance s(1 - s). That is, P{an(s) < x} ~ V 1 II exp {2 ( ~ y 2 )} ds, 21TS(1 - s) -'" s - s where Let US study the limiting properties, as n ~ 00, of the stochastic process fanes), 0 ~ s ~ I} To start with, note that, for s < t, the conditional distribution of Nn(t) - Nn(s), given Nn(s), is just the binomial distribution with parameters n - Nn(s) and (t - s)/(1 - s). Hence it would seem, using the central limit theorem, that the asymptotic joint distribution of an(s) and an(t) should be a bivariate normal distribution In fact, similar reasoning makes it plausible to expect the limiting process (if one exists) to be a Gaussian process To see 362 BROWNIAN MOTION AND OTHER MARKOV PROCESSES which one, we need to compute E[an(s)] and Cov(an(s), an(t». Now, and, for 0 :5 S < I < 1, Cov(an(s), an(t» = n Cov(Fn(s), Fn(t» 1 = - Cov(Nn(s), Nn(t» n E[ E[Nn(s)Nn(t) I Nn(s)]] - n 2 sl n n = s(1 - I), where the last equality follows, upon simplification, from using that Nn(s) is binomial with parameters n, s Hence it seems plausible (and indeed can be rigorously shown) that the limiting stochastic process is a Gaussian process having a mean value function equal to 0 and a covariance function given by s(1 - I), 0 :5 s < I :5 1. But this is just the Brownian Bridge process Whereas the above analysis has been done under the assumption that the X, have a uniform (0, 1) distribution, its scope can be widened by noting that if the distribution function is F, then, when F is continuous, the random variables F(X,) are uniformly distributed over (0, 1). For instance, suppose we want to study the limiting distribution of Vn sup IFn(x) - F(x)1 x for an arbitrary continuous distribution F, where Fn(x) is the proportion of the first n of the X" independent random variables each having distribution F, that are less than or equal to x From the preceding it follows that if we let an(s) =:: Vn[(number of x" i = 1,. ,n· F(X,) :5 s) - s] =:: Vn[(number of X" i = 1, ,n· X, < F-l(S» - s] =:: Vn[Fn(F-l(s» - s] =:: Vn[Fn(Ys) - F(y.)], HllTING TIMES. MAXIMUM VARIABLE. AND ARC SINE LAWS 363 where Ys = F-1(s), then {an(s), 0 ~ s ~ 1} converges to the Brownian Bridge process Hence the limiting distribution of Vn sUPx(Fn(x) F(x» is that of the supremum (or maximum, by continuity) of the Brownian Bridge Thus, where {Z(t), t ~ O} is the Brownian Bridge process. 8.2 HITTING TIMES, MAXIMUM VARIABLE, AND ARC SINE LAWS Let us denote by To the first time the Brownian motion process hits a When a > 0 we will compute P{T a ~ t} by considering P{X(t) ::> a'} and conditioning on whether or not To <:: t This gives (8.2.1) P{X(t) ~ a} = P{X(t) ~ alTa ~ t}P{T a ~ t} + P{X(t) ~ al Ta > t}P{T a > t}, Now if To ~ t, then the process hits a at some point in [0, t] and, by symmetry, it is just as likely to be above a or below a at time t That is, P{X(t) ::> alTa <:: t} = ~ Since the second right-hand term of (8 2 1) is clearly equal to 0 (since by continuity, the process value cannot be greater than a without having yet hit a), we see that (8.2 2) P{Ta ~ t} = 2P{X(t) ::> a} = -2-I'" e- x212 , dx v'21ii a 2 IX 2 = -- e- y !2 dy ~ alV. ' a>O Hence, we see that 2 I'" 2 P{Tu < oo} = lim P{T a ~ t} = -- e- Y 12 dy 1_'" ~ U 1 364 BROWNIAN MOTION AND OTHER MARKOV PROCESSES In addition, using (8.2.2) we obtain = 00 Thus it follows that Ta, though finite with probability 1, has an infinite e x p e c t a ~ tion That is, with probability 1, the Brownian motion process will eventually hit a, but its mean time is infinite. (Is this intuitive? Think of the symmetric random walk) For a < 0, the distribution of To is, by symmetry, the same as that of T- o Hence, from (8.2 2) we obtain Another random variable of interest is the maximum value the process attains in [0, I]. Its distribution is obtained as follows. For a > 0 (by continuity) Let 0(11' 1 2 ) denote the event that the Brownian motion process takes on the value 0 at least once in the interval (II, 12)' To compute P{O(/I' 1 2 )}, we HITTING TIMES, MAXIMUM VARIABLE, AND ARC SINE LAWS 365 condition on X(II) as follows: Using the symmetry of Brownian motion about the origin and its path continu- ity gives Hence, using (82.2) we obtain The above integral can be explicitly evaluated to yield Hence we have shown the following. PROPOSITION 8.2.1 For 0 < x < 1, P{no zeros in (xt, t)} = ~ arc sineYx 1T Remarks Proposition 8.2.1 does not surprise us For we know by the remark at the end of Section 3 7 of Chapter 3 that for the symmetric random walk P{no zeros in (nx, n)} = ~ arc sineYx 1T with the approximation becoming exact as n _ 00 Since Brownian motion is the limiting case of symmetric random walk when the jumps come quicker and quicker (and have smaller and smaller sizes), it seems intuitive that for 366 BROWNIAN MOTION AND OTHER MARKOV PROCESSES Brownian motion the above approximation should hold with equality. Proposi- tion 8.2 1 verifies this The other arc sine law of Section 3.7 of Chapter 3-namely, that the proportion of time the symmetric random walk is positive, obeys, in the limit, the arc sine distribution-can also be shown to be exactly true for Brownian motion. That is, the following proposition can be proven. PROPOSrrlON 8.2.2 For Brownian motion, let A(t) denote the amount of time in rO, tJ the process is positive Then, for 0 < x < 1. P{A (t)/ t :s; x} 2 '. r -arCSIne vx 1T 8.3 VARIATIONS ON BROWNIAN MOTION In this section we consider four variations on Brownian motion The first supposes that Brownian motion is absorbed upon reaching some given value, and the second assumes that it is not allowed to become negative. The third variation deals with a geometric, and the fourth, an integrated version. 8.3.1 Brownian Motion Absorbed at a Value Let {X(t)} be Brownian motion and recall that Tx is the first time it hits x, x > O. Define Z(t) by { X(t) Z(t) = x ift < T, then {Z(t), t ::> O} is Brownian motion that when it hits x remains there forever The random variable Z(t) has a distribution that has both discrete and continuous parts The discrete part is P{Z(t) x} P{T x <: t} = 2 f: e- y2J21 dy (from (82.2». VARIATIONS ON BROWNIAN MOTION 367 For the continuous part we have for y < x (831) P{Z(t) < y} = P {X(t) y, X(s) < x} = P{X(t) y} - P {X(t) < y, X(s) > x} We compute the second term on the right-hand side as follows (83.2) P {X(t) y, X(s) > x} = p{X(t) X(s) > x} P X(s) >x} Now the event that maXO';S:5d X(s) > x is equivalent to the event that Tx < t; and if the Brownian motion process hits x at time T x , where Tx < t, then in order for it to be below y at time t it must decrease by at least x - y in the additional time t - Tx By symmetry it is just as likely to increase by that amount Hence, (8.33) P {X(t) X(s) > x} = P {X(t) 2x - X(s) > x} From (83.2) and (833) we have P {X(t) y, X(s) >x} = p{X(t) 2x - y, X(s) > x} = P{X(t) :> 2 x - y} (since y < x), and from (8.3.1) P{Z(t) y} = P{X(t) y} - P{X(t) 2x - y} = P{X(t) y} - P{X(t) Y - 2x} =-- e- u 2rdu 1 JY 2, vI2m y-2x (by symmetry of the normal distribution) 368 BROWNIAN MOTION AND OTHER MARKOV PROCESSES 8.3.2 Brownian Motion Reflected at the Origin If {X(t), t :> O} is Brownian motion, then the process {Z(t), t :> O}, where Z(t) = IX(t)!. t ~ 0, is called Brownian motion reflected at the origin. The distribution of Z(t) is easily obtained For y > 0, P{Z(t) ~ y} = P{X(t) ~ y} P{X(t) ~ - y} = 2P{X(t) <: y} 1 where the next to last equality follows since X(t) is normal with mean O. The mean and variance of Z(t) are easily computed and (8.34) E[Z(t)] = V2t!TT, Var(Z(t)) = (1 -;) t 8.3.3 Geometric Brownian Motion If {X(t), t ~ O} is Brownian motion, then the process {yet), t ~ O}, defined by yet) = e XUI , is called geometric Brownian motion Since X(t) is normal with mean 0 and variance t, its moment generating function is given by and so E[Y(t)] = E[eX(II] e l12 Var(Y(/)) = E[y2(/)] (E[Y(t)])2 E[e 2X (/I] e 1 Geometric Brownian motion is useful in modeling when one thinks that the percentage changes (and not the absolute changes) are independent and identically distributed For instance, suppose that Yen) is the price of some commodity at time n Then it might be reasonable to suppose that yen)! Yen 1) (as opposed to Y" - Y,,-l) are independent and identically distrib- VARIATIONS ON BROWNIAN MOTION 369 uted. Letting Xn = Y(n)/Y(n - 1), then, taking Y(O) = 1, and so n log Y(n) = L log X" ,= I Since the X, are independent and identically distributed, log Y(n), when suit- ably normalized, would be approximately Brownian motion, and so {Y(n)} would be approximately geometric Brownian motion. EXAMPLE 8.3(A) The Value of a Stock Option. Suppose one has the option of purchasing, at a time T in the future, one unit of a stock at a fixed price K. Supposing that the present value of the stock is y and that its price varies according to geometric Brownian motion, let us compute the expected worth of owning the option. As the option will be exercised if the stocks pnce at time T is K or higher, its expected worth is E[max(Y(T) - K,O)] = J: P{Y(T) - K > a} da = J: P{ ye X( T) - K > a} da J a> { K + a} = a P X(T) > log-y- da 1 Ja> Ja> 2 = -- e- X f2T dx da V2rrT a log[(K + a)ly] 8.3.4 Integrated Brownian Motion If {X(t), t ~ O} is Brownian motion, then the process {Z(t), t ~ O} defined by (83.5) Z(t) = t X(s) ds is called integrated Brownian motion As an illustration of how such a process may arise in practice, suppose we are interested in modeling the price of a commodity throughout time Letting Z(t) denote the price at t, then, rather than assuming that {Z(t)} is Brownian motion (or geometric Brownian motion), we might want to assume that the rate of change of Z(t) follows a Brownian motion. For instance, we might suppose that the rate of change of the commod- 370 BROWNIAN MOTION AND OTHER MARKOV PROCESSES ity's price is the current inflation rate, which is imagined to vary as Brownian motion Hence, d dl Z(I) = X(I) or Z(I) = Z(O) + J ~ X(s) ds It follows from the fact that Brownian motion is a Gaussian process that {Z(I), 1 ~ O} is also Gaussian To prove this first recall that WI, .. , Wn are said to have a joint normal distribution if they can be represented as m W, = 2: a"U" ,=1 i = 1, ... , n, where U"j = 1, .. ,m are independent normal random variables. From this it follows that any set of partial sums of WI,. ,W n are also jointly normal. The fact that Z(II), , Z(tn) is jointly normal can then be shown by writing the integral in (8.3 5) as a limit of approximating sums. Since {Z(I), 1 ~ O} is Gaussian, it follows that its distribution is characterized by its mean value and covariance function. We now compute these' For s ~ I, (8.3 6) £[Z(I)] = £ [t X(s) dSJ = t £[X(s)] ds =0. Cov[Z(s), Z(I)] = £[Z(s)Z(I)] = £ [tX(Y)d Y tX(u)du ] = £ [ J ~ t X(y)X(u) dy dUJ = t t £[X(y)X(u)] dy du = t t min(y, u) dy du = t (1: y dy + t u dY) du = S2 ( ~ - ~ ) VARIATIONS ON BROWNIAN MOTION 371 The process {Z(t), t ;::: O} defined by (8 3 5) is not a Markov process. (Why not?) However, the vector process {(Z(t), X(t», t :> O} is a Markov process. (Why?) We can compute the joint distribution of Z(t), X(t) by first noting, by the same reasoning as before, that they are jointly normal. To compute their covariance we use (8 3 6) as follows' Cov(Z(t), Z(t) - Z(t - h» = Cov(Z(t), Z(t» - Cov(Z(t), Z(t - h» However, and so = ~ - (t h)2 [i t 6 h] t 2 hl2 + o(h) Cov(Z(t), Z(I) - Z(t h» Cov (Z(I), J:-h Xes) tis) = Cov(Z(t), hX(I) + o(h» = h Cov(Z(t), X(I» + o(h), Cov(Z(t), X(t» = t 1 12. Hence, Z(t), X(t) has a bivariate normal distribution with £[Z(I)] = £[X(t)] 0, Cov(X(t), Z(t» = t 2 /2. Another form of integrated Brownian motion is obtained if we suppose that the percent rate of change of price follows a Brownian motion process. That is, if Wet) denotes the price at t, then :t Wet) = X(t) Wet) or Wet) = W(O) exp {f: Xes) dS}, where {X(t)} is Brownian motion. Taking W(O) = 1, we see that 372 BROWNIAN MOTION AND 0 fHER MARKOV PROCESSES Since Z(I) is normal with mean 0 and variance 12(112 - 116) ;:::: 1 3 /3, we see that E[W(I)] = exp{16/6} 8.4 BROWNIAN MOTION WITH DRIFT We say that {X(I), I O} is a Brownian motion process with drift coefficient j.t if. (i) X(O) = 0, (ii) {X(I), I ::> O} has stationary and independent increments; (iii) X(I) is normally distributed with mean j.tl and variance I We could also define it by saying that X(I) = B(I) + j.tl, where {B(I)} is standard Brownian motion Thus a Brownian motion with drift is a process that tends to drift off at a rate j.t. It can, as Brownian motion, also be defined as a limit of random walks. To see this, suppose that for every dl time unit the process either goes one step of length dx in the positive direction or in the negative direction, with respective probabilities p and 1 - P If we let x, = { 1 -1 if the ith step is in the positive direction otherwise, then X(I), the position at time I, is Now E[X(I)] = dx [II dl](2p - 1), Var(X(I» = (dX)2[lldl][1 - (2p - 1)2]. Thus if we let dx ;:::: = + and let dl -? 0, then E[X(I)] -? j.tl, Var(X(I» -? I, and indeed {X(I)} converges to Brownian motion with drift coefficient j.t We now compute some quantities of interest for this process. We start with the probability that the process will hit A before -B, A, B > 0 Let P(x) denote this probability conditionally on the event that we are now at x, - B < x < A That is, P(x) = P{X(I) hits A before - BIX(O) = x} BROWNIAN MOTION WITH DRIFT 373 We shall obtain a differential equation by conditioning on Y = X(h) - X(O), the change in the process between time 0 and time h This yields P(x) = E[ P(x + Y)] + o(h), where o(h) in the above refers to the probability that the process would have already hit one of the barriers, A or -8, by time h. Proceeding formally and assuming that P( y) has a Taylor series expansion about x yields P(x) = E[P(x) + P'(x)Y + P"(x)y 2 /2 + .. ] + o(h) Since Y is normal with mean J.th and variance h, we obtain (841) 2h 2 + h P(x) = P(x) + P'(x)J.th + P"(x) J.t 2 + o(h) since the sum of the means of all the terms of differential order greater than 2 is o(h). From (84.1) we have P '( ) P"(x) _ o(h) x J.t + -2-- h' and letting h ~ 0, P '( ) P"(x) _ 0 xJ.t+--- 2 Integrating the above we obtain or, equivalently, or or, upon integration, Thus 374 BROWNIAN MOTION AND OTHER MARKOV PROCESSES Using the boundary conditions that P(A) = 1, P( - B) = 0, we can solve for C I and C 2 to obtain and thus (842) Starting at x = 0, P(O), the probability of reaching A before - B is thus (843) e 2 /1. B - 1 P{process goes up A before down B} = 2/1. H -2 A e - e /I. Remarks (1) Equation (843) could also have been obtained by using the limiting random walk argument For by the gamblers ruin problem (see Exam- ple 4 4(A) of Chapter 4) it follows that the probability of going up A before going down B, when each gamble goes up or down ~ x units with respective probabilities p and 1 - p, is (844) ( 1 - p)HIIlX 1- -- P{ up A before down B} = ----p--- ( 1 - p)(A +B)lllx 1- -- p When p = (n(1 + J . L ~ x ) , we have hm -- = hm J.L . (1 -p)IIIlX . (1 - ~ x ) Illlx Ilx-O p Ilx-O 1 + J . L ~ x Hence, by letting ~ x ~ 0 we see from (844) that 1 - e- 2 /1.B P{ up A before down B} = 1 _ e -2/1.(A + B)' which agrees with (843) BROWNIAN MOTION WITH DRIFT 375 (2) If J.L < 0 we see from (S.4.3), by letting B approach infinity that (S.4.5) P{process ever goes up A} = e 2 1'A Thus, in this case, the process drifts off to negative infinity and its maximum is an exponential random variable with rate -2J.L. EXAMPLE 8.4(A) Exercising a Stock Option. Suppose we have the option of buying, at some time in the future, one unit of a stock at a fixed price A, independent of its current market price. The current market price of the stock is taken to be 0, and we suppose that it changes in accordance with a Brownian motion process having a negative drift coefficient -d, where d > 0 The question is, when, if ever, should we exercise our option? Let us consider the policy that exercises the option when the market price is x. Our expected gain under such a policy is P(x)(x - A), where P(x) is the probability that the process will ever reach x. From (S.4 5) we see that x> 0, the optimal value of x is the one maximizing (x - A )e- 2d x, and this is easily seen to be x = A + 1/2d For Brownian motion we obtain by letting J.L -7 0 in Equation (S.4.3) (S.4.6) P{Brownian motion goes up A before down B} = B B A+ EXAMPLE 8.4{a) Optimal Doubling in Backgammon. Consider two individuals that for a stake are playing some game of chance that eventually ends with one of the players being declared the winner Initially one of the players is designated as the "doubling player," which means that at any time he has the option of doubling the stakes If at any time he exercises his option, then the other player can either quit and pay the present stake to the doubling player or agree to continue playing for twice the old stakes If the other player decides to continue playing, then that player becomes the "doubling player" In other words, each time the doubling player exercises the option, the option then switches to the other player A popular game often played with a doubling option is backgammon 376 BROWNIAN MOTION AND OTHER MARKOV PROCESSES We will suppose that the game consists of watching Brownian motion starting at the value t If it hits 1 before 0, then player I wins, and if the reverse occurs, then player II wins. From (8.4.6) it follows that if the present state is x, then, if the game is to be continued until conclusion, player I will win with probability x Also each player's objective is to maximize his expected return, and we will suppose that each player plays optimally in the game theory sense (this means, for instance, that a player can announce his strategy and the other player could not do any better even knowing this information) It is intuitively clear that the optimal strategies should be of the following type. OPTIMAL STRATEGIES Suppose player I(II) has the option of doubling, then player I(II) should double at time t iff (if and only if) X(t) 2: p* (X(t) ~ 1 - p"') Player II's (I's) optimal strategy is to accept a double at tiff X(t) ~ p** (X(t) 2: 1 - p**) It remains to compute p* and p** Lemma I p* ~ p** Proof For any p > p**, it follows that player II would quit if X(t) = p and player I doubles Hence at X(t) = p player I can guarantee himself an expected gain of the present stake by doubling, and since player II can always guarantee that I never receives more (by quitting if player I ever doubles), it follows that it must be optimal for player I to double Hence p* ~ p** Lemma 2 p* = p** Proof Suppose p* < p** We will obtain a contradiction by showing that player I has a better strategy than p* Specifically, rather than doubling at p*, player I can do better by waiting for X(t) to either hit 0 or p** If it hits p**, then he can double, and since player II will accept, player I will be in the same position as if he doubled at p* On the other hand. if 0 is hit before p**. then under the new policy he will only lose the onginal stake whereas under the p* policy he would have lost double the stake Thus from Lemmas 1 and 2 there is a single critical value p* such that if player I has the option. then his optimal strategy is to double at tiff X(t) 2: p* Similarly, player II's optimal strategy is to accept at t iff X(t) ~ p* By continuity it follows that both players are indifferent as to their choices when the state is p* To compute p*, we will take advantage of their indifference Let the stake be 1 unit Now if player I doubles at p*, then player II is indifferent as to quitting or accepting the double Hence, since player I wins 1 under the former BROWNIAN MOTION WITH DRIFT 377 alternative, we have 1 = E[gain to player I if player II accepts at p*]. Now if player II accepts at p*, then II has the option of the next double, which he will exercise if X(t) ever hits 1 p*. If it never hits 1 - p* (that is, if it hits 1 first), then II will lose 2 units Hence, since the probability of hitting 1 - p* before 1, when starting at p* is, by (846), (1 - p*)lp*, we have 1 = E[gain to Ilhits 1 1 - p*] p* +2 -1 p* Now if it hits 1 - p*, then II will double the stakes to 4 units and I will be indifferent about accepting or not. Hence, as I will lose 2 if he quits, we have Elgain to Ilhits 1 - p*] = -2, and so or 1 + 2-'--- p* p* = !. ExAMPLE 8.4(c) Controlling a Production Process. In this exam- ple, we consider a production process that tends to deteriorate with time Specifically, we suppose that the production process changes its state in accordance with a Wiener process with drift coefficient p" p, > 0 When the state of the process is B, the process is assumed to break down and a cost R must be paid to return the process back to state O. On the other hand, we may attempt to repair the process before it reaches the breakdown point B. If the state is x and an attempt to repair the process is made, then this attempt will succeed with probability ax and fail with probability 1 - ax If the attempt is successful, then the process returns to state 0, and if it is unsuccessful, then we assume that the process goes to B (that is, it breaks down) The cost of attempting a repair is C We shall attempt to determine the policy that minimizes the long-run average cost per time, and in doing so we will restrict attention to policies that attempt a repair when the state of the process is x, 0 < x < B. For these policies, it is clear that returns to state 0 constitute renewals, and thus by Theorem 3.61 of Chapter 378 BROWNIAN MOTION AND OTHER MARKOV PROCESSES 3 the average cost is just E[cost of a cycle] = C + R(1 - ax) (8.4.7) E [length of a cycle] E [time to reach x]' Let f(x) denote the expected time that it takes the process to reach x We derive a differential equation for f(x) by conditioning on Y = X(h) X(O) the change in time h This yields f(x) = h + E[f(x - V)] + o(h), where the o(h) term represents the probability that the process have already hit x by time h. Expanding in a Taylor series gIves [ y2 f(x) = h + E f(x) - Yf'(x) +"2 f"(x) + . :::: h + f(x) - phf'(x) + 1 rex) + o(h), or, equivalently, and letting h -. 0 (84.8) 1 = p.f'(x) f"(x) o(h) --+- 2 h' 1 = p.f' (x) - f"(x)/2. .] + o(h) Rather than attempting to solve the above directly note that f(x + y) E[time to reach x + y from 0] E[time to reach x] + E[time to reach x + y from x] = f(x) + fey)· Hence, f(x) is of the form f(x) = ex, and from (8.4.8) we see that e = 11 p. Therefore, f(x) = xl p.. Hence, from (84.7), the policy that attempts to repair when the state is x, 0 < x < B, has a average cost of p.[C + R(I - ax)] x BROWNIAN MOTION WITH DRIFT while the policy that never attempts to repair has a long-run average cost of RJLIB For a given function a(x), we can then use calculus to determine the policy that minimizes the long-run average cost. 379 Let Tx denote the time it takes the Brownian motion process with dnft coefficient JL to hit x when JL > 0 We shall compute E[e- IITx ], 0 > 0, its moment generating function, for x > 0, in much the same manner as E [Tx] was computed in Example S.4(C). We start by noting (S 4.9) E[exp{-OTx+y}] = E[exp{-O(Tx + Tx+y - TJ}] = E[exp{-OTx}]E[exp{-O(Tx+y - Tx)}] = E[ exp{ - OTx}] E [exp{ - OTy}]' where the last equality follows from stationary and the next to last from independent increments. But (S.4 9) implies for some c > O. To determine c let We will obtain a differential equation satisfied by [by conditioning on Y = X(h) - X(O). This yields [(x) = E[exp{-O(h + Tx- y)}] + o(h) = e- lIh E[t(x - V)] + o(h), where the term o(h) results from the possibility that the process hits x by time h Expanding the above in a Taylor series about x yields [(x) = e- lIh E [[(X) - Yf'(x) + ~ , 2 ["(x) +. ] + o(h) = e- lIh [[(X) - JLhf'(x) + ~ ["(x) ] + o(h). Using e- lIh = 1 - 8h + o(h) now gives h [(x) = [(x)(1 - Oh) - JLhf'(x) +"2 ["(x) + o(h) 380 BROWNIAN MOTION AND OTHER MARKOV PROCESSES Dividing by h and letting h -l> 0 yields 8f(x) = -J.Lf'(x) + U"(x), and, using f(x) = e-cX, or Hence, we see that either (8410) And, since c > 0, we see that when J.L ~ 0 Thus we have the following PROPOSITION 8.4.1 Let T, denote the time that Brownian motion with drift coefficient p.. hits x Then for fJ > O. x> 0, (84 11) E[exp{ - fJT,}j = exp{ -x(v' p..2 + 2fJ - p..)} if p.. ~ 0 We will end this section by studying the limiting average value of the maximum variable Specifically we have the following PROPOSITION 8.4.2 If {X(t), t ~ O} is a Brownian motion process with drift coefficient p.., p.. 2! 0, then, with probability 1, max X(s) lim ----- = p.. 1_'" t BROWNIAN MOTION WITH DRIFT 381 Proof Let To 0, and for n > 0 let 1 ~ denote the time at which the process hits n It follows. from the assumption of stationary, independent increments, that Tn - Tn _ I , n 2:: 1, are independent and identically distributed Hence, we may think of the Tn as being the times at which events occur in a renewal process Letting N(t) be the number of such renewals by t, we have (84 12) N(t):;:; max X(f):;:; N(t) + 1 O ~ ' ' ' t Now, from the results of Example 8 4(C), we have ET, = 11 fL, and hence the result follows from (84 12) and the well-known renewal result N(t)lt....,. 11 ET, 8.4.1 Using Martingales to Analyze Brownian Motion Brownian motion with drift can also be analyzed by using martingales There are three important martingales associated with standard Brownian motion. PROPOSITION 8.4.3 Let {B(t), 1 2: O} be standard Brownian motion Then {Y(t), t 2:: O} is a martingale when (a) Yet) = B(t), (b) Y(t) B2(t) t, and (c) Yet) exp{cB(t) - c 2 tI2}, where c is any constant The martingales in parts (a) and (b) have mean 0, and the one in part (c) has mean 1 Proof In all cases we write B(t) as B(s) + [ B(t) - B( s) J and utilize independent incre- ments (a) E[B(t)IB(u),O:;:;U$s] = E[B(s)IB(u),O$U$s] + E[B(I) - B(s)IB(u), 0:;:; Ii:;:; s] = B(s) + E[B(I) - B(s)] = B(s) where the next to last equality made use ofthe independent increments property of Brownian motion 382 BROWNIAN MOTION AND OTHER MARKOV PROCESSES (b) E[B2(t)IB(u), O:s u:5 s] = E[{B(s) + B(t) - B(sWIB(u), O:s u:5 s] == B2(S) + 2B(s)E[B(t) - B(s)IB(u), 0:5 u:5 s] + E[{B(t) - B(sWIB(u), O:s u:5 s] ;:: B2(S) + 2B(s)E[B(t) - B(s)] + E[{B(t) - B(sW] = B2(S) + E[B2(t - s)] = B2(S) + t - s which verifies that B2(t) - t is a martingale We leave the verification that exp{cB(t) - c 2 t/2}, t 2: 0 is a martingale as an exercise Now, let X(t) = B(t) + JJ-t and so {X(t), t :;::= O} is Brownian motion with drift JJ- For positive A and B, define the stopping time T by T = min{t· X(t) = A or X(t) = -B}. We will find P A ;: P{X(T) = A} by making use of the martingale in Proposition 843(C), namely, Y(t) = exp{cB(t) - c 2 tI2}. Since this martingale has mean 1, it follows from the martingale stopping theorem that E[exp{cB(T) - c 2 T12] = 1 or, since B(T) :::: X(T) - JJ- T, E[exp{cX(T) - cJJ-T - c 2 T12] = 1 Letting c = - 2JJ- gives E[exp{ -2JJ-X(T)}] = 1 But, X(T) is either A or - B, and so we obtain that and so, thus verifying the result of Eq uation (84.3) BACKWARD AND FORWARD DIFFUSION EQUATIONS 383 If we now use the fact that {B(t), t > O} is a zero-mean martingale then, by the stopping theorem, 0= E[B(T)] = E[X(T) - JLT] = E[X(T)] JLE[Tl AP A - B(l - P A ) JLE[Tl Using the preceding formula for P A gives that 8.5 BACKWARD AND FORWARD DIFFUSION EQUATIONS The derivation of differential equations is a powerful technique for analyzing Markov processes There are two general techniques for obtaining differential equations. the backwards and the forwards technique. For instance suppose we want the density of the random variable X(I) The backward approach conditions on the value of X(h)-that is, it looks all the way back to the process at time h. The forward approach conditions on X(t - h). As an illustration, consider a Brownian motion process with drift coefficient JL and let p(x, t, y) denote the probability density of X(I), given X(O) = y. That is, p(x, I, y) = lim P{x < X(I) < X + axlx(O) y}! ax. Ax_O The backward approach is to condition on X(h). Acting formally as if p(x, I, y) were actually a probability, we have p(x, I, y) = E[P{X(t) = xIX(O) = y, X(h)}]. Now P{X(t) = xIX(O) = y, X(h) = Xii} :: P{X(t - h) = xIX(O) Xii}' and so p(x, I; y) = E[p(x, t - h; X(h»l, where the expectation is with respect to X(h), which is normal with mean JLh + Y and variance h. Assuming that we can expand the right-hand side of 384 BROWNIAN MOTION AND OTHER MARKOV PROCESSES the above in the Taylor series about (x, t; y), we obtain p(x, t; y) = E [P(X, t; y) a h-p(x,t;y) at a h 2 a 2 + (X(h) y)-p(x,t;y) +-2 -2P(X,t,Y) ay at + (X(h) - y)2 a 2 p(x t y) + ... J 2 ay2 " = p(x, t; y) a a h-p(x,l,y) + ph-p(x,t,y) at ay h a 2 + -2 -2 p(x, t; y) + o(h). ay Dividing by h and then letting it approach 0 gives (8.5 1) 1 a 2 a a 2 ay2 p(x, I, y) + J.L ay p(x, I; y) = at p(x, t; y). Equation (8.5.1) is caned the backward diffusion equation. The forward equation is obtained by conditioning on X(I - h). Now P{X(t) = xIX(O) = y, X(I - h) = a} P{X(h)::::: xIX(O) ::::: a} = P{W=x -a}, where W is a normal random variable with mean J.Lh and variance h. Letting its density be fw, we thus have p(x, I, y) = f fw(x - a)p(a, 1- h; y) da = f [P(X, I; y) + (a x) J!..- p(x, I; y) - h ~ p(x, t, y) ax at (a X)2 a 2 ] + 2 ax 2P (X,I,y)+ ... fw(x-a)da a a p(x, t; y) - J.Lh -;- p(x, t; y) - h - p(x, t; y) uX at h a 2 + 2 -2 p(x, I, y) + o(h) ax APPLICATIONS OF THE KOLMOGOROV EQUATIONS 385 Dividing by h and then letting it go to zero yields (85.2) Equation (8.52) is called the forward diffusion equation. 8.6 ApPLICATIONS OF THE KOLMOGOROV EQUATIONS TO OBTAINING LIMITING DISTRIBUTIONS The forward differential equation approach, which was first employed in obtaining the distribution of N( t) for a Poisson process, is useful for obtaining limiting distributions in a wide variety of models This approach derives a differential equation by computing the probability distribution of the system state at time t + h in terms of its distribution at time t, and then lets t -. 00. We will now illustrate its use in some models, the first of which has previously been studied by other methods. 8.6.1 Semi-Markov Processes A semi-Markov process is one that, when it enters state i, spends a random time having distribution H, and mean J-L, in that state before making a transition. If the time spent in state i is x, then the transition will be into state j with probability p'J(x), i, j ~ 0 We shall suppose that all the distributions H, are continuous and we define the hazard rate function A,(t) by A,(t) = h,(t)/H,(t), where h, is the density of H, Thus the conditional probability that the process will make a transition within the next dt time units, given that it has spent t units in state i, is A,(t) dt + o(dt) We can analyze the semi-Markov process as a Markov process by letting the "state" at any time be the pair (i, x) with i being the present state and x the amount of time the process has spent in state i since entering Let P {at time t state is i and time since} . . entering is between x - h and x p,(l, x) = hm h h_O That is, P, (i, x) is the probability density that the state at time tis (i, x). For x > 0 we have that (8.6.1) Pl+h(i, x + h) ;;: p,(i, x)(l - A,(x)h) + o(h) 386 BROWNIAN MOTION AND OTHER MARKOV PROCESSES since in order to be in state (i, x + h) at time t + h the process must have been in state (i, x) at time t and no transitions must have occurred in the next h time units Assuming that the limiting density p(i, x) = lim/_co P,(i, x) exists, we have from (86.1), by letting t _ 00, p(i,x+h)-p(i'X)=_i() (. )+o(h) h 1\, x P t, X h And letting h - 0, we have :Xp(i,X) = -A,(x)p(i,x). Dividing by p(i, x) and integrating yields ( P(i, X)) JX log p(i,O) = - 0 A,(y) dy or p(i, x) = p(i, 0) exp ( - t A/(y) d Y ) The identity (see Section 1.6 of Chapter 1) H,(x) = exp ( - t A,(y) d Y ) thus yields (8.6.2) p(i, x) = p(i, O)H,(x). In addition, since the process will instantaneously go from state (j, x) to state (i, 0) with probability intensity (x), we also have p(i, 0) = L Lp(j,X)A,(X)P,,(x) dx , = L p(j, 0) L H, (x) dx (from (8 6.2)) , = LP(j, 0) , APpLICATIONS OFTHE KOLMOGOROV EQUATIONS 387 Now Ix hj(x)Pj,(x) dx is just the probability that when the process enters state j it will next enter i Hence, calling that probability P j " we have P(i, 0) = L p(j, O)P j , j If we now suppose that the Markov chain of successive states, which has transition probabilities P j " is ergodic and has limiting probabilities 1T" i > 0, then since p(i, 0), i > 0, satisfy the stationarity equations, it follows that, for some constant c, (863) p(l, 0) = C1T" all i From (8.62) we obtain, by integrating over x, (864) P{state is i} = J p(i, x) dx = p(i, O)fJ-, (from (862» = C1T,fJ-, (from (8 63». Since L, P{state is i} = 1, we see that 1 c==--- L 1T,fJ-,' and so, from (862) and (863), (865) • 1T,fJ-, H,(x) p(l,X) = ~ -- L.J 1T, fJ- , fJ-, From (8 64) we note that (866) P{ ""} 1T,fJ-, state IS l = ~ , L.J 1T,fJ-, and, from (8 6 5), P{time in state ~ ylstate is i} = JY H,(y) dy o fJ-, Thus the limiting probability of being in state i is as given by (8 66) and agrees with the result of Chapter 4, and, given that the state is i, the time already in that state has the equilibrium distribution of H, 388 BROWNIAN MOTION AND OTHER MARKOV PROCESSES 8.6.2 The M/G/l Queue Consider the MIG/I queue where arrivals are at a Poisson rate A, and there is a single server whose service distribution is G, and suppose that G is continuous and has hazard rate function A(t). This model can be analyzed as a Markov process by letting the state of any time be the pair (n, x) with n denoting the number in the system at that time and x the amount of time the person being served has already been in service Letting p,(n, x) denote the density of the state at time t, we have, when n => 1, that (86.7) PI+h(n, X + h) p,(n, x)(1 - A(x)h)(1 - Ah) + p/(n - 1, x)Ah + o(h). The above follows since the state at time t + h will be (n, x + h) if either (a) the state at time t is n, x and in the next h time units there are no arrivals and no service completions, or if (b) the state at time tis (n 1, x) and there is a single arrival and no departures in the next h time units Assuming that the limiting density p( n, x) = lim,_ '" P,C n, x) exists, we obtain from (867) pen, x + h) pen x) h ' = -(A + A(x»p(n, x) + Ap(n and upon letting h ~ 0 (868) d dxp(n,x) == -(A + A(x»p(n, x) + >..p(n l,x), Let us now define the generating function G(s, x) by '" G(s, x) == L snp(n, x) n=1 Differentiation yields a '" d -G(s,x) = L sn-d p(n,x) aX n- I X '" = L sn[(-A - A(x»p(n, x) + Ap(n -1,x)] n=1 = (As A - A(x»G(s, x) Dividing both sides by G(s, x) and integrating yields o(h) l,x)+h' n=>1 (from (8 68» APPLICATIONS OF THE KOLMOGOROV EQUATIONS or (8.6.9) G(s, x) = G(S, O)e-A(I -S)xG(X), where the above has made use of the identity To obtain G(s, 0), note that the equation for pen, 0), n > 0, is where Thus or { J pen + 1, x)A(x) dx p(n, 0) = J pen + 1, x)A(x) dx + P(O)A P(O) = P{system is empty} n>l n = 1, i sn+lp(n, 0) 0::: J i sn+ Ip(n + 1,x)A(x) dx + s2AP(0) n= I n= I (8.6.10) sG(s, O) = J (G(s, x) - sp(l, x))A(x) dx + s 2AP(0) 389 = G(s, 0) J e-A(I-s)xg(x) dx - s J p(l,x)A(x) dx + s2AP(0), where the last equality is a result of Equation (8.6.9). To evaluate the second term on the right-hand side of (8.6.10), we derive an equation for P(O) as follows: P{empty at t + h} = P{empty at t}(l - Ah) + J A(x)hpl1, x) dx + o(h). Letting t -4 00 and then letting h -4 0 yields (8611) AP(O) = J A(x)p(l, x) dx. Substituting this back in (8 6 10), we obtain sG(s, O) = G(s,O)G(A(l - s)) - sA(1 - s)P(O) 390 BROWNIAN MOTION AND OTHER MARKOV PROCESSES or (8.612) G(s 0) = ,::A(l - s)P(O) , G(A(l - s» - s where O(s) f e-.u dG(x) is the Laplace transform of the service distribution. To obtain the marginal probability generating function of the number in the system, integrate (8.6 9) as follows: i snp{n in system} :::::: f; G(s, x) dx n=l = G(s, 0) f; e-A(I-S)J:G(x) dx = G(s, 0) f; e-A(I-S)J: r dG(y) dx = G(s, 0) f; e-A(I-S)J: dx dG(y) = G(s.O) fa> (1 - e-A(I-s)y) dG( ) A(1 - s) 0 y _ G(s,O)(1 G(A(l - s») - A(1 - s) Hence, from (8.612) s P(O)(l - G(A(l - s») L.J snp{n In system} = P(O) + n=O G(A(l - s» - s _ P(O)(l S)G(A(1 - s» - O(A(l-s»-s To obtain the value of P(O). let s approach 1 in the above. This yields '" 1 = 2: P{n in system} n=O = P(O) lim (1_ S)O(A(1 - s» 5_1 G(A(1 - s» - s lim dd (1 s) P(O) $-1 S [O(A(l - s» - s] P(O) 1 - AE[S] (by L'hopital's rule since 0(0) 1) APPLICATIONS OF THE KOLMOGOROV EQUATIONS 391 or Po = 1 - A£[S], where £[S] = f x dG(x) is the mean service time Remarks (1) We can also attempt to obtain the functions pen, x) recursively starting with n = 1 and then using (8.6 8) for the recursion. For instance, when n = I, the second right-hand term in (8.6.8) disappears and so we end up with d dx p(l,x) = -(A + A(x))p(1,x). Solving this eq uation yields (8.613) p(l, x) = p(l,O)e- Ax G(x) Thus J A(x)p(1,x)dx = p(1,O) J e-AJ:g(x)dx, and, using (8.6 11), we obtain AP(O) = p(1, O)G(A) Since P(O) == 1 - A£[S], we see that ( 1 0) = A(1 ~ A£[S]) P , G(A) Finally, using (86 13), we have (8.6.14) ( 1 ) = Ae- A J:(1 -_A£[S])G(x) p ,x G(A) This formula may now be substituted into (86.8) and the differential equation for p(2, x) can be solved, at least in theory We could then attempt to use p(2, x) to solve for p(3, x), and so on (2) It follows from (8614) that p(YIl), the conditional density of time the person being served has already spent in service, given a single customer 392 BROWNIAN MOTION AND OTHER MARKOV PROCESSES in the system, is given by -AYG( ) p(yI1) = e _ y . J e-AYG(y) dy In the special case where G(y) =:: 1 - e- ILY , we have Hence, the conditional distribution is exponential with rate A + J.t and thus is not the equilibrium distribution (which, of course, is exponential with rate J.t) (3) Of course, the above analysis requires that AE[S] < 1 for the limiting distributions to exist. Generally, if we are interested in computing a limiting probability distribu- tion of a Markov process {X(t)}, then the appropriate approach is by way of the forward equations. On the other hand, if we are interested in a first passage time distribution, then it is usually the backward equation that is most valuable That is in such problems we condition on what occurs the first h time units 8.6.3 A Ruin Problem in Risk Theory Assume that N(t), the number of claims received by an insurance company by time t, is a Poisson process with rate A. Suppose also that the dollar amount of successive claims are independent and have distribution G If we assume that cash is received by the insurance company at a constant rate of 1 per unit time, then its cash balance at time t can be expressed as ~ cash balance at t =:: X + t - ~ Y " I ~ 1 where x is the initial capital of the company and Y" i ~ 1, are the successive claims. We are interested in the probability, as a function of the initial capital x, that the company always remains solvent That is, we wish to determine R(x) = P {x + t - , ~ Y , > 0 for all t} To obtain a differential equation for R(x) we shall use the backward approach and condition on what occurs the first h time units If no claims are made, then the company's assets are x + h, whereas if a single claim is made, they A MARKOV SHOT NOISE PROCESS are x + h - Y. Hence, R(x) === R(x + h)(1 - Ah) + E[R(x + h - Y)]Ah + o(h), and so R(x + h) ~ R(x) = AR(x + h) AE[R(x + h h Y)] + o(h) h Letting h -I> 0 yields R'(x) = AR(x) AE[R(x Y)] or R'(x) = AR(x) ~ A I: R(x y) dG(y) This differential equation can sometimes be solved for R. 8.7 A MARKOV SHOT NOISE PROCESS 393 Suppose that shocks occur in accordance to a Poisson process with rate A. Associated with the ith shock is a random variable X" i 2:!: 1, which represents the "value" of that shock The values are assumed to be additive and we also suppose that they decrease over time at a deterministic exponential rate That is, let us denote by N( t), the number of shocks by t, X" the value of the ith shock, S" the time of the ith shock Then the total shock value at time t, call it X(t), can be expressed as where a is a constant that determines the exponential rate of decrease When the X" i 2:!: 1, are assumed to be independent and identically distrib- uted and {X" i 2:!: I} is independent of the Poisson process {N(t), t 2:!: O}, we call {X(t), t 2:!: O} a shot noise process A shot noise process possesses the Markovian property that given the present state the future is conditionally independent of the past. We can compute the moment generating function of X(t) by first condition- ing on N(t) and then using Theorem 23.1 of Chapter 2, which states that 394 BROWNIAN MOTION AND OTHER MARKOV PROCESSES given N(t) = n, the unordered set of arrival times are independent uniform (0, t) random variables This gives E[exp{sX(t)}IN(t) = n] = E [exp {s Ita x,e-a(r-u,j J. where VI, , Vn are independent uniform (0, t) random variables Continuing the equality and using independence gives E[exp{sX(t)}IN(t) = n] == (E[exp{sX, e-a(r- U')}J)n = [t 4>(se- ay ) dY1tr == fr, where 4>(u) = E[e"xJ is the moment generating function of X. Hence, (871) '" (At)n E[exp{sX(t)}J = L fre- Ar -, n ~ O n == exp { A J ~ [q,(se- ay ) - 1] dY} The moments of X(t) can be obtained by differentiation of the above, and we leave it for the reader to verify that (872) E[X(t)] == AE[X](1 ~ e-a')/a, Yar[X(t)J == AE[X2](1 - e- 2ar )/2a. To obtain Cov(X(t), X(t + s», we use the representation X(t + s) = e-a'X(t) + X(s), where X(s) has the same distribution as X(s) and is independent of X(t). That is, X(s) is the contribution at time t + s of events arising in (t, t + s) Hence, Cov(X(t), X(t + s)) = e-asYar(X(t)) = e- as AE[X 2 J(1 - e- 2a ')/2a The limiting distribution of X(t) can be obtained by letting t _ 00 in (871). This gives lim E[exp{sX(t)}] == exp {A J'" [4>(se- ay ) ~ 1] d Y } /_'" 0 A MARKOV SHOT NOISE PROCESS 395 Let us consider now the special case where the X, are exponential random variables with rate O. Hence, o c/J(u) = -n-' u- U and so, in this case, lim E[exp{sX(t)}j = exp {A J'" ( 0 -a - 1) d Y } ,_'" 0 y { A IS dx } =exp -- a. = (_0 )Ala. O-s But (0/(0 - s))Ala is the moment generating function of a gamma random variable with parameters A/a. and 0 Hence the limiting density of X(t), when the X, are exponential with rate 0, is (8.7.3) Be-8y( Oy )Ala - 1 fey) r(A/a.) , O<y< 00. Let us suppose for the remainder of this section that the X, are indeed exponential with rate 0 and that the process is in steady state. For this latter requirement, we can either imagine that X(O) is chosen according to the distribution (87 3) or (better yet) that the process originated at t = - 00 Suppose that X(t) = Y An interesting computation is to determine the distribution of the time since the last increase-that is, the time since the last Poisson event prior to t. Calling this random variable A(t), we have (8.7.4) P{A(t) > sIX(t) = y} = lim P{A(t) > sly < X(t) < Y + h} h_O = lim P{ye as < X(t s) < (y + h )e a ., 0 events in (t s, t)} h-O P{y < X(t) < Y + h} = lim f(yea')ea'he- As + o(h) h_O f(y)h + o(h) = I)} It should be noted that we have made use of the assumption that the process is in steady state to conclude that the distributions of X(t) and X(t s) are both given by (873). From (874), we see that the conditional hazard rate 396 BROWNIAN MOTION AND OTHER MARKOV PROCESSES function of A(t)-given X(t) = y, call it A(sly)-is given by d tis P{A(t) <: sIX(t) y} A(sly) = P{A(t) > sIX(t) = y} From this we see that given X(t) y, the hazard rate of the backwards time to the last event starts at t with rate £Jay (that is, A(Oly) = £Jay) and increases exponentially as we go backwards in time until an event happens. It should be noted that this differs markedly from the time at t until the next event (which, of course, is independent of X(t) and is exponential with rate A). 8.8 STATIONARY PROCESSES A stochastic process {X(t), t ;:::: O} is said to be a stationary process if for all n, s, t l , ., t" the random vectors X(t l ), , X(t,,) and X(t 1 + s), . . , X(t" + s) have the same joint distribution In other words, a process is stationary if choosing any fixed point as the origin, the ensuing process has the same probability law. Some examples of stationary processes are' (i) An ergodic continuous-time Markov chain {X(t), t ;:::: O} when P{X(O) j} PI' j::> 0, where {P J , j ;:::: O} are the stationary probabilities. (ii) {X(t), t ;:::: O} when X(t) is the age at time t of an equilibrium re- newal process (iii) {X(t), t ;:::: O} when X(t) N(t + L) - N(t), t ::> 0, where L > 0 is a fixed constant and {N(t), t ::> O} is a Poisson process having rate A The first two of the above processes are stationary for the same reason: they are Markov processes whose initial state is chosen according to the limiting state distribution (and thus they can be thought of as ergodic Markov processes that have already been in operation an infinite time) That the third example- where X(t) represents the number of events of a Poisson process that occur between t and t + L-is stationary follows from the stationary and independent increment assumption of the Poisson process The condition for a process to be stationary is rather stringent, and so we define the process {X(t), t;;;::: O} to be a second-order stationary, or a covariance stationary, process if E[X(t)] = c and Cov(X(t), X(t + s)) does not depend on t That is, a process is second-order stationary (a further name sometimes seen in the literature is weakly stationary) if the first two moments of X(t) are the same for all t and the covariance between X(s) and X(t) depends only STATIONARY PROCESSES 397 on It - sl. For a second-order stationary process, let R(s) = Cov(X(t), Xes + t)). As the finite-dimensional distributions of a Gaussian process (being multivari- ate normal) are determined by their means and covariances, it follows that a second-order stationary Gaussian process is stationary. However, there are many examples of second-order stationary processes that are not stationary. EXAMPLE 8.8(A) An Auto Regressive Process. Let Zo, ZI> Zz, .. , be uncorrelated random variables with £[Z,,] = 0, n 2 0, and n 0 n ;;:: 1, where A2 < L Define (8.8.1) (8.8.2) Xo = Zo, X" = AX',,_I + Z,,' n2L The process {X", n ;;:: O} is called a first-order auto-regressive process It says that the state at time n(X,,) is a constant multiple of the state at time n 1 plus a random error term (Z,,). Iterating (8.8.2) yields and so X" = A(AX'''_2 + Z,,_l) + z" = A 2 X,,_z + AZ,,_I + Z" " = 2: A"-tA,,+m tCov(Zt, Z,) ;=0 where the above uses the fact that Z, and Z/ are uncorrelated when i ~ j Since £[X,,] = 0, we see that {X", n ;;:: O} is weakly stationary 398 BROWNIAN MOTION AND OTHER MARKOV PROCESSES (the definition for a discrete-time process is the obvious analogue of that given for continuous-time processes) ExAMPLE 8.8(a) A Moving A verage Process. Let W o , W l , W 2 , be uncorrelated with E[WnJ = J.L and Var(W n ) = (1"2, n :2!: 0, and for some pcsitive integer k define The process {X n , n :2!: k}, which at each time keeps track of the arithmetic average of the most recent k + 1 values of the W's, is called a moving average process Using the fact that the W n , n >- 0, are uncorrelated, we see that { (k + 1 - m)(1"2 _ k + 1 2 Cov(X n , Xn + m ) - 0 ( ) if 0 s m s k ifm >k Hence, {X n , n :2!: k} is a second-order stationary process Let {X n , n :2!: 1} be a second-order stationary process with E[Xn] = J.L. An important question is when, if ever, does Xn == L l n = 1 X,In converge to J.L The following proposition shows that E[(X n - J.L)2J 0 if, and only if, L l n = 1 R(i)/ n O. That is, the expected sq uare of the difference between X nand J.L will converge to 0 if, and only if, the limiting average value of R(i) converges to O. PROPOSITION 8.8.1 Let {X n , n 2:: I} be a second-order stationary process having mean p. and covanance function R(i) = Cov(Xn, Xn + I)' and let X n S I X,/ n Then lim £[(Xn _p.)2] = 0 n_'" if, and only if, Proof Let Y, = X, - p. and Vn = I Y,In and suppose that I R(i)/n _ 0 We want to show that this implies that £[ _ 0 Now £ Y; + 2 2:2: y,Y,] n I-I I<JSn 22:2: R(j - i) = R(O) + _' <-,-i s_n---. __ n n 2 PROBLEMS 399 We leave it for the reader to verify that the right-hand side of the above goes to 0 when R(i)/n .-,. 0 To go the other way, suppose that E[Y!].-,. 0, then R(i))2 = [1.. ± COV(YI' y.>]2 ,-0 n n,SI [Cov(Y!> Yn)P = [E(YI Y,,)]2 ::; which shows that RO)/n.-,. 0 as n .-,. 00 The reader should note that the above makes use of the Cauchy-Schwarz inequality, which states that for random variables X and Y (E[Xy])2 ::; E[X2]E[y2] (see Exercise 828) PROBLEMS In Problems 8 1, 82, and 8.3, let {X(t), t ;;::: O} denote a Brownian motion process. 8.1. Let yet) = tX(Vt). (a) What is the distribution of Yet)? (b) Compute Cov(Y(s), (c) Argue that {yet), t ;;::: O} is also Brownian motion (d) Let T inf{t > 0: X(t) O} Using (c) present an argument that P{T = O} = 1. 8.2. Let W(/) = X(a 2 1)/a for a> O. Verify that Wet) is also Brownian motion 8.3. Compute the conditional distribution of Xes) given that X(tl) = A, X(1 2 ) = B, where tl < s < t2 8.4. Let {2(t), I :> O} denote a Brownian Bridge process Show that if X(/) = (I + 1)2(II(t + 1», then {X(I). I 2: O} is a Brownian motion process 400 BROWNIAN MOTION AND OTHER MARKOV PROCESSES 8.5. A stochastic process {X(t), t ~ O} is said to be stationary if X(t)), ", X(tn) has the same joint distribution as X(t) + a),. ,X(tn + a) for all n,a,t 1 , ,tn (a) Prove that a necessary and sufficient condition for a Gaussian pro- cess to be stationary is that Cov(X(s), X(t)) depends only on t - s, s ~ t, and E[X(t)] = c. (b) Let {X(t), t ~ O} be Brownian motion and define Show that {Vet), t :> O} is a stationary Gaussian process It is called the Ornstein-Uhlenbeck process. 8.6. Let {X(t), t 2= O} denote a birth and death process that is allowed to go negative and that has constant birth and death rates An == A, P,n == p" n = 0, ± 1, ±2,. Define p, and e as functions of A in such a way that {eX(t), t :> u} converges to Brownian motion as A ~ 00 In Problems 87 through 812, let {X(t), t ~ O} denote Brownian motion 8.7. Find the distribution of (a) IX(t)1 (b) I o ~ ! ~ ( Xes) I (c) max Xes) - X(t). Osss( 8.8. Suppose X(1) = B. Characterize, in the manner of Proposition 81.1, {X(t), 0 -< t ~ I} given that X(1) = B 8.9. Let M(t) = maxOSsSt Xes) and show that P{M(t) > aIM(t) = X(t)} = e- al12t , a> 0 8.10. Compute the density function of Tn the time until Brownian motion hits x 8.11. Let T) denote the largest zero of X(t) that is less than t and let T2 be the smallest zero greater than t Show that (a) P{T2 < s} = (2/1T) arc cos ~ , s > t. (b) P{T) < s, T2 > y} = (2/1T) arc sine -VS;Y, s < t < y 8.12. Verify the formulas given in (834) for the mean and variance of IX(t)1 PROBLEMS 8.13. For Brownian motion with drift coefficient J,L, show that for x > ° 8.14. Let Tx denote the time until Brownian motion 'hits x Compute 8.15. For a Brownian motion process with drift coefficient J,L let f(x) = E[time to hit either A or -Blx o = x], where A > 0, B > 0, -B < x < A. (a) Derive a differential equation for f(x). (b) Solve this equation. 401 (c) Use a limiting random walk argument (see Problem 422 of Chapter 4) to verify the solution in part (b). 8.16. Let To denote the time Brownian motion process with drift coefficient J,L hits a (a) Derive a differential equation which f(a, t) == P{T o ::s; t} satisfies. (b) For J,L > 0, let g(x) = Var(TJ and derive a differential equation for g(x), x > O. (c) What is the relationship between g(x), g(y), and g(x + y) in (b)? (d) Solve for g(x). (e) Verify your solution by differentiating (84.11) 8.17. In Example 8.4(B), suppose Xo = x and player I has the doubling option Compute the expected winnings of I for this situation. 8.18. Let {X(t), t ~ O} be a Brownian motion with drift coefficient J,L, J,L < 0, which is not allowed to become negative. Find the limiting distribution of X(t). 8.19. Consider Brownian motion with reflecting barriers of -Band A, A > 0, B > O. Let plx) denote the density function of XI' (a) Compute a differential equation satisfied by PI(X) (b) Obtain p(x) = lim / _ OJ PI(X) 8.20. Prove that, with probability 1, for Brownian motion with drift J,L X(t) --.-7 J,L t as t.-7 00 402 BROWNIAN MOTION AND OTHER MARKOV PROCESSES 8.21. Verify that if {B(t), t ;:::= O} is standard Brownian motion then {Y(t), t O} is a martingale with mean 1, when Y(t) = exp{cB(t) - c 2 t12} 8.22. In Problem 816, find Var(Ta) by using a martingale argument. 8.23. Show that 1 2 p(x, t,y) == __ e-(x-y-,.r) r2r Vim satisfies the backward and forward diffusion Equations (8.5.1) and (852). 8.24. Verify Equation (8.7.2). 8.25. Verify that {X(t) = N(t + L) - N(t), t ;:::= O} is stationary when {N(t)} is a Poisson process 8.26. Let Ube uniformly distributed over (-1T, 1T), and let Xn = cos(nU). By using the trigonometric identity cos x cos y = Hcos(x + y) + cos(x - y)], verify that {X n , n > I} is a second-order stationary process. 8.27. Show that n R(') R( . ') 2:- 1 -0 implies 2:2: J I -0 ,=1 n r<J<n n thus completing the proof of Proposition 8.8 1 8.28. Prove the Cauchy-Schwarz ineq uality: (Hint.' Start with the 21xYI '$ x 2 + y2 and then substitute xrV E[X2] for x and YI E[y2] for y) 8.29. For a second-order stationary process with mean J.L for which L,n:-ol R(i)1 n _ 0, show that for any s > 0 n -I 2: p{IX n - J.LI > s}- 0 asn- 00 1=0 REFERENCES 403 REFERENCES For a rigorous proof of the convergence of the empincal distnbution function to the Brownian Bndge process. the interested (and mathematically advanced) reader should see Reference 1 Both References 3 and 5 provide nice treatments of Brownian motion The former emphasIZes differential equations and the latter martingales in many of their computations For additional material on stationary processes the interested reader should see Reference 2 or Reference 4 1. P Billingsley. Convergence of Prohahilily Measures, Wiley. New York. 1968 2 G. Box and G Jenkins, Time Series Analysis-Forecasting and Control. Holden- Day. San Francisco, 1970 3 D R. Cox and H D. Miller. Theory of Stochastic Processes, Methuen. London, 1965. 4 H Cramer and M Leadbetter. Stationary and Related Stochastic Processes. Wiley, New York. 1966. 5 S Karlin and H. Taylor, A First Course in Stochastic Processes, 2nd ed .• Academic Press. New York. 1975. CHAPTER 9 Stochastic Order Relations INTRODUCTION In this chapter we introduce some stochastic order relations between random variables. In Section 9 1 we consider the concept of one random variable being stochastically larger than another Applications to random variables having a monotone hazard rate function are presented We continue our study of the stochastically larger concept in Section 92, where we introduce the coupling approach and illustrate its usefulness. In particular, we use coupling to estab- lish, in Section 92 1, some stochastic monotonicity properties of birth and death processes, and to prove, in Section 92.2, that the n step transition probabilities in a finite-state ergodic Markov chain converge exponentially fast to their limiting probabilities In Section 9 3 we consider hazard rate orderings, which are stronger than stochastically larger orderings, between random variables We show how to use this idea to compare certain counting processes and, in fact, we use it to prove Blackwell's theorem ofrenewal theory when the interarrival distribution is continuous Some monotonicity properties of renewal processes, having interarrival distributions that are decreasing failure rate, are also presented In Section 9 4 we consider likelihood ratio orderings In Section 9.5 we consider the concept of one random variable having more vanability than another, and, in Section 96, we present applications of this to comparison of (i) queueing systems (Section 961), (ii) a renewal process and a Poisson process (Section 962), and (iii) branching processes. In Section 9.7 we consider associated random variables 9.1 STOCHASTICALLY LARGER We say that the random variable X is stochastically larger than the random variable Y, written X ~ S l Y, if (9.1.1) 404 P{X> a} ~ P{Y > a} for all a. STOCHASTICALLY LARGER 405 If X and Yhave distributions F and G, respectively, then (9.1.1) is equivalent to F(a):> G(a) for all a Lemma 9.1.1 If X Y, then E[X] E[Y] Proof Assume first that X and Yare nonnegative random variables Then E[X] = J: PiX> a}da J: P{Y> a}da = E[Y] In general we can write any random variable Z as the difference of two nonnegative random variables as follows where Z Z+- - Z-, ifZ 0 ifZ<O, z- Now we leave it as an exercise to show that Y=>X+- Y\ X-::5 Y- t;f st ,\ Hence, ifZ<O The next proposition gives an alternative definition of stochastically larger. PROPOSITION 9.1.2 X Y E[f(X)] E[f(Y)] ,! for all increasing functionsf Proof Suppose first that X Y and let f be an increasing function We show that f(X) f(Y) as follows Letting f-l(a) = infix f(x) a}, then P{f(X) > a} PiX > f-I(a)} P{Y > rl(a)} = P{f(Y) > a} Hence. f(X) f(Y) and so, from Lemma 91 1, E[f(X)] E[f(Y)] 406 STOCHASTIC ORDER RELATIONS Now suppose that E(f(X)] 2= E(f(Y)] for all increasing functions f. For any a let f,. denote the increasing function ifx> a if x Sa, then E[f,.(X)] = P{X> a}, E[f,.(Y)] = P{Y> a}, and we see that X 2=,( Y ExAMPLE 9.1(A) Increasing and Decreasing Failure Rate. Let X be a nonnegative random variable with distribution F and density f Recall that the failure (or hazard) rate function of X is defined by A(I) = 1S 1 ) . F(I) We say that X is an increasing failure rate (IFR) random variable if A(I) i I, and we say that it is a decreasing failure rate (DFR) random vari- able if A(I) ~ I. If we think of X as the life of some item, then since A(I) dl is the probability that a I-unit-old item fails in the interval (I, I + dl), we see that X is IFR (DFR) means that the older the item is the more (less) likely it is to fail in a small time dl. Suppose now that the item has survived to time I and let XI denote its additional life from I onward. XI will have distribution ~ given by (9 1.2) FI(a) = P{X I > a} = P{X- l>aIX>I} = F(I + a)/F(I). PROPOSITION 9.1.3 X is IFR ~ XI is stochastically decreasing in t, X is D FR ~ XI is stochastically increasing in t STOCHASTICALLY LARGER 407 Proof It can be shown that the hazard rate function of Xr-call it AI-is given by (913) Ar(a) A(I + a). where A is the hazard rate function of X Equation (9 1.3) can be formally proven by using (9 1 2), or can, more intuitively, be argued as follows Since (914) Ar(a) = lim P{a < X, < a + hlX r 2: a}/h h-O = lim P{a < X 1 < a + hlX 2: I, X 12: a}/h h-O = lim P{I + a < X < 1 + a + hlX 2: 1 + a}/h h-O = A(I + a) "p/(S) exp {- 1: A/(a) da} = exp { t"S A(y) d Y }, it follows that if A(y) is increasing (decreasing), then "P1(s) is decreasing (increasing) in 1 Similarly, if F,(s) is decreasing (increasing) in t, then (914) implies that A(y) is increasing (decreasing) in y Thus the lifetime of an item is IFR (DFR) if the older the item is then the stochas- tically smaller (larger) is its additional life A common class of DFR distributions is the one consisting of mixtures of exponentials where we say that the distribution F is a mixture of the distribu- tions Fa, 0 < ex. < 00, if, for some distribution G, F(x) = J; FaCx) dG(ex.) Mixtures occur when we sample from a population made up of distinct types. The value of an item from the type characterized by ex. has the distribution Fa. G is the distribution of the characterization quantities Consider now a mixture of two exponential distributions having rates AI and A 2 , where AI < A 2 • To show that this mixture distribution is DFR, note that if the item selected has survived up to time t, then its distribution of remaining life is still a mixture of the two exponential distributions. This is so since its remaining life will still be exponential with rate AI if it is a type- 1 item or with rate A] if it is a type-2 item However, the probability that it 408 STOCHASTIC ORDER RELATIONS is a type-1 item is no longer the (prior) probability p but is now a conditional probability given that it has survived to time t. In fact, its probability of being a type-l i tern is P{t ll rf > }::;: P{type 1, life> t} ype let P{life > t} Since the above is increasing in t, it follows that the larger t is the more likely it is that the item in use is a type 1 (the better one, since AI < A 2 ) Hence, the older the item is, the less likely it is to fail, and thus the mixture of exponentials is DFR. It turns out that the class of DFR distributions are-closed under mixtures (which implies the above since the exponential distribution, as it has a constant hazard rate function, is both IFR and DFR). To prove this we need the following well-known lemma whose proof is left as an exercise. Lemma 9.1.4 The Cauchy-Schwarz Inequality For any distribution G and functions h(t), k(t), t ~ 0, provided the integrals exist We may now state the following. PROPOSITION 9.1.5 If Fa. is a DFR distribution for all 0 < a < /Xl and G a distribution function on (0, /Xl), then F is DFR, where Proof F(t) = J; Fa(t) dG(a) d dt F(t) J: fa.(t) dG(a) AAt)=---- F(t) F(t) COUPLING 409 We will argue that AA/) decreases in I by first assuming that all derivatives exist and then showing that (d/dl)AA/) 0 Now d F(/) f dG(a) + (J .t:.(/) dG(a) r dl [AF(/) J = -----=-----'--------'-- P(/) Since F(/) = fFa(/) dG(a), it follows from the above that to prove that (d/dl)AF(/) :s 0, we need to show By letting h(a) = (Fa (/»112, k(a) = and applying the Cauchy-Schwarz inequality, we see Hence to prove (9 1 5) it suffices to show (916) Now Fa is, by assumption, DFR and thus o =:!!!.- = + dl Fa (I) , implying which proves (9 1 6) and established the result (The above also shows 0, and so k(a) :5 was well defined) 9.2 COUPLING If X Y, then there exists random variables X* and y* having the same distributions of X and Y and such that X* is, with probability 1, at least as large as y* Before proving this we need the following lemma 410 STOCHASTIC ORDER RELATIONS Lemma 9.2.1 Let F and G be continuous distribution functions If X has distribution F then the random variable G -1(F(X)) has distribution G Proof PROPOSITION 9.2.2 P{G-l(F(X)) a} = P{F(X)::S G(a)} = P{X:5 F-l(G(a))} = F(F-1(G(a)) = G(a) If F and G are distributions such that F(a) 2" G(a), then there exists random variables X and Y having distributions F and G respectively such that P{X Y} = 1 Proof We'll present a proof when F and G are continuous distribution functions Let X have distribution F and define Y by Y = G-1(F(X)) Then by Lemma 9 2 1 Y has distribution G But as F G, it follows that F- 1 G- 1 , and so Y = G-I(F(X)) ::s F-l(F(X)) = X, which proves the result Oftentimes the easiest way to prove F :> G is to let Xbe a random variable having distribution F and then define a random variable Y in terms of X such that (i) Yhas distribution G, and (ii) Y -< X We illustrate this method, known as coupling, by some examples EXAMPLE 9.2(A} Stochastic Ordering of Vectors. Let XI,. ., XII be independent and Y I . ' , YII be independent If X, V,. then for any increasing f f(XI " " YII ), 51 Proof Let XI, . . XII be independent and use Proposition 9.22 to generate independent Yi, ,., Y:, where Y,* has the distribu- COUPLING 411 tion of Y, and Y; :s X, Then !(X I , • ,XII) ~ !(Yt,. ,Y:) since! is increasing Hence for any a !(Yt, and so P{f(Yt,. ,Y:) > a} <: P{f(X I , • ,XII) > a} Since the left-hand side of the above is equal to P{f(YI. YII) > a}, the result follows. EXAMPLE 9.2(8) Stochastic Ordering of Poisson Random Vari- ables. We show that a Poisson random variable is stochastically increasing in its mean Let N denote a Poisson random variable with mean A For any p, 0 < P < 1, let II. /2,. be independent of each other and of N and such that Then N / = {I , 0 with probability p with probability 1 - P L ~ is Poisson with mean Ap (Why?) ,-I Since the result follows We will make use of Proposition 9 1 2 to present a stochastically greater definition first for vectors and then for stochastic processes Definition We say that the random vector K = (X h , Xn) is stochastically greater than the random vector Y = (Y 1 , ,Y n ), written X 2:" 2::: if for all increasing functions 1 £[f(X)] 2: £[/(2:::)] 4U STOCHASTIC ORDER RELATIONS We say that the stochastic process {X(t), t ~ O} is stochastically greater than {yet), t;::: O} if for all n, t l ,. ,tn' It follows from Example 9.2(A) that if X and Yare vectors of independent components such that X, ;::::51 }'., then X >51 Y. We leave it as an exercise for the reader to present a counterexample when the independence assumption is dropped. In proving that one stochastic process is stochastically larger than another, coupling is once again often the key. EXAMPLE 9.2(c) Comparing Renewal Processes. Let N, = {N,(t), t ~ O}, i = 1,2, denote renewal p r ~ e s s e s having interarrival distri- butions F and G, respectively. If F ~ G, then {M(t), t ~ O} :::; {N 2 (t), t ~ O}. 51 To prove the above we use coupling as follows. Let XI, X 2 , ••• be independent and distnbuted according to F. Then the renewal process generated by the X,-call it N'( -has the same probability distributions as N I • Now generate independent random variables Y I , Y 2 , ••• having distribution G and such that Y, :5 X,. Then the renewal process generated by the interarrival times Y,-call it Nt -has the same probability distributions as N 2 • However, as Y, :5 X, for all i, it follows that N('(t) S Ni(t) for all t, which proves the result. Our next example uses coupling in conjunction with the strong law of large numbers. EXAMPlE 9.2(D) Let XI. Xl,. . be a sequence of independent Bernoulli random variables, and let P, = P{X, = I}, i ~ 1 If P, > P for all i, show that with probability 1 If lim inf 2: XJn ~ p. If ,=1 COUPLING 413 n (The preceding means that for any s > 0, 2: X/n :::;; P e for i=1 only a finite number of n.) Solution We start by coupling the sequence X I1 i :> 1 with a sequence of independent and identically distributed Bernoulli ran- dom variables Y" i ~ 1, such that P{Y I = 1} = P and XI ~ Y I for all i. To accomplish this, let V" i ~ 1, be an independent sequence of uniform (0, 1) random variables. Now, for i = 1, , n, set x, G if VI <: PI otherwise, and Since P :::;; PI it follows that Y/ :::;; X" Hence, n n lim inf 2: X/n ~ lim inf 2: YJn. n 1=1 n 1=1 if V, <:p otherwise But it follows from the strong law of large numbers that, with n probability 1, lim inf 2: YJn = p. n ,wi ExAMPLE 9.2(E) Bounds on the Coupon Collector's Problem. Suppose that there are m distinct types of coupons and that each coupon collected is type j with probability PI' j = 1, . , m. Letting N denote the number of coupons one needs to collect in order to have at least one of each type, we are interested in obtaining bounds for E[N]. To begin, let ii' . ,im be a permutation of 1, , m Let TI denote the number of coupons it takes to obtain a type ii, and for j> 1, let ~ denote the number of additional coupons after having at least one of each type ii' .. ,i,-l until one also has at least one of type i l (Thus, if a type i, coupon is obtained before at least one of the types ii, . , ir I, then ~ = 0, and if not then T j is a geometric random variable with parameter P,) Now, I and so, m (9.2 1) E[N] 2: P{i j is the last ofit. . , iJI PI( 1=1 Now, rather than supposing that a coupon is collected at fixed time points, it clearly would make no difference if we supposed that 414 STOCHAST[C ORDER RELATIONS they are collected at random times distributed according to a Pois- son process with rate 1. Given this assumption, we can assert that the times until the first occurrences of the different types of coupons are independent exponential random variables with respective (for a type j coupon) rates Pi' j = 1, .. ,m. Hence, if we let Xi' j 1,. ,m be independent exponential random variables with rates 1, then X/ p) will be independent exponential random vari- ables with rates Pi' j = 1, .. , m, and so P{i) is the last of ii, . , i) to be collected} (9.2.2) = P{X,IP, = max(X,IP, , .. , X,IP, )}. J I 1 1 I I Now, renumber the coupon types so that PI :5. P 2 < .. :5. Pm We will use a coupling argument to show that (9.2.3) P{j is the last of 1, .. ,j} < Ilj, (9.24) P{j is the last of m, m - 1, .. ,j}:> 1/(m - j + 1). To verify the inequality (9.2.3), note the following' P{ j is the last of 1, .. ,j to be obtained} = P{X/ PI = max X/ P,} IS'''''i !5 P{X/ p) = max XJ p)} IS,s) since P, :5. p) l/j. Thus, inequality (9.2.3) is proved. By a similar argument (left as an exercise) inequality (9.2.4) is also established. Hence, upon utihzing Equation (9.2.1), first with the permuta- tion 1, 2, , m (to obtain an upper bound) and then with m, m 1, , 1 (to obtain a lower bound) we see that Another lower bound for E[N] is given in Problem 9.17. ExAMPLE 9.2(F) A Bin Packing Problem. Suppose that n items, whose weights are independent and uniformly distributed on (0, 1), are to be put into a sequence of bins that can each hold at most one unit of weight. Items are successively put in bin 1 until an item is reached whose additional weight when added to those presently in the bin would exceed the bin capacity of one unit. At that point, bin 1 is packed away, the item is put in bin 2, and the process continues. Thus, for instance, if the weights of the first four items COUPLING are 45, 32, .92, and .11 then items one and two would be in bin 1, item three would be the only item in bin 2, and item four would be the initial item in bin 3 We are interested in £[8], the expected number of bins needed. To begin, suppose that there are an infinite number of items, and let N, denote the number of items that go in bin i Now, if W, denotes the weight of the initial item in bin i (that is, the item that would nut fit in bin i-I) then (9.25) d N, = max{j W, + VI + .. + V/_ I <: I}, d where X Y means that X and Y have the same distribution, and where VI> V h . is a sequence of independent uniform (0, 1) random variables that are independent of W. Let A.- J be the amount of unused capacity in bin i-I, that is, the weight of all items in that bin is 1 - A,_I Now, the conditional distribution of W" given A.-I, is the same as the conditional distribution of a uniform (0, 1) random variable given that it exceeds A,_I' That is, P{W. > xlA,-d = P{V > xlv> A.-I} where V is a uniform (0, 1) random variable As P{V > xlv> A-I} > P{V > x}, we see that W. is stochastically larger than a uniform (0, 1) random variable. Hence, from (9.2.5) independently of N 1 , , N._ 1 (926) N,<:max{j VI + ... + V l sl} ,[ Note that the right-hand side of the preceding has the same distribu- tion as the number of renewals by time 1 of the renewal process whose interarrival distribution is the uniform (0, 1) distribution. The number of bins needed to store the n items can be ex- pressed as However, if we let X" i :> 1, be a sequence of independent random variables having the same distribution as N(1), the number of renewals by time 1 of the uniform (0, 1) renewal process, then from (92.6) we obtain that 82: N, '[ 415 416 STOCHASTIC ORDER RELATIONS where N== min {m. i XI:> n}. But, by Wald's equation, E = E[N]E[XI]. Also, Problem 37 (whose solution can be found in the Answers and Solutions to Selected Problems Appendix) asked one to show that E[X I ] = E[N(1)] = e - 1 N and thus, since L X, n, we can conclude that n E[N] 2:-. e-1 Finally, using that B :> N, yields that Sl n e - 1 If the successive weights have an arbitrary distribution F concen- trated on [0, 1] then the same argument shows that n E[B] m(1) where m(1) is the expected number of renewals by time 1 of the renewal process with interarrival distribution F. 9.2.1 Stochastic Monotonicity Properties of Birth and Death Processes Let {X(t), t O} be a birth and death process. We will show two stochastic monotonicity properties of {X(t), t :> O}. The first is that the birth and death process is stochastically increasing in the initial state X(O). COUPLING 417 PROPOSITION 9.2.3 {X(t), t ;;:: O} is stochastically increasing in X(O) That is, E[f(X(td. ,X(tn»1 X(O) = i] is increasing in i for all t), , tn and increasing functions f Proof Let {X)(t), t ~ O} and {X 2 (t), t ~ O} be independent birth and death processes having identical birth and death rates, and suppose X1(0) i + 1 and X 2 (O) = i Now since X)(O) > X 2 (0), and the two processes always go up or down only by 1, and never at the same time (since this possibility has probability 0), it follows that either the X1(/) process is always larger than the X 2 (/) process or else they are equal at some time. Let us denote by T the first time at which they become equal That is, { oo T- 1stt. if Xt(t) > X 2 (t) foraHt X\(/) = X 2 (t) otherwise Now if T < 00, then the two processes are equal at time T and so, by the Markovian property, their continuation after time T has the same probabilistic structure Thus if we define a third stochastic process-call it {X 3 (t)}-by ift < T ift;;:: T, then {X 3 (t)} will also be a birth and death process, having the same parameters as the other two processes, and with X 3 (0) == X 1 (0) = i + 1 However, since by the definition of T for t < T, we see for all I, which proves the result Our second stochastic monotonicity property says that if the initial state is 0, then the state at time t increases stochastically in t. PROPOSITION 9.2.4 P{X(t) ~ jIX(O) = O} increases in t for all j 418 STOCHASTIC ORDER RELATIONS Proof For s < t P{X(t) iIX(O) = O} == 2: P{X(t) = 0, X(t - s) == i}P{X(t s):::: iIX(O) = O} = 2: P{X(t) ilX(t - s) = i}Po,(t - s) = 2: P{X(s) iIX(O) = i} Po,(t - s) , 2: P{X(s) iIX(O) O} Po,(t - s) == P{X(s) iIX(O) :::: O} 2: POt(t - s) I = P{X(s) iIX(O) == O} (by the Markovian property) (by Proposition 9 23) Remark Besides providing a nice qualitative property about the transition probabilities of a birth and death process, Proposition 9.2 4 is also useful in applications. For whereas it is often quite difficult to the values POj(t) for fixed t, it is a simple matter to obtain the limiting probabilities Pj' Now from Proposition 924 we have P{X(t) :> jIX(O) = O} <:: lim P{X(t) 2: jIX(O) = O} 1-"" which says that X(t) is stochastically smaller than the random variable-call it X( 00 )-having the limiting distribution, thus supplying a bound on the distribution of X(t). 9.2.2 Exponential Convergence in Markov Chains Consider a finite-state irreducible Markov chain having transition probabilities We will use a coupling argument to show that converges exponentially fast, as n 00, to a limit that is independent of i To prove this we will make use of the result that if an ergodic Markov chain has a finite number-say M-of states, then there must exist N, e > 0, such that (9.2.7) e for all i, j Consider now two independent versions of the Markov chain, say {X n , n 2: O} and n 2: O}, where one starts in i, say P{Xo = i} = 1, and the COUPLING 419 other such that P{Xo == j} = 1T" j = 1, ... , M, where the 1T J are a set of stationary probabilities. That is, they are a nonnegative solution of M 1T J = L 1TI P ,l , 1=1 1 1, .. "M, Let T denote the first time both processes are in the same state. That is, T = min{n: Xn X ~ } . Now and so where A, is the event that X Nl -:/= X NI . From (9.2.7) it follows that no matter what the present state is, there is a probability of at least e that the state a time N in the future will be j. Hence no matter what the past, the probability that the two chains will both be in state j a time N in the future is at least e 2 , and thus the probability that they will be in the same state is at least Me 2 • Hence all the conditional probabilities on the right-hand side of (9.2.8) are no greater than 1 - Mel. Thus (92.9) where ex == Mel, Let us now define a third Markov chain-call it {X n , n :> O}-that is equal to X' up to time T and is equal to X thereafter That is. for n:5 T forn:> T Since Xr = X T , it is clear that {X n , n ~ O} is a Markov chain with transition probabilities P 'J whose initial state is chosen according to a set of stationary probabilities Now P{Xn = j} = P{Xn = jlT:5 n}P{T:5 n} + P{Xn == liT> n}P{T> n} = P{Xn = jlT:5 n}P{T:5 n} + P{Xn == j,T> n}. 420 STOCHASTIC ORDER RELATIONS Similarly, P{X,,:= j} = P{X" = n} + P{X II j, T> n} Hence - P{X II = j} = P{X II = j, T> n} - P{X II = j, T> n} implying that - P{X II = nl < P{T> n} < (1 - a)"JN-l (by (9.2.9». But it is easy to verify (say by induction on n) that and thus we see that I P " 1< 13" IJ - 1T J --1-' -a where 13 = (1 - a)IJN Hence indeed converges exponentially fast to a limit not depending on i. (In addition the above also shows that there cannot be more than one set of stationary probabilities.) Remark We will use an argument similar to the one given in the above theorem to prove, in Section 9.3, Blackwell's theorem for a renewal process whose interarrival distribution is continuous. 9.3 HAZARD RATE ORDERING AND ApPLICATIONS TO COUNTING PROCESSES The random variable X has a larger hazard (or failure) rate function than does Y if (931) for all t:> 0, where Ax{t) and Ay(t) are the hazard rate functions of X and Y. Equation (9.3.1) states that, at the same age, the unit whose life is X is more likely to instantaneously perish than the one whose life is Y. In fact, since P{X> t + six> t} = exp {-J:+s A(y) d Y }, HAZARD RATE ORDERING AND APPLICATIONS TO COUNTING PROCESSES 421 it follows that (9.3.1) is equivalent to P{X> t + siX> t} <: P{Y > t + slY> t}, or, equivalently, XIS; Y 1 for all t ~ 0, 51 where X, and Y, are, respectively, the remaining lives of a t-unit-old item having the same distributions as X and Y. Hazard rate ordering can be quite useful in comparing counting processes. To illustrate this let us begin by considering a delayed renewal process whose first renewal has distnbution G and whose other interarrivals have distribution F, where both F and G are continuous and have failure rate functions AF(t) and Adt) Let J.L(t) be such that We first show how the delayed renewal process can be generated by a random sampling from a nonhomogeneous Poisson process having intensity func- tion J.L(t). Let Sl, S2,' denote the times at which events occur in the nonhomoge- neous Poisson process {N(t), t :> O} with intensity function J.L(t). We will now define a counting process-which, we will then argue, is a delayed renewal process with initial renewal distribution G and interarrival distribution F- such that events can only occur at times SI, Sz, '" Let if an event of the counting process occurs at time S. otherwise. Hence, to define the counting process we need specify the joint distribution of the I., i ~ 1. This is done as follows: Given Sl, Sz, .. ,take (93.2) and, for i > 1, (9.3.3) P{/. = 11/1>' . ,It-!} AG(SJ J.L(S.) = if II = ... = 1,-1 = 0 ifj = max{k. k < i, Ik I} 422 STOCHASTIC ORDER RELATIONS To obtain a feel for the above, let A(t) denote the age of the counting process at time t; that is, it is the time at t since the last event of the counting process that occurred before t. Then A(SI) = SI, and the other values of A(S.) are recursively obtained from II, . ,1'-1 For instance, if II = 0, then A(S2) = S2, whereas if II = 1, then A(S2) = S2 - SI Then (9.32) and (933) are equivalent to AC(S,) J.L(S,) if A(S,) = s, P{/, = III), . AF(A(S,» J.L(S.) if A(S,) < S,. We claim that the counting process defined by the I" i :> 1, constitutes the desired delayed renewal process To see this note that for the counting process the probability intensity of an event at any time t, given the past, is given by P{event in (t, t + h)lhistory up to t} = P{an event of the nonhomogeneous Poisson process occurs in (t, t + h), and it is countedlhistory up to t} = (J.L(t)h + o(h»P{it is counted I history up to t} [J.L(t)h + o(h)] ~ ( ~ j = Ac(t)h + o(h) if A(t) = t = [J.L(t)h + o(h)] A F ~ ~ g » = AF(A(t»h + o(h) if A(t) < t Hence, the probability (intensity) of an event at any time t depends only on the age at that time and is equal to Ac(t) if the age is t and to AF(A(t» otherwise But such a counting process is clearly a delayed renewal process with interarrival distribution F and with initial distribution G. We now use this representation of a delayed renewal process as a random sampling of a nonhomogeneous Poisson process to give a simple probabilistic proof of Blackwell's theorem when the interarrival distribution is continuous THEOREM (Blackwell's Theorem) Let {N*(t), t ~ O} denote a renewal process with a continuous interarrival distribution F Then a m(t + a) - m(t)-- fJ. ast- 00, where m(t) = E[N*(t)J and fJ., assumed finite, is the mean interarrival time HAZARD RATE ORDERING AND APPLICATIONS TO COUNTING PROCESSES 423 Proof We will prove Blackwell's theorem under the added simplifying assumption that AF(t), the failure rate function of F, is bounded away from 0 and 00 That is, we will assume there is 0 < AI < A2 < 00 such that (934) for all t Also let G be a distribution whose failure rate function also lies between AI and A2 Consider a Poisson process with rate A2 Let its event times be 51, 52, Now let Ii, Ii, . be generated by (9 3 3) with p.(t) =A2' and let 1 1 ,1 2 , be conditionally independent (given 5t. 52, ) of the sequence It, Ii,. and generated by (93 3) with G = F and p.(t) = A 2 • Thus, the counting process in which events occur at those times 5, for which 1,* = I-call this process {No(t), t 2:: OJ-is a delayed renewal process with interarrival distnbution F, and the counting process in which events occur at those times 5, for which I, = I-call it {N(t), t 2:: OJ-is a renewal process with interarrival distribution F Let N = mm{i. I, = I:' = I}. That is, N is the first event of the Poisson process that is counted by both generated processes Since independently of anything else each Poisson event will be counted by a given generated process with probability at least AI/ A 2 , it follows that P{I, = I:' = llIl' and hence P{N < oo} = 1 Now define a third sequence I" i 2:: 1, by _ {Il* 1,= I, for i:s N for i 2:: N Thus the counting process {N(t), t 2:: O} in which events occur at those values of 5, for which i = 1 is a delayed renewal process with initial distribution G and interarnval distnbution F whose event times starting at time 5 N are identical to those of {N(t), t 2:: O} Letting N(t, t + a) = N(t + a) - N(t) and similarly for N, we have E[N(t. t + a)] = E[N(t. t + a)15 N :s t]P{5 N :s t} + E[N(t, t + a)15N > t] P{5N > t} = E[N(t, t + a)15 N :s t]P{5 N :S t} + E[N(t, t + a)15N > t]P{5N > t} = E[N(t, t + a)] + (E[N(t, t + a)15N > t] - E[N(t, t + a)15 N > t])P{5 N > t} 424 STOCHASTIC ORDER RELATIONS Now it easily follows that E[N(t, t + a)ISN > t] :s A 2 a and similarly for the term with N replacing N Hence, since N < co implies that P{SN > t} _ 0 as t _ co, we see from the preceding that (935) E[N(t, t + a)] E[N(t, t + a)] _ 0 ast- co But if we now take G = F" where Ft(t) f ~ F ( Y ) dylp., it is easy to establish (see the proof of part (i) of Theorem 352 of Chapter 3) that when G = F t , E[N(!)] = tIp., and so, from (935), which completes the proof a E[N(t,t +a)]-- p. ast- co, We will also find the approach of generating a renewal process by a random sampling of a Poisson process useful in obtaining certain monotonicity results about renewal processes whose interarrival distributions have decreasing fail- ure rates. As a prelude we define some additional notation Deflnltron For a counting process {N(t), t ;:::: O} define, for any set of time points T, N(T) to be the number of events occurring in T We start with a lemma Lemma 9.3.1 Let N = {N(t), t ;:::: O} denote a renewal process whose interamval distribution F is decreasing failure rate Also, let Ny = {Ny(t), t ;:::: O} denote a delayed renewal process whose first interamval time has the distnbution Hp where Hv is the distribution of the excess at time y of the renewal process N, and the others have interarrival distnbution F (That is, Ny can be thought of as a continuation, starting at y, of a renewal process having the same interarrival distribution as N) Then, for any sets of time points TJ, , Tn, HAZARD RATE ORDERING AND APPLICATIONS TO COUNTING PROCESSES 425 Proof Let N* = {N*(t), t s y} denote the first y time units of a renewal process, independent of N, but also having interarrival distribution F We will interpret Ny as the continuation of N* from time y onward Let A*(y) be the age at time y of the renewal process N* Consider a Poisson process with rate p. A(O) and let SI> Sh denote the times at which events occur Use the Poisson process to generate a counting process-call it N-in which events can only occur at times S" ; 2: 1 If I, equals 1 when an event occurs at SI and 0 otherwise, then we let where A(SI) = SI and, for i > 1, where A(S,) is the time at SI since the last counted event pnor to S" where the Poisson event at S, is said to be counted if I, = 1. Then, as before, this generated counting process will be a renewal process with interarnval distnbution F We now define counting process that also can have events or.!!Y at times Sit i 2: 1, and we let II indicate whether or not there is an event at SI Let A(t) denote the time at t since the last event of this process, or, if there have been no events by t, define it to be t + A*(y) Let the L be such that if 1 1 = 0, then II 0, if I, = 1, then l. G with probability A(A(S,»/ A(A(S,» otherwise The well defined since, as L can equal 1 o!!!.y when I, 1, we will always have that A(t) 2: A(t), and since A is nonincreasing A(A(S,» s A(A(S,» Hence we see .II-I} = P{ll = 11/1, • , I,-I}P{/. = 111, = 1} = A(A(S,»/p., and so the counting process generated by the L, i 2: l-call it Ny-is a delayed (since A(O) = A*(y» renewal process whose initial distnbution is Hy Since events of Ny can only occur at time points where events of N occur (l s I, for all i), it follows that N(T) 2: Ny(T) for all sets T, and the lemma follows PROPOSITION 9.3.2 Monotoniclty Properties of DFR Renewal Processes Let A(t) and yet) denote the age and excess at time t of a renewal process N = {N(t), t 2: O} whose interarrival distribution is DFR Then both A(t) and yet) increase stochastically in t 'Ibat is, P{A(t} > a} and P{Y(I) > a} are both increasing in t for all a 426 STOCHASTIC ORDER RELATIONS Proof Suppose we want to show P{A(t + y) > a} ~ P{A(t) > a} To do so we interpret A(t) as the age at time t of the renewal process Nand A(t + y) as the age at time t of the renewal process NY' of Lemma 93 1 Then letting T = [t - a, t] we have from Lemma 93 1 P{N(T) 2: I} 2: P{Ny(T) 2: I}, or, equivalently, P{A(t) :s a} ~ P{A(t + y) :s a} The proof for the excess is similar except we now let T = [t. t + a] Proposition 9.3 2 can be used to obtain some nice bounds on the renewal function of a DFR renewal process and on the distribution of a DFR ran- dom variable Corollary 9.3.3 Let F denote a DFR distribution whose first two moments are ILl = J x dF(x), 1L2 == J x 2 dF(x). (i) If m(t) is the renewal function of a renewal process having interamval distnbu- tion F. then t t 1L2 -:s m(t) :s - + -2 - 1, ILl ILl 21L1 (ii) - {t 1L2} F(t) ~ exp - - - -2 + 1 ILl 21L1 Proof (i) Let XI> X 2 • denote the interarrival times of a renewal process {N(t). t 2: O} having interarrival distribution F Now N(I)+I 2: x. "" t + Y(t). , ~ I where Y(t) is the excess at t By taking expectations. and using Wald's equation we obtain ILI(m(t) + 1) = t + E[Y(t)] HAZARD RATE ORDERING AND APPLICATIONS TO COUNTING PROCESSES 427 But by Proposition 932 E[Y(t)] is increasing in t, and since E[Y(O)] = ILl and (see Proposition 34.6 of Chapter 3) we see or lim E[Y(t)] = 2IL2 , /-+"' ILl IL2 I + ILl :s; ILI(m(t) + 1) :s; t + -, 2ILI (ij) Differentiating the identity m(/) = 2::"1 Fn(t) yields "' m'(t) dl 2: F ~ ( t ) dt n=1 "' = 2: P{nth renewal occurs in (t, t + dt)} + o[(dt)] n"l P{a renewal occurs in (t, t + dt)} + o[d(t)], and thus m'(t) is equal to the probability (intensity) of a renewal at time t But since A(A(t» is the probability (intensity) of a renewal at t given the history up to that time, we thus have m'(/) = E[A(A(t»] ~ A(t), where the above inequality follows since A is decreasing and A(t} :s; I Integrating the above inequality yields met) ~ f: A(s) ds Since F(t) = exp ( - f: A(s) ds) , we see and the result follows from part (i) 428 STOCHASTIC ORDER RELATIONS 9.4 LIKELIHOOD RATIO ORDERING Let X and Y denote continuous nonnegative random variables having respec- tive densities f and g We say that X is larger than Y in the sense of likelihood ratio, and write if X ~ Y LR for all x ~ y Hence X ~ L R Y if the ratio of their respective densities, f(x)/g(x), is nonde- creasing in x We start by noting that this is a stronger ordering than failure rate ordering (which is itself stronger than stochastic ordering) PROPOSITION 9.4.1 Let X and Y be nonnegative random variables having densities f and g and hazard rate functions Ax and Ay If X:::::Y, LR then for aliI::::: 0 Proof Since X :::::LR Y, it follows that, for x ::::: I, f(x) ::::: g(x)f(/)lg(/) Hence, AAI) = 0: f(/) 1. f(x) dx <: f(/) - f' g(x)f(/)lg(/) dx == g(/) f' g(x)dx LIKELIHOOD RATIO ORDERING EXAMPLE 9.4(A) If X is exponential with rate A and Y exponential with rate jL, then f(x) = A e(I'-A)x g(x) jL , and so X 2::LR Y when A :::; jL. ExAMPLE 9.4(a) A Statistical Inference Problem. A central prob- lem in statistics is that of making inferences about the unknown distribution of a given random variable In the simplest case, we suppose that X is a continuous random variable having a density function known to be either f or g Based on the observed value of X, we must decide on either f or g A decision rule for the above problem is a function 4>(x), which takes on either value 0 or value 1 with the interpretation that if X is observed to equal x, then we decide on f if 4>(x) = 0 and on g if 4>(x) 1 To help us decide upon a good decision rule, let us first note that Ix 4>(x)=1 f(x) dx f f(x) 4>(x) dx represents the probability of rejecting f when it is in fact the true density. The classical approach to obtaining a decision rule is to fix a constant a, 0 :::; a ::5 1, and then restrict attention to decision rules 4> such that (941) f f(x)4>(x) dx -< a Among such rules it then attempts to choose the one that maximizes the probability of rejecting f when it is false That is, it maximizes The optimal decision rule according to this criterion is given by the following proposition, known as the Neyman-Pearson lemma. Neyman-Pearson Lemma 429 Among all decision rules tjJ satisfying (94 1), the one that maximizes f g(x)tjJ(x) dx is tjJ* given by if f(x)/g(x) ~ c if f(x)/g(x) < c, 430 STOCHASTIC ORDER RELATIONS where c is chosen so that f f(x) </J*(x) dx = Ot Proof Let </J satisfy (94 1) For any x (</J*(x) - </J(x))(cg(x) - f(x)) 2: O. The above inequality follows since if </J*(x) = 1, then both terms in the product are nonnegative, and if </J*(x) = 0, then both are nonpositive Hence, f (</J*(x) - </J(x))(cg(x) - f(x)) dx 2: 0 and so c [J </J*(x)g(x) dx - f </J(x)g(x) dx ] 2: f </J*(x)f(x) dx - f </J(x)f(x) dx 2: 0, which proves the result If we suppose now thatf and g have a monotone likelihood ratio order-that is, f(x)lg(x) is nondecreasing in x-then the optimal decision rule can be written as where k is such that ~ ' ( x ) = G if x :> k if x < k, J:" f(x) dx = cx. That is, the optimal decision rule is to decide on f when the observed value is greater than some critical number and to decide on g otherwise. Likelihood ratio orderings have important applications in optimization theory The following proposition is quite useful PROPOSITION 9.4.2 Suppose that X and Yare independent, with densities f and g, and suppose that X2:Y LR LlKELlHOOD RATIO ORDERING 431 If hex, y) is a real-valued function satisfying hex, y) 2: hey, x) whenever x 2: y, then heX, Y) 2: heY, X) ,! Proof Let U = max(X, Y), V = min(X, Y) Then conditional on U = u, V = v, U 2: v, the conditional distnbution of heX, Y) is concentrated on the two points h(u, v) and h(v, u), assigning probability AI == P{X max(X, Y). Y = min(X, Y)IU u, V = v} f(u)g(v) f(u)g(v) + f(v)g(u) to the larger value h(u, v) Similarly, conditional on U u and V = v, heY, X) is also concentrated on the two points h(u, v) and h (v, u), assigning probability A2 == P{Y= max(X, Y), X= min(X, Y)IU = u, V = v} = g(u)f(v) g(u)f(v) + f(u)g(v) Since u 2: V, f(u)g(v) 2: g(u )f(v), and so, conditional on U = u and V = v, heX, Y) is stochastically larger than heY, X) That is, P{h(X, Y) 2: al U, V} 2: P{h(Y, X) 2: al U, V}, and the result now follows by taking expectations of both sides of the above Remark It is perhaps somewhat surprising that the above does not necessar- ily hold when we only assume that X :> sl Y. For a counterexample, note that 2x + y ~ x + 2y whenever x ~ y However, if X and Yare independent with x=G y=G with probability 2 with probability 8, with probability .2 with probability 8, 432 STOCHASTlC ORDER RELA TlONS then X >SI Y, but P{2X + Y > ll} = .8 and P{2 Y + X > Il} = 8 + (.2)(.8) = 96. Thus 2X + Y is not stochastically larger than 2Y + X Proposition 94.2 has important applications in optimal scheduling prob- lems. For instance, suppose that n items, each having some measurable charac- teristic, are to be scheduled in some order For instance, the measurable characteristic of an item might be the time it takes to complete work on that item Suppose that if XI is the measurable characteristic of item i, and if the order chosen is ii, ... , in, a permutation of 1, 2, .. ,n, then the return is given by h(Xi ,XI' •.• ,X, ). Let us now suppose that the measurable characteris- I z ~ tic of item i is a random variable, say X" i = 1, ... , n. If XI ~ X 2 > ... ~ X n , LR LR LR and if h satisfies h(YI,'" ,Yr-I,Yr,Yi+l, .. ,Yn) > h(y], ... ,YoYI-I,Y,+I, . . ,Yn) whenever Y, > Yi-I, then it follows from Proposition 94.2 that the ordering 1, 2, .. ,n (n, n - 1, .. ,1) stochastically maximizes (minimizes) the return. To see this consider any ordering that does not start with item I-say 01, i h 1, i'!" ... , in-I)' By conditioning on the values X, ,Xi, " ,XI ,we can usc I) .-1 Proposition 9.4.2 to show that the ordering (i1' 1, i 2 , i 3 , . , in-I) leads to a stochastically larger return. Continuing with such interchanges leads to the conclusion that 1, 2, ... , n stochastically maximizes the return. (A similar argument shows that n, n - 1, ... , 1 stochastically minimizes return.) The continuous random variable X having density fis said to have increasing likelihood ratio iflog(f(x» is concave, and is said to have decreasing likelihood ratio if logf(x) is convex. To motivate this terminology, note that the random variable c + X has density f(x - c), and so ¢:::> logf(x - Ct) - logf(x - C2) i X ¢:::> logf(x) is concave. for all CI > C2 Hence, X has increasing likelihood ratio if c + X increases in likelihood ratio as c increases For a second interpretation, recall the notation X, as the remaining life from t onward of a unit, having lifetime X, which has reached the age of t Now, F,(a) := P{X, > a} = F(t + a)/F(t), STOCHASTiCALLY MORE VARIABLE and so the density of X, is given by Hence, f,(a) = f(1 + a)/F(/). for all s $. I ~ f(s + a) i a f(1 + a) for all s $. I ~ logf(x) is concave 433 Therefore, X has increasing likelihood ratio if Xs decreases in likelihood ratio as s increases Similarly, it has decreasing likelihood ratio if Xs increases in likelihood ratio as s increases PROPOSITION 9.4.3 If X has increasing likelihood ratio, then X is I FR Similarly, if X has decreasing likelihood ratio, then X is DFR Proof Xs ~ X, ~ Ax ::s Ax LR " (from Proposition 9 41) ~ X s ~ X , ,I Remarks (1) A density function f such that log f(x) is concave is called a Polya frequency of order 2. (2) The likelihood ratio ordering can also be defined for discrete random variables that are defined over the same set of values We say that X ~ L R Y if P{X = x}/P{Y = x} increases in x 9.5 STOCHASTICALLY MORE VARIABLE Recall that a function h is said to be convex if for all 0 < A < 1, X I, X2, 434 STOCHASTIC ORDER RELATIONS We say that X is more variable than Y, and write X ~ y Y, if (9.5.1) £[h(X)] ~ £[h(Y)] for all increasing, convex h. If X and Y have respective distributions F and G, then we also say that F 2:y G when (9.5.1) holds. We will defer an explanation as to why we call X more variable than Y when (9.5.1) holds until we prove the following results PROPOSITION 9.5.1 If X and Yare nonnegative random variables with distributions F and G re<;pectively, then X ~ v Y if, and only if, (952) f F(x) dx ~ f G(x) dx for all a ~ 0 Proof Suppose first that X ~ v Y Let ha be defined by ha(x) == (x - aV == {O , x-a x ~ a x>a Since hQ i<; a convex increasing function, we have But And, similarly, E[hQ(X)] ~ E[hQ(Y)] E [hQ(X)] = f: P{(X - aV > x} dx = f: P{X> Ii + x}dx = fa'" F(y) dy E[hAY)] = t G(y) dy, thu<; establishing (952) To go the other way, suppose (952) is valid for all a ~ 0 and let h denote a convex increasing function that we shall suppose i<; twice differenti- able Since h convex is equivalent to h" ~ 0, we have from (952) (953) f: h"(a) 1: F(x) dx da;;>: 1: h"(a) f G(x) dx da STOCHASTICALLY MORE VARIABLE Working with the left-hand side of the above J " h"(a) Je: F(x) dx da = J'" JX h"(a) da F(x) dx o Q 0 II = J: h'(x)F(x) dx - h'(O)E[X] = t h'(x) r dF(y) dx - h'(O)E[X] = t J: h'(x) dx dF(y) - h'(O)E[X] = J: h(y) dF(y) - h(O) - h'(O)E[X] = E[h(X)] - h(O) - h'(O)E[X] Since a similar identity is valid when Fis replaced by G, we see from (953) (954) E[h(X)] - E[h(Y)] 2: h'(O)(E[XJ - E[Y]) 435 The right-hand side of the above inequality is nonnegative since h'(O) 2: 0 (h is increasing) and <;ince E[X] 2: E[Y], which follow<; from (952) by setting a = 0 Corollary 9.5.2 If X and Yare nonnegative random variables such that E[X] = E[Y], then X 2:. Y if, and only if, E[h(X)] 2: E[h(Y)J for all convex h Proof Let h be convex and suppo<;e that X 2:. Y Then as E[XJ = E[Y], the inequality (954), which was obtained under the a<;sumption that h is convex, reduces to E[h(X)] 2: E[h(Y)], and the result is proven Thus for two nonnegative random variables having the same mean we have that X ~ y Y if E[h(X)] ;2: E[h(Y)] for all convex functions h. It is for this reason we say that X ;2:y Y means that X has more variability than Y That is, intuitively X will be more variable than Y if it gives more weight to the extreme values, and one way of guaranteeing this is to require that E[h(X)] ~ E[h(Y)] whenever h is convex (For instance, since E[X] = E[Y] and since h(x) = x 2 is convex, we would have that Var(X) > Var(Y) ) 436 STOCHASTIC ORDER RELATIONS Corollary 9.5.3 If X and Yare nonnegative with E[X] = E[Y], then X;:::. Yimplies that - X;:::. - Y. Proof Let h denote an increasing convex function We must show that E[h(-X)] 2: E[h(-Y)] This, however, follows from Corollary 952 since the function f(x) = h( - x) is convex Our next result deals with the preservation of the variability ordering. PROPOSITION 9.5.4 If XI, 1, , Xn are independent and Y I , • , n, then • Yn are independent, and X/ ;:::. Y" i = g(XI' ,Xn);:::g(YI'" Yn) y for all increasing convex functions g that are convex in each argument Proof Start by assuming that the set of 2n random vanables is independent The proof is by induction on n When n = 1 we must show that when g and h are increasing and convex and Xl ;:::. Y I This follows from the definition of XI ;:::y Y I as the function h(g(x» is increasing and convex since d dx h(g(x» = h'(g(x»g'(x) ;::: 0, d 2 dx 2 h(g(x» = h"(g(x»(g'(x»)2 + h'(g(x»g"(x);::: ° Assume the result for vectors of size n - 1 Again let g and h be increasing and convex Now E[h(g(X h X 2 , , Xn»IX I = x] = E[h(g(x. X2> , Xn»IX I = x] = E[h(g(x, X 2 , , Xn»] (by independence of XI. ,X n ) ;::: E[h(g(x, Y 2 • = E[h(g(XI' Y 2 , (by the induction hypothesis) (by the independence of XI, Y 2 , APPLICATIONS OF VARIABILITY ORDERINGS 437 Taking expectations gives that But. by conditioning on Y 2 , , Yn and using the result for n = 1, we can show that which proves the result. Since assuming that the set of 2n random variables is indepen- dent does not affect the distributions of g(X h • •• Xn) and g(Y h ,Y n ) the result remains true under the weaker hypothesis that the two sets of n random van abies are independent 9.6 ApPLICATIONS OF VARIABILITY ORDERINGS Before presenting some applications of variability orderings we will determine a class of random variables that are more (less) variable than an exponential. Definit[on The nonnegative random van able X is said to be new better than used in expectation (NBUE) if E[X - alX> a] $ E[X] for all a 2:: 0 It is said to be new worse than used in expectation (NWUE) if E[X - alX> a] 2:: E[X] for all a 2:: 0 If we think of X as being the lifetime of some unit, then X being NBUE (NWUE) means that the expected additional life of any used item is less (greater) than or equal to the expected life of a new item. If X is NBUE and F is the distribution of X, then we say that F is an NBUE distribution, and similarly for NWUE. PROPOSITION 9.6.1 If F is an NBUE distnbution having mean IL, then F$ exp(IL), v where exp(IL) is the exponential distribution with mean IL The inequality is reversed if F is NWUE. 438 STOCHASTIC ORDER RELATIONS Proof Suppose F is NBUE with mean p. By Proposition 95.1 we must show that (961) for all c ~ 0 Now if X has distnbution F, then E[X - alX> a] = J: P{X - a> xiX> a}dx = r F(a +x) dx o F(a) = r ~ ( y ) dy a F(a) Hence, for F NBUE with mean p., we have or which implies r ( F(a) ) da ~ E.- o t F(y)dy P. We can evaluate the left-hand side by making the change of variables x = Ia"' F(y) dy,dx = -F(a) da to obtain _ JX(C) dx ~ E.- ,. x p. where x(c) = Ic"' F(y) dy Integrating yields or which proves (96 1) When F is NWUE the proof is similar APPLICATIONS OF VARIABILITY ORDERINGS 439 9.6.1 Comparison of G / G /1 Queues The GIG/1 system supposes that the interarrival times between customers, X n , n 1, are independent and identically distributed as are the successive service times Sf!' n :> 1 There is a single server and the service discipline is "first come first served." If we let Dn denote the delay in queue of the nth customer, then it is easy to verify (see Section 7 1 of Chapter 7 if you cannot verify it) the following recursion formula. D! =0, (96.2) Dn+! = max{O, D" + Sf! Xn+I }. THEOREM 9.6.2 Consider two G/G/l systems The ith, i = 1,2, having interarrival times and seruice times n ;;:: 1 Let denote the delay in queue of the nth customer in system i, i = 1,2 If (i) and (0) then for all n Proof The proof is by induction. Since it is obvious for n = 1, assume it for n. Now (by the induction hypothesis). v (by assumption), v - X! ;;:: - (by Corollary 9 5 3) v Thus, by Proposition 95 4, DiO + ;;:: + - v Since h(x) = max(O, x) is an increasing convex function, it follows from the recursion (962) and Proposition 954 that D (i) :> D(2) n+1 - 11+1 v thus completing the proof 440 STOCHASTIC ORDER RELATIONS Let W Q = lim E[Dn] denote the average time a customer waits in queue Corollary 9.6.3 For a GIG/l queue with E[S] < E[X] (i) If the interarnval distribution is NBUE with mean lIA, then <: AE[S2] W Q - 2(1 - AE[S]) The inequality is reversed if the distribution is NWUE (ii) If the service distribution is NBUE with mean 1I1L, then where {3 is the solution of and G is the interarrival distribution The inequality is reversed if G is NWUE. Proof From Proposition 961 we have that an NBUE distribution is less vanable than an exponential distnbution with the same mean Hence in (i) we can compare the system with an MIG/1 and in (ii) with a GIMII The result follows from Theorem 962 since the right-hand side of the inequalities in (i) and (ii) are respectively the average customer delay in queue in the systems MIG/I and GIM/1 (see Examples 4 3(A) and 4 3(B) of Chapter 4) 9.6.2 A Renewal Process Application Let {Np(t), t ;:::: O} denote a renewal process having interarrival distribution F. Our objective in this section is to prove the following theorem. THEOREM 9.6.4 If F is NBUE with mean IL, then NF(t) ::Su N(t), where {N(t)} is a Poisson process with rate IIIL, the inequality being reversed when F is NWUE Interestingly enough it turns out to be easier to prove a more general result. (We'll have more to say about that later) APPLICATIONS OF VARIABILITY ORDERINGS 441 Lemma 9.S.5 Let F., i ~ 1, be NBUE distributions, each having mean IL, and let G denote the exponential distnbution with mean IL Then for each k :2: 1, ., (963) * F.)(t) ~ 2: G(I)(t), I-k where * represents convolution, and G(I) is the i-fold convolution of G with itself Proof The proof is by induction on k To prove it when k 1 let Xl. X 2 , • be independent with X, having distribution F. and let Now (964) [ N.(t)+ 1 ] E ~ X, = E[X]E[N*(t) + 1] by Wald's equation, which can be shown to hold when the X, are independent, nonnega- tive, and all have the same mean (even though they are not identically distnbuted) However, we also have that N'(/I+ 1 2: X, I + time from I until N* increases ,=1 But E[time from t until N* increases] is equal to the expected excess life of one of the X, and is thus by NBUE, less than or equal to IL That is, [ N'(I)+1 ] E ~ X, ~ t + I L and, from (964), (96.5) E[N*(t)] ~ tIlL However. ., E[N*(t)] = 2: P{N*(t) ~ i} .=1 ., = 2: P{X I + . + X. :S t} ,=1 OJ = 2: (FI * * F.)(t) ,"'I 442 STOCHASTIC ORDER RELATIONS Since the right-hand side of (9 (j 3) is, si milarly, when k = 1, the mean of the exponential renewal process at 1 (or 11p.), we see from (9 (j 5) that (9 (j 3) is established when k = 1 Now assume (9 (j 3) for k We have * F.-d(1 - x) dF,(x) (by the induction hypothesis) ., = L (G(/) * F; ... d(l) = GU- 1l * F; ... I) * G) (I) (by the induction hypothesis) ., ::= L Gu ... IJ(I) I",k ., ::= L G(lJ(I), which completes the proof We are now ready to prove Theorem 9 6.4 Proof of Theorem 9.64 By Proposition 951 we must show that for all k 1 L P{NF(t) i}:5 L P{N(t) i} '=k But the left-hand side is equal to the left-hand side of (963) with each F; equaling F and the right-hand side is just the right-hand side of (96.3) Remark Suppose we would have tried to prove directly that L F(I)(t).:S L G(I)(t) ,=k whenever F is NBUE and G exponential with the same mean. Then we could have tried to use induction on k The proof when k = 1 would have been APPLICATIONS OF VARIABILITY ORDERINGS 443 identical with the one we used in Lemma 965 However, when we then tried to go from assuming the result for k to proving it for k + 1, we would have reached a point in the proof where we had shown that However, at this point the induction hypothesis would not have been strong enough to let us conclude that the right-hand side was less than or equal to (Lj:k G{J») * F. The moral is that sometimes when using induction it is easier to prove a stronger result since the induction hypothesis gives one more to work with. 9.6.3 A Branching Process Application· Consider two branching processes and let FI and F2 denote respectively the distributions of the number of offspring per individual in the two processes. Suppose that That is, we suppose that the number of offspring per individual is more variable in the first process. Let i = 1,2, denote the size of the nth generation of the ith process. THEOREM 9.6.6 If = 1, i = 1, 2, then for all n Proof The proof is by induction on n Since it is true for n = 0, assume it for n. Now and Z(I) X n+1 L.J I 1=1 Z (2) - y n+1 - L..J J' 1=1 where the are independent and have distribution FI (XI representing the number of offspring of the jth person of the nth generation of process 1) and the are ... To review this model, the reader should refer to Section 4 5 of Chapter 4 444 STOCHASTIC ORDER RELATIONS independent and have distribution F2 Since (by the hypothesis) and (by the induction hypothesis), the result follows from the subsequent lemma Lemma 9.6.7 Let Xl, X 2 , be a sequence of nonnegative independent and identically distnbuted random variables, and similarly Y l , Y 2 , • Let Nand M be integer-valued nonnegative random van abies that are independent of the X, and Y, sequences. Then N M X, 2! Y;, , N2!M=:>2:X,2!2: Y, y , ~ 1 Y ,= 1 Proof We wi1l first show that (96.6) Let h denote an increasing convex function To prove (9.66) we must show that (967) Since N 2!, M, and they are independent of the x" the above will follow if we can show that the function g(n), defined by g(n) = £[h(XI + .. + Xn)], is an increasing convex function of n Since it is clearly increasing because h is and each X, is nonnegative, it remains to show that g is convex, or, equivalently, that (96.8) g(n + 1) - g(n) is increasing in n To prove this let Sn = L ~ X" and note that g(n + 1) - g(n) = £[h(Sn + Xn+ l ) - h(Sn)] APPLICATIONS OF V ARIABILITY ORDERINGS Now, E[h(Sn + Xn+ l ) - h(Sn)ISn = t] = E[h(t + X n + l ) - h(t)] = f(t} (say). 445 Since h is convex, it follows that f(t) is increasing in t Also, since Sn increases in n. we see that E[f(Sn)] increases in n But E[f(Sn)] = g(n + 1) - g(n), and thus (968) and (96.7) are satisfied We have thus proven that and the proof will be completed by showing that or, equivalently, that for increasing, convex h But E [h jM = m] E [h ] (by independence) m m mJ. since 2: X, 2 2: Y, I • I The result follows by taking expectations of both sides of the above Thus we have proven Theorem 9.6.6, which states that the population size of the nth generation is more variable in the first process than it is in the second process. We will end this section by showing that if the second (less variable) process has the same mean number of offspring per individual as does the first, then it is less likely at each generation to become extinct. 446 STOCHASTIC ORDER RELATIONS Corollary 9.6.8 Let ,.,.1 and ""2 denote respectively the mean of F, and F 2 , the offspring distnbutions If zg) == 1, i == 1,2,,.,., == ""2 == ,.,., and F, F 2 , then == O} == O} for aU n Proof From Theorem 9 66 we have that and thus from Proposition 95 1 ., ., 2: i} 2: i}, p2 p2 or, equivalently, since ., E[Zn1 == 2: P{Zn i} == ,.,.n, ,=, we have that or which proves the result 9.7 ASSOCIATED RANDOM VARIABLES The set of random variables XI, X 2 , •• ,X n is said to be associated if for all increasing functions f and g E[f(X)g(X)] > E[f(X)]E[g(X)] where X = (XI, , X n ), and where we say that h is an increasing function if h(xl' .. , Xn) -< h(YI,' ., Yn) whenever X, ::s; y, for i = 1, , n PROPOSITION 9.7.1 Independent Random Variables Are Associated Proof Suppose that X" , Xn are independent The proof that they are associated is by induction As the result has already been proved when n == 1 (see Proposition 721), assume that any set of n - 1 independent random variables are associated and ASSOCIATED RANDOM VARIABLES 447 let J and g be increasing functions Let X (Xl, , Xn) Now, E[J(X)g(x)lxn=x]=E[J(Xh ,Xn .,X)g(Xh ,Xn-!,x)IXn=x] = E[J(XIo , X n - 1 , x)g(X b ,X n - I , x)] ~ E[J(X 1 , ,Xn_l,x)]E[g(X h ,Xn_j,x)] = E[J(X)IX n = x]E[g(X)IXn x] where the last two eq ualities follow from the independence of the X,, and the ineq uality from the induction hypothesis Hence, and so, E[J(X)g(X)] ~ E[E[J(X)fXn]E[g(X)IXnll However, E[J(X)IX n ] and E[g(X)IX n ] are both increasing functions of Xn and so Proposition 72 1 yields that E[J(X)g(X)] ~ E[E[J(X)IXnll' E[E[g(X)IXnll = E[J(X)]E[g(X)] which proves the result It follows from the definition of association that increasing functions of associated random variables are also associated. Hence, from Proposition 97.1 we see that increasing functions of independent random variables are asso- ciated EXAMPLE 9.7(A) Consider a system composed of n components, each of which is either working or failed. Let XI equal 1 if component i is working and 0 if it is failed, and suppose that the X, are independent and that P{X, = I} PI, i = 1, .. , n. In addition suppose that there is a family of component subsets C 1 ,. ,C r such that the system works if, and only if, at least one of the components in each of these subsets is working. (These subsets are called cut sets and when they are chosen so that none is a proper subset of another they are called minimal cut sets.) Hence, if we let S equal 1 if the system works and 0 otherwise, then where r Y,=maxX,. jEC, 448 STOCHASTIC ORDER RELATIONS As Y I , •• , Y, are all increasing functions of the independent ran- dom variables XI, .. , Xn , they are associated Hence, P{S = I} = £[S] = £ [n y,] ~ £[Y.]£ [II y,] ~ " ~ n £[Y,] where the inequalities follow from the association property. Since Y i is equal to 1 if at least one of the components in C, works, we obtain that P{system works} '" IT {I - ,ll (1 - P,)}. Often the easiest way of showing that random vanables are associated is by representing each of them as an increasing function of a specified set of independent random variables. Definition The stochastic process {X(/), I 2:: O} is said to be associated if for all nand IJ, , In the random vanables X(/I). • X(ln) are associated EXAMPLE 9.7(8) Any stochastic process {XC t), t ~ O} having indepen- dent increments and X(O) = 0, such as a Poisson process or a Brownian motion process, is associated. To venfy this assertion, let tl < t2 < ... < tn Then as X(tl)' .. , X(t n ) are all increasing functions of the independent random variables X(t,) - X(t,-I). i = 1, .. , n (where to = 0) it follows that they are associated. The Markov process {Xn' n ~ O} is said to be stochastically monotone if P{Xn ::s; alX n - 1 = x} is a decreasing function of x for all n and a That is, it is stochastically monotone if the state at time n is stochastically increasing in the state at time n - 1 PROPOSITION 9.7.2 A stochastically monotone Markov process is associated Proof We will show that Xu, • Xk are associated by representing them as increasing functions of a set of independent random variables Let Fn J denote the conditional distribution function of X. given that X._ I = x Let VI' • Vn be independent uniform PROBLEMS 449 (0, 1) random vanables that are independent of {Xn. n 2: O} Now, let Xo have the distnbution specified by the process, and for i = I, . n, define successively X, F;l (V,) == infIx V,::; F, x (x)} i-I i-t It is easy to check that, given X,_I. X, has the appropnate conditional distribution Since F, x, ,(x) is decreasing in X,- l (by the stochastic monotonicity assumption) we see that X, is increasing in X,-J. Also, since F, x (x) is increasing in x, it follows that , , X, is increasing in V" Hence, XI = is an increasing function of Xo and Vj, X 2 = F i\ (V 2 ) is an increasing function of XI and V 2 and thus is an increasing function of X o • VI, and V 2 , and so on As each of the X,. i = 1, .. , n, is an increasing function of the independent random variables Xo• VI, . , Vn it follows that XI,' ,X n are associated Our next example is of interest in the Bayesian theory of statistics. It states that if a random sample is stochastically increasing in the value of a parameter having an a priori distribution then this random sample is, unconditionally, as- sociated. EXAMPLE 9.7(c) Let A be a random vanable and suppose that, conditional on A A, the random variables Xl,' " Xn are indepen- dent with a common distribution FA If FA(x) is decreasing in A for all x then XI,' ., XII are associated This result can be verified by an argument similar to the one used in Proposition 9.7.2. Namely, let VI, .. , Vn be independent uniform (0, 1) random variables that are independent of A. Then define Xl, . ,X n recursively by Xl = FAI (V,). i 1, ... ,n As the X, are increasing functions of the independent random variables A, VI,. ., Vn they are asso- ciated. PROBLEMS 9.1. If X Y, prove that X+ Y+ and Y- :>st X- 9.2. Suppose X, Y j , i = 1, 2 Show by counterexample that it is not necessanly true that Xl + X 2 Y1 + Y2• 9.3. (a) If X Y, show that P{X:> Y} 2!: t Assume independence (b) If P {X;:::: Y} 2!: !, P{Y;:::: Z} ;::::!, and X, Y, Z are independent. does this imply that P{X ;:::: Z} 2!: !? Prove or give a counterexample 9.4. One of n elements will be requested-it will be i with probability Pi. i = 1, . ,n If the elements are to be arranged in an ordered list, 450 STOCHASTIC ORDER RELATIONS find the arrangement that stochastically minimizes the position of the element requested. 9.S. A random variable is said to have a gamma distribution if its density is given, for some A > 0, a > 0, by where rea) = fro e-.1X"-1 dx. , 0 Show that this is IFR when a :> 1 and DFR when a $ 1 9.6. If X,. i = 1, ... , n, are independent IFR random variables, show that min(Xl' " , Xn) is also IFR. Give a counterexample to show that max (Xl , ... , Xn) need not be. 9.7. The following theorem concerning IFR distributions can be proven. Theorem: If F and G are IFR, then so if F * G, the convolution of F and G Show that the above theorem is not true when DFR replaces IFR 9.8. A random variable taking on nonnegative integer values is said to be discrete IFR if P{X = klX:> k} is nondecreasing in k, k = 0, 1, .. Show that (a) binomial random variables, (b) Poisson random variables, and (c) negative binomial random variables are all discrete IFR random variables. (Hint The proof is made easier by using the preservation of IFR distribution both under convolution (the theorem stated in Problem 9.7), which remains true for discrete IFR distributions, and under limits) 9.9. Show that a binomial n, p distnbution Bn,p increases stochastically both as n increases and as p increases. That is, Bn,p increases in n and in p. 9.10. Consider a Markov chain with states 0, 1, ... , n and with transition probabilities { C, P,,= 1 - c, if} = i - 1 ifj = i + k(i), (i = 1, .. , n - 1; Po,o = Pn,n = 1), PROBLEMS 451 where k(i) ~ O. Let J, denote the probability that this Markov chain ever enters state 0 given that it starts in state i Show that J, is an increasing function of (el' .. , en-I)' (Hint· Consider two such chains, one having (e l , . , en-I) and the other (el,' " Cn-I), where C, >- e, Sup- pose both start in state i Couple the processes so that the next state of the first is no less than that of the second. Then let the first chain run (keeping the second one fixed) until it is either in the same state as the second one or in state n. If it is in the same state, start the procedure over.) 9.11. Prove that a normal distnbution with mean J.L and variance (J2 increases stochastically as J.L increases. What about as (J2 increases? 9.12. Consider a Markov chain with transition probability matrix P" and sup- pose that L,:k P" increases in i for all k. (a) Show that, for all increasing functions f, L, p,JU) increases in i. (b) Show that L;k P ~ increases in i for all k, where P ~ , are the n-step transition probabilities, n >- 2. 9.13. Let {N,(r), r 2 O}, i = 1,2, denote two renewal processes with respective interarrival distributions Fl and F 2 • Let A,(r) denote the hazard rate function of F" i = 1, 2. If for all s, r, show that for any sets T l , ••• , Tn, 9.14. Consider a conditional Poisson process (see Section 2.6 of Chapter 2). That is, let A be a nonnegative random variable having distribution G and let {N(r), r ~ O} be a counting process that, given that A = A, is a Poisson process with rate A. Let G"n denote the conditional distribution of A given that N(r) = n. (a) Derive an expression for G 1 n (b) Does G"n increase stochastically in n and decrease stochastically in r? (c) Let Y1,n denote the time from r until the next event, given that N(r) = n Does Y1,n increase stochastically in r and decrease stochas- tically in n? 9.15. For random variables X and Y and any set of values A, prove the coupling bound I P{X E A} - P{Y E A}I <: P{X ¥- Y} 452 STOCHASTIC ORDER RELATIONS 9.16. Verify the inequality (924) in Example 92 (E). 9.17. Let R I , • ,Rm be a random permutation of the numbers 1,. , m in the sense that for any permutation iI, , i m , P{R, = if, for j = 1, . m} = 11m' (a) In the coupon collectors problem described in Example 9 2(E), argue that E[N] = Lj:1 j-I E[1/ P R IR, is the last of types R I ,. ., R, to be collected], where N is the n ~ m b e r of coupons needed to have at least one of each type (b) Argue that the conditional distribution of P R given that R, is the last of types R I , •• ,R, to be obtained, is stochastically smaller than the unconditional distribution of P R • ! (c) Prove that 9.18. Let XI and X 2 have respective hazard rate functions AI(t) and A2(t). Show that AI(t) :> A 2 (t) for all t if, and only if, P{XI > t} P{X2> t} ------::.. -< ------::.. P{XI > s} - P{X2 > s} for all s < t. 9.19. Let X and Y have respective hazard rate functions Ax(t) ::s; Ay(t) for all t Define a random variable X as follows: If Y = t, then _ {t X- t+ XI with probability Ax(t)/ Ay(t) with probability 1- Ax(t)/Ay(t), where XI is a random variable, independent of all else, with distribution P{X > } = P{X> t + s} I S P{X> t} Show that X has the same distribution as X 9.20. Let F and G have hazard rate functions AF and AG Show that AF(t) ~ Adt) for all t if, and only if, there exist independent continuous random variables Y and Z such that Y has distribution G and min(Y, Z) has distribution F PROBLEMS 453 9.21. A family of random variables {XII, 0 E [a, b J} is said to be a monotone likelihood ratio family if XII :5 LR XII when 0 1 :5 O 2 • Show that the follow- I 2 ing families have monotone likelihood ratio· (a) XII is binomial with parameters n, 0, n fixed. (b) XII is Poisson with mean 0 (c) XII is uniform (0, 0) (d) XII is gamma with parameters (n, 11 0), n fixed (e) XII is gamma with parameters (0, ...\), ...\ fixed 9.22. Consider the statistical inference problem where a random variable X is known to have density either lor g The Bayesian approach is to postulate a prior probability p that I is the true density The hypothesis that I were the true density would then be accepted if the postenor probability given the value of X is greater than some critical number. Show that if/(x)/g(x) is non decreasing inx, this is equivalent to accepting I whenever the observed value of X is greater than some critical value. 9.23. We have n jobs with the ith requiring a random time XI to process. The jobs must be processed sequentially and the objective is to stochastically maximize the number of jobs that are processed by a (fixed) time t (a) Determine the optimal strategy if x. < XI+ I ' i = 1, , n - l. LR (b) What if we only assumed that X I :5 X I + I , i = 1, . ,n - 1? st 9.24. A stockpile consists of n items Associated with the ith item is a random variable XI' i = 1, , n. If the ith item is put into the field at time t, its field life is X,e- a ,. If X, X,+I, i = 1, . , n - 1, what ordering stochastically maximizes the total field life of all items? Note that if n = 2 and the ordering 1, 2 is used, then the total field life is XI + X 2 e- aXl . 9.25. Show that the random variables, having the following densities, have increasing likelihood ratio; that is, the log of the densities are concave. (a) Gamma· I(x) = ...\e-Al(...\x)a-I/f(a), a 1 (b) Weibull· fix) = a...\(...\x)a-le-(Al)Q, a ;;::: 1 (c) Normal truncated to be positive, 1 2 2 I(x) = f2rr, aV'2iia x>o. 454 STOCHASTIC ORDER RELATIONS 9.26. Suppose XI, ' , , , Xn are independent and Y I , .. , Y n are independent. If )(, ~ v Y" i = 1, ... , n, prove that Give a counterexample when independence is not assumed. 9.27. Show that F is the distribution of an NBUE random variable if, and only if, for all a, where Fe, the equilibrium distribution of F, is defined by with p.. = f; F(x) dx 9.28. Prove or give a counterexample to the following: If F :::> v G, then NF(t) :::>.N G(t), where {Nu(t) t ~ O} is the renewal process with interarrival distribution H, H = F, G. 9.29. Let Xl, ... , Xn be independent with P{X i = I} = P, = 1 P{)(, = OJ, i = 1, ... , n. Show that n LX,::s Bin(n,p), ,:1 • where Bin(n, p) is a binomial random variable with parameters nand p= L;"I P,In. (Hint' Prove it first for n = 2.) Use the above to show that L;=l X, <. Y, where Y is Poisson with mean np. Hence, for instance, a Poisson random variable is more variable than a binomial having the same mean. 9.30. Suppose P{X = OJ = 1 M12, P{X = 2} = MI2 where 0 < M < 2. If Y is a nonnegative integer-valued random variable such that P{Y = I} =: 0 and E[Y] = M, then show that X <. Y. 9.31. Jensen's inequality states that for a convex function f, E[f(X)] ~ f(E[X]). (8) Prove Jensen's inequality PROBLEMS 455 (b) If X has mean E[X], show that X::> E[X], v where E[X] is the constant random variable. (c) Suppose that there exists a random variable Z such that E[ZiY] ::> o and such that X has the same distribution as Y + Z. Show that X Y. In fact, it can be shown (though the other direction is difficult to prove) that this is a necessary and sufficient condition for X Y. 9.32. If E[X] 0, show that eX ::>y X when e ::> 1 9.33. Suppose that P{O s X s I} = 1, and let (J = E[X]. Show that X Y where Y is a Bernoulli random variable with parameter (J. (That is, P{Y I} (J = 1 - p{Y = O}.) 9.34. Show that E[YIX] Sy Y. Use this to show that: (a) if X and Yare independent then XE[Y] Sy XY. (b) if X and Yare independent and E[Y] = 0 then X X + Y. (c) if X" i ::> 1, are independent and identically distnbuted then 1 n 1 n+1 - L X ' ~ - L X , n ,.::1 v n + 1 ,=1 (d) X + E[X] Sy 2X. 9.35. Show that if XI, .. ,X n are all decreasing functions of a specified set of associated random variables then they are associated. 9.36. In the model of Example 9.7(A), one can always find a family of compo- nent subsets M I, ... , M m such that the system will work if, and only if, all of the components of at least one of these subsets work (If none of the MJ is a proper subset of another then they are called minimal path sets.) Show that p{S = I} < 1 - If (1 9.37. Show that an infinite sequence of exchangeable Bernoulli random vari- ables is associated 456 STOCHASTIC ORDER RELATIONS REFERENCES Reference 8 is an important reference for all the results of this chapter In addition, on the topic of stochastically larger, the reader should consult Reference 4 Reference 1 should be seen for additional material on IFR and DFR random vanables Coupling is a valuable and useful technique, which goes back to Reference 5, indeed, the approach used in Section 922 is from this reference The approach leading up to and establishing Proposition 9 32 and Corollary 9 3 3 is taken from References 2 and 3 Reference 4 presents some useful applications of variability orderings R E Barlow, and F Proschan, Statistical Theory of Reliability and Life Testing, Holt, New York, 1975 2. M. Brown, "Bounds, Inequalities and Monotonicity Properties for Some Specialized Renewal Processes," Annals of Probability, 8 (1980), pp. 227-240 3 M. Brown, "Further Monotonicity Properties for Specialized Renewal Processes," Annals of Probability (1981) 4 S. L Brumelle, and R G Vickson, "A Unified Approach to Stochastic Dominance," in Stochastic Optimization Models in Finance, edited by W Ziemba and R Vickson, Academic Press, New York, 1975 5 W Doeblin, "Sur Deux Problemes de M Kolmogoroff Concernant Les Chaines Denombrables," Bulletin Societe Mathematique de France, 66 (1938), pp 210-220 6 T Lindvall, Lectures on the Coupling Method, Wiley, New York, 1992 7. A Marshall and [ Olkin, Inequalities Theory of Majorization and Its Applications, Academic Press, Orlando, FL, 1979 8 J G Shanthikumar and M Shaked, Stochastic Orders and Their Applications, Academic Press, San Diego. CA, 1994 C HAP T E R 10 Poisson Approximations INTRODUCTION Let XI, . , Xn be Bernoulli random variables with P{X, = 1} = A, = 1 - P{X, = O}, i = 1, ., n n and let W = 2: X, The" Poisson paradigm" states that if the A, are all small I ~ I and if the X, are either independent or at most "weakly" dependent, then W n will have a distribution that is approximately Poisson with mean A = 2: A" 1=1 and so In Section 10.1 we present an approach, based on a result known as Brun's sieve, for establishing the validity of the preceding approximation. In Section 10.2 we give the Stein-Chen method for bounding the error of the Poisson approximation. In Section 10 3, we consider another approximation for P{W = k}, that is often an improvement on the one specified above. 10.1 BRUNts SIEVE The validity of the Poisson approximation can often be established upon applying the following result, known as Brun's sieve PROPOSITION 10.1.1 Let W be a bounded nonnegative integer-valued random variable If, for all i 2:: 0, E [ ( ~ ) ] =A'/i' 457 458 POISSON APPROXIMATIONS then Proof Let I" J 2: 0, be defined by 1 = {I J 0 ifW j otherwise Then, with the under.;tanding that G) is eq ual to 0 for r < 0 or k > r, we have that ( W) W-J (W -j) = . L (-I)k J k ~ O k "" (W)(W-j) = 2: (_1)k k-O J k = f ( . W ) (j + k) (_I)k k-O J + k k Taking expectations gives, P{W = j} = f E [( . W )] (j + k) ( -1 )k k-O J + k k "" AI+k (j + k) """ ~ (j + k )1 k (_1)k AI'" =-2: (-A)klk l Jf k.O EXAMPLE 10.1(A) Suppose that W is a binomial random variable with parameters nand p, where n is large and p is small. Let A = np and interpret W as the number of successes in n independent trials where each is a success with probability p; and so ( ~ ) BRUN'S SIEVE represents the number of sets of i successful trials. Now, for each of the (:) sets of; trials, define an indicator variable X, equal to 1 if all these trials result in successes and equal to 0 otherwise. Then, and thus _ n(n - 1)·· (n - i + l)pl i' Hence, if i is small in relation to n, then lfi is not small in relation to n, then E [ (7) ] = 0 = A'I;!, and so we obtain the classical Poisson approximation to the binomial. EXAMPLE 10.1(8) Let us reconsider the matching problem, which was considered in Example 1 3(A) In this problem, n individuals mix their hats and then each randomly chooses one Let W denote the number of individuals that choose his or her own hat and note that (7) represents the number of sets of ; individuals that all select their own hats For each of the (:) sets of ; individuals define an indicator variable that is equal to 1 if all the members of this set select their own hats and equal to 0 otherwise Then if X" j ~ 1, , t) is the indicator for the jth set of ; individuals we have 459 460 POiSSON APPROXIMATIONS and so, for i <:: n, [( W)] (n) 1 E i = i -n(n --:--., '-(n -i + 1) = IIi! Thus, from Brun's sieve, the number of matches approximately has a Poisson distribution with mean 1. EXAMPLE :10.1(c) Consider independent trials in which each tnal is equally likely to result in any of r possible outcomes, where r is large. Let X denote the number of trials needed until at least one of the outcomes has occurred k times. We will approximate the probability that X is larger than n by making use of the Poisson approximation. Letting W denote the number of outcomes that have occurred at least k times in the first n trials, then P{X> n} P{W = O}. Assuming that it is unlikely that any specified outcome would have occurred at least k times in the first n tnals, we will now argue that the distnbution of W is approximately Poisson. To begin, note that ( ~ ) is equal to the number of sets of i outcomes that have all occurred k or more times in the first n trials Upon defining an indicator vanable for each of the (;) sets of i outcomes we see that where p is the probability that, in the first n tnals, each of i specified outcomes has occurred k or more times. Now, each trial will result in anyone of i specified outcomes with probability ilr, and so if ilr is small, it follows from the Poisson approximation to the bino- mial that Y, the number of the first n trials that result in any of i specified outcomes, is approximately Poisson distributed with mean nil r. Consider those Y trials that result in one of the i specified outcomes, and categorize each by specifying which of the i out- comes resulted. If ~ is the number that resulted in the jth of the i specified outcomes, then by Example 1.5(H) it follows that the Y" j = 1, ... , i, are approximately independent Poisson random variables, each with mean ~ (nilr) = nlr. Therefore, from the l BRUN'S SIEVE independence of the V " ~ we see that p = (an)', where t-l an = 1 - 2: e-nlr(nl,)'lj! FO Thus, when iI, is small """ (ran)'m In addition, since we have supposed that the probability that any specified outcome occurs k or more times in the n trials is small it follows that an (the Poisson approximation to this binomial proba- bility) is also small and so when ii, is not small, E [ C) ] = 0 = (ra.)'f;! Hence, in all cases we see that E r (;) ] = (ra.)'f;1 Therefore, W is approximately Poisson with mean ,an, and so P{X> n} :::: P{W = O} = exp{-ra n } As an application of the preceding consider the birthday problem in which one is interested in determining the number of individuals needed until at least three have the same birthday. This is in fact the preceding with, = 365, k = 3 Thus, P{X> n} """ exp{-365a n } where an is the probability that a Poisson random variable with mean nl365 is at least 3 For instance, with n = 88, ass = 001952 and so P{X> 88} = exp{-365ass} """ 490. That is, with 88 people in a room there is approximately a 51 percent chance that at least 3 have the same birthday 461 462 POISSON APPROXIMATIONS 10.2 THE STEIN-CHEN METHOD FOR BOUNDING THE ERROR OF THE POISSON ApPROXIMATION n As in the previous section, let W = 2: x,. where the XI are Bernoulli random 1=1 n variables with respective means Ao i = 1, ... , n. Set A = 2: AI and let A I=l denote a set of nonnegative integers. In this section we present an approach, known as the Stein-Chen method, for bounding the error when approximating P{W E A} by 2: e-A,AI/i!. lEA To begin, for fixed A and A, we define a function g for which E[Ag(W + 1) - Wg(W)] = P{W E A} - 2: e-A,Alli! lEA This is done by recursively defining g as follows: g(O) = 0 and for j ~ 0, g(j + 1) = 1. [IU E A} - 2: e-A,Alli' + jg(j)] A lEA where I{j E A} is 1 if j E A and 0 otherwise. Hence, for j ~ 0, Ag(j + 1) - jg(j) = IU E A} - 2: e-AAlli! lEA and so, Ag(W + 1) - Wg(W) = I{W E A} - 2: e-A,Alli! lEA Taking expectations gives that (10.2 1) E[Ag(W + 1) - Wg(W)] = P{W E A} - 2: e-A,Alli' lEA The following property of g will be needed. We omit its proof. Lemma 10.2.1 For any A and A Ig(j) - g(j - 1)1 ~ 1 - e- A ~ min(l, II A) A THE STEIN-CHEN METHOD FOR BOUNDING THE ERROR 463 Since, for i < j g(j) - g(i) = g(j) - g(j - 1) + g(j 1) g(j - 2) + + g(i + 1) g(i), we obtain from Lemma 10 21 and the triangle inequality that, (1022) Ig(j) - g(i)1 s Ii iI min(l, 1IA) To continue our analysis we will need the following lemma Lemma 10.2.2 For any random van able R n E[WR] = L A,E[RIX, = 1] ,=1 Proof n L E[RX,] ,-I n = L E[RX,IX, = 1]A, .-1 " = L A,E[RIX, = 1] / ~ I If, in Lemma 10 2 2, we let R = g( W) then we obtain that n E[Wg(W)] = L A,E[g(W)IX, = 11 " (1023) L A.E[g(V. + 1)], .=1 where V, is any random variable whose distribution is the same as the conditional distribution of L X, given that X, = 1 That is, V, is any random variable such that 1#/ P{V, k} P{LX,=kIX/=l} fj., 464 POISSON APPROXIMATIONS n Since E[Ag(W + 1)] L A,E[g(W + 1)] we obtain from Equations (10 2 1) and ,-I (10 2 3) that, (10 24) I P{W E A} - e ANli11 W + 1)] - E[g(V, + 1)])/ = A,E[g(W + 1) - g(VI + 1)]/ n :::; L AtE[lg(W + 1) - g(V t + 1)1] n .$; L At min(l, l/A)E[IW - V,I] 1=1 w here the final ineq uality used Equation (10 2.2) Therefore, we have proven the fol- lowing THEOREM 10.2.3 Let V, be any random variable that is distributed as the conditional distribution of L given that X, = I, i = 1, ,n That is. for each i, V, is such that J*' P{V, = k} = P{L = klxt = I}, for all k Then for any set A of nonnegative integers 1 1 P{W E A} - L e-AA'/i11 :::; min(I, I/A):i A,E[lW - V,I] lEA ,=1 Remark The proof of Theorem 10.2 3 strengthens our intuition as to the validity of the Poisson paradigm From Equation (10.2.4) we see that the Poisson approximation will be precise if Wand VI have approximately the same distribution for all i That is, if the conditional distribution of 2: X; given rl-I that X, = 1 is approximately the distribution of W, for all i But this will be the case if THE STEIN CHEN METHOD FOR BOUNDING THE ERROR 465 d where:::::: means "approximately has the same distribution." That is, the depen- dence between the X, must be weak enough so that the information that a specified one of them is equal to 1 does not much affect the distribution of the sum of the others, and also Al = P{X, = I} must be small for all i so that L x, and L Xi approxnnately have the same distribution. Hence, the Poisson 1*' I approximation will be precise provided the A, are small and the X; are only weakly dependent. ExAMPLE 10.2(A) If the X, are independent Bernoulli random vari- ables then we can let V, = L x,. Therefore, 1*' and so, L e-AA'li! I ~ min(1, 11 A) ± A;. lEA ,-I Since the bound on the Poisson approximation given in Theorem 10.2.3 is in terms of any random variables VI' i = 1, .. ,n, having the distributions specified, we can attempt to couple V, with W so as to simplify the computation of £[1 W - VII] For instance, in many cases where the X, are negatively correlated, and so the information that X; = 1 makes it less likely for other X, to equal 1, there is a coupling that results in VI being less than or equal to W. If Wand VI are so coupled in this manner, it follows that ExAMPLE 10.2(8) Suppose that m balls are placed among n urns, with each ball independently going into urn i with probability PI' i = 1,. ,n. Let XI be the indicator for the event that urn i is n empty and let W = L x, denote the number of empty urns. If m ,=1 is large enough so that each of the A, = P{X, = I} = (1 - p,)m is small, then we might expect that W should approximately have n a Poisson distribution with mean A = LA,. 1=1 As the XI have a negative dependence, since urn i being empty makes it less likely that other urns will also be empty, we might 466 POISSON APPROXIMATIONS suppose that we can construct a coupling that results in W V r • To show that this is indeed possible, imagine that the m balls are distributed according to the specified probabilities and let W denote the number of empty urns. Now take each of the balls that are in urn i and redistribute them, independently, among the other urns according to the probabilities p/(l - PI}' j =I=- i. If we now let VI denote the number of urns j, j =I=- i, that are empty, then it is easy to see that VI has the appropriate distribution specified in Theorem 10 2.3. Since W >- V, it follows that £[lW - V,I] = £[W - VI] = £[W] - £[V r ] A- 2: 1*1 1 - PI Thus, for any set A, lEA :S; min(1, 11 A) A, [ A - ( 1 P, )m] 1 P, Added insight about when the error of the Poisson approximation is small is obtained when there is a way of coupling V, and W so that W VI' For in this case we have that 2: A,£[lW - V,I] = 2: AI£[W] 2: AI£[V I ] I I I = A2 - 2: AI(£[l + V,] I) = A2 + A AI £ [1 + X,I 1] = A2 + A 2: = 1] = A2 + A £[W2] = A - Var(W}, where the next to last equality follows by setting R = W in Lemma 10 2 2 Thus, we have shown the following. IMPROVING THE POISSON APPROXIMATION 467 PROPOSITION 10.2.4 If, for each i. i = 1, , n. there is a way of coupling V, and W so that W 2: V" then for any set A I P{W E A} - ~ e-A,\'/i! I :s min(1, 1/ A) [A - Var(W)] Thus, loosely speaking, in cases where the occurrence of one of the X, makes it less likely for the other X, to occur, the Poisson approximation will be quite precise provided Var(W) :::::: E[W]. 10.3 IMPROVING THE POISSON ApPROXIMATION Again, let XI,. ., Xn be Bernoulli random variables with A, = P{X, = I}. In this section we present another approach to approximating the probability n mass function of W = LX,. It is based on the following proposition 1=1 PROPOSITION 10.3.1 With V, distnbuted as the conditional distribution of 2: X, given that X, = 1, ,,,, n (a) P{W> O} = 2: A,E[l/(1 + V,)] ,=1 (b) P{W = k} = i ~ A,P{V, = k - 1}, k 2: 1. Proof Both parts follow from Lemma 10 2 2 To prove (a), let Then from Lemma 10 2 2, we obtain that ifW>O ifW=O n n P{W> O} = 2: A,E[l/wlx, = 1] = 2: A,E[I/(1 + V,)] ' ~ I ,=1 468 POISSON APPROXIMATIONS which proves (a) To prove (b), let ifW= k otherwise, and apply Lemma 10.2.2 to obtain n 1 1 n P{W == k} = 2: ,\(- P{W = klx, = 1] = - 2: '\,P{V( = k - 1} k k Let at = E[VJ = 2: E[X,IX, = 1]. Now, if the a, are all small and if, conditional on X, = 1, the remaining j *' i, are only "weakly" dependent then the conditional distribution of VI will approximately be Poisson with mean a,. Assuming this is so, then and P{V( = k - 1}::::: exp{-a.}a:-I/(k -1)!, k:=: 1 E[l/(l + V,)] ::::: 2: (1 + j)-I exp{ -a,}a;lj! ) =.! exp{ -aJ 2: a:+I/U + I)! a , I = exp{ -a,}(exp{a,} - l)la, = (1 - exp{-a,})la, Using the preceding in conjunction with Proposition 10.3.1 leads to the follow- ing approximations' n P{W> O}::::: 2: A,(l - exp{-a,})la" (10.3.1) P{W 1 n k}::::: -k 2: A, exp{ - I)!, I-I If the X, are "negatively" dependent then given that X, = 1 it becomes less likely for the other X j to equal 1. Hence, Vj is the sum of smaller mean IMPROVING THE POISSON APPROXIMATION 469 Bernoullis than is Wand so we might expect that its distribution is even closer to Poisson than is that of W. In fact, it has been shown in Reference [4] that the preceding approximation tends to be more precise than the ordinary Poisson approximation when the X, are negatively dependent, provided that A < 1 The following examples provide some numerical evidence for this statement EXAMPLE 10.3(A) (Bernoulli Convolutions). Suppose that X" i = 1, .. , 10, are independent Bernoulli random variables with E[X,] = ;11000. The following table compares the new approxima- tion given in this section with the usual Poisson approximation. k P{W = k} New Approx. Usual Approx 0 946302 946299 946485 1 .052413 052422 052056 2 001266 001257 001431 3 000018 .000020 000026 EXAMPLE 10.3(8) Example 10 2(B) was concerned with the number of multinomial outcomes that never occur when each of m indepen- dent tnals results in any of n possible outcomes with respective probabilities PI, .. ,pn Let X, equal 1 if none of the trials result in outcome i, i = 1, .. , n, and let it be 0 otherwise. Since the nonoccurrence of outcome i will make it even less likely for other outcomes not to occur there is a negative dependence between the X,. Thus we might expect that the approximations given by Equa- tion (10.3.1), where ( pj)m 1-- rt-, jt-, 1 - P, are more precise than the straight Poisson approximations when A < 1 The following table compares the two approximations when n = 4, m = 20, and p, = il10 W is the number of outcomes that never occur. k o 1 2 P{W = k} 86690 13227 .00084 New Approx 86689 13230 .00080 Usual Approx 87464 11715 00785 EXAMPLE 10.3(c) Again consider m independent trials, each of which results in any of n possible outcomes with respective proba- bilities PI, ... ,Pn, but now let W denote the number of outcomes 470 POISSON APPROXIMATIONS that occur at least r times. In cases where it is unlikely that any specified outcome occurs at least r times, W should be roughly Poisson. In addition, as the indicators of Ware negatively depen- dent (since outcome i occurring at least r times makes it less likely that other outcomes have also occurred that often) the new approxi- mation should be quite precise when A < 1. The following table compares the two approximations when n = 6, m = 10, and p, = 1/6. W is the number of outcomes that occur at least five times k o 1 2 P{W = k} 9072910 0926468 0000625 New Approx. 9072911 0926469 0000624 Usual Approx 91140 .08455 00392 Proposition lO 3 1(a) can also be used to obtain a lower bound for P{W > O}. Corollary 10.3.2 P{W > O} 2: f "-,/(1 + a,) .=L Proof Since f(x) = l/x is a convex function for x 2: 0 the result follows from Proposition 10 31 (a) upon applying Jensen's inequality The bound given in Corollary lO.32 can be quite precise For instance, in Example lO.3(A) it provides the upper bound P{W = O} !5 .947751 when the exact probability is 946302 In Examples lO 3(8) and lO.3(C) it provides upper bounds, P{W = O} < 86767 and p{W = 0}!5 90735, that are less than the straight Poisson approximations PROBLEMS 10.1. A set of n components, each of which independently fails with probability p, are in a linear arrangement. The system is said to fail if k consecutive components fail (a) For i < n + 1 - k, let Y, be the event that components i, i + 1,. ,i + k - 1 all fail. Would you expect the distribution of 2: Y, to be approximately Poisson? Explain. PROBLEMS 471 (b) Define indicators X" i = 1,. ., n + 1 - k, such that for all i ¥- j the events {X, = I} and {X, = I} are either mutually exclusive or independent, and for which P(system failure) = P {X + ~ X, > o}, where X is the indicator for the event that all components are failed. (c) Approximate P(system failure). 10.2. Suppose that m balls are randomly chosen from a set of N, of which n are red. Let W denote the number of red balls selected. If nand m are both small in companson to N, use Brun's sieve to argue that the distribution of W is approximately Poisson. 10.3. Suppose that the components in Problem 10.1 are arranged in a circle. Approximate P(system failure) and use Brun's sieve to justify this ap- proximation. 10.4. If X is a Poisson random variable with mean A, show that E[Xh(X)] = AE[h(X + 1)] provided these expectations exist. 10.5. A group of2n individuals, consisting of n couples, are randomly arranged at a round table. Let W denote the number of couples seated next to each other. (a) Approximate P{W = k} (b) Bound this approximation 10.6. Suppose that N people toss their hats in a circle and then randomly make selections Let W denote the number of the first n people that select their own hat (a) Approximate P{W = o} by the usual Poisson approximation. (b) Approximate P{W = o} by using the approximation of Section 10.3. (c) Determine the bound on P{W = o} that is provided by Corollary 10.3 2. (d) For N = 20, n = 3, compute the exact probability and compare with the preceding bound and approximations. 472 POISSON APPROXIMATIONS 10.7. Given a set of n vertices suppose that between each of the ( ~ ) pairs of vertices there is (independently) an edge with probability p. Call the sets of vertices and edges a graph. Say that any set of three vertices constitutes a tnangle if the graph contains all of the G) edges connecting these vertices. Let W denote the number of triangles in the graph. Assuming that p is small, approximate P{W = k} by (a) using the Poisson approximation. (b) using the approximation in Section 10.3. 10.8. Use the approximation of Section 10 3 to approximate P( system failure) in Problem 10.3. Also determine the bou'nd provided by Corollary 10.3.2. 10.9. Suppose that W is the sum of independent Bernoulli random variables. Bound the difference between P{W = k} and its approximation as given in Equation (10.3.1). Compare this bound with the one obtained in Example 1O.2(A) for the usual Poisson approximation REFERENCES For additional results on the Poisson paradigm and Brun's sieve, including a proof of the limit theorem version of Proposition 10 1 1, the reader should see Reference 2 For learning more about the Stein-Chen method, we recommend Reference 3. The approximation method of Section 10.3 is taken from Reference 4 Many nice applica- tions regarding Poisson approximations can be found in Reference 1 1 D. Aldous, Probability Approximations via the Poisson Clumping Heuristic, Spnnger-Verlag, New York, 1989. 2 N Alon, J Spencer, and P Erdos, The Probabilistic Method, Wiley, New York, 1992. 3 A D. Barbour, L Holst, and S Janson, Poisson Approximations, Clarendon Press, Oxford, England, 1992 4 E Pekoz and S. M Ross, "Improving Poisson Approximations," Probability in the Engineering and Informational Sciences, Vol 8, No 4, December, 1994, pp. 449-462 Answers and Solutions to Selected Problems CHAPTER 1 loS. (0) P N N (n l , •• I 1-1 n1 r , nr-a = -r- IT P ~ " IT 1 ' ~ I n, ,=1 r where n, = 0, , nand L n, = n (b) E[N,] = nP" E[N?] = nP, - nP: + n 2 p;, E[N)] nP p E[Ni] nP, - nPi + n 2 p;, E[N,N J ] = E[E[N,NJINlll, E[N,NJINI = m] = mE[N,INJ = m] = m(n - m) 1 p, nmP,-m 2 p, = 1 - PI E[N N] = nE[NI]P, - E[N7]P, , J t P J n 2 p)p, - nP'P J + nP;P, - n 2 p;p, 1- P J n 2 P J P,(1 - PJ) - nP,P,(1 - P) 2 P n P'P J - nP, I' 1 P J Cov(N" N) = E[N,N J ] E[N,]E[N J ] = -nP,PI' i # j { 1 if outcome j never occurs (c) Let I) = 0 otherwise E[I)] = (1 - P,)", Var[/ J ] = (1 - PJ)n(t - (t - Pin, E[IJ I ] (1 - P, - PlY' i # j, 473 474 ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS , Number of outcomes that do not occur = 2: II' E = (1 - PI)\ Var [i IJ] = i Var[II] + 2:2: Cov[l,. II]' 1 ''''' Cov[I,Ill == E[l,l,] E[I,]E[I;] (1 - P, - p;)n - (1 - P,Y(1 - Pit. Var [i II] = i (1 - P;t(1 - (1 Pin 1.6. (a) Leq = + 2:2: [(1 - P, - PlY (1 - p,)n(1 - PlY] ''''I ifthere is a record at time j otherwise Var[Nn ] = i Var[I,] i (1 I since the II are independent. 1=1 J=I J J (b) Let T = min{n n > 1 and a record occurs at n}. T> n ¢:> XI = largest of XI, X 2 , , Xn, 1 E[T] = 2: P{T > n} = 2: - = (Xl, n"l n P{T = (Xl} lim P{T > n} 0 n-'" (c) Let Ty denote the time of the first record value greater than y Let XTy be the record value at time P{XTy > xl Ty = n} P{Xn > xlx l < y, X2 < y, . ,X,,_I < y. Xn > y} = P{Xn > xlXn > y} { 1 x<y -= F(x)/F(y) x> y Since P{XTy > xl Ty = n} does not depend on n, we conclude Ty is independent of XTy 1.10. Suppose that the outcome of each contest is independent, with either contes- tant having probability 112 of winning For any set S of k contestants let A(S) ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 475 be the event that no member of S< beats every member of S Then P { Y A(S)) < f PlAtS)) = (:) [1 - where the eq ual;ty follows s; nee there are (:) sets of s;ze k and PIA (S)} ; s the probability that each of the n - k contestants not in S lost at least one of h;s or her k matches w;th players ;n S Hence.;f (:) [1 - < 1 then there is a positive probability that none of the events A(S) occurs 1.14. (a) Let denote the number of 1 's that occur between the (j - 1 )st and jth even number, j = 1, , 10 By conditioning on whether the first appear- ance of either a 1 or an even number is 1, we obtain E[Y J ] = before 1]3/4 + E[YJI1 before even] 114 = (1 + E [ Y J ]) 1/4, implying that As, it follows that E[Xd = 10/3 (b) Let W J denote the number of 2's that occur between the (j - 1 )st and jth even number, j = 1, , 10 Then, = E[WJleven number is 2]1/3 + E[WJleven number is not 2]2/3 = 1/3 and so, (c) Since each outcome that is either 1 or even is even with probability 3/4, it follows that 1 + is a geometric random variable with mean 4/3 476 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS iO Hence, 2: (1 + VI) 10 + XI is a negative binomial random vanable r l with parameters n = 10 and p = 314 Therefore, ( 9+ ") P{X 1 i} P{Neg Bin(1O, 314) = 10 + i} = 9 t (314)1°(114)' (d) Since there are 10 outcomes that are even, and each is independently 2 with probability 113, it follows that X 2 is a binomial random variable with parameters n = ] 0, P == 1 13 That is, PIX i} en (1/3)'(2/3)'0-· 1.17. (8) F.n(x) = P{ithsmallest xlx,,=sx}F(x) + P{ith smallest s xix" > x}F(x) = P{K.-In-I =sx}F(x) + P{K."-I sx}F(x) (b) Fwl(x) = P{K.n-1 =sxlX n is among i smallest} iJn + P{K."-I :s xiX" is not among i smallest}(1 - iJn) P{K.+1" =s xix" is among i smallest}iln + P{K.,,:sxIXnisnotamongismallest}(1 - iJn) = P{K.+I " =s x}iI n + P{K." :s x}( 1 - iJn) where the final equality follows since whether or not X" is among the i smallest of X it , Xn does not change the joint distribution of X,n , i = ], , n 1.21. P{N = O} = P{UI < e- A } = e- A Suppose that P{N = n} = P{UI ~ e-A, UIU2;;:: e-A, • Ul = e- A A n ln' Hence, = II e-(A + 10gl) (A + log x)" dx e-· n" ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 477 where the last equality follows from the induction hypothesis since e-)./x = e-(H)ogx) From the above we thus have P{N= n + l} 1 II e ).(A + log x)" dx n' eo' x e-). I). = y"dy n' 0 (byy =: A + log x) = (n+l)I' which completes the induction 1.22. Var(XI Y) = E[(X E(XI y»21 Y] = E[X 2 - 2XE(XI Y) + (E(XI Y)YI Y] = E[X2IY] 2[E(XIY)]l + [E(XIY)f = E[X 2 IY] (E[XIY]f. Var(X) = E[X2] (E[X]Y = E(E[X21 V]) (E[E[XI Y]W = E[Var(XIY) + (E[XIY]Y] - (E[E[XIY]])2 = E[Var(XI V)] + E[(E[XIYW] - (E[E[XI y]])2 = E[Var(XI V)] + Var(E[XI V]) } PiX) < X 2t min(X), Xl) t} 1.34. P{XJ < Xllmin(XJ, Xz) t { } P min(X I • X 2 ) t P{X z = t} = A Z (t)P{X 2 > t}, P{X I =: t} P{X I > t} = Al(t) • P{XJ = t,X z > t} P{X 1 = t, Xz > t} + P{X 2 = I, Xl> I} P{X) = t}P{X z > t} = ~ - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ P{X) = t}P{X 2 > t} + P{X2 t}P{X) > t}' A2(t) P{X2 = t}P{XI > I} = AI(t) PiX) = t}P{Xz > I} Hence P{X1 < Xzlmin(XJ' Xz) = t} = ~ (I) 1 +_2_ A,(t) AI(t) 478 ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS 1.37. Let In equal 1 if a peak occurs at time n, and let it be 0 otherwise, and note, since each of X n - I , X n , or X n + 1 is equally likely to be the largest of the three, that E[ln] = 113 As {/2' 1 5 , Is, }, {f3, 1 6 , 1 9 , .}, {f4, 1 7 , 1 10 , } are all independent and identically distributed sequences, it follows from the strong law of large numbers that the average of the first n terms in each sequence converges, with probability 1, to 1/3 But this implies that, with probability n 1, lim 2: I'+l/n = 113 n_CC I_I 1.39. E[Td = 1 For i > 1, E[T,] = 1 + 1/2(E[time to go from i - 2 to i] = 1 + 1I2(E[T,-d + E[T,D and so, Hence, E[T,] = 2 + E[T,-d, i> 1 E[T2] = 3 E[T3] = 5 E[T,] = 2; - 1, i = 1, ,n If TOn is the expected number of steps to go from 0 to n, then E[Ton] = E [ ~ T,] = 2n(n + 1)/2 - n = n 2 CHAPTER 2 2.6. Letting N denote the number of components that fail, the desired answer is E[N]/(ILI + 1L2), where n+m I [(k - 1) ( IL )n ( J.Lz ) k-n E[N] = 2: k I k ~ n 1 l n ( n m) n - 1 ILl + 1L2 ILl + 1L2 and where (k ~ 1) = 0 when i > k - 1 2.7. Since (Slo S2, S3) = (SI' S2, S3) is equivalent to (Xl' X 2 , X 3 ) = (SJ, S2 - Sl, S3 - S2), it follows that the joint density of SI, S2, S3 is given by ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 2.S. (a) p{ -10; V':5X} = p{IOg SAX} = P{lI VI S e AX } = P{V, e- AX } ::= 1 - e- Ax 479 (b) Letting X, denote the interarrival times of a Poisson process, then N(1)- the number of events by time I-will equal that value of n such that n n+l 2:X<l<2:X" ! ! or, equivalently, that value of n such that n n+I -2: log V, < A < - 2: log V" ! ! or, equivalently, n 11:+1 2: log V,> -A> 2: log V, I ! or, n n+l 11 V, > e- A > 11 V, 1 1 The result now follows since N(l) is a Poisson random vanable with mean 1. 2.9. (a) Ps{winning} = P{1 event in [s, Tn = A(T s)e-A(T-SJ. (b) ! Ps{winning} = Ae-AT[(T - s)Ae"" - e""] Setting this equal to 0 gives that the maximal occurs at s =: T - 1/ A (c) Substituting s = T - lIA into part (a) gives the maximal probability e- I • 2.14. Let N; J denote the number of people that get on at floor i and off at floor j Then N,! is Poisson with mean A,P,} and all N,,. i O,j i, are independent Hence: (8) £[01] = £ = A,P'I; (b) O} 2: N,} is Poisson with mean 2: A,P,,. , , (c) 0, and Ok are independent 2.15. (a) Ni is negative binomial That is, ( k -1) P{N; = k} = - p,)k-n" n,-l 480 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS (b) No (c) T, is gamma with parameters n, and P, (d) Yes (e) E[T] : J: P{T > t} dt :J:P{T,>t,i:l, ,r}dt J: (U P{T, > t}) dt (by independence) = JA., (iI J'" P,e- P ,z(p,x)n,-l dx) dt (I (n, - 1)1 (f) T XI> where X, is the time between the (i - l)st and ith flip Since N is independent of the sequence of X" we obtain E[T] = E[E[TIN]] = E[NE[X]] = E[N] since E[X] 1 2.22. Say that an entering car is type 1 if it will be between a and b at time tHence a car entenng at times s, s < t, will be type 1 if its velocity V is such that a < (t s)V < b, and so it will be type 1 with probability F (.l!.-) -F . t-s I s Thus the number of type-l cars is Poisson with mean 2.24. Let v be the speed of the car entenng at time t, and let tv = Llv denote the resulting travel time If we let G denote the distnbution of travel time, then as T :5 LI X is the travel time when X is the speed, it follows that G(x) == F(Llx). Let an event correspond to a car entenng the road and say that the event is counted if that car encounters the one entenng at time I. Now, independent of other cars, an event occurnng at time s, s < I, will be counted with probability P{s + T > t + tv}, whereas one occurnng at time s, s > I, will be counted with probability P{s + T < t + t.}. That is, an event occurring at time s is, independently of other events, counted with probability p(s), where { G(t + tv - s) p(s) = :(t + t. -s) ifs < I, if 1< s < t + tv, otherwise ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS Hence, the number of encounters is Poisson with mean A f; p(s) ds = A f: G(I + lu - s) ds + A r'o G(I + lu - s) ds = A f'+" G(y) dy + A f'u G(y) dy I. 0 481 To choose the value of lu that minimizes the preceding, differentiate to obtain. Setting this equal to 0 gives, since G(I + lu) = 0 when I is large, that or, Thus, the optimal travel time I ~ is the median of the distnbution, which im- plies that t ~ optimal speed v* = L l I ~ is such that G(Llv*) :=0 112 As G(Llv*) = F(v*) this gives the result 2.26. It can be shown in the same manner as Theorem 231 that, given Sn = I, SI' , Sn-1 are distributed as the order statistics from a set of n - 1 uniform (0, I) random variables A second, somewhat heunstic, argument is SJ, ,Sn-IISn = I = SI, ,Sn-IIN(r) == n - 1, N(/) = n (by independent increments) = SI, ,Sn-IIN(/) = n-l 2.41. (a) It cannot have independent increments since knowledge of the number of events in any interval changes the distribution of A (b) Knowing {N(s), 0 :;:;: s :;:;: I} is equivalent to knowing N(/) and the arnval times SJ, ,SN(I) Now (arguing heuristically) for 0 < Sl < . < Sn < I, PtA :=0 A, N(/) == n, SI == Sl, ,Sn = sn} = PtA = A}P{N(/) = nlA = A}P{SI = SJ, ,Sn = snl A, N(/) = n} (by Theorem 2 3 1) Hence, PtA E (A, A + dA)IN(/) = n, S) == SJ, ,Sn = sn} _ e-AI(At)n dG(A) -1: e-AI(AW dG(A) 482 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS Hence the conditional distribution of A depends only on N(t) This is so since, given the value of N(t), SI, . , SN(I) will be distributed the same (as the order statistics from a uniform (0, t) population) no matter what the value of A (c) P{time of first event after t is greater than t + sIN(t) = n} e-·'e-·'(At)" dG(}.) e-'\/(At)" dG(}.) (d) lim f"'_1 _e-_· h dG(}.) ::= f'" lim (1 -e- M ) dG(}.) h_OO h Oh_O h ::= f:O }. dG(}.) (I (e) Identically distributed but not independent. 2.42. (a) P{N(t) = n} f'" e- A1 (At)" ae-a• (a}.)m-I d)' o n' (m - 1)' amf'(m + n - 1)1 f'" «a + t)}.)mt"-I = (a + t)e-ratl)A d)' n'(m -1)'(a + t)"'+" 0 (m + n - 1)' ( m + n - 1) )" n a+t a+t (b) PiA = }.IN(t) n} == P{N(l) = nlA = }'}P{A = }.} P{N(t) = n} -AI (At)" -a. (a}.)m-l e ",ae (m - 1)1 ( m + n - )" n a+t a+t (a + t)e-(atl)A «a + t)}.)m+n-l . (m + n - 1)1 (c) By part (d) of Problem 241, the answer is the mean of the conditional distnbution which by (b) is (m + n)/(a + t) CHAPTER 3 3.7. The renewal equation is, for t ::::; 1, m(t) t + J:m(t - s) ds = t + J: m(y) dy Differentiating yields that m'(t) = 1 + m(t) Letting h(t) = 1 + m(t) gives h'(t) h(t) or h(t) = ee' ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS 483 Upon evaluating at t = 0, we see that c = 1 and so, m(t) = e' - 1 As N(I) + 1, the time of the first renewal after 1, is the number of interarrival times that we need add until their sum exceeds 1, it follows that the expected number of uniform (0, 1) random vanables that need be summed to exceed 1 is equal to e 3.17. g = h + g * F = h + (h + g * F) * F = h + h * F + g * F2 = h + h * F + (h + g * F) * F2 = h + h * F + h * F2 + g * FJ =h+h*F+h*F2+ + h * F" + g * F"+I Letting n _ 00 and using that F" _ ° yields '" g = h + h * 2: F" = h + h * m "=1 (a) P(t) = f: P{on at tlZ I + Y 1 = s} dF(s) = f ~ P(t - s) dF(s) + r P{ZI > Iizl + Y 1 = s} dF(s) = t P(t - s) dF(s) + P{ZI > t} (b) g(t) = f; E[A(t)IX1 = s] dF(s) = i:g(t - s) dF(s) + r tdF(s) = i:g(t - s) dF(s) + tF(t) f: P{ZI > t} dt _ E[Z] P(t) - fJ.r - E[Z] + E[Y]' f a> tF(t) dt fa> t fa> dF(s) dt J'" JS t dt dF(s) () 0 0 , 0 0 g t _ = = '---'----- fJ. fJ. fJ. _ f>2 dF(s) _ E[X2] - 2fJ. - 2E[X] 3.24. Let T denote the number of draws needed until four successive cards of the same suit appear. Assume that you continue drawing cards indefinitely and any time that the last four cards have all been of the same suit say that a renewal has occurred The expected time between renewals is then E[time between renewals] = 1 + ~ (E[T] - 1) 484 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS The preceding equation being true since if the next card is of the same suit then the time between renewals is 1, and if it is of a different suit then it is like the first card of the initial cycle By Blackwell's theorem, E[timebetweenrenewals] = (lim P{renewal atn}t l = (114)-3 64 Hence, E[T] = 85 3.27. E[RN(I)+d = J: E[RM,)+tiSN(') = s]F(t- s) dm(s) + E[RN(II+dSNIlI = O]F(/) J: E[RdX, > 1- s]F(t - s) dm(s) + E[RIIXI > 1]17(/) -'I> f: E[RlIXI > I]F(t) dtlp. J; !:" E[RlIXI = s] dF(s) dtlp. f: f: dt E[RIIX, = s] dF(s)/p. f: sE[RdXl s] dF(s)/p. E[R,Xd/p.. where p. E[Xl] We have assumed that E[R,Xd < 00, which implies that E[X2] > E2[X] since Var(X) > o except when Xis constant with probability 1 3.33. (a) Both processes are regenerative (b) ]f T is the time of a cycle and N the number of customers served, then Imagine that each pays, at any time, at a rate equal to the customer's remaining service time Then reward in cycle = f ~ V(s) ds Also, letting Y, denote the service time of customer i, reward in cycle = ~ [ D, Y, + f:' ( Y , t) dl] N N y2 =2:D,Y,+2: 2' ,aj .=1 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS Hence, E[reward in cycle] Now E[N] . E[D1Y1+···+D.Y.] = 11m n ....... ilI/> n 485 (by Wald's equation) E[DI + + D.] = E[Y] lim since D, and Y, are independent n ....... oo n = E[Y]WQ Therefore from the above E[rewardincycle] = E[N] (E[Y]W Q + E[;2]). Equating this to E[f; V(s) ds] = VE[T] and using the relationship E[T] = E[N]/A establishes the identity. 3.34. Suppose P{X < Y} 1 For instance, P{X = I} = 1, and Y is uniform (2, 3) and k = 3 3.35. (a) Regenerative (b) E[time during cycle i packages are where E[time i packages are waiting] f; E[ti me I cycle of length x] dF(x) f '" ., (Ax) I = 2: E[timellength isx, N(x) :; j]e-· ... -.,-dF(x) (J J:' J The last equality follows from Problem 217 of Chapter 2. ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 3.36. lim I: r(X(s» ds/I:::; E [I; r(X(s» ds Jj E[T] E (amount of time injduring T) J CHAPTER 4 4.10. (0) a, == 1 - (1 - p)' (b) No (c) No (d) Yes, the transition probabilities are given by P{X" +1 k, Y,,+I = j - klX" = i, Y" = j} = U) - a,);-l, 0 :5 k :5 j 4.13. Suppose i +-+ j and i is positive recurrent. Let m be such that > 0 Let Nk denote the kth time the chain is in state i and let ifX N +m = j • otherwise By the strong law of large numbers n I >0. l:1 n Hence . numberofvisitstojbytimeNn+m n m 1 n N" + m 2: P,! E[T II ] > 0, where T" is the time between visits to i Hence j is also positive recurrent If i is null recurrent, then as i +-+ j, and recurrence is a class property, j is recurrent. If j were positive recurrent, then by the above so would be i Hence j is null recurrent. 4.14. Suppose i is null recurrent and let C denote the class of states communicating with i. Then all states in C are null recurrent implying that =0 for allj E C. ...... ., But this is a contradiction since L 1EC = 1 and C is a finite set For the same reason not all states in a finite-state chain can be transient. ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 487 4.16. (a) Let the state be the number of umbrellas she has at her present location The transition probabilities are i = I, ,r (b) The equations for the limiting probabilities are 'Tf,='TfO+'TflP, TT J = TT,_/(1 - p) + TT'-J+IP, j I, ,r-l, TTO = 'Tf,(1 p) It is easily verified that they are satisfied by 'Tf={r!q I 1 r+ q ifi = 0 if i = 1, , r, where q = 1 - P (c) p'Tf = ~ o r+ q 4.17. Let I. (jl ~ G ifX n = j otherwise Then n = lim 2: 2: E[Il-J(i)Ik(j)]/n n k=1 , n = lim 2: 2: E[h-l(i)]P,/n n k=1 , n = lim 2: P" 2: E[I k-1 (i)]/n n I k ~ 1 488 ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS 4.20. Let X,/(n) equal 1 if the nth transition out of state i is into state j, and let it be 0 otherwise Also, let N, denote the number of time penods the chain is in state i before returning to 0 Then, for j > 0 But, by Wald's equation For a second proof, by regarding visits to state 0 as cycles, it follows that 11/, the long-run proportion of time in state j, satisfies Hence, from the stationarity equations 11} L, 11 I P,} we obtain for j > 0 that 4.21. This Markov chain is positive recurrent if, and only if, the system of equations possesses a solution such that Y/ ~ 0, L, Y/ =::: 1 We may now rewnte these equations to obtain Yo = Ylql, Y!+ I q/+l - Y,P! y/q/ - Y!-IP,-t> F ; ~ 1 From the above it follows that Hence. Therefore, a necessary and sufficient condition for the random walk to be a positive recurrent is for ANSWERS AND SOLUTIONS TO SELEcrED PROBLEMS 489 4.30. Since P{X, - Y, = I} = P 1 (1 - P 2 ) and P{X, Y, = only look when a pair X" Y, occurs for which X, observed is a simple random walk with I} = (1 P I )P 2 if we Y, "" 0, then what is Hence from gambler's ruin results P{error} = P{down M before up M} 1 _ 1 - (1 - (q/p)M) = (q/p)M(l - (q/p)M) = (q/p)M 1 - (q/pfM 1 - (q/p)2M 1 + (q/p)M 1 1 ...,..-----,.,.---- = -- (p/q)M + 1 1 + AM By Wald's equation E [ ~ (X, - Y,) ] = E[N](P1 P2), or, 4.40. Consider the chain in steady state. and note that the reverse process is a Markov chain with the transition probabilities P ~ = 1T J P J ,/1T r Now, given that the chain has just entered state i, the sequence of (reverse) states until it again enters state i (after T transitions) has exactly the same distnbution as Yp j 0, • T For a more formal argument, let io :::: in = 0 Then, n P{Y (io, , in)} = P{X == (in, , io)} = n P'A' iH k ~ 1 n n P,* 1T, 11f t = n P ~ t-l '. "'-1 i. 1'-1 '. k=1 hi 4.43. For any permutation i l • i], , in of 1, 2. , n, let 1T(iJ> i2, , in) denote the limiting probability under the one-closer rule By time reversibility, we have 490 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS for all permutations Now the average position of the element requested can be expressed as average position 2: P,E[position of element i] 2: P, [1 + 2: P{elementj precedes element i}] I !,,"r "= 1 + 2: 2: P, Pie} precedes e,l I I'" = 1 + 2: [P,P{elprecedese,} + PIP{e,precedese l }] ,<) = 1 + 2: [P,P{e,precedese,} + P}(l- P{eJprecedese,})] = 1 + 2:2: (P, - PI)P{e l precedes e,l + 2:2: PI Hence, to minimize the average position of the element requested, we would want to make Pie} precedes e,} as large as possible when P J > P, and as small as possible when P, > PI Now under the front-of-the-line rule PI Pie} precedes e,} = -p. P 1+ I since under the front-of-the-line rule element j will precede element i if, and only if, the last request for either i or j was for j Therefore, to show that the one-closer rule is better than the front-of·the-line rule, it suffices to show that under the one-closer rule P P{e J precedes e,} > p'p J+ I when PI> P, Now consider any state where element i precedes element j, say ( ,i, ii' ,ibj, ) By successive transpositions using (*), we have 1T( ,i",j, ( P)hl ) =...! 1T( PI Now when PI > P" the above implies that 1T( p. ) < pi 1T( I Letting a(i, j) Pie; precedes el}' we see by summing over all states for which i precedes j and by using the above that P a(i,j) < p: a(j, i). } ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS which, since a(i, j) = 1 - aU, i), yields 4.46. (8) Yes. P aU, i) > PI; P,' (b) Proportion of time in j = 11" 0 :$; j :$; N (c) Note that by thinking of visits to i as renewals 11,(N) = (E[ number of V-transitions between V-visits to i])-I 11 (N) = E[ number of V-visits to j between V-visits to i] J E[ number of V-transitions between V-visits to i] _ E[ number of X-visits to j between X-visits to i] - 1111,(N) (d) For the symmetric random walk, the V-transition probabilities are P,.i+1 =! = P, i:= 1, .. , N - 1, P OO ;::: POI, This transition probability matrix is doubly stochastic and thus i =::; 0, 1, . ,N Hence, from (b), E[time the X process spends in j between visits to i] ;: 1 (e) Use Theorem 4 7 1 CHAPTER 5 491 5.3. (8) Let N(t) denote the number of transitions by t It is easy to show in this case that '" (Mt)i P{N(t) 2: n}:5 L e- M ,-.,- J and thus P{N(t) < co} = 1 5.5. Rather than imagining a single Yule process with X(O) := i, imagine i indepen- dent Yule process each with X(O) 1 By conditioning on the sIZes of each of these i populations at t, we see that the conditional distnbution of the k births is that of k independent random variables each having distribution given by (5 32) 492 ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS .5.S. We start by showing that the probability of 2 or more transitions in a time t is o(t). Conditioning on the next state visited gives P{;::::2 transitions by tlX o = i} = 2: P,}P{T, + ~ : S t}, } where T, and T J are independent exponentials with respective rates Vr and vI' they represent the times to leave i and j, respectively. (Note that T, is indepen- dent of the information that j is the state visited from it) Hence, lim P { ~ 2 transitions by tlX o = i}/t 1-0 Now PIT, + ~ : $ t} s PIT + T' :S t}, T, T' = independent exponentials with rates v = max(v" v J ) P{N(t) ;:::: 2}, N(t) = Poisson process with rate v o(t) and Hence, from the above lim P { ~ 2 transitions by tlXo == f}lt::S Vr (1 - 2: PI!) ,-0 J"'M Letting M ...... 00 now gives the desired result Now, P,,(t) P{X(t) == ilX(O) :::0 i} P{X(t) =:; ;, no transitions by tjX(O) = i} for all M + P{X(t) = i, at least 2 transitions by tIX(O) = i} == e-U,' + 0(1) Similarly for i yf j, p'J(t) P{first state visited isj, transition time stiX(O) = i} + P{X(t) = j, first state visited yf jIX(O) == i} ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 493 Hence, or 5.14. With the state equal to the number of infected members of the population, this is a pure birth process with birth rates Ak = k(n - k)A. The expected time to go from state ] to state n is n-I n-I 2: lIAk = 2: lI{k(n - k)A} k ~ 1 k ~ l 5.21. Let qX and px denote the transition rate and stationary probability functions for the X(t) process and sinularly qr, pr for the Y(t) process For the chain {(X(t), Y(t», t ;::: O}, we have We claim that the limiting probabilities are To venfy this claim and at the same time prove time reversibility, all we need do is check the reversibility equations Now (by time reversibility of X(t» Since the verification for transitions of the type (i, j) to (i, j') is similar, the result follows. 5.32. (0) p/2: P J JEB . _ P{X(t) = i, X(r) E G} (b) P{X(t) = ,IX(t) E B, X(t ) E G} = P{X(t) E B, X(t ) E G} 2: P{X(r) = j}P{X(t) = iIX(r) == j} JEG 2: P{X(r) = j}P{X(t) E BIX(t-) = j} JEG 494 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS (c) Let T denote the time to leave state i and T' the additional time after leaving i until G is entered Using the independence of T and T', we obtain upon conditioning on the state visited after i, Now which proves (c) FI(s) = E[e-S{T+T')] = E[e-ST]E[e- ST ] v = _1-2: E[e- ST Inext isj]P I } VI + s I E[e- ST Inext isj] = { ~ ~ ( s ) ifjE G ifj E B, (d) In any time t, the number of transitions from G to B must equal to within 1 the number from B to G Hence the long-run rate at which transitions from G to B occur (the left-hand side of (d» must equal the long-run rate of ones from B to G (the right-hand side of (d» (e) It follows from (c) that (s + v,)f;(s) = 2: F;(s)q,} + 2: q,} JEB lEG Multiplying by P, and summing over all i E B yields 2: P,(s + v')F;(s) = 2: 2: P,F;(s)q'l + 2: 2: P,q,} IEB IEB IEB ,EB lEG ::= 2: F;(s) 2: P,qll + 2: 2: P,q'l IEB IEB rEG IEB where the last equality follows from and the next to last from (d) From this we see that s 2: P,F;(s) = 2: 2: P.q,/(l - F](s» ,EB rEGIEB (f) ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 495 (g) From (f), we obtain upon using (e) s 2: P,F,(s) /EB Dividing by s and then letting s _ 0 yields 2: P,F,(s) (h) ,EB (from (a» (from(e» (from (f» _ 1 - E[e- Sr ,] - sE[Tu] (by (g» (i) If P{T;r s t} is as given. then (j) E[e- Sr ,] = I: e-S'P{T. > t} dtlEfT u ] = I'" e- Sl I'" dF r (y) dtl ErT v ] () I ' = J; I ~ e-$1 dt dFr.(Y)I E[Tu] _ 1 - E[e- sT ,] - sE[TuJ Hence. from (h) the hypothesized distribution yields the correct Laplace tran..<;form The result then follows from the one-to-one correspondence between transforms and distributions E[T;r] == I: t dFr,(t) = I'" tI'" dFr(y)dtIErTv] o , ' I '" IY t dt dF r (y)1 E[T Il ] o 0 • E[n] = 2E[Tv] E[n] ;:: (E[T u J)2 since Var(T.) ;:: 0 496 ANSWERS AND SOLUTIONS TO SELEcrED PROBLEMS 5.33. {R(t), t ~ O} is a two-state Markov chain that leaves state 1(2) to enter state 2(1) at any exponential rate A J q(A2P) Since P{R(O) = I} = p, it follows from Example 58(A) (or 5 4(A» that P{R(t) = I} = pe- X ' + (1 - e- r ')A2pIA, To prove (b), let 1\(t) == AR(" and write for s > 0 " ~ E N(t) = 2: [N(ns) - N«n - l)s)] + o(s) n ~ J Now, E[N(ns) - N«n - l)s)IA«n - l)s)] = A«n - l)s)s + o(s) Thus, E[N(ns) - N«n - l)s)] = sE[A«n - l)s)] + o(s) Hence, from the above, [ ,IE to(S)] E[N(t)] == E ~ sA«n - l)s) + -s- Now let s ~ 0 to obtain E[N(t)] == E [ J ~ A(s) dS] = ~ A, f ~ P{R(s) = i} ds pq(AJ - A2r ( _ ~ A J A 2 t = 1 - e AI) +-- A2 A ' where the last equality follows from (*) Another (somewhat more heunstic) argument for (b) is as follows P{renewal in (s, s + h)} = hE[A(s)] + o(h), and so, E[A(s)] == P{renewal in (s, s + h)}/h + o(h)lh ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 497 Letting h __ 0 gives E[A(s)] =: m'(s), and so, m(t) = E[A(s)] ds CHAPTER 6 6.2. Var Zn = x,) n "" 2: Var(X.) + 2:2: COV(X" X,) I '''', But, for i < j, Cov(X" X,) =: E[X,X,] : E[(Z, - Z,_I)(Z,- Z,_I)] = E[E[(Z, - Z,-d(Z, - Z,-I)IZI, . ,Z,]] = E[(Z, - Z,_I)(E(Z,IZ1, . I Z.) E[Zi-dZI. • Z,])] = E[(Z, - Zi-I)(Z. Z,)] =0 6.7. = .• + 2E[XnSn- 1 Isi. • + ISL . =: + 2E(X n )E[Sn-IISi, . ,S;-d + Hence, - nu 2 1SL == - (n - 1 )u 2 6.8. (0) E[Xn + Y.IX. + Y .. i =: I, ... ,n - 1] = E[x.lx, + Y" i =: 1,. ,n - 1] + E[Y.IX, + Y" i 1, .. , n - 1] (lit) Now, E[X.IX, + y.,;= I, ... n -1] =:E[E[Xnlx.+y',X"i 1, .n-ll1Xi +Y.,i=I •. ,n 1] = E[E[X.lx"i =: I, . ,n l]IX, + Y"i = 1, . ,n - 1] (by independence) = E[Xn-dX, + Y..i: I, ,n - 1] 498 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS Similarly, E[Y"IX.+Y"i=l, .,n-l]==E[Y"_IIX, +Y"i==l, ,n-l] Hence, from (*) E[X" + Y"Ix. + Y"i == 1,. ,n -1] = E[X"_, + Y"-dX, + Y"i = 1, .. ,n - 1] = X"_I + Y"-I The proof of (b) is similar. Without independence, both are false Let V,, i ;::: 1, be independent " and equally likely to be either 1 or -1 Set X" = 2: V,, and set Y I = 0, "-I and for n > 1, Y" = 2: V, It is easy to see that neither {X" Y", n ;::: O} nor {X" + Y", n ;::: O} is a martingale 6.20. Let X, ,X, , , X,, . ,X, denote the shortest path and let h (x) denote its 1 2 • length Now if y differs from x by only a single component, say the ith one, then the shortest path connecting all the y points is less than or equal to the length of the path X, ,X" . ,y" . ,X, But the length of this path differs 1 2 • from h(x) by at most 4 Hence, hl4 satisfies the conditions of Corollary 634 and so, with T = h(X) P{ITl4 - E[T]/41 ;::: b} ::=; 2 exp{-b Z /2n}. Letting a = 4b gives the result CHAPTER 7 7.4. O::=; Var (* XI) = n Var(XJ) + n(n - 1) Cov(X] , X z ) Var(X I ) + (n - 1) COV(Xl' Xz) ;::: 0 ~ Cov(X" X z) ;::: 0 for all n For a counterexample in the finite case, let X] be normal with mean 0 and let Xz = -XI 7.8. Use the fact that f(x) = e 9x is convex. Then, by Jensen's inequality, Since E[X] < 0, the above implies that () > 0 CHAPTER 8 8.3. It follows from (8 1 4) that, starting at some given time, the conditional distn- bution of the change in the value of Brownian motion in a time s - II given ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS 499 that the process changes by B - A in a time 12 - II is normal with mean (B - A)(s - tl)/(t2 - II) and variance (s - II) X (t2 - S)/(t2 - II), II < s < t 2 . Hence, given X(II) = A, X(t 2 ) = B, X(s) is normal with mean and variance 8.4. For s :::; t, Cov(X(s), X(t» = (s + 1)(t + 1) Cov(Z(t/(1 + 1», Z(s/(s + 1))) Since {Z(t)} has the same probability law as {W(t) - IW(I)}, where {Wet)} is Brownian motion, we see that Cov(X(s) , X(t» (s + 1)(t + 1) [ Coy ( W C: 1)' W C : 1)) -t: COY (W(l), we: 1)) s : 1 COY ( W C : 1)' W( 1 ) ) + (s + l ; ~ t + 1) Cov(W(I), W(l» ] == set + 1) - Sf - sf + sf s, ss I Since it is easy to see that {X(f)} is Gaussian with E[X(t)] = 0, it foUows that it is Brownian motion 8.6. 1.1. A, C Vl/2A. 8.7. All three have the same density 8.14. i. A - x e- 2 f'A - e-2p.x 8.15. [(x) = --+ (B + A) (2 B -2 A) 1.1. l.I.e"'-ef' 8.16. (b) E[T.IX(h)] E[h + Tx-x(hll + o(h) h + x - X(h) + o(h) since E[Tx] = ~ . 1.1. 1.1. Var(TxIX(h» Var(h + Tx-X(h) + o(h) g(x - X(h» + o(h). 500 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS Hence, from the conditional variance formula, g(x) = Var(E[TxIX(h)]) + E[Var(TxIX(h»] h = 2 + E[g(x - X(h))] + o(h) /L = :2 + E [g(X) - X(h)g'(x) + Xi h ) g//(x) + . ] + o(h) h h = /L 2 + g(x) - /Lhg'(x) + 2" gH(X) + o(h) Upon dividing by h and letting h _ 0, o = -/Lg'(x) + g"(x)12 + 1I/L2 (c) Var(T x +y) = Var(Tx + Tx+> - Tx) So, = Var(Tx) + Var(Tx+y - Tx) = Var(Tx) + Var(Ty) (by independence) g(x + y) = g(x) + g(y) implying that, g(x) = ex (d) By (c), we see that g(x) = ex and by (b) this implies g(x) == x//LJ 8.17. min(1.5x/4) CHAPTER 9 9.8. (a) Use the representation where XI. • Xn are independent with P{X, = 1} = 1 - P{X, = O} = p. i = 1. , n Since the Xi are discrete IFR, so is X from the convolution result ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 501 (b) Let X" be the binomial with parameters n,p", where np" = A By preserva- tion of the IFR property under limits, the result follows since X" converges to a Poisson random vanable with mean A (c) Let X" i ~ 1, be independent geometric random variables-that is, P{X, = k} = p(1 - p)k-I, k ~ 1 Since P{X, = klx, ?: k} = p, it follows that X, is discrete IFR, i ~ 1 Hence ~ ~ X, is IFR, which gives the result 9.12. Let T, denote the next state from i in the Markov chain The hypothesis yields that T, ::::::SI T.+h i ~ 1 Hence (a) follows since ~ J p,J(s) = E[f(T,)]. Prove (b) by induction on n as follows writing P, and E, to mean probability and expectation conditional on Xo = i, we have P,{X" 2!: k} = E,[P,{X" 2!: klX I }] = E,[PX.{X n - 1 ~ k}] Now the induction hypothesis states that P,{X n - 1 ~ k} is increasing in i and thus PX.{X"-l ~ k} is an increasing function of Xl. From (a), Xl is stochastically increasing in the initial state i, and so for any increasing function g(Xl)-in particular for g(X l ) = PX.{X"-l ~ k}-we have 9.15. Start with the inequality I{X E A} - I{Y E A} :5 I{X oF Y}, which follows because if the left-hand side is equal to 1 then X is in A but Y is not, and so the right-hand side is also 1 Take expectations to obtain P{X E A} - P{Y E A} :::::: P{X oF Y} But reversing the roles of X and Y establishes that P{Y E A} - P{X E A} :5 P{X oF- Y} which proves the result 9.20. Suppose Y, Z are independent and Y - G, min(Y, Z) - F Using the fact that the hazard rate function of the minimum of two independent random variables is equal to the sum of their hazard rate functions gives 502 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS To go the other way suppose AI'(t) ~ AG(t) Let Y have distnbution F and define Z to be independent of Yand have hazard rate function The random vanables Y. Z satisfy the required conditions 9.29. Let n 2 Then 2 2 2: P{X 1 + X 2 ~ i} = P 1 P 2 + P l + g - P I P 2 = 2: P{Bin(2,p) ~ i} showing that XI + X 2 ~ Bin(2,p) • Now consider the n case and suppose PI ~ P 2 :5 . ~ P" Letting f be increasing and convex, we see, using the result for n = 2, that where XI, X" are independent of all else and are Bernoulli random variables with - - PI + P" P{X] = I} = P{X" = I} = 2 Taking expectations of the above shows n n-l 2: x, :5 XI + X" + 2: X, 1";;1 Ii 1 ~ 2 Repeating this argument (and thus continually showing that the sum of the present set of X's is less vanable than it would be if the X having the largest P and the X having the smallest P were replaced with two X's having the average of their P's) and then going to the limit yields " 2: X, :5 Bin(n,p). I v Now write n II n+m 2:x, 2:x,+2:x,. I I ,,+1 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 503 where X, == 0, n + 1 $; i $; n + m Then from the preceding " ( "p) 2: X, $; Bin n + m, 2: _'_ I v I n + m Letting m _ 00 yields ix, $; Poisson (i p,) . I v I Note. We are assuming here that the variability holds in the limit. 9.32. Let fbe a convex function. A Taylor senes expansion of f(Xe) about X gives f(eX) = f(X) + f'(X)(e - l)X + J"(Z)(eX - Z)2/2, where X < Z < eX Taking expectations and using the convexity of f gives E[f(eX)] 2: E[f(X)] + (e - l)E[Xf'(X)] Now the functions g(X) = X and g(X) = f'(X) are both increasing in X for X ~ 0 (the latter by the convexity of f) and so by Proposition 72 1. E[Xf'(X)] 2: E[X]E[f'(X)] = 0, which shows that E[f(eX)] 2: E[f(X)] 9.34. Let f be convex Then E[f(Y)] = E[E[f(Y)IX]] 2: E[f(E[YIX])] where the inequality follows from Jensen's inequality applied to a random vanable whose distribution is the conditional distnbution of Y given X (a) follows from the preceding since, XY2: E[XYIX] = XE[YIX] v (b) X + Y 2: E[X + YIX] = X + E[YIX] = X + E[Y] = X. v " [ " 1"+1] n "+1 (c) ~ X , ~ E ~ X , ~ X , = n+ 1 ~ X , . (d) Let X and Y be independent and identically distributed Then, 2X2: X + Y2: E[X + YIX] = X + E[YIX] == X + E[X] v v where the first inequality follows from (c) Index Absorbing state, 232 Age-dependent branching process, 121 Age of a renewal process, 116-117,136,180-181,211-213 Alternating renewal process, 114, 135-136 Aperiodic, 169 Arc sine distribution, 147-148 Arc sine laws, 148, 150 Arithmetic-geometnc mean inequality, 54 Associated random variables, 446-449 Associated stochastic process, 448 Auto regressive process, 397 Axioms of probability, 1 Azuma's inequality, 307 generalized form of, 321 Backward diffusion equation, 383-384 Balance equations, 253 Ballot problem, 25 applications of, 26 Bayes estimators, 33-35 are not unbiased, 34 Beta random variable, 17 Bin packing problem, 414-416 Binomial random variable, 16, 46, 48,450,453,454 Birth and death process, 233 limiting probabilities, 253-254 stochastic monotonicity properties, 416-418 time reversibility of, 259-260 Birth rates, 234 Birthday problem, 461 Blackwell's theorem, 110-111, 422-424 applied to mean times for patterns, 125-126 for random walks, 349-352 for renewal reward processes, 159 Boole's inequality, 1 Borel Cantelli lemma, 4, 57, 130 converse to, 5 Branching process, 191, 226, 296, 316,443-446 Brownian bridge process, 360, 399 and empirical distribution functions, 361-363 Brownian motion process, 357, 399 absorbed at a value, 366-367 arc sine law for, 365 as limit of symmetric random walk, 356-357 distribution of first passage times, 363-364 distribution of maximum, 364, 400 reflected at the origin, 368 505 506 Brownian motion with drift, 372-375, 379-381 Brun's sieve, 457-458 Cauchy-Schwarz inequality, 399, 402,408 Central limit theorem, 41 for the number of renewals by time t, 108-109 Chapman-Kolmogorovequations, 167, 239-240 Characteristic function, 18 Chernoff bounds, 39-40 for Poisson random variables, 40 Class of states of a Markov chain, 169 Closed class, 185 Communicating states of a Markov chain, 168 Compound Poisson process, 87 asymptotical normality, 96 Compound Poisson random variable, 82-87 moment generating function, 82 probability identity, 84 recursive formula for moments, 85 recursive formula for probabilities, 86 Conditional distribution function, 20 Conditional distribution of the arrival times of a Poisson process, 67 Conditional expectation, 20, 21,33 Conditional Poisson process, 88, 96 Conditional probability density function, 20 Conditional probability mass function, 20 Conditional variance, 51 Conditional variance formula, 51 Continuity property of probability, 2-3,8 INDEX Continuous-time Markov chain, 231-232 Convolution, 25 Counting process, 59 Coupling, 409-418 Coupon collecting problem, 413-414, 452 Covariance, 9 Covariance stationary process, see second-order stationary De Finetti's theorem, 340 Death rates, 234 Decreasing failure rate random variables (DFR). 406-407, 408-409,426-427,433 DecreaSing failure rate renewal processes, 424, 426-427 monotonicity properties of age and excess, 425-426 Decreasing likelihood ratio, 432-433 Decreasing sequence of events, 2 Delayed renewal process, 124, 125 Directly Riemann integrable function, 111-112 Distnbution function, 7 Doob type martingale, 297, 309, 310-311, 318-319 Doubly stochastic transition probability matrix, 221 Duality principle of random walks, 329 Einstein, 358 Elementary renewal theorem, 107 Embedded Markov chain, 165 of a semi-Markov process, 214 Equilibrium distribution, 131 Equilibrium renewal process, 131 Erdos, p. 14 Ergodic Markov chain, 177, 253 Ergodic state of a Markov chain, 174 INDEX Erlang loss formula, 275-278 Event, 1 Excess life of a renewal process, 116-117, 119-120, 136,211-213 Exchangeable random vanables, 154,338-341,353 Expectation, see expected value Expectation identities, 46 Expected value, 9 of sums of random variables, 9 Exponential random variable, 17, 35,49 Failure rate function, 38, 39, 53 Failure rate ordenng, see hazard rate ordering Forward diffusion equation, 384-385 GIGIl queue, 333- 335, 344-346, 439-440 G/M/1 queue, 165, 179-180 GambIer'S ruin problem, 186-188, 190,223,224-225,343-344 Gambling result, 316 Gamma random variable, 17, 450, 453 relation to exponential, 52, 64-65 Gaussian process, 359 General renewal process, see delayed renewal process Geometric Brownian motion, 368-369 Geometric random variable, 16, 48 Gibbs sampler, 182-183 Glivenko-Cantelli theorem, 361 Graph,14 Hamiltonian, 47 Hamming distance, 324 Hastings-Metropolis algorithm, 204-205 Hazard rate function, see failure rate function 507 Hazard rate ordering, 420-421, 452 Increasing failure rate random variable (IFR), 406, 433, 450 Increasing IikeHhood ratio, 432-433 Increasing sequence of events, 2 Independent increments, 42, 59 Independent random variables, 8 Index set, 41 Infinite server Poisson queue, 70 output process, 81 Inspection paradox, 117, 118 Instantaneous state, 232 Instantaneous transition rAte, see transition rate Integrated Brownian motion, 369-372 Interarrival times of a Poisson process, 64 Interrecord times, 37 Inventory model, 118-119 Irreducible Markov chain, 169 Jensen's inequality, 40, 54, 354 Joint characteristic function, 18 Joint distribution function, 7 Joint moment generating function, 18 Joint probability density function, 8 Jointly continuous random vanables,8 Key renewal theorem, 112 relation to Blackwell's theorem, 112-113 Kolmogorov's backward equations, 240 Kolmogorov's forward equations, 241-242 Kolmogorov's inequality for submartingales, 314 Korolyook's theorem, 152 508 Ladder height, 349 Ladder variable, 349 ascending, 349 descending. 350 Laplace transform, 19 Lattice random variable, 109 Likelihood ratio ordering, 428 relation to hazard rate ordering, 428-433 Limiting event, 2 Limiting probabilities of continuous-time Markov chain, 251 Limiting probabilities of discrete- time Markov chain, 175 Limiting probabilities of a semi- Markov process, 214, 215 Linear growth model. 234 List ordering rules, 198-202, 210-211, 224, 228 M/GIl queue, 73,164,177-179, 388-392 busy period, 73-78 M/G/l shared processor system, 278-282 M/MIl queue, 254-255, 262 with finite capacity, 261 M/M/s queue, 234, 260 Markov chain, 163 exponential convergence of limiting probabilities, 418-420 Markov chain model of algorithmic efficiency, 193-195, 227 Markov process, 358 Markovian property, 232 Markov's inequality, 39, 57 Martingale, 295 use in analyzing Brownian motion processes, 381-383 use in analyzing random walks, 341-344 Martingale convergence theorem, 315 INDEX Martingale stopping theorem, 300 Matching problem, 10, 24, 26-28, 304, 459-460, 471 Mean of a random vanable, see expected val ue Mean transition times of a Markov chain, 223-224 Memoryless property, 35 of the exponential, 36, 37-38 Mixture of distributions, 407, 408-409 Moment generating function, 15 Monotone likelihood ratio family, 453 Moving average process, 398 Multinomial distribution, 47, 465, 469,470 Multivariate normal, 18-19 n-Step transition probabilities, 167 Negative binomial random variable, 16, 48 Network of queues, 271--275 New better than used in expectation (NBUE), 437, 454 less variable than exponential, 437-438 renewal process comparison with Poisson process, 440-442 New worse than used in expectation (NWUE), 437 more variable than exponential, 437-438 renewal process comparison with Poisson process, 440-442 Neyman-Pearson lemma, 429-430 Nonhomogeneous Poisson process, 78-79 as a random sample from a Poisson process, 80 Nonstationary Poisson process, see nonhomogeneous Poisson process Normal random variable, 16, 17, 451,453 INDEX Null recurrent, 173-174,353 Number of transitions between visits to a state, 183-185 Occupation time, 285 Order statistics, 66-67, 92, 93 Ornstein-Uhlenbeck process, 400 Parallel system, 128-129 Patterns, 125-128,301 Period of a lattice random variable, 110 Period of a state of a Markov chain, 169 Poisson approximation to the binomial, 63, 458-459, 465, 469 Poisson paradigm, 457 Poisson process, 59-61 Poisson random variable, 16, 28, 32, 48, 51, 90, 471 stochastic ordering of, 411, 453 Polya frequency of order 2,433 Positive recurrent, 173-174 Probabilistic method, 14 Probability density function, 7 Probability generating function, 48 Probability identities, 11-14 Pure birth process, 235 Queueing identity, 138-140, 160 Random experiment, 1 Random hazard, 289 Random time, 298 Random variable, 7 discrete, 9 continuous, 7 Random walk, 166, 328 Random walk on a circle, 42-43, 44 Random walk on a graph, 205-206 Random walk on a star graph, 45, 206-208 Range of a random walk, 330-331 509 Range of the simple random walk, 331-332 Rate of a Poisson process, 60 Rate of a renewal process, 104 Rate of the exponential, 38 Records, 36, 47, 80-81, 220 Recurrent state, 169, 170, 171, 341 Regenerative process, 140-142, 161 Regular continuous-time Markov chain, 233, 287 Reliability bounds, 447-448, 451 Renewal equation, 154 Renewal function, 99, 100, 121, 154 Renewal process, 98 Renewal reward process, 132-133 Renewal type equation, 157 Reverse martingale, 323 Reversed process, 203, 211, 257-258, 270 Round-robin tournament, 47 Ruin problem, 347-348, 354, 392-393 Runs, 195, 227 Sample path, 41 Sample space, 1 Second-order stationary process, 396 Semi-Markov process, 213, 385-387 Shot noise process, 393-396 Shuffling cards, 49 Simple random walk, 166-167, 171-172 Spitzer's identity, 335-337 application to mean queueing delays, 337-338 Standard Brownian motion process, 358 Stationary distribution, 174 Stationary increments, 42, 59 Stationary point process, 149 regular, or ordinary, 151 510 Stationary process, 396-399 Statistical inference problem, 429, 453 Steady state, 203, 257 Stein-Chen method for bounding Poisson approximatipns, 462-467 Stirling's approximation, 144, 171 Stochastic ordering of renewal processes, 412 of stochastic processes, 411-412 of vectors, 410-412 Stochastic population model, 263-270 Stochastic process, 41 continuous time, 41 discrete time, 41 Stochastically larger, 404-406 Stochastically monotone Markov process, 448-449 Stochastically more variable, 433-437 Stochastically smaller, see stochastically larger Stopped process, 298 Stopping time, 104, 298 Strong law for renewal processes, 102 Strong law of large numbers, 41, 56-58,317-318 Submartingale,313 Sum of a random number of random vanables, 22 Sum of independent Poisson processes, 89 Supermartingale, 313 Symmetnc random walk, 143, 145-147,148-149,172,219,220, 332-333 Table of continuous random variables, 17 Table of discrete random variables, 16 Tandem queue, 262-263 INDEX Tilted density function, 53 Time-reversible Markov chains, 203,209,228,259,290,291 Transient state, 169, 170 Transition rate of a continuous- time Markov chain, 233 of the reverse chain, 258 Truncated chain, 228, 261 Two-dimensional Poisson process, 95 Two sex population growth model, 244-249 Two-state continuous-time Markov chain, 242-243, 284-286 Unbiased estimator, 34 Uncorrelated random variables, 9 Uniform random vanable, 17,46, 453 Uniformization, 282-284 Uniformly integrable, 318 Variance, 9 of sums of random variables, 10 Waiting times of a Poisson process, 64-65 Wald's equation, 105, 155, 300 Weakly stationary process, see second order stationary process Weibull random vanable, 453 Wiener process, see Brownian motion process Yule process, 235-238, 288 ACQUISITIONS EDITOR Brad Wiley II MARKETING MANAGER Debra Riegert SENIOR PRODUCfION EDITOR Tony VenGraitis MANUFACfURING MANAGER Dorothy Sinclair TEXT AND COVER DESIGN A Good Thing, Inc PRODUCfION COORDINATION Elm Street Publishing Services, Inc This book was set in Times Roman by Bi-Comp, Inc and printed and bound by Courier/Stoughton The cover was printed by Phoenix Color Recognizing the importance of preserving what has been written, it is a policy of John Wiley & Sons, Inc to have books of enduring value published in the United States printed on acid-free paper, and we exert our best efforts to that end The paper in this book was manufactured by a mill whose forest management programs include sustained yield harvesting of its timberlands Sustained yield harvesting principles ensure that the number of trees cut each year does not exceed the amount of new growth Copyright © 1996, by John Wiley & Sons, Inc All rights reserved Published simultaneously in Canada Reproduction or translation of any part of this work beyond that permitted by Sections 107 and 108 of the 1976 United States Copyright Act without the permission of the copyright owner is unlawful Requests for permission or further information shOUld be addressed to the Permissions Department, John Wiley & Sons, Inc Library of Congress Cataloging-in-Publication Data: Ross, Sheldon M Stochastic processes/Sheldon M Ross -2nd ed p cm Includes bibliographical references and index ISBN 0-471-12062-6 (cloth alk paper) 1 Stochastic processes I Title QA274 R65 1996 5192-dc20 Printed in the United States of America 10 9 8 7 6 5 4 3 2 95-38012 CIP On March 30, 1980, a beautiful six-year-old girl died. This book is dedicated to the memory of Nichole Pomaras Preface to the First Edition This text is a nonmeasure theoretic introduction to stochastic processes, and as such assumes a knowledge of calculus and elementary probability_ In it we attempt to present some of the theory of stochastic processes, to indicate its diverse range of applications, and also to give the student some probabilistic intuition and insight in thinking about problems We have attempted, wherever possible, to view processes from a probabilistic instead of an analytic point of view. This attempt, for instance, has led us to study most processes from a sample path point of view. I would like to thank Mark Brown, Cyrus Derman, Shun-Chen Niu, Michael Pinedo, and Zvi Schechner for their helpful comments SHELDON M. Ross vii Preface to the Second Edition The second edition of Stochastic Processes includes the following changes' (i) Additional material in Chapter 2 on compound Poisson random variables, including an identity that can be used to efficiently compute moments, and which leads to an elegant recursive equation for the probabilIty mass function of a nonnegative integer valued compound Poisson random variable; (ii) A separate chapter (Chapter 6) on martingales, including sections on the Azuma inequality; and (iii) A new chapter (Chapter 10) on Poisson approximations, including both the Stein-Chen method for bounding the error of these approximations and a method for improving the approximation itself. In addition, we have added numerous exercises and problems throughout the text. Additions to individual chapters follow: In Chapter 1, we have new examples on the probabilistic method, the multivanate normal distnbution, random walks on graphs, and the complete match problem Also, we have new sections on probability inequalities (including Chernoff bounds) and on Bayes estimators (showing that they are almost never unbiased). A proof of the strong law of large numbers is given in the Appendix to this chapter. New examples on patterns and on memoryless optimal coin tossing strategies are given in Chapter 3. There is new matenal in Chapter 4 covering the mean time spent in transient states, as well as examples relating to the Gibb's sampler, the Metropolis algonthm, and the mean cover time in star graphs. Chapter 5 includes an example on a two-sex population growth model. Chapter 6 has additional examples illustrating the use of the martingale stopping theorem. Chapter 7 includes new material on Spitzer's identity and using it to compute mean delays in single-server queues with gamma-distnbuted interarrival and service times. Chapter 8 on Brownian motion has been moved to follow the chapter on martingales to allow us to utilIZe martingales to analyze Brownian motion. ix x PREFACE TO THE SECOND EDITION Chapter 9 on stochastic order relations now includes a section on associated random variables, as well as new examples utilizing coupling in coupon collecting and bin packing problems. We would like to thank all those who were kind enough to write and send comments about the first edition, with particular thanks to He Sheng-wu, Stephen Herschkorn, Robert Kertz, James Matis, Erol Pekoz, Maria Rieders, and Tomasz Rolski for their many helpful comments. SHELDON M. Ross Contents CHAPTER 1. PRELIMINARIES 1.1. 1.2. 1.3. 1.4 1.5. 1 1.6. 1.7. 1.8. 1.9. Probability 1 Random Variables 7 Expected Value 9 Moment Generating, Characteristic Functions, and 15 Laplace Transforms Conditional ExpeCtation 20 1.5.1 Conditional Expectations and Bayes 33 Estimators The Exponential Distribution, Lack of Memory, and 35 Hazard Rate Functions Some Probability Inequalities 39 Limit Theorems 41 Stochastic Processes 41 Problems References Appendix 46 55 56 CHAPTER 2. THE POISSON PROCESS 59 64 66 2 1. The Poisson Process 59 22. Interarrival and Waiting Time Distributions 2.3 Conditional Distribution of the Arrival Times 2.31. The MIGII Busy Period 73 2.4. Nonhomogeneous Poisson Process 78 2.5 Compound Poisson Random Variables and 82 Processes 2.5 1. A Compound Poisson Identity 84 25.2. Compound Poisson Processes 87 xi xii 2.6 Conditional Poisson Processes Problems References 89 97 88 CONTENTS CHAPTER 3. RENEWAL THEORY 98 3.1 Introduction and Preliminaries 98 32 Distribution of N(t) 99 3 3 Some Limit Theorems 101 331 Wald's Equation 104 332 Back to Renewal Theory 106 34 The Key Renewal Theorem and Applications 109 341 Alternating Renewal Processes 114 342 Limiting Mean Excess and Expansion of met) 119 343 Age-Dependent Branching Processes 121 3.5 Delayed Renewal Processes 123 3 6 Renewal Reward Processes 132 3 6 1 A Queueing Application 138 3.7. Regenerative Processes 140 3.7.1 The Symmetric Random Walk and the Arc Sine Laws 142 3.8 Stationary Point Processes 149 Problems References 153 161 CHAPTER 4. MARKOV CHAINS 163 41 Introduction and Examples 163 42. Chapman-Kolmogorov Equations and Classification of States 167 4 3 Limit Theorems 173 44. Transitions among Classes, the Gambler's Ruin Problem, and Mean Times in Transient States 185 4 5 Branching Processes 191 46. Applications of Markov Chains 193 4 6 1 A Markov Chain Model of Algorithmic 193 Efficiency 462 An Application to Runs-A Markov Chain with a 195 Continuous State Space 463 List Ordering Rules-Optimality of the Transposition Rule 198 CONTENTS XIII A ••• 4.7 Time-Reversible Markov Chains 48 Semi-Markov Processes 213 Problems References 219 230 203 CHAPTER 5. CONTINUOUS-TIME MARKOV CHAINS 231 5 1 Introduction 231 52. Continuous-Time Markov Chains 231 5.3.-Birth and Death Processes 233 5.4 The Kolmogorov Differential Equations 239 5.4.1 Computing the Transition Probabilities 249 251 5.5. Limiting Probab~lities 5 6. Time Reversibility 257 5.6.1 Tandem Queues 262 5.62 A Stochastic Population Model 263 5.7 Applications of the Reversed Chain to Queueing Theory 270 271 57.1. Network of Queues 57 2. The Erlang Loss Formula 275 573 The MIG/1 Shared Processor System 278 58. Uniformization 282 Problems References 2R6 294 CHAPTER 6. MARTINGALES 295 61 62 6 3. 64. 65. Introduction 295 Martingales 295 Stopping Times 298 Azuma's Inequality for Martingales 305 Submartingales, Supermartingales. and the Martingale 313 Convergence Theorem A Generalized Azuma Inequality 319 Problems References 322 327 CHAPTER 7. RANDOM WALKS 328 Introduction 32R 7 1. Duality in Random Walks 329 xiv CONTENTS 7.2 Some Remarks Concerning Exchangeable Random Variables 338 73 Using Martingales to Analyze Random Walks 341 74 Applications to GI Gil Queues and Ruin 344 Problems 7.4.1 The GIGll Queue 344 7 4 2 A Ruin Problem 347 7 5 Blackwell's Theorem on the Line 349 Problems References 352 355 CHAPTER 8. BROWNIAN MOTION AND OTHER MARKOV 356 PROCESSES 8.1 Introduction and Preliminaries 356 8.2. Hitting Times, Maximum Variable, and Arc Sine Laws 363 83. Variations on Brownian Motion 366 83.1 Brownian Motion Absorbed at a Value 366 8.3.2 Brownian Motion Reflected at the Origin 368 8 3 3 Geometric Brownian Motion 368 8.3.4 Integrated Brownian Motion 369 8.4 Brownian Motion with Drift 372 84.1 Using Martingales to Analyze Brownian Motion 381 85 Backward and Forward Diffusion Equations 383 8.6 Applications of the Kolmogorov Equations to Obtaining Limiting Distributions 385 8.61. Semi-Markov Processes 385 862. The MIG/1 Queue 388 8.6.3. A Ruin Problem in Risk Theory 392 87. A Markov Shot Noise Process' 393 88 Stationary Processes 396 Problems References 399 403 CHAPTER 9. STOCHASTIC ORDER RELATIONS Introduction 404 9 1 Stochastically Larger 404 404 2. Coupling 409 9.7.1 Brun's Sieve 10.3.6 Applications of Variability Orderings 437 9. Improving the Poisson Approximation 467 Problems References 470 472 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS INDEX 473 505 . Stochastically More Variable 433 9. POISSON APPROXIMATIONS 457 Introduction 457 457 10.2.2 1.6. A Renewal Process Application 440 963.3.6. Hazard Rate Ordering and Applications to Counting Processes 420 94. Stochastic Monotonicity Properties of Birth and Death Processes 416 9. A Branching Process Application 443 9.1.CONTENTS xv 9 2.2 Exponential Convergence in Markov 418 Chains 0.2 The Stein-Chen Method for Bounding the Error of the Poisson Approximation 462 10. Likelihood Ratio Ordering 428 95. Associated Random Variables 446 Probl ems References 449 456 CHAPTER 10. Comparison of GIGl1 Queues 439 9. Axiom (2) P(S) = 1 Axiom (3) For any sequence of events E.1.) = ~~ P(E. and we denote it by S.) when the E. P(E) ~ 1. We refer to P(E) as the probability of the event E. 1. 1 PROBABILITY A basic notion in probability theory is random experiment an experiment whose outcome cannot be determined in advance. If E C F.4.P(E) where EC is the complement of E. events for which E.).2.CHAPTER 1 Preliminaries 1.) :S ~~ P(E.1. are mutually exclusive.E.3. P(U~ E. and is said to occur if the outcome of the experiment is an element of that subset. P(EC) = 1 . (2). P(U~ E.1. cf> is the null set). then P(E) :S P(F). E 2 .1. An event is a subset of a sample space. We shall suppose that for each event E of the sample space S a number P(E) is defined and satisfies the following three axioms*: Axiom (1) O:s. The set of all possible outcomes of an experiment is called the sample space of that experiment. exclusive. The inequality (114) is known as Boole's inequality * Actually P(E) will only be defined for the so-called measurable events of S But this restriction need not concern us 1 . 1.. and (3) are.).1.. 1. 1.) = ~ P(E. that is. P( •• that are mutually = cf> when i ~ j (where 9 E. Some simple consequences of axioms (1). > 1. then we define a new event.+1> n .. If {E". n 2= I} is an increasing sequence of events... which we define as follows· A sequence of even ts {E" .=\ . n 2= I} is said to be an increasing sequence if E" C E..+I._00 U . = U E. n 2= 1.+I. and define events F". Similarly if {E".. n 2: 1} is either an increasing or decreasing sequence of events. = E"E~_\... n 2= 1. denoted by lim"_.=1 and " U F. then define lim.1 If {E". n>l That is.= ."'I E.2 PRELIMIN ARIES An important property of the probability function P is that it is continuous.=\ i-I for all n 2: 1. ._00 = n OIl E" when E" :J E. then Proof Suppose. E" by OIl limE. we need the concept of a limiting event.. n .. To make this more precise. . first. We may now state the following: PROPOSITION 1.+I. =U . n 2: 1 by F" = E" ( Y II-I ) t E. i < n It is easy to venfy that the Fn are mutually exclusive events such that '" U F. n 2= I} is a decreasing sequence._oo E" by limE" . F" consists of those points in E" that are not in any of the earlier E.. that {E". " E..1. n 2: 1} is an increasing sequence. when E" C E.> 1 and is said to be decreasing if E" :J E. The number of individuals . P which proves the result (n En) = lim PeEn). n~" which proves the result when {En' n .PeEn)]. n .:::: I} is a decreasing sequence.(A) Consider a population consisting of individuals able to produce offspring of the same kind. hence. equivalently. as U~ F.3 Thus p(y E.) I = n_:r:l limp(U 1 F. then {E~. P (0 E~) = lim P(E~) 1 n_:o But..PROBABILITY .) = " L P(F.~ = (n~ En)'.1. we see that 1.) = n_7J limp(U I E.) = lim PeEn).:::: I} is increasing If {En.) I (by Axiom 3) n = n_7J limLP(F. n .:::: I} is an increasing sequence.P or. (0 1 En) = ~~"2l1 .) =p(y F. "_00 EXAMPLE 1. occur U:: n:=] U:: U:: n:=t U:: . ..1. then P{an infinite number of the E. hence. if n E. denote a sequence of events If "" L: P(E.1. then n E.. the limiting probability that the nth generation is void of individuals is equal to the probability of eventual extinction of the population. is called the size of the zeroth generation All offspring of the zeroth generation constitute the first generation and their number is denoted by Xl In general. occurs. then n E.1 as foHows: = P{the population ever dies out}.1 can also be used to prove the Borel-Cantelli lemma. This follows since if an infinite number of the E. it follows that P{Xn = O} is increasing and thus limn _ oc P{Xn = O} exists What does it represent? To answer this use Proposition 1. and thus for each n at least one of the E. occurs where i . denoted by Xo.) < 1=] 00.. called the lim sup E" can be expressed as '-'" lim sup E. occurs for each n and thus n E.-'" = nU . occurs On the other hand. PROPOSITION 1.2: n.4 PRELIMINARIES initially present.1.. occur} =0 Proof The event that an infinite number of the EI occur. and. E. an infinite number of the E. . occur. occurs for each n.2 The Borel-Cantelli Lemma Let EI> E 2 . let Xn denote the size of the nth generation Since Xn = 0 implies that X n+ l = 0. Proposition 1. That is. ExAMPlE 1.) IX> =0. with probability 1. E. independence is required. Xn must equal 1.(a) Let Xl. P{Xn = O} = lIn 2 = 1 . For a converse to the Borel-Cantelli lemma. as ~: peEn} < 00. PROPOSITION 1. X 2 .) ~ lim n_OCI c=n 2: P(E.P{Xn = 1}.3 Converse to the Borel-Cantelli Lemma are independent events such that 2: P(E n-l IX> n) = 00. it follows from the Borel-Cantelli lemma that the probability that Xn equals 0 for an infinite number of n is equal to O. . If we let En = {Xn = O}. it follows from Proposition P(Q ~ E. then.1. and the result is proven.' 5 2: As U':n E" n 1 1 1 that 1.1. lim Xn = 1. for all n sufficiently large. •• be such that n~1. Hence.) = P(~~ Y. is a decreasing sequence of events. then P{an infinite number of the En occur} = 1.PROBABILITY . and so we may conclude that. P(E.6 Proof PRELIMINARIES P{an infinite number ofthe En occur} == P {~~~ td E. X" will not approach a lImiting value as n -4 00. Also.3 that En occurs infinitely often.} ~!!!.) = i-If 00 0 since for all n.) = 00 it follows from Proposition 1. with probability 1.1(c) Let XI.P{Xn = I}. as L:~I P(E~) = 00 it also follows that E~ also occurs infinitely often. (by independence) = II (1 . = ••• be independent and such that n~1 P{Xn If we let E" O} = lin = 1 . = {Xn = O}.») '" 2: P(E. . Xn will equal 0 infinitely often and will also equal 1 infinitely often. Hence. with probability I.1. Hence the result follows.. X 2 .n == n_oo limp(U Ei) Now. Hence. then as L:~I P(E.» (by the inequality 1 .x s eZ ) = exp ( = ~ P(E. ExAMPLE 1. where X-I(A) is the event consisting of all points s E S such that X(s) EA. Y::$. Since F(x) fB f(x) dx J~ f(x) dx.7 1. ::$. We shall denote 1 .00. x} and Fy(y) = P{Y <: y}. The distribution function F of the random variable X is defined for any real number x by F(x) = P{X::$.RANDOM VARIABLES . called the probability density function. f(x) = dx F(x). the probability that X will assume a value that is contained in the set A is equal to the probability that the outcome of the experiment is contained in X-I(A). For any set of real numbers A. such that P{X is in B} = for every set B. and so F(x) = P{X> x} A random variable X is said to be discrete if its set of possible values is countable For discrete random variables. P{X E A} = P(X-I(A». x} = P{X E (. y) = P{X The distribution functions of X and Y.F(x) by F(x). The joint distribution function F of two random variables X and Y is de· fined by d F(x. x. y}.2 RANDOM VARIABLES Consider a random experiment having sample space S A random variable X is a function that assigns a real value to each outcome in S. F(x) 2: P{X = y} YSJ: A random variable is called continuous if there exists a functionf(x). . x]). it follows that . That is. Fx(x) P{X s.. such that P{Xis inA. equivalently. y) by making use of the continuity property of the probability operator. Fy(y) lim F(x. x}. called the joint probability density function. The joint distribution of any collection XI.y) dy dx for all sets A and B. x. Specifically. n_oo or.XII)' . X_OIl The random variables X and Yare said to be independent if F(x. Y:5. Xl> ". Y <: YII} = P{X:5. y-+'" Similarly. lim .. XII of random variables is defined by Furthermore. Yn}. y). denote an increasing sequence converging to 00 Then as the events {X:s. x. YS:Yn} 11=1 '" {X <:x}. y). are increasing and lim{X :5. . x. n 2: 1."r oo . the n random vanables are said to be independent if where Fx(xr ) J = X. Yis in B} Lt t(x. Fx(x) = lim F(x. y) = Fx(x)Fy(y) for all x and y. y). it follows from the continuity property that lim P{X :s. let Yn. Yn} 11-+" U {X<:x.__ F(XI. Y:s. n ::> 1. The random variables X and Yare said to be jointly continuous if there exists a function t(x.8 PRELIMIN ARIES can be obtained from F(x. 3) .. It follows that independent random variables are uncorrelated However. it follows from (1 3 1) that E[h(X)] == J:. . defined by Cov(X. is defined by (1. Y) = E[(X - EX)(Y . (The reader should think of an example) An important property of expectations is that the expectation of a sum of random variables is equal to the sum of the expectations (1. Since h(X) is itself a random variable.EY)] = E[XY] .32) E[h(X)] = J:. x dFh(x).EXPECTED VALUE 9 1. That is.3. .1) E[X] = J:oo x dF(x) = { I~ . denoted by EtX]. xf(x) dx if X is continuous if X is discrete ~xP{X=x} provided the above integral exists Equation (1.3 1) also defines the expectation of any function of X. the converse need not be true. say h(X). it can be shown that this is identical to J: . h(x) dF(x).E[X])2] = E[X2] - E2[X] Two jointly distributed random variables X and Yare said to be uncorrelated if their covariance.3.3 EXPECTED VALUE The expectation or mean of the random variable X.E[X]E[Y] is zero. h(x) dF(x) The variance of the random variable X is defined by Var X = E[(X . where Fh is the distribution function of h(X) However. (1. = 11X. it follows that P{X.. Var(X.. = 1} 1 1 -.] = lIn. and so E[X.~ = 1} = P{X.) Also =. XI] 2: Var(X.r1) n 1 Now.. = 1. At a party n people put their hats in the center of a room where the hats are mixed together Each person then randomly selects one We are interested in the mean and variance of X~the number that select their own hat To solve. E[X. + XII' where X = .n-l' .) + 2 2:2: Cov(X.X.] = P{X. as the ith person is equally likely to select any of the n hats.}X) 1=1 II EXAMPLE 1. 1(1 -.= { 0 and thus 1 if the ith and jth party goers both select their own hats otherwise.3{A) The Matching Problem.10 PRELIMINARIES The corresponding property for variances is that (134) Var [t. =--.. = l}P{X. = 1} = lin. we use the representation X = XI + X 2 + .:. X/X. {1 if the ith person selects his or her own hat otherwise 0 Now. that occur. ifN= 0 if N> O.3. .~) = n(n . Let AI.EXPECfED VALUE 11 Hence.3. . .5(f) for an explanation as to why these results are not surpnsing ) EXAMPLE 1.5) But by the binomial theorem.3. (1.4)... (~) n (n 1 2 1) Thus both the mean and variance of the number of matches are equal to 1.6) (I-I)"~ ~ (~) (-I)' = i 1=0 (~) (-1)' 1 since (7) = 0 when i > m. A useful identity can be obtained by noting that (1. Cov(X. from (1.3) and (1. ••• .. E[X] = 1 and Var(X) n: 1 + 2 =1.. 1 :S j :s.n by I = 0 } {I if A} occurs otherwise.1) - 1 (1)2 . Letting then N d~notes the number of the A}. (See Example 1. A 2 .. j 1. n. All denote events and define the indicator variables I}.3.1) 1 Therefore.3(a) Some Probability Identities. = n 2(n . .3.3.7) yields (1.). E [(~)] = E[number of pairs of the A.3. that occur] = LL P(A..A). However.3.I or (1.8) E[Il = E[N] . 1<. + (-1)n+IE [(:)].6) yield 1.5) and (1.u Hence. E[I] = P{N> O} == P{at least one of the A..3.E [(:)] + . if we let ifN>O PRELIMINARIES if N = 0. then (1.7) =~ (~) (-1)' 1= ~ (~) (-1)1+1 Taking expectations of both sides of (1. occurs} and E[N] = E [~~] = ~ P(A. E [(~)] = E[number of sets of sIZe i that occur] Hence. . Taking expectations of both sides of the above yields E[I.A.3. ( .8) is a statement of the well-k~own identity P( 9 Ai) = ~ peA.] = n-' (-1)' (' + 2. = I. An occur.') (-1)'.=0 n-' =2.+ 1 i) (-1)'.A •• k) . . 2. .) - ~<f P(A. i) E [(.=0 . Then define I = . Other useful identities can also be denved by this approach.. (N . ..)] N 1 . in general. suppose we want a formula for the probability that exactly . . + (-1)n+lp(A IA 2 ' An). {1 if N=. = N-. (N) (N . otherwise 0 and use the identity (~) (1 or I.) (' + 2 N .+ . by the same reasoning.. .A.. (:) 1)N-.) + ~~f: P(A. (1. . For instance.=0' 1 = n-..EXPECTED VALUE 13 and.') (-1)' .=0 . of the events AI. 1 '. 2. 4. (1.5). we see from (1 3. (3.3.. and arbitrarily number them as 1. Show that for any graph there is a subset of nodes A such that at least one-half of the edges have one of their nodes in A and the other in AC (For instance. 5)}. To verify B this. called nodes. (1. L P(A"A . 2. + i boxes is empty. .3). EXAMPLE 1.4). Let us compute the probability that exactly' of the boxes are empty.3).+ i and each term in the sum is equal to the probability that a given set of . Suppose that the graph contains m edges. if we let C(B) denote the number of edges that have exactly one of their nodes in B. of the events AI.9) that P{exactly. - n i)m . attempts to solve deterministic problems by first introducing a probability structure and then employing probabilistic reasoning. 5} and the set of edges E = {(1. much employed and popularized by the mathematician Paul Erdos.3 1 we could take A = {l. (3. let us introduce probability by randomly choosing a set of nodes S so that each node of the graph is independently in S with probability 112 If we now let X denote the number of edges in .14 PRELIMINARIES or (1 39) P{exactly. independent of the locations of the other balls.~O . By letting A. each ball is equally likely to go into any of the n boxes. As an application of (1 3 9) suppose that m balls are randomly put in n boxes in such a way that.. 4).=0 (' + i) ( ~I I "+ 1 < <I '" n .. 2). (2.z . Figure 1. 3. ". then the problem is to show that max C(B) ~ m12. and a set of (unordered) pairs of nodes. 4} ) where the above follows since Solution. AI. consists of ( n ) terms . Our next example illustrates what has been called the p. m For any set of nodes B. • An occur} = ~ (-I)' (' + i) L .) ( 1 -+ .1 illustrates a graph with the set of nodes N = {I. of the boxes are empty} n-r =L(-I)' . (2.obabilistic method This method. 2.3(c) A graph is a set of elements. in the graph illustrated in Figure 1. denote the event that the ith box is empty.) r 11<lz< <I. called edges For instance. 4 MOMENT GENERATING. CHARACTERISTIC FUNCflONS. CHARACTERISTIC FUNCTIONS. then E[X] = E [~XI] = ~ E[XI] = ml2 Since at least one of the possible values of a random variable must be at least as large as its mean. AND LAPLACE TRANSFORMS The moment generating function of X is defined by rfJ(t) = E [e rX ] = f rx e dF(x) .MOMENT GENERATING.3. letting XI equal 1 if edge i has exactly one of its nodes in S and letting it be 0 otherwise. provided that the graph is such that C(8) is not constant. we can thus conclude that C(8) ~ ml2 for some set of nodes 8 (In fact. the graph that have exactly one of their nodes in S. AND LAPLACE 15 Figure 1.1.9 and 1 10 give further applications of the probabilistic method 1. then X is a random variable whose set of possible values is all of the possible values of C(8) Now. A graph. we can conclude that C(8) > ml2 for some set of nodes 8 ) Problems 1. rjJ'(t) = E[X 2e'X ] ".(I . and then evaluating at t = 0 That is.(] _ p)' " r.p))n np np(1 . cb(t) Mean Variance (:) pI(1 .l and /J. O:5p::S1 Poisson with parameter A > 0 Probability Mass Function. I.x -= 1. .I)} e xl' x 0. Thi~ i~ quite important because it enables us to characterize the probability di~tribution of a random vanable by its generating function EXAMPLE 1.p) = 0.n exp{A(e ' .p)n-. p(x) Moment Generating Function. x -A (pel + (I . Geometric with parameter 0 ::s p:51 Negative binomial with parameters r.I)p. pe' I . p. 2.p)e ' pe' - I p I-p pl (x ..2 and respective variances Table 1. ".16 PRELIMINARIES All the moment~ of X can be successively obtained by differentiating".p)e' ( )' I . p p(1 _ py-l.n(t) Evaluating at t = = E [X"e IX ].p) pl = r. >.4(A) Let X able~ with re~pective and Y be independent normal random varimeans /J. r + 1.(1 . 0 yield~ n~l It should be noted that we have assumed that it is justifiable to interchange the differentiation and integration operations.4.I(t) = E[Xe 'X ].] x - r P r(1 . x = 1. This is usually the case When a moment generating function exists.. .2. it uniquely determines the di~tnbution.1 Discrete Probability Distribution Binomial with parameters n. Thus the moment generating function of X + Y is that of a normal random vanable with mean ILl + 1L2 and variance at + a~. It can be shown that <p always exists and. like the moment generating function. X 2 = a21 Z 1 + . CHARACTERISTIC FUNCTIONS. If for some constants a'l' 1 -< i :=s m.. where the last equality comes from Table 1. and IL" 1 :=s i :=s m.. it is theoretically convenient to define the characteristic function of X by -oo<t<oo. this is the distribution of X + Y. . Let ZI.4(a) The Multivariate Normal Distribution. The moment generating function of their sum is given by r/Jx+y(t) = E [el(x+Y)] = E[eIX]E[e 1Y ] r/Jx(t)r/Jy(t) (by independence) = = exp{(1L1 + 1L2)t + (at + aDt 2/2}. where i = -v=l. By uniqueness.. 1 -< j ~ n.MOMENT GENERATING. Xn by or the joint characteristic function by It may be proven that the joint moment generating function (when it exists) or the joint characteristic function uniquely determines the joint distribution. 1... XI = allZI . We may also define the joint moment generating of the random variables XI. + a2n Z n + 1L2.4. . + alnZn + ILl. uniquely determines the distribution of X. AND LAPLACE 17 at and a~.Zn be independent standard normal random variables.' .2. As the moment generating function of a random vanable X need not exist. EXAMPLE + .. 4>(t) etb - Mean a+b Vanance (b .a) 2 A 12 A2 n A2 0 A A.x~O .4. f(x) ~. b) Exponential with parameter A > 0 Gamma with parameters (n. A).. A > 0 Normal with parameters (p" ( Beta with parameters a.2 Probability Density FunctIOn.a<x<b Ae-. b > 0 cx 1(1 C- _ rea + b) f(a)f(b) a a+b + b)Z(a + b + I) .lr(Ax).~ QIIO Table 1.e-(X-") 12fT C ~ 00 t)" A P.X)H.a)Z 1 ehl t(b .. n ) 1 V2'lT u 4 - 1 ! - <x< < 1 00 ' .t Ae-.t + 2 u2 ab (a > 0. a 2 Moment Generatmg Function. b. 0 < X uZt2} exp { p.-1 (n-I)' .lr.x~ Contmuous Probablhty Dlstnbutlon Umform over (a. t. tm ) = exp {~ t. As in the case of characteristic functions. ate normal distribution. it is sometimes more convenient to use Lap/ace transforms rather than characteristic functions. .l and Cov(X" ~).t. . When dealing with random vanables that only assume nonnegative values.= I the independent normal random variables ZI.~I m m Now. Cov(X" XJ . . i. which shows that the joint distribution of XI. we see that !/I(tl. " . AND LAPLACE . Cov(X" XJ }.Zn it is also normally distributed. X m • The first thing to note is that since ~ t.~I . Let us now consider .MOMENT GENERATING. e- U dF(x). the Laplace transform uniquely determines the distribution. '" . CHARACfERISTIC FUNCfIONS.m. . The Laplace transform of the distribution F is defined by Pes) = f. . This in tegral exists for complex variables s = a + bi.. j = 1.. Xm are said to have a multivari- the joint moment generating function of XI. .X. . + 1/2 ~ t.. . Its mean and variance are and = 2: 2: tit.19 then the random variables XI.lL.. Xm is completely determined from a knowledge of the values of E[X. if Y is a normal random variable with mean IL and variance u 2 then Thus.. where a ~ O. is itself a linear combination of m . 20 PRELIMINARIES We may also define Laplace transforms for arbitrary functions in the following manner: The Laplace transform of the function g. is defined by g(s) = I~ e- SX dg(x) provided the integral exists It can be shown that g determines g up to an additive constant. by P{X = x IY = y } = P{X = x. Y = y} P{Y=y} The conditional distribution function of X given Y = y is defined by F(xly) = P{X < xl Y = y} and the conditional expectation of X given Y = y. in this case. denoted g. by F(xly) = P{X ~ xl Y = y} = to: f(xly) dx The conditional expectation of X.5 CONDITIONAL EXPECTATION If X and Yare discrete random variables. fy(y) and the conditional probability distribution function of X. 1. is defined. by E[xIY=Y]= IxdF(xIY)=LxP{X=xIY=y} x If X and Y have a joint probability density function f(x. given Y = y. given Y = y.xf(xly) dx. for all y such that P{Y = y} > 0. is defined for all y such that fy(y) > 0 by f(xly) = f(x. Thus all definitions are exactly as in the unconditional case except that all probabilities are now conditional on the event that Y = y .y). y). the conditional probability mass function of X. by E[XI Y= y] = r". the conditional probability density function of X. given Y = y. is defined. given Y = y. CONDITION AL EXPECTATION '21 Let us denote by £[XI Y] that function of the random variable Y whose value at Y = y is £[XI Y = y].1) states £[X] = 2: £[XI Y= y]P{Y= y}.5 1) says £[X] = J:". then Equation (1. Y = y} y x :::: 2:x x 2: P{X= x. An extremely useful property of conditional expectation is that for all random variables X and Y (151) £[X] = £[£[XI Y]] f £[XI Y= y] dFy(y) when the expectations exist If Y is a discrete random variable. Y= y} y :::: 2:xP{X ::: £[X].1) when X and Y Are Discrete To show £[X] = 2: £[XI Y y y]P{Y = y}. We write the right-hand side of the above as 2: £[XI Y y = y]P{Y = y} = 2: 2:xP{X = xl Y= y}P{Y = y} y x = 2: 2: xP{X = x.51) we see that £[X] is a weighted average of the conditional expected value of X given that Y = y. x} and the result is obtained. then Equation (1 5. each of the terms £[XI Y y] being weighted by the probability of the event on which it is conditioned . Thus from Equation (1. £[XI Y = y]f(y) dy We now give a proof of Equation (1 5 1) in the case where X and Yare both discrete random van abies Proof of (1 5. y While if Y is continuous with density fey). . Let X .. We shall compute the moment generating function of Y = ~I Xr by first conditioning on N. E[e rX ] is the moment generating function of X.22 ExAMPLE PRELIMINARIES The Sum of a Random Number of Random Variables.. Now 1.1)E2[X] + NE[X2]] = E[N] Var(X) + E[N2]E2[X].5(A) E [ exp {t ~ xj IN exp = n] = = E [ = = {t ~ Xr} IN E [exp {t ~ Xr}] n] (by independence) ('I'x(t))n. = and so r/ly(t) = E [exp {t ~ X}] = E[(r/lX(t))N]. X). Evaluating at t == 0 gives E[Y] = E[NE[X]] = E[N]E[X] and E[y2] = E[N(N . ••• denote a sequence of independent and identically distnbuted random vanables. To compute the mean and vanance of Y r/ly (t) as follows: r/ly(t) = ~7 X" we differentiate = E[N(r/lx(t))N-Ir/lX(t)]. and let N denote a nonnegative integer valued random variable that is independent of the sequence XI.. . 1)(r/lx(t))"~-2(l/Ix(t))2 r/lHt) = E[N(N - + N(r/lX(t))N-Ir/I'X(t)]. . where 'l'x(t) Hence. X 2 . and thus X' has the same distnbution as X. Let Y denote the door initially chosen Then Now given that Y = 1. the time when the miner reaches safety. it follows that X E[eIXI Y = 1] = 2. WMPLE1. But once the miner returns to his cell the problem is exactly as before.5.CONDITIONAL EXPECTATION 23 Hence. Therefore. where X' is the number of additional hours to safety after returning to the mine. let E denote an arbitrary event and define the indicator . The first door leads to a tunnel that takes him to safety after two hours of travel The second door leads to a tunnel that returns him to the mine after three hours of traveL The third door leads to a tunnel that returns him to his mine after five hours.2) yields or E[ IX] _ el' e -3-~I-e5l Not only can we obtain expectations by first conditioning upon an appro· priate random variable. let us compute the moment generating function of X. Similarly. E[e'XI Y 2] = E[e '(3+X)] = e3'E[e IX ]. Var(Y) = E[y2] E2[y] = E[N] Var(X) + E2{X] Var(N). it follows that X 3 + X'. given that Y = 2. and so = e21• Also.5(S) A miner is trapped in a mine containing three doors. but we may also use this approach to compute proba· bilities. Substitution back into (1. To see this. Assuming that the miner is at all times equally likely to choose anyone of the doors. while the others (those without a match) put their selected hats in the center of the room.1 To compute E[Rn]. that those choosing their own hats depart. mix them up. Therefore.n .5. The proof will be by induction on n.=0 n 1 + E[Rn]P{M 1 + E[Rn]P{M = O} + L E[Rn-.]P{M = .P{M = O}) (since E [M] = = E[Rn]P{M = O} + n(1 which proves the result. ExAMPLE 1. from Equation (1.3(a). 1) . . find E[Rn] where Rn is the number of rounds that are necessary.])P{M = i} . the number of individuals. the number of matches that occur in the first round This gives E[Rn] = L E[R nIM •=0 n = i]P{M = I} .S(c) Suppose in the matching problem.=1 n i} = = O} + L (n . As it is obvious for n = 1 assume that E[Rkl = k for k = 1. the number of rounds needed will equal 1 plus the number of rounds that are req uired when n .P{M = O}) . Example 1. start by conditioning on M.24 PRELIMINARIES random vanable X by x=G E[X] = if E occurs if E does not occur.i people remain to be matched with their hats Therefore. We will now show that E [Rn] = n.1) we obtain that peE) = f peE IY = y) dFy(y). . It follows from the definition of X that peE) E[XI y= y] = p(EI y= y) for any random variable Y. Now. given a total of i matches in the initial round. and then reselect.=1 n i)P{M = i} (by the induction hypothesis) = 1 + E[Rn]P{M = O} + n(1.E[M] . If this process continues until each individual has his or her own hat. E[Rnl = = L (1 + E[Rn_. 1 and B a total of m votes As a similar result is true when we are given that B receives the last vote.m denote the desired probability By conditioning on which candidate receives the last vote counted we have Pn.by induction on n + m As it is obviously true when n + m = I-that is.I = Fn. we see from the above that (153) Pn.Pn.5(D) Suppose that X and Yare independent random variables having respective distributions F and G Then the distribution of X + Y -which we denote by F * G. a I Y = y} dG(y) = roo P{X + y < a I Y = y} dG(y) = roo F(a .. In an election...Pnm -1 • n+m m+n n m We can now prove that Pnm = . where n > m.y) dG(y) = We denote F * F by F2 and in general F * Fn.5(E) The Ballot Problem. candidate A receives n votes and candidate B receives m votes. and call the convolution of F and G-is given by (F* G)(a) = P{X + Y:s a} roo P{X + Y =s. show that the probability that A is always ahead in the count of votes is (n - m)/(n + m) Solution. the n-fold convolution of F with itself. P I •D = I-assume it whenever n + m = k Then when n-m n+m .CONDITIONAL EXPECTATION 2S EXAMPLE 1. Thus Fn.m = .m = P{A always ahead IA receives last vote} _n_ n+m + P{A always ahead IB receives last vote} ~ n+m Now it is easy to see that. Assuming that all orderings are equally likely. given that A receives the last vote. the probability that A is always ahead is the same as if A had received a total of n .lm + . Let Pn. is the distribution of the sum of n independent random variables each having distribution F EXAMPLE 1. 1 EXAMPLE 1.1 = n+m n-m The result is thus proven.p )n. consider successive flips of a coin that always lands on "heads" with probability p. Hence. we see that this is just the probability in the ballot problem when m == n .we obtain Pn = P(E) = P(EI M)P(M) + p(EI MC)P(MC) .3) and the induction hypothesis P n. after beginning. The probability that the first time this occurs is at time 2n can be obtained by first conditioning on the total number of heads in the first 2n trials.5.5(F) The Matching Problem Revisited Let us reconsider Example 1 3(a) in which n individuals mix their hats up and then randomly make a selection We shall compute the probability of exactly k matches First let E denote the event that no matches occur. This yields P{first time equal = 2n} = P{first time equal = 2n In heads in first 2n} (2:) pn(1 .. it is easy to see that all possible orderings of the n heads and n tails are equally likely and thus the above conditional probability is equivalent to the probability that in an election in which each candidate receives n votes. I (~ ) p"(l . and let us determine the probability distnbution of the first time.p)" (~) p"(l . P{lirst time equal = 2n} = p". Now given a total of n heads in the first 2n flips.p)' = 2n . one of the candidates is always ahead in the counting until the last vote (which ties them) But by conditioning on whoever receives the last vote.m =-- n n-l-m m n-m+l +-----n +mn . and to make explicit the dependence on n write Pn = P(E) Upon conditioning on whether or not the first individual selects his or her own hat-call these events M and MC. The ballot problem has some interesting applications For example.26 n +m = k PRELIMIN ARIES + 1 we have by (1. that the total number of heads is equal to the total number of tails.1+m m +nn +m .1. The probability of the first of these events is Pn .1 people select from a set of n . clearly PI = 0.2 .55) However.5.5).4).I .Pn . equivalently.Pn .4! and. from Equation (1 5.1 4 3 4 .P2) _ .I n + .1 hats that does not contain the hat of one of them.+-- ( -l)n n' To obtain the probability of exactly k matches. we have and thus. in general. n. from Equation (1. P(E I Me) is the probability of no matches when n . we see that . which is seen by regarding the extra hat as "belonging" to the extra person Since the second event has probability [l/(n . or there are no matches and the extra person does select the extra hat.4) = 0. This can happen in either of two mutually exclusive ways Either there are no matches and the extra person does not select the extra hat (this being the hat of the person that chose first). (1. and so Now. P( ElM) (1 5.2 .CONDITIONAL EXPECTATION 27 Clearly.. we consider any fixed group of k individuals The probability that they. Thus.1 Pn = . n 1 or. P _ P __ (P 3 .1)]Pn . and only . is approximately equal to e-1/kL Thus for n large the number of matches has approximately the Poisson distribution with mean 1. = {1 if the ith person selects his or her own hat otherwise. '" n . A as n .. j ¥: i.1). can be regarded as the number of successes in n tnals when each is a success with probability lin.1.1).. 1 1 +~. it is true that it is a rather weak dependence since. To understand this result better recall that the Poisson distribution with mean A is the limiting distnbution of the number of successes in n independent trials.. when npl'! .. Hence we would certainly hope that the Poisson limit would still remain valid under this type of weak dependence. and suppose that a pair of adjacent points is chosen at random. whereas the above result is not immediately applicable because these trials are not independent.5(6) Suppose that n points are arranged in linear order.-n---1 ••. i + 1) is chosen with probability lI(n . Now if we let x. have no matches. 00. = 11X/ = 1} = 1/{n . each resulting in a success with probability PI'!. the pair (i. where Pn . Now. the desired probability of exactly k matches is ---+ .!-':"":' {_1)n-k 3t kt {n .k)! - PRELIM IN ARIES ..(k ..k)t which.1) Pn . We then continue to randomly choose pairs. for n large. for example.. n) (n k)t P = 2' I n-k (k n.k n! Pn .. 2. i = 1. ExAMPLE 1.k is the conditional probability that the other n .... .k individuals. That is.) choices of a set of k individuals. disregarding any pair having a point A Packing Problem.k .... P{X. n .. = 1} = lin and P{X. select their own hats is 1 1 1 _ (n . As there are (. The results of this example show that it does. selecting among their own hats. 0 then the number of matches.28 they. ~~=I X. n.n is defined to be the probability that point i is isolated when there are n points. 5).0 J { if point i is isolated otherwise. (3.1 previously chosen.1.a 2 3 4 5 6 7 Figure 1...n. 2. if n appearance. ..5. i + 1. " i and i..5.. n. (7.. 2. namely.. Since point i will be vacant if and only if the right-hand point of the first segment and the left-hand point of the second segment are both vacant. . 4) is disregarded) as shown in Figure 1 5.JI represents the number of isolated points. in order of For instance.. . and (4... until only isolated points remain. we see that (1.. then L:"1 f. 3). 1. 8).. To denve an expression for the Pn condition on the initial pair-say (i.I1> note that we can consider the n points as consisting of two contiguous segments.. i + 1)-and note that this choice breaks the line into two independent segments-I.. We are interested in the mean number of isolated points. Let That is.CONDITIONAL EXPEL'TATION 29 . .. i-I and i + 2. To derive an expression for P. (2. That is. 4). Pn is the probability that the extreme point n (or 1) will be isolated.-_.. 8 and the random pairs are. .-_.n] 1=1 n L P. Hence E[number of isolated points] = = L E[l. If we let I f IJl . then there will be two isolated points (the pair (3. . .6) Hence the P'JI will be determined if we can calculate the corresponding probabilities that extreme points will be vacant.. . •.. if the initial .. 1=1 n where P. -_. 30 PRELIMINARIES pair is (i.1 = P/I-2 or P _ P /I /I-I = _ Pn ."0 " ' n~2. ~(-1)' L. .I1. then the extreme point n will be isolated if the extreme point of a set of n .1 or (n .1 = PI + ' . 'L-. nThus.l)P/I = PI + .2)Pn .I . this yields P2 . /"0 1'''0 . Since PI = 1 and P3 P4 - P2 = 0.i .= 1 0 = 2.J ._ _ _ /I - 1"1 n . i + 1).l)P/I . in general. J.PI 2 P3 P2 P2= P3 = - 1 2 1 3! or or 1 P3 = 2! 1 1 P4 = 2l . . + P/I-2.J .'L _/1_-1_-1 _PI _.31' 3 and.5. Hence we have /I-I P + _ + P/1_-2 P .3 and subtracting these two equations gives (n . n i P .1 for n gives (n . + Pn . J.-I (-1)//1-' (-1)1 2<i<n-1. + (1)1 ' 3' 2."0 1 1 (-1)11-1 (-I)' ="-. n - 1 'L-. " i = 1.. .(n . Substituting n ..6).. from (1.. J. /I-I L.. J.-.I points is isolated.2 n -1 .1- n..2)Pn .-. ---+ .Pn . 5(H) A Reliability Example. it can be shown from the above that L~: I P. n-the expected number of vacant points-is approximately given by L P. Then P{N> k} = P( Y E.i large we see from the above that P.) = P{set ofj survive Ivalue is x} dG(x) dG(x) Since pf. then each component that was working at the moment the shock arrived will.) p1 .) - = L P(E. independently.X)I P(E. Consider an n component system that is subject to randomly occurring shocks Suppose that each shock has a value that.n :::::: e. . is chosen from a distribution G If a shock of value x occurs. instantaneously fail with probability x We are interested in the distribution of N.) L P(E. and.<I To compute the above probability let PI denote the probability that a given set of j components will all survive some arbitrary shock Conditioning on the shock's value gives PI = f = f (1 . n.Et ) we see that = p~. P(E.Et ) .CONDITIONAL EXPECTATION 31 For i and n .n :::::: (n + 2)e. independent of all else. in fact.=1 n 2 for large n ExAMPLE 1. denote the event that component i has survived the first k shocks. the number of necessary shocks until all components are failed To compute P{N > k} let E" i = 1. P{N> k} = npt - (~) p1 + (.2.. . we have that . Therefore. kiN = n}P{N = n}. valid for all nonnegative integer·valued random variables N (see Problem 1.k IN = n}P{N = n} + P{NI = nl. . k. = P{NI = n"j = 1. that N . and that N. independent of other events. k.. E [N] The reader should note that we have made use of the identity = ~.k} ~ n}P{N ¥= n} P{NI = n"j= 1. Now. is a Poisson random variable with mean A. since N 1. . Nk has a multinomial distribution with parameters Pl .. nand . Pk..32 PRELIMINARIES The mean of N can be computed from the above as follows: E[N]:=o L P{N > k} k=O ex> :=0 ~ i (-1)1+1 ~P~ n ex> (n) I == ~ (~) 1=1 (-1)'+1. Suppose also that each event that occurs is.... 1= 1 ~ PI k = 1 Let NI denote the number of type j events that occur. . . function For any nonnegative integers n" j Then.. .. . classified as a type j event with probability PI' j = 1. Suppose that we are observing events. the total number that occur. given that there are a total of N = n events it follows.P2' .=o P{N > k}. k. .k IN = P{NI = n"j = 1. ..5(1) Classifying a Poisson Number of Events.j = 1. since each event is independently a type j event with probability PI' 1 :S j < k.P.. N 2 . 1 . . . let n = =~ I 1= 1 ~ nl • k NI .1) EXAMPLE 1. and let us determine their joint probability mass = j = 1. 1 Conditional Expectations and Bayes Estimators Conditional expectations have important uses in the Bayesian theory of statistics A classical problem in this area arises when one is to observe data X = (Xl. equivalently. and in Bayesian statistics one often wants to choose d(X) to minimize E [(d(X) .5.. 1. An estimator of 8 can be any function d(X) of the data. are independent Poisson random variables with respective means API' j = 1. = w] E[E[XI W = w. and (ii) for any random variable W. d(X) is a constant. . .I Y = y] Also. Xn) whose distribution is determined by the value of a random variable 8. the conditional expected squared distance between the estimator and the parameter Using the facts that (I) conditional on X. Y] I W = w] E[XI W] = E[E[XI W. . Y] IW] Also.C)2] is minimtzed when c = E[W] . from the equality E [X] = E[E [Xl Y]] = we can conclude that E[XI W or..8)21 X]. Conditional expectations given that Y = y satisfy all of the properties of ordinary expectations.. we have that E implying that [~XII Y = y] = ~ E[X. E[(W . which has a specified probability distribution (called the prior distribution) Based on the value of the data X a problem of interest is to estimate the unseen value of 8. k.CONDITIONAL EXPECTATION Thus we can conclude that the N. we should note that the fundamental result E[X] = E[E[XI Y]] remains valid even when Y is a random vector. except that now all probabilities are conditioned on the event that {Y = y} Hence. . 5.2 If P{E[OIX] :::: O} -=1= 1 then the Bayes estimator E[OIX] is not unbiased Proof Letting Y = 0 and Z (15. PROPOSITION :1. Lemma :1.8) E[(E[O!X] 8)0] =0 Upon adding Equations (1. An important result in Bayesian statistics is that the only time that a Bayes estimator is unbiased is in the tnvial case where it is equal to (1 with probability 1.5.34 PRELIMINARIES it follows that the estimator that minimizes E[(d(X) .0)0] = 0 .5. the lemma follows.:1 For any random variable Y and random vector Z E[(Y E[YI Z])E[YI Z]] =0 Proof E[YE[YIZ]] = E[E[YE[YI Z] IZ]] = E[E[YI Z]E[YI Z]] where the final equality follows because. is given by d(X) = E[OIX].(1)2IX].7) = X in Lemma 151 yields that E[(O E[OIX])E[OIX]] =0 Now let Y = E[ 01 X] and suppose that Y is an unbiased estimator of 0 so that E[Y! 0] = 0 Letting Z = 0 we obtain from Lemma 151 that ( 1. given Z. called the Bayes estimator. (1 An estimator d(X) is said to be an unbiased estimator of E [d(X) 10] if o.E[OIX])E[O!X]] + E[(E[OIX] . E[YI Z] is a constant and so E[YE[YI Z] IZ] = E[YIZ]E[YI Z] Since the final equality is exactly what we wanted to prove. we start with the following lemma. To prove this.57) and (1 58) we obtain that E[(8 . if its distribution is I .E[8IX] =0 1.:: F(x) = IX -00 0 f(y) dy = { 0 x< O. E[(O . or. with probability 1.e.Ax x.6 THE EXPONENTIAL DISTRIBUTION. LACK OF MEMORY.EXPONENTIAL DISTRIBUTION. The moment generating function of the exponential distribution is given by (1. 8 . A > O. if (16. equivalently. LACK OF MEMORY.:: O..1) All the moments of X can now be obtained by differentiating (1. t . where a random variable X is said to be without memory. Yar(X) = 11 A2.2) P{X> s + tlX> t} = P{X> s} for s.E[8IX]f] = 0 implying that.1). -E[(O .0)0] =0 or. The usefulness of exponential random variables denves from the fact that they possess the memoryless property. AND HAZARD RATE FUNCTIONS A continuous random vanable X is said to have an exponential distribution with parameter A. . if its probability density function is given by A -Ax f(x) = { oe x2::0 x<O.6..6. or memoryless.E[OIX])E[OIX] + (E[OIX] . HAZARD RATE FUNCTIONS 35 or. and we leave it to the reader to verify that E[X] = l/A. let us note that the record times of the sequence X" X 2 .and since F(X) has a uniform (0. the ith record value. A is the last to leave the post office? The answer is obtained by reasoning as follows: Consider the time at which A first finds a free clerk. we see that such random variables are memoryless ExAMPlE 1.2) is equivalent to F(s + t) F(s)F(t). However. it follows that the amount of additional time that this other person has to spend in the post office is exponentially distributed with mean 11 A. Let 1i denote the time between the ith and the (i + l)th record.2). a record occurs each time a new high is reached. Suppose also that A is told that his service will begin as soon as either B or C leaves. then (1. •• be independent and identically distributed continuous random variables with distribution F. and has value Xn if Xn > max(XI! . So let us suppose that Fis the exponential distribution with parameter A 1 To compute the distribution of T" we will condition on R. by the lack of memory of the exponential. . That is. X 2 . We say that a record occurs at time n. what is the probability that.. by symmetry. • will be the same as for the sequence F(X. What is its distribution? As a preliminary to computing the distribution of 1i. if the instrument is alive at time t.).). of the three customers. n > 0. given that it has survived t hours. In other words. A t this point either B or C would have just left and the other one would still be in service. and since this is satisfied when F is the exponential. and suppose that when A enters the system he discovers that B is being served by one of the clerks and C by the other. Now RI = XI is exponential with rate 1. it is the same as if he was just starting his service at this point Hence. The condition (1. But by the lack of memory property of the exponential this ExAMPlE ..1) distribution (see Problem 1. then the distribution of its remaining life is the original lifetime distribution.6. it follows that the distribution of 1i does not depend on the actual distribution F (as long as it is continuous). If the amount of time a clerk spends with a customer is exponentially distributed with mean 11 A. F(X2).Xn -..6(A) Consider a post office having two cler ks. the probability that he finishes before A must equal ~ 1.36 PRELiMINARIES If we think of X as being the lifetime of some instrument. where Xo = 00. R2 has the distribution of an exponential with rate 1 given that it is greater than R. . is the same as the initial probability that it lives for at least s hours.6(B) Let XI.62) states that the probability that the instrument lives for at least s + t hours. That is. the density of R. it follows that g(2In) = g(lIn + lin) = g2(lIn). To see this. g(mln) = (g(l))mln. + lin) = gn(lIn). where A = -log(g(l))] Since a distribution function is always . yields i ~ 1. Hence R2 has the same distribution as the sum of two independent exponential random variables with rate 1. we obtain g(x) = e A. has the same distribution as the sum of i independent exponentials with rate 1. Repeating this yields g(mln) = g"'(lIn). Also g(l) = g(lIn + . HAZARD RAIE FUNCTIONS 37 means that R2 has the same distribution as RI plus an independent exponential with rate 1.. right or left continuity. which implies. Hence. suppose that X is memoryless and let F(x) = P{X> x} Then F(s + t) = F(s)F(t). But it is well known (see Problem 1. That is..EXPONENTIAL DIS I RIBUIION. Hence." but it is the unique distribution possessing this property. It turns out that not only is the exponential distribution "memoryless. That is. or even measurability) are of the form g(x) = e. since g is right continuous.. the only solutions of the above equation that satisfy any sort of reasonable condition (such as monotonicity. that g(x) = (g(l)).29) that such a random variable has the gamma distribution with parameters (i..Ax for some suitable value of A. where the last equation follows since if the ith record value equals t. Since g(l) = g2(1I2) ~ 0. [A simple proof when g is assumed right continuous is as follows. The same argument shows that R. 1).. conditioning on R. then none of the next k values will be records if they are all less than t. Fsatisfies the functional equation g(s + t) = g(s)g(t). LACK OF MEMORY. However. Since g(s + t) = g(s)g(t). is given by t ~O. and vice versa) It turns out that the failure rate function A(t) uniquely determines the distribution F.t + dt)} P{X> t} ~---- f(t) dt F(t) = A(t) dt That is. think of X as being the lifetime of some item.38 PRELIMINARIES right continuous. t + dt). To prove this.3) A(t) = [(f) F(t) To interpret A(t). t = P{X E (t. the failure rate function for the exponential distribution is constant The parameter A is often referred to as the rate of the distribution. This checks out since Thus. (Note that the rate is the reciprocal of the mean.. we must have The memoryless property of the exponential is further illustrated by the failure rate function (also called the hazard rate function) of the exponential distribution Consider a continuous random variable X having distribution function F and density f The failure (or hazard) rate function A(t) is defined by (1 6. and suppose that X has survived for t hours and we desire the probability that it will not survive for an additional time dt That is. X > t} + dt ) IX > t} P{X> t} _ P{XE (t. by the memoryless property. it follows that the distribution of remaining life for a t-year-old item is the same as for a new item. we note that A( t) = ---=d::--t_ F(t) .F(t) d - . A(t) represents the probability intensity that a t-year-old item will faiL Suppose now that the lifetime distribution is exponential Then. consider P{X E (t. Hence A(t) should be constant. t + dt) IX> t} Now P{X E ( t. ::: a and 0 otherwise.SOME PROBABILITY INEQUALITIES 39 Integration yields log F(t) = or f>\(I) dt + k Letting t == 0 shows that c = 1 and so F(t) ~ exp{ .:: a} =s. .7. E[X]/a Proof Let I{X .::: a} be 1 jf X.7 SOME PROBABILITY INEQUALITIES We start with an inequality known as Markov's inequality. X Taking expectations yields the result PROPOSITION 1.1 Markov's Inequality If X is a nonnegative random variable.::: a} =s. a} =s.2 Chernoff Bounds Let X be a random variable with moment generating function M(t) for a> 0 P{X. Lemma 1.::: a} =s. e-lUM(t) E[e'X ] Then foral! t > 0 for all t < O.f~ A(t) dt} 1. then for any a > 0 P{X:. Then.::: o that aI{X . e-IOM(I) P{X =s. it is easy to see since X .7. this minimizing value will be positive and so we obtain in this case that Our next inequality relates to expectations rather than probabilities. then M(t) bound for P{X ~ j} is Hence. PROPOSITION 1. then E[f(X)] ~ f(E[X]) provided the expectations exist.IJ-) since f"(~) ~ 0 by convexity Hence. The proof for t < 0 is similar Since the Chernoff bounds hold for all t in either the positive or negative quadrant.40 Proof For t > 0 PRELIMINARIES where the inequality follows from Markov's inequality.3 Jensen's Inequality If f is a convex function. the Chernoff The value of t that minimizes the preceding is that value for which e' := j/ A.7. Proof We will give a proof under the supposition that fhas a Taylor series expansion Expanding about IJ. .7(A) Chemoff Bounds/or Poisson Random Variables.= E[X] and using the Taylor series with a remainderformula yields f(x) = f(lJ-) + f'(IJ-)(x .IJ-) + f"(fJ(x ~ ~)2/2 f(lJ-) + f'(IJ-)(x . := eA(e'-I) If Xis Poisson with mean A. Provided that j/ A > 1. we obtain the best bound on P{X :> a} by using that t that minimizes r'DM(t) ExAMPLE 1. t E T} is a collection of random variables. 1.. if events are occurring randomly in time and X(t) represents the number of events that occur in [0. For instance. t]. + X . We often interpret t as time and call X(t) the state of the process at time t. S.X2 .9 STOCHASTIC PROCESSES A stochastic process X = {X(t). If the index set T is a countable set. Any realization of X is called a sample path. That is. X 2 . for each t in the index set T. and if T is a continuum. we call X a discrete-time stochastic process. then lim P XI + . Th us if we let S" = :2::=1 X" where XI . Central Limit Theorem If XI. + X/I)/n = IL} 1.. then the Strong Law of Large Numbers states that. then Figure 1 91 gives a sample path of X which corresponds • A proof of the Strong Law of Large Numhers is given in the Appendix to this chapter . we call it a continuous-time process. X(t) is a random variable..j. then P {lim(XI /1-+'" + . •• are independent and identically distnbuted with mean IL and variance a 2./n will converge to E[X. ••• are independent and identically distnbuted with mean IL. whereas the central limit theorem states that S" will have an asymptotic normal distribution as n ~ 00.STOCHASTIC PROCESSES 41 Taking expectations gives that 1. with probability 1.8 LIMIT THEOREMS Some of the most important results in probability theory are in the form of limit theorems.. X 2 . •• are independent and identically distributed.x212 dx.nIL $a } = n-+'" { aVn /I f 1 II -'" e.. The two most important are: StrongJLaw of Large Numbers· If XI. .9(A) P{Xn+ 1 i + tlXn = i} = P{Xn+1 = i -tlXn = i} = 112 where i + 1 0 when i m. the random variables are independent. 1.1. . . the next event at time 3 and the third at time 4. That is. Suppose now that the particle starts at 0 and continues to move around according to the above rules until all the nodes t. it possesses independent increments if the changes in the processes' value over nonoverlapping time intervals are independent.X(t) has the same distribution for all t. 2.. the probability that node i is the last node visited can be determined without any computations... That is. and no events anywhere else A continuous-time stochastic process {X(t). m. It is said to possess stationary increments if X(t + s) . consider the first time that the particle is at one of the two .I = m when i = O. i = 1. < tn. if Xn is the position of the particle after its nth step then EXAMPLE 1.9. that are arranged around a circle (see Figure 1 92) At each step the particle is equally likely to move one position in either the clockwise or counterclockwise direction. . . to the initial event occumng at time 1. .42 PRELIMINARIES 3 2 2 3 t 4 Figure 1. .m have been visited What is the probability that node i.m. labelled 0.. IS the last one visited? Solution Surprisingly enough. A sample palh oj X(t) = number oj events in [0. I]. To do so. and i . t E T} is said to have independent increments if for all to < tl < t2 < . and it possesses stationary increments if the distribution of the change in value between any two points depends only on the distance between those points Consider a particle that moves along a set of m + t nodes. 9. But the probability that a particle at node i . it is equal to the probability that a gambler who starts with 1 unit. .1 steps in a specified direction before progressing one step in the other direction That is. Since neither node i nor i + 1 has yet been visited it follows that i will be the last node visited if. Upon conditioning upon whether he reaches up 1 before down n we . i + 1 is visited before i. This is so because in order to visit i + 1 before i the particle will have to visit all the nodes on the counterclockwise path from i . we obtain P{i is the last node visited} = 11m. i = 1.1 (the argument in the alternative situation is identical).1 before he goes broke.m. that is.1 to i + 1 before it visits i. and wins 1 when a fair coin turns up heads and loses 1 when it turns up tails. Hence.1 or i + 1 (with m + 1 == 0) Suppose it is at node i .2. the first time that the particle is at one of the nodes i . n+1 Suppose now we want the probability that the gambler is up 2 before being down n. or equivalently P{gambler is up 1 before being down n} = _n_.STOCHASTIC PROCESSES 43 Figure 1. will have his fortune go up by m . Remark The argument used in the preceding example also shows that a gambler who is equally likely to either win or lose 1 on each gamble will be losing n before he is winning 1 with probability 1/(n + 1). as the preceding implies that the probability that node i is the last node visited is the same for all i. and as these probabilities must sum to 1. and only if. neighbors of node i.1 will visit i + 1 before i is just the probability that a particle will progress m . Particle moving around a circle. it follows that P{i . condition on whether i . Similarly.P2 )P{i .1 before i + I}) :::: (PI .I is visited before i event that i .9(a) Suppose in Example 1 9(A) that the particle is not equally likely to move in either direction but rather moves at each step in the clockwise direction with probability p and in the counterclockwise direction with probability q :::: 1 . . we have P{i is last state} = PIP{i .1 is last state} < P{i is last state} . and let its value be PI.1 before it decreases by 1 Call this probability P2 .I or i + 1 is visited first Now. PI < P2 Hence.. and note that since p > q.I is visited first then the probability that i will be the last state visited is the same as the probability that a gambler who wins each 1 unit bet with probability q will have her cumulative fortune increase by m .1 before it decreases by 1 Note that this probability does not depend on i.1 before i + I} + P2 (1 . . and thus we can conclude that P{i . since the event that i . i = I.I then the probability that i will be the last state visited is the same as the probability that a gambler who wins each 1 unit bet with probability p will have her cumulative fortune increase by m .2 before i}.44 PRELIMINARIES obtain that P{gambler is up 2 before being down n} n :::: P{ up 2 before down n I up 1 before down n} --1 n+ = P{up 1 before downn + 1}--1 n+ n n +1 n n =n+2n+l=n+2' Repeating this argument yields that P{gambler is up k before being down n}:::: ~k' n+ EXAMPLE 1. m To determine the probability that state i is the last state visited.1 before i + I} + P2 Now. if i .2 is visited before i. if i + 1 is visited before i .P If 5 < P < 1 then we will show that the probability that state i is the last state visi ted is a strictly increasing function of i..1 before i + 1 implies the + I} < P{i .P{i . . if it does return to 0 before reaching the end of ray j. Conditioning on R.1) Also. yields (1 91) P{L = i} = i ! 1. starting at node 0. labeled 0. and rays emanating from that vertex is called a star graph (see Figure 1 93) Let r denote the number of rays of a star graph and let ray i consist of n. . i = 1. . vertices. 1 r P{L = i Ifirst ray visited is j} Now.9(c) A graph consisting of a central vertex. + (1 .)P{L = i} P{L = il first ray visited isj} = (1. we obtain upon conditioning whether the particle reaches the end of ray j before returning to 0 that P{L = i Ifirst ray visited is i} = lin. the first leaf visited is the one on ray i.3.lIn. for instance. r? 45 Solution Let L denote the first leaf visited. from the remark following Example 1 9(A). for i = 1.1Inl )P{L = i}. A star graph. r Suppose that a particle moves along the vertices of the graph so that it is equally likely to move from whichever vertex it is presently at to any of the neighbors of that vertex. Thus. where two vertices are said to be neighbors if they are joined by an edge.9. for j "" i n1 ) Leaf t!:eaf Figure 1.STOCHASTIC PROCESSES EXAMPLE 1.. if j is the first ray visited (that is. the first move of the particle is from vertex 0 to its neighboring vertex on ray j) then it follows. then the problem in essence begins anew Hence. when at vertex 0 the particle is equally likely to move to any of its r neighbors The vertices at the far ends of the rays are called leafs What is the probability that. the first ray visited. that with probability 11nl the particle will visit the leaf at the end of ray j before returning to 0 (for this is the complement of the event that the gambler will be up 1 before being down n . show that as n -+ 00.1. 2. h .. (b) if U is a uniform (0. Compute the mean and variance of a binomial random variable with parameters nand P 1. • Pr-are performed. r.5. 1 Let N. where rl(x) is that value of y such that F(y) = x 1. then E{XJ J: F(x) dx and 1. ~~ P.4. to E [N] hal 2: P{N:> k} = 2: P{N > k}. Pn).. then rl( U) has distribution F. 1) random variable. r with respective probabilities Ph P2. 1). PROBLEMS 1. (a) F(X) is uniformly distributed over (0.46 PRELIMINARIES Substituting the preceding into Equation (191) yields that or P{L i} i = 1.. . :> 1. denote the number of trials resulting in outcome i . n 1 If npn -+ A as n -+ 00. Suppose that n independent trials-each of which results in either outcome 1. O In general show that if X is nonnegative with distribution F.3. Show that .. . . If X is a continuous random variable having distribution F show that. Let N denote a nonnegative integer-valued random variable.2. Let Xn denote a binomial random variable with parameters (n. ) 1. Let Xl and X 2 be independent Poisson random variables with means AI and A • 2 (a) Find the distribution of Xl + X 2 (b) Compute the conditional di§tribution of XI given that Xl + X 2 = n 1. X.pROBLEMS 47 ••• . n. (Hint._I). Compute E[Nn ] and Var(Nn ). (c) Let Ty denote the time of the first record value greater than y. (a) Compute the joint distribution of N I . Use the probabilistic method. with the outcome of any play being that one of the contestants wins and the other loses Suppose the players are initially numbered 1. Ty = min{n.. This is called the multinomial distribution (b) Compute Cov(N/._1 beats i".2.. 1.2 beats i3 .2.. the time of the first value greater than y is independent of that value.9. The permutation i" . n > 1 and a record occurs at n}. n > 0 and has value X" if X" > max(XI . • .10.. Compute E [X] and Var(X). . Let XI. in is called a Hamiltonian permutation if i l beats . Show that Ty is independent of X rr . 1. . (b) Let T = min{n. ~). (It rpay seem more intuitive if you turn this last statement around.7. .) 1.. . (c) Compute the mean and variance of the number of outcomes that do not occur. X 2 . That is. . k < n. where Xo == 00 (a) Let N" denote the total number of records that have occurred up to (and including) time n. be a positive integer such that (:) [1 .8...6. Let X denote the number of white balls selected when k balls are chosen at random from an urn containing n white and m black balls. A round-robin tournament of n contestants is one in which each of the (~) pairs of contestants plays each other exactly once. X" > y}.and i.112*]"-* < 1 Show that . •• be independent and identically distributed continuous random variables We say that a record occurs at time n. and let k. Compute P{T > n} and show that P{T < oo} 1 and E[T] 00. That IS.N.-I. Consider a round-robin tournament having n contestants. Show that there is an outcome of the round-robin for which the number of Hamiltonians is at least n!/2. defined for Izi :5 1 by P(z) E[zX] = L ZIP{X = j}. show that Var(X) a2/4.. show that P{X is even} = 1 + (1 .2A 2 (e) If X is geometric with parameter p. is called the probability generating function of X (a) Show that (b) With 0 being considered even. If X is a nonnegative integer-val ued random variable then the function P(z). (d) If X is Poisson with mean A. show that P{X is even} = 1 + e. 1. 1=0 .48 PRELlMINA RIES it is possible for the tournament outcome to be such that for every set of k contestants there is a contestant who beat every member of this set. show that P{Xiseven} 1. show that P{Xiseven} P(-l) + P(l) 2 (c) If X is binomial with parameters nand p.12.11. . If P{O <: ~ [1 + (-1)' (2 ~ p)']. show that 1-p P{X is even} = -2. -p (f) If X is a negative binomial random variable with parameters rand p.2p)n 2 . <: X <: a} 1. 1. Let X . (b) E[X2]. (c) Find the expected number of times the shuffling operation is performed. (b) Show that -log(U) is an exponential random variable with mean 1. (a) (c) the probability mass function of XI (d) the probability mass function of X 2 • 1. Let f(x) and g(x) be probability density functions. and consider the following algorithm.17.I ( U).16. and suppose that for some constant c.. 1) random variable. Let XI. 1. Consider the following method of shuffling a deck of n playing cards.PROBLEMS 49 1. 1. Assuming that successively generated random variables are independent.' . . Otherwise.f(x) <: cg(x) for all x. go back to Step 1. for k = 0. 1) random variable (a) If X = F. a uniform (0.13. show that X has distribution function F. Let F be a continuous distribution function and let U be a uniform (0. Continue doing this operation until the card that was initially on the bottom of the deck is now on top.n denote the ith smallest of XI. Step 1: Generate Y. Step 2: Generate U. Step 3: If U <: ~~~) set X = Y. Suppose we can generate random variables having density functiong.n . Then do it one more time and stop (a) Suppose that at some point there are k cards beneath the one that was originally on the bottom of the deck Given this set of k cards explain why each of the possible k! orderings is equally likely to be the ordering of the last k cards (b) Conclude that the final ordering of the deck is equally likely to be any of the N! possible orderings. .n be its distribution function. 1. Determine E{Xd. show that: (a) X has density function f (b) the number of iterations of the algorithm needed to generate X is a geometric random variable with mean c. . Take the top card from the deck and then replace it so that it is equally likely to be put under exactly k cards. Xn and let F. 1. Let X. numbered 1 through n. a random variable having density function g. Xn be independent and identically distributed continuous random variables having distribution F. A fair die is continually rolled until an even number has appeared on 10 distinct rolls. Show that· .14. denote the number of rolls that land on side i.15. . Let N(x) denote the number of unit intervals packed in [0.b (1 __ a+b 1_)n (c) What is the probability that the (n + l)st ball drawn is white? L20. then N(5) = 3 with packing as follows --/ 1 -- --l:J-17 . if x = 5 and the successive random intervals are ( 5. .I)-as follows. x) and suppose that we pack in this interval random unit intervals-whose left-hand points are all uniformly distributed over (0.. then the next random unit interval will be packed if it does not intersect any of the intervals II. and for part (b) start by conditioning on whether Xn is among the i smallest of XI. it is replaced by a white ball from another urn Let Mn denote the expected number of white balls in the urn after the foregoing operation has been repeated n times.1).Fr+ln(x) i n +- n Frn(x) (Hints: For part (a) condition on whether Xn <:: x. we disregard it and look at the next random interval The procedure is continued until there is no more room for additional unit intervals (that is. x . For instance. .. 4.so (a) Frn(x) = F(x)Fr 1 n_l(X) _ PRELIMINARIES + F(x)Frn_l(x) n.19. h. Xn) 1. A coin. .27). (31. An urn contains a white and b black balls After a ball is drawn.i (b) Frn_l(x) = .. 5).5). . h have already been packed in the interval. 1./2 31 0.5 • 15 • 27 • • 4 1 • .7..18.. h. A Continuous Random Packing Problem Consider the interval (0. (a) Derive the recursive equation (b) Use part (a) to prove that M = a n +b . but if it is black. xl by this method. all the gaps between packed intervals are smaller than 1). which lands on heads with probability p. Let the first such random interval be II If II. is continually flipped Compute the expected number of flips that are made until a string of r heads in a row is obtained 1.. . it is returned to the urn if it is white. . and the interval will be denoted by h+1 If it does intersect any of the intervals II. (1. (4. namely. (3) Argue that (b) Show that a = t/(l _ p) ifp <2: 112 ifp < 112 >0 (c) Find the probability that the particle ever reaches n. n > 0 If the particle is presently at i. Var(X) = E{Var(XI V)] + Var(E[XI YD.PROBLEMS 51 Let M(x) = E[N(x)1 M(x) M(x) Show that M satisfies 0. that P{N e-A. be independent uniform (0.p Starting at 0.E[XI YD 2 1 Y] Prove the conditional variance formula. is defined by n} = Var(XI Y) E{(X . such that /I /1+1 A II V.Anln! ) 1. 0 x>1 1.23.A (Hint. i < n. conditioning on VI. and n has not yet .21.22. pol o =1 = Show that N is a Poisson random variable with mean .1) random variables.I with probability 1 . Use this to obtain Var(X) in Example 1 5(8) and check your result by differentiating the generating function 1. and let N denote the smallest value of n. given Y. n :::> 0. The conditional variance of X. Let V" V 2 . 2 x-I x < 1. Show by induction on n. !X-I M(y) dy + 1. Consider a particle that moves along the set of integers in the following manner If it is presently at i then it next moves to i + 1 with probability p and to i . <2: epol > II V" where II V. let 0: denote the probability that it ever reaches 1. n (d) Suppose that p < 112 and also that the particle eventually reaches n. .27.I with probability p That is.p. . is given by . has a gamma distribution with parameters (n.26.1) 00 ifp> 112 ifp ~ 112 > 112. In the ballot problem compute the probability that A is never behind in the count of the votes. let E [T] denote the expected time until the particle reaches 1 (a) Show that E[T]= (b) Show that. If Xl.p are interchanged when it is given that n is eventually reached) / 1. Var(T) = 4p(1 .52 PRELIMINARIES been reached. Verify the formulas given for the mean and variance of an exponential random variable 1.p and to i . k. A) That is.28. (Hint Let M. show that the particle will next move to i + 1 with probability 1 . Consider a gambler who wins or loses 1 unit on each play with respective possibilities p and 1 . show that L~ X. for p 1/(2p . show that the density function of L~ X.25. In Problem 1 23. the gambler will play exactly n + 2i games before going broke? (Hint Make use of ballot theorem. ••• .29.i). What is the probability that. .) 1. 1.1)3 (c) Find the expected time until the particle reaches n. i = 0. starting with n units. n > O. denote this expected time and condition on the result of the first gamble) 1.p) (2p .Xn are independent and identically distributed exponential random variables with parameter A. X 2 .24. Consider a gambler who on each gamble is equally likely to either win or lose 1 unit Starting with i show that the expected time until the gambler's fortune is either 0 or k is i(k .. >0 (d) Find the variance of the time at which the particle reaches n. show that P{next at i + 11 at i and will reach '!} = 1 - p (Note that the roles of p and 1 . n 1. i = 1. (c) Show that if P{X. compute the probability that Mr. > a}.continuous solution of the functional equation g(s + t) = g(s) + get) is g(s) = cs.33. If X and Yare independent exponential random variables with respective means 11 AI and 1/A . for t > 0.* > a} I = 1/2 then min M(t)e. In Example 1 6(A) if server i serves at an exponential rate Ai. show that where A. Derive the distribution of the ith record value for an arbitrary continuous distribution F (see Example 1 6(B)) 1. If XI and X 2 are independent nonnegative continuous random variables. 2. 1.I • a • . compute the distnbution of Z = min(X. have density function f. P{X > a} :5 M(t)e-. Let X.30. (a) Show that for any function hex) E[h(X)] = M(t)E[exp{-tX.}h(X. 1. is defined by f.32.)] (b) Show that.35.(t) is the failure rate function of X" 1. Let X be a random variable with probability density function f(x). Show that the only. Y) What 2 is the conditional distribution of Z given that Z = X? 1. and let M(t) = E [elX] be its moment generating function The tilted density function f.(x) = eUf(x) M(t) .31.ra = M(t*)e.34.pROBLEMS 53. A is the last one out 1.ap{X. then i = 2. . and so on. y>O. 1.I. X 2 . determine the expected number of steps until all the states I. i = 0. denote the number of steps it takes to go from vertex i .] recursively. 1.. first for i = 1. n (n )lln .39. 1. in terms of P. 2. Y2 . .1 all consisting of n vertices. That is.. .Y. . . for nonnegative x" show that' ~x.) 1. m . i = I.) 1. with one ray consisting of m vertices and the other r . Suppose that r = 3 in Example 1 9(C) and find the probability that the leaf on the ray of size nl is the last leaf to be visited. Let Y I . denote the probability that the leaf on the ray of m vertices is the last leaf visited by a particle that starts at 0 and at each step is equally likely to move to any of its neighbors.38. 1. • be independent and identically distributed with P{Yn = O} = ex P{Yn> Y} = (1 .36.ex)e.54 PRELIMINARIES 1..42. equal to 1/3. denote the number of additional steps after i of these states have been visited until a total of i + 1 of them have been visited. Let P. 1.I to vertex i.37. (a) Find P2 • (b) Express P. In Example 1 9(A). . Consider a star graph consisting of a central vertex and r rays._I. .. Let XI.ln ~ [lx. Use Jensen's inequality to prove that the arithmetic mean is at least as large as the geometric mean..40.41. with probability 1. m are visited (Hint: Let X. ••• be a sequence of independent and identically distributed continuous random variables Say that a peak occurs at time n if X n_1 < Xn > X n+ l • Argue that the proportion of time that a peak occurs is. A particle moves along the following graph so that at each step it is equally likely to move to any of its neighbors Starting at 0 show that the expected number of steps it takes to reach n is n2• (Hint Let T. and make use of Problem 1 25. .n Determine E[T. England. 5. 8 D Williams.43. 4. 1991 6 W Feller. 1954 4 L. and P. 1991 /) . The Probabilistic Method. E[X']ja' ~ Then use this result to show that n' (n/e)n REFERENCES References 6 and 7 are elementary introductions to probability and its applications. Probability with Martingales. California. 1968 5 R Durrett. 1 N Alon. Probability. 4th ed. 1992. Wiley. 1957 7 S M Ross. x>O 1.an)e-X. n 0 by Xo= 0 Prove that " P{Xn = O} = an P{Xn > x} = (1 . A First Course in Probability. Wiley. New York. Brooks/Cole. Cambridge. 2 P Billingsley. New York. Breiman. An Introduction to Probability Theory and its Applications. are given in References 2. Addison-Wesley. Rigorous approaches to probability and stochastic processes. and 8 Example L3(C) is taken from Reference 1 Proposition 1 5 2 is due to Blackwell and Girshick (Reference 3). I.. Macmillan. 1994. 1995 3 D Blackwell and M A Girshick. based on measure theory. Spencer. For a nonnegative random variable X. Erdos.REFERENCES 55 > Define the random variables X n . New York. show that for a > 0. J. Theory of Games and Statistical Decisions. Wiley. P{X ~ a} =::. MA. Vol. Cambridge University Press. Wiley. Reading. New York. 3rd ed. Probability and Measure. New York. Probability Theory and Examples. That is.X. have mean 0.. is eq ual to 0 Let Sn = 1=1 L X. X 2 . j. Proof of the Strong Law of Large Numbers To begin. = 1.] E[X~X.. our proof of the strong law of large numbers will assume that the random variables X. it follows by independence that E[X~X.X.. then If XI. + Xn n = p. we will suppose that E [X~] = K < 00. have a finite fourth moment. k. and consider Expanding the right side of the above will result in terms of the form and where i..] = 0 = E[XnE[X. it follows upon expanding the above product S6 .. for a given pair i and j there will be (i) = 6 terms in the expansion that will equal X.XkAJ] =0 Now. the n mean of the X. Hence.]E[Xk] =0 E[X.APPENDIX The Strong Law of Large Numbers is a seq uence of independent and identically distributed random variables with mean p. } Although the theorem can be proven without this assumption. assume that p. XI + X 2 + . ••• P { lIm n_O) ..Xk] = E[XnE[X. I are all different As all the X. S!ln4 > e for only finitely many n As this is true for any e > 0. since we see that Therefore. (*) Now.]EfXn where we have once again made use of the independence assumption Now. it follows that '" 2: ErS!ln4] < n=l 00.l)K which implies that Therefore. .THE STRONG LAW OF LARGE NUMBERS 57 and taking expectations terms by term that E[S~J nE[X:J + 6 (. we can thus conclude that with probability 1. for any e > 0 it follows from the Markov inequality that and thus from (*) '" 2: P{S~/n4 > e} < nul 00 which implies by the Borel-Cantelli lemma that with probability 1. from the preceding we have that EfS~] s nK + 3n(n .) EIX!Xi] nK + 3n(n . lim S!ln4 = O.l)E[X. lim which proves the result.. n .. and so we have proyen that. . to obtain that.p. n_OO . the mean of the X" is not equal to 0 we can apply the preceding argument to the random variables )(. lim n_1X) L ()(.p. . l-=} n or equivalently. with probability 1.)ln = 0. as n ~ 00 When p.In = p.:::1 L )(. with probability 1.S8 THE STRONG LAW OF LARGE NUMBERS But if S~/n4 = (Snln)4 goes to 0 then so must Sn1n. N(s) equals the number of events that have occurred in the interval (s. For example.:: t2.N(t l» for all tl <.::= O} is said to be a Poisson process having rate A. if (i) N(O) = 0 (ii) The process has independent increments S9 . N(t» must be independent of the number of events occurring between times t and t + s (that is.1. A > 0. t ~ O} is said to be a counting process if N(t) represents the total number of 'events' that have occurred up to time t.1 The counting process {N(t). For s < t. a counting process N(t) must satisfy (i) (ii) (iii) (iv) N (t) . t2 + s] (that is. N(t z + s) . In other words. then N(s) ~ N(t). the process has stationary increments if the number of events in the interval (t l + s. which is defined as follows. t z] (that is.CHAPTER 2 The Poisson Process 2. t] A counting process is said to possess independent increments if the numbers of events that occur in disjoint time intervals are independent. t . 1 THE POISSON PROCESS A stochastic process {N(t). N(t + s) . Hence. this means that the number of events that have occurred by time t (that is. N(t) .2: 0 N(t) is integer valued If s < t. N(t2) . and s > O.N(tl + s» has the same distribution as the number of events in the interval (tl. One of the most important types of counting processes is the Poisson process. Definition 2.N(t» A counting process is said to possess stationary increments if the distribution of the number of events that occur in any interval of time depends only on the length of the time interval. 60 THE POISSON PROCESS (iii) The number of events in any interval of length t is Poisson distributed with mean At That is. (ii).1. for all s. and for this reason an equivalent definition of a Poisson process would be useful. Note that it follows from condition (iii) that a Poisson process has stationary increments and also that E[N(t)] = At. As a prelude to giving a second definition of a Poisson process. which explains why A is called the rate of the process In order to determine if an arbitrary counting process is actually a Poisson process. and condition (ii) can usually be directly verified from our knowledge of the process.2 The counting process {N(t). (iii) P{N(h) = 1} = All (iv) P{N(h) ~ + o(h) 2} = o(h) . it is not at all clear how we would determine that condition (iii) is satisfied. P{N(t + s) . (At)" n = 0. t ~ 0. However. if (i) N(O) = 0 (ii) The process has stationary and independent increments. We are now in a position to give an alternative definition of a Poisson process. t ~ o} is said to be a Poisson process with rate A. which simply states that the counting of events begins at time t = 0. Condition (i). ' n. 1. we must show that conditions (i). we shall define the concept of a function f being o(h). and (iii) are satisfied. Definition 2.N(s) = n} = e-"-. Definition The function f is said to be o(h) if lim f(h) h-O = h o. A > 0. log Po(t) or = -A.N(t + h) . for n . where the final two equations follow from Assumption (ii) and the fact that (iii) and (iv) imply that P{N(h) ::: O} = 1 . N(t + h) .N(t) .: 2} . we amve at Similarly.N(t) = I} + P{N(t + h) = n. Po(t + h~ .::.::.APIl(t) or P~(t) Po(t) which implies.1 Definitions 2 I I and 2 I 2 are equivalent Proof We first show that Definition 2 1 2 implies Definition 2 lITo do this let Pn(t) = P{N(t) = n} We derive a differential equation for PIl(t) in the following manner Po(t + h) = P{N(t + h) = O} = P{N(t) = = P{N(t) = 0. by integration.THE POISSON PROCESS 61' THEOREM 2.Ah + o(h) Hence.N(t) = O} + P{N(t) "'" n -1. N(t + h) .Ah + o(h)].: 1. Pn(t + h) = P{N(t + h) = n} = P{N(t) = n. 0 yields P~ (t) = . =- At + C Since Po(O) (211) = P{N(O) = O} = 1.N(t) = O} = Po(t)[1 .N(t) = O} O}P{N(t + h) .Po(t) Letting h - = -APo(t) + O~h).N(t + h) .1. d ( Arp ( )) dl e n t = A(At)n-1 (n . hence.) 0. Hence. the last term in the above is o(h). (212) Now by (211) we have when n =1 or which.1 Then by (2 1 2).: -AP (t) + AP (I) + o(h) h n n-I h Letting h -.+ C' n' At) n . yields To show that Pn(t) = e-A'(At)nlnl. Pn(I)Po(h) + Pn-I(I)Pdh) + o(h) = (1 Thus. we use mathematical induction and hence first assume it for n .1)' implying that eMp (I) n ( = . we obtain Pn(t + h) . by (iv). equivalently.Pn(t)..62 THE POISSON PROCESS However. since p.. or. (0) = 0. by using (ii). Ah)Pn(t) + AhPn-1(t) + o(h) Pn(t +h) . However.1 . '. P{N(O) = n} = 0.1. N(t) will (with a probability going to 1) just equal the number of subintervals in which an event occurs. k [ Ai + ° (i)] = At + !~n. t] into k equal parts where k is very large (Figure 2 1 1) First we note that the probability of having 2 or more events in any subinterval goes to 0 as k ~ 00 This follows from P{2 or more events in any subinterval} k ~ ~ P{2 or more events in the ith subinterval} 1=1 ko(tlk) =to(tlk) tlk ~O ask~oo.! k t~ kt - k Figure 2. by stationary and independent increments this number will have a binomial distribution with parameters k and p Atlk + o(tlk) Hence by the Poisson approximation to the binomial we see by letting k approach 00 that N(t) will have a Poisson distribution with mean equal to l~n. since PI1 (O) ::::.~~)] = At. Hence. o . Thus Definition 2 1 2 implies Definition 2 11 We will leave it for the reader to prove the reverse Remark The result that N(t) has a Poisson distnbution is a consequence of the Poisson approximation to the binomial distribution To see this subdivide the interval [0. [t o.THE POISSON PROCESS or. no events of the Poisson process occur in the interval [0. and only if. the process probabUistically restarts itself That is.A1 Hence.1 Xn• n 1. XI has an exponential distribution with mean 1/A To obtain the distribution of X 2 condition on XI This gives P{X2 > tlX I s} = P{O events in (s. The assumption of stationary and independent increments is equivalent to asserting that.:> 1.s + t]lX I = s} (by independent increments) P{O events in (s.2 INTERARRIVAL AND WAITING TIME DISTRIBUTIONS Consider a Poisson process. and thus P{XI > t} = P{N(t) = O} = e.ly distributed exponential random variables having mean 11 A Remark The proposition should not surprise us. s + tn (by stationary increments) Therefore.2. and hence exponential interarrival times are to be expected. n ~ 1} is called the sequence of interarrival times We shall now determine the distribution of the Xn To do so we first note that the event {XI > t} takes place if. the arrival time of the nth event. 2. are independent identical. that X 2 is independent of XI' Repeating the same argument yields the following PROPOSITION 2. and let XI denote the time of the first event Further. The sequence {X n . and furthermore. and also has the same distribution as the original process (by stationary increments) In other words. also called the waiting time until the nth event Since n. the process has no memory. Another quantity of interest is Sn.64 THE POISSON PROCESS 2. let Xn denote the time between the (n 1)st and the nth event. at any point in time. t]. the process from any point on is independent of all that has previously occurred (by independent increments). for n . from the above we conclude that Xl is also an exponential random variable with mean lIA. .:> 1. I} P{l event in (t.Ar (AtY. t + dt)} + o (dt) =e -M(At)n-1 Adt (n .J .::: O.1)' The above could also have been derived by noting that the nth event occurs prior or at time t if. That is.AI (~~)! + J.INTERARRIVAL AND WAITING TIME DISTRIBUTIONS 65' it is easy to show. and only if.1 implies that Sn has a gamma distribution with parameters nand A. J which upon differentiation yields that the density function of Sn is f(t) = - !~n f Ae. Hence.1 also gives us another way of defining a Poisson process. using moment generating functions. that Proposition 2.-. _ ~Ar (At)n-I ..2.t. For suppose that we start out with a sequence {Xn .1. P{Sn $ t} ~ = _ P{N(t) > n} ~ . (n . t + dt)} + o(dt) = P{N(t) = n .. the number of events occurring by time t is at least n. n ~ 1} of independent identically distributed exponential random variables each having mean 1/ A. Now let us define a counting process by saying that the nth event of this . that Proposition 2.1 .Ar (j-1). its probability density is f(t) = Ae.2. upon dividing by d(t) and then letting it approach 0.Ae (n .1)! Remark Another way of obtaining the density of Sn is to use the independent increment assumption as follows P{t < Sn < t + dt} = P{N(t) = n .1 event in (t. N(t) ~ n <=> Sn ~ t.I)! + o(dt) which yields.=n e -AI (At)! -.L. (~t)!-I. That is. !~n f Ae. Yn) if (Y 1 . Y2.. the time of the event should be uniformly distributed over [0.Yn . . .. then the joint density of the order statistics Y(I). t] of equal length should have the same probability of containing the event In other words.. s)} P{O events in [s.). 2.Yn) = n! II/(YI). and we are asked to determine the distribution of the time at which the event occurred. .. Y(2) • . Y n if Y(I() is the kth smallest value among Y1 . N(t) = I} IN() P{N(t) I} _ P{l event in [0. for s :=::. • 2. t. Y(2j. t]. 1=1 The above follows since (i) (Y(I). P{X 1 < s t = I} = P{X. • Yn be n random variables. t ~ o} will be Poisson with rate A. t)} P{N(t) = I} ° =t s This result may be generalized. k = 1.. This is easily checked since.' . We say that YO)' Y(2). . < s. . . Y n) is equal to any of the n! permutations of (YI> Y2. . Since a Poisson process possesses stationary and independent increments. . it seems reasonable that each interval in {O.. . events In [s..3 CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES Suppose we are told that exactly one event of a Poisson process has taken place by time t.66 THE POISSON PROCESS process occurs at time Sn. . YIn) is given by n I(YhYh' . but before doing so we need to introduce the concept of order statistics Let Yh Y h . Y2.n. y.. . Y(n) are the order statistics corresponding to Y" Y h . . where The resultant counting process {N(t). YIn»~ will equal (YI. .. . t)} P{N(t) I} _ P{l event in [0. •• . If the Y.. .'s are independent identically distributed continuous random variables with probability density I. s). . . Y. .. h2 . .. .+h"i 1. ·h.3. Yn) is equal to Yi t . 5 n have the same distribution as the order statistics corresponding to n independent random variables uniformly distributed on the interval (0.5. P{t. .2 .+l = t and let h.2. = n} tn Hence. THEOREM 2.' .. i = 1.) when (y.n.~t... )/(y. then it follows from the above that the joint density function of the order statistics Y(I). Proof We shall compute the conditional density function of 51.nIN(t)=n} h. < Yn < t. .2. .. Yn).1.Yi) Yi • is /(y. n.. i == 1. .~5. . t). are uniformly distributed over (0... P{N(t) . .. + h.1 )2 .. . Y(2) • .CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES 67 and (ii) the probability density that (Y h Yz . Y'2' . . t).) = II. 5 n given that N(t) = n So let 0 < tl < t2 < . ---+ 0. We are now ready for the following useful theorem... . =7' . . YIn) is r' n! 0< Y' < Y2 < . P{t. given n' and by letting the h. . no events elsewhere in [0. + h. .. . that N(t) n is . /(y. If the Y" i = 1. . /(y.~51~t.. .1 Given that N(t) = n. the n arrival times 51. • t) =n' n t ' which completes the proof . .]. is a permutation of (Yl> Y2.. . we obtain that the conditional density of 51.n Now. be small enough so that t. < t'+I.+h"i=I.nIN(t)=n} _ P{exactly 1 eventin [tf> I. < t. t). let us compute the expected sum of the waiting times of travelers arriving in (0. . That is.)IN(t) = n] ::::: nt . we usually say that under the condition that n events have occurred in (0. V" denote a set of n independent uniform (0. ... independently of all else.1 suppose that each event of a Poisson process with rate A is classified as being either a type-lor type-II event.IN(t) = n ] Now if we let VI> .. are distributed independently and uniformly in the interval (0. and suppose that the probability of an event being classified as type-I depends on the time at which it occurs. E [ ~ (t . t) random variables. then (by Theorem 2 3. we want E[~~(. the times SI.68 THE POISSON PROCESS Remark Intuitively.3(A) N(I) E [ 2:(t 1=1 n] ::::: E [~ (t - S. S" at which events occur.E [~ S. . t).S.)IN(t) N(I) ] = n = nt nt nt -"2 ="2 and As an important application of Theorem 23. it is classified as being a type-I event with probability P(s) and a type-II event with probability 1 . By using Theorem 2. Specifically.1 we can prove the following proposition.Sf)].P(s). . considered as unordered random variables.. suppose that if an event occurs at time s.3. t) Suppose that travelers arrive at a train depot in accordance with a Poisson process with rate A If the train departs at time t. where SI is the arnval time of the ith traveler Conditioning on N(t) yields EXAMPlE 2.) (t .1) ( since ~ V(i) = ~ VI) nt 2' Hence. then. p). it follows that the probability that it will be a type-I event is 1 p=.3. where P = -1 fl P(s) ds t 0 Proof We compute the joint distribution of N 1(t) and N 2 (t) by conditioning on N(t) = 2: P{N.(t) = n. If N.2 = (n + m)' p"(1 n'm' p )me-M ----'---'----- (AI)"+m (n+m)' = e. 2. then N 1(t) and N 2 (t) are independent Poisson random variables having respective means Alp and AI(1 .p)m P{N (t) • = n N (t) = m} .P ))m which completes the proof The importance of the above proposition is illustrated by the following example . t). That is. since by Theorem 2 3 1 this event will have occurred at some time uniformly distributed on (0. . i = 1. N 2 (t) = mIN(t) = k}P{N(t) = k} k.(t) represents the number of type-i events that occur by time t. N 2(t) = mIN(t) = n + m}P{N(t) = n + m} Now consider an arbitrary event that occurred in the interval [0.(t) = n. P{N. = mIN(t) = n + m} = ( n+m) n p"(1 .O '" = P{N.2 .CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES 69 a PROPOSITION 2. then the probability that it would be a type-I event would be P(s) Hence.(t) = n. t] If it had occurred at time s.(t) = n. N 2 (t) = mIN(t) = n + m} will just equal the probability of n successes and m failures in n + m independent trials when p is the probability of success on each trial.Mp (Alp )" e-M(. P{N.-p) (AI(1 n' m" . N 2(t) Consequently.fl P(s)ds t u independently of the other events Hence. call an entering customer a type-I customer if it completes its service by time t and a type-II customer if it does not complete its service by time t.2 we obtain that the distribution of NI (t)-the number of customers that have completed service by time t-is Poisson with mean E[N1(t)] == A I~ G(t .3(c) Suppose that a device is subject to shocks that occur in accordance with a Poisson process having rate A. Further NI (t) and N 2 (t) are independent The following example further illustrates the use of Theorem 23. can be expressed as < D(t) == L D.70 THE POISSON PROCESS ExAMPLE 2. s:S t. ExAMPLE 2.3. S <: t. That is. Suppose that customers arrive at a service station in accordance with a Poisson process with rate A.s. if the customer enters at time s. the damage at t.al• If we suppose that the damages are additive. the probability of this will be G(t . then D(t). The damage due to a shock is assumed to decrease exponentially in time.s). t .s).::= 1. and the service times are assumed to be independent with a common distribution G To compute the joint distribution of the number of customers that have completed their service and the number that are in service at t. and since the service time distribution is G.=1 N(I) . is Poisson distributed with mean E[N2(t)] = A I~ G (y) dy. are assumed to be independent and identically distributed and also to be independent of {N(t). Now.s) ds = A I~ G(y) dy. if a shock has an initial damage D. P(s) == G(t . Similarly N 2(t). then a time t later its damage is De.e-a(I-S. where N(t) denotes the number of shocks in [0.1. The ith shock gives rise to a damage D. The D" i . . and thus from Proposition 2.. Upon amval the customer is immediately served by one of an infinite number of possible servers. Hence. the number of customers being served at time t. t].3(B) The Infinite Server Poisson Queue.::= O}.). then it will be a type-I customer if its service time is less than t . ] nJ' et. by Theorem 2.)IN(t) = 1=1 n] = E[D] 2: E[e-O(I-S.e-al)E[D] at and. a .1.e-a(I-S. letting VI.CONDITIONAL DISTRIBUTION Of THE ARRIVAL TIMES 71 where S.al ). E[D(t)] = AE[D] (1 . E[D(t)IN(t)] = N(t) (1 . then.)IN(t) n] n = E[D]e-aIE [t eaS'IN(t) l Now. . Hence. taking expectations.e. t] random variables.ll N(t) = 2: E[D. represents the arrival time of the ith shock. .lIN(t) = n] n n = 2: E[DtIN(t) = n]E[e-a(I-S.e-a(l-s.Udx 0 = - n at (eal 1). . We can determine E[D(t)] as follows: NIl) E[D(t)IN(t) ::: n] ::: E [ ~ D.)IN(t) = n] 1=1 n = E[D]E [~e-aU-S..3. E [~eaS'IN(t)::: n] ::: E [~eaUI'I] E =t [t I~l eOU. Vn be independent and identically distributed uniform [0. 3 1) upon letting h E[D(t)] ---+ 0 .3..=0 where LI is the arrival time of the shock in the interval I" Hence. where [a] denotes the largest integer less than or equal to a.0 a and thus from (2. [tlh].)] + o(h». Then we have the representation II/hi D(t) = 1=0 LX" and so E[D(t)] = L E[XJ . 1" .72 THE POISSON PROCESS Remark Another approach to obtaining E[D(t)] is to break up the interval (0.1) E[D(t)] = AE[D]E [I/h [ ~ he-a(r-L. E I. it follows that the expected damage at t from shocks originating in (y. i = 0. . let h be given and define X. as the sum of the damages at time t of all shocks arriving in the interval I.) ] + [ ] o(h) ---+ l * aJ But.] condition on whether or not a shock arrives in the interval I" This yields [I/hl E[D(t)] L (AhE[De-a(/-L. (2.e. (i + l)h). == (ih.~O II/hI To compute E(X. y + dy) is . = AE[D] (1 a It is worth noting that the above is a more rigorous version of the following argument: Since a shock occurs in the interval (y.aJ ).~O 0 that 1 eL he-a(f-L. y + dy) with probability A dy and since its damage at time t will equal e-a!l-y) times its initial damage.) ---+ I' e-a(t-y) dy = --. it follows upon letting h II/hI . since L. t) into nonoverlapping intervals of length h and then add the contribution at time t of shocks originating in these intervals More specifically.. Equation (i) is necessary for. and only if. + Yk.. then the kth arrival after the initial customer will find the system empty of customers and thus the busy period would have ended prior to k + 1 (and thus prior to n) services The reasoning behind (ii) and (iii) is straightforward and left to the reader Hence. n .. ° (i) Sk (ii) + . and are also independent of the arrival process When an arrival finds the server free.3. . t). t)....1 arrivals in (0.. A ) Also let Y 1 . which we shall designate as time Let Sk denote the time until k additional customers have arrived. + Yk. S k has a gamma distribution with parameters k. + Y n = t} X P{n - 1 arrivals in (0. • denote the sequence of service times Now the busy period will last a time t and will consist of n services if.1 arrivals in (0.lin . (Thus. Suppose that a busy period has just begun at some time. if Sk > Y 1 + . . t). 2... in which customers arrive in accordance with a Poisson process with rate A Upon arrival they either enter service if the server is free or else they join the queue. reasoning heuristically (by treating densities as if they were probabilities) we see from the above that (2. t). The successive service times are independent and identically distributed according to G..2) P{busy period is of length t and consists of n services} = P{Y1 + .n-l} Y1 + . + Y n = t. Y 1 + . known as MIG/l. .. <: Y1 k = 1. + Yk..n . Sk ~ Y 1 + .CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES 73 and so E[D(t)] = = AE[D] t e-a(t-y) dy AE[D] (1 _ a e-a/). . Y 2 .. k = 1.3.. n ..1 (iii) There are n . for instance..1 arrivals in (0. + Y n = t. + Y n = t} . Y 1 + .1 The MIGII Busy Period Consider the queueing system. . we say that a busy period begins It ends when there are no longer any customers in the system We would like to compute the distribution of the length of a busy period. Tn} Then (235) O<y<t otherwise . t). Y" be independent and identically distributed nonnegative random Let Y 1 .e (At)n-I (n _ 1)' dGn(t). k = 1. 1'n-1 are independent of {Y 1 . . Lemma 2..3. where G n is the n-fold convolution of G with itself In addition. Y n } and represent the ordered where 1'1. t) random variables Let Y 1 . .1 arrivals in (0.1 arnvals in (0. be independent and identically distributed nonnegative random variables that are also independent of {Tio .3.4 Let T. Y 1 + _ -Ai + Y n = t} . t) random variables Hence using this fact along with (23. we have from Theorem 23 1 that. variables Then k = 1. t).] IY 1 + .3 . • .3) and (2. Tn denote the ordered values from a set of n independent uniform (0. .M (n At)n-I 1)' dG (t) n X P{ 1'k ~ Y 1 + . n . given n . To compute the remaining probability in (2.n Lemma 2.2) yields (234) P{busy period is of length t and consists of n services} ( = e..74 THE POISSON PROCESS Now the arrival process is independent of the service times and thus (233) P{n .3. values of a set of n .3 4) we need some lemmas Lemma 2 3 3 is elementary and its proof is left as an exercise. t) random vanables. .. Y 2 .1 uniform (0. Y 2 . + Y n = t}.1 independent uniform (0. the ordered arnval times are distributed as the ordered values of a set of n . + Y k . .Y1 +· + Y" = y} y<U ·+Yk<'Tt. u) random vanables (see Problem 218) Doing so.CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES 75 P{ Y1 < Proof The proof is by induction on n When n 'TJIYl y} when 'T\ is uniform (0. .'Tn=u.3 5) by conditioning on the values of Y + + Y. for y < u. t) But = 1 we must compute O<y<t So assume the lemma when n is replaced by n . 1 and thus.stu RHS= { y<U otherwise o Hence.nl'T"=u. + Y" y] + Y" = y] n ly =1---n u' .nIY\+ ·-+Yn-l=s. we have for s < y (236) P{Y1 + ·+Yk<'Tk.. u) random van abIes Hence by the induction hypothesis we see that the right-hand side of (2 36) is equal to I . Y1 + . 'Tit . 'T:_\ are the ordered values from a set of n . suppose that y < t To make use of the induction hypothesis we will compute the left-hand side of (2. ...k:l. . +Yk<'Tbk=1.1 and now consider the n case Since the result is obvious for y 2::: t. for y < u. I U. + Y" y} Y1 + 'T" + Y"-J 'T" . .1 independent uniform (0.n-lIY1 +' '+Yn-1=s} ={0 ify2:::U. .-I and 'Tn and then using the fact that conditional on 'Tn = U. where 'Tt.1 uniform (0...Y!+ . 1 'Tn-l are distributed as the order statistics from a set of n .k P{YI+ 1. 8). t) random variables Hence its density is given by /. + Yn :::: y} =. -. O<x<t _n (tn-I n.i = 1..n1Y1 + .1 Thus from (237) and (23. . . where Un i = 1.n1Y1 + . tn yn-I) P{Y 1 +" + Yk < Thk =1 (7Y t y(tn-I t: 1" yn-l) . + Yn . when y < t.::: y} =1.l and the proof is complete We need one additional lemma before going back to the busy period problem.76 THE POISSON PROCESS where we have made use of Lemma 23 3 in the above Taking expectations once more yields (2. and thus (238) n (x)n-l .E [ 1 n n 1 Tn I y < Tn] P{y < Tn} [~ Iy < Tn] P{y < Tn} Now the distribution of Tn is given by P{Tn <x} = P {max U <x} I 1<IS" P{U1 <x..37) P{Y l + " + Yk < Tk... O<x<t..k =: 1. . .n} = (xlt)n.Jx)=-. . n are independent uniform (0. • . + Yn ... .n-lIY1 + 1. + Y". we must.3 4 by conditioning on Y1 + . upon replacing Tk by t T. +Yn-j=y. + Y" = t} l1Y1 + . . + Y = t} II Y.-k throughout. P{Y1 + . Tn-l has the same joint distribution as t .. I:$. + Yn = t} ::... which proves the result Returning to the joint distribution of the length of a busy penod and the number of customers served..1 :s.==t}.+Yn'1I Y1 +' n-l ""'1--- n (by Lemma 2 3 3). k (Yh1 + . + Y" = 1 t} P{t Tn-k:$..n-lIY\+···+Yn=t} +Yn=t] =E[l-Yl+' . + .k +··· + Y1.n 11 Y1 + . . . .Y j +· ·+Yn=t} lIYJ +···+Yn. + .. compute U will also be a uniform (0. . .Tl' Hence. .CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES 77 Lemma 2. 1'n-l} Then nonnegative random variables that are also independent of {1').-k> Yn ... n II ) . 11 Y + . from (2. t k 1. k = I.. .k 1.k 1.. k <:: n I.n 11 Y...5 Let 1'1> • . P{T.. = P{t = Til-ie <:: 1. t) random variable whenever U Now since t is. Y1 + . . and let Ylt Y2 . + Yk < 1'k. . it follows that TI.. P{Y + j '+Yk<1'bk .1 That is.Til-I> ••• .4).. ...3. + Y Ie . by Lemma 234. + Y k . 1'n-l denote the ordered values from a set of n .J =y} =P{Yj +' '+Yk<1'bk O<y<t otherwise .. + Y" = t} "" P{T"-k:::: Yk+ 1 + . n l1Y.n ... + . t) random variables.. n = l1Y1 + . + Y 1.3.1 independent uniform (0..n Hence. P{Y1 +· +Yk <1'k. ... we obtain P{Tk <: Y1 + .... t ........ lin Proof To compute the above probability we will make use of Lemma 2... + Yn.. k = 1. we have that I. + Y. •• be independent and identically distnbuted . as Yj + . + Yn .. Y 1 and so any probability statement about the Y/s remains valid if Y1 is replaced by Yn . Yk by Yn . Hence.• . call it B(t) 2:"1 B(t. n customers served in a busy period}.A1 (M)n-I dG n! (t) n B(t. •.4. = The distribution of the length thus given by of a busy period.5). •• .Yn by Y j Hence..78 THE POISSON PROCESS where the last equality follows since Y1 .4 NONHOMOGENEOUS POISSON PROCESS In this section we generalize the Poisson process by allowing the amval rate at time t to be a function of t. n). Yn has the same joint distribution as Y n . we see that P{Tk~YI+···+Yk.I . is 2. n) = J I 0 e- AJ (M)n-I n' dGn(t). then !i B(t or dt' n) = e.. Definition 2. • n .n-lIY1 +"'+Yn t} Y1 + .k=I. = P{ Tk > . + Yk. t 2! O} has independent increments . . if we let B(t.. t 2! O} is said to be a nonstationary or nonhomogeneous Poisson process with intensity function A(t).11 Y 1 + . from (2 3.1 The counting process {N(t).. .k+". • . + Yn = t} = lin (from Lemma 2.4). t 2! 0 if (i) N(O) = 0 (ii) {N(t). n) = P{busy period is of length ~ t. k = 1. Y 2 by Yn .3. t + s).A(t + s)Po(s) or log Po(s) = - f: A(t + u) du . P{N(t where the next-to-Iast equality follows from Axiom (ii) and the last from Axioms (iii) and (iv).NONHOMOGENEOUS POISSON PROCESS 79 (iii) P{N(t (il') P{N(t + h) + h) .m(t»}[m(t + s) That is. + s) m(t)Jnln l .N(t) = O} P{O events in (t. Letting h ---10> 0 yields PHs) = . t + s)}P{O events in (I + s.N(t) n} Then. 0 events in (t + s.A(t + s)h + o(h»).Po(s) = -A(t + s)Po(s) + o~h). Po(s + hh . Hence. n:> 0 then it can be shown that (2. N(t + s) .met).N(t) = 1} '= A(t)h + o(h) If we let met) = t A(s) ds.N(t) ~ 2} = o(h) . t + S + h)} Po(s)[l . t + s + h)} P{O events in (t.1.N(t) exp{ -(met = n} .1 with a slight modification Fix t and define Pn(s) = P{N(t + s) . The proof of (2.41) P{N(t + s) .4 1) follows along the lines of the proof of Theorem 2.N(t) is Poisson distributed with mean met + s) . Po(s + h) = + s + h) . . Specifically. where Xo = 0 If a record occurs at time n. t + h)} = P{one event in (t. and (iii) follow since they are also true for the homogeneous Poisson process Axiom (iv) follows since P{one counted event in (t.3 2 is Poisson distributed) EXAMPLE 2. where f and F are respectively the density and distribution function of X) We say that a record occurs at time n if Xn > max(Xl . N(t) is a counting process of events where an event is said to occur at time x if x is a record value I . t + h)} A~) + o(h) Ah A(t) + o(h) A = = A(t)h + o(h) The interpretation of a nonhomogeneous Poisson process as a sampling from a homogeneous one also gives us another way of understanding Proposition 232 (or. it gives us another way of proving that N(t) of Proposition 2. X 2 .4(A) Record Values. equivalently. A(t) = f(t)1 F(t).1) follows similarly and is left as an exercise The importance of the nonhomogeneous Poisson process resides in the fact that we no longer require stationary increments. let A be such that A(t) ~ A for all t ~ 0 and consider a Poisson process with rate A Now if we suppose that an event of the Poisson process that occurs at time t is counted with probability A(t)1 A. Let N(t) denote the number of record values less than or equal to t That is. and so we allow for the possibility that events may be more likely to occur at certain times than at other times When the intensity function A(t) is bounded. X n l ). then Xn is called a record value. (ii). then the process of counted events is a nonhomogeneous Poisson process with intensity function A(t) This last statement easily follows from Definition 24 1 For instance (i). Let Xl.80 or Po(s) = e-1m(I+S)-m(I)1 THE POISSON PROCESS The remainder of the verification of (24. denote a sequence of independent and identically distributed nonnegative continuous random variab~s whose hal:ard rate function is given by A(t). we can think of the nonhomogeneous process as being a random sample from a homogeneous Poisson process. (That is. Hence. and only if. from its generahzation to three types of customers.y) if y < s if s < y < s + t if y > s o + t. P{Xn E (t.G(s . . the first X. EXAMPLE = A(t)h + o(h). t ~ O} will be a nonhomogeneous Poisson process with intensity function A(t) To verify this claim note that there will be a record value between t and t + h if.y) dy = A £:+1 G(y) dy. Again.3. and call it type III otherwise. and with mean A (2) the numbers of departures in disjoint time intervals are inde- f:+ pendent. whose value is greater than t lies between t and t + h But we have (conditioning on which X. s + t) Then an arrival at y will be type I with probability G(S P(y) = { + t . call an arnval type I if it departs in the interval (s.G(s .3 2.2 the number of such departures will be Poisson distributed with mean A I~ P(y) dy = A I~ (G(s + t - y) . To prove statement (2) let II and 12 denote disjoint time intervals and call an arrival type I if it departs in II.NONHOMOGENEOUS POISSON PROCESS 81 We claim that {N(t). To prove statement (1). t + h) IXn > t} which proves the claim.y) . it follows that the number of departures in II and 12 (that is.y) G(s + t .4(B) (1) the number of departures in (s. this is. call it type II if it departs in 12 . of the infinite server queue having Poisson arrivals and general service distribution G-is a nonhomogeneous Poisson process having intensity function A(t) = AG(t). s + t) is Poisson distributed 1 G(y) dy. by the definition of a hazard rate function. To prove this we shall first argue that 2. i = n). from Proposition 2. It turns out that the output process of the M/ G/oo queue-that is.y)) dy + A £:+1 G(s + t . or. from Proposition 2. The Output Process of an Infinite Server Poisson Queue (MIG/oo). the number of type-I and type-II arrivals) are independent Poisson random variables. say. more precisely. + +Xnl]e-At{At)"ln I n=O = 2: E[e'X']"e-A/{At)"ln! n=O ex> where (2..5.. X 2 .5..2) that E [e 'W ] = 2: [4>x{t)]"e.. it is interesting to note that the limiting output process after time t (as t . 00.3) = exp{At{4>x{t) .82 THE POISSON PROCESS Using statements (1) and (2) it is a simple matter to verify all the axiomatic requirements for the departure process to be a nonhomogeneous Poisson process (it is much like showing that Definition 2. = (2.1)] . and (2..2) follows from the independence of the X. A as t .. The moment generating function of W is obtained by conditioning on N... (2. This gives E[e 'W ] = 2: E[e'WI N = n]P{N == n} "=0 0.+ +Xnl IN = "~O ex> ex> n]e-A/{At)"lnl 2: E[e/(X .. and suppose that this sequence is independent of N..5 COMPOUND POISSON RAN'nOM VARIABLES " AND PROCESSES Let XI' X 2 ..5.1) follows from the independence of {XI. • •• } and N..2) = 2: E[e/(X . letting denote the moment generating function of the X" we have from (2.1 of a Poisson process implies Definition 2 1.1 ) (25... 00) is a Poisson process with rate A 2..5..5.2) Since A(t) ..A/ {At)"ln l n~O 0. ••• be a sequence of independent and identically distributed random variables having distribution function F.1. Hence.. a Poisson random vanable with mean A The random variable is said to be a compound Poisson random variable with Poisson parameter A and component distribution F. 4) . 0 This can be shown by calculating the distribution of W by first conditioning on N(t). suppose that an event occurring at time s will. then we can express W as (25. . k..3). occurs. . that E[W] = AE[X] Var(W) where X has distribution F. and that whenever an event occurs a certain contribution results Specifically. = 1. independent of the past. Let W denote the sum of the contributions up to time t-that is.3) or by directly using a conditioning argument. When F is a discrete probability distribution function there is an interesting representation of W as a linear combination of independent Poisson random variables. It is easily shown. either by differentiating (25. denote the number of the X/s that are equal to j.~I k If we let N. j = 1. where N(t) is the number of events occurring by time t.COMPOUND POISSON RANDOM VARIABLES AND PROCESSES 83 . L p. t) random variables (see Section 2. result in a contnbution whose value is a random variable with distribution Fs. compound Poisson random variables often arise in the following manner Suppose that events are occurring in accordance with a Poisson process having rate (say) a. k. even though the XI are neither independent nor identically distributed it follows that W is a compound Poisson random variable with parameters A = at and F(x) 1 = - t II Fs(x) ds..5(A) Aside from the way in which they are defined. and Xi is the contribution made when event. . = AE [X2] EXAMPLE 2. . . Suppose that the XI are discrete random variables such that P{XI = j} = p"j = 1. . and then using the result that. Then. the unordered set of N(t) event times are independent uniform (0. given N(t). jAp. E[W] Var(W) = 2. let W = ~ Xr be a compound Poisson random variable with N 1=1 N being Poisson with mean A and the X. the ~ are independent Poisson random variables with respective means APi' j = 1. = AE[X] = 2.I ~~ E [~Xlh(~ XI)] = ~ e-.1 A Compound Poisson Identity As before.4) to compute the mean and variance of W.I A: n = ~ e-.5. .I ~: ~ E [X'h(~ XI)] = n-G ± e-. which check with our previous results. upon condi1= I n . We now present a useful identity concerning W.2 Let X be a random variable having distribution F that i<. from the above we obtain. for any function hex) E[Wh(W)l = AE[Xh(W + X)l Proof E[Wh(W)l = n=O ± E[Wh(W) IN = nle-. let us use the representation (2. .FAp. 2.5. having distribution F.k As a check.84 THE POISSON PROCESS where. = AE[X2] .5(1).jE[~] = . using the results of Example 1.FVar(~) = 2. because all of the random vanables X. .h (~~) have the same distribution Hence. independent of W Then.5.I A: nE [Xn h n (f 1=1 XI)] where the preceding equation fo II OW<. PROPOSITION 2. 2. 5. then for any po. Corollary 2.COMPOUND POISSON RANDOM VARIABLES AND PROCESSES 85 tioning on X n .5.A (n ~n 1)' f E [ Xn h (~XI) I Xn = x] dF(x) = A~ e.~~ E [ h(~ XI + x) ] dF(x) A x i E[h(W+ x) IN = mlP{N= m}dF(x) m=O xE[h(W + x)l dF(x) = xl dF(x) E[Xh(W + X) IX = AE[Xh(W + X)l Proposition 2.2 gives an easy way of computing the moments of W. =A = A f xE [h(~ XI + x)] dF(x) =A = A f f f f x ~o e.>itive integer n Proof Let hex) = x n- I and apply Proposition 252 to obtain E[wnl = AE[X(W + x)n-Il .3 If X has distribution F.A (n A~-~). that E[Wh(W)l = ~ e. 1 and PJ = P{W = j}.5 2 yields an elegant recursive formula for the probability mass function of W when the X. we obtain upon applying Proposition 252 that P{W = n} = AE[Xh(W + X)] = A2: E[Xh(W+ J X)lx= na J = A 2:jE[h(W + j)]a J J = A" L. which is defined to equal 1 if W = nand 0 otherwise. = j}. we see that E[W] = AE[X] E[W2] = A(E[X2] + E[W]E[X]) = AE[X2] + A2(E[X])2 = A(E[XJ] + 2E[W]E[X2] + E[W2]E[X]) E[WJ] = AE[XJ] + 3A2E[X]E[X2] + AJ(E[X])J and so on We will now show that Proposition 2. Suppose this is the situation.J J j-l P{W + j = n n}aJ .' is immediate. j >. are positive integer valued random variables. by starting with n = 1 and successively increasing the value of n. The successive values of Pn can be obtained by making use of the following Corollary 2.5.4 Proof That PI! = e-.86 THE POISSON PROCESS Thus. and let aJ = P{X. so take n > 0 Let hex) = {OlIn ifx# n if x =n Since WheW) = I{W = n}. Po = e.. Thus.4 p. Let W be a compound Poisson random variable with Poisson parameter A = 4 and with 2.) where {N(t). 3.COMPOUND POISSON R. by X(t) = 2: X. + 5asPo} = 501 120 e -4 2. + 2a2PO} = ~e-4 P3 = ~ {a. if {X(t). . 4. + 4a4PO} = ~~ e. . are identically equal to 1.-:-NDOM VARIABLES AND PROCESSES Remark When the X. P2 + 2a2Pl + 3a3PO} = P4 = Ps = 1: e-4 ~ {a. N(.. To determine P{W as follows: Po = = i} = 114. = Aa. t ~ O}.A = e.4 P2 = ~{a.2.} is a family of independent and identically distributed random variables that is independent of the process {N(t). 2. also.5. suppose that customers arrive at a store at a Poisson rate A. the preceding recursion reduces to the well-known identity for Poisson probabilities P{N = O} P{N = n} ExAMPLE = e. As an example of a compound Poisson process. we use the recursion of Corollary 2.4 e.2 Compound Poisson Processes A stochastic process {X(t). and {X" i = 1.5(a) p{X. t ~ O} is said to be a compound Poisson process if it can be represented. t > O} is a Poisson process. = 5}. Suppose.5. P4 + 2a2P3 + 3a3P2 + 4a4P. t ~ O} is a compound Poisson process then X(t) is a compound Poisson random variable. for t ~ 0.=. If X(t) denotes . that the amounts of money spent by each customer form a set of independent and identically distributed random variables that is independent of the arrival process.A A = - n P{N = n .P. n~l. i = 1. P3+ 2a2P2 + 3a3P.4 A 5" {a.I}. it is a Poisson process with rate A. It should be noted. is given by 2. 0 <: t < oo} is a conditional Poisson process such that A is either AI or A2 with respective prObabilities p and 1 . it does not have independent ones (Why not?) Let us compute the conditional distribution of A given that N(t) = n. then the probability ExAMPLE . P{N(t + s) N(s) n} The process {N(t). that {N(t). t :> O} be a counting process such that. P{A E (A. A simple model for such a situation would be to suppose that {N(t). t :> O} is not a Poisson process. A + dA) I N(t) _ P{N(t) n} n IA E (A. A + dA)}P{A E (A. the average rate at which seismic shocks occur in a certain region over a given season is either AI or A . 2. then {X(t). For instance. whereas it does have stationary increments. given that A =:::: A. Suppose also 2 that it is AI for 100 p percent of the seasons and A2 the remaining time. conditional on the event that A = A. t :> O} is a compound Poisson process.p.6 CONDITIONAL POISSON PROCESSES Let A be a positive random variable having distribution G and let {N(t). t :> O} is a Poisson process having rate A. {N(t). t 2!: O} is called a conditional Poisson process since. A + dA)} P{N(t) = n} =--~-:---- e-Ar (At)n dG(A) n! '" e-AI (At)n dG(A) Jo n! and so the conditional distribution of A. For dA small. given that N(t) = n. Given n shocks in the first t time units of a season.88 THE POISSON PROCESS the total amount spent in the store by all customers arriving by time t.6(A) Suppose that. for instance. Thus. however. depending on factors not at present understood. . that Po(t + s) Po(t)Po(s) (b) Use (a)to inferthatthe interarnvaltimesXI • X 2 . that P{N(s) = k IN(t) = n} = (:) (~r( 1 ~)II-k t ' k 0. N(t + s)] 2. We have a stockpile of n type-l components and m type-2 components.2 implies Definition 2. Calculate E[N(t) . For a Poisson process show. 2.pROBLEMS 89 it is a AI season is P{A = All N(t) = n} Also.e. For another approach to proving that Definition 2. for s < t.1. ..1. Show that Definition 2. Suppose that {N1(t).2.()e-All(A2t)II(1 p) ='~----~--~~--~----~~~~--~ pe.1. • •• are independent exponential random variables with rate A.no 2. show that the probability that the first event of the combined process comes from {N1(t). 2. t 2: O} is AI/(AI + A2).1.. t > O} be a Poisson process with rate A.3.1. has the distribution P{time from t until next shock is ~ x IN(t) n} /(A p(1 .()e. we see that the time from t until the next shock.1' (a) Prove.A11(Alt)1I + e-~I(A2tt(1 p) PROBLEMS 2. (c) Use (b) to show that N(t) is Poisson distributed with mean At. given N(t) = n. Let {N(t). independently of the time of the event. t 2: O} and {N2(t).1.1 of a Poisson process implies Definition 2. t > O} is a Poisson process with rate AI + A2 • Also. A machine needs two types of components in order to function.A 1t)1I + (1 e-Al.2.1. . using Definition 2. by conditioning on whether A::::: AI or A = A2 .5.At .6. 2.4. t 2: O} are independent Poisson processes with rates Al and A2 • Show that {N1(t) + N2(t).2. If you take the bus from that stop then it takes a time R.10. (a) If the preceding strategy is employed. to arrive home. what is the probability of winning? (b) What value of s maximizes the probability of winning? (c) Show that the probability of winning under the optimal strategy is lie. Compute the mean length of time the machine is operative if a failed component is replaced by one of the same type from the stockpile. with our objective being to stop at the last event to occur prior to some specified time T. (b) Use part (a) to show that N is Poisson distributed with mean A when N is defined to equal that value of n such that n n+1 II V. 53. Compute the joint distribution of 51. 52. then we lose if there are any events in the interval (t. Each time an event occurs we must decide whether or not to stop.1 > II V" 1~1 where n~~1 V.)/A. That is. (a) Compute the expected time from when you arrive at the bus stop until you reach home. then we also lose Consider the strategy that stops at the first event that occurs after some specified time s.(Y. (a) If X.9. where the X. 2. Suppose that your policy when arriving at the bus stop is to wait up to a time s. is exponentially distnbuted with rate A. Suppose that events occur according to a Poisson process with rate A. 2. ~ e-. 2. if an event occurs at time t. -< s ~ T. Generating a Poisson Random Variable. T]. ~ 1.90 TH E POISSON PROCESS Type-i components last for an exponential time with rate ILl before failing. that is. Compare with Problem 1 21 of Chapter 1. and if a bus has not yet arrived by that time then you walk home.) are exponential with rate ILl (IL2). and no additional events occur by time T. ° ° 2.7. Let VI! V 2 . If you walk from the bus stop then it takes a time W to arrive home. measured from the time at which you enter the bus. ••• be independent uniform (0. = (-log V. XI! ~~ Y. and win otherwise If we do not stop when an event occurs. .)]. $ t -< T and we decide to stop. 1) random variables. show that X.8. Buses arrive at a certain stop according to a Poisson process with rate A. compute E[min(~. . that if no cars will be passing in the first T time units then the waiting time is 0) 2. occurring according to a Poisson process with rate A.PROBLEMS 91 (b) Show that if W < 11 A + R then the expected time of part (a) is minimized by letting s = 0. Assume the N. r.1)b. Consider an elevator that starts in the basement and travels upward.. However. get off at j with probability Plj' ~j>' P lj = 1. . Find the expected time that the person waits before starting to cross (Note. Consider an r-sided coin and suppose that on each flip of the coin exactly one of the sides appears' side i with probability PI' ~~ PI = 1 For given numbers n'l. causes the system to fail with probability p Let N denote the number of shocks that it takes for the system to fail and let T denote the time of failure Find P{N nl T = t} 2. Events. each time an event is registered the counter becomes inoperative for the next b units of time and does not register any ne~ events that might occur during that interval Let R(t) denote the number of events that occur by time t and are registered (a) Find the probability that the .first k events are all registered. U. denote the number of flips required until side i has appeared for the n time. and suppose that each shock. Suppose that shocks occur according to a Poisson process with rate A. Let OJ number of people getting off the elevator at floor j (a) Compute E[OJ (b) What is the distribution of OJ? (c) What is the joint distribution of OJ and Ok? 2. are registered by a counter. i 1.14. if W > 1/ A + R then it is minimized by letting s = 00 (that is.13. n" let N.1L Cars pass a certain street location according to a Poisson process with rate A A person wanting to cross the street at that location waits until she can see that no cars will come by in the next T time units. . independently.15. 2. and let l . find P{R(t) :> n} 2. Let Nt denote the number of people that get in the elevator at floor i. are independent and that Nt is Poisson with mean A. and when W 1/A + R all values of s give the same expected time. Each person entering at i will. for instance. (c) Give an intuitive explanation of why we need only consider the cases s = 0 and s == 00 when minimizing the expected time. you continue to wait for the bus). independent of everything else. (b) For t :> (n . r (c) What is the distribution of TI? '(d) Are the 11 independent? (e) Derive an expression for E[T] (f) Use (e) to derive an expression for E[Nl 2.r for some i = 1.92 THE POISSON PROCESS Thus N is the number of flips required until side i has appeared n. ~~ P. independent of everything else. Let XI.1. i = 1. 1 Let XI denote the number of outcomes that occur exactly j times. Compute E[X. time. X 2 . (a) What is the distribution of N. .? (b) Are the N. Var(X. The number of trials to be performed is a Poisson random variable with mean A Each trial has n possible outcomes and.17. how many of the X's are less than x? (c) Use (b) to obtain an expression for P{Xlr ) S x} (d) Using (a) and (c) establish the identity (b) X(I) for 0 ~ y S 1 .16. and only if. X n • (a) Note that in order for X(I) to equal x. one must equal x. . 2. r and let T= min TI 1=1. independent? Now suppose that the flips are performed at random times generated by a Poisson process with rate A = 1 Let T.. results in outcome number i with probability Pi. denote the time until side i has appeared for the n. times . • . Xn be independent continuous random variables with common density function! Let X It ) denote the ith smallest of Xl.l.. . j 0.1 of the X's must i must all be be less than x. exactly i . and the other n greater than x Using this fact argue that the density function of X(I) is given by will be less than x if.). independently of all else. are independent and N.(t).PROBLEMS 93 . Suppose cars enter a one-way infinite highway at a Poisson rate A The ith car to enter chooses a velocity V. k are independent and N. D(t + s)] ..(t). (b) Is X(t) Poisson distributed? 2. A bus contains j customers with probability a"j = 1. Individuals enter the system in accordance with a Poisson process having rate A Each arrival independently makes its way through the states of the system Let a.1 uniform (0. and travels at this velocity Assume that the V.(s) denote the probability that an individual is in state i a time s after it arrived Let N. Show that the N.20. i ~ 1. For the model of Example 2. . 2. Busloads of customers arrive at an infinite server queue at a Poisson rate A Let G denote the service distribution.(t) denote the number of individuals in state i at time t. y) random variables 2.3(C).(s) ds 2. h) at time t Assume that no time is lost when one car overtakes another car 2. (b) Cov[D(t). find (a) Var[D(t)]. Suppose that each event of a Poisson process with rate A is classified as being either of type 1. t . (e) Let S. ~~ P.(s) = 1.IN(t) = nl~ { i >n 2.V(n-I)aredistributed as the order statistics of a set of n .19. Let X(t) denote the number of customers that have been served by time t (a) E[X(t)] = ?. k.23.(t) denote the number of type i arrivals in [0.22. .(s).(t) is Poisson with mean equal to AE[amount of time an individual is in state i during its first t units in the system] 2. . i = 1.21. . Derive the distribution of the number of cars that are located in the interval (a. it is classified as type i with probability P.k If the event occurs at s. Let V(llI . then. t] Show that the N. V(n) denote the order statistics of a set of n uniform (0. . Find E[S.> O}.(t) is Poisson distributed with mean A J~ P. 1) random variables Show that given V(n) = y. V(I).18.'s are independent positive random variables having a common distribution F. i = 1. denote the time of the ith event of the Poisson process {N(t).. Let N. and the mean of N 2.29. When a faster car encounters a slower one. Show that W. independent of the past. is a compound Poisson random variable That is. . are independent and identically distributed random variables and are independent of N. is the median of the distribution G 2. Let T Tz. it passes it with no loss of time Suppose that a car enters the highway at time t. Compute the conditional distribution of S" Sh .3 3 2. Sn given that Sn = t 2.met) 2.25.31. Suppose that events occur in accordance with a Poisson process with rate A.27. Compute the moment generating function of D(/) in Example 23(C).26. and that an event occurring at time s. A(t) > 0 for all t. independently from car to car. from the distribution F. Complete the proof that for a nonhomogeneous Poisson process N(t + s) . Prove Lemma 2.30. contributes a random amount having distribution FH S :> O.1 N the X.28. (d) Find the distribution of T2 2. Suppose that cars enter a one-way highway of length L in accordance with a Poisson process with rate A Each car travels at a constant speed that is randomly determined. show that W has the same distribution as ~ x" where 1". t 2: O} is a Poisson process with rate A 1 . Consider a nonhomogeneous Poisson process {N(t). where Show that {N*(t).N(t) is Poisson with mean met + s) . Show that as t -+ 00 the speed of the car that minimizes the expected number of encounters with other cars. where we sayan encounter occurs when a car is either passed by or passes another car.94 THE POISSON PROCESS 2. •• denote the interarrival times of events of a nonhomogeneous"Poisson process having intensity function A(t) (a) Are the TI independent? (b) Are the ~ identically distributed? (c) Find the distribution of T.24. the sum of all contributions by time t. 2. a Poisson random variable Identify the distribution of the X. Let N*(t) = N(m-I(t)) t 2: O}. Let X(t) be the number of workers who are out of work at time t Compute E[X(t)] and Var(X(t)). Repeat Problem 2 25 when the events occur according to a nonhomogeneous Poisson process with intensity functio~ A(t). Suppose further that each injured person is out of work for a random amount of time having distribution F. t ~ 0 However. show that the unordered set of arrival times has the same distribution as n independent and :> identically distributed random variables having distribution function m(x) F(x) = { ~(t) x :5. suppose one starts observing the process at a random time T having distnbution function F. where distance is measured in the usual Euclidean manner Show that (a) P{X > t} = e. (a) Let {N(t). t O} be a nonhomogeneous Poisson process with mean value function m(t) Given N(t) = n. (b) Var(C) . with Ro = 0. t :> O} possess independent increments? (b) Repeat (a) when {N(t).PROBLEMS 95 2. t x> t (b) Suppose that workers incur accidents in accordance with a nonho- mogeneous Poisson process with mean value function m(t).33.35. t ~ O} be a nonhomogeneous Poisson process with intensity function A(t). t ~ 0 2.36. (c) 1TRf . I 2. t ~ O} is a Poisson process. Let N*(t) = N( T + t) . Let R" i ~ 1 denote the distance from an arbitrary point to the ith closest event to it Show that.AlI. Let {N(t). A two-dimensional Poisson proc~ss is a process of events in the plane such that (i) for any region of area A. 2.N( T) denote the number of events that occur in the first t time units of observation (a) Does the process {N*(t).34.2. and let X denote the distance from that point to its nearest event. 2. Consider a fixed point. Find (a) E[C].1TRf-I' i ~ 1 are independent exponential random variables. the number of events in A is Poisson distributed with mean AA. Let C denote the number of customers served in an MIG/1 busy period. and (ii) the numbers of events in nonoverlapping regions are independent.32. each with rate A. (b) E[X] = 1/(20). t:> O} that is not a Poisson process but which has the property that conditional on N(t) = n the first n event times are distributed as the order statistics from a set~ of n independent uniform (0. (b) Compute the conditional distribution of A given {N(s).40. 4.39. Consider a conditional Poisson process where the distribution of A is the gamma distribution with parameters m and a' that is. the history of the process up to time t.41. Let {X(t). h-O h (e) Let XI. Compute Cov(X(s). (a) Explain why a conditional Poisson process has stationary but not independent increments. j = 1. Are they independent? Are they identically distributed? 2. t) random variables. 3.42. ( m +nn 1) ( . ~ 2.. .37. t 2: THE POISSON PROCESS N(t) O} be a compound Poisson process with X(t) == ~ ~. = 1 and P{X1 == j} == JIIO.38. 0 < s < t}. Let {X(t). P{N(h):> I} I1m ----'----->-~---:. Give an example of a counting process {N(t). (c) Compute the conditional distribution of the time of the first event after t given that N(t) "'" n. t 2: O} be a compound Poisson process with X(t) 1=1 ~ ~. the distribution of X(t) is approximately normal N(I) 2. and show that it depends on the history only through N(t). Calculate 2. 2. the density is given by 0< A < (a) Show that P{N(t)~n}'='" 00. (d) Compute .96 2. t )n a+t n>O. for t large. • denote the interarrival times. X 2 " . Explain why this is true. 1=1 and suppose that the XI can only assume a finite set of possible values. For a conditional Poisson process. Argue that.a a+t )m(. and suppose that A P{X(4) 20}. X(t)) for a compound Poisson process 2. 17. Introduction to Stochastic Processes.1975 3 S M Ross.73-75. Prentice-Hall." Operations Research Quarterly.REFERENCES 97 (b) Show that the conditional distribution of A given N(t) = n is again gamma with parameters m + n. "Compound Poisson Distributions. Introduction to Probability Models. Englewood Cliffs.4 was originally obtained in Reference 1 by using a generating function argument Another approach to the Poisson process is given in Reference 2. 1 R M Adelson.N(t) = 11 N(t) = n}lh? REFERENCES Reference 3 provides an altern'lte and mathematicaUy easier treatment of the Poisson process Corollary 25. NJ. Orlando. Academic Press. 5th ed . FL. (c) What is lim P{N(t h_O + h) . a + t.1993 . (1966) 2 E Cinlar. 1. ~ 00. Formally. t ~ O} is called a renewal process 98 . n = 1. it follows that Sn is the time of the nth event. and to avoid trivialities suppose that F(O) = P {Xn = O} < 1 We shall interpret Xn as the time between the (n . it follows that 0 < p. let {Xn . the number of events by time t. we have that N(t). is given by (3 1 1) N(t) = sup{n: Sn < t}. Letting So = 0. Let p. Definition 3. 1 INTRODUCTION AND PRELIMINARIES In the previous chapter we saw that the interarrival times for the Poisson process are independent and identically distributed exponential random variables. Sn = 2.} be a sequence of nonnegative independent random variables with a common distribution F. A natural generalization is to consider a counting process for which the interarrival times are independent and identically distributed with an arbitrary distribution Such a counting process is called a renewal process. As the number of events by time t will equal the largest value of n for which the nth event occurs before or at time t.1 The counting process {N(t).CHAPTER 3 Renewal Theory \ 3. = EfXn] = f.2. X" 1=1 n ~ 1. .1)st and nth event. n x dF(x) denote the mean time between successive events and note that from the assumptions that Xn :> 0 and F(O) < 1. (322) P{N(t) = n} = P{N(t) = P{Sn -< ~ n} .P{Sn+l t} Now since the rapdom variables X. by (3 1 1).2 1) we obtain N(t) ~ n ¢::> Sn ~ t. are independent and have a common distribution F. this means that Sn must be going to infinity as n goes to infinity. and so we say that the nth renewal occurs at time Sn Since the interarrival times are independent and identically distributed. it follows that at each renewal the process probabilistically starts over The first question we shall attempt to answer is whether an infinite number of renewals can occur in a finite time To show that it cannot. Thus. it follows that Sn = ~~=I X. m(t) is called the renewal function. N(t) must be finite.DISTRIBUTION OF N(t) 99 We will use the terms events and renewals interchangeably.. the nth renewal occurs before or at time t That is. and only if. is distributed as Fn. at least in theory.L > 0. and we can write N(t) = max{n. i ~ 1. from (322) we obtain Let m(t) = ErN(t)]. and much of renewal theory is concerned with determining its properties The relationship between m(t) and F is given by the following proposition . the n-fold convolution of F with itself Therefore. that with probability 1. Sn can be less than or equal to t for at most a finite number of values of n Hence.2 DISTRIBUTION OF N(t) The distribution of N(t) can be obtained.P{N(t) ~ ~ n + 1} t} . asn --+ 00 But since J. (321) From (3. Sn ~ t} 3. by first noting the important relationship that the number of renewals by time t is greater than or equal to n if. we note. by the strong law of large numbers. ElN(t)] = £ [~ In] = L P{ln = I} n~1 a: = L P{Sn s t} T-I " where the interchange of expectation and summation is justified by the nonnegativity of the In The next proposition shows that N(t) has finite expectation.1 (323) RENEWAL THEORy met) = L Fn(t) n':!!l " Proof N(t) where = Lin. t] 0 Hence.2. n=1 " I = n {I if the nth renewal occurred in otherwise lO.100 PROPOSITION 3.2.2 met) < 00 for all 0 s t < 00 Proof Since P{Xn = O} < 1. PROPOSITION 3. it follows by the continuity property of probabilities that there exists an a > 0 such that P{Xn ~ a} > 0 Now define a related renewal process . . . and let N(t) = sup{n XI + . + X" ::... t} Then it is easy to see that for the related process. This follows since the only way in which N (00 )-the total number of renewals that occurs-can be finite..2..SOME LIMIT THEOREMS 101 {X"' n ~ I} by if X" <a ifX..3 SOME LIMIT THEOREMS If we let N( 00) limr. tfa + 1 E[N(t)]::'. . renewals can"'only take place at times t = na.... . Therefore.~a. n = 0. then it is easy to see that N(oo) = 00 with probability 1.. . 1. X" implies that N(t) ~ N(t) Remark The above proof also shows that E[N'(t)] < 00 for all t::::: 0.. .and also the number of renewals at each of these times are independent geometric random variables with mean Thus.:: 0 3. P{X" 2: a} < 00 and the result follows since X" ::.. '" N(t) denote the total number of renewals that occurs.. P{N( 00) < oo} = P{X" ::::: 00 for some n} <LP{Xn=oo} n~1 '" =0.. is for one of the interarrival times to be infinite. 1 Thus N(t) goes to infinity as t goes to infinity However. ----N(t) t I-L Proof Since SN(II ::. Then SN(r) = S3 represents the time of the third event Since there are only three events that have occurred by time t. we obtain ast_ Furthermore.3. . for instance.3. As a prelude to determining the rate at which N(t) grows. what SN(r) represents-namely. suppose. just what does this random variable represent? Proceeding inductively. it follows by the strong law of large numbers that SN(I)IN(t) _ I-L as N(t) _ 00 But since N(t) _ 00 when t _ 00.3.\L THEORY ----x----x-------x------~-------x---------- o SN(t) SN(t)+1 Time Figure 3. t < SN(I) 11 . writing SN(')'I N(t) 00 =[ SN(I)'I [N(t) + N(t) + 1_ N(t) J 1] . it would be nice to know the rate at which N(t) goes to infinity. Similar reasoning leads to the conclusion that SN(r)+! represents the time of the first renewal after time t (see Figure 3. that N(t) = 3. since SN(I/ N(t) is the average of the first N(t) interarrival times.1 With probability 1. we would like to be able to say something about liml-co N(t)lt.N(t) SN(I)'I N(t)' However. let us first consider the random variable SN(r)' In words.102 RENEW .1) We may now prove the following PROPOSITION 3. in fact. S3 also represents the time of the last event prior to (or at) time t This is. the time of the last renewal prior to or at time t. That is. we see that (331) ---<---<-- SN(I) t N(t) . how should we proceed? EXAMPLE 3. 1m N(n) n_~ n n~~ n Consider the strategy that initially chooses a coin and continues to flip it until it comes up tails.3. as t _ co A container contains an infinite collection of coins.3(A) Solution. let N(n) denote the number of tails in the first n flips. each of which converges to p. 1). To begin.. is given by Ph . conditioning gives E[number of flips between successive tails] = f1o--1 p dp = 1 00 implying that. We will exhibit a strategy that results in the long-run proportion of heads being equal to 1.I' . by Proposition 3. we have.. The result now follows by (331) since tIN(t) is between two numbers.p) Hence. by the same reasoning. call it Ph. Suppose we are to flip coins sequentially. and these probabilities are the values of independent random variables that are uniformly distributed over (0. note that the times at which a flipped coin lands on tails constitute renewals Hence. Each coin has its own probability of landing heads. given its probability p of landing heads. at any time either flipping a new coin or one that had previously been used.1 lim N(n) n--+oo n = 1/ E[number of flips between successive tails] But. At this point that coin is discarded (never to be used again) and a new one is chosen The process is then repeated To determine Ph for this strategy.SOME LIMIT THEOREMS 103.N(n) . with probability 1. If our objective is to maximize the long-run proportion of flips that lands on heads. the number of flips of a coin until it lands tails is geometric with mean 1/(1 .I'1m n . lim N(n) n_'" n =0 .1 . and so the long-run proportion of heads. . ExAMPLE 3. •• • ExAMPLE 3. Xn and before observing X n+ l . For this reason IIp.2..3(c) Let X n .1 Wald's Equation Let Xl. Intuitively... However. then we have stopped after observing Xl. n = = 1. for all n = 1. ..3(a) Let X n .. n = 1. If N = n. . is called the rate of the renewal process We show that the expected average rate of renewals m(t)lt also converges to 1I p. X n+ 2 . be independent and such that = P{Xn Then = -I} P{Xn = I} = !.2. . 2.. before presenting the proof. denote a sequence of independent random variables. + Xn = I} . N = min{n: Xl + .2. the long-run rate at which renewals occur will equal IIp.. we observe the Xn's in sequential order and N denotes the number observed before stopping.. X n + 2 . An integer-valued random variable N is said to be a stopping time for the sequence if the event {N = n} is independent of X n + l . .. n = 1.104 RENEWAL THEOR Y Thus Proposition 3. then N is a stopping time We may regard N as being the stopping time of an experiment that successively flips a fair coin and then stops when the number of heads reaches 10. + Xn = 1O}. We have the following definition Definition XI' X 2 . . .1 states that with probability 1.3. . N = min{n Xl + .3... X 2 . . 3. be independent and such that P{Xn If we let O} = P{Xn = I} = !. .. we will find it useful to digress to stopping times and Wald's equation. we have not stopped after successively observing XI> .2 eWald's Equation). n} ex: "=1 = E[X1E[N] . we thus obtain E [~ X"] = ~ E[X"lE[/"l E[X] = E[X] n-! 2: E[I"] '" 2: P{N 2:.. and only if.. we have that "~l 2: X" = 2: X" I" n=! N " Hence. . (332) However. n ifN<n. I" is determined by XI. and if N is a stopping time for XI. then expectations. Xl> E [* X. Xl.3._I and is thus independent of X" From (33. X. X. In = 1 if. ~ If X .. are independent and identically distributed random variables having finite such that E[Nl < 00.] = E[N1E[Xl Proof Letting if N 2:..SOME LIMIT THEOREMS 105 is a stopping time It can be regarded as the stopping time for a gambler who on each play is equally likely to either win or lose 1 unit and who decides to stop the first time he is ahead (It will be shown in the next chapter that N is finite with probability 1 ) THEOREM 3.-I' Therefore.2). then (333) We are now in position to prove the following.. replace X. equivalently. we have the following Corollary 3. In this case..3. which yields the conclusion that E[N] = 00 3. or. Thus the event {N(t) + 1 = n} depends only on Xl. + X N = 10 by definition of N.(X2 .. .< 00. + X N] = E[N]E[X] However. the interchange is justified as all terms are nonnegative However. . at the N(t) + 1 renewal. For Example 3 3(B). hence N(t) + 1 is a stopping time... An application of the conclusion of Wald's equation to Example 33(C) would yield E[XI + . Xl +.. Xn and is thus independent of X n + h • . . + X n .. + X N = 1 and E[X] = 0. . Xl + .. E{XI + .. . this implies that the original interchange is allowable by Lebesgue's dominated convergence theorem. by its absolute value throughout. and so E{N] = 20.. ••• denote the interarrival times of a renewal process and let us stop at the first renewal after t-that is. From Wald's equation we obtain that. To verify that N(t) + 1 is indeed a stopping time for the sequence of X" note that N(t) + 1 = n ¢::} N(t) ¢::} =n- 1 :::.3 If Ii.3. Wald's equation implies However. Xl + . when E[X] < 00. Xl + .2 Back to Renewal Theory Let XI . + XN(r)+d = E[X]E[N(t) + 1].l t. and so we would arrive at a contradiction Thus Wald's equation is not applicable. + Xn> t..106 RENEWAL THEORY Remark In Equation (332) we interchanged expectation and summation without justification To justify this interchange. SOME UMIT THEOREMS 107 THEOREM 3.3.4 (The Elementary Renewal Theorem). ---m(l) I 1 IL Proof Suppose first that IJ- < (Xl Now (see Figure 33 1) SN(t)+I > I Taking expectations and using Corollary 333 gives lJ-(m(l) + 1) > t, implying that (334) , 'fm(l) 1 I 1m In --2= ...... I IL To go the other way, we fix a constant M, and define a new renewal process {XII' n = 1, 2, } by letting _ X= {XII II M if XII:=; M, ifXII>M n = 1,2, Let S" L~ X" and N (t) sup{n S,,:=; t} Since the interarnval times for this truncated renewal process are bounded by M, we obtain SN(t)+I :=; I + M, Hence by Corollary 3 3 3, (m(l) + 1)ILM :=; I + M, where IJ-M = E[X"l Thus mel) 1 , I 1m sup - - :=; r_" I ILM ::?: Now, since S" :=; S", it follows that N(t) 2= N(t) and met) met), thus (335) Letting M _ (Xl , met) 1 I1m sup --:=; - . 1-'" t IJ-M yields (336) and the result follows from (3.3 4) and (336) 108 RENEWAL THEORY When IJ- = 00, we again consider the truncated process, since IJ-M the result follows from (335) ~ 00 as M ~ co, Remark At first glance it might seem that the elementary renewal theorem should be a simple consequence of Proposition 3.3.1. That is, since the average renewal rate will, with probability 1, converge to 11 JL, should this not imply that the expected average renewal rate also converges to lIJL? We must, however, be careful, consider the following example. Let U be a random variable that is uniformly distributed on (0, 1). Define the random variables Yn , n ~ 1, by ifU>1/n if U <: lin. Now, since, with probability 1, U will be greater than 0, it follows that Yn will equal for all sufficiently large n. That is, Yn will equal for all n large enough so that lin < U. Hence, with probability 1, ° ° as n ----. However, 00. Therefore, even though the sequence of random variables Yn converges to 0, the expected values of the Yn are all identically 1. We will end this section by showing that N(t) is asymptotically, as t ----. 00, normally distributed. To prove this result we make use both of the central limit theorem (to show that Sn is asymptotically normal) and the relationship (3.3 7) N(t) < n ¢::> Sn > t. THEOREM 3.3.5 Let IJ- and (72, assumedfinite, represent the mean and variance of an interarrival time Then as t~ 00. THE KEY RENEWAL THEOREM AND APPLl~ATIONS 109 proof Let r, = II IL + Y(Tv'11 1L3 Then P {N(/) - IIIL < Y} = P{N(/) < r ,} (TVII IL 3 = P{S,, > I} = (by (3.3.7)) rilL} P{S" - rilL > I (Tv" (Tv" = P s - r IL { "(Tv" > I _Y ( 1 + Y(T ) -112} vt,;. Now, by the central linut th~orem, (S, - ',IL)I(Tv" converges to a normal random vanable having mean 0 and variance l'as I (and thus r,) approaches 00 Also, since -Y ( 1+-- Y(T ) -112 vt,;. --Y we see that and since f'" e-y X 212 dx = IY -~ e- X 212 dx, the result follows Remarks (i) There is a slight difficulty in the above argument since " should be an integer for us to use the relationship (33.7). It is not, however, too difficult to make the above argument rigorous. (ii) Theorem 3.3.5 states that N(t) is asymptotically normal with mean tIp. and variance tu 2/p.3. 3.4 THE KEY RENEWAL THEOREM AND ApPLICATIONS A nonnegative random variable X is said to be lattice if there exists d 2! 0 such that :L:=o P{X = nd} = 1. That is, X is lattice if it only takes on integral 110 RENEWAL THEORY multiples of some nonnegative number d The largest d having this property is said to be the period of X If X is lattice and F is the distribution function of X, then we say that F is lattice. We shall state without proof the following theorem THEOREM 3.4.1 (Blackwell's Theorem). (i) If F is not lattice, then m(t + a) - m(t) for all a ~ 0 (ii) If F is lattice with period d, then alii- as t _ 00 E[number of renewals at nd] - dlli- as n _ 00 Thus Blackwell's theorem states that if F is not lattice, then the expected number of renewals in an interval of length a, far from the origin, is approximatelyalp. This is quite intuitive, for as we go further away from the origin it would seem that the initial effects wear away and thus (3.4.1) g(a) == lim [m(t I_a: + a) - m(t)J should exist However, if the above limit does in fact exist, then, as a simple consequence of the elementary renewal theorem, it must equal alp.. To see this note first that g(a + h) = = lim [m(t I_a: + a + h) - m(t)] + a + h) - m(t + a) + m(t + a) - m(t)J lim [m(t 1-'" = g(h) + g(a) However, the only (increasing) solution of g(a + h) = g(a) + g(h) is g(a) = ca, a>O THE KEY RENEWAL THEOREM AND APPLICATIONS 111 for some constant c. To show that c = 1//L define Xl = m(l) - m(O) m(2) - m(1) X2 = Xn = m(n) - m(n - 1) Then implying that . Xl 1 1m n-'" + .. + Xn n = C or lim m(n) = c n-'" n Hence, by the elementary renewal theorem, c = 1//L. When F is lattice with period d, then the limit in (34 1) cannot exist For now renewals can only occur at integral multiples of d and thus the expected number of renewals in an interval far from the origin would clearly depend not on the intervals' length per se but rather on how many points of the form nd, n :> 0, it contains. Thus in the lattice case the' relevant limit is that of the expected number of renewals at nd and, again, if limn _", E [number of renewals at nd J exists, then by the elementary renewal theorem it must equal dl/L. If interarrivals are always positive, then part (ii) of Blackwell's theorem states that, in the lattice case, lim P{renewal at nd} = n~OC !!.. /L Let h be a function defined on [0, 00 J For any a > 0 let mn(a) be the supremum, and mn(a) the infinum of h(t) over the interval (n - l)a -< t ~ na. We say that h is directly Riemann integrahle if L:=l mn(a) and 112 RENEWAL THEORY ~:=I mn(a) are finite for all a > 0 and " lima L mn(a) 0-.0 n=1 lim a L mll(a) 0_0 n=l- '" A sufficient condition for h to be directly Riemann integrable is that. " (i) h(t) ;;: 0 for all t ;;: 0, (ii) h(t) is nonincreasing, (iii) f; h(t)dt < 00. The following theorem, known as the key renewal theorem, will be stated without proof THEOREM 3.4.2 (The Key Renewal Theorem). If F is not lattice, and if h(t) is directly Riemann integrable, then lim h'" I' h(t - x) dm(x) = ~ I' h(t) dt, 0 ~ 0 where m(x) = L FIl(x) n I '" and ~= t pet) dt To obtain a feel for the key renewal theorem start with Blackwell's theorem and reason as follows By Blackwell's theorem, we have that lim met + a) - met) 1_" =1. ~ a and, hence, ' I' met + a) - met) =-. 1 I1m 1m 0_0 I_a; a ~ Now, assuming that we can justify interchanging the limits, we obtain lim dm(t) H'" = 1. ~ dt The key renewal theorem is a formalization of the above, THE KEY RENEWAL THEOREM AND APPLICATIONS 113 Blackwell's theorem and the key renewal theorem can be shown to be equivalent. Problem 3.12 asks the reader to deduce Blackwell from the key renewal theorem; and the reverse can be proven by approximating a directly Riemann integrable function with step functions. In Section 9.3 a probabilistic proof of Blackwell's theorem is presented when F is continuous and has a failure rate function bounded away from 0 and 00. The key renewal theorem is a very important and useful result. It is used when one wants to compute the limiting value of g(t), some probability or expectation at time t. The technique we shall employ for its use is to derive an equation for g(t) by first conditioning on the time of the last renewal prior to (or at) t. This, as we will see, will yield an equation of ttie form g(t) = h(t) + t h(t - x) dm(x). We start with a lemma that gives the distribution of SN(r), the time of the last renewal prior to (or at) time t. Lemma 3.4.3 P{SN(f):SS} = F(t) + £:F(t- y)dm(y). t;:::=S ~o. Proof P{SNil) :S s} 2: P{Sn :S S, Sn+l > I} n -0 . = F(I) + 2: P{Sn:S s. Sn+l > I} n=1 CD i: f: P{Sn:S s, Sn+l > tlSn= y} dFn(y) = F(I) + ~ f: F(t - y) dFn(y) = F(t) + n=t = F(I) + f~ F(t = F(t) y) d (~ Fn(Y)) + £:F(t- y)dm(y). where the interchange of integral and summation is justified since all terms are nonnegative 114 RENEWAL THEORY Remarks (1) It follows from Lemma 34.3 that P{SN(I) = O} = F(t), O<y<oo. ' dFsNUl (y) = F(t - y) dm(y), (2) To obtain an intuitive feel for the above, suppose that F is continuous with density f. Then m(y) = ~:=I Fn(Y), and so, for y > 0, dm(y) = L fn(Y) dy n=1 0:: 00 =L n=1 P{nth renewal occurs in (y, y + dy)} = P{renewal occurs in (y, y + dy)}. So, the probability density of SN(I) is fs Nil )(y) dy = P{renewal in (y, y + dy), next interarrival > t - y} = dm(y)F(t - y) We now present some examples of the utility of the key renewal theorem. Once again the technique we employ will be to condition on SN(t). 3.4.1 Alternating Renewal Processes Consider a system that can be in one of two states' on or off. Initially it is on and' it remains on for a time ZI, it then goes off and remains off for a time Y 1 , it then goes on for a time Z2, then off for a time Y 2; then on, and so forth. We suppose that the random vectors (Zn' Y n), n ;? 1, are independent and identically distributed Hence, both the sequence of random variables {Zn} and the sequence {Yn } are independent and identically distributed; but we allow Zn and Y n to be dependent In other words, each time the process goes on everything starts over again, but when it goes off we allow the length of the off time to depend on the previous on time. Let H be the distribution of Zn, G the distribution of Y n, and F the distribution of Zn + Y n, n ;? 1 Furthermore, let P(t) = P{system is on at time t}. THE KEY RENEWAL THEOREM AND APPLICATIONS 115 - THEOREM 3.4.4 If E[Zn + Yn] < 00 and F is non/altice, then Proof Say that a renewal takes place each time the system goes on Conditioning on the time of the last renewal pnor to (or at) t yields P(t) = P{on at tISN(/) = O}P{SN(I) = O} + f~ P{on at tISN(/) = y} dFsN(II(y) Now P{on at tISN(/) = O} = = P{ZI > tlZI + YI > t} H(t)/F(t), and, for y < t, P{on attISN(I) = y} = P{Z > t - ylZ + Y> t - y} = H(t Hence, using Lemma 3 4 3, y)/F(t - y) P(t) = H(t) + f~ H(t - y) dm(y), I; H(t) dt where m(y) = L:~I Fn(Y) Now H(t) is clearly nonnegative, nonincreasing, and = E[Z] < 00 Since this last statement also implies that H(t) - 0 as t 00, we have, upon application of the key renewal theorem, If we let Q(t) = P{off at t} = 1 - P(t), then E[Y] Q(t) - E[Z] + E[Y] We note that the fact that the system was initially on makes no difference in the limit. 116 RENEWAL THEORY Theorem 344 is quite important because many systems can be modelled by an alternating renewal process. For instance, consider a renewal process and let Yet) denote the time from t until the next renewal and let A(t) be the time from t since the last renewal That is, Yet) SN(I) +1 SN(!) t, "A(t) = t yet) is called the excess or residual life at t, and A(t) is called the age at t. If we imagine that the renewal process is generated by putting an item in use and then replacing it upon failure, then A(t) would represent the age of the item in use at time t and Yet) its remaining life Suppose we want to derive P{A(t) <: x}. To do so let an on-off cycle correspond to a renewal and say that the system is "on" at time t if the age at t is less than or equal to x. In other words, the system is "on" the first x units of a renewal interval and "off" the remaining time Then, if the renewal distribution is not lattice, we have by Theorem 34.4 that lim P{A(t) sx} = E[min(X,x)lIE[X] !...... "" = J; J: P{min(X,x) > y} dy/ErX] F(y) dy/}L· Similarly to obtain the limiting value of P{Y(t) x}, say that the system is "off" the last x units of a renewal cycle and "on" otherwise Thus the off time in a cycle is min(x, X), and so lim P{Y(t) s x} 1-+00 lim P{off at t} /-+3l = E[min(x, X)]/ E[X] = J: F(y) dy/}L· Thus summing up we have proven the following PROPOSITION 3.4.5 If the interarrival distribution is non lattice and IL < 00, then ~~~ P{Y(t) =:; x} = !~~ P{A(t) =:; x} J; F(y) dyllL We will find this technique of looking backward in time to be quite valuable in our studies of Markov chains in Chapters 4 and 5. We will now use alternating renewal process theory to obtain the limiting distribution of X N(r)+1 Again let an on-off cycle correspond to a renewal interval. (See Problem 3. is known as the inspection paradox. Thus by Theorem 3.) Another random variable of interest is XN(/)+I lently. But looking backwards the excess life at t is exactly the age at t of the original process. . XN(/)+I = A{t) = SN(/)+I - SN(r)' or. r E[XIX > xlF{x)/p. the time between successive events will still be independent and have distribution F Hence. for any x it is more likely that the length of the interval containing the point t is greater than x than it is that an ordinary renewal interval is greater than x.----"------=------''-'" = = p. That is. suppose it started at t = -00 Then if we look backwards in time. This result. equiva- + Y{t) Thus X N(r)+1 represents the length of the renewal interval that contains the point t In Problem 3 3 we prove that P{XN(r)+1 > x} ~ F{x) That is.4 4. we obtain E[on time in cycle] · I1m P{XN(I)+I > X } .THE KEY RENEWAL THEOREM AND APPLICATIONS 117 Remark To understand why the limiting distribution of excess and age are identical.14 for another way of relating the distributions of excess and age. consider the process after it has been in operation for a long time.. provided F is not lattice. for instance. y dF{y)/p. Then P{X N(. the system is either totally on during a cycle (if the renewal interval is greater than x) or totally off otherwise.)+I > x} = P{length of renewal interval containing t > x} = P{on at time t}. looking backwards we see an identically distributed renewal process. and say that the on time in the cycle is the total cycle time if that time is greater than x and is zero otherwise. which at first glance may seem surprising. Suppose that customers arrive at a store. then the amount ordered is EXAMPLE 3. ~ P{X(t) > x}.... then the density of the interval containing the point t. Y 2 .2) we see that this is indeed the limiting density. lim P{X(t) .4. Hence.x if x < s.? x in a cycle]. o The order is assumed to be instantaneously filled.. and suppose we desire lim(. N x = min{n: + . then an order is placed to bring it up to S Otherwise no order is placed Thus if the inventory level after serving a customer is x. equivalently. E[tlme of a cycle] •• . would be g(y) = y dF(y)/c (since dF(y) is the probability that an arbitrary interval is of length y and ylc the conditional probability that it contains the point).4.2) lim P{XN(r)+. Now if we let Y.118 RENEWAL THEORY or. in accordance with a renewal process having nonlattice interamval distribution F. For if this were the case. ifx. Remark To better understand the inspection paradox. from Theorem 3..4(A) S. ~ = E [amount of time. . then if we say that the system is "on" whenever the inventory level is at least x and "off" otherwise. reason as follows: Since the line is covered by renewal intervals. Let X(t) denote the inventory level at time t. S) ordering policy' If the inventory level after serving a customer is below s. (3.4. If X(O) = S..? x} (. is it not more likely that a larger interval-as opposed to a shorter one-covers the point t? In fact.. the inventory. and let (3. which sells a single type of commodity. in the limit (as t --+ 00) it is exactly true that an interval of length y is y times more likely to cover t than one of length 1. The amounts desired by the customers are assumed to be independent with a common distribution G. call it g.. For another illustration of the varied uses of alternating renewal processes consider the following example. An Inventory Example..3) denote the successive customer demands Y.. The store uses the following (s. the above is just an alternating renewal process. x} = JX0 y dF(y)/p..4. ~~ :s..4. + Yn > S - x}.?s.. But by (3. Hence. denote the interarrival times of customers.4. E[Nx ] = mG(S .1 as the number of renewals by time S .2 Limiting Mean Excess and the Expansion of m(t) Let us start by computing the mean excess of a nonlattice renewal process. we thus have upon taking expectations (344) r E L~ P{X():> } = t x E [ LX. E [Ns] However.3) E[Y(t)] = E[Y(t)ISN(r) = O]F(t) + f~ E[Y(t)ISN(r) = y]F(t . .x) 11m P{X(t) .x of a renewal process with interarrival time Y.. as the Y i:> 1. 1=1 N- 1 ] [ = E[Nx ] ] LX. i ~ 1. are independent and identically distributed.x) + 1. Hence if x" i ~ 1. it follows from" (3.x} .s) + 1.4. 1=1 N. we arnve at • :> _ 1 + mG(S ..1 + mG (S)' 1-+'" .4.4).] = mG(S .y) dm(y).3) that we can interpret N x . 3. where G is the customer demand distnbution and ma(t) =L n=1 '" Gn(t). Conditioning on SN(/) yields (by Lemma 3.4.s s~x~S. from (3. Assuming that the interarrival times are independent of the successive demands. Hence. 1= 1 N. then Nl amount of "on" time in cycle = time of cycle = Lx" 1=1 Lx. and it is the Ns customer that ends the cycle. E[N.THE KEY RENEWAL THEOREM AND APPLICATIONS 119 then it is the N x customer in the cycle that causes the inventory level to fall below x. tlX > t]. Now. where the above follows since SNIt) = y means that there is a renewal at y and the next interarrival time-call it X-is greater than t . E [X . x 2dF(x )/21L E[X2]/21L.4.tlX > t] F(t) dtl J.(t .6 If the interarrival distribution is nonlattice and E[X2] < 00.t) dt dF(x)11L IL (by interchange of order of integration) J.y)IX> t .tlX > t] F(t) is directly Riemann integrable provided E[X2] < 00. then lim E[Y(t)] 1-'" = E[X2]/2/L Now SN(I)+I. E[Y(t)ISN(t) = y] = E[X .Y (see Figure 3. and so by the key renewal theorem E [Y(t)] ---+ = = = = J. Thus we have proven the following. Now it can be shown that the function h(t) = E [X . SN(I) = Y.1. E[Y(t)] = E[X .y)IX > t . Hence.y) dm(y).4.y]. J~ (x .120 RENEWAL THEORY ---------x-------------x--------------------x---y ---Y(t)--X :: Figure 3. E [Y(t)ISN(r) = 0] = E[X .tlX> t]P(t) + J: E[X - (t .1). renewal.4. PROPosn·ION 3. J: (x . the time of the first renewal after t.t) dF(x) dtllL J. can be expressed as SN(I) + I = t + Y(t) .y]F(t . 1. then as t co 3.=OjPI > 1 THEOREM 3.4. from Proposition 3 46 we obtain the following Corollary 3. .£ = E[Y(t)] . and F i~ not lattice.4.THE KEY RENEWAL THEOREM AND APPLICATIONS 121 Taking expectations and using Corollary 333. . when m = ~.}.3 Age-Dependent Branching Processes Suppose that an organism at the end of its lifetime produces a random number of offspring in accordance with the probability distribution {PI' j = 0. we have lL(m(t) + 1) or =t+ E[Y(t)] met) - -. m > 1. The stochastic process {X(t). Assume further that all offspring act independently of each other and produce their own offspring in accordance with the same probability distribution {PJ Finally. let us assume that the lifetimes of the organisms are independent random variables with some common distribution F Let X(t) denote the number of organisms alive at t.1 IL IL Hence. then where a is the unique po~itive number such that '" 1 Jo e-a'dF(x) = -m . 2.7 If E[X2] < co and F is nonlattice.4.8 If Xo = 1. t ~ O} is called an age-dependent hranching process We shall concern ourselves with determining the asymptotic form of M(t) = E[X(t)]. s) Thus. from the above we obtain (346) M(t) = F(t) + m t M(t . (345) t £IX(t)ITI = s] dF(s) £IX(t) ITI = s] = ~' {I m M(t .s) dF(s). Now. we obtain (347) Letting I(t) tation. we have from (347). sst.s).) = jM(t . e-a'M(t) =e a'F(t) + J: e-a('-<}M(t . is the number of descendants (including himself) of the ith offspring that are alive at t Clearly. and (345) follows by taking the expectation (with respect to j) of jM(t . using convolution no- = h + G * (h + G * f) = h + G * h + G z* I = h + G * h + G 2 * (h + G * f) = h + G * h + G 2 * h + G) * I = h + G * h + G2 * h + " + Gn * h + G n +1 * I .s) ifs> t ifs s t To see why (345) is true. and suppose further that the organism has j offspring Then the number of organisms alive at t may be wntten as Y] +. + Y" where Y..s) dG(s) = e-a'M(t) and h(t) = e 1= h + 1* G =h+G*1 a'F(t). . Oss< 00 Upon multiplying both sides of (346) by e a' and using the fact that dG(s) me-a.122 RENEWAL THEORY Proal By conditioning on T . Y. the lifetime of the initial organism. are independent with the same distribution as X(t . Y 1 . we obtain M(t) = However. let a denote the unique positive number such that and define the distribution G by G(s) = m te ay dF(y). suppose that T] = s. dF(s). £(YI + + Y.s) Thus. 00 (why?)...ax dF(x) Thus from (348). it follows that Gn(t) .DELAYED RENEWAL PROCESSES U3 Now since G is the distribution of some nonnegative random vanable.00 in the above yields h + h *mo or /(/) h(/) + J~ h(1 . (3410) J. and (34 10) we obtain ast- 00 3... and thus by the key renewal theorem h(/) dl /(/)_. we might start .5 DELAYED RENEWAL PROCESSES We often consider a counting process for which the first interarrival time has a different distribution from the remaining ones For instance.s) dma(s) Now it can be shown that h(/) is directly Riemann integrable.J'" dF(x) dl al IJ 0 I = J~ f e. x dG(x) m t xe.. (349). and so letting n ..0 as n .dl dF(x) al = -1 J'0" (I a (by the definition of a) Also.°_/Lo (348) r J: J: J: e-aIF(/) dl xdG(x) Now (349) J'" e aIF(t) dl = J'" e. We leave the proof of the following proposition for the reader. } be a sequence of independent nonnegative random variables with XI having distribution G.. of course. we have P{Nf)(t) = n} = P{Sn:S t} .F(s) By using the corresponding result for the ordinary renewal process. it is easy to prove similar limit theorems for the delayed process.2) _ ( )_ mo s G(s) . an ordinary renewal process. n ::= 1. let {Xn . Then it is easy to show that (351) mD(t) =::: '" 2: G*Fn-1(t) n=1 and by taking transforms of (3. then the distribution of the time we must wait until the first observed renewal will not be the same as the remaining interarrival distributions. Definition The stochastic process {No(t). . and Xn having distribution F. 1. .124 RENEWAL THEORY observing a renewal process at some time t > O.1). we have. As in the ordinary case. If a renewal does not Occur at t. . t 2: O} is called a general or a delayed renewal process When G = F.P{Sn+1 = G * Fn-I(t) <:: t} G* Fn(t). Let mo(t) = E[No(t)]. Formally. n = 1.Sn = L~ Xi. n > 1 Let So = 0. x dF(x). we obtain (3. and define ND(t) = sup{n: Sn:S t}. Let 1L = I.2.5..5. 1.DELAYED RENEWAL PROCESSES 125 PROPOSITION 3.5. .IJ- asn _ 00 (v) If F is not lattice.' •• . then {N(n). IJ- mo(t) t 1 00 IJ- (iii) If F is not lattice. But by Blackwell's theorem (part (iv) of Theorem 3.0. and suppose that we keep track of the number of times that a given subsequence of outcomes.0. or pattern. 13.IJ- a (iv) If F and G are lattice with penod d.).5.1) this is just the limiting probability of a renewal at . if the pattern is 0.1. ••• ) = (1. If we let N(n) denote the number of times the pattern occurs by time n.0. suppose the pattern is Xl.1.1 (i) With probability 1.5.1.5(A) Suppose that a sequence of independent and identically distributed discrete random variables XI> X 2 . That is.1) this will equal the reciprocal of the mean time between patterns. then x) dmo(x) - f~ h(t - t h(t) dt/IJ- ExAMPLE 3.0. then d E[number of renewals atnd] .0. The distribution until the first renewal is the distribution of the time until the pattern first occurs. X2. For instance. X 2 .1.1. Suppose we want to determine the rate at which the pattern occurs. 1. Xk and say that it occurs at time n if Xn = Xb X n . By the strong law for delayed renewal processes (part (i) of Theorem 3.k + l = XI.< 00. n ~ 1} is a delayed renewal process.l = Xk-l.1.1 and the sequence is (Xl. 7. " . occurs. ••• is observed. whereas the subsequent interarrival distribution is the distribution of time between replications of the pattern. then the pattern occurred at times 5.mD(t) .. (ii) ------ND(t) t 1 as t as t - 00. IJ.0. and h directly Riemann integrable. then mo(t + a) . . X n . But since in order for the pattern 0. 1 is p-2q-2. 1. 1. 1] = E[time to 0. say pattern B. 1 it follows that E[time to 0. it follows that starting with 0. RENEWAL THEORY (E[time between patternsl)-1 = lim P{pattern at time n} n-" k = IT P{X = XI}' . 1. 1] + p-2q-2.})-I. by the same reasoning. 1 first occurs. 1. 1 we see that the expected time between occurrences of this pattern is 1/(pq). By using the same logic on the pattern 0. 0.:1 Suppose now that we want to compute the probability that a given pattern.0.} and the mean time between patter~ is (ITt P{X = X. Suppose now that we are interested in the expected time that the pattern 0. E{time until k consecutive heads appear] = (l/p)k + E[time until k k 1 consecutive heads appearl = 2. 1. and as this is equal to the expected time of its first occurrence we obtain that The above argument can be used to compute the expected number of outcomes needed for any specified pattern to appear For instance.~1 Hence the rate at which the pattern occurs is IT~:1 P{X = x.1 is p-2q -2. Since the expected time to go from 0. .0. 1 is p-2q-2.0. (lip)'.126 time n That is.0. 1 to 0. say pattern A. 1 the expected number of additional outcomes to obtain 0. if each random variable is 1 with probability p and o with probability q then the mean time between patterns of 0.0.0. occurs before a second pattern. 1. 1. 1 to occur we must first obtain 0. consider independent flips of a coin that lands on heads with probability p. and suppose we are interested . For instance. if a coin with probability p of coming up heads is successively flipped then E{time until HTHHTHH] E[time until HTHHl + p-Sq-2 = E[time until H] + p-3 q -l + p-5q-2 = p-l + p-3 q -l + p-Sq-2 = Also. For instance. Let N 81A denote respectively the number of additional flips needed for B to appear starting with A. and similarly for N AIH . E{NTlIIT ] = E[NT] + q-3p -l = q 1 + q 3 -l p E[NTHT] = E[NT ] + q-2p -l = q-l + q 2p-l and so.PA).MIB before A ](1 = E[Ml + E[NAly](l . E[M] Similarly. we see that But. Then E{NlilA] = E[additional number to THTT starting with HTHT] E[additional number to THTI starting with THT] But since E[Nmn ] = E[NTlIT] + E[NTlITIITlIT]. E[NAI8 ] = E[NA] = p-2q -2 +P lq-I To compute PA = P{A before B} let M E[NA] Min(NA' Ny). Also.E[NA] A E[NyIA ] + E[NAIY] E[M] = E[N8] E{N8IA]PA . Also let NA denote the number of flips until A occurs. Solving these equations yields P = E[N8] + E[NAjyJ . Then = E[M] + E[NA = M] PA) + E[NA .DELAYED RENEWAL PROCESSES U7 in the probability that A = HTHT occurs before B THTT To obtain this probability we will find it useful to first consider the expected additional time after a given one of these patterns occur until the other one does. 23 = 8. the expected number until the pattern B appears is 18.t+h)}=:t{ I_eo . ErNAIB] we obtain that P = 18 + 20 . Since all other possibilities taken together clearly have probability a (h). Suppose that the system is said to be functional at any time if at least one component is up at that time (such a system is called parallel).=1 1/JLlt\ we can compute the average length of an up (or functional) .. A. and the probability that pattern A appears first is 9/14 (which is somewhat counterintuitive since ErNA] > ErNB])' EXAMPLE 3..5(a) A system consists of n independent components.. the expected number until either appears is 9017. suppose that p RENEWAL THEORy = 112. . in which state it remains for an exponential time with mean JL" before going back up and starting anew. Now one way for a breakdown to occur in (t. . t + h) is to have exactly 1 component up at time t and all others down. Then as = ErNA] = 24 + 22 = 20 ErN R ] = 2 + 24 = 18. each of which acts like an exponential alternating renewal process More specifically.. + JL. ErNB1A ] = 24 ...20 = 9/14 ErM] A 8 + 20 ' = 18 _ 8 .> 0) is a delayed renewal process Suppose we want to compute the mean time between system breakdowns. 9 = 9017 14' Therefore.n. If we let N(t) denote the number of times the system becomes nonfunctional (that is. breaks down) in [0. and then goes down.. is up for an exponential time with mean A. 0 we obtain Ertime between breakdowns] = ( 1=1 IT n JL Al + JLI 1-1 1 ) L-)-1 n JL. the expected number of flips until the pattern A appears is 20. i = 1. t . and then have that component fail.128 For instance. To do so let us first look at the probability of a breakdown in (t. I I"" I But by Blackwell's theorem the above is just h times the reciprocal of the mean time between breakdowns. we see that limp{breakdOWnin(t. then {N(t). JL A.. t]. t + h) for large t and small h. and so upon letting h .1 IT Al + JL }lh+O(h). As the expected length of a breakdown period is (~. A.. component i. where a < 1 (if a = 1 then all policies meet the objective). . Let 8 m denote the number of flips in cycle m that use the bad coin and let G m be the number that use the good coin. having a very small memory requirement.:. We can now verify that the above is indeed equal to the expected length of a down period divided by the expected time length of a cycle (or time between breakdowns). let P = max(p). at which point switch to coin 2 and flip it until a tail occurs. and then switch and do the same with coin 2. 2. Say that cycle 2 ends at this point. We will need the following lemma. ExAMPLE 3.)-1 IL. n 1 As a check of the above.DELAYED RENEWAL PROCESSES U9 period from E[length of up period] = E[time between breakdowns] - ( 1 2. P2) and ap = min(p I. P2). note that the system may be regarded as a delayed alternating renewal process whose limiting probability of being down is lim P{system is down at t} 1-'" = I~I n n ILl AI + ILl .5(c) Lemma For any s > 0. Our objective is to continually flip among these coins so as to make the long-run proportion of tails equal to min(PI' P2)' The following strategy. will accomplish this objective Start by flipping coin 1 until a tail occurs. In general. Say that cycle 1 ends at this point Now flip coin 1 until two tails in a row occur. which ends cycle n + 1. To show that the preceding policy meets our objective. Consider two coins. Call the coin with tail probability P the bad coin and the one with probability ap the good one. and suppose that each time coin i is flipped it lands on tails with some unknown probability p" i = 1. P{Bm ~ sGm for infinitely many m} = O. when cycle n ends. return to coin 1 and flip it until n + 1 tails in a row occur and then flip coin 2 until this occurs. m ~ eGm } < 00 which. f r=1 (lIp)r = (l/p)m . from the continuity property of probabilities (Proposition 11. supposing that the first coin +e used in each cycle is the good coin it follows. by the strong law of large numbers. we see that in all but a finite number of cycles the proportion B/(B + G) of flips that use the bad coin 1 will be less than . Hence.m + 1 to i must all be tails Thus. where the preceding inequality follows from the fact that G m = i implies that i ~ m and that cycle m good flips numbered i .130 Proof We will show that RENEWAL THEORY L P{B m". the long-run proportion of flips that land tails is equal to the long-run proportion of good coin flips that land tails. this implies. will establish the ':. by the Borel-Cantelli lemma (see Section 11).p which proves the lemma Thus. with probability 1. and this. it follows that. is exp. E[B m = 1 Therefore. this is not necessary and a similar argument can be given to show that the result holds without this assumption. (Whereas the preceding argument supposed that the good coin is used first in each cycle. result Now. that the long-run proportion of flips that use the bad coin is 0 As a result. using the lemma.) . Since e is arbitrary. from Example 33(A).e < e.. . with probability 1.1).1 1. that the long-run proportion of flips that use the bad coin will be less than or equal to e. we see that But. Then the process we observe is a delayed renewal process whose initial distribution is the distribution of Yet). it follows that the distribution of the time of the last renewal before (or at t) t is given by (3. the distribution function F.3) When p. < 00. (iii) {ND(t). we have that riiD(S) = 1i. 0 dx dF(y)/ p. f: esp.s The delayed renewal process with G = Fr is called the equilibrium renewal process and is extremely important.4.5. f. it follows from Proposition 3.sy ) dF(y) _ 1 . is called the equilibrium distribution of F. eJ. Thus. = 2.54). Its Laplace transform is given by (354) Pres) = = = f.F(s) p. e- SX dF.2) and (3.5 that the observed process is the equilibrium renewal process The stationarity of this process is proven in the next theorem.tS . X 2: 0. THEOREM 3. For suppose that we start observing a renewal process at time t. t ~ O} has stationary increments Proof (0 From (35. x} = Ft(x) for all t ~ 0. Joo (1 - e.(x) = J: F(y) dy/p. Let YD(t) denote the excess at t for a delayed renewal process.. P{SN(ll ~ s} = G(t) + J: F(r - y) dmD(y).DELA YED RENEWAL PROCESSES 131 In the same way we proved the result in the case of an ordinary renewal process.(x) SX r SX dF(y)dx/p.5. for t large.2 For the equilibrium renewal process (0 mD(t) = till-. (0) P{YD(t) :s. using (353). letting G f: F(t + x . and thus by the uniqueness of transforms. where the initial distribution is the distribution of Y D(S) The result then follows from (ii) 3.(t + x) + r I F(y) dyllJ- = F. > t} _G(t+x) G(t) P{YD(t) >xlsN(I) = s} = P{X> t + X _F(t+x-s) F(t .ND(S) may be interpreted as the number of renewals in time t of a delayed renewal process.s} Hence. P{YD(t) > x} = G(t + x) + Now. We denote by Rn the reward earned at the time of the nth renewal. and suppose that each time a renewal occurs we receive a reward. We shall assume that the R n .132 RENEWAL THEORY However.(x). . upon conditioning on SN(r) we obtain. n 2! I.(t + x) + = F. Consider a renewal process {N(t).s) dsllJ- = F.s) dmD(s) J: F(t + x . we obtain mD(t) = tllJ(ii) For a delayed renewal process. n 2! 1 with distribution F. = O} = P{X. > t + xiX. simple calculus shows that 1IJ-LS is the Laplace transform of the function h(t) = tllJ-. I.s) six> t .6 RENEWAL REWARD PROCESSES A large number of probability models are special cases of the following model. (iii) To prove (iii) we note that ND(t + s) . t ::> O} having interarrival times X n . are independent and identically distributed. and using part (i) yields P{YD(t) > x} = F. P{YD(t) > x} = P{YD(t) > XISN(I) = O}G (t) + Now P{YD(t) > XISN(I) f~ P{YD(t) > XISN(I) = s}F(t - s) dmD(s). (ii) _E. n ~ 1. and so we assume that the pairs (Xn' R n). If we let N(I) R(t) = ns \ 2: Rn.1 If E[R] < 00 and E[X] < 00. are independent and identically distributed.u. ---'jo-- R(I) I E[R] E[Xl as I --'jo as t --'jo 00.6.) [_R t E[Xl 00 Proof To prove (i) write N(II = ( ~ Rn N(t) ) (N(t») t By the strong law of large numbers we obtain that as I --'jo and by the strong law for renewal processes ---'jo-- 00..(t. then R(t) represents the total reward earned by time t.-[ 1--'jo _E ] R->. then (i) with probability 1. Let THEOREM 3. N(t) t 1 E[X] Thus (i) is proven . we do allow for the possibility that Rn may (and usually will) depend on X n . the length of the nth renewal interval.RENEWAL REWARD PROCESSES 133 However.. s.T) + EIRd m(t) .s. + 1-.] = E [N~' Ri] . let = 0] =: E[R" X. E[RN(I)_IISN(I) E[RN(I)-dSN(..]lt~ 0 as t ~ 00 So.x)1 dm(x) tOt . let g(t) == E[RN(I)+'] Then conditioning on SN!II yields g(t) = E[RN(II-dSN(1l = O]F(t) + J: E[RN(I)+dSN(I) s]F(t . To prove (ii) we first note that since N(t) + 1 is a stopping time for the sequence .) and so (361) Now.T) i t ast~ 00 t ~- s EX . Ig(t)1 . EfiRd IX. and thus we can choose T so that Ih(t)1 < s whenever t ~ T Hence. > 1]. 1-. s] and note that since EIRd = f.-1 . Ih(/)1 + f'-Tlh(1 .-T t t :::.I for all I. (Why?) Thus. towards this end.x)1 dm(x) + f' Ih(t . from (361).1 E[RNwd E[R N (I)+I] = (m(t) + 1)E[R] and so E[R(t)] t = m(t) + 1 E[R] t _ E[RN(!l+.s) dm(s) However.134 RENEWAL THEORY X" X 2 .m(t .s].it is also a stopping time for R" Rz. E[Rnl Xn > t . h(t) ~ 0 as t ~ 00 and h(t). by Wald's equation. it follows that x] dF(x) < 00.J t ' and the result will follow from the elementary renewal theorem if we can show that E[RN(I)_. E [~ R. EIR.m(t . 1) suppose that we earn at a rate of one per unit time when the system is on (and thus the reward for a cycle equals the on time of that cycle). RN(I)+I is related to XN(I) +-I . In the proof of the theorem it is tempting to say that E[RN(I)+d = E[Rd and thus lItE[R N (I)+d tnvially converges to zero.6. • • ~N(r) ~N(I)+I . t]. Also. and thus by Theorem 36. then the theorem states that the (expected) long-run average return is just the expected return earned during a cycle. and suppose first that all returns are nonnegative Then and (ii) of Theorem 3. this is not essential. and the result follows Remarks If we say that a cycle is completed every time a renewal occurs. it (heunstically) follows that XN(I)+I tends to be larger than an ordinary renewal interval (see Problem 3 3). and the general case follows by breaking up the returns into their positive and negative parts and applying the above argument separately to each.6 1 follows since Part (I) of Theorem 361 follows by noting that both ""n~1 Rn1t and ""n~1 Rn1t converge to E [R]I E [X] by the argument given in the proof. . M amount 0 f on time In t ---+ E [X] + E [Y] . However. with probability 1. Then the total reward earned by t is just the total on time in [0. and Theorem 3. up to now we have assumed that the reward is earned all at once at the end of the renewal cycle. ExAMPLE 3. it follows that g(t)lt --+ 0.1 remains true if the reward is earned gradually during the renewal cycle. To see this.1.and XN(I)+-I is the length of the renewal interval containing the point t Since larger renewal intervals have a greater chance of containing t. E[X] . and thus the distnbution of RN(I)+I is not that of R . let R(t) denote the reward earned by t.RENEWAL REWARD PROCESSES 135 by the elementary renewal theorem Since e is arbitrary. A similar argument holds when the returns are nonpositive. However. divided by the expected time of a cycle. For an alternating renewal process (see Section 34.6{A) Alternating Renewal Process. . by Theorem 3. f:A(s) ds _ E[X2] !:~ t . Similarly if Y(t) denotes the excess at t. Now since the age of the renewal process a time s into a renewal cycle is just s. and so J~ A(s) ds represents our total earnings by time t As everything starts over again when a renewal occurs.1.4.6(8) Average Age and Excess. it follows that. and suppose we are interested in computing To do so assume that we are being paid money at any time at a rate equal to the age of the renewal process at tpat time. Then the average value of the excess will.t) dt] E[X] _ E[X2] . Hence.6.4 when the cycle distribution is nonlattice the limiting probability of the system being on is equal to the long-run proportion of time it is on. Let A (t) denote the age at t of a renewal process. ExAMPLE 3. at time s we are being paid at a rate A(s). we have reward during a renewal cycle = I x 0 s ds = 2'"' X2 where X is the time of the renewal cycle. I: A (s) ds t E [reward dunng a renewal cycle] --+ E [time of a renewal cycle] . with probability 1. Thus by Theorem 3. be given by \ u I· L~ I' Y(s s t = E[reward during a renewal cycle] )d / E[X] 0 E [f: (X . That is. with probability 1.2E[X]' .2E[X)' . we can compute the average excess by supposing that we are earning rewards at a rate equal to the excess at that time.136 RENEWAL THEORY where X is an on time and Y an off time in a cycle. 6(c) Suppose that travelers arrive at a train depot in accordance with a renewal process having a mean interarrival time IL. + (N . The expected length of a cycle is the expected time required for N travelers to arrive. then the expected cost of a cycle may be expressed as E[cost ofa cycle] = E[cX. this equals E[length of cycle] = NIL.SN(t) represents the length of the renewal interVal containing the point t. since the mean interarrival time is IL.. Whenever there are N travelers waiting in the depot. It 0 _ E[X2] XN(s)+1 dslt . and. Since it may also be expressed by XN(I)+I = A(t) + yet). then the above is a renewal reward process. If the depot incurs a cost at the rate of nc dollars per unit time whenever there are n travelers waiting and an additional cost of K each time a train is dispatched. (Why is this not surprising?) ExAMPLE 3.l)cXN-d + K 1) + K = CIL N (N - 2 Hence the average cost incurred is . If we let Xn denote the time between the nth and (n + 1)st amval in a cycle. a train leaves.RENEWAL REWARD PROCESSES 137 Thus the average values of the age and excess are equal. + 2cX2 + . (Why was this to be expected?) The quantity XN(I) + I = SN(I)+I .E[X]' E[X2] ~ E[X] E[X] (with equality only when Var(X) = 0) we see that the average value of XN(I)+I is greater than E[X].. we see that its average value is given by !:~ Since . what is the average cost per unit time incurred by the depot? If we say that a cycle is completed whenever a train leaves. . Y2 .) n To argue that Wexists with probability 1. X 2 .6. a new cycle begins each time an amval finds the system empty).2) E[Y. and let YI .6... it follows from Theorem 361 that (3. with probability 1.1 A Queueing Application Suppose that customers arrive at a single-server service station in accordance with a nonlattice renewal process Upon amval.. a customer is immediately served if the server is idle. • denote the service times of successive customers..6. o:.. Since the queueing process begins anew after each cycle. OJ To show that L exists and is constant.138 RENEWAL THEORY 3. let W. on day i. denote the amount of time the ith customer spends in 'the system and define W · WI + . imagine that we receive a reward W. r.. imagine that a reward is being earned at time s at rate n(s) If we let a cycle correspond to the start of a busy period (that is. We shall assume that (3. + Wn = I1m ----'-----'" n . Define L = Jim J'0 n(s) ds/t.] < 00 Suppose that the first customer arrives at time 0 and let n(t) denote the number of customers in the system at time t. As L represents the long-run average reward..3) L = E [reward during a cycle] E [time of a cycle] E [J. ••• denote the interarrival times between customers..] < E[X. and he or she waits in line if the server is busy. and are also assumed independent of the arrival stream Let XI. n(s) dS] E[T] Also. then it is easy to see that the process restarts itself each cycle. it follows that if we let N denote the number of customers served in a cycle. The service times of customers are assumed to be independent and identically distributed. then W is the average reward per unit time of a renewal process in which the cycle time .. The following theorem is quite important in queueing theory THEOREM 3.I .] denote the arrival rate Then L = AW Proof We start with the relationship between T.1 of Chapter 7) that (36. since .: : n} is independent of X n .. + W N.RENEWAL REWARD PROCESSES 139 is N and the cycle reward is WI + .2) implies that E [N] < 00. Xl. k 1.2 Let A = lIE[X. the length of a cycle. hence. and N. X n . T= LX.6. the number of customers served in that cycle If n customers are served in a cycle. hence. ~ (364) W = E[reward during a cycle] E [time of a cycle] E[~ W. then the next cycle begins when the (n + 1)st customer arrives.] = E[N] We should remark that it can be shown (see Proposition 7 1. I~I N Now it is easy to see that N is a stopping time for the sequence XJ. and.Z.n and thus {N : . Hence by Wald's equation E[T] and so by (363) and (364) = E[N1E[Xl :::: E[N]/A (365) . by replacing "the system" by "service" we have that the average number in service = AE[Y].} > G (time) average number in "the system" = A' (average time a customer spends in "the system"). By replacing "the system" by "the queue" the same proof shows that the average number in the queue =A (average time a customer spends in the queue).140 RENEWAL THEORY But by imagining that each customer pays at a rate of 1 per unit time while in the system (and so the total amount paid by the ith arrival is just W.] (2) Theorem 362 states that the and P{Y. 7 REG EN ERATIVE PROCESSES Consider a stochastic process {X(t). = total paid during a cycle. we see f: n (s) ds = ~ W.62 does not depend on the particular queueing model we have assumed The proof goes through without change for any queueing system that contains times at which the process probabilistically restarts itself and where the mean time between such cycles is finite For example. then it can be shown that a sufficient condition for the mean cycle time to be finite is that E[Y. 3. or.2. t ~ G} with state space {G. . N and so the result follows from (395) Remarks (1) The proof of Theorem 3. if in our model we suppose that there are k available servers. 1.} having the property that there exist time points at which the process (probabilistically) .). < X.] < kE[X. H"'} Proof Let pet) :::: P{X(t) = j} Now. as it can be . directly Riemann integrable. SI > t} i. it follows that {51.tatejduringacyclel c[timeofacyclel Conditioning on the time of the la.j Now.. S3... . SI >t s} dm(s) Hence. P{X(t) P{X(t) = jISN(. the distribution of a cycle.) = O} = P{X(t) = j lSI> t}. has a density ouer some interval. ..SI > t}dtIEIS. = jISN(I)::= s} P{X(t s) jlSI >t s}.. . t} denote the number of cycles by time t The proof of the following important theorem is a further indication of the power of the key renewal theorem THEOREM 3. and thus pet) = P{X(t) = j. restarts itself That is. having the same property as 51 Such a stochastic process is known as a regenerative process From the above.7. e. letting if X(t) == j. SI > t} + J: P{X(t s) j. suppose that with probability 1. hown that h(t) == P{X(t) = j.t cycle before t yield. and if E[SII < 00. then P =limP{X(t) . there exists a time 51 such that the continuation of the process beyond 51 is a probabilistic replica of the whole process starting at 0 Note that this property implies the existence of further times S2..REGENERATIVE PROCESSES 141.} constitute the event times of a renewal process We say that a cycle is completed every time a renewal occurs Let N(t) = max{n. 5n ::. = '}= Elamountoftimein<. 5h .Sl > t otherwi .1 If F. we have by the key renewal theorem that P(t)--+ • 0 roo P{X(t) = j. 1 The Symmetric Random Walk and the Arc Sine Laws be independent and identically distributed with P{Y. constitutes a regenerative process provided the initial customer arrives at t = 0 (if not then it is a delayed regenerative process and Theorem 37. = SI > t} dt. From the theory of renewal reward processes it follows that PI also equals the long-run proportion of time that X(t) = j In fact. I(t)dt] I: E[/(t)l dt I: P{X(t) = j. = I} = P{Y.7. we have the following. with prohability 1. the result follows EXAMPLE 3.1 remains valid). X(t). = -I} = t . t)l I_m t = E[time inj during a cycle1 E[time of a cycle1 Proof Suppose that a reward is earned at rate 1 whenever the process is in state j This generates a renewal reward process and the proposition follows directly from Theorem 361 3. PROPOSITION 3.142 then RENEWAL THEORY f.7. the number in the system at time t. lim [amount of time inj during (0.2 For a regenerative process with E[ S11 < 00. Most queueing processes in which customers arrive in accordance to a renewal process (such as those in Section 36) are regenerative processes with cycles beginning each time an arrival finds the system empty Thus for instance in the single-server queueing model with renewal arrivals.7(AJ Queueing Models with Renewal Arrivals. I(t) dt represents the amount of time in the first cycle that X(t) E j Since [I. . then {Xn' n :> 1} is a regenerative process that regenerates whenever Xn takes value O. If we now define Xn by Xn = { -1 ~ if Zn =0 if Zn > 0 if Zn < 0. we will first study the symmetric random walk {Zn.. . Namely. The process {Zn..Z2n-1 =1= 0. which states that un-the probability that the symmetnc random walk is at 0 at time 2n-is also equal to the probability that the random walk does not hit 0 by time 2n. Let Un = P{Z2n = O} and note that (3.S(E) of Chapter 1 (the ballot problem example) the expression for the probability that the first visit to 0 in the symmetric random walk occurs at time 2n. To obtain some of the properties of this regenerative process.::: O} is called the symmetnc random walk process. Z2n = O} = e)mh 2n _ 1 == 2n -1' We will need the following lemma. n 2 O}. Z2 =1= 0.72) P{ZI =1= 0. (3. . n .REGENERATIVE PROCESSES 143 and define Zo = 0.7.1) 2n -1 2n Un = Un-I' Now let us recall from the results of Example l. . 3) which we will do by induction on n.1 Un 2n .7.. .7.nV2ii-that (2n)2n+IJ2 e -2nV2ii 1 2n +e. it follows upon using an approximation due to Stirling-which states that n! .n-1 2:_U_k___ n _ U_ k~1 2k - 1 k=1 2k . with probability 1.1 Now 1- = 1.n and so Un ~O as n ~ 00.nn+ 1J2 e.3 Proof From (3 7 2) we see that Hence we must show that (37. Thus from Lemma 3. the symmetric random walk will return to the origin.1 = Un_I = Un Thus the proof is complete - (by the induction hypothesis) (by (371». the above identity holds since UI =! n 2:_U_k_= 1.' 1 Un . ~ (~) (i)'".3 we see that.1 2n .144 RENEW AL THEORy Lemma 3. Since u. When n So assume (373) for n .2n (21T)2 2n = V. which is that if we plot the symmetric random walk (starting with Zo = 0) by connecting Zk and Zk+1 by a straight line (see Figure 3.1).2k time units is the same as the probability given in Proposition 3. . of the first eight time units the random walk was positive for six and negative for two) z" Figure 3.REGENERATIVE PROCESSES 145 o up The next proposition gives the distribution of the time of the last visit to to and including time 2n PROPOSITION 3.7. J. where we have used Lemma 3 7 3 to evaluate the second term on the right in the above We are now ready for our major result. ZU+I -:/= 0. Proof P{Z2. .4.U n _. 1.4 For k = 0. = P{Z2.7. = 0}P{Z2HI = U. Z2n -:/= O} -:/= 0. = 0. (For the sample path presented in Figure 371. n.7. A sample path for the random walk. then the probability that up to time 2n the process has been positive for 2k time units and negative for 2n . .7. . = O}P{T= 2r} .Un. it is equally likely that the random walk has always been positive or always negative in (0. the time of the first return to 0. and let bkn = P(E kn ) Then (374) Proof The proof is by induction on n Since Uo = 1.=1 n n = ~ L un-. n == and so (by Lemma 3 7 3) . bn-.=1 where the last equality.=1 = L P{Zz" = OiT= 2r}P{T= 2r} . n L un-.==1 Un. it follows that (374) is valid when n = 1 So assume that bkm = UkUm-k for all values of m such that m < n.-2. n_.7. conditioning on T.=1 n Now given that T = 2r. . follows from the induction hypothesis Now. To prove (374) we first consider the case where k = n Then.=1 .n-. n = Un-.5 RENEWAL THEORY Let Ek n denote the event that by time 2n the symmetric random walk wil( be positive for 2k time units and negative for 2n . . yields bnn = L P{Enni T = 2r}P{T = 2r} + P{EnniT> 2n}P{T > 2n}. P{EnniT> 2n} = t and so bnn = ~ L bn-.2k time units.P{T = 2r} + !P{T > 2n} .P{T= 2r} + ~P{T> 2n}. 2r) and it is at 0 at 2r Hence.146 THEOREM 3.P{T= 2r} = L P{Zz. r=l where the last equality follows from the induction hypothesis As L uk-rP{T = 2r} :::: Uk. r~l /I L U" . by Stirling's approximation.k) . also for k = 0 The proof that (374) is valid for 0 < k < n follows in a similar fashion Again. namely. that UIr.=== 1TYk(n . 2r) Hence in order for Ek n to occur.=1 we see that II r-kP{T = 2r} = UII _ . which completes the proof The probability distribution given by Theorem 37. it is equally likely that the random walk has either always been positive or always negative in (0. bkn = ~ L bk-rll-rP{T r~1 /I /I ~r} + t L bkn_rP{T = 2r} r-I II n = ~Un-k L uk_rP{T 2r} + ~Uk L U.REGENERATIVE PROCESSES 147 Hence (374) is valid for k = n and. by symmetry.U II Ie ~ 1 --.-r-kP{T = 2r}. conditioning on T yields bkll = L P{Ehl T .. .2r positive units in the former case and 2k in the latter Hence.5. in fact. the continuation from time 2r to 2n would need 2k . We call it such since for large k and n we have.. is called the discrete arc sine distribution.-1 /I 2r}P{T = 2r} Now given that T 2r. The reason why the above (the fact that the limiting probability that a regenerative process is in some state does not equal the long-run proportion of time it spends in that state) is not a contradiction.y) 1 fX -.-0 (from Lemma 3. the mean time between visits of the symmetric random walk to state 0 is such that E[Tl 00 The above must be true. It also "..3) OIl and as Un -- (Vn"1T)-1 we see that ElTJ 00 . we see that the probability that the proportion of time in (0. follows directly from Lemma 3. 0 < x < 1. it is clear by symmetry and the fact that Un --+ 0 and n --+ 00 that P{X" = I} = P{Zn > O} --+ i as n --+ 00.7.==:=1== dw 1l 0 = 2 1l . for instance. I arcsme vx 4 Thus we see from the above that for n large the proportion of time that the symmetric random walk is positive in the first 2n time units has approximately the arc sine distribution given by (3 7 5) Thus.> 2: P{T> 2n} . 2n) that the symmetric random is positive is less than x.7 3 since E[TJ. it follows that the proportion of time that Xn equals 1 does not converge to some constant On the other hand.148 RENEWAL THEORY Hence for any x. is given by (3. is that the expected cycle time is infinite That is. the probability that it is positive less than one-half of the time is (2111) arc sine Vi = ~. for otherwise we would have a contradiction. which keeps track of the sign of the symmetric random walk. One interesting consequence of the above is that it tells us that the proportion of time the symmetric random walk is positive is not converging to the constant value! (for if it were then the limiting distribution rather than being arc sine would be the distribution of the constant random variable) Hence if we consider the regenerative process {Xn }.75) k -0 2: nX 1 UkUn-k :::::: 1l f v' nX 1 dy 0 Y(n . 5. lim P{N(t) > O} 1_0 t (381) where . r ::::::. THEOREM 3. for 0 < x < 1 and n large.5).8 STATIONARY POINT PROCESSES A counting process {N{t).2. 2f(tI2) and by induction f(t) ::.2 that the equilibrium renewal process is one example of a stationary point process. 3. k:ru ru-} UkUn-k = 2.7. n P{no zeroes between 2nx and 2n} = 1 . 1T UkUn-k k:O 2 '. Hence f(t) ::.7 4 and Stirling's approximation that.STATlONARY POINT PROCESSES 149 Remark It follows from Proposition 3. 2.1 For any stationary point process.8.-arcsme vx. . t ~ O} that possesses stationary increments is called a stationary point process We note from Theorem 3..N(s) > O} + P{N(s) > O} = f(t) + f(s). excluding the trivial one with P{N(t) = O} = 1 for all t ~ 0.\ > 0 ' = 00 is not excluded Letf(t) = P{N(t) > O} and note thatf(t) is nonnegative and non decreasing Also f(s + t) = P{N(s + t) . . P{N(s + t) .N(s) > 0 or N(s) > O} ::.\ Proof = . where the final approximation follows from (3. nf(tln) for all n = 1. 1 A.8. IL t HO IL . s) ° let) =::: n . using L'hospital's rule. it fOIlOWS that for all t E (0.. from (383). and the proof is complete ExAMPLE 3.1 [(s) t n s > n .. [(a) < [(aln) a .o [(t)lt. fix any large A > and choose s such that [(s)ls > A Then... for any t E (0. it follows that Iim.1 [(sin) n .82).8(A) For the equilibnum renewal process P{N(t) > O} = Fe(t) = f~ F(y) dyllL Hence. First. we obtain that for all t in this interval (383) Hence.2) we obtain ...1 [(s) ------=:::. fix 6 > 0.aln Now define A = lim sup [(t)lt By (3. and let s > be such that [(s)ls > A . n which implies lim.---sl(n .11 n sin n s Since 6 is arbitrary.2...150 Thus. suppose that A < 00 In this case.6.1 From the mono tonicity of let) and from (3.0 To show that A = lim. -=::: let) t [(sin) _ n .... letting a be such that [(a) (382) RENEWAL THEORY > 0. we have aIln=I. In this case. and since n ~ 00 as t ~ 0. we consider two cases..o [(t)lt = A Now assume A = 00. A = lim P{N(t) > O} = lim F(t) HO = 1.~o [(t)lt == 00.. Now. s) there is an integer n such that ° -~t~-­ s n s n. To see this. A stationary point process is said to be regular or orderly if (38. In order to determine when c = A. the converse of this is also true That is.5. . any stationary point process for which c is finite and for which there is zero probability of simultaneous events. .4) goes to zero as n ~ 00. When c < 00. The proof in this direction is more complicated and will not be given We win end this section by proving that c = Afor a regular stationary point process.4) implies that the probability that two or more events will occur simultaneously at any point is O. we have that + s)] == E[N(t + s) . for the equilibrium renewal process.4) P{N(t) ::> 2} = o(t) It should be noted that. This is known as Korolyook's theorem. we need the following concept.n 1. A is the rate of the renewal process. E[N(t)] = ct What is the relationship between c and A? (In the case of the equilibrium renewal process.8. it follows from Example 3 8(A) and Theorem 3. which by (3.STATIONARY POINT PROCESSES 151 Thus.8. (3. The probability of a simultaneous occurrence of events is less than the probability of two or more events in any of the intervals j = 0. divide the interval [0. for some constant c. 1.N(s)] + E[N(s)] = E[N(t)] + E[N(s)] implying that. t E[N(t ~ O}. and thus this probability is bounded by nP{N(lIn) ~ 2}. is necessanly regular. For any stationary point process {N(t). for a stationary point process. 1] into n equal parts.2 that A = c = 111-') In general we note that since c= i ~i n nP{N(t) t = n} n} =I n=1 P{N(t) t t _ P{N(t) > O} it follows that c ~ A. for the event {N(I) > k}.N 1) = k} Let B > 0 and a positive integer m be given From the assumed regularity of the process. "' W1.fortbe even+ C: W2}. defined by (38 I). 1) .152 KOROLYOOK'S THEOREM RENEWAL THEORY For a regular stationary point process. Therefore...I.N(1) -NC: '" j=O.n-I Bn = U Bn!' }:O n-l C•• fortbeevent {NC: 1) .}). the mean number of events per unit time and the intensity A.lin. }:O n-1 P(A. (385) where Bn is the complement of Bn However.N B. are equal The case A = c = 00 is not excluded Proof Let us define the following notation A. .Iin and hence = U c. c. a little thought reveals that n-l A. }:O .Bn) S L P(Cn. it follows that B P(Bn}) < n(m + 1)' for all sufficiently large n Hence. :::: . Express in words what the random variable XN(I}+I represents (Hint is the length of which renewal interval?) Show that P{XN(!}+I It ~ x} . In defining a renewal process we suppose that F( (0). Compute the above exactly when F(x) = 1 . . Argue that when F( (0) < 1 the total number of renewals. then after each renewal there is a positive probability 1 .e-AJ:.1.\ + 28 and the result is obtained as is arbitrary and it is already known that c .\ + 28 k=O 00 Hence. Is it true that· (a) N(t) < n if and only if Sn > t? (b) N(t) < n if and only if Sn . .\ PROBLEMS 3. equals 1 If F( (0) < 1. the probability that an interarrival time is finite. .F(oo) that there will be no further renewals. c == £ [N(1)l = L P{N(1) > k} == L P(A k ) 8 00 00 ::.:::: F(x). is such that 1 + N( (0) has a geometric distribution with mean 1/(1 .> t? (c) N(t) > n if and only if Sn < t? 3.2. call it N( (0). L L P( C rO k-O m nk ) +8 == nP{N(lIn) ::: 1} + 8 ::.\ + 28 for all n sufficiently large Now since (386) is true for all m. it follows that L P(Ad ::..F( (0)) 3.PROBLEMS 153 which together with (3 85) implies that (386) L P(A k=O m n-I k ) ::.3. • in is a permutation of 1. E [ Xl + 'N'('t)+ XN(I) N(t»O ] =E[X1IXI<t].Xn . t . ::5 3. X NI " N(I) ~ n ] = E[XoIN(I) = n). the event times SI" . 1) distribution function show that met) =e l - 1. .2:: O} be a renewal process and suppose that for all nand t. t :> O} is a Poisson process (Hint Consider E[N(s) IN(t)] and then use the result of Problem 3. Xn ::s. X. conditional on the event that N(t) = n.n That is. an arrival only enters the bank if the server is free when he or she arrives Let G denote the service distribution (8) At what rate do customers enter the bank? (b) What fraction of potential customers enter the bank? (c) What fraction of time is the server busy? .5.8. Prove the renewal equation met) = F(t) RENEWAL THEORY + f: met . i2 . has the same joint distribution as XI" . X2. Now argue that the expected number of uniform (0. 3. Consider a single-server bank in which potential customers arrive at a Poisson rate A However. X 2 . Prove that the renewal function met). Let Xl. Let {N(t). rival times of a renewal process (8) Argue that conditional on N(t) = n. 0 the interarrival distribution F t < 00 uniquely determines 3. 1) random variables that need to be added until their sum exceeds 1 has mean e 3.7. X 2 <: Xl> ••• . Xn are exchangeable Would XI"" . The random variables Xl' . Xl' . they are exchangeable if the joint distribution function P{XI :s XI. Xn are said to be exchangeable if X'I' .4. Xn+l be exchangeable (conditional on N(t) = n)? (b) Use (a) to prove that for n > 0 E X. 2.9.) 3.Xn whenever ii.5. •• xn). xn} is a denote the interarsymmetric function of (XI.6. (c) Prove that l + N(~.x) dF(x). . Sn are distnbuted as the order statistics of a set of independent uniform (0.154 3. If F is the uniform (0. t) random variables Show that {N(t). . Let XI> X 2" .. Door 1 leads her to freedom after two-days' travel. N 2 .. door 2 returns her to her room after four-days' journey. + 5m m m N. and let T denote the time it takes the miner to become free. N1+ +Nm . + X N1 + N). (c) Equate the two expressions to obtain Wald's equation 3. Suppose at all times she is equally to choose any of the three doors.. + X NI has the same distnbution as X N1 + 1 + ..11. and so on. " be independent and identically distnbuted X. (b) Use Wald's equation to find ErT].:N +1 X" L 1 . (3) Define a sequence of independent and identically distnbuted random vanables XI> X 2 . Now start sampling the remaining X..] stopping times for the sequence XI> X2 . and a stopping time N such that N T=LX. Now start sampling the remaining Xr-again acting as if the sample was just beginning-and stop after an additional N 3 . (3) Let Nl+N2 52 = .. .10. +5) m ( NI + . . stopping at N I . for instance. and door 3 returns her to her room after eight-days' journey. Also let NI . + N m ' derive another expression for the limit in part (a).-acting as if the sample was just beginning with XN1 + I -stopping after an additional N2 (Thus. + N m ' (b) Writing 51 + . " with ErN. 5m= r:N 1+ L Xr' +Nm_I+1 Use the strong law of large numbers to compute m r1 !!!.] < 00. 5 I + . Observe the Xr sequentially.... XI + . + 5m NI + . + N m 51 + . •• .... + . . be independent and identically distributed with E r < 00. Consider a miner trapped in a room that contains three doors. r"l Note You may have to imagine that the miner continues to randomly choose doors even after she reaches safety..PROBLEMS 155 3.. Show how Blackwell's theorem follows from the key renewal theorem. with probability 1. (d) Use part (c) for a second derivation of E [T] 3. the distribution of time between entrances to state 1. Let A(t) and Y(t) denote respectively the age and excess at t. (e) If I-L + x12) = s}.4 6 to show that '" lim E[Y(t)] . (b) P{Y(t) > x IA(t (c) P{Y(t) > x IA(t (d) P{Y(t) > x. Consider a renewal process whose interarnval distnbution is the gamma distnbution with parameters (n.IN = n] and note that it is not equal to E[~~~I X. '" Assume that H. 2A Now explain how this could have been obtained without any computations 3. 1.14. show that.. A(t) > y}.. (d) Compute the joint distribution of A(t) and Y(t) for a Poisson process 3. Find lim P{process is in sta te i a t time t}. 3.. .. '" = n + 1. where it remains for a time having distribution F2 • When it leaves 2 it goes to state 3. A(t)lt ---+ 0 as t 3.13. --+ 00 < 00.156 RENEWAL THEORY (c) Compute E[~~I X.15..16. From state n it returns to 1 and starts over.17. Let A(t) and Y(t) denote the age and excess at t of a renewal-process Fill in the missing terms: (a) A(t) > x +-+ 0 events in the interval _ _ ? (b) Y(t) > x +-+ 0 events in the interval _ _ ? (c) P{Y(t) > x} = P{A( ) > }. + x) > s} for a Poisson process. An equation of the form g(t) = h(t) + J:g(t - x) dF(x) . . and so on... is nonlattice and has finite mean. ... A process is in one of n states. where it remains for an amount of time having distribution Fl' After leaving state 1 it goes to state 2.. 3.]. n Initially it is in state 1. A). Find: (a) P{Y(t) > x IA(t) = s}.ll. Use Proposition 3.2. PROBLEMS 157 is called a renewal-type equation. Prove Equation (3. (a) Compute the mean number of flips until the pattern HHTHHIT ap- pears.3).5. 3.20.19. (b) Which pattern requires a larger expected time to occur: HHIT or HTHT? . Would the number of events by time t constitute a (possibly delayed) renewal process if an event corresponds to a customer: (a) entering the bank? (b) leaving the bank? What if F were exponential? 3.1S. Obtain a renewaltype equation for: (a) P(t). where m(x) = ~:~I Fn(x). Apply the key renewal theorem to obtain the limiting values in (a) and (b) 3. In Problem 3. one can then apply the key renewal theorem to obtain Iimg(t) = 1-'" r ~ h(t) dt _ .9 suppose that potential customers arrive in accordance with a renewal process having interarrival distnbution F. In convolution notation the above states that g = h + g * F.x) dm(x). Either iterate the above or use Laplace transforms to show that a renewal-type equation has the solution g(t) h(t) = + t h(t . the expected age of a renewal process at t. fo F(t) dt Renewal-type equations for g(t) are obtained by conditioning on the time at which the process probabilistically starts over. If h is directly Riemann integrable and F nonlattice with finite mean. (b) g(t) = E[A(t)]. the probability an alternating renewal process is on at time t. Consider successive flips of a fair coin. Find the expected number of flips until the following sequences appear. either wins or loses 1 unit with respective probabilities p and] . t 2: O} whose first interarnval has distribution G and the others have distribution F Let mo(t) = E[No(t)] (0) Prove that mo(t) = G(t) + f: m(t . 3. (0) A = HHTIHH (b) B = HTHTI. Draw cards one at a time. o ----+ x dF(x) 2 (c) Show that if G has a finite mean.158 RENEWAL THEORY 3. 3.22.23.25.24. A coin having probability p of landing heads is flipped k times Additional flips of the coin are then made until the pattern of the first k is repeated (possibly by using some of the first k flips). (c) Find P{A occurs before B} (d) Find the expected number of flips until either A or B occurs 3. Consider a delayed renewal process {No(t).2L On each bet a gambler. independently of the past. where m(t) = ~:. then f: E[AD(t)] ----+ 2 J'" xdF(x) . Consider successive flips of a coin having probability p of landing heads.p Suppose the gambler's strategy is to quit playing the first time she wins k consecutive bets. 3. from a standard deck of playing cards Find the expected number of draws until four successive cards of the same suit appear. Show that the expected number of additional flips after the initial k is 2k. At the moment she quits (0) find her expected winnings (b) find the expected number of bets that she has won.x) dG(x). then tG(t) 0 as t ----+ 00 .I Fn(t) (b) Let Ao(t) denote the age at time t Show that if F is nonlattice with fx 2 dF(x) < 00 and tG(t) ----+ 0 as t ----+ 00. with replacement. Suppose now that p = 112. _ E[X2] . In Example 36(C) suppose that the renewal process of arrivals is a Poisson process with mean IL Let N* denote that value of N that minimIZes the long-run average cost if a train leaves whenever there are N travelers waiting Another type of policy is to have a train depart every T time units Compute the long-run average cost under this policy and let T* denote the value of T that minimizes it. 3. The life of a car is a random variable with distribution F An individudl has a policy of trading in his car either when it fails or reaches the age of A.E[X] . When the cycle reward is defined to equal the cycle length. · E[ reward In ( I. the above yields ~~~ E[XN(I)+d . be the same in both parts .27. 3.28. Show that the policy that departs whenever N* travelers are waiting leads to a smaller average cost than the one that departs every T* time units. Let R(A) denote the resale value of an A-year-old car. as I ~ 00. Prove Blackwell's theorem for renewal reward processes That is. I +)] ~ a E[reward in cycle] a E[ . assuming that the cycle distribution is not lattice. show that. There is no resale value of a failed car Let C1 denote the cost of a new car and suppose that an additional cost C2 is incurred whenever the car fails. time of cycle] Assume that any relevant function is directly Riemann integrable. For a renewal reward process show that Assume the distribution of Xl is nonlattice and that any relevant function is directly Riemann integrable. (a) Say that a cycle begins each time a new car is purchased Compute the long-run average cost per unit time (b) Say that a cycle begins each time a car in use fails Compute the long-run average cost per unit time No Ie.29.PROBLEMS 159 3.26. of course. which is always greater than E[X] except when X is constant with probability] (Why?) 3. In both (a) and (b) you are expected to compute the ratio of the expected cost incurred in a cycle to the expected time of a cycle The answer should. A system consisting of four components is said to work whenever both at least one of components 1 and 2 work and at least one of components 3 and 4 work Suppose that component i alternates between working and being failed in accordance with a nonlattice alternating renewal process with distributions F.6 1 define Vet). the work in the system at time t. denote the amount of time the ith customer spends waiting in queue and define W Q = lim (D] n-+'" + . that is. Consider the policy that flips each newly chosen coin until m consecutive flips land tails. . + Dn)ln (a) Argue that V and W Q exist and are constant with probability 1. as the sum ofthe remaining service times of all customers in the system at t Let v = lim '-+'" II V(s) dslt. (a) Find 3.31. and G" i = 1.. 0 Also.30.32 Consider a single-server queueing system having Poisson arrivals at rate A and service distribution G with mean ILG. 4 If these alternating renewal processes are independent. For the queueing system of Section 3. (b) Prove the identity V = AE[Y]WQ + AE[y2]12. Suppose that AILG < 1 Po. the long-run proportion of flips that land heads is 1 3. 2.33. with probability 1. the proportion of time the system is empty. the probability density is O<p<1. 3. (b) Say the system is busy whenever it is nonempty (and so the server is busy) Compute the expected length of a busy period (c) Use part (b) and Wald's'equation to compute the expected number of customers served in a busy period.. find lim P{system is working at /->'" time t} 3.160 RENEWAL THEORY 3. let D. Suppose in Example 3 3(A) that a coin's probability of landing heads is a beta random variable with parameters nand m. then discards that coin and does the same with a new one For this policy show that. picking up all waiting packages. Packages arnve at a mailing depot in accordance with a Poisson process having rate A. Reference 9 provides an illuminating review paper in renewal theory For a proof of the key renewal theorem under the most stringent conditions. the interested reader should also see Reference 3 Examples 3 3(A) and 3 5(C) are from Reference 5.J4. 3. arnve in accordance to a renewal process with nonlattice interarrival distribution F Let X(t) denote the number of packages waiting to be picked up at time t. 1966 . Methuen. Consider a regenerative process satisfying the conditions of Theorem 3.7 1. 1962 2 H. with probability 1. Trucks. 1 D. Renewal Theory. is the limiting probability that X(t) equals j. If the expected reward dunng a cycle is finite. show that the long-run average reward per unit time is. J where P. R Cox.REFERENCES 161 where llA is the mean interarnval time and Y has the distnbution of a service time. 3.35. Leadbetter. i . where Y is a service time and X an interamval time. Suppose that a reward at rate r(j) is earned whenever the process is in state j.:> O. (Hint: Give an example where the system is never empty after the initial arrival. Wiley. London. (a) What type of stochastic process is {X(t). is not sufficient for a cycle time to be necessarily finite. REFERENCES References 1. more intuitive approach is given in Reference 8 Reference 10 has many interesting applications. and 11 present renewal theory at roughly the same mathematical level as the present text A simpler.r(j). t . Cramer and M. In a k server queueing model with renewal arrivals show by counterexample that the condition E[Y] < kE[X].) 3.6. Stationary and Related Stochastic Processes. New York.36. the reader should see Volume 20fFeller (Reference 4) Theorem 362 is known in the queueing literature as "Little's Formula" Our approach to the arc sine law is similar to that given in Volume 1 of Feller (Reference 4). 7. given by lim H'" II r(X(s» ds 0 = t L P.:> O}? (b) Find an expression for ' lim P{X(t) 1-+'" := i}. FL. New York. Pekoz. 1994 11. Orlando. Vol 1. Introduction to Probability Models. Vols I and II. pp 243-302 ( 10 H C Tijms. Theory of Probability. Vol 10. Academic Press.1993 9 W Smith. E. McGraw-Hili. Orlando. New York. A First Course in Stochastic Processes. Stochastic Models. 1957 and 1966 5 S Herschkorn. NJ. and S. 1970 4 W Feller. 1. 1975 8 S M. 20. 5th ed • Academic Press. Wiley. Wolff. New York. Stochastic Models in Operations Research. FL. 2nd ed . 1996 6 D Heyman and M Sobel. Wiley. Wiley. 1982 7 S Karlin and H Taylor." Journal of the Royal Statistical Society. An Algorithmic Approach. 1989 ." Probability in the Engineering and Informational Sciences. M Ross. "Policies Without Memory for the Infinite-Armed Bernoulli Bandit Under the Average Reward Cnterion. Volume I. An Introduction to Probability Theory and Its Applications. "Renewal Theory and its Ramifications. Prentice-Hall. Senes B.162 RENEWAL THEORy 3 B DeFinetti. (1958). R. Ross. New York. Stochastic Modeling and the Theory of Queues. we have that i.i n _ I' i. this set of possible values of the process will be denoted by the set of nonnegative integers {O. . 1. next make a transition into state j. That is. This is called the Markovian property. the conditional distribution of any future state Xn + I' given the past states X o. Unless otherwise mentioned. then the process is said to be in state i at time n. 2. " Let P denote the matrix of one-step transition probabilities P". . ii' " . we suppose that (4. n = 0.1.. j and all n ~ 0. Since probabilities are nonnegative and since the process must make a transition into some state. 1. so that Poo POI P02 PIO PII PI2 P= P. i = 0.O P. 1. Xn _I = in -I'" • . for all states i 0. there is a fixed probability P" that it will next be in state j. Such a stochastic process is known as a Markov chain. .Xn_I and the present state X n.1 1) may be interpreted as stating that. The value P" represents the probability that the process will.CHAPTER 4 Markov Chains 4. when in state i.1 INTRODUCTION AND EXAMPLES Consider a stochastic process {X n .! P..} that takes on a finite or countable number of possible values.}.2 163 ...j ~O.2. for a Markov chain. We suppose that whenever the process is in state i. Equation (4.1) P{Xn + 1= j IXn = i. is independent of the past states and depends only on the present state.XI = i . Xo = io} = Pi.. XI'" .. If Xn = i. n . we would care how long the person in service had already been there (since the service distribution G is arbitrary and therefore not memoryless) As a means of getting around the above dilemma let us only look at the system at moments when customers depart That is. There is a single server and those arrivals finding the server free go immediately into service.3) P{Yn = j} = f ea: AX (Ax) I -. the nth departure leaves behind Xn customers-of which one enters service and the other Xn .1.-. that they are independent and (4. o J. at the next departure the system will contain the Xn . the number 1 indicates that there is a single server.2) and (41. then {X(t).} is a Markov chain with transition probabilities given by POI = f'" e. The above system is called the MIG/1 queueing system. From (4. let Xn denote the number of customers left behind by the nth departure.1.2) if Xn > 0 if Xn = O.(A~)I dG(x). n ~ 1. (j _ i + 1)' dG(x). it follows.1 customers tha t were in line in addition to any arrivals during the service time of the (n + 1)st customer Since a similar argument holds when Xn = 0. and they are also assumed to be independent of the arrival process. '. c If we let X(t) denote the number of customers in the system at t.164 EXAMPLE 4.(A) MARKOV: CHAINS The M/G/1 Queue.1. j = 0. to predict future behavior. let Y n denote the number of customers arriving during the service period of the (n + 1)st customer When Xn > 0. The letter M stands for the fact that the interarrival distribution of customers is exponential.1 wait in line. all others wait in line until their service turn The service times of successive customers are assumed to be independent random variables having a common distribution G. . otherwise . Since Y n . .1. G for the service distribution. the arrival process being a Poisson process. Suppose that customers arrive at a service center in accordance with a Poisson process with rate A. we see that ( 4.3) it follows that {Xn' n = 1. Hence.::=: 0. represent the number of arrivals in nonoverla pping service intervals. whereas we would not care how much time had elapsed since the last arrival (since the arrival process is memoryless). AX o eo P'I = P'I =0 f e0 AX J (Ax)r 1+ I j. t ~ O} would not possess the Markovian property that the conditional distribution of the future depends only on the present and not on the past. dG(x). 2.1. For if we knew the number in the system at time t. then..::=: 1 Also. this implies that the number of services thus constitutes a Poisson process. ~ 0 k~. So 0 and S" " 2: X" .. and by choosing these time points so as to exploit the lack of memory of the exponential distribution This is often a fruitful approach for processes in which the exponential distribution is present EXAMPLE4. .I.. Remark The reader should note that in the previous two examples we were able to discover an embedded Markov chain by looking at the process only at certain time points.1(c) Sums ofIndependent. = j} If we let = a" J 0. Identically Distributed Random Variables. ::tl. then the next arrival will find i + 1 minus the number served. If we let XII denote the number of customers in the system as seen by the nth arrival.. as we know. and the probability that j will be served is easily seen (by conditioning on the time between the successive arrivals) to equal the right-hand side of the above The formula for P. it is easy to see that the process {X"' n 2:: I} is a Markov chain To compute the transition probabilities P" for this Markov chain.1(B) f '» e-I'I o (p::)' dG(t).INTRODUCfION AND EXAMPLES 165 The G/M/l Queue. The General Random Walk.t This is true since the time between successive services is exponential and. Suppose further that the service distribution is exponential with rate p. let us first note that.i. as long as there are customers to be served. be independent and identically distributed with P{X. the number of services in any length of time t is a Poisson random variable with mean p. Therefore.-I then {SlIt n > O} is a Markov chain for which .. which follows since if an arrival finds i in the system. e-1'1 +I (p. J.o is little different (it is the probability that at least i + 1 Poisson events occur in a random length of time having distribution G) and thus is given by P = 10 f'" L. EXAMPLE 4. Suppose that customers arrive at a single-server service center in accordance with an arbitrary renewal process having interarrival distribution G. .t)k dG(t) kl ' i 2:: O. j=O. Let X" i > 1. 166 MARKOV CHAINS {Sn' n . P{X.1. 0 < P < 1. then no matter what its previous values the probability that Sn equals i (as opposed to -i) is p'/(p' + q '). . IS. n-j . j -+2 2 IS. = -I} = q == 1 P{X. ISnl = i The first of which results in Sn = i and has probability . n ~ I}.. . i k = O}.+ I' Now there are two possible values of the sequence S.-+p2 2q2 2 n -.2. Now consider ISnl.+II = i.. p. n ~ I} measures at each time unit the absolute distance of the simple random walk from the origin. To prove this we will first show that if Snl = i. The random walk {Sn.+IoIS. ExAMPLE 4. n :?: 1} is a simple random walk. then Proof If we let i 0 = 0 and define j = max{k 0 s. + I' .. n. Sn for which .. = I} = p. then. the absolute value of the simple random walk. where Sn = ~~ X" is said to be a simple random walk if for some p. n -. I = i I} = P{Sn = illSnl = i. The process {ISnl.. Somewhat surprisingly {ISn'} is itself a Markov chain.> O} is called the general random walk and will be studied in Chapter 7. since we know the actual value of S" it is clear that P{Sn = i.1 If {Sn. PROPOSITION 4. IS.ISn -II = in _I'" .1(D) The Absolute Value of the Simple Random Walk.. and the second results in Sn = . + II = q-2.i and has probability --. Thus in the simple random walk the process always either goes up one step (with probability p) or down one step (with probability q). P n-. . k s. .1 = O} = i IISnl 1. ""'.1 it follows upon conditioning on whether or -i that Sn = +i + P{Sn+1 Hence. .CHAPMAN-KOLMOGOROV EQUATIONS AND CLASSIFICATION OF STATES 167· Hence. n ~ 0.'n-I-_-!.J ~ pn pm Ik kl for all n. From Proposition 4. i. That is.. _.-n--~i+-!. .-I} _. . Of course P 11 = P'I The Chapman-KoLmogorov equations provide a method for computing these n-step transition probabilities.j ~ 0.'-n---I.Pr r - I.__ -I p2 i -+- n-I 2q2 --- n-j i 2 p2 2q2 2+p2 2q2 2 pi pi +q' and the proposition is proven. all i. m ~ 0.2..1) pn+m = II k=O L. P{Sn = I i ISnl = i.-""". These equations are co (4. .1.-(I + l)I Sn .+1 Pi i + I - p r + qi+1 r +q _ - 1 .. n ~ I} is a Markov chain with transition probabilities _ p. i > 0.ISII = il} = = ~n-I-+-:'!. j. 4. qr _ pr+1 r - + qi+1 r· p +q r P +q i {ISnl. .2 CHAPMAN-KOLMOGOROV EQUATIONS AND CLASSIFICATION OF STATES We have already defined the one-step transition probabilities Pil • We now define the n~step transition probabilities P~I to be the probability that a process in state i will be in state j after n additional transitions. P (n) = P . then i ++ k.IP~k' k=O '" If we let p(n) denote the matrix of n-step transition probabilities Equation (4. Xo = i}P{Xn = klXo = i} k=O '" = 2. P (n -I) = P . Hence. P~I > 0. P{Xn+m = jlXn = k.168 and are established by observing that pn+m II = MARKOV CHAINS P{Xn +m =J'IX0 = i} '" = 2. (0) if i ++ j. = P n • and thus p(n) may be calculated by multiplying the matrix P by itself n times State j is said to be accessible from state i if for some n :> 0.. Proof The first two parts follow tnvially from the definition of communication To prove (iii). and we write i ++ j. suppose that i ++ j and j ++ k. Two states i and j accessible to each other are said to communicate. Hence. (iii) if i ++ j and j ++ k. n such that P~ > 0. k=O P{Xn+m = j.2. P7k > O. PROPOSITION 4.k 'I Ik Similarly.1 Communication is an equivalence relation That is (i) i++i. then there exists m. then j ++ i.Xn = klXo = i} = 2. m P 'k In -- L. we may show there exists an s for which Pi" > 0 . P . P.=0 '" '" pmpn :> pmpn > 0• rr .2) = .2 1) asserts that P~I' then where the dot represents matrix multiplication. P (n .J .. 2 ]. . whereas the right-hand side is the probability of the same event subject to the further restriction that the chain is in i both after nand n + s transitions Hence.2. t. (Note that for i =. > 0 Then pn pm > 0 I' I' 'I pn pn II m :> IJ m:> j. whenever P. starting in i. and by Proposition 4. k = ]. then d(i) = dO) p~ P7. and transient otherwise. Then t'l denotes the probability of ever making a transition into state j. dO) divides both n + m and n + s + m. til = P{Xn = j.6 j.n . then define the period of i to be infinite) A state with period ] is said to be aperiodic Let d(i) denote the period of i. if all states communicate with each other State i is said to have period d if P~. the first transition into j occurs at time n Formally.2 If i ~ j.] IXo = i}. for instance.CHAPMAN-KOLMOGOROV EQUATIONS AND CLASSIFICATION OF STATES 169 Two states that communicate are said to be in the same ciass. j - pn P' pm > 0 "'I ' where the second inequality follows. j is accessible from i. any two classes are either disjoint or identical We say that the Markov chain is irreducible if there is only one class-that is. We now show that periodicity is a class property ° ° PROPOSITION 4.. j Proof Let m and n be such that > 0. t?1 = 0. and suppose that P.. and only if. thus d(i) = dO) For any states i and j define t~1 to be the probability that. thus n + s + m . = for all n > 0. Therefore. given that the process starts in i. Let .(n + m) = s. dO) divides d(i) A similar argument yields that d(i) divides dO). since the left-hand side represents the probability that starting in j the chain will be back in j after n + s + m transitions. = whenever n is not divisible by d and d is the greatest integer with this property. (If P~. X k=. > O.) State j is said to be recurrent if ~I = ].1 is positive if.6 j. suppose the states are 0. MA~KOV CHAINS '" /I L. with probability 1. .3 State] is recurrent if. But as the process must be in some state after time T.. hence the number of visits is geometric with finite mean 1/(1 .}) By the above argument we see that state j is recurrent if. and only if. M and suppose that they are all transient Then aft~r a finite amount of time (say after time To) state 0 will never be visited. we see that. This leads to the conclusion that in a finite-state Markov chain not all states can be transient To see this. and only if. letting I n = 00 = {1 0 otherwise. which shows that at least one of the states must be recurrent We will use Proposition 423 to prove that recurrence. with probability 1. it will return again to j Repeating this argument.lbabilistically restarts itself upon returning to j Hence. the number of visits to j will be infinite and will thus have infinite expectation On the other hand. by the Markovian property it follows that the process pr<.. Since it follows that L. after a finite time T = max{To. 1. with probability 1. pn = 00 n~1 Proof State] is recurrent if. In denotes the number of visits to j the result follows The argument leading to the above proposition is doubly important for it also shows that a transient state will only be visited a finite number of times (hence the name transient).. is a class property . T 1 . we arrive at a contradiction. E[number of visits to jlXo = j] But. . .]. like penodicity. and after a time (say T 1) state 1 will never be visited.2. and so on Thus. . a process starting at j will eventually return However. . sUPP9se j is transient Then each time the process returns to j there is a positive probability 1 f/l that it will never again return. and after a time (say T2) state 2 will never be visited. T M} no states will be visited.170 PROPOSITION 4. 2.2. .. p~ > 0 Now for any s pm+n+r:> pm P' pn }/ JI II 1/ 0 and thus " L. the gambler would be even after 2n trials if.-I' i = 0. then j is recurrent ~ Proof Let m and n be such that P7} > 0.4 ++ If i is recurrent and i j. he won n of these and lost n of these As each play of the game results in a win with probability p and a loss with probability 1 .. where 0 < P < 1. .. which asserts that .p..4 that they are either all transient or all recurrent So let us consider state 0 and attempt to determine if ~:=I P oo is finite or infinite Since it is impossible to be even (using the gambling model interpretation) after an odd number of plays. The Markov chain whose state space is the set of all integers and has transition probabilities EXAMPLE 4. By using an approximation. pm+n+r 2: pm pn " 11 I' '1 L._I =P = 1 . ::t:1. and only if.2(A) P. Since all states clearly communicate it follows from Corollary 4. we must. n = 1.- CHAPMAN-KOLMOGOROV EQUATIONS AND CLASSIFICATION OF STATES 171 corollary 4. due to Stirling.2.. of course.3. . . the desired probability is thus the binomial probability n = 1.J .2. have that 2n+I_0 P 00 . is called the simple random walk One interpretation of this process is that it represents the wanderings of a drunken man as he walks along a straight line..J r pr It = 00 ' and the result follows from Proposition 423 The Simple Random Walk. Another is that it represents the winnings of a gambler who on each play of the game either wins or loses one dollar.P. On the other hand. .. . 172 where we say that a" - b" when lim" .... "'(a"lb,,) MARKOV CHAINS = 1, we obtain P 00 2" - (4p(1 - p))" V;;; Now it is easy to verify that if a" - b", then L" a" < 00, if, and I P~o will converge if, and only if, only if, L" b" < 00. Hence L:= f " " I (4p(1 - p))" V;;; However, 4pQ - p) :5 1 ",:ith equality ~oldinglif, and only If, p = l. Hence, ~,,= I P 00 = 00 If, and only If, p = ?l. Thus, the chain is recurrent when p = ! and transient if p :;6 ~. When p = !, the above process is called a symmetric random walk. We could also look at symmetric random walks in more than one dimension. For instance, in the two-dimensional symmetric random walk the process would, at each transition, either take one step to the left, nght, up, or down, each having probability 1. Similarly, in three dimensions the process would, with probability t make a transition to any of the six adjacent points. By using the same method as in the one-dimensional random walk it can be shown that the two-dimensional symmetric random walk is recurrent, but all higher-dimensional random walks are transient. ~oes. Corollary 4.2.5 If i ++ j and j is recurrent, then = I., = 1 i, and let n be such that P7, > 0 Say that we miss opportunity 1 if X" ¥- j If we miss opportunity 1, then let T[ denote the next time we enter i (T[ Proof Suppose Xo is finite with probability 1 by Corollary 424) Say that we miss opportunity 2 if X T I +" ¥- j. If opportunity 2 is nussed, let T2 denote the next time we enter i and say that we miss opportunity 3 if X T1 +" ¥- j, and so on. It is easy to see that the opportunity number of the first success is a geometnc random vanable with mean lIP71' and is thus finite with probability 1. The result follows since i being recurrent implies that the number of potential opportunities is infinite Let N, (t) denote the number of transitions into j by time t. If j is recurrent and Xo = j, then as the process probabilistically starts over upon transitions into j, it follows that {N/t), t 2: O} is a renewal process with interarrival distribution {f7" n 2: 1}. If Xo = i, i H j, and j is recurrent, then {N,(t), t 2: O} is a delayed renewal process with initial interamval distribution {f7" n 2: 1}. LIMIT THEOREMS 173 4.3 LIMIT THEOREMS It is easy to show that if state j is transient, then ex> " pn n=1 L.J " < 00 for all i, meaning that, starting in i, the expected number of transitions into state j is finite As a consequence it follows that for j transient p~, ----+ 0 as n ----+ 00. Let f.L" denote the expected number of transitions needed to return to state j. That is, if j is transient if j is recurrent. By interpreting transitions into state j as being renewals, we obtain the following theorem from Propositions 3.3.1, 3.3.4, and 3.4.1 of Chapter 3. THEOREM 4.3.1 If i and j communicate, then' (i) p{lim N,(t) It = lIlLI/IX 0 1-'" (ii) lim n_CXl k::::1 = i} = 1 L P:,ln = 1/1L1/ n~'" n (iii) If j is aperiodic. then lim (iv) If j has period d, then n-'" P7, = 1IILw lim P7/ = dllLl/' If state j is recurrent, then we say that it is positive recurrent if f.L" < and null recurrent if f.L" = 00 If we let TT = 00 , lim n-+ 00 pnd(J) " '" it follows that a recurrent state j is positive recurrent if TT, > 0 and null recurrent if TT, = O. The proof of the following proposition is left as an exercise. 174 MARKOV CHAINS PROPOSITION 4.3.2 Positive (null) recurrence is a class property A positive recurrent, aperiodic state is called ergodic Before presenting a theorem that shows how to obtain the limiting probabilities in the ergodic case, we need the fol.lowing definition Definition A probability distribution {PI' j ~ O} is said to be stationary for the Markov chain if ., PI = ,~O L P,P'f' j~O. If the probability distribution of Xo-say P, stationary distribution, then '" = P{Xo = j}, j 2= O-is a P{XI = j} = ,=0 '" L P{XI = jlXo = i}P{Xo = i} P, = ,=0 LP,P" = and, by induction, '" (4.3.1) P{Xn = j} = ,=0 '" = L P{Xn L P"P, = jlXn- 1= i}P{Xn_1 == i} Pi' ,=0 = Hence, if the initial probability distribution is the stationary distribution, then Xn will have the same distribution for all n. In fact, as {Xn, n > O} is a Markov chain, it easily follows from this that for each m > 0, X n, Xn +1' ••• , Xn +m will have the same joint distribution for each n; in other words, {X n , n ~ O} will be a sta tionary process. LIMIT THEOREMS 175 ;lEOREM 4.3.3 An irreducible aperiodic Markov chain (i) belong~ to one of the following two clas~e~ / Either the states are all transient or all null recurrent. in this case, P71 --+ 0 n --+ co for all i, j and there exists no wationary distribution (ii) Or else, all states are positive recurrent, that is, TT I In thi~ case, {TTI,j = 0,1.2. stationary distribution a~ = lim pn > 0 II n~x } L~ a stationary distribution and there exists no other Proof We will first prove (ii) To begin, note that M j=O " '" pn :::; '" pI! = 1 L; 'I L; II I~O for all M Letting n --+ co yields for all M. implying that Now P7/ 1 = Letting n --+ co k-O L P7k Pki ~ L P7kPkl k-O '" M for all M yields for all M, implying that TTl ~ '" L TTkPkl , krO ] 2:: 0 176 MARKOV CHAINS To show that the above is actually an equality, suppose that the inequality is strict for some j Then upon adding these inequalities we obtain ~ ~ ~ ~ I~II 2: TTl > 2: 2: TTkPkj = 2: TTk 2: Pkl = 2: I~Ok~O k~O I~O TTb k~O which is a contradiction Therefore, '" 2: TTkPkj , k-O TTl = j = 0, 1,2, } is a stationary distribution, and Putting PI = TT/I.; TTk, we see that {PI' j = 0, 1, 2. hence at least one stationary distribution exists Now let {PI' j = 0, 1. 2, .} be any } is the probability distnbution of stationary distribution Then if {PI' j = 0, 1, 2, Xo. then by (431) (432) == '" 2: P{Xn = jlX o = i}P{Xu = i} ,~O From (432) we see that PI? ,- 0 2: P7I P, yields M for all M. Letting n and then M approach 00 (433) PI ? ,~O '" 2: TTl P, = TTl To go the other way and show that PI::; TTl' use (4 3 2) and the fact that P71 ::; 1 to obtain '" 2: P7I P, + 2: M PI::; P, for all M. ,~II r="Mt"1 and letting n -+ 00 gives '" 2: TTIP, + 2: M PI S P, forall M ,;0 ,;M "I LIMIT THEOREMS 177 00 Since ~; P, ::= 1, we obtain upon letting M -+ "" that (434) PI:::; 1"0 L "1 PI = "1 If the states are transient or null recurrent and {PI' j = 0, 1, 2, ..} is a stationary distribution, then Equations (4.3.2) hold and P~,-+ 0, which is clearly impossible. Thus, for case (i), no stationary distribution exists and the proof is complete Remarks (1) When the situation is as described in part (ii) of Theorem 4.3.3 we say that the Markov chain is ergodic. (2) It is quite intuitive that if the process is started with the limiting probabilities, then the resultant Markov chain is stationary. For in this case the Markov chain at time is equivalent to an independent Markov chain with the same P matnx at time 00, Hence the original chain at time t is equivalent to the second one at time 00 + t = 00, and is therefore sta tionary (3) In the irreducible, positive recurrent, periodic case we still have that the '1TJ' j 2: 0, are the unique nonnegative solution of ° But now '1T, must be interpreted as the long-run proportion of time that the Markov chain is in state j (see Problem 4.17). Thus, '1T, = lllL lJ , whereas the limiting probability of going from j to j in nd(j) steps is, by (iv) of Theorem 431, given by lim n .... '" pnd JJ =!L = IL /I d'1T I' where d is the period of the Markov chain' ExAMPLE 4.3(A) Limiting Probabilities/or the Embedded M/G/l Queue. Consider the embedded Markov chain of the MIG/l sys- 178 tern as in Example 4.1(A) and let MARKOV CHAINS a, = (Ax)' f'" e-:u-.,-dG(x). J. o That is, a j is the probability of j arrivals during a service period The transition probabilities for this chain are Po, = aj' ;>0, j>i-l, P,,=a,-i+I' P'j = 0, j<;- 1 Let p = ~j ja j • Since p equals the mean number of arrivals during a service period, it follows, upon conditioning on the length of that period, i,ha t p = '\£[S], where S is a service time having distribution G We shall now show that the Markov chain is positive recurrent when p < 1 by solving the system of equations These equations take the form (4.3.5) TT, = TToa, + 2: TTia,_I->I, 1= }"'- I I To solve, we introduce the generating functions TT(S) = 2: TT,S', ,=0 a: A(s) = 2: ajs' ,=0 '" Multiplying both sides of (4.3.5) by s' and summing over j yields TT(S) = TToA(s) + 2: 2: TT,a,_I+ls' ,=O/~ a: ,-> 1 1 = TT oA(s) + S-I L.J TTI S' ~ I a: =I ~ L.J ,=1- '" a,-1+1 sri+1 1 = TToA(s) + (TT(S) - TTo)A(s)/s, L.IMIT THEOREMS 179 or _ (s - l)1ToA(s) 1T (s ) S - A(s) . To compute 1To we let s -+ 1 in the above. As IimA(s) = s_1 this gives I~O La co l = 1, s -A( ) 1 . I1m 1T (s ) = 1To I' 1m s-' s-I S S = 1To(1 - A'(I)t', where the last equality follows from L'hospital's rule. Now A'(I) = L ja t = p, ,=0 '" and thus lim 1T(S) =~. 1- P I-I However, since lims_1 1T(S) = ~~",01T;, this implies that ~~=o1Tt 1To/(l - p); thus stationary probabilities exist if and only if p < 1, and in this case, 1To 1 - p = 1 - AE[S]. Hence, when p < 1, or, equivalently, when E[S] < IIA, 1T (s ) ExAMPLt: 4.3(&) _ (1 - AE[SJ)(s - l)A(s) S - A(s) • Limiting Probabllides for the Embedded GI MIl Queue. Consider the embedded Markov chain for the GIMII queueing system as presented in Example 4.I(B). The limiting probabilities 1TA;, k = 0, 1, ... can be obtained as the unique solution of k 2!:: 0, 180 which in this case reduce to TT" - MARKOV CHAINS _ '" " ,,,,t-l TT, J'0e -I" (J,Lt)' + 1 k (i + 1 _ k)! dG(t), k'2:.1, (4.36) (We've not included the equation TTo = ~, TT,P,o since one of the equations is always redundant.) To solve the above let us try a solution of the form TTl< = cf3/t. Substitution into (4.3.6) leads to (4.3.7) or (4.3.8) The constant c can be obtained from ~/t TTl< C =:: = 1, which implies that 1 - f3. Since the TTk are the unique solution to (4.3.6) and TTk satisfies, it follows that k = 0, 1, ... , (1 - f3)f3k where f3 is the solution of Equation (4.3.8), (It can be shown that if the mean of G is greater than the mean service time 11J,L, then there is a unique value of f3 satisfying (4.3.8) that is between 0 and 1.) The exact value of f3 can usually only be obtained by numerical methods. EXAMPLE 4.3(c) The Age of a Renewal Process. Initially an item is put into use, and when it fails it is replaced at the beginning of the next time period by a new item Suppose that the lives of the items are independent and each will fail in its ith period of use with J?robability P" i > 1, where the distribution {P,} is aperiodic and ~, iP, < 00. Let Xn denote the age of the item in use at time LIMIT THEOREMS 181 n-that is, the number of periods (including the nth) it has been in use. Then if we let denote the probability that an i unit old item fails, then {X n' n O} is a Markov chain with transition probabilities given by PI. I 2: = A(i) = 1 - PI/+ 1> i 2:: 1. Hence the limiting probabilities are such that (4.3.9) (4.3 10) Iterating (4.3.10) yields TT,+ I = TT,(l - A(i)) = i 2:: 1. TT, 1(1 - A(i))(l - A(i - 1)) A(l))(l - A(2))' .. (1 - A(i)) = TT I (1 TTIP{X:> i + I}, where X is the life of an item Using ~~ TTl = 1 yields 1 = TTl '" 2: P{X 2:: i} I~ I or TTl = lIE[X] i 2:: 1, and (43.11) TTl = P{X> i}IE[X], which is easily seen to satisfy (4.3.9). It is worth noting that (4 3 11) is as expected since the limiting distribution of age in the nonlattice case is the equilibr!!im distribution (see Section 34 of Chapter 3) whose density is F(x)IE[X] then it is easy to see that {X n . . it follows that the number of them that are still in the population at time 1 is a Poisson random variable with mean 0:(1 . j = 3. Let p(x. .p) + A then the chain would be stationary Hence. 1. x n ) be the joint probability mass function of the random vector XI. and also that the number of new members that join the population in each time period is a Poisson random variable with mean A. it thus follows that XI is a Poisson random variable with mean 0:(1 .3(0) Suppose that during each time period.. n .182 MARKOV CHAINS Our next two examples illustrate how the stationary probabilities can sometimes be determined not by algebraically solving the stationary equations but by reasoning directly that when the initial state is chosen according to a certain set of probabilities then the resulting chain is stationary EXAMPLE 4.p) As the number of new members that join the population by time 1 is an independent Poisson random variable with mean A.. and call its value x 1 Then generate a random variable whose distribution is the conditional distribution of X 2 given that XI = x:. we can generate a random vector whose probability mass function is approximately p(x l . ~ = x~. . and call its value x~. n.. .. If we let Xn denote the number of members of the population at the beginning of period n. and call its value x ~ Then continue in this fashion until you have generated a random variable whose distribution is the conditional distribution of Xn given that XI = x:' j = 1. we can conclude that the stationary distribution is Poisson with mean Alp.p.p) + A. . . . . x?) > 0 Then generate a random variable whose distribution is the conditional distribution of XI given that ~ = x~.. Since each of these Xo individuals will independently be alive at the beginning of the next period with probability 1 . given all of the other XI' j :.. • ... j = 2. . EXAMPLE 4.. suppose that X 0 is distributed as a Poisson random variable with parameter 0:. It works as follows. . Let X 0 = (x I. X n • In cases where it is difficult to directly generate the values of such a random vector. . a random variable having the conditional distribution of X. j = 0. n = 1. by the uniqueness of the stationary distribution.1. for each i. x n ) by using the Gibbs samrler.tf i. . .3(E) The Gibbs Sampler. n. but where it is relatively easy to generate. if 0: = 0:(1 . x~) be any vector for which p (X?2 . To find the stationary probabilities of this chain. } is a Markov chain. . . every member of a population independently dies with probability p. That is. Hence.. 5 that the number of visits by time n is. .. P{X: =xhX~=xJ'j=2. . It is easy to see that the seq uence of vectors X}. . . and would thus be the limiting probability mass function provided that the resulting chain is aperiodic and irreducible.xn) would remain a stationary distribution.j = 2. 1T} is the long-run proportion of transitions that are into state j.n} = P{X: = xIIX~ = x}. XII)' To verify the claim.xn)' For instance. and for .... aIgont hm th e vector x } ••• . j ~ O-that is. let Tn denote the number of transitions from time n onward until the Markov chain enters state 0.. . Now.. for large n. approximately normally distributed with mean n/ E[N] = n1To and variance nVar(N)/(E [N])3 = nVar(N)1T~. then P(XI' .j = 2. n} =P(XI. . and repeat the process..1I1T~.LIMIT THEOREMS 183 Let X I = (X:. Consider a given state of this chain. we can conclude that it is the limiting probability vector for the Gibbs sampler. xn)' Then it is easy to see that at any point in this ...} . • .' . . . . . It remains to determine Var(N) = E [N 2 ] . . Since visits to state 0 constitute renewals it follows from Theorem 3.. xn) is a stationary probability distribution and so... xn) would be the limiting probability vector even if the Gibbs sampler were not systematic in first changing the value of XI' then X 2 . letting X ~ be the random variable that takes on the val ue denoted by x.n}P{X} = x}.3. 1T}. . . . x ~). That is. and so on. TI + T2 +. . . consider an irreducible. .I I. reward at any time that is equal to the number of transitions from that time onward until the next visit to state 0. . By Imagmmg that we receive a . j ~ 0 is a Markov chain and the claim is that its stationary probabilities are given by P(XI. positive recurrent Markov chain with stationary probabilities. suppose that XO has probability mass function P(XI' . It also follows from the preceding that P(XI. .j = 2.n} = P{XI = xliX} = x}. and conSider 11m n-+iXI .n}P{XY = x}. .. + Til n . Indeed.' . . then I ••• .. . to obtain the new vector X 2.. say state 0. To derive an expression for E [N2]. . let us first determine the average number of transitions until the Markov chain next enters state O. and let N denote the number of transitions between successive visits to state O. provided that the Markov chain is irreducible and aperiodic.j = 2. this time starting with X I in place of X 0.. X}. 1_ I X Xrn I WI'II b e t he va I ue of a random variable with mass functionp(x l . even if the component whose value was to be changed was always randomly determined.. and so on. .xn)· Thus P(XI. . we obtain a renewal reward process in which a new cycle begins each time a transition into state 0 occurs. .. then E [Reward Earned during a Cycle] = E [N + N = 1 + .13). we obtain that 1ToE[N2] + 1 = 22: 1T. since the average reward is just the average number of transitions it takes the Markov chain to make a transition into state 0.rIO +II 1T0 where the final equation used that /Loo = E{N] = 1/1T0 • The values /L. and using that E{N] l/1To.O = 1 + 2: p./L.. 1"0 i >0 . But if N is the number of transitions between successive visits into state 0.]1)/2] _ E[N 2 ] + E{N] 2E[N] However. and since the proportion of time that the chain is in state i is 1T" it follows that (4.O can be obtained by solving the following set of linear equations (obtained by conditioning on the next state visited). Average Reward Per Unit Time = E [N~./L..J 1 1T.184 which the average reward per unit time is Long-Run Average Reward Per Unit Time ' T\ I 1m n_ cc MARKOV CHAINS + T2 + . + Tn n . + 1] E{Timeofa Cycle] Thus. (4312) E[N].o.0 denotes the mean number of transitions until the chain enters state o given that it is presently in state i By equating the two expressions for the average reward given by Equations (4312) and (4. By the theory of renewal reward processes this long-run average reward per unit time is equal to the expected reward earned during a cycle divided by the expected time of a cycle.3 13) Average Reward Per Unit Time = 2: 1T. /L.14) = - 2 1To "'-0 "'" L.3. where /L .}/L} 0.o I or (43. a)/fj2 + (1 . £[N 2] 21TIILI0/1T0 + I/1To = 2(1 . then the number of visits to state 0 by time n is for large n approximately normal with mean n/2 and variance n/4. THE GAMBLER'S RUIN PROBLEM 185· ExAMPI. . AND MEAN TIMES IN TRANSIENT STATES We begin this section by showing that a recurrent class is a closed class in the sense that once entered it is never left. Hence.4.3. ::::: O. For vanance mT~Var(N) instance.TRANSITIONS AMONG CLASSES.fj + afj .a)/fj2 + (1 .a + fj)2/fj2 (1 . then P" ::::: O.( 2 )/fj2..( 2 )/(1 .a + fj)/fj and so. if a = fj = 1/2.PII' For this chain. THE GAMBLER'S RUIN PROBLEM.E 4.4 TRANSITIONS AMONG CLASSES.1 Let R be a recurrent c1ass of states. PROPOSITION 4. 4. there is a positive probability of at least P" that the process will never return to i This contradicts the fact that i is recurrent.a + fj' ILIO 1T =---I I-a+fj I-a lIfj. Then. for n large. Hence.POI P IO = fj = 1 . If i E R.a + fj) . P.3(F) Consider the two·state Markov chain with POO a 1 .a + fj) and nfj(1 . as i and j do not communicate (since j e R).a + fj)/fj .(1 . from (4. j e R. the number of transitions into state 0 by time n is approximately normal with mean mTo = nfj/(1 . Proof Suppose P" > O.14). = 0 for all n Hence if the process starts in state i. and so PI.fj + afj . 1. Var(N) = 2(1 . } is a Markov chain with transition probabilities EXAMPLE 4. .4(A) ° Poo =PNN =I. starting with i units.P of losing 1 unit. Since each transient state is only visited finitely . and {N}. R. yields a set of equations satisfied by the /'. {O}.(oo) > 0lx o= i. T.2 If j is recurrent. what is the probability that. n = 0.2.186 MARKOV CHAINS Let j be a given recurrent state and let T denote the set of all transient states. Consider a gambIer who at each play of the game has probability p of winning 1 unit and probability q = 1 .. Pi i + 1 = P = 1 PI i I ' i = 1. This Markov chain has three classes.J' the probability of ever entering j given that the process starts in i The following proposition. k f£. by conditioning on the state after the initial transition.N 1. {I. . 2. Assuming successive plays of the game are independent. = for k f£..11k i} 2: P{N. PROPOSrrlON 4. we are often interested in computing /. N I}. 2. the first and third class being recurrent and the second transient. the gambIer's fortune will reach N before reaching If we let Xn denote the player's fortune at time n. For i E T. then the process {Xn . namely. 1. XI = k}P{XI = klxo= i} where we have used Corollary 425 in asserting that h. = P{NJ(oo) > 0IX o = . . 441 in asserting that h. . ° 1 for k E R and Proposition The Gamblers Ruin Problem.4. then the set of probabilities {fr" i E T} satisfies where R denotes the set of states communicating with j Proof fr. starting with i.2). Using iN 9. . Since to . equivalently. THE GAMBLER'S RUIN PROBLEM 187· often. the gambler will either attain her goal of N or go broke.. =F 1 p if p it..(q/p) f. = 1- = 1 yields 1 .2. i <::: N. the gambier's fortune will eventuaHy reach N By conditioning on the outcome of the initial play of the game (or. it follows that.N-l.= i N i .1. Let f.N denote the probability that. we obtain i = 1. = 0. we see from the above that Adding the first i 1 of these equations yields or 1 .TRANSITIONS AMONG CLASSES. ~ if 9. equivalently. after some finite amount of time. by using Proposition 4. 0 :s. or. == f..(q/p)' 1 . i = 1. since p + q = 1.4.(q/p)N if p =F i if p = f. 2.(q/p)i 1 . N . it follows that if p > -!. That is.p) = 2p . by conditioning on the outcome of the initial gamble. then. let B denote the number of bets until the gambler's fortune reaches either 0 or n. and derive a set of linear equations for m" i = 1. the gambler will eventually go broke when playing against an infinitely rich adversary. let XI be her winnings on the jth bet. n . B = Min {m i XI =-i or I =I I-I i XI = n - i}.----+ { o 1-(q/P)' if p if >! p :S! Hence.a. we obtain that (2p . with probability 1.1. . we will obtain a more elegant solution by using Wald's equation along with the preceding. . there is a positive probability that the gambIer's fortune will converge to infinity. then with probability a with probability 1 . Suppose now that we want to determine the expected number of bets that the gambler.1)E[B] or = na . whereas if p ::s.. and N is a stopping time for the XI' it follows by Wald's equation that But. Imagining that the gambler continues to play after reaching either 0 or n. Hence. Also. Since the XI are independent random variables with mean E[XI] = l(p) .188 MARKOV CHAINS It is interesting to note that as N ----+ 00 t.(q/p)'] 1 . if we let a = [1 .1 {n[1 .(q/p)n] be the probability that n is reached before 0. E [B] = 1 2p . from the continuity property of probabilities. makes before reaching either o or n Whereas we could call this quantity m.1(1 . ~. j ~ 1.(q/p)n i} . starting at i.(q/p)']/[l .1. t. j = 1" In matrix notation. upon mUltiplying both sides by (I .Q)M =1 we obtain. je T. can be obtained by inverting the matrix 1 .j) +L k~1 I P..Qtl. some of its row sums are less than 1 (for otherwise. (4. the quantities mil' ie T. let mil denote the expected total number of time periods spent in state j given that the chain starts in state i Conditioning on the initial transition yields' (44.Q (The existence of the inverse is easily established) . j) is equal to 1 when i = j and is 0 otherwise.km kl where 8(.] P" P II and note that since Q specifies only the transition probabilities from transient states into transient states.TRANSITIONS AMONG CLASSES. that That is. Let M denote the matrix of values mil' i.1 ) mil = 8(. THe GAMBLER'S RUIN PROBLEM 189 Consider now a finite state Markov chain and suppose that the states are numbered so that T = {1.41) can be written as M = 1 + QM As the preceding equation is equiva- where 1 is the identity matrix of size lent to t (/.j) + L k P. that is. and where the final equality follows from the fact that m kl = 0 when k is a recurrent state .km kl = 8(.. t} denotes the set of transient states Let PI. T would be a closed class of states) For transient states i and j.. ". 2. 6 0 Q) gives that 1. mil = E [number of transitions into state j Istart in = i] m ll !.6 0 0 3 0 .9774 05714 03008 01203 14662 24436 14286 07519 03008 M= (I Qt l = 12857 21429 27143 14286 05714 1.190 MARKOV CHAINS For ieT.6 0 4 0 0 .6090 1. 2.0150 12857 14662 15865 Hence. 5} is as follows.1429 /34 = m 34 /m 44 = 1428612. m33 = 2.7143. we see that EXAMPLE 4.4 0 . the quantitY!. 1 1 2 Q=3 2 ./ where mil is the expected number of time periods spent in state j given that it is eventually entered from state i. we start by deriving an expression for mil by conditioning on whether state j is ever entered. is easily determined from M To determine the relationship.5846.4436 = . je{1. determine' (a) the expected amount of time spent in state 3. m 3•2 = 2. which specifies PI}' i.0150 16917 21429 24436 09774 0. . 4. Solution The matrix Q.5865 0.4 5 0 0 0 0 . Starting in state 3. 3.J> equal to the probability of ever making a transition into state j given that the chain starts in state i.4 . jeT. Thus.4(a) Consider the gambler's ruin problem with p = 4 and n = 6.4 0 .6 0 0 0 4 5 Inverting (I 0 . (b) the expected number of visits to state 2 (c) the probability of ever visiting state 4. have produced j new offspring with probability P" j ~ 0.]] JLE [Xn-.2 1 where JL is the mean number of offspring per individual. given that Xl = j. note that 13 4 is just the probability. let Xn denote the size of the nth generation.BRANCHING PROCESSES 191 As a check.5846. and only if. starting with a single individual. each of the j families started by the members of the first generation eventually die out.1 = = JL 2E [X n . Suppose that Xo = 1. as follows..=0 L P{population dies outl Xl = j}P. the population will eventually die out if. starting with 3. and since the .4 = ~ =~::~::~: = 38/65 = . All offspring of the zeroth generation constitute the first generation and their number is denoted by Xl' In general. 1To = P{population dies out} = . 1 Z" where Z. independently of the number produced by any other individual.I yields E [Xn1 = E [E [XnIXn-. denoted by X o. Now.5 BRANCHING PROCESSES Consider a population consisting of individuals able to produce offspring of the same kind Suppose that each individual will. . The Markov chain {X n . by the end of its lifetime. of visiting 4 before 0. represents the number of offspring of the ith individual of the (n . Since each family is assumed to act independently. 4. L po:. n ~ O} is called a branching process. Let 1To denote the probability that. the population ever dies out An equation determining 1To may be derived by conditioning on the number of offspring of the initial individual.l)st generation Conditioning on X n . The number of individuals initially present. and thus 13. is called the size of the zeroth generation. We can calculate the mean of Xn by noting that Xn _ 1 Xn = . this yields (4.1 Suppose that Po > 0 and Po + PI < 1. THEOREM 4. Then (i) 1To is the smallest positive number satisfying (ii) 1To = 1 if.192 MARKOV CHAINS probability that any particular family dies out is just 7To. 1T and letting n _ 00. 1T ~ lim P{X" O} P{population dies out} = 1To " To prove (ii) define the generating function .1).5. let 1T We'll first show by induction that 1T ~ P{X" O} for all n. and assume that 1T ~ P{X" = O} J Then 1 P{X"+I = O} 2: P{X"+1 = OlX = j}PI 2: (P{X" = On PI i I (by the induction hypothesis) 1T Hence. ~ Proof To show that 1To is the smallest solution of (4.1). :s. Now 1T = 2: 1T JPJ ~ 1Topo = Po = P{X1 = O} J 0 satisfy (45. 1. II.1) In fact we can prove the following. ~ P{X" = O} for all n. and only if.5.5. 1 The result follows. it follows that c/>"(S) = L j(j J~O .APPLICATIONS OF MARKOV CHAINS 193 (1. then examples can usually be constructed that require roughly N . we will present a simple probabilistic model for the number of necessary steps.1 A Markov Chain Model of Algorithmic Efficiency Certain algorithms in operations research and computer science act in the following manner the objective is to determine the best of a set of N ordered elements The algorithm starts with one of the elements and then successively moves to a better element until it reaches the best. Specifically. 1) / Po t--___~ / / / / / / / / / / / / / 45° Figure 4. 4. 1).2 Since Po + PI < I. 1) (1.5. 170 = 1 if. 1) Hence. since c/>(170) = 170.ZPJ> 0 for all s E (0. and in Figure 452.. l)sJ. c/>(s) is a strictly convex function in the open interval (0.) If one looks at the algorithm's efficiency from a "worse case" point of view. we consider a Markov chain that when transiting from any state is equally likely to enter any of the better ones . 1) We now distinguish two cases (Figures 45 1 and 452) In Figure 4.1 steps to reach the optimal element In this section. (The most important example is probably the simplex algorithm of linear programming. It is geometncally clear that Figure 45 1 represents the appropnate picture when C/>' (1) :::. c/>'(1) :::.1 Figure 4.5. and only if. 1). since c/>'(1) = L~ jPJ = IJ. 1. and Figure 452 is appropriate when c/>'(1) > 1 Thus. which attempts to maximize a linear function subject to linear constraints and where an element corresponds to an extreme point of the feasibility region.6.5 1 c/>(s) > s for all s E (0. c/>(s) = s for some s E (0.6 ApPLICATIONS OF MARKOV CHAINS 4. E{T4] = 1 +1(1 + 1 +i) = 1 +i+1. . 0 The importance of the above representation stems from the following. and let T..-1 0..N-l .-1 E[T. i>l.6.1 are independent and P{fJ = l} = 1/j. to obtain a more complete description of TN' we will use the representation TN = 2: N-l Ij .1) Starting with E [TI] = E[T. denote the number of transitions to go from state i to state 1. A recursive formula for E{T. and it is not difficult to guess and then prove inductively that .. j= I where I J = {I if the process ever enters j otherwise.] = 1 + i-I J~ E[TJ 1 . = 1 + l. • f N .] = 2:-:. Lemma 4. l:=:. we successively see that = E{T2] E{T)] 1.1 f. J 1 J= I However..i-l. ..] can be obtained by conditioning on the initial transition: (4.j:=:.194 Consider a Markov chain for which PI I 1 p'J=-'-1' 1- MARKOV CHAINS = 1 and j=I.6. = 11'. Since the sum of a large number of independent Bernoulli random variables. 1 4. 'N.APPLICATIONS OF MARKOV CHAINS 195 proof Given '. 3. 1 contains four runs as indicated 13. " = I} Then P{'. each having a small probability of being nonzero. 2.+ I whenever x.2 An Application to Runs-A Markov Chain with a Continuous State Space XI Consider a sequence of numbers XI.: . 1 ] 1 N-I log N::::. TN has approximately a Poisson distribution with mean log N Proof Parts (i) and (ii) follow from Lemma 46 1 and the representation TN = '2:. 8. 4.~ 1 ] 2: -:..41311.~11 '. > X. the partial sequence 3. •••• If we place vertical Jines before and between x.-'-I' then we say that the runs are the segments between pairs of lines For example. X2.6.-:1) 2: -: . is approximately Poisson distributed.1 ] ] (iii) For N large.-:<I+ 1 -X XI] or N-I 10gN< and so 2: -:< 1 + 10g(N-I).-'-1' .6. N 1) = ! j/(n-I) j PROPOSITION 4. Thus each run is an increasing segment of the sequence . part (iii) follows since NdX ~I 1 fN-1 dx f 1 -<L.2 (i) £[TNJ = 2: ""7 .~ 1 ] N-I 1 (ii) Var(TN) = N-1I( 1 . 812.+1> .. and x.. } = lI(n . 5. let n = min{i i> j. 5. 1 . the next m . we see that P{LI~m}=-" 1 m.1 values must exceed y. Now the run will be of length m and the next one will start with value y if· (i) the next m . if we let L I denote the length of the initial run.m-l} (1 . (iii) the maximum of the first m .Xm_I »yIX. 1) random variables. Now it is easy to see that {In. (ii) the mth value must equal y. Ln = ml/n = x} _(1_x)m-1 . and suppose we are interested in the distribu_ tion of the length of the successive runs For instance. Y + dy). Ln = ml/n = x}. P{ln+ I E (y. . (l-x)m-1 [ (y_x)m-I] (m .1 values are in increasing order and are all greater thanx. m = 1.x ..x)m-I (m -1)' dy = ify< x if Y > x. and only if. For if a run starts with x.196 MA RKOV CHAI NS Suppose now that XI.62) P{L>mlx}= (1 (m . where Ln is the length of the nth run.>x.1)' dy 1 . . are independent and identically distributed uniform (0.1 values must all be greater than x and they must be in increasing order. n ~ I} is a Markov chain having a continuous state space To compute p(Ylx).. the first m values are in increasing order. reason as follows· P{I~+I E (y.1)' -x )m-I since in order for the run length to be at least m.i=I •. X 2 . then as L I will be at least m if.. the initial value in the run. let In denote the initial value of the nth run. the probability density that the next run begins with the value y given that a run has just begun with initial value x.2.y + dy).(m-l)' dyP{max(X 1. The distribution of a later run can be easily obtained if we know x. Hence. .. then (4. To obtain the unconditional distribution of the length of a given run.y + dY)l/n = x} = L m&1 co P{ln+1 E (Y. the limiting density of In' is given by 1T(Y) = 2(1 y). To obtain the limiting distnbution of In we will first hazard a guess and then venfy our guess by using the analog of Theorem 4. Hence it would seem that the rate at which runs occur.APPLICATIONS OF MARKOV CHAINS 197 summing over m yields I-x ify <x p(ylx) = { el _x Y-. 1).3.3. Now II. beginning with an initial value in (y. y + dy). However. is uniformly distributed over (0. O<y<l [A second heuristic argument for the above limiting density is as follows: Each X. {In' n ~ I} is a Markov chain having the continuous state space (0. y + dy) will be (1 . So it seems plausible that the long-run proportion of such values that are less than y would equal the probability that a uniform (0. Since it seems plausible that 17'( y). being the initial value ofthe first run.3 can be proven in the case of a Markov chain with a continuous state space. That is. and independent.y) dy/f~ (1 y) dy = 2(1 y) dy. . uniform (0. 1) random variable is less than y given that it is smaller than a second.] Since a theorem analogous to Theorem 4 3. 1) random vanable.t e -e ify>x. of value y will be the starting point of a new run if the value prior to it is greater than y.y) dy. equals F(y)f(y) dy = (1 . to prove the above we need to verify that In this case the above reduces to showing that or 1 Y = f~ e l X (1 x) dx - f~ ey-x(1 - x) dx. and so the fraction of all runs whose initial value is in (y. the later runs begin whenever a value smaller than the previous one occurs. 1) and a transition probability density P ( ylx) given by the above. At each unit of time a request is made to retrieve one of these .198 which is easily shown upon application of the identity MARKOV CHAINS Thus we have shown that the limiting density of In' the initial value of the nth run.6.3) as follows: !~"! . The above could also have been computed from (4.3 List Ordering Rules-Optimality of the Transposition Rule Suppose that we are given a set of n elements e l .1 (m + 1 1)(m . . '" E[Lnl = 2 m2.6.1 '" 1 (m + 1)(m -1)" 4. 0 < x < l.x). the length of the nth run.6.1)! yielding the interesting identity 1= ]. • . is given by (4. :> _ JI (1(m. .3) !~ P{Ln-m} - .-1)! 2(1-x)dx x)m-I 0 2 (m + 1)(m .en that are to be arranged in some order.2) that E[LII= xl = i ~(1(m -x~)m_-1 ___ m=1 1)! and so = 2.1)! To compute the average length of a run note from (4.6. is 1T(X) = 2(1 . Hence the limiting distribution of Ln. we suppose that i. Consider the following restricted class of rules that.. = Pn = --1 = q. . . for any i. it = 1.. . . = P". but rather restrict ourselves to reordering rules in which the reordered permutation of elements at any time is only allowed to depend on the present ordering and the position of the element requested For_ a gi'{en reqrderi!1g rule. However. i + K(i) will. be moved to a position less than or equal to i.it+1 <: f}. for such a large number of states. n.. at least in theory.. were known. i 1.:: 2.. The set {fl' i = 1. i + 1. the average position of the element requested can be obtained. . move the element to position j. the optimal ordering would simply be to order the elements in decreasing order of the P. when an element is requested and found in position i. an element in any of the positions i.::: 1.. since all the elements 2 through n are identical in the sense that they have the same probability of being requested. being requested (independently of the past) with probability PI' P. . In other words. the problem becomes more interesting if we do not allow such memory storage as would be necessary for the above rule.-t. For a specified rule R in the above class-say the one having K(i) = k(i). let K(i) I-p = max{l.APPLICATIONS OF MARKOV CHAINS 199 elements-e. and leave the rei ative positions of the other elements unchanged. i ::. even if the PI'S were unknown we could do as weJl asymptotically by ordering the elements at each unit of time in decreasing order of the number of previous requests for them However.{e l is in position f}... Oearly if the P. n-Iet us denote the stationary probabilities when this rule is employed by 'IT. by analyzing the Markov chain of n! states where the state at any time is the ordering at that time. In fact. which always moves the requested element one closer to the front of the line. < i for i > 1. nFor such probabilities. n} characterizes a rule in this class For a given rule in the above class. ~~ PI = 1 The problem of interest is to determine the optimal ordering so as to minimize the long-run average position of the element requested.'S.::: i. i . ~ 0. if requested. is optimal. the analysis quickly becomes intractable and so we shall simplify the problem by assuming that the probabilities satisfy P2 = . . we can obtain the average position of the element requested by analyzing the much simpler Markov chain of n states with the state being the position of element e t • We will now show that for such probabilities and among a wide class of rules the transposition rule. . . and i.. In addition. . ..)O.. . = O.tk(/))(1 . be denoted by Of for the transposition rule.1. q 0. j.O. we have.6. 0 0 = 1.. . .O. which has j. or. .200 MARKOV CHAINS In addition.+k(. The steady-state probabilities can now easily be seen to be 0 . n . i + 1.4) where a. qk(i) qk(i) + p' 0. Now consider a special rule of the above class. The above follows since element 1 will be in a position higher than i if it was either in a position higher than i + k(i) in the previous period.) + (0. . n. = L Let the corresponding 0. since K(i) 1 for this rule. = -(0.1+ 1 n P". Then from Equation (4.O. = a. or if it was in position i and if any of the elements in positions i + 1.6. it may be worth noting the following (I) Any element moves toward the back of the list at most one position at a time.p) + (0'_1 ..0. or it was in a position less than i + k(i) but greater than i and was not selected. the transposition rule.= 2: "/ = /.a. (ii) If an element is in position i and neither it nor any of the elements in the following k(i) positions are requested._.. On = 0. i-I. . Before writing down the steady-state equations. it will remain in position i. i:> 0 with the notation P"" signifying that the above are limiting probabilities.+k(')1 i 1. . i + k (i) will be moved to a position <:: i if requested. + (1 . .)qk(i). let 0.• i + k(i) were selected.4). .) p . equivalently. i 2.{el is in a position greater than I}. The above equations are equivalent to (4. . (iii) Any element in one of the positions i.. namely. and note that in this case a=l>1=b . equivalently. PROPOSITION 4. b = (qlp) + . which is equivalent to p. for all i If p ~ lin.n'_1 + (1 .1) [. .2. ~ ". .)n.. . . . 1 + (qlp) + . .. + . .+k(t)' i=l. . .n-l We may now prove the following.b.APPLICATIONS OF MARKOV CHAINS 201 unplying Summing the above equations from j = 1.n.. for all i Proof Consider the case p ~ lin. then ". + (q)']' P p Letting r = k(i). we see that or.(n . + (qlp)lrlll " 1 1 .5) where n. . . + (qlpl('l' i=l. 1 + (k(i)lp)q 1 + qlp + .6.n 1. where k(i) is the value of K(i) for a given rule R in our class. + (qlp)k(') . nn = 0. r yields n i+.. = b. ~ ". . (4.3 If p .:: q. then ". . .:: lin.n.. . .6.. it follows from (464) that if we take c.1. Now it is intuitively clear (and we defer a formal proof until Chapter 8) that the probability that the Markov chain defined by (466) will ever enter 0 is an increasing function of the vector f =: • Cn_l) Hence.] E"'LPosltlon = p E[] + ( 1 . of rule R. since a. n and transition probabilities (466) P.= { 1 - c.202 Now define a Markov chain with states O. = C. (c 1 • . then a. n . if p . i = 1. we see that IT. and from (465) if we let c . fo = 1. will equal the IT. equals TI. = b" then f. as it can be shown that the above set of equations has a unique solution.~0 PiX > it.p) E[X] + (1. b" i for all i = 1. lin. MARKOV CHAINS .c.1 2(n .p)n(n + 1) n.n .4 Among the rules considered. ifj=i-l.P ) E[1 + 2 + _ 1 + n . n.n . 1. 1= 1..::: lin. .::: n.X] X n = (p _ 1 . and if p ~ lin. for all i.-I + (1 . f k(l). equal to a.1) Thus. . . the expected position is minimized by minimizing E[X]. When p :s. then f... . ifj = i + k(i). and the above inequality is reversed THEOREM 4.f. . the limiting expected position of the element requested is minimized by the transposition rule Proof Letting X denote the position of e l • we have upon conditioning on whether or not e l is requested that the expected position of the requested element can be expressed as . satisfies f.6. c.)f. denote the probability that this Markov chain ever enters state 0 given that it starts in state i Then f. ::s. Hence. the result follows from Proposition 46 3 .1. by maximizing E[X] Since E[X] = 2.::: b" i = 1.1 Let f. think of the present time as being time m + 1.P'J) is equal to the rate at which it goes from j to i (namely.J for all i. that (4. namely. It should be noted that this is an obvious necessary condition for time reversibility since a transition from i to j going backward .7 TIME-REVERSIBLE MARKOV CHAINS An irreducible positive recurrent Markov chain is stationary if the initial state is chosen according to the stationary probabilities. then the Markov chain is said to be time reversible. Consider now a stationary Markov chain having transition probabilities p. j. n ::> 1 is a Markov chain it follows that given the present state Xm+1 the past state Xm and the future states X m+2 • X m+3 . for all states i and j.P" P' / . Thus the reversed process is also a Markov chain with transition probabilities given by *..' 1T.). since X n . The condition for time reversibility. starting at time n consider the sequence of states X n• Xn -I. 1T. j. ••• are independent.1T. If P~ = p.1) for all i. .TIME_REVERSIBLE MARKOV CHAINS 203 4. Then. That is.7.) We say that such a chain is in steady state...P.. can be interpreted as stating that. 1T.J and stationary probabilities 1T. and suppose that starting at some time we trace the sequence of states going backwards in time. (In the case of an ergodic chain this is equivalent to imagining that the process begins at time t = -00. But this is exactly what the preceding equation states. It turns out that this sequence of states is itself a Markov chain with transition probabilities P~ defined by P~ P{Xm = jlXm+1= i} _ P{Xm+1 = ilXm = j}P{Xm P{Xm+1 = i} = j} To prove that the reversed process is indeed a Markov chain we need to verify that To see that the preceding is true. the rate at which the process goes from i to j (namely. .. + I + Ptr I = 1 is time reversible. Hence it follows that the rate of transitions from i to i + 1 equals the rate from i + 1 to i. If this random variable takes on the value j.204 MARKOV CHAINS in time is equivalent to a transition from j to i going forward in time. that is if Xm = i and Xm I = j.= 1 2: a. n such that q/. This follows by noting that the number of transitions from i to i + 1 must at all times be within 1 of the number from i + 1 to i. If Xn = i. m Let a p j = 1. This is so since if (47. and set it equal to i otherwise. for all i and j... and so the process is time reversible. = ajlA. . j = 1.2) for all i. for all i. be positive numbers. . m. . that an ergodic chain with p/. . .. . . .. then set Xn+ 1 equal to j with probability min{l. then a transition from i to j is observed if we ar~ looking backward in time and one from j to i if we are looking forward in time If we can find nonnegative numbers. n ::> O} as follows. 'Frt Xl =1 are the unique solution of the above.la t }. without any need for computations. then generiite a random variable that is equal to j with probability qlJ' i. ExAMP1E 4.. summing to 1. One way of simulating a sequence of random variables whose distributions converge to {PI' j = 1. m.. a.. .7(A) An Ergodic Random Walk. .71) then it follows that the Markov chain is time reversible and the numbe~ represent the stationary probabilities. Now define a Markov chain {Xn.. m Suppose that m is large . it ExAMPLE 4. and suppose we ideally want to simulate the values of a sequence of independent random variables whose probabilities are P. = q/. This is so since between any two transitions from ito i + 1 there must be one from i + 1 to i (and conversely) since the only way to re-enter i from a higher state is by way of state i + 1.. m} is to find a Markov chain that is both easy to simulate and whose limiting probabilities are the PI' The Metropolis algorithm provides an approach for accomplishing this task Let Q be any irreducible transition probability matrix on the integers 1. which satisfy (4. = 'Fr. We can argue. j 1.7(a) The Metropolis Algorithm. and let A = and that A is difficult to compute. then summing over i yields 2: Since the stationary probabilities follows that X. j 2: X t 1. . II Consider a graph having a positive number w... . the transition probabilities of {X n . j = 1. j).) = p} min(l. for each i it is easy to generate the value of a random variable that is equal to j with probability q. We will now show that the limiting probabilities of this Markov chain are precisely the PI" To prove that the p} are the limiting probabilities. . q. = pJpl and so we must verify that P.}. {X n } will also be irreducible.TIME-REVERSIBLE MARKOV CHAINS 205 ~ That is. and suppose that a particle moves from vertex to vertex in the following manner· If the particle is presently at vertex i then it will next move to vertex j with probability where w.Ip). . n.} is 0 if (i. min(l.} associated with each edge (i. m by showing that To verify the preceding we must show that Now. .j) is not an edge of the graph. That these stationary probabilities are also limiting probabilities follows from the fact that since Q is an irreducible transition probability matrix.} = q}. j = 1. pJp. . we will first show that the chain is time reversible with stationary probabilities PI' j = 1. it is also aperiodic By choosing a transition probability matrix Q that is easy to simulate-that is.. n O} are ifj *i if j = i.. n-we can use the preceding to generate a Markov chain whose limiting probabilities are a/A. . p. and a/a. .. The Markov chain describing the sequence of vertices visited by the particle is called a random walk on an edge weighted graph. However this is immediate since both sides of the equation are eq ual to mine P" p}). This can also be accomplished without computing A.. and as (except in the trivial case where Pi == lin) P > 0 for some i. since wII WI/ implying that which.=~=-I I Proof The time reversibility equations reduce to or. proves the result. Consider a star graph consisting of r rays. = 1. Starting at vertex O. then it moves towards the leaf of that ray with probability p and towardsOwith probability 1 .) Let leaf i denote the leaf on ray i. Wheneverit is at a leaf.206 MARKOV CHAINS PROPOSITION 4. (See Example 1. ExAMPLE 4. Consider a random walk on an edge weighted graph with a finite number of vertices If this Markov chain is irreducible then it is. in steady state. with each ray consisting of n vertices. equivalently. since L 17. we are interested in finding the expected number of transitions that it takes to visit an the vertices and then return to O.:1.9(C) for the definition of a star graph. Assume that a particle moves along the vertices of the graph in the following manner Whenever it is at the central vertex O.p.7.7(0) . time reversible with stationary probabilities given by 17. it is then equally likely to move to any ofits neighbors Whenever it is on an internal (nonJeaf) vertex of a ray. it moves to its neighbor vertex with probability 1. Since the total of the sum of the weights on the edges out of each of the vertices is and the sum of the weights on the edges out of vertex 0 is r. To see this. and let XI be the number of transitions in the jth cycle.7.. the probability that a particle at a vertex i steps from 0 moves towards its leaf is w'/(w' + WI-I) p.1. /Loo.1.TIME-REVERSIBLE MARKOV CHAINS 207 Figure 4. the expected number of steps between returns to vertex 0. attach a weight equal to 1 with each edge connected to 0. say that a new cycle begins whenever the particle returns to vertex 0.p). with these edge weights.1) Then. we see from Proposition 4 7 1 that Therefore.p) (see Figure 47. A star graph with weights: w := p/(l . where w = pl(l . fix i and let N denote the number of cycles that it . j . To begin.::: 1 Also. is Now. let us determine the expected number of transitions between returns to the central vertex O.7. note the Markov chain of successive vertices visited is of the type considered in Proposition 4. and a weight equal to w' on an edge connecting the ith and (i + 1)st vertex (from 0) of a ray. To evaluate this quantity. note that each cycle will independently reach leaf i with probability r[11_-(~::)"] where 1/r is the probability that the transition from 0 is onto ray i. Therefore N. and. i to be visited (and the probability of this event is clearly 1N). and if it is the last of these leafs to be visited. . increase by n .1/w). . . let T denote the number of transitions that it takes to visit all the vertices of the graph and then return to vertex 0 To determine E[T] we will use the representation T = T.l/w) = 2r[2 . and 1 ~ (11:. the expected number of cycles needed to reach leaf i. the number of cycles needed to reach leaf i.w n .)n is the (gambler's ruin) probability that a particle on the first vertex of ray i will reach the leaf of that ray (that is.(1/w)"]1(1 .(l/w)n] 2 . T.=1 I = 2r(1 . + T2 +. until both the leafs 1 and 2 have been visited and the process returned to vertex 0.1/w Now. upon conditioning on whether leaf i is the last of leafs I.1) before returning to O. and so E [f X] . then T. .wn)[1 .w)(1 . 2-w-1/w p i i .w .-t until all of the leafs 1. will equal 0. will have the same distribution as the time until a specified leaf is first visited and the process then returned to 0 Hence.] /=1 = ILooE[N] ::: --'-1-w-<" E[N].(l/w)n] (1 . we obtain from the preceding that E[T] = 2r{2 - n w . . 2: X. As N is clearly a stopping time for the X" we obtain from Wald's equation that E [f X. . is the additional time from T..l/wn] Iii. i to be visited.MARKOV CHAINS takes for the particle to visit leaf i and then return to 0 With these definitions. then T. in general.. Tz is the additional time from T. is a geometric random variable with mean r[1 .. i have been visited and the process returned to 0 Note that if leaf i is not the last of the leafs 1. To determine E[N]. is the time to visit the leaf 1 and then return to 0. is equal to the number of steps it takes to visit 1= 1 N leaf i and then return to O. + T" where T. j.P. I I I' . if (474) for all states i. starting in state i. and only if. i .i' XkPk.7.k' X. In fact we can show the following. p I' + 1 . which is equivalent to the statement that.1 2 " IPII = P"PI. for all i That is.2) = X.2 A stationary Markov chain is time reversible if. To prove sufficiency fix states i and j and rewnte (474) as PI I I P. = X. • i k yields k k+1 P 'I P11 -_ PI.7.7..2) for an arbitrary Markov chain. P" implying (if P"P.TIME-REVERSIBLE MARKOV CHAINS 209 If we try to solve Equations (4.. P. k.k • Thus we see that a necessary condition for time reversibility is that (4.3) for all i. starting in state i.7..P. THEOREM 4. the path i --+ k -+ j --+ i has the same probability as the reversed path i --+ j --+ k --+ i. Proof The proof of necessity is as indicated. . and so we must have implying (4.3). P. it will usually turn out that no solution exists.k > 0) that which need not in general equal Pkl / P. For example.7. To understand the necessity of this. Summing the above over all states i l • i z.t . any path back to i has the same probability as the reversed path.. note that time reversibility implies that the rate at which a sequence of transitions from ito k to j to i occur must equal the rate of ones from i to j to k to i (why?). from (4. 1 it is easy to show that this chain is time reversible. suppose n == 3. 2. • 4. 1. For instance. 3). the Markov chain is time reversible.· ')-cpnpn-I .2. the element is then put back. For any given probability vector == (PI' ...J /1 k..2.3) ---+ (2. let us suppose that the element requested is moved one closer to the front of the list. where C is chosen so that 2: (II 1[(i1' .'n I I ••• I 12ft P satisfy Equation (4. 4.5 and element 2 is requested. Suppose we are given a set of n elements-numbered 1 through n-that are to be arranged in some ordered list./ft) . By using Theorem 4 7. Since a similar result holds in general.1) ---+ (3. The products of the transition probabilities in the forward direction and in the reverse direction are both equal to P~ P~ P... 3..J 1/ k~ I '" pH I L. In fact.. In fact.. Pn).-~. (1.3. ·' in) == 1.7. . if the present list ordering is 1. but not necessarily in the same position. ... time reversibility and the limiting probabilities can also be verified by noting that for any permutation (iI.2) ---+ (1. in).3) ---+ (2. .2. 2. . 3. for instance.4.2) ---+ (1.2.7(D) A List Problem. After being requested. the above can be modeled as a Markov chain with n! states with the state at any time being the list order at that time. then the new ordering becomes 1.I_ _ MARKOV CHAINS P /1 n =P 1/ n Letting n ~ co now yields which establishes the result.2.. 3) to itself: ExAMPLE r.3.1) ---+ (3. 1.1). ( 'I.210 Hence n n '" pH I L..:. At each unit of time a request is made to retrieve one of these elements-element i being requested (independently of the past) with probability P. . Consider the following path from state (I.5. the probabilities given by 1[ . . are the stationary probabilities and P.'S are the stationary probabilities of the forward chain (and also of the reversed chain./ = 11'. Proof Summing the above equality over all i yields L 11'. and a transition probability matrix P* = [P/~] such that (4 7. That is.5) to obtain both the stationary probabilities and the P ~ . I I Hence.7. P./ *P /1---' 11'/ it follows that the P ~ are the transition probabilities of the reversed chain. the 11'.TIME REVERSIBLE MARKOV CHAINS 211 Hence.7. it follows that the reverse process will always decrease by one until it hits state 1 at which time it jumps to a . we have a second argument that the chain is reversible. To illustrate this we start with the following theorem. EXAMPLE 4.j are the transition probabilities of the reversed chain.3 Consider an irreducible Markov chain with transition probabilities P./ If one can find nonnegative numbers 11'" i 2: 0. Since the state of this Markov chain always increases by one until it hits a value chosen by the interarrival distribution and then drops to 1. why?) Since 11'. which deals with the age of a discrete time renewal process.7(E) Let us reconsider Example 4. The concept of the reversed chain is useful even when the process is not urne reversible.5) then the 11'" i 2: 0. P.7 2 is that we can sometimes guess at the nature of the reversed chain and then use the set of equations (4.L PI. summing to unity. let Xn denote the age at time n of a renewal process whose interarrival times are all integers. and the stationary probabilities are as given above. The importance of Theorem 4.3( C). & THEOREM 4. 212 MARKOV CHAINS state chosen by the interarrival distnbution. Hence it seems that the reversed process is just the excess or residual bye process. Thus letting P, denote the probability that an interarrival is i, i ~ I, it seems likely that P~ = Pi' P':,-l=I, i> 1 Since for the reversed chain to be as given above we would need from (4.7.5) that -",--1T 1 J 1T,P, _ j P , LP =, or 1T, = 1T1 P{X > i}, where X is an interarrival time. Since ~i 1T, = 1, the above would necessitate that 1= 1T1 L P{X ~ i} i= 1 = 1TIE[X], and so for the reversed chain to be as conjectured we would need that (4.7.6) P{X~ i} 1T, = E[X] To complete the proof that the reversed process is the excess and the limiting probabilities are given by (4.7 6), we need verify that 1T, p,., + 1 = 1T, +1 P ,*+ 1 ;, or, equivalently, which is immediate. SEMI MARKOV PROCESSES 213 Thus by looking at the reversed chain we are able to show that it is the excess renewal process and obtain at the same time the limiting distribution (of both excess and age). In fact, this example yields additional insight as to why the renewal excess and age have the same limiting distnbution. The technique of using the reversed chain to obtain limiting probabilities will be further exploited in Chapter 5, where we deal with Markov chains in continuous time. 4.8 SEMI-MARKOV PROCESSES I A semi-Markov process is one that changes states in accordance with a Markov chain but takes a random amount of time between changes. More specifically consider a stochastic process with states 0, 1, .. , which is such that, whenever it enters state i, i ;;:: 0: (i) The next state it will enter is state j with probability p,/, i, j ;;:: O. (ii) Given that the next state to be entered is state j, the time until the transition from i to j occurs has distribution F,/. ~ O} is called a semiMarkov process. Thus a semi-Markov process does not possess the Markovian property that given the present state the future is independent of the past For in predicting the future not only would we want to know the present state, but also the length of time that has been spent in that state. Of course, at the moment of transition, all we would need to know is the new state (and nothing about the past). A Markov chain is a semi-Markov process in which If we let Z(t) denote the state at time t, then {Z(t), t F,/I) = {~ t< 1 t~ 1 That is, all transition times of a Markov chain are identically L Let H, denote the distribution of time that the semi-Markov process spends in state i before making a transition That is, by conditioning on the next state, we see H,(t) = L PIJF,/ (t), / and let /1-, denote its mean. That is, /1-, = f; x dH,(x) 214 MARKOV CHAINS If we let Xn denote the nth state visited, then {Xn' n ~ O} is a Markov chain with transition probabilities P,io It is called the embedded Markov chain of the semi-Markov process We say that the semi-Markov process is irreduc_ ible if the embedded Markov chain is irreducible as well. Let T" denote the time between successive transitions into state i and let Jl.;1 = E[T;J By using the theory of alternating renewal processes, it is a simple matter to derive an expression for the limiting probabilities of a semiMarkov process. PROPOSITION 4.8.1 If the semi-Markov process is irreducible and if Tii has a noni~ttice distnbution with finite mean, then P, == lim P{Z(t) 1--+'" = i IZ(O) = j} exists and is independent of the initial state Furthermore, Proof Say that a cycle begins whenever the process enters state i, and say that the process is "on" when in state i and "off" when not in i Thus we have a (delayed when Z(O) ~ i) alternating renewal process whose on time has distribution H, and whose cycle time is T", Hence, the result follows from Proposition 3 4 4 of Chapter 3 As a corollary we note that Pi is also equal to the long-run proportion of time that the process is in state i. Corollary 4.8.2 If the semi-Markov process is irreducible and Il-ll < 00, then, with probability 1, & Il-ll = lim amount of time in i dunng [0, d. 1--+'" t That is, 1l-,11l-11 equals the long-run proportion of time in state i. Proof Follows from Proposition 372 of Chapter 3 SEMI MARKOV PROCESSES 215 While Proposition 4 8.1 gives us an expression for the limiting probabilities, it is not, however, the way one actually computes the Pi To do so suppose that the embedded Markov chain {X n , n ::> O} is irreducible and positive recurrent, and let its stationary probabilities be 1TJ , j ~ O. That is, the 1Tj , j 2! 0, is the unique solution of, and 1Tj has the interpretation of being the proportion of the X/s that equals j (If the Markov chain is aperiodic, then 1Tj is also equal to limn_ex> P{Xn = j}.) Now as 1TJ equals the proportion of transitions that are into state j, and /Lj is the mean time spent in state j per transition, it seems intuitive that the limiting probabilities should be proportional to 1TJ /Lr We now prove this. THEOREM 4.8.3 Suppose the conditions of Proposition 481 and suppose further that the embedded Markov chain {Xn' n 2:: O} is positive recurrent Then Proof Define the notation as follows Y, U) = amount of time spent in state i dunng the jth visit to that state, i,j::::: 0 N, (m) = number of visits to state i in the first m transitions of the semi-Markov process In terms of the above notation we see that the proportion of time in i dunng the first m transitions, call it P,-m, is as follows N,(m) L (481) P, = m = J= 1 Y,(j) ---'--N-c,(--'m >,--Y,(j) L J=I L , m - - L.J-/=1 N,(m) N:;> Y,(j) /=1 N,(m) L N,(m)N£) , m Y,U) N,(m) 216 Now since N,(m) _ 00 MARKOV CHAINS as m _ 00, it follows from the strong law of large numbers that N,(m) Y,U) 2: , m ) ,_I -N( -ILl' and, by the strong law for renewal processes, that N, - _ (E[ num ber 0 f transItions between VISitS to I'])-1 . . . . - (m) m Hence, letting m _ 00 = 1f, in (48 1) shows that and the proof is complete. From Theorem 4.8.3 it follows that the limiting probabilities depend only on the transition probabilities P,) and the mean times ILl> i, j ~ O. Consider a machine that can be in one of three states: good condition, fair condition, or broken down. Suppose that a machine in good condition will remain this way for a mean time ILl and will then go to either the fair condition or the broken condition with respective probabilities t and t. A machine in the fair condition will remain that way for a mean time ILz and will then break down. A broken machine will be repaired, which takes a mean time IL3, and when repaired will be in the good condition with probability i and the fair condition with probability 1. What proportion of time is the machine in each state? ExAMPLE 4.8(A) Solution. Letting the states be 1, 2, 3, we have that the TTl TT, satisfy + TT2 + TT3 = 1, TTl TT z = iTT3 , = tTTI + iTT3 , + TT 2• TT3 = tTTI The solution is SEMI-MARKOV PROCESSES 217 Hence, P" the proportion of time the machine is in state i, is given by The problem of determining the limiting distribution of a semiMarkov process is not completely solved by deriving the P,. For we may ask for the limit, as t ~ 00, of being in state i at time t of making the next transition after time t + x, and of this next transition being into state j To express this probability let yet) = time from t until the next transition, Set) = state entered at the first transition after t. To compute lim P{Z(t) ,~'" = i, yet) > x, Set) = n, we again use the theory of alternating renewal processes THEOREM 4.8.4 If the semi-Markov process is irreducible and not lattice, then (482) lim ,_00 P{Z(t) = i, Y(t) > x, S(t) = j IZ(O) = k} = Proof Say that "on" if the state state is j Say it Conditioning on p,J~ F,,(y)dy a cycle begins each time the process enters state i and say that it is is i and it will remain i for at least the next x time units and the next is "off" otherwise Thus we have an alternating renewal process whether the state after i is j or not, we see that £["on" time in a cycle] = P"£[(X,, - x)'], 218 MARKOV CHAINS where X,) is a random vanable having distribution F,) and representing the time to make a transition from i to j, and y+ = max(O, y) Hence E["on" time in cycle] = P" f~ P{X,) - X > a} da = P,) f~ F,)(a + x) da = P,) r F" ( y) dy As E[cycle time] = iJ-,,, the result follows from alternating renewal processes By the same technique (or by summing (4.8.2) over j) we can prove the following. Corollary 4.8.5 If the semi-Markov process is irreducible and not lattice, then (483) !~~ P{Z(t) = i, Y(t) > xIZ(O) = k} = r H, (y) dy1iJ-/1 Remarks (1) Of course the limiting probabilities in Theorem 4.8.4 and Corollary 4.8.5 also have interpretations as long-run proportions For instance, the long-run proportion of time that the semi-Markov process is in state i and will spend the next x time units without a transition and will then go to state j is given by Theorem 4.8.4. (2) Multiplying and dividing (4.8.3) by JLiJ and using P, = JL,IJLii' gives 1-'" lim P{Z(t) = i, yet) > x} = P, H,.e(X), where H, e is the equilibrium distribution of H, Hence the limiting probability of being in state i is P" and, given that the state at t is i, the time until transition (as t approaches 00) has the equilibrium distribution of H" PROBLEMS 219 PROBLEMS 4.1. A store that stocks a certain commodity uses the following (s, S) ordering policy; if its supply at the beginning of a time penod is x, then it orders o S-x if x ~ s, if x < s. The order is immediately filled. The daily demands are independent and equal j with probability (Xi' All demands that cannot be immediately met are lost. Let Xn denote the inventory level at the end of the nth time penod. Argue that {X n , n ~ I} is a Markov chain and compute its transition probabilities. - - 4.2. For a Markov chain prove that whenever n. < n2 < ... < nk < n. 4.3. Prove that if the number of states is n, and if state j is accessible from state i, then it is accessible in n or fewer steps 4.4. Show that Pn ii 4.5. For states i, j, k, k "" j, let P~'/k k=O L.J ~ n fk pn-k ii il . = P{Xn = j, Xl"" n k, e = 1,. ., n - llXo = I}. (a) Explain in words what P~/k represents (b) Prove that, for i "" j, p~. = L P~iP~7/. k=O 4.6. Show that the symmetric random walk is recurrent in two dimensions and transient in three dimensions. 4.7. For the symmetnc random walk starting at 0: (a) What is the expected time to return to O? 220 MARKOV CHAINS (b) Let N n denote the number of returns by time n. Show that E[N,.] ~ (2n + 1) c:) (~r - 1. (c) Use (b) and Stirling's approximation to show that for n large E[Nn ] is proportional to Vn. 4.8. Let XI' X h al'i;;:;: O. Say that a record occurs at time n if Xn be independent random variables such that P{X, = j} == > max(X I , ..• ,Xn_I ), where Xo = -00, and if a record does occur at time n call Xn the record value. Let R, denote the ith record value. (0) Argue that {R" i ;;:;: I} is a Markov chain and compute its transition probabilities. (b) Let T, denote the time between the ith and (i + l)st record. Is {T" i ;;:;: 1} a Markov chain? What about {(R" T,), i ;;:;: I}? Compute transition probabilities where appropriate. ... (c) Let Sn = 2: T" , I n n > 1. Argue that {Sn' n ;;:;: I} is a Markov chain and find its transition probabilities. 4.9. For a Markov chain {Xn , n > O}, show that 4.10. At the beginning of every time period, each of N individuals is in one of three possible conditions: infectious, infected but not infectious, or noninfected. If a noninfected individual becomes infected during a time period then he or she will be in an infectious condition during the following time period, and from then on will be in an infected (but not infectious) condition. During every time period each of the (:) pairs of individuals are independently in contact with probability p. If a pair is in contact and one of the members of the pair is infectious and the other is noninfected then the noninfected person becomes infected (and is thus in the infectious condition at the beginning of the next period). Let Xn and Y n denote the number of infectious and the number of noninfected individuals, respectively, at the beginning of time period n. (0) If there are i infectious individuals at the beginning of a time period, what is the probability that a specified noninfected individuid will become infected in that penod? (b) Is {X n , n > O} a Markov chain? If so, give its transition probabilities. PROBLEMS 221 (c) Is {Yn , n 2: O} a Markov chain? If so, give its transition probabilities. (d) Is {(Xn' Y n), n 2: O} a Markov chain? If so, give its transition probabil- ities. 4.11. If /" < 1 and ~J < 1, show that: (a) n=1 '" L P~ < 00; (b) !.) = LP~ n=l., '" 1+ n=1 L P~ 4.U. A transition probability matnx P is said to be doubly stochastic if for allj. That is, the column sums all equal 1. If a doubly stochastic chain has n states and is ergodic, calculate its limiting probabilities. 4.13. Show that positive.and null recurrence are class properties. 4.14. Show that in a finite Markov chain there are no null recurrent states and not all states can be transient. 4.15. In the MIGIl system (Example 4.3(A» suppose that p < 1 and thus the stationary probabilities exist. Compute l1'(S) and find, by taking the limit as s -+ 1, ~; il1t • 4.16. An individual possesses r umbrellas, which she employs in going from her home to office and vice versa If she is at home (the office) at the beginning (end) of a day and it is raining, then she will take an umbrella with her to the office (home), provided there is one to be taken. If it is not raining, then she never takes an umbrella Assume that, independent of the past, it rains at the begInning (end) of a day with probability p. (a) Define a Markov chain with r + 1 states that will help us determine the proportion of time that our individual gets wet. (Note: She gets wet if it is raining and all umbrellas are at her other location.) (b) Compute the limiting probabilities. (c) What fraction of time does the individual become wet? 4.17. Consider a positive recurrent irreducible penodic Markov chain and let l1J denote the long-run proportion of time in state j, j 2: O. Prove that "), j 2: 0, satisfy "1 ~I "1 P,), ~, l1J = 1. = 1 - P. .222 MARKOV CHAINS 4. Argue that C (d) Prove and interpret the following result: JEA' . 2..EA 2: 2: 7T.18. be the stationary probabilities for a specified Markov chain. (b) Finish the following statement: ~ .+I = P.EA transitions that . (a) Find the transition probabilities of the Markov chain {X n ... Let A denote a set of states and let A denote the remaining states. and such that P".21. (c) Let Nn{A. Consider a recurrent Markov chain starting in state O. However. (a) Complete the following statement: 7T.P'j is the proportion of all JEAr .P'j is the proportion of all transitions that . let Nn{A" A) denote the number that are from a state in AC to one in A. Let m. Jobs arrive at a processing center in accordance with a Poisson process with rate A. then one of them is processed that day. n ~ O}. 4. Let 7Tj . denote the expected number of time periods it spends in state i before returning to O. the center has waiting space for only N jobs and so an arriving job finding N others waiting goes away. 4. (c) Write the equations for the stationary probabilities.. similarly. Let Xn denote the number of jobs at the center at the beginning of day n...~ 7T.P'J 4. Th us. At most 1 job per day can be processed.20.. A C denote the number of the first n transitions that are ) from a state in A to one in AC. Now give a second proof that assumes the chain is positive recurrent and relates the m I to the stationary probabilities. if there are any jobs waiting for processing at the beginning of a day. and if no jobs are waiting at the beginning of a day then no jobs are processed that day..19. (b) Is this chain ergodic? Explain.. 1.j 2:: 0..-I' . and the processing of this job must start at the beginning of th e day. Consider a Markov chain with states 0. Use Wald's equation to show that mo = 1. . Let T = {I.'S for this chain to be positive recurrent.l-. 4. until the gambler reaches either 0 or N.. the matrix of transition probabilities from states in T to states in T. (b) Show that Mn ./(n) denote the expected amount of time spent in state j during the first n transitions given that the chain begins in state i. column j. as in Section 44. can be solved recursively. n and the m? . and show how they a set of equations for the m" i =:: 0. first for i 0. Let J.... 0< i < n. Consider the Markov chain with states 0. 4. is m.-I.1 + Qn"1 = Q[I + Q + Q2 + .n denote the mean time to go from state i to state n.25.l-.Qf'(1 . 4. denote the mean time to go from state i to state i + 1 Derive .. Let m. and so on. Find the necessary and sufficient condition on the P./(n) (a) Show that Mn = 1 + Q + Q2 + . (c) Show that Mn = (I . + Qn]. . . 4. 1. starting in i.26. Let M n be the matrix whose element in row i. (d) the probability of ever visting state 1. n (b) Let m. in the gambler's ruin problem. Starting 4.l-. + Qn. then i = 1.24. and let Q be. determine (a) the expected number of visits to state 5 t:" (b) the expected number of visits to state 1.22.(q/p)'] { _ (i ifp "" ~ ifp = ~ + 1)/2.PROBLEMS 223 where Po = 1. (c) What is the relation between J. for i and j in T. Compute the expected number of plays.. abilities PIt f • n and transition prob- 1 =P = 1 P.23. (a) Derive a set of linear equations for the J. In the gambler's ruin problem show that P{she wins the next garnblelpresent fortune is i. (c) the expected number of visits to state 5 in the first 7 transitions. she eventually reaches N} P[l .n 1. Consider the gambler's ruin problem with N in state 3.Qn f I) 6 and p = 7.. + 1]/[1 . and compute the limiting probabilities in this case.(q/p).• t} denote the transient states of a Markov chain. . and YI ...224 MARKOV CHAINS Starting at state 0. + Y) = M n .(YI + . Consider a particle that moves along a set of m + 1 nodes. (b) For any state i l ..27..27. . n) let 1T(iI. find the expected number of additional steps it takes to return to the initial position after all nodes have been visited. = O} = P2 . In Problem 4.. Suppose that two independent sequences XI' Xz.. It continues moving until all the nodes 1. . .' . -p 'I In - . = I} = 1 . 2. . labeled 0. Choose some positive integer M and stop at N.m.P{X. 1. . (a) Argue that the above is a Markov chain. Relate it to the mean time of a gambler's ruin problem) (e) Let N denote the first excursion that ends in state n. 2. i = 1. say that an excursion ends when the chain either returns to 0 or reaches state n Let X J denote the number of transitions in the jth excursion (that is. . (f) Find /-tOn' (g) Find/-t. j~1 (d) Find E[XJ ] (Hint. " -1 I-P .P{Y. = 1 These elements are at all times arranged in an ordered list that is revised as follows. = O} = PI' P{Y. 2 4.p. and find E[N]. ~~ P.. . in) denote the limiting probability Argue that P. . = I} = 1 . + X n .29. 4. Each day one of n possible elements is requested. Yz. 4.m... m have been visited at least once Starting at node 0. ..28. the first value of n such that either X I + . At each move it either goes one step in the clockwise direction with probability p or one step in the counterclockwise direction with probability 1 .. the ith one with probability P" i ~ 1. P{X. and all random variables are independent To decide whether PI > P2 or P2 > PI' we use the following test... the element selected is moved to the front of the list with the relative positions of all the other elements remaining unchanged Define the state at any time to be the list ordering at that time. .30. . in (which is a permutation of 1. are coming in from some laboratory and that they represent Bernoulli trials with unknown success probabilities PI and P2 • That is... the one that begins at the jth return to 0).n 4. . find the probability that node i is the last node visited. PROBLEMS 225 or x I + .6] to a Markov chain with transition matrix [ . except for knowing the location where it ends. A spider hunting a fly moves between locations land 2 according to a 0. + X n .PI)' (Hint. that the expected number of pairs observed is where PI (1 .. The spider catches 06 0.P2) A = Pil ..7 0.31.32. also.(YI n -M + . Relate this to the gambler'S ruin problem) 4. and in the latter that P2 > 'P I Show that when PI ~ Ph the probability of making an error (that is. unaware of the spider. Obtain the transition matnx for this chain (a) Find the probability that at time n the spider and fly are both at their initial locations (b) What is the average duration of the hunt? 4.3 0. can be described by a three-state Markov chain where one absorbing state represents hunt ended and the other two that the spider and fly are at different locations. 0.3] Markov chain with transition matrix [ starting in location 1. of asserting that P2 > PI) is 1 P{error} = 1 + AM' and.4 the fly and the hunt ends whenever they meet in the same location..4 0. starts in location 2 and moves according 0. Show that the progress of the hunt.. Consider a simple random walk on the integer points in which at each step a particle moves one step in the positive direction with probability .7 The fly. + Y) = ' In the former case we then assert that PI > P2 . Consider a branching process in which the number of offspring per individual has a Poisson distribution with mean A. A > 1 Let 110 denote the probability that. instead of starting with a single individual. by \/ 4.L { n I Var(XnlXo = 1) = if f. PN N-I = I-and that the particle starts at n (0 < n < N). for p > 1. Show that. Poo = I-and a reflecting barrier at N-that is. the extinction probability is given. (b) the probability that the population becomes extinct for the first time in the third generation Suppose that.35.L f. calculate: (8) the extinction probability.L =I' 1 if f. and find the mean number of steps /I' 4. the branching process follows the same probability law as the branching process in which the number of offspring per individual is Poisson with mean a . and remains in the same place with probability q = 1 .2p(0 < P < ~). Given that {X n > O} is a branching process: (8) Argue that either Xn converges to 0 or to infinity (b) Show that 211_If. the population eventually becomes extinct Also. one step in the negative direction with probability p. let a. Starting with a single individual. In a branching process the number of offspring per individual has a binomial distribution with parameters 2. a < 1.L 1 nu 2 = I. p.33. starting with a single individual. Suppose an absorbing barrier is placed at the origin-that is. the initial population sIze Zo is a random variable that is Poisson distributed with mean A.226 MARKOV CHAINS p. be such that (8) Show that a A11o- (b) Show that. in this case. where f. conditional on eventual extinction.L and 0'2 are the mean and variance of the number of offspring an individual has 4. Show that the probability of absorption is 1.34.L Uf. where n > m. = Xr-I. Suppose that Xo = i and define T = Min{n: n > 0 and Xn = Let }".) 4.3(D) and show that it is time reversible. For the Markov chain model of Section 46.6. n ~ O} be a Markov chain with stationary probabilities 1T1' j ~ O.~ = 1TI PI . (5.j = 0. 4.1.41. For any infinite sequence XI' X 2 . Find the transition probabilities for the Markov chain of Example 4. Argue that {In' n ~ I} is a Markov chain having a continuous state space with transition probability density given by 4.36. 5.. 2. . That is. (Hint· Use Stirling's approximation. At each step. suppose that the initial state is N ~ (:). 1 Pil=-'-I' 1- j=l. namely. . . i>l. • we say that a new long run begins each time the sequence changes direction. T} is distributed as the states of the reverse Markov chain (with transition probabilities P.40.1 and 1). Argue that {}. 4.. takes on the value j. 1) random variables and let In denote the initial value of the nth long run.4) Let XI' X 2 . . and n .9). 4. T. 5. m.. 3. (4.) starting in state 0 until it returns to 0 n. and (3.i-l.38..37.j = 0. 2). Let {X n . Show that when n..m are large the number of steps to reach 1 from state N has approximately a Poisson distribution with mean m [ clog c where c = ~ 1 + logec - 1) l nlm. 9. A particle moves among n locations that are arranged in a circle (with the neighbors of location n being n . 1. then there are three long runs-namely. if the sequence starts 5. 6. Show that the limiting probabilities for this chain are 1TI = a l 12. it moves . • •• be independent uniform (0. a j • I 4.I1T..7(B) that if the Markov chain is in state i and the random variable distributed according to q.PROBLEMS 227 4.39. 4.) and equal to i otherwise. Suppose in Example 4. then the next state is set equal to j with probability a/(a j + a. .228 MARKOV CHAINS one position either in the clockwise position with probability p or in the counterclockwise position with probability 1 .. 1. by using time reversibility. j. . for all i. Show that this Markov chain is of the type considered in Proposition 4.. 1f. 4.. . . Show that the truncated chain is also time reversible and has limiting probabilities given by _ 1f. Consider the list model presented in Example 4..j = i P.44. (b) Is the chain time reversible? 4.n-l.42. Consider a time-reversible Markov chain with transition probabilities p') and limiting probabilities 1f" and now consider the same chain truncated to the states 0.. '=0 4. .43. I = Pnn .p.) i oF j is time reversible if.p (a) Find the transition probabilities of the reverse chain. n and with transition probabilities POI P.. otherwise. k > 0 for all . That is.M . that the limiting probability that element j precedes element i-call it P{ j precedes i}-is such that P{jprecedes i} > P )P )+ . M.7(D) Under the onecloser rule show. P when~ >P. ergodic Markov chain such that p.7 1 and find its stationary probabilities...I .I = 1 + I = p. 4.' 2. and only if. 1f.45. = 1 . Show that a finite state. i=l. for the truncated chain its transition probabilities p..) are o ~i~ M. = . 1.. Consider the Markov chain with states 0.= P ')' 05:i=l=j<M 0. if N = 3 and XI = 1.. (b) Show that 2: I PI) IlL It = 11ILw (c) Show that the proportion of time that the process is in state i and headed for state j is P.1 urns Consider the Markov chain .(N). N? (c) Suppose {X n } is null recurrent and let 1T. where n denotes the whose state at any time is the vector (nl' number of balls in urn i Guess at the limiting probabilities for this Markov chain and then verify your guess and show at the same time that the Markov chain is time reversible l 4.(N)E[time the X process spends in j between returns to i]. n ::> I} is in state j If 11J > 0 for all j. show that {Yn ./ x.) 71'J F' ( ) IL" '.: O} in each of the states 0. (d) Use (c) to argue that in a symmetric random walk the expected number of visits to state i before returning to the origin equals 1 (e) If {X n .) . Y3 = 2.) (I) dl (d) Show that the proportion of time that the state is i and will next be I. we define Yn to be the nth value of the Markov chain that is between o and N For instance. Now consider a new stochastic process {Yn .. n m ). 1.) 71'J IlL II where 71.47. in one of the other m . what proportion of time is {Yn . X5 2. For an ergodic semi-Markov process. j within a time x is P.: O} is also 4. where F~J is the equilibrium distribution of F. n ~ O} that only accepts values of the Markov chain that are between 0 and N That is.) = F.PROBLEMS 229 4.. Xl = 3. X 4 = 6. . Let {Xn' n ~ 1} denote an irreducible Markov chain having a countable state space. n . At each stage one of the balls is selected at random.48. . Y2 = 3. n . then Y1 :..:: 1. n . i = 0. 1. taken from whichever urn it is in.:.. j =1= i.:. n ..46. M balls are initially distributed among m urns.:.:. and placed. at random. N denote the long-run proportions for {Yn . n .: O} is time reversible. (a) Is {Yn . . (a) Compute the rate at which the process makes a transition from i into j.2 O} Show that 1T) (N) = 11.: O} a Markov chain? Explain briefly (b) Let 11) denote the proportion of time that {X n . X3 = 5. Springer-Verlag. Prentice-Hal.50. t 23 = 30 (t. Boston. Methuen. FL. Introduction to Probability Models. Markov Chain Models-Rarity and Exponentiality. Birkhauser.) = t). Springer-Verlag. A taxi alternates between three locations When it reaches location 1 it is equally likely to go next to either 2 or 3 When it reaches 2 it will next go to 1 with probability ~ and to 3 with probability i. 1980 3 E Cinlar. Reversibility and Stochastic Networks. Orlando. Berlin. Wiley.!. Elements of Applied Stochastic Processes. i = 1. Englewood Cliffs. Wiley. NJ. London. Chichester. 2. A First Course in Stochastic Processes. England. For an ergodic semi-Markov process derive an expression.!.) (a) What is the (limiting) probability that the taxi's most recent stop was at location i. The Theory of Stochastic Processes. The Theory of Branching Processes. 1984 2 C L Chiang. An Algorithmic Approach. 1979 9 J Kemeny. Academic Press. England. 11. REFERENCES References 1. 1979 8 F Kelly. Princeton. 3. 1975 4 0 R Cox and H 0 Miller. Krieger. 1992 11 S M Ross. 3? (b) What is the (limiting) probability that the taxi is heading for location 2? (c) What fraction of time is the taxi traveling from 2 to 3? Note. Denumerable Markov Chains. From 3 it always goes to 1 The mean times between locations i and j are tl2 "" 20. tl3 = 30. 1993 12 H C Tijms. 1965 5 T Harris. FL. 10. 9. 4. 1963 6 S Karlin and H Taylor. Introduction to Stochastic Processes. and 12 give alternate treatments of Markov chains Reference 5 is a standard text in branching processes References 7 and 8 have nice treatments of time reversibility 1 N Bhat. Upon arrival at a location the taxi immediately departs. 5th ed . as t ~ 00 for the limiting conditional probability that the next state visited afte. 2. Academic Press. New York. 1994 . Wiley. NJ. Adventures in Stochastic Processes. Van Nostrand.49.230 MARKOV CHA INS 4. MA. Orlando. 6. Chichester. t is state j. given X(t) = i 4. L Snel. 2nd ed . An Introduction to Stochastic Processes and Their Applications. 1966 lOS Resnick. and A Knapp. 1975 7 J Keilson. Berlin. Stochastic Models. given in Chapter 4. to derive the Erlang loss formula. t 2: 0.7 we illustrate the importance of the reversed chain. and to analyze the shared processor system In Section 5. In Section 5 3 we introduce an important class of continuous-time Markov chains known as birth and death processes. t 2: O} taking on values in the set of nonnegative integers In analogy with the definition of a discretetime Markov chain. In Section 5.8 we show how to "uniformize" Markov chains-a technique useful for numerical computations 5. they are characterized by the Markovian property that. t 2: O} is a continuous-time Markov chain if for all s. the future is independent of the past. and nonnegative integers 231 . As in the case of their discrete-time analogs. In Section 5. we say that the process {X(t). even when the process is not time reversible.2 we define continuous-time Markov chains and relate them to the discrete-time Markov chains of Chapter 4. by using it to study queueing network models.6 we consider the topic of time reversibility. given the present state. Applications of time reversibility to stochastic population models are also presented in this section In Section 5. The material in Section 5 5 is concerned with determining the limiting (or long-run) probabilities connected with a continuous-time Markov chain.2 CONTINUous-TIME MARKOV CHAINS Consider a continuous-time stochastic process {X(t).1 INTRODUCTION In this chapter we consider the continuous-time analogs of discrete-time Markov chains. and then illustrate the importance of this observation to queueing systems. we show that all birth and death processes are time reversible.CHAPTER 5 Continuous-Time Markov Chains 5. Among other things. In Section 5 4 we derive two sets of differential equations-the forward and backward equation-that describe the probability laws for the system. These processes can be used to model popUlations whose size changes at any time by a single unit. t :> O. > t} for all s. before proceeding to the next . by the Markovian property. P{X(t + s) = jIX(s) = i}isindependentofs. V. then state i is called absorbing since once entered it is never left) Hence. note that as the process is in state i at time s. > s + tiT. . = 0. 0 ~ u < s} = P{X(t + s) j IX(s) i} In other words. a transition does not occur) during the next s time units What is the probability that the process will not leave state i during the following t time units? To answer this. where L p ". then P{7". ()O<: <: xu. the random variable 7". that the probability it remains in that state during the interval {s. denote the amount of time that the process stays in state i before making a transition into a different state. we shall assume throughout that 0 s V. (If V.. All Markov chains we consider will be assumed to have stationary transition probabilities.s. Namely. if we let T. it will next enter state j with some probability. then the continuous-time Markov chain is said to have stationary or homogeneous transition probabilities. call it P. given the present state at s and all past states depends only on the present state and is independent of the past If.232 " I. = 00 is called an instantaneous state since when entered it is instantaneously left. X(u) = x(u). > s} = P{T. ]. but is such that the amount of time it spends in each state. Whereas such states are theoretically possible. A state i for which V. and (ii) when the process leaves state i. In fact. < 00 for all i. and suppose that the process does not leave state i (that is. Suppose that a continuous-time Markov chain enters state i at some time. it follows. say. P. in addition. for our purposes a continuous-time Markov chain is a stochastic process that moves from state to state in accordance with a (discrete-time) Markov chain. Hence.u . a continuous-time Markov chain is a stochastic process having the Markovian property that the conditional distribution of the future state at time t + s. s + t] is just the (unconditional) probability that it stays in state i for at least t time units. it is a stochastic process having the properties that each time it enters state i: (i) the amount of time it spends in that state before making a transition into a different state is exponentially distributed with rate. That is. the above gives us a way of constructing a continuous-time Markov chain.! = 1. say time 0. is memoryless and must thus be exponentially distributed.!. P{X(t CONTINUOUS TIME MARKOV CHAINS + s) = jlX(s) = i. assume from now on that all Markov chains considered are regular (some sufficient conditions for regularity are given in the Problem section). the amount of time the process spends in state i. however. Let q'J be defined by all i =F j Since v. and when the state increases by 1 we say that a birth occurs. = q.j I > 1 is called a birth and death process Thus a birth and death process is a continuous-time Markov chain with states 0.I or state i + 1.3 BIRTH AND DEATH PROCESSES A continuous-time Markov chain with states 0.-I· . for which transitions from state i can only go to either state i .1. Let us denote by p'J(t) the probability that a Markov chain. " forwhichq'J = owhenever Ii . is exponentially distributed. is the rate at which the process leaves state i and p'J is the probability that it then goes to j. In addition. will be in state j after an additional time t That is. it follows that q'J is the rate when in state i that the process makes a transition into state j. We shall. and Ii. and the next state visited. A continuous-time Markov chain is said to be regular if. 1. the number of transitions in any finite length of time is finite An example of a nonregular Markov chain is the one having P. and when it decreases by 1 we say that a death occurs Let A. and in fact we call q'J the transition rate from i to j.. p'J(t) = P{X(t + s) = j IX(s) = i} 5. with probability 1. with positive probability. presently in state i.. It can be shown that this Markov chain-which always goes from state i to i + I. t > O.. must be independent random variables For if the next state visited were dependent on T" then information as to how long the process has already been in state i would be relevant to the prediction of the next state-and this would contradict the Markovian assumption. be given by Ii.BIRTH AND DEATH PROCESSES 233 state. The state of the process is usually thought of as representing the SlZe of some population. make an infinite number of transitions in any time interval of length t. spending an exponentially distributed amount of time with mean Ih' 2 in state i-will.+I = 1. he waits in line). When a server finishes serving a customer. we can think of a birth and death process by supposing that whenever there are i people in the system the time until the next birth is exponential with rate A. t 2: O} is a birth and death process with ExAMPLE ~n = {n~ s~ n >s. Each customer. the times between successive arrivals are independent exponential random variables having mean 11 A. If we let X(t) denote the number in the system at time t. n::>O. n 2: 0. (ii) A Linear Growth Model with Immigration. which is exponential with rate ~ I' S. + ~" Hence.3(A) Two Birth and Death Processes. (i) The MIMls Queue. and the next customer in line. there is an exponential rate of increase 8 of the population due to an external source such as immigration Hence. is called a linear growth process with immigration Such processes occur naturally in the study of biological reproduction and population growth. Each individual in the population is assumed to give birth at an exponential rate A. upon arrival. That is. An = nA + 8.234 CONTINUOUS TIME MARKOV CHAINS The values {A" i :> O} and {~I' i and the death rates. in addition./ V. A model in which ~n = n~. Deaths are assumed to occur at an exponential rate ~ for each member of the population. enters the service. and hence ~n = n~ . 2: I} are called respectively the birth rates v" we see that A. Since L/ q. and if not. The successive service times are assumed to be independent exponential random variables having mean 11~. n::> 1. the customer leaves the system. Suppose that customers arrive at an s-server service station in accordance with a Poisson process having rate A. goes directly into service if any of the servers are free. if there are any waiting. the total birth rate where there are n persons in the system is nA + 8. and is independent of the time until the next death. then {X(t). then the customer joins the queue (that is. If we suppose that no one ever dies. . if X(t) represents the population size at time t. P{T] + Tz<t}= tp{T.BIRTH AND DEATH PROCESSES 235 A birth and death process is said to be a pure birth process if JLn = 0 for all n (that is.AX dx = f: (1 .. if death is impossible) The simplest example of a pure birth process is the Poisson process. is the time it takes for the population size to go from i to i + L It easily follows from the definition of a Yule process that the Tn i 2: 1.e. 1 Thus. +. t 2: O} is a pure birth process with n>O This pure birth process is called a Yule process Consider a Yule process starting with a single individual at time 0. is exponential with rate iA Now P{T] ::s.. and. A second example of a pure birth process results from a population in which each member acts independently and gives birth at an exponential rate A. P]I(t) = (1 . we see from the above that. T. j"2::.x dx .e-). and let T" i 2: 1.l)st and ith birth That is. which has a constant birth rate An = A.(1 .e-A')J-I.')Z.A/ )3. then.t.e-).Ax (1 e. + Tz + T3 tiT] + Tz = x}dF. = x}Ae. starting with a single individual.AX ) dx 3A (1 = (1 . in general. +. we see that.e-At)1 e-).e-At)I. the population size at time t will have a geometric distribution with mean eM Hence if .e-At)J-1 . t} = 1 . + T z + T3 ~ t} = f'oP{T. denote the time between the (i . ~ ~ t} = (1 2: .e-u(t-'»Ae-).'(1 .eX»2Ae. n 2: O. we can show by induction that P{T1 + . + Tz<tlT. (x) = f: (1 . {X(t).+ ~ S t} = P{X(t) j + lIX(O) = I}. + Hence. S I 2 = (1 P{T. as P{T. are independent and T. for a Yule process. 3 1) is the joint density function of the order statistics of a sample of n random variables from the density f (see Section 23 of Chapter 2). . + T. and will thus have a negative binomial distribution.S.e. we have thus proven the following . . j2=i>l Another interesting result about the Yule process.1) f(S" .236 CONTINUOUS-TlME MARKOV CHAINS the population starts with i individuals... <S" < t _ P{T .S1' . = SI.. ._I.= I where f is the density function Ae-A(I-X) (532) f(x) = { 1 . + . it follows that its size at t will be the sum of i independent and identically distributed geometric random variables.. . But since (5.. Tn = Sn . starting with a single individual. That is.).5n given that X(t) = n + 1 Reasoning heuristically and treating densities as if they were probabilities yields that for 0 :s: SI ~ S2 <.s"ln + 1) = n~ IIf(s. Since the ith birth occurs at time 5.AI o otherwise. 5" given that X(t) = n + 1 is given by n (53. concerns the conditional distribution of the times of birth given the population size at t. for the Yule process.. == T. .• .. let us compute the conditional joint distribution of 51. 'Sn Hence we see that the conditional density of 51 . T2 = S2 . Tn+l >t- sn} - P{X(t)=n+1} P{X(t) = n + I} where C is some constant that does not depend on SI' . 3.e. . the birth times 51! ' 5 n are distributed as the ordered values from a sample of size n from a population having density (532) Proposition 53.)IX(t) = n + 1] = ao +t +n f Ae. call it A (t).1 Consider a Yule process with X(O) = 1 Then.33) A(t) = ao + f~ X(s) ds .xl (t .. can be computed by the same method.\ A (t) = ao +t+ L (t .1 .A(.. Let us compute the expected sum of the ages of the members of the population at time t The sum of the ages at time t. can be expressed as X(i).AI ( -A/) A 1.e l or E [A (t)IX(t)] = ao + t + (X(t) .\ where ao is the age at t = 0 of the initial individual To compute E [A (t)] condition on X(t). given that X(t) = n + 1.At A = ao + 1 A Other quantities related to A (t).).e Taking expectations and using the fact that X(t) has mean eM yields E [A (t)] = ao +t+ e AI - eAt .S.1) 1 .AI dx o .Ate.AI .3(8) Consider a Yule process with X(O) = 1. E[A(t)IX(t) =n + 1] = ao + t + E [~(t .BIRTH AND DEATH PROCESSES 237 & PROPOSITION 5.S. The above formula for E [A (t)] can be checked by making use of the following identity whose proof is left as an exercise (5. such as its generating function.1 can be used to establish results about the Yule process in the same way that the corresponding result for Poisson processes is used EXAMPLE 5.x) 1 . with probability ah + o(h).(m 1. are independent exponential random variables with respec. If we let X(t) denote the number of infected individuals in the population at time t.1 otherwise The above follows since when there are n infected individuals.i)ia. then each of the m ..238 Taking expectations gives CONTINUOUS-TIME MARKOV CHAINS E [A (t)] = ao + E [I~ X(s) dS] = ao + I~ E [X(s)] ds since X(s) ~ 0 = ao + = I' eJ.1 A The following example provides another illustration of a pure bi~th process S.) )2 .1.s ds 0 ao + eAr .1 "susceptibles " Once infected an individual remains in that state forever and we suppose that in any time interval h any given infected person will cause.i) "V -----::- 1 Var(T) = 2 m- a 1( L. . = (m . Consider a population of m individuals that at time 0 consists of one "infected" and m . we see that tive rates A. i = 1. t ~ O} is a pure birth process with EXAMPLE A n = {(m .n susceptibles will become infected at rate na. . . m . then T can be represented as m -I T= L I~ 1 T" where T.m . is the time to go from i infectives to i + 1 infectives.n)na 0 n = 1. E [T] and =- 1 m-I 1 a ~ i(m . As the T. the {X(t).3(c) A Simple Epidemic Model. If we let T denote the time until the total population is infected. any given susceptible to become infected.= 1 I I . t t ma 5. 1r~0 '" Lemma 5 4. before doing so we need the following lemmas Lemma 5.1 follows from the fact (which must be proven) that the probability of two or more transitions in time t is o(t). By exploiting the Markovian property.(s)._1 m I 1 (1 1) f (_1_+!) ma mm-I - 1 dt= 210g(m-l).4 THE KOLMOGOROV DIFFERENTIAL EQUATIONS Recall that PI' (t) P{X(t + s) j IX(s) i} I represents the probability that a process presently in state i will be in state j a time t later.4.21 m I I ".k(t)PIr. I. whereas Lemma 5.THE KOLMOGOROV DIFFERENTIAL EQUATIONS 239 For reasonably sized populations E [T] can be approximated as follows. .2 For all s._0 1 ") (n I' PII (I) Im--=q'l' 1-. E(T]=-L -.1 ') 1m (I I' 1 .0 t i=l=j Lemma 5.=. ma . which may sometimes be explicitly solved. we will derive two sets of differential equations for PI' (t). However.4. PII(1 + s) = 2: P.2.Pit (I) = v.4. which .+--. 3 (Kolmogorov's Backward Equations). Now.4.1. follows directly from the Markovian property The details of the proof are left as exercises. Pi/ (t) Proof To complete the proof we must justify the interchange of limit and summation on the right-hand side of (5 4 1). we thus obtain.2 we obtain P. Dividing by h and then taking the limit as h Lemma 54.} (I + h) ./ (t) = h I' '" P'i«h) P (t) P (t) 1m L.. = 2: Since the above holds for all N we see that k k<N *.4. for any fixed N.CONTINUOUS-TIME MARKOV CHAINS is the continuous-time version of the Chapman-Kolmogorov equations of discrete-time Markov chains. P. the following THEOREM 5. * or. equivalently.} (t + h) = 2.[1 i<#.Jt h-O 0 yields. v./ (t).i«h)Pi</(t).4.P./ (t) = 2. For all i.1) . j. upon application of + h) ~ p. I 1m p. (5. q. again using Lemma 54. 0. and t 2? P:} (t) = 2: q. 1/ • h-OkH Assuming that we can interchange the limit and summation on the nght-hand side of (5.k Pk} (t) k *. Pik(h) Pi</ (t) .4. P.k Pk/ (I) (542) . -+ PIl(h)]P.1. From Lemma 5.i h */ v.1). we started our calculation with P" (t + h) L. (h) . i . known as the Kolmogorov's forward equations..k' k<N k4'1 k<N *.k(h)} h-O HI h k .Plk(h)] L.k () + 1 . P. since P*.i P.(t) S I.. .THE KOLMOGOROV DIFFERENTIAL EQUATIONS 241 To reverse the inequality note that for N > i.' where the last equality follows from Lemma 5 4.!Xl and using the fact ~H' q.43 are known as the Kolmogorov backward equations They are called the backward equations because in computing the probability distribution of the state at time t + h we conditioned on the state (all the way) back at time h.k(t)P k k.0 HI h' h H.(t) + L P. k<N h k<N L q. SlimsuP[L P'k(h) Pk. That is.1 As the above inequality is true for all N> i.Imsup [". by now conditioning on the state at time t This yields P..L .(h) . P{X(t + h) = iIX(O) = i.k Pk.. . X(h) = k k}P{X(h) == kIX(O) = i} We may derive another set of equations..P.} (t + h) = L. N h k<N J _I' '" ..k = V" The above combined with (5 42) shows that which completes the proof of Theorem 5 43 The set of differential equations for PI} (t) given in Theorem 5. we obtain upon letting N .*(h)p t h.L q. (t) + V. we cannot always justify the interchange of limit and summation.(t CONTINUOUS TIME MARKOV CHAINS + h) P.(t) == Therefore.k(t)Pk. However.APoo(t) -(A + p. Prk(t) k*1 VI P" (t) Unfortunately.4 (Kolmogorov's Forward Equations).(h) H-.PJI(h)]P. Under suitable regularity conditions.1 that P:. 2: P.(t) == 2: P. they do hold in most models-including all birth and death processes and all finite-state models. Consider a two-state continuous-time Markov chain that spends an exponential time with rate A in state 0 before going to state 1. (t) = 2: qk. EXAMPLE 5.242 or P. and thus the above is not always valid. [1.(t) . where it spends an exponential time with rate p. We thus have THEOREM 5. before returning to state 0 The forward equations yield P~o(t) = p..4(A) The Two-State Chain..4.k(t)Pk/h) k P.... we obtain by Lemma 54...(t) Assuming that we can interchange limit with summation.Po.)Poo(t) + p." . and thus P (t) 00 = ~ + _A_e-(H/ll' A + /L A + /L Similarly (or by symmetry). j+l(t) .4(a) P:O(t) P:j(t) EXAMPLE = /L I Pt1 (t) = . P (t) II = _A_ + ~ e-(H/llt A + /L A + /L The Kolmogorov forward equations for the birth and death process are EXAMPLE 5.t (t) P.AjP'j(t). j>i. ArlP. r l (t) + /Lj+1 P. of course. or Thus. = Aj_IP'j_l(t). Hence.AoP.Poo(t). Since POD (0) = 1.(Aj + /Lj)p/j(t). we see that c = A/(A + /L).THE KOLMOGOROV DIFFERENTIAL EQUATIONS 243 where the last equation follows from POI (t) = 1 .j(t) = . is true as P II (t) is the probability that the time until a transition from state i is greater than t The other .A. the forward equations reduce to (543) P.4(c) For a pure birth process. j oF 0 5. Integrating the top equation of (543) and then using PII (0) yields =1 The above. PII(t).o(t). /O) d = 0. where A} = jA. eA/ Ar I PlJ . m). t ~ O} is itself a continuous-time Markov chain.p) q[(n. 5. m .3) as follows' From (5 4. the numbers of females and males in the population at time t. Consider a population of males and females and suppose that each female in the population independently gives birth at an exponential rate A. we can often compute other quantities of interest. yields In the special case of a Yule process. If X(O) = i. m)] = nfL q[(n. (n + 1.p.244 CONTINUOUS TIME MARKOV CHAINS quantities P. (b) E [Y(t)]. (n. Y(t)]. Individual females die at an exponential rate fL and males at an exponential rate v.1.1 (t) == e A / [P:/t) + A} P. In many models of interest the Kolmogorov differential equations cannot be explicitly solved for the transition probabilities However. respectively. for j > i. t ::> O} is a continuous-time Markov chain with infinitesimal rates' EXAMPLE q[(n.4. m). Letting Mx(t) = E [X(t)IX(O) = i] we will derive a differential equation satisfied by Mx(t). and Y(O) = j. m). by first deriving and then solving a differential equation Our next example illustrates the technique. (n.3) we have. Y(t)] Solution (a) To find E [X(t)].4(D) A Two Sex Population Growth Model.} (t). find (a) E{X(t)]. and that each birth is (independently of all else) a female with probability p and a male with probability 1 . can be obtained recursively from (5. To begin. and (c) Cov[X(t). Thus.1)] = mv. m)] = nAp. we can use the above to verify the result of Section 5 3. m). using P. (n . namely. j > i. note that {X(t). if X(t) and Y(t) denote. then {[X(t). m + 1)] = nA(1 . q[(n./t)] == dt [eA/p.} (t)] Integration. . such as mean values. upon integration log Mx(t) or.(p... Ap . 0 < U S t. X(t) + 1 with probability ApX(t)h + o(h) X(t + h) = X(t) .5) E[X(t)] . is informative about the number of females in the population at time t.Mx(t) = (A _ )M () +. does .)Mx(t)h + o(h) Mx(t + h) .. 0 s s < t. and taking expectations once again yields Mx(t + h) or. (Ap p.p.o(h) h p p. and.)Mx(t) or. given X(t). M/(t)/ Mx(t) :::. it follows that the conditional distribution of Y(t + s) given Y(u). xt h' Letting h ---+ 0 gives M/(t) (Ap p.)t + C Since M(O) = K = i.: Mx(t) (b) Since knowledge of Y(s). we obtain that ie(Ap-.)I (54.p. = Mx(t) + (Ap .)X(t)h + o(h).p.1 with probability p. + Ap)X(t)h + o(h).THE KOLMOGOROV DIFFERENTIAL EQUATIONS 245 note that. taking expectations yields E [X(t + h)IX(t)] X(t) + (Ap . (544) Hence..X(t)h + o(h) { X(t) with probability 1 . yet)] yet) + [A(1 p)X(t) . Hence.p)Mx(t) .vMy(t) == iA(1 . and taking expectations once again.CONTINUOUS TIME MARKOV CHAINS not depend solely on Yet). yet + h) = yet) + 1 with probability A(1 p)X(t)h + o(h) yet) . and so we cannot derive a differential equation j] in the same manner as we did for for My(t) = E[Y(t)IY(O) Mx(t) However.p)X(t) . yields My(t + h) = My(t) + [A(1 - p)Mx(t) .vY(t)]h + o(h).p)e(AP-fL)1 . we can compute an expression for My(t + h) by conditioning on both X(t) and Yet). {Yet). Indeed. dividing by h. t 2: O} is not a continuous~ time Markov chain.vY(t)]h + o(h).[A(1 .1 with probability vY(t)h { yet) + o(h) with probability 1 .vMy(t)]h + o(h). Upon subtracting My (t).J1- p) +C . and letting h --+ 0. we obtain My'(t) == A(1 . j My (0) = iA(1 - Ap + v . Adding vMy(t) to both sides and then multiplying by e VI gives or Integrating gives Evaluating at t == 0 gives. Taking expectations gives E[Y(t + h)IX(t).vMy(t). conditional on X(t) and Yet). To begin.p) ) eAp V/ + II - P.THE KOLMOGOROV DIFFERENTIAL EQUATIONS 247 and so. or. note that from (5. and taking expectations of the preceding yields that Dividing through by h and taking the limit as h using Equation (5 4.P.4. Now. or As M2 (0) = .2. that M ~ (t) ---+ 0 shows. Hence.)hX2(t) + (Ap + p.6) My(t) = iA(l . we obtain the result (54. (c) Before finding Cov[X(t).p. upon = 2(Ap . letting M 2 (t) = E [X 2(t)].7) .5).)e(AP-I')I. or. let us first compute E [X2(t)].1fp. equivalently.)hX(t) + o(h). + Ap)X(t)h] + o(h) = X2(t) + 2(Ap .(p. yet)]. (j - iA(l . (54.4) we have that: E[X2(t + h)IX(t)] = [X(t) + 1]2ApX(t)h + [X(t) .p) e(AP-I')1 Ap + II - + P.)M2(t) + i(Ap + p.X(t)h + X2(t)[1 . 248 CONTINUOUS-TIME MARKOV CHAINS Now.' Ii.+ Ap) e.+ AP)] Ii.1) X(t)Y(t) Hence. :t {e-(AP-/L.p)hE[X2(t)] + o(h) implying that.}I M2 (t) = A(l .i(1i.v] + X2(t)A(1 .the above == (X(t) .}1 Integration now yields .p) [i2 ..Ii. A(l .p)h + o(h) Taking expectations gives Mxy{t + h) = Mxy{t) + heAp .p) i(1i. E[X(t + h)Y(t + h)IX(t).. or...}I M xy (tH = A(l . Yet)] = X(t)Y(t) + hX(t)Y(t)[Ap . 1 ..P )e-(AP -/L.l)Y(t) X(t)(Y(t) .p)X(t)h + o(h) with prob li-X(t)h + o(h) with prob vY(t)h + o(h) with prob.Ii.Ap + A(l . ApX(t)h + o(h) with prob.. Since the probability of two or more transitions in a time h is o(h).v)Mxy(t) + A(1. let Mxy(t) = E[X(t)Y(t)].Ap e(AP-/L+. we have that· X(t + h)Y(t + h) (X(t) + l)Y(t) X(t)(Y(t) + 1) with prob. p) Ap -p. + v + CelAp -p.. p)(p..)1 Hence.. (546). Cov[X(t). + Ap) eIAP-I')1 p..p) 1')1 i(p. Mxy(t) = £A(1 p)(p..VII....Ap +v It also follows from Equations (545) and (5. and (5.. ( 548) ..Ap 1')1 5. + Ap) e(Ap v(p. i(p. 2(Ap-p..... + Ap) v(p. [.4. .+v p.2 _.1 Computing the Transition Probabilities If we define "} by iii =1= j if i j [hen the Kolmogorov backward equations can be written as .. + Ap) e2(AP "1')1 Ap Ap-p.p..THE KOLMOGOROV DIFFERENTIAL EQUATIONS 249 or. . e(Ap 1'-11)1 + iA(1 ..(p. + Ap e AP)] p.Ap _ i(p.p)(p. Using that Cov[X(O). Ap) + A(1-p) Ap -p.. .4 7) that.. .p) Ap + v . from Equations (545). + Ap) e2(AP p. Y(t)] = [C _ ij + .A(1 = ij _ FA(1 .(p. . Y(O)] C = 0.. Var[X(t)] = . + Ap) e(AP-I')1 v(p.48) we obtain after some algebraic simplifications that.Ap) A(1 .+Ap) p..Ap) + A(1 ...2A(1 P)] Ap + v-p. gives . 0 '" where RO is the identity matrix I..250 CONTINUOUS TIME MARKOV CH. and pr(/) by le~ting the element in row i. Secondly.9) to compute P(t) turns out to be very inefficient for two reasons First. by choosing n large enough. however. since the matrix R contains both positive and negative values (recall that the off-diagonal elements are the q. ". By letting n be a power of 2.v.. .\INS and the forward equations as These equations have a particularly nice form in matrix notation.} while the ith diagonal element is = . If we define the matrices R. say n = 2k. we often have to compute many of the terms in the infinite sum (549) to arrive at a good approximation. since only the diagonal elements of R are negative and the diagonal elements of 1 are 1 we can. guarantee that all of the elements of M are nonnegative.. use (5 49) to efficiently approximate the matrix P(t) by utilizing the matrix equivalent of the identity 'Ii eX = lim (1 n . Prj (I).) there is a problem of computer roundoff error when we compute the powers of R. column j of these matrices be.. P(/). We can.. we can thus approximate P(/) by raising the matrix M = 1 + Rtln to the nth power.9) is valid provided that the v. and in fact it can be shown that (54. and P:} (t).. _are bounded The direct use of Equation (54. which can be accomplished by k matrix multiplications In addition. respectively. then the backwards equations can be written as "I' and the forward equations as P'(t) This suggests the solution (549) P(/)R P(/) = eRr == 2: (Rt)'li' . '" + xlnt which states that e RI = lim (I + Rtln)n n .. also equals the long-run proportion of time the process is in state j (2) If the initial state is chosen according to the limiting probabilities {PI}' then the resultant process will be stationary.5 LIMITING PROBABILITIES Since a continuous-time Markov chain is a semi-Markov process with it follows. That is. P. that if the discretetime Markov chain with transition probabilities P" is irreducible and positive recurrent. (553) Remarks (1) It follows from the results given in Section 4. then the limiting probabilities P. .1) where the 1T.LIMITING PROBABILITIES 251 5. P". = 2: I V. from the results of Section 4 8 of Chapter 4. for all t.52) we see that the P. using q" = V.8 of Chapter 4 for semi- Markov processes that P.2) From (5. or. are the unique nonnegative solution of V. equivalently. P. P.5 1) and (5.!. = lim( _ '" P" (t) are given by (55. are the unique nonnegative solution of (55. we obtain upon letting t ---7 00.. (t) would necessarily converge to 0 as t ---7 00.P. and is left as an exercise.~. (4) Equation (553) has a nice interpretation. L (q. ~ o(h) . L qkjP.k(t) - V. = L P. + (1 '''' vjh + o(h»P. the number of transitions into state j must equal to within . = lim Pkj(t + s) J_OO = P. 0= k"'. == = L P. The above interchange of limit and summation is easily justified.(t) == k"'. then assuming that we can interchange limit and summation in the above. It is worth noting that the above is a more formal version of the following heuristic argument-which yields an equation for P..(t)Pk/(s) J ..v.252 The above is proven as follows: CONTINUOUS-TIME MARKOV CHAINS L Pi. .. t). P. or _ 0- 1::.. L Pkqk....5..(t) lim P . (t) exist.P.(t)P.. the probability of being in state j at t = oo-by conditioning on the state h units prior in time: P..q" .. which is as follows: In any interval (0. {1J = lim L Pi. + h ' and the result follows by letting h ---7 O. CJJ .. (Why?) Hence.j(h)P. (t)... If we assume that the limiting probabilities P. P:. - v. I kr (s) J .. Pi. (3) Another way of obtaining Equations (5. = limr-oo P.3) is by way of the forward equations P:.h + o(h»P. = rate at which the process leaves state j. Because it balances (that is. AoPo = /-LIP I . AI PI /-L2 P 2 + (AoPo ./-LI PI) = /-L2 P 2. since P. > 0 for all j.5. Now when the process is in state j it leaves at rate v" and. is the proportion of time in state i. Similarly..J' and. is the proportion of time it is in state j. Hence.3) are sometimes referred to as balance equations. (Why?) Hence. A 2P 2 = /-L3 P 3 + (AIP 1 .3). equates) these rates. equivalently.. it thus follows that ""P. (5. AnPn = /-LntlPntl + (An-IPn. Let us now determine the limiting probabilities for a birth and death process. by equating the rate at which the process leaves a state with the rate at which it enters that state. we say that the chain is ergodic. (5) When the continuous-time Markov chain is irreducible and P.LIMITING PROBABILITIES 253 1 the number of transitions out of state j.1 - /-LnPn) = /-LntlPntl' .P.n > 0 Rewnting these equations gives n ~ 1. 2: P. or. when the process is in state i it departs to j at rate q. we see that the rate at which transitions from i to j occur is equal to q. we obtain State Rate Process Leaves Rate Process Enters o n. in the long run the rate at which transitions into state j occur must equal the rate at which transitions out of state j occur.5. equivalently.q" rate at which the process enters state j. since P.3) is just a statement of the equality of the rate at which the process enters and leaves state j./-L2 P 2) = /-L3 P 3. From Equations (5. or.5. Therefore. Equations (5. I ···A A I 0 n~ I J.LI Ao P2 = -PI J. J.:: A.4) n~l The above equations also show us what condition is needed for the limiting probabilities to exist Namely. J.Ln· . J.L2J.L.L1 Po Using L:~o Pn = 1 we obtain 1 = Po + Po L OC A n. n~O .L2 AI = --Po. and thus from (55. The MIM/1 Queue.L3J. EXAMPLE 5.L1 or and hence (55.Ln = J.L2J.L1 AIAo P 3 = -P2 = J.L3 A2 A2AI Ao J.L2J.4) In the MIMIl queue An::.254 CONTINUOUS TIME MARKOV CHAINS Solving in terms of Po yields PI = -Po.5(A) J. Customers arrive at rate A and are served at rate J.L (M .L for limiting probabilities to exist.. . From Equation (554) we have that Pn . M! (M . M' (M .M (M .3 of Chapter 4.Ln :: J.L (A)n 1 + ~I (A)n -. and suppose that the amount of time a machine runs before breaking down is exponentially distributed with rate A and the amount of time it takes the repairman to fix any broken machine is exponential with rate J.n)' J. then this system can be modeled as a birth and death process with parameters ExAMPLE: J.n n=O M M' .n)A n 0 n>M.L...L.n)! P = n -":"'M-(-A-)-n~M-'1 + ~l -. the average number of machines not in use is given by M M L.?: 1 A = {(M . which is null recurrent and thus has no limiting probabilities 5.nP n=O n = L.L < 1 It is intuitive that A must be less than J.n)' Hence. If we say that the state is n whenever there are n machines down. they will arrive at a faster rate than they can be served and the queue size will go to infinity.L behaves much like the symmetric random walk of Section 4. (M-n) J.L. the limiting probability that n machines will not be in use. n.n)' Suppose we wanted to know the long-run proportion of time that a given machine is working..LIMITING PROBABILITIES 255 provided that AI J.L.L (A)n n:: 0. To determine this.5(a} Consider a job shop consisting of M machines and a single repairman. The case A = J. is given by Po =--------1+ (~)n M! 1 f n=1 J. and thus if A > J. . we compute the . say state 0. £ [1-:0] 1 . irreducible continuous-time Markov chain and suppose we are interested in the distribution of the number of visits to some state. at any time t. equating the two expressions for the average reward gives that £ [T~] = 2£ [Too] 2: P. with a new cycle beginning each time state 0 is entered.256 CONTINUOUS TIME MARKOV CHAINS equivalent limiting probability of its working. by time t.3 5 that for t large the number of visits is approximately normally distributed with meantl £ [Too] and variance tVar(T00)1 £3[Too ]' where Too denotes the time between successive entrances into state 0 E [Too] can be obtained by solving for the stationary probabilities and then using the identity To calculate £ [T~o] suppose that.0 is the time to enter 0 given that the chain is presently in state i Therefore. P{machine is working} 2: P{machine is working In not working}Pn n M M 0 = n 2:0 M-n M Pn • Consider a positive recurrent. a reward is being earned at a rate equal to the time from t until the next entrance into state O. it follows that the average reward per unit time can also be expressed as Average Reward = 2: P. It then follows from the theory of renewal reward processes that. Since visits to state 0 constitute renewals.o] I where 7. it follows from Theorem 3. is the proportion of time the chain is in state i. the long-run average reward per unit time is given by Average Reward £ [reward earned during a cycle] £ [cycle time] £ [f~oo XdX] £ [Too] = £ [T~o]/(2 £ [TooD· But as P. £ [T. Y > t are independent. = Tr. since the forward process is a continuous-time Markov chain it follows that given the present state. P{X(t . that is.. going backwards in time.s. Therefore.P'1 . In addition.s. suppose that it started at time . 5. call it X(t).. P{process is in state i throughout [t . tn! P{X(t) = i} since P{X(t s) = i} = P{X(t) = i} = PI' In other words.S P{X(t) = i} = i} :.7. and we say that it is in steady stote.s) = i}e-v. as was shown in Section 4.o as the time it takes to leave state i plus any additional time after it leaves i before it enters state 0.s) = j IX(t) i} and so we can conclude that the reverse process is also a continuous-time Markov chain.::: P{process is in state i throughout [t . Y > t} = P{X(t . t] IX(t) P{X(t .s) and the future states X(y). the past state X(t . since the amount of time spent in a state is the same whether one is going forward or backward in time it follows that the amount of time the reverse chain spends in state i on a visit is exponential with the same rate VI as in the forward process To check this formally.6 TIME REVERSIBILITY consider an ergodic continuous-time Markov chain and suppose that it has been in operation an infinitely long time.00 Such a process will be stationary. Also. X(y). suppose that the process is in i at time t Then the probability that its (backwards) time in i exceeds s is as follows.0 i:> 0. we obtain that the quantities E [I:o] satisfy the following set of linear equations E [I:o] = 11 V. . + L PI.s) j IX(t) i. E [TiO ]. the sequence of states visited by the reverse process constitutes a discrete-time Markov chain with transition probabilities P ~ given by P* I. the amount of time that the process spends in state i during a visit is also exponentially distributed with rate V.. (Another approach to generating a stationary continuous-time Markov chain is to let the initial state of this chain be chosen according to the stationary probabilities) Let us consider this process going backwards in time Now.TIME REVERSIBILITY 257 By expressing T. Equation (56.1) .q. That is. is the proportion of time that the Markov chain is in state j when looking in the forward direction of time then it will also be the proportion of time when looking backwards in time. Hence.P" p. time Markov chain with transition probabilities P". Using the preceding formula for P.J " 1T. - V. / v" 1T. vi P.. This follows because if P. we show that PI' j :> 0 are the stationary probabilities for the reverse chain by showing that they satisfy the balance equations for this chain. = 1T.6. are not only the stationary probabilities for the original chain but also for the reverse chain. v. p* " denote the infinitesimal rates of the reverse chain.6.q".P. q" - * _ v. This is done by summing Equation (5.1) has a very nice and intuitive interpretation.258 CONTINUOUS-TIME MARKOV CHAINS where {1T" j ~ O} are the stationary probabilities of the embedded discrete. but to understand it we must first argue that the P. More formally. Let q* - i. P. (5.~ = P. and so. recalling that Pk -_ we see that 1Tk/ v k C where C = L.1) P.~ we see that However. we see that the reverse process is also a continuous-time Markov chain with the same transi_ tion rates out of each state as the forward time process and with one-stage transition probabilities P~. ~ = P.q" = P. Similarly. Since P.6. But this is rather obvious. because every time the chain makes a transition from j to i in the usual (forward) direction of time.1 An ergodic birth and death process is in steady state time reversible Proof To prove the above we must show that the rate at which a birth and death process goes from state i to state i + 1 is equal to the rate at which it goes from i + 1 to i Now in any length of time t the number of transitions from i to i + 1 must .TIME REVERSIBILITY 259 over all states i.q" for all i.~ is the rate at which the reverse chain makes a transition from i to j. and since when in state f the process goes to j with rate q". it is time reversible if for all i and j which is equivalent to P. '" and so {PI} do indeed satisfy the balance equations and so are the stationary probabilities of the reversed chain. P. is the proportion of time in state i. is the proportion of time this chain is in state f.6. q. q" is the rate at which the forward chain makes a transition from j to i Thus. PROPOSITION 5. The stationary continuous-time Markov chain is said to be time reversible if the reverse process follows the same probabilistic law as the original process.1) states that the rate at which the forward chain makes a transition from j to i is equal to the rate at which the reverse chain makes a transition from i to j.7 of Chapter 4). Equation (5. It should be noted that this is exactly the same condition needed for an ergodic discrete-time Markov chain to be time reversible (see Section 4. and P. Now since q. That is. when in state i.~ is the rate. the chain when looked at in the reverse direction is making a transition from i to j. that the reverse chain makes a transition into state j. the condition of time reversibility is that the rate at which the process goes directly from state i to state j is equal to the rate at which it goes directly from j to f. An application of the above condition for time reversibility yields the following proposition concerning birth and death processes. L q" . I = P. it follows that P. to obtain L P. j.q. going forward in time.1. X(t) increases.1 can be used to prove the important result that the output process of an MIMls queue is a Poisson process We state this as a corollary. for all i.260 CONTINUOUS-TIME MARKOV CHAINS equal to within 1 the number from i + 1 to i (since between each transition from ito i + 1 the process must return to i.6. x equals the times at which. X(t) decreases. in steady state.6. going backwards in time. . a Poisson process with rate A Proof Let X(t) denote the number of customers in the system at time t Since the MIMls process is a birth and death process. the time points at which X(t) increases by 1 constitute a Poisson process since these are just the arrival times of customers Hence by time reversibility. x. it follows that the rate of transitions from i to i + 1 equals the rate from i + 1 to i Proposition 5 6.2 Consider an M 1M I ~ queue in which customers arrive in accordance with a Poisson process having rate Aand are served by anyone of ~ servers-each having an exponentially distributed service time with rate IL If A < ~IL.= 1 .00. t ~ O} is time reversible Now going forward in time.q" = x. suppose that the x/s are nonnegative and satisfy LX. Corollary 5. if a set of probabilities {x) satisfy the time reversibility equations then the stationary Markov chain is time reversible and the x/s are the stationary probabilities.q". then the output process of customers departing is. the time points at which the X(t) increases by 1 when we go backwards in time also constitute a Poisson process But these latter points are exactly the points of time when customers depart (See Figure 5 61 ) Hence the departure times constitute a Poisson process with rate A As in the discrete-time situation. x also equals the times at which. j ------x------x--------------x--x-------------Figure 5. and this can only occur through i + 1. and vice versa) Hence as the number of such transitions goes to infinity as t . it follows from Proposition 561 that {X(t). To verify this. q" = X. q" the chain is time reversible Consider a continuous-time Markov chain whose state space is S We say that the Markov chain is truncated to the set A C S if q" is changed to 0 for all i E A.TIME REVERSIBILITY 261 summing the bottom set of equations over all i gives L x. that is truncated to the set A C S and remains irreducible is also time reversible and has limiting probabilities (562) jEA Proof We must show that fori E A. Thus. or.v. j $. foriE A. All other q" remain the same.3 A time-reversible chain with limiting probabilities PJ' j E S. Thus transitions out of the class of states A are not allowed A useful result is that a truncated timereversible chain remains time reversible. = x. q" = x. . j E A. equivalently. and since x. jE A But the above follows since the original chain is time reversible by assumption 5.6(A) Consider an M I M 11 queue in which arrivals finding N in the system do not enter but rather are lost This finite capacity M I M II system can be regarded as a truncated version of the MI Mil and so is time reversible with limiting probabilities given by EXAMPLE p = (AlJ. the {x) satisfy the balance equations and so are the stationary probabilitieS. . PROPOSITION 5.5(A) in the above.t o <j< N. where we have used the results of Example 5.L q" . A.~O . i (A)" J.t) .6. it follows that what server 2 faces is also an MI Mil queue However. we can use time reversibility to obtain much more We need first the following Lemma. Each server serves one customer at a time with server i taking an exponential time with rate J. they then join the queue in front of server 2 We suppose there is infinite waiting space at both servers. that T2 . T2 . A tandem queue. or sequential. it follows that the sequence of future arrivals is independent of the number presently in the system Hence by time reversibility the number presently in the system must also be independent of the sequence of past departures (since. departures are seen as arnvals) (ii) Consider a customer that arrives at time T.262 CONTINUOUS TIME MARKOV CHAINS ~ Figure 5.3 In an ergodic M / M /l queue in steady state' (i) the number of customers presently in the system is independent of the sequence of past departure times. we see.6. Lemma 5. system (Figure 5.2. consider a 2-server system in which custom_ ers arrive at a Poisson rate A at server 1 After being served by server 1. it follows that the waiting time of the customer.1 Tandem Queues The time reversibility of the MIMls queue has other important implications for queueing theory For instance.T. and departs at time T2 Because the system is first come first served and has Poisson arrivals. (why the same customer?) Hence.6. for a service.L.T" is independent of the arnval process after the time T. looking backwards in time. by time reversibility.2 Such a system is called a tandem. i = 1. by looking at the reversed process.6. will be independent of the (backward) arrival process after (in the backward direction) time T2 • But this is just the departure process before time T2 . Now looking backward in time we will see that a customer arrives at time T2 and the same customer departs at time T. (ii) the waiting time spent in the system (waiting in queue plus service time) by a customer is independent of the departure process prior to his departure Proof (i) Since the arrival process is Poisson.62) Since the output from server 1 is a Poisson process. Leaves system 5. 4 For the ergodic tandem queue in steady state (i) the number of customers presently at server 1 and at server 2 are independent. (2) Though the waiting times of a given customer at the two servers are independent.6.2. his wait in queue at server 2 will also be positive with probability at least as large as t (Why?) Hence. and (ii) the waiting time of a customer at server 1 is independent of its waiting time at server 2. suppose that A is very small with respect to iJ-I = iJ-2.3 we see that the time spent by a given customer at server 1 is independent of the departure process prior to his departing server 1 But this latter process.6. the waiting times in queue are not independent. Upon entrance each mutant becomes the initial ancestor .TIME REVERSIBILITY 263 tHEOREM 5. For a counterexample. Proof (i) By part (i) of Lemma 5.6. given that the wait in queue of a customer at server 1 is positive. it turns out somewhat surprisingly that the waiting times in queue of a customer are not independent. 5. < 1. (ii) By part (ii) of Lemma 5.6 3 we have that the number of customers at server 1 is independent of the past departure times from server 1. Since these past departure times constitute the arrival process to server 2.2 A Stochastic Population Model Suppose that mutant individuals enter a population in accordance to a Poisson process with rate A. and thus almost all customers have zero wait in queue at both servers. i = 1. it is clearly necessary that AI iJ-.4(i) to represent the joint probability.6. the independence of the numbers of customers in the two systems follows The formula for the joint probability follows from independence and the formula for the limiting probabilities of an MIMIl queue given in Example 5 5(A). However. This is the necessary and sufficient condition for the tandem queue to be ergodic. in conjunction with the service times at server 2. clearly detenrunes the customer's wait at server 2 Hence the result follows Remarks (1) For the formula in Theorem 5. j :> I} are independent Poisson random variables . j 2:. and let N(t) = (N)(t).. !!') denote the transition rates of the Markov chain. where we say that a family is in state j if it consists of j individuals.J.L Let N/t) denote the number of families at time t that consist of exactly j members.n2. n" . whereas Bo!! is the next state when a mutant appears If we let q(!!.. define the states j2:.264 CONTINUOUS-TIME MARKOV CHAINS of a family. ) with n.~ and D. 1. Then {N(t). n2.. D. . t 2: O} is a continuous-time Markov chain For any state!! = (n]. > 0. j 2: 1. where we assume that" < J. D)!}. and then change states in a random fashion that is independent of the activities of other families. .!!) = = jn..=(n)-1.!! represent the next state from n if there is respectively a birth or death in a family of size j. ).!!) q (!!. B o!!) = A. B. = (n) + 1. q(!!. Now let us suppose that the population is initially void-that is.. To analyze this Markov chain it will be valuable to note that families enter the population in accordance to a Poisson process. then the only nonzero rates are q (!!. N(O) = Q-and call an arriving mutant type j if its family will consist of j individuals at time t Then it follows from the generalization of Proposition 2 3 2 of Chapter 2 to more than two types that {N/t). j :> 0. Nlt). All individuals in the population act independently and give birth at an exponential rate "and die at an exponential rate J.' ) Thus B.1. and also define Bo!}.. ). . n 2.. .L.'" jn.L. 64). are independent Poisson random variables when N(O) = Qthat the limiting probabilities will be of the form (5.::= 1. We thus have the following . Letting P(~) denote the limiting probabilities. will consist of j individuals at time t.' Bo~) = A II e. Using (565).TIME REVERSIBILITY 265 with respective means (5 6.~I '" a n (X' . -\ ' n.4) Pen) - = II e. and show at the same time that the process is time reversible For P(~) of the form (5. for j .L)'ljv. originating at time s.'!:. = A(vlJ. this yields that Hence. by writing!! as B'_I(D. (56. .(~)) and then using the above.4) gives that. for P(~) given by (564) with (x. it follows from the fact that N (t).a .~l '" n (X' n.! for some set (XI' (X2.5) Similarly (56. j .::= 1.6.3) where Pis) is the probability that a family. We will now determine the (x. P(~)q(~. we have shown We can show a similar result for state!! and D.' . (r)] = A t q(s) ds But consider an arbitrary family and let the family contains i members a time s after its ongination otherwise Then f~ q(s) ds = J~ E [1(s)] ds = E [amount of time the family has i members] . E [N.s) ds = A tq(s)ds. f.5 The continuous-time Markov chain {N(t). t limiting probabilities. From (5 6. in steady state.(r)] = A t q(r. aside from it being the limiting mean number of families of size i. where q(s) is the probability that a family consists of i individuals a time s after its origination Hence. (566) ~~n.L IV i::! I There is an interesting interpretation of the (x.3) we see that E [N. ::! O} is. time reversible with where In other words the Iimiiing number offamilies that consist ofi individuals are independent Poisson random variables with respective means ~ (!:)'.6.266 CONTINUOUS TIME MARKOV CHAINS THEOREM 5. SM). Si :> 0 For a given s..::: ~i nt. = A(vl/-L)'liv To keep track of a given family as time progresses. .. Note that. where ni(~) is. and since E[N. for with our state space it is not possible. whenever there are M families in the population. i = 1.3. however. a mutant appears) its label is uniformly chosen from the set of labels not being used at that time If we let Sf denote the number of individuals in the family labeled i. the truncated process with states ~ remains time reversible and has stationary probabilities where a . let n(s) = the process has states S (nl(~:)'. to determine the exact time a given family becomes extinct Thus we will need a more informative state space-OI~e that enables us to follow the progress of a given family throughout time For technical reasons.2.66).*. . We would like to determine the probability that a given family of size i is the oldest (in the sense that it originated earliest) family in the population It would seem that we would be able to use the time reversibility of the process to infer that this is the same as the probability that the given family will be the last surviving family of those presently in existence. Unfortunately. where M .TIME REVERSIBILITY 267 Hence.(t)] -. by observing the process throughout. the number of families-oC size . it will be easiest to start by truncating the model and not allowing any more than M families to exist. M (with Si = 0 meaning that there is at present no family labeled i).(t)] /-'" = AE[amount of time a family has i members]. a. " nk(~) ). That is. by Proposition 5 6. M and agree that whenever a new family originates (that is. from (5. then we can consider (s" . lim E[N. as before. =.A we see that IV (V)' /-L = E [amount of time a family has i members] (v:/-L)' IV Consider now the population model in steady state and suppose that the present state is !!. we will have to label the different families Let us use the labels 1. no additional mutants are allowed. this conclusion does not immediately follow. ! possible ~ vectors that are consistent with ~ are equally likely (that is. then there are two families-one of size 1 and one of size 2-and it seems intuitive that the six possible states ~-all permutations of 0.(~). n l = n 2 = 1.(s) and ex . = n. note that for ~ = ~(~:) ex' = P(sln)Crr e-a. >0 q(B.! Since alliabelings are chosen at random. = >"(VIJJ-)'liv We will now verify the above formula and show at the same time that the chain with states s is time reversible PROPOSITION 5.7) where n . = i. To obtain the stationary distribution of the Markov chain having states s. it is intuitive that for a given vector n all of the M! (M - i nl)! 1 1=1 IT n. 2-are equally likely).6. i . + 1)1-'.SM) is time reversible and has stationary probabilities Proof For a vector ~ "" (Sl. B. > 0 q(~. 1. S. > 0. 'SI'" SM).~) = s.6. n.(~:) = number of j s. n.-' .::= 3. B. Hence it seems intuitive that (5.268 CONTINUOUS-TIME MARKOV CHAINS i. -1=1 ()O n n.6 The chain with states ~ = (Sl' given by (567) . if M = 3.(~» = S.V.(~) is the state following ~ if a member of the family labeled i gives birth Now for s. = 0. That is. . let that is. (S. TIME REVERSIBILITY 269 Also.+!+I.a. ) Hence for the (568) P(~) given by (5.. = A(v/~)'liv. +! (s.. going backwards in . then ). if . then the probability that a given family of size i is the oldest family in the population is i/~JjnJ Proof Consider the truncated process with states s and suppose that the state s is such that n.+!.6.. .n. n~ n~+! " . . families of size i.~ n. as a. .(~)) = M .~ n (s) A ifs. = 0 The Equation (5 6.v = (n . n. .+1 ~ 1)' (n.n. B. n".(~) = n.s) =~ ifs.8) is easily shown to also hold in this case and the proof is thus complete. A given family of size i will be the oldest if. '- q(B..-I. Since there are M .(O available labelings to a mutant born when the state is~. = 0. ~ 1)'(n. s. we see that q(~.(s).. +!+I)a. + 1)~ or or. Corollary 5. which is clearly true.7 If in steady state there are n" i > 0. > 0.67) and for s. is equivalent to n. and since there are ~ jn/ individuals in the population. /. i -F j.7. of which i belong to the specified family. j 2: 0. . and a collection of nonnegative numbers P" i 2: 0../ denote the transition rates of an irreducible continuous-time Markov chain If we can find a collection of numbers q ~ . the result follows in this case The proof now follows in general by letting M -» 00 Remark We have chosen to work with a truncated process since it makes it easy to guess at the limiting probabilities of the labeled process with states s. it has been in existence the longest But by time reversibility.1 Let q... are the limiting probabilities (for both chains) The proof of Theorem 5. To illustrate this approach consider the following model in the network of queues that substantially generalizes the tandem queueing model of the previous section. i 2: 0. then q~ are the transition rates for the reversed chain and P. then we can use Theorem 5 7 1 to validate our guess. such that and 2: qlJ = 2: q ~ . and hence this is the same as the probability that the specified family will survive the longest among all those presently in the population But each individual has the same probability of having his or her descendants survive the longest. i. the process going backward in time has the same probability laws as the one going forward. 5.7 1 is left as an exercise Thus if we can guess at the reversed chain and the limiting probabilities. summing to unity.270 CONTINUOUS-TIME MARKOV CHAlNS time.2 of Chapter 4 THEOREM 5.7 ApPLICATIONS OF THE REVERSED CHAIN TO QUEUEING THEORY The reversed chain can be quite a useful concept even when the process is not time reversible.7. To see this we start with the following continuous-time analogue of Theorem 4. /.. .APPLICATIONS OF THE REVERSED CHAIN TO QUEUEING THEORY 271 5.-] Equation (571) follows since is the arrival rate of customers to j coming from outside the system and.(n . letting . attempting to show that (572) where P. what would its value be? To answer this. n k). II A.P. Now in the reversed process when a customer leaves server i.1 Network of Queues We consider a system of k servers in which customers arrive. . in accordance with independent Poisson processes at rate '" they then join the queue at i until their turn at service comes. where L:~] P'I s 1. it follows that in both the forward and reverse processes the arrival rate at server j is A I Since the rate at which customers go from j to i in the forward process must equal the rate at which they go from i to j in the reverse process. k.}. i = 1. nk). is the rate at which customers depart server i (rate in must eq ual rate out).L:~I P'I represents the probability that a customer departs the system after being served by server i If we let A I denote the total arrival rate of customers to server j. accordance with the tandem queue results we might hope that the numbers of customers at each server are independent random variables That is. note first that since the arrival rate to a server must equal the departure rate from the server. i = 1. denotes the number of customers at server i In (n]. since A.. P'I is the arrival rate to j of those coming from server i This model can be analyzed as a continuous-time Markov chain with states . n 2 . let us start by the limiting probabilities be denoted by P(n].(n. . we will need to first digress and speculate about the reversed process. Once a customer is served by server i he then joins the queue in front of server j with probability P. this means that ) 'I or (573) P =AIPI. then the AI can be obtained as the solution of (571) AI = 'I +L k A. . where n.}. n 2 .I-indeed does not depend on the past. A. from outside the system. .k . customers at server i To prove that the probabilities are indeed of the above form and to obtain the P. . k. to each server i.7. the customer will go to j with some probability that will hopefully not depend on the past If that probability-let's ·call it P. j= 1.) is thus the limiting probability that there are n. ). and 1 . . ) The nicest possibility would be if this Was a Poisson process.) i. That is...).nk)Now qn n' = '" and. if the conjecture is true. then we make the following conjecture. consider first transitions resulting from an outside arrival. (from (5 7 1» and P(~) = II P. arrivals to i from outside the system in the reversed process corre_ spond to departures from i that leave the system in the forward process.1 we need that .L . he will go to server j with probability P" = A.Lj Pi. as given by (5 7. ..n. (1 . P.. consider the states ~ = (n). .(n. . = J. (n i ). (n. Also.! A.) and a departure from i goes to j with probability Pi.L.n" .. (n. P(~') = P.+l.272 CONTINUOUS-TIME MARKOV CHAINS Thus we would hope that in the reversed process..L. Conjecture The reversed stochastic process is a network process of the same type as the original It has Poisson arrivals from outside the system to server i at rate A. + 1) II P.Pj .nk)and~'=(n). when a customer leaves server i. . . and hence occur at rate A 1(1 .*1 Hence from Theorem 5 7. A..3) The service rate i is exponential with rate J. In addition the limiting probabilities satisfy To prove the conjecture and obtain the P. 1 follow in the same manner.. that which is the definition of P".(n) J.APPLICATIONS OF THE REVERSED CHA1N TO QUEUEING lHEORY 273 or That is.. . n. consider those transitions that result from a departure from server j going to server i That is. . and using that L:" 0 P. Since and the conjecture yields we need to show that or.. n" .). . .4) Thus A. using (5. n.L / must be less than unity and the P. (n) = 1 yields (57.7. we have thus proven the following . nl" . where n. Since the addition verifications needed for Theorem 5. Pi(n + 1) -P. must be as given above for the conjecture to be true. .I J. + 1.• n.n.Li A. let n = (n l ..) and !!' = (nl. > O.. .. 1.7. . .4). To continue with our proof of the conjecture. (I . at an arrival time epoch there is a large probability of another arrival in a short time (namely. What is remarkable is that in the network model the arrival process at node i need not be a Poisson process For if there is a possibility that a customer may visit a server more than once (a situation called feedback).). customers arrive to server i from outside the system according to independent Poisson processes having rates A. then the arrival process will not be Poisson An easy example illustrating this is to suppose that there is a single server whose service rate is very large with respect to the small arrival rate from outside Suppose also that with probability p = 9 a customer upon completion of service is fed back into the system Hence.L. and IL. i ~ 1 Since an arrival from outside to server i in the reverse process corresponds to a departure out of the system from server i in the forward process.) . for all i. Hence the arrival process does not possess independent increments and so cannot be Poisson .3 The processes of customers departing the system from server i. < f.2 • Assuming that A.. the feedback arrival). are Proof We've shown that in the reverse process. in steady state. k. whereas at an arbitrary time point there will be only a very slight chance of an arrival occurring shortly (since A is small). independent Poisson processes having respective rates A. the result follows Remarks (1) The result embodied in Theorem 5.7. (l ~I P. i = I.274 CONTINUOUS TIME MARKOV CHAINS THEOREM 5.7.2 is rather remarkable in that it says that the distribution of the number of customers at server i is the same as in an MIMll system with rates A.~I P.7. the number of customers at service i ar independent and the limiting probabilities are given by e Also from the reversed chain we have the following Corollary 5. X. XI ~ X2 ~ ••• :::. and we will analyze the model on this basis We will attempt to use the reverse process to obtain the limiting probability density p(XI> X 2 . . XI ~ X2 ~ ••• ::s.. the next most recent arrived being X 2 time units ago. xn).2 The Erlang Loss Formula Consider a queueing loss model in which customers arrive at a k server system in accordance with a Poisson process having rate A It is a loss model in the sense that any arrival that finds all k servers busy does not enter but rather is lost to the system The service times of the servers are assumed to have distribution G We shall suppose that G is a continuous distribution having density g and hazard rate function A(t) That is. X n. Now in the original process when the state is::! it will instantaneously go to e. the-most recent one having arrived XI time units ago. it is clear that if we look backwards we will be following her excess or additional service time As there will never be more than k in the system. -I' X. A(t) = g(t)/ G (t) is. The process of successive states will be a Markov process in the sense that the conditional distribution of any future state. server i operates as an M/M /k. X". the instantaneous probability intensity that a t-unit-old service will end We can analyze the above system by letting the state at any time be the ordered ages of the customers in service at that time That is. if there are n customers in service. xn). We shall now attempt to prove the above conjecture and at the same time obtain the limiting distribution For any state! = (xI> . the state will be x ::::: (XI' X 2 . ••• . 1 ~ n ~ k.(::!:) = (XI' . xn). 5. Conjecture The reverse process is also a k-server loss system with service distribution G in which arrivals occur according to a Poisson process with rate A The state at any time represents the ordered residual service times of customers presently in service. and so on.) since the person whose time in service is x. Now since the age of a customer in service increases linearly from 0 upon her arrival to her service time upon her departure. .. and P(c/J) the limiting probability that the system is empty. the theory we've developed for chains can be extended to cover this process. . xn) let e. .7. will depend only on the present state Even though the process is not a continuous-time Markov chain. loosely speaking. •. + I' .APPL I CATIONS OFTHE REVERSED CHAIN TO QUEUEING THEORY 275 (2) The model can be generalized to allow each service station to be a multi-server system (that is.(x) with a probability density equal to A(x. would have to instantaneously . given the present and all the past states. rather than an M/Mll system) The limiting numbers of customers at each station will again be independent and e e number of customers at a server will have the same limiting distribution as if its arrival process was Poisson. we make the following conjecture. X n. .(:!»AG(xl ) Letting i (5 7. k. since A(x.(~) with probability intensity A(x. dX II 1 = P(cP).)P( cP).1 we would need that p(:!)A(x.-1 n = 1.. JII G(x. == P(cP) (AE [. p(:!) = p(e.) dX "'1"'"'2'" ". where E [S] = f G(x) dx is the mean service time And.6) P{n in the system} = P( cP)A II JJ. So we see that in forward:! e.(:!))Ag(xJ. 1=1 II Integrating over all vectors :! yields (57.".S]t. upon using P( cP) + n"'l L k P{n in the system} I. n 2 ••• dXn .. or. J II G(x. then it will instantaneously go to:! if a customer having service time X. instantaneously arrives. .276 CONTINUOUS TIME MARKOV CHAINS complete its service..) = p(e.I A" JJ' .2.) dx] dX "'1'"'2 "'.7.).). in reverse: el~) -:! with (joint) probability intensity Ag(x.1 If 1 dX2 . .) = g(x.. . Hence ifp(:!) represents the limiting density. n.)1 G(xJ. in the reversed process if the state is el!) . then in accordance with Theorem 5. Similarly.5) 1 and iterating the above yields p(:~J = AG(xl)p(el(:!» = AG(Xl)AG(X2)p(el(el(~))) = II AG(x... XI' x 2 . their (unordered) ages are independent and are identically distrihuted according to the equilihrium distrihution Ge of G. given that there are n in the system. we see that.APPLICATIONS OF THE REVERSED CHAIN TO QUEUEING THEORY 277 we obtain (AE [S]t (5. .!) with instantaneous intensity A.77) P{n in the system} = " k ( L. !) = (0..!). Now in forward:! ~ (O.8) and we see that the conditional distribution of the ordered ages given that there are n in the system is p(X) } p{xln in the system} = P{ . 1. To complete our proof of the conjecture we must consider transitions of the forward process from! to (0.) E[S] As G(x)/E [S] is just the density of Go the equilibrium distribution of G.7 1 we must verify that p(!)A = p(O.78) since G(O) = 1 Therefore. and. AE S-. " . then the limiting distrihution of the numher in the system depends on G only through its mean and is given hy (5.!) ~! with probability 1 Hence in conjunction with Theorem 5.xn ) when n < k.77). if the conjecture is valid.5) we have that n (57.1 n = 0.k.=1 IT G(x. h n In t esystem = nl . assuming the validity of the analogue of Theorem 57 1 we have thus proven the following . which follows from (5.. i' From (57. in reverse.J n! [ ] . (O. each will receive service work at a rate of lin per unit time Let A(t) = g(t)/ G(t) denote the failure rate function of the service distribution. the state is:! = (x" X 2.7. By using the reversed process we also have the following corollary. the Erlang loss formula. and divides his time equally among all of the customers presently in the system That is.4 The limiting distribution of the number of customers in the system is given by (AE [SlY (579) P{n . Corollary 5.79). where E[S] is the mean of G To analyze the above let the state at any time be the ordered vector of the amounts of work already performed on customers still in the system That is. and Equation (5. and given that there are n in the system the ages (or the residual times) of these n are independent and identically distributed according to the equilibrium distribution of G The model considered is often called the Erlang loss system.7. In system} = .5 Jn the Erlang loss model the departure process (including both customers completing service and those that are lost) is a Poisson process at rate A Proof The above follows since in the reversed process arrivals of all customers (including those that are lost) constitutes a Poisson process 5. whenever there are n customers in the system.. if there are n customers in the system and XI. .=0 2: k (AE [S])" I n' n = 0. XI ~ x 2 ~ • ~ X n.3 The MIG/l Shared Processor System Suppose that customers arrive in accordance with a Poisson process having rate A Each customer requires a random amount of work. distributed according to G The server can process work at a rate of one unit of work per unit time. Xn is the amount of work performed on these n .k.7. 1. .278 CONTINUOUS-TIME MARKOV CHAINS THEOREM 5. . . xn). and suppose that AElSj < 1. 6) = P{n in system} (AE [S]t P( c/J).7 1 we need that p(X) A(x. then in accordance with Theorem 5. XI ::. Hence. equivalently. . ~ xn· Note that in forward: ~ ~ e.(~)))A2.)p(e. . X2 ~ . - or.7. Using P(c/J) + 2: P{n in system} = 1 n~ I 00 . X..(e.).710) p(~) = nG(x.(~) = (xp .. xn) when -! = (XI' . having workloads distributed according to G and with the state representing the ordered residual workloads of customers presently in the system.(~))A = nG(x. if p(x) is the limiting density. with customers arriving at a Poisson rate A. in reverse e. To verify the above conjecture and at the same time obtain the limiting distribution let e.(x) ~ ~ with (joint) probability intensity Ag(x.6 i = n'Anp(c/J) II G(x.)(n -1)G(xi)p(e.(x))Ag(x.)/n.) n = p(e. xn). .=1 n Integrating over all vectors ~ yields as in (5.(~) with probability intensity A(x')ln.).. (5.) The above follows as in the previous section with the exception that if there are n in the system then a customer who already had the amount of work Xj performed on it will instantaneously complete service with probability intensity A(x. j.APPLICATIONS OF THE REVERSED CHAIN TO QUEUEING THEORY 279 customers Let p(-!) and P( c/J) denote the limiting probability density and the limiting probability that the system is empty We make the following conjecture regarding the reverse process Conjecture The reverse process is a system of the same type._I' Xi+ 1" . n+ the above being the relevant equation since the reverse process when in state (e.AE[S)" .10). 1 THEOREM 5. Also. n ~ 0 Given n in the system.AE[S]). If we let L denote the average number in the system. and W.) will go to state:!. To complete the proof of its validity we must verify that p(x)A = p(O.) E[S]' . from (5.x)--l' . n ~ o.7. given n in the system. given n customers in the system the unordered amounts of work already performed are distributed independently according to G" the equilibrium distribution of G.AE [S]). p(~ln) = p(~)! P{n in system} n _ 'rr G(x. in time (n + l)e Since the above is easily verified we have thus shown the following.6 For the Processor Sharing Model the number of customers in the system has the distribution P{n in system} = (AE[S]Y(l .7.n. is. The departure process is a Poisson process with rate A. 1=] That is.:!. All of the above is based on the assumption that the conjecture is valid.280 gives CONTINUOUS·TIME MARKOV CHAINS P{n in system} = (AE [S]Y(1 . the conditional distribution of the ordered amounts of work already performed. the average time a customer spends in the system. the completed (or residual) workloads are independent and have distribution G. then L = 2: n(AE [S]Y(1 n=O 00 AE [S]) AE [S] 1 . To determine the average number of special customers in the system let us first determine the density of the total workload of an arbitrary customer presently in the system Suppose such a customer has already received the amount of work x Then the conditional density of the customer's workload is f(wlhas received x) = g(w)/ G(x).7 6 the distribution of the number in the system as seen by an arnval is the same for both However.(x) . consider two such systems. But. Of course the distribution of time spent by a customer in the system.E [S] dx .7. as opposed to the mean of this distribution. depends on the entire distribution G and not just on its mean Another interesting computation in this model is the conditional mean time an arrival spends in the system given its workload is y To compute this quantity fix y and say that a customer is "special" if its workload is between y and y + e By L = AW we thus have that average number of special customers in the system = (average arrival rate of special customer) x (average time a special customer spends in the system).1 of Chapter 3). 1) (this being the equilibrium distribution for the distribution of a deterministic random variable with mean 1). Hence it is quite surprising that the mean time in system for an arrival is the same in both cases. and so W= L A = E[S] 1-AE[S].6. Remark It is quite interesting that the average time a customer spends in the system depends on G only through its mean For instance. whereas in the second system the remaining workloads are exponentially distributed with mean 1. Hence the density of the total workload of !lome one present in the system is few) = f'" g(w) dG (x) o G(x) • = f'" g(w) dx o E[S] = wg(w)/ E[S] . one in which the workloads are identically 1 and the other where they are exponentially distributed with mean 1 Then from Theorem 5. from Theorem 5. _ G(x) ) ( Since dG. the amount of work an arbitrary customer in the system has already received has the distribution G•.APPLICATIONS OF THE REVERSED CHAIN TO QUEUEING THEORY 281 We can obtain W from the formula L = AW (see Section 3 6. in the first system the remaining workloads of those found by an arrival are uniformly distributed over (0. y + e)] Letting e ~ = :i(. the average arrival rate of customers whose workload is between y and y + e is average arrival rate Hence we see that = Ag(y)e + o(e) E [time in system Iworkload in (y.8 UNIFORMIZATION Consider a continuous-time Markov chain in which the mean time spent in a state is the same for all states That is.AE[S] Thus the average time in the system of a customer needing y units of work also depends on the service distribution only through its mean As a check of the above formula note that w = E [time in system] = f E [time in systemlworkload is y] dG(y) E[S] 1 . t ::2': O} will be a Poisson process with rate V . 5. which checks.282 CONTINUOUS-TIME MARKOV CHAINS Hence the average number of special customers in the system is E [number in system having workload betweeny and y + e] = = Lf(y)e + o(e) Lyg(y)e/ E [S] + o(e) In addition. it follows that if we let N(t) denote the number of state transitions by time t.AE[S] (from (5 7 11)).711) E [time in system Iworkload is y] = Ai[s] L y 1 .. suppose that V. = V for all states i In this case since the amount of time spent in each state during a visit is exponentially distributed with rate v. then {N(t).e EL(s] + o~e) 0 we obtain (5. . are bounded and let V be any number such that (5. In other words. it gives us no information about which states were visited Hence.1) P (I) " = n-O i pn e-"I (V/)n I.(/). and so when v. == v.2) can be thought of as being a process that spends an exponential amount of time with rate V in state i and then makes a transition to j with probability P. Whereas the applicability of Equation (58 1) would appear to be quite limited since it supposes that v. . P{X(/) = j IX(O) = i. n! The above equation is quite useful from a computational point of view since it enables us to approximate P. is just the n-stage transition probability associated with the discretetime Markov chain with transition probabilities P. it turns out that most Markov chains can be put in that form by the trick of allowing fictitious transitions from a state to itself To see how this works.lv are real transitions out of i and the remainder are fictitious transitions that leave the process in state i.UNIFORMIZATION 283 To compute the transition probabilities P. N(/) = = n}P{N(/) = nIX(O) n}e-"' (vIr n = i} = n~O i P{X(/) = jIX(O) = i.. where 1-V V. N(/) = n} = P~.3) P*= " i. any Markov chain satisfying condition (5 8.(/) by taking a partial sum and then computing (by matrix multiplication of the transition probability matrix) the relevant n-stage probabilities P~. Now when in state i the process actually leaves at rate v. we can condition on N(/) as follows: PJ/) = P{X(/) = =L n=O j IX(O) = = i} '" P{X(/) jIX(O) = i. == V (58.. but as the distribution of time spent in each state is the same for all states. consider any Markov chain for which the v. ~ V for all i. where P~. but this is equivalent to supposing that transitions occur at rate v. but only the fraction v. J=t j ¥ (58.. N(/) The fact that there have been n transitions by time I gives us some information about the amounts of time spent in each of the first n states visited..82) V.:. V. . i = 0. =_A_+~e-(HJ1.a . A+p./(A + p. . 1. from (5 8.~n are the n-stage transition probabilities corresponding to (583) This technique of uniformizing the rate in which a transition occurs from each state by introducing transitions from a state to itself is known as umJormization EXAMPLE 5.8(A) Let us reconsider the two-state chain of Example 5 4(A). Va = A.Pal' A + p. = A+ p" i = 0. 1 Since Poo = P 10 . it follows that the probability of transition into state 0 is equal to p.A + p. Letting V = A + p" the uniformized version of the above is to consider it a continuous-time Markov chain with Poo = ~ = 1 .)' A+p.' p. which has POI = PIG = 1.284 CONTINUOUS TIME MARKOV CHAINS Hence.) no matter what the present state Since a similar result is true for state 1. n:> 1. it follows that the n-stage transition probabilities are given by pn _ Hence. Similarly.1) we have that the transition probabilities can be computed by where p. if 1 + Bin(n. n} The last equality following since N(t) = 0 implies that Suet) = t (since X(O) = 0) Now given that N(t) = n.(t) denotes the time spent in state i by time t The stochastic process {S. the interval (0. for s < t. as X(I)' . if X(t) denotes the state at t (either 0 or 1). X(2)' X(2) :5 . t) where X(I) :5 subintervals (0. . then if we let S/(t). (X(n-!). t ~ O} and start by conditioning on N(t) to obtain. independently for each subinterval. t) random variables it follows that the . given N(t) = n. i = 1. X(n).Xes)] ds t Poo(s) ds We will not attempt to compute the variance of So(t) (see Problem 536) but rather will directly consider its distribution To determine the conditional distribution of So(t) given that X(O) = 0. we shall make use of the uniformized representation of {X(t). be given by SI(t) (5 8 4) = So(t) = t Xes) ds t (1 . J.L/(A + J. P{So(t) ~s} = '" L e-(A+I')/ «A + .L) Hence. 0. (X(n) . .L/(A + J.)t)n P{So(t) ssIN(t) = n=O n.UNIFORMIZATION 285 Another quantity of interest is the total time up to t that the process has been in a given state That is. t) is broken into the n + 1 . t > O} is called the occupation time process for state i Let us now suppose that X(O) = 0 and determine the distribution of So(t) Its mean is computed from (584) E [So(t)IX(O) = 0] = = t E [1 . XI!)~' (X(I).X(s» ds then S. S X(n) are the n event times by t of {N(s)} The process {X(t)} will be in the same state throughout any given subinterval It will be in state 0 for the first subinterval and afterwards it will.L) = k» then So(t) will equal the sum of the lengths of k of the above subintervals However. J. X(n) are distributed as the order statistics of a set of n independent uniform (0.L/(A + J. the number of subintervals in which the process is in state 0 equals 1 plus a Binomial (n..L» random variable If this sum equals k (that is. be in state 0 with probability J.(t). At PROBLEMS 5. an individual in state B divides into two new individuals of type A at an exponential rate {3 Define an appropriate continuous-time Markov chain for a population of such organisms and determine the appropriate parameters for this model. " n + 1 with X(O) = 0. (When k = n + 1 the sum equals I ) Hence.37) Therefore. X(n + I) t-is exchangeable. . . n i e-(HI')(((A 1 + J. P{Y. •• . Denve the parameters of the continuous-time Markov chain {Nr(t). the distribution of the sum of any k of the Y/s is the same no matter which ones are in the sum Finally. i l . . as we see that when k <: n. n + I} is a symmetric function of the vector (y" .286 CONTINUOUS TIME MARKOV CHAINS joint distribution of the n + 1 su bin terval lengths Y I . for s < t. Suppose that a one-celled organism can be in one of two states-either A or B An individual in state A will change to state B at an exponential rate a. this sum has the same distribution as the kth largest of a set of n independent uniform (0. I) random vanables. A popUlation of organisms consists of both male and female members. equally likely to be male or female Let N1(t) and N 2(t) denote the number of males and females in the popUlation at t.2. with probability Ah + o(h).Yn + I)' (See Problem 5. Putting the above together gives.L Also.L A+J.1. Each mating immediately produces one offspring. In a small colony any particular male is likely to mate with any particular female in any time interval of length h. That is. for s < P{So(t) ~ sIX(O) = O} = I. P{So(t) = IIX(O) O} = e. Nlt)}.Yn + I-where Y. X(/) X(I I). . . Y 2 . <: y" i I .L)t)n n' k-J f( n )(~)k k-l 1(_A_)n-k+l A+J. 5. pet) I1m . 5. 5. t). 1 . given (a) that ".7.4. (a) Find .5. Prove Lemma 5. n ~ 0.PROBLEMS 287 5. For a pure birth process with birth parameters An. Consider a Yule process with X(O) = i Given that X(t) = i + k. 5. s < t and conclude that P is continuous. Consider a Yule process starting with a single individual and suppose that with probability pes) an individual born at time s will be robust.3. Compute the distribution of the number of robust individuals born in (0. 5. Verify the formula A(t) given in Example 5.1. what can be said about the conditional distribution of the birth times of the k individuals born in (0. < M < 00 for all i or (b) that the discrete-time Markov chain with transition probabilities P" is irreducible and recurrent.3(B).4.P(s)1 < 1 - pet .9. 1 - pes) + P(s)P(t) I pet) . 5.6. Prove Lemma 5. = ao + J~ Xes) ds. 1-+0 t (b) Show that P(t)P(s) (c) Show ~ pet + s) :s.10. .4.8.2. compute the mean. Show that a continuous-time Markov chain is regular. t)? 5. and moment generating function of the time it takes the population to go from size 0 to size N. variance. 5. Let pet) = Poo(t).s). 16. N(t)lt (b) If the initial state is state 0. find the variance of the number of males in the population at time t. For the Yule process. . find the probability that the first k events are all births 5. find the expected time at which the entire population is infected 5. 1 Let N(t) denote the number of events in (0. some of whom are infected with a certain virus Suppose that in an interval of length h any specified pair of individuals will independently interact with probability Ah + o(h) If exactly one of the individuals involved in the interaction is infected then the other one becomes infected with probability a If there is a single individual infected at time 0.15. (a) What type of process is {X(t).L. t .LIl}' Starting in state i.11. In Example 5.:::= O}? (b) What are its parameters? (c) Find E [X(t) IX(O) = i] 5.13. Let T denote the time taken after T for the population to vanish.288 5. "events" occur in accordance with a Poisson process with rate a" i = 0. new members enter the population in accordance with a Poisson process with rate 8 Let X(t) denote the population size at time t. Suppose that the "state" of the system can be modeled as a two-state continuous-time Markov chain with transition rates "0 = A.4(D).L In addition. Consider a population of SlZe n.L When the state of the system is i.~". find E[N(t)]. t) (a) Find lim. Consider a birth and death process with birth rates {All} and death rates {J. Find the density function of T and show that 5. "1 = J. Consider a population in which each individual independently gives birth at an exponential rate A and dies at an exponential rate J.14.12. 5. (a) verify that CONTINUOUS TiME MARKOV CHAINS satisfies the forward and backward equations (b) Suppose that X(O) = 1 and that at time T the process stops and is replaced by an emigration process in which departures occur in a Poisson process of rate J. how much more business would she do? . determine the proportion of time that immigration is restricted.L.20. operated by a single barber. and the successive service times are independent exponential random variables with mean! hour (a) What is the average number of customers in the shop? (b) What is the proportion of potential customers that enter the shop? (c) If the barber could work twice as fast. J.. Consider a continuous-time Markov chain with X(O) = O. Each individual in a biological population is assumed to give birth at an exponential rate A and to die at an exponential rate J.18. ••• . and set t. A small barbershop. called the random hazard.(n) in terms of the other t. In addition. t] given that the chain begins in state i Let YI . Let A denote a set of states that does not include 0 and set T = Min{t > O· X(t) E A} Suppose that T is finite with probability 1 Set q. However.(t)]? 5. for what value of A is t.) (b) What is the distribution of H? 5. = ~ q". Consider a continuous-time Markov chain with stationary probabilities {P" i ~ O}. and consider the random variable H = f~ qAt) dt. Suppose the Y. has room for at most two customers. (b) If N = 3.(1). + . Yn be independent exponential random variables with mean A. 5. Let A be a specified set of states of a continuous-time Markov chain and let T. are independent of the Markov chain.(Y. Potential customers arrive at a Poisson rate of three per hour.1) (c) When n is large. + Y n)] (a) Derive a set of linear equations for t.(n) a good approximation of E [T.PROBLEMS 289 5.L = 2.EA + hlH > s}/h (Hint Condition on the state of the chain at the time T when f~ qAt) dt = s.19.17. (a) Find the hazard rate function of H That is. 1 = 8 = A. and let T denote the first time that the chain has been in state 0 for t consecutive time units Find E [TIX(O) = 0] 5.(t) denote the amount of time spent in A during the time interval [0.(n) and t. i ~ 0 (b) Derive a set of linear equations for t. find lim P{s < H < s h_O .21. immigration is not allowed when the population size is N or larger (a) Set this up as a birth and death model.(n) = E [T.(n . there is an exponential rate of increase 8 due to immigration. L)I. Consider two M I M II queues with respective parameters A" J. In the stochastic population model of Section 5.29. Suppose they both share the same waiting room.1. . Markov . yet).1 in the waiting room when n > 0) and m at the second. < J. Find the limiting probabilities for the MIMIs system and determine the condition needed for these to exist.L. I > O} and {Y(I).290 CONTINUOUS TIME MARKOV CHAINS 5. I > O} is also. Complete the proof of the conjecture in the queueing network model of Section 5. (Hint Use the result of Problem 523.. j ~ i.r Show the corresponding continuous-time Markov chain is time reversible and find the limiting probabilities 5. t > O} are independent time-reversible Markov chains.L" i = 1.) 5.62: (a) Show that when PC!!) is as given by (564) with (XI = (Aljv)(vlJ.23. .7. Consider a time-reversible continuous-time Markov chain having parameters V.22. N customers move about among r servers The service times at server i are exponential at rate J. i = 1. customers at server I. whenever this room is full all potential arrivals to either queue are lost. nr ) when there are n. . which makes state 0 an absorbing state That is. (b) Let D(t) denote the number of families that die out in (0. 5. 5. and when a customer leaves server i it joins the queue (if there are any waiting-or else it enters service) at server j.26.28. t). show that the process {(X(t). 5. Assuming that the process is in steady state 0 at time I = 0. (That is.) Compute the limiting probability that there will be n people at the first queue (1 being served and n . . what type of stochastic process is {D(I).. which has finite capacity N.. reset Vo to equal 0 Suppose now at time points chosen according to a Poisson process with rate A.25.Lp where A. If {X(I). What can you say about the departure process of the stationary M1MII queue having finite capacity? 5. P II and having limiting probabilities PI' j > 0 Choose some state-say state O-and consider the new Markov chain. I > O}? What if the popUlation is initially empty at t O? 5. 2.27. with probability 1/(r 1) Let the state be (n).24. its transition rates satisfy q(jl. )} is time reversible with stationary probabilities 00 P(n) - = . then N. On arrival. jl) q(j).31.j2)q(j2. q(jn-I . Nit)..j2)q(j2.jn)q(jn.(t) denote the number of chains in state j. and only if. at time t (a) Argue that if there are no initial chains.(t). (d) What is the overflow rate from channel c to channel c + 1? Is the corresponding overflow process a Poisson process? A renewal process? Explain briefly (e) If the service distribution were general rather than exponential.jn-I) for any finite sequence of states jl' j2' .n'' a . Consider an MIM/oo queue with channels (servers) numbered 1. (a) Prove that a stationary Markov process is reversible if. jn (b) Argue that it suffices to verify that the equality in (a) holds for sequences of distinct states .PROBLEMS 291 chains-all of the above type (having 0 as an absorbing state )-are started with the initial states chosen to be j with probability Po. Prove Theorem 5 7 1 5. All the existing chains are assumed to be independent Let N. = 1 IT e.j) = . a' n 5. we can think of all arrivals as occurring at channel 1 Those who find channel 1 busy overflow and become arrivals at channel 2 Those finding both channels 1 and 2 busy overflow channel 2 and become arrivals at channel 3.jl) q(jl.32. Thus.30. j > 0. determine the fraction of time that channel 2 is busy (c) Write down an expression for the fraction of time channel c is busy for arbitrary c. which (if any) of your answers to (a)-(d) would change? Briefly explain 5. a customer will choose the lowest numbered channel that is free. and so on (a) What fraction of time is channel 1 busy? (b) By considenng the corresponding MI M!2 loss system. are independent Poisson random variables (b) In steady state argue that the vector process {(NI(t). j > 0.2.jn)q(jn.. )] . and let F. If a customer arrives to find both servers free. 2. let T.q.(s) = E [e. X(r) E G} (c) For i E B. denote the stationary probabilities. The work in a queueing system at any time is defined as the sum of the remaining service times of all customers in the system at that time. i E B. Suppose the state space is partitioned into two subsets B and Be = G. compute P{X(t) = iIX(t) E B. with transition rates q'j' in steady state.34. (b) Compute the probability that the process is in state i.EB L P. 5. given that it is in B That is.EG)EB F. he is equally likely to be allocated to either server.] Argue that F.)(1 .j' (d) Argue that (e) Show by using (c) and (d) that s . Define an appropriate continuous-time Markov chain for this model. where J-LI + J-L2 > v.(s)P. Let PI' j ~ 0. (8) Compute the probability that the process is in state i.+s [L F.292 CONTINUOUS-TIME MARKOV CHAINS (c) Suppose that the stream of customers arriving at a queue forms a Poisson process of rate vand that there are two servers who possibly differ in efficiency Specifically. and determine the limiting probabilities 5. denote the time until the process enters G given that it is in state i. compute P{X(t) = iIX(t) E B}.(s)) .) + jEB )EG L p. given that it has just entered B That is.33.) = q. i E B. suppose that a customer's service time at server i is exponentially distributed with rate J-L" for i = 1. v. show that it is time reversible. For the MIG/1 in steady state compute the mean and variance of the work in the system. Consider an ergodic continuous-time Markov chain.ST.(s) = _v_. where p.(s) = L L P./L) q.F. Use (a). Then: . (e). let Tv denote the time until it leaves B.] 2E[TJ 1 :> E[Tvl 2' The random variable Tv is called the visit or sojourn time in the set of states B. Hint· Imagine that at each renewal a coin. The random variable T called the exit time from B. 1 . Use part (b) to conclude that (g) Using (e) and (f) argue that (h) Given that the process is in a state in B. That is. Consider a renewal process whose interarrival distribution Fis a mixture of two exponentials.PROBLEMS 293 (f) Given that the process has just entered B from G. having probability p of landing heads. the next interarrival is exponential with rate A" and if tails. The results of the above problem show that the distributions of T1 and Tv possess the same structural relation as the distributions of the excess or residual life of a renewal process at steady state and the distribution of the time between successive renewals.35. (f). it is exponential with rate A2 Let R(t) = i if the exponential rate at time t is A. is flipped If heads appears. represents the remaining time in B given that the process is presently in B. and (g) to show that (i) Use (h) and the uniqueness of Laplace transforms to conclude that P{T ~ t} 1 = JI P{Tv> s} ds 0 E[Tv] (j) Use (i) to show that E[T] = E[T. F(x) = pe-~11 + qe-~2\ q = 1 . 5. let T1 denote the time until it leaves B.p Compute the renewal function E[N(t)]. It represents the time spent in B during a visit. Stochastic Models for Social Processes. The Theory of Stochastic Processes. X(y» (b) Let So(t) denote the occupation time of state 0 by t. Wiley. i 1. Wiley. London. 1964 D. England. Systems in Stochastic Equilibrium. .3. X(n + l) = t. Introduction to Queueing Networks. i 1. England. England. Chapman and Hall. . with X(O) = 0 (a) Compute Cov(X(s).. Let Y. 10.2. and 11 N Bailey.9. 2nd ed . . Wiley. The Elements of Stochastic Processes with Application to the Natural Sciences. 6. Chichester.294 (a) determine P{R(t) CONTINUOUS TIME MARKOV CHAINS i}. 3rd ed. . Berlin. New York. Wiley. J Bartholomew.. New York. Springer-Verlag. t) random variables.37. Stochastic Models. NJ. A.36.. 1988 P Whittle. Queueing Networks and Product Forms.Yn' REFERENCES References I. Cambridge University Press. 1 II P{R(s) 0 i} ds where A(s) AR(s)' 5. 1965 J Keilson. Use (a) and (5. n + 1 where X(O) = 0. n + I} is a symmetric function of Yl> . An Algorithmic Approach. 1978 0 R Cox and H 0 Miller. Argue that P{Y. Prentice-Hall. Wiley. Markov Chain Models-Rarity and Exponentiality. 1991 H C Tijms. 1973 M S Bartlett. and 10 Additional material on queueing networks can be found in References 6. Cambridge. An Introduction to Stochastic Processes. . Modelling Biological Populations in Space and Time. 2. 5. England. (b) argue that E [N(t)] 2 2. . Cambridge.84) to compute Var(S(t». New York. Consider the two-state Markov chain of Example 5. Reversibility and Stochastic Networks.8(A). Wiley. Cambridge University Press. Englewood Cliffs. 1979 F Kelly. and 8 provide many examples of the uses of continuous-time Markov chains For additional material on time reversibility the reader should consult References 5. and X(I) ~ X(2) <: •• ~ X(n) are the ordered values of a set of n independent uniform (0. 1986 2 3 4 5 6 7 8 9 10 11 . 1979 Renshaw. 1994 N Van Dijk. London. <: y" i = 1. = X(i) X(I l). Chichester. . 7. 1993 J Walrand. 6. whose definition formalizes the concept of a fair game. in Section 65 we derive an extension of Azuma's inequality. which yields bounds on the tail probabilities of a martingale whose incremental changes can be bounded In Section 6 4 we present the important martingale convergence theorem and.3 we derive and apply Azuma's inequality.1. show how it can be utilized to prove the strong law of large numbers. As we shall see. 1 MARTINGALES ~ A stochastic process {Zn' n I} is said to be a martingale process if for all n and (61. not only are these processes inherently interesting. known as a martingale.2 we introduce the concept of a stopping time and prove the useful martingale stopping theorem. Finally. but they also are powerful tools for analyzing a variety of stochastic processes Martingales are defined and examples are presented in Section 6. In Section 6. In Section 6. then (6.CHAPTER 6 Martingales INTRODUCTION In this chapter we consider a type of stochastic process. among other applications.1) gives 295 .1) A martingale is a generalized version of a fair game For if we interpret Zn as a gambler's fortune after the nth gamble.1) states that his expected fortune after the (n + 1)st gamble is equal to his fortune after the nth gamble no matter what may have previously occurred Taking expectations of (6 1.1. .1.] = = = E[Z"X. .. n :> I} is a martingale by considering the conditional expectation of Z.] = E[Zd SOME ExAMPLES OF MARTINGALES for all n. If we can then show that ..+l given not only ZI . n ~ I} is a martingale since E[ZIl+IIZh" .296 MARTINGALES and so E[Z.+dZh' . •• be independent random variables with 0 mean.. . . Z" but also some other random vector Y.Zn] (2) If Xl. ..] 1. This follows since E[Z.. .] Z"E[X. Since the conditional expectation E[XIU] satisfies all the properties of ordinary expectations.] + X. n ~ I} is a martingale when Z" n~=l X.Z.. and let Z" = 'L:=1 XI' Then {Z".+dZl.Z. .'" .] + E[X..Z. Z.. (1) Let XI> X 2 .] Z"E[X.2) E[XIU] = E[E[XIY.. are independent random variables with E[X. . it follows from the identity E[X] (6..'" .+dZl. X 2 ... then {Z"..5 of Chapter 4) and let Xn denote the size of the nth generation If m is the mean number of offspring per individual..+d =Z" (3) Consider a branching process (see Section 4.+1IZ1. U]Iu]· It is sometimes convenient to show that {Z".. . then {Zn... n :> I} is a martingale when We leave the verification as an exercise. Z" + E[Xn+1] E[Z" .+dZI.. Zn] E[Z"IZI" . . Z. except that all probabilities are now computed condiE[E[XIY]] that tional on the value of U. . the predictor of X that minimizes the expected squared error given the data Yl . note that .2» and the result follows This martingale.Yn. {Zn. . have mean O. Zn. Yn] (from (6. Yn is just E [XI Yl . .. • •• . . ••• be arbitrary random variables such that E[lXIl < 00. Yn] E[XIY" .Yn). their partial sums constitute a martingale That is. has important applications For instance.Yn (which is equivalent to conditioning variables Y" on Z" . . if Zn = L {XI 1=1 n E[XIIX1 ..1.9. Yl . •• . Zn] E[ZnI Z ]. ...E[XIIX1 . .Yn] = E[E[XIY" .1) is satisfied This is so because if the preceding equation holds. . .XI-d. .. and let It then follows that {Zn' n ::::-: I} is a martingale. Y" . . Even though they need not be independent. .. Y 2 . . .MARTINGALES 297 then it follows that Equation (6. then. as shown in Section 1.. i::::-: 1.. X 2 . = ..1. Y 2 • • are accumulated sequentially Then. n ::::-: I} is a martingale with mean 0 To verify this.. Y]IZJ. often called a Duub type martingale. This yields E[Zn+dYJ. the random variables XI . Yn+dIY" = .Zn but the more informative random . . suppose X is a random variable whose value we want to predict and suppose that data Yl .. . Zn] = E[E[Zn+dZ}. Zn. To show this we will compute the conditional expectation of Zn+' given not Z" . provided E[lZnll < 00 for all n. .. then E[Zn+dZ]. Zn] ADDITIONAL EXAMPLES OF MARTINGALES (4) Let X. Yn ] and so the sequence of optimal predictors constitutes a Doob type martingale (5) Our next example generalizes the fact that the partial sums of independent random variables having mean 0 is a martingale For any random variables Xl. . Zn tells us whether or not N = n If P{N < co} = I. . .1 If N is a random time for the martingale {Z. n I} is a martingale 6.2 n ifN<n 0 That is. . random variable N is said to be a random time for the process {Zn.. In equals 1 if we haven't yet stopped after observing Zl.2 STOPPING TIMES Definition The positive integer-valued. . Zn (since the latter are all functions of Xl. then the stopped process {Zn} is also a martingale Proof Let I = • {I if N". Xn] = Zn + E lXn+dXh Zn ~ . then the random time N is said to be a stopping time Let N be a random time for the process {Zn.}. Xn). n ifn ~ ~ I} and let N ifn > N {Zn. n ~ I} if the event {N = n} is determined by the random variables Zl. yields that E[Zn+llxl .E[Xn+IIXJ. knowing ZI' . Z. . .' " Zn-l We claim that (621) ..2. n ~ I} is called the stopped process. X n. That is. . Xn] = thus showing that {Zn.298 MARTINGALES Conditioning on XI . which is more informative than Zl. possibly infinite.Xn] . . PROPOSITION 6. ] co. with probability 1 asn ~ co E[Zd for all n. (6.. and since Z.zn-llzj.2. . if we know the values of Zl .1) follows < n.-!.2. and (62. .1) follows Now (622) E[Z"IZI. In = 0. Zn = Zn. In this case.Zn-l = Zn-h and In = 1. co} Now let us suppose that the random time N is a stopping time. Zn-l> and the last since {Zn} is a martingale We must prove that E[ZnIZl' . Zn-l = Zn = ZN. as n ~ Is it also true that (624) Since E[Z. Z.' . that is. and (6. (624) states that It turns out that... Since the stopped process is also a martingale. subject to some regularity conditions. then we also know the values of Zl. . Hence.Zn-l)IZI.2) implies this result since. (6 24) is indeed valid. . Z. P{N < 1 Since = if n :5 N ifn > N. However. . Zn-d where the next to last equality follows since both Zn-l and In are determined by Zl' .-l] = Zn-l . . . Zn-d = E[Zn-l + In(Zn .STOPPING TlMES 299 To verify (621) consider two cases: (i) N 2: n (ii) N In this case. . Zn-l. it follows that Zn equals ZN when n is sufficiently large. Zl' we have (623) for alln...Zn-d = Zn-l + InE[Zn . . We state the following theorem without proof. • with E[N] < 00. that . it follows.2 is satisfied The martingale stopping theorem provides another proof of Wald's equation (Theorem 3. are independent and identically distributed (iid) with N is a stopping time for Xl' X 2 . 1=1 p.2.2.2.2 If either The Martingale Stopping Theorem (i) Zn are uniformly bounded. if Theorem 6 2 2 is applicable. and there is an M < 00 such that then (62.2 states that in a fair game if a gambler uses a stopping time to decide when to quit. Corollary 6.2. = E[X] Since n Zn = L (X.3.) is a martingale.300 MARTINGALES THEOREM 6. (iii) E[N] < 00. or.3 (Wald's Equation) If X" i ~ 1.4) is valid. Thus Theorem 6. no successful gambling system is possible provided one of the sufficient conditions of Theorem 6. then E[lXI1 < 00 and if E [~XI] E[N]E[X] Proof Let p. or.2). then his expected final fortune is equal to his expected initial fortune. Thus in the expected value sense. \ii) N is bounded. each initially having 1 unit.2(A} Computing the Mean Time Until a Given Pattern Occurs.5(A) we showed how to use Blackwell's theorem to compute the expected time until a specified pattern appears. 0. 0. . 0.. 1. playing at a fair gambling casino Gambler i begins betting at the beginning of day i and bets her 1 unit that the value on that day will equal O. suppose that each outcome is either 0...llz". then since all bets are fair it follows that {Xn' n ~ I} is a martingale with mean 0 Let N denote the time until the seq uence 0 2 0 appears Now at the end of day N each of the gamblers 1.1 In Example 3. then the required number N would equal 12.. and thus Zn+1 - Zn = E[lzn+I-ZnIIZJ. one at each day. 1. 2. ExAMPLE 6..E[N]..)] = E[~X/ . What is the expected number that must be observed until some given sequence appears? More specifically.STOPPING TIMES 301 But E[ZN] = E [~ (X/ - . we venfy condition (iii) Now X n +1 .. ] = E [± XI] .. /~I To show that Theorem 6.. If we let Xn denote the total winnings of the casino after the nth day. .0. if the sequence of outcomes is 2." . !.. 2. she then bets the 2 units on the next outcome being 2.. The next example presents a martingale approach to this problem. .N.Zn] = E[lXn+' . At the beginning of each day another gambler starts playing. 1..-. N .. then all 12 units are bet on the next outcome being O. or 2 with respective probabilities ~. To compute £[ N] imagine a sequence of gamblers....3 would . and t and we desire the expected time until the run 02 0 occurs For instance.. 2....Zn]=E[lxn+.0. and if she wins this bet (and thus has 12 units).. 1....22 is applicable. If she wins (and thus has 2 units). Hence each gambler will lose 1 unit if any of her bets fails and will win 23 if all three of her bets succeed. Suppose that a sequence of independent and identically distributed discrete random variables is observed sequentially..11 ~ E[lXI] + 1. P Assume that the successive movements are independent. and z coins. Let X. and Z contest the following game At each stage two of them are randomly chosen in sequence. with the first one chosen being req uired to give 1 coin to the other All of the possible choices are eq ually likely and successive choices are independent of the past This continues until one of the players has no remaining coins At this point that player departs and the other two continue playing until one of them has all the coins. Solution. then Hence.1. since E[XN ] = 0 (it is easy to verify condition (iii) of Theorem 62. we obtain from Wald's equation that E[N](2p .302 MARTINGALES have lost 1 unit.q ExAMPLE 6.- . If N is the number of steps it takes to reach i. we can compute the expected time until any given pattern of outcomes appears For instance in coin tossing the mean time until HHTIHH occurs is p-4q -2 + p-2 + p-I. Y.] = 2p . and gambler N would have won 1 (since the outcome on day N is 0) Hence. X N = N .2) we see that E[N] = 26 In the same manner as in the above.3 .2 would have won 23. find the expected number of plays until one of them has all the s = x + y + z coins. where p = P{H} = 1 . 1 E[N] = 2p-l ExAMPLE 6.1) = i or. gambler N . If p > 1/2 find the expected number of steps it takes until the individual reaches position i. since E[X. If the players initially have x. gambler N 1 would have lost 1. i > O. equal 1 or -1 depending on whether step j is to the right or the left.1 =N .2(a) Consider an individual who starts at position 0 and at each step either moves 1 position to the right with probability p or one to the left with probability 1 .26 and. y.2(c) Players X.23 + 1. and Z have after the nth stage of play Thus. Y T . E[Xn+l Y n+ 1Xn == 1 X. Suppose that the game does not end when one of the players has all s coins but rather that the final two contestants continue to play. it follows that MT = T and so. If we let T denote the first time that two of the values X n . To find E[T] we will show that is a martingale It will then follow from the martingale stopping theorem (condition (iii) is easily shown to be satisfied) that E[M T ] = E[Mo] ZT xy + xz + yz But. and Zn are equal to 0. Yn = y] 1) + x(y + 1) + x(y . + (x-l)y+ (x-1)(y+ 1)]/6 As the conditional expectations of Xn+1Zn+l and Yn+1Zn+1are similar. respectively. = 0. E[T] = xy + XZ + yz. Y.1) = [(x + l)y + (x + 1)(y =xy-1I3.4. Xn = 0. O} is a martingale. and Zn denote. Zn = s + 4 indicates that X was the first player to go broke and that after the nth play Y had lost 4 more than he began with. since two of X T . Yn . and Z are all in competition after stage n. Y. and are 0. n 2!:: E[Mn+llx" Y 1 . Yn.STOPPING TIMES 303 Solution. Yn . Z" i and consider two cases. the amount of money that X. consider To show that {Mn. Hence.n] Case 1: Xn YnZn > 0 In this case X. for instance. then the problem is to find E[T]. keeping track of their winnings by allowing for negative fortunes Let X n. 1. we see that in this case . . 1=1 . and define X. Let R denote the number of rounds until all people have a match. " " Therefore. In this case Xn+ I = Xn = 0 and E[Yn+IZn+dYn = y.. when any number of individuals are to randomly choose from among their own hats. the expected number of matches is 1 As R is a stopping time for this martingale (it is the smallest k for which stopping theorem that 0= E[ZR] = E " L X. by mathematical induction. = n. {Mn' n :> O} is a martingale and EfT] = xy + xz + yz In Example 1 5(C) we showed. to equal 1 for i > R.X.lxl •. denote the number of matches on the ith round. say player X. where EXAMPlE 6. - 1)] EfR] R = E[~ XI] - =n-EfR] where the final equality used the identity 2: X. Zn = z] = f(y + l)(z .304 MARTINGALES Case 2: One of the players has been eliminated by stage n.-I]) where the last equality follows since. for i = 1. I~I .l)(z + 1)]/2 = yz-1 Hence. Let X. R. = n). we obtain by the martingale 1=1 [~ (X. 1.1) + (y . We will use the zero-mean martingale {Zko k ~ I}. in this case we again obtain that EfMn+1Ix Y Z" i = 0. that the expected number of rounds that it takes for all n people in the matching problem to obtain their own hats is equal to n We will now present a martingale argument for this result. .2(D) Z" " = L (X. .EfX. /( -a» and (.8».8) + -1. Lemma 6.8)· Proof Since f is convex it follows that.1.f(. .1 Let X be such that E[X] == 0 and P{ -a ~ X ~.3.3.8.8) a+. A convexfunction.8 a+.8.8.8 + -f(. since -a f(X) ~-f(-a) . Figure 6.8 a+.3 AZUMA'S INEQUALITY FOR MARTINGALES Let Z" i ~ 0 be a martingale sequence.8) .8 a+.8} =1 Then for any convex function f E[f(X)] 5 a ~. . (See Figure 63.8 1 a+.8 ~ X 5./(.[J(. Taking expectations gives the result.8 f(. we need some lemmas.f( -a) + -a.f(-a)]X a a+. In situations where these random variables do not change too rapidly.1 ) As the formula for this line segment is y = -. that f( -a)]x it follows. it is never above the line segment connecting the points (-a.8.8.AZUMA'S INEQUALITY FOR MARTINGALES 305 6. -(1. Azuma's inequality enables us to obtain useful bounds on their probabilities Before stating it. in the region -a ~ x ~ .8) + -[J(.8 f( -a) + a:. (3) 10:1 ::s 1. 0) = 0 and thus the lemma is proven .e. [(0:. (3) = e P + e P + o:(e P . or. However. {321+1/(2i + 1)1 1=0 which is clearly not possible when {3 ~ 0 Hence.306 MARTINGALES Lemma 6. gives that e P . that As it can be shown that there is no solution for which 0: 614) we see that =0 and {3 ~ 0 (see Problem or. we can conclude that the stnctly positive maximal value of [(0:. expanding in a Taylor series. ~ ~ 2. we must show that for -1 ::s 0: ::s 1. equivalently. would assume a strictly positive maximum in the intenor of the region R = teo:.3.. + (1 - ()e.P + o:(e P + e.. Now the preceding inequality is true when 0: := -lor + 1 and when {3 is large (say when 1{31 ~ 100) Thus. 1{31 ::s 100} Setting the partial denvatives of [equal to 0. upon division. if the lemma is not true.6.2 exp{o:{3 + {32/2}.P = 2{3 exp{o:{3 + {32/2} Assuming a solution in which {3 ~ 0 implies. ::s e"1/8 Proof Letting () = (1 + 0:)/2 and x = 2{3.P ) eP - = 2(0: + (3) exp{o:{3 + {32/2} e.P ) . if Lemma 63 2 were false then the function [(0:.{321+1/(2i)1 = 2. (3) occurs when {3 = O.e.2 For 0 :s () ::s 1 ()e (1-6). ~ a} ~ exp{ -2a2/~ ~ -a} (a. .r + f3n) where the inequality follows from Lemma 631 since (i) I(x) = e(X is convex. exp{ -ca.)2} ~ exp{ -2a2/~ Proal Suppose first that JJ. and suppose that -a. since E[WoJ n = 1. 0. and (iii) E[Zn . = E[Znl ~ Let Zo = JJ.S.AZUMA'S INEQUALITY FOR MARTINGALES 307 THEOREM 6.}) I (a. E[WnIZn-d = exp{cZn-I}E[exp{c(Zn ~ Zn-t)}IZn-d Wn. a > 0 (i) P{Zn .J[f3n exp{ -can} + an exp{cf3n}l/ (a. . (ii) P{Zn JJ.)2} (at + f3.s.)} I~l . n ~ 1 be a martingale with mean JJ. = 0 Now. for any c > 0 P{Zn ~ a} = P{exp{cZn} ~ eea} ~ (631) E[exp{cZn}]e. let Wn for n > 0 Therefore.E[Zn-IIZn-d (ii) -an. and that To obtain a bound on E[exp{cZn}]. Then for any n ~ ~ Z. + f3. + f3. for nonnegative constants a" f3" i ~ I. Zn .Z.JJ.3.} + at exp{cf3.Zn-dZn-d = E[ZnIZn-tl . f3n.Zn-I 0 Taking expectations gives Iterating this inequality gives.ca (by Markov's inequality) exp{cZn } Note that Wo = 1.3 Azuma's Inequality Let Zn. that E[Wn] ~ II {(f3._I f3. 32) with c = 4a /~ (al + (31)2 will give a sharper bound than the Azuma inequality ExAMPLE 6. from Example 5 of Section 6 1.} + a. + t3. we obtain that for any c > 0 MARTINGALES (632) P{Zn ~ a} $ e. + t3. .Zn} Remark From a computational point of view.308 Thus.exp{-ca.-d = 0. + t3.exp{ct3.) and x = c(a.3(A) E[Xd = 0 and E[X./(a. i:2: I Then. )2/8) gives that Parts (i) and (ii) of A7uma's inequality now follow from applying the preceding.)2/8}. For instance. I~l n where the preceding inequality follows from Lemma 632 upon setting 8 a. j = 1. . . from Equation (631). • Let Xl. . Hence. P{Zn ~ a}:5 exp { -ca + c2 n ~ (a. X j <: {31 for all i.n is a zero-mean martingale. if -a. + t3.~l n $ e- CO I1 exp{d(a. + t3. + t3Y /8} n Letting c = 4a / ~ (a.)2 (which is the value that minimi7es -ca + c2 ~ (a.ca ll {(t3. evaluating the nght side of (6. one should make use of the inequality (632) rather than the final Azuma inequality.} and secondly to the 7ero-mean martingale {IJ. Xn be random variables such that X. first to the 7ero-mean martingale {Zn . then we obtain from Azuma's inequality that for a > 0.)} . + t3.-1 J j . we see that 2: X .IX1 ."" • .IJ.) Hence for any c > 0. :::..})/(a. X. are m and . letting I{A} be the indicator variable for the event A. X = 2: I{urn i is empty} ._II first note that IZI . .E(XI' ..3(8) Suppose that n balls are put in m urns in such a manner that each ball._I) em .limY == J. we have that.~I m and thus. the number of empty urns To begin. X. let D denote the number of distinct values in the set XI.-d = (m . Z. let Xl denote the urn in which the jth ball is placed. X.. .D urns that are presently empty will end up empty with probability (1 .D (1 . .1/m)n-.D .1/m)n-' - if X. is equally likely to go into any of the urns. j = 1. We will use Azuma's inequality to obtain bounds on the tail probabilities of X. To determine a bound on IZ. (XI. .D)(l .] ifX. since each of the m . On the other hand. and for i > 0._I)' ~ Hence. for i ~ 2.Z. and define a Doob type martingale by letting Zo = E [X]. X.Zol = o. D is the number of urns having at least one ball after the first i .D)(l . .X.n.1/mY-'+l. the two possible values of Z.1 bans are distributed Then.+I. = E[XIXI . independently. . E[XIXI .] We wi1l now obtain our result by using Azuma's inequa1ity along with the fact that X = Zn. . t!..AZUMA'S INEQUALITY FOR MARTINGALES 309 and Azuma's inequality is often applied in conjunction with a Doob type martingale EXAMPLE 6.Z. we have that E[XIXI' .._I That is.lIm)n-' { (m ._I. E[X] = mP{urn i is empty} = m(l .X.lImY-' 2. .1)(1 . . . i m .L Now. . Now. (i) P{h(X) . for SOme k. X. where !£ Z. . we thus obtain that -lX.£[h(X)] .. we have for a > 0 that :s e. xn) and y = (YI" .~2 n m 2)2(1 lIm)2(n-'l/m 2 /=2 + L •=m+! n (1 . m). IZ.1/m)2(2 .3. Azuma's inequality is often applied to a Doob type martingale for which Corollary 6.£[h(X)] :s -a} :s e-hln. minU . L (lX. . where. .a2rln (ii) P{h(X) .z'_11 !£ 1. and.)2 = L (m + i . + {3.:: a} . Y.310 MARTINGALES Since 1 !£ D :5.lIm)2 . Xn be independent random variables Then. .1.4 Let h be a function such that if the vectors x == (XI. with X = (Xl. X n ).h(y)1 :s 1 Let X" . The following corollary gives a sufficient condition for a Doob type martingale to satisfy this condition. Z. for all i yf:. k) then Ih(x) ._I!£ {3" From Azuma's inequality we thus obtain that for a > 0. Yn) differ in at most one coordinate (that is. . ••• . X. k 1=1 Now. suppose that 0 < k < n In this case if x and y differ in at most one coordinate then it is not necessarily true that Ihk(x) hk(y)1 $ 1. i = 1.x.Xn )JI::Sl Hence..+I._"X" .)n ~ a} < PiY:$ exp{-a 2 /2n} exp{ ~ (1 - -a} < -a 2 /2n}. X. =1 Z. i = 1. .] . .E[h(X)IX.4 that p { Yo p { Yo - ~ (1. Also.p.)n-k.. Solution To begin. = XIo =IE[h(X" =IE[h(X". suppose n x-balls and ny-balls are put in m urns so that the ith x-ball and the ith y-ball are put in the same urn for all but one i Then the number of urns empty of x-balls and the number empty of y-balls can clearly differ by at most 1 ) Therefore. = XI. let hk(x 1 . we see from Corollary 6.X. note that E [Yk ] = E [~ I{ urn i has exactly k balls} ] = f (n) p~(1 _ p.xn ) denote the number of urns with exactly k balls when X. Now. and note that Y k = h k (X1 . Suppose first that k = 0 In this case it is easy to see that ho satisfies the condition that if x and y differ in at most one coordinate then Iho(x) .n.. i = 1. (That is. • •• . Let Y k denote the number of urns with exactly k balls.. = (J. and use the preceding corollary to obtain a bound on its tail probabilities. ._I Proof Consider the martingale Z. . for the one different value could result in one of the . Now. . ..ho(y)1 $ 1.-11 ::s 1 and so the result follows from Azuma's inequality with ExAMPLE 6. m. Iz. denote the urn in which ball i is put. with each ball independently going into urn j with probability P" j = 1.]. ..X.+I' .n. = x.AZUMA'S INEQUALITY FOR MARTINGALES 311 " .Xn)]-E[h(X" .-IoX" . 0 :$ k < n.3. X2. . . n. let X. = Xi. IE[h(X)IX.X"X.Xn)]1 . = E[h(X)IX = x.Xn)-h(XI' . Xn). .3(c) Suppose that n balls are to be placed in m urns. a..-dl . . .x"x. · .4 and so we can conclude that for 0 < k < n. Suppose that in order to perform experiment j.3(D) Consider a set of n components that are to be used in performing certain expenments Let X.. . show that for a > 0 ExAMPLE p{X. X we see that = L I{experiment j can be performed}..] = P. h:(x) = hk (x)/2 satisfies the condition of Corollary 6. and suppose that the X. m If we let heX) equal the number of experiments that can be performed. . a > 0. must be functioning If any particular component is needed in at most three experiments.3. are independent with £[X.=1 lEA IIp.f Solution Since . . and.. . But from this we can see that if x and y differ in at most one coordinate then Hence. where X is the number of experiments that can be performed. j = 1. then h itself does not satisfy the condition of Corollary .312 MARTINGALES vectors having 1 more and the other 1 less urn with k balls than they would have had if that coordinate was not included. m. equal 1 if component i is in functioning condition and let it equal 0 otherwise. 6. all of the components in the set A.=1 m £[X] = L IIp.=1 lEA. ~ -3a} sexp{-a !2n} 2 . 2.E[X]/3 > a}::S.1 If N is a stopping time for {Zn. the following result.MARTINGALE CONVERGENCE THEOREM 313 63 4 because changing the value of one of the X. the martingale stopping theorem.4 1) . h(X)/3 does satisfy the conditions of the coroJlary and so we obtain that P{X/3 .2! 1} having E[lZnll < 00 for all n is said to be a and is said to be a supermartingale if (6. exp{-a 2/2n} P{X/3 . remain valid for submartingales and supermartingales. AND THE MARTINGALE CONVERGENCE THEOREM A stochastic process {Zn. can be established THEOREM 6.4. SUPERMARTINGALES. whose proof is left as an exercise.4 SUBMARTINGALES. n Theorem 6 2 2 is satisfied. That is.E[X]/3 ~ -a}::S. exp{-a 2/2n} 6. n sub martingale if (6. then ~ 1} such that anyone of the sufficient conditions of E[ZN] and ~ S E[Zd E[Zd for a submartingale for a supermartingale E[ZN] The most important martingale result is the martingale convergence theorem Before presenting it we need some preliminaries . can change the value of h by as much as 3 However.2.4 2) Hence. a sub martingale embodies the concept of a superfair and a supermartingale a subfair game From (641) we see that for a submartingale with the inequality reversed for a supermartingale The analogues of Theorem 6. i ::.314 MARTINGALES Lemma 6. n tingale 2! I} is a martingale and fa convex function. N = k] (Why?) • Zk] Hence the result follows by taking expectations of the above Lemma 6. Zn) > a is equivalent to ZN> a Therefore.: a for all i = 1.4.: E[Znl a fora >0 Proof Let N be the smallest value of i.4. E[ZN] • ZN.3 If {Zn. then P{max(ZI' . and define it to equal n if Z. • Zk.. = k] = E[ZnIZIl = E[ZnI Z 1. n 2! I} is a submar- Proof E(f(Zn+. P{max(ZI' • Zn) > a} = P{ZN> a} ::. since N is bounded. N 2! E[Zd Now. . • n Note that max(ZI. ::. Zn] 2! f(E[Zn+1Iz].4.: n • .)lz. such that Z. then {f(Zn).Zn]) (by Jensen's inequality) = f(Zn) THEOREM 6. > a.2 If {Z" i 2! I} is a submartingale and N a stopping time such that P{N s n} = 1 then Proof It follows from Theorem 641 that. .Zn) > a}::.: E[ZN] a (by Markov's inequality) where the last inequality follows from Lemma 642 since N ::.: n.4 (Kolmogorov's Inequality for Submartingales) If {Zn' n 2! I} is a nonnegative submartingale. . a Cauchy sequence That is. hence E[Z~] is nondecreasing Since E[Z~] is bounded.MART~GALECONVERGENCETHEOREM 315 Corollary 6. it follows from Lemma 643 that {Z~. for all n then. n 2: (The Martingale Convergence Theorem) 00 I} is a martingale such that for some M < E[IZnl1 ~ M. with probability 1. with probability 1. IZnl) > a} ~ E[Z~]/a2 (i) P{max(IZll.6 If {Zn. THEOREM 6.5 Let {Zn' n 2: I} be a martingale Then. . Proof Parts (i) and (ii) follow from Lemma 644 and Kolmogorov's inequality since the functions f(x) = Ixl and f(x) = x 2 are both convex We are now ready for the martingale convergence theorem. (ii) P{max(IZt!. n 2: I} is a submartingale. n 2: I} is.4. IZnl) > a} ~ E[IZnl1/a. limn_oo Zn exist'i and is finite Proof We shall prove the theorem under the stronger assumption that E[Z~] is bounded (stronger since EIIZnl] ~ (E[ZmIf2) Since f(x) = x 2 is convex. m Now ~ 00 (643) P{IZm+k - zml > s for some k ~ n} (by Kolmogorov's inequality) But E[ZmZm+n] = E[E[ZmZm+nIZm11 = E[ZmE[Zm+nIZm11 = E[Z~] . we will show that with probability 1 as k.4. it follows that it converges as n ~ 00 Let IL < 00 be given by IL = lim E[Z~] n_:o We shall show that limn Zn exists and is finite by showing that {Zn. for a > 0 . .E[Z~] 2 8 and recalling the definition of p. 6.) .4 3).Zm I > Letting n _ 00 8 for some k :$ n}:$ E[Z~+n] . n ~ O} is a nonnegative martingale.-. MARTINGALES P{ IZm+k . and so the gambler's fortune is not allowed to become negative.7 it will converge as n ~ 00._E[Z2] 2 m 8 P{lZm+k .Zn+! = 0. yields p{lzm+k . limn~o> Zn exists and is finite Proof Since Zn is nonnegative. with probability 1.Zml > 8 for some k} - 0 as m _ 00 Thus. therefore.. Hence from Corollary 64.4(A) Branching Processes.7 If {Zn. if Zn is the gambler's fortune after the nth play. From this we can conclude that either Xn ~ 0 or else it goes to 00 at an exponential rate. Consider a gambler playing a fair game. then. with probability 1.o> Zn will exist and be finite Corollary 6.4. EXAMPLE ExAMPLE 6. then {Zn' n ~ I} is a martingale Now suppose that no credit is given.Zml > And. she did not gamble on the n + 1 play. 8 for some k}:$ p. Zn will be a Cauchy sequence. and on each gamble at least 1 unit is either won or lost. then Zn = Xnlmn is a nonnegative martingale. from (6. Let denote the number of plays until the gambler is forced to quit (Since Zn . If Xn is the population size of the nth generation in a branching process whose mean number of offspring per individual is m.4(a) A Gambling Result. that is.316 Hence. and thus lim. e'os'/'lTn(to).Jn can be as large as p. X 2 .Znl N<oo That is. _ g (0) - 'IT(t)(p.4. is a martingale Since it is also nonnegative.MARTINGALE CONVERGENCE THEOREM 317 Since {Zn} is a nonnegative martingale we see by the convergence theorem that (6. " The Strong Law of Large Numbers be a sequence of independent and identically distributed random variables having a finite mean p. For a given e > 0. the . being the product of independent random variables with unit means (the ith being e'ox'/'IT(t». I~I • Then Proof We will prove the theorem under the assumption that the moment generating function 'IT(t) = E[e 'X ] exists. and so (64. note that (64.4) n. with probability 1.'lT1(t)el(I"+<ll 'lT2(t) I~O _ . with probability 1 1 for n < N. and let Sn = L X. THEOREM 6. there exists a value to > 0 such that g(to) > 1 We now show that S..4.. let get) be defined by Since g(O) I = 1. But IZn'l .e > 0.. the gambler will eventually go broke..5) However. We will now use the martingale convergence theorem to prove the strong law of large numbers.. + e only finitely often For. oo lim Zn :> exists and is finite... + e)el(l"+t) .4) implies that with probability 1.8 Let Xl . or.e for an infinite number of n} 0 P{p. . 1'(0) == -e. :0:. ? We will end this section by characterIZing Doob martingales To begin we need a definition.e :5 Sn1n :0:. + e for an infinite number of n} 0 Similarly. P. is said to be uniformly integrable if for ' JIxl>Y£ Ixl df~(x) < e. + e for all but a finite number of n} = 1. it follows from (6. is uniformly integrable then there exists M < for aU n 00 such that E[lXnll < M . there exists a value to < 0 such thatf(ta) > 1. since g(to) > 1. by defining the function f(t) = e'(I'-£) 1'I'(t) and noting that since f(O) = 1. lim e'Os" 1'1'" (to) "_00 exists and is finite MARTINGALES Hence.4. Definition The sequence of random variables X n• n every e > 0 there is a y. since the above is true for all e > 0.318 convergence theorem shows that. we can prove in the same manner that P{Snln Hence. with probability 1.9 If Xn. .4 5) that P{Snln > p. such that 2: 1. n 2: 1. where f~ for all n is the distnbution function of Xn Lemma 6. p. Zk] where the interchange of expectation and limit can be shown to be justified by the uniformly integrable assumption Thus. Z" then it can be shown that E[lZI] < 00 6.Yn ] exists As might be expected.. it thus follows that lim n_". be a Doob martingale.::: I} is a uniformly integrable martingale Then. n . . Zd. n~'" ..Zd = E[lim ZnlZI . say limn Zn Z. For suppose that {Zn. • .Z~J = lim . . it has a limit. .. if we let Z = lim~". Now. let Zn = E[XI Y1 . .5 A GENERALIZED AZUMA INEQUALITY The supermartingale stopping time theorem can be used to generalize Azuma's inequality when we have the same bound on all the Z. we see that any uniformly integrable martingale can be represented as a Doob martingale Remark Under the conditions of the martingale convergence theorem. it follows from the martingale convergence theorem that any uniformly integrable martingale has a finite limit. n ... E[XI Y 1 ... by the martingale convergence theorem. Then E[IXnll = J'Ixls" Ixl df~(x) + I Ixl>" Ixl df~(x) Thus.~'" E[ZnIZ . Yn]. Now.Z.::: 1. . consider the Doob martingale {E[ZIZ1. . That is._I We start with the following proposition which is of independent interest . k > I} and note that E[zIZ" . using the preceding. As it can be shown that a Doob martingale is always uniformly integrable.A GENERALIZED AZUMA INEQUALITY 319 Proof Let Yl be as in the definition of uniformly integrable. Not only is a Doob martingale uniformly integrable but it turns out that every uniformly integrable martingale can be represented as a Doob martingale.. this limiting value is equal to the conditional expectation of X given the entire sequence of the Y/. a . and the second from applying Lemma 632 with () = a/(a + fJ).320 PROPOSITION 6. .1 Let {Zn' n ~ I} be a martingale with mean Zo MARTINGALES = 0.3 1.cb E[exp{c(Zn :5 Wn_le-Cb[(Je-ca Zn_I)}IZI. Using the preceding. + ae cll ] /(a + (J) where the first inequality follows from Lemma 6. x = c(a + fJ) Hence.1 is equivalent to .. for any positive values a and b P{Zn ~ a + bn for some n}:5 exp{-Sab/(a + (J)2} Proof Let. Wn = exp{c(Zn . Zn I. P{ZN~ ~ a + bn or n = k} a + bN} = P{WN ~ I} :5 E[WN ] :5 E[Wo] (by Markov's inequality) where the final eq uality follows from the supermartingale stopping theorem But the preceding is equivalent to P{Zn ~ a + bn for some n :S k]} :5 e -Sab/(a t Il)z Letting k ~ 00 gives the result = . that of ZI . define the bounded stopping time N by N = Minimum{n either Zn Now. fixing the value of cas c = Sb /(a + fJ)2 yields and so {Wn' n ~ O} is a supermartingale For a fixed positive integer k.5. W n . for n ~ 0. plus the fact that knowledge of WI. Wn-d = Wn_1e. for n ~ 1.bn)} and note that. for which for all n :> 1 Then. we obtain that . S(A) Let Sn equal the number of heads in the first n independent flips of a coin that lands heads with probability p. .A GENERALIZED AZUMA INEQUALITY 321 THEOREM 6. -p ::S.p it follows that {Zn' n ~ O} is a zero-mean martingale that satisfies the conditions of Theorem 6. if we let X. (i) P{Zn 2: :5 fJ for all Proof To begin.2 The Generalized Azuma Inequality Let {Zn' n 2: I} be a martingale with mean Zo = O. n 0 2: Remark Note that Azuma's inequality states that the probability that Zm1m is at least c is such that whereas the generalized Azuma gives the same bound for the larger probability that Zn1n is at least c for any n :=:: m EXAMPLE 6.Zn. If -a :5 Zn . for nc for some n 2: m} ~ P{Zn 2: mcl2 + (c/2)n for some n} :s exp{ -8(mcl2)(cl2) / (a + fJ)2} where the final inequality follows from Proposition 65 1 Part (ii) follows from part (i) upon consideration of the martingale .p) 1=1 Zn . Zn 2: nc 2: mc/2 + nc/2 Hence. = 2: (X.np is a martingale with 0 mean As.Zn_1 n 2: 1 then. note that if there is an n such that n that n.Zn-l S 1. That is. and let us consider the probability that after some specified number of flips the proportion of heads will ever differ from p by more than e.5. equal 1 if the ith flip lands heads and 0 otherwise. consider P{ISnln - pi > e for some n ~ m} n Now. P{Zn:=:: ne for some n :=:: m}:s. then Zn == Sn .52 with a = p. {3 = 1 . exp{ -2me 2} .P Hence. for any positive constant c and positive integer m' nc for some n 2: m} ~ exp{-2mc 2/(a + fJ)2} (ii) P{Zn ~ -nc for some n 2: m} ~ exp{-2mc 2/(a + fJY}. P{Zn 2: 2: m and Zn 2: nc then. starting with a single individual. n Show that ~ I}. P{Snln . Consider the Markov chain which at each transition either goes up 1 with probability p or down 1 with probability q = 1 .1 satisfies P{ISnln .pi ~ 1 for some n ~ loo}:5 2e.322 MARTINGALES or. the probability that after the first 99 flips the proportion of heads ever differs from p by as much as . and let 1To denote the probability that such a process. is a martingale 6. is a martingale when Xn is the size of the nth generation of a branching process whose mean number of offspnng per individual is m 6. Show that {1T'. 6. PROBLEMS 6. For instance. n ~ 1.Z. n ~ 1.p.4. let X.p:5 -e for some n ~ m} -< exp{-2me 2 } and thus._I. For a martingale {Zn. i :> 1.(n). where Zo == 0 Var(Zn) = L Var(X.6.2.5. Consider a Markov chain {Xn .) j~1 n 6.2 = . . for 1 :5 k < n. Verify that Xnlmn. P{Snln . n ~ O} is a martingale.2707. Argue that (qlp)So.1. n :> O} with PNN = 0 Let PO) denote the probability that this chain eventually enters state N given that it starts in state i Show that {P(Xn ).3.p:> e for some n ~ m}:5 exp{-2me 2 }. n ~ I} is a martingale show that. If {Zn. n ~ O} is a martingale . Let X(n) denote the size of the nth generation of a branching process. = Z. equivalently. 6. eventually goes extinct. Similarly. Use an appropriate martingale to show that the expected number of bets is AB 6.. n .:: O} a martingale when (a) Zn = Xn + Yn? (b) Zn = XnYn? Are these results true without the independence assumption? In each case either present a proof or give a counterexample. Let Zn = II XI> where XI' i::> 1 are independent random variables with 1=1 n P{X. . Use a martingale argument to compute the expected number of flips until the following sequences appear: (a) HHTTIlHT (b) HTHTHTH 6..:: I} is a martingale when 2: Xl and show I~l n 6. A > 0. is {Zn. Suppose the gambler wiJ) quit playing when his winnings are either A or -B. n ~ 1.PROBLEMS 323 6.:: 1. If {Xn' n ::> O} and {Yn.. Let Sn = that {Zn' n . n ::> I} is said to be a reverse. then Zn = (XI + " + Xn}ln. or backwards.13.U. i .7.11. B > O. Is the martingale stopping theorem applicable? If so.2(C). In Example 6. is a reverse martingale. why not? . .. are independent and identically distributed random vanables with finite expectation. A process {Zn. what would you conclude? If not. 6. find the expected number of stages until one of the players is eliminated.9.8. Consider a gambler who at each gamble is equaUy likely to either win or lose 1 unit. martingale if EIZn I < 00 for an nand Show that if X. n ::> O} are independent martingales. = 2} = P{X. Zn O}. Let Xl. be a sequence of independent and identically distributed random variables with mean 0 and variance (T2. Consider successive flips of a coin having probability p of landing heads.10. 6. = O} = 112 Let N = Min{n. 6.. =1 n (This is called the Hamming distance) Let A be a finite set of such vectors.14. and let Xl .15.Xn be independent random vectors that are all uniformly distributed in the circle of radius 1 centered at the origin Let T == . Let X denote the number of heads in n independent flips of a fair coin. .18. .17. Set D = min p(X.nl2 < -a} :::.. Expand in a power series... 6. (Hint. Show that: (a) P{X .19.nl2 :> a} < exp{ -2a 2In} (b) P{X .1 . 6.y.324 MARTINGALES 6. I ~ a }. O. Suppose that 100 balls are to be randomly distributed among 20 urns Let X denote the number of urns that contain at least five balls Derive an upper bound for P{X ~ I5}. Let p denote the probability that a random selection of 88 people will contain at least three with the same birthday Use Azuma's inequality to obtain an upper bound on p (It can be shown that p = 50) 6. Xn be independent random variables that are each equally likely to be either 0 or 1.y) = L: Ix. Let X denote the number of successes in n independent Bernoulli trials. Give an upper bound for P { IX - ~ P. . For binary n-vectors x and y (meaning that each coordinate of these vectors is either 0 or 1) define the distance between them by p(x. Jn terms of 1-". 6. exp{-2a 2In}. find an upper bound for P{D ~ b} when b > I-" 6. with trial i resulting in a success with probability P.16.20. .y) yEA and let I-" = E [D]. Show that the equation has no solution when f3 =f=.) 6. Let Xl. :.ul > b} for b > O. conditional on X. 6. Argue that P{IT E[T]I:> a} -< 2 exp{-a 21(32n)}. (a) Show that {Zn' n 2: 1} is a martingale. .) denote the length of the shortest path connecting these n points.: { aX" + 1 .XnIA} n P{X" . A group of 2n people. (b) Bound P{IX .22. while each man chooses it with probability q" j = 1. n~L 6... X. n :> O} be a Markov process for which Xo is uniform on (0. 0 < a. consisting of n men and n women. Let Zn denote the fraction of balls in the urn that are white after the nth replication.XnIB} Show that if B is true. An urn initially contains one white and one black ball At each stage a ball is drawn and is then replaced in the urn along with another ball of the same color.a aX" with probability Xn with probability 1 Xn where 0 < a < 1. (a) Find.u = E [X]. . are to be independently distributed among m rooms Each woman chooses room j with probability P. Z" exists with probability 1 (c) If b -=1= a. Consider a sequence of independent tosses of a coin and let P{head} be the probability of a head on any toss. X n +. what is limn_to Zn? . (b) Show that the probability that the fraction of white balls in the urn is ever as large as 3/4 is at most 2/3. 6. Let {X". .. m Let X denote the number of rooms that will contain exactly one man and one woman.21. 6.. Discuss the limiting properties of the sequence X n .PROBLEMS 325 T(X1 .. 1) and. and (b) limn_". .23. b < 1 Let XI denote the outcome of the jth toss and let Z =P{X" . Let A be the hypothesis that P{head} = a and let B be the hypothesis that P{head} = b... ••• . . then: (a) Zn is a martingale.24. q. n > 1 What can we say about Zn for n large? 6.25. Let {X(t). N. Use this sequence to construct a zero mean martingale Zn such that limn_a> Zn = . = 2' . (a) E [number of times the chain leaves state j] = vIm" where 11 VI is the mean time the chain spends in j during a visit (b) E[number of times it enters state j] :::: (c) Argue that vlm l = 2: m. t 2: O} be a nonhomogeneous Poisson process with intensity function A(t).q'J! dl j -=f=. 0 vomo = 1 + 2: m.29. 0 <: U :s.o '*0 . Show that for j -=f=. t ~ O} being a continuous-time martingale Do Problems 6 28-6 30 under the assumptions that (a) the continuous-time analogue of the martingale stopping theorem is valid. Let T denote the time at which the nth event occurs Show that 6.26.00 with probability 1 (Hint· Make use of the BorelCantelli lemma. denote the expected time the chain is in state i. P{X.112'. for all s < t. j.q'I' d-I 2: m. in finite expected time. be a sequence of random variables such that ZI ~ 1 and given Zl. t 2: O} be a continuous-time Markov chain that will. Zn is a Poisson random variable with mean Zn _I. . n :> 1.28. Give the conditions on the q'l that result in {X(t). .27. Let {X(t).326 MARTINGALES 6. i ~ 1.) A continuous-time process {X(t). Let XI • X 2 • be independent and such that P{X. E[X(t)IX(u). t 2: O} be a continuous-time Markov chain with infinitesimal transition rates q'I' i -=f=. s] :::: X(s) 6. Let Zn. t ~ O} is said to be a martingale if E[IX(t)l] < 00 for all t and.I} = 112'. t 2: O. enter an absorbing state N Suppose that X(O) = 0 and let m. Let {N(t). = -I} = 1 .j -=f=. and (b) any needed regularity conditions are satisfied 6. O. Zn-I. Probability with Martingales. Let {X(l)." Annals of Probability. John Wiley. No 6 (1980). Probability via Expectation.REFERENCES 327 I :> 6. pp 1171-1175 S M Ross. we believe. Define a continuous-time martingale related to this process REFERENCES Martingales were developed by Doob and his text (Reference 2) remains a standard reference References 6 and 7 give nice surveys of martingales at a slightly more advanced level than that of the present text Example 6 2(A) is taken from Reference 3 and Example 62(C) from Reference 5 The version of the Azuma inequality we have presented is. 8. O} be a compound Poisson process with Poisson rate A and component distribution F. Williams.X. Spnnger-Verlag. 3rd ed . Cambridge University Press. The Probabilistic Method.-II :$ a" we have allowed for a nonsymmetric bound The treatment of the generalized Azuma inequality presented in Section 65 is taken from Reference 4 Additional material on Azuma's inequality can be found in Reference I N Alon. and P Erdos. "Tower Problems and Martingales. 1992 D. "A Martingale Approach to the Study of the Occurrence of Pattern in Repeated Experiments. Stochastic Processes. England. New York.30. New York. No I (1994). 1992 J Doob. Berlin. No 3 (1995). 1991 2 3 4 5 6 7. John Wiley. more general than has previously appeared in the literature Whereas the usual assumption on the increments is the symmetric condition Ix . 9. pp 52-59 P Whittle." The Mathematical Scientist. "Generalizing Blackwell's Extension of the Azuma Inequality. 19. . pp 493-496 0 Stirzaker. 1953 S Y R Li. J Spencer." Probability in the Engineering and Informational Science. Cambridge. X 2 . for instance.CHAPTER 7 Random Walks INTRODUCTION Let Xl.4 we apply the results of the preceding section to GIG/l queues and to certain ruin problems Random walks can also be thought of as generalizations of renewal processes. n . digress in Section 72 to a discussion of exchangeability. Let So = 0. Random walks are quite useful for modeling various phenomena For instance.11 < 00. random walks are also useful in the analysis of queueing and ruin systems. however. ••• be independent and identically distributed (iid) with E[Ix.P{x. In Section 7. Sn = ~. we have previously encountered the simple random walk-P{X.:::: O} is called a random walk process. One of the examples in this section deals with the GIG/l queueing system. As we will see. many people believe that the successive prices of a given company listed on the stock market can be modeled as a random walk. We may also use random walks to model more general gambling situations.1 we present a duality principle that is quite useful in obtaining vanous probabilities concerning random walks. Xn n ~ 1 The process {Sn. and in analyzing it we are led to the consideration of the probability that a random walk whose mean step is negative will ever exceed a given constant. For if XI is constrained to be a nonnegative random variable. which is the condition that justifies the duality principle. = -l}-in which Sn can be interpreted as the winnings after the nth bet of a gambler who either wins or loses 1 unit on each bet.5 we present a generalization of Blackwell's theorem when the XI 328 . using martingales we show how to approximate the probability that a random walk with a negative drift ever exceeds a fixed positive value. = l} = P = 1 . We present De Finetti's theorem which provides a characterization of an infinite sequence of exchangeable Bernoulli random variables In Section 7. Before dealing with this probability we. then Sn could be interpreted as the time of the nth event of a renewal process In Section 7. In Section 7.3 we return to our discussion of random walks and show how martingales can be effectively utilized For instance. ncO .1 Suppose Xl.h ••• . PROPOSl"rJON 7. XI)' The validity of the duality principle is immediate since the X" i > 1. I}. then the random walk will become positive in a finite expected number of steps. = 2: P{Xl ~O. . 1 Let DUALITY IN RANDOM WALKS Sn = 2: X" n n~1 ~ denote a random walk. We shall now illustrate its use in a series of propositions.Xn+ .. is quite useful. ncO . Duality Principle (XI.Xl + X 2 ~O. •• . =2:P{Xn~O.. are independent and identically distributed. .DUALITY IN RANDOM WALKS 329 are not required to be nonnegative and indicate a proof based on results in renewal theory. n there is a duality principle that.11 states that if £(X) > 0. + Xn > O}. 7. though obvious. E[X] > 0 If are independent and identically distnbuted random variables with N = min{n Xl + . In computing probabilities concerning {Sn.. Proposition 7.Xl + .Xn+Xn-l~O. X 2 . then E[N] < 00 Proof E[N] = 2: P{N > n} n~O .1. Xn) has the same joint distributions as (Xn' X n. +Xl~O}. .. X 2 . SIt) PROPOSITION 7. a renewal takes place each time the random walk hits a low (A little thought should convince us that the times between successive renewals are indeed independent and identically distributed) Hence. (711 ) RANDOM WALKS ErN] 2: P{Sn:S: SII-1.1. by the following Definition Rn is the number of distinct values of (So... and so the number of renewals that occurs will be finite (with probability 1) But the number of renewals that occurs is either infinite with probability 1 if F( 00 )-the probability that the time between successive renewals is finite-equals 1. that is.2 n_«J lim E[Rn] n = P{random walk never returns to O} . from Equation (711). ErN] = 2: P{renewal occurs at time n} n-O . SI!:S O} Now let us say that a renewal takes place at time n if SIt :s: Sn-1o Sn :S Sn-2. S. 00 E[N] < 00 Our next proposition deals with the expected rate at which a random walk assumes new values. :s: 0. since E[X] > 0. that SII _ 00.Sn). Sn:S: Sn-2. . . Let us define R". or has a geometric distribution with finite mean if F(oo) < 1 Hence it follows that E[number of renewals that occurs] < and so. SI. . called the range of (So.. . = 1 + E[number of renewals that occur] Now by the strong law of large numbers it follows.. n~O .330 where the last equality follows from duality Therefore. . OJ. where the last equality follows from duality Hence.Xl + . and so." . oo} = P{no return to OJ. XI + X 2 ¥.Sk-lt Sk ¥. then .Sk ¥. as k _ P{T> k} _ P{T 00.1/n - P{no return to O} EXAMPLE 7. X k + Xk-j + . + Xl ¥.1(A) The Simple Random Walk. k=O • where T is the time of the first return to 0 Now.0.0.Sk-2. X k + X k . from (7 1 2) we see that E[R.O} . I} Now when p = ! (the symmetric simple random walk).P{X.0. k=l • .1 = 1 + 2: P{SI ¥. Sz ¥. Sk ¥.O} k~l " = 1 + 2: P{Xk ¥.0.O} = 2: P{T> k}.Sk-h otherwise. the random walk is recurrent and thus P{no return to O} = 0 when p = ~ . + X k ¥.So and SO E[R. k~l . In the simple random walk P{Xc = I} = p = 1 . (712) E[R.1 = 1 + 2: P{h = k:l • " 1} = 1 + 2: P{Sk ¥.l ¥.0.0.Sk-h Sk ¥.DUALITY IN RANDOM WALKS 331 Proof Letting if Sk ¥.Sk ¥. • = 1 + 2: P{XI k~l ¥. by conditioning on X 2 . Also. E[R"ln] ~ 2(1 .ln] ~ 0 when p = ~ When p > L let a = P{return to 0IXl 0IX1 = -I} = 1 (why?). . = a 2p +1- p. a or.p) . RAN DOM WALKS E[R.1 + p) = 0 Since a < 1 from transience. equivalently.1.p)lp.l)(ap .1 Similarly. we see that a = (1 . we have P{return to o} = ap = I}.332 Hence. and so E[R"ln] ~ 2p . (a .p. whenp>~. Since P{return to + 1 .1 when p ~ ~ Our next application of the utility of duality deals with the symmetric random walk PROPOSITION 7.3 In the symmetric simple random walk the expected number of visits to state k before returning to the origin is equal to 1 for all k oF 0 Proof For k > 0 let Y denote the number of visits to state k prior to the first return to the origin Then Y can be expressed as where if a visit to state k occurs at time n and there is no return to the origin before n otherwise. letting Un (7 13) Y II - Xn+ (.Xn + .Xl >O..DUALITY IN RANDOM WALKS 333 or.. I = n {I irSn > 0. equivalently. . . Sn~l >0. Sn k} + Xl > 0.Sl > 0. Sn = k 0 otherwise Thus.. = LP{XI + n=1 + X" >0. +X1>O. S~~2 > 0. E[Y] = n=l L P{Sn > O. .._. Xn I +. it follows (see Figure 7 1 1) that D +1 " Dn + Y n . and let Dn denote the delay (or wait) in queue of the nth customer Since customer n spends a time D" + Yn in the system and customer n + 1 arrives a time X/HI after customer n.>O. SI! = k} = = '" L P{symmetnc random walk hits k for the first time at time n} n-I P{symmetric random walk ever hits k} (by recurrence). E[Y] = L P{Sn > 0. Y2 .. times be Y I .X2 + where the last equality follows from duality Hence.-I. . .S" > SI' '" . X 2 .X.SI > 0.==k}.>O. The reader should compare this proof with the one outlined in Problem 446 of Chapter 4 . n ::::= 1.X. equivalently. . {0 = or.S" > S. +X. and let the service distribution is G Let the interarrival times be Xl.XI + + XI = k} +X.-j = if D" + Yn ::> X n+1 if Dn + Y" < Xnd .~l = n=l '" L P{Xn + . =1 and the proof is complete * Our final application of duality is to the GIGI1 queueing model This is a single-server model that assumes customers arrive in accordance with a renewal process having an arbitrary interarrival distribution F. Sn-l > 0. . and the service .. Un. where the last step uses the fact that Dl .2} max{O.x-------. for e > 0. D n.Un + Un.. where the last equality follows from duality Hence we have the following PROPOSITION 7.3) yields Dn-'-I = max{O.1 + .4 If Dn is the delay in queue of the nth customer in a GIG/1 queue with interarrival times X" i ~ 1.---. Un. then (71 4) P{D n'l ~ c} P{the random walk S" j 2: 1.+I' O}.1. Un + .1 .. Dn+l = max{Dn + Y. Un + V n. UI + V h • .I + D n. Un. Vn + V n. + Un):> e}. Un + Un-I> .---y.X. Un.---y.334 RAN DOM WALKS ----D.2 + Dn..2 }} max{O. crosses c by times n}. V n-2 + D n.1..---- -x-------.. ..I + Vn-I}} = = + V n. Vn + max{O. =0 Hence..I + max{O. Dn + Vn} = max{O. ..1.I} max{O. Un = max{O..x---n arrives n enters service n departs X· n +1 arrives and enters service Figure 7.---- arrives -x----------x-------x-------x-n n n+1 n enters arrives departs servfce or ----------------X"+l-----------------D. and service times Y" i ~ 1. + Ul }. + UI ):> e} P{max(O. Un.--. . . VI. P{D"-'-I :> e} = P{max(O.I + V n. Iterating the relation (7 1. Un + Un-I... VI + . Vn + V n. Let M" = max(O.1. converge to infinity.. and so P{D"... >- ~ c} is nondecreasing in c} = lim P{D" ..X2 + . > c} in this case we need to compute the probability that a random walk whose mean change is negative will ever exceed a constant However. If E[ U] = E[Y] .5. = 2: (YI . + X") . and thus it is only when E[Y] < E[X] that a limiting delay distribution exists. by the strong law of large numbers.5 Spitzer's Identity E[M. = I(S" > O)(XI + max(X2 . before attacking this problem we present a result.. .-'-I n. ever crosses c}. Spitzer's identity is concerned with the expected value of the maximum of the random walk up to a specified time. that will enable us to explicitly compute E[D. .4 that P{D.1 PROPOSITION 7.] in certain special cases. ISIs" .15) P{D". known as Spitzer's identity. 51. ~ c} = P{the random walk 5" j ~ 1.. let I(A) equal 1 if A occurs and 0 otherwise We will use the representation M" = I(S" > O)M" + I(S" :s O)M" Now.DUALITY IN RANDOM WALKS 335 where S. I(S" > O)M" = I(S" > 0) max S.--+'" >- c}. n >.] " 1 Proof For any event A. we have from (714) (7.] = ~ k E[S. To compute P{D".. then the random walk will. ~ c} = 1 for all c if E[Y] > E[X] The above will also be true when E[Y] = E[X]. Letting P{D".).E[X] is positive.XIII) 1=1 / We also note from Proposition 71. 8) Thus.7). we see that . implying that (71.] n =St . O)Mn_1 which. from (716). (71. Xn and X". E[S:] n 0 implies that Mn = Mn. yields that E[M.] Reusing the preceding equation. X 2 . we obtain E[M.. (717) Also. and (71 8) we have that E[/(Sn > O)Mn] = E[/(Sn > O)Mn-d In addition.RANDOM WALKS Hence. this time with n . But. • .1 substituting for n. upon continual repetition of this argument. gives that E[Mn] =! E[S:] + ~1 E[S:-d + E[Mn-2 ] n n- and. XI> X 2 • .!. since XI.. k E[St] + E[M.1 it follows that I(Sn S O)Mn = I(Sn ::. X n.] = which proves the result since MI 1 &. combined with the preceding. since Sn S +. since X" Sn has the same joint distribution for all i.I have the same joint distnbution. 1(8) Consider a single server queueing model where arrivals occur according to a renewal process with interarrival distribution G(s. To compute ExAMPLE first condition on L~~l X . A) and the service times have distribution G(r.L). where G(a.+1 • Then. ~ c} that Hence. an explicit formula for £[Dn+d can be obtained in certain special cases. b) is the gamma distribution with parameters a and b (and thus has mean a/b) We will use Spitzer's identity to calculate a formula for £[Dn+d in the case where at least one of s or r is integral To begin. 51. is distributed as the sum of kr independent exponentials with rate J. 7. P{D n +1 ~ c} which implies. use the fact that L~~l Y. that Hence. J.L that occur by time L~_l Xd1 . upon conditioning on the number of events of a Poisson process with rate J. from Spitzer's identity we see that Using the preceding.L to obtain. letting we have that . upon integrating from 0 to = P{Mn 00.DUAliTY IN RANDOM WALKS 337 It follows from Proposition 7 ] 4 that. suppose that r is an integer. with Mn = max(O. EXAMPLE 7. ..L A 7.2(A) Suppose balls are randomly selected.A J.) +] We can now use the preceding analysis to obtain the following E[D n + l ] = n(rlJ.+ J.. k~1 k .+ J. .338 RANDOM WALKS Since W k is gamma with parameters ks and A a simple computation yields that where a' = e-xxodx for non integral a. n).Xnis exchangeable if X 1 . .L A J.L where ( + a b) = (a + b)!/(a!b '). Xn will be exchangeable but not independent . we use the identity Sf obtain that E[St] = k(E[Y] . (-Sk)+ to When s is an integer. 2.. from an urn originally consisting of n balls of which k are white If we let x I = {I if the ith selection is white otherwise.=0 i-I) (. . . . . 0 then XI. .2 SOME REMARKS CONCERNING EXCHANGEABLE RANDOM VARIABLES It is not necessary to assume that the random variables XI .L A J..s/A) b-) 1 ks +2:-2:-i (kr +i k A . Xn are independent and identically distnbuted to obtain the duality relationship.=0 J.n has the same joint distribution for all permutations (i1> .(kS + E[Dn+d=2:-2:..+I . .L 1 a i i-I) ( + )b ( + = Sk + A J.L n k=1 .E[XD + E [(~ X. .~ y.. in) of (1.X.L )' . without replacement. Hence.L )kr( A )' . we have that when r is integral f: n 1 kr-I kr . A weaker general condition is that the random variables are exchangeable. where we say that XI. . however. = E[!(X2 )g(XI )]. X2 implying that But as exchangeability implies that E[!(X1)g(XI )] E[!(X1)g(X2 )] E[!(X2)g(X2)]. P{XI S XI.XII S XII} (Xl. Xn is exchangeable EXAMPLE 7.(x." .. X 2 .) II dG(A)..2(8) Let A denote a random variable having distribution G and suppose that conditional on the event that A = A. .SOME REMARKS CONCERNING EXCHANGEABLE RANDOM VARIABLES 339 As an illustration of the use of exchangeability. we see upon expanding the above inequality that Specializing to the case where XI and X 2 are independent. then E[f(X)g(X)] 2 E[f(X)] E[g(X)] The infinite seq uence of random variables XI . we have the following PROPOSITION 7. is said to be exchangeable if every finite subsequence XI. Then for all XI. are independent and identically distributed with distribution F.) . ..2.Xn S xnlA = A} = II F..(x. P{X1S Xl. xn)' They are. .1 If f and g are both increasing functions.=1 which is symmetric in pendent.=1 n The random variables Xl' X 2 . Xl.-that is. X2. not inde- . are exchangeable since f II F. suppose Xl and X 2 are exchangeable and let !(x) and g(x) be increasing functions. ". and the theorem should follow upon letting m _ 00 Indeed it can be shown by a result called Helly's theorem that for some subsequence m' converging to 00.)J[m(1 . which states that every infinite sequence of exchangeable random variables is of the form specified by Example 7 2(B). This last equation follows since. = 1.k + 1)[m(1 . Xl> on [0. there corresponds a probability distribution G XI. roughly equal to E[Y~(1 . the distribution of Ym ' will converge to a distribution G and (72 3) will converge to .. Bernoulli) random variables. for large m.j)(m .Y.1)' . (m . X ••• = 1) = X.. - (mY. . = = X.y.I) m(m . by exchangeability each subset of size .2.1) (m . = O} = E [(my.j . 1] such that. (j . given Sm == j.Xk+I==' =Xn=O} == I~ Ak(1 - A)n-k dG(A) Proof Let m ~ n We start by computing the above probability by first conditioning on This yields (722) m P{XI = I = Xn = O} = 2: P{X )=0 == ••• == X k == 1.(n .k = Xn == Olsm == j} P{Sm == j} == 2: j(j ) + l)(m .n + 1) m J of Xl.340 RANDOM WALKS There is a famous result known as De Finetti's theorem.) .ym)"-k].n + 1) The above is.1) . Xk+l = 1)· .k) + 1) P{S = j} m(m . THEOREM 7. (722) P{X1 ==X2 =· ==Xk==l.)(mY. (m .2 (De Finetti's Theorem) To every infinite sequence of exchangeable random variables taking values either 0 or 1. then (722) may be written as (723) P{X. We will present a proof when the X j are 0 .. for all 0 ~ k :$ n. Xm is equally likely to be the one consisting of all 1's If we let Ym == Smim.j .1 (that is... -(M 1). X 2 O} = P{X1 = 0. A ~ {-M. j + M} and let N denote the first time that the process is in either A or A J By Theorem 6. n 2: I} is a martingale Let A denote the set of states from -M up to I-that is. or -00 (if E[X] < 0) So suppose E[X] = 0 and note that this implies that is".1) 7. which cannot be put in the form (7. if n 2. X 2 = I} = t. .. P{process ever enters A} 2: P{SN E A} 2: j + M' j-i . Our first result is to show that if the Xi are finite integer-valued random variables.1 Suppose XI can only take on one of the values 0. j + 1. let AI denote the set of states A J = {j. n 2: O} is a recurrent Markov chain if. E[X] = 0 < 00 Then Proof It is clear that the random walk is transient when E[X] oF 0. where i 2: 0 For j > i. :t M for some M is".:::: 1 denote a random walk.2. .8 Let USING MARTINGALES TO ANALYZE RANDOM WALB:S " 2: Xl' 1=1 S" = n.2. .2.2(A). since it will either converge to +00 (if E[X] > 0).USING MARTINGALES TO ANALYZE RANDOM WALKS 341 Remark De Finetti's theorem is not valid for a finite sequence of exchangeable random variables For instance.3. . and only if. then P{X1 1. :t I.I} Suppose the process starts in state i. then SrI is recurrent if £[X] = 0 THEOREM 7. and so i = E[SNISN E A]P{SN E A} + E[SNISN E AJ]P{SN E A 2:-MP{SNEA}+j(1 P{SNEA}) J} or Hence. k 1 in Example 7. for all i It is easy to see that the above implies that the finite set of states A U B will be visited infinitely often However.. To start. P{process ever enters A UBI start at i} = 1.2 can be shown to be satisfied.3. 2. B > 0. 0 P{process ever enters Blstart at i} Therefore. M} By the same argument we can show that for i 1.. we have Therefore. if the process is transient.1) ..L = E[X] :.. the probability that Sn reaches a value at least A before it reaches a value less than or equal to B.o!: O. then any finite set of states is only visited finitely often Hence. i . the process is recurrent. Once again. .2. is O.?: 0 S = { 1. let denote a random walk and suppose J. it follows that {Zn} is a martingale with mean 1 Define the stopping time N by N min{n: Sn . let 8 :.342 and letting j _ 00. RANDOM WALKS we see that P{process ever enters A Istart at i} Now let B = 1. For given A.?: A or Sn < -B} Since condition (iii) of Theorem 6.o!: 0 be such that We shall suppose that such a 8 exists (and is usually unique) Since is the product of independent unit mean random variables. we shall attempt to compute PA . (7. That is. ( we see that E '] = APA [N - B(1 .p .3 2) for P A . If we neglect the excess (or overshoot past A or .IIB IIA -e -liB' We can also approximate E[N] by using Wald's equation and then neglecting the excess. we have the following approximations: E[eIlSNISN:> A] = ellA. with probability p with probability q = Suppose x = { -1 1 I 1. or (73.e.PA ) E[X]' Using the approximation (7. we obtain (733) EXAMPLE 7. E[eIlSNISN <: -B] = e. Using the approximation E[SNISN~A] =A.1).3(A) The Gambier's Ruin Problem.B).USING MARTINGALES TO ANALYZE RANDOM WALKS 343 We can use Equation (7 3 1) to obtain an approximation for PA as follows. E[SNISN $ -B] = -B. we have and since E[SN] = E[N]E[X]. from (73.2) P A =e 1 .IIB Hence. ((q/p)A . we have 1 and.(q/ptB) . and so the approximations (7. If we assume that A and B are integers.B} + E[eHNISN ~ -B]P{process crosses -B before A}.3 3) are exact.OA . * We will attempt to compute this by using the results so far obtained and then letting B ~ 00 Equation (73.3 2) and (7.4 7.5) ~ ~ eOAP{process crosses A before -B}. 9 ~ 0 was defined to be such that E[e OX ] = 1.B((q/p)A -1). there is no overshoot.----::-- and E[N] = A(1 .(q/ptB)(2p .4.1 ) where Sn = P{D". PA 1(q/p)B = (q/p)A (q/ptB = (q/p)AtR. from (7. upon letting B (73.11 .4. 00. LV" 1=1 n • By crosses A we mean "either hits or exceeds A " .1) Suppose now that E[X] < 0 and we are interested in the probability that the random walk ever crosses A. Therefore. 7.34) 1 = E [eHNI SN~ A] P{process crosses A before ..9) that 9> O. we obtain P{random walk ever crosses A} :S e. Since E[X] < 0.(q/ptR ---'-'---=----<.344 RANDOM WALKS eO We leave it as an exercise to show that E[(q/p)X] = 1 and so = q/p.34). Hence. ~ A} = P{Sn ~ A for some n}.1) states (7. it can be shown (see Problem 7. the limiting distnbution of the delay in queue of a customer is by (7 1 5) given by (7.1 ApPLICATIONS TO GIG/l QUEUES AND RUIN PROBLEMS The GIGII Queue For the GIG/1 queue. is the service time of the ith customer and X.X.+l) given that L (Y. Recall that for N defined as the time it takes for SIT to either cross A or .5) (74.c > A is just the distribution of A plus an exponential with rate p.4) X. Now.B.. This is the conditional distribution of (74. we see E[e 9SN ISN> A] = E[e 9(A+Y)] efiA f eflYp. . SIT = ~~=1 (Yi .. note that the conditional distribution given by (7.4 4) is just the conditional distribution of Y" . Hence. Since this is true for all nand c.e-/LYdy .c given that Y" .3. it follows that the conditional distribution of Y given that Y > c + A is just c + A plus an exponential with rate p.c > A But by the lack of memory of the exponential. when E[U] E[Y] .3) 1 = E[e 8\'NISN>A]P{S"crossesA before -B} + E[e 9SN ISN S -B]P{S"crosses -BbeforeA}.+l) and let us consider the conditional distribution of SN given that SN .E[X] < 0.::: A.. the conditional distribution of Y" . we showed in Equation (7 34) that (7. we have from (7.-I X 1 +1 ) (say it equals c). is exponential with rate p."'I N XJ+I) > A Conditioning on the value of N (say N . . letting () > be such that ° E[e I1U ] = E[e l1 (Y-X)] = 1.4.+I the interarrival time between the ith and (i + 1)st arrival Hence.2) There is one situation in which we can obtain the exact distribution of D"" and that is when the service distribution is exponential So suppose Y.-1 = n) and on the value of X"+l - L (Y.APPLICATIONS TO GIGII QUEUES AND RUIN PRORLEMS 345 and where Y.c given that Y" . O} =-.346 RANDOM WALKS Hence.L 8 ellA P{S" crosses A before B} + E[e 9SN ISN< -B]P{S" crosses -BbeforeA}. ~ A} A >0 Summing up. 1 = ~ ellAp{S" ever crosses A}. Since 8 > 0. f. I~ exponential with rate f. we obtain. by letting B -+ 00. J. when E[Y] < E[X].L.L o where in rhi-r ca-re 0 i~ -ruch that .4 1) P{D". we have shown the following THEOREM 7. X 2. P{D". 1 = J. then A >0. If Y. time~ Y" i ~ 1.L-8 and thus from (7.1 For the GIGl1 queue with lid -rervice . from (743).4. and lid interarrival time~ Xl> where 0 > 0 i-r mch that In addition. we want p P {~ Y. will eventually be wiped out.=1 n Thus the probability we want. where SIr = 2: (y. > et + A for some t O}. suppose that the company receives money at a constant rate of e per unit time. equivalently. Suppose that the values of the successive claims are iid and are independent of the renewal process of when they occurred Let Y. . . starting with an initial capital of A.4.. e > o We are interested in determining the probability that the insurance company.2 A Ruin Problem Suppose that claims are made to an insurance company in accordance with a renewal process with interarrival times XI."I Now it is clear that the company will eventually be wiped out with probability 1 if E[Y] . then the total value of claims made to the insurance company by time t is L~(I) YI" On the other hand.. call it p(A). > A for some n}..1 n 2: Y. < 0 for some n }. .) .::= eE[X] (Why is that?) So we'll assume E[Y] < eE[X]. that event will occur when a claim occurs (since it is only when claims occur that the insurance company's assets decrease) Now at the moment after the nth claim occurs the company's fortune is A + e 2: x. or.APPLICATIONS TO GIGII QUEUES AND RUIN PROBLEMS 347 7. is P (A) = P {A + e ~ X. . eX. - ~ Y.. denote the value of the ith claim Thus if N(t) denotes the number of claims by time t. :> . That is.=1 . It is also fairly obvious that if the company is to be wiped out. . peA) = P{S. Xl. then where 0 is such that (iii) If the arrival process of claims is a Poisson process with rate A.348 RANDOM WALKS is a random walk. identical to the unconditional distribution of the state at t) Hence the (limiting) probability that an arrival will find the system nonempty is equal to the limiting probability that the system is nonempty and that. Thus from Theorem 7. the limiting distribution of what an amval sees is identical to the limiting distnbution of the system state at time t (This is so since the distnbution of the system state at time t. call it p(A). cX will be exponential with rate Aic. is equal to the arrival rate times the mean service time (see Example 4 3(A) of Chapter 4) . From (7 4 1).1 we have the following.. due to the Poisson arrival assumption. . as we have shown by many different ways. > A} where D". is the limiting delay in queue of a GIG/l queue with interarrival times eX. thus from (745) p(O) will equal the probability that the limiting customer delay in an MIG/1 queue is positive But this is just the probability that an arriving customer in an MIG/1 finds the system nonempty Since it is a system with Poisson arnvals. we see that (745) p(A) = P{D".)] =1 (ii) If the claim values are exponentially distributed with rate /J. is such that where e is such that E[exp{O(Y. then p(O) = AE[Y]/c Proof Parts (i) and (ii) follow immediately from Theorem 7 4 1 In part (iii).4.. given that a customer has arrived at t is. and service times Y. THEOREM 7.cX.4.2 (i) The probability of the insurance company ever being wiped out. 5 BLACKWELL'S THEOREM ON THE LINE 00. Let Vet) = L I". . equivalently.. That is. Sn-l).u(t) .. the initial one occurs the first time the random walk becomes positive If a ladder vanable of height S" occurs at time n.L > 0 and the X. For instance. are not lattice.l)st and ith ladder variable.SN. were nonnegative. n ~ I} denote a random walk for which 0 < J. BLACKWELL'S THEOREM If J.a/ J. denotes the time between the (i . Let {S".L = E[X] < Vet) denote the number of n for which Sn :5 t That is.. Let u(t) = E [V(t)]. where So == o. at the first n + j for which X" + I + .J.L as t - 00 for a > 0 Before presenting a proof of the above. then the random vectors (N" 'SN.+! > O. i ~ 1. an ascending ladder variable occurs whenever the random walk reaches a new high. then the next ladder variable will occur at the first value of n + j for which or. we will find it useful to introduce the concept of ascending and descending ladder variables We say that an ascending ladder variable of ladder height S" occurs at time n if S" > max(So. That is. then Vet) would just be N(t). are independent and identically distributed (where SNo == 0) .. are independent and identically distributed. n=l '" where I" = {I if S" :5 t otherwise 0 If the X.. Since the X. it thus follows that the changes in the random walk between ladder variables are all probabilistic replicas of each other. the number of renewals by time t. In this section we will prove the analog of Blackwell's theorem. Sl.BLACKWELL'S THEOREM ON THE LINE 349 7. then u(t + a) . . + X. if N. . ] is less than 1 Since ErX] > 0. Therefore. converges to a limiting distnbution (namely. Hence there will be exactly n such ascending [descending] ladder variables. with probability 1. Sn ~ 00 as n ~ 00. which will enable us to identify this constant as 11 J. the equilibnum interarrival distribution) Hence. with probability 1 there will be an infinite number of ascending ladder variables but only a finite number of descending ones Thus. it follows by the strong law of large numbers that. p = 1 and P. PROOF OF BLACKWELL'S THEOREM The successive ascending ladder heights constitute a renewal process Let Y(t) denote the excess at 1 of this renewal process That is. = P{Sn < 0 for some n}. for some function g. Now at each ascending (descending) ladder variable there will again be the same probability p(P. and so. a) given that the first positive value is y Hence.L. the distribution of U(t + a) . u(t + a) .) denote the probability of ever achieving an ascending (descending) ladder variable That is.) of ever achieving another one. with probability pn(1 . taking expectations.u(t)] . P.)].P. and finally we will prove the generalization of the elementary renewal theorem.u(t) approaches a limit as t ~ 00 Then we will show that this limiting value is equal to a constant times a. 1 + Y(/) is the first value of the random walk that exceeds 1 Now it is easy to see that given the value of Y(t). say Y(/) = y.U(/) = E[g(Y(t»J Now Y(/). then the number of points in (/. and so.1 + a) has the same distribution as the number of points in (0.-'" . n 2!:: 0.350 RANDOM WALKS Similarly we can define the concept of descending ladder variables by saying that they occur when the random walk hits a new low Let p(P. P = P{Sn > 0 for some n}. the number of ascending [descending] ladder variables will be finite and have a finite mean if. if we know that the first point of the random walk that exceeds t occurs at a distance y past I. p[P. E[U(t + a) . and only if. being the excess at 1 of a nonlattice renewal process.U(t)IY(t)] = g(Y(t». E[g(Y(t»] will converge to E[g(y(oo»J where y(oo) has the limiting distribution of Y(l) Thus we have shown the existence of lim [u(t + a) .U(/) does not depend on t That is.p) [p*(1 . < 1 We are now ready to prove Blackwell's theorem The proof will be in parts First we will argue that u(t + a) . ]IL 5. say by M. t] after having gone past t Since the random vanable N.BLACKWELL'S THEOREM ON THE LINE 351 Now let h(a) = lim [u(t + a) 1-0) u(t)] Then h(a + b) = lim [u(t + a + b) . 5.2) ---E[N. denote the first n for which Sn > t If the XI are bounded. for which the random walk is less than SN.] t IL If the XI are not bounded. which implies that for some constant c (751) h(a) = lim [u(t + a) 1-'" u(t)] = ca To identify the value of c let N. and so (75.t+M I~I Taking expectations and using Wald's equation (E[N.u(t)] 1-'" = lim[u(t + b + a) 1-'" 1-'" u(t + b)] + lim [u(t + b) .* will be no greater than the number of points occurring after time N.00. it follows that I (754) E[N1]:$ E[number of n for which Sn < 0] . then we can use a truncation argument (exactly as in the proof of the elementary renewal theorem) to establish (75 2) Now U(t) can be expressed as (753) where Nt is the number of times Sn lands in (.] < to that used in Proposition 71 1) yields t < E[N. 00 by an argument similar t + M.u(t + b) + u(t + b) .u(t)] = h(a) + h(b). then NI t<2:X. 1=1 ~ u(i + 1) . the expected additional time until the random walk again becomes positive is finite At that point there will again be a positive probability 1 . which.352 RAN DOM WALKS We will now argue that the right-hand side of the above equation is finite.5 5) --u(t) t 1 /L The argument that the right-hand side of (7 5 4) is finite runs as follows' we note from Proposition 71 1 that E[N] < 00 when N is the first value of n for which Sn > 0 At time N there is a positive probability 1 . and so from (7.p* that no future value of the random walk will ever fall below SN If a future value does fall below SN. and so on We can use this as the basis of a proof showing that E [number of n for which Sn < 0] :5 E[NIXI <0] 1 < -p* 00 Thus we have shown (755) We will now complete our proof by appealing to (75 1) and (7.p* that no future value will fall below the present positive one. or. even though the distribution of Y(t) converges to that of Y( (0). Y n units of water flow into the dam from outside sources such as rainfall and river flow At the end of each day . during day n. it does not necessarily follow that E[g(Y(t))] will converge to E[g(Y(oo))]. implying that L. Namely. u(n + 1) n u(1) -c. ---'----'----'-'.1. Suppose that. from (755) implies that c = lI/L' and the proof is complete Remark The proof given lacks rigor in one place..u(i) n c as n _ 00.52) and (753) (7.55) From (751) we have u(i + 1) . We should have proven this convergence directly PROBLEMS 7.uU) - c as i _ 00. equivalently. Consider the following model for the flow of water in and out of a dam. then again by an argument similar to that used in Proposition 7 1 1. show that {Sn.. < X(n) are the X. is an infinite seq uence of exchangeable random vanables.8.4. n ~ 0 denote a random walk for which . Let Xl. ~1. with E[X1] < 00.3. X 2 . . Let Xl. for instance. n ~ 1} is a random walk with reflecting barriers at 0 and C .2. are independent and identically distributed. 7. Conclude from this that the probability of winning is 114 for all strategies 7. only assumes the values 0. . C).. X 2 . n ~ 1. If it is less than or equal to a. ." and if the next card exposed is a spade then you win and if not then you lose For any strategy. Let Sn denote the amount of water in the dam immediately after the water has been released at the end of day n. and once at capacity any additional water that attempts to enter the dam is assumed lost Thus.. then the level at the end of the day (before any water is released) is min(x + Y n . if the water level at the beginning of day n is x.. . in ordered arrangement. X.6.X(n)]. An ordinary deck of cards is randomly shuffled and then the cards are exposed one at a time At some time before all the cards have been exposed you must say "next. Xn be exchangeable. show that at the moment you call "next" the conditional probability that you win is eq ual to the conditional probability that the last card is a spade. The capacity of the dam is C. If Xl.a 7.) ) Give a counterexample when the set of exchangeable random variables is finite 7.5. X(2):oS . Xn be equally likely to be any of the n! perm utations of (1.. . Assuming that the Y n . For the simple random walk compute the expected number of visits to state k. 7. . . Compute E[Xdx(I» X(2» . then the amount a is released. Let Sn.. 2. where X(l) :s. then the total contents of the dam are released. n) Argue that 7. Argue that the random walk for which X. .:!:.7.M and E[Xr1 = 0 is null recurrent 7. show that Cov(X I .PROBLEMS 353 water is released from the dam according to the following rule: If the water content of the dam is greater than a. . . X 2 ) ~ 0 (Hint: Look at Var(L. L = E[X] > 0 argue that. B > 0.0. For a random walk with J. where Gis an appropriately defined geometric random variable. 7.) 7. Ai = P{first positive value of Sn equals i}. N = min{w Sn ~ A or Sn ::=:: . with probability 1. denote the probability that a ladder height equals i-that is. Let Sn = ~~ X.10. satisfies p(A) = I~ I:+(I p(A + ct . which states that E[f(X)] ~ f(E[X]) (J whenever fis convex to prove that if 1. E[X] < 0. (Hint: Argue that there exists a value k such that P{Sk > A + B} > O. (a) Show that if q. ----')0- u(t) t 1 J. In the insurance ruin problem of Section 74 explain why the company will eventually be ruined with probability 1 if E[Y] ~ cE[X]. Use Jensen's inequality. 7. and E[e BX ] = 7. be a random walk and let A" i > 0.354 RANDOM WALKS Let. Show that E [N] < 00. then (J > 0 ¥.U. = j} = { j= -1 a" then Ai satisfies . Then show that E[N] S kE[G].9. In the ruin problem of Section 7.13.L ~ where u(t) equals the number of n for which 0 Sn ~ t. the probability that a company starting with A units of assets is ever ruined.B}.11.x) dG(x) dF(t) + I~ G(A + ct) dF(t).4 let F denote the interarrival distribution of claims and let G be the distribution of the size of a claim. P{X. for A > 0. Show that p(A). 7. Wiley. Methuen. An Introduction to Probability Theory and its Applications. Van Nostrand. 2. REFERENCES Reference 5 is the standard text on random walks. New York. 5 S Asmussen. Wiley. Vols I and II.REFERENCES 355 (b) If P{Xc = j} = 1f5. New York.14. and 4. Wiley.1966 F Spitzer.2. j = -2. 1965 B DeFinetti. 1. denote a random walk in which Xc has distribution F Let G(t. New York. s) dF(y). 2 3 4.s) = F(t + s) . show that 1 +v'5 AI = 3 + v'5' 2 A2 = -3-+-v'5----=S 7. Pnnceton NJ.0. s) denote the probability that the first value of Sn that exceeds t is less than or equal to t + s That is. n : : -: 0. Reference 3 should be consulted for additional results both on exchangeable random vanables and on some special random walks Our proof of Spitzer's identity is from Reference 1. 1970 W Feller. Theory of Probability. 1964 . s) = P{first sum exceeding t is ~ t + s} Show that G(t. Applied Probability and Queues. -1.F(t) + t". G(t . London. Let Sn. Theory of Stochastic Processes. G(t. 1985 0 R Cox and H 0 Miller. Useful chapters on this subject can be found in References 1.y. Principles of Random Walks. 1 1) where x I == {+1 -1 if the ith step of length Il x is to the right if it is to the left. what we obtain is Brownian motion More precisely suppose that each Ilt time units we take a step of size Ilx either to the left or to the right with equal probabilities. If we let X(t) denote the position at time t. then (8.CHAPTER 8 Brownian Motion and Other Markov Processes 8. are assumed independent with P{X.] = 1. == -1} == t (81 1) that Since E[XJ == 0. Var(X. Var(X(t)) = (IlX)2 [~J 356 . and where the X.1 INTRODUCTION AND PRELIMINARIES Let us start by considering the symmetric random walk that in each time unit is equally likely to take a unit step either to the left or to the right Now suppose that we speed up this process by taking smaller and smaller steps in smaller and smaller time intervals If we now go to the limit in the correct manner. == 1} == P{X.) == E[X. we see from (812) E[X(t)] = 0. sometimes called the Wiener process. it would appear that. X(t) is normally distributed with mean 0 and variance c 2 t The Brownian motion process. Var(X(t» --+ c 2 t We now list some intuitive properties of this limiting process obtained by taking dx = c~ and then letting dE --+ 0 From (81 1) and the central limit theorem we see that. (iii) for every t > 0. then from (8 1 2) we see that as dt --+ 0 E[X(t)] = 0. as the distribution of the change in position of the random walk over any time interval depends only on the length of that interval.INTRODUClION AND PRELIMINARIES 357 We shall now let d x and dE go to 0 However. (i) X(t) is normal with mean 0 and variance c 2 t In addition. t ~ O} has independent increments Finally. if we let dx = dE and then let dE --+ 0. we must do it in a way to keep the resulting limiting process nontrivial (for instance. t ~ O} has stationary increments We are now ready for the following definition Definition A stochastic process [X(t). t (i) X(O) ~ 0] is said to be a Brownian motion process if = 0. as the changes of value of the random walk in nonoverlapping time intervals are independent. t 2= O} has stationary independent increments. is one of the most useful stochastic processes in applied probability theory It originated in physics as a description of Brownian motion This phenomenon. is the motion exhibited by a small particle that is totally immersed in a liquid or . (iii) {X(t). (ii) {X(t). then from the above we see that E[X(t)] and Var(X(t» would both conv~e to 0 and thus X(t) would equal 0 with probability 1) If we let dx = cY dE for some positive constant c. we have (ii) {X(t). who discovered it. named after the English botanist Robert Brown. analyzing the price levels on the stock market. u < s} P{X(t + s) . • . The independent increment assumption implies that the change in position between time points sand t + s-that is. its density function is given by !rex) = 1 --e- X 2 V21ii 121 From the stationary independent increment assumption. 0 < u < s.X(s)-is independent of all process values before time s. as we might expect from its limiting random walk interpretation. and quantum mechanics. X(u). and.1) suggests that X(t) should be a continuous function of t. it can be proven (though it's deep) that. Also. the above concise definition of this stochastic process underlying Brownian motion was given by Wiener in a series of papers originating in 1918. which states that the conditional distribution of a future state X(t + s) given the present Xes) and the past X(u).. the process is often called standard Brownian motion. in fact.Xes) = a .Xes) <: a . we shall suppose throughout that c = 1 The interpretation of Brownian motion as the limit of the random walks (8. As any Brownian motion can always be converted to the standard process by looking at X(t)/c.) is given by . with probability 1. When c = 1. and no proofr shall be attempted.358 BROWNIAN MonON AND OTHER MARKOV PROCESSES gas Since its discovery. X(t) is indeed a continuous function of t This fact is quite deep. X(t) is nowhere differentiable. 0 s. The first explanation of the phenomenon of Brownian motion was given by Einstein in 1905 He showed that Brownian motion could be explained by assuming that the immersed particle was continually being subject to bombardment by the molecules of the surrounding medium. X(t + s) . X(t. This turns out to be the case.x IXes) = x. However. it easily follows that the joint density of X(t l ). u < s} <: <: = P{X(t + s) . 0 s. the process has been used beneficially in such areas as statistical testing of goodness of fit. it is in no wayan ordinary function For. X(t) is always pointy and thus never smooth. depends only on the present A process satisfying this condition is called a Markov process. and it may be proven that.s al Xes) = x}. with probability 1. Since X(t) is normal with mean 0 and variance t.x} P{X(t + s) . we should note in passing that while the sample path X(t) is always continuous. X(u).1. Hence P{X(t + s) = al Xes) = x. it follows that Brownian motion could also be defined as a Gaussian process having E[X(I)] = 0 and. where s < t.X(tn) has a joint distribution that is multivariate normal.X(s» =s.BSII)2} 2s (1.s)ll.s ) .(B) 2 _ {-x (B-X)2} -K)exp b . given that X(I) = B. X(s) + X(I) . if we let sit = a. for s ::.X(s» = Cov(X(s). Definition A stochastic process {X(t). X(s» + Cov(X(s). It also follows from (81. X(I) . . then the conditional distribution of X(s) given X(t) is normal with mean aX(I) and variance a(1 . It is interesting to note that the conditional variance of X(s). where the last equality follows from independent increments and Var(X(s» = s . X(t» = Cov(X(s). The conditional density is + ( lslt x IB) = t(x)/'-s(B . does not depend on B! That is. X(tn) has Since a multivariate normal distribution is completely determined by the marginal mean values and the covariance values. .a)l.). Hence the conditional distribution of X(s) given that X(I) = B is.1 3).4a) (814b) E[X(s) I X(t) Var(X(s) IX(I) = B] = Bslt. tn a multivariate normal distnbution for all t1 .INTRODUCTION AND PRELIMINARIES 359 By using (8. t 2: O} is called a Gaussian process if X(t. we may compute in principle any desired probabilities For instance suppose we require the conditional distribution of X(s) given that X(t) = B. I.O < a < 1. Cov(X(s). for s < I. = B) = S(I .x) /. normal with mean and variance given by (81.3) that X(t).. . and thus a Brownian motion process is a Gaussian process where we have made use of the following definition. s < t.2(I-s) _ K 2 exp {_ I(X . Since.t). X(1)) . s :5 t. then {Z(t).4b» = s(1 .I). 11 X(1) = O}. Cov[(X(s). and the proof is complete . That is.s Cov(X(1). X(t)) + st Cov(X( 1).::= O} be a Brownian motion process and consider the process values between 0 and 1 conditional on X(1) = O. all we need venfy is that E[Z(t)J = 0 and Cov(Z(s).1. 1. consider the conditional stochastic process {X( t). Thus the Brownian Bridge can be defined as a Gaussian process with mean value 0 and covariance function s(1 . t ~ O} is a Gaussian process.st . By the same argument as we used in establishing (8 1. X(1) = O]IX(1) = 0] = E[X(t)E[X(s)IX(t)]IX(1) = 0] = E [x(t) 7 IX(1) = X(I) 0] (by (8 1. Z(t)) = s(1 . This leads to an alternative approach to obtaining such a process.t) when s:5 t The former is immediate and the latter follows from Cov(Z(s). from (8 1. 0 :5 t:5 1} is a Brownian Bridge process when Z(t) = X(t) . we can show that this process. t ~ O} is Brownian motion. known as the Brownian Bridge (as it is tied down both at 0 and l).sX(l).t).4).4a» = -1(1 . X(t) .360 BROWNIAN MOTION AND OTHER MARKOV PROCESSES Let {X(t).st + st = s(1 . E[X(s) IX(I) we have that. X(t)) .4). t .tX(1)) = Cov(X(s). X(t» IX(I) 0] = E[X(s)X(t) IX(1) = 0] = E[E[X(s)X(I)IX(I). is a Gaussian process Let us compute its covariance function. for s < t < = = 0] = 0 for s < 1. PROPOSITION 8.t Cov(X(s). Z(t)) = Cov(X(s) .1 If {X(t).tX(1) Proof Since it is immediate that {Z(t).I) s t (by (8 l. X(1)) = s . 0 ~ 1 :s. INTRODUcrlON AND PRELIMINARIES 361 The Brownian Bridge plays a pivotal role in the study of empirical distribution functions.<1 sl ~ 0 asn ~ 00 It also follows. - Let US study the limiting properties. X 2 . 0 ~ s ~ I} To start with. by the central limit theorem. is called the Empirical Distribution Function Let us study its limiting properties as n ~ 00 Since Nn(s) is a binomial random variable with parameters nand s. • be independent uniform (0. for s < t. for fixed s.(s). That is. as n ~ 00 with probability 1 In fact. That is.s). it can be proven (the so-called Glivenko-Cantelli theorem) that this convergence is uniform in s. note that. that the asymptotic joint distribution of an(s) and an(t) should be a bivariate normal distribution In fact. using the central limit theorem.s) has an asymptotic normal distribution with mean 0 and variance s(1 . To see this let XI .Nn(s). 0 ~ s ~ 1. That is. Vn(Fn(s) . ~ s otherwise The random function Fn(s) = Nn(s)/n. of the stochastic process fanes). Hence it would seem. P{an(s) < x} where ~ V21TS(1 1 s) -'" II exp {2 s (~y2 s )} ds. it follows from the strong law of large numbers that.s). similar reasoning makes it plausible to expect the limiting process (if one exists) to be a Gaussian process To see .= I n l. given Nn(s).Nn(s) and (t . the conditional distribution of Nn(t) . as n ~ 00. as the number of the first n that are less than or equal to s.s)/(1 . 0 < S < 1. sup IFn(s) - 0<. Nn(s) where = 2: l. is just the binomial distribution with parameters n . .(s) = {~ if X. with probability 1. 1) random variables and define Nn(s). that for fixed s. 362 BROWNIAN MOTION AND OTHER MARKOV PROCESSES which one.n· F(X. Now.n 2 sl n n = s(1 .) are uniformly distributed over (0. < :5 s) . suppose we want to study the limiting distribution of Vn sup IFn(x) . for 0 :5 S < I < 1. s Hence it seems plausible (and indeed can be rigorously shown) that the limiting stochastic process is a Gaussian process having a mean value function equal to 0 and a covariance function given by s(1 .F(y. upon simplification. But this is just the Brownian Bridge process Whereas the above analysis has been done under the assumption that the X. its scope can be widened by noting that if the distribution function is F. we need to compute E[an(s)] and Cov(an(s).) . Vn[Fn(F-l(s» .I).s] Vn[Fn(Ys) .I). For instance.s] =:: =:: . 1) distribution. have a uniform (0. . Vn[(number of X" i = 1. an(t». when F is continuous. from using that Nn(s) is binomial with parameters n. 1). where Fn(x) is the proportion of the first n of the X" independent random variables each having distribution F. Fn(t» 1 = - n Cov(Nn(s).. = Cov(an(s). 0 :5 s < I :5 1. an(t» n Cov(Fn(s). Nn(t» E[ E[Nn(s)Nn(t) INn(s)]] . the random variables F(X.s] F-l(S» . and. where the last equality follows. then. that are less than or equal to x From the preceding it follows that if we let an(s) =:: Vn[(number of x" i =:: = 1.F(x)1 x for an arbitrary continuous distribution F.n· X.)]. t] and.2. AND ARC SINE LAWS Let us denote by To the first time the Brownian motion process hits a When a > 0 we will compute P{Ta ~ t} by considering P{X(t) ::> a'} and conditioning on whether or not To <:: t This gives (8.I'" e-Y 212 dy ~ U 1 . 0 ~ s ~ 1} converges to the Brownian Bridge process Hence the limiting distribution of Vn sUPx(Fn(x) F(x» is that of the supremum (or maximum. it is just as likely to be above a or below a at time t That is. P{X(t) ::> alTa <:: t} = ~ Since the second right-hand term of (8 2 1) is clearly equal to 0 (since by continuity.IX ~ alV.1) P{X(t) ~ a} = P{X(t) ~ alTa ~ t}P{Ta ~ t} + P{X(t) ~ al Ta > t}P{Ta > t}. AND ARC SINE LAWS 363 where Ys = F-1(s). then the process hits a at some point in [0. Now if To ~ t.y2 !2 dy ' a>O Hence. MAXIMUM VARIABLE. MAXIMUM VARIABLE. the process value cannot be greater than a without having yet hit a). 8.HllTING TIMES. by symmetry. dx 2 = .2 2) P{Ta ~ t} = 2P{X(t) ::> a} = -2-I'" ev'21ii a x212 . by continuity) of the Brownian Bridge Thus. then {an(s). e.2 HITTING TIMES. we see that P{Tu < oo} = lim P{Ta ~ t} 1_'" = -2. we see that (8. t ~ O} is the Brownian Bridge process. where {Z(t).. but its mean time is infinite. Its distribution is obtained as follows. For a > 0 (by continuity) Let 0(11' 12) denote the event that the Brownian motion process takes on the value 0 at least once in the interval (II. the distribution of To is. the same as that of T. the Brownian motion process will eventually hit a. we .2. using (8. (Is this intuitive? Think of the symmetric random walk) For a < 0. from (8.2) we obtain = 00 Thus it follows that Ta. by symmetry.364 BROWNIAN MOTION AND OTHER MARKOV PROCESSES In addition. has an infinite expecta~ tion That is. with probability 1.o Hence.2 2) we obtain Another random variable of interest is the maximum value the process attains in [0. though finite with probability 1. 12)' To compute P{O(/I' 12 )}. I]. HITTING TIMES. MAXIMUM VARIABLE. using (82. n)} = ~ arc sineYx 1T with the approximation becoming exact as n _ 00 Since Brownian motion is the limiting case of symmetric random walk when the jumps come quicker and quicker (and have smaller and smaller sizes). PROPOSITION 8.2.2) we obtain The above integral can be explicitly evaluated to yield Hence we have shown the following.2. it seems intuitive that for .1 For 0 < x < 1. AND ARC SINE LAWS 365 condition on X(II) as follows: Using the symmetry of Brownian motion about the origin and its path continuity gives Hence. t)} = ~ arc sineYx 1T Remarks Proposition 8. P{no zeros in (xt.1 does not surprise us For we know by the remark at the end of Section 3 7 of Chapter 3 that for the symmetric random walk P{no zeros in (nx. 2. tJ the process is 2 '. an integrated version. P{A (t)/ t :s. 8. that the proportion of time the symmetric random walk is positive.1 Brownian Motion Absorbed at a Value Let {X(t)} be Brownian motion and recall that Tx is the first time it hits x. Define Z(t) by X(t) Z(t) = { x ift < T. and the second assumes that it is not allowed to become negative.2 For Brownian motion. in the limit.3 VARIATIONS ON BROWNIAN MOTION In this section we consider four variations on Brownian motion The first supposes that Brownian motion is absorbed upon reaching some given value. PROPOSrrlON 8. That is. for 0 < x < 1.366 BROWNIAN MOTION AND OTHER MARKOV PROCESSES Brownian motion the above approximation should hold with equality. let A(t) denote the amount of time in positive Then. then {Z(t).7 of Chapter 3-namely.3. and the fourth. The third variation deals with a geometric. x} rO.2». the following proposition can be proven. obeys. r -arCSIne vx 1T 8. x > O. the arc sine distribution-can also be shown to be exactly true for Brownian motion. Proposition 8. t ::> O} is Brownian motion that when it hits x remains there forever The random variable Z(t) has a distribution that has both discrete and continuous parts The discrete part is P{Z(t) x} = P{Tx <: t} 2 f: e- y2J21 dy (from (82. .2 1 verifies this The other arc sine law of Section 3. y in the additional time t .2) and (833) we have P {X(t) (since y < x).u 2. ~~.P {X(t) < ~~:. X(s) < x} y.YI~~:r X(s) > x} ~ y. = P{X(t) ~ y} . ~~.VARIATIONS ON BROWNIAN MOTION 367 For the continuous part we have for y < x (831) P{Z(t) < y} = P {X(t) ~ y.3. (8.y.. 2rdu .2) P {X(t) ~ y.Tx By symmetry it is just as likely to increase by that amount Hence. where Tx < t.y} Y .S:5d X(s) > x is equivalent to the event that Tx < t.33) P {X(t) ~ YI~~:r X(s) > x} = P {X(t) ~ 2x . X(s) > x} = P{X(t) :> 2 x y} From (83. and if the Brownian motion process hits x at time Tx.1) P{Z(t) ~ y} = P{X(t) = ~ ~ y} .r X(s) > x} = p{X(t) ~ YI~~:. ~~:. and from (8.2x} (by symmetry of the normal distribution) P{X(t) 1 =-- vI2m JY y-2x e.P{X(t) y} . X(s) > x} P {~~:I X(s) >x} Now the event that maXO'.P{X(t) ~ ~ 2x . X(s) > x} We compute the second term on the right-hand side as follows (83. ~~.r X(s) >x} = p{X(t) ~ 2x . then in order for it to be below y at time t it must decrease by at least x . is called geometric Brownian motion Since X(t) is normal with mean 0 and variance t. t ~ O} is Brownian motion. P{Z(t) ~ y} = P{X(t) ~ y} = P{X(t) 1 ~ . Var(Z(t)) = (1 -.3.34) E[Z(t)] = V2t!TT. t Z(t) = IX(t)!. t ~ O}. its moment generating function is given by and so E[Y(t)] = E[eX(II] e l12 (E[Y(t)])2 e 1 Var(Y(/)) = E[y2(/)] E[e 2X(/I] Geometric Brownian motion is useful in modeling when one thinks that the percentage changes (and not the absolute changes) are independent and identically distributed For instance..368 BROWNIAN MOTION AND OTHER MARKOV PROCESSES 8.Y. then the process {Z(t). defined by yet) = e XUI . t :> O} is Brownian motion.2 Brownian Motion Reflected at the Origin If {X(t). suppose that Yen) is the price of some commodity at time n Then it might be reasonable to suppose that yen)! Yen 1) (as opposed to Y" . then the process {yet). The mean and variance of Z(t) are easily computed and (8. The distribution of Z(t) is easily obtained For y > 0.3 Geometric Brownian Motion If {X(t). t ~ 0. where is called Brownian motion reflected at the origin. :> O}.-l) are independent and identically distrib- .) t 8.3.y} 2P{X(t) <: y} where the next to last equality follows since X(t) is normal with mean O. For instance.da e. and so log Y(n) = L log X" .1). t ~ O} is Brownian motion. Supposing that the present value of the stock is y and that its price varies according to geometric Brownian motion. suppose we are interested in modeling the price of a commodity throughout time Letting Z(t) denote the price at t.Ja> Ja> log[(K + a)ly] V2rrT a 8. one unit of a stock at a fixed price K. log Y(n). As the option will be exercised if the stocks pnce at time T is K or higher. taking Y(O) = = Y(n)/Y(n . we might want to assume that the rate of change of Z(t) follows a Brownian motion. 1. Letting Xn then. we might suppose that the rate of change of the commod- . and so {Y(n)} would be approximately geometric Brownian motion. t Z(t) ~ O} defined by (83. are independent and identically distributed. let us compute the expected worth of owning the option. rather than assuming that {Z(t)} is Brownian motion (or geometric Brownian motion).= I n Since the X. its expected worth is E[max(Y(T) . then.K. would be approximately Brownian motion.X 2f2T dx da a} = -1 .5) = t X(s) ds is called integrated Brownian motion As an illustration of how such a process may arise in practice. when suitably normalized.O)] J: = J: P{ = = P{Y(T) . Suppose one has the option of purchasing. EXAMPLE 8.4 Integrated Brownian Motion If {X(t).K ye X( T) - > a} da K > a} da a> J P { X(T) > a K+ log-y. then the process {Z(t). at a time T in the future.3(A) The Value of a Stock Option.VARIATIONS ON BROWNIAN MOTION 369 uted.3. .Wn are also jointly normal.. . For s ~ I. t (1: + t y dy u dY) du = S2 (~-~) .. n. We now compute these' £[Z(I)] = £ = [t X(s) dSJ t £[X(s)] ds =0.370 BROWNIAN MOTION AND OTHER MARKOV PROCESSES ity's price is the current inflation rate. d dl Z(I) = X(I) or Z(I) = Z(O) + J~ X(s) ds {Z(I).3 5) as a limit of approximating sums.m are independent normal random variables. . . where U"j = 1. It follows from the fact that Brownian motion is a Gaussian process that 1 ~ O} is also Gaussian To prove this first recall that WI. . (8. From this it follows that any set of partial sums of WI. Z(I)] = £[Z(s)Z(I)] = £ [tX(Y)dY tX(u)du ] =£ = = [J~ t X(y)X(u) dy dUJ £[X(y)X(u)] dy du u) dy du = tt t t min(y. Since {Z(I). The fact that Z(II).. . = . . 1 ~ O} is Gaussian.=1 2: a"U" m i = 1. Z(tn) is jointly normal can then be shown by writing the integral in (8. which is imagined to vary as Brownian motion Hence. . it follows that its distribution is characterized by its mean value and covariance function. Wn are said to have a joint normal distribution if they can be represented as W. .3 6) Cov[Z(s).. then :t Wet) or = X(t) Wet) Wet) = W(O) exp {f: Xes) dS}.VARIATIONS ON BROWNIAN MOTION 371 The process {Z(t).Z(t . Cov(X(t). the vector process {(Z(t). X(t) has a bivariate normal distribution with £[Z(I)] = £[X(t)] 0. t . that they are jointly normal. Z(t). (Why not?) However. X(I» + o(h).Z(t h» Cov (Z(I). we see that .h» = Cov(Z(t). Z(t» . X(t». t :> O} is a Markov process. Another form of integrated Brownian motion is obtained if we suppose that the percent rate of change of price follows a Brownian motion process. That is. Z(t) . Taking W(O) = 1. Z(t . Z(t» = t 2/2. (Why?) We can compute the joint distribution of Z(t). To compute their covariance we use (8 3 6) as follows' Cov(Z(t). Cov(Z(t). by the same reasoning as before. J:-h Xes) tis) = Cov(Z(t). Hence. if Wet) denotes the price at t. X(t» = t 112. Z(I) .Cov(Z(t). and so Cov(Z(t). where {X(t)} is Brownian motion.h» = ~- (t h)2 [i t 6 h] t 2hl2 + o(h) However. X(t) by first noting. hX(I) + o(h» = h Cov(Z(t).::: O} defined by (8 3 5) is not a Markov process. p ~(1 + j. then X(I).(2p .P If we let x. suppose that for every dl time unit the process either goes one step of length dx in the positive direction or in the negative direction. and let dl E[X(I)] Var(X(I» -? -? -? 0.t.116) . We start with the probability that the process will hit A before -B. then j. P(x) = P{X(I) hits A before .B < x < A That is. B > 0 Let P(x) denote this probability conditionally on the event that we are now at x. It can.:::: 13/3. A. (ii) {X(I).4 BROWNIAN MOTION WITH DRIFT ~ We say that {X(I). and indeed {X(I)} converges to Brownian motion with drift coefficient j. .tl. I.:::: ~.t~). with respective probabilities p and 1 .1)2]. we see that E[W(I)] = exp{16/6} 8.BIX(O) = x} .t We now compute some quantities of interest for this process. To see this.t if. is Now E[X(I)] = dx [II dl](2p = 1).372 BROWNIAN MOTION AND 0 fHER MARKOV PROCESSES Since Z(I) is normal with mean 0 and variance 12(112 . as Brownian motion. Thus if we let dx . (iii) X(I) is normally distributed with mean j. I j. I ::> O} has stationary and independent increments. where {B(I)} is standard Brownian motion Thus a Brownian motion with drift is a process that tends to drift off at a rate j. Var(X(I» = (dX)2[lldl][1 . O} is a Brownian motion process with drift coefficient (i) X(O) = 0. also be defined as a limit of random walks. the position at time I.tl. = { 1 -1 if the ith step is in the positive direction otherwise.tl and variance I We could also define it by saying that X(I) = B(I) + j. Thus .BROWNIAN MOTION WITH DRIFT 373 We shall obtain a differential equation by conditioning on Y = X(h) . A or -8. equivalently. by time h. or or.t+--.. ) P"(x) _ P '(xJ.1) we have P '(x ) J. ] + o(h) Since Y is normal with mean J. where o(h) in the above refers to the probability that the process would have already hit one of the barriers.X(O). the change in the process between time 0 and time h This yields P(x) = E[ P(x + Y)] + o(h).0 2 Integrating the above we obtain or.t + -2-. upon integration. we obtain (841) P(x) = P(x) + P'(x)J.th + P"(x) J.th and variance h.h ' P"(x) _ o(h) and letting h ~ 0. Proceeding formally and assuming that P( y) has a Taylor series expansion about x yields P(x) = E[P(x) + P'(x)Y + P"(x)y 2/2 + . From (84.t 2 2h 2 + h + o(h) since the sum of the means of all the terms of differential order greater than 2 is o(h). .B e -2/1. we have hm . the probability of reaching A before .p p)(A When p = (n(1 + J.L~x J.2/1.. by letting 0 we see from (844) that P{ up A before down B} =1_ 1 ..e.( .B) = 0..- (844) P{ up A before down B} = . (1 .p .L ~x) Illlx Hence. H -2 A e 1 e .e /I. when each gamble goes up or down ~x units with respective probabilities p and 1 .p)HIIlX 1.+B)lllx 11.p)IIIlX -p ~x ~ = hm Ilx-O .B is thus (843) P{process goes up A before down B} = 2 B /1.p. P( . Remarks (1) Equation (843) could also have been obtained by using the limiting random walk argument For by the gamblers ruin problem (see Example 4 4(A) of Chapter 4) it follows that the probability of going up A before going down B.(A +B)' which agrees with (843) .374 BROWNIAN MOTION AND OTHER MARKOV PROCESSES Using the boundary conditions that P(A) C I and C2 to obtain = 1. we can solve for and thus (842) Starting at x = 0. 2/1. (1 1+ Ilx-O J.L~x). P(O). is 1 .( . and this is easily seen to be x =A + 1/2d -7 For Brownian motion we obtain by letting J.4{a) .A )e. then that player becomes the "doubling player" In other words. the process drifts off to negative infinity and its maximum is an exponential random variable with rate -2J. at some time in the future. The current market price of the stock is taken to be 0.BROWNIAN MOTION WITH DRIFT 375 (2) If J.L < 0 we see from (S. EXAMPLE 8. the option then switches to the other player A popular game often played with a doubling option is backgammon EXAMPLE 8. and we suppose that it changes in accordance with a Brownian motion process having a negative drift coefficient -d. if ever.4.3). each time the doubling player exercises the option. Consider two individuals that for a stake are playing some game of chance that eventually ends with one of the players being declared the winner Initially one of the players is designated as the "doubling player. where P(x) is the probability that the process will ever reach x. independent of its current market price.4. Suppose we have the option of buying.3) = P{Brownian motion goes up A before down B} A+ BB Optimal Doubling in Backgammon.2d x.A).L.4.L (S. one unit of a stock at a fixed price A. the optimal value of x is the one maximizing (x ." which means that at any time he has the option of doubling the stakes If at any time he exercises his option.4(A) Exercising a Stock Option. in this case. From (S. when. then the other player can either quit and pay the present stake to the doubling player or agree to continue playing for twice the old stakes If the other player decides to continue playing.4.4 5) we see that x> 0. where d > 0 The question is.5) P{process ever goes up A} = e2 1'A Thus. should we exercise our option? Let us consider the policy that exercises the option when the market price is x. Our expected gain under such a policy is P(x)(x . by letting B approach infinity that (S.6) 0 in Equation (S. then player II is indifferent as to quitting or accepting the double Hence. and since player II will accept. we will take advantage of their indifference Let the stake be 1 unit Now if player I doubles at p*. it follows that player II would quit if X(t) = p and player I doubles Hence at X(t) = p player I can guarantee himself an expected gain of the present stake by doubling. that a player can announce his strategy and the other player could not do any better even knowing this information) It is intuitively clear that the optimal strategies should be of the following type. then. then player II wins. OPTIMAL STRATEGIES Suppose player I(II) has the option of doubling. and since player II can always guarantee that I never receives more (by quitting if player I ever doubles). player I will win with probability x Also each player's objective is to maximize his expected return.4.376 BROWNIAN MOTION AND OTHER MARKOV PROCESSES We will suppose that the game consists of watching Brownian motion starting at the value t If it hits 1 before 0. From (8. since player I wins 1 under the former Proof Suppose p* . and if the reverse occurs. for instance. then player I wins. player I can do better by waiting for X(t) to either hit 0 or p** If it hits p**. if the game is to be continued until conclusion. then player I(II) should double at time t iff (if and only if) X(t) 2: p* (X(t) ~ 1 . then he can double.6) it follows that if the present state is x. if 0 is hit before p**. rather than doubling at p*. player I will be in the same position as if he doubled at p* On the other hand. player II's optimal strategy is to accept at t iff X(t) ~ p* By continuity it follows that both players are indifferent as to their choices when the state is p* To compute p*. it follows that it must be optimal for player I to double Hence p* ~ p** Lemma 2 p* = p** < p** We will obtain a contradiction by showing that player I has a better strategy than p* Specifically. then his optimal strategy is to double at tiff X(t) 2: p* Similarly. and we will suppose that each player plays optimally in the game theory sense (this means.p**) It remains to compute p* and p** Lemma I p* Proof For any p ~ p** > p**. then under the new policy he will only lose the onginal stake whereas under the p* policy he would have lost double the stake Thus from Lemmas 1 and 2 there is a single critical value p* such that if player I has the option.p"') Player II's (I's) optimal strategy is to accept a double at tiff X(t) ~ p** (X(t) 2: 1 . then II will lose 2 units Hence. which he will exercise if X(t) ever hits 1 p*. and in doing so we will restrict attention to policies that attempt a repair when the state of the process is x. In this example.4(c) Controlling a Production Process.p*)lp*. (1 .p* before 1. 0 < x < B.p* (that is.p*] and so 1 + 2-'--p* or = -2. it breaks down) The cost of attempting a repair is C We shall attempt to determine the policy that minimizes the long-run average cost per time. we consider a production process that tends to deteriorate with time Specifically. then II has the option of the next double.BROWNIAN MOTION WITH DRIFT 377 alternative. when starting at p* is. > 0 When the state of the process is B. If it never hits 1 . p* = !. then II will double the stakes to 4 units and I will be indifferent about accepting or not. then the process returns to state 0. Hence. by (846). we have 1 = E[gain to Ilhits 1 p*] 1p* +2 -1 p* Now if it hits 1 .61 of Chapter ExAMPLE . On the other hand. the process is assumed to break down and a cost R must be paid to return the process back to state O. it is clear that returns to state 0 constitute renewals. we have Elgain to Ilhits 1 . For these policies. we have 1 = E[gain to player I if player II accepts at p*]. Now if player II accepts at p*. and if it is unsuccessful. If the state is x and an attempt to repair the process is made. then this attempt will succeed with probability ax and fail with probability 1 . since the probability of hitting 1 . as I will lose 2 if he quits.ax If the attempt is successful. 8. we may attempt to repair the process before it reaches the breakdown point B.p*. then we assume that the process goes to B (that is. and thus by Theorem 3. we suppose that the production process changes its state in accordance with a Wiener process with drift coefficient p" p. if it hits 1 first). f"(x)/2.4.. 1 2 or.phf'(x) + rex) + o(h). Therefore. and from (8. f(x) is of the form f(x) = ex. equivalently.f'(x) --+- f"(x) o(h) h' and letting h -.ax)] x .7) E[cost of a cycle] = E [length of a cycle] C + R(1 .ax) E [time to reach x]' Let f(x) denote the expected time that it takes the process to reach x We derive a differential equation for f(x) by conditioning on Y = X(h) X(O) the change in time h This yields f(x) ~ould = h + E[f(x .] + o(h) :::: h + f(x) . 1 = p.8) 1 = p. Expanding in a Taylor series gIves f(x) = h + E [ f(x) - y2 Yf'(x) +"2 f"(x) + . .[C + R(I .f' (x) . the policy that attempts to repair when the state is x. has a long~run average cost of p. 0 < x < B. Rather than attempting to solve the above directly note that f(x + y) E[time to reach x + y from 0] E[time to reach x] + E[time to reach x + y from x] = f(x) + fey)· Hence.8) we see that e = 11 p. from (84. where the o(h) term represents the probability that the process have already hit x by time h.7).V)] + o(h). Hence. 0 (84. f(x) = xl p.378 BROWNIAN MOTION AND OTHER MARKOV PROCESSES 3 the average cost is just (8.4. where the term o(h) results from the possibility that the process hits x by time h Expanding the above in a Taylor series about x yields [(x) = e.OTx}] E [exp{ . We start by noting (S 4. in much the same manner as E [Tx] was computed in Example S.X(O).V)] + o(h).lIh E [[(X) .OTy}]' where the last equality follows from stationary and the next to last from independent increments.Tx)}] E[ exp{ . its moment generating function. This yields [(x) = E[exp{-O(h + Tx.9) E[exp{-OTx+y}] = E[exp{-O(Tx + Tx+y = = TJ}] E[exp{-OTx}]E[exp{-O(Tx+y .lIh [[(X) .y)}] + o(h) = e.IITx ].4(C).Yf'(x) + ~. for x > 0. But (S. To determine c let We will obtain a differential equation satisfied by [by conditioning on Y = X(h) .lIh E[t(x . we can then use calculus to determine the policy that minimizes the long-run average cost. Using e.JLhf'(x) +"2 ["(x) + o(h) .lIh = 1 .2 ["(x) +. 0 > 0.4 9) implies for some c > O.JLhf'(x) + + o(h).8h + o(h) now gives [(x) = [(x)(1 . ~ ["(x) ] h ] + o(h) = e. Let Tx denote the time it takes the Brownian motion process with dnft coefficient JL to hit x when JL > 0 We shall compute E[e.BROWNIAN MOTION WITH DRIFT 379 while the policy that never attempts to repair has a long-run average cost of RJLIB For a given function a(x).Oh) . 2 If {X(t). with .... or Hence. then. max X(s) lim . denote the time that Brownian motion with drift coefficient p.2 + 2fJ - p. we see that when J.. since c > 0. ~ 0 We will end this section by studying the limiting average value of the maximum variable Specifically we have the following PROPOSITION 8... hits x Then for fJ > O.380 BROWNIAN MOTION AND OTHER MARKOV PROCESSES Dividing by h and letting h -l> 0 yields 8f(x) and...Lf'(x) + U"(x). = -J.)} if p..fJT..L ~ 0 Thus we have the following PROPOSITION 8. p. x> 0. (84 11) E[exp{ .1 Let T. probability 1.4..= p. we see that either (8410) And.}j = exp{ -x(v'p. t ~ O} is a Brownian motion process with drift coefficient p.4.. using f(x) = e-cX. 1_'" t 2! 0. :. PROPOSITION 8.Tn _I . we may think of the Tn as being the times at which events occur in a renewal process Letting N(t) be the number of such renewals by t. 0:.O$U$s] + E[B(I) . we have ET. and for n > 0 let 1~ denote the time at which the process hits n It follows.c 2tI2}. s] .1 Using Martingales to Analyze Brownian Motion Brownian motion with drift can also be analyzed by using martingales There are three important martingales associated with standard Brownian motion. = 11fL. and hence the result follows from (84 12) and the well-known renewal result N(t)lt.:. t (a) Yet) 2:: O} is a martingale when = B(t). and the one in part (c) has mean 1 Proof In all cases we write B(t) as B(s) + [B(t) . and exp{cB(t) . N(t) + 1 O~'''t Now. that Tn . 11 ET. B2(t) t. 8..BROWNIAN MOTION WITH DRIFT 381 Proof Let To 0.:.. from the assumption of stationary.O:. independent increments.U$s] = E[B(s)IB(u).4. we have (84 12) N(t):.:.B(s)IB(u).4..:.3 Let {B(t). from the results of Example 8 4(C).B(s)] = B(s) + E[B(I) = B(s) where the next to last equality made use ofthe independent increments property of Brownian motion . 1 2: O} be standard Brownian motion Then {Y(t).. (b) Y(t) (c) Yet) where c is any constant The martingales in parts (a) and (b) have mean 0. Ii:.B( s) Jand utilize independent incre- ments (a) E[B(t)IB(u).. are independent and identically distributed Hence. max X(f):. n 2:: 1. since B(T) :::: X(T) . X(T) is either A or .B(sWIB(u).cJJ-T .c 2T12] Letting c =1 = . Y(t) = exp{cB(t) .gives E[exp{ -2JJ-X(T)}] = 1 But. and so we obtain that and so.c 2T12] or. namely.For positive A and B.B(s)] s)] + E[{B(t) .JJ.2JJ. We will find PA .B(s)IB(u).B(sW] =B2(S) + E[B2(t = B2(S) + t s which verifies that B2(t) . O:s u:5 s] .3) . 0:5 u:5 s] + E[{B(t) .c 2 t/2}.t is a martingale We leave the verification that exp{cB(t) . let X(t) = B(t) + JJ-t and so {X(t).::= O} is Brownian motion with drift JJ.T. : P{X(T) = A} by making use of the martingale in Proposition 843(C).:: B2(S) + 2B(s)E[B(t) .382 BROWNIAN MOTION AND OTHER MARKOV PROCESSES (b) E[B2(t)IB(u). thus verifying the result of Eq uation (84.B. t 2: 0 is a martingale as an exercise Now.c 2tI2}. O:s u:5 s] == B2(S) + 2B(s)E[B(t) . =1 E[exp{cX(T) . O:s u:5 s] = E[{B(s) + B(t) - B(sWIB(u). define the stopping time T by T = min{t· X(t) = A or X(t) = -B}. Since this martingale has mean 1. t :. it follows from the martingale stopping theorem that E[exp{cB(T) . h) = xIX(O) Xii}' and so p(x. I. t. y) = E[p(x.BACKWARD AND FORWARD DIFFUSION EQUATIONS 383 If we now use the fact that {B(t).h. y) were actually a probability. X(h) = Xii} :: P{X(t . consider a Brownian motion process with drift coefficient JL and let p(x.JLT] JLE[Tl JLE[Tl B(l . y) denote the probability density of X(I). t > O} is a zero-mean martingale then. For instance suppose we want the density of the random variable X(I) The backward approach conditions on the value of X(h)-that is. y) = lim P{x < X(I) Ax_O < X + axlx(O) y}! ax. That is. X(h)}]. by the stopping theorem. The backward approach is to condition on X(h). y) = E[P{X(t) = xIX(O) = y. it looks all the way back to the process at time h. given X(O) = y. the backwards and the forwards technique. t . X(h»l. which is normal with mean JLh + Y and variance h.5 BACKWARD AND FORWARD DIFFUSION EQUATIONS The derivation of differential equations is a powerful technique for analyzing Markov processes There are two general techniques for obtaining differential equations.h). 0= E[B(T)] = E[X(T) = E[X(T)] APA - . Now P{X(t) = xIX(O) = y. As an illustration. where the expectation is with respect to X(h). Acting formally as if p(x. I. p(x. Assuming that we can expand the right-hand side of . we have p(x. I.PA ) Using the preceding formula for PA gives that 8. The forward approach conditions on X(t . I. p(x. t.. t. y) + o(h) ax h a2 .y) at a + ph-p(x.. X(I .Lh and variance h..5. y) + (X(h) h-p(x.2 p(x. Now P{X(t) = xIX(O) = y.p(x. t. y) = E [P(X. y) .. I. y) ax at ] ~ + 2 ax 2P (X. y) da [P(X. fw(x-a)da a2 p(x.p(x. I.384 BROWNIAN MOTION AND OTHER MARKOV PROCESSES the above in the Taylor series about (x.h . t.y) ay a + -2 . y) (a X)2 + (a x) J!.t.J. y) + J.1) is caned the backward diffusion equation.Lh -.. we thus have p(x.. t. I. y) = = f f fw(x . Letting its density be fw. y) . J = p(x.2 p(x. t. The forward equation is obtained by conditioning on X(I .y)+ . t. where W is a normal random variable with mean J. y) uX at a a + 2 .t. t. ay Dividing by h and then letting it approach 0 gives (8.. 1.y) at a y)-p(x.t. 1 a2 a a Equation (8.a)p(a.y) +-2 -2P(X. y) + o(h).L ay p(x. y) h-p(x.h) = a} P{X(h)::::: xIX(O) ::::: a} = P{W=x -a}. y).I.l. y) = at p(x. I.y)2 a p(x t y) a h2 a2 + (X(h) 2 2 ay2 " + .Y) ay at . y). I. y) .5 1) h a2 2 ay2 p(x. I. we obtain p(x.h). t.h.h p(x. t.t. when it enters state i. . which was first employed in obtaining the distribution of N( t) for a Poisson process.(t) by A. x) = hm h h_O That is.(t)/H.(t).6 ApPLICATIONS OF THE KOLMOGOROV EQUATIONS TO OBTAINING LIMITING DISTRIBUTIONS The forward differential equation approach.(i.A. x) is the probability density that the state at time tis (i.(t) dt + o(dt) We can analyze the semi-Markov process as a Markov process by letting the "state" at any time be the pair (i. is useful for obtaining limiting distributions in a wide variety of models This approach derives a differential equation by computing the probability distribution of the system state at time t + h in terms of its distribution at time t. x)(l .1 Semi-Markov Processes A semi-Markov process is one that. is A. are continuous and we define the hazard rate function A.6. x + h) . 8.2) Equation (8.1) Pl+h(i.(x)h) + o(h) . x). P.APPLICATIONS OF THE KOLMOGOROV EQUATIONS 385 Dividing by h and then letting it go to zero yields (85.: p.6. spends a random time having distribution H. and then lets t -. 00. For x > 0 we have that (8. If the time spent in state i is x. x) with i being the present state and x the amount of time the process has spent in state i since entering Let P {at time t state is i and time since} . and mean J-L. in that state before making a transition.(t) = h. then the transition will be into state j with probability p'J(x).h and x p. the first of which has previously been studied by other methods. (i.(l. is the density of H. i. where h. 8. We will now illustrate its use in some models. entering is between x . j ~ 0 We shall suppose that all the distributions H. Thus the conditional probability that the process will make a transition within the next dt time units.. given that it has spent t units in state i.52) is called the forward diffusion equation. (x).2) p(i. x) and integrating yields P(i.X) = -A.(x).O) or X)) = . p(i. )+o(h) h 1\.2)) = .(y) dY ) thus yields (8.X)A. x P t.(i. log ( p(i. x) = p(i.(X)P.(x) dx L p(j. we have :Xp(i. x) at time t and no transitions must have occurred in the next h time units Assuming that the limiting density p(i. LP(j.(x)~/(X)dX . (x) dx . X h And letting h - 0.. x) = p(i. Lp(j. x) = lim/_co P.6.1). O)H. since the process will instantaneously go from state (j.(x) = exp ( - t A. x) exists. we also have p(i. 0) exp ( - t A/(y) dY ) The identity (see Section 1.JX A.6 of Chapter 1) H. 0) with probability intensity A/x)~. In addition. 0) (from (8 6. by letting t _ 00. 0) L H. x + h) at time t + h the process must have been in state (i. Lh. Dividing by p(i.x). x) to state (i.x+h)-p(i'X)=_i() (. 0) =L = .(x)p(i. (x)Aj(x)~.(y) dy 0 p(i. we have from (86.386 BROWNIAN MOTION AND OTHER MARKOV PROCESSES since in order to be in state (i. Since L. it follows that.J 1T.APpLICATIONS OFTHE KOLMOGOROV EQUATIONS 387 Now hj(x)Pj.fJ-. x) dx = p(i. . and.' 1 L and so. O)fJ-.fJ-. then since p(i. which has transition probabilities Pj " is ergodic and has limiting probabilities 1T" i > 0.fJ-. If we now suppose that the Markov chain of successive states. from (8 6 5). (863) p(l. (864) P{state is i} = Jp(i. From (8 64) we note that (866) ""} P{ state IS l = 1T.(x) dx is just the probability that when the process enters state j it will next enter i Hence.(x) = ~ -L. 0) = Ix L p(j. H. the time already in that state has the equilibrium distribution of H. (865) p(l. fJ-.fJ-. from (862) and (863). L. we see that c==--1T.J 1T. calling that probability Pj " we have P(i. .(y) dy o fJ-. (from (862» (from (8 63». O)P j j . i > 0. P{state is i} = 1. 0). Thus the limiting probability of being in state i is as given by (8 66) and agrees with the result of Chapter 4. by integrating over x. satisfy the stationarity equations.X) • 1T. given that the state is i.fJ-. P{time in state ~ ylstate is i} = JY H.. fJ. = C1T. for some constant c.62) we obtain. 0) = C1T" all i From (8. ~ and. A(x)h)(1 .A(x»G(s. x) n=1 Differentiation yields -G(s.. x) with n denoting the number in the system at that time and x the amount of time the person being served has already been in service Letting p. x) exists. x)(1 . or if (b) the state at time tis (n 1.x) X '" L sn[(-A - = A(x»p(n. x) == '" L snp(n.x) == -(A + A(x»p(n.2 The M/G/l Queue Consider the MIG/I queue where arrivals are at a Poisson rate A. n=>1 (868) dxp(n. x) + >.x) = aX a n. x) = lim. and there is a single server whose service distribution is G. x)Ah + o(h). we have. x) + Ap(n l. x) + Ap(n -1._ '" P.I '" d L sn-d p(n. x) by G(s. The above follows since the state at time t + h will be (n.x)] (from (8 68» n=1 = (As A .x)+h' o(h) and upon letting h 0 l. x + h) if either (a) the state at time t is n.x). This model can be analyzed as a Markov process by letting the state of any time be the pair (n.6. x) denote the density of the state at time t. x) and integrating yields .p(n d Let us now define the generating function G(s. when n => 1. x) and there is a single arrival and no departures in the next h time units Assuming that the limiting density p( n.7) PI+h(n.C n.1.(n. x) Dividing both sides by G(s. and suppose that G is continuous and has hazard rate function A(t).388 BROWNIAN MOTION AND OTHER MARKOV PROCESSES 8. x + h) h pen x) ' ~ = -(A + A(x»p(n. X + h) p.Ah) + p/(n . we obtain from (867) pen. that (86.(n. x and in the next h time units there are no arrivals and no service completions. s)P(O) . where the last equality is a result of Equation (8. O) = G(s. 0).6.6. note that the equation for pen. we derive an equation for P(O) as follows: P{empty at t + h} = P{empty at t}(l . 0) 0::: Ji sn+ Ip(n + 1. x) dx. Substituting this back in (8 6 10).sA(1 . 0) = {Jpen + 1. O) = J(G(s. Thus n= I i sn+lp(n. x) dx + o(h).Ah) + JA(x)hpl1. n > 0.6. x)A(x) dx + P(O)A where P(O) = P{system is empty} n>l n = 1. 0) Je-A(I-s)xg(x) dx .10) sG(s. -4 Letting t -4 (8611) 00 and then letting h AP(O) 0 yields = JA(x)p(l.6.9). is Jpen + 1. where the above has made use of the identity To obtain G(s.9) G(s.10).s)) . To evaluate the second term on the right-hand side of (8. x))A(x) dx + s 2AP(0) = G(s. x) . x) = G(S. 0).O)G(A(l .sp(l.x)A(x) dx + s2AP(0).x)A(x) dx + s2AP(0) n= I or (8.APPLICATIONS OF THE KOLMOGOROV EQUATIONS 389 or (8. we obtain sG(s. x)A(x) dx p(n. O)e-A(I -S)xG(X).s J p(l. _ P(O)(l S)G(A(1 . G(A(l .s)P(O) . This yields 1= = '" 2: P{n in system} n=O P(O) lim (1_ S)O(A(1 .s») A(1 .s where O(s) f e-.J ~.e-A(I-s)y) dG( ) y _ G(s.s lim dd (1 P(O) $-1 s) S !~ ~ [O(A(l P(O) 1 . n=O G(A(l . G(s.s] 1) .O) A(1 . let s approach 1 in the above. e-A(I-S)J: rdG(y) dx = G(s.s» .AE[S] (by L'hopital's rule since 0(0) .G(A(l .6 9) as follows: n=l i snp{n in system} :::::: = = f.i:.s) Hence. 0) f.612) snp{n In system} = P(O) + sP(O)(l . 0) f.s» .390 BROWNIAN MOTION AND OTHER MARKOV PROCESSES or (8. 0) f. integrate (8. To obtain the marginal probability generating function of the number in the system. f~ e-A(I-S)J: dx dG(y) = G(s. x) dx G(s.s) fa> (1 0 .s» 5_1 G(A(1 .612) G(s 0) = .u dG(x) is the Laplace transform of the service distribution. e-A(I-S)J:G(x) dx G(s.::A(l .. from (8.s» .s» .s») _~~_~'--~i:.s L.s» O(A(l-s»-s To obtain the value of P(O).O)(1 G(A(l .. x) can be solved.8) disappears and so we end up with d dx p (l.x) = -(A + A(x))p(1. and. at least in theory We could then attempt to use p(2. For instance.O)e. we have (8.A£[S].613) Thus p(l.6 11). Solving this eq uation yields (8. x) to solve for p(3. and so on (2) It follows from (8614) that p(YIl). x).x).O) Je-AJ:g(x)dx. where £[S] =f x dG(x) is the mean service time Remarks (1) We can also attempt to obtain the functions pen. x) = p(l.x G(A) This formula may now be substituted into (86. we see that P( .14) p( 1 ) = Ae. using (8. 1 0) = A(1 ~ A£[S]) G(A) Finally. when n = I.A£[S].6.8) and the differential equation for p(2. the second right-hand term in (8. given a single customer .x)dx = p(1. we obtain AP(O) = p(1. x) recursively starting with n = 1 and then using (8.APPLICATIONS OF THE KOLMOGOROV EQUATIONS 391 or Po = 1 . O)G(A) Since P(O) == 1 .J:(1 -_A£[S])G(x) A .6 8) for the recursion. using (86 13).Ax G(x) JA(x)p(1.6. the conditional density of time the person being served has already spent in service. the above analysis requires that AE[S] < 1 for the limiting distributions to exist. is given by p(yI1) = J =:: -AYG( ) e _ y . if we are interested in a first passage time distribution.t) (3) Of course. are the successive claims.ILY . they . is a Poisson process with rate A.e. the conditional distribution is exponential with rate A + J. we have Hence. then the appropriate approach is by way of the forward equations. On the other hand. Generally.392 BROWNIAN MOTION AND OTHER MARKOV PROCESSES in the system. that the company always remains solvent That is. We are interested in the probability. the number of claims received by an insurance company by time t. of course.t and thus is not the equilibrium distribution (which. we wish to determine R(x) = P {x + t. whereas if a single claim is made.6. as a function of the initial capital x.3 A Ruin Problem in Risk Theory Assume that N(t). Suppose also that the dollar amount of successive claims are independent and have distribution G If we assume that cash is received by the insurance company at a constant rate of 1 per unit time. then it is usually the backward equation that is most valuable That is in such problems we condition on what occurs the first h time units 8. > 0 for all t} To obtain a differential equation for R(x) we shall use the backward approach and condition on what occurs the first h time units If no claims are made. then the company's assets are x + h.. is exponential with rate J. e-AYG(y) dy In the special case where G(y) 1 . then its cash balance at time t can be expressed as cash balance at t X =:: +t- ~ ~ I~ 1 Y " where x is the initial capital of the company and Y" i ~ 1.~ Y . if we are interested in computing a limiting probability distribution of a Markov process {X(t)}. Y)]Ah + o(h). S" the time of the ith shock Then the total shock value at time t. are assumed to be independent and identically distributed and {X" i 2:!: I} is independent of the Poisson process {N(t). we call {X(t). 8. and so R(x + h) ~ h R(x) = AR(x + h) AE[R(x +h Y)] + o(h) h Letting h -I> 0 yields R'(x) = AR(x) ~ AE[R(x Y)] or R'(x) = AR(x) A I: R(x y) dG(y) This differential equation can sometimes be solved for R. can be expressed as where a is a constant that determines the exponential rate of decrease When the X" i 2:!: 1.Ah) + E[R(x + h . We can compute the moment generating function of X(t) by first conditioning on N(t) and then using Theorem 23. call it X(t).7 A MARKOV SHOT NOISE PROCESS Suppose that shocks occur in accordance to a Poisson process with rate A.A MARKOV SHOT NOISE PROCESS 393 are x + h . R(x) === R(x + h)(1 . which states that . t 2:!: O}.1 of Chapter 2. the number of shocks by t. X" the value of the ith shock. let us denote by N( t). Hence. Associated with the ith shock is a random variable X" i 2:!: 1. t 2:!: O} a shot noise process A shot noise process possesses the Markovian property that given the present state the future is conditionally independent of the past.Y. which represents the "value" of that shock The values are assumed to be additive and we also suppose that they decrease over time at a deterministic exponential rate That is. .. where X(s) has the same distribution as X(s) and is independent of X(t). X(s) is the contribution at time t + s of events arising in (t. the unordered set of arrival times are independent uniform (0. e-a(r. Yar[X(t)J == AE[X2](1 .asAE[X 2J(1 - e. t + s) Hence. n~O n '" == exp { A J~ [q.2a ')/2a 00 The limiting distribution of X(t) can be obtained by letting t _ This gives lim E[exp{sX(t)}] == exp {A in (871). we use the representation X(t + s) = e-a'X(t) + X(s).ay ) - 1] dY} The moments of X(t) can be obtained by differentiation of the above.394 BROWNIAN MOTION AND OTHER MARKOV PROCESSES given N(t) = n. Hence. E[X(t)] == AE[X](1 ~ To obtain Cov(X(t). t) random variables Continuing the equality and using independence gives E[exp{sX(t)}IN(t) = n] == (E[exp{sX. /_'" J'" [4>(se. X(t + s».Ar .ay ) ~ 1] dY } 0 .2ar )/2a. t) random variables This gives E[exp{sX(t)}IN(t) = n] = E [exp {s Ita x.e-a(r-u. (871) E[exp{sX(t)}J =L (At)n fre .e. where VI. That is. where 4>(u) = E[e"xJ is the moment generating function of X.(se. X(t + s)) = e-asYar(X(t)) = e. Vn are independent uniform (0. and we leave it for the reader to verify that (872) e-a')/a.j J. Cov(X(t).U')}J)n = [t 4>(se-ay ) dY1tr == fr. _'" = exp {A J'0" ( O~se 0 dx A =exp { ~ IS . c/J(u) and so. and 0 Hence the limiting density of X(t). ~ O<y< 00. O-s But (0/(0 .s))Ala is the moment generating function of a gamma random variable with parameters A/a.A MARKOV SHOT NOISE PROCESS 395 Let us consider now the special case where the X.. Let us suppose for the remainder of this section that the X. 0 events in (t ~ s. lim E[exp{sX(t)}j = -n-' u. Calling this random variable A(t). when the X.3) Be-8y( Oy )Ala .) . t)} h-O P{y < X(t) < Y + h} = lim f(yea')ea'he. are exponential random variables with rate O. Hence. we have (8. we see that the conditional hazard rate . the time since the last Poisson event prior to t.1 fey) ~ r(A/a. From (874). is (8. oo~x = -a - y 1) dY } (_0 )Ala. are exponential with rate 0.00 Suppose that X(t) = Y An interesting computation is to determine the distribution of the time since the last increase-that is.-} a. in this case.U o .As + o(h) h_O f(y)h + o(h) I)} = exp{~Oy(eaS ~ It should be noted that we have made use of the assumption that the process is in steady state to conclude that the distributions of X(t) and X(t ~ s) are both given by (873). are indeed exponential with rate 0 and that the process is in steady state.7. we can either imagine that X(O) is chosen according to the distribution (87 3) or (better yet) that the process originated at t = .4) P{A(t) > sIX(t) = = h_O = y} lim P{A(t) > sly < X(t) < Y + h} lim P{ye as < X(t ~ s) < (y + h )e a.7. For this latter requirement. X(t" + s) have the same joint distribution In other words.396 BROWNIAN MOTION AND OTHER MARKOV PROCESSES function of A(t)-given X(t) = y. A(Oly) = £Jay) and increases exponentially as we go backwards in time until an event happens. is independent of X(t) and is exponential with rate A). It should be noted that this differs markedly from the time at t until the next event (which. call it A(sly)-is given by tis P{A(t) A(sly) d <: sIX(t) y} = P{A(t) > sIX(t) = y} From this we see that given X(t) y. where L > 0 is a fixed constant and {N(t). .) and X(t 1 + s).:::: O} when X(t) N(t + L) .:::: O} is said to be a stationary process if for all n. of course.:::: O} when X(t) is the age at time t of an equilibrium renewal process (iii) {X(t).:::: O} when P{X(O) j} PI' j::> 0. t .:::: O} are the stationary probabilities. t. j . and so we define the process {X(t). s.::: O} to be a second-order stationary. t . X(t + s)) does not depend on t That is. t ::> 0. or a covariance stationary. t l . process if E[X(t)] = c and Cov(X(t).8 STATIONARY PROCESSES A stochastic process {X(t).N(t). the ensuing process has the same probability law. . Some examples of stationary processes are' (i) An ergodic continuous-time Markov chain {X(t). (ii) {X(t). . the hazard rate of the backwards time to the last event starts at t with rate £Jay (that is. t ::> O} is a Poisson process having rate A The first two of the above processes are stationary for the same reason: they are Markov processes whose initial state is chosen according to the limiting state distribution (and thus they can be thought of as ergodic Markov processes that have already been in operation an infinite time) That the third examplewhere X(t) represents the number of events of a Poisson process that occur between t and t + L-is stationary follows from the stationary and independent increment assumption of the Poisson process The condition for a process to be stationary is rather stringent. a process is second-order stationary (a further name sometimes seen in the literature is weakly stationary) if the first two moments of X(t) are the same for all t and the covariance between X(s) and X(t) depends only . . . where {PJ . 8. . a process is stationary if choosing any fixed point as the origin. t .. t .. t" the random vectors X(t l ). X(t.. _I + Z" and so = " 2: A"-tA. Z..8. where A2 < L Define (8.. let R(s) = Cov(X(t). be uncorrelated random variables with £[Z. we see that {X".. there are many examples of second-order stationary processes that are not stationary.] = 0.sl. . state at time n Iterating (8.:: 1.2) yields X" = A(AX'''_2 + Z. Xes + t)).] = 0..8.:: O} is weakly stationary ..._I + Z._z + AZ. However. Let Zo.8(A) n 0 n .) is a constant multiple of the 1 plus a random error term (Z.) where the above uses the fact that Z.' n2L The process {X". For a second-order stationary process. X" = AX'..+m . n . and 8._l) + 2 z" = A X.=0 tCov(Zt. n .. it follows that a second-order stationary Gaussian process is stationary.:: O} is called a first-order auto-regressive process It says that the state at time n(X.STATIONARY PROCESSES 397 on It . n 2 0..)..1) (8... and Z/ are uncorrelated when i ~ j Since £[X. ZI> Zz.8.. EXAMPLE An Auto Regressive Process. ..2) Xo = Zo. As the finite-dimensional distributions of a Gaussian process (being multivariate normal) are determined by their means and covariances. L)2J ~ 0 if.)2] = 0 if.8(a) A Moving A verage Process.-is_n---. l l PROPOSITION 8.p.1 Let {Xn . W l . and Vn = ~. Xn + I)' and let X nS ~.L.i) n .m)(1"2 k +12 0 ( ) (k if 0 s m s k ifm >k Hence. Let W o. if ever.In and suppose that ~. n :2!: 0. L = 1 R(i)/ n ~ O. which at each time keeps track of the arithmetic average of the most recent k + 1 values of the W's. does Xn == L = 1 X._ _ n2 22:2: R(j . and covanance function R(i) = Cov(Xn.] n I-I I<JSn = R(O) + _'<-. n :2!: 1} be a second-order stationary process with E[Xn] = J. . n :2!: k} is a second-order stationary process Let {Xn. we see that _ { Cov(Xn . and for some pcsitive integer k define ExAMPLE The process {Xn . + 2 2:2: y./ n Then n_'" lim £[(Xn _p. n >0. and only if.8.L will converge to 0 if.n= I Y. That is. W 2.L and Var(Wn) = (1"2.n_ I X.J. Proof Let Y .L The n following proposition shows that E[(Xn . {Xn . the expected sq uare of the difference between X nand J. n :2!: k}. are uncorrelated. and only if.In converge to J. X n + m ) - + 1 .Y. and only if. = X. is called a moving average process Using the fact that the Wn .n= I R(i)/n _ 0 We want to show that this implies that £[ Y~] _ 0 Now £[Y~] =~ £ [~ Y.398 BROWNIAN MOTION AND OTHER MARKOV PROCESSES (the definition for a discrete-time process is the obvious analogue of that given for continuous-time processes) 8. n 2:: I} be a second-order stationary process having mean p. the limiting average value of R(i) converges to O. be uncorrelated with E[WnJ = J. An n important question is when. where tl < s < t2 8. which shows that ~.PROBLEMS 399 We leave it for the reader to verify that the right-hand side of the above goes to 0 when ~~ R(i)/n .. 8. Let W(/) = X(a 2 1)/a for a> O. Compute the conditional distribution of Xes) given that X(tl) X(1 2) = B.-. O} denote a Brownian Bridge process Show that if X(/) = (I + 1)2(II(t + 1». t . then {X(I). and 8.3. I :> = A.::: O} denote a Brownian motion process..SI . What is the distribution of Yet)? Compute Cov(Y(s). E[YnE[Y~]. 82. yet»~ Argue that {yet)..4.3.)]2 ::.. 00 The reader should note that the above makes use of the Cauchy-Schwarz inequality. 8.-. t ..~ RO)/n. Let (a) (b) (c) (d) yet) = tX(Vt). 0.>]2 n n. suppose that E[Y!]. E[X2]E[y2] (see Exercise 828) PROBLEMS In Problems 8 1.-. let {X(t). then (~I R(i))2 = [1.1. ±COV(YI' y.. 0 To go the other way. Let {2(t).::: O} is also Brownian motion Let T inf{t > 0: X(t) O} Using (c) present an argument that P{T = O} = 1..-. Verify that Wet) is also Brownian motion 8. which states that for random variables X and Y (E[Xy])2 ::.-0 [Cov(Y!> Yn)P = [E(YI Y. I 2: O} is a Brownian motion process .2.n.. 0 as n . ± 1. (b) P{T) < s. a> 0 8.11. t 2= O} denote a birth and death process that is allowed to go negative and that has constant birth and death rates An == A.n == p" n = 0. Let T) denote the largest zero of X(t) that is less than t and let T2 be the smallest zero greater than t Show that (a) P{T2 < s} = (2/1T) arc cos ~.400 BROWNIAN MOTION AND OTHER MARKOV PROCESSES 8. t 8.tn (a) Prove that a necessary and sufficient condition for a Gaussian process to be stationary is that Cov(X(s).1. Characterize. t ~ O} be Brownian motion and define Show that {Vet). Find the distribution of (a) IX(t)1 (b) ~ O} denote Brownian motion Io~!~( Xes) I Osss( (c) max Xes) . let {X(t).a. s > t.7. A stochastic process {X(t). ±2.10. 8.6. (b) Let {X(t). in the manner of Proposition 81.9. Compute the density function of Tn the time until Brownian motion hits x 8. P. s < t < y 8.X(tn + a) for all n. T2 > y} = (2/1T) arc sine -VS. . {X(t). Let M(t) = maxOSsSt Xes) and show that P{M(t) > aIM(t) = X(t)} = e. ". t :> u} converges to Brownian motion as A ~ 00 In Problems 87 through 812. Let {X(t). and E[X(t)] = c. Verify the formulas given in (834) for the mean and variance of IX(t)1 .X(t). s ~ t.. X(tn) has the same joint distribution as X(t) + a).5. t ~ O} is said to be stationary if X(t)).8. Suppose X(1) = B.12. Define p. and e as functions of A in such a way that {eX(t).s. 0 -< t ~ I} given that X(1) = B 8..t 1 .Y. t :> O} is a stationary Gaussian process It is called the Ornstein-Uhlenbeck process. X(t)) depends only on t . 8.al12t . . x > O. (c) What is the relationship between g(x). and g(x + y) in (b)? (d) Solve for g(x).PROBLEMS 401 8.20. 8.L < 0. A > 0. show that for x > ° 8. for Brownian motion with drift J.L. (a) Derive a differential equation for f(x). which is not allowed to become negative. (b) For J.L as t. g(y). t ~ O} be a Brownian motion with drift coefficient J.19.L.L --.18. 8. Let To denote the time Brownian motion process with drift coefficient J. (c) Use a limiting random walk argument (see Problem 422 of Chapter 4) to verify the solution in part (b). Let plx) denote the density function of XI' (a) Compute a differential equation satisfied by PI(X) (b) Obtain p(x) = lim /_ PI(X) OJ 8. B > 0. In Example 8.L hits a (a) Derive a differential equation which f(a. let g(x) = Var(TJ and derive a differential equation for g(x).L > 0. For a Brownian motion process with drift coefficient J. Find the limiting distribution of X(t).4(B). where A > 0.16. (e) Verify your solution by differentiating (84. -B < x < A. with probability 1. Consider Brownian motion with reflecting barriers of -Band A. t} satisfies. 8. t) == P{To ::s.-7 X(t) t J. J. Let {X(t).11) 8.14. Let Tx denote the time until Brownian motion 'hits x Compute 8. suppose Xo = x and player I has the doubling option Compute the expected winnings of I for this situation.L let f(x) = E[time to hit either A or -Blxo = x].13.17. For Brownian motion with drift coefficient J. Prove that. B > O.-7 00 .15. (b) Solve this equation. Show that R(') 2:.N(t). In Problem 816. By using the trigonometric identity cos x cos y = Hcos(x + y) + cos(x .402 BROWNIAN MOTION AND OTHER MARKOV PROCESSES 8. Verify Equation (8. t ~ O} is a martingale with mean 1. 8. when Y(t) = exp{cB(t) .21. Let Ube uniformly distributed over (-1T. 8. and let Xn = cos(nU)..24.26.27. t . verify that {Xn .y)].r) 2r2r Vim satisfies the backward and forward diffusion Equations (8. Show that p(x. show that for any s > 0 n -I 1=0 2: p{IX n - J.L for which L.8 1 8.2).c 2 t12} 8. t is a Poisson process .0 asn- 00 .-0 n n 1 implies .LI > s}. 8. Verify that if {B(t).=1 2:2: R( J.7.' Start with the in~uahtY 21xYI '$ x 2 xrV E[X2] for x and YI E[y2] for y) + y2 and then substitute 8.5.25.29.:::= O} is stationary when {N(t)} 8.23. 8. For a second-order stationary process with mean J.22. find Var(Ta) by using a martingale argument. 1T).:::= O} is standard Brownian motion then {Y(t).n:-ol R(i)1 n _ 0. Prove the Cauchy-Schwarz ineq uality: (Hint. Verify that {X(t) = N(t + L) .1) and (852). n > I} is a second-order stationary process. ~ ') -0 n I r<J<n thus completing the proof of Proposition 8. t.28.y) 1 == __ e-(x-y-. Time Series Analysis-Forecasting and Control. Cox and H D. Wiley. Box and G Jenkins.• Academic Press. 2nd ed . . Methuen. A First Course in Stochastic Processes. 1968 2 G. Theory of Stochastic Processes. 4 H Cramer and M Leadbetter. 1975. Wiley. 5 S Karlin and H. New York. P Billingsley. Miller. Stationary and Related Stochastic Processes. New York. the interested (and mathematically advanced) reader should see Reference 1 Both References 3 and 5 provide nice treatments of Brownian motion The former emphasIZes differential equations and the latter martingales in many of their computations For additional material on stationary processes the interested reader should see Reference 2 or Reference 4 1. San Francisco. 1966.REFERENCES 403 REFERENCES For a rigorous proof of the convergence of the empincal distnbution function to the Brownian Bndge process. 1965. 1970 3 D R. HoldenDay. Taylor. Convergence of Prohahilily Measures. London. New York. in Section 92. that the n step transition probabilities in a finite-state ergodic Markov chain converge exponentially fast to their limiting probabilities In Section 9 3 we consider hazard rate orderings. and. written X ~Sl Y. if (9. in Section 96. in fact. some stochastic monotonicity properties of birth and death processes. having interarrival distributions that are decreasing failure rate. where we introduce the coupling approach and illustrate its usefulness. we use coupling to establish. we use it to prove Blackwell's theorem ofrenewal theory when the interarrival distribution is continuous Some monotonicity properties of renewal processes. (ii) a renewal process and a Poisson process (Section 962).1.1 STOCHASTICALLY LARGER We say that the random variable X is stochastically larger than the random variable Y. are also presented In Section 9 4 we consider likelihood ratio orderings In Section 9.CHAPTER 9 Stochastic Order Relations INTRODUCTION In this chapter we introduce some stochastic order relations between random variables. In Section 9 1 we consider the concept of one random variable being stochastically larger than another Applications to random variables having a monotone hazard rate function are presented We continue our study of the stochastically larger concept in Section 92.2.7 we consider associated random variables 9.1) 404 P{X> a} ~ P{Y > a} for all a. we present applications of this to comparison of (i) queueing systems (Section 961). . In Section 9. and to prove.5 we consider the concept of one random variable having more vanability than another. which are stronger than stochastically larger orderings. and (iii) branching processes. In particular. between random variables We show how to use this idea to compare certain counting processes and. in Section 92 1. from Lemma 91 1. z- {~ X-::5 Y. then E[X] ~ E[Y] Proof Assume first that X and Yare nonnegative random variables Then E[X] = J: PiX> a}da ~ J: P{Y> a}da Z Z+. E[f(X)] ~ E[f(Y)] .1. PROPOSITION 9.Z-.STOCHASTICALLY LARGER 405 If X and Yhave distributions F and G. then P{f(X) > a} PiX > f-I(a)} ~ P{Y > rl(a)} = P{f(Y) > a} Hence.\ ifZ~O ifZ<O Now we leave it as an exercise to show that X~ t.1) is equivalent to F(a):> G(a) for all a Lemma 9.1. The next proposition gives an alternative definition of stochastically larger. then (9.f Y=>X+- ~ st Y\ Hence. f(X) ~'I f(Y) and so.! Y ~ E[f(X)] ~ E[f(Y)] for all increasing functionsf Proof Suppose first that X Y and let f be an increasing function We show that f(X) ~'t f(Y) as follows Letting f-l(a) = infix f(x) ~ a}. respectively.2 X ~ . = E[Y] In general we can write any random variable Z as the difference of two nonnegative random variables as follows where ifZ ~ 0 ifZ<O.1 If X ~S! Y.1.. 406 STOCHASTIC ORDER RELATIONS Now suppose that E(f(X)] 2= E(f(Y)] for all increasing functions f. we see that X is IFR (DFR) means that the older the item is the more (less) likely it is to fail in a small time dl. F(I) We say that X is an increasing failure rate (IFR) random variable if A(I) i I. E[f. Suppose now that the item has survived to time I and let XI denote its additional life from I onward. For any a let f.(X)] = P{X> a}. I + dl).3 X is IFR ~ XI is stochastically decreasing in t. then since A(I) dl is the probability that a I-unit-old item fails in the interval (I. Let X be a nonnegative random variable with distribution F and density f Recall that the failure (or hazard) rate function of X is defined by A(I) = 1S 1) .(Y)] = P{Y> a}. XI will have distribution ~ given by (9 1. X is D FR ~ XI is stochastically increasing in t . PROPOSITION 9..l>aIX>I} = F(I + a)/F(I). then E[f. If we think of X as the life of some item.1(A) Increasing and Decreasing Failure Rate.2) FI(a) = P{XI > a} = P{X.1.. and we see that X 2=.( Y ExAMPLE 9. and we say that it is a decreasing failure rate (DFR) random variable if A(I) ~ I.. denote the increasing function ifx> a if x Sa. be argued as follows Ar(a) = lim P{a < X. 0 < ex. for some distribution G.) Mixtures occur when we sample from a population made up of distinct types. where A is the hazard rate function of X Equation (9 1. F(x) = J. if. then its distribution of remaining life is still a mixture of the two exponential distributions. This is so since its remaining life will still be exponential with rate AI if it is a type1 item or with rate A] if it is a type-2 item However. where AI < A2 • To show that this mixture distribution is DFR. The value of an item from the type characterized by ex. G is the distribution of the characterization quantities Consider now a mixture of two exponential distributions having rates AI and A2 .STOCHASTICALLY LARGER 407 Proof It can be shown that the hazard rate function of Xr-call it AI-is given by (913) Ar(a) A(I + a). has the distribution Fa. then "P1(s) is decreasing (increasing) in 1 Similarly.3) can be formally proven by using (9 1 2). < a + hlXr2: a}/h h-O = lim P{a < X h-O 1< a + hlX 2: I. FaCx) dG(ex. the probability that it . if F.(s) is decreasing (increasing) in t. X 12: a}/h = lim P{I h-O + a < X < 1 + a + hlX 2: 1 + a}/h = A(I + a) Since (914) "p/(S) exp {- = exp { 1: A/(a) da} t"S A(y) dY }. or can. then (914) implies that A(y) is increasing (decreasing) in y Thus the lifetime of an item is IFR (DFR) if the older the item is then the stochastically smaller (larger) is its additional life A common class of DFR distributions is the one consisting of mixtures of exponentials where we say that the distribution F is a mixture of the distributions Fa. < 00. it follows that if A(y) is increasing (decreasing). note that if the item selected has survived up to time t. more intuitively. 408 STOCHASTIC ORDER RELATIONS is a type-1 item is no longer the (prior) probability p but is now a conditional probability given that it has survived to time t. then F is DFR. In fact. life> t} l P{life > t} Since the above is increasing in t.(t) dG(a) F(t) Proof d dt F(t) AAt)=---- F(t) . the older the item is. its probability of being a type-l itern is P{t ype > ll rf e t }::.: P{type 1. provided the integrals exist We may now state the following.1.4 The Cauchy-Schwarz Inequality ~ For any distribution G and functions h(t). It turns out that the class of DFR distributions are-closed under mixtures (which implies the above since the exponential distribution.1. /Xl). is a DFR distribution for all 0 < a < /Xl and G a distribution function on (0. t 0.5 If Fa. it follows that the larger t is the more likely it is that the item in use is a type 1 (the better one. is both IFR and DFR). as it has a constant hazard rate function. since AI < A2 ) Hence. To prove this we need the following well-known lemma whose proof is left as an exercise. the less likely it is to fail. where F(t) = J. Lemma 9. k(t). PROPOSITION 9. and thus the mixture of exponentials is DFR. Fa(t) dG(a) J: fa. . implying which proves (9 1 6) and established the result (The above also shows f~(/) so k(a) :5 (-f~(/»112 was well defined) ~ 0.2 COUPLING If X ~5[ Y..~(/) dl Fa (I) = Fa(/)f~(/) + f~(/) F~(/) .' ..P(/) ff~(/) (J r Since F(/) = fFa(/) dG(a).... by assumption. and applying the Cauchy-Schwarz Hence to prove (9 1 5) it suffices to show (916) Now Fa is.. then there exists random variables X* and y* having the same distributions of X and Y and such that X* is.' .. and 9..t:. we need to show By letting h(a) = (Fa (/»112. DFR and thus o=:!!!. with probability 1. we see = (-f~(/»II2.COUPLING 409 We will argue that AA/) decreases in I by first assuming that all derivatives exist and then showing that (d/dl)AA/) ~ 0 Now d dl F(/) dG(a) + . k(a) inequality..... it follows from the above that to prove that (d/dl)AF(/) :s 0.. at least as large as y* Before proving this we need the following lemma ..(/) dG(a) [AF(/) J = ..= . ~Sl V. then for any increasing f f(X I " . known as coupling.22 to generate independent Yi. Proof Let XI. then there exists random variables X and Y having distributions F and G respectively such that P{X ~ Y} = 1 Proof We'll present a proof when F and G are continuous distribution functions Let X have distribution F and define Y by Y = G-1(F(X)) Then by Lemma 9 2 1 Y has distribution G But as F ~ G. by some examples EXAMPLE 9. YII be independent If X. Y:. ' . where Y.2.2(A} Stochastic Ordering of Vectors. .XII)~f(YIo' 51 " YII ). ..1.1 Let F and G be continuous distribution functions If X has distribution F then the random variable G -1(F(X)) has distribution G Proof P{G-l(F(X)) ~ a} = P{F(X)::S G(a)} = P{X:5 F-l(G(a))} = F(F-1(G(a)) = G(a) PROPOSITION 9..* has the distribu- . and (ii) Y -< X We illustrate this method.2. it follows that F..1 ~ G. which proves the result Oftentimes the easiest way to prove F :> G is to let Xbe a random variable having distribution F and then define a random variable Y in terms of X such that (i) Yhas distribution G.410 STOCHASTIC ORDER RELATIONS Lemma 9. XII be independent and use Proposition 9. and so Y = G-I(F(X)) ::s F-l(F(X)) = X.2 If F and G are distributions such that F(a) 2" G(a).. XII be independent and YI . .. Let XI. . COUPLING 411 • tion of Y, and Y; :s X, Then !(XI , since! is increasing Hence for any a !(Yt, ,XII) ~ !(Yt,. ,Y:) and so P{f(Yt,. ,Y:) > a} <: P{f(X I , • ,XII) > a} Since the left-hand side of the above is equal to P{f(YI. YII) > a}, the result follows. Stochastic Ordering of Poisson Random Variables. We show that a Poisson random variable is stochastically increasing in its mean Let N denote a Poisson random variable with mean A For any p, 0 < P < 1, let II. /2,. be independent of each other and of N and such that EXAMPLE 9.2(8) / = , {I with probability p with probability 1 - P 0 Then L~ ,-I N is Poisson with mean Ap (Why?) Since the result follows We will make use of Proposition 9 1 2 to present a stochastically greater definition first for vectors and then for stochastic processes Definition We say that the random vector K = (X h random vector Y = (Y 1 , ,Yn ), written X £[f(X)] 2: Xn) is stochastically greater than the 2:" 2::: if for all increasing functions 1 , £[/(2:::)] 4U We say that the stochastic process {X(t), t O} if ~ STOCHASTIC ORDER RELATIONS O} is stochastically greater than {yet), t;::: for all n, t l , . ,tn' It follows from Example 9.2(A) that if X and Yare vectors of independent components such that X, ;::::51 }'., then X >51 Y. We leave it as an exercise for the reader to present a counterexample when the independence assumption is dropped. In proving that one stochastic process is stochastically larger than another, coupling is once again often the key. EXAMPLE 9.2(c) t ~ O}, i Comparing Renewal Processes. Let N, = {N,(t), = 1,2, denote renewal pr~esses having interarrival distri~ butions F and G, respectively. If F 51 G, then {M(t), t ~ O} :::; {N2(t), t ~ O}. To prove the above we use coupling as follows. Let XI, X 2 , ••• be independent and distnbuted according to F. Then the renewal process generated by the X,-call it N'( -has the same probability distributions as N I • Now generate independent random variables YI , Y2 , ••• having distribution G and such that Y, :5 X,. Then the renewal process generated by the interarrival times Y,-call it Nt -has the same probability distributions as N 2 • However, as Y, :5 X, for all i, it follows that N('(t) which proves the result. Our next example uses coupling in conjunction with the strong law of large numbers. S Ni(t) for all t, Let XI. Xl,. . be a sequence of independent Bernoulli random variables, and let P, = P{X, = I}, i ~ 1 If P, > P for all i, show that with probability 1 EXAMPlE 9.2(D) lim inf 2: XJn ~ p. If If ,=1 COUPLING n 413 (The preceding means that for any s > 0, only a finite number of n.) 2: X/n i=1 :::;; P e for Solution We start by coupling the sequence X I1 i :> 1 with a sequence of independent and identically distributed Bernoulli random variables Y" i ~ 1, such that P{YI = 1} = P and XI ~ YI for all i. To accomplish this, let V" i ~ 1, be an independent sequence of uniform (0, 1) random variables. Now, for i = 1, , n, set x, G if VI <: PI otherwise, if V, <:p and otherwise Since P :::;; PI it follows that Y/ :::;; X" Hence, lim inf n 2: X/n ~ lim inf 2: YJn. 1=1 n 1=1 n n But it follows from the strong law of large numbers that, with probability 1, lim inf n 2: YJn ,wi n = p. ExAMPLE 9.2(E) Bounds on the Coupon Collector's Problem. Suppose that there are m distinct types of coupons and that each coupon collected is type j with probability PI' j = 1, . , m. Letting N denote the number of coupons one needs to collect in order to have at least one of each type, we are interested in obtaining bounds for E[N]. To begin, let ii' . ,im be a permutation of 1, , m Let TI denote the number of coupons it takes to obtain a type ii, and for j> 1, let ~ denote the number of additional coupons after having at least one of each type ii' .. ,i,-l until one also has at least one of type il (Thus, if a type i, coupon is obtained before at least one of the types ii, . , ir I, then ~ = 0, and if not then T is a geometric random variable with parameter P,) Now, I j and so, (9.2 1) E[N] 2: P{i is the last ofit. j m . , iJI P I( 1=1 Now, rather than supposing that a coupon is collected at fixed time points, it clearly would make no difference if we supposed that 414 STOCHAST[C ORDER RELATIONS they are collected at random times distributed according to a Poisson process with rate 1. Given this assumption, we can assert that the times until the first occurrences of the different types of coupons are independent exponential random variables with respective (for a type j coupon) rates Pi' j = 1, .. ,m. Hence, if we let Xi' j 1,. ,m be independent exponential random variables with rates 1, then X/ p) will be independent exponential random variables with rates Pi' j = 1, .. , m, and so P{i) is the last of ii, . , i) to be collected} I (9.2.2) = P{X,IP, J I = max(X,IP,1 , . . , X,IP, )}. 1 I :5. Now, renumber the coupon types so that PI will use a coupling argument to show that P2 < .. :5. Pm We (9.2.3) (9.24) P{j is the last of 1, .. ,j} < Ilj, P{j is the last of m, m - 1, . . ,j}:> 1/(m - j + 1). To verify the inequality (9.2.3), note the following' P{ j is the last of 1, .. ,j to be obtained} = P{X/ PI = max X/ P,} IS'''''i !5 P{X/ p) = IS,s) XJ p)} max since P, :5. p) l/j. Thus, inequality (9.2.3) is proved. By a similar argument (left as an exercise) inequality (9.2.4) is also established. Hence, upon utihzing Equation (9.2.1), first with the permuta, m (to obtain an upper bound) and then with m, tion 1, 2, , 1 (to obtain a lower bound) we see that m 1, Another lower bound for E[N] is given in Problem 9.17. 9.2(F) A Bin Packing Problem. Suppose that n items, whose weights are independent and uniformly distributed on (0, 1), are to be put into a sequence of bins that can each hold at most one unit of weight. Items are successively put in bin 1 until an item is reached whose additional weight when added to those presently in the bin would exceed the bin capacity of one unit. At that point, bin 1 is packed away, the item is put in bin 2, and the process continues. Thus, for instance, if the weights of the first four items ExAMPLE COUPLING 415 are 45, 32, .92, and .11 then items one and two would be in bin 1, item three would be the only item in bin 2, and item four would be the initial item in bin 3 We are interested in £[8], the expected number of bins needed. To begin, suppose that there are an infinite number of items, and let N, denote the number of items that go in bin i Now, if W, denotes the weight of the initial item in bin i (that is, the item that would nut fit in bin i-I) then (9.25) d N, = max{j W, + VI + . . + V/_ I d <: I}, where X Y means that X and Y have the same distribution, and where VI> V h . is a sequence of independent uniform (0, 1) random variables that are independent of W. Let A.- J be the amount of unused capacity in bin i-I, that is, the weight of all items in that bin is 1 - A,_I Now, the conditional distribution of W" given A.-I, is the same as the conditional distribution of a uniform (0, 1) random variable given that it exceeds A,_I' That is, P{W. > xlA,-d = P{V > xlv> A.-I} where V is a uniform (0, 1) random variable As P{V > xlv> A-I} > P{V > x}, we see that W. is stochastically larger than a uniform (0, 1) random , N._ 1 variable. Hence, from (9.2.5) independently of N 1 , (926) N,<:max{j VI ,[ + ... + Vl sl} Note that the right-hand side of the preceding has the same distribution as the number of renewals by time 1 of the renewal process whose interarrival distribution is the uniform (0, 1) distribution. The number of bins needed to store the n items can be expressed as However, if we let X" i :> 1, be a sequence of independent random variables having the same distribution as N(1), the number of renewals by time 1 of the uniform (0, 1) renewal process, then from (92.6) we obtain that 82: N, '[ 416 where N== min STOCHASTIC ORDER RELATIONS {m. i XI:> I~I n}. But, by Wald's equation, E [~XI] = E[N]E[XI]. Also, Problem 37 (whose solution can be found in the Answers and Solutions to Selected Problems Appendix) asked one to show that E[XI ] N = E[N(1)] = e - 1 and thus, since L X, ~ n, we can conclude that ,~1 E[N] 2:-. n e-1 Finally, using that B :> Sl N, yields that E[B]~-. n e- 1 If the successive weights have an arbitrary distribution F concentrated on [0, 1] then the same argument shows that E[B] ~ m(1) n where m(1) is the expected number of renewals by time 1 of the renewal process with interarrival distribution F. 9.2.1 Stochastic Monotonicity Properties of Birth and Death Processes Let {X(t), t ~ O} be a birth and death process. We will show two stochastic monotonicity properties of {X(t), t :> O}. The first is that the birth and death process is stochastically increasing in the initial state X(O). COUPLING 417 PROPOSITION 9.2.3 {X(t), t ;;:: O} is stochastically increasing in X(O) That is, E[f(X(td. X(O) = i] is increasing in i for all t), , tn and increasing functions f ,X(tn»1 Proof Let {X)(t), t ~ O} and {X2(t), t ~ O} be independent birth and death processes having identical birth and death rates, and suppose X1(0) i + 1 and X 2(O) = i Now since X)(O) > X 2(0), and the two processes always go up or down only by 1, and never at the same time (since this possibility has probability 0), it follows that either the X1(/) process is always larger than the X 2(/) process or else they are equal at some time. Let us denote by T the first time at which they become equal That is, oo T- if Xt(t) > X 2(t) foraHt X\(/) { 1stt. = X 2(t) otherwise Now if T < 00, then the two processes are equal at time T and so, by the Markovian property, their continuation after time T has the same probabilistic structure Thus if we define a third stochastic process-call it {X3(t)}-by ift < T ift;;:: T, then {X3(t)} will also be a birth and death process, having the same parameters as the other two processes, and with X 3 (0) == X 1(0) = i + 1 However, since by the definition of T for t < T, we see for all which proves the result I, Our second stochastic monotonicity property says that if the initial state is 0, then the state at time t increases stochastically in t. PROPOSITION 9.2.4 P{X(t) ~ jIX(O) = O} increases in t for all j 418 Proof For s < t STOCHASTIC ORDER RELATIONS P{X(t) ~ iIX(O) = O} == = 2: P{X(t) ~JIX(O) = 0, X(t 2: P{X(t) ~ ilX(t 2: P{X(s) ~ iIX(O) s) == i}P{X(t s):::: iIX(O) = O} s) = i}Po,(t - s) s) (by the Markovian property) = 2: P{X(s) ~ iIX(O) = i} Po,(t , ~ O} Po,(t - s) (by Proposition 9 23) == P{X(s) ~ iIX(O) :::: O} 2: POt(t I s) = P{X(s) ~ iIX(O) == O} Remark Besides providing a nice qualitative property about the transition probabilities of a birth and death process, Proposition 9.2 4 is also useful in applications. For whereas it is often quite difficult to determine~explicitly the values POj(t) for fixed t, it is a simple matter to obtain the limiting probabilities Pj' Now from Proposition 924 we have P{X(t) :> jIX(O) = O} <:: lim P{X(t) 2: jIX(O) = O} 1-"" which says that X(t) is stochastically smaller than the random variable-call it X( 00 )-having the limiting distribution, thus supplying a bound on the distribution of X(t). 9.2.2 Exponential Convergence in Markov Chains Consider a finite-state irreducible Markov chain having transition probabilities P~ We will use a coupling argument to show that P~ converges exponentially fast, as n ~ 00, to a limit that is independent of i To prove this we will make use of the result that if an ergodic Markov chain has a finite number-say M-of states, then there must exist N, e > 0, such that (9.2.7) P~> e for all i, j n 2: O} and {X~, n 2: O}, where one starts in i, say P{Xo Consider now two independent versions of the Markov chain, say {Xn , = i} = 1, and the COUPLING 419 other such that P{Xo == j} = 1T" j = 1, ... , M, where the 1TJ are a set of stationary probabilities. That is, they are a nonnegative solution of 1TJ = L 1T P I M ,l , 1 1, .. "M, 1=1 Let T denote the first time both processes are in the same state. That is, T = min{n: Now Xn X~}. and so where A, is the event that X Nl -:/= X NI . From (9.2.7) it follows that no matter what the present state is, there is a probability of at least e that the state a time N in the future will be j. Hence no matter what the past, the probability that the two chains will both be in state j a time N in the future is at least e 2, and thus the probability that they will be in the same state is at least Me 2• Hence all the conditional probabilities on the right-hand side of (9.2.8) are no greater than 1 - Mel. Thus (92.9) where ex == Mel, Let us now define a third Markov chain-call it {Xn , n :> O}-that is equal to X' up to time T and is equal to X thereafter That is. for n:5 T forn:> T Since = X T , it is clear that {Xn , n ~ O} is a Markov chain with transition probabilities P'J whose initial state is chosen according to a set of stationary probabilities Now P{Xn = j} = P{Xn Xr = jlT:5 n}P{T:5 n} + P{Xn == liT> n}P{T> n} = jlT:5 n}P{T:5 n} = P{Xn + P{Xn == j,T> n}. 420 Similarly, P~ STOCHASTIC ORDER RELATIONS P{X,,:= j} = P{X" = jlT~ n}P{T~ n} + P{X II j, T> n} Hence P~ - P{XII = j} = P{X II = j, T> n} - P{XII = j, T> n} implying that Ip~ - P{XII = nl < P{T> n} < (1 - a)"JN-l (by (9.2.9». But it is easy to verify (say by induction on n) that and thus we see that IJ IP" - 1TJ 1< 13" --1-' -a where 13 = (1 - a)IJN Hence P~ indeed converges exponentially fast to a limit not depending on i. (In addition the above also shows that there cannot be more than one set of stationary probabilities.) Remark We will use an argument similar to the one given in the above theorem to prove, in Section 9.3, Blackwell's theorem for a renewal process whose interarrival distribution is continuous. 9.3 HAZARD RATE ORDERING AND ApPLICATIONS TO COUNTING PROCESSES The random variable X has a larger hazard (or failure) rate function than does Y if (931) for all t:> 0, where Ax{t) and Ay(t) are the hazard rate functions of X and Y. Equation (9.3.1) states that, at the same age, the unit whose life is X is more likely to instantaneously perish than the one whose life is Y. In fact, since P{X> t + six> t} = exp {-J:+s A(y) dY}, . .HAZARD RATE ORDERING AND APPLICATIONS TO COUNTING PROCESSES 421 it follows that (9.1) is equivalent to P{X> t + siX> t} <: P{Y > t + slY> t}. XIS. '" Let if an event of the counting process occurs at time S. Sz.-1 = 0 ifj = max{k.L(S. P{/. we will then argue.L(t) be such that We first show how the delayed renewal process can be generated by a random sampling from a nonhomogeneous Poisson process having intensity function J. Sz. i ~ 1. Hence. k < i. S2. for i (9. otherwise. to define the counting process we need specify the joint distribution of the I. . t :> O} with intensity function J. Ik I} . and Y.3) > 1.3. = 11/1>' .. . where X. = 1.take (93.L(t).2) and.) = if II = . This is done as follows: Given Sl.. is a delayed renewal process with initial renewal distribution G and interarrival distribution Fsuch that events can only occur at times SI.' denote the times at which events occur in the nonhomogeneous Poisson process {N(t).L(t). To illustrate this let us begin by considering a delayed renewal process whose first renewal has distnbution G and whose other interarrivals have distribution F. Hazard rate ordering can be quite useful in comparing counting processes. equivalently. 51 Y1 for all t ~ 0. are. where both F and G are continuous and have failure rate functions AF(t) and Adt) Let J.. We will now define a counting process-which. the remaining lives of a t-unit-old item having the same distributions as X and Y.It-!} AG(SJ J. or. Let Sl. respectively.3. » J. III). t + h)lhistory up to t} = P{an event of the nonhomogeneous Poisson process occurs in (t.422 STOCHASTIC ORDER RELATIONS To obtain a feel for the above. whereas if II = 1. is the mean interarrival time . and the other values of A(S.L(t)h + o(h»P{it is counted Ihistory up to t} [J.1'-1 For instance. it is the time at t since the last event of the counting process that occurred before t. . . We claim that the counting process defined by the I" i :> 1. We now use this representation of a delayed renewal process as a random sampling of a nonhomogeneous Poisson process to give a simple probabilistic proof of Blackwell's theorem when the interarrival distribution is continuous THEOREM Let {N*(t).L(t)h + o(h)] = ~(~j = Ac(t)h + o(h) AF~~g» = AF(A(t»h + o(h) if A(t) =t = [J. assumed finite. constitutes the desired delayed renewal process To see this note that for the counting process the probability intensity of an event at any time t.) are recursively obtained from II. Then A(SI) = SI. then A(S2) = S2 . and it is countedlhistory up to t} (J.) = s. a ast- 00.32) and (933) are equivalent to AC(S. t F Then (Blackwell's Theorem) ~ O} denote a renewal process with a continuous interarrival distribution m(t + a) . . given the past.) if A(S.SI Then (9.m(t)-fJ.) J..) < S.L(S. then A(S2) = S2.L(S. t + h).) P{/. that is. the probability (intensity) of an event at any time t depends only on the age at that time and is equal to Ac(t) if the age is t and to AF(A(t» otherwise But such a counting process is clearly a delayed renewal process with interarrival distribution F and with initial distribution G. where m(t) = E[N*(t)J and fJ. = if A(S.. let A(t) denote the age of the counting process at time t. is given by P{event in (t.L(t)h + o(h)] if A(t) < t Hence. AF(A(S. if II = 0. . and the counting process in which events occur at those times 5. t + a)15N :s t]P{5 N :s t} + E[N(t. for which i = 1 is a delayed renewal process with initial distribution G and interarnval distnbution F whose event times starting at time 5 N are identical to those of {N(t). 52. t 2:: O} Letting N(t. t 2:: OJ-is a delayed renewal process with interarrival distnbution F. for which 1. t + a)15 N > t]P{5N > t} = E[N(t. Thus the counting process {N(t).12 . . t + a) = N(t + a) . be generated by (9 3 3) with p. = I:' and hence = llIl' P{N < oo} Now define a third sequence I" i 2:: =1 1. we will assume there is 0 < AI < A2 < 00 such that (934) for all t Also let G be a distribution whose failure rate function also lies between AI and A2 Consider a Poisson process with rate A2 Let its event times be 51. t + a)15N :s t]P{5 N :S t} + E[N(t. t + a)15 N > t])P{5 N > t} . Ii. t + a)15 N > t] P{5N > t} = E[N(t. for which I. the failure rate function of F.N(t) and similarly for N. 52. t 2:: OJ-is a renewal process with interarrival distribution F Let = N = mm{i.E[N(t. and generated by (93 3) with G = F and p. N is the first event of the Poisson process that is counted by both generated processes Since independently of anything else each Poisson event will be counted by a given generated process with probability at least AI/ A2 . = I-call it {N(t). is bounded away from 0 and 00 That is. t + a)] = E[N(t.HAZARD RATE ORDERING AND APPLICATIONS TO COUNTING PROCESSES 423 Proof We will prove Blackwell's theorem under the added simplifying assumption that AF(t).(t) =A2' and let 11 . ) of the sequence It. it follows that P{I. t + a)15N > t] . by for i:s N for i 2:: N 1. we have E[N(t. Ii. Now let Ii. That is.(t) A2 • Thus. be conditionally independent (given 5t. t 2:: O} in which events occur at those values of 5. = I:' = I}.* = I-call this process {No(t). t + a)] + (E[N(t. the counting process in which events occur at those times 5. I.= _ {Il* I. . t .co But if we now take G = F" where Ft(t) f~F(Y) dylp. for any sets of time points TJ.424 STOCHASTIC ORDER RELATIONS Now it easily follows that E[N(t. ast. t + a)] E[N(t. and the others have interarrival distnbution F (That is. Tn. which completes the proof We will also find the approach of generating a renewal process by a random sampling of a Poisson process useful in obtaining certain monotonicity results about renewal processes whose interarrival distributions have decreasing failure rates. it is easy to establish (see the proof of part (i) of Theorem 352 of Chapter 3) that when G = Ft .:::: O} denote a renewal process whose interamval distribution F is decreasing failure rate Also. t .1 Let N = {N(t).. As a prelude we define some additional notation Deflnltron For a counting process {N(t). from (935).3. starting at y. t + a)] _ 0 ast.:::: O} denote a delayed renewal process whose first interamval time has the distnbution Hp where Hv is the distribution of the excess at time y of the renewal process N.. . E[N(!)] = tIp. t .:::: O} define. we see from the preceding that (935) E[N(t. for any set of time points T. and so.t +a)]-p. let Ny = {Ny(t). t + a)ISN > t] :s A2a and similarly for the term with N replacing N Hence. N(T) to be the number of events occurring in T We start with a lemma Lemma 9. since N < co implies that P{SN > t} _ 0 as t _ co. Ny can be thought of as a continuation.co. a E[N(t. of a renewal process having the same interarrival distribution as N) Then. i 2: l-call it Ny-is a delayed (since A(O) = A*(y» renewal process whose initial distnbution is Hy Since events of Ny can only occur at time points where events of N occur (l s I..»/p. or. for all i). it follows that N(T) 2: Ny(T) for all sets T. 2: 1 If I. if there have been no events by t. and we let II indicate whether or not there is an event at SI Let A(t) denote the time at t since the last event of this process. define it to be t + A*(y) Let the Lbe such that if 11 = 0. and since A is nonincreasing A(A(S.-I}P{/. this generated counting process will be a renewal process with interarnval distnbution F We now define a~other counting process that also can have events or. • . then II 0. = 1} = A(A(S. then we let where A(SI) = SI and. for i > 1. = 111. independent of N. I. if I.» otherwise The abov~is well defined since. t 2: O} whose interarrival distribution is DFR Then both A(t) and yet) increase stochastically in t 'Ibat is.3. we will always have that A(t) 2: A(t).» s A(A(S. P{A(t} > a} and P{Y(I) > a} are both increasing in t for all a . t s y} denote the first y time units of a renewal process. = 1.!!Y at times Sit i 2: 1. is said to be counted if I.y when I. as L can equal 1 o!!!. = 1. then l. but also having interarrival distribution F We will interpret Ny as the continuation of N* from time y onward Let A*(y) be the age at time y of the renewal process N* denote the times Consider a Poisson process with rate p. as before. A(O) and let SI> Sh at which events occur Use the Poisson process to generate a counting process-call it N-in which events can only occur at times S" . where A(S.»/ A(A(S.HAZARD RATE ORDERING AND APPLICATIONS TO COUNTING PROCESSES 425 Proof Let N* = {N*(t). and the lemma follows PROPOSITION 9.» Hence we see . 1.2 Monotoniclty Properties of DFR Renewal Processes Let A(t) and yet) denote the age and excess at time t of a renewal process N = {N(t).) is the time at SI since the last counted event pnor to S" where the Poisson event at S. and so the counting process generated by the L. G with probability A(A(S.II-I} = P{ll = 11/1 . equals 1 when an event occurs at SI and 0 otherwise. Then. ~I x.3 2 can be used to obtain some nice bounds on the renewal function of a DFR renewal process and on the distribution of a DFR random variable Corollary 9. equivalently. and using Wald's equation we obtain ILI(m(t) + 1) = t + E[Y(t)] . where Y(t) is the excess at t By taking expectations. 21L1 t ILl t ILl 1L2 (ii) F(t) ~ exp . (i) If m(t) is the renewal function of a renewal process having interamval distnbution F. t + a] Proposition 9.+ .3. P{A(t) :s a} ~ P{A(t + y) :s a} The proof for the excess is similar except we now let T = [t.1.2 . or.a.3 Let F denote a DFR distribution whose first two moments are ILl = J x dF(x)..426 Proof Suppose we want to show P{A(t STOCHASTIC ORDER RELATIONS + y) > a} ~ P{A(t) > a} To do so we interpret A(t) as the age at time t of the renewal process Nand A(t + y) as the age at time t of the renewal process NY' of Lemma 93 1 Then letting T = [t . "" t + Y(t). then -:s m(t) :s .- {t ILl -2 21L1 1L2} +1 Proof (i) Let XI> X 2 • denote the interarrival times of a renewal process {N(t). N(I)+I t 2: O} having interarrival distribution F Now 2: . 1L2 == J x 2dF(x). t] we have from Lemma 93 1 P{N(T) 2: I} 2: P{Ny(T) 2: I}. 6 of Chapter 3) lim E[Y(t)] = 2IL2 . t + . :s. and thus m'(t) is equal to the probability (intensity) of a renewal at time t But since A(A(t» is the probability (intensity) of a renewal at t given the history up to that time. /-+"' = ILl ILl we see I IL2 + ILl :s. t + dt)} + o[d(t)].. we thus have m'(/) = E[A(A(t»] ~ A(t). and since E[Y(O)] Proposition 34. 2ILI or (ij) Differentiating the identity m(/) = 2::"1 Fn(t) yields m'(t) dl "' 2: F~(t) dt n=1 = 2: P{nth renewal occurs in (t. Since F(t) = exp ( - we see and the result follows from part (i) . t + dt)} + o[(dt)] n"l "' P{a renewal occurs in (t. I where the above inequality follows since A is decreasing and A(t} above inequality yields met) Integrating the ~ f: A(s) ds f: A(s) ds) .HAZARD RATE ORDERING AND APPLICATIONS TO COUNTING PROCESSES 427 and (see But by Proposition 932 E[Y(t)] is increasing in t. ILI(m(t) + 1) :s. = <: 1. f(x) dx f(/) g(/) 0: f(/) .428 STOCHASTIC ORDER RELATIONS 9.1 Let X and Y be nonnegative random variables having densities rate functions Ax and Ay If X:::::Y. is nondecreasing in x We start by noting that this is a stronger ordering than failure rate ordering (which is itself stronger than stochastic ordering) PROPOSITION 9. for x ::::: AAI) I.4 LIKELIHOOD RATIO ORDERING Let X and Y denote continuous nonnegative random variables having respective densities f and g We say that X is larger than Y in the sense of likelihood ratio. it follows that. and write X~Y LR if for all x ~ y Hence X ~LR Y if the ratio of their respective densities.f' g(x)f(/)lg(/) dx == f' g(x)dx . LR f and g and hazard then for aliI::::: 0 Proof Since X :::::LR Y. f(x)/g(x). f(x) ::::: g(x)f(/)lg(/) Hence.4. jL. we must decide on either f or g A decision rule for the above problem is a function 4>(x). 0 :::. it maximizes The optimal decision rule according to this criterion is given by the following proposition. A central problem in statistics is that of making inferences about the unknown distribution of a given random variable In the simplest case. and so X ExAMPLE 2::LR Y when A :::. . known as the Neyman-Pearson lemma. the one that maximizes f g(x)tjJ(x) dx is tjJ* given by if f(x)/g(x) ~ c if f(x)/g(x) < c. and then restrict attention to decision rules 4> such that (941) f f(x)4>(x) dx -< a Among such rules it then attempts to choose the one that maximizes the probability of rejecting f when it is false That is. The classical approach to obtaining a decision rule is to fix a constant a.4(A) jL. 9. which takes on either value 0 or value 1 with the interpretation that if X is observed to equal x. Neyman-Pearson Lemma Among all decision rules tjJ satisfying (94 1).LIKELIHOOD RATIO ORDERING 429 EXAMPLE 9. let us first note that Ix 4>(x)=1 f(x) dx f f(x) 4>(x) dx represents the probability of rejecting f when it is in fact the true density. we suppose that X is a continuous random variable having a density function known to be either f or g Based on the observed value of X. a ::5 1.4(a) A Statistical Inference Problem. with rate If X is exponential with rate A and Y exponential then f(x) = A e(I'-A)x g(x) jL . then we decide on f if 4>(x) = 0 and on g if 4>(x) 1 To help us decide upon a good decision rule. Likelihood ratio orderings have important applications in optimization theory The following proposition is quite useful PROPOSITION 9. f (</J*(x) . then both are nonpositive Hence. which proves the result If we suppose now thatf and g have a monotone likelihood ratio order-that is.</J(x))(cg(x) .f </J(x)f(x) dx 2: 0.f(x)) 2: O. with densities f and g. The above inequality follows since if </J*(x) = 1. the optimal decision rule is to decide on f when the observed value is greater than some critical number and to decide on g otherwise. f(x)lg(x) is nondecreasing in x-then the optimal decision rule can be written as ~'(x) = G if x < k.f </J(x)g(x) dx ] f </J*(x)f(x) dx .f(x)) and so c 2: dx 2: 0 [J </J*(x)g(x) dx . and suppose that X2:Y LR . then both terms in the product are nonnegative.430 where c is chosen so that STOCHASTIC ORDER RELATIONS f f(x) </J*(x) Proof Let </J satisfy (94 1) For any x dx = Ot (</J*(x) .</J(x))(cg(x) .2 Suppose that X and Yare independent. and if </J*(x) = 0. That is. if x :> k where k is such that J:" f(x) dx = cx.4. X) is also concentrated on the two points h(u. assigning probability A2 == P{Y= max(X. conditional on U u and V = v.2 with probability 8. y) 2: hey. Y)IU = u. v) Similarly. Y). and the result now follows by taking expectations of both sides of the above Remark It is perhaps somewhat surprising that the above does not necessarily hold when we only assume that X :> sl Y. v) and h (v. x) whenever x 2: y. Y) is stochastically larger than P{h(X. then heX. Y)IU u. note that 2x + y ~ x + 2y whenever x ~ y However.LlKELlHOOD RATIO ORDERING 431 If hex. V = v} f(u)g(v) f(u)g(v) + f(v)g(u) to the larger value h(u. with probability .! U 2: = max(X. heY. Y) is concentrated on the two points h(u. V = v} = Since u 2: V. y) is a real-valued function satisfying hex. Y). X) . Y = min(X. X= min(X. V = min(X. X) That is. =u and V = v. and so. Y) 2: heY. Y) 2: al U. Y) Then conditional on U = u. conditional on U heY. assigning probability Proof Let U AI == P{X max(X. v. the conditional distnbution of heX. . if X and Yare independent with x=G y=G with probability 2 with probability 8. u). V = v. g(u)f(v) g(u)f(v) + f(u)g(v) f(u)g(v) 2: g(u )f(v). v) and h(v. V} 2: P{h(Y. u). X) 2: al U. V}. For a counterexample. Y). heX. . . X has increasing likelihood ratio if c + X increases in likelihood ratio as c increases For a second interpretation. . i 3. and if the order chosen is ii. > a} = F(t + a)/F(t).. Continuing with such interchanges leads to the conclusion that 1.Yr. Thus 2X + Y is not stochastically larger than 2Y + X Proposition 94. .1. (A similar argument shows that n. . are to be scheduled in some order For instance.n.Yi+l.+I. .. . . .1) stochastically maximizes (minimizes) the return. i h 1.Xi.. n . 1 stochastically minimizes return. LR ~ LR Xn..Yn) > h(y]. . . . . i'!" ..Yn) whenever Y. say X" i = 1. the measurable characteristic of an item might be the time it takes to complete work on that item Suppose that if XI is the measurable characteristic of item i. as the remaining life from t onward of a unit.2 to show that the ordering (i1' 1. . then it follows from Proposition 94. )..c). For instance.432 STOCHASTlC ORDER RELA TlONS then X >SI Y.1.. .2 that the ordering 1.. 2. .4... in. . note that the random variable c + X has density f(x .we can usc Proposition 9.X. . To see this consider any ordering that does not start with item I-say 01. Let us now suppose that the measurable characterisz tic of item i is a random variable.YoYI-I. and so I) ¢:::> ¢:::> logf(x . . " . then the return is given by h(Xi I . 2.Y. .XI' •. 2.logf(x logf(x) C2) i X for all CI > C2 is concave. > Yi-I. in-I) leads to a stochastically larger return. .. i2.. recall the notation X. If ~ XI ~ LR X 2 > . suppose that n items. ..8) = 96.Yr-I. and if h satisfies h(YI.Ct) . .n (n. . . a permutation of 1.) The continuous random variable X having density fis said to have increasing likelihood ratio iflog(f(x» is concave.XI.2)(.8 and P{2 Y + X > Il} = 8 + (. in-I)' By conditioning on the values X. and is said to have decreasing likelihood ratio if logf(x) is convex.• . each having some measurable characteristic. n. n stochastically maximizes the return.(a) := P{X. Hence. which has reached the age of t Now.-1 . but P{2X + Y > ll} = . having lifetime X.2 has important applications in optimal scheduling problems. n .. F.'" ... . . To motivate this terminology.. 4.(a) Hence. then X is DFR Proof Xs ~ X.STOCHASTiCALLY MORE VARIABLE 433 and so the density of X. X I. is given by f. (2) The likelihood ratio ordering can also be defined for discrete random variables that are defined over the same set of values We say that X ~LR Y if P{X = x}/P{Y = x} increases in x 9. . X has increasing likelihood ratio if Xs decreases in likelihood ratio as s increases Similarly. if X has decreasing likelihood ratio.I Remarks (1) A density function f such that log f(x) is concave is called a Polya frequency of order 2. = f(1 + a)/F(/). I ~ f(s + a) i a f(1 + a) ~ for all s $. for all s $. I logf(x) is concave Therefore. . Recall that a function h is said to be convex if for all 0 < A < 1. it has decreasing likelihood ratio if Xs increases in likelihood ratio as s increases PROPOSITION 9. ~ Ax ::s Ax LR " (from Proposition 9 41) ~Xs~X. then X is I FR Similarly.3 If X has increasing likelihood ratio.5 STOCHASTICALLY MORE VARIABLE X2. 5.5.434 STOCHASTIC ORDER RELATIONS ~y We say that X is more variable than Y. suppose (952) is valid for all a ~ 0 and let h denote a convex increasing function that we shall suppose i<.5.pectively. we have from (952) (953) f: h"(a) 1: F(x) dx da. thu<.1) holds.aV > x} dx = f: P{X> Ii + x}dx = =fa'" F(y) dy And. a convex increasing function.1) holds until we prove the following results PROPOSITION 9. similarly. if £[h(Y)] for all increasing. E[hAY)] = t G(y) dy.aV == {O x-a x~a x>a Since hQ i<.1 If X and Yare nonnegative random variables with distributions F and G re<. We will defer an explanation as to why we call X more variable than Y when (9. ha(x) == (x . and only if. we have E[hQ(X)] ~ E[hQ(Y)] But E [hQ(X)] f: P{(X .>: 1: h"(a) f G(x) dx da . twice differentiable Since h convex is equivalent to h" ~ 0. establishing (952) To go the other way.1) £[h(X)] ~ Y.5.. (952) f F(x) dx ~ f G(x) dx ~v for all a ~ 0 Proof Suppose first that X Y Let ha be defined by . then X ~v Y if. convex h. then we also say that F 2: y G when (9. If X and Y have respective distributions F and G. and write X (9. and only if.E[h(Y)] 2: h'(O)(E[XJ .h(O) . intuitively X will be more variable than Y if it gives more weight to the extreme values. from (952) by setting a = 0 Corollary 9. we see from (953) (954) E[h(X)] . the inequality (954). then X if.sumption that h is convex. Y E[h(X)] 2: E[h(Y)J for all convex h Proof Let h be convex and suppo<. It is for this reason we say that X .E[Y]) The right-hand side of the above inequality is nonnegative since h'(O) 2: 0 (h is increasing) and <.h'(O)E[X] = t J: r h'(x) dx dF(y) . which was obtained under the a<. we would have that Var(X) > Var(Y) ) . which follow<.2:y Y means that X has more variability than Y That is.h'(O)E[X] E[h(X)] .h(O) . Y Then as E[XJ = E[Y].h'(O)E[X] Since a similar identity is valid when Fis replaced by G.5. reduces to E[h(X)] 2: E[h(Y)]. since E[X] = E[Y] and since h(x) = x 2 is convex. and one way of guaranteeing this is to require that E[h(X)] ~ E[h(Y)] whenever h is convex (For instance.2: E[h(Y)] for all convex functions h.STOCHASTICALLY MORE VARIABLE 435 Working with the left-hand side of the above " Jo h"(a) Je: F(x) dx da = J'" JXII h"(a) da F(x) dx 0 Q = J: h'(x)F(x) dx . and the result is proven Thus for two nonnegative random variables having the same mean we have that X ~y Y if E[h(X)] .h'(O)E[X] = t h'(x) dF(y) dx .e that X 2:. 2:.ince E[X] 2: E[Y].2 If X and Yare nonnegative random variables such that E[X] = E[Y].h'(O)E[X] = = J: h(y) dF(y) . :::.1 Again let g and h be increasing and convex Now E[h(g(Xh X 2 . . Y2 • = E[h(g(XI' Y2 .Y.3 If X and Yare nonnegative with E[X] = E[Y].X. . however.x) is convex Our next result deals with the preservation of the variability ordering. X 2 . PROPOSITION 9. = E[h(g(x. Yimplies that . = h"(g(x»(g'(x»)2 + h'(g(x»g"(x).5.:::. . 1.:::g(YI'" y Yn) for all increasing convex functions g that are convex in each argument Proof Start by assuming that the set of 2n random vanables is independent The proof is by induction on n When n = 1 we must show that when g and h are increasing and convex and Xl . Xn»IX I = x] .5.:::y Y I as the function h(g(x» is increasing and convex since dx h(g(x» d = h'(g(x»g'(x) . Proof Let h denote an increasing convex function We must show that E[h(-X)] 2: E[h(-Y)] This.436 STOCHASTIC ORDER RELATIONS Corollary 9. . .::: 0. then X. Y I This follows from the definition of XI . and X/ .::: dx 2 h(g(x» d2 ° Assume the result for vectors of size n . then g(XI' • • Yn are independent. Xn»IXI = x] (by independence of XI. Y 2 .Xn). .:::.Xn) = E[h(g(x. Xn are independent and Y I . follows from Corollary 952 since the function f(x) = h( . Xn»] (by the induction hypothesis) (by the independence of XI. X2> . n.::: E[h(g(x. .:::.4 If XI. Y" i = . APPLICATIONS OF VARIABILITY ORDERINGS 437 Taking expectations gives that But.alX> a] $ E[X] for all a 2:: 0 It is said to be new worse than used in expectation (NWUE) if E[X . If X is NBUE and F is the distribution of X. Yn and using the result for n = 1. .6 ApPLICATIONS OF VARIABILITY ORDERINGS Before presenting some applications of variability orderings we will determine a class of random variables that are more (less) variable than an exponential. then X being NBUE (NWUE) means that the expected additional life of any used item is less (greater) than or equal to the expected life of a new item. then we say that F is an NBUE distribution. v where exp(IL) is the exponential distribution with mean IL The inequality is reversed if F is NWUE.1 If F is an NBUE distnbution having mean IL. then F$ exp(IL). .alX> a] 2:: E[X] for all a 2:: 0 If we think of X as being the lifetime of some unit.Yn) the result dent does not affect the distributions of g(Xh • •• Xn) and g(Yh remains true under the weaker hypothesis that the two sets of n random van abies are independent 9. Since assuming that the set of 2n random variables is indepen. Definit[on The nonnegative random van able X is said to be new better than used in expectation (NBUE) if E[X . PROPOSITION 9.6. and similarly for NWUE. by conditioning on Y2 . we can show that which proves the result. a> xiX> a}dx F(a +x) dx o F(a) dy F(a) Hence. We can evaluate the left-hand side by making the change of variables x = Ia"' F(y) dy.1 we must show that (961) for all c ~ 0 Now if X has distnbution F. where x(c) = Ic"' F(y) dy Integrating yields or which proves (96 1) When F is NWUE the proof is similar .P.. x ~ E. for F NBUE with mean p.p.alX> a] = = = r r ~(y) a J: P{X .. we have or which implies r(t o F(a) ) da F(y)dy ~ E.dx = -F(a) da to obtain _ JX(C) dx . then E[X . By Proposition 95.438 STOCHASTIC ORDER RELATIONS Proof Suppose F is NBUE with mean p. n . by Proposition 95 4. (by Corollary 9 5 3) S~l). then it is easy to verify (see Section 7 1 of Chapter 7 if you cannot verify it) the following recursion formula.6. D! =0.X! .:: 1 Let D~) denote the delay in queue of the nth customer in system i. D" + Sf! X n I }.2 Consider two G/G/l systems The ith.6. i = 1.X~ v Thus. it follows from the recursion (962) and Proposition 954 that (i) D n+1 :> v D(2) 11+1 thus completing the proof .:: . are independent and identically distributed as are the successive service times Sf!' n :> 1 There is a single server and the service discipline is "first come first served.:: D~2) v + S~2) - X~~I Since h(x) = max(O.2. i = 1.2) Dn+! = max{O..1 Comparison of G/ G/1 Queues The GIG/1 system supposes that the interarrival times between customers. + THEOREM 9. Xn .:: S~2) v .." If we let Dn denote the delay in queue of the nth customer.. Since it is obvious for n D~l). assume it for n.. having interarrival times X~) and seruice times S~). (by assumption).APPLICATIONS OF VARIABILITY ORDERINGS 439 9. Now (by the induction hypothesis).2 If (i) and (0) then for all n Proof The proof is by induction. x) is an increasing convex function. (96. n ~ 1.:: D~2) v = 1. DiO + S~l) x~lll .. Proof From Proposition 961 we have that an NBUE distribution is less vanable than an exponential distnbution with the same mean Hence in (i) we can compare the system with an MIG/1 and in (ii) with a GIMII The result follows from Theorem 962 since the right-hand side of the inequalities in (i) and (ii) are respectively the average customer delay in queue in the systems MIG/I and GIM/1 (see Examples 4 3(A) and 4 3(B) of Chapter 4) 9.6.3 For a GIG/l queue with E[S] < E[X] (i) If the interarnval distribution is NBUE with mean lIA. Our objective in this section is to prove the following theorem.2 A Renewal Process Application Let {Np(t).440 Let WQ STOCHASTIC ORDER RELATIONS = lim E[Dn] denote the average time a customer waits in queue Corollary 9.4 If F is NBUE with mean IL. where {N(t)} is a Poisson process with rate IIIL. t . then NF(t) ::Su N(t).6. the inequality being reversed when F is NWUE Interestingly enough it turns out to be easier to prove a more general result. THEOREM 9.:::: O} denote a renewal process having interarrival distribution F.AE[S]) The inequality is reversed if the distribution is NWUE (ii) If the service distribution is NBUE with mean 1I1L. then <: WQ - AE[S2] 2(1 .6. (We'll have more to say about that later) . then where {3 is the solution of and G is the interarrival distribution The inequality is reversed if G is NWUE. (t)+ 1 (964) E[ ~ ] X.=1 . X 2 . and is thus by NBUE.S. I-k .)(t) ~ 2: G(I)(t). each having mean IL. (963) where * F. i ~ 1. I + time from I until N* increases But E[time from t until N* increases] is equal to the expected excess life of one of the X.=1 X. which can be shown to hold when the X. and let 1 let Xl.)(t) . .. (96. :S t} = 2: (FI * ."'I * F. = 2: P{XI + . = E[X]E[N*(t) + 1] by Wald's equation. nonnegative. less than or equal to IL That is.. and all have the same mean (even though they are not identically distnbuted) However. * represents convolution. N'(I)+1 E[ and.5) However.. • be Now N. are independent.5 Let F. and let G denote the exponential distnbution with mean IL Then for each k :2: 1. and G(I) is the i-fold convolution of G with itself Proof The proof is by induction on k To prove it when k independent with X. ~ X.=1 OJ . we also have that N'(/I+ 1 2: . be NBUE distributions. having distribution F..APPLICATIONS OF VARIABILITY ORDERINGS 441 Lemma 9. ~t+IL ] E[N*(t)] ~ tIlL E[N*(t)] = 2: P{N*(t) ~ i} . + X. from (964). k .=k whenever F is NBUE and G exponential with the same mean.-d(1 - x) dF. d(l) I~k = ((~ G 1l * F. G(lJ(I).).:S L G(I)(t) I~k ~ ~ . the mean of the exponential renewal process at 1 (or 11p. IJ(I) I". =L (G(/) * F. ::= L .4 Proof of Theorem 9. si milarly..(x) (by the induction hypothesis) . .442 STOCHASTIC ORDER RELATIONS Since the right-hand side of (9 (j 3) is. Then we could have tried to use induction on k The proof when k = 1 would have been .. I) * G) (I) U(by the induction hypothesis) ::= L Gu .. . .~k"'l . we see from (9 (j 5) that (9 (j 3) is established when k = 1 Now assume (9 (j 3) for k We have * F.3) Remark Suppose we would have tried to prove directly that L F(I)(t). which completes the proof We are now ready to prove Theorem 9 6.. when k = 1. ..64 k ~ By Proposition 951 we must show that for all ~ 1 L P{NF(t) ~ i}:5 L P{N(t) ~ i} '~k ~ '=k But the left-hand side is equal to the left-hand side of (963) with each F. equaling F and the right-hand side is just the right-hand side of (96. 6 If z~) = 1.2. To review this model.. at this point the induction hypothesis would not have been strong enough to let us conclude that the right-hand side was less than or equal to (Lj:k G{J») * F. when we then tried to go from assuming the result for k to proving it for k + 1.6. then Z~I) ~. denote the size of the nth generation of the ith process. we suppose that the number of offspring per individual is more variable in the first process. the reader should refer to Section 4 5 of Chapter 4 . Z~2) for all n Proof The proof is by induction on n Since it is true for n 7~t) = 0.J 1=1 ~ y J' where the ~ are independent and have distribution FI (XI representing the number of offspring of the jth person of the nth generation of process 1) and the ~ are .J XI and 7~2) Z n+1 (2) - - L. Let Z~). The moral is that sometimes when using induction it is easier to prove a stronger result since the induction hypothesis gives one more to work with. assume it for n. Now Z(I) n+1 =~ 1=1 L..6.3 A Branching Process Application· Consider two branching processes and let FI and F2 denote respectively the distributions of the number of offspring per individual in the two processes. 9. i = 1. we would have reached a point in the proof where we had shown that However. Suppose that That is. i = 1.APPLICATIONS OF VARIABILITY ORDERINGS 443 identical with the one we used in Lemma 965 However. 2.. THEOREM 9. y .~1 Y . is nonnegative. N2!M=:>2:X. defined by g(n) x" the above will follow if we can Xn)]. sequences. • Let Nand M be integer-valued nonnegative random van abies that are independent of the X.7 Let Xl.2!2: Y.6.. = £[h(XI + .444 independent and have distribution F2 Since STOCHASTIC ORDER RELATIONS (by the hypothesis) and (by the induction hypothesis). the result follows from the subsequent lemma Lemma 9.6) Let h denote an increasing convex function To prove (9. or. and Y.66) we must show that (967) Since N 2!. 2! Y. equivalently. and they are independent of the show that the function g(n). .8) To prove this let Sn g(n + 1) - g(n) is increasing in n = L~ X" g(n and note that g(n) + 1) - = £[h(Sn + X n+ l ) - h(Sn)] . be a sequence of nonnegative independent and identically distnbuted random variables. X 2 . and similarly Yl .. Y2 . that (96. M. it remains to show that g is convex.= 1 Proof We wi1l first show that (96. Then N M X. + is an increasing convex function of n Since it is clearly increasing because h is and each X. since Sn increases in n. that for increasing.g(n).6. which states that the population size of the nth generation is more variable in the first process than it is in the second process. . E[h(Sn + X n+ l ) - h(Sn)ISn = t] = E[h(t + X n+l ) =f(t} (say). it follows that f(t) is increasing in t Also. equivalently.6. 2 2: Y. then it is less likely at each generation to become extinct. h(t)] Since h is convex. I • I m m =E[h(~Yr)IM mJ. and thus (968) and (96. convex h But E[h (~XI) jM = m] E[h (~Xr) ] (by independence) 2E[h(~Yt)] since 2: X.APPLICATIONS OF V ARIABILITY ORDERINGS 445 Now. The result follows by taking expectations of both sides of the above Thus we have proven Theorem 9.7) are satisfied We have thus proven that and the proof will be completed by showing that or. We will end this section by showing that if the second (less variable) process has the same mean number of offspring per individual as does the first. we see that E[f(Sn)] increases in n But E[f(Sn)] = g(n + 1) . F2 . and F.=. y.1 and ""2 denote respectively the mean of F.2.7 ASSOCIATED RANDOM VARIABLES •• The set of random variables XI. . equivalently... since E[Zn1 == we have that 2: P{Zn ~ i} == . Xn are independent The proof that they are associated is by induction As the result has already been proved when n == 1 (see Proposition 721). or.. p2 p2 . or which proves the result 9.n if h(xl' .' .. ~.6.Xn is said to be associated if for all E[f(X)]E[g(X)] where X = (XI. and F2 ..7.1 independent random variables are associated and . and thus from Proposition 95 1 2: p{Z~I) ~ i} ~ 2: p{Z~2) ~ i}. . i == 1. X 2 .8 Let . == ""2 == . ::s. .... for i = 1. PROPOSITION 9... the offspring distnbutions If zg) == 1.. assume that any set of n ..... Yn) whenever X...n.446 STOCHASTIC ORDER RELATIONS Corollary 9. and where we say that h is an increasing function . Z~2). then P{Z~') == O} ~ p{Z~2) == O} for aU n Proof From Theorem 9 66 we have that Z~I) ~.. X n ).. .. . Xn) -< h(YI. increasing functions f and g E[f(X)g(X)] > .1 Independent Random Variables Are Associated Proof Suppose that X" . Xn-!.x)] let J and g be increasing functions Let X E[J(X)g(x)lxn=x]=E[J(Xh . . then EXAMPLE 9.Xn_l. at least one of the components in each of these subsets is working..X)g(Xh . from Proposition 97. .I .7(A) r where Y.ASSOCIATED RANDOM VARIABLES 447 (Xl. and so.. In addition suppose that there is a family of component subsets C1 . E[J(X)g(X)] ~ E[E[J(X)fXn]E[g(X)IXnll However. Let XI equal 1 if component i is working and 0 if it is failed. are independent and that P{X.1 . (These subsets are called cut sets and when they are chosen so that none is a proper subset of another they are called minimal cut sets.Xn .) Hence.. . each of which is either working or failed.=maxX. Hence. n.Xn_j. Xn) Now. . E[J(X)IXn and E[g(X)IXn are both increasing functions of Xn and so ] ] Proposition 72 1 yields that E[J(X)g(X)] ~ E[E[J(X)IXnll' E[E[g(X)IXnll = E[J(X)]E[g(X)] which proves the result It follows from the definition of association that increasing functions of associated random variables are also associated.x)IXn=x] .1 we see that increasing functions of independent random variables are associated Consider a system composed of n components.Xn.Cr such that the system works if. . = I} PI.x)]E[g(Xh x]E[g(X)IXn = E[J(XIo ~ E[J(X1 . x)] . and the ineq uality from the induction hypothesis Hence. = E[J(X)IXn = x] where the last two eq ualities follow from the independence of the X. and suppose that the X. and only if. i = 1. . X n . jEC.. if we let S equal 1 if the system works and 0 otherwise. x)g(Xb . . let tl < t2 < .X(t..] ~ "~ n £[Y. • X k are associated by representing them as increasing functions of a set of independent random variables Let Fn J denote the conditional distribution function of X. Y.)}. are all increasing functions of the independent random variables XI.1 PROPOSITION 9. I 2:: O} is said to be associated if for all nand IJ.2 A stochastically monotone Markov process is associated Proof We will show that Xu.. it is stochastically monotone if the state at time n is stochastically increasing in the state at time n . alX n. In Any stochastic process {XC t).-I)..) .7(8) The Markov process {Xn' n ~ O} is said to be stochastically monotone if P{Xn ::s.]£ [II y. • X(ln) are associated . . •• .448 STOCHASTIC ORDER RELATIONS As Y I . Definition The stochastic process {X(/)._ I = x Let VI' • Vn be independent uniform . .. is associated. ..] ~ £[Y. i = 1.7. X(t n ) are all increasing functions of the independent random variables X(t.1 = x} is a decreasing function of x for all n and a That is. the random vanables X(/I). < tn Then as X(tl)' . EXAMPLE 9. t ~ O} having independent increments and X(O) = 0.ll P. Since Y i is equal to 1 if at least one of the components in C. they are associated Hence. To venfy this assertion. works. such as a Poisson process or a Brownian motion process. I(1 - Often the easiest way of showing that random vanables are associated is by representing each of them as an increasing function of a specified set of independent random variables. . given that X. . we obtain that P{system works} '" IT { . X n . P{S = I} = £[S] = £ [n y. n (where to = 0) it follows that they are associated.] where the inequalities follow from the association property. .~o(VI) is an increasing function of Xo and Vj.7. . .n If the elements are to be arranged in an ordered list. i = 1.. n. associated..:::: Z} .li-I (V. i = 1. conditional on A A. . Z are independent..:::: Y} 2!: !.l (by the stochastic monotonicity assumption) we see that X. Vn it follows that XI. If X Y. does this imply that P{X .) == infIx V. X2 = F (V2) is an increasing function of XI and V 2 and thus is an increasing function of X o• VI.. . define successively X. and for i = I.' " Xn are independent with a common distribution FA If FA(x) is decreasing in A for all x then XI._I.(x) is increasing in x. . EXAMPLE 9.). given X.n As the X. is an increasing function of the independent random variables X o• VI. .:::: Z} 2!: !? Prove or give a counterexample 9. xi-t (x)} It is easy to check that. n 2: O} Now. Y. since F. 1) random vanables that are independent of {Xn. i = 1. PROBLEMS 9.4. . x.Xn are associated i\ Our next example is of interest in the Bayesian theory of statistics. F.3. . Yj .. show that P{X:> Y} 2!: t Assume independence (b) If P {X. the random variables Xl. has the appropnate conditional distribution Since F.-J. are increasing functions of the independent random variables A. . . i 1. XI = F". F. XII are associated This result can be verified by an argument similar to the one used in Proposition 9. (a) If X Y. n.1.PROBLEMS 449 (0.::. It states that if a random sample is stochastically increasing in the value of a parameter having an a priori distribution then this random sample is.. . .(x) is decreasing in X . Vn be independent uniform (0.' . X.::::!. let Xo have the distnbution specified by the process.7(c) Let A be a random vanable and suppose that. is increasing in X. is increasing in V" Hence. 1) random variables that are independent of A.Xn recursively by Xl = FAI (V.2. it follows that X. VI. Suppose X. prove that X+ Y+ and Y- :>st X- 9. and so on As each of the X. and V 2 . Vn they are associated. unconditionally. let VI. Namely. Then define Xl. P{Y. .' .2. ... and X. Also. . x. . 2 Show by counterexample that it is not necessanly true that Xl + X 2 Y1 + Y 2 • 9. . One of n elements will be requested-it will be i with probability Pi. a > 0. and (c) negative binomial random variables are all discrete IFR random variables.n = 1). = fro e-. Po.8. .. show that min(Xl' " .S. (b) Poisson random variables. Show that a binomial n. i + k(i). Theorem: If F and G are IFR.p increases in n and in p. 1. (Hint The proof is made easier by using the preservation of IFR distribution both under convolution (the theorem stated in Problem 9. n . { P. Consider a Markov chain with states 0.= if} = i . then so if F * G. Give a counterexample to show that max (Xl . . k = 0..9. . (i = 1. 9.. .. . and under limits) 9.6. by where rea). n and with transition probabilities C. .10. .1 ifj = 1 .1. the convolution of F and G Show that the above theorem is not true when DFR replaces IFR 9.7. for some A > 0... The following theorem concerning IFR distributions can be proven. n. . 1.. A random variable taking on nonnegative integer values is said to be discrete IFR if P{X = klX:> k} is nondecreasing in k. If X. which remains true for discrete IFR distributions.c. p distnbution Bn.o = Pn. That is..1X"-1 dx. . Bn. Xn) need not be. Show that (a) binomial random variables. A random variable is said to have a gamma distribution if its density is given. 9. .p increases stochastically both as n increases and as p increases.. i = 1. 0 Show that this is IFR when a :> 1 and DFR when a $ 1 9. 9.7). are independent IFR random variables.450 STOCHASTIC ORDER RELATIONS find the arrangement that stochastically minimizes the position of the element requested.. Xn) is also IFR. prove the coupling bound IP{X E A} . Prove that a normal distnbution with mean J. for all increasing functions f.P{Y E A}I <: P{X ¥.12. L.) 9. r 2 O}.' " Cn-I). i = 1. 9.15.L and variance stochastically as J.n denote the time from r until the next event..13. Then let the first chain run (keeping the second one fixed) until it is either in the same state as the second one or in state n. r ~ O} be a counting process that. (b) Show that L.(r) denote the hazard rate function of F" i = 1. given that A = A. denote the probability that this Markov chain ever enters state 0 given that it starts in state i Show that J.PROBLEMS 451 where k(i) ~ O. 9.:k P" increases in i for all k. That is. 2. en-I) and the other (el. denote two renewal processes with respective interarrival distributions Fl and F2 • Let A.Y} . where P~.(r). If for all s. given that N(r) = n Does Y1. is a Poisson process with rate A. Let J.L increases.n increase stochastically in r and decrease stochastically in n? 9. start the procedure over.e.11. . ••• .2. (a) Derive an expression for G 1 n (b) Does G"n increase stochastically in n and decrease stochastically in r? (c) Let Y1. Tn.k P~ increases in i for all k.2. . What about as (J2 increases? (J2 increases 9. is an increasing function of (el' . show that for any sets T l . (a) Show that. r. where C. p. n >. Suppose both start in state i Couple the processes so that the next state of the first is no less than that of the second. For random variables X and Y and any set of values A. Let G"n denote the conditional distribution of A given that N(r) = n. >.6 of Chapter 2). Consider a Markov chain with transition probability matrix P" and suppose that L. If it is in the same state.14. Let {N. Consider a conditional Poisson process (see Section 2.JU) increases in i. en-I)' (Hint· Consider two such chains. . are the n-step transition probabilities. one having (e l . let A be a nonnegative random variable having distribution G and let {N(r). . is the last of types RI . P{XI > t} P{X2> t} P{XI > s} .17. there exist independent continuous random variables Y and Z such that Y has distribution G and min(Y.. P{R. . and only if.20. ..R. Show that AI(t) :> A2(t) for all t if.16.. . -< ------::. for j = 1. 9. to be obtained.19. to be collected]. = if. Let X and Y have respective hazard rate functions Ax(t) ::s. Let F and G have hazard rate functions AF and AG Show that AF(t) ~ Adt) for all t if. . •• . and only if. m} = 11m' (a) In the coupon collectors problem described in Example 9 2(E). is stochastically smaller than the unconditional distribution of PR • ! (c) Prove that 9. m in the sense that for any permutation iI. i m .Rm be a random permutation of the numbers 1. with distribution P{X > } = P{X> t + s} I S P{X> t} Show that X has the same distribution as X 9. • . where N is the n~mber of coupons needed to have at least one of each type (b) Argue that the conditional distribution of PR given that R.18. then _ {t X- with probability Ax(t)/ Ay(t) with probability 1. argue that E[N] = Lj:1 j-I E[1/ PR IR. Let XI and X 2 have respective hazard rate functions AI(t) and A2(t). ------::. Ay(t) for all t Define a random variable X as follows: If Y = t. Z) has distribution F . is the last of types R I .P{X2 > s} for all s < t. Verify the inequality (924) in Example 92 (E). t+ XI where XI is a random variable.Ax(t)/Ay(t). 9. Let R I . R.452 STOCHASTIC ORDER RELATIONS 9. . independent of all else. a .. .21. The jobs must be processed sequentially and the objective is to stochastically maximize the number of jobs that are processed by a (fixed) time t (a) Determine the optimal strategy if x.aXl . n fixed. this is equivalent to accepting I whenever the observed value of X is greater than some critical value. 2 is used.. i = 1.. 9. .. 0. A stockpile consists of n items Associated with the ith item is a random variable XI' i = 1. . Consider the statistical inference problem where a random variable X is known to have density either lor g The Bayesian approach is to postulate a prior probability p that I is the true density The hypothesis that I were the true density would then be accepted if the postenor probability given the value of X is greater than some critical number. 0) (d) XII is gamma with parameters (n. . 0 E [a. that is. 9. its field life is X. n . b J} is said to be a monotone likelihood ratio family if XII I :5 LR XII 2 when 0 1 :5 O2 • Show that the following families have monotone likelihood ratio· (a) XII is binomial with parameters n.24. Show that the random variables.23. (b) What if we only assumed that :5 st X I +I . ... If X. . having the following densities.25. .\(.n .. We have n jobs with the ith requiring a random time XI to process. what ordering stochastically maximizes the total field life of all items? Note that if n = 2 and the ordering 1. .\ fixed 9.. have increasing likelihood ratio. n.e. n . then the total field life is XI + X 2e. < XI LR XI+ I' i = 1. i = 1..1? 9. 2 2 x>o. the log of the densities are concave..\e-Al(. a ~ 1 (b) Weibull· fix) = a. .\x)a-le-(Al)Q.\x)a-I/f(a). A family of random variables {XII.1.22. a ..PROBLEMS 453 9.\). (a) Gamma· I(x) = . I(x) = 1 aV'2iia e-(~-/L) f2rr. ~LR X. Show that if/(x)/g(x) is non decreasing inx.+I.::: 1 (c) Normal truncated to be positive. n fixed (e) XII is gamma with parameters (0.l. If the ith item is put into the field at time t... (b) XII is Poisson with mean 0 (c) XII is uniform (0.. 11 0). G. Show that n = I} = P. E[f(X)] (8) Prove Jensen's inequality ~ f(E[X]). (Hint' Prove it first for n = 2. .In. F(x) dx 9. P{X = 2} = MI2 where 0 < M < 2. n. Hence.p). If Y is a nonnegative integer-valued random variable such that P{Y = I} =: 0 and E[Y] = M.29.NG(t). Give a counterexample when independence is not assumed. the equilibrium distribution of F.454 STOCHASTIC ORDER RELATIONS 9.27. Jensen's inequality states that for a convex function f. Let Xl... and only if.::s Bin(n. 9. . for instance. Xn are independent and YI .. where Fe.30.31.26. then NF(t) :::>. 9. a Poisson random variable is more variable than a binomial having the same mean.:1 • where Bin(n. ~v Y" i = 1. Prove or give a counterexample to the following: If F :::> v G.28. <. n.. = 1 P{)(. H = F. Y.=l X. p) is a binomial random variable with parameters nand p= L. . then show that X <. . . Suppose XI. 9. LX.. . where Y is Poisson with mean np.. ' . = OJ. . Y. . Xn be independent with P{Xi i = 1. prove that . is defined by with p. 9. Suppose P{X = OJ = 1 M12. = f. . . If )(. where {Nu(t) t ~ O} is the renewal process with interarrival distribution H. Show that F is the distribution of an NBUE random variable if.) Use the above to show that L... . Y n are independent."I P. for all a. 32. . all of the components of at least one of these subsets work (If none of the MJ is a proper subset of another then they are called minimal path sets. 9. . (b) if X and Yare independent and E[Y] = 0 then X (c) if X" i ::> 1. and let (J = E[X]. show that X::> E[X].=1 (d) X + E[X] Sy 2X. n . for X 9. . are independent and identically distnbuted then 1 n 1 n+1 -LX'~-LX. M m such that the system will work if.33.35. If E[X] 0. (c) Suppose that there exists a random variable Z such that E[ZiY] ::> o and such that X has the same distribution as Y + Z. . Suppose that P{O s X s I} = 1. Show that E[YIX] Sy Y.36. it can be shown (though the other direction is difficult to prove) that this is a necessary and sufficient condition Y.) 9. . v where E[X] is the constant random variable.) Show that p{S = I} < 1 - If (1 9. Show that X Y. 9.Xn are all decreasing functions of a specified set of associated random variables then they are associated.. . Show that an infinite sequence of exchangeable Bernoulli random variables is associated .34. P{Y I} (J = 1 .7(A). Show that if XI. Use this to show that: (a) if X and Yare independent then XE[Y] Sy XY.p{Y = O}.::1 v n + 1 . X + Y.PROBLEMS 455 (b) If X has mean E[X].. (That is. In the model of Example 9.37. Show that X where Y is a Bernoulli random variable with parameter (J. and only if. one can always find a family of component subsets M I. In fact. show that eX ::>y X when e ::> 1 Y 9. pp 210-220 T Lindvall." Annals of Probability. Inequalities Theory of Majorization and Its Applications. Academic Press. "A Unified Approach to Stochastic Dominance. 1994 2. "Bounds. Lectures on the Coupling Method. on the topic of stochastically larger. indeed." in Stochastic Optimization Models in Finance. Inequalities and Monotonicity Properties for Some Specialized Renewal Processes. the approach used in Section 922 is from this reference The approach leading up to and establishing Proposition 9 32 and Corollary 9 3 3 is taken from References 2 and 3 Reference 4 presents some useful applications of variability orderings R E Barlow." Bulletin Societe Mathematique de France. L Brumelle." Annals of Probability (1981) S. Stochastic Orders and Their Applications. 3 4 5 6 7. Statistical Theory of Reliability and Life Testing. "Further Monotonicity Properties for Specialized Renewal Processes. pp. San Diego. FL. 8 . 1979 J G Shanthikumar and M Shaked. New York. "Sur Deux Problemes de M Kolmogoroff Concernant Les Chaines Denombrables. 1975 M. 1975 W Doeblin. Orlando. which goes back to Reference 5. and F Proschan. 1992 A Marshall and [ Olkin. and R G Vickson. Brown. New York. Holt. edited by W Ziemba and R Vickson. CA. Academic Press. 66 (1938). Academic Press. 227-240 M. Brown.456 REFERENCES STOCHASTIC ORDER RELATIONS Reference 8 is an important reference for all the results of this chapter In addition. Wiley. New York. the reader should consult Reference 4 Reference 1 should be seen for additional material on IFR and DFR random vanables Coupling is a valuable and useful technique. 8 (1980). for all i E 2:: 0. i = 1. we consider another approximation for P{W = k}. = O}. that is often an improvement on the one specified above. I~I n The" Poisson paradigm" states that if the A. In Section 10 3. . then W will have a distribution that is approximately Poisson with mean A = and so 2: A" 1=1 n In Section 10.P{X. In Section 10. Xn be Bernoulli random variables with P{X. . 10. n 2: X. for establishing the validity of the preceding approximation.1 BRUNts SIEVE The validity of the Poisson approximation can often be established upon applying the following result. are either independent or at most "weakly" dependent. and let W = = 1} = A. known as Brun's sieve PROPOSITION 10.1.C HAP T E R 10 Poisson Approximations INTRODUCTION Let XI. = 1 . are all small and if the X.1 Let W be a bounded nonnegative integer-valued random variable If..2 we give the Stein-Chen method for bounding the error of the Poisson approximation. [(~)] =A'/i' 457 . . based on a result known as Brun's sieve.1 we present an approach. j) " 2: (W)(W-j) J k~O k-O (-I)k (_1)k J k = f ( . with the under. be defined by 1 J = {I ifW j 0 otherwise Then. we have that = = . P{W = j} = f k-O E [( .O EXAMPLE 10. W )] J+k AI+k (j + k) ( -1 )k k """ ~ (j + k )1 "" (j + k) (_1)k k AI'" =-2: (-A)klk l Jf k.1(A) Suppose that W is a binomial random variable with parameters nand p.458 then POISSON APPROXIMATIONS Proof Let I" J 2: 0.. Let A = np and interpret W as the number of successes in n independent trials where each is a success with probability p. (W) L (Wk.W) (j + k) (_I)k k-O J+ k k Taking expectations gives. and so (~) . where n is large and p is small.tanding that G) W-J is eq ual to 0 for r < 0 or k > r. then lfi is not small in relation to n.1(8) Let us reconsider the matching problem. then E [ (7) ] = 0 = A'I. Now. individuals that all select their own hats For each of the (:) sets of . Then. equal to 1 if all these trials result in successes and equal to 0 otherwise. if i is small in relation to n. for each of the (:) sets of. t) is the indicator for the jth set of . trials.!. which was considered in Example 1 3(A) In this problem. define an indicator variable X.BRUN'S SIEVE 459 represents the number of sets of i successful trials. represents the number of sets of .i + l)pl i' Hence. individuals define an indicator variable that is equal to 1 if all the members of this set select their own hats and equal to 0 otherwise Then if X" j . individuals we have . and thus _ n(n . EXAMPLE 10. and so we obtain the classical Poisson approximation to the binomial.1)·· (n . n individuals mix their hats and then each randomly chooses one Let W denote the number of individuals that choose his or her own hat and note that (7) ~ 1. each of i specified outcomes has occurred k or more times.. in the first n tnals. and so if ilr is small. Now. for i <:: POiSSON APPROXIMATIONS n. where r is large. note that (~) is equal to the number of sets of i outcomes that have all occurred k or more times in the first n trials Upon defining an indicator vanable for each of the (. we will now argue that the distnbution of W is approximately Poisson.460 and so. are approximately independent Poisson random variables.i = IIi! Thus. E i i + 1) [( W)] = (n) -n(n--:--.1'-(n. Consider those Y trials that result in one of the i specified outcomes. each with mean ~ l (nilr) = nlr. Therefore. then P{X> n} P{W = O}. is approximately Poisson distributed with mean nil r. Letting W denote the number of outcomes that have occurred at least k times in the first n trials. and categorize each by specifying which of the i outcomes resulted. then by Example 1. it follows from the Poisson approximation to the binomial that Y. EXAMPLE :10. from Brun's sieve. To begin. i.. Let X denote the number of trials needed until at least one of the outcomes has occurred k times. Assuming that it is unlikely that any specified outcome would have occurred at least k times in the first n tnals. from the .5(H) it follows that the Y" j = 1. If ~ is the number that resulted in the jth of the i specified outcomes.1(c) Consider independent trials in which each tnal is equally likely to result in any of r possible outcomes. We will approximate the probability that X is larger than n by making use of the Poisson approximation. each trial will result in anyone of i specified outcomes with probability ilr. the number of matches approximately has a Poisson distribution with mean 1. the number of the first n trials that result in any of i specified outcomes.) sets of i outcomes we see that where p is the probability that. .. . is not small.)'lj! FO t-l Thus.1 Therefore. where 1- an = 2: e-nlr(nl. E[ C) ] = 0 = (ra. with 88 people in a room there is approximately a 51 percent chance that at least 3 have the same birthday . and so P{X> n} :::: P{W = O} = exp{-ra n} As an application of the preceding consider the birthday problem in which one is interested in determining the number of individuals needed until at least three have the same birthday. in all cases we see that E r(.! Hence. is small """ (ran)'m In addition. P{X> n} """ exp{-365an } where an is the probability that a Poisson random variable with mean nl365 is at least 3 For instance.)'f. k = 3 Thus. W is approximately Poisson with mean . This is in fact the preceding with. when iI.BRUN'S SIEVE 461 V"~ independence of the we see that p = (an)'. That is.)'f. = 365. ass = 001952 and so P{X> 88} = exp{-365a ss} """ 490.) ] = (ra. since we have supposed that the probability that any specified outcome occurs k or more times in the n trials is small it follows that an (the Poisson approximation to this binomial probability) is also small and so when ii.an. with n = 88. . let W = 2: x.. n.Wg(W)] = P{W E A} This is done by recursively defining g as follows: 2: e-A.1)1 ~ 1 .Alli' lEA The following property of g will be needed. II A) . .462 POISSON APPROXIMATIONS 10.jg(j) and so. [IU E A A} - lEA 2: e-A.g(j .2. for j 0.Alli! lEA g(O) and for j ~ = 0 0.Alli! lEA E[Ag(W + 1) .. where the XI are Bernoulli random 1=1 n variables with respective means Ao i = 1.2 THE STEIN-CHEN METHOD FOR BOUNDING THE ERROR OF THE POISSON ApPROXIMATION As in the previous section. We omit its proof.2 1) 2: e-A. .Wg(W) = I{W E A} Taking expectations gives that (10. Lemma 10. In this section we present an approach. = IU E A} - 2: e-AAlli! lEA Ag(W + 1) . 2: lEA To begin.1 For any A and A Ig(j) . Hence.Alli' + jg(j)] ~ where I{j E A} is 1 if j E A and 0 otherwise.Wg(W)] = P{W E A} - 2: e-A. known as the Stein-Chen method. g(j + 1) = 1.eA A ~ min(l. for fixed A and A. we define a function g for which E[Ag(W + 1) . for bounding the error when approximating P{W E A} by e-A.AI/i!. Set A = 2: I=l n AI and let A denote a set of nonnegative integers. Ag(j + 1) . .g(j .E[RIX. . = 1 That is. k} P{LX. we obtain from Lemma 10 21 and the triangle inequality that.E[g(V. is any random variable such that distribution of L 1#/ P{V.2) + + g(i + 1) g(i). V..g(i) = g(j) . .E[g(W)IX.=kIX/=l} fj. for i < j g(j) . = 1]A. we let R = g( W) then we obtain that E[Wg(W)] = L A. = 1] If. given that X. (1022) Ig(j) . where V.E[RIX. = 11 L A.-I = L E[RX. is any random variable whose distribution is the same as the conditional X.2. 1IA) To continue our analysis we will need the following lemma Lemma 10.IX. in Lemma 10 2 2. = 1] .THE STEIN-CHEN METHOD FOR BOUNDING THE ERROR 463 Since.1) + g(j 1) g(j .] .-1 n =L" /~I A.=1 n (1023) " + 1)].g(i)1 s Ii iI min(l.=1 n Proof n L E[RX.2 For any random van able R E[WR] = L A. = I. 1=1 L At min(l. = k} = P{L ~ = klxt = J~' I}.2 3 strengthens our intuition as to the validity of the Poisson paradigm From Equation (10. for each i. min(I.2. be any random variable that is distributed as the conditional distribution of L ~ given that X. L AtE[lg(W + 1) n . l/A)E[IW - w here the final ineq uality used Equation (10 2.4) we see that the Poisson approximation will be precise if Wand VI have approximately the same distribution for all i That is.2. i = 1. is such that J*' P{V.E[g(W + 1) . V.$. = 1 is approximately the distribution of W. for all i But this will be the case if . I/A):i A.2) Therefore. (10 24) L A.g(VI + 1)]/ g(Vt + 1)1] V.3 Let V. .n That is. if the conditional distribution of 2: X.I] 1 Remark The proof of Theorem 10.E[lW lEA V. given rl-I that X.=1 for all k Then for any set A of nonnegative integers P{W E A} - 1 L e-AA'/i11 :::.-I IP{W E A} . we have proven the following THEOREM 10.I] :::.E[g(W + 1)] we obtain from Equations (10 2 1) and . . + 1)])/ A.464 n POISSON APPROXIMATIONS Since E[Ag(W + 1)] (10 2 3) that.~ e ANli11 /~ ~(E[g( W + 1)] = /~ n E[g(V. p. .3 is in terms of any random variables VI' i = 1.. . the Poisson 1*' I approximation will be precise provided the A. 11 A) ± . = L x.-I A. are negatively correlated. = I} = (1 .. in many cases where the X. and so the information that X. are small and the X. Let XI be the indicator for the event that urn i is empty and let W = L x. and L Xi approxnnately have the same distribution.n. are independent Bernoulli random variables then we can let V.=1 n is large enough so that each of the A. we can attempt to couple V. there is a coupling that results in VI being less than or equal to W. denote the number of empty urns." That is. ExAMPLE 10. . are only weakly dependent. = P{X. we might . = 1 makes it less likely for other X. to equal 1. with W so as to simplify the computation of £[1 W . then we might expect that W should approximately have a Poisson distribution with mean A = LA.2. with each ball independently going into urn i with probability PI' i = 1. If Wand VI are so coupled in this manner. If m . since urn i being empty makes it less likely that other urns will also be empty.. and also Al = P{X.THE STEIN CHEN METHOD FOR BOUNDING THE ERROR d 465 where:::::: means "approximately has the same distribution. must be weak enough so that the information that a specified one of them is equal to 1 does not much affect the distribution of the sum of the others. 1=1 As the XI have a negative dependence.n. lEA L e-AA'li! I~ min(1.2(8) Suppose that m balls are placed among n urns. it follows that ExAMPLE 10. 1*' and so. = I} must be small for all i so that L x. Hence.. having the distributions specified.VII] For instance.)m n is small.. Since the bound on the Poisson approximation given in Theorem 10.2(A) If the X. the dependence between the X. Therefore. and W so that W ~ VI' For in this case we have that 2: A.i. imagine that the m balls are distributed according to the specified probabilities and let W denote the number of empty urns.] I) = A2 + A = A2 + A ~ AI £ [1 + ~ X. 11 A) ~ A.~ ( 1 P.VI] = £[W] A1*1 £[Vr ] 2: (l_~)m. 1 P. it follows that £[lW . we have shown the following.I] = 2: AI£[W] I 2: AI£[VI] I = A2 - 2: AI(£[l + V.PI Thus. independently. lEA :S. If we now let VI denote the number of urns j. j =I=.466 POISSON APPROXIMATIONS suppose that we can construct a coupling that results in W ~ V r • To show that this is indeed possible.V. 1 . Since W >. where the next to last equality follows by setting R = W in Lemma 10 2 2 Thus. = £[W2] 1] 1] = A2 + A =A - Var(W}. . among the other urns according to the probabilities p/(l .V.I] = £[W .3. [ A .i.£[lW I V.PI}' j =I=. then it is easy to see that VI has the appropriate distribution specified in Theorem 10 2. )m] Added insight about when the error of the Poisson approximation is small is obtained when there is a way of coupling V. for any set A.I ~ 2: ~£[WIX. min(1. that are empty. Now take each of the balls that are in urn i and redistribute them. . Proof Both parts follow from Lemma 10 2 2 To prove (a). = 1. = I}. n.=1 n . we obtain that P{W> O} = 2: A.E[I/(1 + V.E[l/wlx. In this section we present another approach to approximating the probability mass function of W = LX.\'/i! I:s min(1. let ifW>O ifW=O Then from Lemma 10 2 2. given that X.~ e-A. and W so that W 2: V" then IP{W E A} . = '~I n 1] = 2: A. let XI.=1 n A. (a) P{W> O} = 2: .P{V. It is based on the following proposition 1=1 n PROPOSITION 10. distnbuted as the conditional distribution of 2: X. Xn be Bernoulli random variables with A. = k . for any set A ..3. makes it less likely for the other X.Var(W)] Thus. = P{X. 10. for each i.4 If. i = 1.IMPROVING THE POISSON APPROXIMATION 467 PROPOSITION 10. in cases where the occurrence of one of the X. to occur.E[l/(1 + V.1}... ... there is a way of coupling V.1 With V.)] .3 IMPROVING THE POISSON ApPROXIMATION Again. . 1/ A) [A .2.)] A. the Poisson approximation will be quite precise provided Var(W) :::::: E[W]. loosely speaking. k 2: (b) P{W = k} = i~ 1. are only "weakly" dependent then the conditional distribution of VI will approximately be Poisson with mean a. exp{ -aJ 2: a:+I/U + I)! a I = exp{ -a. conditional on X. F~l j Now. If the X.}a:-I/(k -1)!.3.}a. Assuming this is so.})la" (10.1 leads to the following approximations' P{W> O}::::: 2: A.2 to obtain POISSON APPROXIMATIONS P{W == k} = 2: . exp{ -a. if the a. = 1] = .2: '\.!.2. k:=: 1 and E[l/(l + V. and apply Lemma 10.}(exp{a. are all small and if. Vj is the sum of smaller mean .(l 1 n I-I n exp{-a.lj! ) =.)] ::::: 2: (1 + j)-I exp{ -a.\(.exp{-a. let ifW= k otherwise. the remaining ~.l)la.P{V( = k '~1 k k I~l n 1 1 n 1} Let at = E[VJ = 2: E[X.}a~-'/(k - I)!. Using the preceding in conjunction with Proposition 10. are "negatively" dependent then given that X.468 which proves (a) To prove (b). Hence. = 1 it becomes less likely for the other X j to equal 1.1) P{W k}::::: -k 2: A. = (1 .IX. = 1].1}::::: exp{-a.P{W = klx. = 1. i.3.} . then *' P{V( = k .})la.. k P{W = k} New Approx.3(A) (Bernoulli Convolutions). jt-. .1). and let it be 0 otherwise. and p. k P{W = k} New Approx Usual Approx o 1 2 86690 13227 . but now let W denote the number of outcomes . i = 1. .. .. n. Since the nonoccurrence of outcome i will make it even less likely for other outcomes not to occur there is a negative dependence between the X.] = . where a'=LE[~lx. are negatively dependent.Pn. are independent Bernoulli random variables with E[X. 10. .=I]=L (1 . equal 1 if none of the trials result in outcome i. . = il10 W is the number of outcomes that never occur.000020 946485 052056 001431 000026 10. it has been shown in Reference [4] that the preceding approximation tends to be more precise than the ordinary Poisson approximation when the X. Usual Approx 0 1 2 3 EXAMPLE 946302 . ..3. m = 20..3(8) Example 10 2(B) was concerned with the number of multinomial outcomes that never occur when each of m independent tnals results in any of n possible outcomes with respective probabilities PI. .00084 86689 13230 . 1 . Suppose that X" i = 1.P. In fact. Thus we might expect that the approximations given by Equation (10.3(c) Again consider m independent trials.IMPROVING THE POISSON APPROXIMATION 469 Bernoullis than is Wand so we might expect that its distribution is even closer to Poisson than is that of W. each of which results in any of n possible outcomes with respective probabilities PI.. .00080 87464 11715 00785 EXAMPLE 10.052413 001266 000018 946299 052422 001257 . pj)m are more precise than the straight Poisson approximations when A< 1 The following table compares the two approximations when n = 4. provided that A < 1 The following examples provide some numerical evidence for this statement EXAMPLE 10.pn Let X.rt-..11000. The following table compares the new approximation given in this section with the usual Poisson approximation. 1.32 can be quite precise For instance.947751 when the exact probability is 946302 In Examples lO 3(8) and lO. to be approximately Poisson? Explain. A set of n components. 9072911 0926469 0000624 Usual Approx 91140 . The system is said to fail if k consecutive components fail (a) For i < n + 1 .2 P{W > O} 2: f "-. W should be roughly Poisson.3(A) it provides the upper bound P{W = O} !5 . = 1/6.3.. The following table compares the two approximations when n = 6.i + k .08455 00392 o 1 2 9072910 0926468 0000625 Proposition lO 3 1(a) can also be used to obtain a lower bound for P{W > O}. in Example lO.k. let Y. W is the number of outcomes that occur at least five times k P{W = k} New Approx.1 all fail. Corollary 10. . are in a linear arrangement.3(C) it provides upper bounds.=L Proof Since f(x) = l/x is a convex function for x 2: 0 the result follows from Proposition 10 31 (a) upon applying Jensen's inequality The bound given in Corollary lO. as the indicators of Ware negatively dependent (since outcome i occurring at least r times makes it less likely that other outcomes have also occurred that often) the new approximation should be quite precise when A < 1. In addition. be the event that components i.) . m = 10. In cases where it is unlikely that any specified outcome occurs at least r times. that are less than the straight Poisson approximations PROBLEMS 10. Would you expect the distribution of Y.470 POISSON APPROXIMATIONS that occur at least r times./(1 + a. 2: . and p. i + 1. P{W = O} < 86767 and p{W = 0}!5 90735. each of which independently fails with probability p. use Brun's sieve to argue that the distribution of W is approximately Poisson. (a) Approximate P{W = k} (b) Bound this approximation 10. If nand m are both small in companson to N. show that E[Xh(X)] = AE[h(X + 1)] provided these expectations exist. of which n are red. n + 1 . 10. 10. n = 3.3 2.6. > o}.j the events {X. are randomly arranged at a round table. (c) Approximate P(system failure). consisting of n couples.2. (d) For N = 20.1 are arranged in a circle. Suppose that N people toss their hats in a circle and then randomly make selections Let W denote the number of the first n people that select their own hat (a) Approximate P{W = o} by the usual Poisson approximation. where X is the indicator for the event that all components are failed. Approximate P(system failure) and use Brun's sieve to justify this approximation. . (b) Approximate P{W = o} by using the approximation of Section 10.PROBLEMS 471 (b) Define indicators X" i = 1.4. = I} and {X.3. (c) Determine the bound on P{W = o} that is provided by Corollary 10. 10.. Suppose that the components in Problem 10. Let W denote the number of couples seated next to each other. . Let W denote the number of red balls selected.5. If X is a Poisson random variable with mean A. compute the exact probability and compare with the preceding bound and approximations. Suppose that m balls are randomly chosen from a set of N..3. such that for all i ¥. 10. = I} are either mutually exclusive or independent. A group of2n individuals.k. and for which P(system failure) = P{X + ~ X. 2(A) for the usual Poisson approximation REFERENCES For additional results on the Poisson paradigm and Brun's sieve. Say that any set of three vertices constitutes a tnangle if the graph contains all of the G) edges connecting these vertices.3. The Probabilistic Method. and S Janson. M Ross. Suppose that W is the sum of independent Bernoulli random variables.3 is taken from Reference 4 Many nice applications regarding Poisson approximations can be found in Reference 1 1 D. 1992 4 E Pekoz and S. Barbour. (b) using the approximation in Section 10. Clarendon Press. Call the sets of vertices and edges a graph. Also determine the bou'nd provided by Corollary 10. Aldous. New York. J Spencer. New York. Vol 8. December. Probability Approximations via the Poisson Clumping Heuristic. 2 N Alon.2. pp.472 POISSON APPROXIMATIONS 10. Oxford.9. 1994. Spnnger-Verlag. L Holst.3. The approximation method of Section 10. Wiley. including a proof of the limit theorem version of Proposition 10 1 1. 1992. approximate P{W = k} by (a) using the Poisson approximation. Compare this bound with the one obtained in Example 1O. Assuming that p is small. Bound the difference between P{W = k} and its approximation as given in Equation (10. the reader should see Reference 2 For learning more about the Stein-Chen method. 1989.3. Let W denote the number of triangles in the graph. Given a set of n vertices suppose that between each of the (~) pairs of vertices there is (independently) an edge with probability p. 10. Use the approximation of Section 10 3 to approximate P( system failure) in Problem 10.7.8. England. "Improving Poisson Approximations. 10.1). No 4." Probability in the Engineering and Informational Sciences. and P Erdos. 3 A D. Poisson Approximations. 449-462 . we recommend Reference 3.3. E[N.)".. P J 1.(t .(1 .NJINlll. nmP. .Answers and Solutions to Selected Problems CHAPTER loS.P.PJ) .NJ ] otherwise .Pin. nP.] E[N)] .nPi E[N. + n2 p. •• .nP. + n 2p. E[IJI ] (1 . .-m 2 p. .P. 1 . I' = E[N. = 0. (b) E[N.P.]E[NJ] = -nP.nr-a = n1 -r- IT P~" '~I r IT n. 1 .E[N7]P.P) 1 Cov(N" N) 1 (c) Let I) PJ n P'PJ 2 - P nP.P.n 2p.. J t n 2p)p.nP'PJ + nP.PI E[N N] = nE[NI]P.PI' i # j = {0 if outcome j never occurs E[I)] = (1 Var[/J] = (1 i # j.nP: = nP" nPp = nP. . nand E[N?] E[Ni] L r n.PlY' 473 .PJ)n(t .NJ] = E[E[N.(1 . .=1 where n. E[N. = n . .NJINI = m] = mE[N.PJ n 2PJP. . (0) PN I 1 N 1-1 (n l .m) 1 p.INJ = = m] = m(n .p. X 2 < y.474 ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS Number of outcomes that do not occur = 2: II' I~I . ..10.p.P.)n . 2: n-'" P{T = (Xl} (c) Let lim P{T > n} 0 Ty denote the time of the first record value greater than y Let XTy be the record value at time T~ P{XTy > xl Ty = n} P{Xn > xlxl < y. E [~II] = ~ (1 . (a) Leq = {~ ifthere is a record at time j otherwise Var[Nn ] = i Var[I.X.] E[I.] 1=1 i ~ (1 -~) J=I J J ~ I since the II are independent.P.PlY ''''I 1.6.)n(1 . .PlY] + 2:2: [(1 .p. T> n E[T] ¢:> XI ~ = largest of XI. = n"l 2: P{T > n} = n~1 n = (Xl.t(1 .(1 .Pit.Y(1 . 1 Xn. Xn > y} =P{Xn > xlXn > y} 1 { F(x)/F(y) -= x<y x> y Since P{XTy > xl Ty independent of XTy = n} does not depend on n. X 2 .PI)\ Var [i IJ] = i Var[II] + 2:2: Cov[l.l. II]' I~l 1 ''''' Cov[I.P.(1 I~l J~l Pin (1 .._I < y. Var [i II] = i (1 . (b) Let T = min{n n > 1 and a record occurs at n}.] (1 .]E[I. .Ill == E[l. with either contestant having probability 112 of winning For any set S of k contestants let A(S) . Suppose that the outcome of each contest is independent. we conclude Ty is 1.P. . . (c) Since each outcome that is either 1 or even is even with probability 3/4.th players .n S Hence.f (:) [1 - (1I2)'1"~' < 1 then there is a positive probability that none of the events A(S) occurs 1. . . we obtain E[YJ = E[~leven before 1]3/4 + E[YJI1 before even] 114 ] = (1 + E [ YJ]) 1/4.(I12)·I"~·.14.s or her k matches w. it follows that 1 + ~ is a geometric random variable with mean 4/3 .ty follows s. it follows that E[Xd = 10/3 (b) Let WJ denote the number of 2's that occur between the (j . s the probability that each of the n .ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 475 be the event that no member of S< beats every member of S Then P{ Y A(S)) < f PlAtS)) =(:) [1 . (a) Let ~ denote the number of 1's that occur between the (j . 10 By conditioning on whether the first appearance of either a 1 or an even number is 1. implying that As. j = 1.. j = 1. E[~] = E[WJleven number is 2]1/3 + E[WJleven number is not 2]2/3 = 1/3 and so. where the eq ual. nee there are (:) sets of s.1)st and jth even number.ze k and PIA (S)} .k contestants not in S lost at least one of h.1)st and jth even number. 10 Then. . PIX i} en (1/3)'(2/3)'0-· 1.:sxIXnisnotamongismallest}(1 . with parameters n = 10 and p = 314 P{X1 i} P{Neg Bin(1O. 1."-I :s xiX" is not among i smallest}(1 P{K. Xn does not change the joint distribution of X .=sx}F(x) + P{ith smallest s xix" > x}F(x) = P{K. P == 113 That is. (8) F.n-1 =sxlXn is among i smallest} iJn + P{K. it follows that X 2 is a binomial random variable with parameters n = ]0.n(x) = P{ithsmallest xlx.iJn) = P{K. UIU2."-I sx}F(x) (b) Fwl(x) = P{K.:: e-A. ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 2: r l iO (1 + VI) 10 + XI is a negative binomial random vanable Therefore..+1" =s xix" is among i smallest}iln iJn) + P{K. and each is independently 2 with probability 113." :s x}( 1 - iJn) where the final equality follows since whether or not X" is among the i smallest of X it ..n .n P{UI < e.. P{N .A Suppose that = O} = P{N = n} = P{UI ~ e-A.-In-I =sx}F(x) + P{K.17.A} = e.21. = • Ul e. i = ].476 Hence.+I " =s x}iI n + P{K.AAnln' Hence. 314) = 10 + i} = ( 9+ t") (314)1°(114)' 9 (d) Since there are 10 outcomes that are even. = IIe-· e-(A +10gl) (A + log x)" dx n" . 2XE(XI Y) + (E(XI Y)YI Y] = E[X2IY] = E[X 2 IY] 2[E(XIY)]l + [E(XIY)f (E[XIY]f.(t) AI(t) . Xl> I} =~--~~~~~~~~~~ P{X) = t}P{Xz > t} P{X) = t}P{X2> t} + P{X2 t}P{X) > t}' P{Xz = t} = AZ(t)P{X2 > t}.34. Var(XI Y) = E[(X E(XI y»21 Y] = E[X 2 .(E[E[XI y]])2 = E[Var(XI V)] + Var(E[XI V]) 1.Xz > t} P{X1 = t. I)./x = From the above we thus have P{N= n + l} n' eo' 1 II e ). Xl) { } P min(XI • X 2) t t} P{XJ = t.22.ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 477 e-(H)ogx) where the last equality follows from the induction hypothesis since e-). 0 y"dy A + log x) (n+l)I' which completes the induction 1. P{XJ < Xllmin(XJ.(A + log x)" dx x (byy =: = n' = e-). P{X I > t} = P{XI =: t} Al(t) • P{X2 = t}P{XI > I} = AI(t) PiX) = t}P{Xz> I} A2(t) Hence P{X1 < Xzlmin(XJ' X z) = t} = ~ (I) 1 +_2_ A.(E[E[XIY]])2 =E[Var(XI V)] + E[(E[XIYW] . Xz) t} PiX) < X 2t min(X). Var(X) = E[X2] (E[X]Y (E[E[XI Y]W = E(E[X21 V]) = E[Var(XIY) + (E[XIY]Y] . Xz > t} + P{X2 = I. S2 - 2. S3) is equivalent to (Xl' X 2 .6.] = 2n(n + 1)/2 - n = n2 CHAPTER 2 2. and note. X 3) = . Is. =2 + E[T. where E[N] = k~n1ln(n m) n+m I 2: k [(k . then E[Ton] =E [~ T.] = 1 + 1/2(E[time to go from i E[T.478 ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS 1.S2).-d.n If TOn is the expected number of steps to go from 0 to n.37. 110 . S3 is given by Sl. E[Td n_CC I_I 2: I'+l/n = 1 n = 113 For i > 1. Since (Slo S2. 19 .39. or X n +1 is equally likely to be the largest of the three. and let it be 0 otherwise. since each of X n . 17 . S3 .1 IL I )n ( ILl J. that E[ln] = 113 As {/2' 15 . the desired answer is E[N]/(ILI + 1L2). Hence. with probability 1. lim 1. to 1/3 But this implies that. it follows from the strong law of large numbers that the average of the first n terms in each sequence converges. .1 (SJ. {f3. 16 . E[T.] 2 to i] = 1 + 1I2(E[T. }. X n . S2.D i> 1 and so. Letting N denote the number of components that fail. S3) = (SI' S2.I .] = 2.7. . } are all independent and identically distributed sequences. Let In equal 1 if a peak occurs at time n. {f4.Lz ) k-n ILl + 1L2 + 1L2 and where (k ~ 1) = 0 when i > k. = 1. it follows that the joint density of SI.}. .1. with probability 1. E[T2] = 3 E[T3] = 5 i E[T.-d + E[T.1) ( n . ! is Poisson with mean A. I n 11:+1 ! or.. denote the interarrival times of a Poisson process. = k} = (k-1) n.} and all N.Ax (b) Letting X.I • 2. are independent Hence: (8) £[01] = £ (b) O} [~NII] = ~ A. 2: N. equivalently. then N(1)- the number of events by time I-will equal that value of n such that n n+l 2:X<l<2:X" ! ! or. . 1 The result now follows since N(l) is a Poisson random vanable with mean 1.> -A> 2: log V.lIA into part (a) gives the maximal probability e.AX } ::= 1 .ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 479 2. J denote the number of people that get on at floor i and off at floor j Then N.)k-n" k~n.p.1/A (c) Substituting s = T .e. (c) 0.j ~ i.. i ~ O..P.e""] Setting this equal to 0 gives that the maximal occurs at s =: T . Tn = A(T s)e-A(T-SJ. equivalently.P.2: log V" ! ! n n+I or.S. (a) p{ -10.9.15.-l P~'(l . Let N. .. V':5X} = p{IOg (~J SAX} = P{lI VI S eAX } = P{V. 2.14.} is Poisson with mean 2: A. 2: log V. and Ok are independent 2.P'I. that value of n such that -2: log V. P{N.s)Ae"" . 11 V. > e1 n n+l A > 11 V. . (a) Ni is negative binomial That is. ~ e. < A< . (a) Ps{winning} = P{1 event in [s. (b) ! Ps{winning} = Ae-AT[(T . F (~) . and P. (I P{T.s) ifs < I.e.480 (b) No ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS (c) T. s > I. Say that an entering car is type 1 if it will be between a and b at time tHence a car entenng at times s. is gamma with parameters n.l!. an event occurnng at time s. then as T :5 LI X is the travel time when X is the speed. if 1< s < t + tv. an event occurring at time s is. s < I. is the time between the (i . counted with probability p(s).i:l. where p(s) = { :(t t.-) . Now. + G(t + tv .24. . it follows that G(x) == F(Llx).}. t-s s I Thus the number of type-l cars is Poisson with mean 2.x)n.r}dt J: (U = JA. (d) Yes (e) E[T] : J: P{T > t} dt :J:P{T.1)1 dx) dt (f) T L~l XI> where X.. whereas one occurnng at time s. otherwise s) . independently of other events.~1' (n.z(p.22. . Let v be the speed of the car entenng at time t. independent of other cars. s < t. will be counted with probability P{s + T < t + t. and let tv = Llv denote the resulting travel time If we let G denote the distnbution of travel time. Let an event correspond to a car entenng the road and say that the event is counted if that car encounters the one entenng at time I. will be counted with probability P{s + T > t + tv}. > t}) dt P (by independence) (iI J'" P. will be type 1 if its velocity V is such that a < (t s)V < b. That is.l)st and ith flip Since N is independent of the sequence of X" we obtain E[T] = E[E[TIN]] =E[NE[X]] = E[N] since E[X] 1 2.-l .>t. and so it will be type 1 with probability F (.. differentiate to obtain. I) random variables A second.41. A + dA)IN(/) = n. given Sn = I. . since G(I + lu) = 0 when I is large. 0 :.: s :.Sn-IIN(r) == n . G(y) dy + A f'u G(y) dy To choose the value of lu that minimizes the preceding.:. N(/) = n} (by Theorem 2 3 1) Hence. =I (by independent increments) . somewhat heunstic. N(/) = n . the optimal travel time I~ is the median of the distnbution. Sn-1 are distributed as the order statistics from a set of n .Sn = sn} -1: _ e-AI(At)n dG(A) e-AI(AW dG(A) . SI' . = snl A .: I} is equivalent to knowing N(/) and the arnval times SJ. that or. the number of encounters is Poisson with mean A f. It can be shown in the same manner as Theorem 231 that. (a) It cannot have independent increments since knowledge of the number of events in any interval changes the distribution of A (b) Knowing {N(s).ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 481 Hence. N(/) == n. .Sn = sn} . . which implies that t~ optimal speed v* = LlI~ is such that G(Llv*) 112 As G(Llv*) = F(v*) this gives the result 2. Setting this equal to 0 gives.1 uniform (0. Thus. SI == Sl.Sn-IISn = SI. argument is :=0 SJ. = SI.:.Sn = PtA = A}P{N(/) = nlA = A}P{SI = SJ.1. PtA :=0 A.SN(I) Now (arguing heuristically) for 0 < Sl < .s) ds = A f'+" I. < Sn < I.Sn-IIN(/) = n-l 2. S) == SJ. p(s) ds Af: G(I + = lu - s) ds +A 0 r'o G(I + lu .26. . PtA E (A. a• o n' (a}.1)' = amf'(m + n .)m-I d)' (m .482 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS Hence the conditional distribution of A depends only on N(t) This is so since. dG(}.A1 (At)" ae.eh M ) dG(}. .1)1 (a}.) ::= f:O }. 2. 1. the answer is the mean of the conditional distnbution which by (b) is (m + n)/(a + t) (a + t)}. t) population) no matter what the value of A (c) P{time of first event after t is greater than t + sIN(t) = n} e-·'e-·'(At)" dG(}.1)(~)m(_t n t)e-(atl)A a+t )" «a + (m + n . (a) P{N(t) = n} f'" e. The renewal equation is.) (d) lim f"'_1 _e-_· dG(}.) =~.1)1 n'(m -1)'(a + t)"'+" f'" (a + t)e-ratl)A «a + t)}. for t ::::.ae ( m -a. .1)' ( (b) PiA m +n n - 1) (~)"'(_t)" a+t a+t = nlA = }'}P{A = }. (m ..)m+n-l ..1)1 (c) By part (d) of Problem 241.)m-l a+t + n .42..} P{N(t) = n} -AI (At)" = }.. SN(I) will be distributed the same (as the order statistics from a uniform (0. SI. CHAPTER 3 3.7.)mt"-I d)' 0 (m + n . given the value of N(t). m(t) t + J:m(t - s) ds =t + J: m(y) dy Differentiating yields that m'(t) = 1 + m(t) Letting h(t) = 1 + m(t) gives h'(t) h(t) or h(t) = ee' .) (I (e) Identically distributed but not independent.) h_OO h ::= h f'" lim Oh_O (1 .IN(t) n} == P{N(l) e ".------ e-'\/(At)" dG(}. Let T denote the number of draws needed until four successive cards of the same suit appear. Assume that you continue drawing cards indefinitely and any time that the last four cards have all been of the same suit say that a renewal has occurred The expected time between renewals is then E[time between renewals] = 1 + ~ (E[T] . .r > t} dt _ . we see that c = 1 and so. E[A(t)IX1 = s] dF(s) = i:g(t = i:g(t s) dF(s) + r .2E[X] 3.s) dF(s) + tF(t) E[Z] f: P{ZI P(t) fJ.ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS 483 Upon evaluating at t = 0. it follows that the expected number of uniform (0. fJ. the time of the first renewal after 1. g = h = h +g*F + (h + g * F) * F = h + h * F + g * F2 = h + h * F + (h + g * F) * F2 = h + h * F + h * F2 + g * F J =h+h*F+h*F2 + + h Letting n _ 00 and using that F" _ g ° 1 * F" + g * F"+I yields '" = h + h * 2: F" = h + h * m "=1 I (a) P(t) = = f: P{on at tlZ + Y f~ P(t s) dF(s) + r = s} dF(s) P{ZI > Iizl + Y1= s} dF(s) = (b) g(t) t P(t . is the number of interarrival times that we need add until their sum exceeds 1. _ f>2 dF(s) _ E[X2] 2fJ.1) .E[Z] + E[Y]' 0 0 = '---'----- g(t) _ f 0 a> tF(t) dt = fa> t fa> dF(s) dt 0 J'" JS t dt dF(s) fJ.17. m(t) = e' . 1) random vanables that need be summed to exceed 1 is equal to e 3. tdF(s) .1 As N(I) + 1.s) dF(s) + P{ZI > t} = f. fJ.24. 27. E[R. N .s) dm(s) + E[RIIXI > 1]17(/) -'I> f: E[RlIXI > I]F(t) dtlp. and if it is of a different suit then it is like the first card of the initial cycle By Blackwell's theorem. denote the service time of customer i. which implies that E[X2] > E2[X] since Var(X) > oexcept when Xis constant with probability 1 3. f: sE[RdXl s] dF(s)/p. Y.)+tiSN(') = s]F(t.484 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS The preceding equation being true since if the next card is of the same suit then the time between renewals is 1. (a) Both processes are regenerative (b) ]f T is the time of a cycle and N the number of customers served.Y. at a rate equal to the customer's remaining service time Then reward in cycle = f~ V(s) ds Also. letting Y.s) dm(s) + E[RN(II+dSNIlI = O]F(/) E[RdX. E[timebetweenrenewals] = (lim P{renewal atn}t l = (114)-3 Hence. then Imagine that each pays. f: f: dt E[RIIX.s]F(t . where p. > 1.33.Xd/p. J.=1 =2:D.aj t) dl] N y2 . = s] dF(s)/p. E[Xl] We have assumed that E[R. at any time.+2: 2' . + f:' (Y. !:" E[RlIXI = s] dF(s) dtlp. 64 3.Xd < 00. reward in cycle = ~ [ D.. E[RN(I)+d = J: J: E[T] = 85 E[RM. -.. are independent E[rewardincycle] = E[N] (E[Y]WQ+ E[.... E[timellength isx. (a) Regenerative (b) E[time during cycle i packages are waiting]/~F' where E[time i packages are waiting] = f.-dF(x) (Ax) f I (J J:' J The last equality follows from Problem 217 of Chapter 2.] lim = E[Y]WQ Therefore from the above since D. V(s) ds] = VE[T] and using the relationship E[T] = E[N]/A establishes the identity. Suppose P{X < Y} and k = 3 1 For instance... E[ti me Icycle of length x] dF(x) '" 2:. oo E[DI + n + D. 3) 3.Y. .. 3. E[reward in cycle] (by Wald's equation) Now E[~DY'] E[N] E[D1Y1+···+D.. P{X = I} = 1... Equating this to E[f.2]). ~~~-------=11m n ilI/> =E[Y] n...... and Y is uniform (2. N(x) :.34..ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 485 Hence.] = n. and Y. j]e-·.35.. == 1 . limP~ =0 for allj E C..-l. lim I: r(X(s» ds/I:::.13.. j is recurrent. 0 :5 k :5 j k. r(X(s» ds Jj E[T] E [~r(j). (0) a.a. Let m be such that denote the kth time the chain is in state i and let ifXN > 0 Let Nk •+m = j otherwise By the strong law of large numbers 2:.p)' (b) No (c) No (d) Yes. 4... P~ Y. . Then all states in C are null recurrent implying that . numberofvisitstojbytimeNn+m n n N" + m 2: m 1 P.10.. E [I.(1 .36.!~ P~ >0.~~------~E~[T=]--------~ CHAPTER 4 4..). the transition probabilities are given by P{X" +1 klX" = i. then by the above so would be i Hence j is null recurrent.14. Y" = j} = U) a~(1 . (amount of time injduring T) J :::. where T" is the time between visits to i Hence j is also positive recurrent If i is null recurrent.. Suppose i is null recurrent and let C denote the class of states communicating with i. If j were positive recurrent. Suppose i +-+ j and i is positive recurrent. then as i +-+ j.! E[TII ] > 0. But this is a contradiction since L 1EC P~ = 1 and C is a finite set For the same reason not all states in a finite-state chain can be transient.ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 3. .+I =j - 4. l:1 n Hence n I !~re .. and recurrence is a class property.. = 'Tf.16.ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 487 4.r-l. r.(1 p) It is easily verified that they are satisfied by ifi =0 = 1./n n n k=1 ._/(1 TTO p) + TT'-J+IP. TT J = TT.17. .P (c) p'Tf o =~ r+ q 4.r (b) The equations for the limiting probabilities are 'Tf. (jl ~ G ifXn =j otherwise Then = lim 2: 2: E[Il-J(i)Ik(j)]/n n n k=1 . = lim 2: 2: E[h-l(i)]P.='TfO+'TflP. . 'Tf={r!q I 1 r+ q if i where q = 1 . Let I. . j I. (a) Let the state be the number of umbrellas she has at her present location The transition probabilities are i = I. = lim n 2: P" 2: E[I k-1 (i)]/n I n k~1 . by regarding visits to state 0 as cycles. Let X. Y/ equations to obtain Yo Y!+ I q/+l . and only if. the system of equations possesses a solution such that Y/ ~ 0. it follows that the long-run proportion of time in state j.Y.~ 1 From the above it follows that Hence. Therefore. y/q/ . This Markov chain is positive recurrent if. for j > 0 But./(n) equal 1 if the nth transition out of state i is into state j.21. denote the number of time penods the chain is in state i before returning to 0 Then. Hence. 11 P.} we obtain for j > 0 that I 4. satisfies 11/. L.P! =::: 1 We may now rewnte these = Ylql. let N.Y!-IP. and let it be 0 otherwise Also. by Wald's equation For a second proof.-t> F. a necessary and sufficient condition for the random walk to be a positive recurrent is for . from the stationarity equations 11} L.488 ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS 4.20. (q/p)M) 1 . we have .. let io :::: in = 0 Then. "" 0. • T For a more formal argument.30..(q/p)2M = (q/p)M 1 + (q/p)M 1 1 (p/q)M +1 1 + AM By Wald's equation E [~ (X. . . = only look when a pair X" Y.-----. "'-1 = hi n n P~ 1'-1 '.---. 11f i..40. given that the chain has just entered state i. and note that the reverse process is a Markov chain with the transition probabilities P~ = 1TJ PJ.(1 . 4. . t . Since P{X. then what is 4. 4. For any permutation i l • i]...ANSWERS AND SOLUTIONS TO SELEcrED PROBLEMS 489 I} = (1 PI )P2 if we Y. or. P{Y (io.(q/pfM .Y. = I} = P1(1 .* t-l '. Consider the chain in steady state. in)} = P{X == (in./1Tr Now. 2. .43. occurs for which X..) ] = E[N](P 1 P2).- = (q/p)M(l . in of 1.P2) and P{X.= . . in) denote the limiting probability under the one-closer rule By time reversibility. let 1T(iJ> i2 .. observed is a simple random walk with Hence from gambler's ruin results P{error} = P{down M before up M} 1 _ 1 . - Y.(q/p)M) 1 . 1T.. n. Y. io)} = k~1 n n P'A' iH k=1 n P. the sequence of (reverse) states until it again enters state i (after T transitions) has exactly the same distnbution as Yp j 0. P{e. it suffices to show that under the one-closer rule P P{eJ precedes e.pP .})] = 1 + 2:2: (P.} + P}(l. > PI Now under the front-of-the-line rule PI Pie} precedes e.} + PIP{e.precedesel}] .} as large as possible when PJ > P. we would want to make Pie} precedes e. and as small as possible when P.P{eJprecedese. say ( . j) Pie.i".precedese.! ...l + 2:2: PI Hence.."r "= 1 + 2: 2: P. ) By successive transpositions using (*). i). and only if. 1+ I since under the front-of-the-line rule element j will precede element i if. the last request for either i or j was for j Therefore.490 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS for all permutations Now the average position of the element requested can be expressed as average position 2: P. } .<) = 1 + 2: [P.i. we have 1T( .j.j) < P p: a(j. Letting a(i.} = . to minimize the average position of the element requested.l I I'" = 1 + 2: [P. ii' (P)hl 1T( PI Now when PI > P" the above implies that 1T( ) < pi 1T( I p. precedes el}' we see by summing over all states for which i precedes j and by using the above that a(i. Pie} precedes e. - PI)P{el precedes e. [1 + 2: P{elementj precedes element i}] I !.} > p ' p J+ when PI> P.ibj.P{elprecedese. ) =.E[position of element i] 2: P. to show that the one-closer rule is better than the front-of·the-line rule. I Now consider any state where element i precedes element j. (N) (d) For the symmetric random walk. (8) Let N(t) denote the number of transitions by t It is easy to show in this case that P{N(t) 2: n}:5 J~1l '" L e- M (Mt)i . This transition probability matrix is doubly stochastic and thus i =::.1.ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 491 which.: 1 (e) Use Theorem 4 7 1 CHAPTER 5 5.' 4. j :$. P. E[time the X process spends in j between visits to i] . .. (b) Proportion of time in j = 11/2.~o 11" 0 :$.3.::: =! = P. i:= 1. from (b).i+1 P OO ..~h ~:::: POI.N Hence. .(N) = (E[ number of V-transitions between V-visits to i])-I 11 (N) J = E[ number of V-visits to j between V-visits to i] E[ number of V-transitions between V-visits to i] _ E[ number of X-visits to j between X-visits to i] 1111. yields P aU.-. i).- J and thus P{N(t) < co} = 1 5. (8) Yes. .aU. the V-transition probabilities are P. we see that the conditional distnbution of the k births is that of k independent random variables each having distribution given by (5 32) . . since a(i.46. 0.. Rather than imagining a single Yule process with X(O) := i. j) = 1 . N . N (c) Note that by thinking of visits to i as renewals 11. . imagine i independent Yule process each with X(O) 1 By conditioning on the sIZes of each of these i populations at t. i) > PI.5. 1. ' + 0(1) Similarly for i p'J(t) yf j. P. T.2: PI!) J"'M for all M Letting M .. transition time stiX(O) yf jIX(O) == i} = i} .}P{T... from the above lim P{~2 transitions by tlXo == f}lt::S .. lim P{~2 transitions by tlXo = i}/t 1-0 Now PIT. o(t) N(t) = Poisson process with rate v and Hence. T' = independent exponentials with rates v = max(v" v J ) P{N(t) .::::2 transitions by tlXo = i} = 2: P.. respectively.(t) :::0 i} no transitions by tjX(O) = i} + P{X(t) = i.. at least 2 transitions by tIX(O) = i} == e-U. We start by showing that the probability of 2 or more transitions in a time t is o(t). and TJ are independent exponentials with respective rates Vr and vI' they represent the times to leave i and j. first state visited P{first state visited isj. Conditioning on the next state visited gives P{. is independent of the information that j is the state visited from it) Hence.492 ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS . } where T.5. . + ~:$ t} s PIT + T' :S t}.:::: 2}. 00 now gives the desired result P{X(t) == ilX(O) P{X(t) =:. + P{X(t) = j. (Note that T.. Now.S.-0 Vr (1 . + ~:S t}. Let qX and px denote the transition rate and stationary probability functions for the X(t) process and sinularly qr.k)A. _ P{X(t) = i. the result follows. The expected time to go from state ] to state n is n-I k~1 2: lIAk = 2: lI{k(n k~l n-I k)A} 5. 5. this is a pure birth process with birth rates Ak = k(n .::: O}. or 5.IX(t) E B. j) to (i. X(t ) E G} = P{X(t) E B. X(t ) E G} 2: P{X(r) = j}P{X(t) = iIX(r) == j} 2: P{X(r) = j}P{X(t) E BIX(t-) = j} JEG JEG .14. all we need do is check the reversibility equations Now (by time reversibility of X(t» Since the verification for transitions of the type (i. (0) p/2: P JEB J .21.ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 493 Hence. t . With the state equal to the number of infected members of the population. Y(t». j') is similar. X(r) E G} (b) P{X(t) = . we have We claim that the limiting probabilities are To venfy this claim and at the same time prove time reversibility. pr for the Y(t) process For the chain {(X(t).32. we obtain upon conditioning on the state visited after i.ST Inext isj]PI } VI + s I Now E[e.(s) = 2: 2: P.EB 2: P.(s + v')F.F.(s) 2: P.} JEB lEG Multiplying by P.qll + 2: 2: P. FI(s) = E[e-S{T+T')] =E[e-ST]E[e.ST Inext isj] = v {~ ~(s) ifjE G ifj E B.(s) = 2: 2: P.} + 2: q.F.q.q'l IEB rEG IEB where the last equality follows from and the next to last from (d) From this we see that s .494 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS (c) Let T denote the time to leave state i and T' the additional time after leaving i until G is entered Using the independence of T and T'.q. and summing over all i E B yields IEB 2: P. the number of transitions from G to B must equal to within 1 the number from B to G Hence the long-run rate at which transitions from G to B occur (the left-hand side of (d» must equal the long-run rate of ones from B to G (the right-hand side of (d» (e) It follows from (c) that (s + v.(s)q'l + 2: 2: P. which proves (c) (d) In any time t./(l rEGIEB F](s» (f) .ST ] = _1-2: E[e.)f.(s) = 2: F.} IEB IEB .(s)q.EB lEG ::= IEB 2: F. sT.(t) = I'" tI'" dFr(y)dtIErTv ] o .ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 495 (g) From (f).) .] = I: e-S'P{T.form The result then follows from the one-to-one correspondence between transforms and distributions (j) E[T.F.(s) /EB Dividing by s and then letting s _ 0 yields (h) .(s) (from (a» (from(e» (from (f» _ 1 . ' '" IY t dt dFr• (y)1 E[TIl ] Io 0 E[n] = 2E[Tv] E[n] .EB 2: P.r] == I: t dFr. from (h) the hypothesized distribution yields the correct Laplace tran.:: (E[Tu since Var(T.r s t} is as given..<.] - sE[Tu] (by (g» (i) If P{T.E[e.] Hence.E[e.Sr.F. > t} dtlEfTu] = I'" e.:: 0 J)2 .Sr. then E[e.(Y)I E[Tu] sE[TuJ _ 1 . I~ e-$1 dt dFr. we obtain upon using (e) s 2: P.Sl I'" dFr (y) dtl ErTv ] () I ' = - J. l)s) .e.N«n . f~ P{R(s) = i} ds = pq(AJ . from the above.l)s)IA«n . it follows from Example 58(A) (or 5 4(A» that P{R(t) = I} = pe. E[N(t)] == E [ ~ sA«n .496 ANSWERS AND SOLUTIONS TO SELEcrED PROBLEMS 5.X' + (1 .N«n .r ')A2pIA. + h)}/h + o(h)lh . E[N(ns) . E[A(s)] == P{renewal in (s. To prove (b).IE + -s- to(S)] Now let s ~ 0 to obtain E[N(t)] == E [J~ A(s) dS] = ~ A. {R(t).l)s)s + o(s) Thus. E[N(ns) .l)s)] = A«n .A2r ( _~ 1 .l)s)] 2: [N(ns) n~J + o(s) Now. s + h)} and so.e AI) A2 J 2 +-- A At A' where the last equality follows from (*) Another (somewhat more heunstic) argument for (b) is as follows P{renewal in (s. s = hE[A(s)] + o(h).l)s)] + o(s) Hence. let 1\(t) == N(t) = AR(" "~E and write for s > 0 N«n .l)s)] = sE[A«n .33. t ~ O} is a two-state Markov chain that leaves state 1(2) to enter state 2(1) at any exponential rate AJ q(A2P) Since P{R(O) = I} = p. . .) ''''.n 1] =:E[E[Xnlx.. + Y" i 1.n . (by independence) . E[S~IST.) But.1] (lit) Now. .S~-d + 2E(Xn)E[Sn-IISi. .1] .i=I •. . . =: m'(s). E[X. . = E[(Z.S~-d == S~_I .IZ1. + Y"i = 1..n-ll1Xi +Y.7.nu 1SL 2 .1] =: + Y" i 1. • Z.Z. m(t) = J~ E[A(s)] ds CHAPTER 6.n ..Zi-I)(Z.Z.._!+X... • S~_I] + E[X~ ISL . i = E[x.= I.1] + E[Y.• S~-d + 2E[XnSn. . .Z. + Y. Var Zn 6 = var(~ n I x.i: I.(n . ._I)] = E[E[(Z.) =: E[X. "" 2: Var(X.n . .fISi"".-d(Z.. Z.S. .S~-I] = E[S~-dsi. . S~_I .. =: .1)u 2 6. E[S~ . . .Z._I)(E(Z.]] .ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 497 Letting h __ 0 gives E[A(s)] and so.8.])] Z. . + Y. =: I.S~-d=E[S.Z.. .] : E[(Z. ..lx.1] = E[Xn-dX.) + 2:2: COV(X" X.-d + E[X~] Hence.) I = E[(Z..X"i = E[E[X. .IX._I)(Z.n l]IX.n . .-I)IZI.IX. Cov(X" X. for i < j.lx"i =: I. E[Zi-dZI.1Isi. n . (0) E[Xn + Y. n -1] 1.2. + y..Z.X. . .IX. .+y'.)] =0 6. . 7 Var (* XI) =n 0 Var(XJ) + n(n .20.• denote the shortest path and let h (x) denote its 6. .::: O} is a martingale .II given . 2 exp{-b Z/2n}. + Y"i = 1. It follows from (8 1 4) that. O::=.1] The proof of (b) is similar. length Now if y differs from x by only a single component. + Y"-I .1) COV(Xl' Xz) . ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS E[Y"IX. starting at some given time..n-l] Hence.. . 1 . .8.3. + Y"i == 1. with T = h(X) P{ITl4 . 2 . and set Y I = 0.n .::: 1. Without independence.498 Similarly.y" . +Y"i==l. i . "-I It is easy to see that neither {X" Y".. = X"_I .+Y"i=l. Then. both are false Let V.::: For a counterexample in the finite case.. n . Let X. let X] be normal with mean 0 and let Xz = -XI = 7. by Jensen's inequality.. say the ith one.X"2 . the conditional distnbution of the change in the value of Brownian motion in a time s .n-l]==E[Y"_IIX.::: O} nor {X" + Y". the above implies that () > 0 CHAPTER 8 8. X. then the shortest path connecting all the y points is less than or equal to the length of the path X.::: b} ::=. + Y"-dX. be independent and equally likely to be either 1 or -1 Set X" and for n > 1. Y" " = 2: V.X. n .E[T]/41 .::: 0 ~ Cov(X" X z) .• But the length of this path differs from h(x) by at most 4 Hence. Letting a = 4b gives the result CHAPTER 7.n -1] = E[X"_.X.4. Use the fact that f(x) e 9x is convex. Since E[X] < 0. .. from (*) E[X" + Y"Ix. = 2: V. hl4 satisfies the conditions of Corollary 634 and so.X. .1) Cov(X] . . X z) for all n Var(XI ) + (n . 1 . t.1.1. it foUows that it is Brownian motion 8.sf + sf s.+ (B + A) (2 B -2 A) 1.e"'-ef' 8.15. W(l» ] == set + 1) .A in a time 12 . All three have the same density 8.6.II is normal with mean (B . where {Wet)} is Brownian motion. Z(s/(s + 1))) Since {Z(t)} has the same probability law as {W(t) .S)/(t2 . A. we see that Cov(X(s) . Hence. (b) E[T.1. C Vl/2A.A)(s . l. Cov(X(s). [(x) = . II < s < t 2. given X(II) = A. X(t» (s + 1)(t + 1) [ Coy ( W C: 1)' W C 1)) : -t: s: COY (W(l).x e. 1. X(t» = (s + 1)(t + 1) Cov(Z(t/(1 + 1». since E[Tx] =~.1. 8.II) and variance (s .IW(I)}.Sf . For s :::.ANSWERS AND SOLUTIONS TO SELECfED PROBLEMS 499 that the process changes by B ..14. ss I Since it is easy to see that {X(f)} is Gaussian with E[X(t)] = 0. Var(TxIX(h» Var(h + Tx-X(h) + o(h) g(x . i. we: 1)) 1 COY ( W ) C 1)' W( 1) : + (s + l.16. 1.7.I.X(h» + o(h).II).II) X (t2 .e-2p. X(s) is normal with mean and variance 8. X(t 2) = B.IX(h)] E[h + Tx-x(hll + o(h) h + x .4.~t + 1) Cov(W(I).tl)/(t2 .x 8.2f'A .X(h) + o(h) 1. A . . so is X from the convolution result . from the conditional variance formula.8. i = 1.X(h)g'(x) . (c) Var(Tx+y) o = -/Lg'(x) + g"(x)12 + 1I/L2 = Var(Tx + Tx+> . g(x) = ex (d) By (c). = O} = p.Tx) + Var(Ty) (by independence) = Var(Tx) So. g(x) = Var(E[TxIX(h)]) + E[Var(TxIX(h»] = 2 + E[g(x .500 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS Hence.5x/4) CHAPTER 9 9. ] + o(h) h = /L 2 + g(x) + 2" gH(X) + o(h) h Upon dividing by h and letting h _ 0.X(h))] + o(h) /L = h :2 + h E [g(X) . • Xn are independent with P{X. . we see that g(x) = ex and by (b) this implies g(x) == x//LJ 8. g(x + y) = g(x) + g(y) implying that. = 1} = 1 - P{X.Tx) = Var(Tx) + Var(Tx+y ./Lhg'(x) + Xi ) g//(x) + .17. (a) Use the representation where XI.n Since the Xi are discrete IFR. min(1. 15.1 ~ k} is increasing in i and thus PX. From (a).{X" 2!: k} = E. Start with the inequality I{X E A} . and so the right-hand side is also 1 Take expectations to obtain P{X E A} . k ~ 1 Since P{X. and E. the result follows since X" converges to a Poisson random vanable with mean A (c) Let X" i ~ 1. ?: k} = p.{X"-l ~ k}-we have 9.{X" 2!: klXI }] = E. = k} = p(1 . P{X. and so for any increasing function g(Xl)-in particular for g(Xl ) = PX.Y} which proves the result 9.[P. denote the next state from i in the Markov chain The hypothesis yields that T. is IFR. Let T. Prove (b) by induction on n as follows writing P.{Xn .F Using the fact that the hazard rate function of the minimum of two independent random variables is equal to the sum of their hazard rate functions gives .12. min(Y. = klx. to mean probability and expectation conditional on Xo = i. Z) . it follows that X. where np" = A By preservation of the IFR property under limits.I{Y E A} :5 I{X oF Y}. which follows because if the left-hand side is equal to 1 then X is in A but Y is not.)].1 ~ k}] Now the induction hypothesis states that P.J(s) = E[f(T. which gives the result 9. Z are independent and Y .G.[PX. ::::::SI T. i ~ 1 Hence ~~ X. Xl is stochastically increasing in the initial state i. is discrete IFR.P{Y E A} :::::: P{X oF Y} But reversing the roles of X and Y establishes that P{Y E A} .p)k-I.P{X E A} :5 P{X oF. we have P.+h i ~ 1 Hence (a) follows since ~J p.{X"-l ~ k} is an increasing function of Xl. Suppose Y.ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 501 (b) Let X" be the binomial with parameters n. be independent geometric random variables-that is.20.p".{Xn . 29.502 ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS To go the other way suppose AI'(t) ~ AG(t) Let Y have distnbution F and define Z to be independent of Yand have hazard rate function The random vanables Y. :5 Bin(n.p) • .1 Ii n n-l 1~2 Repeating this argument (and thus continually showing that the sum of the present set of X's is less vanable than it would be if the X having the largest P and the X having the smallest P were replaced with two X's having the average of their P's) and then going to the limit yields " 2: X.p). Let n 2 Then 2: P{X showing that 2 1 + X 2 ~ i} = P1P2 + Pl + g - PI P2 = 2: P{Bin(2. I I . Z satisfy the required conditions 9.. 1". we see. 2:x..+1 n II n+m . that where XI. X" are independent of all else and are Bernoulli random variables with P{X] = I} = P{X" = I} = PI + P" 2 Taking expectations of the above shows 2: x. ~ P" Now consider the n case and suppose PI ~ P2 :5 increasing and convex. using the result for n Letting f be = 2. v I Now write 2:x. :5 XI + X" + 2: X..+2:x.p) ~ i} 2 XI + X 2 ~ Bin(2. I Note.. ~X.32. " [ " 1"+1] = n (d) Let X and Y be independent and identically distributed Then.ANSWERS AND SOLUTIONS TO SELECTED PROBLEMS 503 where X.l)E[Xf'(X)] Now the functions g(X) = X and g(X) = f'(X) are both increasing in X for X ~ 0 (the latter by the convexity of f) and so by Proposition 72 1. We are assuming here that the variability holds in the limit. 2X2: X v + Y2: E[X + YIX] = X + E[YIX] == X + E[X] v where the first inequality follows from (c) .) . where X < Z < eX Taking expectations and using the convexity of f gives E[f(eX)] 2: E[f(X)] + (e . == 0. "+1 (c) ~X. Let f be convex Then 2: E[f(X)] E[f(Y)] = E[E[f(Y)IX]] 2: E[f(E[YIX])] where the inequality follows from Jensen's inequality applied to a random vanable whose distribution is the conditional distnbution of Y given X (a) follows from the preceding since. E[Xf'(X)] 2: E[X]E[f'(X)] = 0.34. n + m Then from the preceding Bin n + m. ( v "p) 2: _'_ n+m I Letting m _ 00 yields ix.l)X + J"(Z)(eX . n + 1 $. which shows that E[f(eX)] 9. n+ 1 ~X. Poisson v (i p. XY2: E[XYIX] v = XE[YIX] (b) X +Y 2: v E[X + YIX] = X + E[YIX] = X + E[Y] = X. I $. " 2: X. Let fbe a convex function. i $. 9.Z)2/2.~E ~X. A Taylor senes expansion of f(Xe) about X gives f(eX) = f(X) + f'(X)(e . I $. 234 Birthday problem. 383-384 Balance equations. 360. 46. 446-449 Associated stochastic process. 1 Azuma's inequality. 321 Backward diffusion equation. 232 Age-dependent branching process. 368 505 .450.Index Absorbing state. 114. 169 Arc sine distribution. 397 Axioms of probability.453. 307 generalized form of. 191. 364. 365 as limit of symmetric random walk. 1 Borel Cantelli lemma. 150 Arithmetic-geometnc mean inequality. 400 reflected at the origin.443-446 Brownian bridge process. 349-352 for renewal reward processes. 130 converse to. 399 and empirical distribution functions. 33-35 are not unbiased. 422-424 applied to mean times for patterns. 414-416 Binomial random variable. 296. 34 Beta random variable. 4. 253 Ballot problem. 147-148 Arc sine laws. 125-126 for random walks.211-213 Alternating renewal process. 361-363 Brownian motion process. 26 Bayes estimators. 357. 148. 5 Branching process. 253-254 stochastic monotonicity properties. 233 limiting probabilities.136. 16. 226. 121 Age of a renewal process. 110-111. 399 absorbed at a value. 363-364 distribution of maximum. 17 Bin packing problem. 316. 448 Auto regressive process. 54 Associated random variables.180-181. 48. 461 Blackwell's theorem. 57. 259-260 Birth rates.454 Birth and death process. 366-367 arc sine law for. 135-136 Aperiodic. 159 Boole's inequality. 356-357 distribution of first passage times. 25 applications of. 416-418 time reversibility of. 116-117. 358 Elementary renewal theorem. 107 Embedded Markov chain. 20 Conditional distribution of the arrival times of a Poisson process. 20 Conditional variance. 67 Conditional expectation. 340 Death rates. 124. 409-418 Coupon collecting problem. 310-311. 111-112 Distnbution function. 41 for the number of renewals by time t. 231-232 Convolution. 402. 21. 108-109 Chapman-Kolmogorovequations. 318-319 Doubly stochastic transition probability matrix. 413-414. 25 Counting process.33 Conditional Poisson process. 167. 82 probability identity. 82-87 moment generating function. 9 Covariance stationary process. 85 recursive formula for probabilities. 86 Conditional distribution function. 234 Decreasing failure rate random variables (DFR). 96 Compound Poisson random variable. 185 Communicating states of a Markov chain. 20 Conditional probability mass function. 2 Delayed renewal process. 96 Conditional probability density function. 39-40 for Poisson random variables.8 Continuous-time Markov chain. 297. 253 Ergodic state of a Markov chain. 51 Conditional variance formula. 408-409. 406-407. 426-427 monotonicity properties of age and excess. 2-3. 379-381 Brun's sieve. 84 recursive formula for moments. 59 Coupling. 131 Erdos. 425-426 Decreasing likelihood ratio. 165 of a semi-Markov process.433 DecreaSing failure rate renewal processes. 169 Closed class. 309. 131 Equilibrium renewal process. 51 Continuity property of probability. 214 Equilibrium distribution. see second-order stationary De Finetti's theorem. 452 Covariance.506 INDEX Brownian motion with drift. 424. 457-458 Cauchy-Schwarz inequality. 87 asymptotical normality. 174 . p. 168 Compound Poisson process. 432-433 Decreasing sequence of events. 399. 88. 40 Class of states of a Markov chain. 7 Doob type martingale. 14 Ergodic Markov chain. 20. 125 Directly Riemann integrable function. 177.426-427. 329 Einstein. 221 Duality principle of random walks. 18 Chernoff bounds. 372-375. 239-240 Characteristic function.408 Central limit theorem. 335. see delayed renewal process Geometric Brownian motion. 433. 182-183 Glivenko-Cantelli theorem. 37 Inventory model. 369-372 Interarrival times of a Poisson process. 241-242 Kolmogorov's inequality for submartingales. 232 Instantaneous transition rAte. 368-369 Geometric random variable. 314 Korolyook's theorem. 119-120. Hazard rate ordering. 7 Joint moment generating function. see hazard rate ordering Forward diffusion equation.224-225. 324 Hastings-Metropolis algorithm.14 Hamiltonian. 112 relation to Blackwell's theorem. 53 Failure rate ordenng. 48 Gibbs sampler. 118 Instantaneous state. 275-278 Event. 59 Independent random variables. 17. 354 Joint characteristic function. 18 Joint distribution function. 2 Independent increments. 81 Inspection paradox. 52.8 Key renewal theorem. 406. 169 Jensen's inequality. 40. 117.INDEX 507 Erlang loss formula. 165. 240 Kolmogorov's forward equations. 179-180 GambIer'S ruin problem. 9 of sums of random variables. 361 Graph. 54. 453 relation to exponential. see transition rate Integrated Brownian motion. 41 Infinite server Poisson queue. 116-117.211-213 Exchangeable random vanables. 384-385 GIGIl queue.343-344 Gambling result. 204-205 Hazard rate function. 344-346. 42.338-341. 136.49 Failure rate function. 316 Gamma random variable. 420-421. 333. 9 Exponential random variable. 186-188. 35. 70 output process. 46 Expected value. 8 Index set. 450. 18 Joint probability density function. 450 Increasing IikeHhood ratio. see expected value Expectation identities. 64-65 Gaussian process. 452 Increasing failure rate random variable (IFR). 359 General renewal process. 8 Jointly continuous random vanables.223. 16. 1 Excess life of a renewal process. 112-113 Kolmogorov's backward equations. 38. 17. 152 439-440 G/M/1 queue.353 Expectation. 39. see failure rate function . 47 Hamming distance. 64 Interrecord times. 432-433 Increasing sequence of events. 118-119 Irreducible Markov chain. 190. 154. 26-28. 36. 459-460. 437-438 renewal process comparison with Poisson process. 35 of the exponential. 16. 349 descending. 469.453 . 271--275 New better than used in expectation (NBUE). 18-19 n-Step transition probabilities. 73. 358 Markovian property. 349 Ladder variable. 453 Moving average process. 39. 234 List ordering rules. 300 Matching problem. 254-255. 350 Laplace transform. see nonhomogeneous Poisson process Normal random variable. 175 Limiting probabilities of a semiMarkov process. 210-211. 465. 193-195.470 Multivariate normal. 437. 48 Network of queues. 234. 2 Limiting probabilities of continuous-time Markov chain. 408-409 Moment generating function. 224. 198-202. 163 exponential convergence of limiting probabilities. 223-224 Memoryless property. 278-282 M/MIl queue. 407. 451. 167 Negative binomial random variable. 251 Limiting probabilities of discretetime Markov chain. 437-438 renewal process comparison with Poisson process. 16. 429-430 Nonhomogeneous Poisson process. 37-38 Mixture of distributions. 471 Mean of a random vanable.177-179. 304. 15 Monotone likelihood ratio family. 19 Lattice random variable. 57 Martingale. 349 ascending. 232 Markov's inequality. 47. 428-433 Limiting event. 10.164. 341-344 Martingale convergence theorem. 80 Nonstationary Poisson process. 437 more variable than exponential. 388-392 busy period. 315 Martingale stopping theorem. 440-442 Neyman-Pearson lemma. 418-420 Markov chain model of algorithmic efficiency. 428 relation to hazard rate ordering. 109 Likelihood ratio ordering. 228 M/GIl queue. 381-383 use in analyzing random walks. 262 with finite capacity. see expected val ue Mean transition times of a Markov chain. 24. 398 Multinomial distribution. 227 Markov process.508 INDEX Ladder height. 214. 295 use in analyzing Brownian motion processes. 454 less variable than exponential. 78-79 as a random sample from a Poisson process. 17. 261 M/M/s queue. 215 Linear growth model. 260 Markov chain. 440-442 New worse than used in expectation (NWUE). 73-78 M/G/l shared processor system. 132-133 Renewal type equation. 392-393 Runs. 171-172 Spitzer's identity. 400 Parallel system. 465. 41 Sample space. 1 Random hazard. 298 Random variable. 257-258. 45. 140-142. 323 Reversed process. 206-208 Range of a random walk. 170. 453 Polya frequency of order 2. 98 Renewal reward process. 235 Queueing identity. 1 Second-order stationary process. 169. 341 Regenerative process. 42-43. 213. 121. 354. 11-14 Pure birth process. 330-331 Range of the simple random walk. 60 Rate of a renewal process. 151 . 7 discrete. 166-167. 154 Renewal function. 396 Semi-Markov process.433 Positive recurrent. 48 Probability identities. 38 Records. 285 Order statistics. 471 stochastic ordering of. 220 Recurrent state. 44 Random walk on a graph. 335-337 application to mean queueing delays. 66-67. 203. 32.INDEX 509 Null recurrent. 287 Reliability bounds. 100. 195. 104 Rate of the exponential. 63. 289 Random time. 457 Poisson process. 393-396 Shuffling cards. 183-185 Occupation time. 36. 42. 169 Poisson approximation to the binomial. 171. 211. 14 Probability density function. 48. 128-129 Patterns. 173-174. 7 Random walk. 233. 80-81. 347-348. 47. 328 Random walk on a circle. 451 Renewal equation. 157 Reverse martingale. 447-448. or ordinary. 337-338 Standard Brownian motion process. 458-459. 90. 160 Random experiment. 138-140. 173-174 Probabilistic method. 49 Simple random walk. 110 Period of a state of a Markov chain. 47 Ruin problem. 154 Renewal process. 59 Stationary point process. 205-206 Random walk on a star graph. 385-387 Shot noise process. 9 continuous. 227 Sample path. 270 Round-robin tournament. 59-61 Poisson random variable. 411. 469 Poisson paradigm. 51. 331-332 Rate of a Poisson process. 7 Probability generating function.301 Period of a lattice random variable. 358 Stationary distribution.353 Number of transitions between visits to a state. 99. 28. 93 Ornstein-Uhlenbeck process. 16. 174 Stationary increments. 149 regular. 166. 161 Regular continuous-time Markov chain. 125-128. 92. 429. 16 Tandem queue.172. 41 Stochastically larger. 235-238. 288 .317-318 Submartingale.220. 412 of stochastic processes. 313 Symmetnc random walk. 298 Strong law for renewal processes. 233 of the reverse chain. 258 Truncated chain. 53 Time-reversible Markov chains. 95 Two sex population growth model.290. 410-412 Stochastic population model. 10 Waiting times of a Poisson process. 64-65 Wald's equation. 34 Uncorrelated random variables. 104. 453 Steady state. 318 Variance.228. 22 Sum of independent Poisson processes. 143.148-149. 9 of sums of random variables. 448-449 Stochastically more variable. 17. 17 Table of discrete random variables. 102 Strong law of large numbers. 257 Stein-Chen method for bounding Poisson approximatipns. 41. see stochastically larger Stopped process. 261 Two-dimensional Poisson process. 228. 41 discrete time. 89 Supermartingale. 462-467 Stirling's approximation. 284-286 Unbiased estimator. 155. 332-333 INDEX Table of continuous random variables. 404-406 Stochastically monotone Markov process. 144. 203.219. see Brownian motion process Yule process. 105. 298 Stopping time.510 Stationary process. see second order stationary process Weibull random vanable. 242-243.313 Sum of a random number of random vanables. 145-147. 262-263 Tilted density function.291 Transient state. 171 Stochastic ordering of renewal processes. 453 Uniformization.259. 203. 396-399 Statistical inference problem. 9 Uniform random vanable. 56-58. 169.209. 411-412 of vectors. 453 Wiener process. 300 Weakly stationary process. 282-284 Uniformly integrable. 433-437 Stochastically smaller. 244-249 Two-state continuous-time Markov chain. 41 continuous time.46. 170 Transition rate of a continuoustime Markov chain. 263-270 Stochastic process.
Copyright © 2024 DOKUMEN.SITE Inc.