Time Series Analysis - Univariate and Multivariate Methods by William Wei.pdf



Comments



Description

Time Series AnalysisUnivariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University Boston San Francisco New York London Toronto Sydney Tokyo Singapore Madrid Mexico City Munich Patis Cape Town Hong Kong Montreal Publisher Executive Editor Assistant Editor Managing Editor Production Supervisor Marketing Manager Marketing Assistant Senior Designer Prepress Supervisor Manufacturing Buyer Composition, Production Coordination, and Text Design IlLustrations Cover Designer Greg Tobin Deirdre Lynch Sara Oliver Ron Hampton Kathleen A. Manley Phyllis Hubbard Celena Carr Barbara Atkinson CaroIine Fell Evelyn Beaton Progressive Information Technologies Jim McLaughlin Leslie Haimes The cover image appears as Figure 17.1 within the text and embodies all the concepts of time series analysis. For instance, the thin solid line represents stationary series as discussed in Chapters 2 and 3; the heavy solid line is nonstationary series as discussed in Chapter 4. As you learn more, you may also see the concepts of a second nonstationary series, forecasting, backcast, seasonal series, heterocedasticity and GARCH models, input and output series, noise series, vector time series, nonlinear time series, and cointegration in the cover image. Library of Congress Cataloging-in-PublicaHonData Wei, W i a m W. S. Time series analysis: univariate and multivariate methods I W i a m W. S. Wei.- 2nd ed. p, cm. lncludes index. ISBN 0-321-32216-9 1. Time-series analysis. I. Title. Copyright @ 2006 Pearson Mucation, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise; without the prior written permission of the publisher. Printed in the United States of America. About the Author William W. S. Wei is Professor of Statistics at Temple University in Philadelphia, PA. He earned his B.A. in Economics from the National Taiwan University (1966), B.A. in Mathematics from the University of Oregon (1969), and M.S. (1972) and Ph.D. (1974) in Statistics from the University of Wisconsin-Madison. From 1982-87, he was the Chair of the Department of Statistics at Temple University. His research interest includes time series analysis, forecasting methods, statistical modeling, and applications of statistics in business and economics. He has developed new methodology in seasonal adjustment, aggregation and disaggregation, outlier detection, robust estimation, and vector time series analysis. Some of his most significant contributions include extensive research on the effects of aggregation, methods of measuring information loss due to aggregation, new stochastic procedures of performing data disaggregation, model-free outlier detection techniques, robust methods of estimating autocorrelations, and statistics for analyzing multivariate time series. He is a Fellow of the American Statistical Association, a Fellow of the Royal Statistical Society, and an Elected Member of the International Statistical Institute. He is an Associate Editor of the Joltma1 of Forecasting and the Journal of Applied Statistical Science. Contents Preface CHAPTER I Overview 1.1 Introduction 1.2 Examples and Scope of This Book CHAEYTER 2 Fundamental Concepts 2.1 Stochastic Processes 2.2 The Autocovariance and Autocorrelation Functions 2.3 The Partial Autocorrelation Function 2.4 White Noise Processes 2.5 Estimation of the Mean, Autocovariances, and Autocorrelations 2.5.1 Sample Mean 2.5.2 Sample Autocovariance Function 2.5.3 Sample Autocorrelation Function 2.5.4 Sample Partial Autocorrelation Function 2.6 Moving Average and Autoregressive Representations of T i e Series Processes 2.7 Linear Difference Equations Exercises CHAPTER 3 Stationary Time Series Models 3.1 Autoregressive Processes 3.1.1 The First-Order AutoregressiveAR(1) Process 3.1.2 The Second-Order Autoregressive AR(2) Process 3.1.3 The Genera1 pth-Order Autoregressive AR(p) Process 3.2 Moving Average Processes 3.2.1 The First-Order Moving Average MA(1) Process 3.2.2 The Second-Order Moving Average MA(2) Process 3.2.3 The General 4th-Order Moving Average MA(q) Process x Contents The Dual Relationship Between AR(p) and MA(q) Processes 3.4 Autoregressive Moving Average ARMA(p, q) Processes 3.4.1 The General Mixed ARMA(p, q) Process 3.4.2 The ARMA(1, 1) Process Exercises 3.3 CHAPTER 4 Nonstationary Time Series Models 4.1 Nonstationarity in the Mean 4.1.1 Deterministic Trend Models 4.1.2 Stochastic Trend Models and Differencing 4.2 Autoregressive Integrated Moving Average (ARIMA) Models 4.2.1 The General ARTMA Model 4.2.2 The Random Walk Model 4.2.3 The ARIMA(0, 1, 1) or IMA(1, 1) Model 4.3 Nonstationarity in the Variance and the Autocovariance 4.3.1 Variance and Autocovariance of the ARIMA Models 4.3.2 Variance Stabilizing Transformations Exercises CHAPTER 5 Forecasting 5.1 Introduction 5.2 Minimum Mean Square Error Forecasts 5.2.1 Minimum Mean Square Error Forecasts for ARMA Models 5.2.2 Minimum Mean Square Error Forecasts for ARTMA Models 5.3 Computation of Forecasts 5.4 The ARIMA Forecast as a Weighted Average of Previous Observations 5.5 Updating Forecasts 5.6 Eventual Forecast Functions 5.7 A Numerical Example Exercises CHAPTER 6 Model Identification 6 , l Steps for Model Identification 6.2 Empirical Examples 6,3 The Inverse Autocorrelation Function (TACF) Contents xi 6.4 Extended Sample Autocorrelation Function and Other Identification Procedures 6.4.1 The Extended Sample Autocorrelation Function (ESACF) 6.4.2 Other Identification Procedures Exercises CKAPTER 7 Parameter Estimation, Diagnostic Checking, and Model Selection 7.1 7.2 The Method of Moments Maximum Likelihood Method 7.2.1 Conditional Maximum Likelihood Estimation 7.2.2 Unconditional Maximum Likelihood Estimation and Backcasting Method 7.2.3 Exact Likelihood Functions 7.3 Nonlinear Estimation 7.4 Ordinary Least Squares (OLS) Estimation in Time Series Analysis 7.5 Diagnostic Checking 7.6 Empirical Examples for Series W1-W7 7.7 Model Selection Criteria Exercises CHAPTER 8 Seasonal Time Series Models General Concepts Traditional Methods 8.2,l Regression Method 8.2.2 Moving Average Method 8.3 Seasonal ARWlA Models 8.4 Empirical Examples Exercises 8.1 8.2 CHAPTER 9 Testing for a Unit Root 9.1 9.2 9.3 9.4 Introduction Some Useful Limiting Distributions Testing for a Unit Root in theAR(1) Model 9.3.1 Testing the AR(1) Model without a Constant Term 9,3,2 Testing the AR(1) Model with a Constant Term 9.3.3 Testing the AR(1) Model with a Linear Time Trend Testing for a Unit Root in a More General Model 128 128 133 134 xii Contents Testing for a Unit Root in Seasonal Time Series Models 9.5.1 Testing the Simjle Zero Mean Seasonal Model 9.5.2 Testing the General Multiplicative Zero Mean Seasonal Model Exercises 9.5 206 207 207 211 CHAPTER 10 Intervention Analysis and Outlier Detection 10.1 Intervention Models 10.2 Examples of Intervention Analysis 10.3 Time Series Outliers 10.3.1 Additive and Innovational Outliers 10.3.2 Estimation of the Outlier Effect When the Timing of the Outlier Is Known 10.3.3 Detection of Outliers Using an Iterative Procedure 10.4 Examples of Outlier Analysis 10.5 Model Identification in the Presence of Outliers Exercises CHAPTER 1 I Fourier Analysis 11.1 11.2 11.3 11.4 11.5 General Concepts Orthogonal Functions Fourier Representation of Finite Sequences Fourier Representation of Periodic Sequences Fourier Representation of Nonperiodic Sequences: The Discrete-Time Fourier Transform 11.6 Fourier Representation of Continuous-Time Functions 11.6.1 Fourier Representation of Periodic Functions 11.6.2 Fourier Representation of Nonperiodic Functions: The Continuous-Time Fourier Transform 11.7 The Fast Fourier Transform Exercises CHAPTER 12 Spectral Theory of Stationary Processes 12.1 The Spectrum 12.1.1 The Spectrum and Its Properties 12.1.2 The Spectral Representation of Autocovariance Functions: The Spectral Distribution Function 12.1.3 Wold's Decomposition of a Stationary Process 12.1.4 The Spectral Representation of Stationary Processes 264 254 264 267 27 1 272 Contents 12,2 The Spectrum of Some Common Processes 12.2.1 The Spectrum and the Autocovariance Generating Function 12.2.2 The Spectrum of ARMA Models 12.2.3 The spectrum of the Sum of ' h o Independent Processes 12.2.4 The Spectrum bf Seasonal Models 12.3 The Spectrum of Linear Filters 12.3.1 The Filter Function 12.3.2 Effect of Moving Average 12.3.3 Effect of Differencing 12.4 Aliasing Exercises CHAPTER 13 Estimation of the Spectrum 13.1 Periodogram Analysis 13.1.1 The Periodogram 13.1.2 Sampling Properties of the Periodogram 13.1.3 Tests for Hidden Periodic Components 13.2 The Sample Spectrum 13.3 The Smoothed Spectrum 13.3.1 Smoothing in the Frequency Domain: The Spectral Window 13.3.2 Smoothing in the Time Domain: The Lag Window 13.3.3 Some Commonly Used Windows 13.3.4 Approximate Confidence Intervals for Spectral Ordinates 13.4 ARMA Spectral Estimation Exercises CHAPTER 14 Transfer Function Models 14.1 Single-Input Transfer Function Models 14.2.1 General Concepts 14.1.2 Some Qpical Impulse Response Functions 14.2 The Cross-Correlation Function and Transfer Function Models 14.2.1 The Cross-Correlation Function (CCF) 14.2.2 The Relationship between the Cross-Correlation Function and the-TransferFunction 14.3 Construction of Transfer Function Models 14.3.1 Sample Cross-Correlation Function 14.3.2 Identification of Transfer Function Models 14.3.3 Estimation of Transfer Function Models xIii 274 274 274 278 279 281 281 283 284 285 286 X~Y Contents 14.3.4 Diagnostic Checking of Transfer Function Models 14.3.5 An Empirical ~ x a r n ~ l e 14.4 Forecasting Using Transfer Function Models 14.4.1 Minimum Mean Square Error Forecasts for Stationary Input and Output Series 14.4.2 Minimum Mean Square Error Forecasts for Nonstatjonary Input and Output Series 14.4.3 An Example 14.5 Bivariate Frequency-DomainAnalysis 14.5.1 Cross-CovarianceGenerating Functions and the Cross-Spectrum 14.5.2 Interpretation of the Cross-Spectral Functions 14.5.3 Examples 14.5.4 Estimation of tbe Cross-Spectrum 14.6 The Cross-Spectrum and Transfer Function Models 14.6-1 Construction of Transfer Function Models through Cross-SpectrumAnalysis 14.6.2 Cross-Spectral Functions of Transfer Function Models 14.7 Multiple-lnput Transfer Function Models Exercises CHAPTER 15 Time Series Regression and GARCI-I Models 15.1 Regression with Autocorrelated Errors 15.2 ARCH and GARCH Models 15.3 Estimation of GARCH Models 15.3.1 Maximum Likelihood Estimation 15.32 Iterative Estimation 15.4 Computation of Forecast Error Variance 15.5 Illustrative Examples Exercises CHAPTER 16 Vector Time Series Models 16.1 Covariance and Correlation Matrix Functions 16.2 Moving Average and Autoregressive Representations of Vector Processes 1 6 3 The Vector Autoregressive Moving Average Process 16.3.1 Covariance Matrix Function for the Vector AR(1) Mode1 - 16.3.2 Vector AR(p) Models 16.3.3 Vector MA(1) Models 16.3.4 Vector MA(q) ModeIs 16.3.5 Vector ARMA(1, 1) Models 2 Partial Process and Partial Process Correlation Matrices 17.3 Equivalent Representations of a Vector ARMA Model 17. Hog Data 17.7.7 An Empirical Example 16.8 Spectral Properties of Vector Processes Supplement 16.A Multivariate Linear Regression Models Exercises CHAPTER 17 More on Vector Time Series 17.3.1 Model Identification 16.4 Nonstationary Vector Autoregressive Moving Average Models 1 6 5 Identification of Vector Time Series Models 16.1.3.S.2 Decomposition of 2.7.l Unit Roots and Cointegration in Vector Processes 17.4 Forecasting 16.1 Representations of Nonstationary Cointegrated Processes 17.5-1 Sample Correlation Matrix Function 16.7.5 Further Remarks 16.3 Diagnostic Checking 16.2 partial Autoregression Matrices 16.3 Partial Process Sample Correlation Matrix Functions 17.5 The Kalman Filter and Its Applications Supplement 18.4 An Empirical Example: The U.2 Parameter Estimation 16.7.3 Testing and Estimating Cointegration 17. State Space Representation 18.5.2.1. 17.6 Model Fitting and Forecasting 16.A Canonical Correlations Exercises .3 State Space Model Fitting and Canonical Correlation Analysis 18.2.1 Finite-Order Representations of a Vector Time Series Process 17.4 Empirical Examples 18.Contents 16.2 Some Implications Exercises CHAPTER 18 State Space Models and the Kalman Filter 18.7.2 Partial Covariance Matrix Generating Function 17.2.1.1 Covariance Matrix Generating Function 17.5.3 Partial Lag cokelation Matrix Function 16.1.2.2 The Relationship between State Space and ARMA Models 18. 4 The Effects of Systematic Sampling and Temporal Aggregation on Causality 20.mi Contents CHAPTER 19 Long Memory and Nonlinear Processes 19. d. q) Process 20.4.1 Cumulants.1.1.2 Practical Implications of the GRFIMA Processes 19. and Tests for Linearity and Normality 19.1.1 The Relationship of Autocovariances between the Nonaggregate and Aggregate Series 20.2 An lltustrative Underlying Model 20.2.3.2.1 Testing for Linearity and Normality 20.5.2.1.1 Decomposition of Linear Relationship between Two Time Series 20.2 The Effects of Temporal Aggregation on Testing for Linearity and Normality 507 507 508 511 512 5 13 515 520 520 52 1 522 524 526 528 528 53 1 532 534 534 537 .5.3 The Effects of Systematic Sampling and Temporal Aggregation on Causality 20.3.1 Hilbert space 20.2 Nonlinear Processes 19.2 The Effects of Aggregation on Forecasting and Parameter Estimation 20.1 Long Memory ~rocessesand Fractional Differencing 19.2 Modeling TAR Models Exercises CHAPTER 20 Aggregation and Systematic Sampling in Time Series 20.1 Tests for TAR Models 19.3 Threshold Autoregressive Models 19.2 Temporal Aggregation of the MA(d.2 The Application of Hilbert Space in Forecasting 20.3 Systematic Sampling of the ARIMA Process 20.1.3 Temporal Aggregation of the AR(p) Process 20. q) Process 20.2.1 Fractionally Integrated ARMA Models and Their ACF 19.4.5 The Limiting Behavior of Time Series Aggregates 20.1 TemporalAggregation of the ARlMA Process 20.4.5 The Effects of Aggregation on Testing for Linearity and Normality 20. 20.1.2.4 Temporal Aggregation of the ARIMA(p.2 Some Nonlinear Time Series Models 19.4 Information Loss ~ u toeAggregation in Parameter Estimation 20.1.3 The Effect of Temporal Aggregation on Forecasting .3 Estimation of the Fractional Difference 19. Polyspectrurn.1.2. 6.Contents 20.6.6.6.3 The Effects ofPAggregationon the Significance Level and the Power of the Test 20.5 General Cases and Concluding Remarks 20.7 Further Comments Exercises References Appendix Time Series Data Used for Illustrations Statistical Tables Author Index Subject Index .2 The Effects of Aggregation on the Distribution of the Test Statistics 20.6.4 Examples 20.1 The Model of Aggregate Series 20.6 The Effects of Aggregation on Testing for a Unit Root 20. . and Supplement IBA on canonical correlations. I took the opportunity to clarify certain concepts and correct previous errors. normality. In particular. this chapter introduces models with autocorrelated errors and ARCWGARCH models for heteroscedasticity that are useful in many economic and financial studies. The. After an introduction to various vector time series models in Chapter 16. I am very grateful for the numerous encouraging letters and comments received fiom researchers. Many time series exhibit characteristics that cannot be described by linear models. Therefore Chapter 19 on long memory processes and nonlinear time series models that are useful in describing these long memory and nonlinear phenomena was included. there are some important phenomena unique to vector time series. cointegration. In rime series analysis. this book has been used by many researchers and universities worldwide. and unit roots. Although tbe original chapters in the book still form the necessary foundation for time series analysis. In a separate new Chapter 15. some new results are included on the effects of aggregation on testing for linearity. I follow the fundamental theme of the first edition and continue to balance the emphasis between both theory and applications. and students. To aid understanding.Preface to the Second Edition Since the publication of the first edition. Regression analysis is the most commonly used statistical method. and the time has come to incorporate these new developments into a more comprehensive view of the field. partial processes. standard assumptions of uncorrelated errors and constant variance are often violated when time series variables are used in the model. instructors. and time series data are widely used in regression modeling. and equivalent representations of a vector time series model are covered in the new Chapter 17. They are useful in understanding and analyzing relationships of time series variables. To address this procedure. many new theories and methods have been developed during the past decade. two supplements have been added: Supplement 16A on multivariate linear regression models. particularly in business and economic research. In this revision. we often encounter nonstationary time series. In the process of updating this book. Methodologies are introduced with proper theoretical justifications and illustrated with empirical data sets that may be downloaded xix . Chapter 9 on unit root tests for both nonseasonal and seasonal models has been added. In the chapter on aggregation. Although the basic procedures of model building for univariate time series and vector time series are the same. the use of time series variables in regression analysis is discussed. and a formal testing procedure for a unit root has now become the standard routine in time series modeling. As with the first edition. and the remaining chapters related to multivariate time series plus supplemental journal articles were discussed in the second semester.temple. Jim Mchughlin. Addison-Wesley for her continuing interest and assistance with this project as well as Ms. the book was developed from a one-year course given in the Department of Statistics at Temple University. Finally. USA April 2005 . Pennsylvania. and Dr. As indicated in the first edition. I wish to thank Dr. Executive Editor of Statistics. Olcay Akrnan of the College of Charleston. I am grateful to Ceylan Yozgatligil for her help in preparing some of the updated examples and tables. Wei Department of Statistics The Fox School of Business and Management Temple University Philadelphia. Mukhtar ALi of the University of Kentucky. Dr. William W.or two-semester courses in time series analysis. S. Barbara Atkinson.Hsieh of the University of Massachusetts. and forecasting. With the proper selection of topics.xx Preface to the Second Edition from the Web site: http://www. Kathleen Manley.sbm. Dr. Dr. K. the book can be used for a variety of one. Mr.edu/-wwei/. I would iike to thank Ms. Tom Short of Indiana University of Pennsylvania for their numerous suggestions and comments that have improved this revision. Robert Miller of the University of Wisconsin. Deirdre Lynch. The book should be useful for graduate and advanced undergraduate students who have proper backgrounds and are interested in learning the subject. David Quigg of Bradley University. H. Dr. and the staff at Progressive Publishing Alternatives who provided wonderful assistance in the production of the book. Ms. exercise problems are included at the end of each chapter to enhance the reader's understanding of the subject. Topics of univariate time series analysis from Chapters 1 through 13 were covered during the first semester. Mohsen Pourahamadi of Northern nlinois University. Dr. model building. It should also be helpful as a reference for researchers who encounter time series data in their studies. This book was developed from a one-year course given in the Department of Statistics at Temple University. and any . I hope. With an appropriate selection of topics. 11. intervention analysis and outlier detection. However. Akaike's information criterion (AIC). most of them concentrating either on time-domain analysis. The majority of these books concentrate only on univariate time series. extended sample autoconelation functions. The book should be useful for graduate and advanced undergraduate students in statistics. and those few books which discuss multivariate time series focus only on theoretical aspects of the subject. Included in the book are many recent advances in univariate and multivariate time series methods such as inverse autocorrelation functions. however. partial processes. The first nine chapters were covered in the first semester. however. This book. the state-space model and Kalman filtering. Chapters 10. business. that a proper balance between theory and applications has been achieved within the pages of this volume. P. The book. Box's motto that statistics will not flourish without a happy marriage between theory and practice. this author believes students have a better appreciation of the frequency-domain approach after studying time-domain analysis. the remaining chapters were covered in the second semester. provides sufficient flexibility regarding the order in which topics can be covered. Within each of these categories some books provide inadequate theoretical background material. For those interested in frequency-domain analysis. attempts to provide a comprehensive introduction to both timedomain and frequency-domain analyses of univariate and multivariate time series. and 12 can be covered earlier. and many others. while others offer very few applications. meteorology. Students are encouraged to attempt to solve as many of these exercises as possible. The course was offered to those graduate students with backgrounds in statistical theory and regression analysis. various areas of social science. Exercise problems are included at the end of each chapter. The purpose of these problems is to enhance the student's understanding of the subject and not solely to test the student's mathematical skills. or solely on frequency-domain analysis. vector ARMA rnodeIs. economics. my attitude toward time series analysis has been influenced greatly by Professor George E. engineering. Since my own student days at the University of Wisconsin-Madison. Several books on the subject have been published. aggregation problems. partial lag correlation matrix functions. therefore.Preface t o the First Edition Time series andysis has been an area of considerable activity in recent years.or two-semester courses in time series analysis and forecasting. the book can be used for a variety of one. P. Special thanks goes to my wife. Volume I . The preparation of the final manuscript began in the summer of 1987. Biornetrika Tables for Statisticiarts.Madison. I am indebted to Ruth Jackson for typing. for their interest and assistance with this project. and I would appreciate having them brought to my attention. Many of their valuable suggestions were incorporated into the book. for typing most of the final manuscript. This book could not have been written without their inspiration as great teachers and researchers. which were used to analyze many univariate and muItivariate time series data sets in the book. Finally. I am very grateful to my then thirteen-year-old son. Karen Garrison. which evolved from a series of lecture notes. I wish to thank all those who assisted in the preparation of this book. Owen and Addison-Wesley Publishing Company for permission to adopt the t and Chi-Square tables from their book. I would like to thank Allan Wylde. 1985) and SCA (Scientific Computing Associates. She carefully checked the h a 1 manuscript. who supentised my research in the field while I was a graduate student at the University of Wisconsin .xxii Preface to the First Edition other field where analysis and research of time series are prevalent. when my administrative responsibility as Department Chairman at Temple was completed. and to the late Professor E. Stephen. and Grace Sheldrick and Frederick Bartlett. and supervised our children in a variety of sorting tasks. I wish to extend my gratitude to Professor Donald B. and Dror Rom and Hewa Saladasa for proofreading several chapters of the notes. I am greatly indebted to Professor George E. Tiao. William Wu-Shyong Wei March 1989 . as well as their assistance with numerous computations and the initial draft of figures using SAS (SAS Institute. Pearson and the Biometrika trustees for permission to adopt the F table from their book. for helping to make this project a family effort. any errors and omissions which remain are entirefy my responsibility. and to the various manuscript reviewers for their helpful suggestions and criticisms. Handbook of Statistical Tables. S. and to Wai-Sum Chan. Box. and Jong-Hyup Lee for their careful reading and checking of the entire manuscript. Further acknowledgments to John Schlater and Alan Izenman for their valuable comments. Leonard Cupingood. The book should also be useful as a reference for research workers interested in learning the subject. Susanna. 1986) software. and the rest of the AddisonWesley staff. I would also like to thank David Reilly for his excellent time-domain software AUTOBOX and MTS. developed a sorting system for indices. provided many helpful suggestions. Of course. who introduced me to time series analysis. and Professor George C. and different methods are needed. 1. which are taken only at specific time intervals. particularly in terms of some equally spaced time intervals.1 shows four time series that are studied later in the book. In meteorology. such as ocean waves and earth noise in an area. monthly price indices. In agriculture. The body of statistical methodology available for analyzing time series is referred to as time series analysis. In engineering. There are various objectives for studying time series. and optimal control of a system. we observe annual crop production and prices. yields. we observe sound. A time series. we monitor a process according to a certain target value.Overview 1. and volume of sales. mortality rates. we observe daily closing stock prices. and voltage. accident rates. the forecasting of future values. is said to be discrete. they also exhibit other very distinctive characteristics. We do so because even continuous time series provide only digitized values at discrete intervals for computations. The intrinsic nature of a time series is that its observations are dependent or correlated. . and the order of the observations is therefore important. Although they each show the salient feature of dependence among observations. and annual rainfall. Although the ordering is usually through time. statistical procedures and techniques that rely on independence assumption are no longer applicable. In this book. the ordering may also be taken through other dimensions. such as interest rates. daily temperature. electric signals. In geophysics. and various crime rates. A time series. In medical studies. In the social sciences. Time series occur in a variety of fields. is said to be continuous. The list of areas in which time series is observed and studied is endless. that can be recorded continuously in time. quarterly sales.1 Introduction A time series is an ordered sequence of observations. They include the understanding and description of the generating mechanism. we observe hourly wind speeds. such as space. we measure electroencephalogram PEG) and electrocardiogram @KG) tracings. In business and economics. we study annual birth rates. weekly interest rates. we deal exclusively with discrete time series observed at equal intervals. Hence. such as electric signals and voltage.2 Examples and Scope of This Book Figure 1. In quality control. we record turbulence. and yearly earnings. The yearly U. and nonseasonal time series such as those shown in Figures l. beer productian Year [dl Contaminatedblo1~4Iydata Day FIGURE 1. Time series that exhibit these phenomena are said to be nonstationary in mean and variance and are examples of nonstationary time series. the variance of this tobacco series increases as the level of the series increases. seasonal.2 CHAPTER 1 Overview (a) Daily average number of truck manufacturing defects @) Yearly U. we study in chapters 3 to 8 a very general class of parametric models: the autoregressive integrated moving average (ARIMA) models that are useful for describing stationary. Moreover. Time series that exhibit this phenomenon are said to be stationary in the mean and are special cases of stationary time series.S.1 Examples of some time series. Nonstationary time series such as those shown in Figures l.l(a).l(c) presents another characteristic pattern that is repetitive in nature due to seasonal variation. nonstationary. After introducing in Chapter 2 fundamental concepts needed to characterize time series.tobacco production.l(a).S. appears to vary about a fixed level. Time series containing seasonal variation are called seasonal time series.S.l(b). does not vary about a fixed level and exhibits an overall upward trend instead. shown in Figure l. The U. tobacco production 2500 - 0 1860 1880 1900 1920 1940 1960 1980 Year (c) Quarterly U.S. . The daily average number of defects per truck found at the end of the assembly line of a truck manufacturing plant.l(b) and (c) can be reduced to stationary series by proper transformations. shown in Figure I . quarterly beer production in Figure l. Specifically. in Chapter 2. are introduced. such as the inverse autocorrelation function (IACP) and the extended sample autocorrelation function {ESACF). These three chapters give the necessary background to identify a tentative time series model. we also discuss useful model selection criteria such as Akaike's information criterion (AIC). but the standard regression assumptions are often violated when time series variables are used in the model. Because several different models may adequately represent a given series. the fundamental tool used in the time domain approach is the cross-corNation function (CCF). In Chapter 9. and the fundamental tool used in the frequency domain approach is the cross-spectral function. Additional identification tools. Using material from earlier chapters. representations of time series processes. both time domain and frequency domain approaches to the problem are discussed. partial autocorrelation function. and Chapter 4 discusses nonstationary ARIMA models. autocorrelation function. The fourth time series shown in Figure I. Chapters 2 through 13 deal with the analysis of a single (univariate) time series. and {c).2 shows the Lydia Pinkham annual data of advertising and sales.1. Basic knowledge of both approaches is a necessary tool for time series data analysis and research. Additional examples include production series affected by strikes and recorded series marred by a machine failure. Therefore. Although these two approaches are mathematically equivalent in the sense that the autocorrelation function and the spectrum function form a Fourier transform pair.autoregressive moving average (ARMA) models. The other useful method to analyze the relationship between variables is regression analysis. The regression model is without a doubt the most commonly used statistical model. which are needed for later discussion of parameter estimation. It reflects another phenomenon of nonstationarity due to a change in structure in the series from some external disturbance. They also form a Fourier transform pair in time series analysis. is known as frequency domain analysis. intervention and outlier models are studied in Chapter 10. For example. Chapter 8 extends these ideas to seasonal time series models. which uses autoco~elatian and partial autocorrelation functions to study the evolution of a time series through parametric models. we introduce the basic ideas of stationarity and nonstationarity. In business. in Chapter 15 we discuss the . It seems natural to build a dynamic model to relate current sales to present and past values of advertising to study the effect of advertising and to improve sales forecasting. To model external interruptions. and a formal testing procedure for nonstationarity has become a standard routine in time series modeling. however. we discuss formal unit root tests for both nonseasonal and seasonal models. Many time series are nonstationary.l(d) is a contaminated laboratory series of blowfly data. often involve simultaneous observations on several variables. This type of nonstationarity cannot be removed by a standard transformation. is known as time domain analysis. Chapters 6 and 7 illustrate the iterative procedure of time series model building. on occasion one approach is more advantageous than the other. An alternative approach. and linear difference equations needed for parametric modeling. Thus. one often tries to promote sales by advertising. The time series approach presented in Chapters 2 through 10.2 Examples and Scope of This Book (b). in Chapter 14 we present transfer function models useful for relating an output series to one or more input series. which uses spectral functions to study the nonparametric decomposition of a time series into its different frequency components. it is presented in Chapters 11 to 13. Figure 1. Chapter 5 develops theory and methods of minimum mean square error (MMSE) forecasting. Such external disturbances are termed interventions or outliers. In analyzing multivariate time series. T i e series are often interrupted by external events. Chapter 3 introduces stationary. Time series data. we present. the approach is illustrated with empirical data sets. Like other methods introduced in the book. Spectral properties of vector processes are also introduced. In many fieids of applications. For example.2 The Lydia Pinkham advertising and sales data. partial processes. although advertising is expected to promote sales. we discuss useful notions such as cointegration. sales may also affect advertising because advertising budgets are often set as a percentage of sales. partial autoregression matrices. long memory processes and nonlinear time series models. the transfer function and regression models may not be appropriate because complicated feedforward and feedback relationships may exist among variables. These methods have been found useful in modeling many time series. use of time series variables in regression analysis including models with autocorrelated errors and various autoregressive conditional heteroscedasticity [ARCH) models for heterogeneous variances. and equivalent representations of a vector time series model. In Chapter 16. After discussing various univariate and multivariate linear time series models and their applications. we extend results to multivariate vector time series and study the joint relationship of several time series variables. . Vector autoregressive moving average models are introduced. Detailed discussions on identification tools such as correlation matrix functions. we discuss the state space model and Kalman filtering. This alternative approach in time series modeling can be used for both univariate and multivariate time series. In Chapter 17. in the Lydia Pinkham data.4 CHAPTER I Overview \\ 01 1900 1 1910 I 1920 I 1930 Year Advertising I 1940 1 1950 \ I * 1960 FIGURE 1. In Chapter 18. The relationship between the state space and ARTMA models is also established. in Chapter 19. and partial lag correlation matrix functions are given. we address a nontrivial. one must first decide on the time unit to be used i n the analysis. such as the annual yield of grain per acre. there are other cases. we try to supply some answers to this question when we discuss aggregation and sampling in time series. For example.1. Does the choice of time interval affect the type of conclusions that are drawn? In Chapter 20. there exists a natural time unit for the series. or yearly sales. . in analyzing sales. quarterly safes. however.2 Examples and Scope of This Book 5 Finally. In some cases. important problem in time series analysis. one may consider monthly sales. In working with time series data. in which several different time units may be available for adaptation. where o belongs to a sample space and t belongs to an index set.Zfn)from a stochastic process {Z(o.. Thus.i = 1. . . the autocorrelation and partial autocorrelation functions. i..1 Stochastic Processes A stochastic process is a family of time indexed random variables Z(w. which play a fundamental role in the parsimonious linear time series processes. For a fixed t. we believe. we introduce some fundamental concepts that are necessary for proper understanding of time series models discussed in this book. the notion of the moving average and autoregressive representations of processes is useful for understanding the logic underlying parsimonious linear processes used in time series analysis. In our discussion. We begin with a simple introduction to stochastic processes.. Thus. t) is a random variable. . Consider a finite set of random variables {Z. .. In addition to its own theoretical merits.(xl)for any integers tl.. where xi. The n-dimensional distribution function is defined by . f2. Z. which. 2. Tbe population that consists of all possible realizations is called the ensemble in stochastic processes and time series analysis. We then discuss estimation of the mean. .e. t). we assume that the index set is the set of all integers unless mentioned otherwise.(xl) = F&l+. and the autocorrelation and partial autocorrelation functions. is called a sample function or realization. . we introduce some essential concepts of stochastic processes in this section. . . f1. and tl k. with a special emphasis on the solution of homogeneous difference equations. Z(w.. a time series is a realization or sample function from a certain stochastic process. We also give a simple introduction to linear difference equations. t): t = 0.Fundamental Concepts In this chapter. A process is said to be first-order stationary in distribution if its one-dimensional distribution function is time invariant. as a function of t. we can illustrate the sampling phenomena of time series models starting in Chapter 3. will enhance the appreciation of model identification discussed in later chapters. t). second-order stationary in distribution if + . n are any real numbers.Z(o. For a given o. and the notion of white noise processes. J . if F&.k. . autocovariances. To have a proper appreciation for time series analysis. it is also true for n 5 m because the ntth-order distribution function determines all distribution functions of lower order. * .. provided E([ztf) < oo. and tz + k. Clearly..~ n ) 9 (2. processes discussed in this book are referred to as the real-valued processes. and Z.2) is true for n = m..: t = 0. we define the mean function of the process . + . Moreover. .1 -2) for any n-tuple (tl. x 2 ) for any integers ti. t). . For a given real-valued process (2.x2) = FZl...(XI. The process is called a real-valued process if it assumes only real values.. . Unless mentioned otherwise. and the correlation function between Z.t ) as Z(t) or Z. k.. we have and . Z ~ + ~x2) ( Xfor ~ . n = 1. is a set of time indexed random variables defined on a sample space..+bq2+k(~1.1. . the variance function of the process the covariance function between Z. any integers tl. ..e..1 FG.&L. t2. since Fq. tn) and k of integers. the mean function pt = p is a constant. . if ~ ( 2 :<) oo.f1..2. . The terms strorrgly stationary and completely stationary are also used to denote a strictly stationary process. Likewise. A process is said to be strictly stationary if (2..%+. a higher order of stationarity always implies a lower order of stationarity. For a strictly stationary process. With proper understanding that a stochastic process. i. . . Hence. if (2.~ n =) F&. tl stationary in distribution if F&. and k. since the distribution function is the same for all t.x2) = F & . then u : = u 2 for all t and hence is also a constant.2) is true for any n. t2.. .. and Z. we usually suppress the variable o and simply write Z(o. f2.Gz(xl.2. * Stochastic Processes 7 + k. and nth-order .).&2(xl. . Z(o. just as we often denote a random variable by X rather than by X(w).1. . . a strictly stationary process may not have finite moments and hence may not be covariance stationary.k and t2 = t. however.i. So far. .4. 8. . 2. say (1. case. the realization or sample function will be (2. A trivial example of a strictly stationary process is a sequence of independent identically distributed (i. Cauchy random variables. 3. It follows from the definitions that a strictly stationay process with the first two moments finite is also a second-order weakly or covariance stationary process. .i. .) would have the realization (2.4. The process is clearly strictly stationary. independent of time origin.8 CHAPTER 2 Fundamental Concepts Letting ti = t . 3 . Other than this simple i. Thus.d. A process is said to be nth-order weakly stationary if all its joint moments up to order n exist and are time invariant. the process in either case is strictly stationary as it relates to a sequence of i. the total number of possible realizations in the ensemble is infinite. particularly a joint distribution function from an observed time series. the covariance and the correlation between Z.6. 6. The ensemble contains a total of 216 possible realizations. If the die is tossed three times independently. Yet. in time series analysis. we have discussed a strong sense of stationarity of a process in terms of its distribution function. it is very difficult or impossible actually to verify a distribution function. Therefore. and ZISkdepend only on the time difference k. 2). i. 4). 2 . then we have a stochastic process 2. This sequence of independent random variables usually does not exist or renders no interest in time series.d. EXAMPLE 2. for a strictly stationary process with the first two moments finite.d..i. t) where t belongs For a particular w.1 Let 2. In this case. the terms statio~lalyin the wide sense or covariance stationaly are also used to describe a second-order weakly stationary process. but it is not weakly stationary of any order because no joint moments exist. Sometimes.3} and w belongs to the sample space = Z(w. A trivial example is the process consisting of a sequence of i. a second-order weakly stationary process will have constant mean and variance. .e. be twice the value of a true die shown on the tth toss. Clearly.i.) random variables. .d. t) with the index set being all positive integers and the sample space being A particular w bf (1. = Z(w. with the covariance and the correlation being functions of the time difference alone. we often use a weaker sense of stationarity in terms of the moments of the process. we get and Thus. to the index set = (1. we have a stochastic process Z.). randomvariables. If a true die is tossed independently and repeatedly. in fact. 1) and a two-valued discrete uniform distribution = 1 )for all t.9 2. Clearly. From these discussions and examples. be a sequence of independent random variables alternately following a standard normal distribution N(0.1. the process is covariance stationary.r. with covariance stationary or second-order weakly stationary processes . it is clear that "covariance stationary" is a much weaker form of stationarity than are "strictly stationary" or "stationary in distribution.9) where A is a random variable with a zero hean and a unit variance and 8 is a random variable with a uniform distribution on the interval [ . EXAMPLE 2. E(Z.1 Stochastic Processes EXAMPLE 2.2 Consider the time sequence 2. not stationary in distribution for any order. is not strictly stationary. = A sin(ut + 0). however. Then which depends only on the time difference k.f = 0 and ~ ( 2 Now and Hence. with equal probability 112 of being 1 or . It is.3 Let Z. however.T] independent ofA. (2. Hence. the process is covariance stationary." We often work.1. The process. 2. the autocorrelation function is often plotted only for the nonnegative lags such as the one shown in Figure 2.)..EL). Like other areas in statistics. 2. Hence. we use the term stationary to refer to all processes that are covariance stationary.2 The Autocovariance and Autocorrelation Functions For a stationary process (Z. (2. and the covariances Cov[Z. yk = y-k and pk = p-n for all k. we write the covariance between Zt and Z. A stochastic process is said to be a normal or Gaussian process if its joint probability distribution is normal. 4. we have the mean E(ZJ = p and variance Var(Zt) = E(Zl p12 = vZ.).~)(Zt+k. 2. It is easy to see that for a stationary process.. Z.st. Henceforth. yk and pk are even functions and hence symmetric about the lag k = 0. =th this. and Zr+k as where we note that Var(Z.10 CHAPTER 2 Fundamental Concepts in time series analysis because it is relatively simple to check the first two moments. unless mentioned otherwise.lpkl 1 3. strictly stationary and weakly stationary are equivalent for a Gaussian process. Unless mentioned otherwise.+k and Z. the autocovariance function yk and the autocorrelation function pk have the following properties: I. most time series results are established for Gaussian processes. Zt+k) = EVt . in this case. Thus. po = 1. Because a normal distribution is uniquely characterized by its first two moments.1) and the correlation between 2.+k as Yk = Cov(Z. the autocorrelation functions and the partial autocorrelation functions discussed in the next two sections become fundamental tools in time series analysis. Another important property of the autocovariance hnction yk and the autoco~relation function pk is that they are positive semidefinite in the sense that - . and Zitk from the same process. I ~ k l 70. This property follows from the time dBerence between Zl and Z. which are functions only of the time difference It . Therefore. yk is called the autocovariance function and pk is called the autoconelation function (ACF) in time series analysis because they represent the covariance and correlation between 2. processes which we discuss are assumed to be Gaussian. i. and Zl-k being the same..1.}.) = Var(Zt+k) = yo.e. 7 0 = Var(Z. As functions of k. separated only by k time lags. the following important remark is in order. This plot is sometimes calIed a conelogram.which are constant. 3) follows from defining the random variable X = zy=l The similar result for pk in (2. and any real numbers at. .2. .3) through by yo.1 Example of autocorrelation function (ACF). . 2.2.3 The Partial Autocorrelation Function In addition to the autocorrelation between 2. . and for any set of time points tl. Thus. A necessary condition for a function to be the autocovariance or autoconelation function of some process is that it be positive semidefinite.4) follows immediately by dividing the inequality (2. it is important to know that not every arbitrary function satisfying properties (1) to (3) can be an autocovariance or autocorrelation function for a process. a2. . t2. .3 -1. . and an. and Z. t.2.By the result (2. The conditional correlation is usually referred to as the partial autocorrelation in time series analysis. we may want to investigate the correIation between Z.. . and Zr+k-l has been removed. . .0 The Partial Autocorreiation Function 11 1 FIGURE 2.+n after their mutual linear dependency on the intervening variables Zr+2. and Zt+k. .2. . 1 Pk-2 Similarly. . where Pi(l 5 i 5 k . "' Pl "' .4) ' ' ' + a k .+k-l.1) are the mean squared linear regression coefficients obtained from minimizing The routine minimization method through differentiation gives the following linear system of equations Yi = alyi-1 + (uZYi-2 + ' ' ' + ak-lY. .+k-l be defined E(Zt) = 0. minimizing Hence.} and. . 1 P1 Pl 1 p2 Pk-2 Pk-3 Pk-4 which implies that ai = Pi(l 5 i 5 k . (2. That is. . Pi = Elpi-1 + f In terms of matrix notation.1) are the mean squared linear regression coefficients obtained by.5) becomes Pk. on +~ as the best linear estimate in theAmeansquare sense of Zt+kas a linear function of Zl+l.3. and Z.12 CHAPTER 2 Fundamental Concepts Consider a stationary process {Z. . we assume that Zt+2.-k+l ( l ~ i s k -1).+k.R + l (1 5 i 5 k ..3S) Hence. Let the linear dependence o ~ . the above system of (2.1).1 .Z . then where a i ( l 5 i 5 k . and Z. . Zt+2.I ~ i . (2. . if Zt+kis the best linear estimate of Z. without loss of generality.3. 2.l). Solving the system in (2.we have Now.6) for ai by Crarner's rule gives . letting Pk denote the partial autocodation between Zt and Zt+k.Hence.k will equal the ordinary autocorrelation between (Z.3.Z.4).) and (2. Thus. + k .3 The Partial Autocorrelation Function 13 It follows that the partial a~toconelationbetyeen Zt and Z.+k). Next.Z. because all other remaining terms reduce to zero by virtue of Equation (2. we have Therefore. using that mi = Pi(l 5 i 5 k ..3. . . Consider the regression model.14 CHAPTER 2 Fundamental Concepts as the ratio of two determinants. we have the following system of equations: . 2.e. For j = 1. .. i. pn. where the dependent variable Zi+kfrom a zero mean stationary process is regressed on k lagged variables Zi+kdl. Multiplying Z.. .3.3. . we get hence. k. k. . The matrix in the numerator is the same as the symmetric matrix in the denominator except for its ith column being replaced by ( p l . Substituting crj in (2.13)and multiplying both the numerator and denominator of (2. where $ki denotes the ith regression parameter and et+k is an error term with mean 0 and uncorrelated with Zt+k-j for j = l .15)each expanded by its last column. and Z. . .14)to (2.3.13) can be easily seen to equal the ratio of the determinants in (2. The partial autocorrelation can also be derived as follows.2. the resulting Pk in (2. .Zt+k-2.13) by the determinant .3. . . ..+k-j on both sides of the above regression equation and taking the expectation.. .3. 15)' we see that i$kk equals Pk. and Z. the partial autocorrelation between 2.2.. .3. and Zt+k can also be obtained as the regression coefficient associated with Z. when regressing Zr+k on its k lagged variables ZI+k-2. . it immediately follows that a white noise process {a. .at.4 White Noise Processes Using Cramer's rule successively fork = 1'2.3. .& = 0 for all k # 0. we use this notation in our book. we have Comparing Equation (2. . $kR is usually referred to as the partial autocorrelation function (PACF). By definition.) = a: and yk = ~ov(a.4 White Noise Processes A process {a.Thus.16). Because has become a standard notation for the partial autocorrelation between 2. . .3. 15 .) is stationary with the autocovariance function .19) with (2. as in (2. and 2.}is called a white noise process if it is a sequence of uncorrelated random variables from a fixed distribution with constant mean E(a. As st function of k.+k in time series literature.) = pa.constant variance Var(a.usually assumed to be 0. 2. = p + a. { a t ) is always referred to as a zero mean Gaussian white noise process. variance a2. The basic phenomenon of the white noise process is that its ACF and PACF are identically equal to zero.16 CHAPTER 2 Fundamental Concepts FIGURE 2. it plays an important role as a basic building block in the construction of time series models. when we talk about the autocorreBecause by definition po = lation and partial autocorrelations. = 1 for any process. Otherwise. Although this process hardly ever occurs in practice. 2. Autocovariances. unless mentioned otherwise.5 Estimation of the Mean. In the following discussion. the autocorrelation function and the partial autocorrelation function The ACF and PACF of a white noise process are shown in Figure 2. just like the roles played by the sine and the cosine functions (sin(nx). The exact values of these parameters can be calculated if the ensemble of all possible realizations is known.. we refer only to pk and 4fi for k # 0.autocorrelations pk. and Autocorrelations A stationary time series is characterized by its mean p. cosf~lx)]in Fourier analysis.2. and partial autocorrelations +kt. it plays the role of an orthogonal basis in the general vector and function analysis.2 ACF and PACF of a white noise process: Z. they can be estimated if . A white noise process is Gaussian if its joint distribution is normal. More precisely. if 1. [2 ((I .5. 2. The question becomes whether the above estimator is a valid or good estimator. In the following discussion.2.u. In most applications. Autocovariances. Clearly.!)pk] k=-[n-1) is finite. For a stationary process. we examine conditions under which we can estimate with good statistical properties the mean and autocovariances and hence the autocorrelations by using time averages. it is di£ficult or impossible to obtain multiple realizations. and Autocorrelations 17 multiple independent realizations are available. a natural estimator for the mean p = E(Zr) of a stationary process is the sample mean which is the time average of n observations.1 n-='. Thus. Most available time series constitute only a single realization. howeverj we have a natural alternative of replacing the ensemble average by the time average. however. which makes it impossible to calculate the ensemble average.s). .5 Estimation of the Mean. and is a consistent estimator for . It can also be easily shown that where we let k = (t . then ~ a r ( Z-t ) 0 as n + co.That is.1 Sample Mean With only a single realization. which implies that Z is an unbiased estimator of p . A sufficient condition far this result to hold is that pk 0 as k + oo. I n-k ?k = ixr= l 1 n-k (2. and Zt+Rsufficiently far apart are almost uncorelated.4). these results simply say that if Z. we employ the following estimators using the time average to estimate the autocovariance function yk.Z) (&+k .5. for n > (N + I).5. in Equation (2. Hence.2 Sample Autocovariance Function Similarly. Intuitively.18 CHAPTER 2 Fundamental Concepts in mean square. with only a single realization.Z)(zt+. we can choose an N such that Ipd < ~ / for 4 all k 3 N. when pk 0 as k co. then some useful new information can be continually added so that the time average will approach the ensemble average. we have -+ -+ where we choose an n large enough so that the first term in the next to last inequality above is also less than 42.5.we have -+ which implies that.e.. +k = - n- z) .k- 2(Zt .because pk 0 as k -+oo implies that for any E > 0.. i.5) holds. 2. Thus.2). The process is said to be ergodic for the mean if the result in (2. 19 2. e.10) z : ~ ( z .) is Gaussian.. It can be shown that for certain types of processes +k has smaller mean square error than $k (see.7).114 estimates are suggested to be calculated in time series analysis. we use $k in (2. especially when k is large with respect to n.P) - (2 . ~ a r ( Z = ) 0 as shown in (2.5.5 Estimation of the Mean. n-k x ( z t r=l . When we ignore the term ~ a r ( 2 ) representing the effect of estimating p. often at most . Hence. In addition.-+p~) by It is clear that both of these estimators are biased.k ) ( Z . it would be more appropriate to compare their mean square errors. Bartlett (1946) has shown the following approximate results: and .(n . for a given n.p ) . however. terms P)(Zt+k - ( P ) .) and (2.. As a result..8) as the sample autocovariance function to estimate the autocovariance function yk. 2.P ) ~ . n-k = x [ ( z t t=l = P) - (z I L ) ~ [ ( Z I-+ ~ n-k t=I - (2 . $k has a larger bias than $. Parzen f1961bl). In general. and Autocorrelations Now. but $k is not necessarily so. Hence.P ) ( z t + k .Z)(~t+k ..k are biased. If pk + 0 as k + cm and hence the process is ergodic for the mean and lim. Some may argue that because both .P ) r= 1 ( ~ -t P) n-k 2 (ZI t=l where we approximate the (rt . When the process {Z.k ) ( Z . then both estimators +k and ?k are asymptotically unbiased. the estimate $k is always positive semidefinite like yk.5. Autocovariances.z.g.$k becomes unbiased but ?k is still biased.jlk and .P)] n-k (2.5. hence a sufficient condition for process to be ergodic for the autocovariances is that the autocovariance is absolutely summable.20 CHAPTER 2 Fundamental Concepts Similarly. versus k is sometimes called a sample conelogram. &. For a given observed time series Z1. Tk tk 2. For a stationary Gaussian process.e. Ergdicity is assumed t o hold for the remainder of this book.5. .Z. and Fuller (1996. because the sample autocovariance is an asymptotically unbiased to be mean square consistent and the estimator of yk. from (2S. p. Bar&lett(1946) has shown that fork > 0 and k j > 0. 30% among others. For our purpose. .3 Sample Autocorrelation Function . 2>lyi[ < ca and hence h .~ a r ( ? ~=)0.16) we see that the variance of $k. we would like to ask when the process is ergodic for the autocovariance function so that in mean square we have A rigorous proof of the above statement is complicated. + . A plot of n. p. where Z = the sample mean of the series. the sample ACF is defined as z:=I~tfn. i. Hannan (1930. Next. the variance of +k is larger than the variance of ?k. and Thus. . In fact. 201). can be substantial for Iarge k resulting in an unstable and erratic estimate. v a r ( q k ) .. it suffices to note that for any given k. . .For relevant readings. interested readers are referred to Gnedenko (1962). Thus. 2.5 Estimation of the Mean. ikis approximately normally distributed with mean pk and variance For processes in which pk = 0 for k > m. = 1. .20) becomes In practice. and Autocorrelations 21 For large n.5. Autocovariances. . rn) are unknown and are replaced by their sample estimates ii. . To test a white noise process. Bartlett's approximation of (2. pi(.and we have the following large-lag standard error of jk. . consider the following ten values of a time series: The sample mean of these ten values is Z = 10. .4 To illustrate the computation of the sample ACF.2. we use EXAMPLE 2. +kk to test the hypothesis of a white noise .3. It was shown by Quenouille (1949)~haton the hypothesis that the underlying process is can be approximated by a white noise sequence.3. 2. a recursive method starting with 411= for computing c $has ~ been given by Durbin (1960) as follows: and The method holds also for calculating the theoretical PACF 4kk.22 CHAPTER 2 Fundamental Concepts and we note that In other words.19). the variance of 4Kli Hence.19). &2/1/. the sample ACF is also symmetric about the origin k = 0. Instead of calcuLating the complicated de.4 Sample Partial Autocorrelation Function The sample PACF $M is obtained by substituting pi by bi in Equation (2.terminants for large k in (2. can be used as critical limits on process..5. (2.25) we have Other can be calculated similarly. -i. i. Thus. there are two useful representations to express a time series process.3.6. {ar) is a zero mean white noise process.6 Moving Average and Autoregressive Representations of Time Series Processes In time series analysis. Here and in the following an infinite sum of random variables is defined as the limit in quadratic mean (mean square) of the finite partial sums.1) in the compact form where $ ( B ) = CO xj=o+.5.5 Using the data in Example 2. we have from (2.5. One js to write a process Z. by (2. = Zt .251.4. as a linear combination of a sequence of uncorrelated random variables. 2. Zt in (2. we can write (2. = 1.Bj.. and @ .2.p. . < W.26) Thus.e. xj=o 00 where I/+-.61) is defined such that where 2.6 Moving Average and Autoregressive Representations of Time Series Processes 23 EXAMPLE 2.xt = x. By introducing the backshift operator 8. and (2.5.19). however.zO+/ < m is a required condition for the process in (2.1) to be stationary.24 CHAPTER 2 Fundamental Concepts It is easy to show that for the process in (2.1).6.6. to be stationary we have to show that yk is finite for each k.6. Hence.6) and (2. the autocovariance and autocorrelation functions in (2.1) is called a moving average (MA) representation of a process. Now. a process contains no deterministic component that can be forecast or predicted exactly from its own past) can always be expressed in the form of (2. the representation is also known as Wold's representation in the literature. and Clearly. Sometimes the term linearprocess is also used to refer to the process in (2~5.1). Hence. and any process that can be represented in this form is called a nondeterministic process.6.1) and E(atZt-j) = { u ' ' 0. Hence. .6. Because they involve infinite sums.7) are functions of the time difference k only. 2. f o r j > 0. The form in (2. Wold (1938) proved that a stationary process that is purely nondeterministic (i. for j = 0.e..6. i. in which we regress the value of Z at time ton its own past values plus a random shock.6. p = c id. For a h e a r process 2. yk. equivalently. This method is a convenient way of calculating the autocovariances for some linear processes.1 1) where v ( B ) = 1 . then PI > 1. + Ej=l]4 (2.2. .. is the coefficient of both Bk and Using (2. k = 0 fI . . r rand ~ ~1j . That is.6. a noninvertible process is meaningless.8) + where we let j = i k and note that tjj = 0 for j < 0. in an autoregressive (AR) representation. Box and Jenkins (1976) calf a process invertible if it can be written in this form. The corresponding autoconelation generating function will be Another useful form is to write a process Z. . the autocovariance where the variance of the process. then 1 ~ =1 03 + w. we can write (2. if P is a root of +(B).6)and stationarity.6 Moving Average and Autoregressive Representations of Time Series Processes For a given sequence -of autocovariances yk. generating function is defined as 25 . V(B)Z. It is easily seen that not every stationary process is invertible.~ F ~ . or. /PI is equal to the absolute value of fl and when j3 is a complex number. They argue that in forecasting. is the coefficient of BO and the autocovariance of lag k.. where ] I is the standard Euclidian metric.e. The autoregressive representation is useful in understanding the mechanism of forecasting. to be invertible so that it can be written in terms of the AR representation. = a. 3 2 . = $(B)u. the roots of $(B) = 0 as a function of B must lie outside of the unit circle. When /3 is a real number. . < CO. yo.6. In the autoregressive representation of a process. in general. .11) to be stationary.14). is. That is. all models of finite number of parameters mentioned above relate an output Z. in modeling a phenomenon. we have 181 > 1. Instead.6.26 CHAPTER 2 Fundamental Concepts It should be noted that an invertible process is not necessarily stationary. they are sometimes referred to as . then the resulting process is said to be an autoregressive process of order p. . then the resulting process is said to be a moving average process of order q. we construct models with only a finite number of parameters. This modeling criteria is the principle of parsimony in model building recommended by Tukey (1967) and Box and Jenkins (1976). = -el. if only a finite number of a weights are nonzero. if S is a root of m(B).. to an input a. This process is written as + Similarly.. In fact. in terms of linear difference equations.e. the number of parameters may still be prohibitively large. we discuss some very useful parsimonious time series models and their properties. To achieve that. the required condition is that the roots of a (B) = 0 all lie outside the unit circle. such that the condition < co is satisfied.e. vp= c $and ~ vk= 0 for k > p. . Thus. .rrl = 7~~ = 5h2. Gq = -Oq and t+bk= 0 fork > 4. the more parameters in a model. they are not the model forms we use at the beginning stage of model building because they contain an infinite number of parameters that are impossible to estimate from a finite number of available observations. . we choose a simpler model to describe the phenomenon. zFo+. which is given by If we restrict ourselves to the above finite-order autoregressive models in (2.. it follows that for the process presented in (2. Linear Difference Equations Linear di~erenckequations play an important role in the time series models discussed in this book. i. the process must be able to be rewritten in a MA representation. if only a finite number of weights are nonzero.13) and moving average models in (2.5. Although the autoregressive and moving average representations are useful. . the less efficient is the estimation of the parameters. however. Other things being equal. . i. A natural alternative is the mixed autoregressive moving average model For a fixed number of observations. . +Z = -02. In the following chapters. . By Wold's result. in the moving average representation.6. As a function of B. = (1 .7.7. Z. we can set Co = 1.Bn).n. Now assume that (1 . Equation (2. and homogeneous if e.7. most of the time series models considered in this book are of the difference equation type and the behavior of the time series generated by such models is governed by the nature of the roots of the associated auxiliary equation. . The solution of a homogeneous difference equation depends on the roots of the associated auxiliary equation. Without loss of generality. = 0.we can rewrite (2.1.7.1) is called a forcing function. = btj. where b is any Proof For nr = 1. C.2) e. To better understand these models.I Zr = 0. A general nth-order linear difference equation with constant coefficients is given by where Ci.7. is a particular Lemma 2. As will be seen. j < (m. # 0. especially to the method of solving these equations. 1. Let ( 1 .2. we concentrate on the solution of a general homogeneous linear difference equation. Lemma 2.1) is said to be nonhomogeneous (or complete) if e. constant and j is a nonnegative integer less than na. = 0. then z .b = 0. j < m. then + hzi2)is also a solution for any arbitrary constants bl and bn. C(B) = 0 is called the auxiliary equation associated with the given linear difference equation..B)m Z. where 2. in (2.B ) ~ . In the remainder of this section. Then for Zt = bti.B)b = b . The particular solution of a nonhomogeneous difference equation depends on the form of the forcing function. i = 0.[~)and blzjl) z/*)are solutions of the homogeneous equation.1) as + C(B)Zt - + + + (2. The solution of linear difference equations rests primarily on the following lemmas. Lemma 2.. The properties of these models often depend on the characteristics of the roots of these difference equations. ( ~ f) complete equation. If z. .7. If Z i H )is a solution to the homogeneous equation and is the general solution of the solution of the nonhomogeneous equation.3. which can be easily proved using the definition of a solution and the property of linearity. .1). - bti. . The function e. Using the backshift operator and letting C(B) =. we have (I -B)Z.7 Linear Difference Equations 27 linear difference equation models.7. we give a brief introduction to h e a r difference equations. Clearly. = bP = b .2. are constants. it operates on Lime index t. .~. Then a solution is given by 2. . When B is used as an operator. (1 CIB C ~ B. 7. = .7. ( c f di) = a(cos 4 f i sin +).2. + a \ Proof The result follows immediately from Lemmas 2. Note that for a red-valued linear difference equation. .2. where bj is a constant.e.3 that (1 . then N 2 = 8 . . if mi = 1 for all i and R . then its complex conjugate (c + do' = (c .B y ti = 0 by Lemma 2.RiB)*i.1. and bmll. The lemma is proved. Theorem 2.. bl. = X $ l b i ~ j .1.1). . .7.7. where Zi=lnli =n and Bi= R y 1 i = 1. .7. = ( ~ r $ ' b ~ t j ) ~ ~ . where j is any nonnegative integer less than In. That is. if(c -F di) is a root.3. a complex root of C(B) = 0 must appear in pairs. i. each term is reduced to zero by the operator i (I . Thus. we have 2. The result then follows immediately from Lemma 2. and 2.3. . ~(i = 1. and the general solution is given by 2. Proof First. Xb~# ~ t j ~~:In . particular. Then a solution is given by Zt = ti Rt. It fotlows from Lemmas 2. . . A general complex number can always be written in the polar form. If C ( B ) = ITi=i(l .7.4 Let (l-RB)m Zt = 0. where and (2. N) are roots of multiplicity mi of C(B) = 0. we note that Repeated application of the above result gives which equals zero because (1 . Lemma 2.d11 is also a root. = 0 be a given homogeneous linear difference equation where N N C(B) = 1 CIB + C ~ +B ~ + C.7.3) .7.2.7.B)"-'. . Let C(B)Z. ..1 and 2. by our induction hypothesis.~ ) ~ [ ~ ~ i =' 0 b for ~ tany j ] constants bo. m Finally.Bn.28 CHAPTER 2 Fundamental Concepts Now each term in the last expression involving t contains only integer powers less than (I# .1.n) are all distinct. . we have the following main result.7.4. 2.1.-1 + Zti2 = 0. . . td cos 4t.5Zt-2 .22.e2)i(arsin 41) = bldcos 4t + &at sin +t. t d sin 4 t. the solution of the homogeneous difference equation should contain the sequences d cos #t. where br and b2 are real constants. The auxiliary equation is given by R+' = 1 is a root of multiplicity 2. we claim that we can always write Z..di)' = el[a(cos 4 + i sin +)Ir + e~[a(cos# . . . . if el = x + iy. EXAMPLE 2. By Theorem 2.7 Linear Difference Equations 29 Because ( c & di)t = at(cos #t rt i sin $t). = el(c + di)' + ez(c . For a real-valued process.-1 The auxiliary equation is given by + 1. Hence. where el and e2 are any (complex) constants. . Thus. For an illustration.7 Find the solution for Z.i sin #J)It.7. = blat cos +t + &d sin #t. .. = (el + e2)(atcos#t) + (el .di = a(cos c) .I.2Z. where bl and b2can be easily seen as real. then e2 = x . d sin #f. This result follows because we can choose e2 to be the complex conjugate of el.and tm-la' cos #Jt. by Theorem 2. we have Z.iy. it follows that for each pair of complex roots of multiplicity m.i sin +). and 2. Find the closed form solution for 2. EXAMPLE 2. tm-lat sin +t.4 = 0.6 Let Z.5Z. consider the second-order difference equation with the following auxiliary equation with R1 = c + di = ~ ( C Oc)Sf i sin 4 ) Rz and = c .7. = bl$' + + b2 b3f. to find Ri. 2. where t is an integer. B1 = R r l = 1 is a root of multiplicity one.Z.2~~ 1. we first find the root Bi of C(B) = 0 and then calculate its inverse. sometimes we can compute Ri = Bidl needed in Theorem 2.5) + Hence. by the remarks following Theorem 2.1) = 0.1 Let Z.7..Thus. we have from Theorem 2.1 with equal probability of fi2.7.we get It can be easily seen that Biis a root of (2. Thus. The reader can easily verify that each Ri obtained in Example 2. Now R1 = 1 and R2 = (1 -i)/2 and R3 = (1 i)/2.i are a pair of complex roots of multiplicity one. as shown in Example 2. + . we have from (2. + EXAMPLE 2.7.l.4) and (2.7. To express R2 and R3 in the polar form.8 Solve (1 . = if t is odd.. we have Remark The Riused in the solution of C(B) = 0 in Theorem 2.5 =+ ( R ~ R f S)(R . and Z.7.1.7 is indeed a solution of R3 .7.7.30 CHAPTER 2 Fundamental Concepts Hence.6) if and only if Ri is a root of (2. Ri = B. = 1 or .+B)(1 - = 0.1. be a sequence of independent random variables defined as 2..1 is the inverse of the root Bj. and and B3 = R?' = 1 . i. Because R1= 4.7. and R2 = 1 of multiplicity 2.1 more easily from solving (2. This cumbersome procedure can be avoided by noting that if we let R = B-I and multiply the equation by Rn = (B-I)".7).7).e. where t is even.5R .7.7. covariance stationary? 2. Show that the following function satisfies the properties stated in Exercise 2. + 2. 66. 48.2 Let 2.57.3 Prove or disprove that the following process is covariance stationary: (a) Zt = A sin (27rt 8). where A is a random variable with zero mean and unit variance.65.7 Given the time series 53. where A is a constant.39.27r] (b) 2. against Z.52. (b) Ipnt -= 1.5 Verify the following properties for the autocorrelation function of a stationary process: (a) Po = 1. 42. = A sin (27rr 4.44.[. 49. 38. 56. 56. strictly stationary? (b) Is Z.4 Is the following a valid autocorrelation function for a real-valued covariance stationary process? Why? 2. 52. 44.5. where A is a random variable with zero mean and unit variance. and 6 is a random variable that is uniformly distributed on [0. 43. 51. (b) Can you guess an approximate value for the first lag autoco~elationcoefficient p l based on the plot of the series? (c) Plot 2.6 It is important to note that not every sequence satisfying the properties specified in Exercise 2.59.43: (a) Plot the series. 44. .34. (a) Is Z.l)'A.52.cos (2?tt).Exercises 31 (a) Is the process first order stationary in distribution? (b) Is it second order stationary in distribution? + 2. where U and V are independent random variables.5 is an ACF of a process.8). = (. 56. 58.60.52. and 8 is a constant. and try again to guess the value of pl.. (c) 2.40. 54. tc) Pk = P-k- 2. each with mean 0 and variance I. but is not an ACF for any stationary process: 2. 41.32. = U sin (2vt) V. .3. = p + xj=o$ja. (b) Z. 2.-j M or. + -: and the a.4. Zt = p XpO+jRj..Z r .1. 3. .4.11 Prove that n Var(Z) - 03. (c) Z.82.-2 = 0.. + +(B)a.3 = 0 .-2 = 0.1. aRli. . t.-Z*-I + Z .+ .ho = 1.81Z. where yj is the jth autocovariance of the ~p-l~~l xk=-yk where $ ( B ) .10 Let Z.5. but Gkis not necessarily so. 2. equivalently. 2. . '33 as n + m. fork = 0.32..12. fork = 0.Prove that process 2. 1. if CO &=-&'kt < 00- 2..2. 2.9 Consider a stationary series with theoretical autocorrelation function Find the variance of jkusing Bartlett's approximation.2.8 Show that the estimate $k is always positive semi-definite. XFO1$jl < variance 5.32 CHAPTER 2 Fundamental Concepts (d) Calculate and plot the sample ACF.5.2 .1.12 Find a closed form solution for the following difference equations: (a) 2. (e) Calculate and plot the sample PACF. &. are white noise with mean 0 and < m. e. 3. and % = 2. Because = Zf=r[$jl5 W ..+lB . .rm = +2. then the resulting process is said to be an autoregressive process (model) of order p. . Yule (1927) used an AR process to describe the phenomena of sunspot numbers and tbe behavior of a simple pendulum. which is denoted as AR(p). the roots of +p (3) = 0 must lie outside of the unit circle. . After giving detailed discussions on the characteristics of each process in terms of the autocorrelation and partial autocorrelation functions. In this chapter.Stationary Time Series Models Limited by a finite number of available observations. First. .1 Autoregressive Processes As mentioned earlier in Section 2.. . if only a finite number of ~ T Tweights are nonzero. It is given by where &(B) = (1 . This model contains a very broad class of parsimonious time series processes found useful in describing a wide variety of time series. in the autoregressive representation of a process. we illustrate the results with examples. which includes the autoregressive model and the moving average model as specid cases. we introduce the autoregressive moving average model.p.6. ~-p~ln~l 33 . The AR processes are useful in describing situations in which the present value of a time series depends on its preceding values plus a random shock. we often construct a finite-order parametric model to describe a time series process. a1 = + I . i. . . vp= and rk= 0 for k > p. To be stationary. the process is always invertible. let us consider the foliowing simple models. Table 3.3 show the:ample ACF and the sample PACF for the series. The magnitudes of these autocorrelations decrease exponentially in both cases. . Clearly 5k decreases exponentially and 4Rtcuts off afier lag 1 because none of the sample . the PACF from (2.is exactly the same as the distribution of Z . (1 = .. given ACF of the AR(1) Process The autocovariances are obtained as follows: and the autoconelation function becomes where we use that po = 1. the PACF of the AR(1) process shows a positive or negative spike at lag 1 depending on the sign of and then cuts off as shown in Figure 3. . then the sign of the autoco~elationsshows an alternating pattern beginning with a negative value. for k 2 2.1.3b) As mentioned above. Hence.9.1 < < 0. EXAMPLE 3. are independent normal N(0. we simulated 250 values from an AR(1) process. we write Z. Hence. . I) #lB)(Z.= #. we have < 1. + a. That is. The white noise series a. the root of (1 . PACF of the AR(1) Process For anAR(1) process. .34 CHAPTER 3 Stationary Time Series Models 3 1 The First-Order Autoregressive AR(1) Process For the first-order autoregressiveprocess AR(l).1.1 For illustration.1.3.. then all If 0 < nentially decays in one of two forms depending on the sign of autocorrelations are positive. if . the ACF expo< 1.Zt-. the process is always invertible. for a stationary process. (3. given it-*.#qB) = 0 must be outside of the unit circle. when [ # 11 < 1 and the process is stationary. It is relatively smooth.19) is k = 1.1 and Figure 3. To be stationary. with random variables.10) = a.2 shows the plot of the series. Figure 3. The AR(1) process is sometimes called the Markov process because the distribution of 2. as shown in Figure 3. 35 = a. more important. these insignificant do not exhibit any pattern. The associated standard error of the sample ACF .k is computed by Zkk and the standard error of the s a m p l e P ~ c ~ is set to be which are standard outputs used in most time series software. .+..B]z.1 ACF and PACF of the AR(1) process: (1 Autoregressive Processes .1 FIGURE 3. PACF values is significant beyond &at lag and.3. EXAMPLE 3...2 and Figure 3.10) = a.36 CHAPTER 3 Stationary Time Series Models 0 50 100 200 150 250 Time FIGURE 3.4 and is relatively jagged. Because though only the first two or three sample autocorrelations are significant. We see the alternating decreasing pattern beginning with a xgative in the sample ACF and = fil.9B)[Z. . even the cutoEproperty of the sample PACF.2 A simulated AR(1) series.9B)(Z. (1 .$lB)(Z.2 This example shows a simulation of 250 values fiom the AR(1) process (1 . 411 is also negative.10) = a. with = -. . The sample ACF and sample PACF of the series are shown in Table 3.. Also. being Gaussian N(0.10) = a.. &. . the overall pattern clearly indicates the phenomenon of an AR(1) model with a negative value of 41..1 Sample ACF and sample PACF for a simulated series from (1 . TGBLE 3.s.65 and a. The series is plotted in Figure 3. 1) white noise. . -0 50 100 150 200 250 Time FIGURE 3..1 Autoregressive Processes FIGURE 3..10) = a.4 A simulated AR(1) series (1 -i-. - 10) = a. 37 .6SB) (Z. ..3 Sample ACF and sample PACF of a simulated ARE]) series: (1 .3.9B) (2. 10) = a. it implies that < 1. + FIGURE 3. To see that.1 with the .1. which is still stationary in the usual sense of the term as defined in Section 2. the process is regarded as nonstationary because we have implicitly assumed that the process is expressed as a linear combination of present and past white noise variables.5 Sample ACF and sample PACF of a sirnufated AR(1) series (1 f .1. It is straightforward to verify that the process 2..65B)(Zt . If we also consider a process that is expressed as a linear combination of present and future random shocks. we have assumed that the zeros of the autoregressive polynomial &(B) tie outside of the unit circle.} is a white noise process with mean zero and variance a:. when I+lj 2 1.3a or b).. in (3.1.9) is indeed stationary in the sense of Section 2.10) = a. Thus.6SBj(Zt .38 CHAPTER 3 Stationary Time Series Models TABLE 3. consider the process where {a. then there exists an AR(1) process with its parameter greater than 1 in absolute value. In discussing stationary autoregressive processes.2 Sample ACF and sample PACF for a simulated series from (1 . In terms of the AR(1) process (3. Now. which can be written as a linear combination of present and past random shocks.2Zt-1 - b. its variance becomes 4mi. 03 2. = In summary. which is four times larger than the variance of a.1.$-lZt-l = b.5)w.5)H. pk = (. 3.Thus. that although the 6.14) where both a. . (3.1.1 1) is a white noise process with mean zero.S)c-j. in (3. and b. although a process with an ACF of the form $14..9) at time (t . can be written either as ~.13). in the following AR(1) model with the same ACE pk = (. in (3.(.19) leads to the following equivalent AR(1) model with 4I = 2. the variance of b. is.. we have .14) is larger than the variance of a. in terms of a stationary AR(1) process.e. where ]$I < 1. consider the process (3. we always refer to the case in which the parameter value is less than 1in absolute value. Thus..1. i.1) and multiply both of its sides by 2.1. (3.1.11) where b.1-13) by a factor of 4-2. we will choose the representation (3. are zero mean white noise processes.2 The Second-Order Autoregressive AR(2) Process For the second-order autoregressive AR(2) process.. however. (3.1...1. = -24 . 2.. Z. That is. Note.1. for practical purposes.1 Autoregressive Processes 39 ACF. in (3.3. .6 ~ = 0~ mist )lie outside of the unit circle.2B . Hence. which are larger than 1 in absolute value.5B .1 = 0. the process (1 . For example.8B) = 0 gives B = 11.1. equivalently. Yet (1 .. we have the following necessary condition for stationarity regardless of whether the roots are real or complex: ...7 and B = 11.. To be stationary. and Thus. ~ B ~ )=z .2B . the roots of 4(B) = (1 . is stationary because (1 .8LI2)= 0 is B = I.a.5B + .1. 5 6 ~ ~ )=~ .41B . is not stationary because one of the roots of (1 .8 as the two roots. The stationarity condition of the AR(2) model can also be expressed in terms of its parameter values. is always invertible. of .#lB .. which is not outside of the unit circle.7B)(1 .# z ~ 2 ) = 0 or. Let B1 and B2 be the roots of (1 . We have 5b2B2 + + and The required condition lBil > 1implies that ll/Bil < 1 for i = 1 and 2.40 CHAPTER 3 Stationary Time Series Models The AR(2) process.a. as a finite autoregressive model. 5 6 ~ ~=) ( I . equivalently. we have 42< 0 and #f ues.1 Autoregressive Processes For real roots. .6 satisfying ACF of the AR(2) Process We obtain the autocovariances by multiplying Zt-k on both sides of (3.I + 42L'k-2. the autocorrelation function becomes Pk = $ I P ~ .1. which impf es that or. + < 0. 1. Thus. in terms of the parameter valPor complex roots.15b) and taking the expectation Hence. we need 4f 41 + 4$2 2 0.3. 4 - k FIGURE 3. the stationarity condition of the AR(2) model is given by the following triangular region in Figure 3.6 Stationary regions for the AR(2) model. 1. we have. the process is also sometimes called the Yule process. PACF of the M(2) Process For the AR(2) process.20) and (3. namely .1.1.42 CHAPTER 3 Stationary Time Series Models Specifically. U. Thus.Using Theorem 2.3. .19).&B*) pk = 0. Hence. because fork 2 1 as shown in (3.7. The pattern of the ACF is governed by the difference equation given by (3.1.19).19).&B2) = 0 are real and a damped sine wave if the roots of (1 .#2B2) = 0 are complex. Yule in 1921 to describe the behavior of a simple pendulum.21). The AR(2) process was originally used by G . .19).1. the ACF will be an exponential decay if the roots of (I . when k = 1and 2 which implies that and pk for k e 3 is calculated recursively through (3. we obtain (1 - where the constants bl and b2 can be solved using the initial conditions given in (3.+IB . from (2.i$qB .1. EXAMPLE 3. however. The oscillating pattern of the ACF is similar to that of an AR(1) model with a negative parameter value.4 and Figure 3.9 show the sample ACF and the sample PACF of this series. ~ B ~ )=z a. rejects the possibility of being an AR(1) model.. with the a. Table 3. the PACF of an AR(2) process cuts Similarly. indicates an AR(2) model. being Gaussian N(0. Both give a fairly clear indication of an AR(2) model.8 show the sample ACF and the sample PACF for a series of 250 values simulated from the AR(2) process (1 + . and the sample PACF cuts off after lag 2.3 and Figure 3. Figure 3.3 Table 3. = 0 for k 2 3. EXAMPLE 3.5B2)2. The rate of the decreasAng of the autocorrelations.= a. 1) white noise. we simulated a series of 250 values from (1 .B .1 Autoregressive Processes 43 because the last column of the numerator is a linear combination of the first two columns. 1)white noise. That 4kacuts off after lag 2. with the a.5B .7 ilIustrates the PACF and corresponding ACF for a few selected AR(2) processes.. being Gaussian N(0. . we can show that off after lag 2. Hence. on the other hand.3..4 To consider an AR(2) model with the associated polynomial having complex roots. The sample ACF exhibits a damped sine wave. + . .= at.44 CHAPTER 3 Stationary Time Series Models FIGURE 3.7 ACF and PACF of AR(2) process: (1 - .42112)2. ..24b) . we multiply Zt-R on both sides of (3. = a..and sample PACF for a simulated series from (1 + .1.8 Sample ACF and sample PACF of a simulated AR(2) series: (1 + .3.3B2)Z.1 Autoregressive Processes 45 TABLE 3.1..&-.1.5B .3 Sample ACF. FIGURE 3.+ a. k > 0.. .5B .3B2fZ. (3.24a) or & = $l~t-l + 42&-2 + ..# 1 B .+ &it.1.- # p ~ ~ ) z=t at (3.& = +l~t-k~t-l + ' ' ' + ilmkat and take the expected value Yk = 41~n-1+ a • • + #p~k-p.1.24b) ACF of the General AR(p) Process To find the autocovariance function.# 2 ~ 2 . (3.25) . 3. = a.3 The General pth-Order Autoregressive AR(p) Process The pth-order autoregressive process AR(p) is (1 . 1.0 0.4 -0.8 0.5B2)Z.2 -0._k) = 0 for k > 0.2 -0. Hence.~. + 1.1 .6 - 5 I' .5B2)zt = 0.0 - -0.46 CHAPTER 3 Stationary Time Series Models TABLE 3.4 0.4 Sample ACF and sample PACF for a simulated series from (1 .0 .26) we see that the ACF. 10 .6 0.6 1 r . . = a.6 0.0 - FIGURE 3.0 -0. . where we recall that E(a. we have the following recursive relationship for the autocorrelation function: From (3. # 1.4 0.ZtZ. .8 - -1.2 *k 0. -0. . a r k - - -0.8 -1. PA.2 0.o 1.. we can write .4 p ~ ) p = k o for k > 0.4 -0.. is determined by the difference equation $(B)pk = (1 -.9 Sample ACF and sample PACF of a simulated 4 2 ) series: (1 .~ $ 9 -~ .3 + .NOW.8 0.B . rot I - 0. 7. .or damped sine waves depending on the roots of &(B) = 0. a finite moving average process is always stationary. let us first consider the following simpler cases. . . then GiW1ate all distinct and the above reduces to For a stationary process.0 for k > q. It is given by . To discuss other properties of the MA(q) process. t. and G.hI = --dl. Hence.. i. = -02. < m. we have If di= 1for all i. 0.2 Moving Average Processes . we can easily see that when k > p the last column of the matrix in the numerator of +u.+ +ppk-p for k > 0. in (2. the PACF will vanish after lag p. . 3.2 Moving Average Processes 47 where Xitn=di = p.. then the resulting process is said to be a moving average process or model of order q and is denoted as MA(q). Damped sine waves appear if some of the roots are complex.3. Hence. This property is useful in identifying an AR model as a generating process for a time series discussed in Chapter 6. This moving average process is invertible if the roots of B(B) = 0 lie outside of the unit circle. and @k -. .l (i = 1.3. the ACF pk tails off as a mixture of exponential decays ..1. if only a finite number of @ weights are nonzero. .19) can be written as a linear combination of previous columns of the same matrix. Using the difference equation result in Theorem 2. IGT'~ > 1 and iGil < 1. PACF of the General G R [ p ) Process Because pk = +lpt-l + c $ ~ P ~+. In the moving average representation of a process. .e. . The process arose as a result of the study by Slutzky (1927) on the effect of the moving average of random events. where + + -+ Because 1 8. Moving average processes are useful in describing phenomena in which events produce an immediate effect that only lasts for short periods of time. m) are the roots of multiplicity diof $(B) = 0.2.~.+q = -dq. have the same . Hence. and vice versa. and the process 2.) = p.9).The mean of 12.41.6. and 2. more generally. In fact.48 CHAPTER 3 Stationary Time Series Models 3.2. For the process to be invertible. it is easy to see that 2Ipnl 4 1. = (1 .BIB) lies outside the unit circle.5B)a. and hence E(Z. we restrict ourselves to an invertible process in the model selections.1/61B)a. (pk[< ' 5 . = (1 .] is a zero mean white noise process with constant variance uz.2.1 The First-Order Moving Average MA(1) Process When O(B) = (1 . Hence. the root of (1 . one and only one k invertible. 2. we require ldll < 1 for an invertible MA(1) process. the MA(1) process is always stationary. for an MA(1) process. Because the root B = 1/O1.2. In other words. among the. using (2. for any 01. Because 1 Of is always bounded. however.as shown in Figure 3.OIB). = (1 autocorrelation function . Thus.=) 0. Two remarks are in order. then the root of (1 . however. Both the process 2. = (1 . If the root of (1 . From (3.) is ~ ( 2 .two processes that produce the same autocorrelations. the autocovariances of the process are The autocorrelation function becomes which cuts off after lag 1. + 1.. have the same autocorrelations.BIB)a. we have the first-order moving average MA(1) process where {a.10. for uniqueness and meaningful implication for forecasting discussed in Chapter 5.4B)a.l/BIB) = 0 lies inside the unit circle. 2.BIB) = 0 must lie outside the unit circle. ACF of the MAC11 Process The autocovariance generating function of a MA(1) process is. BIB)a..19) and (3.2 Moving Average Processes 49 FIGURE 3. the PACF of an MA(1) process can be easily seen to be . = (I .2. PACF of the MA(1) Process Using (2.3.3.10 ACF and PACF of MAEI) processes: Z.4). L) h k fork = 1 . which indicate a clear MA(1) phenomenon. however. -ef(i . it decays on the negative side.1) white noise.5B)a. EXAMPLE 3. which cuts off after lag 1. = (1 .5B)a. the PACF of an MA(1) model tails off exponentially in one of two forms depending on the sign of & (hence on the sign of pl). = (I .5B)a.. Fk all h2 TABLE 3. They are shown in Table 3. as shown in Figure 3.10.1 1 Sample ACF and sample PACF of a sirn~llatcdMA(1) series: Z. = (1 . Statistically. as Gaussian N(0.. If alternating in sign.50 CHAPTER 3 Stationary Time Series Models In general. .gf(k+l) ' 2 1. We also note that < i. Contrary to its ACF.. then it begins with a positive value. only one autoconelation and two partial autocorrelations and are significant. FIGURE: 3.. using a. clearly cuts off after lag 1 and 4fi tails off. From the overall pattern. otherwise.1 1.e..5 The sample ACF and sample PACF are calculated from a series of 250 values simulated from the MA(1) model 2..5 and plotted in Figure 3.5 Sample ACF and sample PACF for a simulated series from Z. 2 The Second-Order Moving Average MA(2) Process When B(B) = (1 . which is parallel to the stationary condition of the AR(2) model. the autocovariances of the MA(2) model are and The autocorrelation function is which cuts off after lag 2.e2B2) = 0 must lie outside of the unit circle.O2B2). For invertibility. the MA(2) process is always stationary. ACF of the MA(2) Process The autocovariance generating function via (2.2.2 Moving Average Processes 51 3.18).OIB .9) is Hence.1.we have the second-order moving average process where (a. as shown in (3. As a finite-order moving average model.3. . Hence. the roots of (1 .) is a zero mean white noise process.OIB .6. BIB .. and the other autocovariances are Hence.13.6 and plotted in Figure 3..19).65B .3. Hence. using that pk = 0 fork 2 3.12 together with the comsponding ACE EXAMPLE 3. . The sample ACF and sample PAC? are in Table 3.02B2) = 0. = (1 .1) white noise series a.3 The General qth-Order Moving Average MA(q) Process The general qth-order moving average process is For this general MAfq) process. the autocorrelation function becomes The autocorrelation function of an MA(q) process cuts off after lag q. equivdently.8 2 ~ 2=) 0 are complex. 3.with a Gaussian N(0. the variance is a where O0 = 1. They are shown in Figure 3.52 CHAPTER 3 Stationary Time Series Models PACF of the MA(2) Process From (2.The PACF will be damped sine wave if the roots of (1 .6 A series of 250 values is simulated from the MA(2) process 2. the roots of (1 . the PACF tails off as an exponential decay or a damped sine wave depending on the signs and magnitudes of and d2 or. we obtain The MA(2) process contains the MA(1) process as a special case.2.24~*)a.BtB . We see that 6kclearly cuts off after lag 2 and #kk tails off as expected for an MA(2) process. This important property enabla us to identify whether a given time series is generated by a moving average process. O ~ B ~ ) ~ .12 ACF and PACF of MA(2) processes: 2. = (I . .2 Moving Average Processes 53 FIGURE 3.3. .GIB . . we can easily see that the partial autocorrelation function of the general MA(q) process tails off as a mixture of exponential decays andlor damped sine waves depending on the nature of the roots of (1 . .BIB . = (1 .6 Sample ACF and sample PACF for a simulated MA(2) series from 2. The PACF will contain damped sine waves if some of the roots are complex. = a. we can write . 3. From the discussion of MA(1) and MA(2) processes...24B2)al.24B2)a.. = (1 . .3 The Dual Relationship Between AR(p) and MA(q) Processes For a given stationary AR@) process..#lB - . where $(B) = ( I .54 CHAPTER 3 Stationary Time Series Models TABLE 3..+BP).13 Sample ACF and sample PACF of a simulated MA(2) ' series: 2..65B .65B .. FIGURE 3.- .0484) = 0. we have $j = +{ for j 2 0. we can write the ARQ) process as which implies that Thus. For example.3 The Dual Relationship Between ARlp) and WCq) Processes with $(B) = (1 55 + $ I B + $*B2 f .3).3. Given a general invertible MA(q) process. . forj 2 2.3. we have where $o = 1. we obtain the $is as follows: Actually. .) such that The $ weights can be derived by equating ihe coefficients of Bj on both sides of (3. This equation implies that a finite-order stationruy AR process is equivalent to an infiniteorder MA process.In a special case when 42 = 0. Therefore. e.-161 -t ~ ~ . the . 9 = n. 3.OIB - . we have .$1B) * = -8: for j 2 1.B4). we can write the MA(2) process as where Thus..3.56 CHAPTER 3 Stationary Time Series Models with fl. .~ tforj 3 ~ 2...11) When O2 = 0 and the process becomes the MA(1) process..rrj and (I 1 Zt = a. + OIB + B T B ~ + .-)it = t 1 .(B) = (1 .. (3.n weights can be derived by equating the coefficients of Bj as follows: In general. we can rewrite it as where For example. . Also.4 Autoregressive Moving Average ARMA(p. q ) Processes 57 Thus. . The process contains a large class of parsimonious time series models that are useful in describing a wide variety of time series encountered in practice. we require that the roots of $(B) = 0 lie outside the unit circle. In summary. q) Processes A natural extension of the pure autoregressive and the pure moving average processes is the mixed autoregressive moving average process. and a finite-order invertible MA(q) process corresponds to an infinite-order AR process. which includes the autoregressive and moving average processes as special cases. in model building. but the MA(q) process has its autocorrelations cutting off and partial autocorrelations tailing off. The ARCp) process has its autocorrelations tailing off and partial autocorrelations cutting off. Thus. Henceforth. is that it may contain too many parameters.4 Autoregressive Moving Average ARMA@. we require that the roots of Bq(B) = 0 lie outside the unit circIe. In general. This dual relationship between the AR(p) and the MA(q) processes also exists in the autocorrelation and partial autocorrelation functions. in terms of the AR representation. a large number of parameters reduces efficiency in estimation. a stationary and invertible process can be represented either in a moving average form or in an autoregressive form. q) process or model. we assume that C$~(B) = 0 and Bq(B) = 0 share no common roots. q] Process As we have shown. even for a finite-order moving average and a finiteorder autoregressive model because a high-order model is often needed for good approximation. respectively. To be stationary.4. it may be necessary to include both autoregressive and moving average terms in a model.3. 3. we refer to this process as an ARMA(p. a finite-order stationary AR(p) process corresponds to an infinite-order MA process. which leads to the following useful mixed autoregressive moving average (ARMA)process: where and For the process to be invertible. though. a finite-order invertible MA process is equivalent to an infinite-orderAR process. 3. in which p and q are used to indicate the orders of the associated autoregressive and moving average polynomials.1 The General Mixed ARMA(p. A problem with either representation. ).e. . = on both sides z ~ . .4.l + and hence. (3. i.l -k .+ . q) Process To derive the autocovariance function. . . where This process can also be written as a pure moving average representation.k ~ t . ' ' ' +$~k-p (q + ' 1 9 (3.~ ~ .4) 2.4.2) T(B)Z~= at.~=) 0 fork > i.d.aH - .B l ~ .at-. .&-. . - 6@(&-flt-.l .6) . Because E ( Z .1) as and multiply by + #+z. = +(B)a. ..4.4.. ' We now take the expected value to obtain yk = + l ~ R .d. .58 CHAPTER 3 Stationary Time Series Models The stationary and invertible ARMA process can be written in a pure autoregressive representation discussed in Section 2. (3.. we have Yk = 4 1 ~ k .=~ +zl ~~l .+ +$l-gt-.. we rewrite (3.l+ -- 4.-kat-l) ) .~ a-~OIE(Z.. + k.+pyk-p + ~ ( i r .d&t-kat-q.6. where ACF of the ARMAEp. . . = .-~+ at . 2 The ARMA(1. Therefore.l) process. = 0. pq-l.4. which depends only on the autoregressive parameters in the model. we get ?ri = j. . 1) Process (1 - + 1 ~ ) ~=. where By equating coefficients of B] on both sides of the above equation.4 Autoregressive Moving Average ARMA@. For stationarity. Thus. q ) Process Because the ARMA process contains the MA process as a special case. we write T(B)Z.4. .3.4.9) . we assume that [1$~1 < 1. and when When an AR(1) process.. and for invertibility. however. q) model tails off after lag q just like an ARlp) process. we require that [Bll < 1.OIB)at.7) satisfies .4. (3.the pth-order homogeneous difference equation as shown in (3. depend on both autoregressive and moving average parameters in the model and serve as initial values for the pattern. 3. In terms of a pure autoregressive representation. (3. This distinction is useful in model identification. its PACF will also be a mixture of exponential decays or damped sine waves depending on the roots of 4JB) = 0 and 04(B) = 0. it is reduced to = 0. .1 - Or).q ) Processes 59 Equation (3. . (1 .1. we can regard the AR(1) and MA(1) processes as special cases of the ARMA(1.pl.26) for the ARlp) process. forj r 1. the autocorrelation function of an ARMA(p. = a. PACF of the m ( p .8a) is reduced to an MA(1) process. The-fist q autocorrelations pq. 81)~. (3. i ~ . we multiply Zf+ on both sides of (3. we have from (3.11) Y1 = 4 1 ~ -0 4 ' ~ a Z q . . . I) Process To obtain the autocovariance for {Z.10) ACF of the A W ( 1 .4.note that Hence.+lif-ardl + ~ ~ . .4. and take the expected value to obtain More specifically.91)B + ($2 .$1#1)3~ + .]..BIB). a ~ .] = (1 . Hence.8b).~ .-~ ~a ~.60 CHAPTER 3 Stationary Time Series Models We can write the ARhM(1. [l + ($1 . when k = 0.2. =( - forj 6 .~ a ~ . Yo = + 1 ~ 1+ @a2 - .~we) . 4(+l When k = I. 2 1. 1) process in a pure moving average representation as We note that lee. ~ ) For the term ~ ( i ' .4. Recall that ~ ( 2 ~ = a u:. 1) model are shown in Figure 3.+lelm: + ~2 . 1) model combines characteristics of both AR(1) and MA(1) processes. the PACF of the ARMA(1. 1) Process The general form of the PACF of a mixed model is complicated and is not needed. 1) model has the following autocorrelation function: Note that the autocorrelation function of an ARMA(1. its shape depending on the signs and magnitudes of cP1 and el.4. Beyond p l . Some of the ACF and PACF patterns for the -(I. the autocorrelation function of an ARMA(1. 1) process contains the MA(1) process as a special case. It suffices to note that.3. 1) model follows function of an AR(1) process.11) Hence. nus. For k 2 2. the ARMA(1. q) Processes 61 Substituting (3. because the ARMA(1. 1) process contains many more different shapes than the PACF of the MA(1) process. By examining Figure 3.14. we have from (3. with. it can be seen that due to the combined effect of both and 81.4 1 e ~+~~TU:.14.l) process also tails off exponentially like the ACF.4. . the same pattern as the autoco~~elation PACF of the ARMA(1. that both ACF and PACF tail off indicates a mixed ARMA model. .Thus. which consists of only two possibilities. the PACF of the ARMA(1.4.13) in (3. we have YO = .4 Autoregressive Moving Average ARMA(p. The moving average parameter 61 enters into the calculation of pl.12). or mixed ARMA model. we know that the phenomenon . = fl . pure MA.7 A series of 250 values is simulated f?om the ARMA(1. Solely based on the sample PACF as shown in Table 3. That both and $kk tail off indicates a mixed ARMA model. To decide the proper orders of p and q in a mixed model is a much more difficult and challenging task. 1) process (1 .OlB)a. = (1 . being a Gaussian N(0. without looking at the sample ACF.with the a. it is sufficient to identify tentatively from the sample ACF and sample PACF whether the phenomenon is a pure AR. 1) model (1 - c#J1~)2.15. EXAMPLE 3.9B)Z.14 ACF and PACF of ARMACl.7.62 CHAPTER 3 Stationary Time Series Models FIGURE 3.5B)a. The sample ACF and sample PACF are shown in Table 3.7 and also plotted in Figure 3. For now. Some helpful methods are discussed in Chapter 6 on model idenacation. sometimes requiring considerable experience and skill...1) white noise series.. 6 and O1 = . The sample ACF and sample PACF are both small because the AR polynomial (1 .8 and plotted in Figure 3. the series is the simulation result from the ARMA(1. 1) process.-63) and the MA polynomial (1 . q) Processes -1. which would indicate a white noise phenomenon.5. In fact.14) that the ACF of the .8 The sample ACF and PACF are calculated for a series of 250 values as shown in Table 3.4.. Recall from (3.dlB)a.4 Autoregressive Moving Average ARMA(p. EXAMPLE 3.with = .0 1 -1.0 63 1 FIGURE 3. (1 .3.16.= (I . None is statistically significant from 0.I#~B)Z.14 (Continzred) cannot be an MA process because the MA process cannot have a positively exponentially decaying PACF.5B) almost cancel each other out. k 1 2 3 4 5 6 7 8 .9B)Zt = (1 .15 Sample ACF and sample PACF of a simulated ARMAfl. ARMA(1.9B)Zt = (1 . qg)model in (3. i.. can also be twitten as .+rB1)/(1 + 8: .5B)a.2.64 CHAPTER 3 Stationary Time Series Models TABLE 3..4181) for k 2 1 . 9 10 FIGURE 3.1).... The assumption of no common roots between 4 J B ) = 0 and O..7 Sample ACF and sample PACF for a simulated ARMA(1. the sample phenomenon of a white noise series implies that the underlying model is either a random noise process or an ARMA process with its AR and MA polynomials being nearly equal.(B) = 0 in the mixed model is needed to avoid this confusion.. 1) process is pk = 41k-1(+l . we note that the ARMA(p.5B)a. which is approximately equal to zero when 4l= dl. 1) series: (I . If series from (1 .e. Before closing this chapter.4. Thus. 1) process: (1 .6B)Zt = f l . = (I . 4..3. .4 Autoregressive Moving Average ARMACp. q) Processes 65 TABLE 3.6B)Z.1. . the AR(p) model becomes (1 - +lB - . 1) series: (I ..$@P)Z.. Oo = 1. FIGURE 3. where In terms of this form.a..8 Sample ACF and sample PACF for a simulated series of the ARMA(1..5B)a. and the MA(q) model becomes It is clear that in the MA(q) process..16 Sample ACF and sample PACF of a simulated ARMA(1..5B)a. . = 8.. 4ZtW2= a.1 Find the ACF and PACF and plot the ACF pk for k = 0. . for the process. .1. . (ii) Z.25Z. .6Z1-1.8 Consider the MA(2) process 2. For each cas?. + . . and 5 for each of the following models where the a.2B $. .66 CHAPTER 3 Stationary Time Series Models 3. is a Gaussian white noise process. 3.20. . = (1 . 4 3.8Zl-z = al.. . .-2 = a.98Zt-1 = a. = 1. (a) Zl ...5 Consider the AR(2) process Z.2Z. (c) Find the PACF 4. (b) Z. ..3Ztd2= a .l.-1 .1.. . = (a) Calcufate pl.1.8Zt-1 .5Z.4.7 Show that if an AR(2) process is stationary. (b) Find the ACF for the model in part (a) with a = -4.. (c) Z..?Z1-l 3. (a) Find'the ACE using the definition of autocovariance function. 1. (a) Find the general expression for pk. + 3.2.. . + 3.4 (a) Show that the ACF pk for the AR(1) process satisfies the difference equation (b) Find the general expression for pk. (c) Calculate the values pk for k = 1. (d) Z. . (b) Use pol p1 as starting values and the difference equation to obtain the general _ expression for pk. . 3.. (b) Find the ACF using the autocovariance generating function.. for k = 0. . 10. 1. 3. (b) Plot the pk. . . 2. plot the simulated series.6 (a) Find the range of a such that the AR(2) process is stationary. (c) Calculate r+2by assuming that cr. .3 Simulate a series of 100 observations from each of the models with a: = 1 in Exercise 3.-1 = a.2. 3.-2 + a. 1. . 5 ~ ~ ) a .. then 3. . 10. and calculate and study its sample ACF jkand PACF fork = 0.2 Consider the following AR(2) models: (i) 2.5Z. .la. yl = 0. For each sipulated series.-l 3. ~ B ~ ) z=. y2 = -4. (1 .Exercises 67 3.4B .8B)a.25.~ . (b) Examine stationarity and invertibility for the process obtained in part (a). (c) Express the model in an AR representation if it exists. . (iv) (1 . (a) Is the model stationary? Why? (b) Is the model invertible? Why? (c) Find the ACF for the above process. .3..10 (a) Find a process that has the follgwing autocovariance function: yo = 10. .. (a) Verify whether it is stationary. or both. calculate. plot the series. l B 4.and $33 for the MA(1) process using both procedures.5B)al. 3.6B)Z. & fork = 0. 1. .19) or (2. yk = 0.9 Find an invertible process which has the following ACF: po = 1. (ii) (1 . = (1 .6B)Z. .. + (ii) (1 .. 3.20.1.25). . 3. j ? ~ ~and .2B + . = 1 in Exercise 3 3 . and pk = 0 fork r 2. Plot the simulated series and calculate and study its sample ACF. 2 1 ~ . for lk] > 2.1. (iii) (1 .8B)Zt = (1 .2. (b) Express the model in an MA representation if it exists. PACF.13 One can calculate the PACF using either (2.. (a) Find the ACF pk (b) Find the PACF #ldi for k = 1. =I 3.. . 1.3.14 Consider each of the following models: (i) (1 . 2 ~ ~ ) a .15. 422. = (1 . .15 Consider each of the foilowing processes: (i) (1 . ..l.9B)a.5B)at..5. . .1.1. 3. (c) Find the autocovariance generating function. a.7B + .H Consider the MA(2) process 2. 7 2 ~ ~ ) a . Illustrate the computation of +11. and study its sample ACF bk and PACF & for k = 0. or invertible. = (1 .R)Z.20. ~ B ~ ) z=. 3..16 Simulate a series of 100 observations from each of the models with oh2 = 1 in Exercise 3.12 Simulate a series of 100 observations from the model with rr. P I = . . 3. (1 . With respect to the class of covariance stationary processes. They could have nonconstant means p. or both of these properties. .1 clearly shows that the mean level changes with time. particularly those arising from economic and business areas. timevarying second moments such as nonconstant variance u:.Nonstationary Time Series Models The time series processes we have discussed so far are a11 stationary processes. the monthly series of unemployed women between ages 16 and 19 in the United States from January 1961 to August 2002 plotted in Figure 4. but many applied time series. nonstationq time series can occur in many different ways..S. tobacco production between 1871 and Jan 1960 I I I I Jan 1970 Jan 1980 Year Jan 1990 Jan 2000 * FIGURE 4. are nonstationary.1 Monthly series of unemployed women between ages 16 and 19 in the United States from January 1961 to August 2002. For example. The plot of the yearly U. Two such classes of models that are very useful in modeling time series nonstationary in the mean are introduced in this section.1. the autoregressive integrated moving average ( A m ) models. we illustrate the construction of a very useful class of homogeneous nonstationary time series models. there are models that can be constructed from a single realization to describe this time-dependent phenomenon. a standard regression model might be used to describe the . tobacco production between 1871 and 1984. In such a case. Fortunately. 4.S. 1984 shown in Figure 4.1 Deterministic Trend Models The mean function of a nonstationary process could be represented by a deterministic trend of time.2 indicates both that the mean level depends on time and that the variance increases as the mean level increases. 4.1 0 I860 1880 1900 1920 Year 1940 Nonstationarity in the Mean 1960 1980 69 0 FIGURE 4. In this chapter.4. Some useful differencing and variance stabilizing transformations are introduced to connect the stationary and nonstationary time series models.1 Nonstationarity in the Mean A process nonstationary in the mean could pose a very serious problem for estimation of the time-dependent mean function without multiple realizations.2 Yearly U. one can use 2.1-3) If the deterministic trend can be represented by a sine-cosine curve.2 Stochastic Trend Models and Differencing Although many time series are nonstationary. = cue + cult f . j= 1 which is often called the model of hidden periodicities.1. if the deterministic trend can be described by a kth-order polynomial of time. + aktk+ a.one can use + + More generally. 85) refer to this kind of nonstationary behavior as homogeneous nonstationary. however.These models can be analyzed using standard regression analysis and are discussed again later in Chapter 13. p. .5) where a = v cos 6. CY (4. = a. due to some equilibrium forces. (4. we can have rn 2. follows a linear trend. + COS(W~) iP sin(wt) + a. + nit. (4.1 -4) (4. p = -v sin 6..1.. and We call v the amplitude. = v0 + 2 (ajcos djt + Pj sin ujt) + ul.. being a zero mean white noise series. if the mean function p. = CYO alt a2f2. p. different parts of these series behave very much alike except for their difference in the local mean levels. the process is nonstationary if some roots of its AR polynomial do not lie outside the unit circle.. For a deterministic quadratic mean function. Box and ~enkins(1976. = a 0 then one can use the deterministic linear trend model 2. More generally. t a l t + a.. p. one can model the process by Z. In terms of the ARMA models.1+1] with the a. w the frequency. By the nature of homogeneity. = v0 = YO + v COS(O~ + 6) -t.a. and 8 the phase of the curve.70 CHAPTER 4 Nonstationary Time Series Models phenomenon. 4.. For example. This result is different from the deterministic trend model mentioned in the previous section. 2.2 Autoregressive Integrated Moving Average (ARIMA) Models 71 the local behavior of this kind of homogeneous nonstationary series is independent of its level. The autoregressive moving average models are useful in describing stationary time series.12) To see the implication of this kind of homogeneous nonstationary series. This equation implies that q ( B ) must be of the form for some d > 0. is stationary. and we characterize the process as having a stochastic trend. . (4. where the mean level of the process at time t is a pure deterministic function of time.1. the mean level of the process Zt in (I.) is nonstationary. a homogeneous nonstationary series can be reduced to a stationary series by taking a suitable difference of the series.2 Autoregressive Integrated Moving Average (ARIMA) Models A homogeneous nonstationary time series can be reduced to a stationary time series by taking a proper degree of differencing.4. Hence.B ) ~ z . but its dth differenced series. consider d = 1 in (4. = Z. ) for Some integer d 2 1. autoregressive integrated moving average models. ((1 . by letting q ( B ) be the autoregressive operator describing the behavior.1.12)>ie. we discuss the use of differencing to build a large class of time series models.. we have for any constant C. Thus. 4. .1). a. .. the level of the series at time t is which is subject to the stochastic disturbance at time (t .-I Given the past information Zt-l.In other words. .B ) ~ z for . . so in this section. In other words. For example. which are useful in describing various homogeneous nonstationary time series. where 4 ( B ) is a stationary autoregressive operator.+ + a.13b) . Zt. d r 1 changes through time stochastically.. we have (3 - B ) ~ z =. the series (Z. if the dth differenced series foilows a white noise phenomenon. (4.1. the stationary process resulting from a properly differenced homogeneous nonstationary series is not necessady white noise as in (4.4) .+l . Thus.1) has been referred to as the autoregressive integrated moving average model of order (p. and q = 0. consider the following simpfe modifkation of (4.* . O0 = p(1 .B I ~ Zfollows . In the following discussion.} and insjgn%cant zero ACF for the differenced series ((1 . - 4..e. q) model. q) and is denoted as the IMA(d.BIB . When d = 0.2. = ZIII + Oo + a. In the random walk model.4.}. the general stationary ARMAb.2.. Z.. we have and the invertible MA operwhere the stationary AR operator &(B) = (1 .BqB(I)share no common factors.1. The parameter 90 plays very different roles for d = 0 and d > 0.12). with 4 + 1. Bo is called the deterministic trend term and. d.ifp = 0.* ator 8&B) = (1 .1) of Chapter 3. When p = 0.2. The resulting homogeneous nonstationary model in (4. the ARIMAb. q) process discussed in (3.= Z. the differenced series (1 . is often omitted ftom the model unless it is really needed. as shown in the next section. q) and is denoted as the ARIMAfp. the random walk model phenomenon can be characterized by large. q) model is also called the integrated moving average model of order (d.16)that O0 is related to the mean of the process.1). d = 1.then we have the well-known random walk model. we illustrate some commonly encountered ARMA models. Because the autocorrelation function of the AR(1) process is pk = 4': as 4 1. More generally.1) plus a random shock. d. the value of Z at time t is equal to its value at time (t .#lB .a.2.' When d 2 1.2. the original process is stationary. Next. (4.2 The Random Walk Model In (4. = a. i. Note that the random walk model is the limiting process of the AR(1) process (1 @)Z. (4.72 CHAPTER 4 Nonstationary l i m e Series Models 4. nonvanishing spikes in the sample ACF of the original series (Z.2. This behavior is similar to following a drunken man whose position at time t is his position at time (t . d.-I 4.1) plus a step in a random direction at time t.2b) This model has been widely used to describe the behavior of the series of a stock price. q) model. and we recall from (3. however..B)z.2a) with a nonzero constant t& + Z.2. a .1 The General ARIMA Model Obviously.4. -2. That the ACF for both series decays very slowly indicates that they are nonstationary. in general.2 Autoregressive Integrated Moving Average (ARIMA) Models 73 With reference to a time origin k.B)Zt = a. the nonstationarity of the random walk model without drift is shown through a stochastic trend and its values are free to wander. . # 0 is usually called the random walk model with drift.4. this term can become very dominating so that it forces the series to follow a deterministic pattern.2 and Figure 4.-r. Given Z.3. + at-l 2. If we look at their behaviors as plotted in Figure 4. at time t is . . contains a deterministic trend with slope or drift 19. and the model (1 . in both models are i. When Oo = 0. then the difference is striking.k)Oo + = Zk t aj. In fact.d. W P L E 4.6. both exhibit phenomena of a white noise process. The series of the random walk model with drift is clearly dominated by the deterministic linear trend with slope 4. To identify the model properly.k and +&are identical for the two models. when d > 0. we have z.4. .i. More generally.1) through the term Zt-l as well as by the deterministic component through the slope go. which is influenced by the stochastic disturbance at time (t .1) can be shown to correspond to the coefficient cud of Id in the deterministic trend. ) . the patterns of . -+ a .2. as shown in Table 4. Then how can we tell the difference between the regular random walk model and the random walk model with drift? We cannot if we rely only on their autocorrelation structures..B ) ~ z .BIZ.1 and Figure 4.+. although the sample ACF of the original series fiom a random walk model with drift generally decays more slowly.2. for the model involving the dth differenced series ((1 . = 4 + a. Thus. we assume that Oo = 0 unless it is clear from the data or the nature of the problem that a deterministic component is really needed. 80 is referred to as the deterministic trend term. Z.B)Z. where the a. = Ztd2 + 20a + 'a.4) the mean level of the series 2.2. we simulated 100 observations each from the model (1 . .+ a l t 4%id. when d > 0.3) with 8. For this reason. it is dear that 2.4. A? expected. On the other hand. normal N(0. 1) white noise. = + ( t . The sample ACF and PACF of the original series are shown in Table 4. + a. q.. we have a model with only a stochastic bend.the nonzero Oo in (4. Hence. For large t. we calculate the sample ACF and PACF of the differenced series (1 . The process in (4.1 To illustrate the results of the random walk mode1 discussed in this section.by (4. j=kf 1 fort > k. however. by successive substitution.5. 1) becomes where -1 < 6 < 1. simulated from random walk models. (1 . = 4 + a.B)Zt = at k 1 2 3 k 1 2 3 4 5 6 7 8 9 10 8 9 10 @] For ZZ. d = 1. 1) model with 6 = 0.B)Z.B)(1 + BB + o2Il2 + . For-1<6<1.from (1 .74 C W T E R 4 Nonstationary Time Series Models TkBLE 4. 1) Model When p = 0. This IMA(1. - (1 . (a) For 2. 4 5 6 7 4. the model in (4. the basic phenomenon of the IMA(1.2. Thus.l) model for Zt is seduced to a stationaty MA(I) model for the first differenced series.BIZ. 1) or IMA(1.I) model is characterized by the sample ACF of the original series failing to die out and by the sample ACF of the h t differenced series exhibiting the pattern of a fitst-order moving average phenomenon.1 Sample ACF and sample PACF for the original series 2.from (1 .2. The random walk model is a special case of this IMA(1. and q = 1.3 The ARfMA(0.. 1.OB) = (1 . (b) For 2.4. = a.B)Z.. = 4 -t a.3 Sample ACF and PACF of the random walk model.0). (a) For Z. from (1 . 75 .2 Autoregressive Integrated Moving Average (ARIMA) Models FIGURE 4. from (1 . where a = (1 .. Hence.BIZ. . . 5 Simulated series from random walk models..B)Z.B)Z.B)Z.B)Z..BIZ. = 4 f a. TABLE 4.3 Sample ACF and sample PACF for the original series 2. from (l k 1 2 k I 2 3 4 (c) For Zt from (1 3 4 . = a. .8B)(l 5 (b) For Z. (bf A simulated random walk series with drift: (1 . = a.= (1 - 5 8 7 . 8 9 10 8 9 10 . 6 . simulated from three ARIMA models.78 CHAPTER 4 Nonstationary 3 m e Series Models (a) -10 0 20 40 60 Time 80 100 Time FIGURE 4.9B)(1 . from k I 2 3 4 (1 ..753~2...5B)a. 7 . [a) A simulated random walk series: (1 . = (1 5 7 6 6 9 10 . fa) For Z. 75B)ar. from (1 .. (c) For 2. from (1 .' 79 .6 Sample ACF and PACF from three ARIMA models. (b) For 2.. = (1 . from (1 ...BIZ.8B)(1 .2 Autoregressive Integrated Moving Average (ARIMA) Models FIGURE 4.9B)(1 . = jl .5B)a..4..B)Zr = a.BIZ. (a) For Z. from the ARIMA(l. 6 7 [c) For W.B)Zt = a. the ACF ti of W.B)Z. . whereas its PACF $&tails off. 2 3 4 5 6 7 8 2 (b) For W. = (1 2 9 6 7 8 9 10 9 10 .B)Z. 1) model cuts off after lag 1. = (1 .B)Z.l.B)Z. Now.. 1. = (1 ..9B)(1 .5B)at 8 one shows the same phenomenon of the sustained large ACF and an exceptionally large first lag PACF. are nonhomogeneous. 1. = {I .80 CHAPTER 4 Nonstationary Time Series Models T B L E 4.B)Z.B)Z. = (1 . = (I simulated from three ARIMA modeIs. from { l .4 Sample ACF and sample PACF for the differenced series W.B)Z.4 and Figure 4. however.= (1 . Many nonstationary time series.B)Zt from the ARIMA(1. from the ARIMA(0. The sample ACF and PACF for these differenced series are shown in Table 4. from (1 . 1) model tail off as expected from their characteristics discussed in Chapter 3. we take differences for each of these series. ' k 1 k I k 1 (a) For W .3 Nonstationarity in the Variance and the Autocovariance Differencing can be used to reduce a homogeneous nonstationary time series to a stationary time series.. all the fine details are evident. To remove the shadow. The dominating phenomenon of nonstationarity overshadows the fine details of the underlying characteristics of these models. The nonstationarity of these series is not because of their time-dependent means but because of their . 0)model tails off..B)Z. = (1 . & 4.7. and both the ACF jkand PACF of Wt = ( 1 .8B)(1 .*TheACF jkof W.from (1 3 4 5 3 4 5 10 .75B)a. whereas its PACF $& cuts off after lag 1. . 4.BIZ.312. (a) For W.BIZ.9B)(I .3 Nonstationarity in the Variance and the Autocovariance FIGURE 4..B]Z. = (1 . = ( 1 . (c) For W..7 Sample ACF and PACF for differenced series from three AFUMA models. 81 . = (1 . (b) For W.75B)a.. from ( I ..83) (1 .. = [l .BIZ..B)Z. = a.5B)a.. from (1 . from (1 . = (1 . and a:. the complete characteristic of the process is determined for all time only by a finite number of parameters.B)a. Thus. . Similarly. Hence. First.d)u.60%.k > no. . (4. As we have shown in the previous section. we can write . =Z . with respect to the time origin no. 1) model . To reduce these types of nonstationarity. + (1 .82 CHAPTER 4 Nonstationary 'Erne Series Models time-dependent variances and autocovariances. although the model is nonstationary. to a series of no observations. 2. (Z1. &.3. 4.. i. the mean function of the ARIMA model is t h e dependent.+l . With reference to this time origin no.. That is.. for t by successive substitutions. We now show that the ARXMA model is also nonstationaty in its variance and autocovariance functions. will also be nonstationary in the variance and the autocovariance. For example.). > 110. f (I .-. the complete future evolution of the process can be developed from the ARIMA model fitted to a given data set.3.1 Variance and Autocovariance of the ARf MA Models A process that is stationary in the mean is not necessarily stationary in the variance and the autocovariance. however. + . we need transformations other than differelicing. A process that is nonstationary in the mean.2) . suppose that we fit the IMA(1.e. for t + a. the & the Oj. we note a very fundamental phenomenon about the ARIMA model. . . ) of the process are afso time dependent and hence are not invariant with respect to time translation.8. The vafiance Var(Z. although the original series Zt is nonstationary. Thus.3.$+. are known values with respect to the time origin no. we can apply a proper differencing to reduce it to stationary process.7).2 Variance Stabilizing Transformations Not all nonstationary series can be transformed to stationary series by differencing.2.BP) and e ( B ) = (1 . is stationary and can be represented as an ARMA process where $(B) = (1 - 41B - . Many time series are stationary in the mean but are nonstationary in the variance. If r is large with respect to no.4.3. FortunateIy.-~. z. the parameters #i.d l B - . horn (4. - In general.C0rr(2. it is difficult or impossible to make the statistical inferences of a process that is nonstationary in both the mean and the autocovariance or autoco~elationfunction.)[5 1.B)~z. The variance. in exactly the same way as the stationary case discussed in Chapter 7. . its properly differenced series Wt = (1 . Bj. 3.) and the autocorrelation COK(Z. and Now.-~. 2. 4.3 Nonstationarity in the Variance and the Autocovariance 83 where we note that Z. and a. 4. In other words.) 1.-~. Var(Z.) is unbounded as t 4 co. Because lcorr(~.-~.). and a2 that control the evolution of the nonstationary phenomenon of 2.Bq) have all their roots outside of the unit circle. . The autocovariance COV(Z. we have the foliowing important observations: 1. of the ARTMA process is time dependent.4) to (4.3. Thus. can be estimated from the differenced series W.3. we need a proper variance stabilizing transformation. and Var(Zt) # Var(Zt-k) fork # 0. then from (4. for the homogeneous nonstationary process. That is. with only a single realization. it implies that the autocorrelation function vanishes slowly as kjncreases.- .7). It is very common for the variance of a nonstationary process to change as its level changes. 2. To overcome this problem. they are not only functions of the time difference k but are also functions of both the time origin t and the original reference point no.Z. . if the variance of the series is proportional to the level so that Var(Zt) = cp.) = cp..) to be constant. Thus. if the standard deviation of Var(Z. Next. Now Thus. we approximate the desired function by a first-order Taylor series about the point pt. has a constant variance? To illustrate the method.12) implies that For example. in order for the variance of T(Z. the variance stabilizing transformation T(Z. ln(Zt).'. will have a constant variance. a desired transformation that has a constant variance will be the reciprocal IjZ.. Third. will have a constant variance..) must be chosen so that Equation (4.) = c$.) evaluated at p.84 CHAPTER 4 Nonstationary Time Series Models for some positive constant c and function f. then Therefore. then fi. T(Zr). How do we find a function T so that the transformed series. a logarithmic transformation (the base is irrelevant) of the series. then st series is proportional to the level so that Hence. a squxe root transformation of the series. if the standard deviation of the series is proportionai to the square of the level so that Var(Z.3.Let where T'(pJ is the iirst derivative of T(Z. and choose the value of X that gives the minimum residual mean square error. where is the geometric mean of the data coming from the Jacobin of the transformation.17) is that we can treat A as a transfarmation parameter and estimate its value from data. 2..+IB p ) = (1 . we can use the power transfonnation introduced by Box and Cox (1964).3. For example. Values of A (lambda) Transformation * hZ. This class of transformations contains many of the previously discussed transformations as special cases.3 Nonstationarity in the Variance and the Autocovariance 85 More generally. $ p ~ ~ ) ( ~ -i A ) .4. (1 . we can include A as a parameter in the model. Thus. for A = 0. For example.BIB . the following table shows some commonly used values of A and their associated transformations. we note that One great advantage of using the transformation in (4. Box and Cox (1964) show that the maximum likelihood estimate of A is the one that minimizes the residual mean square error calculated from the fitted model on the 'hormalized" data . the residual mean square error is computed from the fitted model on .B.Bq)a. (no transformation) To see why X = 0 corresponds to the logarithmic transformation.. 38at-*.81at-..B I ~ Z= .n weights.9B)(1 ..5B)a. 1. Is it stationary? 2 at 1... . 4. The variance stabilizing transformations introduced above are defined only for positive series. = a. and PACF.1 Consider the model (a) Is the model for Zt stationary? Why? (b) Let W. ..9B)(1 . (iv) (1 .. is not as restrictive as it seems because a constant can always be added to the series without affecting the correlation structure of the series. = a.BIZ. (Ei) (1 .1 ...6B)a.B)~z..B)Z. .B)Z. = (1 .. (b) (I . k = 0. 4.6B)a.cat-1 -I. should be performed before any other analysis such as differencing.5 f (1 . (c) Calculate and discuss the behavior of the sample ACF. -I.86 CHAPTER 4 Nonstationary Time Series Models In a preliminary data analysis. however.3 (a) Simulate a series of 100 observations from each of the following models: (i) (I . . for t where c is a constant. 2. . a. (a) Find'the mean and covariance for 2. (ii) (1 ..BIZ. A variance stabilizing transformation.5B)a. = (1 .B)Zt = (1 . = ( 1 . Suppose that Z. + .. Express each of the above processes in the AR representation by actually finding and plotting the . 1. the transfornation not only stabilizes the variance.. if needed. but also improves the approximation of the distribution by a normal distribution. . Frequently.. (b) Plot the simulated series.2 Consider the following processes: (a) (1 . Is it stationary? (b) Find the mean and covariance for (1 .B)Z. 4. 2 0 for each simulated series and its differences. = . is generated according to Z. + cal.. 4.Is the model for Wt stationary? Why? (c) Find the ACF for the second-order differences Wt..4 &. 3. one can use an AR model as an approximation to obtain the value of A through an AR fitting that minimizes the residual mean square error. The choice of the optimum value for A is based on the evaluations of the residual mean square error on a grid of A values. This definition. Some important remarks are in order. p2)2/~t). (c) Find the variance of the transformed variable. It is known that r.6 Let ZI. &. 2..Z.Exercises 87 4. (a) Show that the variance of Z. . @) Find a proper transformation so that the variance of the transformed variable is independent of p.r. jVt = 2..)) is actually a variance-stabilizing transformation. is asymptotically distributed as N (p. .7 Let r.-l. = a. be the Pearson correlation coeff~cientfrom a sample size of n. . depends on its mean p. PACF. Show that Fisher's z-transformation Z = ln((1 r n ) / ( l . 4.5 Consider the simple white noise process. Znbe a random sample from a Poisson distribution with mean p. and AR representation of the differenced series. (1 . . 4. 4 + . Discuss the consequence of overdifferencing by examining the ACF.where p is the population correlation coefficient. . and Whittle (1983). Consider the general ARIMA@.#qB . which leads us to the minimum mean square error forecast. among others. . . are derived from a general theory of lineaf prediction developed by Kolmogorov (1939. The termforecasrirlg is used more frequently in recent time series literature than the term prediction.BIB . and investment analysis. we develop the minimum mean square error forecasts for a time series following the stationary and nonstationary time series models introduced in Chapters 3 and 4. our objective is to produce an optimum forecast that has no error or as little error as possible. Kalrnan (1960). Forecasting is essential for planning and operation control in a variety of areas such as production management. its operation is usually based on forecasting. The deterministic trend parameter Go is omitted for simplicity but no loss of generality.. + 5.l. . In this chapter.1 Introduction One of the most important objectives in the analysis of a time series is to forecast its future values.Forecasting Uncertainty is a fact of life for both individuals and organizations. Wiener (1949). 5. Yaglom (1962). Equation (5. following this model for both cases when d = 0 and d 0.8. We also discuss the implication of the constructed time series model in terms of its eventual forecast function. however.Bq). financial planning.l) is one of the most commonly used models in forecasting applications. quality control. Formulas will also be developed to update forecasts when new information becomes available. inventory systems. Even if the final purpose of time series modeling is for the control of a system. Most forecasting results. d. and the series a. is a Gaussian N(0.. . This forecast will produce an optimum future value with the minimum error in terms of the mean square error criterion. We discuss the minimum mean square error forecast of 2. cr:) white noise process. 1941). .2 Minimum Mean Square Error Forecasts In forecasting. q) model ' - O(B) = (1 .~.I~~BP). where 4(B) = (1 . + $~+I%-I+ '!'r+zan-2 f ' . . the stationary ARMA model Because the mode1 is stationary.2 Minimum Mean Square Error Forecasts 89 5. we first consider the case when d = 0 and . Z.(l) of Zn+lbe where the 1c.2.e. a . . i. ZR-l. Because 2. Zn-2.2. we can rewrite it in a moving average representation.. Using (5. . Zn-*.f are to be determined. %an alI be written in the form of (5.2). for t = n. The mean square error of the forecast is which is easily seen to be minimized when k ( ~= )$10. ..5.u = 0..2.2. . . n .-I. .4) and that we have $rq = Hence. .1 Minimum Mean Square Error Forecasts for ARMA Models To derive the minimum mean square enor forecasts. . where Suppose that at time t = n we have the observations Z. we can let the minimum mean square error forecast Z.1. . and wish to forecast I-step ahead of future value as a linear combination of the observations Z. -l. ..2. The forecast error e. . This correlation is true for the fdrecast errors in(l)+ and which are made at the same lead time 1 but different origins n and n . the one-step ahead forecast error is A Thus.8) is a linear combination of the future random shocks entering the system after time n. if one-step ahead forecast errors are correlated. That is.?n(l)is usually read as the I-step ahead forecast of Zntr at the forecast origin n.j for j c I.(l) as shown in (5. the minimum mean square error forecast of Z. The forecast error is Because ~ ( e . .5 n) = 0.. the (1 . howevel. .a) 100% forecast limits are where Nao is the standard normal deviate such that P(N > Nan) = a/2. For example. which implies that Zn(l) is indeed the best forecast of Zntl. a. the forecast is unbiased with the error variance For a normal process. The forecast errors for longer Icad times. is given by its conditional expectation. are correlated. . the one-step ahead forecast errors are independent. Otherwise. Specifically. and hence improve the forecast of Zntl by simply using &. an-?.90 CHAPTER 5 Forecasting Thus. It is also true for the forecast errors for different lead times made from the same time origin.+l as the forecast. .. then one can construct a forecast & + l from the available errors a. ( l ) ] ~t. we apply the operator 1 + i.18) and obtain where no = . .l ~ i .. It can be easily shown that . of course. q) model with d P 0. we rewrite the model at time t + I in an AR representation that exists because the model is invertible. when the series is known up to time n. the complete evolution of the process is completely determined by a finite number of fixed parameters. .2 Minimum Mean Square Error Forecasts 91 Minimum Mean Square Error Forecasts for AFUMA Models We now consider the general nonstationary ARJMA(p d. respectively. .. Thus. + $ l . as shown in Chapter 4. where or. we can view the forecast of the process as the estimation of a function of these parameters and obtain the minimum mean square error forecast using a Bayesian argument.Bq) is an invertible MA operator. . +~Iz.+) is given by its conditional expectation Zn-l.. the optimal forecast of Z.2. . .6. q) model with d = 0.The minimum mean square error forecast for the stationary ARMA model discussed earlier is. equivalently.ho = 1. Although for this process the mean and the second-order moments such as the variance and the autocovariance functions vary over time.). 5. -~ .h13+ . which corresponds to a squared loss function. Following Wegman (1986).I and i..$$$) is a stationary AR operator and 613) = (1 8 . a special case of the forecast for the ARIMA(p.l to (5.2 .4 E B .2. To derive the variance of the forecast for the general ARIMA model. It is well known that using this approach with respect to the mean square error criterion. Hence.5.. E(z. d.- where 4 ( B ) = (1 .is. . Hence. increases and becomes unbounded as I -+ oo.J. Hence.92 CHAPTER 5 Forecasting Choosing $ weights so that we have where Thus.14) hold for both stationary and nonstationary ARMA models. the forecast limits in this case become wider and wider as the forecast lead 1 becomes larger Ad larger as shown in Figure 5. 1-1 2 For a stationary process.2. In fact.10). t I11) = 0. For a nonstationary process. from (5.7) through (5. For more discussion on the properties of the mean square error forecasts. Thus. for t 5 n. given Z. does not exist..25) is exactly the same as (5.$~ $.2.21).:~i#f .2. we have because E(a.+j[ Z.2. for j > 0. the eventual forecast limits approach two horizontally parallel lines as shown in Figure 5.2. liml. see Shaman (1983).2.l(b). x. E:l=o$j exists.. liml. the results given in (5.l(a). The forecast enor is where the $j weights.. by (5. The practical implication of the latter case is that the forecaster becomes less certain about the result as the forecast lead time gets larger.8). can be calculated recursively h m the weights as follows: Note that (5. @) Forecasts for nonstationary processes. 93 .2 Minimum Mean Square Error Forecasts (a) X X Actual Observation Forecast + Forecast Limit FIGURE 5.5.1 (a] Forecasts for stationary processes. O.94 CHAPTER 5 Forecasting 5. we have Taking the conditional expectation at time origin n. 1) model: . Let The general ARIMA(p. we consider the l-step ahead forecast Zn(I)of Zn+[for'the following ARIMA(I. we obtain where and W P L E 5.15) can be written as the following difference equation form: For t = n + I. d. 1) or ARMA(1.1 To illustrate the previous results.3 computation of Forecasts A From the result that the minimum mean square error forecasts &(I) of Zn+[at the forecast origin n is given by the conditional expectation we can easily obtain the actual forecasts by directly using the difference equation form of the model. q) model (5.2. 7) Equating the coeEcients of Bj on both sides gives Hence. To calculate the $weights ance.. I+] (1 .4B) and B(B) = (1 .3.e. the model in (5. because the MA representation does not exist. Calculate the forecast. kn(l)as the conditional expectation from the difference equation form.@)] as l -+ m.3.5.bB)(l + $lB + $ 2 +~ .l) process i needed for the forecast variwhere we note that (1 . which corresponds to taking the first difference. When 4 = 1. That is. the forecast error variance becomes which converges to q:[l + ( 4 . (5. we can calculate the $ weights from the moving average representation (5.2)with +(B) = (1 .2.BB). and 2. When < 1. Calculate the forecast error variance Var(e. .3 Computation of Forecasts 95 1. . For t = n + I. we first rewrite (5. we can rewrite the above model in (5.0 ) ~ / ( 1 .4) becomes an IMA(1.10) in an AR form when time equals t + 1.) = (1 . i.4) as Hence.r.(l)) = CT.3.B)p = 0.$I~ $f.3.~ .BB). the choice between a stationary ARMA(1.4 The ARIMA Forecast as a Weighted Average of Previous Observations Recall that we can always represent an invertible ARIMA model in an autoregressive representation.8). As expected. j r 1. Z. + which approaches w as I + oo. Thus. Equating the coefficients of ~i on both sides gives = (1 ..9) when 4 + 1.3.3. + nl+l= (1 .I.rrlB.T ~ O ) B-~( 7 ~ r r 2 B ) ~ 3- - .B) = I .) * . 1 5 j becomes 5 1 . ]#I 5. of _the process as b +a.d).B ) .19)~= (1 . Zn(l) approaches the meat. from (5.dB) or (1 . equivalently.( r 2.. by applying (52. the first equation of (5. . 1) model has very different implications for forecasting.1) for all I. is expressed as an infinite weighted sum of previous observations plus a random shock.r n B 2 - . the forecast is free to wander.8)8 + (1 .1 1) Now. Thus.e.. the variance of e.(l) = Z.26) we obtain $1 = $2 = a1 = (1 .(E). That is.6b) implies that Z.3.(Z . when 4 is close to 1. a * - = ( 1 .9).2. (5.3. or.When # + 1.3.12) is the limiting case of (5.(nl + B)B . we have Gj = (1 . That is. i.96 CHAPTER 5 Forecasting where n(B) = 1 . with no tendency for the values to remain clustered around a fixed level. (5.Jt can be seen even more clearly hfom the limiting behavior of the I-step ahead forecast &(I) in (5. p. In this representation. 1) model and a nonstationary MA(1.6b). For 4 1.B) (1 . This optimal property is not shared in general by the moving average and the exponentiaf smoothing methods. . For example. are special cases of ARIMA forecasting. By repeated substitutions. many smoothing results. it can be shown by successive substitutions that where and T)" = Tj.4 The ARIMA Forecast as a Weighted Average of Previous Observations 97 where Thus. Thus. where More generally. we see that & ( I ) can be expressed as a weighted sum of current and past observations Z.5. The user does not have to specify either the number or the form of weights as required in the moving average method and the exponential smoothing method. for f 5 n. Note dso that the ARlMA forecasts are minimum mean square error forecasts. ARIMA models provide a very natural and optimal way to obtain the required weights for forecasting. such as moving averages and exponential smoothing. 4.9) and (5. in(2)is a weighted average of the previous observations with tbe ?rF) weights given by = #(# . we see that iry) which. The associated ?r weights provide very useful information for many important managerial decisions.4) form a convergent series.3). with d -: 0.rr weights in (5.from (5.4.4.3.3) or (5.2 For the ARMA(1..98 CHAPTER 5 Forecasting For an invertible process.4. Again.6b) with 1 = 2 and p = 0. from (5. which impties &at for a given degree of accuracy.B ) B j . as expected. j= 1 When I = 1.e)ej-lk(r .3.4. agrees with (5.4. Equating the coefficients of ~j on both sides gives Assuming that p = 0. EXAMPLE 5. Note that by comparing (5. I) model in (5. we have.5). .10).4) with 181 c 1. we get For 1 = 2.(I) depends only on a finite number of recent observations. Z.4. we have &(r) x(# 03 = .4.' for j a 1.4) and (5. from (5. the .2).j). The mod> 1 and 181 1 are both noninvertible.25).l)j-l(I + 4) for 8 = -1. note that if 161 > 1. is listed again in the following: In particular.33. the . Hence.5.4. we have the following updating equation: . els corresponding to /$[ = I$( ]%I - 5. a meaningful forecast can be derived only from an invertible process. which implies that the more recent observations have a greater influence on the forecast. the one-step ahead forecast error is Clearly.5. For 2 1. which. From (5.1) for 0 = 1 and 9 = (. the forecast error at the forecast origin (11 .5. and using (5. the 1-step ahead minimum mean square forecast error for the forecast origin n using the general ARMA model is obtained in (5. is available for t 5 n. which have the same absolute value for all j. although the model is still stationary. let us examine (5.rr weights become = (4 . To see the trouble. Thus.1) is * zn. it is clear that where and Hence. For 161 < 1. its AR representation does not exist. which iniplies that the more remote past observations have a much greater influence on the forecast. When = 1.5 Updating Forecasts Recall that when a time series 2. $-I(#8) converges to 0 as j * m.9) more carefully. then the a weights rapidly approach -# or +m as j increases. I). for convenience.5 Updating Forecasts 99 To see the implications bf these weights. all past and present observations are equally important in their effect on the forecast. after substituting.zndI(l) = an.2. rearranging. The updated forecast is obtained by adding to the previous forecast a constant multipIe of the one-step ahead forecast error a.1.6.7. the points Z.1). equivalently. Zn I for 13 q satisfies the homogeneous difference equation of order (p -I-d).1) for I > q.Zn(q . The are constants that depend on time origin n.+l = &(I)).B ) ~Recall .2) keing the splution of Equation (5.. 4(B)(l .(q). . .d). .Zn(l). 2.3) that When I > q. - + 5. when I = q + 1.2) holds for all I 2 (g . from Equation (5.6. Then. the general solution is given by A L) which.3.6. they are fixed constants for all forecast lead times 1.p .6. we wiU naturally modify the forecast in(11) made earlier by proportionalty adding a constant multiple of this error.d 1) actually iie on the curve represented by Equation (5. or.+l = Zn+l .100 CHAPTER 5 Forecasting . Thus.6 Eventual Forecast Functions Let the ARTMA model be where $(B) =. holds actually for all 1 2 (q . The reason that Equation (5. when the value ZM1 becomes available and is found to be higher than the previous forecast (resulting in apositive forecast error a.2). Let P ( B ) = Jli=l(l. The constants change when the forecast origin is changed.d + 1) and hence is often referred to as the eventual forecast function. + . as seen shortly.p .RiB)mt with N xi=lnri = (p 4. For a given forecast origin n. For example. from Theorem 2.p . That is. ..(1)becomes where B is now operating on I such that 23gn(l)= in([ .d + 1) foltows from (5. which is certainly sensible. where Zn(-j) = Znlj for j 2 0.d +.4).6.I).(q). 1)model The eventual forecast function is the solution of (1 . The h which holds for 1 r (q .&(l))/(l EXAMPLE 5. .4 Consider the -A( 1. the forecast &(I) satisfies the difference equation (1 . c?)= ..p . .2). I . theAfunction.3.. Zn(g .p)+d. hence.4). EXAMPLE 5. 0.p]$'-'. + &'+ A EXAMPLE 5.5. 1) model given in (5. As 1 -+ co. Thus.1) = 1.# B ) ( ~ ~ (-I )p ) eventual forecast function is given by = 0 for 1 > 1. the function passes through the first forecast i n ( l ) and the last observation Z. we have cP) c F ~ cIn'=(in(1) . i.+zn)/(l - 9). Thus. and they can be used to solve the constants.6 Eventual Forecast Functions 101 In other wozds. Zn(l) approaches the mean p of the stationary process as expected.62) is the unique curve that pass? through the (p + d) values given by Z.3 For the ARIMA(1.p . the function passes through Zn(l). & ( l ) = + [ i n ( l ) .@)(I A . Zn(q .d + I). These (p + d) values can serve as initial conditions for solving the constants in (5.5 Consider the ARUIA(O.2. Thus.B)Z.in(5.(l) = 0 and is given by which holds for 1 2 (q . .p . .d t 1) = 0. Because i n ( l ) = + and Z. . = &(o) = df')+&).1)model . in(l)= a and c?)= (&(l) .e. $)'.2 that the deterministicterm do in this case is related to the coefficient of a first-order deterministic time trend.&. Combining the solution (5.6.6 Consider the followingARlMA(1. 1) model with a deterministic trend term The eventual forecast function is the solution of (1 .2. Thus. The solution for the homogeneous equation is given in Equation (5.(I) .6. For a particular solution.7) is clearly dominated by the last term related to a deterministic trend. Because Z (1) + b?) 4 f Bo/(l . CUP)+ Because ( I .4 ) .9).(l) = Zn + (.2) it is given by in(l) = CY) + C$)'I.Bol/(l .p . u!.6.l")l.. and + =!?) The difference between Equations (5. the forecast from model (5.i n ( l ) ) ( l . = & ( I ) . andi.6) which holds for I r ( q . and from (5. we have Cln) = Zn. the function is a straight line passing through i n ( l ) and Z. + cF) CP' ExiihfPLE 5.6.Zn)l.4).+B)aj")= Bo and ap)= 00 /(1 . ( l ) = ~ j " ) and Zn = &(0) = Cj").. we have bvJ = [(&(I) .4) of the homogeneous equation and a particular solution given in (5.p . .B)$(I) = Bo for 1 > 1.?.5) is drastic. When the forecast lead time 1 is large.4Zn)(l . 1.102 CHAPTER 5 Forecasting The eventual forecast function is the solution of ( 1 .) are combined new canstants to be determined. Because i .6..6.6.+).d + 1) = 0.+I2.4) and Zn = Zn(O) = b y ) by).B)(a[")+ ar'l) = ap! it follows that (1 .6.4) + 001/(1 . by) = [(Zn . Thus. where b p ) and b(.1 1) and (5. and a particular solution becomes which holds also for 1 > (q . we get the eventual forecast function which passes through i n ( l ) and Zn.B ) ~ & ( z )= 0 for 1 (5.+B)(l . we note from the discussion in Section 4. > 1.d + 1) = 0. consider the AR(1) model with 4 = .7 A Numerical Example As a numerical example.2.5.1) . Zlm. Zlm. We proceed as follows: 1.2. The results are shown in Figure 5. The AR(1) model can be written as and the general form of the forecast equation is it[l)= P + + ( i t ( l .p).4 =p + +yz.10)are The 95% forecast limits for Zroz. To obtain the forecast limits. Suppose that we have the observations Zg7 = 9. 1 2 1.9. and Zlol with their associated 95% forecast limits. The 95% forecast limits forZlol fiom (5. and and want to forecast Zlol.5. . p = 9.are The 95% forecast limits for ZlO3 and ZlO4can be obtained similarly. . we obtain the )t weights from the relationship That is.I. Zgg = 9. 2. Zlm = 8. 299 = 9.6. and cr: = .7 A Numerical Example 103 5. Thus. unknown and have to be estimated from the given . In practice.Zlm.5) as follows: The earlier forecasts for ZIM. Suppose now that the observation at t = 101 turns out to be Zlol = 8.5. the parameters are. and Zloj by using (5. 3.8.2 Forecasts with 95% forecast limits for an ARC11 process.2 I X Actual Observation Forecast + Forecast Limit + 4- + + 8-4 FIGURE 5.ZIM.and zIo4 made at t to the negative forecast error made for Zlol. of course.104 CHAPTER 5 Forecasting 8.we can update the forecasts for ZIo2. = 100 are adjusted downward due This forecasting is based on the assumption that the parameters are known in the model. Because +r = #' = (.6)[. = (1 . hence.l. (b) Find the 95% forecast limits for the forecasts in part (a). . .4 Verify Equation (5. 5.2. 5. .2. (b) Compute-and plot the correlations between the error of the forecast ~ ~ ( and 3 ) those of Z. (b) Find the variance of the 6-step ahead forecast error for 1 = 1.5 A sales series was fitted by the ARIMA(2. 1. 5.O) model where &: = 58.20). . .PI = at.3 Consider the model (a) Compute and plot thecomlations between the error of the forecast & ( 5 ) and those of the forecasts Ztlj(5) for j = 1.(I) for I = 1.5. (i) (1 .2. (a) Calculate the forecasts of the next three observations. (c) Find the eventual forecast function. = 800. Zn-l = 770. 5.4 4 . . With respect to the forecasting origin t = n. (iii) (1 . and 3.2. . . the estimates are known constants. . . (a) Find the I-step ahead forecast & ( I ) of Zn. &.BIZ. 5 . .41BI(Zt .Exercises 105 observations {Z1.# 2 ~ 3 ( ~-t P) = at. Estimation of the model parameters is discussed in Chapter 7.41B)(l .1 Consider each of the following models: (0 (1 .Zn}.000 and the last three observations are Zn-2 = 640.2 (a) Show that the covariance between forecast errors fiom different origins is given by (b) Show that the covariance between forecast errors from the same origin but with different lead times is given by 5. the results remain the same under this conditional sense.OIB)at. . however. and 2. . 6.4.9.p) = a. ZS1 became known and equaled 34.6B)Z.Zg3. 1) model A (a) (b) (c) (d) Write down the forecast equation that generates the forecasts. 1.2.23) and (5.. Suppose that we have the observations q6= 60. Find the updated forecasts for Za2. 5..4. Express the forecasts as a weighted average of previous observations. l q = 1. 1) model (a) Find the rr weights for the AR representation Z.3~')a.2.. (b) Let i ( 2 ) = Express ?rj2)in terms of the weights TI-) where 5... = .4. (b) (1 . (a) Forecast GI. .6 Consider the IMA(1.3B)(1 . 5.and Zg4. (c) Suppose that the observation at t = 81 turns out to be Zal = 62. = Z.(I).&B . where 49 = . Zn = 58.8 Consider the ARZMA(0.4. and rri = 2.2..4 f a. ZSo= 33. be the two-step ahead forecasts of Z..6. (b) What is the eventual forecast function for the forecasts made at t = 50? (c) At time t = 51.+ 2at time t. + .ZE3.B)Z. Discuss the connection of this forecast ~ J z )with the one generated by the simple exponential smoothing method. = (1 . and vi = 1.23) and (5.10 Consider the model = 33. Consider an AR(2) model (1 . ZT9=: 70. -t a.106 CHAPTER 5 Forecasting 5.11 Obtain the eventual forecast function for the following models: (a) (1 .7 (a) Show that (5. (b) Illustrate the equivalence of (52.. for 1 = 1.and ZW.5) are equivalent.1.6. 5.Zg2. and the observations (a) Compute the forecast Z50(l).9 = 1. 3 and their 90% forecast limits.5) using the model in Exercise 5. Find the 95% forecast limits for & ( l ) . and Zso= 62. (X1 and show that Xj.7. Z.= 2. 2. = 64.9. p = 65.&B2)(Z.2. 5.8B . (b) Find the 95% forecast limits for the forecasts in part (a). Update the forecasts obtained in part (a).4. 2. where the a.. . (c) Find the forecasts. 5.6 ~(Zt ~) 65) = n.81Zrd2+ a..12 Consider the AR(2) model. and Z50= 21. (b) Find the variance of the process. = -2+ 1... (e) (1 . (d) Find the 95%forecast limits in part (c). (d) (1 .75B + . Z48 = 22.2B . = (1 . (a) Find the mean of the process.8Z. ?50(l). 3 4 ~ ~ ) a .-1 . Za = 17.1. and 3. and Z47= 19.B)~z.3ft2)at. for E = 1.B)~z.2B . (e) Find the eventual forecast function for the forecast made at f = 50 and its h i t ing value..6B) (1 . is the Gaussian white noise with mean 0 and variance 4.2.= 50 -I-(1 . .Exercises 107 + (c) (1 ... aM. and PACF. For example. and PACF. q) model Model identification refers to the methodology in identifying the required transformations. In any time series analysis. we use the following useful steps to identify a tentative model. particularly the characteristics of these processes in terms of their ACF. pk. and sample PACF. the first step is to plot the data. outliers. d. pk. such as variance stabilizing transformations and differencing transformations.Model Identification In time series analysis. Through careful examination of the plot. .5. This understanding often provides a basis for postulating a possible data transformation. Plot the time series data and choose proper transformations. practice. the most crucial steps are to identify and build a model based on the available data. they have to be estimated by the sample ACF. Step 1. a large single signscant spike at lag 1 for Fk will indicate an MA(1) model as a possible underlying process. these ACF and PACF are unknown. discussed in Section 2. We also discuss some other identification tools such as the inverse autocorrelation and extended sample autocorrelation functions.1 Stem for Model Identification - - To illustrate the model identification. and for a given observed time series ZI. J M . seasonality. . . we give illustrative examples of identifying models for a wide variety of actual time series data. 6. . and other nonnormal and nonstationary phenomena. we usually get a good idea about whether the series contains a trend. because we know that the ACF cuts off at lag 1 for an MA(1) model. with the known patterns of the ACF. Given a time series. These steps require a good understanding of the processes discussed in Chapters 3 and 4. for the ARMA models. nonconstant variances.. &. Pk. and the proper orders of p and q for the model. in model idenacation. . the decision to include the deterministic parameter Bo when d 2 1. kk. and Z. I $ ~In. our goal is to match patterns in the sample ACF. After introducing systematic and useful steps for model identification. we consider the general ARIMA(p. &. and PACF. Thus. . are also stationary. 2. One can also use the unit root test proposed by Dickey and Fuller (1979).Bq). In most cases. In a borderline case. we can apply Box-Cox's power transfornation discussed in Section 4. Lf the sample ACF decays very slowly (the individual ACF may not be large) and the sample PACF cuts off after lag 1. . Step 3.1 Steps for Model Identification 109 In time series analysisj the most commonly used transformations are variance-stabilizing transformations and differencing.B ) ~ z is .B.&BP). A series with nonconstant variance often needs a logarithmic transformation. but do beware of the artifacts created by overdifferencing so that unnecessary overparameterization can be avoided. we refer to this transformed data as the original series in the following discussion unless mentioned otherwise. More generally.#lB .B)Z. the needed orders of these p and q are less than or equal to 3. although occasionally for data of good quality one may be able to identify an adequate model with a smaller sample size.6. TABLE 6. and Miller [1986])..B ) ~ z . to stabilize the variance. To identify a reasonably appropriate ARSMA model. if necessary. Some authors argue that the consequences of unnecessary differencing are much less serious than those of underdifferencing. 1.for d > 1. A strong duality exists between the AR and the MA models in terms of their ACFs and PACFs. More generally. then it indicates that differencing is needed.. we should always apply-variance-stabilizing transformations before taking differences. or 2. Because a variance-stabilizing transformation.B ) ~ + ~for z . . is often chosen before we do any further analysis. differencing is generally recommended (see Dickey. . and q is the highest order in the moving average polynomial (1 . Bell. 2.2. Process ACF AREP ) Tails off as exponentia1 decay or damped sine wave Cuts off after lag p MA(q) Cuts off after lag q Tails off as exponential decay or damped sine wave ARMA(P. Because variance-stabilizing transformations such as the power transformation require non-negative values and differencing may create some negative vdues. TabIe 6. q ) Tails off after lag (q .1 summarizes the important results for selectingp and q. Compute and examine the sample ACF and PACF of the properly transformed and differenced series to identify the orders of p and q where we recall that p is the highest order in the autoregressive polynomial (1 .3. Note that if (1 . .p) Tails off after lag (p . to remove nonstationarity we may need to consider a higher-order differencing (1 .1 Characteristics of theoretical ACF and PACF for stationary processes. ideally. then (1 .q ) PACF . and the number of sample lag-k autocorrelations and partial autocorrelations to be calculated should be about 11/4. Some general rules are the following: 1. Compute and examine the sample ACF and the sample PACF of the original series to further confirm a necessary degree of differencing so that differenced series is stationary. d is either 0. we need a minimum of n = 50 observations.fllB . . Try taking the fist differencing (1 . i = 1. stationary. Step 2. Usually. for a nonstationary model. (1 .5.n ~ a r ( m )= E. then we can test for its inclusion by comparing the sample mean of the differenced series W.110 CHAPTER 6 Model Identification We identify the orders p and q by matching the patterns in the sample ACF and PAW with the theoretical patterns of known models. the parameter Bo is usually omitted so that it is capable of representing series with random changes in the level. Most criminals disguise themselves to avoid being recognized.6.21) and (2.27) are very rough approximations. however. because the underlying model is unknown.11 that lim.B ) 2.5. d. Model improvement can be easily achieved at a later stage of diagnostic checking. If there is reason to believe that the diEerenced series contains a deterministic trend mean. Test the deterministic trend term Oo when d > 0. the variance and hence the standard error for is model dependent.5 often disguise the theoretical ACF and PACF patterns. so that where we note that a$ = mj/(1 - 42)The . slope. Some authors recommend that a conservative threshold of 1. ~vhichis also fnte of ACF and PACF.. we have. For example..2. most available software use the . Thus. At the model identification phase.. = (1 . for i h e m A ( 1 . however. hence. required standard error is Expressions of Sv for other models can be derived similarly. As discussed in Section 4.6. especially for a relatively short series. or trend. yj. The estimated variances of both the sample ACF and PACF given in (2. q!t(B)(l . To derive Sc. from (2.9). 0) model. The s m p h g variation and the cornlation among the sample ACF and PACF as shown in Section 2..B)d Zt = f B(B)a. ~ with its approximate standard error Ss.5 standard deviations be used in checking the significance of the short-term lags of these ACF and PACF at the initial model identification phase..we note kom Exercise 2. in the initial model idenGcation we always concentrate on the general broad features of these sampleACF and PACF without focusing on the fine details.8) and y(1) is its value at B = 1. Thl: art of a time series analyst's model identification is very much like the method of an FBI agent's criminal search. Hence. where y(B) is the autocovariance generating function defined in (2.+B)Wt = a. Step 4. 1 shows Series Wl.1 Daily average number of truck manufacturing defects (Series W l ) . SPLUS.5) reduces to Alternatively. we present a variety of real-world examples to illustrate the method of model identiiication. MINITAB. 6.}. and SPSS are available to facilitate the necessary computations.I. . EXAMPLE 6. Many computer programs such as AUTOBOX. The data consist' of 45 daily observations of consecutive business days between FIGURE 6.6. one can include do initially and discard it at the final model estimation if the preliminary estimation result is not significant..$k 11 1 are the fist k significant sample ACFs of { W. which is the daily average number of defects per truck found in the final inspection at the end of the assembly line of a truck manufacturing plant.2 Empirical Examples where .1 Figure 6. Under the null hypothesis pk = 0 fork 2 1. SCA. SAS. Some programs are available for both the mainframe and personal computers. EVIEWS.. RATS. Parameter estimation is discussed in Chapter 7.1. .. Equation (6.Go is the sample variance and .2 Empirical Examples In this section. November 4 to January 10. named after Wolfer. p. who was a student of Wolf. The figure suggests a stationary process with constant mean and variance. giving a total of n = 302 observations. telecommunications. This series is also known as the Wolfer sunspot numbers. .YuIe (19271. EXAMPLE 6. BartIett (1950). FIGURE 6.2. as reported in Burr (1976. for exampte. The sample ACF and sample PACF ate reported in Table 6. and Brillinger and Rosenblatt (19671. Whittle (1954).2 Series W2 is the classic series of the Wolf yearly sunspot numbers between 1700 and 2001. 134).112 CHAPTER 6 Model Identification TABLE 6. and others. For an interesting account of the history of the series. see Izenman (1985).2 and plotted in Figure 6.2 Sample ACF and sample PACF of the daily average number of truck manufacturing defects (Series Wl). That the sample ACF decays exponentially and the sample PACF has a single significant spike at lag 1 indicates that the series is likely to be generated by an AR(1) process.2 Sample ACF and sample PACF for the daily average number of truck manufacturing defects (Series Wl). The Wolf sunspot numbers have been discussed extensively in time series literature. Scientists believe that the sunspot numbers affect the weather of Earth and hence human activities such as agriculture. 26 256.887.3 Residual mean square errors in the power transformation.3.43 6.2 with the result given in Table 6.4.040.O 0.3. The plot of the data is given in Figure 6.2.83 2.81 241.4 and Figure 6. The sample ACF shows a dmping sine-cosine wave. TABLE 6.964.3. h Residual mean square error 1.0 -0.2 2700 1750 1800 1850 Year 1900 Empirical ExampIes 1950 2 13 2000- FIGURE 6. and 9.5 0.3 Wolf yearly sunspot numbers. It suggests that a square root transformation be applied to the data.01 . The sample ACF and sample PACF of the transformed data are shown in Table 6. It indicates that the series is stationary in the mean but may not be stationary in variance. and the sample PACF has relatively large spikes at lags 1. 2 700-2001 (Series W2). we use the power transformation analysis discussed in Section 4.6.0 277. suggesting that a tentative model may be an ARf2).5 -1. To investigate the required transformation for variance stabilization. C) = (1. is also possible. even though AR(3) model.4 Sample ACF and sample PACF for the square root transformed sunspot numbers (Series W2).114 CHAPTER 6 Mode1 Identification TGBLE 6. and Reinsel (1994) suggest that an .4 Sample ACF and sample PACF for the square root transformed sunspot numbers [Series W2). Box. By ignoring the values of $fi beyond lag 3.# J ~ -B ~ . Jenkins. FIGURE 6.. (1 . A fixed number of adult blowflies with balanced sex ratios were kept inside a cage and given a fixed amount of food daily. The data plot suggests that the series is stationary in the mean but possibly nonstationary in variance.6. 5 f 5 299.3 Series W3 is a laboratory series of blowfly data taken from Nicholson (1950). Thus. BriIiinger et al. The sample decays exponentially and #Akcuts off after Jag 1.many scientists believe that the series is cyclical with a period of 11 years. Because of the large autocorrelation of .6.5 indicates that no transformation is needed and the series is stationary in variance. and argued that the series Blowfly A is possibly generated by a nonlinear modei. We examine this series more carefully in later chapters. between 20 Blowfly 13: for Z.5 Blowfly data (Series W3). giving a total of n 364 observations. EXAMPLE 6.2 Empirical Examples 11 5 their analysis was based on the nontransformed data between 1770 and 1869. between 218 5 t 5 145. the following AR(1) model is entertained: Fk FIGURE 6. Later Tong (1983) considered the following two subseries: - Blowfly A: for Z. The sample ACF_and PACF are shown in Table 6. The blowfly popufation was then enumerated every other day for approximately two years.6 and Figure 6. (1980) h t applied time series analysis on the data set. .67 at lag 11. The power transformation analysis as shown in Table 6. Series W3 used in our analysis is the series Blowfly B of 82 observations as shown in Figure 6.5. It is clearly nonstationary in the mean.1. The differenced series as shown in Figure 6. suggesting differencing is needed. .5 Results of the power transformation oii blowfly data.116 CHAPTER 6 Model Identification TMLE 6.7. k 1 2 3 4 5 6 7 8 9 10 FIGURE 6.6 Sample ACF and sample PACF for the blowfly data (Series W3).7 and Figure 6. This conclusion is further confumed by the sustained large spikes of the sample ACF shown in Table 6.8 seems to be stationary. Residual mean square error h TABLE 6.6 Sample ACF and sample PACF for the blowfly data (Series W3). EXAMPLE 6.4 Series W4 is the monthly number of unemployed young women between ages 16 and 19 in the United States from January 1961 to August 2002 as shown in Figure 4. 1) model for the original series. To test whether a deterministic trend parameter Oo is needed.7 Sample ACF and sample PACF of Series W4: the U. Figure 6.10 shows Series W5. This nonstationarity is also shown by the slowly decaying ACF in Table 6. monthly series of unemployed young women between ages 16 and 19 from January 1961 to August 2002.000 population) of Pennsylvania between 1930 and 2000 published in the 2000 Pennsylvania Vital Statistics Annual Report by the Pennsylvania Department of Health. monthly series of unemployed young women between ages 16 and 19 from January 1961 to August 2002.8 and also plotted in Figure 6 9 . Thus. The sample ACF and sample PACF for this differenced series are reported in Table 6. This pattern is very similar to the ACF and PAW for MA(1) with positive O1 as shown in Figure 3.1926.2 Empirical Examples 117 TABLE 6. our proposed model is EXAMPLE 6.10. The sample ACF now cuts off after lag 1 while the sample PACF tails off.6. FIGURE 6. we calculate the t-ratio ~ / S E= .5 The death rate is of major interest for many state and federal governments. The series is clearly nonstationary with an increasing trend. per 100.7 Sample ACF and sample PACF of Series W4: the U.11. suggesting an MA(1) model for the differenced series and hence an IMA(1.11 and the evaluation of power .S.3707/1. which is not significant. Both Figure 6.925 = . which is the yearly cancer death rate (all forms.S.9 and Figure 6. Hence.B)Z.8 Sample ACF and sample PACF of the differenced U S . rnonthIy series of unemployed young women between ages 16 and 19 from January 1961 to August 2002 (Series W4). = (1 . the following random walk model with drift is entertained: (1 . The sample ACF and PACP of the differenced data shown in Table 6.S.158. implies that a determi-nistic trend term is recommended.6) .118 CHAPTER 6 Model Identification 1960 1970 1980 Year 1990 2000 FIGURE 6. of unemployed young women between ages 16 and 19 (Series W4). The t-ratio. W/sW= 2.BIZ. W.. of the U. = O0 + a.. Wt = (1 . transformations indicate no transformation other than differencing is needed.12 suggest a white noise process for the differenced series.8 The differenced monthly series.BIZ.0543/.2. monthly series.3336 = 6.10 and Figure 6. TGBLE 6. (6. 2 FIGURE 6.6.BIZ. however. . = (1 Empirical Examples 1 19 Sample ACF and sample PACF of the differenced U.9. monthly series. the yearly Pennsylvania cancer death rate between 1930 and 2000. We will investigate both The clear upward trend. . 194 1960 1980 2000 Year FIGURE 6.10 Series W5. suggests models in Chapter 7 when we discuss parameter estimation. of unemployed young women between ages 16 and 19 [Series W43.S. Based on the sample ACF and PACF of the original nondifferenced data in Table 6.9 W. one may also suggest an alternativeAR(1) model to be close to 1. It suggests that an IMA(1.12 with their plots in Figure 6. which is not significant. the yearly Pennsylvania cancer death rate between 1930 and 2000.6 We now examine Series W6.1 1 Sample ACF and sample PACF of Series WS. These transformed data are plotted in Figure 6. FIGURE 6.120 CHAPTER 6 Model Identification TGBLE 6.9 Sample ACF and sample PACF of the Pennsylvania cancer death rate between 1930 and 2000 [Series W5).14 further supports the need for di£€erencing.0186 = .3.2.0147/. we entertain the following IMA(1. The ACF cuts off after lag 1. the sample ACF and PACF for the differenced data.15. the standard deviation of the series is changing in time roughly in proportion to the fevel of the series. we examine the t-ratio. which looks very similar to Figure 3.11 and Figure 6.B ) h Z.10 with 81 > 0.l) model as our tentative model: . EXAMPLE 6. The very slowly decaying ACF as shown in Table 6.7903. 1) is a possible model.= (I .S. Hence. are reported in Table 6.Hence. Hence. t = R/sw = . tobacco production from 1871 to 1984 published in the 1985 Agricultural Statistics by the United States DepaEtment of Revenue and shown in Figure 4. In fact. which is the yearly U. The plot indicates that the series is nonstationary both in the mean and the variance.13 and show an upward trend with a constant variance. from the results of Section 4. To determine whether a deterministic trend term Bo is needed. and the PACF tails off exponentially.2. a logarithmic transformation is suggested. W. 13 indicates that a logarithmic transformation is required..p) = a.10 Sample ACF and sample PACF for the differenced series of the Pennsylvania cancer death rate between 1930 and 2000 [Series W53. Tong (1977). EXAMPLE 6.16(b).2. and Lin (1987).+lB .7 Figure 6. Priestley (I 981. The result of the power transformation in Table 6. The series of lynx pelt sales that we analyze here is much shorter and has received much less attention in the literature. the yearly number of lynx pelts sold by the Hudson's Bay Company in Canada between 1857 and 1911 as reported in Andrew and Herzberg (1985). FIGURE 6. We use this series extensively for various illustrations in this book. Thus. References include Campbell and Walker (19771. The three significant +w.2 Empirical Examples 12 1 TABLE 6. our entertained model is (1 . (6.17 show a cle.16(a) shows Series W7. strongly suggest p = 3.9) A related series that was studied by many time series analysts is the number of Canadian lynx trapped for the years from 1821 to 1934. Section 5 .& B ~.6.14 and Figure 6. The sample ACF in Table 6. .12 Sample ACF and sample PACF for the differenced series of the Pennsylvania cancer death rate between 1930 and 2000 (Series WS].ar sine-cosine phenomenon indicating an AR@) model with p r 2.&B3)(lnzt . 3 . The natural logarithm of the series is stationary and is plotted in Figure 6. ..(B) = (1 .S.1 1 Sample ACF and sample PACF for natural logarithms of the U.BIB . . . .B.122 CHAPTER 6 Model Identification 1870 I I I I I I 1890 1910 1930 1950 1970 1990 ) Year FIGURE 6.&BP) is a stationary is an invertible moving average . - #qB - .S. TABLE 6.3 The Inverse Autocorrelation Function (IACF) Let be an -A@. yearly tobacco production (Series W6). 6.Bq . yearly tobacco production in million pounds (Series W6)..13 Natural logarithms of the U. B. q) model where 4 J B ) = ( 1 autoregressive operator. 6.14 Sample ACF and sample PACF for natural logarithms of the U. TABLE 6. operator. is a white noise series with a zero mean and a constant variance ui.6.12 Sample ACF and sample PACF for the differenced series of natural logarithms of the U. the autocovariance generating function of this model is given by .9). We can rewrite Equation (6.S. yearly tobacco production (Series W63. and the a.(B).3 The Inverse Autocorrelation Function (IACF) 123 FIGURE 6.S.3. From (2.1) as the moving average representation where +(B) = Bq(B)/$. yearly tobacco production (Series W6). Year Year FIGURE 6. @) The natural logarithms of yearly numbers of lynx pelts sold in Canada between 1857 and 1911.16 The Canadian lynx pelt sales data (Series W7). [a) The yearly number of lynx pelts sofd in Canada between 1857 and 1911.124 CHAPTER 6 Model Identification FIGURE 6. Assume that y(B) # 0 for a11 1 3 1 = 1 and let .15 Sample ACF and sample PACF for the differenced natural logarithms of the tobacco data (Series W6). With respect to the model given in (6. a function of k. they are proper autocovstriances for a process. Hence.). y(I1(B) is also referred to as the inverse autocovariance generating function of {Z. &om the assumptions on tive definite.14 Sample ACF and sample PACF for natural logarithms of the yearly number of lynx pelts sold (Series W7).610). the coefficients {&)} in (6.4) form a posiClearly. from (2.is given by The kth lag inverse autocorrelation is defined as which.3 The Inverse Autocorrelation Function (IACF) 125 TABLE 6. called the inverse autocorrelation function (IACF).).6. is equal to the coefficient of B~ or B . Natwdly.3.11. of course.~ in p ( o ( ~ )As . pi!is . square surnmable sequence that is symmetric about the origin.3. the process having #')(B) as its autocovariance generating function is referred to as the inverse process. and Bq(B). h Residual mean square error TABLE 6. the inverse autocorrelation generating function of {Z.13 Results of the power transformation on the lynx pelt sales. Hence. 126 C M E R 6 Model Identification FIGURE 6. . Similarly. the inverse autocorrelation function of an AR@) process will cut off at lag p. it is easily seen that the inverse autocorrelation function is given by Because one can approximate an ARMA model by an AR(p) model with a suficiently largep.3.) is an ARMA (p.3) and (6. Specifically. then the inverse process will be an MA@) process with its autocorrelations cutting off at lag p. the inverse autocorrelation function of a process exhibits characteristics similar to the partial autocorrelation function and can be used as an identification tool in model identification. I n other words. p) process. i.] is an AR(p) process with autocorrelations tailing off. if (Z. one method to obtain a sample inverse autocorrelation is to replace the AR parameters of the approximate AR model by their estimates in (6. then the inverse process will be an ARMA(q.3.4). it is clear that if (Z.17 Sample ACF and sample PACF for naturaI logarithms of the yearly number of lynx pelts sold (Series W7).e. an MA(q) process with autocorrelations cutting off at lag q will have its inverse autocorrelations tailing off. For an M(p)process.3. From Equations (6. q) process.. Hence.7). 1 in which we entertained an AR(1) model for the series.15 The sample inverse autocorrelation function (SfACFJ. however. because the standard error of the sample IACF is -14. Note that although the inverse autocorrelation function seems to cut off at lag 1.3.3. one can use the limits f2 1 6 to assess the significance of the sample inverse autocorrelations. however.15(a) is the sample inverse autocoxrelation for the daily average series of truck manufacturing defects discussed in Example 6. the maximum AR order will be 2. which is clearly not appropriate from the sample ACF in Table 6. Table 6.6. examples. Table 6. the standard error of $) is given by Thus. EXAMPLE 6. It is not unusual that the values of the IACF are smaller than those of PACF in the above = p f ) = 0 for k > p.8 As illustrations.14(a).15 shows sample inverse autocorrelations for two time series examined in Section 6.3 The Inverse Autocorrelation Function (IACF) 127 The parameter estimation is discussed in Chapter 7.15(b) is the sample inverse autocorrelation for the natural logarithms of the Canadian lynx pelt sales that was identified as an AR(3) model based on the three significant PACF examined in Example 6. If the underlying modd is an A&!@). it is not statistically significant at a = -05. In terms of the LACF. In fact. implies that and from the discussion in the closing paragraph of Section 2.7. we have TGBLE 6. k (a) The daily average number of truck manufacturing defects (Series W1) 1 2 3 4 5 6 7 8 9 - k - 10 - (b] The natural logarithms of Canadian lynx pelt sales (Series W7) 1 2 3 4 5 6 7 8 9 10 .7). the model implied by the IACF may be an AR(l). Under the no11 hypothesis of a white noise process. then we know that Equation (6.2.Table 6. 6. equivalently. and WCF are relatively simple.4. In one study.e. due to the cutting off property of the PACF and MCF for AR models and the same cutting off property of the ACF for MA models. One commonly used method considers that if Z. Abraham and Ledolter (1984) conclude that. and IACF all exhibit tapering off behavior. as an identification aid. We return to this point in Chapter 12.-. the ACE PACF.128 CHAPTER 6 Model Identification Hence. For a mixed ARMA process..9. which makes the identification of the orders p and q much more difficult. the partial autocorrelation function. the identitication of the order p of an AR model and the order q of an MA mode1 through the sample ACF. Bq)a. q) model or.l > Ipf)l. we introduce the extended sample autocodation function and orher procedures that wiI1 help us identify the orders of mixed ARMA models. follows an ARMA(p. leading to another method to obtain a sample inverse autocorrelation function. and the inverse autoconelation function are useful in identifying the orders of AR and MA models. PACF. I#. The inverse autacorrelation function was &st introduced by Cleveland (1972) through the inverse of a spectrum and the relationship between the spectrum and the autocorrelation function..4 Extended Sample Autocorrelation Function and Other Identification Procedures The autocorrelatian function. it seems clear that. In this section. then follows an MA(^) model = ( 1 . .1 The Extended Sample Autocorrelation Function (ESACF) From the previous empirical examples. and values of the sample IACF in general are smaller than values of the sample PACF. 6..B . however. the PACF generally outperforms the IACE Some computer programs such as SAS (1999) and SCA f 1992) provide both PACF and IACF options for analysis. particularly at lower lags. q) with q > 0. . we consider an AR(p) regression plus an added term i?:? i.') represents the correspondingarror term. if q 2 1.If an AR(p) model is fitted to the data. then the lagged values i = I.6.e. q) model. First..4. which leads to the following iterated regressions. The OLS estimates will be consistent if q = 1. Consistent est@ates can be . .4 Extended Sample Autocorreiation Function and Other Identification Procedures 129 where without loss of generality we assume Bo = 0. . the OLS estimates #!q) obtained from the following qth iterated AR(p) regression will be consistent: . then the #(" will again be inconsistent..e. For q > 2. In fact. . where e. . an MA(2) residual process from an AR(1) fitting implies a mixed ARMA(l. For example.Ia). some authors such as Tiao and Box (1981) suggest using the sample ACF of the estimated residuals from an ordinary least squares (OLS) AR fitting to identify q and hence obtaining the orders p and q for the ARMA(p. the estimated residuals Zjf) are not white noise. however. i = 1.. however. We thus consider the second iterated AR(p) regression +I1' $y) The OLS estimates will be consistent if q = 2. .4) are not consistent when the underlying model is a mixed ARMA(p. then the OLS estimates inconsistent and the estimated residuals of +i. will contain some information about the process Z. i. . .p. Thus. . will not be white noise. the will again be inconsistent.. But as we show in Section 7. q) process in (6. however. because the OLS estimates of the AR parameter #is in (6.4. if #J~.. I f q > 1.4. and the lagged values 2!?i will contain some information about Z.obtained by repeating the above iteration. will be zFi.q. represents the error term. To derive consistent estimates of suppose that n observations adjusted for mean are available from the ARMA(p.2) model. the procedure may lead to incorrect identification. That is. where the superscript (1) refers to the fi5t iterated regression and e. we have . . TABLE 6. Of course. by (6. .4.j)-I on the hypothesis that the transformed pfm). the true order p and q of the ARMAO-.2. the second row gives the first ESACF . be the OLS estimates from the jth iterated AR(m) regression of the ARMA process Z. the vertex of the zero triangle in the asymptotic ESACF will be at the (p. .X C ~ $ ~ ~ ) Z-. form = 1. . The zeros can be seen to form a triangle with the vertex at the (1. .m . 1) position.Ois the estimated residual of the fi iterated AR(p) regression and the Jim and are the corresponding least squares estimates. as the sample autocorrelation function for the transformed series PY by). i. the asymptotic variance of Bartlett's formula or more crudely by (ti .q may X can be approximated using not be exactly zero. . OIm-p€j-q. of Z. and the columns are numbered in a similar way for the MA order. However. 1. . q ) process. q) position. . .10). for an AIZMA@. for m = 0. 2. . . nt. . In fact. . .. let #.. q) process. we have the following convergence in probability. Specifically. Based on the preceding consideration.. to specify the AR order. { + 0.and j = 1. .2. we have finite samples. In practice. . Thus. They define the mth ESACF at lag j.jl). The rows are numbered 0. More generally. 861[{" -~ ?'i-. Tsay and Tiao (1984) suggest a general set of iterated regressions and introduce the concept of the extended sample autocorrelaJion function (ESACF) to estimate the orders p and q. in practice. the number of observations. otherwise.p < j . however. the ESACF can be a useful tool in model identification.16 where the first row It is useful to mange corresponding to $) gives the standard sample ACP of Z. and so on.e. particularly for a mixed ARMA model. it can be shown (see Tsay and Tiao [1984]) that for an ARMA(p.130 CHAPTER 6 Model Identification where Lii) = 2.. . .16 The ESACF table.17. i = 1. . the asymptotic ESACF table for an ARMA(1. in a two-way table as shown in Table 6. and the . even though Note that the ESACF it is not explicitly shown.@) for 0 5 m . . is a function of r l . q) model are usually unknown and have to be estimated. 1) model becomes the one shown in Table 6. 1. Hence. 18(b) the corresponding indicator symbols for the ESACF of the series. 1) model. TABLE 6.18 (a) The ESACF for natural logarithms of Canadian lynx pelt sales. Table 16.18(a) shows the ESACF and Table 6. The vertex of the triangle suggests a mixed ARMA(2.4. This model is different from an AR(3) model that we entertained earlier based on the characteristic of the PACE We further examine the series in Chapter 7.9) is white noise. (a) The ESACF (b] The ESACF with indicator symbols . we use SCA to compute the ESACF for the natural logarithms of Canadian lynx pelt sales discussed in Example 6..4 Extended Sample Autocorrelation Function and Other Identification Procedures 131 TMLE 6. EXAMPLE 6. The ESACF table can then be constructed using indicator symbols with X referring to values greater than or less than 33 standard deviations and 0 for values within f2standard deviations. 2 3 4 .17 The asymptotic ESACF for an ARMA(1. series ~ l j of) (6. MA AR 0 1 . .6.9 To illustrate the method. 1) model.7. the ESACF approach suggests an AR(2) model for this series. invertible or noninvertible. the task of identifying nonstationarity through this approach is generally difficult. and Reinsel (1994) suggested an ARIMA(1.2. the ARIMA(p d. i? the previous ~e'i. OLS estimation is used in the iterated regression. the iterated AR estimates can be examined to see whether the AR polynomial contains a nonstationary factor with roots on the unit circle. it indicates that the AR polynomial contains the factor (1 B). they suggest that for given specified values of p and q. For example. and Reinsel (1994). the pattern in the ESACF table from most time series may not be as clear-cut as those shown in the above examples. Jenkins. Jenkins. 2. q) process is identified as the ARMA(e q) process with P = p d. 0) and ARIMA(0. Because the vertex of the zero triangle occurs at the (2. Of model.1 and 410) = -32. and Reinsel. 1. Because the OLS regression estimation can always be applied to a regression model regardless of whether a series is stationary or nonstationary. Table 6..3)(1 . For example.8B).19 The ESACF of Series C in Box. which generally require stationarity.O)position.10 As another example. are identified as an ARMA(2. Because the ESACF proposed by Tsay and Tiao (1984) is defined on the original series. From the author's experience. Because +to)(3) (1 . Both these ARIMA(1. Box. 0 1 2 3 4 5 X X 0 X X x X X 0 X X 0 X x 0 X X 0 X X 0 0 0 0 0 0 0 0 x o X X 0 0 0 0 X x 0 X X 0 0 0 0 0 0 0 0 0 0 X X 0 EXAMPLE 6. however. however. particularly because a tentatively identified model will be subjected to more efficient estimation procedures (such as the maximum likelihood estimation). + - . Jenkins. and ESACF.132 CHAPTER 6 Model Identification TABLE 6. 0) rnodefs. Other than a few nice exceptions. they argue that the use of ESACF eliminates the need for differencing and provides a unified identification procedure to both stationary and nonstationary processes. Jenkins. the iterated AR estimates are 41') = 11.0) model or an ARIMA(O. PACF. Due to sampling variations and correlations among sample ACF. the ESACF is computed fiom the original nondifferenced data. though. As a result. models can usually be identified without much difficulty through a joint study of ACF.19 shows the ESACF for Series C of 226 temperature readings in Box. Tsay and Tiao (1984) allow the roots of AR and MA polynomials to be on or outside the unit circle in their definition of the ESACE Hence. This advantage can be used much better if the ESACF is used for a properly transformed stationary series. based on the sample ACF and PACF of Series C as given in Table 6. To see whether a series is nonstationary. y d Reinsel (1994). The real advantage of the ESACF is for the identification of p and q for the mixed ARMA models.20.& C in Box. I. 0)model in term of the ESACF. and Reinsel (1994).6. Gourieroux. model identification truly becomes the most interesting aspect of time series analysis. . ESACF.is discussed in Chapter 7.4. Through careful examination of the ACF. and Monfort (1980). and other properties of time series. At this point. Interested readers are referred to their original research papers listed in the reference section of this book. the R-and-S-array introduced by Gray. and Mchtire (1978). PACF. and the comer method suggested by Beguin. Kelley. The statistical properties of the statistics used in the R-and-S-array approach and the corner method are still largely unknown. it is appropriate to say that model identification is both a science and an art. One should not use one method to the exclusion of others. 6.2 Other Identification Procedures Other model identification procedures include the information criterion (AIC) proposed by Akaike (1974b).4 Extended Sample Autocorrelation Function and Other Identification Procedures 133 TABLE 6. IACF. Jenkins.20 Sample ACF and PACF of Series C in Box. The information criterion. and the software needed for these methods is not easily avaitable. Some computer programs such as AUTOBOX (1987) and SCA (1992) provide the option to compute the ESACF in the model identification phase. 227 1.622 .082 1.282 -1.960 .3.566 .535 -1.478 -.15 -.581 -1.I40 -.463 2.317 -.865 .186 3.750 4.557 2.453 -.918 1.719 -1.588 2.k -.701 .237 -.05 .084 1.907 -2.80 bw(k) .I7 4 5 .727 -.349 -1. data = 2.06 (d) 6 7 -06 -.334 1.002 -.89 -86 3 5 .12 .93 .09 -.317 2.356 3.10 .605 -2.04 .368 .3 3 .340 1.I22 -1.325 1.08 .491 -4.36 -.532 -2.912 .454 1.595 .09 W = 2.367 .08 5 6 .270 -1. Wt = (1 . k 1 2 3 4 5 6 7 8 9 1 0 ?k .I95 . k 1 2 3 .676 3.082 .126 -.790 -1.153 1.239 -1.SO5 .279 3.463 .2 Identify proper ARlMA models for the following data sets (read across): (a) -2.z(k) = 1500 (e) n = 100.07 (c) n = 250.604 2.970 2.809 -. data = Z.07 .I1 (b) n = 250.048 -.542 -1.81 .610 2.63 .060 .03 .572 -..I74 .154 -1.867 -. W.740 .09 d. (a) n = 121.531 -1.082 1.167 -A55 2.569 (b) -1.I79 -.769 .645 .984 1.259 1.5.I8 . data =: 2.296 .540 3.83 C W ( ~ ) .08 -08 -03 .421 .35 -.469 .119 -3.738 -1.B)Z.954 2.714 2.476 -2.060 -.765 .06 -.O1 -01 .01 7 -.289 -.I40 -.33 .= (1 .00 6. 7 8 9 1 0 6 3 4 5 1 2 -99 -98 .04 .45 -.551 2.413 -. S$ = 20.19 .50 .1 Identify appropriate AMMA models from the sample ACF below.236 -.02 .493 -2.968 1.089 1.98 .118 .896 1.84 .80 . data = Z.322 2.10 -.024 -.960 -1..I7 .01 -. k 1 2 3 4 6 k -.343 -.193 1.134 CHAPTER 6 Model Identification 6.69 .041 1.97 .986 -.961 -.192 2.033 1.589 1.639 -.077 .353 -.90 -89 -87 $6 .891 .200 -2.370 ..858 .07 10 .479 -2. Justify your choice using the knowledge of the theoretical ACF for ARIMA models.91 .382 -.697 1. data = 2.295 .823 -.07 -.746 2.16 -. k .939 .610 370 1..292 3.203 1.693 .B)Z.875 2. 5 6 7 8 9 1 0 1 2 3 4 k k(k) -94 .370 1.263 3.472 . = 35.09 12 = 100.06 8 9 .328 1.600 1.08 .10 .574 .04 8 9 .401 -.94 .04 -.I1 10 .I17 -.05 . 419 1.315 ..Exercises (c) (d) 1 35 3.769 .054 .637 9.777 6.1.613 3.999 .452 4.822 .204 6.012 8.935 4.014 .1.749 -2.358 7.1.404 4.485 4.718 .424 .1.059 4.3 Identify models for Series W1 through W7 using ESACF and compare them with the models suggested in the book using ACF and PACE .219 7.598 3.677 .778 4.561 6.047 -.653 -2.501 .311 6. . dq). and a: = ~ ( a : in ) the model where Z. With full generality. q) model. & ) I . 8 = (dl. the next step is to estimate the parameters in the model.) is estimated by 2. . . .d. sample variance To. That is. N(0. . Very often several models can adequately represent a given series. TO estimate +. and Model Selection After identifying a tentative model. 7. after introducing diagnostic checking. . = Z. .ifor their theoretical counterparts and solving the resultant equations to obtain estimates of unknown parameters.i.Parameter Estimation. For example. (t = 1. Thus.1 The Method of Moments The method of moments consists of substituting sample moments such as the sample mean. . + $pk-p for k 2 1to obtain the following system of YuleWalker equations: - . Once parameters have been estimated. . . Z. we discuss the estimation of parameters 4 = &.p. 02. Z. . Several widely used estimation procedures are discussed. p = E(Z. .+~ . and sample ACF j. in the AR(p) process the mean p = E(z. Diagnostic Checking.) white noise. we also present some criteria that are commonly used for model selection in time series model building. we h t use that pk = 41~k-l + & P ~ . we check on the adequacy of the model for the series. . .). r+. n) are 11 observed stationary or properly transformed stationary time series and { a t ) are i. we consider the general ARMA(p. 2. where To is the sample variance of the Z.7. That is.1 For the AR(1) model the Yule-Walker estimator for from (7.1 The Method of Moments ck. . 4pby solving the above Iinear system of equations. Next. .3). series.49. h A A 137 Then. Iet us consider a simple MA(1) model . .1. we use the result and obtain the moment estimator for a : as EXAMPLE 7. is The moment estimators for p and o-2 are given by $=Z and respectively. .. . Having obtained 41. replacing pk by we obtain the moment estimators 41.. These estimators are u_suailycallec Yule-Walker estimators. .#*. #p. as discussed in Section 3. Diagnostic Checking.5. and Model Selection Again.2. 7. They are usually used to provide initial estimates needed for a more efficient nonlinear estimation discussed later in this chapter.2. there exist two distinct real-valued solutions and we always choose the one that satisfies the invedbility condition. regardless of AR. we discuss several alternative maximum likelihood estimates that are used in time series analysis. then we have a unique solution = kl.l. the moment estimates are very sensitive to rounding errors. This example of the MA(1) model shows that the moment estimators for MA and mixed ARMA models are complicated.138 CHAPTER 7 Parameter Estimation.2 Maximum Likelihood Method Because of its many nice properties. particularly for the MA and mixed ARMA models. If Ifill > . q) model . or ARMA models.1 Conditional Maximum Likelihood Estimation For the general stationary ARMA(p. pected. because a real-valued MA(1) model always has For < . After having obtained we calculate the moment estimator for w: as I€ Ipl l sl.which gives a noninvertible estimated model. This result is ex< . 7.5. The moment estimators are not recommended for final estimation results and should not be used if the process is close to being nonstationary or noninvertible. we use that and solve the above quadratic equation for BI after replacing pl by The result leads to 4 = i . In this section. the maximum likelihood method has been widely used in estimation. 5 .5.For GI. p is estimated by 2. MA. then the real-valued moment estimator does not exist. More generally. Lf (7. Q)' are known.1) as we can write down the likelihood function of the parameters ( 4 . these estimators are the same as the conditional least squares estimators obtained from minimizing the conditional sum of squares function S+(+. N(0. Because lnL* (4.. of a: is calcu- whe_retheAnumberof degrees of freedom d. . .i. p . we may also assume ap = ap-1 = = ~ l p + l -= ~ 0 and calculate a.) is a series of i. . az. where h A is the conditional sum of squares function. For the model in (7. .2. and a* = (al-.p . by the sample mean and the unknown a.7. and 0. There are a few alternatives for specifying the initial conditions Z* and at.2. Based on the assumptions that {Z.} is stationslry and (a. N(0.} are i. by its expected value of 0.i. . Let Z = (Zl.2.4). m i ) .3). are called the conditional maximum likelihood estimators.6) is used to calcufate the . the joint probability density of a = (al.p and {ci. fi.d.2.a.adl. . which. we note.Zn)' and assume the initial conditions ZI = (ZI-pJ . and lated from --- a. The quantities of 4. After obtaining the parameter estimates #. does not contain the parameter cr:. the estimate 3. L1.d.)' is given by . .5) thus becomes + which is aIso the form used by most computer pzograms. fort 2 @ 1)using (7. The conditional sum of squares function In (7. 0). equals the number of terms used in the sum of S*(& b.random variables.2.2. uz). . .2 Maximum Likelihood Method 139 where 2.. 0.f. . &. . . G:) white noise. we can replace the unknoxm 2.) involves the data only through S*(+. g.1). 8 ) minus the number of parameters estimated. Rewriting (7. p . p. = Z. which maximize Function (7. . O). b. 8. The conditionaI log-likelihood function is a' . In practice. p. .2..9) should have exactly the same autocovariance structure..2.2. &. 4. 8. Thus. one asks whether we can back-forecast or backcast the unknown values Zr = (Zllp. q)' needed in the computation of the sum of squares and likelihood functions. and Z.f.1 1) is approximated by a finite fonn 1 where M is a sufficiently large integer such that the backcast increment JE(z. 8.l .f. .8) to forecast the unknown future values for j > 0 based on the data (Zl. that is possible because aay ARMA model can be written in either the forward form . where S(4.these unconditional maximum likelihood estimators are equivalent to the unconditiond least squares estimators obtained by minimizing S{+.2 Unconditional Maximum Likelihood Estimation and Backcasting Method As seen from Chapter 5. a-1.Z) E(Z. Therefore.8) and (7. (7. p. Z. one of the most important functions of a time series model is to forecast the unknown future values. and 8 that maximize Function (7. p . 8. we can also use the backward form (7. given 4. p. + . p. Indeed. p . 0) is the unconditional sum of squares function given by I and E{a. = 2.( p d.(2p + q + 1).2. and Reinsel (1994) suggest the follo~vingunconditional log-likelihood function: . Z .10) are called unconditional maximum likelihood estimators.2. Because of the stationarity..2.2. . Box. z)]is less than any arbitrary predetermined small E value for t 5 . p. . p . . Diagnostic Checking. which implies that (e. in the same way as we use the forward form (7. . for a further improvement in estimation.Z1). Naturally.. . Jenkins. . For other models. 6.-l. Z. 6.} is also a white noise series with mean zero and variance cr.uz) involves the data only through S(4. 4 . + q + 1) = n . . and Model Selection sum of squares. 8). the summation in (7. d.-I 4. Z) is the conditional expectation of a.140 CHAPTER 7 Parameter Estimation.9) to backcast the unknown past values 3 and hence compute aj for j 5 0 based on the data {Z.(M + 1). &)' and a* = (al-. The quantities 4. Some of these terms have to be calcylated using backcasts illustrated in Example 7.). %).p) . . .. should be adjusted accordingly. or the backward form where FjZ. $.2. I . Again because lnL(#.. = (n . the 7. p. 8 . the use of backcasts for parameter estimation is important for seasonal models (discussed in Chapter 81. using the backcasting . . consider the AR(1) model that can be written in the forward form or. Z ) is negligible for t 5 .) = 0.1 . in the backward form where.(M -t. = a. 2)C= p.2 Maximum Likelihood Method 141 I This expression implies that E[Zt 1 4 .ZIO)from the process. EXAMPLE 7. which are listed in Table 7. &. Z = (ZI. . without loss of generality. for models that are close to being nonstationary.+BIZ. and especiaIly for series that are relatively short. p. TABLE 7. . the estimate 6. . After obtaining the parameter estimates 4.2 To illustrate the backcasting method. . and 0. Most computer programs have implemented this option.1 CaIcufation of St4 = . Consider a very simple example with ten observations.b. we assume that E(Z.3) for (1 method.7. of uz can then be calculated as A A For efficiency. 0. E(at 4 .1). hence. equivalently. 142 CHAPTER 7 Parameter Estimation.2. e.2. . Now. Z) .E(ZIdl 4 = . . and Model Selection I under the column E(Z..2. the predetermined e value. 10.IS). we can return to the forward form in (7.. i. shocks with respect to the observationsZ.E(Z-:! z)]= .~ Z) ._l.00378 < . . Z. fort I 0.3.3. we choose M = 2. for cfJ = . and Zl. we note that in terns of the backward form. Hence.18) I I Because ] E ( z .3 and we want to calculate the unconditional sum of squares IE I 1 + where M is chosen so that (2. 4 = . 2)for f = 1.2. (7. 4 . 5 0. Z) as E(ar Z) and E(ZI f q5 = . for t 5 0.. . for t 5 O are unknown future random .e. z)[< . with these backcasted values 2. Z) as E(Zt Z)..2.3 ftom t = -2 to t = 10 as follows: 1 .005. involves the unknown Z.3. E(et 1 2) = 0. To do so.3. Suppose that 4 = . First. To simplify the notations for this example we write E (a. .2. To obtain E(aI Z) we use (7. we have from (7.17) to compute E(at Z) for 4 = . Z) for f 5 1. values for t which need to be backcasted. however. we use the backward form in (7.005 for t 5 -(M 1). Diagnostic Checking.3.19) Therefore.&. .14) and compute 1 1 I I I This computation of E(a.= . d. .2. p.2. 233). For more detailed examples. = 4zt-1 + a. but the procedure is the same. Rewriting the process in the moving average representation. For the AR(1) model we do not need the value E(e. we can obtain S(4)for other values of 4 and hence find its minimum.3 Exact Likelihood Functions Both the conditional and unconditional likelihood functions (7.u:). we have . Jenkins. and Reinsel (1994. where 2.7. To illustrate the derivation of the exact likelihood function for a time series model.4) and (7.2 Maximum Likelihood Method 143 All the above computations can be carried out systematically as shown in Table 7.i.10) are approximations. = (2. see Box. are i.p).1 and we obtain Similarly. consider the AR(1) process z. Z) for t 2 1. ]I$] < 1 and the a. For other models. they may be required. 1 7.2. N(0. &. the joint probability density of (el. Now consider the following transformation: z. . .. at. . The z.$2)). . Diagnostic Checking. and Model Selection Clearly. The Jacobian for the transformation. rr:/(1 . are highly conelated. a*. .2. Hence. is . ) (&. however. .$2)).). follows the normal distribution N ( 0 ..) is .144 CHAPTER 7 Parameter Estimation. w.. -t- a.in) and hence the likelihood function for the parameters. To derive the joint probability density function P ( z ~ i2.= #Zn-.. we consider .221. the 2.. and they are all independent of one another.: / ( 1.. for 2 5 t 5 n. a. .~ ~ 5 of . Note that el follows the normal dis&ibution~ ( 0~. will be distributed as N(O. from (7. is nonlinear in parameters. 7. Interested readers are advised also to see Ali (1977). and Nicholls and Hall (1979). . .z. consider a simple ARMA(1. Newbold (1974) derived it for a general W ( p . . p . . 9 ) or the unconditional sum of squares S(#. To see that. . a*.) = P(el. . For a model containing an MA factor... &. the a.7. .. a. and at is clearly linear in parametefs. llao and Ali (1971) derived it for an W ( 1 . q) model. . p. Hence. . For an AR(p) process.3 Nonlinear Estimation 145 It follows that P(z. The exact closed form of the likelihood function of a general ARMA model is complicated. . 1)model . 1) model. . which are the sums of squares of the error terms at's.3 Nonlinear Estimation It is clear that the maximum likelihood estimation and the least squares estimation involve minimizing either the conditional sum of squares St(+. e). Ansley (1979). for a given series (zl. however. among others for additional references. i. Hillmer and T5ao (1979).) we have the following exact log-likelihood function: where is the sum of squares term that is a function of only 4 and p. Ljung and Box (1979). N ( 0 . . u:) independent of all the Xti.- . which follows a multivariate normal distribution MN(a. 2. for a general ARMA model..E2. we know that the Ieast squares estimators are given by for t = 1. .7) as . Y2. . .3. a2. Yn)'. From results in linear regression analysis. . Diagnostic Checking. V(&)) with The minimum residual (error) sum of squares is The least squares estimates in (7. . ap)' be an initial guess value of a = (al. .3. Let Z = (G. . and Model Selection To calculate a . . . we note that This equation is clearly nonlinear in the parameters. .5) can also be obtained by the following two-step procedure discussed in Miller and Wichern (1977). 11. ap)'and be the corresponding matrix for the independent variables Xti's. Let Y = (Yl. a2. . We can rewrite the minimum residual sum of squares in (7.. (~p)'. .i.d.a = (a1. Because a linear model is a special case of the nonlinear model. Hence. . . The nonlinear least squares procedure involves an iterative search technique. where the et's are i. we can illustrate the main ideas of the nonlinear least squares using the following linear regression model: . . a nonlinear least squares estimation procedure must be used to obtain estimates.146 CHAPTER 7 Parameter Estimation. a2.5) and (7. compute the residual Z = (Y .p and t = 1. .. Now. Y. . Once the values S = (81. . - Step 1. . . Given any vector of initial guess values Z. where represents a guess of the regression equation obtained by using the original model and the given initial values Ziys. . . .) are calculated.)'. Yz. . . That is. where = squares estimates are given by Note that the residual Zt is calculated as I. (X.3. is a white noise series having zero mean and constant variance (T: independent of X. .3.. it is clear from Equation (7.Y) and the residual sum of squares where ? = f(Z} is a vector of values obtained by replacing the unknown parameters by the initial guess values. f(X2.g. e. S(S) in Equation (7. .147 7. . 2.. Now..n.Moreover. .4) that .9) and S ( & ) in (7. S2.7) are in the same form. . 8. a). the matrix used in the equation of the least squares estimates in (7.3.3 Nonlinear Estimation where the Z.)' and f(a)= XI. . . .3. ..+)' is a vector of parameters. . Let Y = (Yl.Z). a). the least squares estimators (linear or nonlinear) can always be calculated iteratively as follows. Xtp) is a set of independent variables corresponding to the observations. the least squares value of 8 is (zl.4. a)with the frrst-order Taylor series expansion about the initial value i'. . the least N . Hence. . .3. for i = 1. a = (a1.From the previous discussion.'s are estimated residuals based on the initial given values G and S = (6 . and e.10) is actually the matrix of the partial derivatives of the regression function with respect to each of the parameters. .2. Approximate the model f(Xt.. ..f(X. - . . a)]'. consider the following model (linear or nonlinear): where X.Xt2. =. Hence. The estimated variance-covariance matrix V ( 6 ) of ii is where $2 is estimated as in (7. One of the algorithms commonly used is due to Marquardt (1963). step 2 only leads to new initial values for further iterations. We note that Si in S represents the difference or change in the parameter values. That is. For a Enear model.V ( & ) ) .148 CHAPTER 7 Parameter Estimation. and Model Selection z) where S = (a and gz = [Xlj] is the n X p matrix of the partial derivatives at ii. p. many search algorithms are developed. The iterations continue until some specified convergence criteria are reached. In summary. The nonlinear least squares routine starts with initial guess values of the parameters. the maximum change in the parameter values less than a specified level. Then we calculate zzis fixed and equals g. 0) or S(4.2. q) model.p. It is a compromise between the Gauss-Newton method and the method of steepest descent. step 2 gives the find least squares estimates. Diagnostic Checking. Some convergence criteria that have been used are the relative reduction in the sum of squares.To 'achieve a proper and faster convergence. 0). we can use the nonlinear least squares procedure to iind the least squares estimates that minimize the error sum of squares S* (4. or the number of iterations greater than a certain number. Note that for a linear model the changes from iteration to iteration. among othefs. ii be the estimate of a and d X g be the matrix of the partial derivatives in the h a l iteration of the nonlinear least squares - procedure. see Draper and Smith (1981). Step 2. O).7) or (7.: i n the above linear approximation.p.for a nonlinear model.2.13) and is the sample covariance between $ and Gj. For a nonlinear model. Obtain the updated least squares estimates and the corresponding residual sum of squares S(&). We know that asymptotically & is distributed 2 a multivariate normal distribution MN(a. this x8. Properties of the Parameter Estimates Let a = (4. For more discussions on nonlinear estimation. It monitors these values in the direction of the smaller sum of squares and updates the initid guess values. for a given general ARMA(p. We can test the hypothesis Ho: ai= rrio using the following t statistic: . It follows from steps 1 and 2 of the procedure that where x&is the (la .( p q 1) for the general ARMA model in (7. the degrees of freedom equals the sample size used in estimation minus the number of parameters estimated in the modei.3 Nonlinear Estimation 149 + + with the degrees of freedom equaling n . They hold for both linear and nonlinear models. = Z.1) X 1 matrix of partial derivatives. That is.. (More generally. + a.13) to rewrite (7.3. . for t = 2.-l.3 Consider the AR(1) model Z . which should be avoided as it often causes daculties in the convergence of the nonlinear least squares.2. Applying &%nonlinearestimation to (7. = a.3.1).22) 4 in the context of this section. The results presented in this section are perfectly general.231.) The estimated correlation matrix of these estimates is where A high correlation among estimates indicates overpararneterization. the initial guess value $ of 4 to be 0. .. We illustrate their uses in the following simple example. .. with no loss of generality. . = #Z. (7.n. we use (7. EXAMPLE 7. f(Z.3. we choose.-l To put the estimation of where Y.7. .22) as 4) = and e. Thus.3. then we have 4. we have alently.~ matrix of 4. which follows a t-distribution with (la .v($)) or N(O. (1 -+ . Diagnostic Checking.1Conditional 1 on (Z. For the asymptotical distribution of we note that G($. or the t statistic if a: is unknown and estimated.@)) Or. equiv- .4) -% N ( 0 . N ( 4 .# 2 ) / n ) .6) D $5 N(+. and Model Selection and Thus. Now. where Z ~ = ~ Z ~is-the I )variance-covariance .1 ) degrees of freedom. r a ~ a ) . (1 . n~($)). asymptotically. when 141 < 1.150 CHAPTER 7 Parameter Estimation. Properties of t$ The timitor ~ ( $ 1= c ~ ( ~ ~ X $=) -U' ~ ) is dishibuted agymptofically as N(+. if 101< I. = -4~~-~)~ /1 () .v(+)).. The es- timated variance-covariance matrix is x:=2(~I + where 6. we can test the hypothesis Ho: = c $ ~using the standard normal statistic if u: is known. G ( b . Thus. 4. 3. particularly when time series data are involved. Now.. based on available data. The answer depends on the stochastic nature of the error term e. that assumption 4 is crucial for this result to hold. the explanatory variables are usually also random variables. To see that. however. we rewrite 4 as and consider the following two cases. In a noncontro1lable study. 2. In this section.:E(X. Assumption 4 automatically follows if the explanatory variables are nonstochastic.: 1. Note however. we discuss some problems of OLS estimation in time series analysis. .4 Ordinary Least Squares (OLS) Estimation in Time Series Analysis Regression analysis is possibly the most commonly used statistical method in data analysis. Consider the simple linear regression model Under the following basic assumptions on the error term e. consider the foliowing time series model: The OLS estimator of 4. As a result.7. Zero mean: E(eJ = 0 Constant variance: E(e:) = ~2 Nonautocorrelation: E(et ek) = 0 for t f k Uncorrelated with explanatory variable X. the ordinary least squares (OLS) estimation developed for linear regression models is perhaps also the most frequently used estimation procedure in statistics. is 6 We would like to ask whether is still unbiased and consistent in this case when the explanatory variable is a lagged dependent variable.eJ = 0 it is well known that the OLS estimator is a consistent and the best linear unbiased estimator of 4.4 Ordinary Least Squares (OLS) Estimation in Time Series Analysis 151 7. In this case. e. is a zero mean white noise series of constant variance a:. Because this residual series is the product of parameter estimation.14). the series 2. and for an ARMA(1.] are white noise. the OLS estimator for the parameter of an explanatory variable in a regression model will be inconsistent unless the error term is unconelated with the explanatory variable.If 141 < 1 and hence 2. the a. That is.4) is equivalent to the first lag sample autocorrelation jl for the series 4. 1)process.). = (1 .4. That is. model diagnostic checking is accompLished through a careful analysis of the residual series (6. by (3. we have to assess model adequacy by checking whether the model assumptions are satisfied.. In this case.'s are uncorrelated random shocks with zero mean and constant variance.5 is a consistent estimator of p l . it is easy to see that in (7. becomes an AR(1) process with an absolutely summable autocone~ationfunction. 1) process Cme 2: and - This result shows that autocorrelation in the error term not only violates assumption 3 but also causes a violation of ?sumption 4 when the explanatory variables contai? a lagged dependent variabfe. q) models. the residuals 2. After parameter estimation.152 CHAPTER 7 Parameter Estimation. becomes an ARMA(1. 6 e. Estimation methods discussed in Sections 7.4. Under this condition.. Diagnostic Checlung.3 are more efficient and commonly used in time series analysis.2 and 7. It starts with model identification and parameter estimation. 7.5 Diagnostic Checking Time series model building is an iterative procedure. is an MA(1) process. which is equal to $. one can construct a histogram of the standardized residuals . the model diagnostic checking is usually contained in the estimation phase of a time series package. then by Section 2. is a zero mean white noise series of constant variance t ~ z hence .'s are estimates of these unobserved white noise at's. this condition usually does not hold except when q = 0. For any estimated model. = a.4) is an asymptotically unbiased and consistent estimator of 4.4. In summary. the e. Hence. For ARMA(p.BB)a. To check whether the errors are normally distributed. where the a.. 4 is no longer a consistent estimator of 4 because 4 is a consistent estimator for p l . Thus. $ in (7./Go and compare it with the standard nonnal distribution using . and Mode1 Seiection Case I : e. The basic assumption is that the {a. if OLS estimation is used and the model should be indeed a mixed model. Based on the results of these residual analyses. 7. we estimated the AR(3) model identified in Example 6.. Another useful test is the portmanteau Iack of fit test. we can examine the plot of residuals. Then we should re-identify an ARMA(1. if the entertained model is inadequate.6 Empirical Examples for Series WI-W7 153 the chi-square goodness of fit test or even Tukey's simple five-number summary.e. the above procedure of using the residuals to modify models usuaIIy works h e . where m = p q. assume that the entertained AR(1) model + produces an MA(1) residual series instead of a white noise series. To check whether the residuals are approximately white noise. As mentioned earlier. i. For example.e. 1) model and go through the iterative stages of the model building until a satisfactory model is obtained.rn) distribution. we compute the sample ACF and sample PACF (or IACF) of the residuals to see whether they do not form any pattern and are all statistically insignificant.6 Empirical Examples for Series W1-W7 For an illustration. a new model can be easily derived.7 for Series W7-the yexly numbers of lynx pelt sales-and obtained the following result: . i. Ljung and Box (1978) and Ansley and Newbold (1979b) show that the Q statistic approximately follows the X ~ ( K. Although that may sometimes cause problems.7. then the OM estimates of the AR parameters based on a misidentified model are inconsistent. This test uses all the residual sample ACFs as a unit to check the joint null hypothesis with the test statistic This test statistic is the modified Q statistic originally proposed by Box and Pierce (1970).. within two standard deviations if a = -05. To check whether the variance is constant. Under the null hypothesis of model adequacy. The residual autoconelations from this ARMA(2.116. Diagnostic Checking. raising the question of model selection to be discussed next. For K 24. we recalt that in Example 6. Related tables are not shown here. we conclude that the AR(3) model is adequate for the data. 1) model shown in Table 7. To check model adequacy Table 7.7. The result was also presented in Table 7.2 for Series W1 to W6. we use the nonlinear estimation procedure discussed in Section 7. All models except the AR(2) for Series W2 are adequate.3 = 21. In fact. Diagnostic checking similar to the one for Series W7 was performed for each model fitted in Table 7. TABLE 7. both AR(3) and ARMA(2.rn = 24 . and Model Selection and &: = .2 Residual ACF and PACF for the AR(33 model for Series W7. Series W7 was alternatively identified as an ARMA(2. and the model can be refitted with 42removed if necessary.3 to fit the models identified in Section 6.3. the Q statistic is Q = 26.4.3.05 for the degrees of freedom = K . The residual ACF and PACF are all small and exhibit no patterns.2 gives the residual ACF and PACF and the Q statistics.9. The estimation of this model gives - with all parameters being significant and 6. 1) models fit the data almost equally well. f ) model using the ESACF. . the chi-square value at the significance level a = .4 also indicate the adequacy of the model. They are all significant except for +z. where the values in the parentheses under the estimates refer to the standud errors of those estimates. Instead.3.154 CHAPTER 7 Parameter Estimation. Section 6. = . which is not significant as x i 5 ( 2 I ) = 32.124. Similarly. The results are summarized in Table 7.7.1. Thus. 4 Residual autocarrelations. of observations Fitted models 6: TABLE 7.6 Empirical Examples for Series Wl-W7 TABLE 7.3 Summaryof models fitted to Series W1-W7 (values in the parentheses under estimates refer to the standard errors of those estimates). . Series No. from the ARMA(2.155 7. 1) model for Series W7. Fk. numerous criteria for model comparison have been introduced in the literature for model selection.7. other times the choice can be very difficult.10) that the log-likelihood function is Maximizing (7.5) .7.3) is a constant.7. Thus. Because the second term in (7. we introduce some model selection criteria based on residuals.7 Model Selection Criteria In time series analysis or more generally in any data analysis. indistinguishable in terms of these functions. In this section. and ESACF are used only for identifying likely adequate models. and ui. recall from (7. when there are multiple adequate models. I. several models may adequately represent a given data set. Diagnostic Checking.7. Residuals from all adequate models are approximately white noise and are. the selection criterion is normally based on summary statistics from residuals computed fYom a fitted model or on forecast errors calculated &om the out-sample forecasts. (7.2) wih respect to 4. p. the AIC criterion reduces to The optimal order of the model is chosen by the value of M.. Shibata (1976) has shown that the AIC tends to overestimate the order of the autoregression.M)ln +Mlnn + Mln - >I 1 M . called the Bayesian information criterion (BIC).156 CHAPTER 7 Parameter Estimation.2. from (7. For a given data set. Model identification tools such as ACF. we have. so that AIC(M) is minimum. They are different from the modd identification methods discussed in Chapter 6. Akaike's M C and BXC Assume that a statistical model of M parameters is fitted to data. The criterion has been called AIC (Akaike's information criterion) in the literature and is defined as AIC(M) = -2 ln[maximum likelihood] + 2M. 1974b) introduced an information criterion. Akaike (1973. Akaike (1978. (7. 0.2. Criteria based on out-sample forecast errors are discussed in Chapter 8.1) where M is the number of parameters in the model. 1979) has developed a Bayesian extension of the minimum AIC procedure. The latter is often accomplished by using the first portion of the series for model construction and the remaining portion as a holdout period for forecast evduation. PACF. For the ARMA model and n effective number of observations.13). which is a function of p and q. Sometimes. the best choice is easy. and Model Selection 7. in general. IACF. To assess the quality of the model fitting. which takes the fom BIC(M) = ~ l l n & z -(n . it is clear that the minimum AIC occurs for p = 2 and q = 1. Hannan (1980). Canadian lynx pelt sales. an ARMA(2. I) model should be selected for the data. Hence. There are many other criteria introduced in the literature. Note that a competitive AR(3) model that we fitted earlier to the same data set gives the second smallest value of AIC.7. From the table. Stone (19791. Parzen's CAT Parzen (1977) has suggested the following model selection criterion.4 The AIC has become a standard tool in time series model fitting. and its computation is available in many time series programs.7. and r?$ is the sample variance of the series. which has been called SBC (Schwartz's Bayesian criterion): Again in (7. The optimal order of p is chosen so that CAT(p) is minimum. 3. We have introduced only some commonly used model selection criteria. we use the SAS/ETS (1999) software to compute AIC for Series W7.7 Model Selection Criteria 157 where Gi is the maximum likelihood estimate of a i . In Table 7. M is the number of parameters in the model.6). which he called CAT (criterion for autoregressive transfer functions): where 6. and n is the effective number of observations that is equivalent to the number of residuals that can be calculated from the series. and others. 2.5. &: is the maximum Ekelihood estimate of r+z. Schwartz's SBC Similar to Akaike's BIC. Schwartz (1978) suggested the following Bayesian criterion of model selection. .5 AIC values for Series W7. Interested readers are referred to Hannan and Quinn (1979). TDLE 7. EXAMPLE 7. Through a simulation study Akaike (1978) has claimed that the BIC is less likely to overestimate the order of the autoregression. based on the AIC. A4 is the number of parameters. is the unbiased estimate of uz when an AR(j3 model is fitted to the series and n is the number of observations. see FindIey (1985). For further discussion on the of ATC. 3 .and IT:. 1) model with 8 = . and compare it with the fitted model given in Table 7. and modify your fitted model from the result of residual analysis.158 CHAPTER 7 Parameter Estimation. and f . and Model Selection 7.&) for the MA(2) process with = . 1. 5 . illustrate how to calculate the conditional sum of squares function S(#Q.calculate the conditional sum of squares S(OI.5.3 Given the set of observations 2. moments estimates for = . Perform residual analysis and model checking for each of the fitted models. and compare with the m e parameter values of the model. (a) Fit the simulated series with an AR(1) or an MA(1) model.O. I) model.5 and 62 = -2.418.2.2.8. 7. (b) Estimate the parameters of your modified model.3. 2 . 7. Diagnostic Checking.5 Consider the following observations from an ARIMACO.2. 2 . 7..0. 3. and 1.2 Assume that 100 observations from an AR(2) model gave the following sample ACF: = . Carry out diagnostic checking. 1) model. Find method of 7. 7.7 A summary of models fitted for the series W1 to W7 is given in Table 7.1. Estimate and &. (b) Calculate the unconditionaj sum of squares using the backcasting method as shown in Table 7.4.523. B1) for theARMA(1. 7. . 7.1.5.5. and i2= . 1) model gave the following estimates: G: = 10.1 Assume that 100 obsirvations from an ARMA(1.4: (a) Calculate the conditional sum of squares (with a0 = 0).8 Use ~kto find a model for each of the series W l to W7.3. 1.4 Given the set of observations 6 . . 4 .6 Simulate 100 observations from an ARMA(1. 4 .4.3. and j3= .1. F2 = . 01. 2. 9 Suppose that (1 --@)Z. 7. a. (c) Discuss the relationship between the ordinary least square estimator and the maximum likelihood estimator for # in the given model. for the AR(2) model [GI. find the maximum likelihood estimator for # and its associated variance. and calculate 95% forecast limits.12 Derive the joint asymptotical distribution of the least squares estimators..Exercises 7.13 Consider the yearly data of lumber production (in billions of board feet) in the United States given as follows: Year Production (a) Plot the data and perform necessary analysis to construct an appropriate model for the series. (c) Update your forecasts when the 1983 observation became available and equaled 34. (b) The AR(2) model Zt = $lZf-~+ $2Zr-2 $. Given calculate the unconditional sum of squares for I$ = . = (1 - 159 BB)a. 7..3 using the following models: (a) The MA(1) model Zt = (1 .8.10 Consider the AR(1) model (1 . A . 7.W ( Z t - PI = at- (a) For p = 0. (b) Find the maximum likelihood estimators for # and p when p # 0.11 Illustrate the nonlinear estimation procedure discussed in Section 7.6.4 and 0 = . is a tentatively entertained model for a process.BB)a. (b) Find and plot the forecasts for the next four years. . 7. we extend the autoregressive integrated moving average models to represent seasonal series. which affects many business and economic activities like tourism and home building. The phenomenon repeats itself every 12 months. giving a seasonal period of 4. toys. The numbers increase dramatically in the summer months. As an illustration. and thus the seasond period is 12. ) is seasonal with seasonal period s. The seasonal period in these later cases is 12. The seasonal nature of the series is apparent. and the monthly sales of toys rise every year in the month of December. the table has also been called the Buys-Ballot table in the literature.1 according to the period and season and including the totals and averages. which is closely related to safes such as jewelry. After a brief introduction of some basic concepts and conventional methods. and stamps. we devote a separate chapter to seasonal t h e series. The smallest time period for this repetitive phenomenon is called the seasonal period. suppose that the series { Z . monthly auto saIes and earnings tend to decrease during August and September every year because of the changeover to new models. and graduation ceremonies in the summer months. monthly employment figures (in thousands) for young men aged between 16 and 19 years from 1971 to 1981. greeting cards. . Detailed examples are given to illustrate the methods. with peaks occurring in the month of June when schools are not in session. which are directly related to the labor force status in these months.1 General Conce~ts Many business and economic time series contain a seasonai phenomenon that repeats itself after a regular period of time.1 shows the U. Wold (1938) credits these arrangements of the table to Buys-Ballot (1847).S. 8. and decrease in the fall months when schools reopen. To analyze the data. hence. Figure 8. Seasonal phenomena may stem from factors such as weather. and the series repeats _thisphenomenon each year.Seasonal Time Series Models Because of their common occurrence in our daily activities. Similarly. it is helpful to arrange the series in a two-dimensional table as shown in Table 8. More generally. cultural events like Christmas. For example. the quarterly series of ice-cream sales is high each summer. . .1 Buvs-Ballot table for a seasonal time series. Z.n Z.2 z(n-l>+3 * ' ' zm T. = the total for the jth season. Tj.. FIGURE 8.8.. T j = the total for the jth'period. monthly employment figures (in thousands) for young men aged between 16 and 19 years from 1971 to 1981. . 1 Z[n-1>+l Total Average TI. z3. TW T T/s Tltl T/w .2 Z. zl. 7. 2..S. = the average for the jth season.1 U. s Total Average z1 z 2 ZJ+2 z 3 zr+3 . z(n-l>+2 T2. z2 T3.. TABLE 8.... ..] = the average for the jth period. . zs Zs+l 2. E1 T.. T = the grand total of the series.l General Concepts 0 I970 1 I 1972 1974 I 1976 Year I I I 1978 1980 1982 16'1 '. 2 3 . + St + e. time series have been thought to consist of a mixture of trend-cycle (P. For example. For example. as 2. Alternatively. can be written as .. (8. seasonal (S.I) seasonal dummy variables. an additive seasonal time series is written as the following regression model: where P. and irregular components (eJ.. and the Wit are the trend-cycle variables. represents the seasonal effect of thejth period as compared with the period s. then one can write the time series Z. /3. a seasonal series of period s can be written as where Dj. = CYO + XyL1cyiq.162 CHAPTER 8 Seasonal Time Series Models 8.).2. a linear trend-cycle component P. Note that when the seasonal period equals s. we need only (s . a trend-cycle component can be written as an mth-order polynomial in time as S i d a r l y . In other words. = 1 if t corresponds to the seasonal period j and 0 othenvise. is set to be 0 so that the coefficient pi. can be written as k More generally. = P.. S. 8.2.1 Regression Method In the regression method.1) To estimate these components. several decomposition methods have been introduced in the literature. j # a. If these components are assumed to be additive.2 Traditional Methods Conventionally. the seasonal component St can be described as a lineat combination of seasonal dummy (indicator) variables or as a linear combination of sine-cosine functions of various frequencies. = Zj=lPjl$. S. and the yl are the seasonal variables.). Pi.2 Traditional Methods where [sD]is the integer portion of sI2.. e. and yp sion method can be used to obtain estimates Gi.2 Moving Average Method The moving average method is developed based on the assumption that an annual sum of a seasonal time series possesses little seasonal variation.P.2. for Equation (8.. For Equation (8. Thus. The estimates of P. the standard least squares regresand yj of the parameters ai.7) are then given by and 2t A A = zt . .2.8)they are given by and 8. and specified values of~?t and s.163 8.2. A model of this type is discussed hrther in Chapter 13. = P. and e.2. Thus.2) bticomw For a given data set Z. S. the model (8.. be + . by letting N.S. i. For an illustration. but also the employment levels for the month of June in the previous years.-.e.zits. i. the decomposition procedures are also known as seasonal adjustment. i.. .164 CHAPTER 8 Seasonal Time Series Models the nonseasonal component of the series.3 Seasonal ARIMA Models Traditional methods presented in Section 8.)contains within periods and between periods relationships. . That is.. An estimate of the seasonal component is derived by subtracting N. The topic is presented only briefly here. there is great interest in government and industry to adjust the series seasonally.. which has been widely employed in government and industry.. are not as well behaved.} contains the between period seasonal variation and fit a nonseasonal ARIRiIA model for the series. . however. and Cupingood and Wei (1986). where nt is a positive integer and the $ are constants such that I+ = h-! and X%-. In this section. .. and between periods relationships represent correlation among . from the original series. an estimate of the nonseasonal component can be obtained by using a symmetric moving average operator.2 are based on assumptions that the seasonal component is deterministic and independent of other nonseasonal components. . the seasonal component may be stochastic and correlated with nonseasonal components. Interested readers are referred to an excellent collection of articles edited by Zellner (1978) and to some research papers on the topic such as Dagum (1980).e. Within periods relationships represent the correlation among Zt-2. the Buys-Ballot table implies that (2. ~t-25.2. Pierce (1980).Zf-s1ZI. ' ' ' ' Suppose that we do not know that the series {Z. The series with seasonal component removed. Hence.. let us examine the U. A saccessful example of using this moving average procedure is the Census X-12 method. A These estimates may be obtained iteratively by repeating various moving average operators. to forecast the employment level for the month of June. is referred to as the seasonally adjusted series.Ji = 1. Hillmer and Tiao (1982).. A 8. Thus.e. Zt .. Z. Zit*. The table indicates that the employment figures are not only related from month to month but also related from year to year. In general.S. Bell and Hillmer (1984). . we extend the stochastic ARIMA models discussed in previous chapters to model seasonal time series. Many time series. Z. one should examine not only the employment levels in the n e i g h b o ~ gmonths such as May and July. &+a. Based on prevailing belief that the seasonal component is a well-behaved phenomenon that can be predicted with a reasonable accuracy. .S. monthly employment figures (in thousands) for young men aged between 16 and 19 years &om 1971 to 1981 using the Buys-Ballot table shown in Table 8. More likely. 818 Total 109. 8. 8. Mar.590 12. Avg. May June July 7.181 Nov.805 Apr. 8.048 11. monthly employment figures (in thousands) for young men aged between 16 and 19 years.986 7.094 . 8.052 Aug.770 Sept.091.971 Dec 8.S. 8.17 . 9.288 Oct.ys-Ballot table for the U. It can be easily seen that this between periods relationship can also be represented as an AF3MA model where and are polynomials in BS with no common roots.we have E 0 = -8. The roots of these polynomials lie outside of the unit circle. Then ( I . D = 0.if P = 0.3. Combining (8. ) will not be white noise as it contains unexplained between period (seasonal) correlations. s = 12.] is a zero mean white noise process. For illustration.@ ~ ' ~ )=b a. and Q = 0.9. D = 0.] becomes pxlz) = (.9)j as shown in Figure 8. suppose that in (8.166 CHAPTER 8 Seasonal Time Series Models Then obviously the series { b .3. Similarly.3. and {a.3) we get the well-known Box-Jenkins multiplicative seasonal ARZMA inodel.3. . . then the autocorrelation function becomes which is shown in Figure 8. (8.] representing the unexplained between periods relationship.3) P = 1.4) Zf @ = ..2.3.1) and (8. Let be the autocorrelation function for {b. s = 12. and Q = 1. then the autocorrelation function of (b. 3 ACF for 6. where ifd=D=O.9BE)bt = a. For convenience. where the subindex s refers to the seasonal period. q ) X (R D. The model in (8. Q).3. we often call &(B) and d4(B) the regular autoregressive and moving average factors (polynomials) and Dp(BS)and Bp(BS)the seasonal autoregressive and moving average factors (or polynomials).8B12)a.... . = (1 - . otherwise.2 ACF for (I - 2 67 . respectively.6) is often denoted as ARIMA(p. d.8. FIGURE 8.3 Seasonal ARIMA Models FIGURE 8. 3. Hence. This extension is usefuf in describing many nonstandard time series that.6. M autoregressive factors. 1. + e2)a. ylz = yj = 0. The autocovariances of \Vr = (1 . 1. Because it is this general form that most time series software use. may contain a mixture of seasonal phenomena of different periods. 7 1 3 = eed. be easily found to be (1 -t d2)(1 yo 71= -6(1 YII +. the model may contain K differencing factors. The ith difi'erencing factor is .4. and N moving average factors. otherwise.B)(1 .. the plot of pk is shown in Figure 8. the ACF becomes pj = 0.B ' ~ ) z can . we now explain this general model in more detail. l)lz model This model has been found very useful to represent a variety of seasonal time series such as auline data and trade series. Hence. 1) X (0. The General AFUMA Models More generally. we can write a general ARWA model as follows: Thus.2. (8.4 and 0 = . otherwise. it is also referred to as the airline model in the literature.9) For 0 = . The model was first introduced by Box and Jenkins to represent the international air travel data. for example. .e(l + e2)c:. = 60~2.1 Let us consider the AWMA(0.168 C m E R 8 Seasonal Time Series Modeb EXAMPLE 8.82)u. . because the ESACF provides only the information about the maximum order of p and q. .or damped sine waves at the seasonal and nonseasonal lags. = Z. one may use the factors in any desired order-for example. its use in modeling seasonal time series is very limited. and ESACF for Seasonal Models The PACF and IACF for seasonal models are more complicated. Z. On the other hand. PACE IACF. Clearly. We have found that in identifying seasonal time series.6B1*)a. and N are usually less than or equal to 2. Parameters in autoregressive and moving average factors beyond the first factor of each kind are normally considered seasonal parameters in the model.4 ACF for W. The kth moving average factor is and contains one or more moving average parameters.4B)(1 169 .If K = 0. M.8. as the mean of the series. The same is true for the differencing factors..The jth autoregressive factor contains one or more autoregressive parameters.. the seasonal and nonseasonal autoregressive components have their PACF and IACF cutting off at the seasonal and nonseasonal lags. Otherwise. = 2. then 2. cjjm. the seasonal and nonseasonal moving average components produce PACF and IACF that show exponential decays. does not exist. In general.p.3 Seasonal ARIMA Models FIGURE 8. dh. the values of K. In most applications. The computation of the ESACF for seasonal models is time consuming. Besides. with the order si (the power of B) and the degree di... = (1 . making the first factors in each case as the "seasonal factors"-if so desired. The parameter O0 represents the deterministic trend and is considered only when K # 0. the standard ACF analysis is still the most useful method. and the patterns are in general very complicated. 0 = .8. Table 8. and forecasting for these models follow the same general methods introduced in Chapters 5. The series is listed as Series W8 in the appendix. series (1 .5 A simulated ARIMA(0. we illustrate the application of these methods to several seasonal time series in the next section. (1 . 1.These values indicate that the series is nonstationary and that differencing is called for.1) white noise. being Gaussian N(0.6.6 show the sample ACF and PACF for the series. it is clearly seasonal with an upward trend.B) . The description of these methods is not repeated here. and 7..4 Empirical Examples EXAMPLE 8 2 This example shows a simulation of 150 values from an W ( O . Instead.. 1) X (0. As shown in Figure 8. and a.3 and Figure 8. diagnostic checking. I.5.170 CHAPTER 8 Seasonal Time Series Models Model Building and Forecasting for Seasonal Models Because seasonal models are special forms of the ARMA models introduced in Chapters 3 and 4.8B) ( 1 . 0 I I I I I 30 60 90 120 150 * Time FIGURE 8. 8. model with 8 = .6B4]a. 1. The ACF is large and decays very slowly. parameter estimation. the model identification. 1. whereas the PACF has a single large spike at lag 1. 1) x (0. 6..B4fZt = ( 1 . 7(a). That is.. Hence.7(b). 1) X (0. Parameter estimation procedures can now be used to estimate the values of 8 and 8. 0.B ~ is) also needed so as to achieve stationarity.4(a) and Figure 8. 3. .8B)(1 .4 Empirical Examples 171 - TABLE 8.B)Z. ..B)(1 . FIGURE 8.8. = (I .B ~ Zin.8B)(1 .B)(1 . (I . the series is differenced and the sample ACF and PACF of the differenced series (1 . 1j4model is zero except at lags 1. 1. 1.7(b) implies an ARIMA(0.B)(1 .}.B)(1 .B4fZ. are computed as shown in Table 8.B ~ ) z as .6lI4)a.3 Sample ACF and sample PACF for a simulated series from (1 ... 4. the sample ACF of (1 .6B4)a..6 Sample ACF and sample PACF for a simulated series from (I. 1) X (0. It is known that the ACF of an ARIMA(0. we examine the sample ACF and sample PACF for (1 .B4)Z.4(b) and Figure 8. Table 8.. given in Table 8. Thus. That the ACF decays very slowly at multipIes of seasonal period 4 implies that a seasonal differencing ( I . model for the original series {Z. 0. To remove nonstationarity. and 5.4(b) and Figure 8. 1.5(b) and regular differencing (1 .2 indicate a strong seasonal variation with seasonal period 12.5.10.S. we obtain the following results: . Because W = . k 1 2 (a) For Wt = (1 .66/(7 1. imply that there is a stochastic trend in the series.85/*) = . on the other hand. The row totals over the years.. = (1 .1. Hence. The series was shown earlier in Figure 8.3 We now examine the U. = 71. n = 119. n = 149) 6 7 8 9 10 11 12 EXAMPLE 8. process as a tentative model for the series Parameter Estimation and Diagnostic Checking Using the standard nonlinear estimation procedure. 3 4 5 (m= 1. 0) X (0.4 Sample ACF and sample PACF for the differenced series simulated from ( 1 . which is not significant. S.} as shown in Table. the t-value of significant spike at lag 12. The sample ACF for the original and differenced series are computed and given in Table 8.BIZ. which is listed as Series W9 in the appendix. Model Identificatiotz The columns and thcassociated marginal column totals of the BuysBallot table in Table 8. and thus the deterministic &end term Bo is not needed. ST.B)(1 .87.B I Z . monthly employment figures for young men between 16 and 19 years of age f ~ o m1971 to 1981.312)Z. The need for both is clear &om Table 8. we identify the following ARIMA(0.5(d) has only one = .. 1. = 3.6B4fat.66. the ACF for (JK = (1 . Now.85.8B)fl .1. This trend has to be removed by differencing before a model can be identified.EI4)(1 .B ) and seasonal differencing (1 (c).172 CHAPTER 8 Seasonal 'Erne Series Models TABLE 8. 8. fb) For Wr = (1 . . Thus.6. (a) For W. j.1.e. The residua1 ACF for the fitted model as shown in Table 8.B4]Zt.. however.B)(l .7 Sample ACF and sample PACF of the differenced series. has a significant spike at lag 1.4 Empirical Examples 173 FIGURE 8. we modify the model to an ARlMA (0.8..B)Z. with &: = 3327. = (1 . 1112.3.1. 1) X (0. 85. Sw = 114.5 Sample ACF for Series W9. Values of the residual ACF of this modified model as shown in Table 8.(w ) = 26. Sz = 161. n = 119) Parameter estimation gives with Gz = 3068.47. For K = 24.B ~ ~ ) Z . the value 20.81.7 of the Q statistic is not significant .B)(1 . = (1 .7 are all small and exhibit no patterns.L for {Z1)(Z= 826.174 CHAPTER 8 Seasonal Time Series Models TABLE 8. n = 132) (b) jkfor {Wt = (1 . Sly = 148.98. n = 120) .B ) z . ) ( ~= 22. la] . Sw = 71.66.89.B ' ~ ) Z .6. n = 131) Ti (c) ik for {Wt = ( 1 (d) ikfor {TV. ) [=W.22. As discussed in Chapter 5.4. 1. forecasts can be calculated directly from the difference equation. because X$5(22) = 33. . for a given forecast origin..k.33B) [l .B'2]Z. 1) X (0.B](1 (1 .. the fitted ARIhllA(0. 1.BI2)Z. model in (8.7 Residual ACF. For Equation (8.4 Empirical Examples 175 TABLE 8.4.5) is Forecasting Because the model in (8..9.8. we can use it to forecast the future employment figures. judged adequate for the series.82fI1*)a.5)we have Thus. Thus.5) is adequate.. &. the 1-step ahead forecast from the time origin t = 132 is given by where and TABLE 8. from the fitted model (1 .Bf(1 . = (1 . say t = 132.79B12)a..4. = . . from the fitted model (1 .6 Residual ACF. By equating the coefficients of Bj on both sides of (8. the $j weights that are needed for calculating the forecast variance can be easily obtained as folloivs: and where the .4. we first rewrite the model in the following AR representation because the model is nonstationary but invertible: where and.the variance of the forecast error is and the 99% forecast h i t s .4. by (5. from (5.8).nj weights are given in (8.26).9).2.7).10).2.2. we have Now. are given by .176 CHAPTER 8 Seasonal Erne Series Models To derive the forecast variance. By (5. hence. 8 Forecasts and their 95%confidence limits for youth employment at the origin of December 1981. Standard error .4 Empirical Examples 177 The first 12 forecasts. TGBLE 8. and their standard errors are given in Table 88. i. .8.. These forecasts and their 95% forecast limits are also plotted in Figure 8. 2.&. for 1 = 1. . 12. . Because the process is nonstationary. the limits become wider as the forecast lead time 1 becomes larger.8 Forecasts for youth employment at the origin of December 1981. A Lead time Forecasts Zk32(I) X Actual Observation Forecast + Forecast Limit FIGURE 8. ?132(~).8. . IJ. in millions of barrels.9 shows the 32 consecutive quarters of U. . beer production.4. It is not evident that we need a seasonal differencing.. they are decreasing and not large. Thus.178 CHAPTER 8 Seasonal Time Series Models FIGURE 8.S. The numbers of sample ACF and PACF values used to identify a seasonal model with a seasonal period s should be at least 3s. the model is inadequate. The First Tentative Model Based on the Original Series To assess the forecasting performance of a model.11) The estimation of the above model gives 6 = -93 with the standard error of . The series is listed as Series W10 in the appendix.9 Quarterly beer production. We choose it because sometimes we do need to construct a model £rom a relatively short series. The sample ACF and sample PACF as shown in Table 8. EXAMPLE 8. The residual ACF has a significant spike at lag 4 (table not shown). we use only the first 30 observations of the series for model construction.4 Figure 8. Series WlO may be too short. The power transformation analysis of these 30 observations indicates that there is no need for a power transformation. between 1975 and 1982 (Series W10). from the first quarter of 1975 to the fourth quarter of 1982. in this example. . Although the sample ACF at multiples of 4 are larger than their neighboring values. in millions of barrels. The 95% confidence interval of @ clearly contains 1 as a possible value. we start with a tentative model of the form (1 - w)(z.07. hence.) = a. (8.9 demonstrate a strong seasonal pattern of period 4. 10(a) has only one significant spike at lag 4 suggests the following tentative model: where Bo is included because Parameter estimation gives SF = 1 . we consider the seasonally differenced series W.BJ)2.B ~ } zwith . n = 30) Models Based on the Seasortally DllfferencedSeries As a result of the previous analysis. = 5. S.B ~ ) z .9 Sample ACF and sample PACF for U.S. and :?t = 2. Careful readers may notice.10(b) also has only one significant spike at lag 4. The residual ACF as shown in Table 8. that the sample PACF of (1 . quarterly beer production (Series W10).42. its sample ACE and sample PACF shown in Table 8. howver. as shown in Table 8.as shown in Table 8.11 indicates no model inadequacy.8. with its magnitude approximateIy .39. where values in parentheses below the parameter estimates are the associated standard errors.92.= (1 .) (2 = 44. which is significant.10. (a) bk for {Z. 3 6 / ( 2 . That the sample ACF of the series (1 .67.4 Empirical Examples 179 TABLE 8. 0 3 / 6 ) = 3. (8..B ~ ) z . jk. Sw = 2.03. TABLE 8.4. = 1.14) .87B4)a. quarterly beer production.180 CHAPTER 8 Seasonal Time Series Models Sample ACF and sample PACF for the seasonally differenced series of U. = do + a..S.10 (a] h for {w. )(F= 1.11 Residual ACF. This fact suggests that the following purely seasonal AR process could also be a good tentative model: (I - @ ~ j ) (-l E4)Z.= (1 .from the fitted model [l . n = 26) TABLE 8.49 + (1 .36.B4)Z.. equal to the sample ACF at the same lag. + The estimation results are with G: = 3. Let the 1-step ahead forecast error be where rz is the forecast origin larger than or equal to the length of the series so that the evaluation is based on outsample forecasts..51 + a. from the fitted model (1 .B4)2.12 Residual ACF. - Tfie mean square enor is 1 " MSE = . .4 Empirical Examples 181 TABLE 8.66114)(l . The comparison is usually based on the folIo\ving summary statistics. If the main purpose of a model is to forecast future values. Model Selection Based on Forecast Errors It is not uncommon that several adequate models can be found to represent a series. The ultimate choice of a model may depend on the goodness of fit. = 2.2 e f . then alternative criteria for model selection can be based on forecast errors. 1. The residual ACF as shown in Table 8.12 also indicates model adequacy. -z2+)loo%. which is also referred to as bias because it measures forecast bias.8.06.k. The mean percentage error. Ml=l . is MPE = 2. such as the residual mean square or criteria discussed in Chapter 7. 15) Forecast value Error MPE MSE MAE W E 3. (b) (1 . VZ. Lead time Actual value Model (8.13) Forecast value Error Model (8.4.4.4. or VV4Zt.1s).15) in terms of forecasts. data = Z.aIP)(1 . The mean absolute percentage error is MAPE = -x I - )loo%.13) slightly outperforms Equation (8. 8.13) and (8.1 Find the ACF for the following seasonal models: (a) Z.4.B1B)ar.15).(PIBS)Z.B4)Z. For illustration..13 Comparison of the forecasts between models (8. = f 1 . = (1 . VZ. .-z 1 lej[.133 and (8. 8.2 Identify appropriate time series models for the data from the following sample autocorrelations: (a) n = 56.4..4. Ml=l 4.13. The mean absolute error is MAE .4. we calculate the one-step ahead and two-step ahead forecasts &O(l) for 1 = 1 and 2 from the forecast origin 30 for the two competing models (8..= (1 .182 CHAPTER 8 Seasonal Time Series Models TABLE 8. (c) (1 .4. The forecast errors and the corresponding summarystatistics are shown in Table 8.B)(1 . The results indicate that Equation (8.BIBS)a.#J$)& = at.B)Z.BIB)(l .. where W is used to refer to the corresponding series Z. VV4Zl = (1 . compare two models based on forecast errors. s& = 183 102.24. (b) Give a preliminary evaluation of the series using Buys-Ballot table. withdl = . weights that are needed to evaluate the forecast variance. Compute and plot the rr weights. and examine carefully the trend and seasonal phenomenon contained in the data. (e) Calculate the forecasts for the next 4 periods using the models obtained in parts (c) and (d). ?k 8.2. 1 = ( 1 .5 Consider the number of visitors (in thousands) to a Colorado ski resort given as follows: Year Winter Spring Summer Falf (a) Plot the series. @ = . (d) Fit a seasonal ARIMA model to the data using oniy observations between 1981 and 1985.anduz = 1.. 2 . (a) Express the model in terms of the AR form.~ ' * ) l nZ.~ ' ~ )-( B)Z. (b) Find the eventual forecast function. .B)(1 .8.3 Consider the following model: ( 1 .4 Consider the following model: (a) Find the one-step ahead forecast in terms of past observations. W k = . (c) Find the forecasts and their 95% forecast limits for the next 12 periods.Exercises (b) = 102.8) to the data using only observations between 1981 and 1985.B I B ) ( l . 8. 8. (c) Fit a proper regression'model such as Equation (8. (b) Compute and plot the IC.81B12)a. data = (1 .38. Apr.S. (a) Build a seasonal ARIMA model for the series. Mar. (b) Forecast the next 4 observations and find their associated 95% forecast limits. 8. Sept. Dec. . Nov. Oct. personal consumption of gasoline and oil (in billions of dollars) between 1967 and 1982: Year I - I1 I11 1V (a) Build a seasonal ARIMA model for the series.7 Consider the following U. and find their 95% forecast limits. Feb. June May July Aug.184 CHAPTER 8 Seasonal Time Series Models 8.S. liquor sales (in millions of dollars): Year Jan. (b) Forecast the next 12 observations.6 Consider the following U. 8 Consider the following monthly employment figures for young men between ages 16 and 19 in the United States from January 1982 to November 2002 (in thousands): Year Jan. May June July Aug. (a) Build two alternative seasonal models for the series using observations from January 1982 to December 2001. Apr. . (b) Compare your models with the model discussed in Example 8. Feb.Exercises 185 8. Nov. Dec. Oct. Mar. and comment on a model selection in terms of forecasting performance. (c) Forecast the next 12 observations.3. Sept. For distinction. in this case is referred to as an integrated process or series. A process W ( t ) is . 9. .2 Some Useful Limiting Distributions To develop a desired test statistic and discuss its properties. a nonstationary time series can often be reduced to a stationary series by differencing. we introduced a homogeneous nonstationary time series that can be transformed to a stationary series by taking a proper order of differencing. Ad Z. Empirical examples are used to illustrate the tests. then the Z. Alternative proofs of. we will use the approach employed by Chan and Wei (1988) and begin with the following useful notions. To decide whether to take the difference. Specifically. Because an ARIMA model can be regarded as a more general ARMA model with a unit root in the associated AR polynomial. so far we have relied on our own judgement after examining the plot of the series and the pattern of its autocorrelation function.B ) ~ z .. 9. we present several unit root tests for nonseasonal and seasonal time series i n this chapter. Chapter 10). If we decided that was the case. SV(t). = (1 . d. The Z. A process. is nonstationary and its dth difference. For a formal treatment. we often write a continuous process as W(t)rather than TV. In model identification discussed in Chapter 6. the decision for differencing is based on an informal procedure relying on a subjective judgment of whether the sample ACF of a time series fails to decay rapidly.1 Introduction In Chapter 4. Most of these tests can be performed by simply using the output from a standard multiple regression procedure.some results presented in this chapter can also be found in Fuller (1996. for a more formal procedure we can introduce tests to test for a unit root in our model fitting.Testing for a Unit Root As discussed in Chapter 4. then we would fit the series with an ARXMA model. q) model. is said to follow an ARIMA(p. if a time series Z. is continuous if its time index t belongs to an interval of a real line. q) process. is stationary and also can be represented as a stationary AfZMA(p. t 2 ) and (tg. it foflorvs that F. W(t) is distributed as N(0. Now. - .2 Some Useful Limiting Distributions 187 said to be a Wiener process (also known as Brownian motion process) if it contains the following properties: 1.e. we also use the notation X.a:) random variable by the central limit theorem. 4. to X in probability as n m. Given the i. E [ W ( t ) ] = 0.. jconverges to a N(0. which wjlI be denoted by D P Fn(x) N(0.d. In the following discussion. . ) . X U $ ) as n +00. and W(t) has independent increments. where x E [O. x u . . we consider tin the closed interval between 0 and 1. 11. i-e.9.e.. [W(t2) . W~thno loss of generality.W(tl)] and [W(t4) . W(0)= a.W(t3)] are independent for any nonoverlapping time intervals (tl. Because. . n.(x) converges in distribution to f i N ( 0 . W(t) follows a nondegenerate normal distribution for each t. random variables a. . for t = 1. That is. It can be easily seen that the limit of the sequence of the random variable Fn(x)/u.define for 0 5 x < l/n. t). with mean 0 and variance uz. --+X to indicate the convergence of X. can be described by a Wiener process. 2. t E 10. as - --r m. then the process is also called standard Brownian motion.l] and [xn] represents the integer portion of (xlz). Furthermore. if for any t.i. i. . i.. u:) = N(0. for 2/11 5 x < 3/n. for l/n 5 x < 2/11. [ - 6 and ~ z a .. / t / i . t4). . 3. 2. 2: = zt-. \ z"/fi. Then.e.4). Specifically.a:]. x).. ..188 CHAPTER 9 Testing for a Unit Root where W(x)at time t = x follows an N(0.2. by (9.I). (9. Thus.2. where W(1)follows an N(0. it is clear that the integral j:Fn(x) dx is simply the sum of the area defined in (9.2.6) I.+ a.2..6).a.-la. = 1 ?[z: - z:-. Similarly.We can rewrite Fn(x) in (9. and Next. Let Zt = al + a* + .i. forx = (9.1) as for0 5 x < l/n. for 2/12 5 x < 3/n.)2 = 2:-1 + 2Z.. and Zo = 0.5)implies that and Also. + a. for l/n 5 x < 2/n. + a:. HI : #I < 1. . It is tempting to use the normal distribution given in (7. where the a.1 1) P which follows from (9.8) and that [ ~ : = l a f / n ] o:. n and Zo = 0. is the Gaussian N(0.3 Testing for a Unit Root in the AR(1) Model The simplest model that may contain a unit root is the AR(1) model.I).26) or the t-distribution given in (7..3 Testing for a Unit Root in the AR(1) Model 189 and summing from I to n gives Therefore.3. . the null hypothesis will be rejected if #I is much less than 1 or if ( 4 . the unit root test implies a test for a random walk model. 2 (9.9. With the assumed Zo and following (7.f l : { [ ~ ( l ) ] ~ .2. however. . we note that . We discuss three cases for this model: a model without a constant term. . 9.: 1 1 = .4. only hold when ]#I < 1. if tbe test is based on the statiGic 4 used to estimate 4. 9. a:) white noise process. i. A 'cT.[~(l)]' 2 - .3. Under the hypothesis Ha : 4 = 1. HO: 4 = 1.4) we obtain the OLS estimator of I$ in (9. These distributions..1 Testing the AR(1) Model without a Constant Term For the simple AR(I) model with t = 1.1) is too negative.1) as. The alternati-ve is that the series is stationary. a model with a constant term. and a model with a linear time trend.3.27) to test the above hypothesis. Thus.2.e.3. the null hypothesis is rejected only when n ( 4 . As a result. By the law of large numbers.6827 as n becomes large. in (9.4) were constructed by Dickey (1976) using the Monte Carlo method and reported in Fuller (1996.27) that when 1$1 fa. 1) random variable.I ) < 0 approaches . They are reproduced in Table F(a) of the appendix.For any given probability density function f(x).. The result folfows from (9.1) is clearly skewed to the left. In fact.3.. lV(1) is known to be an N(0. 641). When and the integration cannot be solved algebraically. Hence.2. The OLS estim_ator4 clearly underestimates the true value in this case.on fO.1) is really too negative. (9. which is simply The percentiles for the empirical distribution of n($ .e. The probability that a X2(1) random variable is_less than I is . (9.3. generate n independent uniform random variables XI.3.e.1 1) and that under Ho. much less than the rejection limit when the normal or the t-distribution is used.. .6827..190 CHAPTER 9 Testing for a Unit Root Thus.1) becomes a random walk model that can be written as Zr = a l + az+ * * ' Recall from (7. (9. we can test the hypothesis Ho : 4 = t$ = 1.. if we wish to calculate +o using either the normal or the t-distribution.3. i. Then compute . Because the denominator is always positive.5) < 1.4). thejrobability that 11(4 . fW(1)12 follows the chi-square distribution with one degree of freedom.2. p. . however..I]. i. and the limiting distribution of n(+-. x2(1). First. we can use the Monte Carlo method to approximate the integral by a sum in the following way.26) and (7.3.10).1) given in (9.3. we can no longer use these distributions. this sum converges to E[f(X)] as n becomes large. X. 1).3. (9. 11.I)(+ .2.-. It is clear that the large sample results of ( a .10). = $Z. = . The use of the assumed initial value Zo = 0 is purely for the convenience of deriving a limiting distribution.3. . To test whether the underlying process contains a unit root. which is the mean adjusted data set of the 71 annual cancer death rates of Pennsylvanians between 1930 and 2000. They are reproduced in Table G(a) of the appendix. We reject Ho if Tis too large negatively.2).. The nature of the distribution of T is similar to that of n ( 4 .1) sums of squares and produc9 when estimating the regression parameter in (9. we normally use only real data points..1)(4 .-d2 (11 .8) . + a. where Y.3. we often use only (11 .3 of Chapter 7. p. That is.9. ls the annual cancer death rate and is the mean of the series. we fit Series W5 with two alternative models.=I t=1 (Zt . .AWiththat in mind.1).3.P. .1 As shown in Table 7.1) is computed only for t = 2.3) can be repla_ced by (rt . Z.I) are exactly the same. we fit the following regression model 2. Now let us more rigorously test for a unit root.2.$2.3. As a result.~ )x -1 ID and i+. In actual data analysis. a random walk model and an AR(1) model with a large value of 4.1 1) and that $2 is a consistent estimator of 6.1) given in (9. The percentiles for the empiricaf distribution of T were also constructed by Dickey (1975) using the Monte Carlo method and reported in Fuller (1996. = .. (9.1) ' It can be easily seen from (9.3) that which follows from (9. .1) and n ( 4 . hence (9. To facilitate the test. EXAMPLE 9. we consider the series Z. the test statistic n(4 . 642).3 Testing for a Unit Root in the AR(1) Model 191 The commonly used t-statistic under Ho is where s4 = [ ~ ~ (]e ~ . p = E(Zt).05. (9. Thus.3.4.$1. we cannot reject the null hypothesis.2 Testing the AR(1) Model with a Constant Term The mean.3.. First.9) for t = I. which is not less than the 5% critical value -7. and we conclude that the underlying process for the data contains a unit root. + a. . where werecall from (3.95 given in Table G(a). of P is given by = (x'x)-'X'Y. Thus. we consider the foflowing model with a constant term Z. For a nonzero mean AR(1) process.1) = 70(. = a + 4Zi-.3.n.4) as where The OLS estimator. b. Ho : 4 = 1.I)(+ . we have ( n .7 for n = 50 or -7. . note that in terms of the matrix form of a standard regression model.1) = -1. Ho : 4 = 1. The t statistic is given by which again is not less than the 5% critical value -1. . the null hypothesis.9 for n = 100 given in Table F(a). Hence.985 .192 CHAPTER 9 Testing for a Unit Root The OLS regression equation is 4.16) that a = p(1 . .1) was assumed to be 0. 9. we can rewrite (9. and obtain the same conclusion that the underlying model contains a unit root. where theAvaluegiven in the parenthesis is the standard error of With n = 71. .3. of the AR(1) process given in (9. cannot be rejected. 9). It follows that .10). (9.it follows that under the null hypothesis we have (b Thus.8 ) = (x'x)-'X'Eand under Ho : # = 1.9).(9.5).3 Testing for a Unit Root in the AR(1) Model aP 193 where following Dickey And Fuller (1979).2. and (9. with a constant term related to the mean of the process.2.2.3.11). by (9.9. we use the subscript p in to denote that the estimate is based on (9. we have a = 0. Because .2. from (9. to denote that (9.194 CHAPTER 9 Testing for a Unit Root The corresponding t statistic is where from (7. with a constant term related to the mean of the series. Czt-1 cz:-1 1=1 t=l For the sampling distribution of T. were constructed by Dickey (1974) and reported in Fuller (1996.1 1) and (9. we first note that hence.9).u is used in T. ] l D . .2.3. . the-subscript . pp.. as and using (9. They are reproduced in Table F(b) and Table G(b). 541-642).13). The empirical percentiles of n ( 4 .~ ] . S$p = { "[o I] [ n ~ ~ l .19).9) and (9.3. ..3.10) Rewriting T. respectively.2. . is used.1) and T. we have that Again.3. again the null hypothesis. (9. It was shown in Example 9. the subscript t is used in 4. = cr + 4Z. One example is Z.13. Again.with the model Z. Ho : 4 = 1. The t statistic in this case becomes which is not less than the 5% critical value -2.. = (4.3. and T.to 1984.3.9. They F e reproduced in Table F(c) and Table G(c).3 Testing the AR(I) Model with a Linear Time Trend Sometimes a time series process may contain a deterministic linear trend.88 for n = 250 given in Table G(b).Thus. The empirical percentiles of the distributions of ? I ( + .3 Testing for a Unit Root in the AR(1) Model 195 EXAMPLE 9. the null hypothesis.7 for n = 100 and . tobacco production from January 1871.2. 9. 641-642).6841.2. denoted by #J.3.9 for n = 250 given in Table F(b).. Thus. Recall fiom Table 7.13. the 114 yearly observations of U. we will now try to fit the series that is presently denoted as 2. respectively.17) with a linear time trend is used. . EXAMPLE 9. the 71 annual cancer death rates of Pennsylvanians between 1930 and 2000.. Ho : # = 1. With ti = 114.3 Let us again consider Series W5. (9. cannot be rejected.1)/S& under the hypothesis that 4 = 1 and S = 0 were constructed by Dickey (1976) and reported in FulIer (1996.3.90 for 31 = 100 and -2.9143 . and we conclude that the underlying process for the data contains a unit root with a = 0. . . cannot be rejected.2 Let us consider Series W6.17) h The test statistic based on the OLS estimator of 4 . we have (ti .-1 + a.1 ) = 113(. we fit the foflowing regression model Z . To test for a unit root.S. Thus.-I + a. which is not less than the 5% critical value .I)(#.a. the implication is that the model may contain a deterministic component.3 that a random walk model with drift was used to fit the series: As shown in Section 4. = a + fit + #JZrll 4.1) and I.1) = -9. to denote that (9.. = a + St + 4Z. and its limiting distribution can be derived using_the same procedure as in previous cases.16) The OLS regression equation is where the_ values in the parentheses are the standard errors. . pp.1 that the underlying process for the series contains a unit root. Ho : 4 = 1. Not rejecting Ho : = 1 in (9.7 for n =: 50 or -20.3.is a general linear process . cannot be rejected. 2. however.3.3. is the Gaussian white noise process.9013 . and we conclude that the model for the series of annual cancer death rates contains a unit root.909 which is not less than the 5% critical value -19. the conclusion in this case is that the model contains a + unit root with or without a deterministic tern. In a preliminary analysis.181. i. Thus. + a. a random walk modef with drift could be used to fit the series.. the null hypothesis. Thus. It is important to note the difference between (9.Z.. More generally.3.cannot be rejected.3. For the following discussion.is a nonwhite noise stationary process. not rejecting the null hypothesis. Ho : 4 = 1.I)(+. The t statistic in this case becomes which is not less than the 5% critical value -3. where a. 9. depending on whether the intercept term is statistically significant.9) simply implies that the model contains a unit root because 4 = 1 also indicates that cu = 0. a2 . the constant term is statistically significant at the 10% level.17).B)Z.1) is reduced under the null hypothesis of a unit root to a random walk model (1 .45 for n = 100 given in Table G(c). =: a.50 for n = 50 or -3.e. we may have + + - where X. Now the value of the test statistic becomes ( n .1) = 70(. In (9. As shown in (9. we assume that X.3. again the null hypothesis..196 CHAPTER 9 Testing for a Unit Root The OLS regression equation is where the values in the pahrentheses are the standard errors. = a.5). Thus. Ho : 4 = 1. As shown in (9. in this case becomes the sum o f t independent and identically distributed random variables.1) = -6.17).6 for n = 100 given in Table F(c).3.9) and (9. . does not necessxily imply that ru = 0.4 Testing for a Unit Root in a More General Model Note that the ARfl) model without a constant term in (9. and Ej=ojl$jl < 00. we have and For the sampling distribution of (9.4.'@ = ~0.1) becomes Now. . Ho: # = 1. we note that by substitution (9.429. we can fit the following OLS regression and consider the estimator Under the null hypothesis. Hence.9.4 Testing for a Unit Root in a More General Model 03 197 CO where @ ( B ) = $j~i. To test for a unit root in this general case. 4. comparing (9. ~ o ( ~ ~ o @ j + l + = i )~ a ~f . for 1/12 I x < 2/11. Clearly. a stationary process.2.4.+ zo]/Ih+ 0. J=O I Thus.2. i o a i a .Yo &. + + P where Z. given in (9.1).2.(x)is given by % . Because it follows that the process Y. is stationary.198 CHAPTER 9 Testing for a Unit Root w where I = -~ .8) forx = 1. . a.61. and the initial conditions.4. Let for0 s x < l/n.2. I$(l)I < a.f.8) with (9. the nonstationq process 2.1) or (9.1) is actually the sum of a random walk.4) that under (9. -and i ori = .4.] 4.Yt . we see fiom (9. the limiting distribution of+H. Thus. Because [Y. for 2/n 5 x < 3/14 (9. = @(l)[al+ a2 + . .4.4.I-ua$(l) --+ ~(x)&.13) . % .4.10) and that Z@Z P ---+ 0 and [8:=Iz/~] ob P + (9. Thus.2.4.4 Testing for a Unit Root in a More General Model Specifically. which follows from (9. r-1 IX(.9.y= It is easily seen from (9. 0 and Because summing from I to rt we get Hence.2. (6) F.9) and (9.10) that 1 1 n . and /fi are the same. -= . we also have or.7) that the limiting distributions of ~ . equivalentlyy which is often known as the central limit theorem for stationary processes. It follows similarly from (9.3 / 2 e ~ . D 1 1 .Hn(l) fi - D 199 (9.10) ua$(l)W(l). .17).14) and (9.8) and (2. we can use the same Table F of the appendix for the significance test.~ variance ] of the estimator which follows from (9.$8.6. Here we note fiom (2..6. 4. where &$ and ?(I) are consistent estimates of u$ and y(l).3). following Phillips and Perron (1988). Hence.14). where yk is the kth autocovariance of the stationary process X.200 CHAPTER 9 Testing for a Unit Root It follows from (9. in our test. With the modified statistic (9.4.4. we can use the following modified statistic.15) that Note that zF the. Further note that u $ / [ L ~ = ~ is in tbe OLS regression in (9.4.4.4.9) that y ( B ) = u$@(B)$(B-') is the autocovariance generating function. Thus. we refer readers to their 1988 paper and to Hamilton f 1994).and where the +k's are the sample autocovariances of the OLS regression residuals. To test for a unit root.1 + j=1 Cqj~ztAj + a. P-1 q p . For more detailed discussion on the procedure of Phillips and Perron. testing for a unit root is equivalent to testing c$ = 1 in the model P. .c$) white noise process with ~ ( a ! < ) cu. because any stationary process can be approximated by an autoregressive model.4.~has ) roots lying outside the unit circle..B).3) can also be modified so that the same Table G of the appendix can be used for the significance test.. The value of m is chosen such that Iqkl is negligible for k > m.~B P .21) j= 1 Thus.4. Suppose that the series is generated from the A R ( p ) model where the a. Z. we assume that (B) = 6-1(B)(l . the t statistic from the OLS regression fitting in (9. is the Gaussian N(O.22) .9. = 4zrU1 where where = (Zt-j (9.4. we can rewrite (9.4 201 Testing for a Unit Root in a More General Model In practice.@. Alternatively. Similarly..B)Zt = (Z! - - C V ( Z-~Zt-j-l) -~ = a. . and aP(B) = 1 Q ~ B .4. for any given accuracy.3) for G$.22) as . This assumption implies that . Dickey and Fuller (1979) suggest the use of a higher-order AR model to solve the problem.... BP.4. . we can use residual mean square error from the OLS regression fitting in (9. (9.Zt-j-l). where 6-1 (B) = ( 1 p l ~ .l f B ) ( l . . lvhich may contain a unit root. Tn terms of the matrix form.p P . 202 CHAPTER 9 Testing for a Unit Root The OLS estimator. of @ is given by = (x'x)-l~'~. + (X'X)-~X'E. (X'X)-l~'(~#l + E) = p under the null hypothesis of a unit root. Thus. we have Let We note that . = Because = (X'X)-~X'Y 8. it follows that P .P = (x'X)-~X'E. = AZ.2). = X.4.-l (B) is assumed to satisfy stationarity and the conditions given in (9.14) and (9. from (9.15).B ) Z .4.9. it is clear that under the null hypothesis we have ( 1 .4.4..4 Testing for a Unit Root in a More General Model 203 Hence. we have and . (9.21).27) where I ) @ ) = l/cp.Thus.4. From (9. = $(B)a. 7) and from (9. The qj's are parameters on the stationary regressors (AZt-l. we see that Table F(a) can be used when applying the statistic n($ . .4. from (9. on AZt-l./n 0.4.2.1)$(1). they can be tested using the standard t statistics provided in the regression output. Asymptotically.4.22) is the same as the standard asymptotic distribution for the OLS estimators obtained by regressing the differenced series AZ. .4. (9. .4. (9.4.30) to (9.4. . Applying (9. .YO Zo)a.4. . we have s:+p+l + P where we use (9. . we obtain the following interesting results: + 1. and (9.204 CHAF'TER 9 Testing for a Unit Root for j = 1. (p . equivalently.31) with (9. Hence. 2. following (9. AZ.26).1).4.4. .3. Similarly.7). . we see that the "t statistic" in this case is . or.4).28) and (9. the estimators of and 9's are independent as shown in (9. The parameter q5 relates to the unit root test.28).4.26).27).Furthermore.+l). . and (9. .28).4.3. Comparing (9. and asymptotically by (9.4.1 1) and that (5-1 . The Imiting distribution of their OLS estimators Gj's obtained from (9.301.30).29). (9.-. 3.4. .29). AZi-p+l. .4.32) resulting &om the use of a higher-order AR model are often referred to as the augmented Dickey-Fuller tests in the literature. . . The limiting distributions of (11 . and . .54. we note that under the null hypothesis (9.Ztll).4.. The OLS regression equation becomes .QI ..4..22) and the regression is computed for [IZ .22) forp = 3 with a constant. the Table G(a) can be used in this general case.p ) (# .e. where AZ.4 As an illustration.p ) ( $ . .p) observation:. More rigorously. Without using assumed initial values.I). 4. = p + X. we can show that a reasonable model for the series is an AR(2) after differencing. Based on the ACFs and PACPs of the original series and its differencing.4.0and Xt = $@)a.9. where the values in the parentheses are the standard errors. i. . .*B2))a.l ) $ ( l ) instead of tt($ . The tests such as (9.n. let us consider the 54 yearly advertising expenditures of the Lydia Pinkham data listed as Series W12 in the appendix. Similar results extend to the use of Tables F@)and (c) and G(b) and (c).~ ( B )=) a[1/(1 ~ . for the model with a constant tern and the model with a line_ar time trend.. Thus.4 Testing for a Unit Root in a More Genera1 Model 205 As a result. EXAMPLE 9. we can assume an AR(3) model and test for a unit root. Again we often use ( ~ t.1)+(1) if no initial values are assumed in Equation (9.4. To compute the test statistic. for Z. i.l)$( I ) are exactly the same.1)$(1) and n(# . . .B)Z. t = ( p -it.33) becomes (1 where p = &/(I mus..plB . we will fit the model with t -. . we fit (9. = (Z. respectively.31) and (9. = [ l / q ~ ~ .e. 3 for rt = 50 given in Table F(b). however. only that the series Zt is integrated of order d greater than or equal to 1. Equation (9.4.22) can also be written as where a = 4 . Ho : C#J = 1. see Said and Dickey (1985) or Phiuips and Penon (1988). For more references. Ho : = 1. among others. In such a case. 9. 3.34)becomes AZ. In such - . Recall that if 2. cannot be rejected. A nonstationary series may not be homogeneous and no differencing of any order will transform it into a stationary series.1. then its seasonal differencing is (1 . and we conclude that the process for yearly advertising expenditures can be approximated by an AR(3) model with a unit root. It is clear that Equation (9. we can also equivalently run the regression (9.5 Testing for a Unit Root in Seasonal Time Series Models As discussed in Chapter 8. = 2. A2Zt.4.For the AR(1) model. however.-I + a . (9. 1. To find out the order of required differencing.4. It is clear that with the general structure introduced in this section. the null hypothesis. The t statistic in this case becomes which is not less than the 5% critical value -2.34) and test Ho : a = 0 against HI : a < 0. Thus. is a seasonal time series with a seasonal period s. cannot be rejected. the test can be used for any general mixed ARMA model. in testing a unit root. + Some remarks are in order.BS)Z.206 CHAPTER 9 Testing for a Unit Root which is not less than the 5% critical value -13.93 for n = 50 given in Table G(b). = aZ. for most homogeneous nonstationary series the required order of differencing is rarely greater than 2.. If the null hypothesis is not rejected. A g i n the null hypolhesis. and so on until an order of integration is reached. Thus. a seasonal time series can also be nonstationary and requires a seasonal differencing to remove its nonstationarity. some other transformations such as those discussed in Chapter 4 may be necessary.. we conclude that the model contains a unit root.435) 2. In practice.-. we can repeat the above test on the series AZ. This conclusion implies. 2. . . A formal testing procedure is introduced in this section.. and their percentiles for various sample sizes and for s = 2..2 Testing the General Multiplicative Zero Mean Seasonal Model For the more general multiplicative seasonal model.. .i. &).6) is a nonlinear function of (a. Zo are initial conditions and the a.5. 9. are discussed by Dickey. we say that the associated time series model contains a seasonal unit root. 9.1 Testing the Simple Zero Mean Seasonal Model Consider the seasonal time series model (1 . are i. .(PBS)Z. The Studentized statistic for testing the null hypothesis Ho : @ = 1 is where and The sampling distributions of n ( 6 . in (9. Z. . is Gaussian.4pBP) = 0 lie outside of the unit circle.d. . -- . Zz-.1) and T. random variables with mean 0 and constant variance a:. The linear that the a. We note 4) where 4 = (+I. (9. and 12 are reproduced in Tables H and I. or..5. The OLS estimator of @ is given by which is also the maximum likelihood estimator when a. equivalently.9.5. and Fuller f1984).4. Hasza. consider where the roots of (1 . . + a. . = @Zt-.= a.B .5 Testing for a Unit Root in Seasonal Time Series Models 207 a case. . . under the nut1 hypothesis Ho : Q = 1.1) where Zl-.5. 5. The extension to the seasonal model with a nonzero mean is straightforward from the results of Section 9..5.1) and Tin (9. Equation (9.~to] obtain the OLS esti. where e. = a. . t$). Under the null hypothesis. Equations (95. Gp~~)~.B3Zt.3.208 CHAPTER 9 Testing for a Unit Root A h approximation of this function a. to obtain an initial estimator 4 of 4.(l. we have a = 0.-.-s.-l. where ASZ. I$).(@. we wiU reject Ho : @ = 1if the estimator of a = (@ . In other words. 6) --.8) and use Equation (9.. . 5 (1 .R. The models in (9.3 and 9.4. Regress ASZt on .ASZ. . To test the null hypothesis Ho : 0 = 1. then it implies that < 1 and the process is stationary. For most practical purposes-a purely seasonal model without any regular ARMA component is rare. using Taylor's expansion gives where Rt is the Taylor series remainder. Equivalently. Dickey. If' a is much less than 0.).5. Most of the time the question of a zero or nonzero constant and a zero or nonzero mean arises when one needs to decide the proper regular differencing discussed in Sections 9. in testing a unit root with a nonzero constant or a nonzero mean. AsZ.-2.and Fuller (1984) show that under Ho : @ = 1. we usually .5.+ p ) .I) is significantly negative.I) = a! and be used to test the null hypothesis No: <P = 1.5. . (@ .5. This estimator can be easily seen as consistent because under the null hypothesis Ho : @ = 1..5. A S Z ~ .) = 0.(@. 4) . . 4) .rPp . on [(I .6) becomes which is a stationary AR(p) model for the seasonally differenced series ASZI. The estimator of a =.5.6) and (9.7) to regress a. the percentiles of Tables H and I for n(@ . evaluated at (@.6) imply that E(Z.1) and (9.1) can mator of (0. . Hence.B $Z. H%sza. Compute a t ( l . . from (9.7) suggest the following two-step regression procedure: A 1.6). 2.1) are also applicable for the general multiplicative seasonal model in (9. 5 Testing for a Unit Root in Seasonal Time Series Models 209 apply the results given in Sections 9. beer productions fiom the fhst quarter of 1975 to the fourth quarter of 1982. where Y. we will use (rr .S. which is the mean adjusted data set of Series W10 of the 32 quarterly U.s)(@ . we refer the interested reader to Dickey.3 and 9.7608. Thus. . EXAMPLE 9. n ( 6 .S. of @ and its associated statistics presented earlier should be modified accordingly. Instead. Thus. . . The OLS regression becomes where the value in^ the parenthesis is the standard error. we will not present the extension of the purely seasonal mode1 with a nonzero constant here.s Consider the series.67 for n = 40 given in Table H. the estimator Q. The value of the test statistic becomes (tt .9014 . .4. .1) = (32 4)(. and we condude that the model for the series of quarterly beer productions contains a unit root at the seasonal period of 4. Hasza.9. is quarterly U. which we studied i n Example 8. the null hypothesis.. Thus.32. cannot be rejected. Thus. .4. with t < 1. and & available. The t statistic in this case becomes - . In actual data anatyzis. and In testing the hypothesis I& : Q.s)@ . With no knowledge of the underlying model.1) instead of although the limiting distributions of the two statistics are exactly the same. we can begin with the test of whether the model contains a unit root at the seasonal period.1) = -2. which is not less than the 5% critical value -8. Gms. . without assuming any initial values for Z. = I. Z. That is. when there are no good initial values of Zl-.. Z.I). . Ho : CD = 1.F.beer productions and is the mean of the series. we fit the regression model for t = 5. . and Fuller (1984). s = 4. = 1. and we conclude that the model for the first seasonal differences of quarterly beer production no longer contains unit roots.s)(& ..1) = (28 . (9. the null hypothesis. Ho : @ = 1. . TO see whether the second-order seasonal difference is required we can let bV. cannot be'rejected.is called for. The t statistic in this case becomes we have fta .0628 . which is much less than the 5% critical value -8.3.5.7 for n = 50 given in Table F(a).1) = (28 . for t = 5. (1 . Ho : iP = 1. No : Q. in the regression mode{ IK = #W-l + a. The t statistic in this case becomes which is also much less than the 5% critical value -137 for s = 4 and n = 40 given in Table I. and test the hypothesis. i n the regression model . = 1. : @ = 1.1)(. From the OLS regression results .. . which is much less than the 5% critical value -7.67 for s = 4 and n = 40 given in Table H.. where for convenience we reindex the 28 available differences starting from time 1. . again the null hypothesis. Thus. 32. Thus. To see whether the series also needs a regular differencing.87 for n = 40 given in Table I.4)(.'s. should be rejected.B ~ z .336 .1) = -22. and we should reject the null hypothesis.1 and test the hypothesis. Ho : 4 = 1.3 for n = 25 or -7.1) = -17. the null hypothesis. 28.928. The above results imply that the series requires one seasonal difFerencing of period 4. and we conclude that the model for the first seasonal differences of quarterly beer productions no longer contains a unit root at the seasonal period. for t = 1.2 20 CHAPTER 9 Testing for a Unit Root which is not less than the 5% critical value -1. The OLS regression results are The value of the test statistic becomes (n .14) . = (1 . with a total of 28 W. . Ho : 4 = 1.. we can apply the results in Section 9.B~)z. .I)($ . should be rejected. Thus.4928. The above test result suggests that a seasonal difference. 9.B~)z.3. Z = p + (@a. we reach the same conclusion. and the a.17).5. 9. (a) Asgme that the a. .i. = p +~.-~ + a.5.1) in testing a unit root for the model where -1 < 8 < 1. where zt = Z.. is anAR(1) process. . Z.zol$jI $(B) = $ j ~$0 ~=. 9. are white noise with mean 0 and variance (T:.17).3.p ) . I. Prove that # 0. (b) Test for a unit root on Series W6.3 Derive the limiting distribution of T. = #z. (b) Assume that xJ:i+j variables. The implication is that we can identify a stationary model for the quarterly differences.1).f l r ~ ( 0ui(l . personal consumption of gasoline and oil given in Exercise 8. are i. hence. u:) random D f i ( Z . 9. where yj is the jth autocovariance of the process Z. equivalently.6 Derive the limiting distribution of n(& .t(& . (0. for the model given in (9.2 Derive the limiting distribution of .p and 1#1 < 1.1. 9.11) for the model given in (9.. (1 .5 (a) Find the limiting distribution of n(4.1). 9.S.@)-l). 9.95 given in Table G(a).Exercises 21 1 which is also much less than the 5% critical value .1) for the model given in (9.1 Let Z.d. and < m. then f i ( z . .4 (a) Test for a unit root on Series W7. (b) Test for a unit root on Series W16. .7 Derive the limiting distribution of Tfor the model given in (9. Show that &=-Yj = u1[I&(1)l2. (c) Show that if Z.9 Perform unit root tests on Series W14 and determine the required differencing to make the series stationary.~iI&j xJzo or. 9. where z. 9.7 and determine the required differencing to make the series stationary. ..a) 3 ~ ( 0u~[4(1)]').8 Perform unit root tests on the series of U. 1975). 10. 1981). Intervention analysis has been successfully used to study the impact of air pollution control and economic policies (Box and Tiao. There are two common types of intervention variables.In this chapter. the impact of the New York blackout (Izenman and Zabell. The t-test. we introduce the technique. The method is then generalized to study the impact of the events when the timing of interventions is uoknown and hence leads to the general time series outlier analysis. called iritemention analysis. it is extremely sensitive to the violation of the independence assumption as shown by Box and Tiao (1965). to evaluate the effect of these external events. . by how much? One may initially think that the traditional two-sample t-test could be used to analyze this problem in terms of comparing the pre-intervention data with the post-intervention data. the intervention is a step function.Intervention Analysis and Outlier Detection Time series are frequently affected by certain external events such as holidays. however. is there any evidence of a change in the time series (such as the increase of the mean level) and. who developed the intervention analysis to study a time series structural change due to external events (Box and Tiao. if so. the impact of the Arab oil embargo (Montgomery and Weatherby. and many other events. We first discuss the analysis when the timing of the interventions is known. 1975). and other policy changes. sales promotions. We call these external events i~zterventions. 1980). One represents an intervention occurring at time T that remains in effect thereafter. Even though the t-test is known to be robust with respect to the normality assumption. assumes both normality and independence. strikes. That is.1 Intervention Models Given that a known intervention occurs at time T. the impact increases linearly without bound. For example. Note that the pulse function can be produced by differencing the step function s J ~That .1. If 6 = 1.1. where 0 5 6 5 1. respectively.3) and (10.5) and (10. an intervention model can be represented equally well with the step function or with the pulse function. we have 0 < S < I. An impact of an intervention is felt b periods after the intervention. A fixed unknown impact of an intervention is felt b periods after the intervention. For 6 = 0.B)$q. we may have the response .1. it is a pulse function.1 Intervention Models 213 The other one represents an intervention taking place at only one time period. plT) = . For illustrations.4). Therefore. we plot the above interventions with b = 1 and 0 < S < 1 in Figure 10. The use of a specific form is usually based on the convenience of interpretation. but the response is gradual. and the response is gradual. We illustrate some commonly encountered ones. Tfius. depending on the type of intervention. Thus. is. the impact is 2. For most cases. we have and for a pulse input.1. sI~ 1. There are many possible responses to the step and pulse interventions.10. (10. For a step input.Sig = (1 .6) reduce to (10.1. Note that various responses can be produced by different combinations of step and pulse inputs. and the effect of a price or a tax increase on imports may be represented in Figure 10. as represented in Figure 10. As mentioned-earlier.214 CHAPTER 10 Intervention Analysis and Outlier Detection I n J l u J Response UB S p WEpt(T) FIGURE: 10.~~IT).1 Responses t o step and pulse inputs. - S B ) ] P ~+) o. however.7) can also be written as This model is useful to represent the phenomenon in which an intervention produces a response that tapers off gradually but leaves a permanent residue effect in the system. The impact of an intervention such as advertising on sales can be represented as shown in Figure 10.1. (b) oo < Oand wl < 0.2 Response t o combined inputs. FIGURE 10.2(a).!$~! response (10. [woB/(l (a) w g > Oand w l > 0.2.2(b).B). because p133 = (1 . . light and power. As the result. It contains many business units. The main purpose of intervention models is to measure the effect of interventions. and the root outside the unit circle represents a phenomenon that has a gradual response. its stock becomes an important component in many investment portfolios. .j = 1.osBSand 6(B) = 1 . and water. before the date of intervention. Its business is regulated by federal and state authorities and normalfy is given an exclusive right to perform its services in a specific area. Field . appropriate modifications must be made to the modd and estimation and diagnostic checking repeated. then we can make appropriate inferences about the intervention. on the other hand. .wlB .. The polynomial S(B). 10. k are intervention variables. These intervention variabfes can be either step or pulse functions. : t < T } . they can be proper indicator variables. and its stock does not fluctuate in value as much as most common stocks. its revenue-generating ability is relatively stable. The unit root represents an impact that increases linearly. . a response may be represented as a rational function where a(B) .10.is usually identified using the univariate model identifleation procedure based on the time series 2.. and the weights mi's in the polynomial o(B) often represent the expected initial effects of the intervention. and its model is hence known as the noise model. telephone and telegraph services.2 and 10. otherwise.6. Thus. Therefore. If diagnostic checking of the model reveals no model inadequacy.. It generally pays a high rate of dividend in addition to offering moderate growth potential. where lit. measures the behavior of the permanent effect of the intervention. (2. The form W ~ ( B ) B ~ ~ / Sfor ~ ( the B ) jth intervention is postulated based on the expected form of the response given knowledge of the intervention. For multiple intervention inputs. The noise model [B(B)/!P(B)]~. .e. free from any competition. . The roots of 6 ( B ) = 0 are assumed to be on or outside the unit circle. Br are polynomials in B. More generally. .North Carolina..4. gas and gas transmission. as shown later in Examples 10. . the time series free of intervention is called the noise series and is denoted by N. the model in (10. with respect to the intervention variables lit. Duke Energy Corporation is an utility company headquartered in Charlotte.2 Examples of Intervention h a l y s i s EXAMPLE 10. For a nonstationary process. 2. b is the time delay for the intervention effect.10) normally does not contain a constant term Bo.1.= o o .2 Exampfes of Intervention Analysis 215 More generally. including Franchised Electric. Natural Gas Transmission.1 A utility company provides electricity. i. .aLB .. we have the following general class of models: . 2002. Pennsylvania. On July 9. 'There are 166 observations in total. In this example. . Like all other companies. about the true value of DUK stock between July 22.06. it engages in businesses not only in North Carolina but also other areas inside and outside the United States. a law firm located outside Philadelphia.3 Daily stock price of Duke Energy from January 3. The law firm tried to recover damages for the company's breaches of its fiduciary duties and violations of the federal securities laws on behalf of at1 investors who bought Duke Energy Corporation stock during the period. to August 31. Following the announcement. International Energy. corresponds to the 129th observation. 2002. we perform an intervention analysis to assess the effect of this classaction securities litigation announcement on Duke Energy Corporation's stock prices. and the closing price.3 shows Series W11. fiIed a compfaint in the United States District Court for the Southern District of New York against Duke Energy Corporation alleging that the company misled investors about its business and financial conditions and. 2002. 2002. that specializes in representing shareholders and consumers in many classaction litigations throughout the United States. Schiffrin & Barroway. The company's shares are traded in the NeivYork Stock Exchange as DUK. was the first trading day after the announcement. 1999. it issued statements fiom time to time about its business and financial conditions. is regarded as the noise series containing no major intervention that would affect the stock FIGURE 10. $29.21 6 CHAPTER 10 Intervention Analysis and Outlier Detection Services. LLP. Because the announcement was made late in the day and July 10. Figure 10. and July 9. of July 9. 2002. 2002.2002. and Other Operations. and August 31. the period between January 3.25 and closed at $29. and May 17. consequently.2002. the daily closing stock price of Duke Energy Corporation between January 3.2002.06 per share that day.2002. Duke Energy North America. the stock of Duke Energy Corporation fell $1. Through these units. Error .1 Sample ACF and PACF of Duke Energy Corporation stock prices price. The sample ACF and sample PACF of the original and the differenced noise series as shown in Table 10..2.B)N. t r 130 (July 10.2002). t < 130 (July 10. one could propose the response hnction where represents the impact of the litigation announcement and 0. = a.1) By assuming that the effect of the class-action litigation against the company is to cause an immediate level change in stock prices.10. (10.2 217 Examples of Intervention Analysis TABLE 10.2002).1 suggest the random walk model (1 . Estimate St. 1. The intervention model becomes The estimation results are Parameter . 2 18 CHAPTER 10 Internention Analysis and Outlier Detection The sample ACF for the residuals show no signs of model inadequacy. The problem comes from substances produced by chemical reactions in sunlight among some primary pollutants such as oxides of nitrogen and reactive hydrocarbons.4 shows the monthly average of hourly readings of O3 in parts per hundred million (pphm) in downtown Los Angeles from 1955 to 1972. To ease the air poliution problem. Tfius. after 1966 special regulations were implemented to require engine design changes in new cars in order to reduce the production of 03. One measured product that is indicative of the degree of this photochemical pollution is ozone. 1955-1972. and the estimate of the intervention parameter is highly significant. often denoted by 03. It was through the study of the effect of these events on the pollution problem that Box and Tiao (1975) introduced the intervention analysis. which causes such health hazards as eye irritation and lung damage. there is evidence that the announcement of the litigation against Duke Energy Corporation did cause a downward effect on its stockprices. FIGURE 10.4 Monthly average of hourly readings of O3 (pphm) in do~vntownLos Angeles. The products of these chemical reactions are responsible for the notorious Los Angeles smog.2 Los Angeles has been known to be plagued by a special air pollution problem. including the diversion of traffic in early 1960 by the opening of the Golden State Freeway and the inception of a new law (Rule 63) that reduced the allowable proportion of reactive hydrocarbons in the gasoline sold locally. Figure 10. . different methods were instituted. Also. EXAMPLE 10. Unfortunately. 1. no such data are available.. has significant spikes only at lags 1 and 12.to 1960 is assumed to be free of intervention effects and is used to estimate the noise model for. Estimation results of the above model are Parameter Estimate St.B'~)N. within this period suggests nonstationary and highly seasonal behavior.10. The sampleACE.2 Examples of Intervention Anaiysis 219 The period from 1955. t 2 January 1960.. Error . The effect of I2would be most accurately measured by the proportion of new cars having specified engine changes in the car population over time. the effect of I2 would be different in these two seasons. "winter" months November-May beginning in 1966. One might. {I' 0. The ACF of the seasonally differenced series (1 . Intervention I2 would be represented by the regulations implemented in 1966 requiring engine changes in new cars. represent the effect of I2 as an annual trend reflecting the effect of the increased proportion of new design vehicles in the population. "summer" months June-October beginning in 1966. otherwise. 1. which might be expected to produce a step change in the O3 level at the beginning of 1960. t<January1960. otherwise. 0. which implies the following noise model: Box and Tiao (1975) suggest that the opening of the Golden State Freeway and the implementation of Rule 63 in 1960 represent an intervention I. Because of the differences in the intensity of sunlight and other meteorological conditions between the summer months and the winter months. however.N. Thus. Box and Tiao (1975) proposed the model where Ilt 13' = = 0. Montgomery and Weatherby (1980) applied an intervention model to the natural logarithm of monthly electricity consumption from January 1951 to April 1977 using a total of 316 observations. there is a reduction of O3 level in the summer months but not in the winter months.2) X (O. To test the validity of this belief. Thus. Error Because the estimate of S1is not statistically significant. they used 275 months of data from January 1951 to November 1973 to model the noise series and obtained the folloiving ARZMA(0. one could propose the intervention model where oo represents the initial impact of the oil embargo and The estimates and the associated standard errors are Parameter Estimate St. there is evidence that (1) intervention l1 reduces the level of O. show no obvious inadequacies in the model.1. The prices of petroleum products rocketed. Montgomery and Weatherby dropped the parameter S1 in the foIlowing modified model: . and (2) associated with intervention 12.. 1. and the cries for energy conservation were loud and clear everywhere in the United States. EXAMPLE 10. Montgomery and Weatherby (1980) assumed that the effect of this embargo was not felt until December 1973.220 CHAPTER 10 Intervention Analysis and Outlier Detection Residuals of 6. Thus. lIlzmodel: By assuming that the effect of the embargo is to cause a gradual change in consumption.3 The Arab oil embargo in November 1973 greatly affected the supply of petroleum products to the United States. Although the embargo started in November 1973. It is generally believed that the rate of growth of energy consumption following this event was lower for the post-embargo years. M. Because the model was built using the natural logarithm of electricity consumption. which is plotted in Figure 10. The blackout was long. which is defined as the time from onset of the last menstrual period (LMP) to birth. Using a total of 313 weekly births in New York City from 1961 to 1966. Izenman and Zabell(1981) applied the intervention analysis to the phenomenon.2." Numerous articles were published afterward in newspapers and magazines both inside and outside the United States alleging a sharp increase in the city's birthrate. equivalently. Thus. or.294. . The residual ACF does not exhibit any model inadequacy. 1965.2 Examples of InterventionAnalysis 221 where I. November 9. 1966. falls in the middle of the 254th week. A number of medical and demographic articles then appeared with contradictory statements regarding the blackout effect. t = 292. was defined as in (10..295. EXAMPLE 10. Therefore. the post-intervention level of electricity consumption was 93% of the pre-intervention level. most of New York City was plunged into darkness due to a massive power failure. and most of the city remained dark during the night. Error ! The estimates are all statistically significant.4 At exactly 5:27 P.07 = . August 10. otherwise. 1965. On Wednesday. the first 254 weekly birth totals are used to model the noise series with the following process: Furthermore. which occurred on November 9. It = (0. The estimation results are Parameter Estimate St. Because the blackout. the estimate of the intervention effect in terms of the original megawatt hours (MWH) metric is e'o = e-.8) is satisfactory.7).93.293. the New York E m s carried a front-page article with the headline "Births Up 9 Months After the Blackout. Izenman and Zabell(1981) suggested the intervention form where wo is used to represent the effect of the blackout and 1.5. the effect of the Arab oil embargo reduced electricity consumption by 7%. occurs at 40 or 41 weeks after the LMP. the obstetrical and gynecological studies show that the mode of the gestational interval. the intervention model in (10. The result implies that the embargo induced a permanent level change on electricity consumption.2.10. The previous intervention variable I.2. The estimates and associated standard errors are Parameter Estimate St.12)is slightly different from the model given in Izenman and Zabell(198i).2. Because the parameter estimate of wo is not statistically significant. Error The residual ACF shows no signs of model inadequacy. 1961-1966. we have the intervention model where I. Thus.5 Total weekly births in New York City.222 CHAPTER 10 Intervention Analysis and Outlier Detection 2500 1960 I I 1 I 1962 1964 1966 1968 - Year FIGURE 10. Here. is defined as in (10. we note that the model estimated in (10. .11). was introduced to take the value 1 for weeks 38 through 41 after the blackout to give the model the maximum possible chance of detecting the effect. we conclude that intervention analysis of the New York City birth data does not detect a significant increase in births that can be ascribed to the blackout. and Tiao (1983). Chang and Tiao (19831. Because outliers are known to wreak havoc in data analysis.3 Time Series Outliers Time series observations are sometimes influenced by interruptive events. The consequences of these interruptive events create spurious observations that are inconsistent with the rest of the series. Assume that (X. the timing of interruptive events is sometimes unknown. it is important to have procedures that will detect and remove such outliers effects. Martin (1980). additive and innovational. When the timing and causes of intemptions are known.10. outbreaks of war. be the observed series and X. The detection of time series outliers was first studied by Fox (1972).. Chang.B.)follows a general ARMA(p q) model - . Hillmer. Such observations are usually referred to as outliers. however. In practice. where two statistical models.1and 10.B4) are stationary and invertible operators sharing no common factors and (a.) is a sequence of white noise. sudden political or economic crises. be the outlier-free series.2.1 Additive and Innovational Outliers With no loss of generality. consider a zero mean stationary process. a:). were introduced. Other references on this topic include Abraham and Box (1979). Let Z.. .3. 10. Bell. such as strikes. Tsay (1986). Tiao. An additive outlier (AO) model is defined as where . . or even unnoticed errors of typing and recording. unexpected heat or cold waves. and Chen (1988).- where +(B) = l-+lB .+p BP and B(B) = (1 -BIB .3 Time Series Outliers 223 10. their effects can be accounted for by using the intervention model discussed in Sections 10. . making the resuitant inference unreliable or even invalid. identically and independently distributed as N(0. 3. 3 . Z n whereas an innovational outlier affects all observations ZT.. where X. v j ( B ) = I for an AO.2 Estimation of the Outlier Effect When the Timing of the Outlier Is Known To motivate the procedure for detecting A 0 and 10. 2 ~and ) (10.3b)that ~ ) a.1) are known. a time series might contain several. 10. An innovational outlier ( 1 0 ) model is defined as' Hence. More generally. . .3.. . beyond time T through the memory of the system described by O(B)/t$(B). AO: e. ZT+l. = W T ( B ) I ~ + and .rr(B)Z.224 C W T E R 10 Intervention Analysis and Outlier Detection is the indicator variable representing the presence or absence of an outlier at time T. = ( 6 ( B ) / + ( B ) ) a. outliers of different types.3. Letting and defining e. and vj(B) = O(B)/+(B) for an I 0 at time t = Tj. say k. and we have the follo~vinggeneral outlier model: . we have from ( 1 0 . we consider a simpler case when T and all parameters in (10. an additive outlier affects only the Tth observation. = . = ertl.= (1 . f 2) . .) is white noise.::7Tj where a * ( F ) .3.x.7) can be written as Let hT be the least square estimator of w for the A 0 model. from the least squares theory.The variance of the estimator is f f : -- (10.10. Because (a.3. the A 0 model in (10. .3 Time Series Outliers 225 For n available observations.l:wjeT+j 2.1 1) T ~ ' Similarly. F is the forwatd shift operator 7 ~ ~ -1 .a l F such that Fe. we have that AO.= e~ . we have 10: and Gr = e~ (10. ~ n-T 2 zj=O rj. .3.T.-~F"-~). and T~ = . letting CjlT be the least squares estimator of o for the I 0 model. &. Compute the residuals from the estimated model..&B- be the initial estimate of (I.Oj. a! will tend to be overestimated. both A1. we can proceed to calculate A1. *.. w h e r e d ( ~ )= (1 .=and are distributed as N(0.and in some cases.3 Detection of Outliers Using an Iterative Procedure If' T is unknown but the time series parameters are known. are usually unkno\vn and have to be estimated.and e. ~ ... and hZrfor each t = 1. . the best estimate of the effect of an I 0 at time T is the residual en whereas the best estimate of the effect of an A 0 is a linear combination of en e ~ +. . It is easily seen that V a r ( h T )5 Var(GT)= vi.n. i. . q.:.and a.. . I).. The likelihood ratio test statistics for A 0 and 10 are and Under the null hypothesis Ho. as expected. e t . however. It is known that existence of outliers makes the estimates of the parameters seriously biased.$ p ~ ~ ) a n d i =( ~(1) - a l -~ . with weight depending on the sbvcture of the time series process. and then make decisions based on the above sampling results.) by assuming that there are no outliers. Model the series (2.3. .4 ~ q )~ . In practice. In particular. the time series parameters 4j. Step 1.226 CHAPTER 10 Intervention Analysis and Outlier Detection Thus. . Various tests can be performed for the hypotheses No: ZT is neither an A 0 nor an I 0 H2: Zr is an 10. 10. .2.e. Chang and Tiao (1983) proposed an iterative detecting procedure to handle the situation when an unknoivn number of A 0 or I 0 may exist. Vw(GAT)can be much smaller than crz. = $(B)Z. . . based on the modified residuals and and repeat step 2 until all outliers are identified. This result leads to the new residuals A revised estimate of a: can then be caIculated.n. . and Tk. Step 3. . and defining the new residuals using (10.for i = 1 . Define A A where T denotes the time when the maximuh occurs. .. and estimate the outlier parameters w l . and i2.. .3 Time Series Outliers 227 Step 2.i. Treat these times as if they are known. then there is an I 0 at time T with its effect being hT.2bf as and define the new residuals using (10...TI > C. 1f AT = /A1. .7) as If IT = > C.10. The initial estimates in m(B) remain unchanged. Step 4.3. Recompute ir. x. . 2 . using the estimated model. Calculate hl.3a).ukand the time series parameters simultaneously using the model where vj (B) = 1for an A 0 and vj(B) = 6(B)/#(B) for an I 0 at t = Ti. where Cis a predetermined positive constant usually taken to be some value between 3 and 4. This I0 effect can be removed by modifying the data using f 10. oz. T2. .3. .3. ..e. Suppose that step 3 terminated and k outliers have been tentatively identified at times T I . .. .h2.One can modify the data using (10.8) as A new estimate is then computed from the modified residuals. and. then there is an A 0 at time T with its effect estimated by GAT.3. 05 in the series.8 1 -~ - i q ~ q )are 10. The result indicates that there are no outliers evident with a significance level . . We then apply the procedure to this outlier contaminated series. the effect is much smaller. Inspection of the residuals from the fitted model indicates the possible existence of a number of outliers. $ ( B ) = (1 obtained in the final iteration. ( B ) = ( 1 . model in Chapter 8. quarterly series of beer production between 1975 and 1982.S. The series is the daily average number of defects per truck .&BP) and . Both time series software SCA (1992) and AUTOBOX have implemented the procedure and make the analysis easier.5 The outlier detection procedure discussed in Section 10.54 with a new observation ZI2 = 56.21. the procedure correctly identifies the A 0 at t = 12 in the first iteration. O7 0) X (0. which is plotted in Figure 10.3 has been applied using software AUTOBOX to the U.6 resulting in an AR(1) model with 6%= .228 CHAPTER 10 Intervention AnaIysis and Outlier Detection Steps 2 through 4 are repeated until all outliers are identified and their impacts simultaneously estimated. which could be due to a typing error.6 Series W1 was analyzed in Sections 6.2 and 7.. Thus.54. We illustrate the following examples by using AUTOBOX and a standard regression package.6. To check the efficiency of the proposed procedure. 1. . EXAMPLE 10. we have the fitted outlier model where $ . The analysis produces the following result: Iteration Detected Outliers Time me Magnitude (G) Thus. A computer program based on a modification of the above procedure was also written by Bell (1983).4 Examples of Outlier Analysis The above outlier detection procedure can be easily implemented in any existing intervention analysis software or linear regression package. . we artificially contaminated the series by replacing the original observation ZI2 = 36. EXAMPLE 10. which was fitted earlier by a seasonal ARIMA(0. Although & has also been detected as an I 0 in the second iteration. 10.S. found in the final inspection at the end of the assembly h e of a truck manufacturing plant. To ensure the maintenance of quality. quarterly series of beer production between 1975 and 1982.6 Outlier contaminated U.4 Examples of Outlier Analysis 1975 I I I I 1977 1979 1981 1983 229 . Year FIGURE 10. In this example. the detection of outliers is always an important task in quality control. we consider the following outlier model: VP~ . we apply the previous outlier detection procedure to the data set and obtain the following results: Detected Outliers Iteration Time Thus. + (I -1SB) Wlf(TI 9 where we recall that X.1). For a production process under proper quality control.4.) (.37) (1 . many other h e series outlier detection procedures have been introduced in the literature.II. the change in the estimate of the autoregressive parameter is also substantial. is to search for the causes of the identified outliers and further fine-tune the fitted model in (10.991j7) (.230 CHAPTER 10 Intervention Analysis and Outfier Detection Simultaneous estimation of the previous parameters yields 1 + + 2. 1993).24). Comparing (10.1. the intervention analysis is used when the timing and causes of intermplions are known. M o l t e r (19901.21 to .661j4) + a.14 4. = 1. however. and Ljung (1993).3) and 6.3. For example.28B) (. A more hitfuf approach.11. = .43 to a much smaller value of . 10.31) (10.. Other detection procedures include Tsay (19881. because the underlying outlier-free model is often unknown and outliers can distort model identification as shown in the following. After outliers are identified. . one can adjust data using (10.391$~~). from .5 Model Identification in the Presence of Outliers As indicated earlier.61ij9) (-19) + . decreasing from .28. Furthermore.11) (-. we refer readers to Chen and Liu (1991. Because the timing and causes of interruptions are sometimes unknown. Other than the model based iterative detection procedure.3. = xt+ (1 1 - B)o~l!n and TC: 2. In searching for the causes of an outlier. one would expect the series of defects to be random white noise.4.3.4. we see that a reduction of about 100% in the estimated variance 6: of the at's occurs. is the underlying outlier-free process. when the effects of the four outliers are taken into account. one may find the nature of the disturbance. in addition to the additive outlier (AO) and innovational outlier (10).3) with (10.18) or (10. = X.20) and then pursue the analysis based on the adjusted data. For empirical examples that contain possible LS and TC situations. That would occur if a smaller critical value C were chosen and a few more additional outliers were identified.the following level shift (LS) and temporary change (TC) can also be incorporated in the iterative outlier detection procedure discussed in Section 10.3: Ls: 2.11) (.This process is appropriate not only for parameter estimation but also for model checking and forecasting. Lee and Wei (1995) proposed a model-independent outlier detection procedure. 7 As an example let us consider Series W14. some outliers may tum out to be important intervention variables due to some policy changes with which the analyst was unfamiliar and hence were overlooked during the preliminary stage of data analysis. This explicit form of a combined intervention-outlier model is usually more useful in forecasting and control than the uaivariate fitted model based on the outlier-adjusted data. which is clearly an intervention event. the month of the World Trade Center tragedy in New York City. The outlier procedure not only detects the event but also suggests the form of the intervention. EXAMPLE 10. Thus.functions as discussed in Sections 10.2.5 Model Identification in the Presence of Outliers 231 For example. then the only outlier found is the observation at time 81 that corresponds to September 2001.01. the monthly airline passengers in the United States from January 1995 to March 2002 plotted in Figure 10.10. .1 and 10. we obtain the following result: Detected Outliers Iteration Time PP~ If we use a significance level Iess than . instead of using the adjusted data by removing the effects of outliers. Without looking at the plot and blindly applying the outlier detection method introduced in the chapter to the series. the analyst should incorporate the information into the model by introducing proper intervention variables and response.7.7 The monthly airline passengers in the United States from January 1995 to March 2002 (Series W14). 1996 1998 2000 2002 Year FIGURE 10. and ESACF. In practice. The underlying AR(1) characteristics have been completely washed out by this single outlier.0. The results all indicate a white noise phenomenon.3 works well only when the outlier effect is moderate and will not overshadow and obscure the underlying pattern contained in the sample statistics of an outlier-free series. that the value 22221 was used for Z20instead of 2221. as shown in Table 6.2001) t 2 81 (Sept. A white noise series is itself an outlier in empirical time series. 1. and ESACF are computed and shown in Table 10.3. Its sample ACF. For example. Such an outlier can be easily detected by plotting the series. Figure 10. The data are listed as Series W3 in the appendix. The estimate of the AR(1) parameter was found to be . outIier contamination may make model identification impossible. one should first examine whether there is an obvious outlier in the data. The iterative outlier detection procedure is based on the assumption that the underlying model for the outlier-free series is either known or can be identified. Error The effect of the September I f .3) and the information about the observation at time 81 in the intervention model where 0.5. we will combine model (10. 0) X (0.1. . In more serious cases. due to a typing error. sample PACF. however. Thus. which is always the first aid for any data and outlier analysis. when encountering a white noise phenomenon in studying sample statistics such as sample ACF.2. Thus.232 CHAPTER 10 Intervention Analysis and Outlier Detection The standard time series modeling on the subseries from January 1995 to August 2001 suggests the ARZMA(2. the underlying model is usually unknown and has to be identified through the standard analysis of sample statistics.8 shows the series contaminated by this outlier at Zzo. O)lz seasonal model: Thus. 2001. suppose. tragedy on the airline industry is clearly devastating. Now. it should be emphasized that the detection procedure of Section 10.73 in Table 7.6. such as sample ACF. The estimation results are Parameter Estimate St. 2001). t<81(Sept. the blowfly data exhibit a very clear autocorrelation pattern of AR(1) process. sample PACF. sample LACF. . . a number of robust estimation methods have been introduced in the literature. . Chan and Wei (1992) introduced the following a-trimmedsample autocorrelation function (TACF). . ..5 Model Identification in the Presence of Outhers 233 To reduce the adverse effect of outliers on the parameter estimation. . Let Z(l) 5 Z(2). Z.8 Outlier contaminated blowfly data. &. . . where FIGURE 10.10. I be the ordered observations of the given time series Z1. The a-trimmed sample autocorrelation function is defined by . In terms of estimation of the ACF. and g is the integer part of [an]and O 5 a 5 .where Thus. otherwise. we can approximate the standard error of the TACF by .5. the TACF in (10. under the white noise model it can be easily shown that the for k 2 1 are asymptotically independent normal random variates with asymptotic variances l/v(k).~Z(.5) trimmed a% extreme values from the computation of the sample ACE Because {L?)) is a deterministic sequence of real numbers (0 or 11.-ptl). [c) ESACF 4 5 6 7 8 9 10 ifZ~Z(g)~rZ.2 Sample ACF. AR I 2 3 d"l={O' 1. and ESACF for an outlier contaminated M(1) series.05. the asymptotic theory for time series containing amplitude modulated observations can be applied to the TACF. In other words. sample PACF.234 CHAPTER 10 Intervention Analysis and Outlier Detection TABLE 10. Follo~vingDunsmuir and Robinson (19811. Because of the length of the chapter. we refer readers to Tsay (1986). and Li (2003). we will not discuss this interesting problem and refer readers to Wichern. In closing the chapter. If we choose a large value for a.5. Sip) > Sikfor the white noise model.1993). but it also performs well in the presence of both additive and innovational outliers. in practice. a = 3 to 5% for medium contaminated series. Miller. we recommend using a = 1 to 2% for general series. In practice. Other robust estimations of ACF include the jackknife estimator proposed by Quenouilli (1949) and the robust sample p d a l autocorrelation and autocorrelation functions by Masarotto (1987).Exercises 235 where Note that f (k) < 1 when a > 0. Abraham and Chuang (1989).the TACF is no longer an efficient estimator because G(k) becomes smaller as a increases. and a = 6 to 10% for heavily contaminated series. for proper model identification the use of robust estimates of autocorrelations is recommended.1 Consider the following response functions of interventions: (1) U O P P WI (4) [(I Y 8 B ) + (1 . Different techniques are needed to study this problem. 10. all the proposed robust ACF estimations outperform the standard sample ACF. On occasion. When there are outliers in the series.5) not only compares favorably with other robust estimators in terms of the root mean squared error. which is expected because the information in the trimmed series is always less than that in the original series under the outlier-free situation. There are many studies on time series outliers. The choice of a is very important for the TACF. we note that internention analysis is a useful technique to study the impacts of intervention and outliers that normally cause the level shift of a time series. Because. however.B ) ] P i g . the simple TACF in (10. Chan and Wei (1992) show that in estimating the ACF. For other useful discussions on robust methods and model identification in the presence of outliers. Wei and Wei (1998). Chen and Liu (1991. if we select a very small value for a . the shift occurs in the variance of a series. we may not know whether outliers exist. Thus. and Hsu (1976) and Abraham and Wei (1984). On the other hand. the purpose of the estimator (to trim out possible outliers in the series) might not be fully fulfilled. 10. 10.SB)(1 .B ) P' (a) Plot the above response functions.5 interventions. (c) Compare and comment your results in parts (a) and (b).236 CHAPTER 10 Intervention Analysis and Outlier Detection Wo (27 ( I . (b) Find a time series model for the same data in part (a) using the 5% trimmed sample autocorrelation function. and build an outlierintervention model for the series.p! .3 Perform and report an iterative outlier analysis for the following time series (read across): 10.3 using the sample ACF. (c) Compare and discuss your findings in parts (a) and (b).k. (a) Fit a best AREVIAmodel for the series. Carry out intervention analysis and submit a written report of your analysis. (b) Detect all possible outlier and intervention variables. .2 Find a time series that was affected by some external events. which was likely contaminated by some outliers or 10. (b) Discuss possible applications of the various interventions. I. (a) Find a time series model for the data given in Exercise 10.4 Find a time series of your interest. Bernoulli.7. J. in addition to Fourier. is known as freqi~ertcydomain analysis. we concentrate on Fourier analysis of discrete t h e functions or sequences. P. . In this chapter. In this chapter. L. we study how to construct an arbitrary function using these sinusoids. which tries to describe the fluctuation of time series in term of sinusoidal behavior at various frequencies. Later he also obtained a representation for nonperiodic functions as weighted integrals of sinusoids that are not all harmonically related. Lagrange. With the proper understanding of these techniques. we establish the orthogonality property of sine-cosine functions and the complex exponentials in Section 11. DiricNet. J. Using this property.6. such as the physical motion of vibrating string and heat propagation and diffusion. Euler. The fast Fourier transform used in the actual computation of the Fourier representation is presented in Section 11. An alternative approach.1 General Concepts Often it is more convenient to represent a function by a set of elementary hnctions called a basis such that all functions under study can be written as linear combinations of the elementary functions in the basis. Laplace. is known as time domaitt analysis. The Fourier transform of an aibitrary nonperiodic sequence is discussed in Section 11. B. the essential ideas of Fourier analysis carry over to phenomena in discrete time. L. One very useful set of these elementary functions involves the sines and cosines or complex exponentials.3 and 11. 1 1. we develop the Fourier series representations for finite sequences and periodic sequences in Sections 11. we introduce some basic concepts of Fourier analysis. Although the original work of Fourier analysis was solely concerned with phenomena in continuous time. which uses functions such as autocorrelations and partial autocorrelations to study the evolution of a time series through parametric models.5. The Fourier representation of a continuous function is presented in Section 11. The development of the techniques of Fourier analysis has a long history involving many mathematicians. D. Fourier. including E. who claimed in 1807 that any periodic function could be represented as a series of harmonically related sinusoids. S. we are then ready for the frequency domain approach of time series analysis in Chapters 12 and 13.4. and P.Fourier Analysis The time series approach presented in earlier chapters. Specifically. For our purposes.2. The study of this subject is often referred to as Fourier analysis due to the 18th-century French mathematician J. which are fundamental for the frequency domain analysis. 2. and cos(2r(n/2)t/rr) = cos(?rt) = (. Fork = 0. for t = 1. then [n/2] = ( ~ r. (11 . . . 1. 1112 . and the system contains cos(2~0tln)= 1.n. . .i ~ COS w = 2 ' (1 1. sin(2rktln) and cos(2?rkt/n) for k = 1. .2 Orthogonal Functions Let &(t) and #j(t) be compfex valued functions defined on the domain D. where [xlis the greatest integer less than or equal to x. which is a subset of the real line. . we concentrate on the discrete-time functions.238 CHAPTER 11 Fourier Analysis 1 1. eim = cos w + i sin o and the identities .4) . . Continuous time functions 4k(tt) and #j(l) defined on an interval of the real line are orthogonal if In this book.3) is actually a collection of orthogonal functions.1)/2. sin(2?rkt/ts) and cos(27~ktln)for k = 1. Next.2. Discrete-time functions #k(t) and 4j(t) defined on a discrete set are said to be orthogonal if where @(t) denotes the complex conjugate of +..(t)..1)'. More specifically. 2. we show that the system (11. To do so. we introduce two equivalent systems of orthogonal functions that are useful in time series analysis. . [11/2] = n/2 when n is even. it has exactly n functions in the system. . . [11/2]. . . we use the Euler relation . There are many kinds of orthogonal functions.I. In this section. the system . In either case.e.1)/2 and the system consists of cos(2mOt/n) = 1. i. none of which is identically zero.2.2. If n is odd. Suppose that the trigonometric sine and cosine functions sin (2vktln) and cos ( 2 ~ k t j n ) are defined on a finite number of 11 points. i. which follows from the sine function being identically zero for k = 0 and for k = [n/2] if n is even.e.2. which are also called sequences. contains exactly n nonzero functions.io sin w = _ 2i e-i~ ' e i ~+ e . . . + 0. 0.Equations (1 1. I . k # 0. .2. k k =.2.9) imply that if.2 Orthogonal Functions 239 Then But zeimi = zcos R r=l n wt t=l n +i sin wf.sin(ak) rt.9) ' Let w = 2/rrk/n. . 5 sin(?) t=l = O.10) .8) and (1 1. and cos o = 1 and sin w = 0 for w = 0. Because sin(mn/2) . k=0.11) . [n/21. k = O. we have (1 1. .2. t=1 Hence.8) t= 1 sin ot = sin( w(n t=l + 1) ) sin(wrt/2) sin(w/2) (1 1. (11.2. 0.2.2. (11. sin(w/?) sin(~k/kln) = (0.1l. we obtain 2vkt c o s ( ) c o s ( ) = { 2 .2. k # j.5) and (11. k = j = 0 or 142 ( n even).2.2. (112.A)).2. the corresponding system contains the following complex exponentjals: n II + 1 5 k 5 -if 2 2 er2?rh/n: .10) and (11.I ) ( n .6).13) k = j = 0 or n/2 (n even).1I). To show the orthogonal property of these complex exponentials.3) is a set of orthogonal functions.2. + A) + sin(@ .A ) .1) is even and . This form is much simpler and compact and is very useful in certain applications.- ti (12 .3) can also be represented in complex form.cos(o sin w cos h = ${sio(o + A)).2. k = j + 0 or n/2 ( n even). we note that where we use that .2. 1-1 This result shows that the system (1 1.A)). t= 1 0. In terms of this representation.-s k s 2 i f n is odd which again consist of exactly n functions. (11. it is clear that the system of the trigonometric functions in (1 1.14) = 0 for ail k and j. r= 1 0. k=jiOnrii/2(i~ermJ. 5 sin(2:kf) sin(?) = {:.2. sin o sin A = i{cos(w . using the trigonometric identities cos o cos h = ${cos(o + A) + cos(o . From Equations (11.240 CHAPTER 11 Fourier Analysis Next. and from Equations (11. k#j. 2. k = 0. 1. respectively.}. These frequencies are called the Fourier frequencies. 1 n . 11.2. Using the system of complex exponentials given in (11. To avoid possible confusion. we obtain the akand bkby multiplying cos(2mkt/n) and sin(27rkfln) on both sides of f 11.3.1.17) we immediately obtain the following result: That is. 2 ez. That is.1).1) in this operation.2. For a given n-dimensional space. [%I.2. . and then summing over f = 1. = 2.n-1)/2 ckeiUkr.3.la.3 Fourier Representation of Finite Sequences .3. . . the given sequence of n numbers.=1 n t=l cos(F). . &.16) is orthogonal.2.. it is known that any set of n orthogonal vectors forms a basis.2. . can be written as a linear combination of the orthogonal trigonometric functions given in the system (11. 8 n is even.1) is called the Fourier series of the sequence Z.. Let ZI. .keiukf. Thus. we have ak = { eZ. we often construct a set of vectors called a basis so that every vector in the space can be written as a linear combination of the elements of the basis.16). . if n is odd.11..15). (Z. k = 0. we can also write the Fourier series of Zt as In-1)/2 2. Thus. cos(~). (1 1.13) to (1 1. In vector analysis.Zn be a sequence of n numbers.3). The akand bkare called Fourier coefficients. and k = . . Applying Equation (11.3. {k=8+. .3 Fourier Representation o f Finite Sequences 241 and (eamkln)"= eilmk =.2) Let ok= 2?rk/n. Using the orthogonal property of the trigonometric functions given in (11.2. = F k=.. Equation (11. the reader can replace the index k by j in Equation (11. [n/2]. This sequence can be regarded as a set of coordinates of a point in an n-dimensional space. the system (I 1.112 if tt is even.. . . 3P. value.2.242 CHAPTER I1 Fourier Analysis where the Fourier coefficients ckare given by Equations (1 1. 1 1.1) holds is called the fundamental period or simply the period of the function. The above material shows that any finite sequence can be written as a linear combination of the sine-cosine sequences or the complex exponentials. It is dbvious that a function that is periodic with period P is d s o periodic with periods 2P.!(even n). and ck are easily seen to be related as co = %. .3. ckei"kt.3.4. if 11 is even. The smallest positive value of P for which (11. i. . ak .5) and (11. cnp = an/. - Thus.ibk ck = - 2 ' The coefficient co = a0 = x : = l ~ ..3) imply that [n/2J Z. if n is odd. the constant average value of the sequence.2.bk.e. . .6). the Fourier coefficients ak.1) and (11. from the relationships (11. = 2(akcos wkt + bksin wkt) k=O qeiwkt.c.4 Fourier Representation of Periodic Sequences A general functionf ( t ) is said to be periodic with period P if there exists a positive constant P such that for all t. . / nis often referred to as the d. 6)are periodic with period n.11. A fundamental phenomenon of a periodic function is that the function is uniquely determined by its pattern in a range of one period. which leads to the desired consequence that .3) and the complex exponentials eiwkt in (11. 1. n as the linear cornbina- The Fourier coefficients ak and bk in (11.[?I k = 1.2 with ok = 2sk/n.3.3. .4.5) and ck in (11.4 Fourier Representation of Periodic Sequences 243 Suppose that the sequence (or the discrete time function) 2. Following the result of the Fourier series of a finite sequence in Section 11.n as the linear combination of the orthogonal sine and cosine functions as follows: . .4.2.2. we can write 2.7) are related by the equations in (1 1.4. . a periodic sequence with period n is . Thus. for t = 1.. .sin wkt.4. That is. tion of complex exponentials . . It is easy to see that the sine-cosine functions sin o k t and cos wktin (11. we can also write Z. for t -. is periodic with period where n is a positive integer. =IL t = l . Outside that range the function is just a repetition of the pattern in that range. Similarly. = 2 (akcos wkt f bk sin okt) k=0 bk 2 CZ. . uniquely defined by its values at t = 1.4.2.4) and (11. ["/21 Z. . for all integer t. . 11.and n..5).. 4. we have %z: = [ 11' ~ [ a k ~cos z okt l k=O [(n-1)/2] 1 1 ~ 8+ ntri 2 1 to t = n. The terms for k = 1 and k = .4.4.9) . the energy associated with the sequence in one period is defined as + Now.1)/z 42 n C k=-n/2+1 where [ckI2= eke$.6) and (1 1. IckI2.j both have the frequency j(2rfn) and are referred to as the jth harmonic components. we have the corresponding form of Parseval's relation as (n-1)/2 n 2 k=-(n.3) and (11. We have demonstrated how to represent an arbitrary periodic sequence of period n as a linear combination of n trigonometric sequences or n complex exponentials.even.3). summing from t using the relation (11. on the both sides of (11.7). if n is.5). . The smallest positive value of n for which the Fourier series representation in (1 1. j ck12. hence are harmonically related. if n is odd.4.1 in the above representation both have the same fundamental period equal to n (and hence the same fundamental frequency 2.rrlrt) and are collectively referred to as the fist harmonic components. i f n is even. Thus. Equation (11.4.t [(fi-1)/2] + -2 k=l (az 4.6) are valid for all integers t. In other words.6) holds is called the fundamental period.4. multiplying by Z.9) is known as Parseval's relation for Fourier series.4.3) and (11.4. and I + bk$z1 sin mkt l 1- 1=1 = fag + bj). and the corresponding value 2 n / n is called the fundamental frequency.244 CHAPTER 1 1 Fourier Analysis for all integers t and j.b$) + no$*. k=l (11. For a given periodic sequence Z.4) and (11. ifnjsodd. the Equations (11. all the terms in the Fourier series representation have frequencies that are multiples of the same fundamental frequency.4. Equivalently. More generally. of period 11. using Equations (1 1.4. 2 4 ~ 1 and .4. the terns for k = +j and k = . (11. is infinite.4 Fourier Representation of Periodic Sequences 245 Equations (1 1. . we can interpret the quantity from the term in the Fourier series of 2. 4. the function is periodic. the jth harmonic components include the terms for both k =: +j and k =: . which is called the power of the sequence. . . .4. at the kth frequency ok= 2. a n d Z t + 3 j = Z . .4.t) FIGURE 1 1. which is odd. j = f l . .+ b sin ( 2. and fn/2] = I.1. I is called the power spectrum and describes how the total power is distributed over the various hequency components of the sequence 2. This equation is given by or. the period a (11. 2 .10) imply that the total energy for a periodic sequence over the whole time horizon t = 0. . f 2 .4.11.n-1)/2 k=-n/2+1 Is p. The quantity fk plotted as a function of wk shown in Figure I f .I L e t Z l = 1 . . equivalently.j as they correspond to the same frequency j ( 2 4 n ) . Now. = .1 Power spectrum.12) I ckp.3)becomes = 3. Hence. 3 . As noted above. if n is odd. 7 k=. The Fourier representation using Z. with period 3 as shown in Figure 11.9) and (11. 4-2. .Clearly. EXAMPLE1l.2. a + al cos (. . & = 3 .rrkln as the contribution to the total power.. t = 1 . we consider its energy per unit time. if n is even.4. Hence. +3. Z 2 = 2 . .4.6).2 Periodic sequence in Example 11.we can also represent Z.3. in terms off11. = i(1 .2. where the Fourier coefficients are obtained through (11. as where the coefficients are calculated using (1 1. Similarly.3.4.4..7): That is. .246 CHAPTER 11 Fourier Analysis FIGURE 1 1.57735'03i)e~~~~/~. .and ck are clearly related as shown in (1 1. + .5773503i)e-~~'"/~ 2 The coefficients ak. + + $(I + .4. Z.4)as That is.1. bk. t = 1.5). 4.3..577350312] = $. otherwise.4. 1 . 1 1. Z.11.5 Fourier Representation of Nonperiodic Sequences 247 Ff GURE 11. we can express (1 1.5 Fourier Representation of Nonperiodic Sequences: The Discrete-Time Fourier Transform Consider a general nonperiodic sequence or discrete-time function.5. of finite duration such that Z. Thus. = 0 for I t / > M.6).l)in a Fourier series of the form . Using (1 1. Let n = (2M 1) and define a new function + which is clearly periodic with period n. using (1 1. k = 1. The plot is given in Figure 11.13).the power spectrum of the sequence is given by (-. k = 0.3 Power spectrum of Example 1 1 . 5. Yr= Zt.2) and (11. the total interval of integration will always have a width of 2 ~Therefore.5. because the summation is carried out over n consecutive values of width Ao = 2 4 n .5.8) represents the area of a rectangle of height f(kAw)eikA"' and width Ao.1)/2.1)/2 5 t 5 (ti .7)can be written as Now each term in the summation of (11. Let We have where Aw = 2 ~ / nis the sample spacing of frequencies. we have I.ir/n. .1112. Combining (11. Therefore. = 0 for t < -(n . .5.1)/2 or t > (n . the limit of the summation becomes an integral.6) yields Because Ao = 2. and A o * 0. (11. however. Thus. where we have used that Z. Furthermore.As n * ~ .-+ 2.248 CHAPTER 11 Fourier Analysis In the interval -(rt . we can consider -71.5 w 5 71-. .. From the above discussion.5. in this book. Tbese modified definitions offlu) lead to the following Fourier transform pairs: and respectively. and 2.5). we see that we can also define the Fourier transfomf(w) as instead of the given (11.5 Fourier Representation of Nonperiodic Sequences 249 ~ e c a u sf(w)eiwt.5.5.11) is usually referred to as the (discrete time) Fourier transform of Z. is periodic with period 27~. in (1 1.the interval of integration can be taken as any interval of length 27r. They form a Fourier transform pair.5. we have the following: and The function f(w) in (11.11.11).10) is often referred to as the (discrete time) inverse Fourier transform of flu). we use the Fourier transform pairs as given in (11. e as a function of w . which may be found in other books.5. Thus. Specifically.10) and (11. For our purposes. In fact.5. e .5. is square summable. as it provides us with the information on how Z.5. The energy associated with the sequence is given by the following Parseval's relation: * w 2 ~ . is also referred to as the energy spectrum or the energy spectral density function. the above theory holds also when 2. however.~~ and summing over aU time or by computing the energy per unit fkequency 27rlf(o)[* and integrating over all frequencies. it suffices to note that the condition in Equation (1 1. was assumed to be of arbitrary finite duration. we must consider the question of convergence of the infinite summation in (11.5.5. as a function of o . That is. the quantity I f(o)l is often referred to as the spectrum or amplitude spectrum of the sequence.L If(@) 1' do. the energy may be determined either by computing the energy per unit time IZ. g(w) = 2?rlf(w)12. but the converse is not true. . as a linear combination of complex sinusoids infinitesimally close in frequency with amplitudes 1f(o)l(do). Thus.10) represents the sequence 2. Z.' ~do' f (o) - 2~ J-f *(w)f(w) d o 7s = Z1.17) implies the condition in Equation (1 1.5.10) as the limiting form of a sum implies that the inverse Fourier transform given in (1 1. In other words.18). A condition on Z. In the above construction.5.10) and (1 1.16) Parseval's relation thus relates energy in the time domain to energy in the frequency domain.5.. which guarantees the convergence of this sum is that the sequence {Z.e. (11. Hence. In this case. is composed of complex sinusoids at different frequencies.11).250 CHAPTER 11 Fourier Analysis The derivation of the integral in (11.11) remain valid for a general nonperiodic sequence of infinite duration. Equations (1 1.) is absolutely summable. i. The proof of the result using a weak condition depends on an alternative presentation of the Fourier transform and is omitted. 5. for periodic sequences is infinite. EXAMPLE 11. . Hence. between the frequency domain properties of periodic and nonperiodic sequences exist: 1. . The energy over whole time horizon t = 0. we describe their properties in terms of the energy spectrum over a continuum of frequencies.2. Hence. we study their properties in terms of the power spectrum over a finite set of harmonically related frequencies.11. whereas those of nonperiodic sequences form a continuum of frequencies. f1. The spectrum'frequenciesof periodic sequences are harmonically related and form a finite discrete set. The resulting spectra are hence sometimes referred to as line spectra. 2.4 Nonperiodic sequence in Example 11.5 Fourier Representation of Nonperiodic Sequences 251 Important differences. .17). The energy over the whole time horizon for nonperiodic sequences is finite as guaranteed by the condition in Equation (11.4. f2. .2 Consider the sequence which is graphed in Figure 11. Now Hence. we can write FIGURE 11. 3 Let the sequence be as shown in Figure 11. . It shows that the spectrum is a positive and symmetric function dominated by low frequencies.6.252 CHAPTER 7 1 Fourier Analysis Frequency (0 to T) FIGURE 11.6 Symmetric oscillating sequences in Example 11-3.5 The spectrum of the sequence in Example 11.5. EXAMPLE 11. FIGURE 11.2. The function f(o)is graphed in Figure 11. 11. for the spectrum Am) = 1 + cos w 27r ' . Now however.4 In this example. This phenomenon is typical for a sequence that tends to oscillate rapidly. = -/ 1 2w " -r (1 + cos w)eimr dw Frequency (0 to ?r) FIGURE 11.Thus.3. we obtain 3 2 ~ ( 5 4 cos o)' + -5- I0 5 T.5 Fourier Representation of Non~eriodicSequences 253 Following the same arguments as in Example 11.2.10).7 The spectrum of the oscillating sequence in Example 11. we can calculate the corresponding sequence by using inverse Fourier transform given in (11. EXAMPLE l f . it is dominated by high frequencies.' T T ~ O S T For a given spectrum.7 is again positive and symmetric.. The spectrumf(o) as shown in Figure 11. .5. we find the corresponding sequence Z. 6.6 Fourier Remesentation of Continuous-Time Functions 1 1. 1 1. the complex exponentials in (11.. . Also.254 CHAPTER 1I Fourier Analysis ( 0. (1 1. consider the set of harmonically related complex exponentials Clearly.3) are periodic with the same period P = 2 4 0 0 as the functionflt).e. we only need to consider the function for t in the interval [-P/2.6. Thus.P / 2 ] . Hence. Now recall that the fundamental period P and the fundamental frequency wo are related as follows: For this fundamental frequency. f ( t ) = f (t + j P ) for all real values t and integers j. i. we assume thatfit) is periodic with the fundamental period P.1 Fourier Representation of Periodic Functions We now consider a function Kt) that is defined for all real values t.1) This periodic function is completely characterized by its behavior in any interval of length P. otherwise. First.6. Hence. if the series in (11.6 Fourier Representation of Continuous-TimeFunctions Equation (11. 255 then the integral becomes if k =#j..6. Dirichlet Conditions for Periodic Functions IfAt) is a bounded periodic function with period P and has at most a finite number of maxima and minima in one period and a finite number of discontinuities.e.6) converges uniformly to At).6.11. we might represent the functionAt) as the following Fourier series: If the representation is valid. we need to show that the series uniformly converges to the functionAt).3) is a periodic system with period P that is orthogonal on any interval of width P. There are a number of necessary and sufficient conditions for the convergence of the series. then term-by-term integration is permissible and the Fourier coefficient can be easily shown to be For the representation in (11.6)is easily seen to be . the system in (11.6. The Parseval's relation for (1 1. then the Fourier series converges toflf) at every continuity point of flt) and converges to the average of the right-hand and left-hand limits ofAt) at each point whereflt) is discontinuous. then the integral is Thus.6. it suffices to note the following sufficient conditions proposed by Dirichlet. i.5) follows because if k = j.6. For our purposes.6) to be valid. in this interval.f(t) = 0 for It\ > P/2.Now g(t) = f(t) in the interval [ . Hence. g(t) is periodic with period P. which is defined on all real value t but of finite duration. P/2]. we can write and where oo = 2nlP. we have and .. but is periodic outside this interval.e. we define a new function that is identical to f(t) in the interval [-P/2. P/2].256 C W T E R 11 Fourier Analysis 11. That is. Clearly.P / 2 .2 Fourier Representation of Nonperiodic Functions: The Continuous-Time Fourier Transform For a general nonperiodic functionf(f).6. Thus. i. 11-6 Fourier Representation of Continuous-Time Functions 257 where Now each term in the summation of (11. and f(r) in (11.6. which we denote by w.6. As P +W. Thus. g(t) -+f(t). and the limit of the summation becomes an integral. That is.6.12) represents the area of a rectangle of height H(kwo) and width wo. and where we note that in the limit the quantity kwo becomes a continuous variable. and The function F ( w ) in (1 1.13) is called the inverse Fourier transform or inverse Fourier integral . oo +0.14) is called the Fourier transform or Fourier integral of f(t). The Parseval's relation for the Fourier transform pair in (1 1. f (t)is absolutely integrable. 1 1. also known as Fourier transforms: . Z.3.7 The Fast Fourier Transform Because.6.e. Although the results are valid for some functions of infinite duration. The following Dirichlet conditions provide a useful set of sufficient conditions of the existence of T ( w ) .258 CHAPTER 11 Fourier Analysis of F ( o ) . f(t) has only a finite number of maxima and minima and a finite number of discontinuities in any finite interval. we use (1 1.4) to calculate the following Fourier coefficients. . Zz. the following alternative forms of the Fourier transform pairs.6. we are given a sequence of 11 values Z1. by the same arguments as in Section 11. and may also be found in other books.13) and (11..f(#)is assumed to be of finite duration. .14)can be easily shown to be In the above construction. f ($1df < m. . in actual time series analysis. not all functions of infinite durations have a Fourier transform.They form a Fourier transform (integral) pak Similarly.5. 2.. i. Dirichlet Conditions for Nanperiodic Functions lz 1 1 1. require a complex multiplication. .1) become where . to simplify the notations we let + + . then the direct computation requires 202. strictly speaking. e-'"tf is equal to 44 and does not. there is a corresponding storage problem. the required number of operations is clearly substantial.1) in the form we see that for each k. For example. To illustrate the algorithm. an efficient algorithm. we consider an even value of n. When JZ is large. . -n/2 2.).112) using (11. Hence.. the direct computation of the whole set {q : k = -n/2 1.7. = 22t-1represent the odd-indexed values of (2. .1 1. . Additionally. Major contributions to the method include Good (1958) and Cooley and Tukey (1965).500 complex multiplications and additions and storage locations. Hence. Furthermore. this computation may become formidable if not impossible. known as the fast Fourier transform 0. the n-point Fourier transforms in (11.1) requires approximately 1z2 complex multiplications and additions. .= Z2' represent the even-indexed values of {Z..7 The Fast Fourier Transform 259 where we assume that fa is even and recall that wk = 2mkl1~. and we let X.} and Y. the computation of ck requires approximately n complex multiplications and additions if we ignore that for some values o f t and k. .Rewriting (1 1. Then. if n =. 450. was developed to compute these Fourier transforms.7.0.7. For large values of n. Consequently.6).7.7. The total number of required complex multiplications and additions for the overall computation is seen to be approximately equal to .4) is f If n/2 is also even.5) and (11. then we can continue this process for r iterations and end with a set of 2-point Fourier transforms.7. Hence. we only need to calculate fk and gk for k = 1. r1/2.7.and Y. . 2.7.7.6) are clearly the (n/2)-point Fourier transforms of X.7. then by repeating the same process.5) and (1 1. the required total number of operations for the overall computation leading to ck's is + If n = 2'with r being a prime number..7. in Equations (11. . Equation (11. ..260 CHAPTER 11 Fourier Analysis Note that fk and gk in (11. the total number of complex multiplications and additions required in the computation of Equation (11. . The direct computation of these fi and g k through (1 1. Then the linear combination of these two (n/2)-point Fourier transforms through (11.6) requires approximately 2(n/212 complex multiplications and additions.4) implies that the n-point Fourier transforms can be calculated in terms of the linear combination of the simpler Fourier transforms of length n/2.4ikgk to fk. we simply follow the same logic as described in the previous paragraph and note that it is equal to n/2 2(n/4)2. each of the (nJ2)-point Fourier transforms fk and gk in (11.6) can be computed via two (ti/4)-point Fourier transforms. Hence.5) and (11.5) and (11. Because we have Thus.7.4) requires rt complex multiplications corresponding to multiplyin gk by A? and also n complex additions corresponding to adding the product .7. respectively.7. To find the number of required operations to compute each of the (n/2)-point transforms via the (11/4)-point transforms. the FEPr is an iterative algorithm that uses the properties of trigonometric functions and complex exponentials in reducing the computation of n-point Fourier transforms to simpler transforms.e.2 Consider the following finite sequence: Find the Fourier representation of the sequence.1) requires approximately 4. then the transforms will be calculated at the frequencies wk = 2ak/m rather than at o k = 2mk/n. we illustrated the algorithm when n = 2'. rarely of the form 2'. Tukey [1967j. the actual length of a sequence is. . the 2-point Fourier transforms in the final iteration are actually obtained only by simple addition and subtraction. some authors have recommended that the unpadded portion of a modified sequence be "tapered" to smooth the transition from nonzero to zero values (see. say n = nln2. Because most available computer software for fast Fourier transforms is based on the form n = 2' introduced in this section.194. e.then the direct computation using (1 1. It is clear that for a given frequency wk these appended zeros will not affect the nurneticaf values'of the Fourier transforms. We leave as an exercise the simple case when n is the product of two integers. For example. . whereas the fast Fourier transform requires only 22. Godfrey [1974]. we can always write .e-i2n/2 = -1. r2. i. if n = 2048 = 211.Exercises 261 Clearly. where rl. we will not go into the method with other forms of n. In this section. we know Erorn number theory that an integer can always be written as the product of prime numbers. AIthough n is not necessarily of the f o m n = 2'. we refer readers to Bloomfield (2000). because h2 --. To attenuate potential undesirable effects. The reduction-ismore than 99%.g.304 operations.. In essence. Furthermore. .. for a large value of 11. After we add zeros to increase the length of a sequence from n to rn (say). rm are prime numbers not necessarily equal to 2. . always add zeros to the sequence (known as padding) until it reaches the required form. It is clear that the FTT for this form of n can be similarly developed.528 operations. One can.7. For other cases. of course. however. In practice. however.1 Prove the identity 11. the reduction in the required operations from n2 to n fog2ti is very substantial. and Brillinger [1975]). 11. 11. Z2 = 3.3 Find the Fourier representation for each of the following sequences: if 1 5 t 5 m. Show that Zt is periodicwith periodp. (c) f(t) '= sin 4t.9 Find the Fourier representation for each of the foIfawing functions: (a) f ( t ) = 1. 3 . 11... = tt .3 5 m.6 Consider the following sequence: (a) Is the sequence absolutely summable? (b) Find the Fourier transform of 2. (c) Plot and discuss the spectrum of the sequence. (b) f ( t ) = cos +t.5 Let Z1 = 4. a. (b) Z... 4 and j = 3 1.Y.cos 0 cos 2w (a) f (0. (b) Find the Fourier representation of 2. the function is called the convolution of gfx) and h(x). 11. = 11.. Z3 = 2. for 1 5 t 5 11..6. in terms of complex exponentials. and Zt+4j= Z.4 Consider 2. 11. . Z4 = I . 118 For absolutely integrable functions g ( x ) and h(x) defined on the real line. (c) Find and plot the power spectrum of 2. (a) Find the Fourier representation of Z. for t = 1.3) (c) f ( 4= + = ? 297 1 + cos 40 2rr ' -?r TT -rr 5 (O 5 0. Let X. and define 2. (a) Z r = i f m < t r n. -?r 5 t 5 IT. f2. m 2). andf ( t ) = 0 otherwise. .10 Show the Parseval's relation given in Equation (11. 11.2. and Yt be absolutely summable sequences. {i: + Xke x p ( .t .. = X.i ( 2 n k t ) l p ) . where #is not an integer.262 CHAPTER 11 Fourier Analysis 11. . where 4 is not an integer..7 Find the corresponding sequence Z. for each of the following spectra: 1 . is the convolution of the spectrums for Xi and Y. Show that the spectrum of 2.17). such that provided that f(t) is continuous at t = to.e. ' / ~ -m < t < m. (b) Find the Fourier transfom pair of a delta function. ..12 Discuss the fast Fourier transform when the number of observations n is the product of two integers 111 and 112.11 A generalized function S ( t ) represents a sequence of functions 6. ri = nl n2. yields a Dirac's delta function.(t) for ti = 1.Exercises 263 11. The function S(t) is also called Dirac's delta function. (a) Show that the sequence s.(t) = ( ~ l / a )efnf2. . . i. . 2. 11. we are now ready to study the frequency domain approach of time series analysis. More generally. sin 0 = 0. Then.sin ok. the spectrum of a stationary process is simply the Fourier transform of the absolutely summable autocovariance function of the process. from the results in Section 11.11) equals where we have used the properties that yk = Y-R.k ) = .1. After developing the general spectral theory. The sequence yk can be recovered from f(o) through the inverse Fourier transform 264 .Spectral Theory of Stationary Processes With the proper background of the Fourier transform discussed in Chapter 11. be a real-valued stationary process with absolutely summable autocovariance sequence yk. such as the spectrum of ARMA models. In fact. Related questions of the effect of Iinear filters and aliasing are also addressed. sin a ( . and cos w(-k) = cos ok. the Fourier transform of yk exists and by Equation (11.1 The Spectrum and Its Properties Let 2. 12. a stationary process can always be represented by a spectral distribution function.5. we examine the spectrum of some common processes.1 The Spectrum 12.5. The function f(o) has the following important properties. I 1, f ( o ) is a continuous real-valued nonnegative function, i.e., f(o)/= f(wf. Thus, from the discussion following Equation (11.5.15) in Section 11.5, f(w) is also referred to as the spectrum of the autocovariance sequence yk or the corresponding stationary process Z,. It is obvious that f ( o ) is a continuous real-valued function. To show it is nonnegative, we note that yk, as an autocovariance function, is positive semidefinite, i.e., for any real numbers ci, cj and any integers kj and kj. In particular, let ci cj = cos wj, ki = i, and kj = j. We have n n C CY(i-j) cos mi cos oj r O. i=l j=] Likewise, n n 2 j=l2y(i-j) sin wi sin w j r 0. Thus, n n y(i-j) [cos oi cos oj i-1 j-1 + sin wi sin oj] 2 j=1 cY(i-j) cos w(i - j) n = R .. - = 2 k=-(n-1) where k = (i - j). Therefore, (n - Ikl)yk cos ok = cos oi, 266 CHAPTER 1 2 Spectral Theory of Stationary Processes Now, that ys; is absolutely sumable implies that for any given E N > 0 such that > 0, we can choose an Thus, for n > N, we have Clearly, for any fixed N, Because E is arbitrary, it follows that Hence, I because yk cos wkl5l ykl.Therefore, by (12.1.7) and (12.1.Q 1 f (o) = 2?r k 2= 4 yk cos ok CQ 2. f ( o ) = f(o f 2v) and hence f ( o ) is periodic with period 27r. Furthermore, because f ( o ) = f(-o), i.e., f ( o ) is a symmetric even function, its graph is normally presented only for 0 5 w 5 rr. 267 12.1 The Spectrum TI Frequency (0 to rr) FIGURE 12.1 An example of a spectrum. 3. From (12.1.4),we have TI -77 which shows that the spectrum f(w) may be interpreted as the decomposition of the variance of a process. The term f(o)dw is the contribution to the variance attributable to the component of the process with frequencies in the interval (o, o do). A peak in the spectrum indicates an important contribution to the variance from the components at frequencies in the corresponding interval. For example, Figure 12.1 shows that the components near frequency ooare of greatest importance and the high-frequency components near rr are of Little importance. 4. Equations (12.1.1) and (12.1.4) imply that the spectrum f(w) and the autocovariance sequence yk form a Fourier transform pair, with one being uniquely determined from the other. Hence, the time domain approach and the frequency domain approach are theoretically equivalent. The reason to consider both approaches is that there are some occasions when one approach is preferable to the other for presentation or interpretation. + 12.1.2 The Spectral Representation of Autocovariance Functions: The Spectral Distribution Function Note that the spectral representation of yk given in (12.1.1) and (12.1.4)holds only for an absolutely summable autocovariance function. More generally, for a given autocovariance function yk, we can always have its specttal representation in terms of the following Fourier-Stieltjes integral: yk = j:aimk dF(o), (12.1.11) 268 CHAPTER 12 Spectral Theory of Stationary Processes where F ( o ) is known as the spectral distribution function. Equation (12.1.11) is usually referred to as the spectral representation of jhe autocovariance function yk.Like any statistical distribution function, the spectral distribution function is a nondecreasing function that can be partitioned into three components: (1) a step function consisting of a countable number of finite jumps, (2) an absolutely continuous function, and (3) a "singular" function. The third component is insignificant and is ignored in most applications. Thus, we may write the spectral distribution function as where Fs(u) is the step function and F,(o) is the absolutely continuous component. Equation (12.1.4) shows that for a process with absolutely summable autocovariance function, F(w) = Fc(w) and dF(w) = f(o)d o . To illustrate a step spectral distribution function, consider the general linear cyclical model where the Ai are constants and the Oi are independent uniformly distributed random variables on the interval t-T, T ] .The oi are distinct frequencies contained in the interval [-T, T ] .Then and where we note that 12.1 The Spectrum 269 Clearly, the autocovariance function yk in (12.1.15) is not absolutely summable. Hence, we cannot represent it in the form of (12.1.4). We can, however, always represent it in the form of (12.1.11). To do that we make the following important observations: 1. 2, is the sum of M independent components. 2. wi can be anywhere in the interval [-'T? TI, and yk in (12.1.11) is also the autocovariance function for the process given in (12.1.13) at frequencies -wi. and hence the variance A ; F / ~of the ith component is its contribution to the total variance of 2, at both frequencies -mi and oj.In other words, one half of the variance of the ith component is associated with the frequency -wi, whereas the other half is associated with the frequency wi. Thus, Equation (12.1.15) can be rewritten as where or, equivalently, where F(w) is a monotonically nondecreasing step function with steps of size Af/4 at w = fw i# 0 and a step of size A?/2 for o = wi = 0 for i = 1, , . , ,M, Figure 12.2 illustrates the spectral distribution function for a process with M = 2, Al = 1, A2 = 2, and nonzero frequencies oland 02. The corresponding spectrum representing the variance contributions at different frequencies is a set of discrete lines and is shown in Figure 12.3. Because (12.1.17) is clearIy periodic, this result is certainly expected. As shown in Section 11.5, the spectrum for a periodic sequence is a discrete or line spectrum. Although the spectral distribution function F(w) is a nonnegative and nondecreasing function, it does not quite have the properties of a probability distribution function because dF(w) = yo, which is not necessarily equal to 1. If we define 270 CHAPTER 1 2 Spectral Theory of Stationary Processes FIGUFE 1 2 2 Spectral distribution function for 2 Z, = 2 i = I A jsin(@&+ O i ) ,Al = 1, Az = 2. FIGUY 12.3 Line spectrum for Z, = Zi=*Ai sin(wit + Of), A1 = 1, Az however, then G.(o) r 0 and = 2. Jz d G ( w ) = 1. When dF(o) =f (o) do,we have p ( o ) do = dC(o)="f do. Yo (12.1.20) 12.1 The Spectrum 271 Hence, fkom (12.1.1) and (12,1.4),we get the following corresponding Fourier transform pair: and The function p(w) has the properties of a probability density function over the range [ - T , T] and is often referred to as the spectral density function, although the same term has also been used to refer to the spectrumf(o)in the literature. 12.1.3 Wold's Decomposition of a Stationary Process Note that the process 2, in (12.1.13) is clearly stationary because E(Z,) = 0 and E(Z,Z,+k) = 4 ~ ~ ~ I ~o i fk c= o7 s~ There . is, however, a very important difference between this process and the processes discussed in Chapters 2 through 10. The iinear cyclical model given in (12.1.13) is deterministic in the sense that it can be predicted without error from its past record. A process such as the ARIMA model that does not have this praperty is termed nondeterministic. Obviously, a process may contain both deterministic and nondeterministic components. For example, the model where A is a constant, A is a uniformly distributed random variable in I-v, 4,and [B(B)/+(B)] a, is a stationary and invertible A W A model, contains both deterministic and nondeterministic components. In fact, Wold (1938) showed that any covariance stationary process can be represented as where is a purely deterministic component and ~ j " is ) a purely nondeterministic component. This representation is known as Wold's decompositionof a stationary process. This decomposition is similar to the decomposition of spectral distribution function given in (12.1.12)with 2fd)having a step spectral distribution function and zIn)having an absolutely continuous spectral distribution function. 272 CHAPTER 12 Spectral Theory of Stationary Processes 12.1.4 The Spectral Representation of Stationary Processes Using exactly the same argument that an autocovariance sequence yk can always be represented in the form given in (12.1.1 I), a given rime series realization Zt can be written as the folIowing FourierStieltjes integral: where U(o) plays a similar role to F(o)in Equation (12.1.11) for a single realization, Because every realization can be represented in this form, however, we certainly should not expect the function U(o) to be the same for all realizations. Instead, the function U(u) should, in general, vary from realization to realization. In other words, corresponding to each realization of the process Z, there will be a realization of U(o). If (12.1.27) is used to represent the process Zt of all possible realizations, then U ( o ) is itself a complex valued stochastic process for each o.The integral on the right-hand side of (12.1.27) thus becomes a stochastic integral, and the equality is defined in the mean-square sense, i.e., The relation (12.1.27) is called the spectral representation of the stationary process 2,. To study the properties of U(u), we assume, with no loss of generality, that Z, is a zeromean stationary process. Hence, E [ d U ( w ) ] = 0, for all o. (12.1.29) Next, we note that for a zero-mean complex valued stationary process Z,, the autocovariance function yk is defined as where Z;+k denotes the complex conjugate of Z,+k. This general definition is necessary so that the variance of Zr will be real. Clearly, when Z, is a real valued process, we have Z;+k = Zt+k and hence yk = E(Z,Z,+k). Taking the complex conjugate of both sides of (12.1.27), we obtain Hence, Zt is real-valued if and only if for all o 12.1 The Spectrum 273 Now, consider which should be independent of t as 2, is assumed to be stationary. This result implies that the contribution to the double integral in (12.1.33)must be zero when o A, i.e., + E [ d U ( o )dU"(A)] = 0, for all w # A. (12.1.34) In other words, U ( o )is an orthogonal process. Letting h = w in (12.1.33),we get Ti- = Le"dF(w), where dF(w) = ~ [ i d U ( w ) l=~ E] d ~ f w ) d ~ * ( w ) ,for -rr 5 w 5 .rr, (12.1.36) and F(o)is the spectral distribution function of 2,. If 2, is purely nondeterministic so that F(o) = F,(o), then we have f ( ~ ) d w= dF(w) = ~ [ [ d ~ ( w ) / * ] , for -T 5 (12.1.37) o 5 T, and Ti- yk = JTe "kf(w)d o . The above result shows that a stationary process can always be represented as the Iimit of the sum of complex exponentials or, equivalently, sine and cosine functions with uncorrelated random coefficients. This representation is also known as Cramer's representation. The derivation leading from (12.1.27)to (12.1.38)provides a mare explicit physical interpretation of the spectrum discussed earlier in (12.1.10).That is, f(w) d o is the contribution to the variance of 2, attributable to the component Z,(w) of 2,having frequencies in the range (w, w do). + 274 CHAPTER 12 Spectral Theory of Stationary Processes 12.2 The Spectrum of Some Common Processes 12.2.1 The Spectrum and the Autocovariance Generating Function Recall from Section 2.6 that, for a given sequence of autocovariances yk, k = 0, i l , 3 2 , the autocovariance generating function is defined as . ,. , where the variance of the process yo is the coefficient of BO = 1 and the autocovariance of lag k, yk, is the coefficient of both B~ and B-k. Now, we know that if the given autocovariance sequence yk is absolutely summable, then the specbxm or the spectral density exists and equals 1 2rT k=, f (o)= - C yke-lUk. Comparing Equations (12.2.1) and (12.2,2), we note that for a process with an absolutely summable autocovariance sequence, its spectrum and autocovariance generating function are related as follows: 1 f (o)= -y(e-io). 2a This result is very useful in deriving the spectrum for many commonly used processes. 12.2.2 The Spectrum of ARMA Models Any nondeterministic stationary process Z, can be written as zFo $ j ~ j ,t,bo = 1 . Without loss of generality, we assume that E(Z,) = 0. where $ ( B ) = From (2.6.9), the autocovariance generating function of this linear process is given by Now, a general stationary ARMA(p, q) model - with +JB) = (1. - I$~B- . . - &B*) and 04(B) = (1 - OIB common factors can be written as ..- - BqB4) sharing no 12.2 The Spectrum of Some Common Processes 275 with $(B) = 8q(B)/4p(B).Hence, the autocovariance generating function of the ARMA(p, q) model becomes When the model is stationary, the roots of 4p(B) = 0 Lie outside of the unit circle, guaranteeing the absolute sumability of the autocovariance function. Consequently, the spectrum of a stationary ARMA(p, q) model is given by which is also known as a rational spectrum. If the model is invertible, i.e., the roots of flq(B) = 0 lie outside of the unit circle, 0,(e-iW)9,(eio) will not vanish. Hence, the inverse of f(o) exists and is given by which can be easily seen to be the spectrum of an ARMA(q, p) process. Applying the inverse Fourier transform offl(w), we get which is known as the inverse autocovariance function. Naturally, is called the inverse autocorrelatio~function, a concept due to Cleveland (1972). As discussed in Section 6.3, the inverse autoconelation function can be used as an identification tool that has similar characteristics as the partial autocorrelation function. Equations (12.2.9) and (12.2.1 1) provide an alternative method to calculate the sample inverse autoconelation function, 276 CHAPTER 12 Spectral Theory of Stationary Processes The Spectrum of a White Noise Process A white noise process is a series of uncorrelated random variables with autocovariance function and autocovariance generating function ~ ( 3=) a:. Hence, by (12.1.l) or equivalently by (12.2.31, its spectrum is which is a horizontal straight line as shown in Figure 12.4. This result implies that the contrjbutions to the variance at all frequencies are equal. It is well known that white light is produced when the light spectrum at aU frequencies is the same. In fact, the white noise process is named because of its similarity to white light in terms of the spectrum. In model building, the diagnostic checking for white noise can be performed by examining whether the spectrum of residuals is relatively flat. The Spectrum of an AR(I) Process The autocovariance generating function of a stationary AR(1) process, (1 - +B)Z, = a,, is given by 0 7r Frequency (0 to ?r) FIGURE 12.4 The spectrum of a white noise process. a low-frequency dominating spectrum indicates a relatively smooth series.OB)a. hence. The Spectrum of an MA(1) Process The autocovariance generating function of a moving average process of order 1. whereas a high-frequency dominating spectrum implies a more ragged series.cos 0) can be taken to be the spectrum of a random walk model.2. The implication of this phenomenon in data analysis is that a sample spectrum with strong peak near zero frequencies indicates a possible need for differencing.2 The Spectrum of Some Common Processes 0 277 97 Frequency (0 to . iuustrated earlier in Examples 11. We remind the reader that these phenomena were.rr) Frequency (0 to q) FIGURE 12. Z. As shown in Figure 12. is a well-behaved nonnegative even function other than a possible problem at o =: 0 where the peak of the function becomes infinite in height.12. strictly speaking. by (12. although. the spectrum is dominated by low-frequency (long-period) components. = (1 . a random walk does not have a spectrum because its autocovariance sequence is not absolutely summabfe.5 Spectrum for the ARCl) process. when # > 0 and hence the series is positively autoconelated.3. c u f(&) = 1 G(1. In other words. If t$ < 0.17) however..2.2 and 11. in fact. is . the ARfl) process in the limit becomes a random walk model. The function (12. As t$ approaches I.8a) the spectrum is The shape of the spectrum depends on the sign of 4.5. Thus.then the series is negatively autocorrelated and the spectrum is dominated by high-frequency (short-period) components. when 0 is positive. the series is positively autocorrelated and hence relatively smooth. the shape of f(o) depends on the sign of 8 .. the autocorrelation function in the time domain and the spectrum in the frequency domain contain exactly the same information about the process.6 Spectrum for the MA(1) process. Frequency (0 to ?r) . On the other hand.6.2. Both cases are shown in Figure 12. the series is negatively autocorrelated and hence relatively ragged.3 The Spectrum of the Sum of Two Independent Processes Consider the process Z. The corresponding phenomenon in terns of the spectrum is its dominating values at the lower frequencies. which is the s&n of two independent stationary processes X. Frequency (0 to ?r) FIGURE 12.and Y. when 6 is negative.. As expected. 12.278 CHAPTER 12 Spectral Theory of Stationary Processes and its spectrum is Again. Recall that ( 0 9 otherwise. which is represented by a spectrum with higher values at the higher frequencies. Thus. 4 The Spectrum of Seasonal Models From the previous results in Section 11. .23). The spectrum of 2.. processes. = a. Then.3.f6 . (1 . X.1 2. will be the sum of two independent spectra.2.2.2. . we know that a deterministic seasonal component with seasonal periods will have a discrete (line) spectrum at the harmonic Fourier) frequencies ok= 27~k/s.. . *[s/Z]. f2. from the result in Section 12.. and Y. Hence. it follows that That is. (12.. the resulting spectrum will have spikes at the harmonic seasonal frequencies ok= 27412 for k = fl .4 and Section 12.1.2.. 12. can be used to fit a monthly series with a cyclical trend. the spectrum of the sum of two independent processes is the sum of the two independent spectra.yx(B).2.2 The Spectrum of Some Common Processes 279 Let yZ(B). For example. 4Z2. the model .k = 4Z1. . Now consider the following seasonal ARMA model with seasonal period 12: . and the other represents the error process a.24) . . One represents the deterministic component in (12. and yy(B) be the autocovariance generating functions for the Z. in (12. represent a white noise error process.@B'*)z.2. Because the spectrum of the white noise emor process is a horizontal flat line.23). where the a. The general shape of the spectrum for @ > O and 4 > 0 is shown in Figure 12.1)/12 for k = 1.(PB'2]~.3. = at.3.2.280 CHAPTER 12 Spectral Theory of Stationary Processes The autocovariance generating function is If the process is stationary. The spectrum for Qt < 0 can be derived similarly.5.(1 + rp2 ..27) From (12. k = 1. (12.6. the spectrum can easily be seen to be d f ( ~= ) 1 ?. 3. (12.2. $ 6 .1 at o = v(2k .16) and (12.5.28) .24 cos 0 ) .2 0 cos 120) (1 4. and 6 and troughs at frequencies w = ~ ( 2 k. 3.8. Thus.5.@ B ' ~(1 ) . > 0.4. for Q. = a. Frequency (0 to ?r) FIGURE 12.4. and cos(l2o) = 1for w = 0 and w = 2mk/12.2.26).4. the - spectrum will also exhibit peaks at seasonal harmonic frequencies 2~rk/12fork 1.f#2 .11/12 for k = 1. other than a peak at o = 0.7 Spectrum for (1 .7. and 6 as shown in Rgure 12. Consider a multiplicative seasonal ARMA model having both seasonal and nonseasonal ARMA components. 2.+B)Z.2. then its spectrum exists and equals Note that cos(l2o) = .2.2.4. (1 .2.. 12.3 The Spectrum of Linear Filters 281 Frequency (0 to rr) FIGURE 12.8 Spectrum for (1 - @Bl2)[1 - +BIZ, = a,. The sample spectrum discussed in the next chapter does not behave as regularly as those in Figures 12.7 or 12.8. If the data indeed contain a seasonal phenomenon, though, then the spectrum should exhibit peaks or bumps at the harmonicafly related seasonal frequencies. Again, we remind the reader about the difference between the spectra for the Equations (12.2.23) and (12.2.24). Although both are purely seasonal models, the cyclical trend term in (12.2.23) is deterministic and hence has a discrete (line) spectrum, whereas the seasonal phenomenon in (12.2.24) is nondeterministic with a continuous spectrum. This difference implies that, in data analysis, a sharp spike in the sample spectrum may indicate a possible deterministic cyclical component, whereas broad peaks, unless due to the smoothing effect to be discussed in the next chapter, often imply a nondeterministic seasonal component similar to the multiplicative seasonal AlUMA process. In either case, it is generally impossible to determine the exact form or the order of the underlying model from a spectrum, In terms of the ARIMA model building, the autoconelations and partial autocorrelations are, in general, more useful. In any event, for model building the autocorreIation and the spectrum should complement rather than compete with each other. 12.3 The Spectrum of Linear Filters 12.3.1 The Filter Function Let Z, be a stationary process with' an absolutely summable autocovariance sequence yZ(k) and corresponding spectrum 282 CHAPTER 12 Spectral Theory of Stationary Processes Consider the following linear filter of 2, given by w CO where a(B) = Zj=-aj@and Xj=-tql < 00. Without loss of generaIity, assume that E(Z,) = 0 and hence E(Y,)= 0.We have E(,,+,) = E[( 5 h = - a ah+-.) whereweletl=k-j+h. Now, y y (k) is absolutely summable, because Hence, we have (2 j=-Ol q Z t +k- j ) ] 12.3 The Spectrum of Linear Filters 283 where and which is often known as the f2ter function or the transfer function. This filter function can be used to measure the effect of applying a filter on a series. In many studies, one important objective is to design a good filter so that the output series or signals satisfy certain desired properties. 12.3.2 ~ f f e c of t Moving Average Consider the simple moving average where a ( B ) = ( E Y L ~ ' B ~Then, ) / ~ . from (12.3.4), - -1 1 - cos mw m2 1 - c o s o ' - Note that cos mw = -1 when mw = (2k - 1)1T, i.e., w (2k - l)m/m for k = 1 , . . . , [(m 1)/2],and cos mw = 1 when mo = 2k .rr or w = 2k ?rim for k = 1, . . , [m/2].Also, + . Hence, for ni r 2, the filter function has its absolute maximum at w = 0 and relative maxima at frequencies w = (2k - I)+ for k = 1, , , [(ni 11/21, The function vanishes at frequencies o = 2 k d n t for k = 1, . . . , [mn/2].Because (1 - cos o)is an increasing function between 0 and v , the overall shape of the filter function looks like that shown in Figure 12.9. When m is large, the first peak around the zero frequency becomes dominating and the relative maxima at other frequencies become negligible, which implies that Y,will mainly contain .. + 284 CHAPTER I2 Spectral Theory of Stationary Processes 77 Frequency (0 to d FIGURE 12.9 Filter function for Icy(eiw) 1 = l/m2[(l - cos mo]/(l - cos w ] ] . only the low frequency component of the original series 2,. Because a monotonic trend has an infinite period that corresponds to a zero frequency, the moving average bas often been used to estimate the monotonic trend component of a series. A filter like this one, which preserves the low-frequency component and attenuates or eliminates the high-frequency component of a series, is called a low pass filter. 12.3.3 Effect of Differencing Next, we consider the differencing operator where n(B) = (1 - B). From (12.3.4), we have [a(eio)t2= (1 - eiw)( 1 - e-iu) = 2(1 - cos w ) . (12.3.8) Clearly, Ia(ei")12is 0 at w = 0.It increases between 0 and rr and attains its maximum at o = a, as shown in Figure 12.10. Thus, the differencing filter preserves the higher-frequency component and essentially eliminates the low-frequency component of a series. Such a filter is called a high pass fitter and is often employed to remove the trend of a series. The seasonal differencing has an effect similar to a high pass filter and is left as an exercise. 12.4 Aliasing 285 7l Frequency [O to T ) FIGURE 12.10 Filter function for differencing 1 a(eiw) l2= 211 - cos o). 12.4 Aliasing Consider the following cosine curves: eos[(o -t 2 n j / ~ t ) t ] , forj = 0, +1, &2, . . . and -m < t < m. (12.4.1) For any given o,the curves cos(ot) and cos[(w f2 ~ r j / A t ) ffor ] j # 0 are clearly different, as shown in figure 12.11 for cas(ot) and cos[(w 2 ~ r / A t ) twith ] o = 7r/2 and At = 1. Now, suppose that we study the curve cos(ot) as shown in the dotted line and observe it only at the time points k Af for k = 0,f1, +2, . . . , where A t is the sampling interval. These curves become indistinguishableat the time points t = k A t for k = 0,Ifr 1, f2, . . . because + coa[(w k)*: FIGURE 12.11 The aliasing. At] - cos(uk At i 2 ~ r j k )= cor(uk At), (12.4.2) 286 CHAPTER 12 Spectral Theory of Stationary Processes The same problem occurs for the sine and exponential functions. Thus, in Fourier representation, if we observe a process 2, at t = k At, for k = 0,fI , f2, . , then the components of 2, with frequencies o 3~ 2 a j / A t , for j = k l , f2 , . . will all appear to have frequency o,These frequencies are said to be aliases of w. The phenomenon, in which a high frequency component takes on the identity of a lower frequency, is called aliasirig, For a given sample interval At, if we observe a process at time points k At for k = 0, f1, 2t2, , . . , the fastest observable oscillation goes above the mean at one time point and below the mean at the next time point. The period of this fastest oscillation is 2At. Thus, the highest frequency used in Fourier transform is 2?r/2At -- ?r/At, which is also known as the Nyquist frequency with the given sampling interval At. With this Nyquist frequency v / A t , if o < m/At, then, as discussed above, the components with frequencies w A 2vj/At, for j = 0 , f1, f2, . . , are all confounded and are aliases of one another, and the sum of all their contributions will appear at frequency o in F(w) orf(o), making the estimation of contributions of different components difficult or impossible. To avoid this confounding or aliasing problem, we thus should choose the sampling interval so that the frequencies under study are all less than the Nyquist frequency. For example, if a phenomenon is a linear combination of sine-cosine components with frequencies wl, 02, . . . , and om,then we shouId choose At SO that loj]< (v/At) for j = 1,2, . , m. Otherwise, if v / A t < 0.9,say, then will be aliased with some other gequency in the interval fO,?r/Af] and hence make the estimation of the component at oj impossible. In data analysis, we often take At = 1. . .. .. 12.1 Consider the process where A1 and A2 are independent random variables, each with mean 0 and variance T. (a) Find the autocovariance function ykfor 2,. {b) Is yk absolutely summable? Find the spectral distribution fuhction or the spectrum of 2,. 12.2 Consider the process where A is a constant, A is a uniformly distributed random variable on [-a,v], and the at's are i.i.d. random variables with mean 0 and variance 271.. Also, h and a, are independent for all t. (a) E n d the autocovariance function yk for 2,. (b) Is yk absolutely summable? Find the spectral distribution function or the spectrum of Z,. Exercises 287 12.3 A spectral distribution function G(w) can be decomposed in the form G(o) = pGa(o) + q~d(o), where p and q are nonnegative, p $; q = 1, Ga(w) is an absolutely continuous spectral w a) discrete or step spectral distribution function, distribution function, and ~ ~ ( is Express the following specttd distribution function in the above decomposition form: 12,4 Given the following spectral distribution of Z,, Find the autocovariance function yh for Z,. 12.5 Find and discuss the shape of the spectra for AR(1) processes, (1 - $B)Z, with the following values of 6: = a,, fa) 4 = .6, (b) 4 = .99, (c) + = -.5. 12.6 Find and discuss the spectrum for each of the following processes: (a) (1 - .7B + . 1 2 ~ ~=) a,, ~, (b) 2, = (1 - .7B - .18B2)a,, + (c) (1 - .5B)Zt = (1 .6B)aj, (d) Z, = (1 - .4B)(1 - .8&)a,, where the a,'s are i.i.d. N(0, 1). 12.7 Find and discuss the shape of the spectrum for (1 - @ B ~ ) z = , a,, where the a, is a Gaussian white noise process with mean 0 and variance 1. 12.8 Let X, and Y, be independent stationary processes with the spectra fx(w) and fy(o), respectively. Assume that Y,follows the AR(2) model, (1 - .6B - .2B2)Y, = a,, and X, and a, are independent white noise processes, each with mean 0 and variance cr2. If Z, = X,+ Y,, then find and discuss the shape of the spectrum of 2,. 288 CHAPTER 12 Spectral Theory of Stationary Processes 12.9 Consider the following linear filter of 2,: where Z, is a stationary process with an absolutely summable autocovariance function. Find and discuss the filter function defined in (12.3.4). 12.10 Differencing operators have been used widely to reduce a nonstationary process to a stationary process. Discuss the effect through the filter function for each of the following differencing operators: fa) Wt = (1 - B > ~ z ~ , (b) I.V, = (1 - B~)Z,, (c) IV, = (I - B)(1 - B ' ~ ) Z ~ . Estimation of the Spectrum In this chapter, we'discuss estimation of the spectrum. We begin with periodogram analysis, which is a useful technique to search for hidden periodicities. The smoothing of the sample spectrum and related concepts such as the lag and spectral windows are discussed. Empirical examples are presented to illustrate the procedures and results. 13.1 Periodogram Analysis 13.1.1 The Periodogram Given a time series of n observations, we can use the results in Section 11.3 to represent the n observations in the following Fourier representation: InPI 2, = 2 (akcos o k t + bk sin o k t ) , k=O where o k = 2ak/n, k = 0, 1, . . . , [11/2], are Fourier frequencies, and and are Fourier coefficients. It is interesting to note the close relationship between the above Fourier representation and regressidn analysis. In fact, the Fourier coefficients are essentially the least squares estimates of the coefficients in fitting the following regression model: mn/21 Z, = k=O (akcos o k t + bk sin wkt) + el, 290 CHAPTER 13 Estimation of the Spectrum where the ok are the same Fourier frequencies. The fitting will be perfect, but we can use the notion from regression analysis to regard the Parseval's relation as an analysis of variance presented in Table 13.1. The quantity I(wn) defined by mi, k n - (a: 4- 2 bz), nu52 , = 0, k = 1, . . . ,[(n - 1)/2], (13,1.4) ?I k = - when n is even, 2 is called the periodograrn. It was introduced by Schuster (1898) to search for a periodic component in a series. Sampling Properties of the Periodogram Assume that Zl, Z2, . . . , and Z, are i.i,d.N ( 0 , u2).Then 13.1.2 TABLE 13.1 Analysis of variance table for periodogram analysis. Source Degrees of freedom Sum of squares -- Frequency wo = 0 (Mean) Frequency w~ = 2m/n 2 11 -(a: 2 + b:) Frequency oz = 4r/n Frequency W f [ n - u / 2 ] = [(n-l)/2]2r/n 2 Frequency w , / ~= .rr (exists only for even n) 1 Total n 2 2 y ( a [ ( n - ~ ) / ~+] b[(n-1)/21) 13.1 Periodogram Analysis 291 and where we use the fact that ak and aj are independent for k # j because of (11.2.13). Hence, the ak for k = 1, 2, . , [(n - 1)/2], are i,i.d, N(0, 2g2/1z), and the naf/2u2, k = 1,2,. , [ ( n - 1)/2f, are i,i.d. chi-squares with one degree of freedom. Similarly, the nb$/2cr2, k = 1,2, , , [ ( t i - 1)/2], are i.i.d. chi-squares with one degree of freedom. Furthermore, naj/2u2 and nbj2/2u2 are independent for k = 1, . . . , [(n - 11/21 and j = 1, 2, . , [(n - 1)/23, because, ak and bj are normal and by the orthogonality property of the cosine and sine system, we have .. .. .. . . Ii Z, cos mkt = 4 " C [E(z?) n2 4ff2 112 1=1 =0 - cos wkt sin wit] C" [COSukt =- .C Zu sin ojv u=l sin ojt] for any k and j. (13.1.5) It follows that the periodogram ordinates for k = 1,2, . . . , f(n - 1)/2] are i.i.d. chi-squares with two degrees of freedom, Clearly, by u ~ even n) follows the chi-square distrithe same arguments, each of 1(0)/c~and I ( T ~ ) / (for bution with one degree of fteedom. With the adjustment for Z(T) in mind, we assume, without loss of generaiity, that the sample size n is odd in the following discussion. Suppose that a time series of n observations contains a periodic component and can be represented by Z, = a. 't ak cos wkt + bksin o k t -k e,, (13.1.7) where ok = 27rk/n, k f 0, and the e, are i.i.d. N(0, (r2). To test the hypothesis in (13.1.7), it is equivalent to test in Table 13.1 Ho:ak=bk=O vs. Hl:akSCO or bk+O 292 CHAPTER 13 Estimation of the Spectrum using the test statistic - (n - 3) (a: + bz) 2 f:f3 (a,? bj2)' + 2 J#k which follows the F-distribution F(2,11- 3) with 2 and (n - 3) degrees of fieedom. Neither the numerator nor the denominator in (13.1.8) includes the term ao, corresponding to the zero frequency. In fact, because the periodogram at frequency zero reflects the sample mean and not the periodicity of the series, it is usually removed from consideration in the analysis. More generafly, we can test whether a series contains multiple m periodic components by postulating the model Zt = a. r.. + (ak,cos uk1t+ bki sin mkit) + e,, (13.1.9) i= 1 . where the e, are i,i,d. N(0, m2), wk, = 2?rk JII, and the set I = {ki : i = 1, . . , ma} is a subset of {k : k = 1,2, . . . , [n/2]).The corresponding test statistic from Table 13.1 follows the F-distribution F(2m, n - 2m freedom. - 1) with 2m and (n - 2m - 1) degrees of 13.1.3 Tests for Hidden Periodic Components In practice, even if we believe that a time series contains a periodic component, the underlying frequency is often unknown. For example, we might test H o : a = /3 = 0 vs. Hl:aZ 0 or /3 $ 0 for the model Zt = p + a cos ~t + /3 sin w t + e,, (13.1.11) where {er] is a Gatmian white noise series of i.i.d. N(0, a*)and the kquency w is unknown. Because the frequency w is not known, the F-distribution and the test statistic as discussed i n . u2is often unknown and has to be estimated. we can search out the maximum periodogram ordinate and test whether this ordinate can be reasonably considered as the maximum in a random sample of [n/2J i." If the time series indeed contains a single periodic component at frequency w. Now. In this case. then we could use (13.-toindicate the Fourier frequency with the maximum periodograrn ordinate.i. the { Z . for any g 3 0.are i.d. we have If u2were hown. random variables. a natural test statistic will be where we use w(l.13. .1. under the null hypothesis Ho.d. The periodogram analysis.i. however.k = 1.14)to derive an exact test for the maximum ordinate. the periodogram ordinates I(#R)/u2. [n/2]. .1 Periodogram Analysis 293 the previous section are not directiy applicable. Thus. however. . the original purpose of the periodogram was to search for "hidden periodicities. In practice. To derive an unbiased estimator of cr2. which have the probability density function Thus. can still be used. . 2. chi-square random variabIes with two degrees of freedom. it is hoped that the periodogram I(o& at the Fourier &equency wk closest to o will be the maximum. In fact. ) are Gaussian white noise N(0. each being a multiple of a chi-square distribution with two degrees of h e dom. cr2). we note that under the null hypothesis Thus. Hence. 20). and nt is the largest integer less than l/g. [11/2] are independent.05 as given by Fisher (1929) is shown in Table 13.20) to find the critical value g..2 is an approximation obtained by using only the first term in (13.2.1. .e. The critical values of T for the sigdicance level a = . contains a periodic component.1. . based an the following statistic: Under the null hypothesis of the Gaussian white noise process N(0. / u ~for . such that If the T value calculated from the series is larger than g.. for large samples. and by replacing u2by &2 it leads to the test statistics Using that k = 1. An exact test for max{I(o&] was derived by Fisher (1929). . g > 0. . This test procedure is known as Fisher's test. The third column in Table 13.i.we can use Equation (13.Thus.e. the distribution of V can be approximated by the same distribution as ~ ( ~ ) ( o ( ~ ..294 CHAPTER 13 Estimation of the Spectrum is an unbiased estimator of a2.0. u2)for Z. then we reject the null hypothesis and conclude that the series Z.. . ~ h u sfor . Fisher (1929) showed that where N = [r1/2]. any g '. we have It follows that G2is a consistent estimator of a2.2. )i. any given significance level a. 1 Periodogram Analysis 295 TABLE 13. for most practical purposes. The test for seasonality is then performed by the F ratio of the mean square for the six seasonal frequencies 277k/12.d. to examine whether a monthly time series is seasonal with period 12. It leads to an estimate of m.1. The approximation is very close to the exact result. @y first term only) .wcl.1) if n is even.2 . 2. we can use standard regression analysis and construct the usual regression test.20)is taken as the distribution of T2 with N being replaced by (N . 6. The procedure can be continued until an insignificant result is obtained.i. 5. Alternatively.1.05 for the ratio of the largest periodogram ordinate t o the sum.1 ) . The critical values for a = . because is chosen only from the Fourier frequencies and not from all the possible frequencies between 0 and rr.1. 4. . For example. 3. (1952) suggested extending Fisher's test for this second largest ordinate based on the test statistic where the distribution in (13. to the residual mean square. for the test. however.21)to derive the critical value g. has shown that the unknown w with the maximum periodic component can be safely estimated by and that the probability that lo . we might postulate the model where the et are i. Hartley (19491.1)/2 if n is odd and N = (n/2 . which implies that there exists a periodic component in the series at some frequency o. u2).the number of periodic components present in the series. Under the null hypothesis Ho in (13. we can use Equation (13.13. N* *N= fn g. for k = 1 . Hence. a significant value of ~ ( ' ) ( w (leads ~ ) ) to the rejection of Ho.This o is not necessarily equal to w ( l ) . (by exact formula) g. N(0. Let I ( ~ ) ( w (be ~ )the ) second large-st periodogram ordinate at Fourier fiequency ~ ( 2 )Whittle .l > 29~/11is less than the significance level of the test.11). . For Fisher's exact test for the maximum periodogram.7518 >> .3 are the values of the F statistics given in (13.68544. .52) = 3. the result is highly significant and we conclude that the series contains a periodic component at the frequency 0)s = .05 = .13. we have which gives g = .68544.21493. For significance level a = . It indicates that the data exhibit an approximate nine-year cycle.21493.21).1. For a = .O5. we have From Table 13. F.8) to test the significance of the periodogram at each Fourier frequency.22805 for N = 25. Because T = . This fresuencycorresponds to a period of P = 2v/o = 2v/w6 = 9. we can use the first term approximation given in (13. g. nated by a very latge peak at frequency w6 = 21r(6)/55 = . and the periodogram is significantonly at frequency m6 = . and g.1 Periodogram Analysis 297 Frequency (0 to T) FIGURE 13.05(2.1667 years.2.1.1 Periodogram of the logarithms of the annual Canadian lynx pelt sales.O5.I9784 for N = 30.os= . More precisely for N = 27.19533. Also included in Table 13.68544. l). we can only calculate fork = 0. k = I . we investigate the estimation of the spectrum of time series with absolutely summabIe autocovariances.From (12.i sin wkt) I 2 " Z*(COS okt 'i. i sin @kt) . Tk and call it the sample spectrum. To see. To further examine the properties of the sample spectrum. the sample spectrum and the periodogram are closely related. Z.(cos okt. . it is natural to estimate f ( o ) by replacing the theoretical autocovariances yk by the sample autocovariances For a given time series of n observations.l 5 v. Based on sample data. j ( w ) is asyrnptaticaIly unbiased and looks very promising as a potential estimator forf (w).298 CHAPTER 13 Estimation of the Spectrum 13.5. we have Tk is asymptotically unbiased. At these Fourier frequencies.3). as discussed in lim ~ ( f ( w )=) f ( w ) .(a . .l. Because Section 2. . however. -?r 5 6. . .[11/2].2 The Sample Spectrum In this section. .. we note that .1. Hence.the spectrum is given by 03 1 = -(YO 2~ + 2k=lx y kCOS OK). we estimate f ( o ) by Tk. let us consider f ( w k ) at the Fourier frequencies ok= Zn-k/n. 11-cO Thus.2. 2. .~)/2?r = mn. and letting j = f . In generai.s in (13. we get Hence. fork = 1.J2?r if n is even. . denote it as where x2(2) is the chi-square distribution with two degrees of freedom. (11 . .7) is the spectrum of Z. eiokt = 299 2:-e-iwir = 0.2b). It follo\vs that if Z. Here.rr)x2(2) = ( ~ ~ / 2 1 ~ ) ~ ~ (and 2 ) /we 2 .2.2. .2.IS a Gausslan white noise series with mean 0 and constant variance a2.13.2 The Sample Spectrum where we use that Because x:=.4). we note that u2/27r in (13. are distributed independently and identically as (a2/4.. fro~n(13. we have where we note that j(wr) = I(~. then f(wk). .. Now. it can be then shown that if Zt is a Gaussian process with the spectrumf(o). following the same arguments.1)/2. we have for any two different Fourier frequencies wk and mi. Thus. as the sample size n increases and the mesh of the Fourier Frequency (0 to . as shown in (13. . it is an unsatisfactory estimator because it is not consistent as the variance of f fok)does not tend to zero as the sample size n tends to infinity.300 CHAPTER 13 Estimation of the Spectrum and which does not depend on the sample size n. by (13.Ir) Frequency (0 to T ) Frequency (0 to ?rf FIGURE 13.2 Sample spectrum of a white noise process. More generally. although the sample spectrum cdculated at the Fourier frequencies is_unbiased.6).6) and that the periodogram ordinates are independent.1.n) Frequency @ to . even for other distinct non-Fourier frequencies o and A.2. Furthermore. 2(wj) = 0. As a result. where the mk = 27rk/n. nt.i.[n/2]. The weighting function W n ( w j ) is called the spectral window because only some of the spectral ordinates are utilized and shown in the smoothing summation. the spectral estimator is the smoothed spectrum obtained from the following weighted average of m values to the right and left of a target frequency ok. . Iff(o) is flat and approximately constant within the window. k = 0. f1. n being the sample size. and W. which is often chosen such that mn -+ m but (m.e.3 The Smoothed Spectrum 301 frequencies becomes finer and finer. f ( .2./n) +0 as n +m..1 Smoothing in the Frequency Domain: The Spectral Window A natural way to ieduce the variance of the sample spectrum is to smooth the sample spectrum locally in the neighborhood of the target frequency. are Fourier frequencies. . ( o ) is unstable and tends to be very jagged.3 The Smoothed Spectrum 13. Note that the pattern persists regardless of the size of the sample. then we have .(wj) is the weighting function having properties that and m" Iim IJ+M 2 w.13. 13. the covariance between j ( w ) and j(h) still tends to zero as TI e m. as illustrated in Figure 13. indicates that m is a function of n. In other words. . The effort of correcting these unsatisfactory properties leads to the smoothing of the periodogram and the sample spectrum. j=-m. f.3. . more stable. a common dilemma with statistical estimators. increases.3. the bias will also increase as the bandwidth increases because more and more spectral ordinates are used in the smoothing.10).2a).and not only at the Fourier frequencies. the resuiting estimator becomes smoother. As the bandwidth increases.302 CHAPTER 13 Estimation of the Spectrum Also.4) of the spectral window implies that the variance of the smoothed spectrum decreases as m. and has smaller variance. The value of rn.(A) is the spectral window satisfying the conditions and The spectral window is also known as the kernel in the literature. Thus. represents the number of frequencies used in the smoothing.2. we can write the general smoothed spectrum in terms of the following integral form: where W. also known as the bandwidth of the window. . More generally. hence. The property (13. the sample spectrum is defined at any frequency w between -T to a. by (13. where we also use that the sample spectrum is independent at Fourier frequencies. however. This value is directly related to the width of the spectral window. Unless f(w) is really flat. from (13. more spectral ordinates are averaged. We are thus forced to compromise between variance reduction and bias.2. catculation is only necessary for the frequency range between zero and a.3 The Smoothed Spectrum 303 If the spectrum is approximately constant within the bandwidth of the spectral window. Equivalently.(w) is a consistent estimator Off(&). . The covariance will be large if the spectral windows overlap considerably. and not the spectrum. it is not included in the smoothing.As the periodogram is also symmetric about frequency zero. the covariance between smoothed spectral ordinates is easily seen to be proportional to the amount of overlap between the spectral windows centered at o and A. var(f.3.9) implies that For the variance. however. we can extend the periodogram and hence the sample spectrum using the periodic property. the condition (13. f. we can fold the weights back into the interval -T to a.11). Because the periodogram is periodic with period 2 7 ~ .1) is used at the Fourier frequencies. By condition (13.3.. we note that (13.(w)) +0 as n + w. the smoothing is essentially applied to the periodogram. spectral ordinates at two different frequencies o and A are in general correlated because they may contain some common terms in the smoothing. the discrete form of (13.3.8) can be approximated by the sum .3.13.Also. Clearly. In the actual computation of the smoothed spectrum. when the window covers frequencies that fail to lie entirely in the range between -v and a. where mk = 2 ~ k / 1 1Thus. Thus. hence. and tbe value at wl is used in its place. Because the sample spectral ordinates at different Fourier frequencies are independent. the covariance will be small if the spectral windows overlap only slightly. because the periodogram at frequency zero reflects the sample mean of Z. A few other remarks are in order.. with its weights inversely proportional to the magnitude of k. Hence.2. This weighting function W. (k) = I V ( i ) . The lag window and the spectral window are closely related because the autocovariance is the iiyerse Fourier transform of the spectrum. Thus.2a).e. an alternative to spectrum smoothing is to apply a weighting function W ( k ) to the sample autocovariances. from (13.2a).(k) for the autocovariances is called the lag window.3. the weighting function W(k) should also be chosen to be symmetric.(k) is chosen to be an absolutely summable sequence W. Tk Tk Because the sample autocovariance function is symmetric and because is less reliable for larger k. From (13. which is often derived from a bounded even continuous function W(x) satisfying The value of M is the truncation point that depends on the sample size tt.2..304 CHAPTER 13 Estimation of the Spectrum 13. the inverse Fourier transform of f ( h ) is Thus. . we have where the weighting function W.2 Smoothing in the Time Domain: The Lag Window Note that the spectrum f(m) is the Fourier transform of the autocovariance function yk.i. Because the lag window is often derived from a bounded even continuous function as given in (13. we have . The ~veightirtgfunctiort was the standard term used in the earlier literature.3 The Smoothed Spectrum 305 where is the spectral window. i. the fag window and the spectral window form a Fourier transform pair. Both the terms lag ~ v f ~ d oand were introduced by BIackman and Tukey (1959).. with one ~ u spectral witldolv being uniquely determined by the other.3.21) that the spectral window is the Fourier transform of the lag window and that the lag window is the inverse Fourier transform of the spectral window. By Parseval's relation from (11.14) can also be written in terms of a lag window. however. The variance expression given in (13.13.16). hence.5.3.181.e.3. It is clear from (13. Hence. The rectangular lag and spectral windows are both shown in Figure 13.12c)). which is the proportion of sample autocovarjances for which the lag weights are nonzero.306 CHAPTER I3 Estimation of the Spectrum Thus.3.3. + [sin(o(M ci 112)) .3. we now introduce some lag and spectral windows commonly used in time series analysis.sin(w/2)] .P(&)/ W2(x)&. 13. sin(o/2) (using (11. The Rectangular Window The so-called rectangular or truncated lag window where M is the truncation point less than (n .21). the corresponding spectral window is given by 1 =-[I 2~ - sin(oM/2) + 2 cos(o(M ~+h (1)/2) O/2) 1 sin(o/2) 2%- 1 .8)). (using (11. 1 vu[f (o)] " .11.3 Some Commonly Used Windows With the above general background. is derived from the continuous function From (13.2.2. -1 This result implies that the variance of the smoothed spectrum for a given lag window is related to the ratio M/n. . As a result. For a given spectral window W.2. .30) . the smoothed spectrum estimator using this window may lead to negative values for some fiquencies o.4 Bandwidth of the spectral windo\v. (13. as shown in Figure 13.3 has a main lobe at w = 0 with a height of (2M 1)/2a. one commonly used definition for the bandwidth of the window is the distance between the half-power points on the main lobe. [b) Rectangular spectral window.3 The rectangular lag and spectral windows. this result is certainly not desirable.3. (a) Rectangular lag window.4. . .(w) that attains its maximum at w = 0. and side peaks (lobes) of decreasing magnitude approximately at o = 6f4j l)n/(2M 1) for j = 1.13. Because the spectrum is known to be a nonnegative function. Note that the spectral window as shown in Figure 13.(f wl) = ~ " ( 0 ) .. FIGURE 13.3 The Smoothed Spectrum 307 FIGURE 13. 4 where wl is such that W. zeros at o = f2j7~/(2M I). + + + + Bandwidth = 2wJ. That is. Q can be approximated by . the value of O.9) and (11. the bandwidth of the spectral window is inversely related to the truncation point M used in the lag window.2. as M increases. In general. The corresponding spectral window is given by f sin(o/2) cos(wM/2) sin(w(M .308 CHAPTER 13 Estimation of the Spectrum Lf wll is the first zero of W. In terms of the above rectangular spectral window ~ i ( w ) the .2.8) . the window is also known as the &iangular window.all.3.1. bandwidth is approximately equal to 2 d 2 M 1). On the other hand. hence. + BartIett's Window Bartlett (1950) proposed the lag window based on the triangular function Hence. hence. the variance of the smoothed spectrum increases although the bias decreases. the bandwidth increases and the variance decreases while the bias increases. the bandwidth is approximately equal to oll.1)/2) sin(w/2) } by (11. as M decreases. the bandwidth decreases. Thus.(w). as discussed in Section 13. 5. the Bartlett's spectrum estimator is always nonnegative.3.3.cos w ) -t. a direct comparison of Equations (13. The effect of large side lobes is to allow f (A) to make large contributions in the smoothing at frequencies distant from w. (b) Bartlett's spectral window.2.cos wM) 2~rM[iin(w/2)]~2 2 {L( .12b) 1 . Furthermore. 309 by (11. lag window.-(cos o . (a] Bartlett's triangular .33) shows that the side lobes of the Bartlett window are sm~flerthan the side lobes of the rectangular window.3 - - The Smoothed Spectrum 1 -(1 . the resulting spectrum FIGURE 13. Because the spectral window Wf(w) is nonnegative.29) and (13.cos w M ) 2~r~[sin(m/2>]~ Bartlett's triangular lag window and its corresponding spectral window are shown in Figure 13.13.5 BartIett's lag and spectral windows. Hence. 2a + 2a cos vx.29).3. M is the truncation point for the sample autocovariance function and the constant a is chosen in the range 0 < a 5 . a result.?r/M).This phenomenon is known as leakage.and (o4.29)As . Ixt > 1. Because lobes are produced by the Fourier transform of sharp corners. the general principle of the sefection of lag windows is to avoid functions with sharp corners.?r/M). 1x1 5 1.e.. o. Thus.310 CHAPTER 13 Estimation of the Spectrum estimate fw(o)may reflect significant spectral components at other frequencies different from o. Again. where W. The corresponding spectral window can be derived as follows: where we have used (13.(o) is given by (13. i. . the Blackman-Tukey spectral window is a weighted linear combination of the spectral windows for the rectangular lag function at the frequencies (o. The Blackman-Tukey Window Blackman and Tukey (1959) suggested the lag window based on the continuous weighting function W(x) = 1 .3. the Blackman-Tukey spectrum estimator may also be negative at some frequencies o.v$(k)r 0 for all k.25 so that t. (13.' C W. The window (13.rr k=-&f +2 2 (1 . The Parzen Window Parzen (1961b) suggested the lag window which is based on the continuous weighting hnction The corresponding spectral window for an even value of M is given by 1 W./ k l / ~cos ) ~wk M/2<lkl~df where we refer readers to a more 'detailed derivation in Parzen (1963). The resulting window is which is also known as the Tukey-Hamming window in the Literature.W.(k) COS wk 2.25 is called as hanni~tgafter the Austrian meteorologist Julius von Hann.P(u)= - ".3.34) with a = . who was not directly related with the subject.3.23 as hamtning after R.13.3. When M is large.41) is approximated by .34) with a = .3 The Smoothed Spectrum 31 1 Blackman and 'I'ukey (1959) called the window in (13. one of nkey's coauthors who had worked on the subject.Hamming. The window is given by which is also known as the Tukey-Hanning window or the Tukey window in the literature. equivalently. the Tukey-Hanning estimator will have a smaller bias than the Parzen estimator. Many other windows are introduced in the literature. Because W:(o) is always nonnegative. The spectrum estimates produced for the same window shape and different bandwidths vary. Chapter 6) for more details. the Parzen window will produce a nonnegative and smoother spechum estimator. we are concerned not only about the design . the form of the weighting function) and the bandwidth of the window (or. the truncation point). in spectrum smoothing.@) Parzen lag window [M = 53. (d) Parzen spectral window (M = 5).e. Interested readers are referred to an excellent book by Priestley (1981. Both the Tukey-Hanning window and the P m n window are plotted in Figure 13. (c) Tukey-Hanning spectral window (M = 51. for the same mncation point M.As can be seen from Figure 13. Thus. however. Hence. The above windows are some commonly used ones. The quality of a smoothed spectrum is determined by the shape of the window (i. particularly in terms of the commercially available software. (a] Tukey-Hanning lag window (M = 5).6.6(b) and (d).. the bandwidth of the Tukey-Hanning spectral window is narrower than the bandwidth of the Parzen spectral window.6 The Tukey-Hanning and the Parzen windows.312 CHAPTER 13 Estimation of the Spectrum FIGURE 13. The latter concern often is a more crucial and difficult problem in time series analysis because for a given window shape there is no single criterion for choosing the optimal bandwidth. 13. on the other hand. . Parzen. a truncation point M can also be chosen as a function of n.Also shown are the smoothed sample spectrum functions using the Bartlett window with truncation points M = 2. . Z.13. and 10. In fact. Chapter 7) suggest choosing three values of M.7 shows the sample spectrum of the lynx pelt sales data over the interval (0. The sequences of spectral estimates are then computed and examined with these M values to see whether an intermediate value between MI and M3. choose a spectral window with an acceptable and desirable shape. . from a process with the spectrum f(w). and Tukey-Hanning windows all lead to the periodic component at the frequency w = . To compare different estimates.3 The Smoothed Spectrum 313 of spectral windows with desirable shapes (or "window carpentry" as John W.are independent and identically distributed as .4 Approximate Confidence Intervals for Spectral Ordinates . Tukey called it). but the rectangular window does not lead to this periodic component. Otherwise. 5. For a moderate size 11. and M3 with M3 = 4M1. The estimates using the Bartlett.T). The spectral bandwidth can also be determined by choosing a truncation point M such that ?(k) for k > Mare negligible.3. the smoothed spectrum will blur these peaks. Figure 13. To ease the difficulty.Ail. if we wish to be able to distinguish several peaks offlu) at frequencies XI. the number of observations in a series. This choice of M has been found practical and was used successfully in a number of cases. The spectrum for M = 10 clearly is improper. The smoothed spectrum given by the rectangular window is clearly undesirable. or a value larger than M3 should be chosen. narrower bandwidths Iead to smaller bias and smudging and hence to higher resolution of the resulting spectrum estimates. M can be chosen to be M = [n/10]. Then initially calculate the spectral estimates using a large bandwidth and then recalculate the estimates by using gradually smaller bandwidths until the required stability and resolution are achieved. Alternatively. but also about the bandwidth of the windows. the sample spectral ordinates f ( w k ) at Fourier frequencies u k = 2.2 Figure 13. &. with okf 0 and a. As. the following steps are often suggested. First.. . A2. Jenkins and Watts (1968. M2." Because the bandwidth of a spectral window is inversely related to the truncation point M used in the lag window. . Additionally. a commonly chosen bandwidth is equal to minj]hi. This procedure is often referred to as "window closing.8 shows the spectrum estimates using various windows with the same truncation point M = 5.rrk/n. In principle. . or a value smaller than MI. say. M I . wider bandwidths produce smoother spectrums with smaller variance of the estimators. EXAMPLE 13.the bandwidth of the spectral window should not exceed the minimum interval between these adjacent peaks. . A compromise is needed between high stability and high resolution. Recall that for a given sarnple ZI. 1 .68544. the smoothed spectral ordinate &(wk) will be distributed as + where the degrees of freedom DF = (4m 2) is simply the sum of the degrees of freedom of (2m 1) x2(2) random variables. In fact. Thus. The estimator in + .6 Frequency (0 to w) Frequency (0 to ?r) FIGURE 13.35 0.314 CHAPTER 13 Estimation of the Spectrum [a) Spectrum for pelt sales data @) Bartlett window (M= 2) Frequency (0 to T) Frequency (0 to T) C (c) Barlleft (d) Bartlett window (M = 10) window (M= 5) t 0. i. one of the pioneers of spectrum estimation.7 Sample spectrum of the logarithms of the Canadian lynx pelt sales and Bartlett estimates for different values of M.e.. if we smooth the sample spectrum with the simple (2m + 1) term moving average. this property is the first smoothing method introduced by D a ~ d f1946). 44) is also known as Daniell's estimator.(0) 3.(4 where c and v are chosen such that and . this nice additive property no longer holds for if o is not a Fourier frequency or if the sample spectrum is smoothed by a spectral window with nonuniform weights.80 Frequency (0 to TF) Frequency (0 to T) FIGURE 13. w e can only approximate a general smoothed spechm by 3.3.13..3 The Smoothed Spectrum 31 5 @) Bartlett window (M= 5) (a) Rectangular window (IM= 5) Frequency (0 to T ) (c) Tukey-Hanning window (M= 5) (d) Parzen window (M = 5) 0.8 Smoothed sample spectrum of the logarithms of the Canadian lynx pelt sales using different windows. Thus. However. (13. 4.14). we can also write the equivalent degrees of freedom in terms of the lag window where W[x) is the continuous weighting function used in the associated lag window. we have ~ [ f w ( u l= l f(m) and Hence.3.3.3.23) and (13. .3.12)and (13.26) or (13. Using (13.(w). for any given spectral window W. The equivalent degrees of fieedom v for some commonly used windows are given in Table 13.25).316 CHAPTER 13 Estimation of the Spectrum Now.from (13. cv = f (o) and giving It folIows that where v is often referred to as equivalent degrees of freedom for a smoothed spectrum.3. & ( 4 1 ) = 60.56./M 3n/M 2.3.4 Equivalent degrees of freedom for various windows. & ~ ( 4 = 1 )25. we have.53). For n = 55 and M = 5.49) or (13.3. we calculate the 95% confidence interval for the spectrum of the Canadian lynx pelt sales discussed in earlier examples using Parzen's window with M = 5.13.709nlM By (13.53) is proportional to j.54).79 = 41.2. EXAMPLE 13. Rectangular Bartlett Tukey-Hamming Tukey-Hanning Parzen . becomes .3. however.52).(&) are proportional to f(w) and f2(w).orj100% confidence interval f o r f f : ~is) given by where Y is the equivalent degrees of freedom calculat2d by (13. the logarithmic transformation of the spectrum estimator lnfw(o) is often suggested.3.709(55/5) = 40. from the discussion in Section 4. the (1 .3 For illustration. will be the same for any &equencyo.(o) and hence varies with frequency.53)..3.3.4. Hence.a)l00% confidence interval for lnfiw) is given by Note that the width of the interval for?(@) as given in (13.3. respectively. Recall that the asymptotic mean and variance of A. the 95% confidence interval for f(w). a (1 . we have where X:(V) is the upper a% point of the chi-square distribution with v degrees of freedom.22 and ~ . Hence.5 16?1/M 8n/3M 3.3.51). from (13. from Table 13. Because ~ . From (13. The width of the confidence interval for lnf(w) as given in (13. v = 3.3 The Smoothed Spectrum 317 TABLE 13. and 9 be the estimates of +I. . A reasonable for some p.74 occ'm at w =: . any stationary process can be approximated by an AR(p) model. Frequency (0 to ?r) FIGURE 13.4 ARMA Spectral Estimation As discussed in Chapters 2 and 3. Let alternative to spectrum estimation is to substitute these parameter estimates in the theoretical expression for the spectrum of the AR@) model discussed in Section 12. . Note that the maximum of fw(m) = .69 is given by The confidence intervals for f@) at other frequencies w can be ca1culated similarly and are shown in Figure 13. . . b. &.. .9 The 95%confidence intervals for the spectrum of the logarithms of the Canadian lynx pelt sales using Parzen's window. and ui.2.9.69. 13. the 95% confidence interval forf(o)at w = .3 18 CHAPTER 1 3 Estimation of the Spectrum where jw(o) is the estimated specbum using Panen's window with M = 5 given in Figure 13. .. Thus.8. i. .2.e. . . . p should not be too small because an inadequate order of p leads to a poor approximation and hence may increase the bias of the estimated spectrum. truncation point.01. .13. the choice of the orders p and q can be determined using criteria such as Akaike's AIC. dl. and (ri. For large n. q) model al.(e-'") = (1 .(Pie-I" . . . similar to the choice of bandwidth. (PP. the quality of the ARMA spectral estimation relies on the proper choices of p and q in the approximation. On the other hand. .Let . .. Consequently.4 ARMA Spectral Estimation 31 9 : where &.5) is known as ARMA spectral estimation... If the AR order p is correctly determined.7.. to control the variance. As discussed in Section 7. and different windows. the order of p chosen to approximate the process should not be too large. Sitnilar to the AR spectral estimation. we can approximate the unknown process by an ARMAb. More generally. Parzen (1977) suggested the use of CAT criterion in choosing the optimal order p. . . .&. . then asymptotic simultaneous confidence bands for autoregressive spectra can be constructed using the method by Newton and Pagano (1984). .This method of spectrum estimation through an AR approximation was suggested by Akaike f 1969) and Panen (1974) and is often known as autoregressive spectral estimation. . for some p and 4.O. the determination of the order p in the autoregressive approximation becomes a crucial step in the AR spectral estimation. ...and &: be the consistent estimates of #J~. .4. .The spectrum for the underlying process can also be estimated by h A where and Equation (13. .iq. Parzen (1974) showed that A Thus.GPe-bu). . I) spectrum Frequency (0 to ?r) Frequency (0to n) FIGURE 13.10.116.I24 and the ARMA(2. They are extremely similar.4.. 13.17 years. After carefully examining Figures 13.68544 yielding a period of 9. were adequate.4 Recall that we fitted the logarithms of Canadian lynx pelt sales with AF&lA models in Chapter 7 and found that both the AR(3) model with 6: = .7. and 13. the spectrum of the logarithms of Canadian lynx pelt sales can also be estimated by Both the spectra (13. . (a) AR(3) spectrum @f ARR.8) and (13.10 ARMA spectral estimation for the logarithms of Canadian lynx pelt sales.l) model with 62 = .320 C W T E R 13 Estimation of the Spectrum EXAMPLE 13.98).9) are plotted in Figure 13.10. Thus. where X. and both have the sharp peak at o = .4.TA(2. = (In 2.8. one should find that the ARMA spectral estimation is comparable with other estimates for this data set. 4 (a) Compute the smoothed spectrum for the data set given in Exercise 131 using the Tukey-Hanning window with M = 10.6 Consider the first differences of Series W4 discussed in Chapters 6 and 7. 13. and 15.8 Find and discuss the ARMA spectral estimates for Series W2 and W3. (b) Perform Fisher's test for the maximum periodogram ordinate.5 (a) Compute the smoothed spectrum for the Series W2 using Parzen's window with a proper choice of M. (d) Comment your results obtained in parts (b) and (c). 13. (b) Compute and plot a smoothed spectrum using a proper spectral window.6. Repeat the analysis requested in Exercise 13. 13.1 Consider the following data set (read across): (a) Compute the periodogram. 13. 10. 13. (c) Find and plot an ARMA spectral estimate.Exercises 32 1 13.3 Compute and discuss the smoothed spectrum for the data set given in Exercise 13. 13. (b) Perform Fisher's test for the maximum periodogram ordinate. (a) Compute the periodogram.2 Consider each of the series W2 and W3 discussed in Chapters 6 and 7.1 using Bartlett's window with truncation points M = 5. (b) Find the 95% confidence interval for the spectrum of the unknown underlying generating process for Series W2 using the result obtained in part (a). (a) Compute the periodogram. . 13. (b) Find the 95% confidence interval for the spectrum of the unknown underlying generating process for the data set using the results obtained in part (a). (c) Perfom Whittle's test for the second largest periodogram ordinate.7 Consider the proper differenced sequence of Series W9 discussed in Chapter 8. and diagnostic checking of these models. The frequency domain approach of bivariate processes is also presented. . Jenkins. 14. In this chapter.w where v(B) = vl.1) are often called the impulse response weights. and Reinsel call Quation (14. and y.Transfer Function Models In earlier chapters. The transfer function model is said to be stable if the sequence of these impulse response weights is absolutely summable.1. Detailed step-by-step model building is illustrated with an empirical example. For example. Thus. Jenkins. when x. we discuss the identification. Recall from Section 12. are properly transformed series so that they are both stationary.e. The coefficients vj in the transfer function model (14. if ZIv.3 that the term transferfrcilctiortis also used to describe the fiequency response function in the literature. single-output linear system. and relative humidity.1 General Concepts Assume that x. As a function of j.1.1. the noise series of the system that is independent of the input and Reinsef (1994) and ~ z is series x. In a singleinput. are assumed to follow some AREvlA models.1. vj is also called the impulse response function. Equation (14. Forecasting using transfer function models is aIso discussed. are related through a linear filter as &=. sales may be related to advertising expenditures or daily electricity consumption series may be related to certain weather variable series such as daily maximum outdoor temperature.. We study the single-output linear system with both a single-input and a multipleinput Iinear system.1 Single-Input Transfer Function Models 14. For an obvious reason. and n.1) is also known as the ARMAX model. we consider transfer function models where an output series is related to one or more input series. the output series yt and the input series x.@ is referred to as the transfer function of filter by Box. Box.. in a stable system. i. estimation. we were concerned with the time-domain and the frequency-domain analysis of a univariate time series. a bounded input .l < 00.1) the transfer function model. After studying the basic characteristics of these transfer function models. Xj=O[vjl< 00.- where o(B)= 00 . we often consider only the stable and causal model CO cn where v ( B ) = Zj=O vjBj.w. the present output is affected only by the system's current and past input values.1. FIGURE 14.w l B . In practice.Br. Thus. we represent the transfer function v(B)in the rational form . . To alleviate these difficulties. based on the available information of the input series xt and the output series y.1. 6(3) = 1 .1 Dynamic transfer function system.. The difficulties are that the information on x. and X. .6. and n.. . The transfer function model is said to be causal if vj = 0 for j < 0.2) may contain an infinite number of coefficients.14. and yt is finite and the transfer function v(B) in (14. The purposes of transfer hnction modeling are to identify and estimate the transfer function v(B) and the noise model for n.1 Single-Input Transfer Function Models 323 always produces a bounded output.alB . The system is presented in Figure 14. r . the system does not respond to input series until they have been actually applied to the system. are independent. in a causal model. we assume that the roots of S(B) = 0 are outside the unit circle.BS. For a stable system. A causal model is also called a realizable model as it appears that all real physical systems are causal. and b is a delay parameter representing the actual time lag that elapses before the impulse of the input variable produces an effect on the output variable. In other words. 5) . Some typical transfer functions are illustrated as follows: Tjpe I . Type 2. if r = 0. . 3. . vb+. b zero weights vo.~ +Vb+s-r+2r I. the values of r and s in the system (14. 2. . the impulse response weights exhibit an exponential decay starting from vb if s = 0.1. . . .~ serve +I as starting values for the difference equation Tn other words.-~.1.1. r = 1.. and b and their relationships to the impulse response weights uj can be found by equating the coef6cients of Bj in both sides of the equation Thus.3) rarely exceed 2. .= -usas shown in Table 14. from v b + if s = 1. The r impulse response weights vb+. In this case.2.The value r is determined by the pattern of the impulse response weights in a similar manner to the identification of the order p for a univariate ARTMA model through the pattern of the autocorrelation function. + + In summary.1. . In this case. that follows the pattern given in (14. vb-I s . r = 0. then the value of s is found by checking where the pattern of decay for impulse response weight begins. vj. b is determined by vj = 0 for j < b and vb # 0. ..V ~ + ~ . 4. v b + ~ .2 Some Typical Impulse Response Functions In practice.r 1 weights vb. if r # 0. vl. . for j > b s. r.3) consist of the following: 1. the impulse response weights for the system (14.1. . + 14. . then the value of s can be easily found using that vj = 0 for j > b s. For a given value of b. vb+s-rthat do not follow a fixed pattern r starting impulse response weights v ~ + . .and vb+. . the transfer function contains only a finite number of impulse response weights starting with vb = 00and ending at vb + . .324 CHAPTER 14 Transfer Function Models The orders of s. we have that . and from vb + XS = 2 as shown in Table 14. Transfer function Typicai impulse weights o p e 3.~) Transfer function %pica1 impulse weights TAl3LE 14. if Si 4a2 r 0.2 Transfer function for r = 1. are both univariate stationary processes and the cross-covariance function between x. i. if Sf 4S2 < 0..2. Given two stochastic processes xt and y. and y. the impulse response weights exhibit either a damped exponential or a damped sine wave depending on the nature of the roots of the polynomial S(B) = (1 .S I B .e... f 1.S2g2)= 0. i. The value of s is determined similarly as described earlier. Cov(xt.2 The Cross-Correlation Function and Transfer Function Models 14. + +- 14. and y. for t = 0. y.2 The Cross-Correlation Function and Transfer Function Models 325 TABLE 14.They follow an exponential decay if the roots are real. .3 illustrates some examples of impulse response weights exhibiting a damped sine wave. and y. Table 14.). (b.e. . .1 The Cross-CorrelationFunction (CCF) The cross-correlation function is a useful measure of strength and direction of correlation between two random variables. is a function . f2.14. r = 2. are jointly stationary if x. they follow a damped sine wave if the roots are complex. In this case. we say that x.1 Transfer function for r = 0. . .) = y d . we have the following cross-correlation function (CCF): for k = 0. and y. . Instead.~ c y )= E(y. It is important to note that the cross-covariancefunction yq(k) and cross-correlation functions p.2. . + k h ) ( x t . it is important to examine the CCF. the CCF measures not only the strength of an association but also its direction. Upon standardization.2. for both positive lags k > 0 and negative lags k < 0. and the series y.: t) only.(k) = pJ-k). f2. Typical impulse weights Transfer function of the time difference (s function between x. we can rewrite Equation (14.)(yl + k .1 Consider the simple AR(I) process ( 1 .k ) . Ttius..(k) # p d . the cross-correlation function is not symmetric. we have p. f1 . where px = E(x. = a. is a white noise series with zero mean and constant variance a:.(k). .(k) = E(x.)..p.p. because y.3) as + . where axand my are the staadard deviations of x. i. pJk) = px(-k). however. For time t k. The graph of the CCF is sometimes called the cross-conefogram.) and py = E(y. p. To see the full picture of the relationship between the series x.e.326 CHAPTER 14 Transfer Function Models TABLE 14. ... (14. p. . which is symmetric around the origin. .3 Transfer function for r = 2. - EXAMPLE 14.(k). for k = 0. In such cases.e. we have the following cross-covariance .@)Z.(k) are generalizations of autocovarimces and autocorrelation functions because y d k ) = yJk) and p d k ) = p.3) where 141 < 1 and n. Unlike the autocorrelation function px(k). and y.k ) . *2. i. f1. and the trmsfer function v(B)= Bq(B)/+p(B). = a.2. for (f - #B)Z.$2). FIGURE 14./(l . is Hence. Because a univariate ARMA model &(B)Zt = Oq(B)atcan be written as G#I = . and 2. is the output. and Z.&3) = at+k + $a.2 The Cross-Correlation Function and Transfer Function Models 327 1 Zt+k = ( 1 .+k-l + +2at+k-2 + ' ' The cross-covariance function between a. the white noise series a. . The graph of this CCF for shown in Figure 14. In this formulation.14.5. is the input.2 CCF between a. the univariate time series 2. with 4 = . the CCF becomes where we recall that cr? = Var(Zt) = a.5 is the above example shows that the univariate ARMA model can be regarded as a special case of the transfer function model without a noise term. Even if r = 0 in (14. is defined only when x.7) reduces to Thus.7).7)to form a system of equations to solve for vj as a function of p. as shown in Equation (14. (k) = 0 for all k. i. Hence. To achieve the required stationarity. foIlows an ARMA process .2.1.(k) is difficult. unless mentioned otherwise.e.2.(k) and p. The CCF. are assumed to be jointly stationary in our discussions. we have where we recall that y.1-2)can be written as Without loss of generality we assume that ~ 1=. the use of Equation (14.2 The Relationship between the Cross-Correlation Function and the Transfer Function For time t + k. then Equation (14. In the general transfer function model we assume that the input series x. The variance and covariance of the sample estimate of p.5) and taking expectations. and y. the processes x.(k). 2. are jointly bivariate stationary processes. p. The relationship between the CCF.328 CHAPTER 1 4 Transfer Function Models 14. 0 and py = 0. and y.(k) are clearly contaminated by the autocorrelation structure of the input series x. Thus. pe(k) and the impulse response function vj is clearly contaminated by the autocorrelation structure of the input series xt. Multiplying by xt on both sides of Equation (14. however.(k).3) and hence the transfer function v(B) contains only a finite number of impulse response weights. the impulse response function vk is directly proportional to the cross-correlation function p.(k) = 0 for k # 0. 1 . p. Some remarks are in order.making the identification of both p.2.(k) and uk difficult. If the input series is white noise. the transfer function model (14.2. some differencing operators and variance stabilization transformations may be necessary.2.. 3.1 Sample Cross-Correlation Function For a given set of time series data x. (14. Letting E .is white noise. Sy= m. The series a. Applying the same prewhitening transformation to the output series y.3. ' ( B ) ~ ~ ( B the ) ~ . = B . 1 5 t 5 n. respectively..9) becomes The impulse response weights vj for the transfer function can therefore be found as This result leads us to the fundamental steps of the identification of the transfer function models to be discussed in the next section. the cross-coxelation function is estimated by the sample cross-correlation function sx= V'xm.329 14.2. and Z and are the sample means of the xt and y. function model (14. 14. and y. is often called the prewhitened input series.transfer . we obtain a filtered output series.. series..3 Construction of Transfer Function Models 14.3) .3 Construction of Transfer Function Models where a. we can test the hypothesis that the two series xt and y..(k) with their approximate standard errors I/-.3. we compare the sample CCF &(k) with their standard errors. series is usually not white noise. are uncorrelated and the x.4) becomes It follows that Thus.. (14. Tn practice. Under the hypothesis that the two series xt and y. as shown in the next section. series is white noise.(k) are zero. the x. are not cross-cqrrelated by comparing the sample CCF . Under the normal assumption.(k j). . Bartlett (1955) derived the approximate variance and covariance between two sample cross-correlations &(k) and . when the xt series is white noise. and we have to prewhiten it and also filter the output series.330 C W T E R 14 Transfer Function Models To test whether certain values of the p. The covariance is given by + Hence. 4.w l B ..k)-'fi.1 and 14. S(B) = (1 . Identify b.1. and s are chosen. to estimate ~ k : The significance of the CCF and its equivalent Ck can be tested by comparing it with its standard error (n . we have a preliminary estimate of the transfer function v(B) as ij Identification of the Noise Model Once we obtain the preliminary transfer function. where at is a white noise series with mean zero and variance 2.and P. Thus. .2 Identification of Transfer Function Models Based on the above discussions. and w(B) = ( o o . B S ) by matching the pattern of Gk with the h o w n theoretical patterns of the v k discussed in Sections 14.1. . transform the output series y.alB . as shown in Equation (14. r.o . the transfer function v(B) is obtained from the following simple steps: 1. That is.. using the above prewhitening model to generate the series 3.1.2.3 Construction of Transfer Function Modeb 331 14.Br).6.14. we can calculate the estimated noise series .3. Prewhiten the input series. preliminary estimates hj and can be found from their relationships with vk. Calculate the sample CCF. Calculate the filtered output series.4). Once b. &(k) between a. 332 CHAPTER 14 Transfer Function Models The appropriate model for the noise can then be identified by examining its sample ACF and PACF or by other univariate time series identification tools. however. Thus. In the construction of the model. giving Remarks on the Identification of the Transfer Function Model Combining (14.3.14). This process is often referred to as double prewhitening. . . .3 Estimation of Transfer Function Models After identifying a tentative transfer function model .16) as . . and x.12) and (14. are all stationary. . see Granger and Newbold (1986). For a good reference and example of constructing a noncausal system. x. . d = (dl.a. .)'. w.3. In the above process of identifying the transfer function v(B). + . ) I .. we have the following entertained transfer function model: Some important remarks are in order in the construction of this transfer function model. however. but not necessarily to whiten it. both input and output series should be prewhitened before examining its CCP..3. is also influenced by y. and a. is influenced by x. 1.)'. o = (wO..3. and ui. . where y. This method for constructing a causal transfer function model is nonnal and simple. The prewhitening model is applied to Hter the output series. 14. 2. where # - . for nonstationary time series. equivalently. it is assumed that the variables y. generally more advantageous to model a noncausal system using a vector process discussed in Chapter 16.We can fewrite (14. . . . It is. or. some variance-stabilization and differencing transformations should be used k s t to achieve stationarity.)'.O. .. For constructing a possible noncausal system with feedback phenomenon.. we prewhiten the input series.W I . we need to estimate the parameters 6 = (a1 . Note that so far we have assumed that b is known. p. the nonlinear least squares estimate of these parameters is obtained by minimizing w f i e r e f o = m a x { p + r + 1 .3.)-"/~exp 1 (14.For example. yo. by setting the unknown a's equal to their conditional expected values of zero. we have the following conditional likelihood function: &kp L ( 6 .yo. x. In general. and ek are functions of ai. and (ri. the OLS estimates in this case are not necessarily consistent.14.3. dj. As pointed out by Miiller and Wei (19971. B e y introduce an iterative regression procedure to produce consistent estimates.#. Then b is selected to be the value that gives the overall minimum of the sum of squares. Y.2. d 1 b. then (14. are N(0. if we need aIso to estimate b. . ao) = (2?rt~. Under the assumption that the a. &fl. and on the basis of these consistent estimates they also propose an alternative method of model specification. 6. s. 8. For given values of r. b + p + s + I}. one may be tempted to use it to estimate the parameters. in terms of the general transfer function model in (14.3. however.2. Because the ordinary least squares method is the most commonly used estimation procedure.20) where xo.9). and 6. and q.21)can be optimized for a likely range of values of b.3 Construction of Transfer Function Models 333 and Thus.. where ci. a:) white noise series. We refer readers to Miiller and Wei's paper for details. the estimation methods introduced in Chapter 7 can also be used to estimate the parameters o . a0 are some proper starting values for computing a. o. from (14.19) similar to the starting values needed in the estimation of univariate ARIMA models discussed in Section 7. xo. uj. we have to examine the residuals 4 from the noise model as well as the residuals cy. In the transfer function model.(k) 0 and p. between Gt and or. from the prewhitened input model to see whether the assumptions hold.As shown in Pierce (19681. 2.334 CHAPTER 14 Transfer Function Models 14. For an adequate model. A portmanteau Q test similar to the one in (7.. we assume that the a. In terms of model inadequacy.to 1. both the sample ACF and PACF of ii. we have both p. For an adequate model. should show no patterns and lie within their two standard errors 2(n . In this case.5. and M is the number of parameters Siand wj estimated in the transfer function v(3)= o(B)/8(B). calculated.Thus.q) degrees of freedom depending only on the number of parameters in the noise model. and hence are also independent of the prewhitened input series q.3. are white noise and are independent of the input series x. and other purposes. and the input series x. are independent. which is the number of residuals Ci. suppose that the correct model is + but we incorrectly choose a wrong transfer function vo(B)leading to an error term b. the number of degrees of fkeedorn for Qo is independent of the number of parameters estimated in the noise model.1) can also be used.p . 1. Case 1: The transfer function v(B) is incorrect. it is necessary to check the model adequacy before we can use it for forecasting. in the diagnostic checking of a transfer function model.(k) # 0 for some k. where m = n . The following portmanteau test can also be used: + which approximately follows a X2 distribution with (K 1) .k)-'/'.. we can calculate + The Q1 statistic approximately follows a x2 distribution with (K .4 Diagnostic Checking of Transfer Function Models After the model has been identified and its parameters estimated.M degrees of freedom. Cross-correlation check: To check whether the noise series a. j z ( k ) . . regardless of whether the noise model is adequate. A~ctoco~~relafion check: To check whether the noise model is adequate. two situations may arise. To see. In other words. contral. the sample CCF. Then. should not show any patterns. i... and only the noise model is inadequate. both .(k) = 0 for a11 k. The pattern of b. Case 2: The transfer function v(B) is correct. . In summary. Lydia Pinkham's Vegetable Compound was first brewed and prepared \ \/ 011900 I 1910 I 1920 I 1930 Advertising I 1940 I 1950 I * 1960 Year FIGURE 14.(k) and { . In essence...3 Construction of Transfer Function Models 335 Thus. That is hue even \& +o(B) = t$(B). there is no need to check the autocorrelations for &. p. the Ernedy in this case is first to reidentify the transfer function v(B)and then to modify the noise model. but p. for a transfer function model to be adequate. In this case...(k) can be used to modify the noise model [B(B)/+(B)]a.3 Lydia Pinkham annual advertising and sales data (1907-1960). Because the noise model will be contaminated by an incorrect transfer function. but also \viU be autocorrelated.5 An Empirical Example The Data The Lydia Pinkham annual data listed as Series W12 in the appendix and plotted in Figure 14. and hence with a. 14. will not only be cross-conefated with x.3. One should reidentify the transfer function v(3)and repeat the iterative model-building steps until a satisfactory model is obtained. this step is similar to the analysis of a variance model with an interaction term where the test of an interaction effect should be performed before the test of the main effects. it is prudent to carry out the cross-correlation check first. when the noise model is correct. b.3 provide an interesting case for studying the dynamic relationship between advertising and sales. Hence. If the cross-correlation test fails. in the diagnostic checking of transfer function models. ( k ) should be statistically insignificant and show no patterns.14.e.(k) # 0 for some k. the product remained essentially unchanged in taste.336 CHAPTER 1 4 Transfer Function Models by the lady of the same name as a home remedy for her friends. The company relied exclusiveIy on advertising to increase primary demand for the product. = (1 . In addition. (14. = (1 . was considered to be effective against "women's weakness. and the a.. It was reexamined later by Heyse and Wei (1986). an herbal extract in atcoholic solution. A transfer function model using sales Y. and in 1941 the FTC permitted the company to strengthen its advertising copy. although in 1914 the solid content of the liquid compound was increased sevenfold at the prodding of the Internal Revenue Service. were used. 1976) between current sales and present and past advertising. Furthermore.. such as a sales force. Such lagged advertising effects on sales may result from delayed response to the marketing effort." Additional medical claims followed with the marketing success of the compound. or quantity discounts.3.as an output series and advertising expenditures X. Diagnostic checking does not reveal any inadequacies in the model. credit. indicates that the original series X. when sales reached $500. the U. x. For the purpose of forecasting comparisons. form. 1 .41~')x. as adopted in the following.. or to an increase in demand from existing customers.S.07B + . There were no close product substitutes on the market. the company was able to demonstrate the possibility that the active herbal ingredients had some medical benefits. the prewhitening filter for x.27) where x.601. but the company began to experiment with radio and television in the years following World War II.4. Prelvhiteriirtg of the advertising series. to a holdover of new customer demand. A Transfer Function Model between Advertising and Sales A strong relationship has been demonstrated (Clarke.B)Y. = (1 .. where we note that the original series of sales is also nonstationary.as an input series was first suggested by Helmer and Johansson (1977) to model the Lydia Pinkham data. The fitted model is ( 1 . The advertising budget was primarily for newspaper copy.000.are stationary and can be modeled as an AR(2) model. Thus. Based on Palda (1964). will be used to filter the differenced output series. which was considering an alcohol tax on the compound. The first step in building a transfer function model is to prewhiten the input series: advertising expenditures. Examination of the sample ACF and PACF. and action through the period in question. shown in Table 14. y. = 52. they hold back the last 14 observations and use only the first 40 observations (1907-1946) for model building. A part of the profits was reinvested immediately into high intensity advertising campaigns. Palda (1964) was later able to obtain the data for 1907-1960. The first commercial sale of the medicine was in 1873.are the white noise with mean 0 and 2. Records of a court litigation involving a family argument produced a complete listing of the firm's annual advertising expenditures and sales over the period 1908-1935. . = a. The compound.B)X.is nonstationary but its differences. Finally.S. Food and Drug Administration in 1925 to cease and desist from continuing to make strong medical claims. There were some changes in the character of the advertising copy used because of orders fiom the U. no other marketing instruments. Federal Trade Commission in 1938 was also examining the medical uses of the compound.B)X. advertising may have a cumulative effect on demand. 41g2)y. 5 6 7 8 9 9 10 11 12 10 11 12 TABLE 14. and the filtered output series p. where &. Identijcatiort of the imptrlse response functiort and trarrsferfunctiort. 4 1 ~ ~ ) x . .3.. where G p = 287. (14.14. (a) f i k and k 1 2 3 4 k 1 2 3 4 6 5 4mkfor X. Lag (4 CCF(L. 7 8 (b) filr and t$bh for x. = 229.) 2.4 Sample ACF and PACF of the advertising series X..07B + . = (1 . and the impulse response function.B)X.54).28) .5 Sample CCF and impulse response function for the prewhitened input and the filtered output series (i?. = (I . for the prewhitened input senes a.+. Gk = (Gfi/&a)bap(k).5 gives sample cross-correlation.3(k)) Impdse response function (9. = (1 ...3 Construction of Transfer Function Models 337 TABLE 14. = 229. 0 7 ~ .54.35. Table 14.35 and Gp = 287. . That is.29).. Ide~ifijcatioaof the noise niodel.Table 14.1. however. = (1 .B)Y. For the transfer function given in (14.3. and s = 0. 3. i. we can obtain preliminary estimates of the parameters in the transfer function by utilizing the relationships between the impulse response weights and the parameters given in Equation (14. and x.r = 1. from Table 14.'e. The estimated noise series is then obtained by where y.2 suggests a transfer function with b = 0. Once a tentative transfer function is obtained.BJX. then. To be conservative.6 Sample ACF and PACF for the estimated noise series. one can try an AR(1) model TABLE 14.338 CHAPTER 14 Transfer Function Models The pattern of the sample impulse response function Ck in Table 14.1. a transfer function with b = 0. = (1 . If only the first two large impulse weights Go and GI were retained and the remaining smaller ones were ignored.4). r = 0. we have Thus. The pattern indicates a white noise model.6 shows the sample ACF and PACF of the estimated noise series.. and s = 1 is also a possibility. & and the CCF between the prewhitened input a. Combining the above results.3. . Inspection of these two functions indicates that both the ACF and CCF are zero at these lags. Final adoption of a proposed model requires study of the residuals.E.E.33) Parameter Estimate St.3.34) and (14. between the prewhitened input and the residuals from Equations (14. St. one might conclude that Equation (14. 5.3.34) Parameter Estimate The results indicate that the parameters S1.3.B)I. wl. coufd be deleted from the (14.3. + a.35) The fitted model becomes with 6: = 46. and These models were in fact originally suggested by H e h e r and Johansson (1977).33) is adequate. Similar results hold for the residual ACF and the CCE.3.Estimates of the parameters and their associated standard errors are as follows: Equation (14. 4.7.935.35). hence. and one could consider the following model: (1 . Estimation of the transfer fitnctiotz model. are indeed white noise and are independent of the input series X..and the residuals 2. the following two tentative transfer function models are obtained. Diagnostic checking. = w o ( l .and hence independent of the prewhitened input series apThe first 11 lags of the residual ACF of .B)X..33) are shown in Table 14. fiom Equation (14.3 Construction of Transfer Function Models 339 and check the result of estimation in the combined transfer function-noise model to see whether = 0. one has to check whether the a.and models.3.14. Specifically. Equation (14. is said to cause series x. hence. denoted by x .3. it is important to examine p. and y.2.33).340 CHAPTER 14 Transfer Function Models TABLE 14.(O) # 0.I7 . Unfortunately. one needs consider only its nonnegative lags.(k) for both positive lags k > 0 and negative lags k < 0.and the filtered output series j3.18 -.333. if pq(k) # 0 for some k < 0 and p9(k) = 0 for all k > 0.. Now. &(. .I and paB(k) = 0 for k > 0 at a 5% significance level.18 .. and the presence of a lagged association from sales to advertising.19 .. TABLE 14. and y. As mentioned in Section 14. is said to cause series y. A causal transfer function model exists between series x. A feedback relationship between x. and y.3. the above procedure relied only on nonnegative lags.32 .e..).18 .43 -17 ..(O) # 0 and pry(k) = 0 for all k # 0. +.ll . The cross-conelations between the prewhitened input series a. Reexamination of the Transfer Function Model Are the transfer function models in Equations (14.This phenomenon happens because advertising budgets are often set as a percentage of earlier sales.. -+ y.4.17 . we have sales (Y.I8 . &(k) St.) rather than advertising (X. (14. To see the complete picture of the relationship between two time series x. The pattern clearly indicates a contemporaneous relationship. the resulting conclusions are dubious.51. Because pap(k) # 0 for k = 0 . are said to have a contemporaneous relationship if p.24 . series y. these procedures are wrong.8 Sample CCF bettveen the prewhitened input and the filtered output series.16 . They are said to be related only contemporaneously if p. Series x. E.26 .34). if pq(k) # 0 for some k > 0 and p.3. .8 and plotted in Rgure 14. This approach has been widely used in empirical research. fiap(0) = .51 .)-t advertising (X. denoted by x. fet us reexamine the Lydia Pinkham data. exists.)+ sales (Y.1 ) = . Two series x. . if p.(k) # 0 for some k < 0 as well as for some k > 0. and y.43.17 . Because the ACF is symmetric about zero.35) really adequate for the Lydia Pinkham data? Because in practice one often considers only a reatizable system. however.18 .09 . the CCF is generally asymmetric and measures both the strength and direction of an association.I7 -. denoted by x. i.7 Sample ACF and CCF for the residuals from Equation (14.H y.23 .00 .y.3. only when pq(k) =: 0 for k > 0 or pv(k) = 0 for k < 0.18 . and (14. a causal transfer function model. at both negative and nonnegative lags are shown in Table 14..(k) = 0 for all k < 0. 4 Forecasting Using Transfer Function Models Once a transfer function model is found to be adequate.3.9 Sample CCF between the prewhitened input series and the residuals.04 .17 -.I0 .4 Forecasting Using Transfer Function Models 341 FIGURE 14. .18 Equations (14.3..14.18 -.9 strongly suggests that the model is inadequate. the more detailed CCF between the prewhitened input series and the residuals from Equation (143.331. and the associated input series X.18 -.18 -.01 -17 .18 -19 -18 -05 . ( k ) St. 14.33). (14.4 Sample CCF between the prewhitened input and Htered output series with hvo standard error limits. TABLE 14. it can be used to improve the forecast of the output series Y.32 -18 .46 .3.18 -.3.(-I). or (14.31 .3. The significant spike.E.35) are constructed without using the information contained in the CCF at negative lags..341.I4 .17 .3. . at lag k = -1 implies that there is a feedback relationship from sales to advertising at one period which is not properly accounted for by transfer function models such as (14.34).That is particularly tnre if the input series is a leading indicator. by using the past history of both the output series Y.-.06 .35).331 as shown in Table 14. and (14. As a result. .02 . (14. although the residual ACF indicates that the residuals are white noise. and a. Let = * CO Zj=Oui+jat-j 4. are independent zero mean white noise series with variances ui and C T ~ . OfB).+l-i.1 Minimum Mean Square Error Forecasts for Stationary Input and Output Series Suppose that Y. Then the fore- where 00 cast error becomes .Zj=o$ & j ~ . d ( 3 ) = 0. and B.(B) = 0 are all outside of the unit circle.~be the I-step ahead optimal forecast of Y. Thus. the roots of 6(B) = 0.are stationary and are related in the stable transfer function model and where w(B). +x(B). = Zj-o ujs+. . and O.+x(B) = 0. = 1.and X.+l. respectively.(B) are finite order of polynomials in B.4.-j + XjEo $ju. and a. 4(B) = 0. Let and We can rewrite (14. S(B).+(B).1) as 2(1) 02 03 $.342 CHAPTER 14 Tsansfer Function Models 14.4. ( 1 . Equation (14. the minimum mean square error forecast of Yt+[ at time f is also given by the conditional expectation of Y.4. are nonstationary.4 Forecasting Using Transfer Function Models 343 The mean square of the forecast error E[X+[ .2 for a general A D A model.1)and (14. q) models. and X.4.14.2. weights for the ARIMA model. and the weights 19 and qj in (14. 6(B)..4.9).B ) ~ ) u . .4. Specificafly.(B).8) holds also for the nonstationary case. Because E(Y. +#3). are nonstationary but that ( 1 . and a.B ) ~ x are .at time t. (14. stationary and are related in the stable transfer function model and where w(B).the . can be reduced to stationary series by some proper transformations and differencing so that they follow some ARTNIA(p. the forecast is unbiased and the variance of the forecast is given by A 14. a.B ) ~ Yand . The required uj weights come from the first term ( ~ ( B ) / ~ ( B ) ) Band ~ x .+Iat time t even if Y .8) can be derived by the same method as Equation (5.and X. That is true as long as Y.. . suppose that Y.2). are defined as in (14.From and Y. we have 8. is the sum of two independent processes. and X.2.2 Minimum Mean Square Error Forecasts for Nonstationary Input and Output Series By the same argument used in Section 5.8(B).+.4. d. the minimum mean which is minimized when lcLj = ul+j and $f~i+~ = * square error forecast x(1) of Yt+[at time origin t is given by the conditional expectation of Y. .26) used to compute the rC. Likewise.i(l)j2 is given by * In other words.q(I)) = 0.4. weights come from the second term (B(B)/#(B)(f.+. Thus.16) where Next. To derive the uj weights. (14.the II. we first express Xt in (14. .26)to compute the following +)'I weights: .. .15) through (5.4..fB) = 0 are assumed to be outside of the unit circle. Thus. The required ui weights for j = 0. = I. (14.are thus equal to the coeffiand cient of B] in the following expression that exists because w(B) is a finite-order polynomiai in B and the roots of 6(B) = 0 are outside of the unit circle: It can be easily seen that if b # 0. we let and express el in an AR representation that exists because the roots of d(B) = 0 are outside the unit circle. .4..13) where By the results from (5. = 0 for j < b..26). .344 CHAPTER 14 Transfer Function Models To derive the Gj weights. .2.I). ~.BS)..10) in terms of its AR representation that again exists because the roots of 8.olB. T ( ~ ) ( L= I )a.weights are calculated recursively from the weights in (14.4. if b = 0. then uo # 1 unless =1 .2.o.2.4. a. T ( ~ ) ( B ) X=.then uj in o(B)= (oo.14) as follows: and $0 = 1. we use the result (5. (1 . 1. 4 Forecasting Using Transfer Function Models 345 For the actual computation of forecasts. and Gt is generated either by Equation (14.. is clear that for 1 > (g + r). = d(B)Xt-+ + e(B)a.i .10) as discussed in Section 5.9) and rewrite it in the difference equation form c(B)I.3 for the univariate AIUMA models.4. The required forecast of the input series can be easiIy obtained from Equation (14.14.20) or (14. (14.20) where and Thus.4. the forecast i ( l ) of Y. we consider the general transfer function model (14.4. . .+)depends only on the past history of Y and the present and past history of X.4.19) or as the one-step forecast error 1.3.~ ( l ) It. 3. and 6: = 52601. To compute the forecast. Y38= 2518. Note that Equation (14.. (14.4.bj.41B2)(1 .3 An Example The numerical computation of the minimum mean square error forecasts for stationary input and output series is simpler than that of nonstationary input and output series. EXAMPLE 14. = a. Ya = 2177. and Xjl = 836. Although the causal transfer function model has been found inadequate in describing the relationship between the Lydia Pinkham annual advertising budget and sales. we have that and (1 .05.23) as Hence. Now.2 From (14.101. we rewrite (14.4.07B where 6: = + . For illustration. the model equations themselves can be used to illustrate the computational aspect of the forecasting procedure described above. we have = . othenvise.24) 45175.B)X. X40= 1012.346 CHAPTER 14 Transfer Function Models 14.24).33). To obtain the forecast variance.. X39= 1145.4. from the data and model fitting. Thus. it equals the one-step ahead forecast of Xdl determined from its own past through Equation (14.4. Y39= 2637. The one-step ahead forecast from the forecast origin n = 40 is where k m ( l ) = Xll if it is h o w .4. we consider the nonstationary case of the Lydia Pinkham example. we need to compute the weights uj and t.23) can be written as . we have the following i..4. and Using (14. a.151. where Thus. we note that where 347 .4 Forecasting Using Transfer Function Modeb Let and a(a)(~)= e .14.bj weights: To obtain the tij weights. 348 CHAPTER 14 Transfer Function Models Hence.181.4.1 are equal to the coefficient of Bjin the following expression: .19). 1 . and np)= 0. .4. by (14. Using (14.the uj weights for j = 0. 1. Thus. for j 2 4.we have and . . aad y. For the 14 one-step ahead sates forecasts using the actual one-step ahead advertising value.3. .it does not seem appropriate to assume that the next period's advertising budget is known with certainty when forecasting the next period's sales. Reed from Section 14. Because of a feedback relationship from sales to advertising at lag 1.3. y.34). respectively. . 14.33)or (14. Rather than the actual advertising levels.4. and y . see Liu (1987).4.33)or (14. and y. For more examples and discussion on forecasting using transfer function models. .. are jointly stationary with the cross-covariance function y.080 obtained from a univariate time series model of sales..3.5 Bivariate Frequency-Domain Analysis 349 Having obtained the uj and weights. and is given by .23)and 14. Helmer and Johansson (1977) reported a mean squared forecasting error of 8912 for Equation (14.14. . then its Fourier &ansfom exists and is called the moss-spectrum (or cross-spectral density) between x.5 Bivariate Frequency-Domain Analysis 14. and y.34).4. if one-step ahead advertising levels from Equation (14.(B) = yx(B) and yyy(B)= yy(B) are autocovariance generating functions for x. Specifically.respectively.3. we can now calculate the one-step ahead forecast error variance using (14. & 1. both of which are considerably smaller than the MSE of 16.e. Zk=-lyxy(k)l < m. the last 14 observations (1947-1960) were held back to examine the forecasting performance.3. Define the cross-covariance generating function as where the coefficient of gk is the kth cross-covariance between x.1 Cross-Covariance Generating Functions and the Cross-Spectrum Suppose that x.23) and 9155 for (14.177 for (14. the corresponding MSEs of the 14 one-step ahead sales forecasts are 15.5.24) are used.044 for (14.4. This equation is the generalization of the autocovariance generating function introduced earlier.8)and get The forecasts fm(l)and their forecast error variances V ( I ) for other 1 can be calculated similarly. i..(k) for k = 0. If 03 the cross-covariancesequence yxy(k)is absolutely sumable.5 that in fitting the data set. f2. (w) will in general be complex because yxy(k) # yx. respectively. The coherence (or the squared coherency) is defined by whch is essentially the standardized cross-amplitude specaum.More precisely. fx.) l o o =y J k ) sin ok. we can express f.(-k). we can write where cXy(u)and -q.5. Note that even when yxy(k) is real for the real processes x. f.rr k 2 = 4 (14. and y.Alternatively. Specifically...(w) = &(o) are spectrums for x . however. and q (a)is called the quadrature spectrum.. XY of x.(o.2. 2.(w) andf.350 CHAPTER I4 Transfer Function Models which is the generalization of the spectrum (or spectral density) for the univariate series as introduced in Section 12. are difficult to ~nterpret. respectively. and y. Two other useful functions are the gain function and the coherence. . The gain function is defined as which is the ratio of the cross-amplitude spectrum to the input spectrum.(u) are the real and imaginary parts offxY(w).(o) in the polar form where and The functions Axy(o) and +xy(o) are known as the cross-amplitude spectrum and the phase spectrum....5) The function cxy(@)is called the cospectrum. and q+ . These functions. Hence.(o) = f.. and y. 14). several cross-spectral functions are needed to describe the relationship between two series in the frequency domain. namely. we obtain for w # A. letting A = o in (14. In general. we recall from Section 12.5. for the right-hand side to be a function of k only.14. and the coherence. Therefore.. they are possibly the three most commonly used cross-spectral functions in kequency domain analysis.respectively. are also jointly stationary. the phase spectrum.(o) and dU.(w) and dUy(w).14) must be zero when w # A.5 Bivariate Frequency-Domain Analysis 351 In practice. 14.4 that any stationary zero mean process can be written in terms of the spectral or Cramer's representation.(w) are the complex conjugates of dU. where dU. (14. Thus.(w) are each orthogonal processes such that for w f A. Hence.5. E [ d & ( w ) dU. the integrand in (14.5.(A)] = 0.1. and y.13). That is. in addition to (14.5.5. where dU:(o) and dU. we have Now. we also require cross-orthogonality. If x.2 Interpretation of the Cross-Spectral Functions To see the implications of the above functions associated with the cross-spectrum. the cross-spectrum between two series is more easily analyzed through the gain function.15) . we can write . Thus. which implies that the gain function GI.A&) = If. E [ d U .1. we have and where and E[dU.(@) is simply the absolute value of the standard least squares regression coefficient of the o-frequency of x.e. By the same argument that was used to derive (12..(w)!..dUy(o)).(w).(A)] = 0.(w) dU.. Similarly. Also.19) we see that the cross-amplitude spectrum. for o # A. are both purely nondeterministic and jointly stationary zero-mean processes with absolutely s u m a b l e autocovariance and cross-covariance functions.. measures the covariance between the w-frequency component of x.37). and the o-frequency component of y. from (14. ( o )d q t o ) ] = Cov(dU.352 CHAPTER 14 Transfer Function Models Assume that x. and y. Now. i.5. K$(w) is square of the correlation coefficient between the o-frequency component ofx. It follows that when a(eio) and P ( e i w )are nonzero. we have or. To see. 0 5 K$(o) 5 1. . and the w-ffequency'component of yy.14. the coherence is invariant under linear transformations.5. by (12.3. A value of ~ z ~ ( o ) close to one implies that the o-frequency components of the two series are highly linearly ) zero implies that they are only slightly linearly related. related.17) and (14.clearly. dU. hence. suppose that and Then.3).A~(m). equivalently.18). and a value of ~ & ( w near Like the correlation coefficient between two random variables.5. fWY(w)= ~l(e-~")~(e~~). by f 14.(o) = a(e-'") dU' (o) and Hence. the coherence is invariant under linear transformation.5 Bivariate Frequency-Domain Analysis 353 Hence.. e. The value of #x. because +. or x. between the w-frequency components of x..e..rr to w.5. dUx(w) and dU. where there is no feedback relationship between x.. and y. is then the angle between the positive half of the c.(o) is an odd symmetric function that is zero at o = 0. the shift in time units is #~. and y. Thus.#y(w)j. the phase spectrum is defined only mod(2n). In terms of a causal model y. a. - E[Ax(w)Ay( o ) 1E [e'[+1(~)-4~(~)]] .5. if the phase +xy(w)is positive..354 CHAPTER 14 Transfer Function Modets To learn the interpretation of the phase spectrum. .(w) .(o) is a solution to (14.(o).5. the actual time delay of the component of y. leads the o-frequency component of y. we recall from Section 12. It follows that +xy(w) and -qXy(w) have the same sign. A. and y. however. Hence.8). Thus. the range of +x.1. From (14. For a given +xy(o). then + x y ( ~= ) +iy(w) f 2 r k for any integer k is also a solution of (14.-.(o) and +x(o) are the amplitude spectrum and the phase spectrum of the x.8).8). = ax.(o). In practice. [+. the phase spectrum is normally graphed only for o over the range 0 5 w 5 ?r. The w-frequency component of x.6) and (14.5.(w)/w. which is the argument of fxy(o). Similarly.4 that for any o . and the phase spectrum # x y ( ~ ) represents the average phase shift. and Ay(o) and $(w) are the amplitude spectrum and the phase specirum of the y. Hence. series.. lags the o-frequency component of y. -q. we can write where A.. series. i. = py. at frequency o is equal to + + which is not necessarily an integer. The w-frequency component of x.?(w) axis and the line joining the origin to the point (cx. if the phase +xy(w)is negative. it is clear that if &. where for simplicity we assume that the amplitude spectrum and the phase spectrum are independent..(&) can also be thought as the average value of the product of amplitudes of the o-frequency components of x. if we take the given function to be nonnegative. Furthermore.(w) are complex valued random variables. the phase spectrum is a measure of the extent to which each frequency component of one series leads the other.-.(w)). Consequently.(w) is conventionally restricted to the interval from -.. GX@) = 0. - = O.n i ) .4 Consider the model where we assume that nl processes. are jointly stationary and uncorrelated processes. Equation (145. are jointly independent zero mean stationary yV(k) = E ( x f ~ t + k ) = E[xt(ffxttk-.2a). at each frequency as emphasized above in the phrase such as "the w-frequency components of x.5 Bivariate Frequency-Domain Analysis 355 From (14. Then yxy(k) = 0 for all k.5..3 Examples EXAMPLE 14.AX@) = 0. and ~ & ( o )0 EXAMPLE 14.5. &. and y. Hence. and e.2a). Then > 0 and x. which implies that c.5.. Hence. for dl w . we have &(o) = 0. and y. Thus.33) implies that + ett~>] .3 Assume that x.15) the components of x.y(o) = 0. by (14.5.." 14. = ayx(k . and y.(w) = 0.14. we need only examine the rilationship between the corresponding components of xt and y. are independent at different fkquencies. q&) for all w. by (14. the actual time delay of they. This phase spectrum shown in Figure 14. the absolute value of the least squares regression coefficient. if m instead equaled 4. . we also need f. The gain function for this example is given by which equals.2. the time dday between two series may change from frequency to frequency. with m = 4.5 The phase spectrum for ExarnpIe 14. Thus. series equals m at all frequencies.22)and (12. which.30). By Equation (14. the slope of each line is (-m).5is a straight line with slope (-m) and is worthy of further discussion. T ) we would have a graph Like Figure 14.3).6 The ~ h a s espectrum for Example 14. equals FIGURE 14. In this case.4.(w). In general. by (12. Because the phase function is negative for all o . with m = 1. In Figure 14.3. series at all frequencies. then following the convention of restricting the phase to the interval ( .5.6. series leads the y. the x.4.5. as expected. the phase spectrum between two series may not be a straight line.T .356 CHAPTER I4 Transfer Function Modeb FIGURE 14. For the coherence. 14. for t = 1... x . .then = 1 for all frequencies o. in this case. satisfy the exact linear relationship y. is the truncation point. With similar notation. = 0. series can be estimated by the smoothed spectrum where K ( k ) is the sample autocovariance function. KG(@) If e. we can write jXy(m) as where .(k) and Mxy are the corresponding lag window and the truncation point for the smoothing. the smoothed spectrum for the y. . we have learned from Chapter 13 that the spectrumfJo)for the x.5. 2. and y.-. and M. series is given by Simifarly.5 Bivariate Frequency-Domain Analysis 357 Hence.This result is also expected because. = ax. Clearly. 14. we can estimate the cross-spectrum by where Txy(k)is the sample cross-covariancefunction and bVx. and n. and y. . For a given set of time series data x.4 Estimation of the Cross-Spectrum . WX(k)is the lag window. (k)sin wk 2 g k=-hiJy 2 are the estimated cospecirum and quadrature spectrum. the truncation points M. and the lag windows W. We do not discuss the sampling properties of cross-spectral estimates. we may estimate the crossamplitude spectrum. My. That is. The time lag r can be estimated by the lag corresponding to the maximum cross-correlation or by Equation (14. Having estimated the cospectrum and quadrature spectrum. the phase spectrum.Y?X. Interested readers are referred to the books by Jenkins and Watts (1968).. then the estimated cross-spectrum will be biased. if one series leads another and the cross-covariance does not peak at zero lag. two series can be aligned by shifting one of the series by a time fag T so that the peak in the cross-correlation function of the aligned series occurs at the zero lag. and others.5.(k) are chosen in a similar way to those used in the univariate spectral analysis. the bias can be reduced by the proper alignment of the series. Brillinger (1975). they are often chosen to be the same. and gain are unreliable when the coherence is small. and M.. and the coherence as follows: and In the estimated cross-spectrum.(k). because of the simplification of the sampling properties of the spectrum estimates. Hannan (19701. and Wxy(k)do not necessarily have to be the same and should be allowed to vary depending on the characteristics of the univatiate as well as the joint series. phase. . The truncation point MXyand the lag window Wx. In practice. Rigorously. Because the coherence is invariant under the linear transformation. Priestley (1981). the gain function.358 CEIAPTER 14 Transfer Function Models and 1 &(o) = 'GV. Thus. It is also worth noting that the estimates of cross-amplitude. we discuss the relationship between the cross-spectrum and the transfer function model in the next section. That is especially true for the fag window.30) and the slope of the phase spectrum. the lag window places heavier weight on the sample crosscovariances around the zero fag. This bias is especially severe for the estimated coherence.Wy(k). Instead. however. respectively. and nt are assumed to be independent and jointly stationary zero mean processes.14.(o) # 0.lb) where v(B) = Xj=.2b).6. + n. Then..1 Construction of Transfer Function Models through Cross-SpectrumAnalysis Consider the two-sided general transfer function model =.5) shows that the frequency response function is the ratio of the cross-spectrum to the input spectrum. CO Multiplying both sides of (14. .6 The Cross-Spectrum and Transfer Function Models 14.v j ~such j that Zj. we have Hence. for all w such thatf. we get By (14.Ivj[ < m. The function u(w) is called the frequency response function of the linear (transfer function) system.6.6 The Cross-Spectrum and Transfer Function Models 359 14.6. Equation (14. The series x. The impulse response function vkcan be obtained from b(o)eiwi TI vk = do. (14.6. 03 v(B)x.5.2) by Bkand summing. 6.9) and (14.3) and obtain Hence.5.5.(k).4.2 Cross-Spectral Functions of Transfer Function Models The gain function between the input x. and the output y. the bansfer function model can also be constructed through the use of cross-spectrum analysis.5. In other words. the phase spectrum between the input series x. becausef. The noise autocovariance function y.2.360 CHAPTER 14 Transfer Function Models To derive the noise spectrum (or error spectrum)f.(w) is nonnegative and real-valued.19).5.(w) as described in Chapter 13 and Section 14.22) and (12.6.(w). . that f$(o)= &(o).3. we have. 14. we again use the results from Equations (12. that Hence.5) as Similarly. and f. can be easily seen from (14.5). These estimates can then be used to identify the transfer functionnoise model as discussed in Section 14.(o). depends only on the phase shift of the frequency response function v(o). from (14.f. we obtain estimates of the impulse response function v k and the noise autocorrelation function p.31) and (14. and the output series y. from (14.5.(k) can then be obtained from By substituting the estimates of &(o).3. where we note. (w)/f.7 Multiple-Input Transfer Function Models 361 For the coherence. Other things being equal.(o) = Ivfo)12fx(u) represents the signal of the system.(o)/f. a model with a uniformly smaller noise-to-signal ratio and hence a Iarger signal-to-noiseratio is obviously more preferable.6. The noise-to-signal ratio is large when the coherence is small. the coherence approaches zero as the ratio fn(w)/'(w) approaches unity. and we have the multiple-input causal model . 14. because the output equals the buried signal plus the noise. equivalently. we can form the noise-to-signal ratio wheref.. Conversely. the higher the signal-to-noise ratio.(w) approaches zero.6. that Thus.6.7). The coherence K$(W) approaches unity as the ratio f. Equivalently.14.The larger the coherence.7). the signal-to-noiseratio can be used as a criterion for model selection. and vice versa.4) and (14. the coherence is a measure of the ratio of the noise spectrum to the output spectrum at frequency w. it can be shown that Hence. we note. the coherence is also a direct measure of the signal-to-noise ratio f. From (14. the output series may be influenced by multiple input series.7 Multiple-Input Transfer Function Models More generally.(o) at frequency o. from (14. The noise-to-signal ratio or. in general. We close this chapter with the following remarks.k. however. . 1.2. and xi. If the input series xi. some input series xjt could be intervention or outlier variables. the resulting model is.362 CHAPTER 14 Transfer. linearity in parameter is a normal case with regression models but a special case with transfer function models. One important element in a time series model is the time unit used in the model. one may think that the multiple regression models are special cases of the transfer function models because the noise processes in the regression models are often assumed to be uncorrelated white noise. where linear models are usually the desired end products. The reader may notice the close relationship between the transfer function models and the multiple regression models. then all aspects of the time domain and frequency domain analysis of the single-input transfer function model discussed above can be extended and applied with no difficulty. There is. The intervention and outlier models introduced in Chapter 10 are special cases of a transfer function model. In fact. are where vj (B) = u~(B)B~~/S~(B) assumed to be independent of each of the input series xj. we refer readers to Pack (1979).. The noise model is then identified through the study of the generated noise series . that much more research is still needed in the model building of multiple-input transfer function models when input series are cross-correlated. and xj. . and xjt for i # j are correlated. Liu and Hanssen (1982) suggested that a common filter be chosen from the AR polynomials of the input series with roots dose to one to reduce their cross-correlations. We believe. highly nonlinear in terms of its parameters. a vital difference between the transfer function and regression models in terms of modeling. The model building is much more complicated when the input series xi. unlike the regression modeling. some transformations must be taken to produce stationarity. are uncorrelated for i # j. we can construct the transfer function vj(B) relating y. The relationship is even more apparent if a transfer function model is written in terms of the difference equation form. if the original series are nonstationary. however. separately. Again. one at a time. Thus. In other words. For example. although the transfer function is linear. who fitted the Lydia Pinkham monthly data to a transfer function model and incorporated two price increases as isolated intervening events. To obtain proper and valid statistical inferences. . 2. As an example. j = 1. Function Models or is the transfer function for the jth input series xjt and a. the choice of an appropriate time unit is . The construction of both the transfer function and the noise models is important in the process. Ln other words. If the model for a phenomenon under investigation is regraded appropriate in terms of a certain time unit.4 Consider the transfer function model and (1 . ( c ) v ( B ) = (B ~ ' ) / ( 1. each with mean 0 and variance 3. are unconelated white noise processes. ol = -2..53. 4 Calculate and plot the impulse response weights for the following transfer functions: (a) v ( B ) = B / ( l . and y.. = a. In fact. Tiao and Wei (1976) and Wei (1981) show that a causal relationship is often destroyed by temporal aggregation.6B). This problem is addressed in more detail later in Chapter 20. then proper inferences about the underlying model should be drawn in terms of this same monthly time unit.ui = . say a month. and the last few observations on x..2 Consider the A R M A ( 1 .4 B ) x . 6 = . More seriously.8B . 4 = . where x.4. and a. where oo = 1. u: = . is a white noise series with zero mean and constant variance 1 . CalcuIate the cross-correlation function between a.35. it could cause a misspecification of lag structure in causality study. 14. + + + + 14. and y. Improper use of data in some larger time unit such as quarterly or annual quantities could lead to a substantial information loss in parameter estimation and forecasting.6 ~ ~ ) . + rt.. = (2 + B)x.agg = a100 = 0. 8 = . .3 Suppose that y. 2. and 2.. l ) process where a.2.. and 3.5. are as follows: Calculate the forecasts $loo ( I ) and their forecast error variances for 1 = 1. Find the CCF between x. 14. (b) v(B) = (3 2B)/(1 .Exercises 363 very important.8B). 292 96.827 33.704 69. (e) Estimate the parameters of your tentative model and perform diagnostic checking for model adequacy.007 75. where a. 52.194 81.882 92.078 51.453 131. + 14. series and filter the Y.748 79.909 71.828 69.078 131.287 91.982 91.592 79. (read across) in thousands of units.6 Suppose that y. (c) The amplitude.340 99.806 81.461 77. each with mean zero and varian'ce 5.491 77.362 76.51 7 93.650 Housing sales XI (read across) in thousands of units.291 75.167 131.826 71.504 86.856 58.443 39..404 40.157 40. 38 44 53 49 54 57 51 58 48 44 42 29 35 34 34 45 51 54 36 29 43 32 43 40 29 49 56 58 42 33 53 41 46 43 36 62 60 66 53 44 49 44 46 42 42 62 65 63 53 54 49 49 43 43 43 58 64 64 55 56 40 47 41 44 44 59 63 60 48 51 40 46 44 39 44 64 63 53 47 51 36 47 47 40 48 62 72 52 43 53 29 43 41 33 45 50 61 44 39 45 31 45 40 32 44 52 65 40 33 45 42 26 34 32 31 40 50 51 36 30 44 37 23 31 32 28 37 44 47 28 23 38 (a) Plot and examine the two series in the same graph.504 68. (c) Compute the sample CCF (for lags from -24 to 24) between the prewhitened and filtered series.522 80. (b) The cross-spectrum.961 54.408 80.5 Consider the following monthly housing sales X. 8 ~ .004 70.301 71.848 83.150 100.136 120. (f) Forecast housing starts and attach their 95% forecasts limits for the next 12 months.364 CHAPTER 14 Transfer Function Models 14.791 39.088 47. and e.013 115.584 116.959 62. and x.627 102. Plot them together with f2 standard errors.424 86.069 42.198 46.318 90.643 84.068 69.646 55.767 43. For the above model.561 84.398 82.362 59.234 86.777 92.870 119.839 87. are independent white noise processes.149 47.324 120.904 80.560 105.498 77. = a.300 47. .185 135.105 73. between January 1965 and December 1975: Housing starts Y. series.274 66. and housing starts Y.941 84.488 46.351 61.876 85.782 73. (b) Prewhiten the X.205 82.026 45. find (a) The cross-covariance function. (d) Identify a tentatively transfer function-noise model for the data set.149 102. .715 79.782 84.039 55. + .363 74.341 78.931 98.750 72.~ e. = x. nt. (f) The coherence. 14.~ ~ 't.7 Given yt = ( x . find the frequency response function.. where x. . are assumed to be independent and jointly stationary zero mean processes. (e) The gain function. calculate. (d) The coherence. 14. and n.5.-3) 2 4. gain function.-I 3.8 For the data set given in Exercise 14. (c) The gain function. (b) The phase spectrum. + x. plot.x. and discuss (a) The cospectrum.Exercises 365 (d) The phase. and phase function. and Xi are independent for all s and r.1). . conditional on Xi.i. Xi = .1 Regression with Autocorrelated Errors -- -- In standard regression analysis. .l)becomes an AR(k) model (x-l. .)&I = . u:). we illustrate the use of some t h e series techniques in regression analysis..d. .e. Under these standard assumptions..& ) I . When Xi is a vector of k lagged values of Y. 15. equivalently. is a (k X 1) vector of parameters. When Xi is stochastic in (15. N(0.. however. where Ytis a dependent variable. . . In this chapter.the results about the OLS estimator of P also hold as long as E. we will introduce the autoregressive conditional heteroscedasticity (ARCH) and generalized autoregressive conditional heteroscedasticity (GARCH) models which have been found to be useful in many economic and financial studies. the model in (15. are often violated when time series data are used. we consider the model or. i. Xi is a (1 X k) vector of independent variables. The standard assumptions associated with these models. and et is white noise. and E.Time Series Regression and Garch Models Regression models are without a doubt the most commonly used statistical models. .1. In the process. f . it is well known that the OLS estimator of P is a minimum variance unbiased estimator. is an error term often assumed to be i. 1.i. Z can be easily calculated. The diagonal element of X is the variance of E. the generalized least squares (GLS) estimator is known to be a minimum variance unbiased estimator. we will not know the variance-covariance matrix T: of because even if e.1. Given Z. the estimator is no longer consistent. are autocorrelated. the jth off-diagonal element corresponds to the jth autocovariance of et. a residual analysis is an important step in regression analysis when time series variables are involved in the study. could still be autocornelated unless the correct order of k is chosen. Thus. where and then. are i. A final analysis can then be performed on a modified model with a specified model substituted for the error term. When time series variables are used in a model.4. Normally.1).1. . Even in univariate time series variables analysis when the underlying process is known to be an AR model as in (15. the following iterative GLS is often used: . an underlying model can be identified for the error term using the identification techniques discussed in Chapter 6. .1. however. (r2). The latter method is particularly useful. Many methods can be used to test for autocorrelation.d. .1. .n. the error terns E.1 Regression with Autocorrelated Errors 367 The OLS estimator of P is asymptotically unbiased and consistent. it is the norm rather than the exception that the error terms are autocorrelated. the u 2 and AR parameters q are usually unknown.. we can either use the test based on the Durbin-Walson statistic or we can perform the residual autocorrelation analysis discussed in Chapter 6. In time series regression analysis.4) and the model is stationary. follows an AR(p) model given in (15. This caveat is important. the following model is often proposed to account for the autoconelated errors: for t = 1.3) is where S follows a multivariate normal distribution N(0. For example. As a remedy.15. . When that is the case. qp and a2 are known in (15.41.2). .When PI. Based on the residuals from a preliminary analysis of (15. 2.As shown in Section 7. Let The matrix form of the model in (15. and they can be easily computed as shown in Chapter 3. . this result no longer holds when the E . N(0. 2). . For example.4) based on the OLS residuals. using B obtained in step 3. Indeed. In many practical applications. are uncorrelated but have variances that change over time. 15. the weighted regression is often used if the error variance at different times is known.1. we can also use the nonlinear estimation or maximum likelihood estimation discussed in Chapter 7 to jointly estimate the regression and error model parameters and q ' s as we did in Chapter 14.31. Compute Z from (151. this assumption may not be realistic.2 ARCH and GARCH Models One of the main assumptions of the standard regression analysis and regression models with autocorrelated errors introduced in Section 15.3). can be autocorrelated in the regression model. and Phillips (1986). More generally. therefore. let us assume that the error term can be modeled as . from OLS fitting of (15. m2. of the errors it. Estimate yrj and a2for the AR(p) model in (15.. 3. Many approaches can be used to deal with heterocedasticity. In such a case. from the GLS model f~ttingin step 4. P = (X'B-lX)-l x'Z-IY. Although the error term. it should be stationary. A nonstationary error structure could produce a spurious regression where a significant regression can be achieved for totally unrelated series as shown in Granger and Newbold (1986). Compute the GLS estimator.1 is that the variance. and repeat steps 2 through 4 until some convergence criterion (such as the maximum change in the =timates between iterations becoming less than some specified quantity) is reached. it is generally agreed that stock markets' volatility is rarely constant over time. (15. is constant. In practice. the error structure can be modified to include an ARMA model. in financial investment. a simple conditional OLS estimation can be used. the study of the market volatility as it relates to time is the main interest for many researchers and investors. e. one should properly difference the series before estimating the regression. models to account for the heteroscedasticity are needed. by substituting the error model in the regression equation. it. Alternatively. Let us first consider the standard regression model with uncorrelated error where ct = ttt and the n. however.1. the error variance is normally unknown.368 CHAPTER 15 Time Series Regression and Garch Models 1. The above GLS iterative estimation can still be used except that a nonlinear least squares estimation instead of OLS is needed to estimate the parameters in the error model. Compute the residuals E.using any method discussed in Chapter 7. 4. Following Engle (1982). 2.1. Calculate OLS residuals 2. Such a model incorporating the possibility of a nonconstant error variance is called a heteroscedasticity model.4) using_the values of q and u2obtained in step 2. For example. For example. random variables with mean 0 and variance 1. it is an ARCH model of order s and is often denoted ARCHCs). specify the following ARCS) model to ti:: . A large error through n:-i gives rise to the variance. To test for ARCH.4) is simply the optimal forecast of n f if n? follows the folloiving AR(s) model where a. Thus. we can form the series S: and examine its ACF and PACF to see whether it follows an AR model.2) and (15. . Engle (1982) calied the model of the error term ti. Engle. = 0.15. Because - ~2 = regression sum of squares total sum of squares ' . and Robins (1987) generalize (15. is an N ( 0 .d. . Then. a'). = 8. we have Ho : = . and it changes over time. we can. . Thus.. equivalently. The term has also been used for the model for Y.2) and (15.5) the autoregressive conditional heteroscedasticity (ARCH) model. a:) white noise process. the conditional variance of the nt becomes which is related to the squares of past errors. . + + .2.1) to and call it the ARCH-in-mean or ARCH-M model.2. .2.independent of past realizations of rz.2.j. which tends to be followed by another large error. for t = s 1. More specifically. one can first fit the ordinary least squares regression in (15.1) for t = 1.3).2. are i. This phenomenon of volatility clustering is common in many financial time series. with the variance specitication given in (15. in (15. = 0.3) or.I). In financial applications. Using the forecasting results from Chapter 5. because the mean return is often expected to be related to the variance.1) with an error variance structure given in (15. are i.. .Under the null hypothesis that the 11. The fist step is to compute the OLS residuals i.2.2.2.i.d. s 2. .i. n. Alternatively. if Ho is true we expect that the estimates of Bj's are close to zero and that the value of the coefficient of determination R~ is small. following Engle (1982). we see that Equation (15.2. n and test Ho : B1 = = 8. in (15. N(0. and Given all the information up to time (f .. = fit.2.2 ARCH and GARCH Models 369 where the e. Lilien. the model in (15. In fact.2.#. and therefore. neither n f nor CT? plays this proper role. .rPIB - . and the roots of ( 1 B . is the associated white noise process for the process of n:. A natural extension to the ARCH model is to consider that the conditional variance of the error process is related not only to the squares of past errors but also to the past conditional variances. .2.9)and conclude that the conditional variance follows an ARMA(c s) model with r being the AR order and s being the MA order. the associated error is a white noise process corresponding to the one-step ahead forecast error.2. independent of past realizations of n.-i.8) with the property (15.i.11a) is a proper ARMA model. To guarantee u: > 0. The model for the error tern given in (15.2. which follows because ~ .a.+r Br)ut. (15.1 la) 0 for i > r.~ ( n :=) u f .9)in terms of n! and a. as follows: ( 1 .10)or (15. Qi = 0 for i > s.370 CHAPTER 15 Time Series Regression and Garch Models and under the null hypothesis an ARCH phenomenon exists when the value of (n . we assume that Oo > O and that +i and Oj are nonnegative. Thus. .Bn')n: = Oo -k (1 . and We claim that a. we have the more general error process where the e.S)R* is statistically significant.u:) so that a.cr. the model reduces to the ARCH(s) process. = (nt .arlB where nt = max(l.a. = (n: .2. (15.91. when r = 0.2. random variables with mean O and variance 1. One should not be misled by (15. .. We can rewrite (15.9)is not a proper ARMA model. Br) = 0 are outside the unit circle. are i.9) is known as the generalized autoregressive condition& heteroscedasticity (GARCH)model of order ( I .2. From Chapter 5.2. we see that in an ARMA process. . s) and is denoted by GARCH(c s).Ciearly. This model was first proposed by Bollerslev (1986). + + +i =. . s). Let a.). In (152.d. and n? are always nonnegative in (15.13) is called exponential GARCH.2 ARCH and GARCH Models 371 is the one-step ahead forecast of r z z .in the conditional variance.111 (15.11a) with the AR order being m = max(r. In other words.a)). - . .e. a positive error has the same effect as a negative error of the same magnitude. when 0 < at < m. g(a.9). Thus.2. +i. The coefficients t.2.al . the effect of errors on the conditional variance is symmetric. Nelson (I991) proposed that 2pla..2.12) for i f j./a. and E (aiaj) = E (it . financial variables and their volatility changes and to relax the restriction on the coefficients in the model. In this case. with slope (6 a). depending on the sign of a. For example. < 0.13) implies that u! will be positive regardless of the sign of the coefficients. I ) (e2 . the GARCH(c s) model in (15.. .2.2. .. Moreover..9) implies that 1t: follows an ARMA(m. i.2. The model in which the evolution of the conditional variance satisfies (15. From (15. or simply EGARCH. we see that this process will have a unit root if (1 .i.9).when -03 < a.10) or (15.. + xj'_lr$j+ where qo = 1 and a.d. GARCH. it is clear that in the specification of ARCH. the model will be called an integrated GARCH process.hj are often assumed to relate to an ARMA specification to the model.2. where we note that the ef are i.2. The function g is chosen to allow for asymmetric changes. x2 (1).am) = 0.) is linear with slope (6 . r) model in (15. To accommodate the asymmetric relation between many cients Bo.3) and (15.e. From (15. g(at) is linear in a.u3)( n f .i. g(a.2.uf) = E [rrfu!(ef = 0. = a.2. The specification of (15. or IGARCH models.a:)(crfej . because CT.8) and (15. is the corresponding one-step ahead forecast error. . we may choose + In this case.15. and a.11a).uf) = ~ ( u f e f.e. it is obvious that some restrictions on the coeffiand Bj are needed. The polynomial $(B) is such that the roots of 4(B) = 0 are all outside the unit circle and share no common factor with the polynomial 8(B).) altows the conditional variance to respond asymmetrically to the rises and falls of the process... a7 where Zj=o@jBj = $(B) = 6(B)/#(B). if = Eyil(4i Bi) = 2j=16j = 1 . i. s).2.Thus. or simply IGARCH. 2. A pattern of these ACF and PACF not only indicates ARCH or GARCH errors. u: = do (15. .2. N(0.. 3 + . + p p ~ t .2.e. for a pure ARCH(s) model. we can also compute its PACF. where Similarly. both ACF and PACF of 2. e. will show patterns of an exponential decay. we f i s t note that as shown in (15. s). reduces to the AR(s) model Hence. = p1~~-l (15. 1) and independent of past realizations of ?I. r) model for n! with m = max(l.15).e. ~ ~ .2. 3.3p.+. 2. 4.15) to the g. Obtain the residuals i tfrom the AR fitting in (15.14). To see how the ACF and PACF pattern can help identify a GARCH model. i.372 CHAPTER 15 Time Series Regression and Gar& Models More generally.. On the other hand.. its ARMA representation. from the OLS fitting of (15. Fit an AR(p) model (15. we can perform the following steps: 1. Then.16) 11 = IT. )ti. .2..2. from (15. we can cbeck whether these ACF and PACF follow any pattern. Form the series if and compute its sample ACF. ~ . s) model for u! corresponds to an ARMACnr.2. CalcuIate OLS residuals 2. a general GAICCH(r.i.2. but it also forms a good basis for their order specification..1 la).15) + $luf-l + .17) ~ ~ ~ + B . Thus. + B ~ n ~ ~ ~ + (15.d. i. The orders r and s can be identified using the similar techniques of identifying the orders of an ARMA modef. where &I + . To test for heieroscedasticity in the error variance. the regression model with autocorrelated error can be combined with the conditional heteroscedasticity model. . the corresponding PACF will cut off after lag s.11a). n ~ ~ ~ and the et are i. 1) and independent of past realizations of ~ z . = Utei.i.15. the next step is to estimate its parameters. under the null hypothesis that the ?$are white noise.d.2.1 Maximum Likelihood Estimation Note that the above mode1 can be rewritten as .3 Estimation of GARCH Models Once a model is identified. 15. . 15. Let us consider the general regression model with autoconelated error and conditional heterocedasticity GARCH structure for the error variance. . In this case.3 Estimation of Garch Models 373 Alternatively. N(0. and the et are i. Recall that to test for an ARCH model. the significance of the Q(k) statistic occurring only for a small value of k indicates an ARCH model..2. and a persistent significance for a large value of k implies a GARCH model.k as suggested by McLeod and Li (1983): pi(n:) As in Chapter 7.3..~The . following methods can be used to estimate the parameters. ~ n.6) and check the significance of the statistic given in (15.also use the following portmanteau Q statistic to test for = 0.+Q + ~ E ~ -n. we can . . i = 1. the Q(k) statistic approximately follows a x2(k) distribution. .7). we can also fit model (15.2. . where et = qqst-1 +. ). s). X. and identify and fit its AR(p) representation. where the a.r) model on the squared residuals . 4. p = (ql. n.2..3. cp.. r) model for n. . 15. s) model for of in (15. Y. series. 3. . #.3. because the GARCH(. & Let &. Compute the residuals. Fit the identified ARMA (nz. b).4).. we have . with m = max (r.1la. .8. for t = I . Len. or the log-likeljhood function . zo. are zero mean white noise. Let g. ..374 CHAPTER 15 Time Series Regression and Gar& Models .6). .3. .I. Then. we can obtain the maximum likelihood estimates of the parameters through maximizing the conditional likelihood function . Let Y = ( Y l .).).6). from the A R ( p ) fitting of the 2. . and be the estimates of the parameters in this ARMA[m. .. . r) model. we can estimate the parameters using the following steps: 1. . .3. dl. similat to the estimation of transfer function models discussed in Section 14.2 Iterative Estimation Alternatively.). . and 0 = (do.3. n from (15. Compute GLS estimator of /3 and Q using the iterative GLS method discussed in Section 15. . be the error series obtained from the last iteration of the GLS estimation. where u: is given in (15.3.1. . Calculate the sample ACF and PACF of the g.4) is equivalent to an ARMA(n1. Then from (15. is given in (15. and Yo and & be some proper starting values required for computing n. .. X = (XI.e. . k.3. series in step 2.:A i. . 2. 4 = (41. we may need a: and 11. BollersIev (1986) suggests that these terms be chosen as g: = 11: = G2. . (15. i.4.6) .3.The associated 1-step ahead forecast error is E.9) using the estimated parameters obtained earlier.. + 1. (15.3.0. we see that given all the information up to time t.3.+/ . (15. the I-step ahead forecast error variance is the conditional variance where crf+l-j = E. = n. .+[ = Y. We first note that the optimal 1-step ahead forecast of Y. . in using (15. fort = -m + 1 .Et(Yt+l).. and 0. which can be calculated from (15.15.1.3.4.4) or (15. .9). .To calculate the conditional variance.2) as (l-cplB-. .(nf+l-i). where Q0 = 1 and Using the result in Section 5. for t = -m where nz = max(r.3. s). equivalently.4) or.. .e. Clearly. ..+)given all information up to time t is the conditional expectation Er(Yr+I).4 Computation of Forecast Error Variance 375 15 -4 Computation of Forecast Error Variance An important use of an ARCH or GARCH model is to find a forecast error variance that changes over time.- pPBP)c.2. we rewrite (15.1) Thus. and we have Var. the one-step ahead conditional forecast enor variance is simply (T:.(~. This series contains 46 monthly observations on Louisiana and Oklahoma spot prices for natural gas from January 1988 to October 1991. measured in dollars per million British thermal units ($/MBtu). which were originally presented in Bryant and Smith (1995).1. the model reduces to the GARCH(c s) model of uncorrelated errors. given the information up to time ( t .1 Spot prices of natural gas in Oklahoma (solid h e ) and Louisiana (dotted line) between January 1988 and October 1991. we consider the bivariate series listed as Series W15 in the appendix and plotted in Figure 15.3.+~) = u:+~. ..2) that when E.1 In this example. Thus.11. It was known 1988 1989 1990 1991 Year FIGURE 15. The spot market for natural gas is the market in which natural gas is sold under a contract term of no more than one month.5 Illustrative Examples EXAMPLE 15. = n. which changes over time.376 CHAPTER 15 Time Series Regression and Carch Models where As a special case. we note that in (15. 15. gt = 0. gives = . The AR(1) fitting of the series g.. The PACF of the g. = 6. .-1.377 15..5. respectively. we will try a time series regression model. We consider the OLS estimation for the model ot = PO 4- PIL. t-value P-value + Thus. let 0.1 Test for heteroscedasticity. TMLE 15. we compute the residuals. Preliminary Analysis We begin with a simple regression model with uncorrelated errors. the OLS regression equation is 0. k Q(k) Statistic P-value . = -12 .. Thus.8L.and L. .502i. To check for heteroscedasticity in error variance. f (15. i.e. be the spot prices for Oklahoma and Louisiana..5 Illustrative Examples that the spot price for Louisiana natural gas was known on or before the first day of trading on the Oklahoma market. That is. which gives R~ = . and we can compute the OLS residuals.1. .6.1) &I.and calculate the portmanteau Q(k) statistic on the squared ii? series. series shows a significant spike at lag 1. To examine whether one can predict the Oklahoma spot price from the current spot price of Louisiana. we propose an AR(1) model for the error term.502.9444 and the following estimation result: Estimate Parameter St.E. with the results summarized in Table 15.. Ci. however.10). is a zero mean white noise process.i.4.4) the I-step ahead forecast error variance is the conditional variance TABLE 15. t-value P-value . Thus. I).E.5. given all the information up to time f.956~~. equivalently.378 CHAPTER 15 Time Series Regression and Gar& Models Construction of a GARCH (r. we will refit the spot prices with the model where the e.3) that with & = 1 t. we note that from (15.d. N(0. from (15. Thus.I. Coefficient Value St. 2. (15. .= -05.bj = (*502)jforj = 1.4. I] model. la. For a = . s) Model At a . Hence. we see that the estimate is not significant but that 81 is highly significant. by (15.2. From this table. . Computation of the Conditional Forecast Error Variance To calculate the conditional variance.2) and (15. are i. a0 or.1 shows that there is no evidence of heteroscedasticity.. independent of and The estimation results of this model are summarized in Table 15.8) where a.2 Estimation of the GARCHEO. .-~ + a. Table 15.2.4. = 1.. these Q(k) statistics clearly indicate an ARCH(1) or GARCH(0. 1) model. where the e. The other 1-step ahead conditional forecast error variances can be calculated similarly.4.5.5). Although we introduce the conditional variance through a regression model. we now fit an ARIMA(0. random variables with mean 0 and variance 1.. following (15. given the information up to October 1991 that corresponds to t = 46.17291~f .# l B . Examination of ACF and PACF of n!. The series is more volatile at some times than at others. however. The estimated model is where for consistency of notations used in this chapter we use n.2 We recall Series WS. .independent of past realizations of a.OqB4)a. .. to denote the white noise error series.d.369811?-~ a. 1) model to the original series and try to examine its residuals. Instead of taking a transformation. Let us consider the following example. 1. may follow a GARCH(r s) model.4) and (15.e. are i.. i. first considered in Chapter 4 with its plot shown in Figure 4. the concept clearly applies to the univariate time series models. the white noise process a. the variance of 12. and or an ARCH model if r = 0.i.2.( + (1 .094) .Illustrative Examples 15. suggests the following ARCH(2) model: + + nf = ..i. In other words.5 379 where 2 = ~ . is assumed to be constant. we can compute the conditional forecast error variances for t = 47 and t = 48 as follows: where n~ is the residual at t = 46 obtained from the autoregressive error fitting in (15.089) (.4. (.4). ( n : + ~ -Specifically. = %. in the general ARMA model (1 . ~). the yearly US tobacco production from 1871 to 1984. Traditionally.B I B - " ' . - + p B P ) z . EXAMPLE 15. c#+ 2 0. and Lamoureux and Lastrapes (1993). . - + 15. and Show that for the unconditional variance of n. We use the above simple examples to illustrate the procedure.3 Consider the general GARCHfl.i. Oj r 0. Other examples include Engle. . Indeed.and +i 2&.Oj < 1. Day and Lewis (1992).. is Var(n. random variable with mean 0 and variance 1. the GARCH model and its variations have been applied to many risk and volatility studies.If we fit the model by OLS + + . be the Gaussian white noise process with mean 0 and variance a2. where the e. show that ( N . = Ute. where R~ is the 15. to be positive and finite the coefficients need to satisfy the constraints do > 0.s multiple coefficient of determination. = Ute.j = Oo/(l el).d. N ..i.d. are i. . are i. a very simple GARCH(c s) model with r 5 2 and s 5 2 is often sufficient to provide a significant improvement over the traditional homoscedasticity models. Ng. where the e. 15. random variables with mean 0 and variance 1. ) d ----* ~ ~X2 (s). we can compute the conditional error variances for t = 1I5 and t = 116 as follows: which shows that the variances are related to the squares of past errors and they change over time. Toward these ends. and Rothschild (1990).Show that the unconditional variance of rz. s) model.2 Let 11. The investigation of conditional variance models has been one of the main areas of study in time series analysis of financial markets. s 2. 11.1 Consider the ARCH(1) model n.-!_l. These studies suggest that the selected model does not have to be complicated. and cr? =.380 CHAPTER 1 5 Time Series Regression and Gar& Models Given the information up to 1984 that corresponds to t = 114. Oo O1 n. for t = s 1. + . 6 Find some related time series of your interest. . and construct a GARCH model for the error variance if needed.Exercises 381 15. and construct a GARCH model if needed.S. perform a residual anaiysis. liquor sales between 1970 and 1980 given in Exercise 8. Fit the series with a univariate model.4 Consider the U.5 Find a financial time series. 15. (a) Fit a regression model using OLS. (a) Build an appropriate ARIMA-GARCH model for the series. (b) Compute the one-step ahead and two-step ahead conditional forecast error variances. [c) Compare and comment on the fesults obtained from the ARfMA-GARCH model and the standard ARIMA model. 15.6 of Chapter 8. identify the error process. (b) Compute the next 12-month forecasts and their conditional variances from the model. perform an analysis on its squared residuals. 1 6. and the lag-k cdvariance matrix . and the cross-covariance between Zj. . .t). . .. m..t. m.) = pi is constant for each i = 1.. . . are functions only of the time difference (s . . stationary and nonstationary vector ARMA models are discussed. . Hence. Detailed empirical studies are presented. 2.= {ZIrt. In the transfer function model. be an rn-dimensional jointly stationary real-valued vector process so that the mean E(Zi. After an introduction of some basic concepts for vector processes. In regression time series models. 2.]'. The chapter ends with some descriptions of spectral properties of vector processes.e. i. Z.. t = 0.. we also concentrate on relationship between a dependent variable and a set of independent variables. . fI. In this chapter.Zz. prices.. an input-output relationship between one output variable and one or several input variables. nt and j = 1. For example. for afi i = 1.1 Covariance and Correlation Matrix Functions . Some useful notions such as partial autoregression and partial lag correlation matrix function are introduced. in a sales performance study.. . .. f2. . we introduce a more general class of vector time series models to describe relationships among several time series variables. the variables might include sales volume. In many applications. 2.Vector Time Series Models The time series data in many empirical studies consist of observations from several variables. . . . . and Zj. safes force. we have the mean vector . .. we study a specific relationship between variables. Let 2. however. and advertising expenditures.. transfer function and regression models may not be appropriate. r(k) is called the covariance matrix function for the vector process Z. . j)th off-diagonal element of p(k). the covariance and correlation matrix functions are also positive semidefinitein the sense that . 2.&2. m. . ... Like the univariate autocovarizince and autocorrelation functions. 2. i = 1. .The matrix r(0)is easily seen to be the contemporaneous variancecovariance matrix of the process. for i f j. . however. A jointly stationary process implies that every univariate component process is stationary. represents the cross-correlation function between ZL and Zj. . For i j..16.. . yij(k) is the cross-covariance function between Z. . . A vector of univariate stationary processes. i.nt. nr and j = 1. pii(k). . where D is the diagonal matrix in which the ith diagonal element is the variance of the ith process. fl. . .2. . Clearly. is the autoconelatian function for the ith component series ZC whereas the (i. .. the ith diagonal eIement of p(k). is not necessarily a jointly stationary vector process. .1 Covariance and Correlation Matrix Functions 383 where - .. .. As a function of k. and j = 1. . . yii(k)is the autocovariance function for the ith component process Zi. fork = 0. . . .e. The correlation matrix function for the vector process is defined by for i = 1.2.i and q.m. and its standardization. . Note. we can write zZ0 q s B S .. . is said to be a linear process or a purely nondeterministic vector process if it can be written as a linear combination of a sequence of nz-dimensional white noise random vectors.The above representation is known as the where 2. that yv(k) P yU(-k) for i # j and also that r ( k ) # I'(-k). and an. the 'Pis are nz X m coefficient matrices.Pi)(z. .. = a. .t . where 'Po= I is the ne X nl identity matrix. and the a. Equivalently. because ~ i j ( k= ) EI(Zi. . .pj)1 = Et(Zj..-.2 Moving Average and Autoregressive Representations of Vector Processes An m-dimensional stationary vector process 2. although the elements of a. Sometimes. . however. we have . at different times are uncorrelated. they may be contemporaneously correlated.Pi11 = yji (-k)..'s are m-dimensional white noise random vectors with zero mean and covariance matrix structure where X is a ni X m symmetric positive definite matrix. the covariance and correlation matrix functions are also called the autocovariance and autocorrelation matrix functions. Instead. Thus. = Z. 16. using the backshift operator BSa.p and g ( B ) = moving average or Wold's representation. . t2. . .+. i.e. t+k . t .pjl(Zi.a2.384 CHAPTER 16 Vector Time Series Models and for any set of time points q.The results follow immediately by evaluating the variance of Ey=iaiZ. tn and any set of real vectors al. For a vector process with a stationary moving average representation to be invertible. lie on or inside the unit circle. denoted by I ~ ( B ) I . An invertible process is not necessarily stationary...l i~f =i = j a n d I I ~ . W e h a v e ~ ( B ) = [Eij(B)].. .i=1. a = O CO if i # j....=[$~. such that E[(i. .. are absolutely summable.be sqye summable in the sense that each of the m X 111 sequences $c+ is square summable.bv. i. Z~olIIii. o = O i f i # j j .. i. m . no zeros of the determinant of the moving average matrix polynomial should lie on or inside the unit circle. $1..$ < 00 for i = 1.=* $$. vector process is said to be invertible if the autoregressive coefficient matrices TZ.a. where and the H. Z.2q j a t j ) ] o -+ j=O j=O asn - m.. j = 1 . i.4) Another useful form to express a vector process is the autoregressive representation.5 The .Here and in the following an infinite sum of random variables is defined as the limit in quadratic mean of the finite partial sums. E ~ . - 5 'U. in which we regress the value of Z at time t on its own past values plus a vector of random shocks. ttf and j = 1.e. a stationary process is not necessarily invertible.sl < ca for all i and j.e. A l s ~ ... . . w h e r e + i i .2. .2. .2. .. where nii(B) = I ? . m. .2. .16. We can write Y ( B ) = [$ii(B)]. i . 2. ~ o = I .BS.. in (16. III(B>I # 0 for IBI 5 1. For the process to be stationary. m . . . I*(B)~ # 0 for IBI 5 1..s]. m.2.e.-j)'(it . in terms of the backshift operator.e. For a vector process with an invertible autoregressive representation to be stationary. 2.1)or (16. = l i f i = j a n d $ e . .. or. Let E. X.0 .2. m .3)is dehed . = Illv. where @ii(B) = Es=Ot. .z ~ ~ I ST ~ . 2 . i. (16. we require that no zeros of the determinant of the autoregressive matrix polynomial. i = 1. e ... we require that the coefficient matrices q. j = 1.e. i.. Thus.2 Moving Average and Autoregressive Representations of Vector Processes 385 Let!P.. are m X m autoregressive coefficient matrices. Similarly.. is positive definite. When p = 0. the with no loss of generality.(B)Jare outside the unit circle. = = I. For any nondegenerate case where the covariance matrix X of a. = + . and Qo and goare nonsingular m X rn matrices.is square summable. (16. the m X nt identity matrix.386 CHAPTER 16 Vector Time Series Models 16. q) process where and are the autoregressive and moving average matrix polynomials of orders p and q. we can write where such that the sequence 9. In this case. enabling us to write . The process is invertible if the zeros of the determinantal polynomial le4(B)1 are outside the unit circle.3..3 The Vector Autoregressive Moving Average Process A useful class of parsimonious models is the vector autoregressive moving average ARMA(p.3) The process is stationary if the zeros of the determinmtal polynomial J@. that process becomes a vector MA(q) model When q = 0. respectively. the process becomes a vector AR(p) model Z. + @ p ~ t . we assume in the following discussions.p + a. . . - Procedure 1 is simpler and more commonly used than procedures 2 and 3. . Among these choose one with minimal autoregressive order p. Hannan considers the class of stationary processes satisfying the following conditions: A. 1970. if @.1) by an arbitrary nonsingular matrix or by a matrix polynomial in B yields a class of equivalent models that ail have identical covariance matrix structures. The only common left divisors of @.e.1) is not identifiable in the sense that we cannot uniquely determine the values of p and q and the coefficient matrices aiand Bj from the covariance matrices of 2. q) model.= @ti.1). and j = 1.-I + a.16. and B.3. a given autocovariance or autocorrelation function may correspond to more than one ARMA(p.(B). . left-multiplying both sides of (16. Recall from Section 3.(B) are unimodular ones. 1976. is less than or equal to the order of +ii(B) for all i = 1.(B) and B. 3. where &(B) = I .. If the order of +ij(B). The resulting representation will be unique if the rank t@. for univariate time series. To ensure the unique representation.3. The zeros of IQp(~)I(possibly after some proper dEerencing and transformations) are outside the unit circle and the zeros of IB~(B)Iare on or outside the unit circle. Represent @. Vector M(1)Models The vector AR(1) model is given by Z. From each class of equivalent modeh choose those with minimal moving average order q..(B) = &(B)I..$lB is a univariate AR polynomial of order p. Represent e p ( B )in lower triangular form. i. Thus. Other important phenomena on a vector ARMA representation are presented in Chapter 17. j ) element of @. without further restrictions. Similarly.(B) = C(B)H(B) and 8 ( B ) = C(B)K(B). 1979). .3. 2. . the model in (16. then the determinant [c(B)I is a constant.(B) in a form such that @.3 T h e Vector Autoregressive Moving Average Process 387 with such that the sequence is absolutely summable.2 that. then the model is identifiable. He then argues that with the above conditions the model in (16.1) may be identified by any one of the following procedures: 1. In.Bq3 is equal to nt.3. . we have imposed invertibility conditions in the model selection. . ni. the (i. The model is identifiable if +p # 0. . in the vector ARMA model in (16. The problem of identsability for vector ARMA models has been studied by Hannan (1969. . we let .. with one period delay with effect because the input series Zl.3.to the output Z2. Thus... however.. ..are correlated.12) as a causal transfer function model This is so relating the input Z1. mistakenly regard (16. Thus.9). then Equation (16. - Vector ArWIA and Transfer Function Models If 412 0 in (16.388 CHAPTER 16 Vector Time Series Models For nt = 2..10) implies that current sales depend not only on the previous sales but also on the advertising expenditure at the last time period.. then we have One should not. if zlg.. ..12) to a causal transfer function model.3. and &. we have . depends not only lagged values of iiit but also lagged values of other variables $ For example. Thus. . and the noise series a2. each ii.3. which violates a basic assumption in the transfer function model.The error term bz. on al.3. represent the sales and the advertising expenditures (measured f?om some level p) of a company at time t. there is a feedback relationship between two series. the current advertising expenditure will also be influenced by the sales performance at the previous time period.is independent of al. and hence bl. Moreover. where a is the regression coefficient of a*. apart from current shocks. To reduce (16. More generdiy.. Let the variance-covariance matrix X of [al.. then a joint multivariate vector model can be reduced to a block uansfer function model where a feedback relationship could exist among series within each block.3 The Vector Autoregressive Moving Average Process 389 This result shows that the input series ZI. actually has an instantaneous effect a and a oneperiod delay effect (#21 .CY#I~). treating the model in (16. a2.. the zeros of the determinantal equation must be outside the unit circle. we have 1 @k~I .12) as a transfer function model could be very misleading if u12# 0. In summary. a joint multivariate vector AR model can be reduced to a causal transfer function model. For the process to be stationary.!]' be It can be easily shown that Thus.(B) is triangular.16. if is block triangular. Stationarity and Invertibility Conditians The vector AR(I) process is clearly invertible. if the autoregressive matrix polynomial (P.3. Letting A = B-'. . the zeros of [I . . be the associated eigenvectors of QI.hi= &hi for i 1. . . n-+w n+w . Let hi. .. h. lhil < 1 for i = 1. . A*.' = 0 A It follows that and Thus. . . lim An = lim diag[Af. .M. In such case.. Let A = diag[hl..Am]and H = [hl. . . We have H = HA and - . .~ ~ are 3 outside 1 the unit circle if and only if all the eigenvalues hi are inside the unit circle. Hence. rn. . we can write ITAH-'. .. Far simplicity. h2. . h2. be the eigenvalues and hl. the zeros of are related to the eigenvalues of (Dl.Ah] = 0 and lim @: = lim HAHI' n-03 = fim H A ~ H . such that @.A. .. HAH-' n+m ~ M (16. assume that the eigenvectors are linearly independent.3. . This result implies that an equivalent condition for stationarity of a vector AR(1) process is that all the eigenvalues of lie inside the unit circle.22) ...e.2. Now. A$. . h2$.]. . Thus. 2.. .390 CHAPTER I6 Vector Time Series Models 11 . i.. h. . and m.. j ) element of the matrix cPf (B).)'] = + E [ Z . Gantmacher 119601 and Noble [1969]) so that where the adjacent A's may be equal and the 6's are either 0 or 1. are linearly indeRecall that in the above derivation we assume that the eigenvectors of pendent. . we can rewrite (16.I).2.I).~ @ ~~ ~ . however.~ a : ] where we note that E ( z . . e. where = I and the cPlf matrix decreases toward the 0 matrix as s increases.~ Z .111 .3. the eigenvectors of Qll may not be all linearly independent. . . The above results still < I for i = 1. follow if 16.~ a. In such case. .= ) 0 fork 2 I.g. . the autoregressive components of the univariate series may not be identical.26) implies that each Z i t from a joint multivariate AR(1) model is marginally a univariate ARMA process of order up to (m. Fork = 1.] + = E [ Z . Because cancellations may occur for some common factors between IQ~(B)Iand @$ ( B ) . the (i.~ ( @ ~ Z .~ ~ .16. Hence.1 Covariance Matrix Function for the Vector AR(1) Model r(k) .hence.24a) as Because the order of the polynomial IB~(B)Iis m and the maximum order of the polynomiaf @ t ( B ) is (m . . Note.3. . we have r(I) = r(O)@i. there still exists a nonsingular matrix H (see. . (16. however. In general.= E[z~-~z.3 The Vector Autoregressive Moving Average Process 39 1 a.3. that where (B) is the adjoint matrix of a l ( B ) . we let vec(X) be the column vector formed by stacking the columns of a matrix X.392 CHAPTER 16 Vector Time Series Models and Given the covariance matrix function r(k). we can compute r(0) by noting that (16.30) implies that Thus. we have from (16.3.31) for r(O). Solving Equation (16. Equations (16..3.30) can be used to calculate Z and a. p. Conversely. vec(ABC) = (C' @ A ) vec(B). 3333).3. For example.33) where 63 denotes the Kronecker product.r(O)@. if then Using the result (see SearIe E1982.3.3.3. (16.29) and (16.31) + vec(r(0)) = vet(@ . given cPl and X.) vec(Z) = (al63 Ql) vec(r(0)) + vec (2) . when are . we have Specifically.3 The Vector Autoregressive Moving Average Process 393 The use of (16. Because the eigenvalues of (A C 3 3 )are products of eigenvalues of A with those of .34). The function T(k) for k 2 1 can then be obtained from the second part of (16.4 and -9.3. the process is the eigenvalues of stationary. hence.34) is given in Example 16.1.3.28).1 Consider the two-dimensional vector AR(1) process where and From (16.16.3. EXAMPLE 16. which are inside the unit circle. the roots of be inside the unit circle.3.+ * * .3. hence. equivalently. it follows that all eigenvalues of (rPI (16. For the process to be stationary we require that the zeros of 1 1 . 16.394 CHAPTER 16 Vector Time Series Models B. Thus. by (16.2 VectorAR(p) Models The general vector AR(p) process (I- a l ~ ' -.gP\lie outside the unit circle or.3. the inverse in and.GJIB .34) exists. we can write . .281. @ Q1)are inside the unit circle.3. .( P ~ B P=) a~.35a) is obviously invertible. In such case. . ~ (16. 35a) and then taking expectations.3 The Vector Autoregressive Moving Average Process 395 where the weights are square summable and can be obtained by equating the coefficients of ~j in the matrix equation ( I . is a polynomial in B at most of order nip. i.(PIB - * .3.37a) as Now..e.a.16.Bp)(Z + YIB + * 2 ~ 2 + .38) 1. (16. .e. and each 4 ( B ) is a polynomial in B at most of order ( n ~. !PI = Because where @:(B) IQ~(B)I = [@$ ( B ) ] is the adjoint matrix of (B). each individual component series follows a univariate ARMA model up to a maximum order (mp. Thus. The order can be much less if there are common factors between the AR and MA polynomials. Thus.. . The covariance matrix function is obtained by Ieft-multiplying on both sides of the transposed equation of (16. we can write (16.3.1)p.1)p). . .3. (m .) = I . for k = 1. is Note that f(-1) = r l ( l ) and that r(k) cuts off after lag 1.396 CHAPTER 16 Vector Time Series Models where 0 is the m X m zero matrix. we note that. . we have The covariance matrix function of Z. generalized Yule-Walker matrix equations: .3 Vector MAGI) Models The vector MA(1) models are given by where the a. . . from (16.3.45) I . For m = 2.p. To solve for and X in terms of r(k)'s. we obtain the following 16. are m X 1 vector white noise with mean zero and variance-covariance matrix X. More compactly.3.44).3. a behavior that parallels the univariate MA<1) process. and (16. the zeros must be outside the unit circle or. In such a case. equivalently. The process is always stationary.81~1 be all less than 1 in absolute value. If it is also invertible.3..3 The Vector Autoregressive Moving Average Process 397 Hence. s=l where the Qf matrix decreases toward the 0 matrix as s increases.16. For the process to be invertible. The vector MA(1) process is clearly stationary. IZ and we can write Q) Z.4 Vector MA(q) Models The general vector MA(q) process is given by The covariance matrix function is where e0= I and F(-k) = r 1 ( k ) . the eigenvalues of 8 1 should of . which is a quadratic matrix equation in QI. then we can write with . The proof is exactly identical to that of the stationarity condition for tbe vector ARf 1)process. + 2 @fZf-s= a.'Amethod to solve such an equation can be found in Gantmacher (1960).Note that r(k) cuts off after lag q. 16. = I.Bq] no= -I s = 1..3.BIB - = I.53) n. we can write where the 'Psweights are obtained by equating the coefficients of Bj in the matrix equation ( I .8 i ~are( outside the unit circle or if aII the eigenvalues of are inside the unit circle.54) and llj = 0 f o r j < 0.. The pracess is invertible if the zeros of 11 . In such a case.(3)I be outside the unit circle. In such a case.2. the only requirement needed for the model to be invertible is that all the zeros of16.3.3.. -I I ~ B .l ) q .'PIB + 'P2B2 -I- -) = (I .5 Vector ARMA(1.(B) [I Hence. 1) model is given by 1 The model is stationary if the zeros of the determinantal polynomial . Because the elements of the adjoint matrix 8:(8) are polynomials in B of maximum order (m ..E).l) Models The vector ARMA(1.Q ~ B Iare outside the unit circle or if all the eigenvalues of are inside the unit circle. we can write such that .-. (16. can be obtained by equating coefficients of ~j in ll (B)8. where . + n.398 CHAPTER I 6 Vector Time Series Models so that the II.BIB). The II...*I[ ~ 1 .o. are absolutely summable.a I B ) ( I -t.. can be calculated recursively from the Qj as n.eli. (16. 16.= ns-. 3 The Vector Autoregressive Moving Average Process 399 and For the covariance matrix function. we get . for k = 2. r(0)can be solved in the same way as illustrated in Example 16.60). given the covariance matrix function r(k).3. Thus.31). Conversely. Equation (16.60). First. we consider and note that Hence.3. we obtain Using the first two equations in (16.60). using the first two equations in (16.3.3.3. we have and When the parameter matrices and Z are known. we obtain Next.60). from the third equation of (16.3. we can obtain the parameter matrices as follows.61) is of the same form as Equation (16.1 and is given by The other covariance matrices r(k)can then be calculated through (16.16. in univariate time series. is reduced to a stationary series (1 . we have where Equation (16. it is very common to observe series that exhibit nonstationary behavior.60) into (16.3. i. Substituting the second equation of (16. For example.64) 16.3.B ) ~ z for . it can be reduced to a stationary vector series by applying the differencing operator D(B). To be more flexible.e.4 Nonstationary Vector Autoregressive Moving ~ v e r & eModels In the analysis of time series.so we can write where gS. This extension. may be nonstationary. implies that all component series are differenced the same number of times. equivalently.1) to the vector process is or. some d > 0.3.3.65).46) and can be solved similarly for B1. A seemingly natural extension of (16. however. a nonstationrvy series 2.400 CHAPTER 16 Vector Time Series Models Hence. we assume that even though 2.3. .66) is of the same form as Equation (16.Having and B1.. One useful way to reduce nonstationq series to stationary series is by dserencing.Z can be calculated from (16.(B) is a stationary AR operator.4. This restriction is obviously unnecessary and undesirable. dm)is a set of nonnegative integers. . Furthermore. :. Differencing on vector time series is much more complicated and should be handled carefully. [B~(B)I where the zeros of IOp(3)[and are outside the unit circle. 16. dz.and Z. 16. In this circumstance.5. One should be particularly careful when the orders of differencing for each component series are the same.. . . one can use the foilowing generalization of the vector ARMA model proposed by Tiao and Box (1981): I where the zeros of the determinantal polynomial [ Q ~ ( Bare ) on or outside the unit circle. . .1 Sample Correlation Matrix Function Given a vector time series of la observations ZI. as discussed in Chapter 17. . .5 Identification of Vector Time Series Models 401 where and (dl. . Z2. Thus. Box and Tiao (1977) show that identical differencing inlposed on a vector process may lead to a noninvertible representation. in some cases. For a given observed vector time series Z1. a linear combination of nonstationary series could become stationary. we have the nonstationary vector ARMA model for Z. OverdiEerencing may lead to compfications in model fitting. we can calculate the sample correlation matrix function . .5 Identification of Vector Time Series Models In principle. we identify its underlying model from the pattern of its sample correlation and partial correlation matrices after proper transformations are applied to reduce a nonstationary series to a stationary series.16. identification of vector time series models is similar to the identification of univariate time series models discussed in Chapters 6 and 8. &. a model purely based on differences may not even exist. . In such a case.Zn. Their method is to note the symbols -. Ifpgi(k)= 0 for [ k ]> q for some q.e. 228) shows that j(k) is a consistent estimator that is asymptotically normally distributed.denotes a value greater than 2 times the estimated standard enors. Unfortunately.) Z.k) is often replaced by n in the above expressions.zj) .3. . series are white noise. the matrices are complex when the dimension of the vector is increased. and .5) are rather complicated and depend on the underlying models. fiii(k + S ) ] - 1 n-k and For large samples. from (1 4. Bartlett's approximation for the asymptotic covariance simplifies to Cov [ & j ( k ) .+k-l has been removed. i. and Z. For a stationary vector process.then.denotes a value less than -2 times the estimated standard enors. .z. it is the correlation between Z.2 n = [x. .4) and (14.t+k . zi and and ndj are the sample means of the corresponding component series.zj)2]1p zi) . 16.k [ I + 2~ S=I P ~ ~ ( S ) P ~ ~ ( ~ I )kI] . . As dehed in Section 2. we have Var[ie(k>] - 1 4 n .k after their mutual linear dependency on the intervening variables ZrS1.2 Partial Autoregression Matrices A useful tool for identifying the order of a univariate AR model is the partial autocorrelation function.3.l(zi. +. j ) position of the sample correlation matrix.3. (ta .=~(Z~. Tiao and Box (1981) introduced a convenient method of summarizing the sample correlations.402 CHAPTER 16 Vector Time Series Models where the . and Z.51.+. .Zt+2. and in the (i.ij(k) - ZZ!(zipt (zj..5.> 4. The crowded numbers often make recognition of patterns difficult. When the 2...denotes a value within 2 estimated standard errors. where -t. The variance and covariance of these sample cross-correlation Fii (k) obtained by Bartlett (1966) and given in (14. Hannan (1970. p. To alleviate the problem.ij(k) are the sample cross-conelations for the ith and jth component series .. The sample correlation matrix function is very useful in identifying a finite-order moving average modd as the correlation matrices are 0 beyond lag q for the vector MA(q) process. .3. .5 Identification of Vector Erne Series Models A 403 * where 2. is the error term. . k.. and Zt+Rare the minimum mean squared error linear regression estimators of Zt and Zt+k based on Zt+f. and the nt X nt matrix coefficients a . The Tiao and Box definition of 9 t s ) leads to solving (16. . those that minimize . Tiao and Box (1981) define the partial autoregression matrix at lag s. .+. denoted by $(s).. 2.5. to be the last matrix coefficient when the data are fitted to a vector autoregressive process of order s. In this section. . the partial autocorrelation between Z. If.8) for a. p. .7) leads to a multivariate generalization of the Yule-Walker equations in unnormalized form. This definition is a direct extension of the Box and Jenkins (1976.. we consider a generalization of the concept to a vector time series process proposed by Tiao and Box (1981). The partial autocorrelation is useful for identifying the order of a univariate AR(p) model because +kk is zero for [kl > p. . k = 1. are \where /%I2= YV. in the multivariate linear regression where. .5.% . and Z. and determining the partial autoregression matrix requires solving (16.. . That is. and Zr+kis also equal to the last regression coefficient associated with 2. for s 2 2.16. when regressing Zt+kon its k lagged variables Zf+R-l.5. 64) definition of the partial autocorrelation function for univariate time series. and Zt+k-l. B(s) is equal to @s. s. AS shown in Section 2.. Minimization of (16.ZIC2.3. we let . for the m-dimensional vector V.Zt+k-2r.8) for successively higher orders of s. 15) The 9 ( s ) so defined has the property that if the model is a vector AR(p). the partial autoregression matrix function. .= (r(0) .. s > 1. has the cutoff property for vector AR processes.15) and hence can be applied to any stationary process. 9 ( s ) . Although the autoregression matrix function. (16. (16S.b' (s)[A(s) ]-lc(s)}. then Like the partial autocorrelation function for the univariate case. Li = 1.b' (s)[A(s) 1-'b(s))-l (r(s). we have @.5. the elements of P(s) are not proper correlation coefficients.5. Equation (16. 9 ( s ) .11) implies that Substituting (16.14) The partial autoregression matrix function is therefore defined to be B(s) = ri(1)~r(o~~-l.5.8) can be written as Hence.: .13) into (16. is motivated from consideration of vector GR models. {rf(s) . Unllke the univariate partid autocorrelation function.12) and solving for @j.cl(s)[A(s)]-lb(s)) (r(0) - bf (s)[A(s)]-lb(s))-I. the function is defined in terms of the F(k)'s through (16.404 CHAPTER 16 Vector Time Series Models and then the system (16.5. however.5.5. estimates of the P ( s ) can be obtained by fitting vector autoregressive models of successively higher orders by standard multivariate linear regression models. we note that if a process is a stationary vector AR(p).5.A. we have where When the a. Thus. we can write it as where r' is a constant vector. First. . . Zm)' is the sample mean vector. The maximum likelihood estimate of Z is given by . . are N(0.16. and fort = p + 1. the following results. .15) by the sample covariance matrices qs): . &. . from the discussion of multivariate linear models given in Supplement 16. Alternatively.- wbere z = (Z1. 2.5 Identification of Vector % m e Series Models 405 Sample Partial Autoregression Matrices Sample estimates of the P ( s ) ca_n be obtained by replacing the unknown r(s) in (16. n. is joint multivariate normal with the mean and variance-covariancematrix where @ denotes the Kronecker product. . . . Z) Gaussian vector white noise. we have. given n observations. A The estimated variance-covariance matrix of P is .pm .5.e.I) d. and is denoted as WN-pm-I(Z).f.p and S ( p ) is the residual sum of squares and cross-products.406 CHAPTER 16 Vector Time Series Models tvhereN = n .. 4. TOdo so.. i. The maximum likelihood function is given by Summary Statistics for Partial Autoregression To test for the significance of the sample partial autoregression. P is independent of S(P). we note that Equation (16.. 3. S(p) follows aAWishartdistribution with (N.and &p are provided by many statistical software that contain multivariate Linear models. we can use an overall X2 test. . Furthermore.18) can also be written as where and To test the hypothesis . The estimated variances and correlations of the elements of such as 41. Ideally. Otherwise. the statistic asymptotically has a chi-square distribution with m2 degrees of freedom. Some software programs provide both M ( p ) and AIC.e.5 Identification of Vector Time Series Models 407 we note that under Ho. it becomes and one selects a mode1 that gives the smallest AIC value. i. The corresponding maximum likelihood function under Ho is given by Thus. we see that under Ho. . For example.1)is the residual sum of squares and cross-products under Ho.. the estimate of Z is given by where S ( p . For an m-dimensional vector AR(p) model. use of good judgment from a model builder may be required. the software SCA computes AIC as 2 AIC = . We reject Ho if U is small and hence when M(p) is large.In(maximum likelihood) II + -2n (the number of parameters). different criteria will reach the same model. the likelihood ratio statistic for testing QP = 0 against QP f 0 is Following the result given in Supplement 16. One can also use Akaike's information criterion (AIC) to select a model..A.16. .+.can also be obtained as the last regression coefficient associated with Z.~ . . 221).+~/~] from a regression of Z. . . however. ]. As we have noted.33) is the residual from a regression of 2. for the univariate time series is the correlation coefficient between Z. unlike the univariate partial autocorrelation function. we let . . and that (16.3 Partial Lag Correlation Matrix Function By definition. and Z. . on its s lagged variables Zl+s-I. Note that (16. Haman [1970. b) extend the definition of univariate partial autocorrelation to vector time series and derive the correlation matrix between Z. p.-l. 'Iiao and Box (1981) extend the latter result to the vector case and intxoduce the partial autoregression matrix function P(s).5. after the hear dependence of each op the intervening variables Z. .. / ~respectively.5.. for example. k are those that and E [ ~ V . on the carriers Z. Z.2t+2. This correlation matrix is defined as the correlation between the residual vectors and The multivariate linear regression coefficient matrices ad-l. For s 2 2..k and Ps-l. Heyse and Wei (1985a.. . Zt+. and q+. and ZI+. .+s-i. As shown in Section 2.+l. the elements of 9 ( s ) are not proper correlation coefficients. after removing the linear dependence of each on the intervening vectors ZtLtl. this partial autocorrelation between 2.-.3. .5. Z. .+s-l is removed (see.-z.5. when regressing &+.l.and &+.Z. the partial autocorrelation function between 2. . ..32) is the residual minimize E[~U. and Zt. . and use A(s) and c(s) from (16. . on the same set of carriers.408 CHAPTER 16 Vector Time Series Mod& 16. .9) and consider the minimization of . .. + I. Zt+s-1. . Solving the system (15. . .. letting .(s) are uncorrelated and that so Similarly.(s) defines the linear projection of Zt+.36)or (16. onto the space spanned by Z. p.5. Z.5.-l. .-l..5 Identification of Vector Time Series Models 409 Taking the derivative of (165. Because we have that u.+.+.e.37) gives the familiar multivariate linear regression coefficient matrices The Linear combination cu(s)Z. . and Z. on i. .35) with respect to the elements of lu'(s) and setting the resulting equations equal to 0 gives (Graham [1981. 541) which are recognized as the multivariate normal equations for the regression of Zt+. .16. . it is easily seen from (16.(l) = V.I by V. as a matrix of autocorrelation coefficients.410 CHAPTER 16 Vector Time Series Modeh and using A(s) and b(s) from (16. The covariance between v.-~. us-1.(s).we can write the normal equations for the of (16.(s). Heyse and Wei (1985a.~.15) that Tiao and Box's partial autoregression matrix at lag s equals 9 ( s ) = v&. Z.(s)]-'.9) and Z. var(vs-1.. .5. they recommend referring to P(s) as the partial lag autoconelation function even in the univariate case.33) that V. For the case s = 1.. we simply refer to P(k) as the partial lag correlation matrix function.+. vS-~. Recognizing that the univariate case corresponds to the special case when nt = 1. . is a vector extension of the partial autocorrelation function in the same manner as the autocorrelation matrix function is a vector extension of the autocorrelation function.5.+l. and cov(vs-I..34). . and Z.5. and Z.. .44) The following is a summary of the relationships and differences between our partial lag correlation matrix P(s) and Tiao and Box's partial autoregression matrix B(s) .+. They call P(s). . Z.I by Vv. Also.+.5. Note that c o v ( ~ . b) define the partiat lag autoconelation matrix at lag s to be where Dv(s) is a diagonal matrix in which the ith diagonal element is the square root of the ith diagonal element of Vv(s) and D. .after the removal of their linear dependence on the vectors at the intervening lags.) by Vv(s). as a function of s. In this book.) is equal to Vk(s).(s) from (16.391. .5.+. P(s).(l) = r(0) and that Vw(l) = r(1) because there are no intervening vectors between Z.5. the partial lag autocorrelationmatrix function.32) and (16. . in a similar manner to (16.-l. and us-l.+$. The ternpartial lag autocorrelation ntatri~is used because it is the autocorrelation matrix between the elements of Z.33) as which has the solution / I r (s) = [A(s)]-I b(s).(s).(s) is similarly defined for V. (s) [v. We denote var(us-1..5. is .l. . . (16S. Note from (16. 36) or (16. Jenkins.5.5.5. Box. when s = I. The partial lag correlation matrix P(s) is a correlation coefficient matrix because each of its elements is a properly normalized correlation coefficient. however. Recursive Algorithm for Computing P(s) In this section.5. Given the regressions from (16. however.5. That is.33).-l. . 4.44). P(s) and B(s) share the same cut-off property for autoregressive processes.32) and (16.for s 2 2. When m = 1. This method is also mentioned in Ansley and Newbold (1979a) and provides a vector generalization of Durbin's (1960) recursive computational procedure for univariate partial autocorrelations. we see from (16. . . . Thus.16. 3. the partial lag correlation matrix at lag I reduces to the regular correlation matrix at lag 1. and Reinsel (1994.40) that and Thus. This natural property that also hoIds for the univariate partial autocorrelation function 41ek. except in the univariate case mentioned in step 2. Here. Hence. p. and Z... which is to be expected because in this case there are no intervening vectors between 2. is not shared by 9 ( s ) except in the univariate case when nz = 1.{s) = 0. .+l. 2.5 Identification of Vector Erne Series Models 41 1 1. we discuss a recursive procedure for computing the partial lag correlation matrices.+. P ( s ) is not equal to the correlation coefficient matrix between the elements 2.5.43) and (16. Zt+. and Z.43) that P(1) = p(1). we see that P(s) = P ( s ) = 0 if and only if V.5. allowing for Zt+l. From (16. 87) give a nice summary of Durbin's procedure. Even for a general m-dimensional vector process. This property is one of the reasons that Tiao and Box called 9 ( s ) the partial autoregression matrix. the partial lag correlation matrix and Tiao and Box's partial autoregression matrix are equal in the univariate case. it is easily shown by reversing the order of the equations in either (16. .5.5. vs-I. . in the regression relation (16.45) on the residual v.+l on the carriers Z.-l. t+s+l.5. Z.+l.5. Therefore. + . Zt+r.. . Because the multivariate linear regression coefficient matrix that defmes the regression of Z. .l.45) and (16. be the regression of the residual us-1.. t) [ ~ = ( ~ s -t)l 1-I .in (16.the coefficients as.47).48). Substituting for us-.5.Us. and vs-I. thus. from COV(U~-I. the multivariate linear regression coefficients P ..5.5.. . . k i n (16.47) and (16. in fact.+.+s.. the partial autoregression matrix at lag s is equal to a. = us. From (16.+.5. Let . . Hence. = VL ( s ) [V. kZt+s+~-td. and us. .. .(s)]-' .. on the carriers Zt+. k and p.46). is. on us-1. from (16.5.546) in the regression (16. . from (16. where the multivariate linear regression coefficient is given by 1Yf = . equal to the partial autoregression matrix at Iag s..44) that a: is. Note from (16. we require the computation of the multivariate linear regression coefficient matrices u s . is the same as those that define the regression of Zt+.5. a1 = a . t+r.5. Corresponding to the definition of the partial lag cornelation matrix.48) can be found by considering the regression of vS-l.412 CHAPTER t 6 Vector Time Series Models we want to consider the regressions S Zt+s+l = 2 k= I ff... . k in (16..49).and uf+.47) can be computed from and Similarly.+. however. .49) gives .-1. (16. our interest is in the correlation between v. .+.+. and V.50). and (16. from (16. where D.41)]. [from (16. Follo~vingQuenouille (1957.l vU(s) = r(0).5.as in (16.5 Identification of Vector Time Series Models 413 and This development leads naturally to the following recursive procedure for computing are obtained using (16.p. Zn addition.39).5. to a vector autoregressive process of order s can be computed from 8 .5.(s).5. Vv(s).43).16.krf(k) [from (16. 40).(s) is similarly defined for V.k = as. the partiat autoregression matrix. P(s) in which V. vve note that this algorithm aIso provides 9 ( k ) . For s = 1. Ansley and Newbold (1979a) define the multivariate partial autocorrelation matrix at lag s to be . s. kr(k) Vv(s) = r(O)- s- 1 k= 1 Ps-l.(s).(16.5.5.&) respectively.42). The partial lag correlation matrix at lag s is.41). The matrix coefficients for fitting Z.1.5.fs) is a diagonal matrix whose ith diagonal element is the square root of the ith diagonal element of Vv(s)and D.39)1.k=21ffs. however. is asymptotically distributed as a X2 with m2 degrees of freedom.(s). . X-la. . 0 = (81.-. I+s and v.(s) are the symmetric square roots of V. .cPIZ. fnL(@. (16.+. 400). Z2.@. less than -2/2/. are independent and asymptotically normally distributed with mean 0 and variance 1/11. 16..(s) and [W.. .6.(s)12 = Vv(s). Thus.-1nIXI 2 2 1 " 2 . allowing for Z. the elements of P(s).1) is identified. . q ) process.). are uncorrelated white $oise series. .@.). Under the null hypothesis that Z. the log-likelihood function is given by ..(s) and W.(s)12 is asymptotically distributed as a X2 with one degree of freedom. we can use Tia? and Box's -. notation introduced at the end of Section 16.5. which implies that . The term X(s)provides a diagnostic aid for determining the order of an autoregressive model.6 Model Fitting and Forecasting Once a vector ARMA model in (16. the two residuajs u. 1. .3. In fact.(s) and V. . . . - 2. + + . .+. +. where a. Because P(s) is a proper correlation matrix. and 2 / G . are obtained by using r ( j )in place of r ( j )f o r j = 0... U$ng Quenouille (1957. 41) and Hannan (1970.-1.(S)]~ = V. and X. a.a.. B.414 CHAPTER 16 Vector Time Series Models where W. 8. . parameter matrices = (@I.=I -- a..Z. In addition. Q(s) js not equal to the correla.-r.) from a zero mean Gaussian vector ARMA(p. and Z... Z. For example.-l. . or between -2/\/.-. s . defined such that [W. its elements are not correlation coefficients except in the univariate case in which S ( s ) = Q ( s ) = P(s). denoted I$(s). n[Z. Sample Partial _Lag Correlation Matrices Estimates o_f the partid lag correlation matrices. ..1 in P(s).2) . is an autoregressive process of order s .812) = nnt 11 -ln27r . p. denoted_P(s). the methods discussed in Chapter 7 for the univariate case can be generalized to estimate the associated. given Z = (ZI. the results of sample correlation matrices can be used for its inference. p..1.-l . Like 9 ( s ) . and tion coefficient matrix between the elements of Z. .-1. .1 to indicate values of ej(s) that are larger than 2/1/. . . 4) have been discussed by Nicholls (1976.a. has a nonzero mean vector p.e. .. are assumed to be zero.16. Z i is the inverse of the information matrix and the derivatives are evaluated at the true parameter vector q.. and the ap+~-.q) has an asymptotic multivariate normal distribution. . Let q be the vector of all parameters and be the corresponding maximum likelihood estimator. where Z is the vector of sample means.6 Model Fitting and Forecasting 415 and Alternatively..6. Hence. fort 5 0 are not available.6. d where --+denotes convergence in distribution. If the process Z. maximizing the log-likelihood function in general leads to a nonlinear estimation problem. Properties of the maximum likelihood estimators obtained from (16. They have shown that 6 is asymptotically unbiased and consistent and that fi(G .1) can be replaced by the conditional log-likelihood function where . i.It can be shown that where and fi = Z is asymptotically independent of 6. 1977) and Anderson (1980).it can be estimated by the sample mean. asymptotically.. i. . (16. because 2. The sample mean is then subtracted from the series prior to estimating the parameters q.this result is equivalent to jointly estimating p with the other parameters by the maximum likelihood method. Clearly.e. Nicholls and Hall (1979).2) are generalized directly to the vector case simply by replacing scalars by vectors and matrices.5. Dunsmuir. and E(anq) = 0 for j > 0. the residuals should be white noise. may reveal subtle relationships that suggest directions of improvement in the proposed model. Phadke and Kedern (1978).f i ) orhenvise and where the &i and ej are estimates of the parameters $and For an adequate model.5 indicates that there is a feedforward and feedback relationship between advertising and sales. The estimation algorithms are complicated and in general very slow.3) and (5. Estimation of a vector time series model often takes multiple stages. We now apply the joint vector time series approach introduced in the preceding sections to andyze this data set.26).3) for computing forecasts. the forecast limits of (5. the forecast error variance (5.E(al+j) = an+jfor j cast error variance-covariancematrix is given by 5 0.41 6 CHAPTER 16 Vector Time Series Models The exact likelihood function for vector time series has been studied by Osborn (1977). Based on the initial unrestricted fit. The fore- where the 'Pj matrices are computed using (16. one should check the adequacy of the fitted mode1 through a careful diagnostic analysis of the residuals whey Z. Initially.+j forj 5 0.2.9).3. the computation of the qjweights (5. we would estimate every element of a parameter matrix. forecast updating (5.is now used to denote Z. if p = 0 and to denote (2. 16.6.2. the autoregressive representation of forecasts (5. Forecasting for a vector ARMA model may be carried out in exactly the same way as in the univariate case discussed in Chapter 5.3) or (5. we might consider the restricted fit by retaining only the statistically significant coefficients with the nonsignificant coefficients set to zero.5). FLillmer and Tiao (1979). and the eventual forecast function (5. the {-step ahead minimum mean square enor forecast can be recursively calculated by * where Zn(j ) = Z.5) and qo= I .10). A close inspection of the correlation sfructure. and Deistler (1980).2.3. The recursion (5.44. Once the parameters have been estimated. and Hannan. the correlation matrices of it should be insignificant and have no pattern. Z a o and Box recommend using the conditional method in the preliminary stages of model building and later switching to the exact method. .3. .4.2.7 An Empirical Example - - Recall that the analysis of the Lydia Pinkham data performed in Section 14. For example. Overall x2 tests based on the sample comlations of the residuals are suggested by Hosking (1980) and Li and McLeod (1981). however. Hence. . The pattern of bll(k) indicates a possible low-order AR model for the series (1 . and the pattern of c22(k)implies a possible MA(1) model for the series (1 . We use MTS in the followving analysis.Q I B . and the newer version of SAS.1 plots the sample correlation matrices along with their indicator symbols for the differenced series of advertising and sales from lag 1 through 10..5.7 An Empirical Example Only a few time series software programs are available for modeling vector ARMA models.1 Sample correlation matrices for the differenced Lydia Pinkham data. We note that the (i. and MTS. i. is the estimate of pii(k) if i 5 j and of p&-k) if i > j.3.7. we examine the partial lag correlation matrix function together with the indicator symbol and summary statistics shown in Table 16. ( I .7.e.B ) q For more clues.2. As shown in Section 14.1 Model Identification .1) 16. 16.B)ZI. The parameters were estimated in three stages. implements the partial lag correlation matrices suggested by Heyse and Wei (1985a.a2B2)(1 . (16. j> element of the lag k matrix.1 shows and Figure 16. These include SCA.3. The result indicates a possible AR(2) model for the data. SCA 1997 implements the partial autoregression matrices proposed by Tiao and Box (1981).3.. be the series of annual advertising expenditures and &. efficient estimates of the parameter matrices Qi and X are determined by maximizing the likelihood function.2 Parameter Estimation Once a tentative model is identified.1) are shown in Table 16. . Table 16. there is a need for differencing the data. h c ..417 16. > ( k ) .7. a multivariate time series program developed by Automatic Forecasting Systems.7. In the first TABLE 16.B ) Z t = a. .. be the series of annual sales for the Lydia Pinkham data between 1907 and 1960 as plotted in Figure 14. b). (1987). Let Z1. MTS. The results of the estimation for Equation (16. 1) was less than 1. for the second stage.418 CHAPTER 16 Vector Time Series Models FIGURE 16.1 Sample auto. it too was dropped from the model.and cross-correlations for Lydia Pinkham data. Those elements of Q2 that {yere less than 1. 2). After the second estimation stage. T$ey included e2(l. stage. attention focused on Because ar(2.5 times its estimated standard error. all dements of and Q2 were estimated with their estimated standard enors. The third stage estimates represent our fitted model .5 times their estimated standard etrors were set to zero 2) and Q2(2. 6@I k 1 2 ..05 -.16) (-16) [:a: (3) Final model t. I: [ : * I] [I T] [: :] [: :] [ : -1 [: :] ::.7.27 (.1).17) -.17) (.. [I) Full mode1 (.:I:.53 .001 25.I15 2.70 .] [ -.301 .I [-.59 .00 .000 3.2 Sample partial lag correlation matrix function for differenced Lydia Pinkham data..7 An Empirical Example 419 TABLE 16.242 7.3 Estimation results for Equation (16.a -.618 TABLE 16. Indicator symbols [ :.I61 (2) Intermediate (-16) (.479 5.16.45 .20] X(k1 P-value 17.61 .13) and 43.32 3 4 5 6 [-: I::[ 2 I:- [-:. 16. Zl. . we can rewrite Equation (16.420 CHAPTER 16 Vector Time Series Models TABLE 16. P 16. we note that after multiplying the matrices and rearranging terms. .7. given Z1.2) provides an adequate representation of the data.2).4 gives the residual correlation matrices along with their indicator symbols for lags 1 through 4 that indicate that the restricted AR(2) model (16. = 608. &.. Such an analysis includes plots of standardized residuals against time and analysis of residual correlation matrices. Z1. Tabie 16. we use Equation (16.2 = 1390.7.53 = 644.4 Residual correlation matrices for Equation (1 6.7.54 = 564. First.54 = 1289.5.Z1.@2Z1-2is necessary.7.@lZI-l . .a detailed diagnostic analysis of the residual series it = 2. . Z2.53 = 1387. and Z2.2) to compute the one-step ahead forecast of Zs5 from the forecast origin t = 54.2) as and Hence. we have ..3 Diagnostic Checking To guard agynst modelAmisspec%cation. 1 = 1016.7. . .7. 52 = 639.4 Forecasting For illustration. . &. 1 and 14. 2) > 0.. 1.tqt = 17.7.32)(43. these matters include the sales.3). Z2.8 Spectral Properties of Vector Processes The spectral results in Sections 12.. This estimate indicates that advertising and sales were also contemporaneously related. This feedback relationship is even more explicit horn the forecast equation of the advertising given in (16. the contemporaneous relationship between components of vector series is modeled through the off-diagonal elements of 2. we choose a.7.1 = &. It should be emphasized.Zl.05/v(31. 16. Note that el..05. Model building is a delicate process that also requires proper understanding of related subject matters under study. For an interesting study of building a model for the Lydia Pinkham data using other external information. that the above results by no means imply that the proposed vector model is the best choice for describing the advertising and sales phenomena of Lydia Pinkham Medicine Company.I.16.t l l . In this case.4) that there is no contemporaneous relationship between sales and advertising.. The above example is used to illustrate a vector time series model and its impfications for the data set when only the past time series observations are used. One should not conclude from (16. .. shown in Table 14. we conclude that cPl(l.7. we can estimate the correlation between the residuals irl. among others. 16. however.I ) = 31. .1 = a2. Hence. 1. go= I.7.. errors.Zm. In our form of ARMA models..7.. Var(e2. 54(1) be the one-step ahead forecast Let el. and Cov(el. and operation of the company. From 2 in (16. 1) = 17. ez. - w q. .e2. 1 = Zr.5 Further Remarks From Equation (16. and which equals $.3) and (167. but a general form of the variance-covariance matrix X for the innovation a.1) = 43.46.32. The advantage of a joint vector model over a transfer function model is evident when there is a feedback relationship between variables.5 for the univariate and bivariate processes are readily generalized to the m-dimensional vector process.9.2). the spectral representation of 2.30) = .21. marketing.This conclusion confirms our suspicions from the cross-correlation functions estimated from the residual series of the transfer function model.54(l) and e*.Var(el..is given by . .55. and the forecast error variance for [el. e2.8 Spectral Properties of Vector Processes * 421 &. For a jointly stationary nt-dimensional vector process Zt = [ZI. that a feedback relationship with sales leading advertising existed in the data. is given by T%us.30. the reader is referred to Palda (1964). I = alS55 and.55 . (m) in (16. It is easily seen that the spectral density matrix function f (w) is positive semi-definite. h ( w ) = fz(w) for all i and j. Thus.. and the element fj...8. . . i. ..(w>]. Also. . . The element fii(w) in (16.e.!w)ll and the dUi(w). the matrixf (w) is Hermitian.4) is the cross-spectrum or the cross-spectrd density of Zi. and the Zj.e. . nt. . The results of bivariate frequency-domain analysis can be used for the multivariate case.422 CHAPTER 16 Vector Time Series Models - .8. .. . .. gain function. Specifically. If the covariance matrix function is absolutely summable in the sense that each of the m X m sequences yg(k) is absolutely summable. then the spectnun matrix or the spectral density matrix function exists and is given by .2. f.. phase spectrum. (16.i. . . the cospectrum. and coherence may now be defined for any pair of component processes in Z. we can write and f ( w ) = 2~ 2 T( k ) e-iwk. and Zj.4) is the spectrum or the spectral density of Zi.3) d ~ * ( wis) the transpose of the complex conjugate of dU(w). i = 1. dU2(w).dU. The spectral representation of the covariance matrix function is given by where = [ E( d Ui (o)dU. and F(w) is the spectral distribution matrix function of 2. c'f(w)c 2 0 for any nonzero m-dimensional complex vector c. (w ) )1 = [dFi. where dU(w) [dUl(w).8. are bothi orthogonal as well as cross-orthogonal.The diagonal elements Fiio) are the spectral distribution functions of the Zi and the off-diagonal elements Fij(o) are the cross-spectral distribution functions between the Zi. Hence. . . Xil.2) or (16. X. .3) is the so-called multivariate linear regression model. NXm where Y.A Muttivariate Linear Regression Models 423 Supplement 16.1) and that we let Yi = [Yil.. X. .]' has mean E(a) = 0 and variance-covariance Var(a) = X. .. respectively. we also list their dimensions underneath. We can represent the system in the matrix form . . Let Yl. Z) for i = 1. fi N[P.Supplement 16. a.] denote the constant term and predictor variables for the ith trial. a!. .. .A. more compactly.A.]' denote the corresponding errors. Yim3' denote the responses for the ith trial. or. Xr be a common set of r predictor (independent) variables. Y.3.A Multivariate Linear Regression Models . .I:(X'X)-'1. be the m response (dependent) variables and XI. . . . N. . and ai = [ail. . . Assume that each response follows its own multiple regression such that where the error term a = [al. We have the following results: + 1. .. then the error specification becomes and The model given in (16. and E denote the matrices of responses. . Assume further that rank (X) = (r 1) < Nand ai N(O.A. . For clarity. parameters. . p. The MLE for P is fi = (XiX)-'X'Y. If ai and aj are uncorrelated for i # j. and errors. predictors. X Y = NXfn p N X ( r + 1) (r+I)Xm + E . . [l.. Now suppose that there are a total of N trials in the system of (16. 2. - - . . P(kf2)i. Let us label these predictors [Xk+t. . The estimate of X in this case now . In certain applications. it may be of interest to test the null hypothesis that some of the predictor variables are unnecessary. ~ ( b 5) . Pn. . .424 CHAPTER 16 Vector Time Series Models and ~ 4.e.) is the likelihood function.3) as The hypothesis that the predictors in X2are unnecessary is equivalent to testing that Ho : P(21= 0 versus Ha: p(2)# 0. .Y for the reduced model of (16.. .. I - f :WN-r-l(X). we can rewrite (16. . .I l l .A.r .x~)-~x.i = l r . i.A.6). = ( 2 n ) ~ 1m51~W' where L (.. a Wishart distribution with (N .1) degrees of freedom. Under the hypothesis we have Let c(l) be the estimate of becomes and = (x. e-N1n/2 5. and 2 are independent. By partitioning the matrices X and P for the N trials presented in (16A.3) accordingly. .Xr] and label the corresponding parameters for the ith response [/3(k+l)i. Xk+21 .]. r = pln. .. It is clear that 0 < A 5 1. When testing the hypothesis we have N = ( n . . ..I1. .A.A Multivariate Linear Regression Models 425 Let fl be the original parameter space for (16. = [Zl. In the vector AR(p) modeI. . We reject the null hypothesis when A is smail or -21n A and Bartlett's modified statistic is large. .. .3) and nobe the reduced parameter space under Ho.k)m). . X2((r. we have . we can apply the likelihood ratio test It is welf known that which follows approximately a chi-square distribution with the degree of freedom equal to [dim(St) . i. and k = (p .1)m. and where Z. .X). T = {T. . For a better approximation. To test the null hypothesis Ho.dim(&)]. T.' the mi are m X m coefficient matrices. Thus. .p).].Z the at are Gaussian vector white noise N(0. one can use the modified statistic due to Bartlett (1947).Supplement 16.e. . . - (e) (I . 16.2 . and f3.6 Consider the process - where 141 < 1.I' is a two-dimensional stationary vector process.GIB)Zt = (I . = (1 . (a) Express the model for [Q. 12 5 [:4 and z = I and Z = I. 16.4 = [. t]' N ( 0 .I' stationary7 Invertible? . (d) (I . 18]..822B)b2. r(k). a.6]'md' =I.3 Find the marginal process Zl.2 Consider the following simple demand and supply model: Q. = ao 3. (b) Find the correlation matrix function.t.5 Find the AR and MA representations for the models given in parts (d) and (e) of Exercise 16. . (a) Find the covariance matrix function.t Qt = Po + @&-I + a2. Z). p (k).3 .1. .1.1.ale + al. and f3.81B)a. (b) Is the process [Z1. az. ..t. Z ) . for k = f1. (b) Repeat the problem in part fa) if al. = (I .OllB)bl. and at = [ a l . = [bl.. (a) Write the process in a vector form.4 Consider the models given in Exercise 16..81B)al. and a. N(0. where @I = 8 3 : 6 ] 3'1 ..]' is a vector white noise process. where b. where GI= (c) Zt = (I . and az. bz. *l. Pt]' as a joint vector process if a.@$)a. represents quantity demanded at time t and Pi represents the price at time t. = (1 .f2. where 16.. for the models given in Exercise 16.426 CHAPTER I 6 Vector Time Series Models 1 Determine the stationarity and invertibifity of the following two-dimensional vector models: (a) (I . where Bl = [:!:36]..for k -. t (demand) where Q. zk2.?.QIB)Zt = a. 181 < 1. and a = [al... 0. 5. 16. is a vector white noise process.QIB)Z. 16. Exercises 427 (c) Write down the model for the vector of first differences (I ..10 and 16. (b) Repeat part (a) for the properly differenced series. Consider the U. (a) Calculate the sample correlation matrix function ji(k). where 2.. -. &. with k = 1.I1. 2.S. . for the models given in parts (a) and (c) of Exercise 16. . . . . it is required that 2 . 16. . .5.[Z. hog data given in the appendix as Series W13. for the original series between 1965 and 1974.S.. Show that to properly estimate the parameters in the model. . h Calculate the one-step ahead forecast Z. . . 2. 10. Repeat Exercise 1610 for the partial lag correlation matrix functions P(k). Complete the construction of a joint vector model for the data of U. .3. Calculate the partial lag correlation matrix function P(k) with k for the models given in parts (a) and (c) of Exercise 16.(l) and the forecast error variance for the model given in part (d) of Exercise 16.17). = 1.5. IS the resulting model stationary? Invertible? .. .11. Build a joint vector model for the five series of U. 10. housing sales and starts between 1965 and 1974 from the information obtained in Exercises 16.5.1. 15.1. Forecast the housing starts for 1975 using the joint vector model and compare the result with the forecast obtained in Exercise 14. . Let n be the number of observations for the vector ARfp) model given in (16. .7 Calcufate the partial autoregression matrices 9 ( k ) with k = 1.IB)Z. 2. housing sales and housing starts given in Exercise 14.1.( +i)m + f. p' is called a cointegrating vector which is sometimes also called the long-run parameter. - . and equivalent representations of a vector time series model. an I(0) process will be stationary.B)Z. an integrated series can wander extensively. Z. AZ. for 2. tend to keep many of these integrated series together and form equilibrium relationships. is stationary. Thus. to be Z(1) we mean that AZ.Bld2. In such a case. is Ifl) if Zt is nonstationary but its first difference.1 Unit Roots and Cointenration in Vector Processes A univariate nonstationary process or series Zt is said to be integrated of order d. Under this restriction. Under this notion. = $(B)n. and it wiU be used in our discussions. we have $(I) # 0. 6(B)(1 . is not only stationary but is actually I ( 0 ) . is stationary. is stationary. b).. then so too is cp'Z. and nominal GNP. Examples include short-term and long-term interest rates. Because the difference of a stationary process is stationary. The cointegrating vector is clearly not unique because if P'Z. if its (d . Zt = B(B)a. = (1 .~~a. Thus. Understanding these concepts is important in the analysis and model building of vector time series. In this chapter. = (1 . and +(I) = 0. 17. 2. the differenced series. = +(B)a.B)a. however. for any nonzero constant c. is said to be cointegrated of order (d. some important phenomena are unique in vector time series. partial processes.~B~B~. of a stationary process. Adz. = (1 .More on Vector Time Series Although the basic procedures of model building between univariate time series and vector time series are the same. An (m X 1) vector time series 2. . where B(B) = and 2. cp' is also a cointegrating vector. Economic and market forces.. =: (1 . zFo z. income and consumption. This leads to the concept of cointegration.B)Z.Fl~9j[ < W. As a nonstationary process. we introduce concepts of cointegrakion. to restrict ourselves to a real integrated process we define an I(0) process to be a stationary process where in terms of its MA representation. denoted as I(d). where 0 < b 5 d. AZ.B)Z.. b).b) for some nonzero (1 X m) constant vector P'. = t/. however.-~. is not I(0) because the resulting MA representation of the differenced series is AZ. denoted as CI(d. The most cornmon case is d = b = 1. if each of its component series ZL is I(d) but some linear combination of the series j3'Zt is I(d .1)th difference is nonstationary but the dth difference. Hence. form a basis for the space of the cointegrating vectors known as the cointegration space. consider the two-dimensional vector AR(1) process Clearly. are each nonstationary. This concept has been widely used in many interesting applications. As an illustration. where . However. especially in the area of economics and finance. .we have 5. vectors Pf are not unique. Pi. Clearly. neither is Pi. but their linear combination is stationary. See. Kremers (1989). The vectors pi.. and Z2.omponent series is an I(1) process. Thus. Granger (1986) and Engle and Granger (1987) further developed these ideas and-proposed the concept of cointegration. each being I ( l ) . . Pi such that P'Zt is a stationary (k X 1) vector process.is nonstationary..= -+Zl. More generally. Zt is cointegrated with a cointegrating vector P' = [+ 11. If for any other (1 X m) vector b' that is Linearly independent of the rows of P'. then the linear combination is clearly I(0). . then there may exist k (< m) linearly independent (1 X m) vectors Pi.17. then the 2. is said to be cointegrated of rank k. let p' = [# 11. This interesting phenomenon is shown in Figure 17. For the component &. we have that bJZ. if the (rn X 1) vector series Z.is a random walk... for example.*. the component ZIT. . . where Z1. who showed that nonstationarity of a vector process can be caused by a small number of nonstationary components. contains more than two components. . . becomes + an. . however. Hence. . is a (k X m) cointegrating matrix. Pi. this equation which follows an MA(1) process..After differencing. each c.1.1 Unit Roots and Cointegration in Vector Processes 429 The dangers of differencing all the components of a vector time series was reported by Box and Tiao (1977). 430 CHAPTER 17 More on Vector Time Series FIGURE 17.1 A cointegrated process where Z1,, (a heavy solid line] and Zz, (a dotted line] are each nonstationary, but their linear combination Yt fa thin solid line) is stationary. , 17.1.1 Representations of Nonstationary Cointegrated Processes Representation Because each component of 2,is I(l), the implication is that AZ, = (I - B ) Z , is stationary, Let e0 = E(AZ,). We can write AZ, in the MA form AZ, - O0 = 'P(B)a,, (17.1 -4) where a, is the vector white noise process with mean E(aJ = 0, variance-covariance matrix E(s,s:) = Z, q ( B ) = q j B j , q,, = I, and the coefficient matrices qj is absolutely summable. Let X, = *(B)a,. Then, AZ, - O0 = X,and zFo Z, = Zpl + Oo + XI. Applying a direct substitution for Z, - in (17.1.5) gives (17.1.5) 431 17.1 Unit Roots and Cointegration in Vector Processes A straightforward extension of Equations (9.4.5) and (9.4.7) to vector processes gives where Y , = x;o'P:a,-j and P; = - E E I ~ +which i is an absolutely sumable matrix sequence. Hence, the vector process Y, is stationary. Combining (17.1.6) and (17.1.7), we get and P'Z, - @'(Zo - Yo)+ P'eOt + f3'T(l)[al -t- a2 + . + a,] + P'Y,. * (17.1.9) + + -+ Clearly, the process br[al a2 . a,] is nonstationary for any nonzero (1 X m) vector b'; hence, (17.1.9) implies that P'Z, will be stationary if and only if and Equation (17.1.10) places a restriction on the nonzero drift across the components, whereas Equation (17.1.11) says that the determinant I ~ ( B ) \ = 0 at B = 1; hence, Q(B) is noninvertible. By (17.1.4), this implies that we cannot invert the MA form and hence can never represent a cointegrated process with a vector AR form in AZ,. The vector AR representation of a cointegrated process must be in terms of Z, directly. EXAMPLE 17.1 It can be easily shown from (17.1.2) and (17.1.3) that the MA representation of the two-dimensional cointegrated process given in (17.1.1) is given by 0 Note that *(B) = O ] and *(l) = [-4 O] 0 f 0, but lq(1)) = 0. Thus, !PCB) is not invertible, and we cannot represent the process in an AR form in terms of AZ,. AR Representation Now, suppose that 2, is nonstationary and can be represented as an AR(p) model 432 CHAPTER 17 More on Vector Time Series t such that I@,(B)~= 0 contains some unit roots, where @,(B) = I - a I B each component of Z, is 1(1), then, from (17.1.4), ( 1 - B ) Z , = O0 . .- BP. I f + 'P(B)a,. Hence, Multiplying (1 - 3)on both sides of (17.1.12) and comparing with (17.1.141, we have where we note that (1 implies that - B)Bo = 0. Because Equation (17.1.15) holds for any a,, however, it and for any B. Hence, Comparing (17.1.10) and (17.1.1 1) with (17.1.16) and (17.1.171, we see that in terms of an AR representation, the matrix @,(l) must belong to the space spanned by the rows of P'. That is, for some (m X k) matrix M. EXAMPLE 17.2 Although an integrated process cannot be represented in an AR form in terms of AZ,, its AR representation does exist in terms of 2, itself. Recall the two-dimensional cointegrated process and its AR presentation in terms of Z, given in (17.1.1): In this case, 17.1 Unit Roots and Cointegration in Vector Processes 433 and the cointegrating matrix P' = 14 11. Thus, Z, is a cointegrrtted process of rank k,= 1. Note that %(I) = 0 [$ 0 = MB' = MI4 11, where Error-Correction Representation Next, note that in the AR representation, the matrix polynomial Op(B) can be written as Qp(B) = I - QIB - - QPBP = (I - AB) - ( q B + . . + rup-l~p-') (1 - B), (17.1.19) + --+ where A = @, . @, and 9 = -(Qj+1 (17.1.12) can be rewritten as Z, = O0 Subtracting = + Qp) f o r j = 1,2, . . . , p - 1. Thus, + AZtil + ( Y ~ A Z +~ - ~ + L Y ~ - ~ A Z+~ a,,- ~ + ~ (17.1.20) a * from both sides of (17.1.20) gives AZf = e0 + yZ,-$ -t- alAZ,-l where y + A - I = --@,(I) = + - . . + ru,-lAZ,-,+I + a,, (17.1.21) -MP' by (17.1.18) and (17.1.19). Hence, for some fm X k) matrix M, where Y, - 1 = P'Z,-* is a (k X $)-dimensionalstationary process. Equations (17.1.21) and (17.1.22) imply that the differenced series AZ, of a cointegrated process 2, cannot be described using only the values of past lagged differences, AZj, for j < t. The process must include an "error-correction" term, MY,-l = MP'Z,-l. If we regard the relation AZ, in terms of its own past lagged values AZj for j < f as a long-run equilibrium, then the term Ytml = P'Zf-l can be viewed as an error from the equilibrium and the coefficient M is the required adjustment or correction for this error. Accordingly, 434 \ CHAPTER 1 7 More on Vector Time Series Equation (17.1.21) or (17.1.22) is known as the error-correction representation of the cointegrating system. The representation was first proposed by Davidson et al. (1978). EXAMPLE 17.3 For the cointegrated process given in (17.1.1), we have p = 1, O0 = 0, Qj = 0 for j 2 2, and hence aj= 0 for j 1 1. From Example 17.2, M= [i] and &-I = p'Z,-I = [4 11[ Z1, t-1 2 2 , r-1 ] = @l,t-l + Z2,,-1. Thus, the error-correction representation of (17.1.1) is given by where Yt-l - +Z1, + 5,t - l is the stationary error-correction term. 17.1.2 Decomposition of 2, Assume that nonstationarity of Zt is due to unit roots in the AR matrix polynomial. That is, in the AR representation Qp(B)Zt = 00 + at, (17.1,23) some of the raots of I @ , ( B ) / = 0 are equal to 1. More specifically, assume that there are h (5I ) unit roots and that all other roots of J@, ( B ] I = 0 are outside the unit circle. Thus, ]@p(l)[= 11 - Zf'=l@jl = 0; the matrix I&l@j has h eigenvdues equal to 1 and ( ~ n- h) other eigenvalues less than 1 in absolute value. Let P be an ( m X m) matrix such that P-l[Z&l@j]~= diag[Ih Ak]= J, where k = (m - h) and J is the Jordan canonical form of Ef= @j. Then, r P Let P = [S R] and (P-')' ate (nl X k) matrices so that - [H 1 PI, where S and H are (nt x h) matrices, and R and p 17.1 Unit Roots and Cointegration in Vector Processes 435 Recall that y = -@,(I) in (17.1.21),which can therefore also be written as AZ, = Q0 + C Y ~ ~ ' Z+, alAZt-l -~ + - . + ap-1AZ,-p+l + a,, (17.1.26) , where a = R(AA- 19 = -M. It follows that although 2, is nonstationary, the k (= fnt - h)) linear combinations BIZr are stationary, where k = rank(@,(l)) = rank(Pf). That is, 2, is cointegrated of rank k. Let Y1,, = HrZ,. Because P-' is nonsingular, the rows of H' are linearly independent of the rows of pi. Thus, Yl,, must be nonstationary. In fact, rank(Hr) = h, which also equals the number of unit roots. This h-dimensional Yl,process is actually the underlying common stochastic trend related to the k unit roots, which is aIso known as the common factors in the process of Z,. It is the common driving force for the nonstationary phenomenon in each component Zi, of 2,. Let Y2,, = PfZf.Then, , , and Hence, the cointegrated process 2, is a linear combination of the 11-dimensionalpurely nonstationary component, Y1,, = H'Z,, and the k-dimensional stationary component, Y2, = plZI, where k = (m - h). The concepts discussed in this section are closely related to the common factor analysis studied by Pena and Box (1987), Stock and Watson (1988), and many others. 17.1.3 Testing and Estimating Cointegration Following the above discussion, we see that the first step in testing cointegration is to test the null hypothesis of a unit root in each component series Zi individually using the unit root tests discussed in Chapter 9. If the hypothesis is not rejected, then the next step is to test cointegration among the components, i.e., to test whether Y, = P'Z, is stationary for Sometimes the choice of the matrix/vector p' is based on some some matrix or vector theoretical consideration. For example, if Z, = [Z1,, Z2,J', where Z1,, represents revenue and Z2,,represents expenditure, we may want to test whether the revenues and expenditures are in some long-run equilibrium relation and therefore whether Y, = Zl, - Z2, is stationary. In such a case, we choose pr = [l -11. Tn testing for cointegration in 2,with a known cointegrating vector P', we can fomulate the null hypothesis to test whether the process Y, = PfZ, contains a unit root so that we can again use the tests discussed in Chapter 9. We will conclude that Zt is cointegrated if the null hypothesis is rejected. When the cointegrating vector is unknown, we can use the following methods to test and estimate the cointegration. , a'. , , 436 CHAPTER 17 More on Vector Erne Series The Regression Method First, note that if the m-dimensional vector process Z', = [ZI,,, Zz,,, . . . ,Z,,,] is cointegrated, then there is a nonzero 111 X 1 vector p' = [el, ~ 2 ., . . , c,] such that P'Z, is stationary. With no loss of generality, say cl # 0. Then, (l/cl)P1 is also a cointegrating vector. Thus, a very natural approach that was also suggested by Engle and Granger (1987) to test and estimate cointegration is to consider the regression model for Zl, , and check whether the error series E, is I(1) or IfO). If E, is I(1) and contains a unit root, then Zt cannot be cointegrated. Othenvise, if E, is I(O), then Z, is cointegrated with a normalizing cointegrating vector given by P' = [ l , . . . , The results remain the same for the above model with a constant term included. There are many subtle points for the regression model in (17.1.29). In drawing inferences using the OLS estimated model, the following should be noted: .I,$ e series for nonstationarity, we calculate the OLS estimate, 1. in t e ~ t i n g ~ t herr2r P' = [ 1, 41, ,#rnJ and use the residual series i, for the test. 2. The estimates 4i do not have asymptotic t-distributions, and the standard t-test for #i = O is not applicable unless E, is I(0). As indicated in Chapter 15, the regression in (17.1-29)will produce spurious regression if E, is non~tationary.~ 3. If 6, is I@) and Z, is cointegrated, then we can let Y,-l = P'Z,-l in (17.1.22) and perform the estimation of the error-correction model. 4. We estimate the cointegrating vector by normalizing the first element to be unity. Clearly, we can normalize any nonzero element ci and regress 4 , on other variables in estimating the regression. Although most of the time the results are consistent, inconsistent conclusions could sometimes occur, which is a weakness of the approach. Because of its simplicity, however, the approach is still very commonly used. . .: To test whether the error series st is I(l), Engle and Granger (1987) proposed several test statistics, including the standard Dickey-Fuller test and the augmented Dickey-Fuller test. For these later tests, recall from Equation (9.4.22), in terms of the error term process in (17,1,29), that to test for a unit root in the general case we either test Ho : p = 1 against HI : q < 1 for the model or, equivalently, test Ho : h = 0 against HI : h < 0 for the model In testing, note that we use the residuafs i,obtained from the OLS fitting of (17.1.29), but they are only estimates and not the actual error terns. To adjust for the difference, instead of using 17.1 Unit Roots and Cointegration in Vector Processes TmLE 17.1 Critical values of T, for Ho : A Level of significance: =0 against HI : h 1% 437 < 0. 5% the standard Dickey-Fuller tables, Engle and Granger (1987) conducted a Monte Carlo experiment on the model * and produced the critical values given in Table f7,l for the t-statistic, T = h/S$., One should reject the null hypothesis if the T-value is less than the critical value and conclude that cointegration exists among the variables in the vector. EXAMPLE 17.4 From the discussion of this section, it is clear that in modeling vector time series, one should not automatically difference the component series even if they are each nonstationary. In fact, in terms of a vector AR model, the results of this section show that if the component series are cointegrated, then a stationary vector AR model for the differenced series does not exist. One should either construct a nonstationary AR vector model for the original series or construct an error-correction mode1 for the differenced series. Let us now further examine the Lydia Pinkham data discussed in Chapter 16 and test whether the advertising and sales series are cointegrated. Let Zl, be the advertising series and Z 2 , , be the sales series. To test whether they are each I(d), first, we obtain the following separate OLS estimated regression for each series: , where the values in the parentheses under estimates are their standard errors. Thus, T = (-8251 - f)/.07965 = -2.196 and T = (.9222 - 1)/.05079 = - 1.532, respectively, At 5%, the critical value from Table G of the appendix is -2.93. Thus, the hypothesis of a unit root is not rejected for either series, and both ZIT,and &, are I(&, with d I1, To find the order of d for the two series, we now consider the following OLS estimated regressions on their differences: , 438 CHAPTER 1 7 More on Vector Time Series where the values in the parentheses under estimates are their standard errors. Now, T = (.(I253 - 1)/.1409 = -6.918, and T = (.429 - I)/. 1278 = -4.468, respectively,At 5%, the critical value kom Table G is -2.93. Thus, the hypothesis of a unit root is rejected for either differenced series, and we conclude that both ZIP,and &, are I(1). Next, we consider the regression model , The OLS estimated equation is with the estimated regression of (17.1,32) for p = I, From Table 17.1, because the value of T = -.421/.1177 = -3.58 is less than the critical value -3.37 at the 5% level, we reject the hypothesis of a unit root. Thus, we conclude that the residua1 series is I(0) and the advertising and sales are cointegrated. It will be interesting to see whether the same conclusion will be reached when sales is regressed on advertising. In this case, we consider the regression model The OLS estimated equation is with the following estimated regression of A$ on From Table 17.1, because the value of T = -.3005/.0972 = -3.09 is not less than the critical values -3.37 at the 5% level, we cannot reject the hypothesis of a unit root. Thus, the residual series is not I(O), and the advertising and sales series are not cointegrated, which is different from the earlier conclusion. It should be noted, however, at 1% significance level, the critical value is -4.07 from Table 17.1. Both cases become insignificant, and we reach the same conclusion that the advertising and sales series are not cointegrated. This may be expected. For most companies, especially growth companies, we expect their revenues after expenditures to grow. Unit Roots and Cointegration in Vector Processes 17.1 439 The Likelihood Ratio Test Let Z, be a nonstationary (m x I) vector process that can be represented as a vector AR(p.)form iPp(B)Z, = where @,(B) = I - a I B written as - - Q0 + a,, (17.1.33) iPpBP. When 2,is cointegrated, by (17.1.21), it can be If in (17.1.33) lap(B)I = 5 contains li unit roots and k = (rt~- h) roots outside the unit circle, then, from the discussion of Sections 17,l.l and 17.1.2, it is equivalent to stating that the process Z, contains k cointegrating relations. Under this null hypothesis Ho, we have for some (m X k) matrix a , and Y, = PIZt is a (k X 1)-dimensional stationary process. Thus, only k linear combinations, p'Zt-l, of Ztw1,which are stationary, will actually appear in (17.1.34). If the vector white noise process at is Gaussian, i.e., if the a, are i.i.d. N(O, X), then given a sample of (11 p) observations on Z,, i.e., (Z-p+l, Z-p+2, . . . ,Z1, , . ,Zn), the likelihood function of {Z1, Z2,, . . ,Z,,] conditional on {Z-p+l, Z-p+2,, , , &) is + . . The MLE of (00, y, a*, . . . , ap-l, Z) will be chosen so that the likelihood function is maximized subject to the constraint that y can be written in the form of (17.1.35) or, equivalently, subject to the reduced-rank restriction on y so that rank(?) = k. To test the null hypothesis, one can use the following likelihood ratio Under the null hypothesis, some linear combinations of Z, components are nonstationary, so the asymptotic distribution of -2 In A is no longer a chi-square distribution. This particular distribution was studied by Johansen (1988,1991) and Reinsel and Ahn (1992). They show that 440 CHAPTER 1 7 More on Vector Time Series ... . where j ? ~ ~ + ~ ,, and 1;,(i12 2 fizz 2 . . 2 Cm2) are the (nl - k) = h smallest sample partial canonical correlations between the components of AZ,and Z,-l given . . . , AZt-p+l. Furthermore, when (17.1.34) does not contain a constant k m , where W(x) is a h-dimensional standard Brownian motion process, which depends only on h and not on the orderp of the AR model. The asymptotic distribution of -2 In A when the model contains a constant term can be derived similarly, and we refer readers to Reinsel and Ahn (1992) for details. We reject the null hypothesis when A is small or, equivalently, when -2 In A is large. The approximate critical values of the asymptotic distribution of -2 In A for the likelihood ratio test of Ho:rank(y) 5 k against a general alternative, for both cases without and with a constant term, based on Monte Carlo simulations have been obtained by Johansen (1988) and Reinsel and Ahn (1992). Some commonly used values selected from Reinsel and Ahn are shown in Table 17.2. The estimate of the cointegrating matrix can be obtained from the estimation of the error-correction model in (17.1.34). Note that from the discussion in Section 17.1.2, it is clear that testing for k cointegratjng relations is equivalent to testing for h = (nt - k) unit roots in the process. When h = 1, the asymptotic distribution in (17.1.39) reduces to TABLE 17.2 Critical values for the likelihood ratio test statistic under the nu11 hypothesis of rankfy) = k = [trt - h), against the alternative of rank fy) > k, where h is the number of unit roots. . -.. Case 1: Without a Constant Case 2: With a Constant h Probability of a Smaller Value .90 .95 .99 17.1 Unit Roots and Cointegration in Vector Processes 441 TABLE 17.3 Critical values for the likelihood ratio test statistic under the nu11 hypothesis of rankcy) = k = (nt - h) against the alternative of rankCy) = k + f , where h is the number of unit roots. h Case I: Without a Constant Probability of a Smaller Value .9 .95 .99 Case 2: With a Constant which can be seen to be the square of the asymptotic distribution of T statistic given in (9.3.7) that we have used to test the unit root for a univariate time series model. We may also consider the likelihood ratio test of the null hypothesis of k cointegrating relations against the alternative of (k 1) cointegrating relations. The test statistic has been derived by Johansen (1991) and is given by + + where jk+l is the (k 1)' largest sample partial canonical correlation between the components of AZ, and Z,-r given AZ,-I, . . . , AZt-pf The critical values of the asymptotic distribution for the test statistic have been obtained through simulations by Johansen and Juselius (1990), and some commonly used values are given in Table 17.3. EXAMPLE 17,5 For the Lydia Pinkham data, we have shown in Example 17.4 that the advertising and sales series are each I(1). To illustrate the use of the likelihood ratio test, following (17.1.34) and the fitted model (16.7.1) discussed in Section 16.7, we now consider the error-correction model The squared partial canonical comlations between AZt and Z,-I given and AZti2 are obtained as if= .2466 and = .1166. Because nt = 2,the rank (y) to be tested will be either 0 or 1.We begin with testing tbe hypothesis that rank (y) = k = 0 or equivalently, h = 2. Now, ;z and m3.2. Thus. .2 Partial Process and Partial Process Correlation Matrices In Section 16.3 for h = 1 comes from different simulations. the value of the test statistic 21-99 is less than the critical value 22. the testing stops. are square matrices satisfying the square summability condition.+I.q l B .. Z. .2. after removing the linear dependence of each on the vectors at intervening lags Z._l.2.16 from Table 17. In this section. we consider the correlation (expressed as a matrix of dimension m .. In testing rank ( y ) = k = 1. and Z. and we conclude that advertising and sales series are not cointegrated. we reject the null hypothesis and conclude that the series of advertising and sales are possibly cointegrated. we test the null hypothesis that rank (y) = k = 1 or equivalently. 17. The 5% critical value is 8. we discussed the partial lag correlation matrix at lag s as the correlation (expressed as an rn X m matrix) between the vectors Z...2.1 Covariance Matrix Generating Function Consider a zero mean covariance stationary vector time series 2.08. That is.of dimension nt having moving average representation . Heyse and Wei (1984) call this the partial process correlation matrix at lag s. . Next. + .97 from Table 17. The hypothesis is not rejected.. we can also use h = 1 in Table 17.t. Z. If 1% level of significance is used in testing the hypothesis that rank(y) = k = 0 or equivalently.79 from Table 17.3.)' and (Zi. h = 2...2 and Table 17.is due to an intrinsic reIationship of each wjth Z3. q s . ) I after removing the linear dependence of each on the third component series Zj. . .Zh. respectively. 25.3. .. Thus. In this case.5. such that the coefficients. Thus.. . ... the regression method and the likelihood ratio test reach the same conclusion.+.)' of dimensions ml. we don't reject the null hypothesis and conclude that the series of advertising and sales are cointegrated. in which Z. because In = 2.and &. at 1% level of significance. The minor difference between Tables 17.. h = 1. is partitioned into three subvectors.* 2 ~ 2 operator B. which is less than the 5% critical value 8.. Zi..442 CHAPTER 17 More on Vector %me Series which is larger than the 5% critical value 17. ) is an m X m matrix polynomial in the backshift where q(B) = . These correlations are useful in determining whether an apparent linear relationship between Z1.1 4 ) between the subvectors (Zi. 17. we introduce another form of partial correlation proposed by Heyse and Wei (1984). Z5. = (Zi?..m2. . is suitably partitioned into 3 subvectors. it can be shown that where F = 13-'. m2.2) for this particular partition of Z.. the moving average representation for Z. 25. (17. and t. can be written as . is defined as Extending a result in Box. Jenkins... p.2 Partial Covariance Matrix Generating Function . The lag s covariance matrix can be similarly partitioned as where r i i ( s ) = E[Zi. . Our interest is in the correlation matrix at lag s for the process [Zi.2.. Zj.is the coefficient of BS in (17.j = 1. Z$.17. . + where the components q!tii(B) are matrix polynomials in B with 1Clii(0)= I.note that is the autocovariance generating function for the ith component process of 2.2. and m3 with ml m2 1113 = ni.1).. = (Zi. ' h e variance of the process Zi yii(0). Suppose that Z. for i # j. . . . Partitioning conformably the *(B) and a. yU(B) = y i i ( ~ ) i~ =s .Ip. . can be written as + . .. The covariance matrix generating function (17. m. . .2 Partial Process and Partial Process Correlation Matrices 443 The covariance matrix generating function for the process Z...)'.].bij(0) = Ornix.. Similarly. Z$. Writing GZ(B)= [ y @ ( ~ we ) ] .2. . yii(s). and i # j.+.is the coefficient of BO = 1.. and Reinsel (1994. 2.. 85)..:x 17.4). of dimensions nt. after removing the finear dependence on the process Z3. 1... is the cross-covariance generating function between the ith and jth components of 2.rn.2. and the autocovariance at lag s. . +. . Z 5.Zi!)' and (Z . respectively. and i2.1trespectively..444 CHAPTER 17 More on Vector Time Series where Note that because rij(s)= rji(-s). and consider the correlation matrix function for the residual process . 541) and setting the resulting derivative equal to 0 leads to the normal equations . . p...2.7) with respect to acj (Graham [1981. let and where the ai and Pi are chosen to minimize.)I after removing the linear dependence of each on the series &.. . we let Zl.+. and Z. and Taking the partial derivative of (17.t be the linear projections of Zl.?..on Z3. That is. the components of GZ(B)are related as To find the correlation between (Zi. 2 Partial Process and Partial Process Correlation Matrices 445 Multiplying the jth equation in (17.9) by Bj and summing over j gives Hence.@(B)Z3. [rll( s )BS .r t 3 ( s )BSa'(B) - . from which ive have the solution * The covariance matrix at lag s for the residual process ( Z 1 . .e.2.10) by BS and summing overs...Z is equal to The covariance matrix generating function for the residual process can be determined by multiplying (17. i. E[1&.2.17. 12] (Y ( B ) r 3 * ( s BS ) + ~ r ( B ) r ~ B~ S( ~s )' ( B ) ] is minimized by choosing The covariance matrix at lag s for the residual process ( Z 2 .. t) is equal to . . = s=-cn Similarly.2 2 . ZGG3.446 CHAPTER f 7 More on Vector Time Series from which the covariance matrix generating function can be determined as Also. t . at fag s. and Note that from (17.11) to (17. Z5. (B). ~ Specifically.3. . t)f If we let be the covariance matrix for the residual process (z11.3. and Thus.3. ~ (iss ) obtained directly as the coefficient of BS in G ~ ~ .t .2.2.3(0)is the coefficient of BQ in (17. I'f2. then r f ~ .15) and represents the variance-covariance matrix for the residual process.2.14) we have the covariance matrix generating function for the residual process (Z1. ~ ( sfor ) .5. . E a sample of the partiat process j = [ ~ i .)' component of Z. is the correlation matrix function for the patial process. ~could ( s ) be estimated directly using the procedures discussed in Section 16. ~ ..2. .. t .. one intuitively reasonable estimation procedure would be first to compute the residuds &om the regression equations z .2.3(0).. then clealy ~ f 2 . The partial process correlation matrix at lag s.15) ) the partial process covarinent series Z3*has been removed.P.17. ~]' .n . because it represents the remaining stochastic process in Zl.. Z5.i . P f 2 ... can be estimated by the sample correlation matrix at lag s for the series where t = Q + 1. . . ..nl m2)-dimensional (Zi.. ~~ . .. They call G ~ * . Zb.3 Partial Process Sample Correlation Matrix Functions + . the (.3.)'.2 Partial Process and Partial Process Correlation Matrices 447 Let D be the diagonal matrix.16) the partial process correlation matrix function for the partial process of (Zi. Q + 2.&.6) the partial process of (Zi. . Heyse and Wei (1984) call the residual process (Zi. 17.. with its ith diagonal element being the square root of the ith diagonal element of rr2. Thus. ~ in ance matrix generating function and &(s) in (17.2. and. Q2) and P = max(P1. were observable. Zh. with Q = max(Ql.. Z5. allowing for 5.3.That is. 3 ( ~ ) . using multivariate least squares estimates for Gk and where rr is the number of obsexved values of 2.1. The partial process conelation matrix function.. P ~ * . allowing for Z3. P2).)'.)' in (17. and il.2.then we can define the correlation matrix at lag s for the residual process as . after the linear relationship of each with the third stochastic compo( S(17. after removing the linear dependence on Z3.. .2.2. 3 ( ~ do ) not change dramatically.P I .This problem is common in building statistical models. . 3 ( ~ ) .Q ) is the effective sample size.k = .17) and (17. . Hog Data The classical U. It is apparent that the choice of the P's and Q's involves a trade-off. C.: Hog prices in dollars per head on January 1. . .2. and Z2.4 An Empirical Example: The U. .I). it is generally advisable to choose several values for the limits to be sure that the resulting estimates of P f 2 .(s).and Q2. . Q. Nonetheless. . . Let f. 453) shows that the jk(s) differ from the yh(s) by quantities that are of the order (no . Zt is the vector of means for kf. then some effects of the third series will remain in fif2. Q1. which we recall was computed from the series of estimated residuals ~ f ( s in ) (17. Note also that because the effective sample size no uses max(P1. and I = -P2.17)and (17. P2. .: Corn prices in dollars per bushel on December I. .2. The five series are HI:Hog numbers recorded on January 1in the Census of Agriculture.Q2. . and we don't seek to solve it here..17) and (17. 17. (nrl nt2).P . Hannan (1970.Q1.). on 23. j)th element of yiz3(s) Also denote the (i. In practice. Z$.3(s)be the partial process sample covariance matrix at lag s for the unobservable partial process z:. P2.3(s). . ~ by ( s ?). j)th element of ~ J z .S. P. one can choose both sets of limits at the maximum in (17..18).If they are too large. which implies that -$$(s) has the same asymptotic distribution as yi(s). -(PI . and j = 1.y h ( s ) ] converges in probability to zero. One problem with this procedure for computing f i f 2 . P2) and max(Q1. In other words.2. .: Corn supply in bushels produced during the year. for i = 1. diagnostic aids that examine the statistical significance of the parameters are helpful. let yL(s) be the (i. for large tt and moderate choices for PI. If they are too small. no + + 6 .18). for the purpose of estimating the remaining correlation structure in ( Z i . our interest is in removing the linear dependence of Z.448 CWPTER 1 7 More on Vector Time Series where " (n .2. Q1. then the effective sample size 110 will be too small. .s)ll and that ~ [ ? J ( S ) . p. . .Instead. elements of the partial process sample correlation matrix obtained by computing the sample correlation matrix for the estimated residuals (17.and Q2 for computing the residuals in (17. Qz).2. we must remind ourselves that our interest is not in estimating the linear regression coefficients ak. For convenience. . \VI:Farm wage rate.S. hog data consist of five series of 82 annual observations for the years 1867 to 1948.18)can be interpreted in the usual way.17) and (17.]2.A(~trl n13. In dealing with these choices. -(P2 .2.and b is the diagonal rn2trix in which the kth diagonal element js the square root of the corresponding element of I ' ] 2 . the data are listed in the appendix as Series W13..2.18) without reducing ?la. 3 ( ~ )is with the choices of P I .I). These cross-correlationsare persistently high and provide very little infomation regarding the relationship between hog numbers and prices. and 23. Our attention focuses on the relationship between hog numbers Ht and hog prices P.22 and ..3..22. f 1.for hog numbers Ht and hog prices P. who fit the data to a five-dimensional vector autoregressive moving average model. I I i I A description of the data appears in Quenouille (1957). indicates cross-correlations between -. and Tiao and Tsay (1983). + indicates cross-corretations greater than . Z2. The data were logged and linearly coded by Quenouifle.22. iI2(s).f 12. at lags.4 Estimated cross-correlation for hog numbers H. 2) element of the sample correlation matrix 6 (s) for the five series.In terns of our development thus far.+.are shown in Table 17.4 for lags s = 0. and hog prices P. . . WJt. Lag s $12 (4 Indicator Lag s i 1 2 (s) Indicator Note: 2 x estimated standard error = . we first want to compute the residuals and . . .17. The estimated cross-correlations.+.2 Partial Process and Partial Process Correlation Matrices 449 TMLE 17. CI. except possibly that the series are nearly nonstationq or are commonly related to a third series. = P. Box and Tiao (1977). Following our definition of the partial process and the development in Section 17.22. we are considering the case where Zf.2. = Ht.. and this form of the data was used for the present analysis. These values were taken from the (1.t = (Qt. . .2. .20) and (17. and wage rates. and hokprices P.31 indicates that hog numbers Hi and hog prices P.3(~) Indicator Note: 2 X estimated standard error . i. however.e. . +. and j12.23 and .3(0) = -.3 Tndicator Lag s $f2. + + r .indicates cross-correlations less than . and jj]2.5 indicates that hog numbers and prices were related in the period under study beyond their common dependence on corn prices. ' .5 for the choice L = 3 along with the -. $12.5 Estimated partial process cross-correlations for hog numbers H.3(-1) = -23 indicates that hog prices P.3(0). the three estimates are shown for each of the choices L = 0. corn supply.3(1). at time t were negatively correlated contemporaneousfy. symbols indicating whether correlation coefficients exceed two times their estimated standard errors (1/75)'12. and wage rates. b12(0).450 C W E R 17 More on Vector rime Series TABLE 17.3(s) are larger than twice their estimated standard errors.30 is not clear and may be due to random variation. . .23. The value L =.. .. The partial process sample cor_elajion matrix function 6L2. To observe the effect of increasing the limits in the regressions (17. Lag s fif2.3(-1) because the meaning of jf2. b12. In Table 17. fifi.2. at time t and hog prices at time t 1 were negatively correlated. and 612. Next.23.the sample cross-correlations jj12(. f ) for each of these choices and was used to estimate the cross-correlation between hog numbers and prices after removing the linear dependence on corn price. a indicates cross-correlations between -.23.51 shows that hog numbers H. 8.S(l) = -.3(. at lag s.3(~)in Table 17. 1. A using ordinary least squares estimates for the Gik and Pn.= . . we examine the effect of increasing the h i t s of the re essions (17. null represents the null case. j312. -k indicates cross-correlations greater than . Only four values of bf2. and b12(1).2. Here..I). 1.. 8.21) on the three partial process sample cross-conrelations 5 2. Interpretation of ij12. we used the symmetric limits L = P I = Ql = P2 = Q2 = 0. corn supply.20) and (17. These values are shown in Table 17.2.3(-~1) = -.3(s) was then computed from the estimated residual series ( ~ f~ . and of these we consider only j?#f2. at time t were positively correlated with hog numbers HttI at time t 1.3(~).6.1).21).3(f)..23. respectively. This phenomenon becomes apparent for this data.2(a).#B). For example. The results of this analysis indicate that the effect of increasing the limit L on the l) although ij[2. is a white noise series with mean zero and constant variance u2. + + 17.indicate partial process cross-correlations which exceed t w o times their estimated standard errors.+. ~is(minimal.21). closer to zero as the limit increases.3(0). B is the backshift operator such that BZ. The additional evidence enhances our conclusion that hog numbers H. = Z.3 Equivalent Representations of a Vector ARMA Model An ARMA model can often be represented by different forms. and hog prices Pt from Table 17.17. 3 ( .2. Hence. This result certainly corresponds to simple economic theories relating supply and prices.5 may be questionable. our interpretation of the contemporaneous cross-correlation between hog numbers H. 3 ( f ) are plotted against the limit L in Figure 17. and i j l * . and (c).3 Equivalent Representations of a Vector ARMA Model 451 TABLE 17.3(~) does diminish to values estimates $f2. and a. and wage rates. corn supply. however.3(-1) and $ f ~ .2. (b). consider the zero mean first-order univariate autoregressive (AR(1)) model where 4(B) = (1 .20) and (1 7. then the model can also be written as the infinite moving average (MA) f o m .Estimated partial process cross-correlations for increasing values of the limits P = Q in (1 7.If the zero of the polynomial +(B) lies outside the unit circle.-l.+I at time t 1 were positively correlated for the period under study beyond their common dependence on corn prices. Null 0 1 2 3 4 5 6 7 8 ' Note: Null indicates sample cross-correlations. + and .l ) r fi[2.at time t and hog prices P. The estimates i 1 2 . only through the study of the partial process and the partial process correlation function.6 . with no loss of generality.at time t and hog numbers H. at time t 1 were negatively correlated and that hog prices P. Similarly.~(j) for increasing values of the limits in the regressions (17.2. + where $(B) = [ # ( B ) ] .2..2 fif2. the first-order univafiate .~ = (1 #B moving average (MA( 1))model + # 2 ~ 2+ .20) and (17.21).452 CHAPTER 17 More on Vector Time Series FIGURE 17.). 5).3. or a finite-order vector autoregressivemoving average (ARMA)model.1 Finite-Order Representations of a Vector Time Series Process For the vector AR model given in (17. and a finite-order invertible MA process can be represented by an infinite-order AR form.1.=a.Jf is an nt X 1 vector.3. The infinite forms in (17. This conjecture is not necessarily true.3. we present some simple results showing that under certain conditions a vector process can be simultaneously represented by a finite-order vector AR model.3.17. it is intuitively reasonable to expect that the above dual representations also hold for a vector process. the model can be written as the MA form .d 2 ~ 2 (17. For example. In other words.13 element t$i j.QB). + (17. Hannan and Deistler [1988]). Thus. there are interesting dual representations. and a. More generally. Now. for a univariate time series. [+ij] is a nonzero m X nt coefficient matrix with its (i.6B) and. equivalently.3. .3. . . The possibitity of multiple representations for a vector ARMA model has been known in the literature (see. . the AR form is useful in expressing the forecast as a weighted average of previous observations.1 that when roots of the determinantaf polynomial I@(B)~ = 0 lie outside the unit circle or. Z. In Section 17.I is an nt X m identity matrix. and the MA form makes the calculation of the forecast variance especially simple..1) is the vector AR(1) process + . ..63 4.Because a vector time series is a generalization of a univariate time series.2) and with r ( B ) = [o(B)]-I = ( 1 -I. we expect that a finite-order vector AR process wilt correspond to an infinite-order vector MA process and that a finite-order vector MA process will correspond to an infinite-order vector AR process.. = [Z1. = [al. The generalization of (17.I' 17.3.a. and a finite-order MA process corresponds to an infinite-order AR process. . In Section 17. when all the eigenvalues of @ lie inside the unit circle. however. a finite-order stationary AR process can be represented by an infinite-order MA form.*.3. suppose that Z.. . is an m X 1 vector white noise process with mean vector 0 and positive definite variance-covariance matrix 2. a finiteorder vector MA model. These alternative representations are useful.its zero outside the unit circle.3. a finite-order AR process corresponds to an infinite-order MA process. we discuss some of the implications of these special types of vector time series models.2. for example. where @(B) = (I . Q =.4) indeed contain an infinite number of terms as long as the parameters 4 and 0 are not zero. That is.3 Equivalent Representations of a Vector ARMA Mode1 453 with O(B) = (1 . can alternatively be expressed as the infinite autoregressive (AR) form rr(B)Z. it is known from Section 16. We also establish the maximum order of these representations.4) ). @(B) = (I . For the example given in (17. We note that in the vector AR(1) model given in (17. = [ I . To see why that happens.3. the inverse of a nondegenerate AR(1) matrix polynomial (i. (e) The matrix @ is nilpotent.@BI = 1.3. any vector AR(1) model with a nilpotent coefficient matrix can also be written as a finite-order vector MA model. 21. Zy='=r+ii=O.7).6) always an infinite order? To answer the question.36) represents a vector MA(2) model. ik) is the matrix of with its il. there exists an integer k such that Gkf 0.7) with Because Q2 # 0 and @ j = 0 for j > 2.OBI . Now.l@(i)l = O . To understand this special feature of a vector ARMA model. .rP~lisindependent of B. where @(il.e. The order of the resulting matrix polynomial in B of its adjoint matrix clearly will be finite. . we have the following results (more detailed proofs can be found in Shen and Wei [19951): 1. . we investigate in this section conditions for these seemingly different but equivalent representations. In fact. @(B) # I) will be of a finite order if the determinant \@<B)\is independent of B. the follotving conditions are equivalent: (a) The vector AR(1) process can be expressed as a finite-order vector MA(q) model. with B = 0. Given the m-dimensional vector AR(1) model defined in (17.@BI = 1. the form in (17.Q B ] a. Is the MA form in (17.... (d) The eigenvalues of @ are all 0.5).3."=.iq. i..454 CHAPTER 1 7 More on Vector Time Series where Go = I.. Thus.. .3.(PB)Z. . . let us consider the three-dimensional vector AR(1) model taken from Newbold (19821. From the above observations. In fact. 11 . . and (Pu denotes the ith diagonal element of a. let I A ~ denote the determinant of a matrix A and adj(A) denote the adjoint matrix of A. (17.. and @ j = 0 forj > k. = 1 adj [I . (I .j)l = O . (b) The determinantal polynomial 11 .5).@B) is a matrix polynomial in B of order 1. 2. Z.. (c) I@[ = 0 .e. ik columns and rows removed. Espedally. we note that the determinant 11 . we have11 .3. . .@B]-Ia. = a.[@(i. OBI is independent of 3. respectively. k) model withpl r p and k .@B) is an m X m matrix with its elements being polynomiafs in B of order 1. On the other hand.1) pl q] + .e l B . (I . .epBq) are the process can be represented as a finite-order vector MA(k) model with k r [(nl .@33 are polynomials in B of order less than or equal to (m . Because . 5 [(m. . (d) The matrix 63 is nilpotent.i denotes the ith diagonal element of 6.1) q -t. the nr-dimensional vector AR(1) model can be expressed as a finite m-dimensional vector MA(q) model with q 5 (ni .1). ll).I).@BIis a constant. .can be expressed as a finite-order vector AR(p) model.I)..@) = (I . j < j i j d i i = O . j? element Oimj. then the process can be written as the finite-order vector ARMA(p . if any one of the following holds: (a) The determinantal polynomial 11 .17.3.(B) = Hp. @. because + the process can be represented as a finite-order vector AR(n) model with ti 5 [(nt .QpP)and8. let us consider the na-dimensional vector ARMA(p.3 Equivalent Representations of a Vector ARMA Model 455 Because 1 .(B) = (I .1) p q] if the determinantal polynomial I a p ( ~ )is[ independent of B. ik) is the matrix of 8 with its il. where @fir. in (17. with p 5 (m . . if. . the m-dimensional vector MA(1) model where O = [ O i j ] is an nz X nz coefficient matrix with (i. q) process where@.(B)flp-.(B)~ is independent of B. 3.pl. and the elements of the adjoint matrix adj fl . Next. p] if the determinantal polynomial I O.. Similarly. . . More generally.(B)I is independent of B. 2. iA columns and rows removed and Bi. (b) 0 6 0 . (c) The eigenvalues of 6 are all 0.(B) such that IH.e l B matrix polynomials in B of order p and q. 4. which is not independent It can be easily shown that 11 .elB . (I . then the process can be written as the finite-order vector ARMA(n. 3) model where Similarly. Because. which is not independent of B.(B)r.@. however.3. in (17.456 CHAPTER 17 More on Vector Time Series On the other hand.4B f ..m)(I - ne). B ~= ) ( I .2) process where = (1 .(B) such that IV. hence. .1) ql f p] EXAMPLE 17.a l B of B.45B2). however.3.-.3. ( B)[ is independent of B. 11 . with Hence.18~'). It can be shown.16) can also be written as the vectorAKMA(1. the process cannot be represented by a finite-order purely AR vector model.B + .r B ) . .e2lI21 = (1 . by (17. that the (I - @I B .14).6 Consider the two-dimensional vector AFWA(2.(B) = V. (17. 0.16) cannot be represented as a finite-order vector MA model.3.1. if. q ..ql) model with ql r q and n 5 [(m. Model in (17. Hence.11).elB - 8 2 ~ =~ ( I) - V B ) (I . 3. we examine some implications of these representations. ~ g ~ < $ C*3$ for.2 Some Implications The results in Section 17.. or a finite-order vector ARMA model. can be written in an MA form k] are square summable in the such that the nl X nr coefficient matrices q k= sense that each of the nt X nt sequences is square summable.15). .3 Equivalent Representations of a Vector A R M Model 457 with by (17.TB)al. This property is in drastic contrast to the univariate time series process. -. li]. Tiao and Tsay (1989) also call these equivalent models exchangeable models.1 indicate that for a vector time series.3. a phenomenon that cannot exist in the univariate time series.3. = where HI = [189 n2 [-. = (I . ~ . and (17. . Four implications are as follows: 1.17.3. A finite-order vector AR or ARMA model is a finite period memory model if it can also be represented as a finite-order vector MA model.l0 o -51' The models in (17. (17.16) can also be written as the vectorARPVIA(3. an m-dimensional stationary vector process 2. where the mean becomes the optimal forecast for an AR or an ARMA model only as a limit when the forecast period tends to infinity. a finite-order vector MA model.3.17).I. a certain process can be simultaneously represented as a finite-order vector AR model. i.18) are equivalent.3.6 and n3 = [-. That is. 2.e.l. In terms of stationarity in the wide sense.16). 1)model (I . 17.nlB - n 2 ~-2lI3Il3)Z.3. Some implications are direct consequences of the representations. (17. the optimal forecast of the process after a finite period of time is the mean vector of the process. In this section. . where 3 is a Jordan matrix whose elements are zero except for those on the first superdiagonal.3. i.e.. = (Z1. 2.. a finite-order vector MA. such that the nt X nz coefficient matrices Ilk = Inii.21) If the process can also be written as a finite-order vector MA model. . i. . cannot be simultaneously represented by a finite-order vector AR. an nr X 1 vector process Z. In other words. . . is. As discussed in Section 17. = a. . . ~ g ~ <t 03~for~i =.1.1)th difference is nonstationary but its dth difference is stationary. . 4. The results for other cases are simif ar. let us consider an m-dimensional vector AR(1) process . . l. . for some matrix C'. which are equal to 0 or 1. then they must be in terms of its linear combination C'Z.is 1(1). no noninvertible vector process can be simultaneously represented by both a finite-order vector AR model and a finite-order vector MA model. . . no nonstationary vector process can be represented by both a finite-order AR vec'tor model and a finite-order vector MA model. If multiple representations occur. .1. 3' is said to be cointegrated if each of its component series Zi.111 and j = I. An ni-dimensional vector process 2. 3. is said to be invertible if it can be written in an AR form i = 1. The discussion in item 2 implies that a cointegrated vector process Z.kRf are absolutely summable.458 CHAPTER 17 More on Vector Time Series . A nonstationary process 2.e. If a vector time series process can simultaneously be represented by a finite-order AR model and a finite-order MA model.is said to be integrated of order d. for some k X m matrix C' with k < m. C'Z. .~ . there exists a nonsingular matrix H such that H@H-' = J.. . .. In.Thus. Hence.. (17. .is a stationary k X 1 vector process. nt and j = 1. if a vector time series process can be simultaneously represented by a finite-order AR model and a finite-order MA model.. denoted as I(d). then the process can be reduced to an inputoutput type of transfer function model discussed in Chapter 14. First.. m.. if its (d . Hence.@B)Z.but some linear combination of Z. then the process must be both stationary and invertible. then by earlier results the eigenvalues of cP are all zero. or a finite-order vector ARMA process. We illustrate the phenomenon through a first-order vector process. (I . gives H(I . . there exists a nonsingular matrix R such that ROR-' = J. gives R(1 + 8 B -F + 8 2 ~ 2. = Ra. = r. and Because 8 is nilpotent. (17. then there exists an integerp so that 8p # 0 but @" 0 for n > p. Thus.. where 3 is a Jordan matrix defined in (17.3.= Ra. and r.26) and letting W.22).@B)H-'RZ.23) becomes Next. Multiplying H on both sides of (17. and those on the first superdiagonal.. and e. where e.17. is the m-dimensional white noise process with mean vector 0 and variancecovariance matrix W H ' . = Ha.3 Equivalent Representations of a Vector ARMA Mode1 459 where S = 0 or 1. consider an nt-dimensional vector MA(1) process If the process can also be written as a finite-order vector AR model. + R@PR-'BP)W. = HZ.3.. Multiplying R on both sides of (17.21) and letting Y .3. which are all equal to -6B. = Ha. = RZ. or (I + RBR-'B + . + @PBP)R-~RZ.JB) is a square matrix whose elements are zero except for those on the principal diagonal.3. which are all equal to 1. (I . where the a. Thus. Note that *(I) = 0. time series process can simultaneously be represented by a finite-order vector AR model and a finite-order vector MA model. . (a) Why is 2.3 Consider the bivariate process given in Exercise 17.1. and show that the resulting process is asymptotically stationary and thus is not an Z(1) integrated process.a. More explicitly. it is not I(0). .27) becomes These examples show that if a vector. (d) Find its error-correctionrepresentation. (17. 17. is the m-dimensional white noise process with mean vector 0 and variancecovariance matrix RXR'.d. random variables with mean 0 and variance 1. then it can be written in a vector AR form with a triangular coefficient matrix.3.1.460 CHAPTER 17 More on Vector Time Series Because RakR-I = Jk. = (1 . = a. is I(1).3. cointegrated? Find its cointegrating vector.2 Consider the two-dimensional vector AR(1) process Verify that each component series of Z.B)a.2. . (c) Find its AR representation.1 Consider the process.i. (b) Fiid its MA representation. 17. Find the process of its sum. 2. are i. where r. hence. 17. = T(B)a. the process can be reduced to an input-output type of transfer function model as discussed in Section 16. (d) Find the partid rocess covariance matrix function 11[2.Ir allowing for 23. (a) BuiId a vector ARMA model for the series.. t. a3. and examine the interrelationship among the component series.5 Let Z.4 Find a vector time series with its component series being nonstationary. = (ar. = (al. 17.. = (ZIP. find an error-correction model for the series that describes a long-term relationship among the components.3(k). = I.J' is a vector of white noise process with mean zero and variance-covariance matrix I. P 17. (e) Comment on your findings. . .Ir and where a. (a) Find the covariance matrix generating function GZ (B) of Z. 5... =. Z3.. (b) Find correlation matrix function pI2(k) for (Zl.2 2 . . a*.J' is a vector of white noise process with mean zero and variance-covariance matrix (a) Find the covariance matrix generating function GZ(B)of Z. a2.r.... where a. (c) Find the partial process covariance matrix generating function G ] * . . .. (d) If the components are cointegrated.t ) l and . (b) Find the covariance matrix function r z ( k ) and correlation matrix function pZ(k) of Z. (c) Test whether the components ate cointegrated.&.Exercises 461 17.. Z2..r. .3(k)and correlation matrix function p 2. ~ ( Bfor ) (Zl.6 Let 2. (b) Perform the unit root test to verify whether each component series is 41).. (Z1. Z2. a3. 462 CHAPTER 17 More on Vector Time Series (c) Find the partial process covariance matrix generating function G ~ z . ~ ( for B) (21,,,Z2, A ' allowing for 23, (d) Find the partial process correlation matrix function p12.3(k) for fZ1, 22,,)' allowing for &,t . (e) Comment on your findings. ,, 17.7 Find a vector time series and perform a partial process analysis as discussed in Section 17.2.4. 17.8 Let 2, be the m-dimensional vector AR(1) model (I - cPB)Z, = a,, where a, is the rn-dimensional white noise vector process with mean 0 and variancecovariance matrix Z. Show that the following conditions are equivalent: (a) The vector AR(1) process can be expressed as a finite-order vector MA(q) model. (b) The determinantal polynomial 11 - @BI is independent of B. Especially, when ~ = 0 , w e h a v e [ 1 - @ B / = 1. (c) The eigenvalues of Q1 are all 0, (d) The matrix is nilpotent, i.e., there exists an integer k such that @ # 0, and @ j = 0 for j > k. 17.9 fa) Find an example of a vector process that has both finite AR and finite MA representations. (b) Express the model in part (a) in a transfer function model. State Space Models and the Kalman Filter .. Having presented various univariate and multivariate time series models in previous chapters, we now give a brief introduction to the state space model and Kalman filtering, This very genera1 approach in time series modeling can be used for both univariate and multivariate time series. 18.1 State Space Representation . The state space representation of a system is a fundamental concept in modern control theory. The state of a system is defined to be a minimum set of information fiom the present and past such that the future behavior of the system can be completely described by the knowledge of the present state and the future input. Thus, the state space representation is based on the Markov property, which implies that given the present state, the future of a system is independent of its past. Consequently, the state space representation of a system is also calfed the Markovian represenhtion of the system. Let Y1, and Yube the outputs of a system to the inputs XI, and Xa, respectively. A system is said to be linear if and only if a tinear combination of the inputs, ax1, bX2,, produces the same h e a r combination of the outputs, aY1, 4- by2,, for any constants a and b. A system is said to be time-invariant if the characteristics of a system do not change with time so that if the input X, produces the output Y,,then the input X,-,,wit1 produce the output Y,-,,. A system is said to be linear time-invariant if it is both linear and time-invariant. Many physical processes can be modeled by h e a r time-invariant systems that contain stationary processes discussed in previous chapters as special cases. For a linear time-invariant system, its state space representation is described by the state equation + and the output equation 464 CHAPTER X 8 State Space Models and the Kalman Fitter where Y, is a state vector of dimension k, A is a k X k transition matrix, G is a k X ~z input matrix, X,is an n X 1 vector of the input to the system, 2, is an nt X 1 vector of the output, and H is an nt X k output or observation matrix. If both the input X, and the output 2,are stochastic processes, then the state space representation is given by AY, + Ga,+1 Z, = HY, + b,, Y,+1 = I where a,+l = XI+* -E(XI+IXi, i 5 i ) is the a X 1 vector of one-step ahead forecast error of the input process X,and b, is an rn X 1 vector of disturbances assumed to be independent of a,. The vector a,+l is also known as the innovation of the input X, at time (t 1- 1). When 2, = X,,b, vanishes from (18.1,4), the state space representation of a stationary stochastic process 2, becomes Thus, the process 2, is the output of a time-invariant linear stochastic system driven by a white noise input a,. The Y , is known as the state of the process. The state equation is also called the system equation or the transition equation, and the output equation is also referred to as the measurement equation or the observation equation. The state space representation of a system is related to the Kalman filter and was originally developed by control engineers (see Kalman f19601, Kalman and Bucy {1961], and Kalman, Falb, and Arbib l19691). The system is also known as a state space model. Akaike (1974a) seems to have been the first to apply the concept directly to the analysis of ARMA models. 18.2 The Relationship between State Space and ARMA Models To see the relationship between state space and ARMA models for either the univariate or multivariate case, consider the zero mean stationary nl-dimensional vector ARMA(p, q) model - where @(B) = (I - Q I B - - . - @,BP), 8(B) =(I - BIB nt-dimensional multivariate zero mean white noise process. . . . - QpB4),and a, is an 18.2 The Relationship Between State Space and ARMA Models Rewriting (18.2.la) in a moving average form where Let Then Now, Thus, = I, we have 465 466 CHAPTER 18 State Space Models and the KaIman Filter We assume, without loss of generality, that p > q by adding sary. Then, from (I8.2. lb), we have @i = 0 in (18.2.lb) if neces- . In fact, it is clear that Z1+p,il, for i e 0 is a function of Z,, Zl+llr,. , and Z,+p-llt, Hence, the state vector is {Z,, Z,+Il,, . . , Z,+,-ll,), and the state space representation of the vector ARMA(p, q) model is given by . and Note that in the above representation, the first I? components of the state vector are equal to Z,. Equation (18.2.6) is usually suppressed in the representation of an ARMA model. Now suppose that a process 2, has a state space representation where it is assumed that Y, is a nap X 1 vector of the state and a, is the innovation of Z,. If the characteristic polynomial of A is given by IAI - A( = Xy=O#iA~-i, where tho = 1, then by the Cayley-Hamilton theorem (see, for example, Noble [I9691 and Searle 119821) 18.2 The Relationship Between State Space and A R M Models 467 By successive substitution in the first equation of (18.2.7), we have, for i > 0, Now, Hence, Zt has an AlZMA representation where H(Ap +l~i-l + #qAp-l + . + # J ~ - . ~+A +pX)Yt = 0 by (18.2.8) and Bi = H(Ai + + .., + + i ~ ) ~ . In the representation (18.2.101, we use the Cayley-Hamilton theorem and obtain the AR polynomial as Qp(B) = $p(B)I where $(B) = 1 - #lB - . - +,LIP is a univariate polynomial of order p. Recall from Section 16.3, this is one of possible representations of an identifiable ARMA model. Note that in the state space representation (18.2.5) of an ARMA model, because the first m components of the state vector are equal to Z,, we can also recast the state representation into an ARMA modef by directly solving the state space system of equations for the first m components. tVe illustrate this method in the following examples. . - 468 CHAPTER 18 State Space Models and the Kalman Filter EXAMPLE 18.1 Consider the univariate ARMA(2, 1) model, By rewriting (18.2.11) in the moving average form where !Po = 1, 'PI = c#q - 81, . . . ,we have, from (18.2.5) and (18.2,6), the following state space representation of (18.2.11): To recast the state space representation (18.2.12) into an ARMA model for Z,, we note that (18.2.12) implies that By (18.2.13), we have Because Zt+l!t+l= Z,+, and Zl+21t+2= Z,+2, by substituting (f8,2.13) into (18.2.14) and (18.2.14) into (18,2.15), we obtain which is theARMA(2, 1) model given in (18.2.11). For a multivariate vector process, it is possible for some components of Z,+d, to be linearly combinations of .the remaining components. In such cases, the state vector consists of a subset of the possible components of {Z,+,!, 1 i = 0, I, . . . , ( p - 1)).Thus, depending on whether the redundant components, which are linearly dependent on others, have been eliminated, seemingly different but equivalent structure may arise. We illuseate this point in the following example, 18.2 The Relationship Between State Space and ARMA Models 469 EXAMPLE 18.2 Consider the following two-dimensional vector ARMA(1, 1) model: By (18.2.5), the state space representation of (18.2.16) is given by where 1= I'[ $21 '"] and 0 2 = [o0 0 0] is a zero matrix added to change the ARMA 422 (1, 1) model to an ARMA(2, I) model so that p > q, By the above remarks, the state vector is equal to fZ[, Z~+llt]' if all components of this vector are linearly independent and equal to a subset of [Z:, Z:+llt]r if some components of the vector are linearly dependent. To examine the linear dependency of the components of the possible state vector [Z:, Z~+llt]', we note that [ZI, Z:+lltI1 = [Z1,!, 22,f , Zt, t+llfr22,r+~ltl',NOIV, Hence, the state vector is [Z1,,, &,1, Zl, t+llt]',Because the component &,,+dt is a linear as shown in (18.2.21), it is not included in the state vector. combination of Zl,, and The representation (18.2.17) can now be simplified by representing Z1, 22, t+l, and and Z1, fcdt+l in terms of the state vector [Zl, ,, 22, !, Zl, t+llt]'and the input noises at, a2,f + l . Clearly, from (18.2.18), (18.2.19), and (18.2.20), we have 470 CHAPTER 18 State Space Models and the Kalrnan Filter Combining (18.2.22), (18.2.23), and (18,2.24), we obtain the reduced state space representation We may recast the state space form (18.2.25) into ARMA form by expressing the system of equations in (18.2.25)in tems of Zl,, and &, Using (18.2.22), we can write the conditional expectations in (18.2.24) as ,. Hence, Combining (18.2.27) and (18.2.23), we get the vector ARMA(1, 1) model given in (18.2.16). Thus, theoretically there is no distinction between the state space and the ARMA representations of a stationary process. The results obtained fiom one representation can be used for the other representation. Unless all components of a state vector are linexly independent, however, when converting a state space form into an ARMA form, there will be nontrivial common factors in the autoregressive and moving average polynomials. Hence, the resulting ARMA model may have superficially inflated the orders o f p and q. 18.3 State Space Model Fitting and Canonical Correlation Analysis The state space.representation given in Sections 18.1 and 18.2 is clearly not unique. For example, given (18.1.5), we can form a new state vector V, = MY, for any nonsingular matrix M and obtain a new state space representation 471 18.3 State Space Model Fitting and Canonical Correlation Analysis K+I = A I +~Glat+l (18.3.1) and where Al = MAMA1,GI = MG, and HI = EM-'. Through a canonical representation as shown in Akaike (1976), however, we can obtain a unique solution. In the canonical correlation representation, the state vector is uniquely determined through the canonical correlation analysis between the set of current and past observations (Z,, ZnV1;. . . ) and the set of current and future values (Z,, Zn+lln,. . . ). In terms of an AR(p) model, because the eventual forecast function is determined by the AR polynomial and (Z,, Zn-l, , , Z,-p) contains essentially all the information relevant for the future values of the process, the canonical correlation analysis is simply performed between the data space .. and the predictor space Consider tire following block Hankel matrix of the covariance between Dn= (Zh, Zk-l,. Zb-p)' and F,, = (Zk, Zh+lln,. . , ZA+pln)'defined by . . ., where we use a property of conditional expectations and hence that COV(Z,-~,Z,+iln) = COV(Z,-~,Zn+j). Akaike f1974a, 1976) shows that under the nonsingularity assumption of r(O), for general vector ARMA models, the rank of I' is equal to the dimension of the state vector and thus is also equal to the number of nonzero canonical correlations between D,, and F,. When the model is unknown, the choice of the order p is obtained from the optimal AR fitting of the data, which is often .based on AIC discussed in Chapter 7. For the vector process, AIC is defined as 472 CHAPTER 18 State Space Models and the K a h a n Filter where 11 IX,~ = the number of observations, = determinant of the covariance matrix for the innovations or the white noise series in the AR(p) fitting, and m = the dimensions of the vector process 2,. The optimal AR order p is chosen so that AIC is minimum. Thus, the canonical correlation analysis will be based on the block Hankel matrix of sample covariances, i.e., * - where r(j), j 0,1 , . .., 2 p , are the sample covariance matrices as defined in (16.5.16). As discussed in Example 18.2, because some components of a prediction vector Zn++, may be Linearly combinations of remaining components, the canonical correlation analysis is performed between dl the components of the data space and the components of the predictor space Because the state vector is known to be a subset of the predictor space, a sequence of potential state vectors, Yi,is determined through a canonical correlation analysis ketween a sequence, F;,of subset? of Fn, and the data space D,, based on the submatrix FJ forming from the columns of r, which correspond to the components of 'D, and .Fi. More specifically, because the canonical correlation between 2, = [ZIP., Z2, n , . . , Zm,,If and 'D, are 1, . . . , 1, which are clearly nonzero, the current state vector is set to Zn and the first subset in the sequent:, is set to [Z1,, &,,,, . . . , 2%n, Z1, n+llnl'. If the smallest canonical correlation of F' is judged to be zero, then a linear combination of F,!, is uncorrelated with the data space 'D,. Thus, the component Z1,ntlln and any Zl, n+qn are excluded from further consideration for components in the state vector. If the smallest canonical correlation is nonzero, then Zl, n+lin is added to the current state vector. The sequence F;is now generalized by adding to the current state vector the next component of Fnthat does not correspond to the component that has previously failed to be included . FA, 18.3 State Space Model Fitting and Canonical Correlation Analysis 473 in the state vector. The smallest canonical correlation of fj is computed and tested for its significance. If it is significaly different from zero, then the component is added to the state vector. Otherwise, the component is excluded from the current state vector and also from any further consideration. The state vector selection is complete when no elements of Fnare left to be added to the current state vector. For each step in the sequence of canonical correlation analysis, the significance of the smallest canonical correlation, denoted by F ~ , ,is based on the following information criterion from Akaike (1976): where q is the dimension of F/, at the current step, If C 5 0, p i , is taken to be zero; othenvise, it is taken to be nonzero. To test the significance of the canonical correlation & ;, one can also foilow Supplement 18A and use the approximate ,y2 test given by Bartlett (1941), who shows that under the null hypothesis of zero canonical correlation, the statistic + + has an approximate chi-square distribution with [m(p 1) - q 11 degrees of freedom, i.e., x2(m(p 1) - q 1). Once the state vector is identified, we have the canonical representation of a state space model + + where a, is a Gaussian vector white noise series with mean 0 and variance-covariance matrix 2, i.e., N(0, X), and H = [I,, Oj where I, is the m X ni identity matrix. Clearly, from the results of Section 18.2, estimates for the matrices A, G , and I:can be obtained from the parameter estimates of the optimal AR model fitting. More natural estimates of the elements for the transition matrix A, however, are obtained from the canonical correlation analysis. For example, let k be the number of components of the final state vector Y,, hence, A is a k X k transition matrix, From (18.3.9), it is known that rn 5 k 5 m(p + I). We now illustrate how the estimates in the lirst row of A are derived, which are related to the first step of the sequence of canonical correlation analyses when Zl, ,,++ is added to the vector Z, to form the first subset Fj;in deciding whether Zl, n+lln should be included in the state vector. When the smaflest canonical correlation between F,!and , Dnis judged to be nonzero, Zl, n+lln becomes the (?n 4- 1)th component of the state vector. Thus, the first row of A will have 1 in the (ni + 11th column and 0 elsewhere. When the smallest canonical correlation is judged to be zero, a linear combination of 3: is uncorrelated with the-data space V,,, and Zl, n+lln is excluded from the state vector, Because the determinant of r(0)is nonzero, we can take the coefficient of Z1,n+lln in this linear combination to be unity. Thus, we have the the maximum likelihood estimation involves a highly nonlinear iterative procedure. the STATESPACE procedure uses a multiplicative . and 2. Its STATESPACE procedure is used in the analysis of both examples. SAS~ETSof the SAS Institute (1999) is one of a few computer software that can be used to perform state space modeling. For a given sequence of n observations Z. and the second is a multivariate vector process.. The estimates in the other rows of A are obtained similarly. .3. one can use the maximum likelihood procedure b obtain more efficient estimates of A.16) where The standard maximum likelihood estimation.6. 1 8. can now be used to derive estimates of A. G . . 2 (18.AB)-lG]-'Z. the coefficients of the vector a are used as estimates of the first m columns of the first row of the transition matrix A. = [H(I .jn) columns of the first row are zero. in Equation (18. subject to additive constants. the log-likelihood function. .3. = or'Z.. because we have and a.G).11).and Z.-trZ-'S(A.6.3.) cx 1z . G . becomes In L(A.12) is identified. . Z2.. Alternatively. . and the remaining (k .. .n+rl. one can set certain elements of A and G to some constants such as zero or one. Z. In the process of estimation. We note. X .4 Empirical Examples We now give two examples of the state space modeling.-In 2 1 121 . . The fust one is a univariate time series. however. once a state space model (18. and 2. The estimates obtained from the canonical correlations analysis may serve as initial estimates in this more efficient estimation procedure.As pointed out in Section 16. Thus.474 CHAPTER 18 State Space Models and the Kalrnan Filter relationship Z1.. G . as discussed in Section 16. 63 < X $ 5 ( l ) = 3.q I]).45 + + constant. instead of -(n . Withp = 1. although in a large sample. -(n .2. The first step in the state space modeling is to find an optimal AR fitting of the data. The above canonical correlation data analysis is summarized in Table 18. the AIC is minimum at p = 1.4 Empirical Examples TABLE 18.04.q I]).1/2fm(p f 1) -t. The minimum canonical conelation coefficient equals . Hence. and the information criterion equals which is negative. the optimal AR order p is chosen to be 1. State vector Correlations Infor.0891.56 + 337. Z. EXAMPLE 18. the difference in using these two different multiplicative constants will not affect the conclusion in most cases.2 Canonical correlation analysis for blowfly data..1 / 2 [ m ( p 1) .1.Ib}.which we analyzed in Chapters 6 and 7.37 338.. This result is also supported by Bartlett's chi-square test which is not significant as x2 = . Z.1 AIC of AR models for the blowfly data.93 335.93 340. Z.-l} and the predictor space {Z.3 The univariate time series used in this illustration is the square root transformed blowfly data of 82 observations (Series W3).. (C) Chi-square DF . we apply the canonical correlation analysis between the data space {Z. Hence.475 18. TABLE 18. From Table 18.+lb is not included in the state vector. AIC 333. which corresponds to the same model we identified for the series in Chapter 6 using the sample ACF and PACF. . we next apply the canonical correlation analysis between the data space {Z. the state vector consists only of 2. . Z. Gz = .62 -308.. . are potential elements of the state vector does not mean that all their components ~ ~ +yn+iln. and 6: = .149. which is a scalar and is equal to . EXAMPLE 18. the canonical correlation analysis is performed between and TABLE 18. y.88 -608.I.476 CHAPTER 18 State Space Models and the Kalrnan Filter Thus. .00 .ll. = (1 .B)Y.73.= 1. Jenkins.-$) and the predictor space {Z. Jenkins.4 As a second illustration.]' for f = 1... . the optimal AR order is chosen to be 5. = [x.B)X. 5 are needed in the final state vector. . in constructing a state space model for the data. . we f i s t search for the optimal order in the AR fitting of the data. As mentioned earlier.. .3 AIC of AR models for the differences of series M in Box. and the canonical representation of the state space model for the data is given by which is the AR(1) model we fitted in Chapter 7. . . Thus. taken from Series M in Box.. .2. = (1 .3 because AIC is minimum at p = 5. The estimate of the transition matrix A is . . Again. ~ lfor~ i . however.97 -665. Z. Based on p = 5.0676. through Zn+51. 2. and Reinsel AIC -230. consider the bivariate series of a leading indicator X. and Reinsel (1994). Zn+lb.24 -271.. They fitted the data with the transfer function model where x. that Zn. We now apply the state space procedure to construct a state space model for the series 2. . and sales Y.+51n). y.0484. this result is certainly expected. .. Because the AR(1) is a Markov process. From Table 18. . State vector Correlation Infor. The estimates of A.(C) - Chi-square DF as shown in Table 18. from Table 18.is The resulting state space model is where [ar. Because the C value equals 12. ~is ~ considered.63. and Reinsel. . X) vector white noise process. The C values are positive when the components y. GIand X are given by and .+.+lln and Yn+nln are considered but negative when y . Hence.]' is an N(0.. + . Jenkins..4.4 Canonical correlation analysis for the differences of Series M in Box. a2.18.4. which is negative when the component xn+lln is considered. xn+in for i 2 1 are excluded from the state vector. the final state vector.4 Empirical Examples 477 TABLE 18. M). Because the t-value of the (4.d. we set it to zero and reestimate the restricted model. a.0011. respectively. We leave it as an exercise. The final fitted state space model becomes and Diagnostic tests indicate that the residuals are not different from white noise series. and H are fixed matrices of order k X k.4. i..+1 Z.1. and bt are independent of each other.4) in terms of variable relationships and forecasting. and b. and we conclude that the model is adequate. 18. and btis the measurement noise. 1) element of the input matrix is much less than 1 (t = . Z ) series.1. G.i.4). Y. it is assumed that the two noise series a. It would be interesting to compare the state space model (18. More precisely.e.d. which is an i. which is an i.denoted by . Gaussian white noise N(0.478 CHAPTER 18 State Space Modeb and the Kalman Filter where the value in parentheses below an estimate is the standard error of the estimate. Furthermore.4. are assumed to have a joint multivariate normal distribution with mean 0 and a block diagonal covariance matrix.5 The Kalman Filter and Its Applications Consider a general state space model of a stochastic system given by (18. k X n. and rn X k. Gaussian white noise N ( 0 . The vector a. b. { = AX + Ga. is the system noise.+1 + where A. = HY. a)series..097 = .3) and (18.7) and the transfer function model (18.i. This equation..(l)depends on the quality of the estimate Y. when a new observation becomes available. together with the system equation in (18. In this section. . These distributions present our belief prior to having observed the data Z. from (18. . we introduce the Kalman filter. .1). ZF). To improve forecasts. determines the distribution of the state vectors p(YJ.5. the I-step ahead forecasts from the forecast origin time t can be calculated as follows: Hence.. After observing the data. inn(18. which is a recursive procedure used to make inferences about the state vector Y. for t = 1. where Clearly. i s . the accuracy of the forecasts k. To specify the distribution of the state vectors Y. .4). . we revise our prior distribution and obtain the posterior distribution of the state vector.which summarizes the information fiom the past that is needed to forecast the future. which is the conditional distributionp(Y. (is.18:5 The Kalman Filter and fts Applications 479 Once a state space model is constructed. where 1 ZT = ( ~ ~ 51 1 5 t). we assume that the distributionp(Yo) of the state vector Yo at time zero has mean Yo and covariance Vo.* and are referred to as the prior distributions. it should be used to update the state vector and hence to update the forecasts. 2.5. . . of the state vector Y. for t = 1 .5. 2 . .6) .5.1). . having observed Zf but prior to observing we have. 0). I 2:) at an earlier' time becomes a prior distribution for calculating a new posterior di~tributionp(Y..+l = AV. the prior distribution I + where R. l).+~ I ZT+l)when a new observation Zr+lbecomes available. a posterior distribution p(Y.5. .Af + GXG'.+l. from (18.+l. Because knowing Z. of the state vector Y tat time t is a normal distribution with a mean Y. however. (18.e. Note that at time (t I). Hence. + ~Thus.5.5.+.+I . ZT).. to derive the posterior distribution P(Y. (18.+~I ZT+l) = P ( Y .+l I e. ZT). when a new observation Ztcl becomes available. suppose that the posterior distribution. we have the posterior or conditional distribution (er+l I T + I23 ~ . follows immediately if we can find the joint normal distribution of IfY. e.91. and a covariance matrix V. p(Y. The conditional distribution the posterior distributi~np(Y~+~ of (Y.5. we would like to update the state vector and derive the new posterior distribution P(Y.+~I ZFtl).+lett1.480 CHAPTER 18 State Space Models and the Kalman Filter By Bayes's theorem.11) After observing Zt+l.ZT).12) . we need only to find ) el+l. 1 Zf).+l)' I GI. via the system equation in (18.+l..A$. i. Now.ZT)of (Y. At time (r + I).+l is equivalent to knowing the one-step ahead forecast error e .N(H(Y. + ~I Z. 14).18.+l and X1 correspond to e. and by (18. and (18. (18.5. by the same equations. Also.5.5 The Kalman Filter and Its Applications 481 Recall that for any two random vectors X1 and X2.5.Then.5.12). we have if and only if Now.10) and (18. we get . from (18.16). In terms of a joint multivariate normal distribution.15).+l.5. we have Hence. let X2correspond to Y. Otherwise. which ahead forecast error e . the sum of the projected es~imateusing observations up to time t.l in (18.5.482 CHAPTER 18 State Space Models and the Kalman Filter Thus. In that case. following a principle of uninformative prior. Clearly.5..5. some guess values from experience can be used.+l.15). Kalman filtering is a recursive updating procedure that consists of forming a preliminary estimate of the state and then revising the estimate by adding a correction to this preliminary estimate. In fact. it can be easily seen from (18.5. The updated estimates Y.21). Thus. the results will be dominated by the information from the data.5. Thus. from (18. It follows.5). The Kalman filter will tend toward a steady state when the Kalman gain approaches a limit. Zt) - ~ ( i . These initial values can be the values from the previous studies.1). Z.22) are the basic recursive formula used to update the mean and the covariance matrix and hence the distribution of the sta? vector.+I of the state is. The matrix determines the weight given to the forecast error. after the new observation.+l conditional on Z.21) and V.VI+1)7 where with Equations (18. Y. the Vo is often chosen as a diagonal matrix with large elements. for the given state space model (18.22) require initial values Yo and Vo specified in (18. the rehcursive equations to compute in (18.+1. has become available. that (X+l le.Zt(l). + l .21) and (18. for a large n.5.+l. from (18.+l .5.21) that the Kalman gain is simply the coefficient of the least squares regression of Yt+l on the forecast error e. and the one-step is called the Kalman gain. The magnitude of the correction is determined by how t e l l the preliminary estimate predicted the new observation.5. + ~= Z.5.5. ..13). by (18. Bayesian forecasting (Harrison and Stevens [1976]). The advantage of this formulation is that one can estimate the time-varying coefficients. Priestley and Subba Rao (1975) derive the K a h a n filter algorithm through the statistical technique of factor analysis. these methods have now become well known and widely used in many areas of applications. P. (an m X 1 vector of nt coefficents). where + The commonly used multiple regression model. Other applications include time series forecasting (Mehra [1979]). Bohlin [1976]). missing observations (Kohn and Ansley 1198311. A = I. For example. Y. quality control (Phadke [1981]). For further references. = A x u 1 + a. and data disaggregation (Gudmundsson [1999]). The time-varying regression model is the above state space model with H = X. The recursive updating equations can be derived using different approaches. we are interested in finding the correlation between the Iinear combination of the variables in X and the linear . see K a h a n (1960). An expository article is given by Meinhold and Singpunvalla (1983).A Canonical Correlations 483 The methods of Kalman filtering were originalfy developed by Kalman (1960) within the context of Iinear systems. can be easily seen as the state space model with H = X (a I X nt vector of independent variables). assume that q 5 r.. = P.A Canonical Correlations Let X be a q X 1 vector and Y be an r X 1 vector with the variance-covariance matrix Z given by Wlth no loss of generality. = BY. and Z J. Kalman and Bucy (1961). however.Y. the analysis of time series models with timevarying coefficients (Young [1974]. Jazwinski (1970). For example.0. Supplement 18. b. and X = 0. Z.Supplement 18. = XP b. Because of the ease of implementation of the algorithm on digital computers. consider the simple state space model + Z. Y. and others.= P... regression analysis (O'Hagan [1978]). temporal aggregation (Schmidt and Gamerman 11997]).. using Kalman filtering. In many applications. we need to maximize such that IX~Zxal = PiXYS1= 1. we get Multiplying the first equation in (18. such that Var (ruiX) = 1 and Var (PiY) = 1.A.A.484 CHAPTER 18 State Space Models and the Kalrnan Filter combination of variables in Y. the system to solve for a1 and pl in (18.The first canonical correlation. Thus.e. is defined to be the maximum correlation between U1and Vl over all possible arl and 01.PI...e. p(l). and 42and setting them to 0.5) by ~ i the . Using Lagrange multipliers. Vl)= ~ l l j 2 ~For ~ fa1 i ~and .+I. i.5) becomes and p = 241 = 242 = Corr(Ul. the correlation between atXand P'Y for some nonzero q X I vector IX and r X 1vector p. i. second equation by pi. and using the constraints in the last two equations. the . Let U1 = aiX and Vl = P$Y. implication is that PI not identically zero. we maximize the function: Taking partial derivatives with respect to al. 2 pi.The ith canonical correlation.To test that X and Y are uncorrelated. from the results of partitioned matrices.A.A. The corresponding vectors oil and P1 are obtained through (18. it is equivalent to .e. i.1) canonical variates. q. and Vi = P:Y.8) be ordered so that pf 2 p$ r . i = I . . we have and therefore the equivalent implication Clearly. The resulting U1 = atjX and Vl = PiY are called the first canonical variates. r Assume that X and Y follow a joint multivariate normal distribution N ( p .. it can be shown that p(i) is the positive square root of the ith largest solution.. and hence they are independent. where xj and yj are q X 1 and r X 1 vector observations of X and Y.6). i.e. p. If we then let the q solutions of (18. pti). Let be the sample variance-covariance matrix computed from a sample of size n.Supplement 18. .. there are q solutions for p2 in (18.A Canonical Correlations 485 Note that if lZyl # 0.. we choose p(1) to be the positive square root of the maximum solution. . and U i and V iare uncorrelated with all previous (i . 2). . Let Ui = a : X .A. The sample canonical correlations are . is defined to be the maximum correlation between Ui and Vi so that Var(a1X) = 1 and Var(PiY) = 1. . XXy # 0. where fif r fij 2 are the solutions of .8)..(i). statistic .ie. p(q)) are jointly zero against the alternative that they are not jointly zero. we have To improve the X2 approximation.A. . Thus. . from the large sample result of Wilks (19381. It can be shown that the likelihood ratio statistic is # 0.we have hence. Bartlett (1941) suggests replacing the multiplicative factorn by the factor (rt -(r q 1)/2). It is clear from (18.A. . when tix( (gY[# 0. and From (18.486 CHAPTER 18 State Space Models and the Kalrnan Fiiter test the null hypothesis HI : (p(l). + + If H I is rejected.9). it follows that Under the null hypothesis. .. . p(q)) are jointly zero.10) that 62 is a characteristic root of 2gi%xy$~2ux. We use the test . we next test that Hz : (p(2). . . Under the null hypothesis we have In the canonical correlation analysis of state space modeling. we have + q + 1) I d In A2 +X 2 ( ( r . then p(f) is the only significant canonical correlation. a.0 ~ 3 ~ ) a .16) I f H l is rejected and Hz is not rejected. = + b.-l .B)Zt = (1 . for example.I)).-. See. using the test statistic . we have r = (p 4. Z. however.BIB)al. Johnson and Wichern (1998).1)(q .1)1n and k = q. (1 .Exercises 487 Under the nuU hypothesis H2.3 Consider the so-called autoregressive signal plus noise model in which we observe 2. It is an important data-reduction method discussed in most books on multivariate analysis. . = (1 . .1 Find the state space representation for the following models: (a) (b) (c) (d) ZI = (1 .A. . Canonical correlation analysis is used in testing the independence of two sets of variables. we can continue the process. (18. = C$~Z. and the unobserved process Y.41B . p(q)) are jointly zero.&B~)z. an AR(1) process .f12at-z.2 (a) Find the state space representation for the model given in Exercise 16. say. + 18.1. (b) Is the representation obtained in part (a) unique? Why? 18. . part (c). 18.OIB (1 . we test that Hk:(p(k). .Bla. At the kth step.OIB .&B)(1 . If Hz is rejected.follows an autoregressive process. (a) Find a proper univariate ARMA model for the series. (b) Find the steady state of the Kalman filter. I) model. 18. 18.3. = r: + b. if 4 = 1. are independent zero mean Gaussian white noise processes with variances cr: and mg.5 Compare the forecasts obtained from the transfer function model (18. 18. (b) Show that the random walk plus noise model is in fact anAHMA(0.4) and the state space model (18. (b) Find a state space model for the series. where the unobservable process Y. 1. (c) Compare your results in parts (a) and (b).71. follows a random walk given by (a) Express the model in the state space form. ..respectively.4.488 CHAPTER I8 State Space Models and the Kalman Filter where a.6 Consider the corn supply data in series W13.4.4 In Exercise 18. 18. the autoregressive signal plus noise model becomes the random walk plus noise model 2. (a) Express the model in the state space form.7 Use canonical correlation analysis to construct a state space model for the Lydia Pinkham data Series W12. and b. a. for a process to be stationary.1 Fractionally Integrated ARMA Models and Their ACF The ARIMA(p. however. are short memory processes with autocorrelation functions that decrease fairly fast. and these finear models cannot properly describe them. Clearly. All time series variabla (differencing if necessary) discussed so far.1 Long Memory Processes and Fractional Differencing 19.1. We would. process is nonstationary but its dth differences for some integer d is stationary.1. and all time series models presented there are linear. 19.B ) ~ = z .2) . q) process plays a dominant role in all chapters we have studied.. the original 2. In fact. we discussed various methods for analyzing both univariate and multivariate time series. To fit these linear models. d. we have assumed that the value of d is an integer. the value of d has to be less than 1. we introduce long memory processes and some nollliwar time series models that are in describing these long memory and nonlinear phenomena. With no loss of generality.(B) # 0 for 5 1&d a. Many of these commonly used transfornations such as the logarithmic transformation and square root transformation are nonlinear. In this process. In other words. like to know more exactly the range of d for the process where #. many time series exhibit chmacteristics such as sudden bursts of large amplitude at irregular time epochs that are commonly found in seismological data as well as some financial series. (19. to be stationary and invertible. we consider the simple case (1 . though. to answer the question. is a white noise process with mean zero and constant variance u:.(B)B. The transformed series folloiving a linear model clearly indicates that the original series may follow a nonlinear model. we sometimes apply transformations so that the transformed series can fit these models well. In this chapter.Long Memory and Nonlinear Processes In earlier chapters. d) > 1.5.1.1. (I)~)is square summable if and only if 2(1 . we see that the process is invertible if and only if -. The process in (19.5 is therefore called the autoregressive fractionally integrated moving average rnodeI.8b). d. Using Sfizling's formula we have Clearly. d < .2) or more generally the process in (19.1.490 CHAPTER 19 Long Memory and Nonlinear Processes If the process in (19.1.2.2. Clearly.5 is called the fractionally integrated (or differenced) noise. i t equals . the spectrum of the fractionally differenced noise exists and.5 < d < . The process in (19. Similarly.2) with -.)is the gamma function.1) with -.5 < d < . From Taylor series expansion. then we should be able to write it as such that ( G I ) is square summable.1.5. i.5) and (l2. often denoted the ARFIMA(p. from (11. q ) model.e.5 < d. the process in (19.1) is stationary and invertible if and only if -. we have the general binomial formula where and r(.5 < d < . Thus.2) is stationary.. 1.51) Thus. Using Stirling's formula again.19. 491 . Now. where we use the following integral result (see Gradshteyn and Ryzhik (t196. fiom (12.1 Long Memory Processes and Fractional Differencing although it diverges at the zero frequency.4). we have where c is a constant with respect to k. pk.1. for the ACF. although it will converge at a much slower rate. Thus. a phenomenon simitar to the sample ACF of a nanstationary time series discussed in Chapter 4.4 model. is further said to be persistent.8). i. a stationary ARFLRlA model could be misspecified as a nonstationary ARbL... A misspecified nonstationary model wiU therefore produce a bias and a forecast error with an inllated variance. q) model decays very slo\vly. the ACP of a persistent AREZMA(p. For example.). I).1) < -1. 19.) and {Z.B ) Zt. ~ and let fW(m)and fz(w) be the spectral density fbnction of {W. q) process is a Iong memory process. 19. Because d is known to be less than . i. the fractionally differenced noise process and thus the ARFIMA(p. we showed in Chapter 5 that the forecast based on a stationary process converges to the mean value of the process. respectively. see Crato and Ray (1996).+m lPkl converges only if (2d .e. let W.1. For instance.492 CHAPTER 19 Long Memory and Nonlinear Processes A stationary process is said to be a short memory process if its ACF is geometricaIly bounded. if where a < 0. a stationary process is said to be a long memory process if its ACF follows an asymptotic hyperbolic decay. d. Also. For more details. the forecast of a long memory process should converge to the mean value of the process.1. In t e r m of forecasting. It is known that the stationary ARMA models that we studied in earlier chapters are short memory processes.8) follows an asymptotic hyperbolic decay. q) process is in fact persistent when 0 < d < -5. Furthermore.. if where c is a positive constant and 0 < r < 1. realizations of both persistent ARFlMA models and nonstationary ARIMA models have periodograms that diverge at the zero frequency. 03 IPk[ = 00.e. A long memory process with an ACF which is not absolutely summable. The consequence of this overdifferencing has some undesirable effectson parameter estimation and forecasting.3 Estimation of the Fractional Difference For the ARFIMA model given in (19. The fractionally differenced noise process and thus the AFSIMA(p. or d < 0.5. On the other hand. These similarities often lead to model misspecification. d. d. in finite sampIes. Hence.2 Practical Implications of the ARFIMA Processes An AlWMA process contains several characteristics that. 2g.1.1. i. in (19.e. = (1 . . are similar to those of a nonstationary process. it is clear that the ACF given in (19. Then. >)~ - . . .i. q) model. because we have The least square estimator of d is given by .1. i.493 19. .and ej = In (Zz ( o j ) / f Z(wj)). [n/2]and adding In IZ(oj). random variables. to both sides. O Thus. . . the periodogram of {Z. . 4 = lnl 1 T eiujI-2. j = 1. gives For near zero. Taking logarithms on both sides of (19.Here we recall from Chapter 13 that the sequence [Iz(oj)/fi(oj)] and therefore ej are approximately i..1 I).e. for j = 1.d.). we get Replacing w by the Fourier frequencies wj = 2njlr1. For computation.1 Long Memory Processes and Fractional Differencing where is the spectral density of a regularARMA(p. m I n ( f ~ ( ~ ~ ) / ' f ~ t0. . << (142)such that m/n -+ 0 as n -t m. Note thatti(@) +OI as w -+0. c In fw(0). we have where kj = h b(q). I is the greatest integer function. from the fractionally differenced data Wt = (1 . we can estimate the parameters. . we often take ni = [6] so that the regression (19.1. = 1.13) is performed over the Fourier frequencies oj= 2 ~ j / nj .2. that because time series aggregates are often used in data analysis and modeling.B ) Zt ~ using the methods introduced in Chapter 7. the spectrum exists and equals zpo The equation given in (19. is a white noise process with mean 0 and variance a:. then the process is stationary. a generating long memory ARFLMA process could be misidentified as a short memory ARhilA process due to the consequence of aggregation as shown in Teles.2. where [. This misspecification may lead to a very undesitable result in forecasting. +j and Bj.494 CHAPTER 19 Long Memory and Nonlinear Processes Geweke and Porter-Hudak (1983) showed that The 95% corhdence interval for d is obtained by In practice. can always be written in the form where t/io = 1. and fiom. and Crato (1999).1) is the general form of a linear process. however.2 Nonlinear Processes With no loss of generality. . and 03 $I ( B ) = (4BiIf zjzo $.2. section 12. . 19. a time series process can be written as . a. It is known that a nondeterministic linear time serie-s process Z. . Once d is estimated. One should beware.[fi]. 4 m. The estimated model can then be used to forecast the future values with the results discussed in Chapter 5. we now consider a zero mean process. More generally. Wei. Specifically. if + .) random variables.is often known as the nth-order cumulant spectrum or the nth-order polyspectrum. Note that E[exp (itZ)] is the characteristic function of Z. because of the nth-order stationarity.2. In addition..2..in (19. we also require that (a. k2.) = ln{E[exp(itlZt 4. Then.3).it2Zl+k. a .+ a 1).2.} in (19. the second-order polyspectntm is and the third-order polyspectmm becomes . 19. . As a result.495 19. . we choose cumulants.. . then its Fourier transform exists and equals where -T 5 wi 5 a for i = 1. exists and is defined to be the coefficient of int1t2 the Taylor expansion about the origin of . .1 Cumulants. Clearly. Following the simiIar results from Chapter 12.I. C(kl.4) is also known as the cumulant generating function. f (wl. For a Gaussian random variable with mean p and variance a2. 12 5 w. (19. kn-. w. and Tests for Linearity and Normality Other than regular moments.4) ..t.i ( 4 +wk2). are absolutely summable. In such a case. -T 5 I. the function. k2. t. .). .itnZt+k. . Because of their many desirable properties. if the nth-order cumulants. .3) be a process of iqi. C(kl. for the models to be presented in this section. The higher-order moments are called for even with a weakly stationary process. Z. rp(tl. . the 11th-order t. . Clearly.-~)..3) is linear if it contains only the first term.2. It also follotvs that its cumulants of order greater than two are zero. in (14-2. In addition. .2..f2. .7) kl=-co k2=-w which is also known as a bispectrum. many higher-order moments are available for describing a process. in cumulant. tz. . kn-.2 Nonlinear Processes The process. all its joint regular moments up to order rl exist and are time-invariant. second-order properties such as autocorrelation functions and spectra are no longer sufficient to describe the genera1 nonlinear process given in (19... . . a q(tl.(0. from Chapter 2.3) is an 11th order weakly stationary process.2. . . Assume that the process 'in (19. it is known that its characteristic function is given by exp[it (it)2 u2/2j. cr.). .2.d. the cumulant generating function is independent of time origin.-l) a (14. The process is nonlinear if it contains more than the first term. Polyspectrum. 12. -i. The function. .). Similarly. To illustrate the procedure of deriving cumufants. the third-order cumulant equals (19.o) = E(Zt). it follows that all cumulants of order greater than two are also zero. I a ~([exp(itlZ. let us consider the second-order cumulant.P ) = Ykl) and the second-order polyspectrum is simply the spectrum where -T 5 u1k 'IT.9) . which is the coefficient of i2t1t2in the Taylor expansion about the origin of From the Taylor series for a function of several variables. C(kl) = E(Zt .496 CHAPTER 19 Long Memory and Nonlinear Processes Y and Z are independent.2. 440]). the bispectrum and all higher-order polyspectra of a Gaussian process are identically zero. we have Now.+ it2Zt+lrllliZ. Rao t1965. Because a multivariate normal can be defined as a vector of linear combinations of independent univariate normal variables (see.+k15 at. hence. C(kl).w)(Zt+kl . for example. where p E[exp(it1Zt + it2~t+q) (o. Hence. p. for any constants a and b we have: E[exp(itlaY + it2bZ)] = E[exp(itlaY)] E[exp(it2bZ)]. One can use these higher-order polyspecka to measure the departure of the process from normality. the function is constant. Thus.19.j .2 Nonlinear Processes 497 Let It can be seen that T ( q . . Hence.and ) +j = 0 for j < 0. Testing for linearity can therefore be based on checking the constancy of the following sample statistic over a grid of frequencies . under the null hypothesis that a process is linear.and j=O where ru = ~ ( a . we have 2. w2) can be used to test whether a process is linear because under the - M - null hypothesis the process is linear. = z @ j a . 2. 1984).3) contains an infinite number of parameters. As a result. A non-Gaussian process could also have a zero bispectrum. Many nonlinear models have been proposed in the literature with the objective of describing different characteristics exhibited by time series encountered in practice that cannot be appropriately accounted for by linear processes. kl.2 Some Nonlinear Time Series Models The general form of the process given in (19. u2)and the function T ( o l . which was discussed in Chapter 13. For these reasons. 0 2 ) is statistically zero over a grid of frequencies. one should really be testing all the higherorder polyspectra. great skill is required for its successful use. In addition to the finite order form of (19. m2) or T(wl. however. we see Subba Rao and Gabr (1980. For more details of this and the previous tests. the test suffers in terms of its power.w2) equal zero at all frequencies.3) that has been called nonlinear moving average (NLMA) models. Interested readers should see excellent references in .2. the performance of the test is also affected by the choice of the lag windows. for a comprehensive test. as pointed out in Chan and Tong [1986). 19. we give below some other general finite parametric nonlinear models without details. Additionaliy.. Both the bispectrurnf(ol. The principal advantage of the above test is that it can be used for any type of nonlinear processes.esting for normality can also be based on checking whether the sample statistic f(ml. we see that. Once nonlinearity in the process is c o n h e d . From the earlier discussion. then (Y = ~ ( a ? = ) 0. the truncation points.2. W(k) is the lag window. and If the process is Gaussian. k2). Thus. and the placing of the grids of frequencies.498 CHAPTER 19 Long Memory and Nonlinear Processes where M is the truncation point. m = max(0. the next task is to build a suitable finite number parameter nonlinear model. such as logistic and exponential. we discuss the general k regime threshold autoregressive model further in the next section. and I(.p and Zt+ is a model-dependent variable. just as in linear time series modeling. The threshold autoregressive PAR)mode1 is where d is the delay parameter.. 1990).) in model 3 above is an indicator function. e . 4.p ~ Z f . d ) model) is Z. = 2 (+. 7. When the function I(. The exponential autoregressive model of order (p.3 Threshold Autoregressive Models 499 Subba Rao and Gabr (1984). q. Specifically. T and S are the location and scale parameters. the model becomes a simple two regime threshold autoregressive model. (0.d. . ui) random variables. . Because of its many interesting properties and close connection with the models studied in earlier chapters. d ) (EXPAR(p. in above models are all i. Priestley (1988). They are piecewise linear models where the linear relationship changes according to the values of the process. with respect to the NLAR model. 7. For example. respectively. Tong (1983.) is a smooth function. + i+ a . or an indicator function.i. The series a. The nonlinear autoregressive W A R ) model of order p (NLAR(p) model) is wheref (. and Fan and Yao (2003). 3. 1983. The models are as follows: 1. . q. let us partition the real . i= l where pi 2 0 for i = 1. the orders that have been commonly chosen are relatively small.3 Threshold Autoregressive Models Threshold autoregressive models were originally proposed by Tong (1978. The bilinear model of order ( p . .19. NLAR(1) and NLAR(2) models are the most commonly used. 1990) and Tong and Lim (1980). 8s)( W p .~ a. $1model) is 2.) is a nonlinear function. 19. . In practice. These models are in the form of general orders.d ) ~ . pk.where RI = (-m. ... a n d d following simple TAR(2. k . then the value of 2.1 Tests for TAR Models Let p = maxIpl. With the understanding that (19. p. + a!". and . and a(jl is a sequence of independently identically distributed random noise with mean zero and variance C T ~For . such as limit cycles. and the weight for each regime is equal to the probability of 2. 4jj) = 0 for i > pi. uj.1 is positive. At time (t . .d. Moreover. the number of observations in the negative regime is less than that in the positive regime (regime 2).1. separated by (k . Rk = Irk. the series has large upward jumps in the negative regime (regime I). ri) for line R into k intervals or regimes. .rn < rl < .1) as a TAR(k. is a weighted average of the conditional means of the two regimes.). = [O. and jump phenomena.. 1.3. R = i = 2. we will refer to (19. pl. The series exhibits an asymmetric pattern of increasing and decreasing. where k is the number of regimes. w).we can rewrite . If the value of 2. .3. . .p. The model contains many interesting features. Ri = fq-1. . d ) model. R.I). it tends to remain positive for a few periods before it changes to negative. which is nonzero.500 CHAPTER 19 Long Memory and Nonlinear Processes k Ri. rl).< r k .1) thresholds ris. = &I + $#jj)z.i = 1. however. 19. convenience.1 < 03 are the thresholds. = I-m.k. (0. c$) random variables. . ..3. at the next period tends to be positive. 1) model.l)as . it becomes a random level differ for different regimes. (T?. O).m). which cannot be described by a linear time series process. Many financial time series exhibit phenomena illustrated in this example.2. . d is a positive integer and is known as the delay parameter. It becomes a nonhomogeneous linear AR model when only the noise variances differ for different regimes. of the error term in the positive regime. is a self-exciting threshold autoregressive (TAR) process if it follows the model . The process is a linear autoregressive model in each regime and is nonlinear when there are at least two regimes with different linear models. and pi denotes the order of the autoregressive model in the jth regime. .-l is negative.pk). = 1 forthe where a!'). . if the value of Z. .1 Letk = 2.of the error term in the negative regime is normally larger than the variance.-.R. As a consequence. and the variance. it reduces to a linear model when k = 1. shift model when only the constant terms EXAMPLE 19. 1.i. The mean of the series. P 2. i= 1 where j = 1. amplitude dependent frequencies. = E(Z. A time series process 2. is a sequence of i. . = p z = 1 . in that regime. 1. where rd.. . .we can compute the one-step ahead standardized forecast error. . 342). ( n . p.3. . + . Z.l v(Tl . . . Zr-d. Suppose that the threshold variable Zt-d assume values {Zh.p 1 . p. where h = rnax (1. r1. . . . for r = rd. . For example. . Under assulnption of linearity. d) model. 1971.i for t = (p I). Petruccelli and Davies (1986) proposed a test for threshold nonlinearity using the concept of ordered autoregression.(11 . the one-step ahead standardized forecast errors Zj. n. For any given A R ( p ) regression with p observations. . = (n/lO) p. 1. .h I). . + With a proper choice of r. = $o ~ ~ = ' = l + i ~ . Tsay (1989) suggested an iterative fitting with r = r-.4). .- r&~r~(tt-d-/[+I) d + 1 .d ) . -t. After each iterative AR fitting of the model in (19.s are known to be asymptoticallyindependent and identically distributed with zero mean and unit variance. .19.. . .. let us call (Z. . we illustrate the test using the TAR(2.d . Z. the above-rearranged autoregression can be fitted iteratively. we can form the cumulative sum + + S. Tsay (1989) further developed the method into a modeling procedure for TAR models.a.d . . . d ) model.h I). . p. the least squares estimates of an AR model are consistent and the one-step ahead standardized forecast errors 2. .d . For simplicity. = i=r-+I Zi. Thus.rhn)' From the invariance principle (Feller. I) 5 i 5 (n . but with no loss of generality. . ZI-p) a case of data for the AR(p) model.h -k 1) and compute the test statistic T= max Is. are not only asymptotically independent and + + . . . . ( r ~ . we can compute the asymptotic p-value of the test statistic T by The linearity is rejected if P(T < x) is less than the sign5cance level a. We can obtain the folloiving arranged autoregression with cases rearranged according to the ordered values of the delay variable ZtPd: + + + . It is we11 known that when the mode1 is linear.3 Threshold Autoregressive Models 501 and simply call it the TAR(k.Zn-d). Let (t> be the time index of the ith smallest observation of (Zh. 2 Modeling TAR Models The AIC Approach Tong and Lim (1980) proposed a TAR modeling procedure based on Akaike's information criterion.d . .d .. . and minimize the AIC over this set.7) and giis the least squares residual of (19.h . follows a TAR(k. Let d be fixed and allow r to vary over a set of potential candidates R = { r l . .2 are chosen so that where nj is the number of observations in the arranged autoregression of the jth regime. reR . . x)lI2 is the norm of a vector x. r ~ .3.r ~ .Let Step 2.r. Based on the given r. . The orders pi for j = 1.h . and \]XI[ = (x'. d ) model.d .7). -t. construct an arranged autoregression for each regime. r)}. Search for the threshold value r.3.Under the 19earity assumption. the sample statistic F(p.. d) is asymptotically distributed as an F distribution with @ 1) y d (11 .) (19.3. Tsay (1989) suggested the least squares regression for (rd. p.3. Step 1.1) S i 5 (n . 6)model with k = 1 . let P be the maximum order to be entertained for the two piecewise linear AR models. Search for the orders of the piecewise AR models. Zco+d-p). Thus. Given any fixed values d and r.11) . when Z. > Fa.h . pl. ?) = min(AIC(d. We illustrate the procedure for a TAR (2.502 CHAPTER 19 Long Memory and NonIinear Processes identically distributed but are also orthogonal to the regressors {Z(o+d-lr Thus. pz.e.p) degrees of freedom. The choice of P is subjective. Tong (1990) suggested P = 1 na with a < 2. .r.p ) - + 19.(p+l. The other cases follow similarly. gj@i) is the vector of residuals from an AR (4)fitting. + I) and the associated F statistic where the summations are over all the observations in (19. The linearity assumption is rejected if F ( ~ . we choose the value "rf r so that AIC(d. i. r2. ... 2. and the residual variances are 258. To initiate the iteration. where r. P). . 11. . 4. The estimation resufts are . In addition to the formal AIC approach. d. raO}. This tentative selection of the AR order can be based on the PACF or on other criteria such as AIC. . 1990) suggested performing a preliminary analysis of a time series using graphical methods such as reverse data plots.2. Tentatively select the AR order. is the qth percentile of the data and D = {1. The Arranged Autoregression Approach We now illustrate the arranged autoregression approach proposed by Tsay (1989).(p. however.r30. directed scatter diagrams. Select the maximum order. we use the original numbers instead of their square roots. 3) model.44. and bivariate histograms to search for evidence of nonlinear time series characteristics such as time irreversibility. Let D = {I. we should normalize the AIC by the effective number of observations (n . For details. is chosen such that . R = {rZ0. Then fit the arranged autoregressions for the given P and every element d of D. Step 3. Because the effective number of observations in the airanged autoregression varies as the value of d changes. Let D be the set of possible delay lags.19. Step 1. and nonnormality.nd). this step may take several iterations based on refined grids of potential candidates. Tong (1983. Often. . EXAMPLE 19. An estimate of the delay parameter.66 and 71. we refer readers to Tong's books. Search for the delay parameter d and hence the threshold variable ZtVd. P. The numbers of observations are 129 and 144. Let P = 11. and perform the threshold nonlinearity test. . P). . The AIC approach leads to a TAR(2. Search for the delay parameter d. to be entertained for the two piecewise 1inearAR models. we let D = (1. . d)) dED . . We now let d vary and choose its value 2 by where lid = max(d. .. i ( ~2).6).3 Threshold Autoregressive Models 503 Because r can be any value in an interval.= ~nax{. 2. In this example. cyclical behavior (limit cycles). Step 2. . T} be the set of iotential candidates of the delay parameter d. respectively.2 Consider the series of annual sunspot numbers between 1700 and 1983 that is a subseries of Series W2 discussed in earlier chapters. Tong (1990) suggested using the sample percentiles as initial candidates of r. $j is a consistent estimate of $j.15) The argument of the above selection is that if a TAR model is called for. Step 3. D p-value . Refine the AR order and threshold values in each regime using tbe available linear autoregression techniques. Search for the threshold values. EXAMPLE 19.j be the recursive estimate of the lag-j AR parameter in the arranged autoregression. Under linearity assumption.+. Refine the results. then a reasonable starting point for the delay parameter is to begin with the one that gives the most significant result in the threshold nonlinearity test. Thus. hence D = 11.. 6 A A Gj Step 4. the estimate #j begins to change and the t-ratios start to deviate. Once the recursion reaches the threshold value r.1. pvalue(2) = min{p-value[fi(~.1 The p-values of F tests for the annuat sunspot numbers (Series W2). To be comparable with the AIC approach. if necessary. 2. 11).3 We now analyze the same subseries of 284 annual sunspot numbers using the arranged autoregression method.d ) ] 1. the threshold values can be identified through the change points in the convergence pattern of the t-ratios from a scatterplot of the recursive t ratios of versus the values of the threshold variable Z. and the t-ratios of +j will behave exactly as those of a linear AR model before the recursion reaches the threshold * value r.3. The results of threshold nonlinearity F tests based on the given P and D are summarized in Table 19. deD (19. TABLE 19. Thus. Let b. more precisely. To search for the . . . we select ZlW2as our threshold variable.504 CHAPTER 19 Long Memory and Nonlinear Processes or. The value d = 2 gives the minimum p-value. we also choose P = 11. . 10. From the plot. 2) model: The numbers of observations are f 18. we construct a scatterplot of the t-ratios of #2 versus the ordered values of Zfd2in Figure 19. respectively.23. 11. 89. 125.1 Scatterplot of recursive t-ratios of the lag-2 AR coefficient +z versus ordered 2t-2for the sunspot numbers.1.19.11. suggesting that rl = 35 and i-2 = 70.68. We then use AIC to refine the AR orders in each regime.3 Threshold Autoregressive Models 505 TIGURE 19. once at Zt-2 = 35 and again at around Z1-2 = 70. and the residua1 variances are 174. and 89. 10. and 66. we see that the t-ratio changes its direction twice. h threshold value. The results lead to the foflowing TAR(3. be a stationary Gaussian process. (b) Perform the linearity test on the series.d. There are many studies on nonlinear models and their applications. (b) Build a TAR model for the series.1 Consider the AIZFIMA(0. and Pena. (c) Build a TAR model for the series.i a. . 19.13) and (19. Discuss the different phenomena of the model when [ ~ . ~ = (1 19.2. Show that its bispectrurn is identically zero.5 Consider Series W2 of the yearly sunspot numbers from 1700 to 2001 given in the appendix.-l 3. Compare and comment your TAR model with the standard AR models obtained in Example 6. = -F + i e + b l z : . Chan (1990).4.i. we refer readers to Subba Rao and Gabr (1984). 19.3). Tsay (1989.4 Consider the EXPAR(2.3 Show that the bilinear model.8B)a. and show that the EXPAR model behaves rather Like the TAR model where the coefficients change smoothly between two extreme values.2 Let Z. . 19.a. = qbIZ. 1) random variables. are 19. 1) model.6 Consider Series W7 of the yearly number of lynx pelts sold by the Hudson's Bay Company in Canada from 1857 to 1911 given in the appendix. Tiao. (a) Test the linearity of the underlying process using the bispectrurn test. Compare and comment your TAR model with the standard AR models obtained in Example 6. z:=~(+~ + 19.l ) ~ t . - . For more examples.B ) .506 CHAPTER 19 Long Memory and Nonlinear Processes Chan (1989) gave a detailed analysis of annual sunspot numbers including TAR models in (19. 2. (1 . . (a) Find and plot its spectrum.~ ~ is small. (b) Find and plot its autocorrelation function.$zZ.. 1991).~ 1is large and when I Z . Tong (1990).3.. N(O.16).3. (a) Perform the normality test on the series. can be expressed in the form of (19. Z. and Tsay (2001) among others. Because their ability to capture the asymmetric phenomena. Zt i.-l 3. .. (c) Comment your hdings in part (a) and (b). 1) model.-la. the models or the hybrids of these models such as TAR-GARCH models are found useful in many economic and financial studies.2.7. where the a. In many studies. Although time series observations may be available. a time series variable is either a flow variable or a stock variable. Thus.Aggregation and Systematic Sampling in Time Series In any scientific investigation. based on theoretical grounds. a . Suppose that. the assumed time unit in the model may not be the same as the time unit for the observed data. data often are available only through aggregation or systematic sampling. A flow variable such as industrial production exists only through aggregation over a time interval. controlled experiments may not be possible. we study the consequences of aggregation and systematic sampling on model structure. In this chapter. Generally speaking. In some cases. after making a conjecture or a proposition about the underlying model of a phenomenon. parameter estimation. A stock variable such as the price of a given commodity exists at every time point. a scientist proposes an underlying model for a time series process in terms of a basic time unit t. 20. a scientist must test his or her proposition against observable data. forecasting. however. In some fields.B ) ~ zfollows . the investigator often cannot choose the time interval. a scientist can test his or her proposition by analyzing the data from a designed experiment in terms of the same time unit t used in his or her proposition. The values of a flow variable are usually obtained through aggregation over equal time intervals. whereas the values of a stock variable are often obtained through systematic sampling.1 Temporal Aggregation of the ARIMA Process Let zt be the equally spaced basic series whose dth difference ivf = (1 covariance stationary process with mean 0 and autocovariance function . and various time series tests. and we refer to z.' Thus. 20.1. Thus. defined as where T is the aggregate time unit and m is fixed and is called the order of aggregation.. if t represents the time unit of month and m is 3. We call ZT an aggregate series. Not+. then the ZT are quarterly sums of the monthly series 2. by letting we have . for example. and its aggregate series ZT. as a basic or nonaggregate series.1 The Relationship of Autocovariances between the Nonaggregate and Aggregate Series To derive the relationship between the autocovariances of the nonaggregate series z. we first define the following nt-period overlapping sum: and note that Let be the backshift operator on the aggregate time unit Tsuch that L ~ Z T= &--I.508 CHAPTER 20 Aggregation and Systematic Sampling in Time Series Assume that the observed time series ZT is the m-period nonoverlapping aggregates of z. Stram and Wei (1986b) show that the autocovariance function y. we can.1 Temporal Aggregation of the ARIMA Process 509 Hence. and y.. .. .and yw(l). . .1) columns corresponding to yw[-(d I)(m .~ ) ~ (Because ~+~). + - 1) columns EXAMPLE 20.yLl(k)is the linear transformation of the autocovariances y W O from j mk . em write this transformation in the follo~ving matrix form: + + + The coefficient matrix A is equal to + + + + .(j .l)(m . y. is the resufting matrix formed by this deletion of the first (d -t l)(nt and adding them to the proper remai&ng columns of the matrix A. delete the first (d l)[ni .1) I] vector of Ci.(k) for ~ v and .(d 4. respectively.is a 1 X . Thus. . in the above matrix.1). + + where A$. . which is the coefficient of B' in the polynomial (1 B B ~ .aggregate series.1) 10j = rnk (d l)(m . . + where 0.1 To clarify the construction of the matrices A and A. the autocovariance function yU(k) for UT are related as follows: - where B now operates on the index of yw(J3 such that By. the dth difFerenced.t vector of zeros and C is a 1 X [2(d l)(m . . UT = (1 .I)]. Thus. by adding them to the columns corresponding to y. . .(-1).. covariance stationary as it is a finite moving average of a stationary process w.B ) d ~ Tis.1).&? = y...20. we consider an MA(2) model. ~ m . The coefficients in the Iinear transformation are found by expanding the polynomial (1 f B 4.~ ) % ~ +We l ) .(-k) = yw(k) for all k.E(d + l)(az .I)]. have d = 0.(f) = (-el 4. .1. By using (20. + + where yA0) = (I + a 2 ) a 3 = 3[1 81 + O$](T.1.9).1. and from equation (20.8). nt = 3. The parameters.1 1) is reduced to Thus. To find the model for its third order aggregate.8162)u.(O) = (1 Of B$)ai. and (TI through 'the relations. + 4[-01 + f1192]~.8 2 d . O and oi. [jl > 2.1. the third aggregate of an MA(2) model is an MA(1) model.2. yz(2) = . equation (20.B2. ~[-o~]cT$.510 CHAPTER 20 Aggregation and Systematic Sampling in Time Series + + where y. by (20. & = (1 + B + B * ) z ~we ~ . and Y2(3= 0.and y d l ) = -0u3 = 2[-02]u: + [-dl + d182]u2.7) Hence. y.of the aggregate model for ZT can be found to be functions of 01. and [XI is used to denote the integer part of x. one need only consider the first (q* 1) rows and (q 1) columns of the coefficient matrix A$.OIB . We would like to know the corresponding model for the ntth-order aggregate series & defined in (20.(k) for which ml . which implies that the n~th-orderaggregate series &-= (1 B -IB'"l)~mTfollo~san IMA(d. are white noise series with mean 0 and variance a:. q) model. let us consider the case when m = 3 and the model for z. yv(2) will be identically zero if I > q* = [d 1 + (q . q* = 1.LjjdzT.1. equations (20. In fact. we see that yu(l) is a weighted sum of those y. = (1 .+ where the AT is a white noise series with mean 0 and variance ui. is an MA(2) process.(d -I 1)(nz . q) Process Suppose that the nonaggregate series zt folfows an IMA(d. . . They can be derived from solving the system of equations in (20.1). we have + + + It should be noted that q* in (2011. Let UT = (1 . yufq*).1.1).1.1. which reduces to a much simpler form for an IMA(d. The parameters Pi's and ui are functions of 8. ?%us. .20. q) model where the a. by (20.2 Temporal Aggregation of the IMA(d. yu(l). .1 Temporal Aggregation of the ARlMA Process 511 and 20. Because y.1) Ik 5 mE + (d lf(rn .. Letting A $ ( ~ be ) this submatrix consisting of these first (q* t 1) rows and (q 1)columns of A$.Also.15) is only an upper bound. From (20.1. . z. No) process + + + .d . h this case.15) impfy that in shrdying temporal aggregation for the IMA(4 q) model.the only nonzero autocovariances of UT are at most yu(0).€J2~')a.1.9).s and a:..1.7).l)/m]. To see that.13) and (20.7).(k) = 0 for lkl > q.. we have that and where ~ v = . z.. into b distinct sets Ai such that sk and sj E Ai if and only if 6f: = Sjm. let b equal the number of distinct values Sin for i = 1. (20. and (20. @P)z. . + a .- .20) Let &(B) = (1 .. . .. if %2 = 01/(01 . each with multiplicity q such that si = p. .16). 20.. Now. For my given value of m. + + .512 CHAPTER 20 Aggregation and Systematic Sampling in Time Series Thus.# I ~ B P )and 6 ~ for ' i = 1. . . . partition the numbers si. from (20.p*. then yufl) = 0 and the aggregates become an uncorrelated white noise series.p* be the distinct roots of &(B).#lB .3 Temporal Aggregation of the AR(p) Process Suppose that the nonaggregate series z.1. . i = 1. . * ..8).2).1.1. Furthermore.p*. and UT = ZT.= (1 B Bm-')zmT. &.folfows an ARMA(M.1. = a.9).1. . . Then Stram and Wei (1986b) show that the nzth-order aggregate series.. (20. follows a stationary AR@) process (l-$lIB-. N l ) model ~71. so that Thus. which are and 1 They are distinct.(2 + 1)/3] = 1. the AT are white noise with mean 0 and variance -3. q ) model . = -(.20. the process given in (20.1.1. 61 = S$ = 5. 20. For a more detailed discussion. Thus. follows the AR(2) model with = -(. and CT: ~n = 3. q) Process Now assume that the nonaggregate series follows a mixed ARTMA(p.24) has a hidden periodicity of order 3. HenceM = 1 andNl = [l f 1 . we first find the roots of #p(B) = (1 = 1. It is worth noting that A model is said to have hidden periodicity of order m if the reduction of the AR order on aggregation in the above example is due to the hidden periodicity. and the third-order aggregate series. see Stram and Wei (1986b).2 Assume that the nonaggregate series z.1 Temporal Aggregation of the ARlMA Process 513 where maxAi = the largest element in ini. Clearly.5)'j3.413 . sl = q = 1.5)2fl. EXAMPLE 20. d. I) model. # Sj but ST = 6y.# 2 ~ 2 ) . pj. But. and u$are functions of +. ZT. p* =: 2. follows anARMA(1. and ai.4 Temporal Aggregation of the ARIMA(p. To derive the aggregate model for ZT when .s and ui. d. each with multiplicity 1. whichimplies that b = 1 and$II = ( 1 ) . 514 CHAPTER 20 Aggregation and Systematic Sampling in Time Series where and the a.the mth-order aggregate .'.(B) are assumed to have their roots outside the unit circle and to have no roots in common. are white noise. the AR order is unchanged by aggregation and that the roots of the AR polynomial of the aggregate series model are the mth powers of the roots of the nonaggregate series AR polynomial. If the ARIMA(p. %. the aggregate series ZT = ZmTfollows an ARIMAf p.3.26). N2) model where the AT are white noise with mean 0 and variance o j .1. q) model of the nonaggregate series z. has hidden periodicity of order ni in the AR polynomial. Letting &(B) = llf==I(l.where Thus. d.SjB) and multiplying on both sides of (20. d.Note that in this case. then S? = Sf' if and only if ai = Si. we get Let It is easily seen that for K > N2. . The polynomials $(B) and O. then by the results of Section 20. . Additionally. are the roots of +p (B).1. .s. . i = 1. we also assume that the model has no hidden periodicity of order m in the sense that if a.p. and 02. and the parameter Pi's and a$ are functions of &'s. 5 The Limiting Behavior of Time Series Aggregates To see the implications of time series aggregation. is a white noise process with mean 0 and variance a:. QIs model with a seasonal period S.when s = mS for some integer S and nz < S.When m 4 m. d. the parameters Pi's and mi are functions of 4. d.1. M is determined by (20. (e where which all have roots outside of the unit circle. and the MA order N is given by More generally. Tiao (1972) shows that the limiting model for the aggregates. will follow a mixed ARIMA(M. parameters a i s are functions of #j's. d. N by (20.22). it is interesting to study the limiting behavior of the time series aggregates. Assume that the nonaggregate series z. .1. follows a general multiplicative seasonal ARIMA(p.26). q) X (E: D. it is easily seen that the mth-order aggregate series. will follow a multiplicative seasonal m A ( M . q) model in (20. When ni r S.s.31)..1. Suppose that z. ZT. N) model. 20. Oils.22). The AR order M is determined by (20.e.Q). Using the results from previous sections.. where have roots outside of the unit circle and the a. and the AT is a white noise process with mean 0 and variance ~ 1The .1. model with a seasonal period s. i. &-. follows an ARIMA(p. aggregation of seasonal ARIMA models was studied by Wei (1978b). N) X D. aggregation reduces a seasonal model to a regular AIUMA model.1 Temporaf Aggregation of the ARIMA Process 515 series.20. and cr.1. d. &-. d ) process. with no loss of generality. Wei (1978b) shows that the limiting aggregate mode1 of ZT becomes an IMA (D d. = (1 . follows a seasonal ARWIA(p. d ) model.as + where A$(d) becomes a (d -t. 2. to study the limiting model of a time series aggregate. yw(j)'s. which is independent of p and q.(d)is nonsingular. then 1 is also a root of the moving average polynomial for the Zjprocess. The Limiting model of the time series aggregates from a stationary model is a white noise process. we have where and a\ is the first row of ~ $ ( d )Note .3)d are related.B)'ZT and the autocovariances yw(j?for rv. of the underlying IMA(d.516 CHAPTER 20 Aggregation and Systematic Sampling in Time Series exists and equals the IMA(d. to the autocovariances. of the IMA(d. D + d ) process. d. q) X D. yutj)'s. follows an IMA(d. Upon normalization. 1) X (d 1) square matrix. the autocovariances yu(k) for UT = (I . d ) model for z. Wei and Stram (1988) show that the results hold also for norinvertible models and that if 1 is a root of the moving average polynomial for the zt process. model. by (20. ( d the ) aggregation matrix for the linear transformation that maps the autocovariances. From the construction. When z.1. d) aggregate model for G. that .16). Two interesting consequences of the results are the following: (e + 1. The limiting model is unique and is independent of the stationary portions of the nonaggregate model. In this case. Wei and Stram (1988) call the matrix ~ f . As a result.Q). one needs only consider the case where the underlying nonaggregate series z. it is easily seen that the matrix Ad. (d))'. Now.1. y. ( 1 .35)is continuous. Wei and Stram (1988) prove that the autocorrelation vector p y ) for the invertible limiting aggregate mode1 in (20. the eigenvector is scaled so that its first element equals 1. ( d where ). then Thus.40)is given by the eigenvector associated with the largest eigenvdue of ~ $ .If then Hence. roots of 8$?')(23) = 0 are all outside the unit circle. sip. and the AT are white noise with mean 0 and variance ui. by (20. Let the invertible limiting model of aggregates be - where 8iw)(B) = (1 . Let fm be the mth-order aggregation transformation given in (20. Wei and Stram give an interesting note .B & ~ ) L ?the ~ . > 0 and the aggregation transformation defined in (20.BldzT. . .1. = (y.1. if we let p r ) be the autoconelation vector of the limiting model.34). . yw(l).0jpO)f3. .1.1 Temporal Aggregation of the ARIMA Process where y. .Hence.. the limiting autocorrelation vector p r j is simply the eigen-vector of the aggregation matrix A$(d) with the associated eigenvalue a i p r ) . the autocorrelation vector of a limiting model is unaffected by aggregation. 517 .B)~z.and which is the variance of UT = (1 .y..(O). That is.35).20.(O) is the variance of ~ v =. we can choose m = 2.1.(k) of {[I . Table 20.2 repeat the sample autocorrelation functions of the original series and its first differences. For convenience.35) is continuous.).518 CHAPTER 20 Aggregation and Systematic Sampling in Time Series TABLE 20.e.1 and 20.2 Sample autocorrelations j?~. is stationary following an MA(1) model.3 Sample autocorrelations $o(k) of ((I . let us consider the number of monthly unemployed young women per 1000 persons between the ages of 16 and 19 in the United States from January 1961 to August 2002. Tables 20.show nonstationai-ity.. which was examined in Chapters 5 and 7.B)ZT] = {UT) . The limit can then be taken through n$ as k -+ a. we compute the sample autocorrelations of for nt = 6 and rn = 12 corresponding to the semiannual and annual total of unemployed young women. i. --+ EXAMPLE 20.B ) t . ( dcan ) be any integer larger than or equd to 2. TABLE 20. The autocorrelations for &. To examine the limiting behavior of the autocorrelation function of the aggregates.3 shows the autocorrelations of the differenced aggregates.Specifically. ) = {w.1 Sample autocorrelations &(k) of (z. It is evident that the original series z. is nonstationary and that the series of its first differences (1 .3 To illustrate the above results.B)z. the nt in ~ d . TABLE 20. which greatly simplifies the calculation of the coefficients of ~j in the polynomial (1 f 3 f and hence simplifies the construction of A$d) as illustrated in the following example.). when deriving the aggregate model. that because the linear transformation given i n (20. Because rank (AI(1) . When m = 12. there is one free variable.. the .. the autocorrefation structure of the Limiting aggregate model becomes Taus. det(Ai(1) . let us derive the invertibfe autoconelations &)(o) and p r ) ( l ) of the limiting aggregate model (1 .1.16).1. by (20. and it can easily be shown that Thus.3 is expected from the limiting behavior of temporal aggregates. as shown in Table 203.AI) = 0.7) gives y. 1) model for the aggregates &. with 500 observations of 6. k and (20. 1. (20.81) an associated eigenvector is =. Take nz = 2 and consider the Iimit of mk as k + 03. is and the eigenvalues are easily seen to be 2 and 8.B)zT = o [ ~ ) ( B ) A ~ .8Z)x = 0.Table 20.1 Temporal Aggregation of the ARIMA Process 519 Is the above phenomenon expected? To answer this question.e. The result with nr = 6. A. = 8 and its corresponding eigenvector is the solution of the system of equations.20. ( 22).1. For example. 1)model for 2.(k) = (1 ~ ) ~ ~ . the result shown in Table 20. The characteristic equation.1.2 clearly implies an IMA(1. also implies an IMA(I. i. This result may not be evident purely based on empirical values.9) imply that + + Hence.8) and (20.. Thus. ( ~ 4 ( 1 ). For m = 2 and d = 1. For each x in C. 2 ) + (y.B)ZT when nl = 6 and becomes .. 11. y ) for any real number a. 20. a(bx) = (ab)x. From an inner product. In summary. (y + 2). (a + b)x = ax -t. 2. it also tends to simplify the model fonn. This reduction in fiu(l) is a direct result of temporal aggregation.3.41 for $v. 2. however. however. the value of i U ( l )is .bx. (x + y) + 2 = x 3.x+y-y+x. y) = a(x. ( x t. The operation of addition and scalar multiplication satisfies 1.there exists a unique . 2. Two elements x and y are said to be orthogonal if (x.white noise phenomenon. 3.1 Hilbert Space A linear vector space L is a nonernpty set that is closed under vector addition and scalar multiplication such that for any elements x. x ) 2 0 and (x. As the order of aggregation becomes large. 3.x ) = 0. y). (x. 4. The operation of addition satisfies l.2 The Effects of Aggregation on Forecasting and Parameter Estimation 20. often denoted by [x[. ( x . a ( x + y ) = a x + a y .34 for Z& = (1 . we define as a norm of x. A simple AR process becomes a mixed ARMA model upon aggregation. the sample autocorrelation for this case indicates a. y) = 0. z in C and any real numbers a and b. m. There exists a unique element 0 in C such that 0 + x = x.B)z. 4. ~ = ) @.y. The operation of scalar multiplication satisfies 1.x in C such that x + ( . lx = x. It is easy to see that the finite linear combination of zero mean random variables is a linear vector space and E(XY) is an inner product.520 CHAPTER 20 Aggregation and Systematic Sampling in Time Series standard deviation of iU(k)is approximately equal to -2.2. (ax. = (1 .25 when nr = 12.TI. 2. as shown in Table 20. y. temporal aggregation may complicate model structure.y [ is defined as the distance between x and y and is called the metric of the space.. x) = 0 if and only if x = 0. Even though the value of i w ( l )is -. 2). An inner product is a real-valued function on a space C denoted by (x. 2 ) = (x. . It satisfies the following conditions: Then [x . such that the following are true: 1. the operations satisfy the following properties: I. hence. :t 5 N } . The point p(x) is called the orthogonal projection of x on ?lo. Under the inner product (I.*l.. this orthogonal projection is the same as the .) = E(z.p(x) [ = rnin.p(x)] is the residual of x from 'Ho. [x .2.then 71 = 7fo @ H & . is said to be a Cauchy sequence if A Hilbert space. and x. in the space.2 The Effects of Aggregation on Forecasting and Parameter Estimation 52 1 A sequence x. By the above Projection Theorem.. k2. be a zero mean stationary process. then ax by is also in the subset for any real numbers a and b. Thus.:t = 0.).y [. 20.41). Consider the closed linear manifold L{z. Then the sum of H I and 'Hz is defined as the set If 'HI and Z2are orthogonal subspaces in the sense that every element x E 'HI is orthogonal to every element y E 'Hz. or Koopmans 11971.ptxfl E Xd and Lx . we are interested in forecasting ZN + 1.denoted by Ff = L(z. : t 5 N ) . Every element x 6 7-l is uniquely expressed as where p(x) E 7&. then HI 71z is closed and hence is a Hilbert space. Under the minimum mean square error criterion deduced from the inner product E(z+. In time series analysis. it is known that the closed linear manifold H. and [x . is equivalent to finding the element ? N ( I ) in LIZ. X. Suppose that 'If1 and 'If2 are subspaces of a Hilbert space 7. Section 7.is a complete inner product linear vector space in the sense that the limit of every Cauchy sequence is also an element in the space so that the space is closed.x.:t 5 N j . the best linear forecast of Z N + ~is simply its orthogonal projection on the subspace L(z.). for some 1 2 1.2 The Application of Hilbert Space in Forecasting A linear manifold is a nonempty subset of a Hilbert space such that if x and y are in the subset.xJ2 is the distance norm between the elements z. .. It is clear that C{zt:t 5 N } is a subspace of 71.61.:r 5 N ) such that the mean squared error + . In this case. Section 7. f . based on its past history. we will write the sum as @ ?f2.% [x . where 7-ld is rhe orthogonal complement of 540. finding the best linear forecast of z ~ +which ~ . x. is an element in H ' .20.We have the following useful theorem [for the proof see Halmos [1951]f: + Projection Theorem If ? islo a subspace of a Hilbert space X. is a Hilbert space (see Parzen f1961al Anderson [1971. based on the knowledge of past history (z.1. E(zt . Let z.. It is easy to see that X.I..1. follows the model in (20. : ~ by the same theorem. the best linear forecasts of ZN+[based on the nonaggregate process zl and the aggreg$e s e d a Q are simply the projections of ZNfl on the two subspaces and 'FI. . ~Also.2.P. then this forecast can be derived from tbe nonaggregate model of z. . . and be the closed linear manifold spanned by the mth-order aggregate series ZT. letting Zi(1) and ZN(i) be the best linear forecasts of ZN+h E 2 1. Thus. .33). follows a general multiplicative seasonal ARIMA model in (20.is a subspace of 'Hd and that H$ is a subspace of a#. and the aggregate process &.1.Thus. based on the nonaggregate process z. be the closed linear manifoId spanned by the past history of the aggregate series {ZN. 20. : ts N j . Let 7& = C ( z t : t = 0.. the forecast error (z~+ . If both nonaggregate and aggregate series are available. The ..32) and hence the aggregate series ZI. f2.3 The Effect of Temporal Aggregation on Forecasting Suppose that we are interested in forecasting a future aggregate ZM1 at time N. ~ .respectively. and ZrnN-1. forecasting from an aggregate model is in general less efficient than forecasting from a nonaggregate model. we have Thus. f l . . Suppose that the nonaggregate series z.) be the closed linear manifold spanned by the nonaggregate series z.). ZN. Let be the dosed linear manifold spanned by the past history of the nonaggregate series { z . . or from the aggregate model of ZT.522 CHAPTER 20 Aggregation and Systematic Sampling in Time Series conditional expectation E [ Z ~ + ~5 ~NZ] .~&(l)f is in the space C L { z t:t 3 N ) and hence is orthogonal to the subspace C { z . It is of interest to examine the relative efficiency between the two forecasts. 2. - . 1) = 1for all m and 1 if p = d q = 0 and s = mS for some integer S.2 The Effects of Agsregation on Forecasting and Parameter Estimation 523 variances of the forecast errors based on the two models can be explicitly calculated. we first note that where $ro = 1 and the *'s are obtained similarly but based on the nonaggregate model (20.1. the variance of the foiwasterror based on Lhe aggregate model is given by where !Po = 1 and the qj's can be calculated from the aggregate model (20.1. [ ( I ) << I. .32).33) using the method discussed in Section 5. To calculate the variance of the forecast error from the nonaggregate model. [ ( l ) = hm J. it was shown by Wei (1978b) that ((m. 3. if d = 0. I)= 1 for all I. I ) 5 1. Because and &.(I) are obviously unbiased forecasts for (20.if d > 0. Let It was shown in Wei (1978b) that the variance of the forecast error based on the nonaggregate model is where m. 4. b(nl. In fact. and Y(1) -+ 1 only when 1 + 07. 0 5 c(m. I) 5 1 for all m and I.20.2. Specifically.5) implies that 0 5 l(m.2. (n~.1) has the following properties: 1.1 Caj = 2 Gj-ii=O The relative efficiency of forecasting future aggregates using the aggregate model can be measured by the variance ratio where m is used to indicate the order of aggregation. 2.2. For large samples. The loss in efficiency is less serious for long-termforecasting. particularly when the nonseasonal component is stationary.2. the variance-covariance matrix likelihood estimator .10). It often happens that over the relevant region of the parameter space. the loss in efficiency through aggregation can be substantial if the nonseasonal component of the model is nonstationary. There is no loss in efficiency due to aggregation if the basic model is a purely seasonal process. the above results imply that insofar as forecasting the future aggregates is concerned.10). let r ) be the k x 1 vector of all the k parameters. and where we use that. Using (7. the partial derivatives lii are constant over the region of approximation. . 20. in the quadratic approximation (20. we have Hence.j is given by V(4) for the maximum where I(q) = -E[lii] is the information matrix for the parameter q.524 CHAPTER 20 Aggregation and Systematic Sampling in Time Series h summary. the fog-likelihood function is approximately quadratic in the elements of r) so that where the partial derivatives are constant. and the seasonal periods s and S for the basic and aggregate models are related by s = 111s where m is the order of aggregation.4 Information Loss Due to Aggregation in Parameter Estimation For a given time series model. 12.20. Unfortunately.(?) are the corresponding information matrices based on the nonaggregate and the aggregate models. respectively. Define 7(nt) = 1 det V(6) . respectively.1. N(0. Thus.In fact. r(m) in (20.19)is an increasing function ofm and the number of parameters in the model.i. N(0. It follows that and hence To see the implication of (20. the relationship between the parameters in the aggregate and nonaggregate models is extremely complicated. the more serious is the information loss in estimation.2 The Effects of Aggregation on Forecasting and Parameter Estimation 525 Let 6 and 5 be the maximum likelihood estimators of q based on the nonaggregate model and the aggregate model.assume that P = 1. mu:). As shown in Section 20.i. if the nonaggregate model of z.d. is a purely seasonal model with the a. We have ~ ( 2 )= ~ ( 3 = ) 3.2. a:). and the nonaggregate series is a 2 5 monthly series with s -. hence.-det V ( G ) Z=- 1- det Ia(rl) det Id(?) ' We can use ~ ( m to ) measure the information loss in estimation due to aggregation. ~ ( 6=) 5. a. when estimating the parameters in the underlying model. .4. temporal aggregation leads to a tremendous loss in information. and ~ ( 1 2 )= $. then the aggregate model for the mth-order aggregate ZT with s = nzS for some integer S becomes where the AT's are i. Q = 0.19).d. being i.2. the derivation of ~ ( mfor ) a general model is difficult. however. Then and where I d ( q ) and I. The higher the order of aggregation and the more parameters in the model. a2. is now a moving average MA[(p i. and the a. Then. I3 is the backshift operator on y ~ Now. In quality control.d)l process. which is nonzero only when . .. be the reciprocal of the roots of #p(B). We examine the relationship between the underlying process zt and the model based on the sampled subseries yn where y~ = z . We now study the phenomenon of a stock variable. Let + where B = Bm. = flfX1(1 . Note that ByT = BmzmT= Z.. ~ and rn is the systematic sampling interval..p . in the stock market. it is rarely the ease that every possible value of a stock variable is observed and recorded. . ..3.B ~ ) ~ zwhich . .6 P m ) ( l . assume that the underlying process zt follows theARIMA(p. In practice.T-~ = Z. the often-reported series is the daily closing price.~Hence. (20. For example.1) can be written as Multiplying both sides of the above equation by we get Let t. 4 q) process where 4pfB) and 8&B) are stationary and invertible AR and MA polynomials.'s are white noise series with mean 0 and variance ~ 2 Let . E(CTCT-j) = E(VnlTVmT-mj). With no loss of generality. we discussed temporal aggregation of a flow variable. The value of a stock variable such as the price of a given commodity exists at every time point. the observed series is often only a subseries obtained by a scheme of systematic sampling. and 8.526 CHAPTER 20 Aggregation and Systematic Sampling in Time Series 20. however. respectively.(T-~) = y ~ .3 Systematic Sampling of the ARIMA Process In the previous sections. al.d)nr ( q . only every lath item is inspected. Instead.. . . the series y~ where r 5 [ { p d ) (q . autocovariance generating function.d)/nz].Now let = Then. . That is. d.p)/mj. as where (:) = . and the Pj's and the are functions of +is. r) process + + 527 + [q .(B). we consider first the case in which the underlying process zt follows an integrated moving average IMA(d.the a:s are functions of +.3 Systematic Sampling of the ARIMA Process + + n z j S { ( p d ) m ( q . (1 -~ ) ~ The z .z!/(r!(n - r)l). the erYsare white noise with mean zero and variance ma.p . q) process becomes an W ( p .2) reduces to a result obtained by Brewer (1973). and &. series. ej's. r ) process with r 5 [ p (q . . W e n d = 0. . Let We obtain the autocovariance generating function of v. q) process G-a + Let ~ v = . g.d ) } or j 5 [ ( p + d ) follows an ARIMA(p. V . (20.20. ~ T= (2 .3.B m ) d ~ m T(1 - B)dyl.d)/nl]. To study the limiting behavior of the sampled subseries.p . where (:) = 0 for n < I-. Thus. of jv. is given by where yw(j] is the jth autocovariance of the IV. the subseries systematically sampled from a stationary ARMA(p.p.s. . and Weiss (1984). as m increases. we do not repeat them here. Because an ARIMA(p. q) process can be approximated with any desired accuracy by an IMA process. It would be interesting to investigate whether the random walk model is indeed the underlying model for the stock prices or simply a consequence of systematic sampling. Werner (1982). d . The results presented in this section are taken from Wei (1981) and mainly deal with the effect of systematic sampling on model structure.as the linear causality from xt to y. . y as the instantaneous linear causality between x.. 10 for it 3 d. F.2. it approaches an IMA(d.. the effects of systematic sampling on parameter estimation and forecasting can be examined analogously to the effects of temporal aggregation discussed in Section 20.. v f2 . Other references on this topic include Quenouille (1958). + .. and y.. Thus... the above limiting result holds also for any homogeneous autoregressive integrated moving average process.1) process..4.Once the resulting model of a sampled subseries is known.1 Decomposition of Linear Relationship between Two Time Series The study of causal relationships among a set of time series variables has been one of the most important problems in the literature. Geweke (1982) derived a decomposition of linegr relationships behveen two time series variables xt and y. q) process approaches a simple random walk model.4 The Effects of Systematic Sampling and Temporal Aggregation on Causality 20. . and to Wei and Tiao (1975) and Liitkepohl (1987) for the effect of systematic sampling on forecasting.. In summary. q) process follows an ARIMA(p.d)/nt).. Systematic sampling of a stock variable is an important special case of a more general problem known as missing observations in the literature. F. In an interesting paper. .When rn becomes large.and F. as the linear causality horn y. + F. and y.as the linear relationship between x. d . The random walk model has been widely used to describe the behavior of stock prices. the sampled subseries from an ARIMA(p. d. the subseries yr approaches an IMA(d... Brewer (19731.. when d = 0. = F. the limiting model from a stationary process becomes white noise. For reference. + + 20. By denoting FX. to x. we refer readers to Zellner (1966) and Jones (1980) for the effect on parameter estimation of missing observations from ARMA models.p . ... r) model where r is less than or equal to the integer part of ( ( p d ) (q . d...1) process. 1. he showed that F. As expected.. In particular.d-n-l . d..528 C W T E R 20 Aggregation and Systematic Sampling in Time Series It is easy to show that for ~ t > t q I:. a subseries obtained by systematically taking every nith item from an ARTMA(p.. Fy +. derived from (20. and y.Note that the marginal series y. 2. The linear projection of y.4.5) is obtained from the first component of the joint model in (20. The linear projection of y.4 The Effects of Systematic Sampling and Temporal Aggregation on Causality 529 Consider a bivariate time series x. The projection (20. V b Now.4) into an autoregressive form.4.1) into an autoregressive representation where the representation (20.. .3) is obtained by inverting (20. and a$ = IT:. Hence.6).4.By inverting (20. is a white noise series with mean 0 and variance af.20. let us consider the linear projections of y.4. b.1) is for some a(%) such that z61t is white noise.]' is a bivariate white noise series with mean 0 and variance-covariance matrix 2= 2 ["a L+ub u$].4.4. on L{yi:j < t ) : where i d l . on L{ { y j :j < t ) U { x j :j < t ) ) : where 1 1 is~ a~ white noise series with mean 0 and variance a$. uzt = a. with the invertible moving average representation where [a. on the following different subspaces: 1. 4. is a white noise series with mean 0 and variance a$.onL{{yj:j < f ] U { x j : .6) by the matrix. = at . can be written as where r t j ..that is.4.4. Theprojectionofy. which diagonalizes the variance-covariancematrix given in (20.1). on L { ( y j : j < t } U { x j :j r t ) ) : where ug. Hence.7) is obtained from the first component of the resulting system. x.)): where u4. by premultiplying the matrix the projection (20.m < j < w:. u g . is a white noise series with mean 0 and variance ui.2). is a white noise series with mean 0 and variance o$.(uab/ug)bp 4.. Define .4.4.y.9) by $-' (B).530 CHAPTER 20 Aggregation and Systematic Sampling in Erne Series 3.8)is obtained by multiplying (20. In terms of its linear projection on the space (xi : -a 4j 4 a}.]' given in (20.Let be the cross-covariance generating function of [y.By premultipfying (20.and The projection (20. The linear projection ofy.4. which gives The one-step ahead forecast error variances for the other projections can be easily shown to be crj = cr? = u$ = a:.2 An Illustrative Underlying Model For the purpose of illustration and discussion. is related to the input variable x.]' is white noise with mean 0 and variance-covariance matrix Using (20. we have where (Y is obtained by solving the equation subject to -1 < a < 1.4. Hence. The model can be written in the invertible bivariate moving average form where [a. and bt are zero mean white noise processes with variances cr.4 The Effects of Systematic Sampling and Temporal Aggregation on CausaIity 53 1 It can be easity seen that 20. in terms of the time unit t by where a.4..4). and erg. b. suppose that on some theoretical grounds the output variable y. respectively. Fy+x = FV = 0 and . and they are independent of each other.20. the linear relationship between x and y as represented in (20. toy.given in (20. where YT = ymT and XT = xm=. in (20. o:/u$ = I.Systematic sampling preserves the direction of the causal relationship among the variables. Thus.4. letting v = 1. the corresponding model for (YnXTf is where AT and Erare two independent white noise series with mean 0 and variances and respectively. .4showing causality for the underlying model.12) contains only the one-sided linear causality from x. Table 20. and m is ihe systematic sampling interval. crz/ui = 1.4. 20. and rn = 3.3 The Effects of Systematic Sampling and Temporal Aggregation on Causality The Effect of Systematic Sampling on Causality Suppose that both y. That is.4.4.. For numericat illustration. we obtain Table 20.12)are stock variables and that only the subseries YT and XTare obtained.14). It can be easily shown hat for the model given in (20.5 gives the measures showing the effect of systematic sampling on causality. The corresponding bivariate moving average representation is For v = 1.4. It weakens the relationship as shown by the magnitude of the coefficient of Xr.4 Linear causality for the underlying model in Equation (20. and x.12).4. the linear relationship between XT and YT still contains only the one-sided linear causality from XT to YT. however.532 C W T E R 20 Aggregation and Systematic Sampling in Time Series TMLE 20.12). 4.(P2) and the 2 X 2 symmetric matrix G(B) having elements . in (20.12) are flow variables and the observed series are mth-order aggregates and For the given one-sided causal model in (20.13). CT is a bivariate white noise process with mean 0 and variance-covariance matrix X. The Effect of Temporal Aggregation on Causality Now suppose that both y. and H and 2 are related such that with S = - $)2(1 . and x.4.4. for ZT = [Yz XTl1.12) or (20.5 Effect of systematic sampling on causality. in terms of a moving average representation.20. was shown in llao and Wei (1976) to be where q = +m.the corresponding model.4 The Effects of Systematic Sampling and Temporal Aggregation on Causality 533 TABLE 20. 6.4. and F. F.5. when 4 = 0. to y. This result is true even for the case where the input is white noise. it was shown in Table 20. however. Thus. = ln(2.4.5 as shown in Table 20. . we obtain the linear relationship and causafity For v = 1.. F.534 CHAPTER 20 Aggregation and Systematic Sampling in Time Series TABLE 20.2.6 and Table 20..1 Testing for Linearity and Normality One widely used linearity and normality test is the bispectral test.2. this test was proposed by Subba Rao and Gabr (1980. It was later modified by Hinich (1982). For example. 20.13) for 4 = . The latter magnitude of FPy is larger.4. . into a pseudo two-sided feedback system. we use Hinich's modified bispectral test. temporal aggregation turns the one-sided causal relationship from x. the magnitude of the measures F. i.14) to be weaker because 141< 1.5 as shown in Table 20. In fact.. In (2. cannot be used for comparison unless the time units used are the same.14) based on sampled variables YT and XT for nl = 3 gives F. and = 1. CT:/(T~ between XT and YT upon temporal aggregation for 4 = 0 and # = .5 The Effects of Aggregation on Testing for Linearity and Normality In this section. As pointed out by Wei (1982). it is clear that upon aggregation the instantaneous linear causality between XT and YT becomes a dominant force in the linear relationship between XT and YT.12). As described in Section 19. 1984). and nl = 3.4 that F*. The corresponding model (20. from Table 20.. The causal effect £rom XT to YT in the fatter case.e.5.6 Effect of temporal aggregation on causality.4. is easily seen fiom (20. as discussed in Tiao and Wei (19761. we study the effects of the use of time series aggregates in performing linearity and normality tests..33) for = . for the model given in (20. - 20. In this discussion.5. . and * denotes complex conjugate.. be a third-order stationary process.I)M/2 : v = 1 . The principal domain of f ( w l ./~z. k2) is the triangular set 0 = {O 5 olI V . . = M2. . = 2 ~ ( 2 x. s) in a square of M~ points with the centers of the squares being defined by the lattice of points L. ~is1approximately ~ distributed. 1 . + + i) where . Let where 8. . Z.2. 2 .e.. . . To estimate the bispectral density function based on a third-order stationary time series Z1.as x2(2. .. as a noncentral chi-square variable with two degrea of freedom and noncentrality parameter -+ . 2 = Z:= l~. . .1 ) ~ / 2(2v .. .2 that the bispectral density function is where C(kl.5 The Effects of Aggregation on Testing for Linearity and Normality 535 Let 2. hKv). 2u1 o25 2w).20. .5. The estimator f(mI. .. The estimator of the bispectral density function is + .Z. and R. deduced from the properties of C ( k l .1)M/(2n). w2) is constructed by averaging the F(j. Hinich (1982) shotvs that 2 1 ~ . Define also the lattice of points L = ((2u . u 2 ) . If the square is in D. is the finite Fourier transform of Zl.i.O < s 5 j. 11 and for each pair of integers j and s define the set D = {O < j 5 11/2. and it is sufficient to consider only the frequencies in this region. f ( w ) denotes the estimator of the spectral density function constructed by averaging M adjacent periodogram ordinates. Equation (20. Recall from Section 19. 2 j s 5 n ) . w2)is consistent if M = nC is rounded to the nearest integer with < c < 1 . oz I ol. k2) is the third-order cumulant of Z. we let wh = 2~rIt/11for h = 0..2) shows that j ( q . is the number of ( j .. . 1' and rc I nl(2M) -v in the set D.then R .. . s) in the square that are in D but not on the boundaries j = s or (2j s) = 11 plus twice the n u ~ b e on r these boundaries. v) Test of Linearity A zero mean stationary linear process zt can be written in the form where a.2 and 19.2. where h in L.. 2. the statistic is approximately x 2 ( 2 ~A). Bz. we have and f (o)= :I*(e-")I2. ./2.2.. From Sections 12.5. 00 Zj=-lqjdljl < 00.. is a sequence of independent and identically distributed random variables with w zero mean and variance Var(a..YELAU. for large n.4) is constant for all squares in D. v12) Consequently. $(x) = Zj.8) that for a linear process parameter (20."l2*Because these variables are independent. Here.-$jxj. the distribution x2(2.5. = hf2 and M = nc. i. the parameter All can be consistently estimated by E ( I ~ . io) converges to a x2 (2.vand Q. the number of (14. z.1.5. and B is the backshift operator. is approximately equal to ti2/(12@).536 CHAPTER 20 Aggregation and Systematic Sampling in Time Series The bispectral test is based on the variables 2[Gt. = zt-...e. = ~I.. the noncentrality where we note that R. Because 21k. ff We see from equations (20.7) and (20.) = r+i and ~(a:) = p3. AO) as n+a.v12 is independent and = 1 + h.. 20. where t .I2 to measure their dispersion. ug)with ug = . 1. 0. the tests based on the Z7. . cannot be a linear Gaussian process. then the noncentrafity parameter will be different for different values of Ir and v. . the statistic defined in (20. Therefore. 1) model. series follow also the same lines as for the z. then p30= 0. For the given stationary linear Gaussian process z.8)/[&[g((. Some Simple Nonlinear Time Series Models As introduced in Section 19. under the null hypothesis of linearity.~and B is the backshift operator.5.8. we will consider the folIowing simple models: 1.6).5. Z ~ o ~<' P W . 3% = &--I. 8 is the 80% quantile of the . The sample 80% quantile 2.8(1 . i.d. Thus.* Test of Normality To test a time series for normality.20. is the unobserved nonaggregate time series. if a. io) distribution with density function g and QD denotes the number of squares wholly in D. its bispectral density function f(ol. many nonlinear time series models have been proposed in the literature.l2 wilf be larger than that expected under the null hypothesis.~j. if the null hypothesis is false. On the other hand. To study the effects of aggregation on the linearity test.. The time series ZTis the aggregate time series.2. Under the null hypothesis.5) is approximately X 2 ( 2 ~0).e. we first note that in (20. ZT = (1 + B Bm-')zmn where T is the aggregate time unit and rn is the order of aggregation. where a is the for large n.5 The Effects of Aggregation on Testing for Linearity and Normality 537 Thus. i. (1986) proposed using the 80% quantile of the 2(tu. the noncentrality parameter of the asymptotic distribution of the 21k3.e.. Rejection of the null hypothesis implies that the bispectral density function of 2. Thus. is Gaussian. X2. Z. testing a linear process for normality can be achieved by checking whether its bispectral density function is zero at d l frequencies.5.5. it is known from Tiao and Wei (1976) and Wei (1978b) that ZT is also a stationary linear Gaussian process with the representation where ET is a sequence of i.+ Let ZT be the nonoverjapping aggregates of z. BiIinear BL (I. 'the sample dispersion of the 2/C.. v in L and squares wholly in D. * ( x ) = 2Fo*j. is not identically zero. random variables with zero mean and variance Var(eT) = a: and E(E$) = p p Here.g)]2. As a result.12is constant for alI u.. and z.2.. The parameters qj and uz are functions of the parameters and L T ~ of the nonaggregate process ia (20.y2(2. Ashley et al. w2) is identically zero for all oland wz. a central chi-square and the hypothesis is rejected if > significance level of the test. Thus.2 The Effects of Temporal Aggregation on Testing for Linearity and Normality +..8 has an asymptotic distribution N(&.61. in (20. series.Linearity is rejected is significantly larger than (& ifl.5.6).i. the test is based on the constancy of this parameter.000 observations each were generated from the above nonlinear models and were temporally aggregated with orders of aggregation ni = 2.708 1.793 2.623 0. Thus.626 3. is a sequence of i.a. . standard normal variables. respectively. To this end.734 1.306 8.55 for the smoothing parameter. a. This ratio measures the impact of aggregation while taking into account the decrease in sample size. Because nonlinear models do not generally share this property. Recall that the noncentrality parameter A .577 0.858 1. 3.988 2. = ( . 4.556 0. therefore. 2.. tends to be constant as m increases.569 0.538 CHAPTER 20 Aggregation and Systematic Sampling in Time Series 2.020 3. Teles and Wei (2000) reported a simulation experiment in which 1000 time series of 12.835 2. is constant far all n.)(. denoting the (sample) standard deviation of the estimates of the noncentrality parameters computed from the simulation by SL)and S& for the nonaggregates and the aggregates.7 and note that SDA= SD when m = I. Threshold TAR(2.606 0. 0 0 4 ~ .d.601 4.i. Order of aggregation Model 1 2 3 4 6 8 12 BL NLAR NLMA TAR 18.3.563 0. 3) model.7 Aggregation effects on the dispersion of the noncentrality parameter. z.568 0. NLAR(1) model. We use the value c = 0..6.144 1. --t ~ a.863 2. 12.3.249 0.899 . In the above models. NLMA(2) model. The reduction of the TABLE 20.990 4.409 0. The reslllts dearly show that the dispersion of the values of the noncentrality parameters decreases as a consequence of aggregation and that this effect generally increases with the order of aggregation. Aggregation Effects on the Test of Linearity To analyze the effects of aggregation on the test of linearity.008 0. A. we show the values of ~ ~ ~ l ( 2 n Tin-Table l ) 20. v (for squares wholly in the principal domain) for linear models.692 2. The effect of aggregation on the test statistic is severe even for low values of m.1 62 1.593 1.55zt-1) 4.749 0. we fist analyze how aggregation affects the dispersion of this parameter in the lattice in the principal domain. 013 0. we know that the noncenttality parameter of the asymptotic distribution of 21q. More important.019 0.9).027 0. the test of linearity is very sensitive to aggregation. The null hypothesis is rejected for large values of iA. i.008 0.747 0. nz = 2. asymptotically X 2 (2Q..095 0. Because m 2 2 and < c < 1.002 0.000 0.249 0.Xo = QAo.000 0. Thus.. is non-Gaussian.657 0. the power is very low. 4 A Recall that if the process z. 3 4 6 8 12 0. n i / ( 1 2 ~ 3 = ) rn2{'-') n / ( 1 2 ~ * ) w ln2Cc-')Q and MA = n i . Aggregation Effects on the Test of Normality Under the null hypothesis of Gaussianity.024 0. t_he distribution of the test statistic S is . A similar study was conducted by Granger and Lee (1989) on some time domain tests.l2 decreases as a consequence of aggregation and therefore the hypothesis of linearity will be less rejected. we denote. where p20 5t 0.456 0. Even for the lowest order of aggregation. it implies that QA< Q.Y = 2nk-I pjo/u. the test statistic based on the aggregate series as &. for large n~ = rdm.. For a h e a r non-Gaussian process z.01 1 0. We now analyze how this parameter is affected.. which is asymptotically X 2 ( 2 ~ A0) . Order of aggregation Model 1 2 BL NLAR 0.001 0.044 0.Consequently. aggregation leads to a decrease in the degrees of freedom of the asymptotic distribution of the test statistic as a consequence of the smaller number of observations in the aggregate time series.. it causes the noncentrality parameter to change..5.281 0. Thus.6= ho for all u. reflecting the strong decrease in the variability of the noncentrality parameter noted above.001 0. Teles and Wei (2000) also investigated the power of the test with the results shown in Table 20.5. The effects of temporal aggregation we have obtained are much more severe than the results reported in their study. . We can draw two conclusions: I.990 0.003 0.798 0. and the higher the order of aggregation.e.5 The Effects of Aggregation on Testing for Linearity and Normality 539 TABLE 20. the test based on the nonaggregate time series).110 0. i. assuming for simplicity that all squares Ee entirely in in.. the greater the loss.416 NLMA TAR . because QA < Q. A =: E.5).e. folfo\ving (20. 2. Similarly.005 0. the statistic SA will be asymptotically x 2 ( 2 ~ AA*).. The most striking conclusion is the abrupt loss of power that occurs for aggregate data.148 0.e. i.v)2 is constant as shown in expression (20. A).013 0.8 (nz = 1 represents the situation of no aggregation. A U. where QA F. Aggregation causes the power of the test to decrease..8 Aggregation effects on the power of the bispectral test.20. v and for all squares in D.000 variability of the valtes of the noncentrality parameter implies that the sample dispersion of the estimators 2l1r..000 0. that follows a stationary and invertibleAKMA(1. . and a.e.6 The Effects of Aggregation on Testing for a Unit Root To test for a unit root. hA = QahA.13) shows that the noncentrality parameter of the asymptotic distribution of 21t. For example. It also means that aggregation induces normality.e. i.and & depends on the relationships between pjcand E. Consequently. & with [#I < 1 and 101 < 1. the above expression shows that the relationship between hA. 1) model. the stronger this effect will be. we consider the AR(1) model and test the null hypothesis Ho : 4 = 1 against the alternative HI : # < 1.. and between v. i. AA < A. and the value of hA decreases at the rate of the square of the aggregation order.. The higher the order of aggregation.. the noncenkality parameter of the distribution of the test statistic iA also decreases. fewer rejections will happen. a ceneal chi-square.540 .. is assumed to be a sequence of independent and identically distributed variables with zero mean and variance a : . Teles and Wei (2002) show that the relationship between the noncentrality parameters A. and Q I= 0.Bearing in mind that m 2 2 and < c < 1. therefore.CJ. The decrease of the noncentrality parameter implies that temporal aggregation brings the approximate distribution of the test statistic closer to that valid under the null hypothesis.5.. the corresponding noncentrality parameter is also constant: where pjs = E(EQ and oi = Var(q). Thus. and hA. 20.. the null hypothesis will be less rejected when tested with the aggregate time series than with the basic series. for which the noncentrality parameter is zero.is given by Because $ < c < 1 and m r 2.12 decreases at a rate of m2 as the order of aggregation increases.. if & is the mth-order aggregate series from the series z. As a consequence of the reduction of A&. expression (20. where a. CHAPTER 20 Aggregation and Systematic Sampling in Time Series Because ZTis also a linear process. The corresponding model for the aggregate series is shown by Teles and Wei (1999) as . -2 1 = 12 . time series are frequently obtained through temporal aggregation.B)z. the series used in testing for a unit root are often time series aggregates. its where S..i.The Effects of Aggregation on Testing for a Unit Root 20. As a result. which is also commonly known The limiting distribution of this statistic under the null hypothesis is given by A where W(u) represents the standard Wiener process.6. The Studentized statistic of conventional regression t statistic. where 541 . is the estimated standard deviation of $ and 8: is the OLS estimator of o:.e. It is important to kno-iv the consequences of unit root testing when nonaggregate series are not available and aggregate data are used. i. model (20. The limiting distribution of Z$ under the null hypothesis is given by Because of the process of data collection or the practice of researchers.1 The Model of Aggregate Series Under the nuU hypothesis Ho : # = 1.=2 &zt-*)2. " -2 1 11 2 A? " 2 (z. 4.1) becomes a random walk.6.6 Recall from Chapter 9. (1 . &.. under the null hypothesis.I). = a. For example.e. is.. 20. the series commonly used in analysis and modeling are quarterly or annual totals.that the test statistic is n($ as the nomalized unit root estimator. from the results (20. I .6. when IV increases. then@ = 0 a n d u 2 = 2. is and. with zero mean and variance u:. because under the null hypothesis & = + .6.~ with .d. O increases in absolute value. It is easy to see from that O is a decreasing function of ni. These results imply that the aggregate series folloivs an ARIMA(0.71.@ E ~ .6) and (20.2 The Effects of Aggregation on the Distribution of the Test Statistics To examine whether the aggregate series ZTcontains a unit root. 20.7).6.6. Moreover. then (~2. which means that the unit root remains after temporal aggregation. the aggregate time series is still nonstationary.we have Therefore. 1) model.i. i s . we conclude that.the moving average parmeter @ of the aggregate model is negative and it depends exclusively on the order of aggregation m. Ifnz = 1. If ni 2 2.we should test the null hypothesis Ho: @ = 1 against the alternative HI:@ < 1 from the model where lTfo1lows the MA(1) model YT = ET . the ET being i. as shown by (20.zr - O ET-I. Because O < 0. we obtain the following expression for the normalized unit root estimator: . The OLS estimator of Q. .542 CHAPTER 20 Aggregation and Systematic Sampling in %me Series where the s~ are independent and identically distributed variables with zero mean and variance uz and the parameters O and gz are determined as follows: 1. 20. where zl = 0 and a.-1 a. 3 . We start by studying the effects on the empirical significance levels of the test. 12.3) and (20. Because nt 2 2. 1) random variables.6 The Effects of Aggregation on Testing for a Unit Root 543 The Studentized statistic .000 time series of n 600 observations each were generated from a random walk model z.e.6.6.6.6. as given in equations (20. the distributions of the aggregate statistics are shifted to the right relatively to those of the basic statistics given in (20.Of the OLS estimator 6 for (20. and the shift increases with the order of aggregation.13) show that the limiting distributions of the test statistics for the aggregate time series depend on the order of aggregation m. 4 . Therefore.000 values of each test statistic were generated from the appropriate limiting distributions. the null hypothesis of a unit root is rejected when the values of the test statistics are smaller than the appropriate critical points. In the finite sample case. we investigate more explicitly the impact of the use of aggregate time series on the test in terms of its significance level and power.20. the empirical significance levels will be lower than the nominal level.6. because the rejection region is on the left tail of the distribution. 8 . = z.6. is a sequence of independent and identically distributed N(0. Concerning the Limiting case.6) is.4. 12. 641-642) may lead to improper conclusions of the test when aggregate series are used to test for a unit root.13) for the aggregate case with Tn = 2. The observed significance levels + - . 100. 6 . i. where we applied (20.6. To assess this effect. In this section. 52-53) and by Fuller (1996.6.6.12) and (20. In fact. we conducted a simulation experiment as follows. the distribution shift to the right caused by aggregation will lead to fewer rejections than expected under the n d l hypothesis..5) for the basic series and in equations (20. 50.6. The limiting distributions of the test statistics under the null hypothesis are again shown by TeIes and Wei (1999) as Expressions (20.. pp.12) and (20.8.10).3 The Effects of Aggregation on the Significance Level and the Power of the Test The shifi to the right shown by the distributions of the test statistics implies that the percentiles tabulated by Dickey (1976.6. The generated time series were aggregated with orders of aggregation nt = 2 .3) and (20.51. under the null hypothesis. pp.6.3. 52-53) andFuller (1996. and the impact is very strong.2.01 50 0. For negative values of the moving average parameter. pp.1.021 1 0. we followed the procedure of Dickey (1976). the tables of percentiles provided by Dickey (1975.0213 0. and Cheung and Lai (1995) showing that the Dickey-Fuller test is severely affected by the presence of a moving average component in the model. As a result. The distribution shift of the test statistics also has important implications on the power of the test.0174 0. however.9 Aggregation effects on the empirical significance level of the test. the observed significance levels of the test in their simulations were also lower than the nominal levels. Pantula (1991).6. 0. where nn = 1 represents the basic case when there is no aggregation.0153 0. 1 2 Order of aggregation 3 4 6 8 12 0. To obtain the corrected percentiles. Because the null hypothesis is rejected when the test statistics are lower than the critical points.0154 Finite sample NURE Stdt.0165 0. we constructedTables J and K in the appendix. Stat.0509 0. NLJRE and Stdt. the critical points provided by Dickey and Fuller are appropriate. for the basic series (m = I). and 12.0161 Limiting distribution based on the critical points provided by Dickey (1976. even for large sample sizes. which is the situation we have. 641-642) need to be adjusted when aggregate time series are used in testing for a unit root. 52-53) and by Fuller (1996. The observed significance levels decrease when the order of aggregation increases. pp. In fact. i. which are suitable for thg distributions of the test statistics for the most commonly used orders of aggregation m -. the empirical significance levels are practically coincident with the nominal level of 5%.0500 0. We adapted the procedure . Thus. pp.0157 0.9. These conclusions are also supported by several simulation studies like those reported by Schwert (1989). Agiaklogfou and Newbold (1992).the power of the t a t will decrease and will be lower than that obtained for basic time series. respectively. we observe that a serious impact occurs even for nt = 2. In the table. pp. where m = 1 represents no aggregation with values close to those in Tables F and G except for rounding.e. 641-642) obtained from the simulation are shown in Table 20. The nominal significance level used is 5%. It is clear that the effect of aggregation on the test is severe for any order of aggregation. there will be fewer rejections of the null hypothesis than expected under the alternative hypothesis of no unit roots.4. Moreover.544 CHAPTER 20 Aggregation and Systematic Sampling in Time Series TABLE 20.3. Stat. which is based on Monte Carlo simulations. the use of the unadjusted percentiles as critical points for the test based on aggregate series will lead to smaller rejection regions than implied by the signi6cance level adopted for the test.. The simulation results clearly show that. To avoid the problem.0173 0.0154 0.0150 0. represent the normalized unit root estimator and the Studentized statistic. 20. 52-53) and by Fuller (1996. new one-family houses sold. First. Consequently. From the tables provided by Dickey (1976.9205 .i. 1). hence. pp.60.0182 = -2.1)/0.9603 . N(0. 641-642). Because the test is based on an aggregate time series with m = 3 and n~ = 80. we analyze two examples: a simulated series and the time series of the U. we reject the null hypothesis of a unit root and conclude that the aggregate series ZT is stationary. 1) model gives a nonsignificant MA term. is stationary. At a 5% significance level. Tables J and K are for models wvithout a constant term. The least squares estimation leads to the following model (standard error in parenthesis): The values of the test statistics are rtA(6 .20.53 and = (0. and for the Studentized statistic it is -1.1) = -9.9603 . We refer readers to Teles (1999) for more detailed tables including tables for models with a constant term. we will use the adjusted critical points with nn = 3 from Tables J and K..4 Examples To illustrate the effects of aggregation on the tests of a unit root. For example.36 and (0. The sample autocorrelations and partial autocorrelations (table not shown) suggest an AR(1) model.77. where the a.1)/0. 9 5 ~ . we will not assume a known model and will proceed with a standard model building procedure to identify a best model for the series. The sample sizes shown in the tables are the sizes of aggregate series.6 The Effects of Aggregation on Testing for a Unit Root 545 and computed the test statistics for aggregate series.~ series was aggregated with rn = 3. Let us now consider the aggregate series ZT for m = 3 (80 observations) and use it to test for a unit root. The least squares estimation leads to the following result [standard error in parenthesis): To_test the hypothesis of a unit root with this series. Z& = . are i.82. however. we generated a time series 4. .6. Without assuming a known model.0448 = -1. = 0 .18. both suggest an AR(1) model.205 . we examine its sample autocorrelation function and partial autocorrelation function.40. The result is certainly consistent with the underlying simulated model. the 5% percentile of the normalized unit root estimator based on an aggregate time series of size r t = ~ 100 resulting from an order of aggregation nt = 2 is -5. This of 240 observations from the model z.4 In this example of a simulated Time Series.0 and -7. An overfitting with an ARhllA(1.d. To be realistic with actual data analysis.1) = -6.95. the critical point for the normalized unit root estimator is behveen -5.1) = SO(0. we will retain the AR(1) model. the critical point for the normalized unit root estimator is behveen -8. and for the Studentized statistic it is -1. The conclusion of the test is to reject the hypothesis of a unit root.1) = 240(0. let us test the basic series z.a. pp. EXAMPLE 20.45 and -5. for n = 240 and a 5% significance level. the values of the test statistics are n(# .S. and we conclude that z.9. (240 observations) for a unit root. . Department of Commerce. i. Stram and Wei (1986b).1.S. new one-family houses sold in the United States from January 1968 through December 1982 (180 observations) listed as W15 in the appendix. . Thus. the series has been adjusted by subtracting the corresponding monthly average seasonal values. Next.'641442).9 and -7. The source of the data is the Current Construction Reports published by the Bureau of the Census.new one-family house sales.b).S. To avoid this problem.5 We now consider the time series of the monthly U. then the critical points for n = 80 at a 5% significance level would be between -7.95 for the Studentized statistic. We would then retain the hypothesis of a unit root and conclude that the series ZT is nonstationary. and Lutkepohl(1987). 52-53) and by Fuller (1996. it is known that the aggregate series of a stationary time series is stationary. There have been extensive studies on the aggregation of stationary processes. pp. The least squares estimation (standard error in parenthesis) gives . the sample autocarrelation and partial autocorrelation functions of ZTgiven in the following table lead to the identification of an AR(1) model: Sample autocorrelations of ZT - - - - Sample partial autocorrefationof ZT An overfitting with an ARMA(1. To test & for a unit root. U.7 for the normalized unit root estimator and . From these studies. EXAMPLE 20.S. 1) model gives the nonsignificant MA term and we will retain the AR(1) model. Suppose first that we are interested in modeling the quarterly U. Weiss (1984). including those by Amemiya and Wu (19721. pp.546 CHAPTER 20 Aggregation and Systematic Sampling in Time Series If we derived our conclusion of this test from the unadjusted percentiles given by Dickey (1976. This wrong conclusion would lead us to overdifference ZT and therefore distort its analysis. the adjusted percentiles should be used when the test is based on an aggregate series.e. we first note that as a seasonally adjusted series. &-(60 observations). Brewer (1973). Wei (1 978a. The data have been seasonally adjusted. this example shows that using the unadjusted critical points to test for a unit root with an aggregate series may lead to the conclusion that a stationary process contains a unit root and to overdifferencing. it has a zero mean. with least squares.QipBP)&. Therefore. as in the previous example. and for the Studentized statistic it is -1. We note that the data were collected monthly. most payment plans are monthly. then the critical points for n = 60 at a 5% significance level would be between -7. clearly leads to a wrong conclusion. and they reflect the way the demand for new homes is formed. -10.1) = 60(. Consequentiy. 52-53) and by Fuller (1996. we conclude that the aggregate series ZT is stationary. we consider the AR (p) approximation of the aggregate series.B) - . is the basic series.84. the series z. The estimation results are (standard error in parenthesis) Ton test the hypothesis of a unit root with this series.7 for the normalized unit root estimator and -1. At a 5% significance level. For a general process.33.5 General Cases and Concluding Remarks We analyzed how the use of aggregate t h e series affects testing for a unit root.9. the decision of buying a home is strongly dependent on mortgage payment plans. we reject the null hypothesis of a unit root.e. Consequently. The critical point for the normalized unit root estimator is between -8. is stationary.6 The Effects of Aggregation on Testing for a Unit Root 547 A The values of the test statistics are nA(@ ..40.0252 = -2. we reject the hypothesis of a unit root and conclude that z.1)/. concluding that ZT is nonstationary.1) = -6. Based on its sample autocorrelation function and partial autocorrelation function (table not shown). i.8915 . For a nonzero mean process.9412 . . As in Example 20. Therefore. we should use the adjusted critical points from Tables J and K. we can simply compute the mean adjusted series before applying the testing procedure. the 'critical point for the normalized unit root estimator is between -5. we would then retain the hypothesis of a unit root and conclude that the series ZT is nonstationary. pp. We now test I . We then fit an AR(1) model to z. If we based the conclusion of this test on the unadjusted percentiles given by Dickey (1976.ET.9412 . we can consider the month as the appropriate time unit for the series of the new one-family houses sold because the demand for new homes depends on the capacity of home buyers to pay for the houses they buy. In the period we are studying.1) = 180(. Because z.45 and -5.60.1)/. To test for a unit root. pp.p.0 and -7. . In fact.51 and T&= (3915 . and for the Studentized statistic it is -1. 20._l(B)(l . we compute the test statistics nf+ .9 and -7. Consequently. is identified as an AR(1) model.059 = -1. as implied by the use of the unadjusted critical points. The overwhelming majority of home buyers rely on mortgages to finance their homes.4. (1 Q ~ B . we assume that (B) = q. .95 for the Studentized statistic.1) =. we will use the tables provided by Dickey and Fuller. we may take the month as the basic time unit and therefore the monthly series zt as the basic series.95 for 11 = 180 and a 5% significance level. 641-642).58 and 7$ = (.20.. for a unit root. Because the test is based on an aggregate time series with m = 3 and IZA = 60. The presentation is based on the model originally proposed in Dickey and Fuller (1979).6. 6.b) test. one could not examine the underlying causes of the effects as a result of order of aggregation illustrated in this paper. the corresponding The limiting distributions of these test statistics in (20.19) and the r statistic in (20. however. One may argue that many tests are available for a series with correlated errors.iand ) .548 CHAPTER 20 Aggregation and Systematic Sampling in Time Series - . so it is hoped that the results presented in this study will be useful to data analysts. Similar to the results in Section 9. the adjusted tables based on the order of aggregation should be used as long as an aggregate time series is used in the analysis. Following the results in Section 9.20) are the same as those in (20. Thus.4.6. where%-I (B) = (1 .~has ) roots lying outside the unit circle. .~P . we can use the same adjusted Table J and Table K for the normalized unit root estimator in (20..Zw-l). respectively.6. leads to empirical significance levels lower than the nominal level. Although aggregation induces correlation. .6.r P p .cprB . see Teles (1999). and Q(B) = (1 t statistic is . . As a result. The Dickey-Fuller test is possibly one of tbe most widely used tests in time series analysis.4. which is clear from both the simulated and real-life examples in Section 20. for a proper inference. the augmented Dickey-Fuller test. The essence of the results in this study is that although the standard Dickey-Fuller test can still be used. and significantly reduces the power of the tests.4.6. .B . when aggregate data are used.6.. Because their study does not provide the explicit distribution of the test statistic used in testing aggregate series. the standard Dickey-Fuller test instead of other desirable tests may still be used to make the inference. for example. For details and examples. . the Phillips (f987a. Both simulation and empirical examples clearly show that the use of the adjusted tables is necessary for a proper test whenever aggregate time series are used. A similar study was reported by Pierse and Snell (1995).l ~ ~ .12) and (20. we constnrcted adjusted tables of critical points suitable for testing for a unit root based on aggregate time series.13). however. Aggregation causes the distributions shift to the right. the induced correlation is often not strong enough to show significance in the process of empirical model fitting. the normalized unit root estimator becomes where h ( B ) = l/rp(B).6. To correct these problems.20). and the Phillips and Perron (1988) test.19) and (20.' ~ p B testing for a unit root is equivalent to testing-@= 1 in the model where = (ZT-j . We presented only some results which are directly related to the materials covered in this book. Abraham (19821. Engle (1984). The book by Liitkepohl (1987) deals with temporal aggregation of vector ARMA processes and contains extensive references. one must first decide on the observation time unit to be used in the analysis. %ao and Guman (1980). Weiss (1984). If the model for a phenomenon under shrdy is regarded appropriate in terms of a certain time unit f. and Wei and Stram (1988). Because of the process of data collection or the practices of researchers. A related problem that has not been discussed in this chapter is the contemporaneous aggregation of time series. and nonstationarity. To avoid missing the nature of the underlying process. Liitkepohl (19841. Denton (19711. and Liitkepohl (19871. The problems on aggregation and seasonal adjustment are also studied by Lee and Wei (1979. normality. 1980). Engte and Liu (1972). Most time series procedures have been established under the assumptions such as linearity. Wei (1978b). Tiao and Wei (1976). Wei (1978b). For example. Through the study of its impacts on sampling distributions of the test statistics. Due to space limitations. several disaggregation procedures have been introduced. however. Guerrero (19901. The consequence of temporal aggregation on parameter estimation has been studied by Zellner and Mont-marquette (1971). Wei and . we will not discuss them here and refer interested readers to Boot. Chow and Lin (1971). Tiao (1972). normality. the improper use of aggregate data could distort results in terms of lag structure and causality and hence may lead to very serious consequences in terms of final policy and decision making. Stram and Wei (1986b). To recover information Ioss. and stationarity. Feibes. or nonstationary. An interested reader is referred to the work by Rose (1977). among others. Stram and Wei (1986a). we study the effects of the use of aggregate time series in performing various tests. Liitkepohl(1984). often nonlinear. Tiao and Wei (1976). As shown in the preceding sections. Engel (1984). In this chapter. the series used in testing are often time series aggregates. improper use of data in some larger time unit could lead to a substantial information loss in parameter estimation and forecasting.20. The effect of temporal aggregation on forecasting has been investigated by Amemiya and Wu (1972). time series used in analysis and modeling are frequently obtained through temporal aggregation. Cohen. Werner (1982). Al-Osh (1989). National housing starts is the aggregate of component housing starts in the northeast. As a result. To check whether an observed series confirms the assumption. then the proper inferences about the underlying model should be drawn from the analysis of data in terms of this time unit. special care is required in interpreting the testing results when time series aggregates are used in the tests. Miiller. north central.9 Further Comments - Temporal aggregation and systematic sampling pose important problems in time series analysis. Niemi (1984). Harvey (1981). It deals with aggregation of several component series in a system. Brewer (1973). Tiao and Wei (1976). Wei (1978a. and AhsanulIah and Wei (1984b).b). References on the effect of temporal aggregation on model structure include Engle and Liu (1972).7 Further Comments 549 20. Many time series encountered in practice are. and Lisman (1967). and Padberg (1971). Wei and Abraham (1981). and Liitkepohl (1987). nonnormal. the total money supply is the aggregate of a demand deposit component and a currency component. south. Kohn (1982). tests have been introduced to test linearity. Ahsanullah and Wei (1984a). Terasvirta (1977). and west regions. In working with time series data. Wei and Mehta (1980). we show that the use of time series aggregates results in serious Ioss of power of the tests and hence often leads to erroneous conclusions. More important. of ZT for k = 0. is a white noise process with mean 0 and variance IT:.2. . where a. e = rn. 2 ~ ~ ) awhere . (d) Compare and comment on your findings in parts (a). yZ(k).b). and the model for &. . (a) Let ZT = (1 B ~ ~ )Find z the ~ autocovariance ~ . (1 + + 20. 2. the autocovariance function. and find the model for ZT.I). 4 = 1. (a) Let ZT = (1 iB + B2)z3T. . of &-for k = 0. .{)(I . function. 20. (b) Let ZT = Z ~ T Find . 1) white noise process. = (1 . -8of.1 Consider the nonaggregate model z. and 3.550 CHAPTER 20 Aggregation and Systematic Sampling in Time Series Stram (1990).20OOa. a. Show that E(H.. 1.. Find the autocovariance function.19)~u. (b) Let ZT = z 3 ~Find .of & for k = 0. is a zero mean Gaussian process with a: = 2.1B . [m(L . Forecast the next nine-month data for unemployed young women. 2 ..4 (a) Let H. = (1 .2. yZ(k).13)~+ 26]u. 20. otherwise. . 10. the autocovariance function.9B)at. Chan (1993). of ZT for k = 0. =: ( Bj)x. (m .9B)z. and the model for ZT.. (c) Derive the implied quarterly aggregate model of the monthly model obtained in part (a). and (c).. (b). and 3. consider x. yz(k). Gudmundsson (1999). Prove that j=O (b) In part (a). . and 3 of 2002. Hotta and Vasconcellos (1999). and compute forecasts for the next three quarters. m-1 20. and 3. is a Gaussian N(0. and 3. 1.2.. ( m . 1 . among others. Forecast the next three quarters. and obtain the corresponding quarterly forecasts for quarters 1. where a.3 Consider Series W4 of the monthly unemployed young women between ages 16 and 19 in the United States from January 1961 to August 2002. yz(k). and Di Fonzo (2003)..Ht- 0= I t = 0.2. Hodgess and Wei (1996. and the model for ZT. (a) Build a monthly model using the data from January 1961 to December 2001.2 Consider the nonaggregate model. (b) Build a quarterly model for the quarterly unemployed data from the first quarter of 1961 to the fourth quarter of 2001. 1. = a. .d. (d) Repeat the above for Series W14.and a:. whereF = XT is an MA(1) process with the autocovarince generating function (AGF) given by yx(B) = oi{[m(l . Show that the inh order temporal aggregation turns the one-sided causal relationship into a two-sided feedback system. and a.-l + e. 4 20.. (c) Compare and comment on your findings in parts (a) and (b).i.6(B F)}.1 ) + B ] [ ( n t .0l2 + 26 .be the noncentrality parameter for z. yr = ax.I) + F ] + nta:. is a sequence of i. ( b ) Perform the unit root test on the third-order aggregates of Series W4. Show that hA. = (1 . where < c < 1. random variable with mean 0 and variance (~2. .Exercises 55 1 (c> Consider the one-sided causal model. = h o / d c . respectively. and ZT. and the UTis an error process independent of XT with the AGF given by + a20:{[m(l - el2 + 281 .B(B + F ) - + q 4 [ ( r n .. where x. and hA.6 ( B + F ) ] (1 - 20. et and at are independent white noise processes with mean 0 and variance c ~ . [nt(1 .6 (a) Perform the unit root test on Series W4. where zt = a. Let A.6 ) 2 291 .respectively.8B)a.5 Let ZT be the m* order aggregate of z. . A dynamic linear model approach for disaggregating time series data. P. (1972). Bionletrika. Proc.W. Matkovian representation of stochastic processes and its application to the analysis of autoregressive moving average processes. and Box.343-352.241-248. A Bayesian extension of the minimum AIC procedure of autoregressive model fitting. W. Math. and Chuang. The StatisficaIAnalysis of Etne Series. Biomem.. (1992). Ahsanullah. 71. 9-14. Abraham. J.471-483. J. Bayesian analysis of some outlier problems in time series. and Wei. T. 267-281. G . and Wei. Metrih. B. No.229-236. Budapest. AC-19. New York. Statist. Sfatist. k m . Math..kz. 1-11. Abraham.407-419. B. 1) process. Abraham.363-387.285-291. 66. M. 31. Al-Osh. Inst.IEEE Transactiorts ott Ailto?natic Control. 26. G. T. (1976). M. A. (1984b).. 30A. (1984). Biornetrika.. S.183-194. Academic Press. Corrrputatio~ralStatistics Qirarterly. (1984). A Bayesian analysis of the minimum AIC procedure. Inst. 553 . fohn Wiley. Arnemiya. Bio~netrikn. and Ledolter. The effects of time aggregation on the AR(1) process.85-96. N. (1969). Power spectrum estimation through autoregressive model fitting. Temporal aggregation and time series.64. B. Intematio!ial Statistical Raiew. Csaki). 66. (1979). M o t i s ) . Math. Akaike. ASA Proceedings of Bzcsiness and Econoaiic Statistics Sectiott. (1984a). IV. H. A new look at the statistical model identification. Ali.. W. Jounlal of Forecasting. H. A note on inverse autocorrelations. Mehra and D. Abraham. and Ledolter. 4. S. Empirical evidence on Dickey-Fuller-type tests. Akaike. Tecl~rzotnetrics. B. Petrov and F.. W. 1.. Akademiai Kiado.. B. The effect of aggregation on prediction in the autoregressive model. 13. W. (1982). (1971). Akaike. (1978). 609-614. C.References Abraham. Akaike. Anel: Statist Assoc. J. R. No.628432. E. and Wu. Akaike. (1973). 339. Analysis of ARMA models--estimation and prediction. Inferences about the parameters of a time series model with changing variance. and Newbold. S. 2nd btfernational Sytiipositmr oti Infontzatio~zTheory (33s. 50. Ann. 297-302. and Wei. Information theory and an extension of the maximum likelihood principle. Y. Akaike. W. Canonical correlations analysis of time series and the use of an information criterion. Akaike. H. H . H. Outlier detection and time series modeling.716-723. M. Effects of temporal aggregation on forecasting in an ARMA(1... B. 21. W. (1974b). 31. Agiakloglou. New York. M. Itjst. (1979). New York and London. 3. (1977). 8. N . Abraham. Anderson. P. (1974a). Statistical Method for Forecastittg. John Wiley.237-242.. M. Anrr. Ahsanullah. Statist. R. (1983).. in Systet~tIderttificatiotl: Advances atld Case Studies (Eds. (1989). (1989). Joanial of Ktne Series AnalysysB. . P. Fourier Analysis of Erne Series: An Introduction. Berlin. R. NJ.181-192. in Systent Identijcation: Advances nnd Cases Strldies (Eds. G . J. Biornetn'h. 31. R. D. Bartlett. J. An analysis of transformations. and Herzberg. M. Ansley.Journal of Econottletr i a . Mehra and D. (1987). S. (1985).. Bionletrika. S. Biottietrika. An algorithm for the exact likelihood of a mixed autoregressive moving average process. G. M.. B. Stocltastic Processes. Academic Press. D. (1979a). Feibes. (1976). Four cases of identification of changing systems. C . A. Ansley. 1-16.. Allied Statistics. Bollerslev. M. Time Series Analysis. The Data: A Collectiotl of Problems front Statistics. G. W..ASA Proceedings of Business at~dEconomicStatistics Section. P.Gourieroux. Bartlett.291-320. Boot. Forecasting and Control. (19461. Fa(1979). B8. Wiley Interscience. Issues involved with the seasonal adjustment of economic time series. A diagnostic test for nonlinear serial dependence in time-series fitting errors. P. Bell. Blackman. Automatic Forecasting Systems. Biorttetrifka. B. and Tukey. M. 2. P. Roy Stat. A change in Ievel of a non-stationary time series. 26. 0. (1980). T. (1965). Box. An htroduction to Stochastic Processes with Reference to Merhods aj~dApplication. Cambridge. Joicmal. Box. identification of a mixed autoregressivemoving average process: The comer method. (1964). Box. and Cox. (1979b). (1950). 185-282 and 485-569[1958]). Amer. W. M. Brillinger and G. (1955). Box. C .. (1947). C. M. On the finite sample distribution of residual autocorrelations in autoregressive-moving average models. New York and London. F. and llao. On the theoretical specification of sampling properties of autocorrelated time series. (1967). (1994). 3rd ed. J.211-252. Asllley. (1959).. E. and Hinich.29-38. in Directions in Time Series (Eds. G. Anderson). Periodogram analysis and continuous spectra. Statist.. JournaE of the Royal Statistical Socieq Srpplement. E. San Franscisco. E. R. Soc. Time Series Analysis Forecasting a~tdCo~ztrol. Jenkins... (1941). New York (Originally published in Bell Systenis Tech. (1976). G.2nd ed. Ladotis]. Hatboro.307-327. R. I?. Patterson. PA... E. G .547-554.Dover. and Newbold.. Assoc. Bloomfield.2741. G. New York. M.FV. Biotnetrika. Bartlett. C. Cambridge University Press. J. A. Andrew.. R. A. P. 176-197.. E. 2nd ed. G. R. 423-436. C. The Meamrement of Power Spectnimfrom the Poi~tto f W w of Con~ntrrrlicationsE~zgirteentig. (2000). Further methods of derivation of quarterly figures from annual data. 66. Joimtal of 7itne Series Analysis.. and Newbold. 65.J. and Monfort. S. NorthHolland. (1970). C. 349-353. Springer-Verlag.. Royal Stat. 1665-67. MTS User's Guide. Distribution of residual autocorselations in autoregressiveintegrated moving average time series models. ASA Proceedings of Business artd Econottlic Statistics Section. iV. Bartlett. Prentice Hall. 3 9 . (1984). (1986). T. G. Maximum likelihood estimation for vector autoregressive moving aterage models. M. 7. C. A. fine Series (Ed. D. Multivariate partial autocorrelations..37.554 References Anderson. S. and Hillmer. A computer program for detecting outliers in time series. 2nd ed. Bell. and Lisman. and Jenkins. G. Institute of Mathematical Statistics. Soc.. F. C. Cambridge University Press. Box. 37. Generalized autoregressive conditional heteroskedasticity. The statistical significance of canonical correlation. J. S. Multivariate analysis. E. Ser. Holden-Day. P.. (1980). Englewood Cliffs. . W. C. Amsterdam. Inc. (1966).49-59. 1. M. C. Beguin. M. H. and Reinsel. London. P.. J.(1986).165-178. 1509-1526. Bartlett. Bartlett. S.D. (1983). T. B o b . 32. G. Tiao).59455. F. M . D. 52. Ansley. 66. M. S. and Pierce. 634-639. Journal of Bicsiness and Eco~iomicStatistics. R.. . N. and a new analysis. C. (1971). Btillinger. W. (1991). W. (1973). Brillinger. 88. (1971). B. D. 13. G.8.367401. Burr. Campbell. Forecasting time series with outliers. Bio~netrika. Chen.(1980). 149-163. Antes Starist. Brillinger.. Rinehart and Winston.. and Rosenblatt. Guckenheimer.13-35. 345-357. Some consequences of temporal aggregation and systematic sampling for ARMA and ARMAX models. Assoc. 14. D. Soc. J. W. Y.65-90. A. 12. S. J. 119-129. Soc. A canonical analysis of multiple time series. 133-154. Disaggregation of annual time series data to quarterly figures: a comparative study..493498. and Tong. C. (1976). Chow. P. Practical Data Analysis: Case Studies in Business Sratistics. unpublished Ph. and Smith. A survey of statistical work on the MacKenzie River series of annual Canadian lynx trappings for the years 1821-1934. (1976). Applied Stahtics.372-375. 2. (1977). Best linear unbiased interpolation. J. Brewer. Estimation of time series parameters in the presence of outliers. (1989).. Chan. (1995). Roy. C. D. A. I. S. H. and Walker.References 555 Box.University of Chicago. Discussion 448-468. and Wei. Clarke. S. New York.. (1993)... 16. L. G. Jo~cntalof Forecnsti~ig. G. Chan.. Utrecht. '8. Annals of Statistics.Marcel Dekker. C. Cheung. M . E. Tecl~nonternks. and Liu. Autoregressive approaches to disaggregation of time series data.S.. Sortte Robirst Methods for Etne Series Modeling. K. (1847). E. Statistics Research Center. Intervention analysis with applications to economic and environmental problems.R.193-204. and Tiao.(1988). (1967). (1986). S . 5. Lecfirreson Matkenzatics in the Life Science. and Chen. Joitn~alof Econorjtetrics. 53... (1990). G. M. B. Journal of Forecmti~tg. (1988). 2. On tests for non-linearity in time series. Asymptotic theory of kth order spectra. A... New York. John Wiley. Review ofEcoilomic Statistics. W. W. Chang. M. Ser. The inverse autocorrelations of a time series and their applications. and Tiao.. (1993). I. C. W. Joint estimation of model parameters and outliers effect in time series. Tiao.411431. Temple University. Rote Series: Data Ar~alysisand Tlteory. distribution and extrapolation of time series by related series. Kernink et Fils. Jortrnal of Marketing Reseadt. (1972). . Cleveland.. Cohen. E. S. Statist. P. (1975). L.. and Oster. Estimation of time series parameters in the presence of outliers. M. M. G. 20. 153-188. G. M. Spectral Analysis of Zi~neSeries (Ed. and Wei. On tests for no linearity in Hong Kong stock returns. Miiller. D.) 13.70-79.. S. Teclznontetrics. (1975). R. G.284-297. W. G . P. and Lin. 70. Chang. No..LRo CIraentenfsP2riodiqrres de Te~rtpBrature.355-365. Limiting distribution of least squares estimates of unstable autoregressive processes. P. W. Holt.D. and Padberg. R. Ho~tgKong Jotrntal of Btcsiness Martoger?tettt. Jountal of Etne Series Analysis. A comparison of some estimators of time series autocorrelations. W. Technical Report 8.677488. Richard hvin Inc. Guttorp. 12. I. Comnptttational Statistics and Data Altalysis. Estimating finite sample critical values for unit root tests using pure random walk processes: a note. Chan. Harris).W. Buys-Ballot. Jozrntal of the Aaerica~rStatistical Association. I). Bryant. K.. J. Jotrnzal of Forecasting. S. 64. W. 30. Chan. C.. Dissertation. H. and Tiao. and t i u . New York. M. (1995). (Amer. 277-293..217-228. Statistical Qllality Coritrol Metl~ods. 14. Math. C. Empirical modeling of population time series data: The case of age and density dependent vital rates. C. G. 140. 16. J.I. Burr Ridge. C. Econometric measurement of the duration of advertising effect on sales. 1-11.. G. C. A. Chen. (1977). Chan. K. Chan. Box. and Lai. Illinois. W. (1983). H. (1992). A. J. Temporal disaggregation of economic time series: towards rt dynamic extension. B. 50. W. John Wiley. C. (1979). M. Applied Regression Analysis. Co-integration and error correction: representation. and Ray.213-237. J. (1972). R. (1987). hnex Statist. G. D. New York. L. F. R. S. Econometric modeling of the aggregate time series relationship between consumers' expenditure and income in the United Kingdom. F. and testing. estimation. Estimating time varying risk premia in the term structure: the ARCH-M model. Sankhya. Columbia University Press. 15.88-90. (1986). Di Fonzo. (1984).. J. (19815. J. Ottawa. R. Vol. Hasza. W. The X. 45. Durbin. (1986). No. F. 88. Hickman). and Fuller. Luxembourg. D. Working Paper #I2 of Eurostate. and Miller. Jo~~rnal of Econometrics. E.556 References Cooley. Adjustment of monthly or quarterly series to annud totals: an approach based 'on quadratic minimization. 74.. for the machine calculation of complex Fourier series. J. Roy. A. 79. M. Fan.E. T. J. 107-125. Engle. (1971). 159-171. 1 4 1 . Engei... R.. 2 (Ed.391-407. Iowa State University. (2003).661492.. Testing for unit roots in seasonal lime series. Office for Official Publications of the European Communities. and Liu. Assoc. D. V. 673-737. (2003). C . F. New York.. Day. N. M. A.. D. Draper. The fitting of time series models. A. Jorrmal of Eco~zol?retrics. F. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom Inflation. Jourtlal of Forecastirzg. Ecotrotrzetrica. Reviav of the Institute of Intenratiotral Statistics. W. New York.(1996).. (1965). Dickey. and Robinson.and Smith. P. 3. and Rothschild.. H.267-287. B. E. 5.. 66. (1987). Statist. 2nd ed.427-431. Catalogue 12-564. Soc. Assoc.(1992). and Fuller. T. Dunsmuir.. IF'. R. unpublished PhD.S.. 28. \V. Econo~ne~ca. 55. . F. 26C-281. N. 1. (1984).. M. Springer-Verlag. W. A. Stock market volatility and the information content of stock index options. R. and Lewis.. Denton..987-1007. 8. 2. (1976). W. 2nd ed. Srba..473-484.11 -ARIMA seasonal adjustment method. Ontario.. Engle. (1981). Ecotloniic Jountal. T.. (1990). A.233-244. Asset pricing with a factor-arch covariance structure: empirical estimates for Treasury bills. 4. A. A unified approach to the study of sums. Nonlinear Erne Series.. R. B. Conzp. Dickey. Engle. Engle. Feller. (1946). M. Dagum. J. F. Model selection and forecasting for long-range dependent processes. E. and Granger. Dickey. Estirnatio~~ and Hypothesis Testit~gin No~zstatiorralyTirne Series. (1960).251-276. D.. (1980). Q. New York. An Introdziction to Probability Theory and Its Appiications.. K. W. Ng. Journal of Erne Se~iesAnalysis. An algorithm.. B. J. J. No.. R. K. Bell. Vol. Wiley-Interscience. Engle. Seasonal adjustment of time series using one-sided filters.99-101. Henry. 52. A. Discussion on symposium on autoconelation in time sedes. 12-26. Amer. Jo~mralof Business and Econotnic Statistics. and Yaa. Daniell.. Unit Roots in time series models: Tests and implications. (1982). Dickey. Distribution of the estimates for autoregressive time series with a unit root. Crato. P. C . Statistics Canada. R. E. and Yea. Statist. (1978). Cupingood. Assoc. D. 3. H. Anler: Statist. A. time-aggregation and other functions of ARMA processes. Davidson. Matft. 197-301. 55.355-367. Effects of aggregation over time on dynamic characteristics of an econometric model. P. D. \V.Lilien. Suppl. and Wei. and Tbkey. Tlte American Statistician.dissertation.. in Econometric Models of Cyclical Behaviors. 19.. 40. Asymptotic theory for time series containing missing and amplitude modulated observations. and Robins. 1. Eco~to~netrica. (1971). products. Proc. 2nd ed. J. (1986). G. 6 . (1974). b~tematiottal Staiistical Review. On the . and Stevens. New York. 0. Tests of significancein harmonic analysis.. R.. Academic Pfess. J. The interaction algorithm and partial Fourier series. 361-372. NewYork. B. R. C .. W. Getveke. J. Granger. New York.275-295. E. Princeton University Press. (1969).194-201. Roy. Soc. D.: Statist. E. P. 77.M. Greenwich. H. Measurement of linear dependence and feedback between multiple time series. 87. The Theory of Matrices. Guerrero. B. Chelsea.. and Mclntire. Graham. Good. (1979). Mariano (Ed. S .304-3 13. W. 10. NJ. E. Princeton. Academic Press. I. Statist.. M.References 557 Findley. D.and Deistfer. John Wiley. W. A. Tabks of hztegrals.. (1976). Roy.unbiased property of AIC for exact or approximating linear stochastic time series models. and Newbold. J. Jorrrnal of M~/ilrltivariate Analysis. M. J. ~rishnaiah). Roy. B. The determination of the order of an autoregression. 229-252. Harrison. (1980). (1981). Bitll bist. f. Hamilton. Outliers in time series. 18. Computational methods for time series. J. Hannan. Geweke. 10. R. A new approach to AFWA modeling.. and Lee.A~tn. T. and Quinn. J. 125. J. Soc. 190-195. R. Soc. 2nd ed. Hannan. Addendum (1960).JAI. B. P. K. Fox. (1962).Statist. 8.372-375. Estimation of vector ARMAX models. and Porter-Hudak.. Hannan. Enre Series Models. Gudmundsson. Oxford Brrlleti~tof Ecot1oniics and Statistics. (1988). W. J. Gray. E. Kelley. T)re Statistical Theory of Littear Systetns.4th ed. (1958). Academic Press. B. J. J. and Ryzhik. F. Roy. Chelsea. Multiple Kttte Series. John Wiey. P. D. Fuller. D. Ztttrodrrction to ~tatisficalKtne Series.. Fisher. (1978). C. A. Ser.223-225. C. 5 8 . New York.221-238. Halmos. J. J. Roy. New York. (1965). Gantmacher. Granger C . Granger. E. (1972). E. San Diego). The statistical theory of linear systems. (1979). J. John W~ley& Sons. 48.38. NewYork and London. (1989). and Deistlet. D. Ser. The identification of vector mixed autoregressive moving average system. J. h Developn~etttsiti Statistics (Ed. Kitre Serieshzalysis. 1071-1081. John Wiley. 20. also in R. Hartley. The estimation and application of long-memory time series models. (1981). J. 18.81-121. Hannan.. T. Advances ill Statistical Analysis and Statistical Co~nplrting. New York.. A.205-247. report. bltroduction to Hilberf Space. Arne. 41. New York. M. Hannan. New York..). Hannan. G. l3. V. Statist. F. (1990).224-227. Ser. Gradshteyn. Biometrika. Series and Pmd~rcts. A. Math Appl. V. E. 1. Soc.132. Bayesian forecasting (with discussion). Sac.. Seckler). CT. Statist. Biontetrikn. (19Hl). I. A. . (1970).213-228. Chelsea.. 36. The estimation of the order of an ARMA process. B. Tests of significancein harmonic analysis.33-37. Ser. Jortnaal of i7t1teSeries Analysis.54-59. A.. Harvey. Hannan. Godfrey. The effects of aggregation on nonlinearity. Temporai disaggregation of time series: an ARlMA based approach. (1951). M. (1985).. 1. (1982).350-363. Dunsmuir. (University of California. (1994). G. J. Ser. Review of multiple time series. P.M. Statist. (1996). Press. Jorrnial of Forecasting. C. D. 56. E (1976). Gnedenko. Assoc. 2.. J.L. E. J. 4. Jounlalof Erne Series Analysis. Kronecker Products and Matrix Calclrlus: Wtlt Applications. New York. tech. (1 986).43.. (1983).. (1999). H. M. 2 9 4 .. New York.Vol. 322. Developments in the study of co-integrated economic variables. (1980). (1929).. The TIzeoly of Probability (Trans. S M Reviaus.1-77. S. W. Contrn~tnicatiotls ka Statistics. D. Halsted Press. Forecast Ecoiiotnic Eine Series.(1949). I. Disaggregation of annual flow data with multiplicative trends. Hannan. . K. 12. W. S. S. and Zabell. 1551-1580. Holden-Day. An autoregressive disaggregation method.S. ?V. W. W. and Vasconcellos. W. ASA Pruceedirigs of Biisiness and Economic Statistics Section... Amel: Statist.. Ser. Babies and the blackout: The genesis of a misconception. Trans. and Wei. (1990). 4th ed. S. New York and London. Joimtal of Forecasfitig. (1984). D. and Wei.. ASME J. Spectral Analysis and Its Applications. Antel: Statist. tech. Temple University.. Johansen. Johansen. (2000a). W. A. (15180). H. Johnson. G. E. W. L. and Juselius.. Kalrnan. 74. 169-176. Statistical analysis of cointegration vectors. 221-235. F. R. Economerrica. E. Prentice Hall.. vations. report # 30. S. Jolrnial of Marketing Research. S. No. New Jersey. Stochcntic Processes and Filtering Theory. J. Hillrner. The multivariate portmanteau statistic. Maximum likelihood filtering of ARMA models to time series with missing obser22. 82. (1985a).652-660. andlvei. 52.. S. Heyse. R. Jazwinski. Hinich. (2000b). 20. (1991). E. 59. Estimation and hypothesis testing of cointegration vector in Gaussian vector autoregressive models.231-254. Johansen. 389-395. Bell. J. (1985). Izenman. S. Heyse. Heyse. Zeher). Department of Statistics. S. in Applied Time Series Analysis ofEcotromic Data (Ed. Wolf and the Zurich sunspot relative numbers. (1999). Applied Mrdtivariate Statistical Analysis. 14. (1982). 74-100. Department of Statistics. Hotta. 155-171. K. and Watts. M.. DC. A.. J. 75. Hodgess. 165-183. J.4. J. Temporal disaggregation of time series. L. D. 77. W. (1977). C. C. 10. Temple University. Testing for Gaussianity and linearity of a stationary time series.602607. Bark Engrg. W. Temporal disaggregation of stationary bivariate time series.27-33.. R. Assoc. Journal of Tirne Series Aaalysis. (1980). and Wichern. Teclu~oitietrics. H. The partial lag autocorrelation function. M. and Tiao. G. 169-210.63-70..G. report # 32.. Applied Statistical Science V. Likelihood function of stationary multiple autoregressive moving average models. A. ?V.. 1. Inverse and partial lag autocorrelation for vector time series.. Jenkins. Maximum likelihood estimation and inference on cointegration with applications to the demand for money Oxford Bulleiitz of Econoniics and Statistics. and Wei. E. S.3545.. A. (1981). S. Hodgess. and Wei. R. (1982). J. J. Upper Saddle River. G. (1998). Partial process autocorrelation for vector time series. S. W. W. and Johansson. Bureau of the Census. IV. C. HUmer. M. G. W. (1960). Hodgess. E. 7 . L. Hosking. S . D. . J. W. Aggregation and disaggregation of stmctural time series models. Modelling the advertising-sale relationship through use of multiple time series techniques. An approach to seasonal adjustment. 33-44.282-299. K. R. Washington. R. (1983). and Wei. C. (1996). Assoc. and Tiao. San Francisco. 321. M. and 'Iiao. an re^ Statist. M.227-239. E.. U. 3. Academic Press. J. R. 175-196. (1970). Assoc. J. J. M. Izenman. ModeUing considerations in the seasonal adjustment of economic time series. C. 3. (1979). Linear Afgebra arid Its Applications. C. Applied Sratistical Science I. Social Sciertce Research. (1988). Hillmer. A. and Wei. Jones. 233-237. (1968). An exposition of the Box-Jenkins transfer function analysis with an application to the advertising-sates relationship. (1986).. Heyse. W. F. Jounial of Time Series Analysis.558 References Helmer.. J. A new approach to linear Htering and prediction problems. tech. M. The Mathematical Ititelligeltcer. S. R. S.. (1985b). Journal ofEco~io?nic Dy~iamicsarzd Control. S.265-270. L. Exact likelihood of vector autoregressive moving average process with missing or aggregated data. R. 208. Lundort. (1963). Series D. P.K. C . i?at~s. L. Jonnral of Busitzess and Economic Stafistics. 3. W. M. (1981). Review of Fi~rurtcialStudies. R. C. 366-370. 161-176. New York and London.(1983). Lamoureux. Ind. Jozrn~alof Monetary Econoinics. 5. S. Koopmans. Roy.3-14.Jorrmal of lime Series Analysis. D.. G. R.337-349. The Census X-11 program and quartedy seasonal adjustments... Cot~trnrttticationk Statistics. New York. 3. Kremers. Ser. Journal of T h e Series Analysis.. 2. A. (Nauk). W. Lin. Soc. Appl.. 11. S. L. Chapman and HWCRC. S.269-273. Acad.. G. Falb. G. Biometriku. (1969). (1982). M. Sur l'interpolation et l'extrapolation des suites statiomaires. 559 Kalman. (1989). P. Sales forecasting using multi-equation transfer function models.231-239. (1941). Li. 83. On outlier detection in time series. and Lastrapes. Berlin.B. ASA Proceedings of Brlsitiess and Ecor~or~ric Section. Robust identification of autoregressive moving average models. Ljung. (1939).. Diagnostic Checks i~aTinze Series. No. H. An algorithm for least squares estimation of non-linear parameters. 427-43 1 . W. E.. Mcleod.. Marquardt. Outlier diagnostics in time series models. Kalman. Math. C. (1978). 11. W. . (1987).S. and Li. Liitkepohl. 4.. CA. No. W.. I. G. H.(1987). LV. Masarotto.. A.317-324.345-359. D. ASA Proceeditrgs of Business and Econotltic Section. U.(1961).2043-2045. C.43.225-278. J.E. Springer-Verlag.References J. and Wei. R. Acad. 254-288. Identification of multiple input transfer function models. SOC. Journal of Enie Series Analysis.2.201-214. 4. Forecasting contemporaneously aggregated vector ARMA processes. and Wei. Kolmogorov. W. and Bucy. (1993). E. K.223-238. S. D. W. 2. H. Kohn. and Box. 2. and McLeod. Bionretn'ka. (1980). Jorrmal of Forecasting. and Box. A. and Ansley.559-567. interpolation und extrapolation von stationtiren Z u w g e n Folgen. New results in linear filtering and prediction problems.297-303. 297-314. J. The likelihood function of stationary autoregressive moving average models. W. Ledolter. R. Ljung. R.431441. Kohn. 5. Forecasting stock return variance: toward an understanding of stochastic implied volatitities. J. and Arbib. Ljung. 8. (2003).. I. (1982). Topics in Mather~rnfical Systern Theory. Applied Statistics. USSR. Math. (1980). R. Basic Eltgrg. W. in Directions irr Tirne Series (Ms. D. P. R. Lee. 23.214-220. B. M. Lee.B55. Kolmogorov. M. M. Lee. A comparative study of various univariate time series models for Canadian lynx data. M. and Hanssen. 95-108. Bull. 66. 65. Jorrmnl of Applied Statistical Science. M. institute of Mathematical Statistics. K. Stat. (1979). R. M. Sci. Hagwood. and Wei. ASME. (1987). G. Academic Press. (1984).70. Tiao). Soc. K. Distribution of the residual autocorrelations in multivariate ARMA times series models. A. Bior~letrika. J.. (1990). Model based seasonal adjustment and temporal aggregation. Liu. L. P. (1995). Li. (1974).. Brillinger and G. Robust estimation of autoregressive models. E (1983).McGraw-Kill. A model-independent outlier detection procedure. G.. Martin. M. 36. Diagnosdc checking ARMA time series models using squaredresidual autocorrelations. 18. W. (1987).293-326. Ser. R. A. The Spectral Analysis of 7I11teSeries. 6.. federal indebtedness and the conduct of fiscal policy. (1993). Roy. H. Forecasting Aggregated VectorARMA Processes. Llltkepohl. Sci. When is an aggregate of a time series efficient forecast by its past? Jotoaal of Ecorrometrics. G. (1979).219-23 8. Liu. On a measure of lack of fit in time series models. Paris. Statist.. S. B. (1982). (1983). and Pagano. (1977). D. Holt.3. Joronol of Business atsd Econornic Statistics. 31. and Wichern. Pena. W. D. G. G . and Hall. 66. Some recent advances in time series modelling. Miller. E. Rinehart and Winston. and Weatherby. Biometrtka. Parzen. Statist Assoc. (1963). Econo~netricn. S . R. (1987). E (1975). Curve fitting and optimal design for prediction. Iterative least squares estimation and identification of the transfer function model.. 40. Identifying a simplifying structure in time series. H. 165. Oak Ridge. (1974). W. R. Nelson. A. Noah-Holland. (1950). (1961a). Bio~tzetrika. Nicholls. 64. and Box. in Forecasting W s . Notes on Fourier analysis and spectral windows. Roy. TN. The exact likelihood function of multivariate autoregressive moving average models. The effrcient estimation of vector h e a r time series models. 57. D. Wheelwright). An approach to time series analysis. J. J.. Multiple time series modeling: Determining the order of approximating autoregressive schemes. J. fittermediate Busir~essStatistics.43-50. AlIE Traasactions.476477. 283-295.259-264. ZEEE Trans. E. report. E. Forecasting time series affected by identifiable isolated events and an explanatory variable.723-729.. Makridakis and S. 12. North-Holland. 63. (1979). Englewood Cliffs.. 0. E~neSeries A~ialysis:Tl~eotyand Practice 1 (Ed. 61. in Mz~ltiva~iutehzalysk N (Ed. S. C. Pantula. 289-307. North Holland. tech....579-592. A. 82..71.951-989. Understanding €he Kalman filter. TIMS Studies in the Marzagernent Sciertce. D. Anderson). (1979). D. Conditional heteroskedasticity in asset return: a new approach. O'Hagan. (1977).347-370. New York. C.836843. Newton. Ser. E. Pack. D. P. D. Bio~>ietrika. Montgomery. Nature.423-426. Amsterdam. Techiornetrics. l? Krishnaiah). M. (1984). Jottntol of Enre Series Atialysis. Kalman filters and their applications to forecasting. (1961b). The invertibility of sampled and aggregated ARMA models. Statirt.167-190. . Newbold.123-127.. AC-19. 3. P. Palda. 114-1 18. (1974). W. R. Parzen. A comparison of estimation methods for vector linear time series models.63-7 1 . 3. G. 9. Niemi. Parzen. N. Nicholls. in Etlte Series Analysis Papers (Ed. D. Simultaneous confidence bands for autoregressive spectra. Newbold. S. E. (1969). and Wei. Pamen. Amsterdam. NJ. Asymptotic distributions of unit-root tests when the process is nearly stationary. D. and Shgpurwalla. Arrto~naricConrrol. J. Bionzetrika. Ser. San Francisco. (1 977). 197-202. Parzen). 701-716. The exact likelihood function for a mixed autoregressive moving average process. Metrib. 59. Computer Science Division. Statist. London. 18. 74-894. D. E. Qsborn. 1 4 2 . Mathematical considerations in the estimation of spectra. B. Nicholson. Population oscillations caused by competition for food. Englewood Cliffs. D. 85-90. (1991). (1964).560 References Mehra. F. (1991). (1997). (1978).381-390. Modeling and forecast time series using transfer function and intervention methods.. Soc. H. Roy. Oak Ridge National Laboratory. Meinhold. K. R.39. Prentice Hall. Soc. Noble. AppEied LinearAlgebra. Causality testing in economics. ~men'cmStatistician. Nicholls. ofAmer. Math Statist. D. Parzen. (1984). 32. Amsterdam. E. R. (1979). E. Ann. (1980). Prentice Hall. Holden-Day. (1977). NJ. K. Biornetrika. B. A. The Measitrenlent of Cumulative Advehing Effects. Exact and approximate maximum likelihood estimators for vector moving average processes. P. Mliller. J. 33.Version 8. and Snell.73. D. Inc.277-301.323-346. SCA Corp. P. (1997). Phadke. Roy. Phillips.535-547. Priestley. J. and Ahn.. New York. A. 125-1 34. R. 5. %me series regression with a unit root. M. Assoc.John Wiley. Hypothesis testing in A R M @ . The Analysis of Mrdtiple E ~ n eSeries. (1965). M.333-345. P. Joztrnal of Forecasting. S. SCA Corp. Approximate tests of correlation in time series.34.. Jolrr-rralof Eco~~onlerrics. Abnos. (1985). Vol. R.. likelihood ratio test. (1988).335-346. Jotinla1 of Ti. Searle. John Wiley. Univ.ne Series Analysis. Littear Statistical I~lferenceand Its Applications. and Kedem. Jartntal of Business atid Ecorzontic Statistics. 2. J. SAS Institute. 5. P. Quenouille. 7. Said. Schuster. IL. Distribution of residual correlations in dynamiclstochastic time series models. (1949).(1992). Griffin.65. of ?Visconsin-Madison. 1. B11. (1989). . The h ~ e r i c a Siaiisti~t cia~z. C . and forecasting. S A S m User's Guide. 16. Computation of the exact likelihood function of multivariate moving average models. B.687-694. G. Vols. MatrixAlgebra Usefrilfor Stadistics. London. Academic Press.. (1977). 65. C. (1992). (1981). Academic Press.(1987a). 13. M. C.. Bio~ttetrika. 1. D. M. Testing for a unit root in time series regression. report 173.511-519. 75. Schwartz. (1958). 3. S. New York. 68-84. Priestley. Vector autoregressive models with unit roots and reduced rank structure: estimation.369-374. M. Reinsel. D. (1898). Pierce.55. S. Rose. Pierce. A. a. Towards a unified asymptotic theory for autoregression. El. S.London. D. A survey of recent developments in seasonal adjustment. 74. Temporal aggregation and the power of tests for a unit root. (1999). Jol~rnalof Econo~tzefrics. (1968). (1997). Biotnetrika. S. M.461464. P M p s .293-310. D. (1987b). 21.. G. 13-41. and Tsay. W. C .353-375. Soc. London.. G . A. Quality audit using adaptive Kalman filtering.Estimating the dimension of a model. (1986). A. Int. Ecoilotttefrica. D. H. D. DeKalb.: Statist. 147-159. 1.. Quenouille. 1045-1052. of Statistics. H. and Gameman. A portmanteau test for self-exciting threshold autoregressivetype nonlinearity in time series. Schmidt. 3. Elect. Biontetrika. Pierse. 80. Forecasting aggregates of independent Al2IMA process. Forecasfitzg and Elite Series Analysis Using the SCA Statistical System.. J. tech. On the investigation of hidden periodicities with application to a supposed 26-day period of meteorological phenomena. and Davies. Me~ika. Control. A Colrrse i71 llitne Series kralysis. Jozcmal of Econonretrics. (1982). M. Forecastitig and Tiltie Series Analysis Usktg the SCA Statistical System. B.. Stat. Dept. Rao. E. Ann. (1957). G. (2001). B. Schwert. Tiao. I and 11. Phillips. Spectral Analysis nnd Enre Series. Priestley. Phillips. Tests for unit roots: a Monte CarIo investigation.. Quenouille. Vol. G.References 561 Pena. (1981). Phadke. Bio~netrika. A. and Dickey. P. ASQC Qrraliry Congress Trarisactions-Sa?zFrancisco. M. and Subba Rao. New York.971-975. (1975). (1 988). M. Statist. John Wiley. (1980). R. S. Understanding spurious regressions in econometrics. G. R. (1978). B. Temporal aggregation in dynamic hear models. A. C . P. (1986). Terx Mag.21-27. q) models. C. K. DeKdb.. Discrete autoregressive schemes with varying time-intervals. htre.311-340. NC. Norr-Linear and Non-stationary Etne Series Analysis. (1995). The estimation of factor scores and Kalmm filtering for discrete parameter stationary processes. (1978). C. J.. N. B. Cary. Petruccelli.. H. and Perron. T. 6. W. P. (1981). Subba Rao. Amer: Statist. (2000). Teles. Journal of Tirne Series Analysis.. Teles. Ser. S. Y. (1989). P. Bioinetrika. Ser. M. Tiao. Statist. Modeling multiple time series with applications. E. Soc. and Wei. G. The use of aggregate time series in testing for long memory. G. Comments on model selection criteria of Akaike and Schwartz. J. 37-51. 1999IWEWNSF 31neSeries Col~erenceProceedings IX & X . 4. and Tsay. (1988). T. G. 3.e Series. 7 . 1. Slutzky.. Temporal aggregation in the ARIMA process. A methodological note on the disaggregation of time series totals.No. Assoc. 63.. and Gabr... (1986b).. Roy. h e r : Sfah'st. Biotnetrika. Model specification in multivariate time series (with discussion). Roy. J. Stone. The summation of random causes as the source of cyclic processes. 4 . 12. 331-342. Testing for common trends. W. Some comments on the Canadian lynx data with discussion. 117-126. Academia Sinica. D. and Guttman. Taiwan. (1976). Tiao. H. and Watson. Biotrteti-ika. The effects of temporal aggregation on tests of Linearity of a time series.279-292.and Wei. Wei. Journal of Time Series Analysis. Assoc. (1986a). W. Teles. Selection of the order of an autoregressivemodel by Akaike's information criterion.293-302.63.448468. R. (1977). Joun~alof Tinte Series Atmlysis.Berlin. C.802-816. M. Statist.91-103. (1983). (1979). M. 58. liao. E. in Sttidies itt Eco~lometlics. Stram. Statist. M. W. W.. Jolrrnal of721neSeries Alzalysis. Tong..276-278. R. (1999). P. (1972). W. Effect of temporal aggregation on the dynamic relationship of two time series variables. Jolrrnal of Econometrics. Springer-Verlag. 23.. Tiao. 83. C. W. W. 5. W. Bulletin of the International Statistical Institute. C. W. and Crato.D.unpublished Ph..562 References Shaman. Asymptotic behavior of temporal aggregates of time series..95-116. C. (1995).341-342.. dissertation. (2002). W. Conjuncture Institute Moscow).. No. Tiao. N. The invertibility of sums of discrete MA and ARMA processes. Biortretrika. B. 41.W.157-213. S. M. 1. Shibata.525-53 1. Roy. A. J. P. and . S. Economefrica. Tiao.219-230.. Soc. S. Forecasting contemporal aggregates of multiple time series. S. S. (1983). J.43-56. 59. Stram. 7 . Jountal of Applied Statistical Science.No. Multiple time series modeling and extended sample cross-correlations. D. The use of aggregate series in time series testing. TerSisvirta.2Trint. S . W. and Tsay. B51. Translated from the earlier paper of the same title in Problems of Ecorrott~ic Conditiotrs (Ed. Academic Press. The use of aggregate time series in testing for Gaussianity. I. G. W. Taipai. (1999). (1999).1097-1 107. (1984). and Wei.. W. 31 1-3 18. (1927). Soc. Shen. J. Cot~rprttationalStatisrics arid Data Analysis. A. Amemiya. M. Tiao. Karlin. 140. Teles. and Wei.. 2 . P. M..S.J. 1-29. The Effects of Teittporal Aggregation otr Erne Series Tests.. C. T. 4. Goodman). R. and Gabr. and Box. (1980). P. (1976). and M~tltivariateStatistics Rds. and Wei.. Stock. C. G . A note on the representation of a vector M A model. (1980). G.. Joitmal of Bruitless and Ecoliornic Statistics. S. (1977). 1. 34. M. G. \V. O. New York and London. G. and Wei. Temple University.513-523. 76. S. (1971). and AIi. S. H. 145-158. T. . 165-170. f 05-146 (1937). and L. A test for linearity of stationary time series. P. C. Subba Rao. Scandirzaviatr Joirnral of Statistics. Teles. T. W. An bitmdrrction to Bispectral Analysis atrd Bilitlear Erfze Series Models. S...Wei. O. Analysis of correlated random effects: linear model with two random components. Properties of estimates of the mean square error of prediction in autoregressive models. Assoc. K. 79. W. A. Effect of systematic sampling on ARlMA models. Springer. 8. J. J. and Abraham. Wei. Soc. Some consequences of temporal aggregation in seasonal time series models.51-58. Forecasting contemporal time series agmgates. Sijthoff and Noordhoff.132-141. Co~nmirtnications in Statistics-Tl~eoc mid Meth.Chen). The effect of systematic sampling and temporal aggregation on causality-A cautionary note. (1990). 1335-1344. H. Si~aula. Amsterdam. 523-530. 101-1 10. C. Wei. New York.26. J. Statist. J. S. On the temporal aggregation in discrete dynamical systems. %me series model specification in the presence of outliers. Jolmial of Ecotzotnetrics. New Yo& Tong. W. Trabajos. The Astrophs. Inst. (1988). Tsay. A statistical investigation of sunspot observations with special reference to H. (1980). S. Jounial of Eco~rometrics. (1986). W.. (1981). . \V. (1984). Wei. Another look at Box-Jenkins forecasting procedure. Threshold autoregression.245-292. E.. W. Estadist. in Systent Modelitzg and Optitnization (Eds.255-278. (1990).. (1980). S. S. S. (1982). Wei.. 84. W.References 563 Tong. 433-444. (1998). Soc. W. and Tiao.. level shifts. and Lim. Harris). and Mehta. 819-825. A Dytmnic System Appmaclt.New York.. H. (1967). Wei. 120.W. Non-linear Eine Series. G. Statist. in Pattent Recognition and Signal Ptacessing (Ed. in Analyzitlg n n ~ eSeries (Ed. B52. (1989). Tong. R. S .. W.. An eigenvdue approach to the limiting behavior of time series aggregates. Statistics Sinica. S. I? (1952). 1. R. J . 0. Whittle. J. 2389-2398. 1-22. J. J. Washington. Disaggregation of time series models. J. S. H.dord University Press. John Wiley. Drenick and F. S. S. Tong. C . 0. P. W. R.B. North-Holland. J. S. Detecting and modeling no linearity in univariate time series analysis. Statist. (1975). 7. Werner.251-260. W. Jattnlal of Applied Statistical Science. The effect of temporal aggregation of parameters estimation in distributed lag model.IV. A. R. Wei. A~necStatist. S. F. Amsterdam. (1978a). Systematic sampling and temporal aggregation in time series models. Wei. 0. and Xao. Tsay. lS{Z). Anti..S.. Wei.237-246. Anel: Statist.. limit cycles. Weiss.. Kozin).453467.84-96. Statist. W. Coaittiztn. Amer. Consistent estimates of autoregressive parameters and extended sample autocorrelation function for stationary and non-stationary A M models. The simultaneous estimation of a time series' harmonic covariance structure. Wegman. S. IV. Wei. W. Whittle. \V. (1981). S. Assoc. D. 391400. 613417. 2546. Comparison of some estimates of autoregressive parameters. D. IV. H. (1978b). 0. W. S. R. Jozrrtlal of Forecasting. Tsay. A10(23). in Advanced Se~ninaron Spectral Ai~alysis@d. G.431451.. J. Ray. Bureau of the Census. and variance changes in time series.. N. Roy. D. (1986).43-57.. Cortimrmicatio~tsin StafisticsTheox and Math. An introduction to the calcuIations of numerical spectrum analysis. DC. Statist. ?V. Testing and modeling threshold autoregressive processes. U. The effect of temporal aggregation on forecasting in the dynamic model. 77. and IVei. Temporal aggregation and information loss in a distributed lag model. W. Springer-Verlag. C. Assoc. ASA Proceedings of Btcsitzess and Ecorro~nicSection. 40. Tsay. Wei. (1488)..231-240. Matlr.. New York.316-319. Outliers. H . Alfven's sunspot model. and Stram. TttrestroldModels in Nora-linear Etne Series Analysis. A. and Stram. and cyclical data (with discus' sion). (1984). S. Anrel: Statist. Tukey. 8. 3. in Seasonal Analysis of Ecorzoinic Zt?re Series (Ed. (1954). (1978). Anderson). S. On a threshold model. A10(13). (1991). R. Tsay.. W. B. B42. Department of Commerce. ASSOC. Zellner). 81. (1 983). (1982). D. New York. D.Irtterpolatiot~and St~rootlairtgof Statio~raryE I HSeries ~ wifh Eltgineeri~rgApplicafions. . (1 974).S. On the analysis of first order autoregressive models with incompiete data. 2nd ed.564 References Whittle. Zellner. A. 53. W. Yaglom.. P. 10. NJ... 226. A. Math. Intenmtioaal Econornic Reviews. 0. (1932). Oxford. (1962). 25 (3). On a method of investigating periodicities in disturbed series with special reference to Wolfer's sunspot numbers. Ahqvist and Wiksell. U. (1966).. Zellner).A Stzdy in the Analysis of Stafiortury7i11leSeries. and Hsu. Blackwell.72-76. Engleivood Cliffs.. G .. Wichern.209-224. (1983). A study of some aspects of temporal aggregation problems in econometric analyses. Washington DC. Lotzdotl.A. 248-256. MMir. S. Plzilos. (1949). (1978). Review of Ecotrot~iicsand Statistics. U. Appl. Zellner. S. H . @id.John Wiley.335-342. Bull. 7. C. Uppsala.A. Department of Commerce. A. Prediction and Regulatiott Lty Liriear hast-Square Methods. Yule. Wiener. Zellner. Stat. An bttmduction to the Themy of Stationary Rat~dotnAatctiotzs. Appl. A. Prentice Hall. R. Inst. and Montmarquette. 24. Changes af variance in first-order autoregressive time series models with an application. A. Wilk.471-494. Young. Wold. Ser. (1976). A. Bureau of the Census. Bio~~ietrika. (1927). M. Trans. (1938). Roy. B. Certain generalization in the analysis of variance. Recursive approaches to time series analysis. Soc. P. The Extrapolatio~i. Semo~zalA~zniysisof Econo~)zicE~neSeries. (1971). N. C.267-298. Appendix Series W l Series W2 Series W3 Series W4 Daily Average Number of Truck Manufacturing Defects Wolf Yearly Sunspot Numbers from 1700 to 2001 Blowfly Data Monthly Unemployed Young Women between Ages 16 and 19 in the United States from January 1961 toAugust 2002 Series W5 yearly Cancer (All Forms) Death Rate (per 100.000) for Pennsylvania between 1930 and 2000 Yearly U.S. 2002 and August 31. New One-Family Houses Sold (thousands) from January 1968 to December 1982 Table A Table B Table C Table D Table E Table F Tail Areas of the Standard Normal Distribution Critical Values of the &Distribution Critical Values of the Chi-Square (x2)Distribution Critical Values of the F-Distribution for F0.S.S.1) for # = 1 . Hog Data from 1867 to 1948 Series W14 Number of Monthly Airline Passengers in the United States from January 1995 to March 2002 Series W15 Oklahoma and Louisiana Spot Prices for Natural Gas between January 1988 and October 1991 Series W16 U.S. Beer Production from the First Quarter of 1975 to the Fourth Quarter of 1982 Series W11 Daily Closing Stock Price of Duke Energy Corporation between January 3.2002 Series W12 Lydia Pinkham Annual Advertising and Sales Data fiom 1907 to 1960 Series W13 The Five Series: U. v2) Critical Values of the F-Distribution for FO.05 (vl.ol(vl. Tobacco Production from 1871 to 1984 Series W6 Yearly Number of Lynx Pelts Sold by the Hudson's Bay Company in Canada Series W7 from 1857 to 1911 Series W8 Simulated Seasonal Time Series Series W9 Monthly Employment Figures for Males between Ages 16 and 19 in the United States from January 1971 to December 1981 Series W10 Quarterly U. u2) Empirical Cumulative Distribution of n ( 4 . 00 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 60.00 73.00 40.20 45.90 61.00 27.00 0.00 29.90 85.00 111.00 11.80 106.10 36.00 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 70.70 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 12.00 8.00 36.00 21.00 22.00 5.40 37.1) for the Zero Mean Seasonal Models Percentiles of the Studentized T Statistic for the Zero Mean Seasonal Models Empirical Percentiles of the Normalized Unit R2ot Estimator for Aggregate Time Series: AR(1) Model without Intercept nA(@ .00 78.00 40.00 73.40 47.00 47.00 11.00 16.10 100.00 20.00 11.00 2.00 101.00 60.00 80.00 34.1) Empirical Percentiles of the Studentize! Statistic for Aggregate Time Series: AR(1) Model without Intercept T4 = (0.60 10.60 (Corrtinued) .00 26.00 23.1)/S& Series W1 Daily Average Number of Truck Manufacturing Defects (Read Across) Series W2 Wolf Yearly Sunspot Numbers from 1700 to 2001 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 5.00 10.00 122.00 40.80 69.00 16.60 54.00 11.OO 35.00 3.40 47.20 32.00 22.00 103.00 39.06) 20.00 28.40 20.90 83.70 47.00 47.80 30.00 11.90 11.20 9.80 81.00 81.00 16.00 58.00 0.00 63.566 Appendix Table G Table H Table I Table J Table K Empirical Cumulative Distribution of Tfor q5 = 1 Percentiles of tl(6 .00 62.00 5. Erne Series Data Used for Illustrations 567 Series W2 (continued) - (Corttintted) . 568 Appendix Series W2 (continued) . Dec. (Continired ) . Apr.Time Series Data Used for Illustrations 569 Series W3 Blowfly Data (Read Across) 1676 3050 2175 4369 4651 8579 2330 3075 3772 1628 3394 4243 7533 3123 3815 3517 2388 3869 4620 6884 3955 4639 3350 3677 2922 4849 4127 4494 4424 3018 3156 1843 3664 5546 4780 . Sept. 2784 5860 2625 2412 4272 .3771 2837 4690 3016 2881 6316 6650 5753 5555 5781 2221 4955 5119 3821 6304 5712 4897 2619 5584 5838 4300 4842 4786 3920 3203 3891 5389 4168 4352 4066 3835 2706 3501 4993 5448 3215 3618 2717 4436 4446 5477 2652 Series W4 Monthly Unemployed Young Women between Ages 16 and 19 in the United States from January 1961 to August 2002 (in Thousands) Year Jan. May June July Aug. Oct. Nov. Feb. Mar. 8 248.8 230. Nov. Sept. Series W5 Yearly Cancer (All Forms) Death Rate (per 100.9 251.Tobacco Production from 1871 to 1984 (in Millions of Pounds) .2 173.3 219.2 251.7 157.4 150.4 250.4 249.570 Appendix Series W4 (coritinued) Year Jan.6 226.7 157.9 216.0 203.0 180.6 251.9 169.8 196.4 177.000) for Pennsylvania between 1930 and 2000 1930 1941 1952 1963 1974 1985 1996 100.4 136.0 205.8 190. Feb.5 131.1 105.7 155.5 188.7 235. Oct.3 184.8 241.2 175.4 235.1 109.6 222.2 122.9 199.1 235. May June July Aug.3 249.8 190.7 Series W6 Yearly U.2 170.5 219.9 179. Apr.9 126.5 137.2 130.3 215.0 182.1 165.S.3 116.7 187.4 186.4 131.2 168.6 248.5 144.6 110.8 241.2 244.1 134. Dec. Mar.5 152.8 250.8 142.8 144.9 118.2 187.7 173.0 247.8 161.8 164.8 169.5 125. Time Series Data Used for Illustrations Series W6 (continued) Series W7 Yearly Number of Lynx Pelts Sold by the Hudson's Bay Company in Canada from 1857 to 1911 571 . Feb. . Sept. May June July Aug. Mar. Apr. .Oct. Nov. Dec.572 Appendix Series W8 Simulated Seasonaf T h e Series (Read Across) Series W9 Monthly Employment Figures for Males between Ages 16 and 19 in the United States from January 1971 to December 1981 (in Thousands) Year Jan. 2002 (Read Across) 573 .2002 and August 31.Time Series Data Used for Illustrations Series WlO Quarterly U.S. Beer Production from the First Quarter of 1975 to the Fourth Quarter of 1982 (Millions of Barrels) Quarter Year I n IIT TV Series W l 1 Daily Closing Stock Price of Duke Energy Corporation between January 3. 574 Appendix Series W12 Lydia Pinkham Annual Advertising and Sales Data from 1907 to 1960 Year Advertising (in $1000) Sales (in $1000) Advertising Year (in $1000) Sales (in $1000) . Xme Series Data Used for Illustrations 575 Series W13 The Five Series: U.S. Hog Data from 1857 to 1948 Year Hog Numbers Hog Prices Corn Prices Corn Supply Wage Rates (Continued) . 576 Appendix Series W13 (continued) Year Hog Numbers Hog Prices Corn Prices Corn Supply Wage Rates . Time Series Data Used for Illustrations Series W W (continued) - Year 577 Hog Numbers -- Hog Prices Corn Prices Corn Supply Wage Rates . June Aug. Feb.Series W14 Number of Monthly Airline Passengers in the United States from January 1995 to Marc Year Jan. Mar. 1995 1996 1997 1998 1999 2000 2001 2002 40878 41689 44850 44705 45972 46242 49390 42947 38746 43390 43133 43742 45101 48 160 4795 1 42727 47103 51410 53305 53050 55402 58459 58824 53553 48561 51317 52925 54702 56457 60787 59515 51443 54377 55868 5 6546 58424 6 1098 62654 Oc . 579 %me Series Data Used for Illustrations Series W15 Oklahoma and Louisiana Spot Prices for Natural Gas between January 1988 and OCtober 1991 - Year Month Oklahoma Spot Price - Louisiana Spot Price . 340 1.165 1.258 1.298 1.287 1.320 5 6 7 8 9 10 1.020 1. May June July Aug. Oct. Dec.223 1.613 2. .065 1.318 1.237 1.830 1. Mar.368 1.137 1.580 Appendix Series W1S (continued) - - - Year Month Oklahoma Spot Price Louisiana Spot Price 1990 1990 1991 1991 1991 1991 1991 1991 1991 1991 1991 1991 11 12 1 2 3 4 1. Apr.205 1.903 1.732 Series W16 U. Sept. Nov.627 1.117 1.208 1.S. Feb.New One-Family Houses Sold (Thousands) from January 1968 to December 1982 Year Jan.208 1.052 2.807 1. Statistical Tables Table A Tail Areas of the Standard Normal Distribution 581 . 582 Appendix Table A (continued) Table B Critical Values of the t-Distribution (Continued) . Statistical Tables Table B (continued) . 583 . I0 .025 .01 .05 .005 .584 Appendix Table B (contir~ued) .25 . Copyright renewal O 1990. B.Statistical Tables Table B (continued) .Owen. Hartdbook of Statistical Tabh. Addison-Wesley Publishing Co. TabIe C Critical Values of the Chi-square (. D. Inc. Publishing as Pearson Addison Wesley.y2)Distribution 585 . O 1962. Reprinted by permission of Pearson Education. 586 Appendix Table C (coatinued) (Continued) . For example.025 (m+ ~. when v = 100.yz = - 587 -05 .059. .)'/2.01 .a05 where N. 0 5 ( ~v2) l. x i s (d2(100) .995 X -99 .1 + 1. Table D Critical Values of the F-Distribution for F 0 . one can use approximation .645)'/2 = 124.Statistical Tables Table C (conti~zlued) .975 -95 For large values of v. is the a% upper tail standardized normal deviate given in Table A. . . 81 3.70 2.98 3.11 3.11 3.14 3.71 3.22 3.74 3.96 4.59 3.76 4.10 3.48 3.83 2.26 3.33 3.85 2.01 2.07 3.95 2.90 2.34 3.20 3.49 3.80 2.84 4.71 2.77 2.588 Appendix Table D (continued) c 10 11 12 13 14 1 2 3 4 5 6 7 4.75 4.89 3.00 2.85 3.41 3.02 2.03 2-96 3.36 3.18 3.92 2.60 8 9 3.09 3.67 4.65 .91 2. Vol. 1958. Reproduced from Table 18 of Bioinetrikn Tablesfor Statisticiarls. . by permission of E.Statistical Tables Table D (corrtinued) 589 . S. I. Pearson and the Biometrika T i t e e s . OI(~l. v2) 1 2 3 4 5 6 7 8 9 .590 Appendix Table E Critical Values of the F-Distribution for FO. Pearson and the Biornetrika Trustees. Vol. 1958.Statistical Tables 59 1 Reproduced from Table 18 of Biometrika Tables for Statisticians. . S. 1. by permission of E. 7 -9.8 -13.6 -5.99 i -.592 Appendix Table I? Empirical Cumulative Distribution of n(4 .62 1.9 -10.3 -13.7 -5.7 -7.A.4 -10.29 1.60 2.9 -8. This material is used by permission of John Wdey & Sons.85 -.61 1.05 2.03 Reproduced with minor notation adjustment from Table 10. . Inc.86 1.0 -8.16 2.025 .04 2.86 -.05 -10 -11.28 2.7 -5.69 1.65 1.1) for + = 1 Probability of a Smaller Vafue n -01 . Copyright O I996 by John W e y & Sons.3 -7.3 -9.975 .41 1.86 -.5 -7.1 in Fuller (1996). Inc.7 (a) 25 50 100 250 500 w .29 1.5 -5.01 -97 .78 1.31 1.94 -93 -93 .0 -8.2 -10.9 0 -32 1.6 -13.3 -5.7 -13.34 1.84 d.95 .8 -12.1 -5.95 .28 1.SO .4 -10.09 2. Statistical Tables 593 Table G Empirical Cumulative Distribution of T for I$ = 1 Probability of a Smaller Value Reproduced with minor notation adjustment from Table I0.2 in FulIer (1996). Inc.A. .This material is used by permission of John Wiley & Sons. Inc. Copyright Q 1996 by John Wiley & Sons. 594 Appendix Table H Percentiles of n(& - 1) for the Zero Mean Seasonal Models Probability of a Smaller Value 11 = Ins - Reproduced with Minor Notation Adjustment from Table 2 in Dickey. by permission of the American Statistical Association. All rights reserved. . and Fuller [1984). Ham. Copyright 1984 by the American Statistical Association. . Copyright 1984 by the American Statistical Association.Statistical Tables 595 Table I Percentiles of the Studentized T Statistic for the Zero Mean Seasonal Models Probability of a Smaller Value Reproduced with Minor Notation Adjustment from Table 3 in Dickey. All rights reserved. by permission of the American Statistical Association. Hasza. and Fuller [1984). 30 -13.70 -5.99 -8.87 -12.19 -10.93 0.36 -10.44 -10.35 -9.03 -8.03 2.93 Order of aggregation rn = 2 Order of aggregation III = 3 .52 -5.60 -99 2.78 1.(@ 1) - Order of aggregationm = 1 '#A 25 50 100 250 500 750 m .69 -7.62 1.69 1.65 1.07 -5.04 2.93 0.28 2.95 0.61 -5.025 .72 -13.32 -7.596 Appendix Table 3 Empirical Percentiles of the Normalize! Unit Root Estimator for Aggregate Time Series: AR(1) Modef without Intercept n.31 1.82 -13.29 1.lo -9.03 2.05 .67 -5.15 2.93 0.61 6 1 1.59 -13.88 -7.34 1.78 .04 -8.97 0.29 1.08 2.28 .02 .01 0.91 -10.42 -10.32 -5.975 1.69 -5.95 1.48 -7.71 -90 1.O 1 -11.40 1.69 -13.28 1. 90 .64 -3.68 -6.02 -5.08 -6.025 .05 -10 .64 -3. .06 -9.21 -5. DetaiIs are given in Teles (1 999).62 -3.20 1.92 -9.58 This table was constructed by Pado J. Teles using the Monte Cado method.66 1.25 1.01 1.59 1.19 1. C.95 .20 -5.64 -3.65 1.52 2.22 1. Order of aggregation 112 = 4 n~ -01 .46 2.83 -6.83 -6.81 -6.96 1.96 1.75 1.19 -5. 2.55 -3.59 1.95 .45 2.85 -5. F.99 Order of aggregation m = 6 Order of aggregation m = 12 - 25 50 100 250 500 750 cn -8.76 -6.77 261 2.44 .21 1.60 2.62 1.97 1.20 1.51 -6.44 -8.31 1.17 -5.60 -3.05 -9.48 2.975 1.12 -5.Statistical Tables Table J (continued) 597 .76 -8.07 2.18 2.21 -3.02 -9. 1)/S& Order of aggregation nt = 1 Order of aggregation m = 2 Order of aggregation rn = 3 .598 Appendix Table K Empirical Percentiles of the SJudentized Statistic for Aggregate Time Series: AR(1) Mode1 without Intercept T& = (@ . 975 . Teles using the Monte CarIo method.95 .025 ... E C.99 .05 . .10 -90 . .Statistical Tables 599 Table K (continued) ---.- Order of aggregation r?t = 4 Order of aggregationm = 6 n~ -01 . . Details are given in Teles (1999). Order of aggregation tn = 12 This table was constructed by Paulo J. . 546. 464.556 Falb.559. B.429.556 Dickey. 314. M..554 Bell. G.C. A.417.429..193.. W. M. 156. 549. A.. N. E. E. 550.561 Di Fonzo. M. S.358.554 Beguin. W. B.561 Day.555 Campbell.207. 25. 145.380.. M. 464. F..554.555 Bucy.554 Bartlett..440.556.. A. K.112. 157.556 EngIe. I.554 Bollerslev. T.. I. 483..319.411.559 Arbib.261.554 Box. 109. 85. A.. F...... 223. J. P..411..330. 549.. A. W.4IO. R. G.473.223. 549. 160. E. 145.546. R. T..85.561 Ahsanufiah.. A.235.223. 380. 145.128. W. 499. C.553 Ali. J.416.206.449. 112. Y. D. 129. W..562 Brewer. 476. W. 128. 549.. 425. 416.. H..554 Boot.553. R. 464.553 Akaike.132. R.553 Ahn.311. C.J.310. M.555 Chan.557 Denton.553 Clarke. 164.554 BoMin.Author Index Abraham..555 Cohen. T. J. 121.402.26.219.401.549.437. E.556.464. 153. 549.. 547.369.549..554 Bloomfield.555 Cheung. T.556 Feibes.563 Agiaktoglou.194.554.. 336. K.554 Andrew..555 Chen.556 Davies.209. 549. 223. 41 1. 556 Cox.549. S.549. L. 544.556 DeistIer.. F..471.543.555 Cooley. 112.413..556 Findley. J.555 Bryant...308. C. E.G. R.557 . R.528. M.498. 376.. C.. H.556 Dagum. N.435. 439.375.559 Burr.553.114..235.408.20. M.. 143..191.70. W. 76.190.. f 21. P. P. 201.195. R. 19. H. R.212.555 Cleveland.560. J. 133. H. S.555. 501. M.506.. R. E. 501. 403. S.521.555 Chuang. M. 546.553 A1-Osh.153. G. 261.554 Ansley. D.554. G. 370.. R. E.556 Draper. B. C.557 Durbh..223.494. K. 109.555 Brjllinger.556.. 556 Davidson.545.226.550. D. 544.553.W.. 492..555 Buys-Ball~t. 235.554 Crato.218. L..559 Fan.553.556 Engel.558 Blackman. M.555 Chow.443.562 Arnemiya. 537.486. T.555 Chan. N..453. W.. J.555 Chang.483.559 Ashley...544. 186.556 Daniell. P. 157. 233. 434.557 Fisher. 259. S.. 415. 294. W.234. 527.554 Felier.148.. T... 368. T.556 Dunsmuir.. 164.473. 549. D. 133.562 Cupingood.436..115. 305. W. N.208. 549.556. P. D.230. A.235. J. P. 164.553 Anderson. D.322..402.483.275. A.M...339. 336.559 Kedem. 483 Hartley. 391.143. 145.H.. 429. 483.557 Graham.. J.. M.. 544.416.557 Hannan.408.557 Cuckenheimer. R. 20.416. 521.554 Liu.555 Hall..208.557 Geweke. M. 133. 349. 121.416. 223.. 76. 528. 483. M. D..554.557 Hamitton. 133. 549. M.556.554... K..556. 333. 261.. R. 230.549. I. D.336. 121. M.. H.209.440..554 Heyse. M. M.164. L.558 Jenkins. 332. C.417. 212.B. 88. 295. D.559 Liu.313.557 McLeod. 560. 549.557 Gourieroux. G. 25. J. D. J.556 Ljung.464.414.558 Hoskng.559 Koopmans. 487.. 439.339. S.563 Monforl. R. D.322.557 Gradshteyn..402.436.499. R. 416.559 Kremers. W. 146. 539.221. P.S.235.555 Muller. 133.C.157.368.556. 549.558 Jazwinski...114.. C.556 Helmer. J. M. A.559 Lee. 483. S. 186. A. 409. L. 434.. 380.563 Izenman. G.560 Miller. A.. O. R. J. R. 547.554 Montgomery.506.70. 499..443. 195.557 Gray..559 Lai.. D. R. G.557 Gabr.543. I... H.. A. 235. A. D.558 Johnson. E..528.535..207.557 Good.. E. A. 133.416. H.193. 534. C.. 549.558 Hillmer.546.559 Liiien..220.558 Juselius. P..558 Kalman. 207. 109.557 Hasza. T.349. J. 115. J.. 483. 20. 441. D. 549. K. J.408. K. J.. I.349..556 Herzberg.. R. K. P. M.558 Hinich. 223.563 Meinhold. T. S. 380. F.W. 358. 550. 235..26.483... H. S.208.. D.. 373.557 Guttman. J. J.556 Lim. 259.. K. R.190.558 Hsu. W.209.. V.559 Marquardt. 336.194. 483.560 Walmos.. J. G.. 549. M.447.230. 230.560 Montmarquette.441.557 Granger.559 Lee.559 Lisman..502. M.557 Lewis.. K.559 Ledolter.558 Hotta.534. C. C.557 FuUer.559 Mehra.559 McIntire. D.602 Author lndex Fox. F.. E. R.555 Gudmundsson.476....558 Jones. 491..559 Harrison. 362. 549. G. R. H. W.557 Gnedenko. 145. J. 380.. J. 498.230. D.556 ti. 20. D. 128..560 Mehta. D.558 Johansen. 549. A.. 539. 191. H..557 Kohn. H.429. 416. 483.442.410. 416..560 . I.212..562 Guttorp.563 Miiler.559 LUtkepohl.201..223..358. 115.537.554.. J. 145.A. 521. 549.557 Godfrey.222. M..411.453.553. K. 201..545. W.557 Guerrero. 133.562 Gamerman.544. 112.558.559 Maain. I? J.557 Hanssen.558 Henry.397.448..546. L. A. 528.. A. A.132..557 Harvey. M.387.558 Hodgess. C. F. J. M..A.563 Lin.561 Keliey.. R.550.235.554. M.555 Lamoureux..437. 369.... 88. D.. P. E. M. D. 109. R. 494.554 Lee. 235. 549. C.153. 550. H.559 Lastrapes.559 Masarotto.559 Kolmogorov.362. J. 148..561 Gantmacher.373.549.558 Sohansson. 555...555 Lin. W.444. C. 556 Stevens.539. D. 449. S.556 Robinson. 164. N. S...562 Terhvirta.557 Stock.561 Schuster.559. 164. W.561 Porter-Hudak.. 212.401.561 Quenouilte. 380.219. W.454. A.563 Tukey. M.515. 549.561 Quim.. S.M.. 200. Y.539.556 Reinsel.562 Singpurwalla.417. H. P.562 Trao. 206. G. 319. 363.143.311...235. 156.449. 561. 371. JV. R. G. C.A. 115. A. C.358.G. M. E. 290. S.554 Pena.499. 501.413. K. J.549... G. P. 148.560 Newton. 3...261.562 Stone. 483. R.521. 234..561 Penon.230.523.513.416. 509.358. S.233. 435.563 Tong.206.206. 47.220.560 Nicholson.561.560 Newbold..553.E. 26.132..534.. C.498. 544. B. G... S.D.223. M.368.562 Shen. Z. A.235. 200.447. M. S.557 Said.517. 112.153.562 Shibata. 153. 483.546. P. 549.506. R.466. Q.. W. E.. D..561 Srba.561 Phillips.230. 319.563 Wei. B.556 Smith.494.562 Smith..528. E.498.560 Pdda.. 19. 435.421..528. 313.235.560 Padberg.561. 549. 186.550.416. D.541...513. S.560 Oster.563 603 .333.454. W.. M..402.561 Seacle.. G.516.414.534. 546. 515. 496. H.410.157.429.541..543. 392. 562 Watts. K.516. J.562.560 O'Nagan. G. 235.. S.554.259.560 Patterson.561 Phadke. D. R.466. 218. H. 483.561 Shaman.554.319. R.449. 362. 416. S.121.483. D. 501.563 Weatherby.556 Rose. F. P. M.499. B.554. J.561 Pierse.561 Robins. 548.550.560 Slutzky.145. M. 548. A. R. 132..... 548.512. H. 129.560 Niemi.368. A. R. 369.557 Rao.457. 435. 130. B.506. 506. C. 537.212. 558..537.533. 440. 554.. 544.561 Ray.483.548.557. 164...561 Petruccelii.. 483. 91.503.555 Snell.. 130..403. D.560 Pantula.312. J.415..322. 157.562..201.. J. 336. M.503. H. A.563 Walker.562 Teles.554.. M. D..336. 533. G. 376. F.563 Subba Rao.561 Schwartz.560 Ng. C. M. P.410.562.556 RyzWc.. 491. 549. 561 Schwert. 380. J.537.555.558. K. 114..561 Schmidt.562 Stram.332. 483.443.540.556. H. 22.555 Pagano. 509.411.408. R.555. 548..562. A.502. 391. 562.540.. 549.506. G..563 Tsay.555 Wei. 408. 494..W.534.556 Nicholls.549. 115.476. V.Author Index Nelson..457.560 Osborn. C.439..499... 121..442.558 Wegman (1986). 121. 115.553.560 Noble.310. 434.417. G. 157.555 Watson. D..454. 92.305. J.492.132.506. M.538.226.502.528.. 145.. M.512..538. T. 416..157...560.311. M. 555 Pack.. E.. 544.555.557 Priestley.413.555 Rothschild. 494.560 Panen.411. B.561 Pierce.223..517.543. T.561 Rosenblatt.556..201. 145. D.363.560 Wei. 88..549.563 Werner.. L. 88. A. J.......604 Author Index Weiss. Y. G.222.235. 33.553 Yaglom.546. H.558 Dllner. A.563 . A. P. S. 546.549. .563 Wold. 563 Wichem.560.563 Wu. C . A. 112. N. S. 528.. 112. S..295. 434. 212. 528. U.563 Zabell. P..563 Whittle. H. 146.42..556 Young. 486.. 88. 160.A. 24.\V.487.271.26.563 Wiener. 483. 499.556 Yeo.549. R.558.563 Yao. 164. G. M. D.528.563 Wiks.549. 0.222.. S.563 Yule. 71 ARIMA model. 72. 25 of ARMA(p. 148 vector ARMA process.35 Autocorrelation matrix function. 507 of t h e m process.492 ARMA (See Autoregressive moving average). 233 standard error of sample. 34 AR(2) process.Subject Index Additive outlier. q) model. 250. 57 ARTMA (See Autoregressive htegrated moving average). 74 forecasting of. 164 Autoregressive moving average processes. 1) process. 58 partial.407 Aliasing. 51 of MA(q) process. 290 ARCH models. 71 ARMAX model. 507 of seasonal model. 267 Autocovariance generating function. 384 Autocovariance function. 10 definition of.64 of ARMA(p.l. 136 exact likelihood function. q) process. 223 Aggregate Series. 285. 156. 18 spectral representation of. 45 ARMA(1. 21. 386 . 1) process. 20 a-trimmed sample. 26. 91 IMA(1. 274 state space representation of. 122 matrix. q) process. 41 of AR@) process. 508 limiting behavior. 25 sample. 20 for stationary processes. 11 properties of. 515 matrix. 515 Aggregation.368 ARFlMA model. 143 MA presentation. 384 Autoregressive integrated moving average (ARMA) processes. 1) process. 70 Amplitude.272 generating. 10 covariance of sample. 48 of MA(2) process. 25 inverse.354 Analysis of variance for periodogram. 109 generating. 39 A R ( p ) process. 466 variance-covariance matrix of estimates. 34 of AR(2) process.57 AR representation. 45 of MA(1) process. 322 Autocorrelation function (ACF). 52 spectrum of. 384 of AR[l) process. spectrum. aggregate series. 60.523 (see aIso effect of aggregation. 48 MA(2) process. and temporal aggregation) Ilkaike's information criterion (AIC). 74 seasonal. 10.286 Amplitude. 508. 516 order of.271 ARIMA(O. 275 inverse. 23 AR(1) process. 51 MA(q) process. 10 sample. 52 of ARMA(1. 1) process. 59 ARMA(p. 123 Autocovariance matrix function. 57 estimation of. 23 MA(1) process. 490. 366. 354 Causality. 165 CAT criterion. 3 13 for spectral ordinates. 495 Cyclical model.569 monthly U. 115. 430 representation of.391 Covariance matrix generating function. 310 Box-Cox's transformation.475.116. 139 Confidence interval. 434 error-correction representation of. housing sales and housing starts. 335. 240 Conditional sum of squares function. 350. 273. q.178. 23. 157 Canonical correlations.606 Subject Index Autoregressive processes. 1 . 111. 33 first-order. electricity consumption. appendix. 364 monthly U. 187 . 565 blowfiy data. 329 Cross-covariance function.437. 428 AR representation of.161.S. 528 Cayley-Hamilon theorem. 358 Cointegrated processes.569 contaimated blowfly data. 357 interpretation of.440 Buys-Ballot table. 326.S. 483 Causal model. 2. 384 Autoregressive representation. 183 daily average number of truck manufacturing defects.109 Brownian motion. 495 Blackman-Tukey window.574 Monthly airline passengers in the U. 302. 408 sample. 68. 359 estimation of. 326 Cross-correlation function (CCF). 401 Correlogram. 433 MA representation of. 268 Daniell's estimator.340. 218 monthly U.467 Central limit theorem for stationary processes. of window.361 estimation of.228. 537 Bispectrum. 2.153.580 Oklahoma and Louisiana spot prices for natural gas.141 Backshift operator.384 Bandwidth. 319 Backcasting method.. new one-family houses sold. 85. 382 partial lag. 23 Autoregressive signal plus noise model. 382.359 and transfer function models.223 Colorado ski resort data.34 pth-order.S.S. K s) model. 220 monthly employment figures for young man.383.223. 442 Cramer's representation. 428 testing and estimating.566 daily closing stock price of Duke Energy Corporation. 216. 20 Cospectrurn.416. 1) model.185.27. 499 BL(l. 153. 156 Bilinear model.38 vector.579 Quarterly U.272. 7 Correlation matrix function. 160.161. 325 sample. 45 second-order. 351 Cumulants. 315 Data. 262 Correlation function. beer production. 350 Covariance matrix function. 435 Complex conjugate. 323.363 and the transfer function.441. 238.O.228. 199 Coherence. 33. 1 Convolution. 466. 428 Cointegration. 23 1. 546.383 Cross-covariancegenerating function. 487 Autoregressive spectral estimation.351 Complex exponentials. 340.572 monthly unemployed young women. 10 sample.578 monthly average of hourly readings of ozone in downtown Los Angela.S. 140.307 Bayesian information criterion (BTC). 2. 172. 2.363 linear.573 . 39 stationarity of. 573 Lydia Pinkham annual advertising and sales data. 431 decomposition of. 313 Continuous time series. 349. 376. 499 BL(p.351 Cross-conelogram. 153.518. 430 Cointegrating vectors. 325. 349 Cross-spectrum. 27 homogeneous.575 simulated ARCI) series.545 simulated ARIMA series. 43 simulated ARMAtf. 143 Exponential autoregressive model. 284 of moving average.lumber production. 110 Diagnostic checking. 153.Subject Index Data Continlred series C of Box. 222 Wolf yearly sunspot numbers. 266 Eventual forecast functions. 139. 117. 16 nonlinear. 152. 387. 72.571 yearly U S . 26 auxiliary equation. 255 for periodic funclions. 263 Deterministic trends. on forecasting.36.172 of ARMA models. 16 of autocovariances. 27 particular solution of.296. 255 for nonperiodic functions.317. 289 of transfer function models. 70. 19. I10 test for. 140 of mean. of a vector ARMA model. 283 of systematic s a m p h g and temporal aggregation on causality.124. 18 Estimation ARMA spectral. 69 Deterministic trend term.20 Ensemble.451 Ergodicity.69.537 of differencing.S. 27 nonhomogeneous. 138 of cross-spectrum. 151 maximum likelihood. and Reinsel. 16 autoregressive spectral. 540 of aggregation. 34. 112. 448. 499 Exponential smoothing.570 yeady number of lynx pelts sold.320. and Reinsel. 319 conditional maximum likelihood. 523 Energy spectrum. I Dual relationship. 2. 100 Exact likelihood function. 532 of temporal aggregation on causality. 145 parameter. 78 simulated seasonal time series. of freedom for smoothed spectrum. 533 of temporal aggregation.566 yearly cancer death rate for Pennsylvania.572 weekly births in New York City. f 09 Dirac's delta function. 534. personal consumption of gasoline and oil. 120. 153. 471 Delta function. on testing for linearity and normality. 520 of aggregation. 54 607 Effect (s) of aggregation.153. 140 of vector time series models.119. 476 U. 136 of spectrum. 414 Euler relation. 332 'unconditional maximum likelihood. 153 of transfer function models. 416 Difference equations. 430 simulated MA(1) series. 109 overdifferencing. 34. 130 . 159 Data space. 238 Even function. 134. 128. 316 EquivaIent representation.153. Jenkins. on testing for a unit root. EXPAR(p. 334 of vector time series models. 236 simulated random walk series. relative. 135 simulated cointegrated series. d) model. 50 simulated MA(2) series. 1) series.570 yearly U.130 asymptotic variance of.543 simulated AR(2) series. 522 Efficiency.S. 6 average. 3 18 of autocorrelations. 373 least squares. on forecasting and parameter estimation. 132 series M of Box. tobacco production. 27 general solution of. 263 Didchlet conditions. 220 yearly U. 483 Discrete time series. 258 Disaggregation. for forecasting future aggregates. 170. hog data. 153. 52 simulated series with outliers. 97 Extended sample autocorrelation function (FSACF). 27 Differencing. 528 of systematic sampling on causality.S. 148.36. 121. 357 of GARCH models. Jenkins. 250 Equivalent degrees. between AR(p) and MA(q) processes. 414 Fourier analysis. 331 of transfer function models. 489. 258 Fractional difference. 521 ARIMA models.111 of vector time series models. typical. 242 of nonperiodic sequences (discrete-timeFourier transform). 247 discrete-time. 91 effects of temporal aggregation on. 215 Intervention models. 88 application of Hilbert space in. of input. 463 linear. 241. 366. 352. 524 due to aggregation in parameter estimation. 237 coefficients. 247 continuous-time.257 . 351 Intervention analysis. 244 spacing of. 401 Impulse response function.175 using transfer function models. 254 of continuous-timefunctions (continuous-timeFourier transform).375. 521 Homogeneous equation.358 GARCH models. 133.331 Fisher's test. 522 seasonal models. 324 Impulse response weights. 90. 375 Gaussian process. 249. 241 of periodic sequences. 464 Innovational outlier. 27 Identifiabitity. 241 Fourier transform. 492 Frequency. 367 Harmonic components.286 fundamental. 27 Forecast. 254. 241.256 integral.181. 94 error.290 inverse transform.416 Forecasting. 90 minimum mean square error. 258 inverse. 368 Hidden periodic components. 88. 520 Innovation.242.343. 247 of periodic functions. 263 Generalized least squares (GLS). 25. 507 Forcing function. of linear system. 3. 342. sample. 241. 331 of seasonal models.416 updating. 324-325 Information criterion (AIC). 329.368 estimation of. 464 Input series prewhitened. Fast Fourier transform 258. 100 limits of.256 Fourier series. examples of. 283 Filtered output series. 89. 520 application in forecasting. 223 Input matrix.289 of finite sequences. 244 Heteroscedasticity. 284 Hilbert space. 343 Interpretation. 168 Generalized function.237 bivariate. 359 Gain function. 294 Aow variable.156 Information loss. 241 Fourier representation. 281 moving average. 99 variance of.4 models. 248 Frequency domain analysis. 373 computation of forecast error variance. 524 Inner product. 329 nonstationary. for vector ARMA models. 70.249. 283 transfer function. 292 Hidden periodicity. 108 of noise model. 88.608 Subject Index (Fm.289 frequencies. 341 vectorARM. pair@).492 estimation of. of the sequence. 387 Identification of ARTMA models. of order nl.375.343. 212 intervention analysis. 10. 88 computation of.16 General AEUMA models.416 eventual functions of. of the cross-spectral functions. 281 Kaiman. 212 . 349 Frequency response function. 257 series. 170. 324 Impulse response functions. 513 High pass filter.259 Filter. 172 of Series W1-W7. 181. 397. 366. 33 threshold autoregressive.Subject Index Interventions. 323 seasonal. 168 intervention. 302 Kronecker product. 151 unconditionat. 284 Matrix. 26 Linear manifold. 181 Measurement equation. 146 ordinary. 136 Metric. 138 conditional. 164. 385 square summable.498 Leakage. 57 autoregressiveintegrated moving average (ARiMA). 416 Model identification. 91 for stationary input and output series. 384 609 Mean function.387. 108. 386 Kalman filter. 47 of vector ARMA models.385 of AR models.7 1 construction of transfer function. 23 .439 Linear difference equations.230.478 Kalman gain. 385 Maximum likelihood estimation.368 general ARIMA. 182 Mean percentage error. 322 vector. 275 Inverse Fourier transform (inverse Fourier integral). 489 Low pass filter. 33 of ARMA models. 520 Linearity test. 140 Likelihood ratio test. 138 GARCH models. 156 Akaike's BIC. 366 transfer function. 463. 139 conditional.275 Inverse autocorrelation generating function. 17 variance of sample. 122.331. 181 method of.305. 156 based on forecast errors. 415 m-dimensional white noise random vectors. 138 unconditional. 373 ARMA models. 125 Inverse autocovariancefunction. 384 Correlation.91. 19. 182 Mean absolute percentage error.358. 47 of MA models. 383 Covariance.417 Model selection. 520 Minimum mean square error forecasts. 126 Tnvertibility. 385 Coefficient. 382. 157 Models autoregressivemoving average (ARMA). 397 polynomial. 463 Linear vector space.401.88. 18 Mean absolute error. 181 Mean square error. 382 equation. 7 sample. of the space. absolutely summable. 25. 521 Linear time-invariant system. 482 Kernel. 169 state space. 157 Schwartz's SBC.396. 160.398 Moving average (MA) and autoregressive representations of time series process. 2f 2 multiple-input transfer function. 392 Lag window. 304. 33. 463 stationary time series. 140 vector ARMA models. 156 Akaike's AZC. 69 nonstationary vector autoregressivemoving average. 495 Long memory processes. 139 nonlinear. 310 Least squares estimation. 499 time series regression. 343 for vector time series models. 425. 69. 498 nonstationary time series. 89 for ARTMA models. 464 Method of moments. 342 for nonstationary input and output series. 257 Inverse process. 329 deterministic trend.316.394. 156 Parzen's CAT. 361 nonlinear time series. 89 for ARMA models. 400 realizable. 69 GARCH. 212 Inverse autocorrelation function (MCF'). 499 Nonlinear moving average (NLMA) model. 2. 230 Outlier model. 522. 42 of AR(p) process. q) protess. 414 Partid lag correlation matrix function. 408. 71 in the mean. 413 Noise model. 443 Partid lag autocorrelation matrix. 238 Orthogonal subspaces. 27 Nonlinear autoregressive model. 404 Partial covariance matrix generating function.523 Ordinary least squares (OLS) estimation. 51 MA(q) process. 447 covariance matrix generating function. 499 NLAR(1) model. 361 Nonaggregate series. 464 Overdifferencing. 223 level shift. 8 Nyquist frequency. 47 autocorrelation function of. 508. hidden.447 co~~elation matrix finction. 289 analysis.151. 226 . 230 time series. 513 test for. hidden.401 PACP. 361 Multiplicative model. 408 Partial process. 224 Output equation. 59 of MA(1) process. 241 Orthogonal functions. 11. 212. 397 Multiple-input transfer function models. 52 stationarity of. 52 of MA(q) process. 464 Output (or observation) matrix.447 correlation matrix. 47 vector MA processes. 498 Normality test. 402 sample. 508 Nonhomogeneous equation. 247 Nonstationarity homogeneous. 157 Period. 405 Partid autoregression matrix function. 423 Multivariate partial autocorrelation matrix. 129. 35 Partial autoregression matrices. 47 MA(1) process.410 Partial lag correlation matrix. 495 Nonperiodic sequence. 109. 290 Phase. 447 Parzen's CAT criterion. 384 representation. and ESACF. 169 of AR(1) process. 2.258. 15 for seasonal models. Outliers. 489.80 Nonstationary time series models.242 fundamental. 34 of AR(2) process. 54 sample. 166 Multivariate linear regression models. 160 Periodic components. 47 of ARMA(1.223 additive (AO). 499 NLAR(p) model.494 Nonlinear time series models. 68 Nonstationary vector autoregressive moving average models. 331 Noise spectrum. la. 498 Nonlinear processes. for seasonal models. 48 MA(2) process.250. 400 nth order weakly stationary. 292 Periodicity. 49 of MA(2) process. 223 detection of. 286 Observation equation. 59 of ARMA(p.189. 70 . 244. 242 seasonal. IACF.innovational. 464 Order of aggregation.610 Subject Index Moving average (MA) Cotrtinued and autoregressive representations of vector processes. 169 Parseval's relation. 442. 442.290 Parsimony.69 in the variance and the autocovariance. 27 Partial autoconelation function ( P A W . 223 temporary change. 289 sampling properties of.366 Orthogonal basis. 443 sample correlation matrix function. 52 invertibility of. 360 Noise-to-signal ratio. 292 Periodogram. 413 sample. 22 standard error of sample. 1) process.89 Moving average processes. 24. 329 inverse autocorrelation function. 245 Power transformation. 267. 157 Seasonal adjusted series. application of in forecasting. 245 square surnmable. 18 correlation matrix function. 206 Seasonal dummy.495. 488 Rational spectrum. 306 Recursive algorithm. 170 PACF for. 126 mean. f 57 Schwartz's Bayesian criterion. 73 Random walk plus noise model. 361 Single-input transfer function models. 7 Rectangular window. for computing P(s). 250 Short memory process. 405 spacing of frequencies. 22 partial autoregression matrices. 85 Prediction. 7 Projection theorem. 162 seasonal adjustment. 361 Signal-to-noise ratio. 247 periodic. 3.358 Polyspect~um. 164 Second order stationary. 492 Signal. 164 seasonal ARIMA models.496 Portmanteau test. 170 IACF for.189 with drift. 401 conelogram. 250 . 272 61 1 . 301. 10 Power. 20 autocovariance function. 164 forecasting. 169 model building for. i n distribution. 323 Realization. 422 Spectral estimation. 264 strictly stationary. 329 Pretvhitening. ARMA. 245 Power spectrum. 471 Prewhitened input series. 213 Quadrature spectrum. 6 Sampling properties. 7 strongly stationary.Subject Index Phase spectrum. 329 cross-covariance function. 464 Sample autocorrelation function. 422 Spectral distribution function. 6 Real-valued process. 521 Pulse function. 7 stationary. 271 Spectral density matrix function. 304 Space MIbert. 298 Sample function. 290 SBC criterjon. 160. 17 partial autocorrelation function. 322 Smoothed spectrum. of the system. 521 predictor. 422 of autocovariance functions. 163 regression method. 164 Seasonal models. 354. 47 1 Space representation. 160 Seasonal time series. 275 Realizable model. 334 lack of fit test.206 moving average method. 464 Spectral density. of sequence. of periodogram. 153 Positive semidefinite. 328 of autocovariances between nonaggregate and aggregate series. 160 ARIMA.268 of stationary processes. 242 nonperiodic. 318 Spectral representation of covariance matrix function. 242 power of. 7 Sequence absolutely summable. 331 double. 20 cross-correlation function.357 Smoothing in frequency domain. 267 Spectral distribution matrix function. 332 Process completely stationary. 350 Random walk model. 248 spectrum. 41 1 Relationship between cross-correlation and transfer functions. 301 in time domain. 88 Predictor space. 72. 169 testing for unit root.162 Seasonal period. spectral theory of. finite. 508 between state space and ARMA models. 526 Tables. 594-595 . 358 rational. 279 smoothed. 71 Stochastic trend models. 526 effect of. 70 Stock variable.274. 361 phase. 7 of autoregressive moving average processes. 464 State space representation of ARMA models. 464 State equation. 47 1 fitting. 250. 7 strong. 440. orthogonal. 581 summary of models for series Wf -W7. 165 chi-square distribution. 464 transition equation. 301. 3 17 F-distribution.612 Subject Index Spectral window. 213 Stochastic integral. 581-599 analysis of variance table for periodogram analysis. 289. 470 input equation. on causality.358 power. 264. 470 data space. 464 output equation. 528 Systematic sampling interval. 422 Squared coherency. 464 Stationarity.305 Spectrum. 463 linear. 471 system equation. 295 likelihood ratio test for cointegration rank. 406 Sunspot numbers.441 some typical impulse response functions.506 System. 7 weak. See also Cospectrum. 264 Stationary time series models.298 of ARMA models. 274 of linear filters. 324-326 standard normal distribution. 464 measurement equation.357 of sum of two independent processes. 464 Systematic sampling in time series. 244 quadrature. 33 Steady state. 592-593 for seasonal series. and differencing. 492 spectral theory of. 464 State vector. 464 predictor space.315. 245 properties of. 8 in distribution.38 complete. 6 Stochastic trend. 507 of AUIMA process. 298 of seasonal models. 57 of autoregressive processes. 596-599 for nonaggregate series. 301. 290 Buys-Ballot table for seasonal data. 463 canonical correlation analysis and. 507 Strongly stationary.385 State. 276 Spectrum matrix. for partial autoregression. 464 State space model. 269 energy. 112. 109 cointegration and unit root test on residuals. statistical. 350 Square summable.504. 386 strict. 8 Stationary process short memory process. 274 discrete. 422 noise. 270 matrix. 278 of white noise process. 587 Fisher's test for the largest periodogam ordinate. 463 System equation. 7 Subspaces. 272 Stochastic process. 585 characteristics of ACF and PACF for stationary processes. 275 sample. 354.113. 250 estimation of. 7 of vector ARMA processes. 521 Summary statistics. 8 wide sense. 33. of the process. 582 unit root test for aggregate series. 482 Step function.155 t-distribution. 437 equivalent degrees of freedom for various windows. 250 and autocovariance generating hnction. Cross spectrum amplitude. 281 line. 7 covariance. . . 387 identification of. 19. 500 TAR(2. 240 Truncation point. 196 seasonal model. 315 Variance function. 1) model. 397 MA representation of. 2) model. nonstationary. 313. 1I.498 Tukey window. 500 Time delay. 186.22 of sample spectrum. 414 identifiability of. 385. 311 Tukey-Hamming window. 331 multiple input. 140 Unit root. 507 of AR(p) process. 238 orthogonality of.331 and cross-spectrum. 11. 283. 359 and vector ARMA model. 322 Transition equation.237 Time series. 196. 384 AR(p) models. 3.201.Subject Index TAR-GARCH models. 140 Unconditional maximum likelhood estimation. 401 invertibility of. 7 Variance stabilizing transformations. 33 vectors. 140 Unconditional sum of squares. 463 Time-varying regression model. 300 of smoothed spectrum.366 Time-invariant system. 341 identification of. 396 MA(q) modeb. 324 Unconditional least squares estimators. 83 Variance-covariance matrix.540 tables. 382. 162. . 388 construction of. 498 nonstationary. 223 'rime series regression. 515 T i e series models nonlinear. 511 of ARIMA(p. 1)models. 389 MA(1) models.357. 414 forecasting of.322 Transfer function model. 31 1 Qpicd impulse response functions. 499 modeling TAR models. 505 test for. f 62 Transfer function. homogeneous. I . 383 Vector time series models. 500 TAR(k. q) process. 21 of sample autocovariances. limiting behavior of. 70 Trend-cycle. 240 Trigonometric identities. 322. 503 TAR(3. 1 Time series. 329 diagnostic checks of. q). 206 testing for. 592-599 Updating forecasts. 4. 68 seasonal. 482 Variance of forecast errors. 384 613 . 506 Temporal aggregation. 361 single input. 201 a general model. 69 stochastic. 394 autoregressive moving average process. process.401 Time series outliers. 71 Time series aggregates. 99 procedure for a state vector. 162 Triangular window. 502 self-excitingTAR models. 332 forecasting of. 388 AR representation of.20 of sample cross-correlations. 334 estimation of. 507 of IMA(d. 5 13 Threshold autoregressive models.p& d) model. 387 ARMA(1. 483 Traditional methods. 17 of sample partid autocorrelations. 398 ARMA and transfer function models. 322 stable. 90 of sample autocorrelations. 386 estimation of. 500 TAR(2. 512 of W A process. 308 Trigonometric functions. 354 Xme domain analysis. d. 3) model. . 160 stationary. 189 AR(p) model. 1. 186 AR(1) model. 464 Transition matrix.lO. 464 Trend deterministic. 10.pl. 382 AR(1) models. 330 of sample mean. 302. 276 Wiener process.305. 3 11 hanning. 24. 396 Yule-Walker estimators. 8 Weighting function.305 White noise processes. 3 11 Wold's decomposition. 15. 301. 421 Weakly stationary. 3 10 Parzen. generalized. 313 Window closing.384 spectrum of. 31 1 Tukey-Hamming. 400 stationarity of. 164 Yule process.305. 385. 304. 310 hamming. 42 Yule-Walker equations.307 Bartlett's. 3 1 1 lag. 384 spectral properties of. of stationary process. 308 lbkey. 384 nonstationarity. 137 . 306 spectral. 3 11 rectangular.311 triangular. 152.61 4 Subject Index Vector time series models Continlced at-dimensional white noise random. 304. 308 Blackman-Tbkey. 136.389 unit roots and cointegration in. 187 Window carpentry. 271 Wold's representation.403 matrix.358 leakage. 3 13 Window bandwidth of.384 X-12 method. 428 Vector processes autoregressive presentations of. Documents Similar To Time Series Analysis - Univariate and Multivariate Methods by William Wei.pdfSkip carouselcarousel previouscarousel nextApplied+Multivariate+Statistical+Analysis+6E【课后习题答案】BD solutions.pdfMakridakis, Wheelwright & Hyndman - Forecasting, Methods and Applications. 3rd EdDONALD E. BENTLY_HANDBOOK FUNDAMENTALS OF ROTATING MACHINERY DIAGNOSTICS.pdfintro to Econometrics Formula Review SheetTime Series Analysis Forecasting yControl (3rd Edition)[Chris Chatfield] the Analysis of Time SeriesThe Analysis of Time Series - An IntroductionApplied Time Series Modelling and ForecastingForecastingTime Series With SPSSTime Series Analysis Univariate and Multivariate Methods.pdfTime Series AnalysisTime series analysis in RFiltering and System IdentificationTime Series BookComputational Methods in Power System AnalysisTSintro Accidental Mortality in IndiaIntroduction to Time Series and Forecasting【solution manual 】12 3 2010 Mitchel Slidesbasic-econometrics-gujarati-2008time seriesTime Series Analysis by SPSS(7th Edition) William Mendenhall-A Second Course in Statistics_ Regression Analysis-Prentice Hall (2011)Time Series AnalysistmpF6B3.tmpTime Series _SASBox Jenkins MethodologyMore From Anuj SachdevSkip carouselcarousel previouscarousel nextfsr_15.pdfRobot report3 -Understanding DemandNew Microsoft Office Word DocumentProject report first 4 pages.pdfCrash Analysis of Car Chassis Using AnsysFooter MenuBack To TopAboutAbout ScribdPressOur blogJoin our team!Contact UsJoin todayInvite FriendsGiftsLegalTermsPrivacyCopyrightSupportHelp / FAQAccessibilityPurchase helpAdChoicesPublishersSocial MediaCopyright © 2018 Scribd Inc. .Browse Books.Site Directory.Site Language: English中文EspañolالعربيةPortuguês日本語DeutschFrançaisTurkceРусский языкTiếng việtJęzyk polskiBahasa indonesiaSign up to vote on this titleUsefulNot usefulYou're Reading a Free PreviewDownloadClose DialogAre you sure?This action might not be possible to undo. Are you sure you want to continue?CANCELOK
Copyright © 2024 DOKUMEN.SITE Inc.