Skip Navigation


Journal of Financial Econometrics Advance Access originally published online on February 26, 2008
Journal of Financial Econometrics 2008 6(2):171-207; doi:10.1093/jjfinec/nbn003
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
6/2/171    most recent
nbn003v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Easley, D.
Right arrow Articles by Wu, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008.
Published by Oxford University Press. All rights reserved. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oupjournals.org

Time-Varying Arrival Rates of Informed and Uninformed Trades

David Easley
     Cornell University

Robert F. Engle
     New York University

Maureen O'Hara
     Cornell University

Liuren Wu
     Baruch College, CUNY

Address correspondence to Robert F. Engle, Stern School of Business, New York University, 44 West 4th Street, Suite 9-62, NY 10012-1126, or e-mail: rengle{at}stern.nyu.edu.

JEL Classification: C51, C53, G10, G12, G14


    Abstract
 Top
 Abstract
 Introduction
 1 MODEL FORMULATION
 2 DATA AND ESTIMATION
 3 THE ARRIVAL RATE...
 4 FORECASTING MARKET LIQUIDITY...
 5 DIAGNOSTIC ANALYSIS
 6 CONCLUSION
 References
 Footnotes
 
We propose a dynamic econometric microstructure model of trading, and we investigate how the dynamics of trades and trade composition interact with the evolution of market liquidity, market depth, and order flow. We estimate a bivariate generalized autoregressive intensity process for the arrival rates of informed and uninformed trades for 16 actively traded stocks over 15 years of transaction data. Our results show that both informed and uninformed trades are highly persistent, but that the uninformed arrival forecasts respond negatively to past forecasts of the informed intensity. Our estimation generates daily conditional arrival rates of informed and uninformed trades, which we use to construct forecasts of the probability of information-based trade (PIN). These forecasts are used in turn to forecast market liquidity as measured by bid-ask spreads and the price impact of orders. We observe that PINs vary across assets and over time, and most importantly that they are correlated across assets. Our analysis shows that one principal component explains much of the daily variation in PINs and that this systemic liquidity factor may be important for asset pricing. We also find that PINs tend to rise before earnings announcement days and decline afterwards.

KEYWORDS: Arrival rates, informed trades, uninformed trades, autoregressive process, market depth, liquidity


    Introduction
 Top
 Abstract
 Introduction
 1 MODEL FORMULATION
 2 DATA AND ESTIMATION
 3 THE ARRIVAL RATE...
 4 FORECASTING MARKET LIQUIDITY...
 5 DIAGNOSTIC ANALYSIS
 6 CONCLUSION
 References
 Footnotes
 
A fundamental insight of the microstructure literature is that order flow is informative regarding subsequent price movements. This informational role arises because orders arrive from both informed and uninformed traders, and market observers can infer new information regarding the value of the asset from the composition and existence of trades. Thus, market parameters such as volume, volatility, market depth, and liquidity are all linked in the sense that each is influenced by the underlying order arrival processes. In this paper, we propose a dynamic econometric microstructure model of trading, and we investigate how the dynamics of trades and trade composition interact with the evolution of market liquidity, market depth, and order flows.

There are many reasons why understanding market liquidity and depth are important. From a practical perspective, the cost of trading in a security is inextricably linked to these market variables, and market professionals devise trading strategies that explicitly incorporate these factors. From a more academic perspective, understanding the evolution of liquidity and its interaction with information flow provides insight into the price formation process as well as into more fundamental asset pricing issues as formulated by Easley, Hvidkjaer, and O'Hara (2002Go), O'Hara (2003Go), and Acharya and Pedersen (2005Go). We argue in this paper that understanding market parameters such as liquidity requires understanding a more basic market variable, the order arrival process.

Our dynamic microstructure model follows Easley and O'Hara (1992Go) by letting the arrival of informed and uninformed traders dictate the order flow and the price formulation. Different from them, however, our model explicitly allows the arrival rates of informed and uninformed trades to be time-varying and predictable. We propose a forecasting relation for the bivariate arrival rate process which is analogous to the GARCH (Bollerslev 1986Go) specifications on volatilities. We estimate the parameters that govern the forecasting dynamics using a maximum likelihood method. The likelihood function is determined by the probability of having a given set of buy and sell orders each day, as a function of the arrival rate forecasts. Thus, our model specification allows us to forecast the arrival rates of informed and uninformed orders, and then to forecast the resultant measures of liquidity based on these order arrival processes.

Our modeling approach is a blending of model-based microstructure (see, for example, Easley and O'Hara 1992Go) with the literature analyzing the econometric determinants of the joint dynamics between trades and prices. Examples of the latter include Hasbrouck (1991Go), Dufour and Engle (2000Go), Engle (2000Go), Engle and Russell (1998Go), Manganelli (2000Go), Engle and Lange (2001Go), Chordia, Roll, Subrahmanyam (2000, 2001a, 2001b, 2002, 2005), Chordia and Subrahmanyam (2004Go), Hasbrouck and Seppi (2001Go), and Korajczyk and Sadka (2006Go). In common with this econometric literature, our model generates direct forecasts on market liquidity and depth. Different from them, however, we do not rely on exogenous dynamic specifications of trade and price linkages. Instead, our inclusion of a GARCH-style specification into a microstructure model allows us to show why particular components of order imbalance matter, thus providing an econometric structure for investigating order flow information and its resultant effects on market liquidity and depth.

To illustrate the potential of our methodology, we estimate the dynamic model for 16 actively traded stocks using daily numbers of buys and sells over 15 years from January 1983 to December 1998. We find that both the informed and uninformed order flows are highly persistent. More trade today generates more trade tomorrow by both kinds of traders. However, the uninformed arrival forecasts respond negatively to past forecasts on the informed arrival. Informed trade arrival responds more to past order imbalance than it does to overall trade volumes, with the impulse responses to both variables positive and the decay exponential. Uninformed trade responds more to past uninformed trade than it does to past informed trade. The impulse responses suggest a slower decay to the uninformed trading behavior.

We use the estimated model to generate forecasts on the arrival rates of informed and uninformed traders. Based on the arrival rate forecasts, we compute forecasts of the probability of information-based trading (PIN), which has been shown to have explanatory power for both spreads and returns. We also use the arrival rate forecast to predict trading-cost relevant measures such as bid-ask spreads and price impacts. For example, our microstructure model directly links the arrival rates of informed and uninformed traders to the bid-ask spread, and so our arrival rate forecasts can be used to predict bid-ask spreads. We illustrate the power of this approach by predicting opening spreads for a sample of stocks, and we find significantly positive results for most stocks. Similarly, given the arrival rate forecasts, we can use Bayesian updating to calculate the price impact of any given sequence of order flows. As an illustration, we define a measure of market depth we term the half-life. This measure is defined as the number of consecutive buys needed for the price impact to exceed half of the exogenously specified maximum impact. The half-life estimates provide a compact forecast of the market depth based on the forecasts of arrival rates of informed and uninformed traders.

We also illustrate the value of our dynamic model of trading by showing how our estimated PINs vary around earnings announcement days. One might expect PINs to be high before earnings announcements, and low afterwards as earnings announcements turn private information about earnings into public information. In a recent working paper, Benos and Jochec (2007Go) ask whether constant PINs estimated from the static model over time periods of at least 28 trading days before and after earnings announcement have this property. They find that their PIN estimates do not have the expected property. Our belief is that this occurs because the variation in trade based on private information occurs in short periods before and after announcements and using long periods to estimate PINs obscures this effect. Using our dynamic model, we find significant variation in PIN, in the predicted direction, in the week or so before and after earnings announcement days. This result suggests that with our dynamic specification PIN can be used in event studies.

We believe that our results will have an impact in three areas of finance. First, institutional investors need to predict trading costs in order to evaluate the efficiency of alternative trading strategies. In order to do this, it is necessary to predict the price impact of hypothetical trades. Our approach allows us to do a better job of making these predictions than standard microstructure models. We provide an illustrative example in Section 3-4. Second, the liquidity of assets is important for risk management as one of the risks associated with an asset position is the cost of reversing the position. We can predict the PIN, which in turn allows us to forecast liquidity. Third, our more sophisticated model of PIN shows that PINs are both autocorrelated and cross-correlated. Since PIN can be viewed as a simple measure of liquidity, our results show that liquidity covaries across assets. Acharya and Pedersen (2005Go) argue that liquidity risk matters for asset pricing and our PIN analysis shows that there is a systemic liquidity factor. Further, our new PINs should allow us to improve on the asset pricing results of Easley, Hvidkjaer, and O'Hara (2002Go).

The paper is organized as follows. We begin in Section 1 by setting out our dynamic microstructure models. Section 2 describes the data set and our estimation procedure. Section 3 provides our estimation results on the order arrival processes, and we examine the impulse response functions to shocks to trade imbalances and overall volume levels. Section 3-4 investigates the application of the arrival rate forecasts to the prediction of bid-ask spreads and price impacts. This section also illustrates how to use our dynamic model of PINs in an event study. Section 5 provides some diagnostic analysis of the forecasting results. Section 6 concludes.


    1 MODEL FORMULATION
 Top
 Abstract
 Introduction
 1 MODEL FORMULATION
 2 DATA AND ESTIMATION
 3 THE ARRIVAL RATE...
 4 FORECASTING MARKET LIQUIDITY...
 5 DIAGNOSTIC ANALYSIS
 6 CONCLUSION
 References
 Footnotes
 
In this section, we propose a dynamic microstructure model of trading. We use this model as a vehicle to investigate how the dynamics of trades and trade composition interact with the evolution of market liquidity and depth. From a practical perspective, portfolio managers observe the order flow of buys and sells on an asset, but not information on what type of player is behind each order and why that player sends a particular order. The idea of building the dynamic microstructure model is to provide a theoretical base according to which portfolio managers can infer the unobservable arrival rates of different types of players from the publicly observable streams of buys and sells. From an academic perspective, the microstructure framework enables us to separate information risk and liquidity risk, and their different impacts on asset pricing.

To build our dynamic model, we use the model of Easley and O'Hara (1992Go) as our benchmark, but allow the arrival rates of different types of trades to follow autoregressive processes. Every day agents update their parameter estimates based on past information before embarking on their trading day. We can use the microstructure model in a conditional form to construct the likelihood function of the observed order flows. By maximizing the likelihood function, we identify the parameters that govern the dynamic processes of the arrival rates. Using the estimated model, we can generate forecasts on the arrival rates, information flow, market liquidity, and depth.

1.1 The Static Model Benchmark
We follow Easley and O'Hara (1992Go) and Easley, Kiefer, and O'Hara (1996, 1997a, 1997b) in modeling a market in which a competitive market maker trades a risky asset with uninformed and informed traders. Trade occurs over discrete trading days and, within each trading day, trade occurs in continuous time. Information events occur between trading days with probability {alpha}. When these events occur, they are either bad news with probability {delta}, or good news with probability 1–{delta}. Traders informed of bad news sell and those informed of good news buy. We assume that orders from these informed traders follow a Poisson process with daily arrival rate µ. Uninformed traders trade for liquidity reasons. We assume that buy and sell orders from uninformed traders each arrive at the market according to a Poisson process with daily arrival rate {varepsilon}. A more extensive discussion of this structure can be found in Easley, Kiefer, and O'Hara (1996, 1997a, 1997b).

Under this model, the probability of observing B number of buys and S number of sells at a given date t is given by


Formula 1

(1)
where Formula denotes the observation vector (number of buys and sells) for day t. The probability can be regarded as a mixture of three Poisson probabilities, weighted by the probability of having a "good news day" Formula , a "bad news day" {alpha} {delta}, and a "no news day" (1–{alpha}). The model is static in the sense that each day the arrivals of an information event, and trades conditional on information events, are drawn from identical and independent distributions.

1.2 Time-Varying Arrival Rates of Trades
The benchmark model assumes constant arrival rates for both informed and uninformed traders. In reality, agents continually gain information about the trading environment and consequently update their estimates of these arrival rates. To capture this effect econometrically, we specify how the arrival rates evolve and what the key information sources are about the arrival rates. With the dynamics specification, the arrival rates in Equation (1) become conditional arrival rate forecasts, and the probabilities of buys and sells vary over time with the conditional arrival rate forecasts.

1.2.1 The information content of trades.
According to the benchmark microstructure model, data on daily numbers of buys and sells contain important information about the underlying arrival rates of informed and uninformed traders. Let Formula denote the total number of trades per day. The expected value of the total trades, Formula , is equal to the sum of the Poisson arrival rates of informed and uninformed trades:


Formula

Furthermore, the expected value of the trade imbalance Formula is given by:


Formula

Hence, when the probability of bad news {delta} is not exactly one-half, the mean of trade imbalance provides information on the arrival of informed trades. A more informative quantity is the absolute value of the trade imbalance. The expectation on absolute differences of Poisson variables takes on rather complicated forms (see Katti 1960), but the first-order term of this expectation relates directly to the arrival of the informed trades: Formula .

These relations provide the key information sources that agents would use to update their arrival rate estimates. In this paper, we model the arrival rate dynamics with a forecasting specification that uses past values of balanced and imbalanced trade as well as past arrival forecasts to forecast informed and uninformed arrival rates. It seems reasonable to allow arrival rates to depend on these variables as traders can observe them and can thus condition their trading choices on this data.

1.2.2 A generalized autoregressive specification on arrival rates of trades.
The arrival rate of informed trades is Formula and the arrival rate of the uninformed trades is 2{varepsilon}. We use Formula to denote the vector of the two arrival rates. To remove any deterministic trend in arrival rates, we model the detrended arrival rates Formula as a vector stationary process, where the vector Formula captures the growth rates of the two intensities.

In order to allow our arrival rate forecasts to depend on past observables, we specify that the detrended arrival rate forecasts follow bivariate vector autoregressive process with predetermined forcing variables,


Formula 2

(2)
where Formula denotes the detrended time-t forecast of the arrival rate vector at time t+1, Formula denotes the time-t observed absolute trade imbalance and balanced trades, and Formula denotes the detrended trade quantities. This equation is directly analogous to a GARCH equation (Bollerslev 1986Go), where unobservable quantities (arrival rates) are modeled as a function of observables (imbalanced and balanced trades). In principle, as in GARCH-type specifications, we can incorporate any predetermined observables into the forecasting equation as long as they are informative about the informed and uninformed trade arrivals.

To compute multistep forecasts of the arrival rates, it is necessary to forecast future values of Formula based on the model. As a first-order approximation, Formula . Then, as in GARCH models, the above forecasting relation can be rewritten as an Formula process:


Formula 3

(3)
where


Formula

and Formula denotes the forecasting error. The stationarity of the process requires that the eigenvalues of Formula be less than one.

For model estimation, we set Formula . Adding back the time trend, we can rewrite the forecasting relation as


Formula 4

(4)
where Formula is the Hadamard product.

Equation (4) forecasts the product of the parameter {alpha} and the arrival rate of informed traders µ. However, the likelihood function needs separate inputs for the two quantities. To separate them, we assume that {alpha}, the probability of an information event, is constant over time. In reality, informed trades could vary because of variations in either the arrival rate of informed traders µ or the probability of an information event {alpha}, or both. We find it more plausible that the arrival rate of informed traders is time varying than that the probability of an information event is time varying. Some information events are more important than others. We use the time-varying arrival rate of informed traders to capture the variation in the importance of the information events. More important information events attract more informed traders. Nevertheless, it is possible that the probability of having an information event also follows a stochastic process that we miss-identify as variation in informed traders with this assumption.

1.3 Maximum Likelihood Estimation
With daily observations on the number of buys and sells, we use a maximum likelihood method to estimate the parameters that govern the dynamics of the arrival rates of informed and uninformed trades [Formula ], the probability of an information event {alpha}, and the probability of bad news {delta}. First, given initial guesses on the model parameters, we use Equation (4) to forecast the informed and uninformed trade arrival rates at each time t based on information at time Formula to obtain [Formula ]. Second, conditional on the time-Formula forecasts of the time-t arrival rates, we compute the time-Formula conditional probability of having Formula buys and St sells at time t according to the benchmark microstructure model,


Formula 5

(5)
where Formula denotes the time-Formula filtration. Equation (5) represents a direct extension of Equation (1), where the constant arrival rates of informed and uninformed traders are replaced by their conditional forecasts.

We construct the aggregate log likelihood function on the time series of buys and sells as a summation of the logarithm of the daily conditional probabilities given in (5):


Formula 6

(6)
where t denotes the number of daily observations and Formula denotes the vector of model parameters, Formula . We obtain the parameter estimates by maximizing this aggregate likelihood function on the number of buys and sells.

Although the estimation procedure is straightforward, we often encounter numerical problems when performing the estimation in practice. The three components of the conditional probability in Equation (5) all have the factorials of buys and sells in the denominator and have the arrival rates raised to the power of buys and sells in the numerator. As the number of buys and sells become very large numbers for some heavily traded stocks, the computation generates overflow errors for both the numerator and the denominator. Furthermore, the exponential operation on the negative of the arrival rates can also generate underflow errors when the arrival rates are large.

To circumvent the numerical difficulty, we factor out a common term from the three components of the conditional probability, Formula , and rewrite the log likelihood function as,


Formula 7

(7)
with Formula . For model estimation, we also drop the last term Formula as it does not vary with the choice of model parameters.

Our model formulation combines the strength of GARCH-type specifications in forecasting arrival rate dynamics with a microstructure setting to generate a likelihood function that is tightly linked to the interactions between informed and uninformed traders. The GARCH specification in Equation (4) makes a static microstructure model dynamic and enables a highly stylized microstructure story to capture observed order flow behaviors. On the other hand, the microstructure backdrop provides guidance on the forecasting dynamics specifications and informative observable choices. It also generates structural interpretations on the estimated model parameters.


    2 DATA AND ESTIMATION
 Top
 Abstract
 Introduction
 1 MODEL FORMULATION
 2 DATA AND ESTIMATION
 3 THE ARRIVAL RATE...
 4 FORECASTING MARKET LIQUIDITY...
 5 DIAGNOSTIC ANALYSIS
 6 CONCLUSION
 References
 Footnotes
 
We select 16 actively traded stocks to illustrate our approach to estimating the arrival rates dynamics and forecasting trading costs.1 These stocks are Ashland (ASH), Exxon Mobil (XOM), Duke Energy (DUK), Enron (ENE), AOL Time Warner (AOL), Philip Morris (MO), ATT (T), Pfizer (PFE), Southwest Air (LUV), AMR (AMR), Dow Chemical (DOW), CitiGroup (C), JP Morgan Chase (JPM), Wal Mart (WMT), Home Depot (HD), and General Electric (GE). We choose representative stocks from a variety of industries that had high trading volume and were listed on the NYSE. The latter criterion is intended to avoid differences introduced by different trading platforms. Trade data for these stocks are taken from the TAQ transactions database over 15 years for the period January 3rd, 1983, to December 24th, 1998 (3891 business days). A minimum level of trading activity is necessary to extract the information changes from each day, so we exclude days when there are either no buys or no sells. The least active stock is Enron, from which we drop 244 inactive days, then JP Morgan Chase (244 days), Ashland (65 days), Duke Energy (61 days), Wal Mart (19 days), Exxon Mobil (18 days), Southwest Air (7 days), Pfizer (4 days), ATT (4 days), and Philip Morris (3 days). Furthermore, the data for AOL Time Warner, CitiGroup, and Home Depot start late. The starting dates are, respectively, September 16, 1996; October 29, 1986; and April 19, 1984.

The TAQ data provide a complete listing of quotes, depths, trades, and volume at each point in time for each traded security. For our analysis, we require the number of buys and sells for each day, but the TAQ data record only transactions, not who initiated the trade. The classification problem has been dealt with in a number of ways in the literature, with most methods using some variant on the uptick or downtick property of buys and sells. In this article, we use a technique developed by Lee and Ready (1991Go). Those authors propose defining trades above the midpoint of the bid-ask spread to be buys and trades below the midpoint of the spread to be sells. Trades at the midpoint are classified depending upon the price movement of the previous trade. Thus, a midpoint trade will be a sell if the midpoint moves down from the previous trade (a downtick) and will be a buy if the midpoint moves up. If there is no price movement, we move back to the prior price movement and use that as our benchmark. We apply this algorithm to each transaction in our sample to determine the daily numbers of buys and sells. The first trade each day is excluded from our sample as it is determined by a different mechanism.

We begin by analyzing the properties of the trade variables. Table 1 reports the summary statistics of the trade quantities Formula , the number of imbalanced and balanced trades. We observe the following features:

  • Trades are increasing. The daily number of balanced trades Formula grows faster than the trade imbalance K. The estimated annual growth rate for the balanced trade ranges from 2.4% for DOW to 94% for AOL. The growth rate for the trade imbalance ranges from negative for XOM (–3.66%) and DOW (–1.51%) to 133% for AOL.
  • The number of balanced trades is more volatile than trade imbalance. For all stocks investigated, the standard deviation of the balanced trades is much larger than the standard deviation of the trade imbalance. Standard deviations are measured on the detrended residuals. Furthermore, the intercept of the detrending regression is also larger for the number of balanced trades Formula than for the trade imbalance Formula , implying that the number of balanced trades dominates the total trades.
  • Trades are highly persistent. Balanced trades are more persistent than the trade imbalance. The first order autocorrelation for balanced trade ranges from 0.697 to 0.953 while that for the trade imbalance ranges from 0.145 and 0.772. Autocorrelations are measured on the detrended residuals.
  • Balanced trades and trade imbalances are cross-correlated. The two quantities are generally positively correlated. The cross-correlation coefficient between the balanced trade Formula and the trade imbalance Formula ranges from Formula for XOM to 0.802 for Citigroup.


View this table:
[in this window]
[in a new window]

 
Table 1 Summary statistics of trading activities.

 
The above observations suggest a level of complexity to the order arrival process that is not well captured by static models. The observations also suggest that informed and uninformed trade behaviors exhibit complex dynamic interactions, which are the key motivations for our dynamic specifications of the arrival rates. The observation that balanced and imbalanced trades show both serial and cross-sectional dependence indicates that the arrival rates of informed and uninformed trades are not constant over time, but instead follow some correlated, autoregressive dynamics. The observation that the trades are increasing over time prompts us to also incorporate a deterministic time trend in the arrival rate dynamics specification.

Using the time series of balanced and imbalanced trades on each of the 16 stocks, we maximize the log likelihood defined in Equation (7) to estimate the parameters that govern the dynamics of the arrival rates of informed and uninformed trades. These estimated parameters indicate how the two arrival rates interact with each other and how they move over time. From the estimated dynamics and observations on order flows, we then construct arrival rate forecasts, which in turn predict market liquidity, depth, and potential trading cost in each stock.


    3 THE ARRIVAL RATE DYNAMICS
 Top
 Abstract
 Introduction
 1 MODEL FORMULATION
 2 DATA AND ESTIMATION
 3 THE ARRIVAL RATE...
 4 FORECASTING MARKET LIQUIDITY...
 5 DIAGNOSTIC ANALYSIS
 6 CONCLUSION
 References
 Footnotes
 
Table 2 reports the parameter estimates and the maximized log likelihood values for each stock. Our focus here is on the dynamics of informed and uninformed order flow rather than directly on the parameter estimates. We first discuss how to construct the dynamics from the parameter estimates. In the next section, we turn our attention to the impact of the dynamics on market liquidity, depth, and trading cost analysis.


View this table:
[in this window]
[in a new window]

 
Table 2 Maximum likelihood estimates for model parameters.

 
To understand how the arrival rates of the two types of trades interact with each other and how they respond to innovations in the order flow, we rewrite the generalized autoregressive process as,


Formula

The second line is obtained via a linear approximation on the expectation of the balanced and imbalanced trades. The term Formula captures the first-order persistence of the arrival rate forecasts and Formula denotes the forecasting error, or innovation, in trading quantities. Based on this linear approximation, the multiperiod impact of a trade innovation on the arrival rate forecasts is given by the following impulse response function:


Formula 8

(8)
where Formula denotes the Formula th element of the impulse response matrix and captures the impact of the Formula th element of the shock Formula on the Formula th element of the arrival rate, Formula . In this system, the estimates on Formula capture the instantaneous impact of the time-t innovation on the time-t forecast of the next period's arrival rates. In contrast, the autoregressive matrix Formula measures the persistence of the arrival rate forecasts and determines to a large degree the multiperiod impact of the trade innovations. The whole picture of dynamics is obtained by a joint analysis of the instantaneous impact Formula , the autoregressive matrix Formula , and the whole impulse response function of each element.

3.1 The Instantaneous Impact of Trade Innovations
The instantaneous impact of trade innovations Formula on the arrival rate forecasts Formula is captured by the Formula matrix. Inspecting the estimates of the Formula matrix in Table 2, we find that the estimates for all elements of the matrix are positive for all the 16 stocks. Therefore, shocks to both balanced and imbalanced trades have positive instantaneous impacts on the arrival rate of both informed and uninformed agents. Further inspection shows that the estimates for the Formula and Formula elements are larger than the estimates for the Formula and Formula estimates, indicating that both trade innovations have a larger impact on the arrival rate forecast of uninformed trades than on the arrival rate forecast of informed trades. As a result, we can more effectively forecast the uninformed arrival rate than the informed.

The elements Formula and Formula capture the instantaneous impact of the innovation in trade imbalance Formula on the informed and uninformed arrival forecasts, respectively, holding the number of balanced trades constant. Hence, the positive coefficients imply that given a fixed number of balanced trades, increasing trade imbalances increase the arrival forecasts on both informed and uninformed arrivals, potentially because increasing the trade imbalance in this scenario also increases the total number of trades.

On the other hand, if we hold the total number of trades constant, the instantaneous effect of a relative increase in the trade imbalance is captured by Formula on the informed arrival forecast and by Formula on the uninformed arrival forecast. We find that the estimates for the difference Formula remain predominantly positive, with only one exception in Citigroup. Thus, we conclude that a relative increase in the composition of the imbalanced trades also increases the arrival forecasts of informed trades for most stocks. However, the estimates for the difference Formula have mixed signs negative for seven firms and positive for nine forms. Hence, the impact of a relative increase in the composition of imbalanced trades is ambiguous on the arrival forecast of uninformed trades.

Overall, we find that an absolute increase in either balanced or imbalanced trades increases the forecasts of both informed and uninformed arrivals. So we forecast greater arrival rates for both types of traders following an increase in trade of either type. However, an increase in the relative composition of the imbalanced trades while holding the total number of trades constant has a positive impact on the arrival forecast of informed trades, but an ambiguous impact on the arrival forecast of uninformed trades. So we forecast a greater arrival rate for informed traders following an increase in the share of trades that are imbalanced, but there is no clear effect of the share of imbalanced trades on the forecast of uninformed arrivals.

3.2 The Serial Dependence of Arrival Rate Forecasts
The Formula matrix captures the first-order persistence of the vector arrival rate forecasts on informed and uninformed trades. The diagonal terms of Formula capture how the current forecast is correlated with the lagged forecast of the same arrival rate. The parameter estimates reported in Table 2 indicate that the diagonal terms of Formula are mostly positive, indicating a trend following or herding behavior for both types of arrival rate forecasts. Table 3 reports the eigenvalues of this impact multiplier for the 16 stocks in our sample. Under the linear approximation, both eigenvalues should be less than one for the vector process to be stationary. Given the nonlinearity inherent in the dependence of Formula on Formula , we cannot directly use the eigenvalues to determine the stationarity of the system. Nevertheless, the magnitudes of the eigenvalues give us an approximate picture of the persistence. For all the 16 stocks, we find that the second eigenvalue of the multiplier matrix is very close to one, demonstrating the extreme persistence of the system.


View this table:
[in this window]
[in a new window]

 
Table 3 Stationarity of the dynamic arrival rate processes.

 
The dynamics of the vector arrival rate processes is further complicated by the presence of large off-diagonal terms in Formula . In particular, the Formula th element of the impact multiplier, Formula , captures the impact of the previous informed arrival rate forecast on the current uninformed arrival rate forecast. For all 16 stocks, the estimates for Formula in Table 2 are all remarkably negative. Thus, a forecasted increase in the arrival rate of informed trades leads to a systematic decrease in our forecasts of the uninformed arrival rate. This forecasting relation is not predicted by traditional microstructure models, which view the only determinant of uninformed trading as the presence of other uninformed traders. The behavior is more in line with models that allow discretionary behaviors for liquidity traders, e.g., Admati and Pfleiderer (1988Go), Foster and Vishwanathan (1990Go), and Lei and Wu (2000Go).

The impact of previous day's uninformed order arrival forecast on today's informed arrival forecast is captured by the Formula th element of impact multiplier, Formula . The estimates on Formula reported in Table 2 are small, and are not consistently positive or negative across the 16 stocks. Hence, the arrival forecasts of informed trades do not depend much on lagged forecasts on the uninformed arrivals. This dynamic behavior is consistent with the hypothesis that informed traders act mainly on information, and do not respond strongly to the activity of uninformed traders.

3.3 The Multiperiod Impact of Trade Innovation
The impulse response function, defined in Equation (8), describes how a shock to one of the state variables will alter the evolution of these variables through time. Such shocks will typically decay over time but in this case there is substantial persistence. The impulse-response function is determined jointly by the instantaneous impact matrix Formula and the impact multiplier Formula . In Figure 1, we plot the normalized impulse-response function for the 16 stocks in our sample, computed based on Equation (8). To compare the relative persistence of each of the four elements, we normalize each element of the impulse-response function by the corresponding element in Formula so that all elements of the impulse response are normalized to one at the instantaneous level Formula . The 16 stocks generate very similar persistence patterns. In particular, the arrival rate of uninformed trades (dotted line) is much more persistent than the arrival rate of informed trades (solid line), with one exception on AOL (the fifth panel). The persistence of cross-impacts falls between the two direct impacts.


Figure 1
View larger version (34K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 The normalized impulse response function. Lines depict the impulse response functions of the bivariate arrival rate system for the 16 companies. Each panel is for one company. In each panel, the solid line captures the impact of the trade imbalance Figure 1 on the arrival of informed trades, the dashed line captures the impact of the trade imbalance on the arrival of uninformed trades, the dash-dotted line captures the impact of the balanced trade Figure 1 on the informed trade arrival, and the dotted line captures the impact of the balanced trade on the uninformed arrival. For ease of comparison, we normalize all responses at Figure 1 to one.

 
This persistent behavior of informed and uninformed trades is not unexpected given that many studies have shown volume to be significantly and positively autocorrelated. But this result is at variance with the predictions of microstructure models in which trades are viewed as iid. Perhaps more importantly, the result reveals that trade patterns are predictable across trading days.

3.4 Robustness of Arrival Rate Dynamics with Respect to Model Perturbations
We have also done the estimation with a generalized autoregressive process on the logarithm of the arrival rates instead of the arrival rates themselves. This specification is analogous to the EGARCH model of Nelson (1991Go). The maximized log likelihood values from the two models are very close to one another, neither model consistently dominating the other model across all stocks. More importantly, parameter estimates from both models imply similar dynamic behaviors for the informed and uninformed arrivals, showing the robustness of the results.2 For both models, uninformed trades tend to be highly persistent. Uninformed order arrivals clump together, with high-volume days more likely to follow high-volume days, and conversely. However, an increase in the forecast of informed arrival rate leads to a decline in future forecast of the uninformed arrival rate. The informed arrival rates also exhibit complex patterns, but the forecast of the informed arrival rate depends little on past forecasts of the arrival rates of uninformed trades.


    4 FORECASTING MARKET LIQUIDITY AND DEPTH
 Top
 Abstract
 Introduction
 1 MODEL FORMULATION
 2 DATA AND ESTIMATION
 3 THE ARRIVAL RATE...
 4 FORECASTING MARKET LIQUIDITY...
 5 DIAGNOSTIC ANALYSIS
 6 CONCLUSION
 References
 Footnotes
 
In addition to providing insights on how the informed and uninformed dynamically interact with each other, the estimation of our dynamic model also generates direct forecasts on the arrival rates of informed and uninformed trades. These forecasts are informative in predicting the market liquidity and market depth. Thus, they are useful not only for academics in better understanding the market microstructure, but also for practitioners in better positioning their trades, and for risk managers seeking to measure the risks of illiquidity.

We also use our dynamic model to generate a time series of the PIN. This variable has been used in many studies to provide insight into the microstructure questions, such as the determinants of bid-ask spreads, and asset pricing questions, such as the determinants of the cost of capital. But all prior work using PIN required an assumption that it was constant over a substantial period of time. So PIN could not be used to provide insight into short-term, transitory changes in information-based trading. Here we show how to use the time series of PINs produced by our dynamic model to investigate the effects of earnings announcements on the variation in information-based trading.

4.1 Market Liquidity and Bid-Ask Spread
Market liquidity is often measured by the bid-ask spread: markets in which the bid-ask spread is small are interpreted as liquid markets. Our model links bid-ask spreads directly to the trade sequence and the arrival rates of informed and uninformed trades. By forecasting the arrival rates, we can predict the dynamics of bid-ask spreads.

We start by analyzing the bid quote in response to a sell order. Under our model, an application of Bayes rule shows that the probabilities of a good and a bad information event conditional on a sell order at time t are given by, respectively,


Formula 9

(9)
where Formula and Formula denote the prior probabilities at time t of a good and a bad information event, respectively, and (Formula ) denote the time-Formula forecast of the arrival rates of informed and uninformed traders at time t. In a competitive market, the bid price must provide the market maker zero expected profit conditional on a trade at the bid, i.e., the arrival of a sell order. Thus, the bid price should be equal to the expected value of the asset conditional on history and on the arrival of a sell order. If we use Formula to denote the expected asset value conditional on good news and Formula the expected value conditional on bad news, we can derive the bid price as


Formula 10

(10)
where Formula is the probability of no information event and Formula denotes the unconditional expected value of the asset.

Now, we consider the ask price for a buy order. Again, we can apply the Bayes rule to derive the probabilities of a good and a bad information event conditional on a buy order,


Formula 11

(11)
The ask price is the expected value of the asset conditional on this buy order,


Formula 12

(12)
From Equations (10) and (12), we can compute the bid-ask spread as a function of the trade sequence and the arrival rates of informed and uninformed traders. Therefore, our forecasts on the arrival rates lead to direct forecasts on the market liquidity as measured by bid-ask spreads.

For illustration, we consider the special case at the opening of each day t. We start the day with the unconditional probabilities of good and bad information events,


Formula 13

(13)
Plugging the unconditional priors in (13) into Equations (10) and (12), we obtain the date-t opening bid-ask spread (Formula ):


Formula 14

(14)
If we further assume that Formula , i.e., bad and good news have equal probabilities, the opening bid-ask spread simplifies to


Formula 15

(15)
where Formula denotes the time-(Formula ) forecasted fraction of informed trades at time t that are based on information. Hence, the opening bid-ask spread is directly linked to the expected trade composition.

Our dynamic model provides conditional expectations of the arrival rates of informed and uninformed trades. We use the arrival rate forecasts to compute forecasts of the probability of informed trades, PIN. This conditional PIN is interpreted as the forecast of the probability that a trade on the next day will be from an informed agent. Then, we use these conditional PINs to predict market liquidity, exemplified by the opening bid-ask spread, using (14). The summary statistics for the PIN forecasts are reported in Table 4.


View this table:
[in this window]
[in a new window]

 
Table 4 Sample properties of the forecasts on proportion of informed trades (PIN).

 
Figure 2 plots the time series of the PIN forecasts for each stock. For ease of comparison, we apply the same scale for all panels. We observe an obvious decline in the PIN forecasts over time for several stocks, especially during the last several years of our sample.


Figure 2
View larger version (42K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 The time series of PIN forecasts. Lines depict the time series of the PIN forecasts from our estimated dynamic model for each stock. PIN denotes the probability of informed trades, defined as the arrival of informed trades over the arrival of total trades. Each panel represents one stock. For ease of comparison, we apply the same scale on all panels.

 
A new generation of asset pricing theories ascribe a role to liquidity. Easley, Hvidkjaer, and O'Hara (2002Go), O'Hara (2003Go), and Acharya and Pedersen (2005Go) differ on the measures of liquidity but agree on their importance. A simple measure of illiquidity is PIN, or the probability of informed trading. High values imply wide bid-ask spreads, small market depths, and costly trading by uninformed traders. From Table 4 and Figure 2, it is clear that PIN varies across assets and over time. Although the average level of PIN is substantially different for these 16 stocks, perhaps even more important is the movement in this indicator. For each stock, the PIN estimate varies greatly over time. The minimum PIN estimates for most stocks are in single digits (in percentage points), but the maximum can well be over 30 percentage points.

From an asset pricing point of view, the covariance of illiquidity across assets is also of importance. Just as with the risk of return, diversification can reduce the risk that an investor must sell when an asset is particularly illiquid. Hence, the strength of correlation matters, see for example Hasbrouck and Seppi (2001Go) and Chordia, Roll, and Subrahmanyam (2000Go). It is clear from Figure 2 that PIN moves similarly across assets. Table 5 reports the cross-correlation estimates between the PIN time series on different stocks. The correlations are estimated using the common sample of the two stocks involved. The estimates differ greatly across different stock pairs, ranging from Formula to Formula . Based on the common sample of 14 stocks,3 we perform principal component analysis and plot the normalized eignevalues of each principal component in Figure 3. The plots show that one principal component explains 37% of the daily variation in the 14 PIN series. This estimate suggests that there is a systematic liquidity factor that underlies the stocks that we estimate. While diversification can remove the idiosyncratic component of the liquidity risk, the systematic liquidity risk in each stock should be priced.


View this table:
[in this window]
[in a new window]

 
Table 5 Cross-correlations of the PIN forecasts on different stocks.

 


Figure 3
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Percentage variation explained by each principal component of the PIN time series on 14 stocks. The length of bars denotes the normalized eigenvalues of the covariance matrix of the daily changes in the 14 time series of PIN estimates from our dynamic model. The normalized eigenvalues can be interpreted as the percentage variation explained by each principal component.

 
To examine how informative the arrival rate forecasts are in predicting the opening bid-ask spread, we run the following forecasting regression on each stock:


Formula 16

(16)
where OPS denotes the percentage opening bid-ask spread of a stock, defined as


Formula 17

(17)
where we normalize the bid-ask spread by the average of the bid and ask level. The normalization has two purposes. First, we want to abstract from the impact of the scale of the quote. Second, we use the mid-quote as a proxy for the maximum impact of the information event. The term Formula denotes the time-Formula forecast of the proportion of informed trades at time t. In addition to PIN, we also include three control variables: (1) the lagged spread Formula , (2) a standard GARCH(1,1) volatility estimate on the stock returns, Formula , which measures the time-(Formula forecast of time-t return volatility, and (3) the aggregate trading volume at time Formula . We use these control variables to capture variations in the spread that are not explained by the proportion of informed trades. The first variable captures the unexplained persistence of the spread. The second variables captures the contribution of the price data, which can potentially reveal information about the variation in the spread between the upper and lower bounds of the valuation (Formula ). The last variable captures the impact of trade size, which is absent from our model. The significance of the estimates on Formula indicates how informative our PIN forecasts are in predicting the opening bid-ask spread, on top of the predictions from the three control variables.

Since the estimate for {delta} is not exactly at Formula for most stocks, in theory we should use a more complicated function of arrival rates as in (14) rather than PIN. Nevertheless, we use PIN for its simplicity and its intuitive interpretation as a measure for expected trade composition. Furthermore, several studies have generated the PIN estimates from the static model (based on either a rolling or a nonoverlapping window) and explored their implications. Using PIN from our dynamic model provides a comparison with these studies.

We estimate the regressions using generalized methods of moments, with the weighting matrix calculated according to Newey and West (1987Go) with 30 lags. Table 6 reports the slope estimates, their standard errors (in parentheses), and the Formula -squares of the regressions in (16). The forecasting performance of the PIN forecasts are quite remarkable. The estimates for the Formula coefficient, which captures the impact of the probability of informed trades, are significantly positive for all but two stocks. The sample average of Formula over the 16 stocks is 0.253, with an average standard deviation of 0.105. The strong statistical significance of the coefficient estimates are remarkable given that the arrival rate forecasts are obtained from purely trade quantities while the opening bid-ask spread is a price behavior.


View this table:
[in this window]
[in a new window]

 
Table 6 Forecasting opening bid-ask spread.

 
The Formula coefficient estimates on the autoregressive component are also significantly positive for all stocks, indicating that the persistence of the bid-ask spreads cannot be fully explained by the arrival rate forecasts. Furthermore, the coefficient estimates Formula on the GARCH volatility are on average positive and that on the trading volume are on average negative, suggesting that the opening bid-ask spread is higher if the previous day's volatility is high but trading volume is low. Overall, the regression in (16) exhibits pronounced forecasting power, with an average Formula -square of 31.2%.

It is important to note that our arrival rates forecasts can be used to forecast the bid-ask spreads under any given trade sequences. Here, we use the specific regressions on the opening bid-ask spreads to illustrate their forecasting power and potential usefulness in forecasting the time-variation in market liquidity.

4.2 Market Depth and Price Impacts of Trade Orders
When a portfolio manager tries to purchase or liquidate a large position by sending consecutive buy or sell orders to the market, the price change induced by this series of orders could be significant. Using our dynamic microstructure framework, we can compute the price impact of this sequence of orders as a function of the arrival rates of informed and uninformed trades. Since we have forecasts of the arrival rates, our dynamic model can also be used to forecast the market depth and the potential cost of loading or unloading a position.

We use a sequence of Formula consecutive buy orders as an example. Let Formula and Formula denote the probabilities of a good and a bad information event conditional on N–1 consecutive buy orders. From (12), we can derive the price impact of N consecutive buys as


Formula

where Formula captures the impact of N consecutive buys:


Formula 18

(18)
The probabilities Formula and Formula can be readily updated via Bayes rule as in (11), starting from the unconditional priors at the opening. As the number of consecutive buy orders increases, the probability of a good information event increases and approaches unity while the probability of a bad information event approaches zero. The price impact Formula converges to {delta}, and the price converges to the expected upper bound of the asset value Formula . The speed of convergence governs the depth of the market and is determined by the arrival rate forecasts (Formula ).

To illustrate how the arrival rate forecasts impact the market depth, we use the first stock of our sample, Ashland Oil, as an example and consider three dates in our sample period when the PIN forecasts on Ashland are at the sample minimum, median, and maximum, respectively. At each of three PIN levels, we use the estimated model parameters on Ashland Oil and the arrival rate forecasts for that date to compute the price impacts of N consecutive buy orders (Formula ) according to Equation (18) and then normalize the impacts by their convergent value {delta}. Figure 4 plots the three normalized price impact curves (Formula ) as a function of the number of consecutive buy orders (N) at the three selected PIN levels for Ashland.


Figure 4
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 The price impact of consecutive buy orders on Ashland Oil. The lines depict the normalized price impact curves of consecutive buys (Figure 4), computed based on the arrival rate forecasts on Ashland Oil from our dynamic model on three different dates, when the forecasted proportion of informed trades (PIN) is at the minimum (left panel), median (middle panel), and maximum (right panel), respectively.

 
All three normalized curves start at zero with zero trade and converge to one as the stock price converges to its upper bound Formula with increasing number of consecutive buy orders. The speeds of convergence are captured by the slope of t