Journal of Financial Econometrics Advance Access originally published online on February 26, 2008
Journal of Financial Econometrics 2008 6(2):171-207; doi:10.1093/jjfinec/nbn003
| ||||||||||||||||||||||||||||||||||||||||||||||||
Time-Varying Arrival Rates of Informed and Uninformed Trades
Cornell University
New York University
Cornell University
Baruch College, CUNY
Address correspondence to Robert F. Engle, Stern School of Business, New York University, 44 West 4th Street, Suite 9-62, NY 10012-1126, or e-mail: rengle{at}stern.nyu.edu.
JEL Classification: C51, C53, G10, G12, G14
| Abstract |
|---|
|
|
|---|
We propose a dynamic econometric microstructure model of trading, and we investigate how the dynamics of trades and trade composition interact with the evolution of market liquidity, market depth, and order flow. We estimate a bivariate generalized autoregressive intensity process for the arrival rates of informed and uninformed trades for 16 actively traded stocks over 15 years of transaction data. Our results show that both informed and uninformed trades are highly persistent, but that the uninformed arrival forecasts respond negatively to past forecasts of the informed intensity. Our estimation generates daily conditional arrival rates of informed and uninformed trades, which we use to construct forecasts of the probability of information-based trade (PIN). These forecasts are used in turn to forecast market liquidity as measured by bid-ask spreads and the price impact of orders. We observe that PINs vary across assets and over time, and most importantly that they are correlated across assets. Our analysis shows that one principal component explains much of the daily variation in PINs and that this systemic liquidity factor may be important for asset pricing. We also find that PINs tend to rise before earnings announcement days and decline afterwards.
KEYWORDS: Arrival rates, informed trades, uninformed trades, autoregressive process, market depth, liquidity
| Introduction |
|---|
|
|
|---|
A fundamental insight of the microstructure literature is that order flow is informative regarding subsequent price movements. This informational role arises because orders arrive from both informed and uninformed traders, and market observers can infer new information regarding the value of the asset from the composition and existence of trades. Thus, market parameters such as volume, volatility, market depth, and liquidity are all linked in the sense that each is influenced by the underlying order arrival processes. In this paper, we propose a dynamic econometric microstructure model of trading, and we investigate how the dynamics of trades and trade composition interact with the evolution of market liquidity, market depth, and order flows.
There are many reasons why understanding market liquidity and depth are important. From a practical perspective, the cost of trading in a security is inextricably linked to these market variables, and market professionals devise trading strategies that explicitly incorporate these factors. From a more academic perspective, understanding the evolution of liquidity and its interaction with information flow provides insight into the price formation process as well as into more fundamental asset pricing issues as formulated by Easley, Hvidkjaer, and O'Hara (2002
), O'Hara (2003
), and Acharya and Pedersen (2005
). We argue in this paper that understanding market parameters such as liquidity requires understanding a more basic market variable, the order arrival process.
Our dynamic microstructure model follows Easley and O'Hara (1992
) by letting the arrival of informed and uninformed traders dictate the order flow and the price formulation. Different from them, however, our model explicitly allows the arrival rates of informed and uninformed trades to be time-varying and predictable. We propose a forecasting relation for the bivariate arrival rate process which is analogous to the GARCH (Bollerslev 1986
) specifications on volatilities. We estimate the parameters that govern the forecasting dynamics using a maximum likelihood method. The likelihood function is determined by the probability of having a given set of buy and sell orders each day, as a function of the arrival rate forecasts. Thus, our model specification allows us to forecast the arrival rates of informed and uninformed orders, and then to forecast the resultant measures of liquidity based on these order arrival processes.
Our modeling approach is a blending of model-based microstructure (see, for example, Easley and O'Hara 1992
) with the literature analyzing the econometric determinants of the joint dynamics between trades and prices. Examples of the latter include Hasbrouck (1991
), Dufour and Engle (2000
), Engle (2000
), Engle and Russell (1998
), Manganelli (2000
), Engle and Lange (2001
), Chordia, Roll, Subrahmanyam (2000, 2001a, 2001b, 2002, 2005), Chordia and Subrahmanyam (2004
), Hasbrouck and Seppi (2001
), and Korajczyk and Sadka (2006
). In common with this econometric literature, our model generates direct forecasts on market liquidity and depth. Different from them, however, we do not rely on exogenous dynamic specifications of trade and price linkages. Instead, our inclusion of a GARCH-style specification into a microstructure model allows us to show why particular components of order imbalance matter, thus providing an econometric structure for investigating order flow information and its resultant effects on market liquidity and depth.
To illustrate the potential of our methodology, we estimate the dynamic model for 16 actively traded stocks using daily numbers of buys and sells over 15 years from January 1983 to December 1998. We find that both the informed and uninformed order flows are highly persistent. More trade today generates more trade tomorrow by both kinds of traders. However, the uninformed arrival forecasts respond negatively to past forecasts on the informed arrival. Informed trade arrival responds more to past order imbalance than it does to overall trade volumes, with the impulse responses to both variables positive and the decay exponential. Uninformed trade responds more to past uninformed trade than it does to past informed trade. The impulse responses suggest a slower decay to the uninformed trading behavior.
We use the estimated model to generate forecasts on the arrival rates of informed and uninformed traders. Based on the arrival rate forecasts, we compute forecasts of the probability of information-based trading (PIN), which has been shown to have explanatory power for both spreads and returns. We also use the arrival rate forecast to predict trading-cost relevant measures such as bid-ask spreads and price impacts. For example, our microstructure model directly links the arrival rates of informed and uninformed traders to the bid-ask spread, and so our arrival rate forecasts can be used to predict bid-ask spreads. We illustrate the power of this approach by predicting opening spreads for a sample of stocks, and we find significantly positive results for most stocks. Similarly, given the arrival rate forecasts, we can use Bayesian updating to calculate the price impact of any given sequence of order flows. As an illustration, we define a measure of market depth we term the half-life. This measure is defined as the number of consecutive buys needed for the price impact to exceed half of the exogenously specified maximum impact. The half-life estimates provide a compact forecast of the market depth based on the forecasts of arrival rates of informed and uninformed traders.
We also illustrate the value of our dynamic model of trading by showing how our estimated PINs vary around earnings announcement days. One might expect PINs to be high before earnings announcements, and low afterwards as earnings announcements turn private information about earnings into public information. In a recent working paper, Benos and Jochec (2007
) ask whether constant PINs estimated from the static model over time periods of at least 28 trading days before and after earnings announcement have this property. They find that their PIN estimates do not have the expected property. Our belief is that this occurs because the variation in trade based on private information occurs in short periods before and after announcements and using long periods to estimate PINs obscures this effect. Using our dynamic model, we find significant variation in PIN, in the predicted direction, in the week or so before and after earnings announcement days. This result suggests that with our dynamic specification PIN can be used in event studies.
We believe that our results will have an impact in three areas of finance. First, institutional investors need to predict trading costs in order to evaluate the efficiency of alternative trading strategies. In order to do this, it is necessary to predict the price impact of hypothetical trades. Our approach allows us to do a better job of making these predictions than standard microstructure models. We provide an illustrative example in Section 3-4. Second, the liquidity of assets is important for risk management as one of the risks associated with an asset position is the cost of reversing the position. We can predict the PIN, which in turn allows us to forecast liquidity. Third, our more sophisticated model of PIN shows that PINs are both autocorrelated and cross-correlated. Since PIN can be viewed as a simple measure of liquidity, our results show that liquidity covaries across assets. Acharya and Pedersen (2005
) argue that liquidity risk matters for asset pricing and our PIN analysis shows that there is a systemic liquidity factor. Further, our new PINs should allow us to improve on the asset pricing results of Easley, Hvidkjaer, and O'Hara (2002
).
The paper is organized as follows. We begin in Section 1 by setting out our dynamic microstructure models. Section 2 describes the data set and our estimation procedure. Section 3 provides our estimation results on the order arrival processes, and we examine the impulse response functions to shocks to trade imbalances and overall volume levels. Section 3-4 investigates the application of the arrival rate forecasts to the prediction of bid-ask spreads and price impacts. This section also illustrates how to use our dynamic model of PINs in an event study. Section 5 provides some diagnostic analysis of the forecasting results. Section 6 concludes.
| 1 MODEL FORMULATION |
|---|
|
|
|---|
In this section, we propose a dynamic microstructure model of trading. We use this model as a vehicle to investigate how the dynamics of trades and trade composition interact with the evolution of market liquidity and depth. From a practical perspective, portfolio managers observe the order flow of buys and sells on an asset, but not information on what type of player is behind each order and why that player sends a particular order. The idea of building the dynamic microstructure model is to provide a theoretical base according to which portfolio managers can infer the unobservable arrival rates of different types of players from the publicly observable streams of buys and sells. From an academic perspective, the microstructure framework enables us to separate information risk and liquidity risk, and their different impacts on asset pricing.
To build our dynamic model, we use the model of Easley and O'Hara (1992
) as our benchmark, but allow the arrival rates of different types of trades to follow autoregressive processes. Every day agents update their parameter estimates based on past information before embarking on their trading day. We can use the microstructure model in a conditional form to construct the likelihood function of the observed order flows. By maximizing the likelihood function, we identify the parameters that govern the dynamic processes of the arrival rates. Using the estimated model, we can generate forecasts on the arrival rates, information flow, market liquidity, and depth.
1.1 The Static Model Benchmark
We follow Easley and O'Hara (1992
) and Easley, Kiefer, and O'Hara (1996, 1997a, 1997b) in modeling a market in which a competitive market maker trades a risky asset with uninformed and informed traders. Trade occurs over discrete trading days and, within each trading day, trade occurs in continuous time. Information events occur between trading days with probability
. When these events occur, they are either bad news with probability
, or good news with probability 1–
. Traders informed of bad news sell and those informed of good news buy. We assume that orders from these informed traders follow a Poisson process with daily arrival rate µ. Uninformed traders trade for liquidity reasons. We assume that buy and sell orders from uninformed traders each arrive at the market according to a Poisson process with daily arrival rate
. A more extensive discussion of this structure can be found in Easley, Kiefer, and O'Hara (1996, 1997a, 1997b).
Under this model, the probability of observing B number of buys and S number of sells at a given date t is given by
|
| (1) |
, and a "no news day" (1–
). The model is static in the sense that each day the arrivals of an information event, and trades conditional on information events, are drawn from identical and independent distributions.
1.2 Time-Varying Arrival Rates of Trades
The benchmark model assumes constant arrival rates for both informed and uninformed traders. In reality, agents continually gain information about the trading environment and consequently update their estimates of these arrival rates. To capture this effect econometrically, we specify how the arrival rates evolve and what the key information sources are about the arrival rates. With the dynamics specification, the arrival rates in Equation (1) become conditional arrival rate forecasts, and the probabilities of buys and sells vary over time with the conditional arrival rate forecasts.
1.2.1 The information content of trades.
According to the benchmark microstructure model, data on daily numbers of buys and sells contain important information about the underlying arrival rates of informed and uninformed traders. Let
denote the total number of trades per day. The expected value of the total trades,
, is equal to the sum of the Poisson arrival rates of informed and uninformed trades:
|
|
Furthermore, the expected value of the trade imbalance
is given by:
|
|
is not exactly one-half, the mean of trade imbalance provides information on the arrival of informed trades. A more informative quantity is the absolute value of the trade imbalance. The expectation on absolute differences of Poisson variables takes on rather complicated forms (see Katti 1960), but the first-order term of this expectation relates directly to the arrival of the informed trades: These relations provide the key information sources that agents would use to update their arrival rate estimates. In this paper, we model the arrival rate dynamics with a forecasting specification that uses past values of balanced and imbalanced trade as well as past arrival forecasts to forecast informed and uninformed arrival rates. It seems reasonable to allow arrival rates to depend on these variables as traders can observe them and can thus condition their trading choices on this data.
1.2.2 A generalized autoregressive specification on arrival rates of trades.
The arrival rate of informed trades is
and the arrival rate of the uninformed trades is 2
. We use
to denote the vector of the two arrival rates. To remove any deterministic trend in arrival rates, we model the detrended arrival rates
as a vector stationary process, where the vector
captures the growth rates of the two intensities.
In order to allow our arrival rate forecasts to depend on past observables, we specify that the detrended arrival rate forecasts follow bivariate vector autoregressive process with predetermined forcing variables,
|
| (2) |
To compute multistep forecasts of the arrival rates, it is necessary to forecast future values of
based on the model. As a first-order approximation,
. Then, as in GARCH models, the above forecasting relation can be rewritten as an
process:
|
| (3) |
|
|
For model estimation, we set
. Adding back the time trend, we can rewrite the forecasting relation as
|
| (4) |
Equation (4) forecasts the product of the parameter
and the arrival rate of informed traders µ. However, the likelihood function needs separate inputs for the two quantities. To separate them, we assume that
, the probability of an information event, is constant over time. In reality, informed trades could vary because of variations in either the arrival rate of informed traders µ or the probability of an information event
, or both. We find it more plausible that the arrival rate of informed traders is time varying than that the probability of an information event is time varying. Some information events are more important than others. We use the time-varying arrival rate of informed traders to capture the variation in the importance of the information events. More important information events attract more informed traders. Nevertheless, it is possible that the probability of having an information event also follows a stochastic process that we miss-identify as variation in informed traders with this assumption.
1.3 Maximum Likelihood Estimation
With daily observations on the number of buys and sells, we use a maximum likelihood method to estimate the parameters that govern the dynamics of the arrival rates of informed and uninformed trades [
], the probability of an information event
, and the probability of bad news
. First, given initial guesses on the model parameters, we use Equation (4) to forecast the informed and uninformed trade arrival rates at each time t based on information at time
to obtain [
]. Second, conditional on the time-
forecasts of the time-t arrival rates, we compute the time-
conditional probability of having
buys and St sells at time t according to the benchmark microstructure model,
|
| (5) |
We construct the aggregate log likelihood function on the time series of buys and sells as a summation of the logarithm of the daily conditional probabilities given in (5):
|
| (6) |
Although the estimation procedure is straightforward, we often encounter numerical problems when performing the estimation in practice. The three components of the conditional probability in Equation (5) all have the factorials of buys and sells in the denominator and have the arrival rates raised to the power of buys and sells in the numerator. As the number of buys and sells become very large numbers for some heavily traded stocks, the computation generates overflow errors for both the numerator and the denominator. Furthermore, the exponential operation on the negative of the arrival rates can also generate underflow errors when the arrival rates are large.
To circumvent the numerical difficulty, we factor out a common term from the three components of the conditional probability,
, and rewrite the log likelihood function as,
|
| (7) |
Our model formulation combines the strength of GARCH-type specifications in forecasting arrival rate dynamics with a microstructure setting to generate a likelihood function that is tightly linked to the interactions between informed and uninformed traders. The GARCH specification in Equation (4) makes a static microstructure model dynamic and enables a highly stylized microstructure story to capture observed order flow behaviors. On the other hand, the microstructure backdrop provides guidance on the forecasting dynamics specifications and informative observable choices. It also generates structural interpretations on the estimated model parameters.
| 2 DATA AND ESTIMATION |
|---|
|
|
|---|
We select 16 actively traded stocks to illustrate our approach to estimating the arrival rates dynamics and forecasting trading costs.1 These stocks are Ashland (ASH), Exxon Mobil (XOM), Duke Energy (DUK), Enron (ENE), AOL Time Warner (AOL), Philip Morris (MO), ATT (T), Pfizer (PFE), Southwest Air (LUV), AMR (AMR), Dow Chemical (DOW), CitiGroup (C), JP Morgan Chase (JPM), Wal Mart (WMT), Home Depot (HD), and General Electric (GE). We choose representative stocks from a variety of industries that had high trading volume and were listed on the NYSE. The latter criterion is intended to avoid differences introduced by different trading platforms. Trade data for these stocks are taken from the TAQ transactions database over 15 years for the period January 3rd, 1983, to December 24th, 1998 (3891 business days). A minimum level of trading activity is necessary to extract the information changes from each day, so we exclude days when there are either no buys or no sells. The least active stock is Enron, from which we drop 244 inactive days, then JP Morgan Chase (244 days), Ashland (65 days), Duke Energy (61 days), Wal Mart (19 days), Exxon Mobil (18 days), Southwest Air (7 days), Pfizer (4 days), ATT (4 days), and Philip Morris (3 days). Furthermore, the data for AOL Time Warner, CitiGroup, and Home Depot start late. The starting dates are, respectively, September 16, 1996; October 29, 1986; and April 19, 1984.
The TAQ data provide a complete listing of quotes, depths, trades, and volume at each point in time for each traded security. For our analysis, we require the number of buys and sells for each day, but the TAQ data record only transactions, not who initiated the trade. The classification problem has been dealt with in a number of ways in the literature, with most methods using some variant on the uptick or downtick property of buys and sells. In this article, we use a technique developed by Lee and Ready (1991
). Those authors propose defining trades above the midpoint of the bid-ask spread to be buys and trades below the midpoint of the spread to be sells. Trades at the midpoint are classified depending upon the price movement of the previous trade. Thus, a midpoint trade will be a sell if the midpoint moves down from the previous trade (a downtick) and will be a buy if the midpoint moves up. If there is no price movement, we move back to the prior price movement and use that as our benchmark. We apply this algorithm to each transaction in our sample to determine the daily numbers of buys and sells. The first trade each day is excluded from our sample as it is determined by a different mechanism.
We begin by analyzing the properties of the trade variables. Table 1 reports the summary statistics of the trade quantities
, the number of imbalanced and balanced trades. We observe the following features:
- Trades are increasing. The daily number of balanced trades
grows faster than the trade imbalance K. The estimated annual growth rate for the balanced trade ranges from 2.4% for DOW to 94% for AOL. The growth rate for the trade imbalance ranges from negative for XOM (–3.66%) and DOW (–1.51%) to 133% for AOL.
- The number of balanced trades is more volatile than trade imbalance. For all stocks investigated, the standard deviation of the balanced trades is much larger than the standard deviation of the trade imbalance. Standard deviations are measured on the detrended residuals. Furthermore, the intercept of the detrending regression is also larger for the number of balanced trades
than for the trade imbalance
, implying that the number of balanced trades dominates the total trades.
- Trades are highly persistent. Balanced trades are more persistent than the trade imbalance. The first order autocorrelation for balanced trade ranges from 0.697 to 0.953 while that for the trade imbalance ranges from 0.145 and 0.772. Autocorrelations are measured on the detrended residuals.
- Balanced trades and trade imbalances are cross-correlated. The two quantities are generally positively correlated. The cross-correlation coefficient between the balanced trade
and the trade imbalance
ranges from
for XOM to 0.802 for Citigroup.
|
The above observations suggest a level of complexity to the order arrival process that is not well captured by static models. The observations also suggest that informed and uninformed trade behaviors exhibit complex dynamic interactions, which are the key motivations for our dynamic specifications of the arrival rates. The observation that balanced and imbalanced trades show both serial and cross-sectional dependence indicates that the arrival rates of informed and uninformed trades are not constant over time, but instead follow some correlated, autoregressive dynamics. The observation that the trades are increasing over time prompts us to also incorporate a deterministic time trend in the arrival rate dynamics specification.
Using the time series of balanced and imbalanced trades on each of the 16 stocks, we maximize the log likelihood defined in Equation (7) to estimate the parameters that govern the dynamics of the arrival rates of informed and uninformed trades. These estimated parameters indicate how the two arrival rates interact with each other and how they move over time. From the estimated dynamics and observations on order flows, we then construct arrival rate forecasts, which in turn predict market liquidity, depth, and potential trading cost in each stock.
| 3 THE ARRIVAL RATE DYNAMICS |
|---|
|
|
|---|
Table 2 reports the parameter estimates and the maximized log likelihood values for each stock. Our focus here is on the dynamics of informed and uninformed order flow rather than directly on the parameter estimates. We first discuss how to construct the dynamics from the parameter estimates. In the next section, we turn our attention to the impact of the dynamics on market liquidity, depth, and trading cost analysis.
|
To understand how the arrival rates of the two types of trades interact with each other and how they respond to innovations in the order flow, we rewrite the generalized autoregressive process as,
|
|
|
| (8) |
3.1 The Instantaneous Impact of Trade Innovations
The instantaneous impact of trade innovations
on the arrival rate forecasts
is captured by the
matrix. Inspecting the estimates of the
matrix in Table 2, we find that the estimates for all elements of the matrix are positive for all the 16 stocks. Therefore, shocks to both balanced and imbalanced trades have positive instantaneous impacts on the arrival rate of both informed and uninformed agents. Further inspection shows that the estimates for the
and
elements are larger than the estimates for the
and
estimates, indicating that both trade innovations have a larger impact on the arrival rate forecast of uninformed trades than on the arrival rate forecast of informed trades. As a result, we can more effectively forecast the uninformed arrival rate than the informed.
The elements
and
capture the instantaneous impact of the innovation in trade imbalance
on the informed and uninformed arrival forecasts, respectively, holding the number of balanced trades constant. Hence, the positive coefficients imply that given a fixed number of balanced trades, increasing trade imbalances increase the arrival forecasts on both informed and uninformed arrivals, potentially because increasing the trade imbalance in this scenario also increases the total number of trades.
On the other hand, if we hold the total number of trades constant, the instantaneous effect of a relative increase in the trade imbalance is captured by
on the informed arrival forecast and by
on the uninformed arrival forecast. We find that the estimates for the difference
remain predominantly positive, with only one exception in Citigroup. Thus, we conclude that a relative increase in the composition of the imbalanced trades also increases the arrival forecasts of informed trades for most stocks. However, the estimates for the difference
have mixed signs negative for seven firms and positive for nine forms. Hence, the impact of a relative increase in the composition of imbalanced trades is ambiguous on the arrival forecast of uninformed trades.
Overall, we find that an absolute increase in either balanced or imbalanced trades increases the forecasts of both informed and uninformed arrivals. So we forecast greater arrival rates for both types of traders following an increase in trade of either type. However, an increase in the relative composition of the imbalanced trades while holding the total number of trades constant has a positive impact on the arrival forecast of informed trades, but an ambiguous impact on the arrival forecast of uninformed trades. So we forecast a greater arrival rate for informed traders following an increase in the share of trades that are imbalanced, but there is no clear effect of the share of imbalanced trades on the forecast of uninformed arrivals.
3.2 The Serial Dependence of Arrival Rate Forecasts
The
matrix captures the first-order persistence of the vector arrival rate forecasts on informed and uninformed trades. The diagonal terms of
capture how the current forecast is correlated with the lagged forecast of the same arrival rate. The parameter estimates reported in Table 2 indicate that the diagonal terms of
are mostly positive, indicating a trend following or herding behavior for both types of arrival rate forecasts. Table 3 reports the eigenvalues of this impact multiplier for the 16 stocks in our sample. Under the linear approximation, both eigenvalues should be less than one for the vector process to be stationary. Given the nonlinearity inherent in the dependence of
on
, we cannot directly use the eigenvalues to determine the stationarity of the system. Nevertheless, the magnitudes of the eigenvalues give us an approximate picture of the persistence. For all the 16 stocks, we find that the second eigenvalue of the multiplier matrix is very close to one, demonstrating the extreme persistence of the system.
|
The dynamics of the vector arrival rate processes is further complicated by the presence of large off-diagonal terms in
The impact of previous day's uninformed order arrival forecast on today's informed arrival forecast is captured by the
th element of impact multiplier,
. The estimates on
reported in Table 2 are small, and are not consistently positive or negative across the 16 stocks. Hence, the arrival forecasts of informed trades do not depend much on lagged forecasts on the uninformed arrivals. This dynamic behavior is consistent with the hypothesis that informed traders act mainly on information, and do not respond strongly to the activity of uninformed traders.
3.3 The Multiperiod Impact of Trade Innovation
The impulse response function, defined in Equation (8), describes how a shock to one of the state variables will alter the evolution of these variables through time. Such shocks will typically decay over time but in this case there is substantial persistence. The impulse-response function is determined jointly by the instantaneous impact matrix
and the impact multiplier
. In Figure 1, we plot the normalized impulse-response function for the 16 stocks in our sample, computed based on Equation (8). To compare the relative persistence of each of the four elements, we normalize each element of the impulse-response function by the corresponding element in
so that all elements of the impulse response are normalized to one at the instantaneous level
. The 16 stocks generate very similar persistence patterns. In particular, the arrival rate of uninformed trades (dotted line) is much more persistent than the arrival rate of informed trades (solid line), with one exception on AOL (the fifth panel). The persistence of cross-impacts falls between the two direct impacts.
|
This persistent behavior of informed and uninformed trades is not unexpected given that many studies have shown volume to be significantly and positively autocorrelated. But this result is at variance with the predictions of microstructure models in which trades are viewed as iid. Perhaps more importantly, the result reveals that trade patterns are predictable across trading days.
3.4 Robustness of Arrival Rate Dynamics with Respect to Model Perturbations
We have also done the estimation with a generalized autoregressive process on the logarithm of the arrival rates instead of the arrival rates themselves. This specification is analogous to the EGARCH model of Nelson (1991
). The maximized log likelihood values from the two models are very close to one another, neither model consistently dominating the other model across all stocks. More importantly, parameter estimates from both models imply similar dynamic behaviors for the informed and uninformed arrivals, showing the robustness of the results.2 For both models, uninformed trades tend to be highly persistent. Uninformed order arrivals clump together, with high-volume days more likely to follow high-volume days, and conversely. However, an increase in the forecast of informed arrival rate leads to a decline in future forecast of the uninformed arrival rate. The informed arrival rates also exhibit complex patterns, but the forecast of the informed arrival rate depends little on past forecasts of the arrival rates of uninformed trades.
| 4 FORECASTING MARKET LIQUIDITY AND DEPTH |
|---|
|
|
|---|
In addition to providing insights on how the informed and uninformed dynamically interact with each other, the estimation of our dynamic model also generates direct forecasts on the arrival rates of informed and uninformed trades. These forecasts are informative in predicting the market liquidity and market depth. Thus, they are useful not only for academics in better understanding the market microstructure, but also for practitioners in better positioning their trades, and for risk managers seeking to measure the risks of illiquidity.
We also use our dynamic model to generate a time series of the PIN. This variable has been used in many studies to provide insight into the microstructure questions, such as the determinants of bid-ask spreads, and asset pricing questions, such as the determinants of the cost of capital. But all prior work using PIN required an assumption that it was constant over a substantial period of time. So PIN could not be used to provide insight into short-term, transitory changes in information-based trading. Here we show how to use the time series of PINs produced by our dynamic model to investigate the effects of earnings announcements on the variation in information-based trading.
4.1 Market Liquidity and Bid-Ask Spread
Market liquidity is often measured by the bid-ask spread: markets in which the bid-ask spread is small are interpreted as liquid markets. Our model links bid-ask spreads directly to the trade sequence and the arrival rates of informed and uninformed trades. By forecasting the arrival rates, we can predict the dynamics of bid-ask spreads.
We start by analyzing the bid quote in response to a sell order. Under our model, an application of Bayes rule shows that the probabilities of a good and a bad information event conditional on a sell order at time t are given by, respectively,
|
| (9) |
|
| (10) |
Now, we consider the ask price for a buy order. Again, we can apply the Bayes rule to derive the probabilities of a good and a bad information event conditional on a buy order,
|
| (11) |
|
| (12) |
For illustration, we consider the special case at the opening of each day t. We start the day with the unconditional probabilities of good and bad information events,
|
| (13) |
|
| (14) |
|
| (15) |
Our dynamic model provides conditional expectations of the arrival rates of informed and uninformed trades. We use the arrival rate forecasts to compute forecasts of the probability of informed trades, PIN. This conditional PIN is interpreted as the forecast of the probability that a trade on the next day will be from an informed agent. Then, we use these conditional PINs to predict market liquidity, exemplified by the opening bid-ask spread, using (14). The summary statistics for the PIN forecasts are reported in Table 4.
|
Figure 2 plots the time series of the PIN forecasts for each stock. For ease of comparison, we apply the same scale for all panels. We observe an obvious decline in the PIN forecasts over time for several stocks, especially during the last several years of our sample.
|
A new generation of asset pricing theories ascribe a role to liquidity. Easley, Hvidkjaer, and O'Hara (2002
From an asset pricing point of view, the covariance of illiquidity across assets is also of importance. Just as with the risk of return, diversification can reduce the risk that an investor must sell when an asset is particularly illiquid. Hence, the strength of correlation matters, see for example Hasbrouck and Seppi (2001
) and Chordia, Roll, and Subrahmanyam (2000
). It is clear from Figure 2 that PIN moves similarly across assets. Table 5 reports the cross-correlation estimates between the PIN time series on different stocks. The correlations are estimated using the common sample of the two stocks involved. The estimates differ greatly across different stock pairs, ranging from
to
. Based on the common sample of 14 stocks,3 we perform principal component analysis and plot the normalized eignevalues of each principal component in Figure 3. The plots show that one principal component explains 37% of the daily variation in the 14 PIN series. This estimate suggests that there is a systematic liquidity factor that underlies the stocks that we estimate. While diversification can remove the idiosyncratic component of the liquidity risk, the systematic liquidity risk in each stock should be priced.
|
|
To examine how informative the arrival rate forecasts are in predicting the opening bid-ask spread, we run the following forecasting regression on each stock:
|
| (16) |
|
| (17) |
Since the estimate for
is not exactly at
for most stocks, in theory we should use a more complicated function of arrival rates as in (14) rather than PIN. Nevertheless, we use PIN for its simplicity and its intuitive interpretation as a measure for expected trade composition. Furthermore, several studies have generated the PIN estimates from the static model (based on either a rolling or a nonoverlapping window) and explored their implications. Using PIN from our dynamic model provides a comparison with these studies.
We estimate the regressions using generalized methods of moments, with the weighting matrix calculated according to Newey and West (1987
) with 30 lags. Table 6 reports the slope estimates, their standard errors (in parentheses), and the
-squares of the regressions in (16). The forecasting performance of the PIN forecasts are quite remarkable. The estimates for the
coefficient, which captures the impact of the probability of informed trades, are significantly positive for all but two stocks. The sample average of
over the 16 stocks is 0.253, with an average standard deviation of 0.105. The strong statistical significance of the coefficient estimates are remarkable given that the arrival rate forecasts are obtained from purely trade quantities while the opening bid-ask spread is a price behavior.
|
The
It is important to note that our arrival rates forecasts can be used to forecast the bid-ask spreads under any given trade sequences. Here, we use the specific regressions on the opening bid-ask spreads to illustrate their forecasting power and potential usefulness in forecasting the time-variation in market liquidity.
4.2 Market Depth and Price Impacts of Trade Orders
When a portfolio manager tries to purchase or liquidate a large position by sending consecutive buy or sell orders to the market, the price change induced by this series of orders could be significant. Using our dynamic microstructure framework, we can compute the price impact of this sequence of orders as a function of the arrival rates of informed and uninformed trades. Since we have forecasts of the arrival rates, our dynamic model can also be used to forecast the market depth and the potential cost of loading or unloading a position.
We use a sequence of
consecutive buy orders as an example. Let
and
denote the probabilities of a good and a bad information event conditional on N–1 consecutive buy orders. From (12), we can derive the price impact of N consecutive buys as
|
|
|
| (18) |
, and the price converges to the expected upper bound of the asset value
To illustrate how the arrival rate forecasts impact the market depth, we use the first stock of our sample, Ashland Oil, as an example and consider three dates in our sample period when the PIN forecasts on Ashland are at the sample minimum, median, and maximum, respectively. At each of three PIN levels, we use the estimated model parameters on Ashland Oil and the arrival rate forecasts for that date to compute the price impacts of N consecutive buy orders (
) according to Equation (18) and then normalize the impacts by their convergent value
. Figure 4 plots the three normalized price impact curves (
) as a function of the number of consecutive buy orders (N) at the three selected PIN levels for Ashland.
|
All three normalized curves start at zero with zero trade and converge to one as the stock price converges to its upper bound












