Skip Navigation



Journal of Financial Econometrics Advance Access published online on November 28, 2007

Journal of Financial Econometrics, doi:10.1093/jjfinec/nbm019
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
6/1/87    most recent
nbm019v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Google Scholar
Right arrow Articles by Chen, S. X.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Authors
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Nonparametric Estimation of Expected Shortfall

Song Xi Chen
     Iowa State University

Address for correspondence: Department of Statistics, Iowa State University, Ames, IA 50011-1210, email: songchen{at}iastate.edu


    Abstract
 Top
 Abstract
 Introduction
 1 Nonparametric Estimators
 2 Main Results
 3 Standard Errors
 4 Simulation Study
 5 Empirical Study
 Appendix: Proofs
 References
 Footnotes
 
The expected shortfall is an increasingly popular risk measure in financial risk management and it possesses the desired sub-additivity property, which is lacking for the value at risk (VaR). We consider two nonparametric expected shortfall estimators for dependent financial losses. One is a sample average of excessive losses larger than a VaR. The other is a kernel smoothed version of the first estimator (Scaillet, 2004 Mathematical Finance), hoping that more accurate estimation can be achieved by smoothing. Our analysis reveals that the extra kernel smoothing does not produce more accurate estimation of the shortfall. This is different from the estimation of the VaR where smoothing has been shown to produce reduction in both the variance and the mean square error of estimation. Therefore, the simpler ES estimator based on the sample average of excessive losses is attractive for the shortfall estimation.

KEYWORDS: expected shortfall, kernel estimator, risk measures, value at risk, weakly dependent


    Introduction
 Top
 Abstract
 Introduction
 1 Nonparametric Estimators
 2 Main Results
 3 Standard Errors
 4 Simulation Study
 5 Empirical Study
 Appendix: Proofs
 References
 Footnotes
 
The expected shortfall (ES) and the value at risk (VaR) are popular measures of financial risks for an asset or a portfolio of assets. Artzner, Delbaen, Eber, and Heath (1999Go) show that VaR lacks the sub-additivity property.1 The sub-additivity implies convexity of a risk measure which is used to define a risk measure being coherent; see Artzner, Delbaen, Eber and Heath (1999Go) for details. In contrast, ES is coherent (Föllmer and Schied, 2002Go) and has become a more attractive alternative in financial risk management.

Let {Xt}nt=1 be the market values of an asset or a portfolio of assets over n periods of a time unit. Let Yt = –log(Xit/Xit–1) be the negative log return (log loss) over the tth period. Suppose {Yt}nt=1 is a stationary process with the stationary distribution function F. Given a positive value p close to zero, the VaR at a confidence level 1 – p is


Formula 1

(1)
which is the (1 – p)th quantile of the loss distribution F. The VaR specifies a level of excessive losses such that the probability of a loss larger than {nu}p is less than p. See Duffie and Pan (1997Go) and Jorion (2001Go) for the financial background, statistical inference, and applications of VaR. A major shortcoming of VaR, in addition to not being a coherent risk measure, is that it provides no information on the extent of excessive losses other than specifying a level that defines the excessive losses. In contrast, ES is a risk measure that is not only coherent but also informative on the extent of losses greater than {nu}p.

The ES associated with a confidence level 1 – p, denoted as µp, is the conditional expectation of a loss given that the loss is larger than {nu}p, that is,


Formula 2

(2)

Estimation of the ES can be carried out by assuming a parametric loss distribution, which is the method commonly used in actuary studies. Frey and McNeil (2002Go) propose a binomial mixture model approach to estimate ES and VaR for a large, balanced portfolio. The extreme-value theory approach (Embrechts, Kluppelberg, and Mikosch, 1997Go) can be viewed as a semiparametric approach, which uses the asymptotic distribution of exceedances over a high threshold to model the excessive losses and then carries out a parametric inference within the framework of the generalized Pareto distributions. Recently, Scaillet (2004Go) has proposed a nonparametric kernel estimator and applied it to sensitivity analysis in the context of portfolio allocation.

An advantage of the nonparametric method is that it is model-free and hence is model robust and avoids bias caused by using a mis-specified loss distribution. Financial risk management is primarily concerned with characteristics of the tail part of the loss distribution. However, data are generally sparse in the tail and hence finding a proper parametric loss model that is adequate for the tail part is not trivial. This is where the nonparametric method can play a significant role. Another advantage of the nonparametric approach is that it allows a wide range of data dependence, which makes it adaptable in the context of financial losses. The nonparametric estimators considered in this paper can accommodate data dependence explicitly since the effect of dependence on the variance of ES estimation can be clearly spelled out in the variance formula. This is different from the extreme- value approach as the latter effectively treats high exceedances as independent observations, which is true asymptotically under the so-called D and D' conditions (Leadbetter, Lindgren, and Rootzén, 1983Go). An empirical study by Bellini and Figá-Talamanca (2002Go), carrying out a nonparametric runs test, has shown that financial returns can exhibit strong tail dependence even for large threshold levels. This indicates the need for considering the dependence in financial returns directly, which is the approach taken by the nonparametric estimators considered in this paper.

In this paper, we evaluate two nonparametric ES estimators. One is based on a weighted sample average of excessive losses defined by a VaR estimator Formula based on an order statistic. The other is the kernel estimator proposed in Scaillet (2004Go) which employs kernel smoothing in both the initial VaR estimation and the final averaging of the excessive losses. It was hoped that the kernel smoothing would produce a more accurate estimator, like the case of VaR estimation studied by Chen and Tang (2005Go).

A main finding of the current paper is that the variance and the mean square error of the kernel estimator proposed by Scaillet (2004Go) is not necessarily smaller than those of the sample weighted average estimator. This is because the second order variance term of the kernel ES estimator vanishes instead of taking a negative value as in the case of VaR estimator. This indicates no meaningful variance reduction due to the kernel smoothing. As kernel smoothing introduces a bias, the lack of variance reduction makes the smoothing not worthwhile as the overall mean square error increases. Another finding is that the weighted average estimator has the same asymptotic variance as the kernel estimator. Therefore, for estimation of the ES, the sample weighted average of excessive losses is attractive as it is easy to compute as far as point estimation is concerned. This may be surprising considering that kernel smoothing leads to smaller variance in quantile estimation for both independent (Sheather and Marron, 1990Go) and dependent (Chen and Tang, 2005Go) observations. The underlying reason that these different effects of kernel smoothing happen is that the unconditional ES is effectively a mean parameter, which can be estimated accurately by simple averaging.

The paper is structured as follows. We introduce the two nonparametric ES estimators in Section 1. Their statistical properties are discussed in Section 2. Variance estimation for the purpose for supplying standard errors for the ES estimates is discussed in Section 3. Section 4 reports simulation results, which is followed by an empirical study on two financial series in Section 5. All the technical details are given in the appendix.


    1 Nonparametric Estimators
 Top
 Abstract
 Introduction
 1 Nonparametric Estimators
 2 Main Results
 3 Standard Errors
 4 Simulation Study
 5 Empirical Study
 Appendix: Proofs
 References
 Footnotes
 
The first nonparametric estimator of the ES considered in this paper is


Formula 3

(3)
which is a weighted average of excessive losses larger than Formula where I(·) is the indicator function, Formula is the sample VaR (quantile) estimator of {nu}p and Y(r) is the rth order statistic of {Yt}nt=1.

The kernel estimator proposed by Scaillet (2004Go) is the following. Let K be a kernel function, which is a symmetric probability density function, and G(t) = {int}{infty}tK(u)du and Gh(t) = G(t/h) where h is a positive smoothing bandwidth. The kernel estimator of the survival function S(x) = 1 – F(x) is


Formula 4

(4)
A kernel estimator of {nu}p, denoted as Formula , is the solution of Sh(z) = p, as proposed by Gourieroux, Laurent, and Scaillet (2000Go).2 By replacing the indicator function and Formula with the smoother Gh and Formula respectively in Equation (3), Scaillet (2004Go) proposed the following kernel estimator


Formula 5

(5)

Based on the improvement of the kernel VaR estimator Formula over Formula , it is expected that the kernel ES estimator Formula would improve the estimation accuracy of the unsmoothed estimator Formula . Confirming this or otherwise is the focus of the next section.

The commonly employed stochastic models in financial data modeling and risk assessment can generate data to which the proposed ES estimation may be applied. These models include the linear process


Formula

with independent and identically distributed innovation {{xi}s}{infty}s=0; the Markov process


Formula

where Formula are p-lagged values of Yt and {{epsilon}t}Tt=1 are independent and identically distributed random variables, and m(·) and {sigma}2(·) are respectively the conditional mean and volatility functions of Yt given Formula ; the GARCH (p, q) model


Formula

where c, {alpha}i, and βj are all positive parameter; as well as the continuous-time diffusion models and the stochastic volatility models.


    2 Main Results
 Top
 Abstract
 Introduction
 1 Nonparametric Estimators
 2 Main Results
 3 Standard Errors
 4 Simulation Study
 5 Empirical Study
 Appendix: Proofs
 References
 Footnotes
 
The properties of these two nonparametric ES estimators are evaluated in this section. We start with some conditions.

Let Formula be the {sigma}-algebra of events generated by {Yt, k <= t <= l} for l>k. The {alpha}-mixing coefficient introduced by Rosenblatt (1956Go) is


Formula

The series is said to be {alpha}-mixing if limk->{infty}{alpha}(k) = 0. The dependence described by the {alpha}-mixing is the weakest as it is implied by other types of mixing; see Doukhan (1994Go) for a comprehensive discussion. The following conditions are assumed in our study:

  1. There exists a {rho} isin (0, 1) such that {alpha}(k) <= C{rho}k for all k >= 1 and a positive constant C.
  2. The stationary distribution F of the stationary process {Yt} is absolutely continuous with probability density f which has continuous second derivatives in Formula , a neighborhood of {nu}p; for k >= 1, Fk, the joint distribution functions of (Y1, Yk+1), have all its second partial derivatives bounded in Formula ; E(|Yt|2+{delta}) <= C for some {delta}>0 and a positive constant C.
  3. K is a symmetric probability density satisfying the moment conditions {int}1–1uK(u)du = 0 and {int}1–1u2K(u)du = {sigma}2K>0, and K has bounded and Lipschitz continuous derivative.
  4. h satisfies h -> 0, nh3–β -> {infty} for any β>0 and nh4 log2(n) -> 0 as n -> {infty}.

Condition (i) means that the time series is geometric {alpha}-mixing, which is satisfied by many commonly used financial time series; some of them are listed at the end of last section. For instance Carrasco and Chen (2002Go) established the {alpha}-mixing for ARCH model; and Genon–Catalot, Jeantheau, and Larédo (2000Go) for diffusion models. Conditions (ii) contains standard conditions, which requires underlying smoothness for the marginal and pair-wise joint densities together with finite moments for the absolute returns. Conditions (iii) and (iv) are extra ones required by the kernel estimator. While Condition (iii) has the usual requirements on the kernel, Condition (iv) specifies a range for the bandwidth which includes O(n–1/3), the optimal order for estimating VaR estimation. These conditions are comparable to conditions imposed by other authors.

Let {gamma}(k) = Cov{(Y1{nu}p)I(Y1 >= {nu}p), (Yk+1{nu}p)I(Yk+1 >= {nu}p)} for positive integers k and


Formula

Assumption (i) and the Davydov inequality (see Bosq, 1998Go) imply that {sigma}20(p, n) is finite for each n and is converging as n -> {infty}.

We start with evaluating the unsmoothed estimator Formula to provide a point of reference for the kernel estimators. Derivation given in the appendix shows that under conditions (i) and (ii), and for an arbitrary positive {kappa},


Formula 6

(6)
This is a Bahadur-type expansion (Bahadur, 1966Go) which leads to the following theorem regarding the asymptotic normality of Formula .

Theorem 1
Under conditions (i) and (ii), as n -> {infty}


Formula 7

(7)

This theorem indicates that the asymptotic variance of Formula is {sigma}20(p; n)/(np2), which is the variance of p–1{n–1{sum}ni=1(Yt {nu}p)I(Yt >= {nu}p) – pp{nu}p)}, the leading order term in expansion (6). The dependence in the original time series is reflected in the asymptotic variance through the covariance in {sigma}20(p; n). This means that we need to accommodate the dependence in further statistical inference for the shortfall estimation; see Section 3 for estimation of the variance. We note also that the effective sample size for the ES estimation is np2. As p is small ranging between 1% and 5% as commonly used in financial risk management, the ES estimator is subject to high volatility, which is a common challenge for statistical inference of risk measures.

The following theorem summarizes the properties of the kernel estimator (5).

Theorem 2
Under conditions (i) and (iv), as n -> {infty}


Formula 8

(8)
and furthermore,


Formula 9

(9)


Formula 10

(10)

By comparing with Theorem 1, it is found that the kernel estimator has the same asymptotic normal distribution as the unsmoothed sample estimator Formula . This is similar to the corresponding results for VaR estimation as reported in Chen and Tang (2005Go). We also note that both Formula and Formula converge to µp at the rate of Formula or more precisely at the rate of Formula ; whereas the VaR estimators Formula and Formula converge to {nu}p at the rate of Formula or, more precisely, at the rate of Formula where f is the probability density of Yt.

The second part of the theorem conveys a story different from VaR estimation. First of all, unlike the VaR estimation, the kernel estimator does not offer a variance reduction at the second order of n–1h as the second order term vanishes. At the same time, the smoothing brings in a bias that leads to an overall increase in the mean square error. Therefore, for the purpose of estimating the ES, the kernel smoothing is counterproductive. The underlying reason is the fact that the ES is effectively a mean parameter, which can be estimated rather accurately without smoothing. The situation is similar to nonparametric estimation of the mean parameter, which can be estimated well by the sample mean.

It should be noted that our above conclusion is only applicable for point estimation of ES. For constructing confidence intervals and testing hypothesis on µp in the presence of data dependence, the kernel smoothing as shown in the next section will play a significant role in estimating {sigma}20(p; n). For estimation of conditional ES (Scaillet, 2005Go), smoothing is needed due to the involvement of conditioning variables.


    3 Standard Errors
 Top
 Abstract
 Introduction
 1 Nonparametric Estimators
 2 Main Results
 3 Standard Errors
 4 Simulation Study
 5 Empirical Study
 Appendix: Proofs
 References
 Footnotes
 
In this section we introduce a method of obtaining standard errors for the nonparametric ES estimates considered earlier. Although it has not been advised for point estimation of ES, smoothing is needed for variance estimation so as to supply standard errors for the ES estimates. A similar approach is used in Chen and Tang (2005Go) for obtaining standard errors for VaR estimates.

Let {phi} be the spectral density of {(Yt{nu}p)I(Yt >= {nu}p)}. From Brockwell and Davis (1991Go),


Formula

which means that the leading order Formula is 2{pi}{phi}(0)(np2)–1. Hence, the key is estimating {phi}(0).

Let Formula for t = 1, ..., n. We propose estimating {phi}(0) by smoothing a set of sample periodograms close to the zero frequency of {Zt}nt=1. One may use Gh(Yt {nu}p) to replace Formula in order to reduce the variability in the estimation of {phi}(0). Let


Formula 11

(11)
be the sample periodograms at frequency {omega}j = 2{pi}j/n isin [ – {pi}, {pi}] for j isin T = ±1, ..., ± [n/2].

Let Wj = log{In({omega}j)/(2{pi})} + 0.57721 and m({omega}) = log{{phi}({omega})}. Following the lines of Fan and Gijbels (1996Go) and Chen and Tang (2005Go), a Nadaraya–Waston kernel estimator of m(0) based on a symmetric kernel K1 and a smoothing bandwidth {lambda} is


Formula 12

(12)
where {lambda} -> 0 and n{lambda} -> {infty} as n -> {infty}. Then, an estimator of {phi}(0) is Formula .

An important issue here is the selection of {lambda}. An objective function we may use in guiding the bandwidth selection is to minimize


Formula 13

(13)
by defining weights qnj = I(|j| <= [kn]) where kn is an n-dependent integer. We choose kn = [0.05n], which means that only the 10% sample periodograms close to the zero frequency are considered. It may be shown that an unbiased estimate of R({lambda}) is


Formula 14

(14)
Ignoring the term not involving {lambda}, the object function to be minimized for {lambda} selection is


Formula

The proposed standard error estimation method with the proposed bandwidth selection will be applied in analyses of some financial data sets in Section 5.


    4 Simulation Study
 Top
 Abstract
 Introduction
 1 Nonparametric Estimators
 2 Main Results
 3 Standard Errors
 4 Simulation Study
 5 Empirical Study
 Appendix: Proofs
 References
 Footnotes
 
In this section we report results from a simulation study which evaluates the performance of the nonparametric ES estimators. The main objective is to confirm our theoretical findings in the preceding section.

The models chosen for the log loss Yt in the simulation are


Formula 15

(15)


Formula 16

(16)
We are interested in estimating µ0.01, the 99% ES. In constructing the kernel estimator, the Gaussian kernel Formula is employed. The sample size considered in the simulation are 250 and 500. The number of simulation is 1000.

Figures 1 and 2 display the bias, variance, and mean square errors of Formula and the kernel VaR estimator Formula over a set of bandwidth values. For comparison, the figures also include the bias, variance, and mean square errors of the unsmoothed VaR estimator Formula and the kernel estimator Formula , respectively. Although the sample size considered in these figures is 250, the same pattern of results is observed for the sample size 500 as well. One feature that is worth noting from Figures 1 and 2 is that a large bandwidth increased both the variance and MSE of the kernel ES estimator. At the same time, the impact of a large bandwidth on the bias was quite limited as shown by the drop of the bias for large h. The main revelation of the simulation is that Formula has a larger variance and, to a large extent, a larger MSE than Formula for both models. In contrast, the kernel VaR estimator Formula delivers both variance and mean square error reduction as revealed in Chen and Tang (2005Go). This confirms that there is no need to smooth the data for ES estimation.


Figure 1
View larger version (29K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1 Simulated average standard deviation (SD) and root mean square error (RMSE) of the kernel 99% ES estimator Figure 1 in (a) and (b) and 99% kernel VaR estimator Figure 1 in (c) and (d), and their unsmoothed (with legend sample) counterparts Figure 1 in (a) and (b) and Figure 1 in (c) and (d) for the AR model with n = 250. And µ0.01 = 3.078 and {nu}0.01 = 2.686.

 


Figure 2
View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2 Simulated average standard deviation (SD) and root mean square error (RMSE) of the kernel 99% ES estimator Figure 2 in Panels (a) and (b) and 99% kernel VaR estimator Figure 2 in Panels (c) and (d), and their unsmoothed (with legend sample) counterparts Figure 2 in (a) and (b) and Figure 2 in (c) and (d) for the ARCH model with n = 250. And µ0.01 = 5.8042 and {nu}0.01 = 5.6647.

 

    5 Empirical Study
 Top
 Abstract
 Introduction
 1 Nonparametric Estimators
 2 Main Results
 3 Standard Errors
 4 Simulation Study
 5 Empirical Study
 Appendix: Proofs
 References
 Footnotes
 
We apply the proposed kernel estimator to estimate the ES of two financial time series. The two financial series are the CAC 40 and the Dow Jones series from October 1st 2001 to September 30th 2003, which consist of 500 observations (2 years' data). The log-return series are displayed in Figure 3 together with their sample autocorrelation functions (ACFs). To confirm the existence of dependence, we carry out the Box–Pierce test with the test statistic Formula where Formula is the sample autocorrelation for lag k. The statistic Q takes value 51.146 for the CAC 40 and 43.001 for Dow Jones, which produces p-values of 0.0068 for CAC 40 and 0.0455 for Dow Jones, respectively. Therefore, the dependence is significant for both series at 5% level of significance.


Figure 3
View larger version (31K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3 The two financial return series in (a) and (c) and their sample autocorrelation functions (ACF) in (b) and (d).

 
We carry out analysis over three periods on each series, which are the first year (2001–2002), the second year (2002–2003), and the entire two years (2001–2003), respectively. Table 1 presents the ES estimates Formula and their standard errors. The standard errors were obtained by using the approach outlined in Section 3. The table also provides the kernel estimates for the 99% VaR. It is observed that for both indices the year 2001–2002 had the largest estimates (risk) of the ES and the VaR, and hence the highest risk, which reflected the high volatility after the setback in the ".com" business and the September 11. The level of risk settled down in the year 2002–2003. It is interesting to see that the CAC was more risky than Dow Jones as the estimates of the ES and the VaR were all larger than their counterparts in Dow Jones. The variability of the ES estimate for Dow was much higher than that of CAC in the year 2001–2002; and the situation reversed in the second year when the ES estimates of CAC became more variable. We observed as expected that the variability for the ES estimates based on the entire 2-year observations were smaller than those of each individual year (Table 1).


View this table:
[in this window]
[in a new window]

 
Table 1 Estimates for {nu}0.01, µ0.01, and standard errors (SE).

 
We then extend the analysis for 20 equally spaced levels of p ranging from 0.01 to 0.03. The kernel estimates of Formula and their 95% confidence bands are displayed in Figure 4. The confidence bands were constructed by adding and subtracting 1.96 times the standard errors. These plots show that, as expected, the ES estimate declined as p increased. For both indices, the year 2001–2002 experienced larger risk than the year 2002–2003. It reveals again that the CAC was more volatile that Dow Jones as the ES estimates were always larger than those of Dow for each of the three time periods and at each fixed p level.


Figure 4
View larger version (31K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4 Expected shortfall estimates and their confidence bands for the two financial return series.

 

    Appendix: Proofs
 Top
 Abstract
 Introduction
 1 Nonparametric Estimators
 2 Main Results
 3 Standard Errors
 4 Simulation Study
 5 Empirical Study
 Appendix: Proofs
 References
 Footnotes
 
Throughout this section we use C and Ci to denote generic positive constants. The proof of Theorems 1 and 2 requires the following lemmas.

Lemma 1
Under Condition (i), Formula exponentially fast as n -> {infty}.

Proof
We only give the proof for Formula as that for Formula can be treated similarly,

Let Formula . It is easily shown that


Formula A1

(A.1)

Let Xi = I(Yt < {nu}p + {epsilon}n) – F({nu}p + {epsilon}n). Clearly E(Xi) = 0 and |Xi| <= 2. Choose q = b0n{epsilon}n, p = n/(2q) and Formula . From an equality given in Yokoyama (1980Go), u2(q) <= Cp. Apply Theorem 1.3 in Bosq (1998Go) for {alpha}-mixing sequences,


Formula A2

(A.2)
where {sigma}2(q) = 2p–2u2(q) + {epsilon}n = C{epsilon}n. It is obvious that


Formula A3

(A.3)
where C2>0. Since n{epsilon}2n -> {infty} means q{epsilon}n -> {infty}, the first term in (A.2) converges to zero exponentially fast. On the second term of (A.2), the geometric {alpha}-mixing implies that


Formula A4

(A.4)
which converges to zero exponentially fast too. This completes the proof of Lemma 1.    {blacksquare}

Lemma 2
Under the Conditions (i) and (ii) and for any {kappa}>0,


Formula

Proof
Let Formula . We first evaluate E(Wt). Note that Formula where


Formula

Furthermore, let Formula and Formula where, for a isin (0, 1/2) and {eta}>0,


Formula

Applying the Cauchy–Swartz inequality, for k = 1 and 2,


Formula

Then Lemma 1 and the fact that Formula imply


Formula A5

(A.5)

To evaluate Formula , we note that Formula This means


Formula

Using exactly the same approach we can show that Formula as well. These and (A.5) mean, by choosing a = –1/2 + {gamma} where {gamma}>0 is arbitrarily small,


Formula A6

(A.6)
for an arbitrarily small positive {kappa}, which in turn implies


Formula A7

(A.7)

We now consider Var(Wi). For a isin (0, 1/2),


Formula

Note that


Formula

which converge to zero exponentially fast as implied by Lemma 1. Applying the Cauchy–Schwartz inequality, we have


Formula

converge to zero exponentially fast as well. Then, applying the same method that establish (A.6), we have


Formula

In summary, we have E(W2t) = o(n–3/2+{kappa}). This and (A.6) mean Var(Wt) = o(n–3/2+{kappa}). By slightly modifying the above derivation for Var(Wt), it may be shown that for any t1, t2 Formula . Therefore,


Formula A8

(A.8)
This together with (A.7) readily establishes the lemma.    {blacksquare}

Lemma 3
Let Formula and Formula . Under the conditions (i)–(iv),


Formula

Proof
We only present the proof of (a) as the proofs for the others are similar. Define Formula . Let Formula , Formula and Formula for some functions {psi}j, j = 1, 2, and 3, such that E{{psi}j(Yt)} = 0. For instance, {psi}2(Yt) = Kh({nu}pYt) – E{Kh({nu}p Yt)} and {psi}3(Yt) = Gh({nu}pYt) – E{Gh({nu}pYt)}.

Using the approach in Billingsley (1968Go, p. 173),


Formula A9

(A.9)
where [6] indicates all the six different permutations among the three indices. Let p = 2 + {delta}, q = 2 + {delta} and s–1 = 1 – p–1q–1 for some positive {delta}. From the Davydov inequality,


Formula

Since |{psi}3(Yi+j)| <= 2 and E|{psi}2(Yi)|2+{delta} <= Ch–1–{delta},


Formula

This and the fact that ||{psi}(Y1)||p = E1/p|{psi}1(Y1)|p <= C lead to


Formula

Similarly, Formula . Therefore,


Formula A10

(A.10)
From (A.9) and (A.10), and the fact that {alpha}(k) is monotonic non-increasing,


Formula

since {sum}j{alpha}{delta}/(2 + {delta})(j) < {infty} as implied by Condition (i).    {blacksquare}

Lemma 4
Under the conditions (i)–(v) and for l1, l2 = 0 or 1,


Formula

Proof
The case of l1 = l2 = 0 has been proved in Chen and Tang (2005Go) and the proofs for the other cases are almost the same, and hence are not given here.    {blacksquare}

Proof of Theorem 1
Let {phi}1(t) = n–1{sum}ni=1YtI(Yt >= t) and {phi}2(t) = n–1{sum}ni=1I(Yt >= t). Then, Formula . Note that E{{phi}1({nu}p)} = pµp, E{{phi}2({nu}p)} = p and Formula . From Lemma 2, for an arbitrarily small positive {kappa},


Formula A11

(A.11)
These lead to


Formula A12

(A.12)

We need to employ the blocking technique and Bradley's Lemma to establish the asymptotic normality. Write Formula where Ti,n = {sigma}–10(p; n)p–1{(Yi{nu}p)I(Yi >= {nu}p) – pp{nu}p)}.

Let k and k' be respectively positive integers such that k' -> {infty}, k'/k -> 0 and k/n -> 0 as n -> {infty}. Let r be a positive integer so that r(k + k') <= n < r(k + k' + 1). Define the large blocks


Formula

the smaller blocks


Formula

and the residual block {delta}n = Tr(k+k')+1,n + • • • + Tn,n. Then


Formula

We note that E(Sn,2) = E(Sn,3) = 0 and as n -> {infty},


Formula

Therefore, for l = 2 and 3


Formula A13

(A.13)

We are left to prove the asymptotic normality of Sn,1. From Bradley's lemma (see Bosq, 1998Go), there exist independent and identically distributed random variables Wj,n such that each Wj,n is identically distributed as Vj,n and


Formula A14

(A.14)

Let {Delta}n = Sn,1n–1/2{sum}rj=1Wj,n. Then


Formula A15

(A.15)
By choosing r = na for a isin (0, 1) and k' = nc such that c isin (0, 1 – a), we can show that the left-hand side of (A.15) converges to 0 as n -> {infty}. Hence


Formula A16

(A.16)
Therefore, Sn,1 = n–1/2{sum}rj=1Wj,n + op(1).    {blacksquare}

By applying the inequality estbalished in Yokoyama (1980Go) and the construction of Wj,n, we have E(Wj,n)4 = E(V4j,n) <= C1k2 and Var(Wj,n) = E(V2j,n) <= C2k. Thus,


Formula

as n -> {infty}, which is the Liapounov condition for the central limit theorem of triangular arrays. Therefore,


Formula A17

(A.17)
Thus, the proof of the theorem is completed by combining (A.13), (A.16), (A.17) and the Slutsky theorem.

Proof of Theorem 2
We first derive (9) and (10). From derivations given in Chen and Tang (2005Go), Formula admits an expansion: Formula From the bias of Formula given in Chen and Tang (2005Go)


Formula A18

(A.18)

The kernel ES estimator


Formula A19

(A.19)
Note that


Formula A20

(A.20)

Let {eta} = E{p–1YtKh(Yt {nu}p)} = p–1{int}({nu}phu)K(u)f({nu}phu)du = p–1{nu}pf({nu}p) + O(h2). Using a standard derivations for {alpha}-mixing sequences, for instance those given in Bosq (1998Go), we have Formula . Hence, from (A.18),


Formula A21

(A.21)
Combine (A.19), (A.20), and (A.21),


Formula

which establishes the bias given in (9).

We now derive the variance of Formula . Let Formula be the leading order term of the expansion (A.19).

Then,


Formula A22

(A.22)
It is easy to see that


Formula

Let cK = {int}{infty}{infty}uK(u)du{int}u{infty}K(v)dv. It may be shown that


Formula A23

(A.23)

Equation (A.23) and Lemma 3 mean


Formula A24

(A.24)

The second term on the right-hand side of (A.22) is


Formula

It may be shown by using the fact that {eta} = p–1{nu}pf({nu}p) + O(h2)


Formula A25

(A.25)
From the inequality given in Yokoyama (1980Go) for {alpha}-mixing sequences,


Formula

Applying the Cauchy–Schwartz inequality and Lemma 3,


Formula A26

(A.26)


Formula A27

(A.27)
Combine (A.25), (A.26), and (A.27),


Formula A28

(A.28)

From Lemma 3, the covariance term on the right-hand side of (A.22) is


Formula

Since Cov{YtGh({nu}pYt), Gh({nu}pYt)} = p(1 – pp – 2{nu}pf({nu}p)hcK + o(h),


Formula A29

(A.29)
Substituting (A.25), (A.28), and (A.29) to (A.22), we note that all the second order terms of O(n–1h) cancel out each other and therefore