Journal of Financial Econometrics Advance Access originally published online on November 28, 2007
Journal of Financial Econometrics 2008 6(1):87-107; doi:10.1093/jjfinec/nbm019
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nonparametric Estimation of Expected Shortfall
Iowa State University
Address for correspondence: Department of Statistics, Iowa State University, Ames, IA 50011-1210, email: songchen{at}iastate.edu
| Abstract |
|---|
|
|
|---|
The expected shortfall is an increasingly popular risk measure in financial risk management and it possesses the desired sub-additivity property, which is lacking for the value at risk (VaR). We consider two nonparametric expected shortfall estimators for dependent financial losses. One is a sample average of excessive losses larger than a VaR. The other is a kernel smoothed version of the first estimator (Scaillet, 2004 Mathematical Finance), hoping that more accurate estimation can be achieved by smoothing. Our analysis reveals that the extra kernel smoothing does not produce more accurate estimation of the shortfall. This is different from the estimation of the VaR where smoothing has been shown to produce reduction in both the variance and the mean square error of estimation. Therefore, the simpler ES estimator based on the sample average of excessive losses is attractive for the shortfall estimation.
KEYWORDS: expected shortfall, kernel estimator, risk measures, value at risk, weakly dependent
| Introduction |
|---|
|
|
|---|
The expected shortfall (ES) and the value at risk (VaR) are popular measures of financial risks for an asset or a portfolio of assets. Artzner, Delbaen, Eber, and Heath (1999
Let {Xt}nt=1 be the market values of an asset or a portfolio of assets over n periods of a time unit. Let Yt = –log(Xit/Xit–1) be the negative log return (log loss) over the tth period. Suppose {Yt}nt=1 is a stationary process with the stationary distribution function F. Given a positive value p close to zero, the VaR at a confidence level 1 – p is
|
| (1) |
p is less than p. See Duffie and Pan (1997
p.
The ES associated with a confidence level 1 – p, denoted as µp, is the conditional expectation of a loss given that the loss is larger than
p, that is,
|
| (2) |
Estimation of the ES can be carried out by assuming a parametric loss distribution, which is the method commonly used in actuary studies. Frey and McNeil (2002
) propose a binomial mixture model approach to estimate ES and VaR for a large, balanced portfolio. The extreme-value theory approach (Embrechts, Kluppelberg, and Mikosch, 1997
) can be viewed as a semiparametric approach, which uses the asymptotic distribution of exceedances over a high threshold to model the excessive losses and then carries out a parametric inference within the framework of the generalized Pareto distributions. Recently, Scaillet (2004
) has proposed a nonparametric kernel estimator and applied it to sensitivity analysis in the context of portfolio allocation.
An advantage of the nonparametric method is that it is model-free and hence is model robust and avoids bias caused by using a mis-specified loss distribution. Financial risk management is primarily concerned with characteristics of the tail part of the loss distribution. However, data are generally sparse in the tail and hence finding a proper parametric loss model that is adequate for the tail part is not trivial. This is where the nonparametric method can play a significant role. Another advantage of the nonparametric approach is that it allows a wide range of data dependence, which makes it adaptable in the context of financial losses. The nonparametric estimators considered in this paper can accommodate data dependence explicitly since the effect of dependence on the variance of ES estimation can be clearly spelled out in the variance formula. This is different from the extreme- value approach as the latter effectively treats high exceedances as independent observations, which is true asymptotically under the so-called D and D' conditions (Leadbetter, Lindgren, and Rootzén, 1983
). An empirical study by Bellini and Figá-Talamanca (2002
), carrying out a nonparametric runs test, has shown that financial returns can exhibit strong tail dependence even for large threshold levels. This indicates the need for considering the dependence in financial returns directly, which is the approach taken by the nonparametric estimators considered in this paper.
In this paper, we evaluate two nonparametric ES estimators. One is based on a weighted sample average of excessive losses defined by a VaR estimator
based on an order statistic. The other is the kernel estimator proposed in Scaillet (2004
) which employs kernel smoothing in both the initial VaR estimation and the final averaging of the excessive losses. It was hoped that the kernel smoothing would produce a more accurate estimator, like the case of VaR estimation studied by Chen and Tang (2005
).
A main finding of the current paper is that the variance and the mean square error of the kernel estimator proposed by Scaillet (2004
) is not necessarily smaller than those of the sample weighted average estimator. This is because the second order variance term of the kernel ES estimator vanishes instead of taking a negative value as in the case of VaR estimator. This indicates no meaningful variance reduction due to the kernel smoothing. As kernel smoothing introduces a bias, the lack of variance reduction makes the smoothing not worthwhile as the overall mean square error increases. Another finding is that the weighted average estimator has the same asymptotic variance as the kernel estimator. Therefore, for estimation of the ES, the sample weighted average of excessive losses is attractive as it is easy to compute as far as point estimation is concerned. This may be surprising considering that kernel smoothing leads to smaller variance in quantile estimation for both independent (Sheather and Marron, 1990
) and dependent (Chen and Tang, 2005
) observations. The underlying reason that these different effects of kernel smoothing happen is that the unconditional ES is effectively a mean parameter, which can be estimated accurately by simple averaging.
The paper is structured as follows. We introduce the two nonparametric ES estimators in Section 1. Their statistical properties are discussed in Section 2. Variance estimation for the purpose for supplying standard errors for the ES estimates is discussed in Section 3. Section 4 reports simulation results, which is followed by an empirical study on two financial series in Section 5. All the technical details are given in the appendix.
| 1 Nonparametric Estimators |
|---|
|
|
|---|
The first nonparametric estimator of the ES considered in this paper is
|
| (3) |
p and Y(r) is the rth order statistic of {Yt}nt=1.
The kernel estimator proposed by Scaillet (2004
) is the following. Let K be a kernel function, which is a symmetric probability density function, and G(t) = 
tK(u)du and Gh(t) = G(t/h) where h is a positive smoothing bandwidth. The kernel estimator of the survival function S(x) = 1 – F(x) is
|
| (4) |
p, denoted as |
| (5) |
Based on the improvement of the kernel VaR estimator
over
, it is expected that the kernel ES estimator
would improve the estimation accuracy of the unsmoothed estimator
. Confirming this or otherwise is the focus of the next section.
The commonly employed stochastic models in financial data modeling and risk assessment can generate data to which the proposed ES estimation may be applied. These models include the linear process
|
|
s}
s=0; the Markov process |
|
t}Tt=1 are independent and identically distributed random variables, and m(·) and
2(·) are respectively the conditional mean and volatility functions of Yt given |
|
i, and βj are all positive parameter; as well as the continuous-time diffusion models and the stochastic volatility models. | 2 Main Results |
|---|
|
|
|---|
The properties of these two nonparametric ES estimators are evaluated in this section. We start with some conditions.
Let
be the
-algebra of events generated by {Yt, k
t
l} for l>k. The
-mixing coefficient introduced by Rosenblatt (1956
) is
|
|
-mixing if limk

(k) = 0. The dependence described by the
-mixing is the weakest as it is implied by other types of mixing; see Doukhan (1994- There exists a
(0, 1) such that
(k)
C
k for all k
1 and a positive constant C.
- The stationary distribution F of the stationary process {Yt} is absolutely continuous with probability density f which has continuous second derivatives in
, a neighborhood of
p; for k
1, Fk, the joint distribution functions of (Y1, Yk+1), have all its second partial derivatives bounded in
; E(|Yt|2+
)
C for some
>0 and a positive constant C.
- K is a symmetric probability density satisfying the moment conditions
1–1uK(u)du = 0 and
1–1u2K(u)du =
2K>0, and K has bounded and Lipschitz continuous derivative.
- h satisfies h
0, nh3–β
for any β>0 and nh4 log2(n)
0 as n
.
Condition (i) means that the time series is geometric
-mixing, which is satisfied by many commonly used financial time series; some of them are listed at the end of last section. For instance Carrasco and Chen (2002
) established the
-mixing for ARCH model; and Genon–Catalot, Jeantheau, and Larédo (2000
) for diffusion models. Conditions (ii) contains standard conditions, which requires underlying smoothness for the marginal and pair-wise joint densities together with finite moments for the absolute returns. Conditions (iii) and (iv) are extra ones required by the kernel estimator. While Condition (iii) has the usual requirements on the kernel, Condition (iv) specifies a range for the bandwidth which includes O(n–1/3), the optimal order for estimating VaR estimation. These conditions are comparable to conditions imposed by other authors.
Let
(k) = Cov{(Y1 –
p)I(Y1
p), (Yk+1 –
p)I(Yk+1
p)} for positive integers k and
|
|
20(p, n) is finite for each n and is converging as n
.
We start with evaluating the unsmoothed estimator
to provide a point of reference for the kernel estimators. Derivation given in the appendix shows that under conditions (i) and (ii), and for an arbitrary positive
,
|
| (6) |
Theorem 1
Under conditions (i) and (ii), as n![]()
![]()
(7) This theorem indicates that the asymptotic variance of
is
20(p; n)/(np2), which is the variance of p–1{n–1
ni=1(Yt –
p)I(Yt
![]()
p) – p(µp –
p)}, the leading order term in expansion (6). The dependence in the original time series is reflected in the asymptotic variance through the covariance in
20(p; n). This means that we need to accommodate the dependence in further statistical inference for the shortfall estimation; see Section 3 for estimation of the variance. We note also that the effective sample size for the ES estimation is np2. As p is small ranging between 1% and 5% as commonly used in financial risk management, the ES estimator is subject to high volatility, which is a common challenge for statistical inference of risk measures.
The following theorem summarizes the properties of the kernel estimator (5).
Theorem 2
Under conditions (i) and (iv), as n![]()
![]()
and furthermore,
(8)
(9)
(10)
By comparing with Theorem 1, it is found that the kernel estimator has the same asymptotic normal distribution as the unsmoothed sample estimator
. This is similar to the corresponding results for VaR estimation as reported in Chen and Tang (2005
). We also note that both
and
converge to µp at the rate of
or more precisely at the rate of
; whereas the VaR estimators
and
converge to
p at the rate of
or, more precisely, at the rate of
where f is the probability density of Yt.
The second part of the theorem conveys a story different from VaR estimation. First of all, unlike the VaR estimation, the kernel estimator does not offer a variance reduction at the second order of n–1h as the second order term vanishes. At the same time, the smoothing brings in a bias that leads to an overall increase in the mean square error. Therefore, for the purpose of estimating the ES, the kernel smoothing is counterproductive. The underlying reason is the fact that the ES is effectively a mean parameter, which can be estimated rather accurately without smoothing. The situation is similar to nonparametric estimation of the mean parameter, which can be estimated well by the sample mean.
It should be noted that our above conclusion is only applicable for point estimation of ES. For constructing confidence intervals and testing hypothesis on µp in the presence of data dependence, the kernel smoothing as shown in the next section will play a significant role in estimating
20(p; n). For estimation of conditional ES (Scaillet, 2005
), smoothing is needed due to the involvement of conditioning variables.
| 3 Standard Errors |
|---|
|
|
|---|
In this section we introduce a method of obtaining standard errors for the nonparametric ES estimates considered earlier. Although it has not been advised for point estimation of ES, smoothing is needed for variance estimation so as to supply standard errors for the ES estimates. A similar approach is used in Chen and Tang (2005
Let
be the spectral density of {(Yt –
p)I(Yt
p)}. From Brockwell and Davis (1991
),
|
|

(0)(np2)–1. Hence, the key is estimating
(0).
Let
for t = 1, ..., n. We propose estimating
(0) by smoothing a set of sample periodograms close to the zero frequency of {Zt}nt=1. One may use Gh(Yt –
p) to replace
in order to reduce the variability in the estimation of
(0). Let
|
| (11) |
j = 2
j/n
[–
,
] for j
T = ±1, ..., ± [n/2].
Let Wj = log{In(
j)/(2
)} + 0.57721 and m(
) = log{
(
)}. Following the lines of Fan and Gijbels (1996
) and Chen and Tang (2005
), a Nadaraya–Waston kernel estimator of m(0) based on a symmetric kernel K1 and a smoothing bandwidth
is
|
| (12) |
0 and n
as n
. Then, an estimator of
(0) is
An important issue here is the selection of
. An objective function we may use in guiding the bandwidth selection is to minimize
|
| (13) |
[kn]) where kn is an n-dependent integer. We choose kn = [0.05n], which means that only the 10% sample periodograms close to the zero frequency are considered. It may be shown that an unbiased estimate of R(
) is
|
| (14) |
, the object function to be minimized for
selection is |
|
The proposed standard error estimation method with the proposed bandwidth selection will be applied in analyses of some financial data sets in Section 5.
| 4 Simulation Study |
|---|
|
|
|---|
In this section we report results from a simulation study which evaluates the performance of the nonparametric ES estimators. The main objective is to confirm our theoretical findings in the preceding section.
The models chosen for the log loss Yt in the simulation are
|
| (15) |
|
| (16) |
Figures 1 and 2 display the bias, variance, and mean square errors of
and the kernel VaR estimator
over a set of bandwidth values. For comparison, the figures also include the bias, variance, and mean square errors of the unsmoothed VaR estimator
and the kernel estimator
, respectively. Although the sample size considered in these figures is 250, the same pattern of results is observed for the sample size 500 as well. One feature that is worth noting from Figures 1 and 2 is that a large bandwidth increased both the variance and MSE of the kernel ES estimator. At the same time, the impact of a large bandwidth on the bias was quite limited as shown by the drop of the bias for large h. The main revelation of the simulation is that
has a larger variance and, to a large extent, a larger MSE than
for both models. In contrast, the kernel VaR estimator
delivers both variance and mean square error reduction as revealed in Chen and Tang (2005
). This confirms that there is no need to smooth the data for ES estimation.
|
|
| 5 Empirical Study |
|---|
|
|
|---|
We apply the proposed kernel estimator to estimate the ES of two financial time series. The two financial series are the CAC 40 and the Dow Jones series from October 1st 2001 to September 30th 2003, which consist of 500 observations (2 years' data). The log-return series are displayed in Figure 3 together with their sample autocorrelation functions (ACFs). To confirm the existence of dependence, we carry out the Box–Pierce test with the test statistic
|
We carry out analysis over three periods on each series, which are the first year (2001–2002), the second year (2002–2003), and the entire two years (2001–2003), respectively. Table 1 presents the ES estimates
|
We then extend the analysis for 20 equally spaced levels of p ranging from 0.01 to 0.03. The kernel estimates of
|
| Appendix: Proofs |
|---|
|
|
|---|
Throughout this section we use C and Ci to denote generic positive constants. The proof of Theorems 1 and 2 requires the following lemmas.
Lemma 1
Under Condition (i),exponentially fast as n
![]()
.
Proof
We only give the proof foras that for
can be treated similarly,
(A.1) Let Xi = I(Yt <
p +
n) – F(
p +
n). Clearly E(Xi) = 0 and |Xi|
2. Choose q = b0n
n, p = n/(2q) and
. From an equality given in Yokoyama (1980
), u2(q)
Cp. Apply Theorem 1.3 in Bosq (1998
) for
-mixing sequences,
where
(A.2) 2(q) = 2p–2u2(q) +
n = C
n. It is obvious that
where C2>0. Since n
(A.3) 2n
![]()
means q
n
![]()
, the first term in (A.2) converges to zero exponentially fast. On the second term of (A.2), the geometric
-mixing implies that
which converges to zero exponentially fast too. This completes the proof of Lemma 1.
(A.4)
Lemma 2
Under the Conditions (i) and (ii) and for any>0,
Proof
Let. We first evaluate E(Wt). Note that
where
Furthermore, let
and
where, for a
(0, 1/2) and
>0,
Applying the Cauchy–Swartz inequality, for k = 1 and 2,
Then Lemma 1 and the fact that
imply
(A.5) To evaluate
, we note that
This means
Using exactly the same approach we can show that
as well. These and (A.5) mean, by choosing a = –1/2 +
where
>0 is arbitrarily small,
for an arbitrarily small positive
(A.6) , which in turn implies
(A.7) We now consider Var(Wi). For a
(0, 1/2),
Note that
which converge to zero exponentially fast as implied by Lemma 1. Applying the Cauchy–Schwartz inequality, we have
converge to zero exponentially fast as well. Then, applying the same method that establish (A.6), we have
In summary, we have E(W2t) = o(n–3/2+
). This and (A.6) mean Var(Wt) = o(n–3/2+
). By slightly modifying the above derivation for Var(Wt), it may be shown that for any t1, t2
. Therefore,
This together with (A.7) readily establishes the lemma.
(A.8)
Lemma 3
Letand
. Under the conditions (i)–(iv),
Proof
We only present the proof of (a) as the proofs for the others are similar. Define. Let
,
and
for some functions
j, j = 1, 2, and 3, such that E{
j(Yt)} = 0. For instance,
2(Yt) = Kh(
p – Yt) – E{Kh(
p – Yt)} and
3(Yt) = Gh(
p – Yt) – E{Gh(
p – Yt)}.
Using the approach in Billingsley (1968
, p. 173),
where [6] indicates all the six different permutations among the three indices. Let p = 2 +
(A.9) , q = 2 +
and s–1 = 1 – p–1 – q–1 for some positive
. From the Davydov inequality,
Since |
3(Yi+j)|
2 and E|
2(Yi)|2+
![]()
Ch–1–
,
This and the fact that ||
(Y1)||p = E1/p|
1(Y1)|p
C lead to
Similarly,
. Therefore,
From (A.9) and (A.10), and the fact that
(A.10) (k) is monotonic non-increasing,
since
j
/(2 +
)(j) <
as implied by Condition (i).
Lemma 4
Under the conditions (i)–(v) and for l1, l2 = 0 or 1,
Proof
The case of l1 = l2 = 0 has been proved in Chen and Tang (2005) and the proofs for the other cases are almost the same, and hence are not given here.
Proof of Theorem 1
Let1(t) = n–1
ni=1YtI(Yt
t) and
2(t) = n–1
ni=1I(Yt
t). Then,
. Note that E{
1(
p)} = pµp, E{
2(
p)} = p and
. From Lemma 2, for an arbitrarily small positive
,
These lead to
(A.11)
(A.12) We need to employ the blocking technique and Bradley's Lemma to establish the asymptotic normality. Write
where Ti,n =
–10(p; n)p–1{(Yi –
p)I(Yi
![]()
p) – p(µp –
p)}.
Let k and k' be respectively positive integers such that k'
![]()
, k'/k
0 and k/n
0 as n
![]()
. Let r be a positive integer so that r(k + k')
n < r(k + k' + 1). Define the large blocks
the smaller blocks
and the residual block
n = Tr(k+k')+1,n + + Tn,n. Then
We note that E(Sn,2) = E(Sn,3) = 0 and as n
![]()
,
Therefore, for l = 2 and 3
(A.13) We are left to prove the asymptotic normality of Sn,1. From Bradley's lemma (see Bosq, 1998
), there exist independent and identically distributed random variables Wj,n such that each Wj,n is identically distributed as Vj,n and
(A.14) Let
n = Sn,1 – n–1/2
rj=1Wj,n. Then
By choosing r = na for a
(A.15) (0, 1) and k' = nc such that c
(0, 1 – a), we can show that the left-hand side of (A.15) converges to 0 as n
![]()
. Hence
Therefore, Sn,1 = n–1/2
(A.16) rj=1Wj,n + op(1).
By applying the inequality estbalished in Yokoyama (1980
) and the construction of Wj,n, we have E(Wj,n)4 = E(V4j,n)
C1k2 and Var(Wj,n) = E(V2j,n)
C2k. Thus,
|
|
, which is the Liapounov condition for the central limit theorem of triangular arrays. Therefore,
|
| (A.17) |
Proof of Theorem 2
We first derive (9) and (10). From derivations given in Chen and Tang (2005),
admits an expansion:
From the bias of
given in Chen and Tang (2005
)
(A.18) Note that
(A.19)
(A.20) Let
= E{p–1YtKh(Yt –
p)} = p–1
(
p – hu)K(u)f(
p – hu)du = p–1
pf(
p) + O(h2). Using a standard derivations for
-mixing sequences, for instance those given in Bosq (1998
), we have
. Hence, from (A.18),
Combine (A.19), (A.20), and (A.21),
(A.21) which establishes the bias given in (9).
We now derive the variance of
. Let
be the leading order term of the expansion (A.19).
It is easy to see that
(A.22) Let cK =
–
uK(u)du
u–
K(v)dv. It may be shown that
(A.23) Equation (A.23) and Lemma 3 mean
(A.24) The second term on the right-hand side of (A.22) is
It may be shown by using the fact that
= p–1
pf(
p) + O(h2)
From the inequality given in Yokoyama (1980
(A.25) ) for
-mixing sequences,
Applying the Cauchy–Schwartz inequality and Lemma 3,
(A.26) Combine (A.25), (A.26), and (A.27),
(A.27)
(A.28) From Lemma 3, the covariance term on the right-hand side of (A.22) is
Since Cov{YtGh(
p – Yt), Gh(
p – Yt)} = p(1 – p)µp – 2
pf(
p)hcK + o(h),
Substituting (A.25), (A.28), and (A.29) to (A.22), we note that all the second order terms of O(n–1h) cancel out each other and therefore
(A.29) which establishes (10).
(A.30)
The asymptotic normality of
can be established from (A.19) by using the same blocking method as that in the proof of Theorem 1.
| Footnotes |
|---|
|
|
|---|
The author thanks the Editor-in-Chief Professor Eric Renault, an Associate Editor and two referees for valuable comments and suggestions which have improved the presentation of the paper. The author also thanks Cheng Yong Tang for valuable computational support and acknowledges support of a National Science Foundation Grant (DMS-0604563).
1 The sub-additivity of a risk measure means that the risk for the sum of two independent risky events is not greater than the sum of the risks of the two events. ![]()
2 Its statistical properties and how to obtain the standard errors are considered in Chen and Tang (2005
). See also Cai (2002
) and Fan and Gu (2003
) for kernel conditional quantile estimation. A kernel estimator of conditional ES is proposed in Scaillet (2005
). See Fan and Yao (2003
) for other applications of the kernel method for nonlinear time series analysis. ![]()
Received July 12, 2007; revised April 16, 2007; accepted September 24, 2007
| References |
|---|
|
|
|---|
Artzner P., Delbaen F., Eber J-M., Heath D. "Coherent measures of risk." Mathematical Finance (1999) 9:203–228.[CrossRef][Web of Science]
Bahadur R. R. "A note on quantiles in large samples." The Annals of Mathematical Statistics (1966) 37:577–580.[CrossRef]
Bellini F., Figá-Talamanca G. "Detecting and modeling tail dependence." (2002) Manuscript and a paper presented in Quantative Methods in Finance 2003 Conference, December 2003: Cairns, Australia.
Billingsley P. Convergence of probability measures (1968) New York: Wiley.
Bosq D. Nonparametric Statistics for Stochastic Processes. Lecture Notes in Statistics 110. (1998) Heidelberg: Springer.
Brockwell P. J., Davis R. A. Time Series: Theory and Methods (1991) New York: Springer.
Cai Z.-W. "Regression quantiles for time series data." Econometric Theory (2002) 18:169–192.[CrossRef][Web of Science]
Carrasco M., Chen X. "Mixing and Moment Properties of Various GARCH and Stochastic Volatility Models." Econometrics Theory (2002) 18:17–39.[CrossRef]
Chen S. X., Tang C. Y. "Nonparametric inference of Value at Risk for dependent financial returns." Journal of Financial Econometrics (2005) 3:227–255.
Duffie D., Pan J. "An overview of value at risk." Journal of Derivative (1997) 4:7–49.
Doukhan P. Mixing. Lecture Notes in Statistics (1994) 85. Heidelberg: Springer.
Embrechts P., Klueppelberg C., Mikosch T. Modeling Extremal Events for Insurance and Finance (1997) Berlin: Springer.
Fan J., Gijbels I. Local Polynomial Modeling and Its Applications (1996) London: Chapman and Hall.
Fan J., Gu J. "Semiparametric Estimation of Value-at-Risk." In: Econometrics Journal (2003) 6:261–290.[CrossRef]
Fan J., Yao Q. Nonlinear Time Series: Nonparametric and Parametric Methods. (2003) New York: Springer.
Föllmer H., Schied A. "Convex measures of risk and trading constraints." Finance and Stochastics (2002) 6:429–447.[CrossRef][Web of Science]
Frey R., McNeil A. J. "VaR and expected shortfall in portfolios of dependent credit risks: conceptual and practical insights." Journal of Banking and Finance (2002) 26:1317–1334.[CrossRef]
Genon-Caralot V., Jeantheau T., Laredo C. "Stochastic Volatility Models as Hidden Markov Models and Statistical Applications." Bernoulli 6 (2000) 1051–1079.
Gouriéroux C., Scaillet O., Laurent J. P. "Sensitivity analysis of Values at Risk." Journal of Empirical Finance (2000) 7:225–245.[CrossRef]
Jorion P. Value at Risk (2001) 2nd. New York: McGraw-Hill.
Leadbetter M. R., Lindgren G., Rootzén H. Extremes and Related Properties of Random Sequences and Processes (1983) Berlin: Springer.
Rosenblatt M. "A central limit theorem and a the
-mixing condition." Proceedings of the National Academy of Science, USA (1956) 42:43–47.
Scaillet O. "Nonparametric estimation and sensitivity analysis of expected shortfall." Mathematical Finance (2004) 14:115–129.[CrossRef][Web of Science]
Scaillet O. "Nonparametric estimation of conditional expected shortfall." Insurance and Risk Management Journal (2005) 74:639–660.
Sheather S. J., Marron J. S. "Kernel quantile estimators." Journal of American Statistical Association (1990) 85:410–416.[CrossRef]
Yokoyama R. "Moment bounds for stationary mixing sequences." Probability Theory and Related Fields (1980) 52:45–87.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







ni=1(Yt – 











>0, 






j, j = 1, 2, and 3, such that E{






n = Sn,1 – n–1/2









