더 모두에 문제의있는 신뢰 구간의 모든 비모수 부트 스트랩 추정 (CI), 일부 공통적 인 어려움이있는 "경험"합니다 (에서는 "기본" boot.ci()
은 R의 기능 boot
패키지 와의 참조. 1 ) "백분위 수"CI 추정치 ( 참고 문헌 2에 설명 됨) 및 백분위 수 CI로 악화 될 수있는 것.
TL; DR : 경우에 따라 백분위 수 부트 스트랩 CI 추정이 적절하게 작동 할 수 있지만 특정 가정이 유지되지 않으면 백분위 수 CI가 최악의 선택 일 수 있으며 경험적 / 기본 부트 스트랩이 다음으로 최악입니다. 다른 부트 스트랩 CI 추정치는 더 나은 적용 범위로 더 안정적 일 수 있습니다. 모두 문제가 될 수 있습니다. 진단 플롯을 항상 살펴보면 소프트웨어 루틴의 출력 만 수락하면 발생할 수있는 오류를 피할 수 있습니다.
부트 스트랩 설정
일반적으로 Ref. 1 , 우리는 데이터의 샘플이 는 누적 분포 함수 F를 공유하는 독립적이고 동일하게 분포 된 랜덤 변수 Y i 에서 도출됩니다 . 데이터 샘플로 구성 경험적 분포 함수 (EDF)입니다 F . 표본의 값이 t 인 통계량 T 로 추정되는 모집단 의 특성 θ 에 관심 이 있습니다. 얼마나 잘 우리는 알고 싶다 T는 추정 θy1,...,ynYiFF^θTtTθ, for example, the distribution of (T−θ).
Nonparametric bootstrap uses sampling from the EDF F^ to mimic sampling from F, taking R samples each of size n with replacement from the yi. Values calculated from the bootstrap samples are denoted with "*". For example, the statistic T calculated on bootstrap sample j provides a value T∗j.
Empirical/basic versus percentile bootstrap CIs
경험적 / 기본 부트 스트랩은 (T∗−t) among the R bootstrap samples from F^ to estimate the distribution of (T−θ) within the population described by F itself. Its CI estimates are thus based on the distribution of (T∗−t), where t is the value of the statistic in the original sample.
This approach is based on the fundamental principle of bootstrapping (Ref. 3):
The population is to the sample as the sample is to the bootstrap samples.
The percentile bootstrap instead uses quantiles of the T∗j values themselves to determine the CI. These estimates can be quite different if there is skew or bias in the distribution of (T−θ).
Say that there is an observed bias B such that:
T¯∗=t+B,
T¯∗T∗jT∗jT¯∗−δ1 and T¯∗+δ2, where T¯∗ is the mean over the bootstrap samples and δ1,δ2 are each positive and potentially different to allow for skew. The 5th and 95th CI percentile-based estimates would directly be given respectively by:
T¯∗−δ1=t+B−δ1;T¯∗+δ2=t+B+δ2.
The 5th and 95th percentile CI estimates by the empirical/basic bootstrap method would be respectively (Ref. 1, eq. 5.6, page 194):
2t−(T¯∗+δ2)=t−B−δ2;2t−(T¯∗−δ1)=t−B+δ1.
So percentile-based CIs both get the bias wrong and flip the directions of the potentially asymmetric positions of the confidence limits around a doubly-biased center. The percentile CIs from bootstrapping in such a case do not represent the distribution of (T−θ).
This behavior is nicely illustrated on this page, for bootstrapping a statistic so negatively biased that the original sample estimate is below the 95% CIs based on the empirical/basic method (which directly includes appropriate bias correction). The 95% CIs based on the percentile method, arranged around a doubly-negatively biased center, are actually both below even the negatively biased point estimate from the original sample!
Should the percentile bootstrap never be used?
That might be an overstatement or an understatement, depending on your perspective. If you can document minimal bias and skew, for example by visualizing the distribution of (T∗−t) with histograms or density plots, the percentile bootstrap should provide essentially the same CI as the empirical/basic CI. These are probably both better than the simple normal approximation to the CI.
Neither approach, however, provides the accuracy in coverage that can be provided by other bootstrap approaches. Efron from the beginning recognized potential limitations of percentile CIs but said: "Mostly we will be content to let the varying degrees of success of the examples speak for themselves." (Ref. 2, page 3)
Subsequent work, summarized for example by DiCiccio and Efron (Ref. 4), developed methods that "improve by an order of magnitude upon the accuracy of the standard intervals" provided by the empirical/basic or percentile methods. Thus one might argue that neither the empirical/basic nor the percentile methods should be used, if you care about accuracy of the intervals.
In extreme cases, for example sampling directly from a lognormal distribution without transformation, no bootstrapped CI estimates might be reliable, as Frank Harrell has noted.
What limits the reliability of these and other bootstrapped CIs?
Several issues can tend to make bootstrapped CIs unreliable. Some apply to all approaches, others can be alleviated by approaches other than the empirical/basic or percentile methods.
The first, general, issue is how well the empirical distribution F^ represents the population distribution F. If it doesn't, then no bootstrapping method will be reliable. In particular, bootstrapping to determine anything close to extreme values of a distribution can be unreliable. This issue is discussed elsewhere on this site, for example here and here. The few, discrete, values available in the tails of F^ for any particular sample might not represent the tails of a continuous F very well. An extreme but illustrative case is trying to use bootstrapping to estimate the maximum order statistic of a random sample from a uniform U[0,θ] distribution, as explained nicely here. Note that bootstrapped 95% or 99% CI are themselves at tails of a distribution and thus could suffer from such a problem, particularly with small sample sizes.
Second, there is no assurance that sampling of any quantity from F^ will have the same distribution as sampling it from F. Yet that assumption underlies the fundamental principle of bootstrapping. Quantities with that desirable property are called pivotal. As AdamO explains:
This means that if the underlying parameter changes, the shape of the distribution is only shifted by a constant, and the scale does not necessarily change. This is a strong assumption!
For example, if there is bias it's important to know that sampling from F around θ is the same as sampling from F^ around t. And this is a particular problem in nonparametric sampling; as Ref. 1 puts it on page 33:
In nonparametric problems the situation is more complicated. It is now unlikely (but not strictly impossible) that any quantity can be exactly pivotal.
So the best that's typically possible is an approximation. This problem, however, can often be addressed adequately. It's possible to estimate how closely a sampled quantity is to pivotal, for example with pivot plots as recommended by Canty et al. These can display how distributions of bootstrapped estimates (T∗−t) vary with t, or how well a transformation h provides a quantity (h(T∗)−h(t)) that is pivotal. Methods for improved bootstrapped CIs can try to find a transformation h such that (h(T∗)−h(t)) is closer to pivotal for estimating CIs in the transformed scale, then transform back to the original scale.
The boot.ci()
function provides studentized bootstrap CIs (called "bootstrap-t" by DiCiccio and Efron) and BCa CIs (bias corrected and accelerated, where the "acceleration" deals with skew) that are "second-order accurate" in that the difference between the desired and achieved coverage α (e.g., 95% CI) is on the order of n−1, versus only first-order accurate (order of n−0.5) for the empirical/basic and percentile methods (Ref 1, pp. 212-3; Ref. 4). These methods, however, require keeping track of the variances within each of the bootstrapped samples, not just the individual values of the T∗j used by those simpler methods.
In extreme cases, one might need to resort to bootstrapping within the bootstrapped samples themselves to provide adequate adjustment of confidence intervals. This "Double Bootstrap" is described in Section 5.6 of Ref. 1, with other chapters in that book suggesting ways to minimize its extreme computational demands.
Davison, A. C. and Hinkley, D. V. Bootstrap Methods and their Application, Cambridge University Press, 1997.
Efron, B. Bootstrap Methods: Another look at the jacknife, Ann. Statist. 7: 1-26, 1979.
Fox, J. and Weisberg, S. Bootstrapping regression models in R. An Appendix to An R Companion to Applied Regression, Second Edition (Sage, 2011). Revision as of 10 October 2017.
DiCiccio, T. J. and Efron, B. Bootstrap confidence intervals. Stat. Sci. 11: 189-228, 1996.
Canty, A. J., Davison, A. C., Hinkley, D. V., and Ventura, V. Bootstrap diagnostics and remedies. Can. J. Stat. 34: 5-27, 2006.