유한 분산을 테스트 하시겠습니까?


29

표본이 주어진 랜덤 변수의 분산의 유한성 (또는 존재)을 테스트 할 수 있습니까? 널 (null)로서 {분산이 존재하고 유한함} 또는 {분산이 존재하지 않거나 무한함}이 허용됩니다. 철학적으로 (그리고 계산적으로), 유한 분산이없는 모집단과 매우 큰 분산이있는 모집단 (> ) 사이에 차이가 없어야하기 때문에 이것은 매우 이상하게 보입니다 . 해결되었습니다.10400

나에게 제안 된 한 가지 접근법은 중앙 한계 정리를 통해 이루어졌습니다. 샘플이 iid이고 모집단이 유한 평균이라고 가정하면 어떻게 든 표본 평균이 증가함에 따라 표본 평균에 올바른 표준 오차가 있는지 확인할 수 있습니다. 그래도이 방법이 효과가 있다고 확신하지 않습니다. (특히, 적절한 테스트로 만드는 방법을 모르겠습니다.)


1
관련성 : stats.stackexchange.com/questions/94402/… 분산이 존재하지 않을 가능성이 최소 인 경우 유한 분산을 가정하지 않는 모델을 사용하는 것이 좋습니다. 그것을 테스트하는 것에 대해 생각조차하지 마십시오.
kjetil b halvorsen

답변:


13

크기 의 유한 표본은 예를 들어 오염 된 정규 모집단과 정규 모집단을 확실하게 구별 할 수 없기 때문에 이것은 불가능합니다.N >> n 일1/N 때 Cauchy 분포의 1 / N 양. (물론 전자는 유한 분산이고 후자는 무한 분산입니다.) 따라서 완전 비모수 적 검정은 그러한 대안에 대해 임의로 낮은 검정력을 갖습니다.Nn


4
이것은 매우 좋은 지적입니다. 그러나 대부분의 가설 검정이 일부 대안에 대해 임의로 저전력을 갖지는 않습니까? 예를 들어 평균이 인구에서 샘플을 제공 할 때 제로 평균에 대한 테스트는 매우 낮은 전력있을 것이다 ϵ 에 대해 0<|ϵ|작은. 나는 여전히 그러한 테스트가 완전히 무질서하게 구성 될 수 있는지 궁금해하며, 경우에 따라 저전력을 사용하는지 여부는 훨씬 적습니다.
shabbychef

2
또한, 당신이 인용 한 것과 같은 '오염 된'분포는 항상 '동일하게 분포 된'이라는 생각과 상충되는 것처럼 보였다. 아마도 당신은 동의 할 것입니다. 말 샘플이 일부 유통에서 IID 그려 보인다 분포를 명시하지 않고 의미없는 (물론, IID의 '독립적'부분을 의미)입니다.
shabbychef

2
(1) 당신은 저전력에 대해서는 맞지만 여기서 문제는 "유한 한 것"에서 "무한한"으로의 점진적인 단계가 없다는 것입니다. "큰"출발과 비교하여 널에서 "작은"출발을 구성하는 것. (2) 배포 형태는 iid의 고려와 무관합니다. 데이터의 1 %가 Cauchy에서, 99 %는 Normal에서 온다는 의미는 아닙니다. 데이터의 100 %가 거의 정규 분포이지만 코시 꼬리가있는 분포에서 나온다는 것을 의미합니다. 이런 의미에서 데이터는 오염 된 분포에 대해 iid 일 수 있습니다.
whuber

2
이 논문을 읽은 사람이 있습니까? sciencedirect.com/science/article/pii/S0304407615002596
Christoph Hanck

3
@shabbychef 모든 관찰 결과가 동일하게 분포 된 동일한 동일한 혼합 공정에서 발생하는 경우 각각 해당 혼합물 분포에서 도출됩니다. 일부 관측치가 반드시 한 공정에서 나온 것이고 다른 관측치가 반드시 다른 공정에서 나온 것이라면 (관찰 1 ~ 990은 정상이고 관측치 991 ~ 1000은 Cauchy), 동일하지 않습니다 99 % -1 % 혼합물로부터). 이것은 본질적으로 사용중인 프로세스 모델에 달려 있습니다.
Glen_b-복지 주 모니카

16

분포를 모르면 확신 할 수 없습니다. 그러나 그러한 당신이 크기의 샘플이있는 경우 즉, "부분적인 변화"라고 할 수 있습니다 무엇을보고 같은 당신이 할 수있는 어떤 일, 거기에 , 당신은 처음부터 추정 분산 그릴 n 개의 , 용어를 N 에 2에서 실행은 N .NnnN

유한 모집단 분산을 사용하면 부분 분산이 곧 모집단 분산에 가까워 지길 바랍니다.

모집단 분산이 무한하면 부분 분산이 급증한 후 다음으로 큰 값이 표본에 나타날 때까지 천천히 감소합니다.

이것은 Normal 및 Cauchy 랜덤 변수 (및 로그 스케일)가있는 그림입니다 Partial Variance

분포의 모양이 충분한 신뢰도로 식별하기 위해 필요한 것보다 훨씬 더 큰 표본 크기가 필요한 경우 (예 : 유한 분산 분포의 경우 매우 큰 값은 드물지만) 무한 분산 분포의 경우 매우 드 rare니다. 주어진 분포에 대해 그 특성을 밝히지 않는 것보다 더 큰 표본 크기가있을 것입니다. 반대로, 주어진 표본 크기에 대해, 그 표본의 크기에 대해 그 특성을 위장하지 않을 가능성이 더 큰 분포가 있습니다.


4
+1 나는 이것을 좋아한다. (a) 그래픽은 일반적으로 테스트보다 훨씬 더 많은 것을 나타내며 (b) 실용적이다. 나는 그것이 임의의 측면을 가지고 있다는 것에 약간 우려하고 있습니다. "부분 분산"이 하나 또는 두 개의 극단 값으로 인해 시작에 가까워지면이 그림이 기만적 일 수 있습니다. 이 문제에 대한 좋은 해결책이 있는지 궁금합니다.
whuber

1
+1 for great graphic. Really solidifies the concept of "no variance" in the Cauchy distribution. @whuber: Sorting the data in all possible permutations, running the test for each, and taking some kind of average? Not very computationally efficient, I'll grant you :) but maybe you could just chose a handful of random permutations?
naught101

2
@naught101 Averaging over all permutations won't tell you anything, because you will get a perfectly horizontal line. Perhaps I misunderstand what you mean?
whuber

1
@whuber: I actually meant taking the average of some kind of test for convergence, not the graph itself. But I'll grant it's a pretty vague idea, and that's largely because I have no idea what I'm talking about :)
naught101

7

Here's another answer. Suppose you could parametrize the problem, something like this:

H0: Xt(df=3) versus H1: Xt(df=1).

Then you could do an ordinary Neyman-Pearson likelihood ratio test of H0 versus H1. Note that H1 is Cauchy (infinite variance) and H0 is the usual Student's t with 3 degrees of freedom (finite variance) which has PDF:

f(x|ν)=Γ(ν+12)νπΓ(ν2)(1+x2ν)ν+12,

for <x<. Given simple random sample data x1,x2,,xn, the likelihood ratio test rejects H0 when

Λ(x)=i=1nf(xi|ν=1)i=1nf(xi|ν=3)>k,
where k0 is chosen such that
P(Λ(X)>k|ν=3)=α.

It's a little bit of algebra to simplify

Λ(x)=(32)ni=1n(1+xi2/3)21+xi2.

So, again, we get a simple random sample, calculate Λ(x), and reject H0 if Λ(x) is too big. How big? That's the fun part! It's going to be hard (impossible?) to get a closed form for the critical value, but we could approximate it as close as we like, for sure. Here's one way to do it, with R. Suppose α=0.05, and for laughs, let's say n=13.

We generate a bunch of samples under H0, calculate Λ for each sample, and then find the 95th quantile.

set.seed(1)
x <- matrix(rt(1000000*13, df = 3), ncol = 13)
y <- apply(x, 1, function(z) prod((1 + z^2/3)^2)/prod(1 + z^2))
quantile(y, probs = 0.95)

This turns out to be (after some seconds) on my machine to be 12.8842, which after multiplied by (3/2)13 is k1.9859. Surely there are other, better, ways to approximate this, but we're just playing around.

In summary, when the problem is parametrizable you can set up a hypothesis test just like you would in other problems, and it's pretty straightforward, except in this case for some tap dancing near the end. Note that we know from our theory the test above is a most powerful test of H0 versus H1 (at level α), so it doesn't get any better than this (as measured by power).

Disclaimers: this is a toy example. I do not have any real-world situation in which I was curious to know whether my data came from Cauchy as opposed to Student's t with 3 df. And the original question didn't say anything about parametrized problems, it seemed to be looking for more of a nonparametric approach, which I think was addressed well by the others. The purpose of this answer is for future readers who stumble across the title of the question and are looking for the classical dusty textbook approach.

P.S. it might be fun to play a little more with the test for testing H1:ν1, or something else, but I haven't done that. My guess is that it'd get pretty ugly pretty fast. I also thought about testing different types of stable distributions, but again, it was just a thought.


2
estimating the α in stable distributions is notoriously difficult.
shabbychef

1
You could test also that H1:ν2, because T-dist has finite variance only for ν>2.
probabilityislogic

2
Re: α, I didn't know it was notoriously difficult, but it sounds right, thanks. @probability, you are right, and the only reason I picked 3 versus 1 was because it meant less fractions. And BTW, I liked probability's answer better than mine (+1).

1
maybe I misremembered the result: something about tail index estimation when α is near 2; the paper is by Weron, I think. That aside, testing α=2 against a sum-stable alternative is a kind of normality test! Such tests usually reject given sufficient (real) data: see e.g. stats.stackexchange.com/questions/2492/…
shabbychef

6

In order to test such a vague hypothesis, you need to average out over all densities with finite variance, and all densities with infinite variance. This is likely to be impossible, you basically need to be more specific. One more specific version of this and have two hypothesis for a sample DY1,Y2,,YN:

  1. H0:YiNormal(μ,σ)
  2. HA:YiCauchy(ν,τ)

One hypothesis has finite variance, one has infinite variance. Just calculate the odds:

P(H0|D,I)P(HA|D,I)=P(H0|I)P(HA|I)P(D,μ,σ|H0,I)dμdσP(D,ν,τ|HA,I)dνdτ

Where P(H0|I)P(HA|I) is the prior odds (usually 1)

P(D,μ,σ|H0,I)=P(μ,σ|H0,I)P(D|μ,σ,H0,I)
And
P(D,ν,τ|HA,I)=P(ν,τ|HA,I)P(D|ν,τ,HA,I)

Now you normally wouldn't be able to use improper priors here, but because both densities are of the "location-scale" type, if you specify the standard non-informative prior with the same range L1<μ,τ<U1 and L2<σ,τ<U2, then we get for the numerator integral:

(2π)N2(U1L1)log(U2L2)L2U2σ(N+1)L1U1exp(N[s2(Y¯μ)2]2σ2)dμdσ

Where s2=N1i=1N(YiY¯)2 and Y¯=N1i=1NYi. And for the denominator integral:

πN(U1L1)log(U2L2)L2U2τ(N+1)L1U1i=1N(1+[Yiντ]2)1dνdτ

And now taking the ratio we find that the important parts of the normalising constants cancel and we get:

P(D|H0,I)P(D|HA,I)=(π2)N2L2U2σ(N+1)L1U1exp(N[s2(Y¯μ)2]2σ2)dμdσL2U2τ(N+1)L1U1i=1N(1+[Yiντ]2)1dνdτ

And all integrals are still proper in the limit so we can get:

P(D|H0,I)P(D|HA,I)=(2π)N20σ(N+1)exp(N[s2(Y¯μ)2]2σ2)dμdσ0τ(N+1)i=1N(1+[Yiντ]2)1dνdτ

The denominator integral cannot be analytically computed, but the numerator can, and we get for the numerator:

0σ(N+1)exp(N[s2(Y¯μ)2]2σ2)dμdσ=2Nπ0σNexp(Ns22σ2)dσ

Now make change of variables λ=σ2dσ=12λ32dλ and you get a gamma integral:

2Nπ0λN121exp(λNs22)dλ=2Nπ(2Ns2)N12Γ(N12)

And we get as a final analytic form for the odds for numerical work:

P(H0|D,I)P(HA|D,I)=P(H0|I)P(HA|I)×πN+12NN2s(N1)Γ(N12)0τ(N+1)i=1N(1+[Yiντ]2)1dνdτ

So this can be thought of as a specific test of finite versus infinite variance. We could also do a T distribution into this framework to get another test (test the hypothesis that the degrees of freedom is greater than 2).


1
When you started to integrate, you introduced a term s2. It persists through the final answer. What is it?
whuber

2
@whuber - s is the standard deviation MLE, s2=N1i=1N(YiY¯)2. I thought it was the usual notation for standard deviation, just as Y¯ is usual for average - which I have incorrectly written as x¯, will edit accordingly
probabilityislogic

5

The counterexample is not relevant to the question asked. You want to test the null hypothesis that a sample of i.i.d. random variables is drawn from a distribution having finite variance, at a given significance level. I recommend a good reference text like "Statistical Inference" by Casella to understand the use and the limit of hypothesis testing. Regarding h.t. on finite variance, I don't have a reference handy, but the following paper addresses a similar, but stronger, version of the problem, i.e., if the distribution tails follow a power law.

POWER-LAW DISTRIBUTIONS IN EMPIRICAL DATA SIAM Review 51 (2009): 661--703.


1

One approach that had been suggested to me was via the Central Limit Theorem.

This is a old question, but I want to propose a way to use the CLT to test for large tails.

Let X={X1,,Xn} be our sample. If the sample is a i.i.d. realization from a light tail distribution, then the CLT theorem holds. It follows that if Y={Y1,,Yn} is a bootstrap resample from X then the distribution of:

Z=n×mean(Y)mean(X)sd(Y),

is also close to the N(0,1) distribution function.

Now all we have to do is perform a large number of bootstraps and compare the empirical distribution function of the observed Z's with the e.d.f. of a N(0,1). A natural way to make this comparison is the Kolmogorov–Smirnov test.

The following pictures illustrate the main idea. In both pictures each colored line is constructed from a i.i.d. realization of 1000 observations from the particular distribution, followed by a 200 bootstrap resamples of size 500 for the approximation of the Z ecdf. The black continuous line is the N(0,1) cdf.

enter image description here enter image description here


2
No amount of bootstrapping will get you anywhere against the problem I raised in my answer. That's because the vast majority of samples will not supply any evidence of a heavy tail--and bootstrapping, by definition, uses only the data from the sample itself.
whuber

1
@whuber If the X values are taken from a symmetrical power law, then the generalized CLT applies and KS test will detect the difference. I believe that your observation do not correctly characterize what you say is a "gradual step from "finite" to "infinite""
Mur1lo

1
The CLT never "applies" to any finite sample. It's a theorem about a limit.
whuber

1
When I say that it "applies" I'm only saying that it provides a good approximation if we have a large sample.
Mur1lo

1
The vagueness of "good approximation" and "large" unfortunately fail to capture the logic of hypothesis tests. Implicit in your statement is the possibility of collecting an ever larger sample until you are able to detect the heavy-tailedness: but that's not how hypotheses tests usually work. In the standard setting you have a given sample and your task is to test whether it is from a distribution in the null hypothesis. In this case, bootstrapping won't do that any better than any more straightforward test.
whuber
당사 사이트를 사용함과 동시에 당사의 쿠키 정책개인정보 보호정책을 읽고 이해하였음을 인정하는 것으로 간주합니다.
Licensed under cc by-sa 3.0 with attribution required.