중앙값의 편견 추정치


16

[ 0 , 1 ] 에서 지원되는 랜덤 변수 X 가 있다고 가정 해 봅시다 . X 의 중앙값에 대한 편견없는 추정값을 어떻게 얻을 수 있습니까?X[0,1]X

물론 일부 샘플을 생성하고 샘플 중앙값을 취할 수는 있지만 이것이 일반적으로 편향되지는 않는다는 것을 이해합니다.

참고 :이 질문은 내 마지막 질문 과 관련이 있지만 동일하지는 않습니다 .이 경우 XX 는 대략 샘플링 될 수 있습니다.

답변:


13

이러한 추정기는 존재하지 않습니다.

직감은 평균이 고정밀도를 유지하면서 양쪽에서 확률 밀도를 자유롭게 이동할 수 있기 때문에 평균값이 한 분포의 중앙값 인 추정량은 변경된 분포에 대해 평균이 다르므로 편중됩니다. 다음 설명은이 직관에 조금 더 엄격합니다.


우리는 분포에 초점 F는F 고유 중앙값 갖는 해요m 있으므로 그 정의에 의해 F ( m ) 1 / 2F(m)1/2F ( X ) < 1 / 2F(x)<1/2 모두 X < mx<m . 표본 크기 n 1을 고정하고 t : [ 0 , 1 ] n[ 0 , 1 ]이 m을 추정 n1한다고 가정합니다 . ( t 충분하다t:[0,1]n[0,1]mt한계는 있지만 일반적으로 불가능한 값을 산출하는 추정자를 진지하게 고려하지 않습니다.) 우리 는 t 에 대한 가정 을 하지 않습니다 . 어디에서나 연속적 일 필요조차 없습니다.t

t 가 편향되지 않는다는 의미 (고정 된 표본 크기의 경우)는t

E F [ t ( X 1 , , X n ) ] = m

EF[t(X1,,Xn)]=m

X i ~ F 인 iid 샘플의 경우 . 에 "불편 추정" t는 이 속성을 하나 모두 같은 F .XiFtF

편견 추정기가 존재한다고 가정합니다. 우리는 특히 간단한 분포에 적용하여 모순을 도출 할 것입니다. 다음과 같은 특성을 갖는 분포 F = F x , y , m , ε 을 고려하십시오 .F=Fx,y,m,ε

  1. 0 x < y 1 ;0x<y1

  2. 0 < ε < ( y - x ) / 4 ;0<ε<(yx)/4

  3. x + ε < m < y - ε ;x+ε<m<yε

  4. Pr ( X = x ) = Pr ( X = y ) = ( 1 - ε ) / 2 ;Pr(X=x)=Pr(X=y)=(1ε)/2

  5. PR ( m - ε X m + ε ) = ε ; 과Pr(mεXm+ε)=ε

  6. F [ m - ε , m + ε ] 에서 균일합니다.F[mε,m+ε]

These distributions place probability (1ε)/2(1ε)/2 at each of xx and yy and a tiny amount of probability symmetrically placed around mm between xx and yy. This makes mm the unique median of FF. (If you are concerned that this is not a continuous distribution, then convolve it with a very narrow Gaussian and truncate the result to [0,1][0,1]: the argument will not change.)

Now, for any putative median estimator tt, an easy estimate shows that E[t(X1,X2,,Xn)]E[t(X1,X2,,Xn)] is strictly within εε of the average of the 2n2n values t(x1,x2,,xn)t(x1,x2,,xn) where the xixi vary over all possible combinations of xx and yy. However, we can vary mm between x+εx+ε and yεyε, a change of at least εε (by virtue of conditions 2 and 3). Thus there exists an mm, and whence a corresponding distribution Fx,y,m,εFx,y,m,ε, for which this expectation does not equal the median, QED.


(+1) Nice proof. Did you come up with it, or is it something you remembered from the grad school?
StasK

4
Here is another proof: Most Bernoulli random variables have median 00 or 11. The estimate from nn trials depends only on the average values of the estimator on the vertices of [0,1]n[0,1]n with kk, and the weights of these average values is a polynomial in pp of degree nn. If this is an unbiased estimator, it must have average value 11 for any p>1/2p>1/2, and there are more than n+1n+1 such values of pp, so this polynomial must be constant... but it must be 00 on lower values of pp, so it can't be unbiased there, too.
Douglas Zare

1
@Douglas That's a great proof. I suspect some people might feel a little uneasy about the scope of its applicability, though, because the median for a Bernoulli variable is somewhat special, being coincident with one of its two support points (except when p=1/2p=1/2). Readers might be tempted to declare this as "pathological" and try to bar such monsters by looking only at continuous distributions with everywhere positive densities on their domains. That's why I took care to show that such efforts will fail.
whuber

3

Finding an unbiased estimator without having a parametric model would be difficult! But you could use bootstrapping, and use that to correct the empirical median to get an approximately unbiased estimator.


If this is impossible, is it possible to prove it? For example, if X1,X2,,XnX1,X2,,Xn are independent samples from XX then can one prove that f(X1,,Xn)f(X1,,Xn) cannot be unbiased for any choice of ff?
robinson

2
I think kjetil is saying that in a nonparametric framework there is no method that will give an unbiased estimate for every possible distribution. But in the parametric framework you probably could. Bootstrapping a biased sample estimate can allow you to estimate the bias and adjust it to get a bootstrap estimate that is nearly unbiased. That was his suggestion for handling the problem in the nonparametric framework. Proving that an unbiased estimate is not possible would also be difficult.
Michael R. Chernick

2
If you really want to try to prove that there do not exist an unbiased estimator, there is a book, Ferguson: "Mathematical Statistics - A Decision Theoretic Approach" which do have some examples of that kind of thing!
kjetil b halvorsen

I imagine that the regularity conditions for the bootstrap will be violated with the distribution functions that whuber considers in his answer. Michael, can you comment?
StasK

2
@Stas As I pointed out, my functions can be made to look very "nice" by mollifying them. They can also be generalized to mollifications of large finite mixtures of atoms. The class of such distributions is dense in all distributions on the unit interval, so I don't think bootstrap regularity would be involved here.
whuber

0

I believe quantile regression will give you a consistent estimator of the median. Given the model Y=α+uY=α+u. And you want to estimate med(y)=med(α+u)=α+med(u)med(y)=med(α+u)=α+med(u) since αα is a constant. All you need is the med(u)=0med(u)=0 which should be true so long as you have independent draws. However, as far as unbiasedness, I don't know. Medians are difficult.


See @whuber 's answer
Peter Flom - Reinstate Monica
당사 사이트를 사용함과 동시에 당사의 쿠키 정책개인정보 보호정책을 읽고 이해하였음을 인정하는 것으로 간주합니다.
Licensed under cc by-sa 3.0 with attribution required.