중앙값의 편견 추정치

에서 지원되는 랜덤 변수 가 있다고 가정 해 봅시다 . 의 중앙값에 대한 편견없는 추정값을 어떻게 얻을 수 있습니까? $X$ $[0,1]$ $X$

물론 일부 샘플을 생성하고 샘플 중앙값을 취할 수는 있지만 이것이 일반적으로 편향되지는 않는다는 것을 이해합니다.

참고 :이 질문은 내 마지막 질문 과 관련이 있지만 동일하지는 않습니다 .이 경우 $X$ 는 대략 샘플링 될 수 있습니다.

sampling

— 로빈슨
소스

답변:

이러한 추정기는 존재하지 않습니다.

직감은 평균이 고정밀도를 유지하면서 양쪽에서 확률 밀도를 자유롭게 이동할 수 있기 때문에 평균값이 한 분포의 중앙값 인 추정량은 변경된 분포에 대해 평균이 다르므로 편중됩니다. 다음 설명은이 직관에 조금 더 엄격합니다.

우리는 분포에 초점 $F$ 고유 중앙값 갖는 $m$ 있으므로 그 정의에 의해 $F(m) \ge 1/2$ 및 $F(x) \lt 1/2$ 모두 $x \lt m$ . 표본 크기 고정하고 추정 $n \ge 1$ 한다고 가정합니다 . ( 충분하다 $t: [0,1]^n \to [0,1]$ $m$ $t$ 한계는 있지만 일반적으로 불가능한 값을 산출하는 추정자를 진지하게 고려하지 않습니다.) 우리 는 에 대한 가정 을 하지 않습니다 . 어디에서나 연속적 일 필요조차 없습니다. $t$

가 편향되지 않는다는 의미 (고정 된 표본 크기의 경우)는 $t$

E F [t (X 1, \dots, X n)] = m

$E_F[t(X_1, \ldots, X_n)] = m$

iid 샘플의 경우 . 에 "불편 추정" 이 속성을 하나 모두 같은 . $X_i \sim F$ $t$ $F$

편견 추정기가 존재한다고 가정합니다. 우리는 특히 간단한 분포에 적용하여 모순을 도출 할 것입니다. 다음과 같은 특성을 갖는 분포 을 고려하십시오 . $F = F_{x,y,m, \varepsilon}$

; $0 \le x \lt y \le 1$
; $0 \lt \varepsilon \lt (y-x)/4$
; $x + \varepsilon \lt m \lt y - \varepsilon$
; $\Pr(X = x) = \Pr(X = y) = (1-\varepsilon)/2$
; 과 $\Pr(m-\varepsilon \le X \le m+\varepsilon) = \varepsilon$
는 에서 균일합니다. $F$ $[m-\varepsilon, m+\varepsilon]$

These distributions place probability $(1-\varepsilon)/2$ at each of $x$ and $y$ and a tiny amount of probability symmetrically placed around $m$ between $x$ and $y$ . This makes $m$ the unique median of $F$ . (If you are concerned that this is not a continuous distribution, then convolve it with a very narrow Gaussian and truncate the result to $[0,1]$ : the argument will not change.)

Now, for any putative median estimator $t$ , an easy estimate shows that $E[t(X_1, X_2, \ldots, X_n)]$ is strictly within $\varepsilon$ of the average of the $2^n$ values $t(x_1, x_2, \ldots, x_n)$ where the $x_i$ vary over all possible combinations of $x$ and $y$ . However, we can vary $m$ between $x + \varepsilon$ and $y - \varepsilon$ , a change of at least $\varepsilon$ (by virtue of conditions 2 and 3). Thus there exists an $m$ , and whence a corresponding distribution $F_{x,y,m,\varepsilon}$ , for which this expectation does not equal the median, QED.

— whuber
소스

(+1) Nice proof. Did you come up with it, or is it something you remembered from the grad school?

— StasK

Here is another proof: Most Bernoulli random variables have median

0 $0$ or

1 $1$ . The estimate from

n $n$ trials depends only on the average values of the estimator on the vertices of

[0,1]n $[0,1]^n$ with

k $k$ , and the weights of these average values is a polynomial in

p $p$ of degree

n $n$ . If this is an unbiased estimator, it must have average value

1 $1$ for any

p>1/2 $p \gt 1/2$ , and there are more than

n+1 $n+1$ such values of

p $p$ , so this polynomial must be constant... but it must be

0 $0$ on lower values of

p $p$ , so it can't be unbiased there, too.

— Douglas Zare

@Douglas That's a great proof. I suspect some people might feel a little uneasy about the scope of its applicability, though, because the median for a Bernoulli variable is somewhat special, being coincident with one of its two support points (except when

p=1/2 $p=1/2$ ). Readers might be tempted to declare this as "pathological" and try to bar such monsters by looking only at continuous distributions with everywhere positive densities on their domains. That's why I took care to show that such efforts will fail.

— whuber

Finding an unbiased estimator without having a parametric model would be difficult! But you could use bootstrapping, and use that to correct the empirical median to get an approximately unbiased estimator.

— kjetil b halvorsen
소스

If this is impossible, is it possible to prove it? For example, if

X1,X2,…,Xn $X_1, X_2, \ldots, X_n$ are independent samples from

X $X$ then can one prove that

f(X1,…,Xn) $f(X_1, \ldots, X_n)$ cannot be unbiased for any choice of

f $f$ ?

— robinson

I think kjetil is saying that in a nonparametric framework there is no method that will give an unbiased estimate for every possible distribution. But in the parametric framework you probably could. Bootstrapping a biased sample estimate can allow you to estimate the bias and adjust it to get a bootstrap estimate that is nearly unbiased. That was his suggestion for handling the problem in the nonparametric framework. Proving that an unbiased estimate is not possible would also be difficult.

— Michael R. Chernick

If you really want to try to prove that there do not exist an unbiased estimator, there is a book, Ferguson: "Mathematical Statistics - A Decision Theoretic Approach" which do have some examples of that kind of thing!

— kjetil b halvorsen

I imagine that the regularity conditions for the bootstrap will be violated with the distribution functions that whuber considers in his answer. Michael, can you comment?

— StasK

@Stas As I pointed out, my functions can be made to look very "nice" by mollifying them. They can also be generalized to mollifications of large finite mixtures of atoms. The class of such distributions is dense in all distributions on the unit interval, so I don't think bootstrap regularity would be involved here.

— whuber

I believe quantile regression will give you a consistent estimator of the median. Given the model $Y = \alpha + u$ . And you want to estimate $\text{med}(y) = \text{med}(\alpha + u) = \alpha + \text{med}(u)$ since $\alpha$ is a constant. All you need is the $\text{med}(u) = 0$ which should be true so long as you have independent draws. However, as far as unbiasedness, I don't know. Medians are difficult.

— Francis
소스

See @whuber 's answer

— Peter Flom - Reinstate Monica