가중 평균 추정의 계산 표준 오차


16

그 가정 X 1 , X 2 , . . . , X는 N 각 그려 IID 와 일부 분포로부터 w I 독립의 X . w 엄격히 긍정적이다. 당신은 모든 관찰 내가 아닌 X I ; 오히려 당신은 i x i w 를 관찰합니다w1,w2,,wnx1,x2,...,xnwixiwiwixiixiwi 합니다. 견적에 관심이 있습니다E[x] from this information. Clearly the estimator

x¯=iwixiiwi
is unbiased, and can be computed given the information at hand.

How might I compute the standard error of this estimator? For the sub-case where xi takes only values 0 and 1, I naively tried 기본적으로의 변화를 무시내가하지만,이 샘플이 250 주위보다 작은 크기 (그리고 이것은 아마의 차이에 따라 약세를 발견내가.) 어쩌면 내가 안 보인다 '더 나은'표준 오류를 계산하기에 충분한 정보가 있습니다.

sex¯(1x¯)iwi2iwi,
wiwi

답변:


17

최근에 같은 문제가 발생했습니다. 다음은 내가 찾은 것입니다.

가중치가 동일한 단순 랜덤 표본과 달리 가중치 평균 의 표준 오차에 대한 정의는 널리 인정되지 않습니다 . 요즘에는 부트 스트랩을 수행하고 평균의 경험적 분포를 구하는 것이 간단하며 표준 오차를 추정 한 것입니다.

이 추정을 수행하기 위해 공식을 사용하려면 어떻게해야합니까?

주요 참고 자료는 Donald F. Gatz와 Luther Smith가 작성한 이 논문 에서 3 가지 공식 기반 추정값을 부트 스트랩 결과와 비교합니다. 부트 스트랩 결과에 대한 가장 근사치는 Cochran (1977)에서 나옵니다.

(SEMw)2=n(n1)(Pi)2[(PiXiP¯X¯w)22X¯w(PiP¯)(PiXiP¯X¯w)+X¯w2(PiP¯)2]

The following is the corresponding R code that came from this R listserve thread.

weighted.var.se <- function(x, w, na.rm=FALSE)
#  Computes the variance of a weighted mean following Cochran 1977 definition
{
  if (na.rm) { w <- w[i <- !is.na(x)]; x <- x[i] }
  n = length(w)
  xWbar = weighted.mean(x,w,na.rm=na.rm)
  wbar = mean(w)
  out = n/((n-1)*sum(w)^2)*(sum((w*x-wbar*xWbar)^2)-2*xWbar*sum((w-wbar)*(w*x-wbar*xWbar))+xWbar^2*sum((w-wbar)^2))
  return(out)
}

Hope this helps!


This is pretty cool, but for my problem I don't even observe the PiXi, rather I observe the sum iPiXi. My question is very weird because it involves some information asymmetry (a third party is reporting the sum, and trying to perhaps hide some information).
shabbychef

Gosh you're right, sorry I did not fully understand the question you posed. Suppose we boil your problem down to the simplest case where all wi are Bernoulli RV. Then you are essentially observing the sum of a random subset of n RVs. My guess is there is not a lot of information here to estimate with. So what did you end up doing for your original problem?
Ming K

@Ming-ChihKao this cochran formula is interesting but if you build a confidence interval off this when the data is not normal there is no consistent interpretation correct? How would you handle non-normal weighted average mean confidence intervals? Weighted quantiles?
user3022875

I think there is an error with the function. If you substitute w=rep(1, length(x)), then weighted.var.se(rnorm(50), rep(1, 50)) is about 0.014. I think the formula is missing a sum(w^2) in the numerator, since when P=1, the variance is 1/(n*(n-1)) * sum((x-xbar)^2). I can't check the cited article as it is behind a paywall, but I think that correction. Oddly enough, Wikipedia's (different) solution becomes degenerate when all weights are equal: en.wikipedia.org/wiki/….
Max Candocia

These may work better in general: analyticalgroup.com/download/WEIGHTED_MEAN.pdf
Max Candocia

5

The variance of your estimate given the wi is

wi2Var(X)(wi)2=Var(X)wi2(wi)2.
Because your estimate is unbiased for any wi, the variance of its conditional mean is zero. Hence, the variance of your estimate is
Var(X)E(wi2(wi)2)
With all the data observed, this would be easy to estimate empirically. But with only a measure of location of the Xi observed, and not their spread, I don't see how it's going to be possible to get an estimate of Var(X), without making rather severe assumptions.

at least in the specific case where xi have a Bernoulli distribution I can estimate the variance of x by x¯(1x¯) as noted above. Even in this case, as noted in the question, I need a larger sample size than I would have expected.
shabbychef
당사 사이트를 사용함과 동시에 당사의 쿠키 정책개인정보 보호정책을 읽고 이해하였음을 인정하는 것으로 간주합니다.
Licensed under cc by-sa 3.0 with attribution required.