결합 된 표준 편차를 찾을 수 있습니까?


답변:


30

따라서이 두 샘플 중 하나를 하나로 가져 오려면 다음을 수행하십시오.

s1=1n1Σi=1n1(xiy¯1)2

s2=1n2Σi=1n2(yiy¯2)2

여기서 ˉ y 2 는 표본 평균이며 s 1s 2y¯1y¯2s1s2 는 표본 표준 편차입니다.

그것들을 추가하려면 다음이 있습니다.

s=1n1+n2Σi=1n1+n2(ziy¯)2

새로운 평균 ˉ y 1ˉ y 2 와 다르기 때문에 그렇게 간단하지 않습니다 .y¯y¯1y¯2

y¯=1n1+n2Σi=1n1+n2zi=n1y¯1+n2y¯2n1+n2

최종 공식은 다음과 같습니다.

s=n1s12+n2s22+n1(y¯1y¯)2+n2(y¯2y¯)2n1+n2

일반적으로 사용되는 베셀 보정 ( " 분모") 버전의 표준 편차에 대한 평균 결과는 이전과 같지만n1

s=(n11)s12+(n21)s22+n1(y¯1y¯)2+n2(y¯2y¯)2n1+n21

You can read more info here: http://en.wikipedia.org/wiki/Standard_deviation


1
If the OP is using the Bessel-corrected (n1-denominator for the variance) version of sample standard deviation (as almost everyone who asks here will be doing), this answer won't quite give them what they seek.
Glen_b -Reinstate Monica

In that case, this section does the trick. (edit to link to old wikipedia version since it's removed from the new one)
Glen_b -Reinstate Monica

@Glen_b Good catch. Can you edit this into the answer to make it more useful then?
sashkello

I went to Wikipedia to find the proof, but unfortunately this formula is no longer there. Care to elaborate (the proof) or improve Wikipedia? :)
Rauni Lillemets


8

This obviously extends to K groups:

s=k=1K(nk1)sk2+nk(y¯ky¯)2(k=1Knk)1

7
This is a bit brief by out standards. Could you say a bit more about how this is derived and why this is the correct answer?
Sycorax says Reinstate Monica

1

I had the same problem: having the standard deviation, means and sizes of several subsets with empty intersection, compute the standard deviation of the union of those subsets.

I like the answer of sashkello and Glen_b ♦, but I wanted to find a proof of it. I did it in this way, and I leave it here in case it is of help for anybody.


So the aim is to see that indeed:

s=(n1s12+n2s22+n1(y¯1y¯)2+n2(y¯2y¯)2n1+n2)1/2

Step by step:

(n1s12+n2s22+n1(y¯1y¯)2+n2(y¯2y¯)2n1+n2)1/2=(i=1n1(xiy1¯)2+i=1n2(yiy2¯)2+n1(y¯1y¯)2+n2(y¯2y¯)2n1+n2)1/2=(i=1n1((xiy1¯)2+(y¯1y¯)2)+i=1n2((yiy2¯)2+(y¯2y¯)2)n1+n2)1/2=(i=1n1(xi2+y¯2+2y1¯22xiy1¯2y1¯y¯)n1+n2+i=1n2(yi2+y¯2+2y2¯22yiy2¯2y2¯y¯)n1+n2)1/2=(i=1n1(xi2+y¯22y¯j=1n1xjn1)+2n1y1¯22y1¯i=1n1xin1+n2+i=1n2(yi2+y¯22y¯j=1n2yjn2)+2n2y2¯22y2¯i=1n2yin1+n2)1/2=(i=1n1(xi2+y¯22y¯j=1n1xjn1)+2n1y1¯22y1¯n1y1¯n1+n2+i=1n2(yi2+y¯22y¯j=1n2yjn2)+2n2y2¯22y2¯n2y2¯n1+n2)1/2=(i=1n1(xi2+y¯22y¯j=1n1xjn1)n1+n2+i=1n2(yi2+y¯22y¯j=1n2yjn2)n1+n2)1/2

Now the trick is to realize that we can reorder the sums: since each

2y¯j=1n1xjn1
term appears n1 times, we can re-write the numerator as
i=1n1(xi2+y¯22y¯xi),

and hence, continuing with the equality chain:

=(i=1n1(xiy¯)2n1+n2+i=1n2(yiy¯)2n1+n2)1/2=(i=1n1+n2(ziy¯)2n1+n2)1/2=s

This been said, there is probably a simpler way to do this.

The formula can be extended to k subsets as stated before. The proof would be induction on the number of sets. The base case is already proven, and for the induction step you should apply a similar equality chain to the latter.


I don't see how the question is clear. Are the two data sets assumed to come from the same distribution? Does the OP have the actual observations available or just the sample estimates of mean and standard deviation?
Michael R. Chernick

Yes they are assumed to come from the same distribution. Observations are not available, just the mean and standard deviation of the subsets.
iipr

Then why are using a formula that involves the individual observations?
Michael R. Chernick

Maybe my answer is not clear. I am simply posting a mathematical proof of the above formula that allows to compute s from the standard deviations, means and sizes of two subsets. In the formula there is no reference to the individual observations. In the proof there is, but its just a proof, and from my point of view, correct.
iipr
당사 사이트를 사용함과 동시에 당사의 쿠키 정책개인정보 보호정책을 읽고 이해하였음을 인정하는 것으로 간주합니다.
Licensed under cc by-sa 3.0 with attribution required.