14

참고 : SST = 총 제곱합, SSE = 제곱 오차의 합, SSR = 회귀 제곱합. 제목의 방정식은 종종 다음과 같이 작성됩니다.

i=1n(yiy¯)2=i=1n(yiy^i)2+i=1n(y^iy¯)2

매우 간단한 질문이지만 직관적 인 설명을 찾고 있습니다. 직관적으로, SSTSSE+SSR 이 더 의미가있는 것처럼 보입니다 . 예를 들어, 가정하자 점 xi Y 값에 대응하고있다 yi=5Y = 3 여기서, Y 회귀 직선에 대응하는 점이다. 또한 데이터 세트의 평균 y- 값이 ˉ y = 0 이라고 가정합니다 . 그런 다음이 특정 지점 i에 대해 Sy^i=3y^iy¯=0SST=(50)2=52=25 , 반면SSE=(53)2=22=4SSR=(30)2=32=9 . 분명히9+4<25 입니다. 이 결과가 전체 데이터 세트로 일반화되지 않습니까? 나는 그것을 얻지 못한다.


답변:


15

더하기 및 빼기

i=1n(yiy¯)2=i=1n(yiy^i+y^iy¯)2=i=1n(yiy^i)2+2i=1n(yiy^i)(y^iy¯)+i=1n(y^iy¯)2
So we need to show that i=1n(yiy^i)(y^iy¯)=0. Write
i=1n(yiy^i)(y^iy¯)=i=1n(yiy^i)y^iy¯i=1n(yiy^i)
So, (a) the residuals ei=yiy^i need to be orthogonal to the fitted values, i=1n(yiy^i)y^i=0, and (b) the sum of the fitted values needs to be equal to the sum of the dependent variable, i=1nyi=i=1ny^i.

Actually, I think (a) is easier to show in matrix notation for general multiple regression of which the single variable case is a special case:

eXβ^=(yXβ^)Xβ^=(yX(XX)1Xy)Xβ^=y(XX(XX)1XX)β^=y(XX)β^=0
As for (b), the derivative of the OLS criterion function with respect to the constant (so you need one in the regression for this to be true!), aka the normal equation, is
SSRα^=2i(yiα^β^xi)=0,
which can be rearranged to
iyi=nα^+β^ixi
The right hand side of this equation evidently also is i=1ny^i, as y^i=α^+β^xi.

3

(1) Intuition for why SST=SSR+SSE

When we try to explain the total variation in Y (SST) with one explanatory variable, X, then there are exactly two sources of variability. First, there is the variability captured by X (Sum Square Regression), and second, there is the variability not captured by X (Sum Square Error). Hence, SST=SSR+SSE (exact equality).

(2) Geometric intuition

Please see the first few pictures here (especially the third): https://sites.google.com/site/modernprogramevaluation/variance-and-bias

Some of the total variation in the data (distance from datapoint to Y¯) is captured by the regression line (the distance from the regression line to Y¯) and error (distance from the point to the regression line). There's not room left for SST to be greater than SSE+SSR.

(3) The problem with your illustration

You can't look at SSE and SSR in a pointwise fashion. For a particular point, the residual may be large, so that there is more error than explanatory power from X. However, for other points, the residual will be small, so that the regression line explains a lot of the variability. They will balance out and ultimately SST=SSR+SSE. Of course this is not rigorous, but you can find proofs like the above.

Also notice that regression will not be defined for one point: b1=(XiX¯)(YiY¯)(XiX¯)2, and you can see that the denominator will be zero, making estimation undefined.

Hope this helps.

--Ryan M.


1

When an intercept is included in linear regression(sum of residuals is zero), SST=SSE+SSR.

prove

SST=i=1n(yiy¯)2=i=1n(yiy^i+y^iy¯)2=i=1n(yiy^i)2+2i=1n(yiy^i)(y^iy¯)+i=1n(y^iy¯)2=SSE+SSR+2i=1n(yiy^i)(y^iy¯)
Just need to prove last part is equal to 0:
i=1n(yiy^i)(y^iy¯)=i=1n(yiβ0β1xi)(β0+β1xiy¯)=(β0y¯)i=1n(yiβ0β1xi)+β1i=1n(yiβ0β1xi)xi
In Least squares regression, the sum of the squares of the errors is minimized.
SSE=i=1n(ei)2=i=1n(yiyi^)2=i=1n(yiβ0β1xi)2
Take the partial derivative of SSE with respect to β0 and setting it to zero.
SSEβ0=i=1n2(yiβ0β1xi)1=0
So
i=1n(yiβ0β1xi)1=0
Take the partial derivative of SSE with respect to β1 and setting it to zero.
SSEβ1=i=1n2(yiβ0β1xi)1xi=0
So
i=1n(yiβ0β1xi)1xi=0
Hence,
i=1n(yiy^i)(y^iy¯)=(β0y¯)i=1n(yiβ0β1xi)+β1i=1n(yiβ0β1xi)xi=0
SST=SSE+SSR+2i=1n(yiy^i)(y^iy¯)=SSE+SSR


당사 사이트를 사용함과 동시에 당사의 쿠키 정책개인정보 보호정책을 읽고 이해하였음을 인정하는 것으로 간주합니다.
Licensed under cc by-sa 3.0 with attribution required.