이러한 변수의 선형 조합 (예 : A + 2 B + 5 C )을 선택한다고 가정 해 봅시다 .A+2B+5C
이 질문은 두 가지 다른 방식으로 이해되어 두 가지 다른 답변으로 이어질 수 있습니다.
A linear combination corresponds to a vector, which in your example is [1,2,5,0,0,0][1,2,5,0,0,0]. This vector, in turn, defines an axis in the 6D space of the original variables. What you are asking is, how much variance does projection on this axis "describe"? The answer is given via the notion of "reconstruction" of original data from this projection, and measuring the reconstruction error (see Wikipedia on Fraction of variance unexplained). Turns out, this reconstruction can be reasonably done in two different ways, yielding two different answers.
Approach #1
하자 X는 센터링 된 데이터 세트 일 수 ( N 행은 샘플에 대응하는 D 열 변수에 대응)하도록 Σ는 그 공분산 행렬 및하자 승 로부터 단위 벡터 수 R에 D . 데이터 세트의 총 분산은 모든 d 분산 의 합 , 즉 공분산 행렬의 트레이스입니다. T = t r ( Σ ) . 질문 : 어떤 비율 T는 않습니다 w를 설명? @todddeluca와 @probabilityislogic의 두 가지 대답은 모두 다음과 같습니다. 계산 프로젝션 X wXndΣwRddT=tr(Σ)TwXw, compute its variance and divide by TT: R2first=Var(Xw)T=w⊤Σwtr(Σ).
R2first=Var(Xw)T=w⊤Σwtr(Σ).
This might not be immediately obvious, because e.g. @probabilityislogic suggests to consider the reconstruction Xww⊤Xww⊤ and then to compute ‖X‖2−‖X−Xww⊤‖2‖X‖2,
∥X∥2−∥X−Xww⊤∥2∥X∥2,
but with a little algebra this can be shown to be an equivalent expression.
Approach #2
Okay. Now consider a following example: XX is a d=2d=2 dataset with covariance matrix Σ=(10.990.991)
Σ=(10.990.991)
and
w=(10)⊤w=(10)⊤ is simply an
xx vector:
The total variance is T=2T=2. The variance of the projection onto ww (shown in red dots) is equal to 11. So according to the above logic, the explained variance is equal to 1/21/2. And in some sense it is: red dots ("reconstruction") are far away from the corresponding blue dots, so a lot of the variance is "lost".
On the other hand, the two variables have 0.990.99 correlation and so are almost identical; saying that one of them describes only 50%50% of the total variance is weird, because each of them contains "almost all the information" about the second one. We can formalize it as follows: given projection XwXw, find a best possible reconstruction Xwv⊤Xwv⊤ with vv not necessarily the same as ww, and then compute the reconstruction error and plug it into the expression for the proportion of explained variance: R2second=‖X‖2−‖X−Xwv⊤‖2‖X‖2,
R2second=∥X∥2−∥X−Xwv⊤∥2∥X∥2,
where
vv is chosen such that
‖X−Xwv⊤‖2∥X−Xwv⊤∥2 is minimal (i.e.
R2R2 is maximal). This is exactly equivalent to computing
R2R2 of multivariate regression predicting original dataset
XX from the
11-dimensional projection
XwXw.
It is a matter of straightforward algebra to use regression solution for vv to find that the whole expression simplifies to R2second=‖Σw‖2w⊤Σw⋅tr(Σ).
R2second=∥Σw∥2w⊤Σw⋅tr(Σ).
In the example above this is equal to
0.99010.9901, which seems reasonable.
Note that if (and only if) ww is one of the eigenvectors of ΣΣ, i.e. one of the principal axes, with eigenvalue λλ (so that Σw=λwΣw=λw), then both approaches to compute R2R2 coincide and reduce to the familiar PCA expression R2PCA=R2first=R2second=λ/tr(Σ)=λ/∑λi.
R2PCA=R2first=R2second=λ/tr(Σ)=λ/∑λi.
PS. See my answer here for an application of the derived formula to the special case of ww being one of the basis vectors: Variance of the data explained by a single variable.
Appendix. Derivation of the formula for R2secondR2second
Finding vv minimizing the reconstruction ‖X−Xwv⊤‖2∥X−Xwv⊤∥2 is a regression problem (with XwXw as univariate predictor and XX as multivariate response). Its solution is given by v⊤=((Xw)⊤(Xw))−1(Xw)⊤X=(w⊤Σw)−1w⊤Σ.
v⊤=((Xw)⊤(Xw))−1(Xw)⊤X=(w⊤Σw)−1w⊤Σ.
Next, the R2R2 formula can be simplified as R2=‖X‖2−‖X−Xwv⊤‖2‖X‖2=‖Xwv⊤‖2‖X‖2
R2=∥X∥2−∥X−Xwv⊤∥2∥X∥2=∥Xwv⊤∥2∥X∥2
due to the Pythagoras theorem, because the hat matrix in regression is an orthogonal projection (but it is also easy to show directly).
Plugging now the equation for vv, we obtain for the numerator: ‖Xwv⊤‖2=tr(Xwv⊤(Xwv⊤)⊤)=tr(Xww⊤ΣΣww⊤X⊤)/(w⊤Σw)2=tr(w⊤ΣΣw)/(w⊤Σw)=‖Σw‖2/(w⊤Σw).
∥Xwv⊤∥2=tr(Xwv⊤(Xwv⊤)⊤)=tr(Xww⊤ΣΣww⊤X⊤)/(w⊤Σw)2=tr(w⊤ΣΣw)/(w⊤Σw)=∥Σw∥2/(w⊤Σw).
The denominator is equal to ‖X‖2=tr(Σ)∥X∥2=tr(Σ) resulting in the formula given above.