역 능선 회귀 : 주어진 반응 행렬과 회귀 계수, 적합한 예측 변수 찾기

표준 OLS 회귀 문제를 고려하십시오 $\newcommand{\Y}{\mathbf Y}\newcommand{\X}{\mathbf X}\newcommand{\B}{\boldsymbol\beta}\DeclareMathOperator*{argmin}{argmin}$ $\Y$ $\X$ $\B$

L = ‖ Y - X β ‖^{2} .

$L=\|\Y-\X\B\|^2.$

\hat{β} = \underset{β}{argmin} {L} = (X^{⊤} X)^{+} X^{⊤} Y .

$\hat\B=\argmin_\B\{L\} = (\X^\top\X)^+\X^\top \Y.$

"역전 된"문제를 제기 할 수도 있습니다 : 와 , 산출하는 를 찾으십시오 . 즉, . 즉, I는 응답 행렬이 와 계수 벡터 난에 근접 계수를 산출 할 예측기 행렬 찾을 . 물론 이것은 솔루션 의 OLS 회귀 문제이기도합니다. $\Y$ $\B^*$ $\hat\X$ $\hat\B\approx \B^*$ $\|\argmin_\B\{L\}-\B^*\|^2$ $\Y$ $\B^*$ $\B^*$

\hat{X} = \underset{X}{argmin} {‖ \underset{β}{argmin} {L} - β^{*} ‖^{2}} = Y β^{⊤} (β β^{⊤})^{+} .

$\hat\X = \argmin_\X\Big\{\|\argmin_\B\{L\}-\B^*\|^2\Big\} = \Y\B^\top(\B\B^\top)^{+}.$

설명 업데이트 : @ GeoMatt22가 그의 답변에서 설명했듯이, $\Y$ 가 벡터 (즉, 하나의 응답 변수 만있는 경우)이면이 $\hat \X$ 는 1이되고 반대 문제는 크게 결정되지 않습니다. 필자의 경우 $\Y$ 는 실제로 행렬입니다 (즉, 반응 변수가 많고 다변량 회귀입니다). 따라서 $\X$ 는 $n\times p$ 이고 $\Y$ 는 $n\times q$ 이고 $\B$ 는 $p\times q$ 입니다.

능선 회귀에 대한 "반전"문제를 해결하는 데 관심이 있습니다. 즉, 내 손실 함수는 이제 이고 해결책은

L = ‖ Y - X β ‖^{2} + μ ‖ β ‖^{2}

$L=\|\Y-\X\B\|^2+\mu\|\B\|^2$

\hat{β} = \underset{β}{argmin} {L} = (X^{⊤} X + μ I)^{- 1} X^{⊤} Y .

$\hat\B=\argmin_\B\{L\}=(\X^\top \X+\mu\mathbf I)^{-1}\X^\top \Y.$

"역방향"문제는

\hat{X} = \underset{X}{argmin} {‖ \underset{β}{argmin} {L} - β^{*} ‖^{2}} = ?

$\hat\X = \argmin_\X\Big\{\|\argmin_\B\{L\}-\B^*\|^2\Big\} = \;?$

또, I는 응답 행렬이 $\Y$ 와 계수 벡터 $\B^*$ 난에 근접 계수를 산출 할 예측 매트릭스 찾을 $\B^*$ .

실제로 두 가지 관련 공식이 있습니다.

와 및 주어진 찾으십시오 . $\hat\X$ $\Y$ $\B^*$ $\mu$
및 주어진 및 찾으십시오 . $\hat\X$ $\hat \mu$ $\Y$ $\B^*$

둘 중 하나에 직접적인 해결책이 있습니까?

다음은 문제를 설명하기 위해 간단한 Matlab 발췌입니다.

% generate some data
n = 10; % number of samples
p = 20; % number of predictors
q = 30; % number of responses
Y = rand(n,q);
X = rand(n,p);
mu = 0;
I = eye(p);

% solve the forward problem: find beta given y,X,mu
betahat = pinv(X'*X + mu*I) * X'*Y;

% backward problem: find X given y,beta,mu
% this formula works correctly only when mu=0
Xhat =  Y*betahat'*pinv(betahat*betahat');

% verify if Xhat indeed yields betahat
betahathat = pinv(Xhat'*Xhat + mu*I)*Xhat'*Y;
max(abs(betahathat(:) - betahat(:)))

이 코드는 mu=0그렇지 않으면 0을 출력합니다 .

regression least-squares ridge-regression

— 아메바의 말에 따르면 복원 모니카
소스

와

가 주어 지기 때문에 손실의 변화에 영향을 미치지 않습니다. 따라서 (1)에서 여전히 OLS를 수행하고 있습니다. 손실이 복용에 의해 임의적으로 작게 할 수 있기 때문에 (2), 동등 간단

당신이 그것에 부과 비교 어떤 제약의 범위 내에서 임의로 부정적인. 그러면 사례가 줄어 듭니다 (1).

B

$B$

μ

$\mu$

\hat{μ}

$\hat\mu$

— whuber

@ whuber 감사합니다. 나는 그것을 명확하게 설명하지 않았다고 생각합니다. (1)을 고려하십시오.

와

주어진다 (현실을 부르 자

),하지만 난 찾을 필요가

에 가까운 능선 회귀 계수를 산출 할

즉 내가 찾으려는,

최소화

왜 이것이 OLS 여야하는지 모르겠습니다.

B

$B$

μ

$\mu$

B^{*}

$B^*$

X

$X$

B^{*}

$B^*$

X

$X$

‖ \underset{B}{argmin} {L_{r i d g e} (X, B)} - B^{*} ‖^{2} .

$\Big\|\operatorname*{argmin}_B\big\{ L_\mathrm{ridge}(X,B)\big\} - B^*\Big\|^2.$

— 아메바 씨는 Reinstate Monica

그것은 내가 가진처럼

및 I 찾으려

있도록

가까운 주어진 것입니다

를 찾는 것과 같지 않습니다 .

f (v, w)

$f(v,w)$

v

$v$

{argmin}_{w} f (v, w)

$\operatorname{argmin}_w f(v,w)$

w^{*}

$w^*$

{argmin}_{v} f (v, w^{*})

$\operatorname{argmin}_v f(v,w^*)$

— amoeba는

게시물의 설명은 실제로

을 손실 함수로 사용 하지 않기 때문에이 문제에 대해 혼란 스럽습니다 . 게시물의 문제 (1)과 (2)에 대해 자세히 설명해 주시겠습니까?

L

$L$

— whuber

@ hxd1011 X의 많은 열은 일반적으로 "다중 회귀"라고하며 Y의 많은 열은 일반적으로 "다변량 회귀"라고합니다.

— amoeba는

질문이 관심있는 문제의보다 정확한 공식에 수렴되었으므로 사례 1 (알려진 능선 매개 변수)에 대한 해결책을 찾았습니다. 이는 사례 2 (분석 솔루션이 아니라 간단한 공식 및 일부 제약 조건)에도 도움이됩니다.

요약 : 두 가지 역 문제 공식 중 어느 것도 독특한 답이 없습니다. 에서는 케이스 (2) 여기서, 릿지 파라미터 , 무한히 많은 솔루션이있다 알려지지 에 대한, . 가 주어진 경우 1의 경우 , 특이 값 스펙트럼의 모호성으로 인해 대해 유한 한 수의 솔루션이 있습니다. $\mu\equiv\omega^2$ $X_\omega$ $\omega\in[0,\omega_\max]$ $\omega$ $X_\omega$

(파생은 약간 길기 때문에 TL, DR : 끝에 작동하는 Matlab 코드가 있습니다.)

불충분 한 사례 ( "OLS")

순방향 문제는 이며 여기서 , 및 입니다.

min_{B} ‖ X B - Y ‖^{2}

$\min_B\|XB-Y\|^2$

X \in R^{n \times p}

$X\in\mathbb{R}^{n\times p}$

B \in R^{p \times q}

$B\in\mathbb{R}^{p\times q}$

Y \in R^{n \times q}

$Y\in\mathbb{R}^{n\times q}$

업데이트 된 질문에 기초하여, 우리는 가정 하므로, 는 와 주어질 때 결정 됩니다. 질문에, 우리는 "기본"(최소 가정합니다 -norm) 솔루션 는 IS 의사 - 역행렬 의 . $n<p<q$ $B$ $X$ $Y$ $L_2$

B = X^{+} Y

$B=X^+Y$

X^{+}

$X^+$

X

$X$

의 특이 값 분해 ( SVD ) 로부터 * 의해 주어집니다 . 의사 역수는 ** 으로 계산 될 수 있습니다. (* 첫 번째 표현식은 전체 SVD를 사용하는 반면 두 번째 표현식은 축소 된 SVD를 사용합니다. ** 간단하게하기 위해 에 전체 순위가 있다고 가정 합니다. 즉, 이 존재합니다.) $X$

X = U S V^{T} = U S_{0} V_{0}^{T}

$X=USV^T=US_0V_0^T$

X^{+} = V S^{+} U^{T} = V_{0} S_{0}^{- 1} U^{T}

$X^+=VS^+U^T=V_0S_0^{-1}U^T$

X

$X$

S_{0}^{- 1}

$S_0^{-1}$

앞으로의 문제는 해를 나중에 참조 할 수 있도록 . 여기서 은 벡터입니다 특이 값.

B \equiv X^{+} Y = (V_{0} S_{0}^{- 1} U^{T}) Y

$B\equiv X^+Y=\left(V_0S_0^{-1}U^T\right)Y$

S_{0} = d i a g (σ_{0})

$S_0=\mathrm{diag}(\sigma_0)$

σ_{0} > 0

$\sigma_0>0$

역의 문제에서 우리는 와 받습니다. 우리는 가 위의 과정에서 나온 것을 알고 있지만 는 모른다 . 작업은 적절한 를 결정하는 것 입니다. $Y$ $B$ $B$ $X$ $X$

$X$

X_{0} = Y B^{+}

$X_0=YB^+$

B

$B$

과도하게 결정된 사례 (Ridge Estimator)

"OLS"의 경우, 최소 표준 솔루션 을 선택하여 결정된 문제를 해결했습니다 . 즉, "고유 한"솔루션은 암시 적으로 정규화되었습니다 .

$\omega$

$\beta_k$ $k=1,\ldots,q$

min_{β} ‖ X β - y_{k} ‖^{2} + ω^{2} ‖ β ‖^{2}

$\min_\beta\|X\beta-y_k\|^2+\omega^2\|\beta\|^2$

B_{ω} = [β_{1}, \dots, β_{k}], Y = [y_{1}, \dots, y_{k}]

$B_{\omega}=[\beta_1,\ldots,\beta_k] \quad,\quad Y=[y_1,\ldots,y_k]$

min_{B} ‖ X_{ω} B - Y ‖^{2}

$\min_B\|\mathsf{X}_\omega B-\mathsf{Y}\|^2$

X_{ω} = [\begin{matrix} X \\ ω I \end{matrix}], Y = [\begin{matrix} Y \\ 0 \end{matrix}]

$\mathsf{X}_\omega=\begin{bmatrix}X \\ \omega I\end{bmatrix} \quad , \quad \mathsf{Y}=\begin{bmatrix}Y \\ 0 \end{bmatrix}$

B_{ω} = X^{+} Y

$B_\omega = \mathsf{X}^+\mathsf{Y}$

B_{ω} = (V_{0} S_{ω}^{- 2} U^{T}) Y

$B_\omega = \left(V_0S_\omega^{-2}U^T\right) Y$

σ_{ω}^{2} = \frac{σ_{0}^{2} + ω^{2}}{σ_{0}}

$\sigma_\omega^2 = \frac{\sigma_0^2+\omega^2}{\sigma_0}$

p \leq n

$p\leq n$

σ_{ω}

$\sigma_\omega$

σ_{0}

$\sigma_0$

X_{ω} = Y B_{ω}^{+}

$X_\omega=YB_\omega^+$

X_{ω} = U S_{ω}^{2} V_{0}^{T}

$X_\omega=US_\omega^2V_0^T$

σ_{ω}^{2}

$\sigma_\omega^2$

$\sigma_0$ $\sigma_\omega^2$ $\omega$

σ_{0} = \bar{σ} \pm Δ σ, \bar{σ} = \frac{1}{2} σ_{ω}^{2}, Δ σ = \sqrt{(\bar{σ} + ω) (\bar{σ} - ω)}

$\sigma_0=\bar{\sigma} \pm \Delta\sigma \quad , \quad \bar{\sigma} = \tfrac{1}{2}\sigma_\omega^2 \quad , \quad \Delta\sigma = \sqrt{\left(\bar{\sigma}+\omega\right)\left(\bar{\sigma}-\omega\right)}$

The Matlab demo below (tested online via Octave) shows that this solution method appears to work in practice as well as theory. The last line shows that all the singular values of $X$ are in the reconstruction $\bar{\sigma}\pm\Delta\sigma$ , but I have not completely figured out which root to take (sgn = $+$ vs. $-$ ). For $\omega=0$ it will always be the $+$ root. This generally seems to hold for "small" $\omega$ , whereas for "large" $\omega$ the $-$ root seems to take over. (Demo below is set to "large" case currently.)

% Matlab demo of "Reverse Ridge Regression"
n = 3; p = 5; q = 8; w = 1*sqrt(1e+1); sgn = -1;
Y = rand(n,q); X = rand(n,p);
I = eye(p); Z = zeros(p,q);
err = @(a,b)norm(a(:)-b(:),Inf);

B = pinv([X;w*I])*[Y;Z];
Xhat0 = Y*pinv(B);
dBres0 = err( pinv([Xhat0;w*I])*[Y;Z] , B )

[Uw,Sw2,Vw0] = svd(Xhat0, 'econ');

sw2 = diag(Sw2); s0mid = sw2/2;
ds0 = sqrt(max( 0 , s0mid.^2 - w^2 ));
s0 = s0mid + sgn * ds0;
Xhat = Uw*diag(s0)*Vw0';

dBres = err( pinv([Xhat;w*I])*[Y;Z] , B )
dXerr = err( Xhat , X )
sigX = svd(X)', sigHat = [s0mid+ds0,s0mid-ds0]' % all there, but which sign?

I cannot say how robust this solution is, as inverse problems are generally ill-posed, and analytical solutions can be very fragile. However cursory experiments polluting $B$ with Gaussian noise (i.e. so it has full rank $p$ vs. reduced rank $n$ ) seem to indicate the method is reasonably well behaved.

As for problem 2 (i.e. $\omega$ unknown), the above gives at least an upper bound on $\omega$ . For the quadratic discriminant to be non-negative we must have

ω \leq ω_{max} = {\bar{σ}}_{n} = min [\frac{1}{2} σ_{ω}^{2}]

$\omega \leq \omega_{\max} = \bar{\sigma}_n = \min[\tfrac{1}{2}\sigma_\omega^2]$

For the quadratic-root sign ambiguity, the following code snippet shows that independent of sign, any $\hat{X}$ will give the same forward $B$ ridge-solution, even when $\sigma_0$ differs from $\mathrm{SVD}[X]$ .

Xrnd=Uw*diag(s0mid+sign(randn(n,1)).*ds0)*Vw0'; % random signs
dBrnd=err(pinv([Xrnd;w*I])*[Y;Z],B) % B is always consistent ...
dXrnd=err(Xrnd,X) % ... even when X is not

— GeoMatt22
소스

+11. Thanks a lot for all the effort that you put into answering this question and for all the discussion that we had. This seems to answer my question entirely. I felt that simply accepting your answer is not enough in this case; this deserves much more than two upvotes that this answer currently has. Cheers.

— amoeba says Reinstate Monica

@amoeba thanks! I am glad it was helpful. I think I will post a comment on whuber's answer you link asking if he thinks it is appropriate and/or if there is a better answer to use. (Note he prefaces his SVD discussion with the proviso

p \leq n

$p\leq n$ , i.e. an over-determined

X

$X$ .)

— GeoMatt22

@GeoMatt22 my comment on original question says using pinv is not a good thing, do you agree?

— Haitao Du

@hxd1011 In general you (almost) never want to explicitly invert a matrix numerically, and this holds also for the pseudo-inverse. The two reasons I used it here are 1) consistency with the mathematical equations + amoeba's demo code, and 2) for the case of underdetermined systems, the default Matlab "slash" solutions can differ from the pinv ones. Almost all of the cases in my code could be replaced by the appropriate \ or / commands, which are generally to be preferred. (These allow Matlab to decide the most effective direct solver.)

— GeoMatt22

@hxd1011 to clarify on point 2 of my previous comment, from the link in your comment on the original question: "If the rank of A is less than the number of columns in A, then x = A\B is not necessarily the minimum norm solution. The more computationally expensive x = pinv(A)*B computes the minimum norm least-squares solution.".

— GeoMatt22