편집 : 귀하의 질문을 오해했습니다. 두 가지 측면이 있습니다.
A) na.omit
와 na.exclude
모두 예측하고 criterions에 모두에 대한 casewise 삭제를 할. 추출기는 기능이 다르거 residuals()
나 생략 된 경우에 대해 fitted()
출력을 NA
s로 채 웁니다.na.exclude
입력 변수와 동일한 길이의 출력을 갖는 .
> N <- 20 # generate some data
> y1 <- rnorm(N, 175, 7) # criterion 1
> y2 <- rnorm(N, 30, 8) # criterion 2
> x <- 0.5*y1 - 0.3*y2 + rnorm(N, 0, 3) # predictor
> y1[c(1, 3, 5)] <- NA # some NA values
> y2[c(7, 9, 11)] <- NA # some other NA values
> Y <- cbind(y1, y2) # matrix for multivariate regression
> fitO <- lm(Y ~ x, na.action=na.omit) # fit with na.omit
> dim(residuals(fitO)) # use extractor function
[1] 14 2
> fitE <- lm(Y ~ x, na.action=na.exclude) # fit with na.exclude
> dim(residuals(fitE)) # use extractor function -> = N
[1] 20 2
> dim(fitE$residuals) # access residuals directly
[1] 14 2
b)는 실제 문제 사이의 차이가되지 않습니다 na.omit
및 na.exclude
둘 다 할 계정에 기준 변수를 취 casewise 삭제를 원하는 것 같지 않습니다.
> X <- model.matrix(fitE) # design matrix
> dim(X) # casewise deletion -> only 14 complete cases
[1] 14 2
X+=(X′X)−1X′ (pseudoinverse of design matrix X, coefficients β^=X+Y) and the hat matrix H=XX+, fitted values Y^=HY). If you don't want casewise deletion, you need a different design matrix X for each column of Y, so there's no way around fitting separate regressions for each criterion. You can try to avoid the overhead of lm()
by doing something along the lines of the following:
> Xf <- model.matrix(~ x) # full design matrix (all cases)
# function: manually calculate coefficients and fitted values for single criterion y
> getFit <- function(y) {
+ idx <- !is.na(y) # throw away NAs
+ Xsvd <- svd(Xf[idx , ]) # SVD decomposition of X
+ # get X+ but note: there might be better ways
+ Xplus <- tcrossprod(Xsvd$v %*% diag(Xsvd$d^(-2)) %*% t(Xsvd$v), Xf[idx, ])
+ list(coefs=(Xplus %*% y[idx]), yhat=(Xf[idx, ] %*% Xplus %*% y[idx]))
+ }
> res <- apply(Y, 2, getFit) # get fits for each column of Y
> res$y1$coefs
[,1]
(Intercept) 113.9398761
x 0.7601234
> res$y2$coefs
[,1]
(Intercept) 91.580505
x -0.805897
> coefficients(lm(y1 ~ x)) # compare with separate results from lm()
(Intercept) x
113.9398761 0.7601234
> coefficients(lm(y2 ~ x))
(Intercept) x
91.580505 -0.805897
Note that there might be numerically better ways to caculate X+ and H, you could check a QR-decomposition instead. The SVD-approach is explained here on SE. I have not timed the above approach with big matrices Y against actually using lm()
.