방금 혼합 효과 모델링을 통해 측정의 반복성 (일명 신뢰성, 일명 클래스 내 상관 관계)을 계산하는 방법을 설명하는 이 문서를 보았습니다. R 코드는 다음과 같습니다.

#fit the model
fit = lmer(dv~(1|unit),data=my_data)

#obtain the variance estimates
vc = VarCorr(fit)
residual_var = attr(vc,'sc')^2
intercept_var = attr(vc$id,'stddev')[1]^2

#compute the unadjusted repeatability
R = intercept_var/(intercept_var+residual_var)

#compute n0, the repeatability adjustment
n = as.data.frame(table(my_data$unit))
    k = nrow(n)
    N = sum(n$Freq)
n0 = (N-(sum(n$Freq^2)/N))/(k-1)

#compute the adjusted repeatability
Rn = R/(R+(1-R)/n0)

이 접근법은 다음과 같이 효과의 신뢰성 (예 : 변수의 합계 대비 효과 2 수준)을 계산하는데도 사용할 수 있다고 생각합니다.

#make sure the effect variable has sum contrasts
contrasts(my_data$iv) = contr.sum

#fit the model
fit = lmer(dv~(iv|unit)+iv,data=my_data)

#obtain the variance estimates
vc = VarCorr(fit)
residual_var = attr(vc,'sc')^2
effect_var = attr(vc$id,'stddev')[2]^2

#compute the unadjusted repeatability
R = effect_var/(effect_var+residual_var)

#compute n0, the repeatability adjustment
n = as.data.frame(table(my_data$unit,my_data$iv))
k = nrow(n)
N = sum(n$Freq)
    n0 = (N-(sum(n$Freq^2)/N))/(k-1)

#compute the adjusted repeatability
Rn = R/(R+(1-R)/n0)

세 가지 질문 :

효과의 반복성의 점 추정치를 얻기위한 위의 계산이 의미가 있습니까?
반복성을 추정 할 변수가 여러 개인 경우 모든 변수를 동일한 피팅 (예 :)에 추가하면 lmer(dv~(iv1+iv2|unit)+iv1+iv2각 효과에 대해 별도의 모델을 만드는 것보다 더 높은 반복성 추정치가 산출됩니다. 다중 효과를 포함하면 잔차 분산이 감소하는 경향이 있기 때문에 계산적으로 의미가 있지만 결과 반복성 추정이 유효하다는 것은 긍정적이지 않습니다. 그들이 있습니까?
위에서 인용 한 논문은 가능성 프로파일 링이 반복성 추정치에 대한 신뢰 구간을 얻는 데 도움이 될 수 있지만, 내가 알 수 confint(profile(fit))있는 한 절편 및 효과 분산에 대한 구간 만 제공하는 반면 잔차 분산에 대한 구간은 계산에 추가로 필요합니다 반복성의 간격, 아니?

mixed-model reliability intraclass-correlation repeatability spss factor-analysis survey modeling cross-validation error curve-fitting mediation correlation clustering sampling machine-learning probability classification metric r project-management optimization svm python dataset quality-control checking clustering distributions anova factor-analysis exponential poisson-distribution generalized-linear-model deviance machine-learning k-nearest-neighbour r hypothesis-testing t-test r variance levenes-test bayesian software bayesian-network regression repeated-measures least-squares change-scores variance chi-squared variance nonlinear-regression regression-coefficients multiple-comparisons p-value r statistical-significance excel sampling sample r distributions interpretation goodness-of-fit normality-assumption probability self-study distributions references theory time-series clustering econometrics binomial hypothesis-testing variance t-test paired-comparisons statistical-significance ab-test r references hypothesis-testing t-test normality-assumption wilcoxon-mann-whitney central-limit-theorem t-test data-visualization interactive-visualization goodness-of-fit

— 마이크 로렌스
소스

적어도 조정되지 않은 반복성 추정, 즉 고전적인 클래스 내 상관 관계 (ICC)에 관한 귀하의 질문에 대답 할 수 있다고 생각합니다 . "조정 된"반복성 추정치에 관해서는, 당신이 연결 한 논문을 훑어보고 실제로 적용한 공식이 논문에서 어디에 있는지 보지 못했습니까? 수학적 표현에 기초하여, 그것은 개별 점수가 아닌 평균 점수의 반복 성인 것으로 보인다. 그러나 이것이 귀하의 질문에 중요한 부분이라는 것은 확실하지 않으므로 무시하겠습니다.

(1) 효과의 반복성의 점 추정치를 얻기위한 위의 계산이 의미가 있습니까?

예, 제안한 표현이 의미가 있지만 제안 된 공식을 약간 수정해야합니다. 아래에서는 제안 된 반복성 계수를 도출하는 방법을 보여줍니다. 이것이 둘 다 계수의 개념적 의미를 명확하게하고 약간 수정하는 것이 바람직한 이유를 보여주기를 바랍니다.

먼저, 첫 번째 경우의 반복성 계수를 가져 와서 그것이 의미하는 바와 그 출처를 명확히하자. 이것을 이해하면 더 복잡한 두 번째 경우를 이해하는 데 도움이됩니다.

무작위 차단

$i$ $j$

y_{i j} = β_{0} + u_{0 j} + e_{i j},

$y_{ij} = \beta_0 + u_{0j} + e_{ij},$

u_{0 j}

$u_{0j}$

σ_{u_{0}}^{2}

$\sigma^2_{u_0}$

e_{i j}

$e_{ij}$

σ_{e}^{2}

$\sigma^2_e$

$x$ $y$

c o r r = \frac{c o v (x, y)}{\sqrt{v a r (x) v a r (y)}} .

$corr = \frac{cov(x, y)}{\sqrt{var(x)var(y)}}.$

$x$ $y$ $j$

I C C = \frac{c o v (β_{0} + u_{0 j} + e_{i_{1} j}, β_{0} + u_{0 j} + e_{i_{2} j})}{\sqrt{v a r (β_{0} + u_{0 j} + e_{i_{1} j}) v a r (β_{0} + u_{0 j} + e_{i_{2} j})}},

$ICC = \frac{cov(\beta_0 + u_{0j} + e_{i_1j}, \beta_0 + u_{0j} + e_{i_2j})}{\sqrt{var(\beta_0 + u_{0j} + e_{i_1j})var(\beta_0 + u_{0j} + e_{i_2j})}},$

I C C = \frac{σ_{u_{0}}^{2}}{σ_{u_{0}}^{2} + σ_{e}^{2}} .

$ICC = \frac{\sigma^2_{u_0}}{\sigma^2_{u_0} + \sigma^2_e}.$

랜덤 절편 및 랜덤 슬로프

이제 두 번째 경우에는 먼저 "효과의 신뢰성 (즉, 변수가 2 레벨 인 변수의 합 대비 효과)"이 무엇을 의미하는지 명확하게 설명해야합니다.

$i$ $j$ $k$ $x$

y_{i j k} = β_{0} + β_{1} x_{k} + u_{0 j} + u_{1 j} x_{k} + e_{i j k},

$y_{ijk} = \beta_0 + \beta_1x_k + u_{0j} + u_{1j}x_k + e_{ijk},$

σ_{u_{0}}^{2}

$\sigma^2_{u_0}$

σ_{u_{1}}^{2}

$\sigma^2_{u_1}$

σ_{u_{01}}

$\sigma_{u_{01}}$

e_{i j}

$e_{ij}$

σ_{e}^{2}

$\sigma^2_e$

$j$ $i$

$x$ $|x_1|=|x_2|=x$

y_{i_{1} j k_{2}} - y_{i_{1} j k_{1}} = (β_{0} - β_{0}) + β_{1} (x_{k_{2}} - x_{k_{1}}) + (u_{0 j} - u_{0 j}) + u_{1 j} (x_{k_{2}} - x_{k_{1}}) + (e_{i_{1} j k_{2}} - e_{i_{1} j k_{1}}) = 2 x β_{1} + 2 x u_{1 j} + e_{i_{1} j k_{2}} - e_{i_{1} j k_{1}}

$y_{i_1jk_2}-y_{i_1jk_1}=(\beta_0-\beta_0)+\beta_1(x_{k_2}-x_{k_1})+(u_{0j}-u_{0j})+u_{1j}(x_{k_2}-x_{k_1})+(e_{i_1jk_2}-e_{i_1jk_1}) \\=2x\beta_1+2xu_{1j}+e_{i_1jk_2}-e_{i_1jk_1}$

y_{i_{2} j k_{2}} - y_{i_{2} j k_{1}} = 2 x β_{1} + 2 x u_{1 j} + e_{i_{2} j k_{2}} - e_{i_{2} j k_{1}} .

$y_{i_2jk_2}-y_{i_2jk_1}=2x\beta_1+2xu_{1j}+e_{i_2jk_2}-e_{i_2jk_1}.$

Plugging these into the correlation formula gives us

I C C = \frac{c o v (2 x β_{1} + 2 x u_{1 j} + e_{i_{1} j k_{2}} - e_{i_{1} j k_{1}}, 2 x β_{1} + 2 x u_{1 j} + e_{i_{2} j k_{2}} - e_{i_{2} j k_{1}})}{\sqrt{v a r (2 x β_{1} + 2 x u_{1 j} + e_{i_{1} j k_{2}} - e_{i_{1} j k_{1}}) v a r (2 x β_{1} + 2 x u_{1 j} + e_{i_{2} j k_{2}} - e_{i_{2} j k_{1}})}},

$ICC = \frac{cov(2x\beta_1+2xu_{1j}+e_{i_1jk_2}-e_{i_1jk_1}, 2x\beta_1+2xu_{1j}+e_{i_2jk_2}-e_{i_2jk_1})}{\sqrt{var(2x\beta_1+2xu_{1j}+e_{i_1jk_2}-e_{i_1jk_1})var(2x\beta_1+2xu_{1j}+e_{i_2jk_2}-e_{i_2jk_1})}},$ which simplifies down to

I C C = \frac{2 x^{2} σ_{u_{1}}^{2}}{2 x^{2} σ_{u_{1}}^{2} + σ_{e}^{2}} .

$ICC = \frac{2x^2\sigma^2_{u_1}}{2x^2\sigma^2_{u_1} + \sigma^2_e}.$ Notice that the ICC is technically a function of

x

$x$ ! However, in this case

x

$x$ can only take 2 possible values, and the ICC is identical at both of these values.

As you can see, this is very similar to the repeatability coefficient that you proposed in your question, the only difference is that the random slope variance must be appropriately scaled if the expression is to be interpreted as an ICC or "unadjusted repeatability coefficient." The expression that you wrote works in the special case where the $x$ predictor is coded $\pm\frac{1}{\sqrt{2}}$ , but not in general.

(2.) When I have multiple variables whose repeatability I want to estimate, adding them all to the same fit (e.g. lmer(dv~(iv1+iv2|unit)+iv1+iv2) seems to yield higher repeatability estimates than creating a separate model for each effect. This makes sense computationally to me, as inclusion of multiple effects will tend to decrease the residual variance, but I'm not positive that the resulting repeatability estimates are valid. Are they?

I believe that working through a similar derivation as presented above for a model with multiple predictors with their own random slopes would show that the repeatability coefficient above would still be valid, except for the added complication that the difference scores we are conceptually interested in would now have a slightly different definition: namely, we are interested in the expected correlation of the differences between adjusted means after controlling for the other predictors in the model.

If the other predictors are orthogonal to the predictor of interest (as in, e.g., a balanced experiment), I would think the ICC / repeatability coefficient elaborated above should work without any modification. If they are not orthogonal then you would need to modify the formula to take account of this, which could get complicated, but hopefully my answer has given some hints about what that might look like.

— Jake Westfall
소스

You are right Jake. The adjusted ICC referes to the section VII. EXTRAPOLATED REPEATABILITY AND HERITABILITY in the linked paper. The authors write It is important to distinguish between the repeatability of individual measurements $R$ and the repeatability of measurement means $R_n$ .

— Gabra

lmer 모델의 효과 반복 계산

무작위 차단

랜덤 절편 및 랜덤 슬로프