종속 변수가 다른 모델의 로지스틱 계수 비교

14

이것은 내가 며칠 전에 물었던 후속 질문입니다 . 나는 그것이 그 문제에 대해 다른 기울기를두고 있다고 생각하므로 새로운 질문을 나열했다.

문제는 모델마다 계수의 크기를 다른 종속 변수와 비교할 수 있습니까? 예를 들어, 한 표본에서 경제가 하원 의원 또는 대통령에 대한 더 강력한 투표 예측기인지 알고 싶다고 말합니다. 이 경우 내 두 가지 종속 변수는 하원의 투표 (민주당은 1, 공화당은 0으로 표시), 대통령 (민주당은 1, 공화당은 0)으로 투표하고 독립 변수는 경제입니다. 두 사무실에서 통계적으로 유의미한 결과를 기대할 수 있지만 다른 사무실보다 더 큰 영향을 미치는지 어떻게 평가합니까? 이것은 특히 흥미로운 예는 아니지만 비교할 방법이 있는지 궁금합니다. 나는 계수의 '크기'를 볼 수 없다는 것을 알고 있습니다. 그래서, 다른 종속 변수가있는 모델의 계수를 비교하고 있습니까? 그렇다면 어떻게 할 수 있습니까?

이 중 하나라도 이해가되지 않으면 알려주십시오. 모든 조언과 의견을 부탁드립니다.

regression logistic

— 이스
소스

2

계수의 '크기'를 볼 수 없다는 것을 어떻게 알 수 있습니까?

— onestop

두 계정을 병합했습니다. FAQ에 표시된대로 계속 등록해야합니다 . (복사본을 가리키는 @onestop Thx.)

— chl

이전 질문에 대한 답변의 계수를 보면 모형에서 예측 변수의 '효과'를 비교할 수 없다고 가정했습니다. 위의 예와 다른 점이 있습니까?

— Ejs

2

현상금을 시작하면 -이 없음 세 개의 매우 다른 대답과 함께 중요한 질문처럼 보인다 하나의 투표를 . 우리는 더 잘할 수 있습니다. 앤디 W의 종이 링크 에 이 관련 문제는 관련 보인다.

— 매트 파커

4

짧은 대답은 "그렇습니다"입니다. 그러나 "큰 모델"의 최대 가능성 추정치 (MLE)를 두 모델에 모두 맞는 두 모델의 모든 공변량과 비교해야합니다.

이것은 확률 이론이 귀하의 질문에 대답 할 수있는 "준 공식적인"방법입니다

이 예에서 과 는 동일한 유형의 변수 (분수 / 백분율)이므로 비교할 수 있습니다. 나는 당신이 같은 모델을 둘 다에 적합하다고 가정합니다. 따라서 두 가지 모델이 있습니다. $Y_{1}$ $Y_{2}$

M_{1} : Y_{1 i} \sim B i n (n_{1 i}, p_{1 i})

$M_{1}:Y_{1i}\sim Bin(n_{1i},p_{1i})$

엘 영형 지 (\frac{피_{1 나는}}{1 - 피_{1 나는}}) = α_{1} + β_{1} {엑스}_{나는}

$log\left(\frac{p_{1i}}{1-p_{1i}}\right)=\alpha_{1}+\beta_{1}X_{i}$

{미디엄}_{2} : {와이}_{2 나는} \sim 비 나는 엔 (엔_{2 나는}, 피_{2 나는})

$M_{2}:Y_{2i}\sim Bin(n_{2i},p_{2i})$

엘 영형 지 (\frac{피_{2 나는}}{1 - 피_{2 나는}}) = α_{2} + β_{2} {엑스}_{나는}

$log\left(\frac{p_{2i}}{1-p_{2i}}\right)=\alpha_{2}+\beta_{2}X_{i}$

따라서 평가하려는 가설이 있습니다.

H_{0} : β_{1} > β_{2}

$H_{0}:\beta_{1}>\beta_{2}$

그리고 일부 데이터 및 일부 사전 정보 (예 : 물류 모델 사용)가 있습니다. 따라서 확률을 계산하십시오. $\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n}$

P = P r (H_{0} | {Y_{1 i}, Y_{2 i}, X_{i}}_{i = 1}^{n}, I)

$P=Pr(H_0|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I)$

이제 은 회귀 모수의 실제 값에 의존하지 않으므로 한계 화로 제거해야합니다. $H_0$

P = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} P r (H_{0}, α_{1}, α_{2}, β_{1}, β_{2} | {Y_{1 i}, Y_{2 i}, X_{i}}_{i = 1}^{n}, I) d α_{1} d α_{2} d β_{1} d β_{2}

$P=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} Pr(H_0,\alpha_{1},\alpha_{2},\beta_{1},\beta_{2}|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I) d\alpha_{1}d\alpha_{2}d\beta_{1}d\beta_{2}$

가설은 단순히 통합 범위를 제한하므로 다음과 같은 이점이 있습니다.

P = \int_{- \infty}^{\infty} \int_{β_{2}}^{\infty} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} P r (α_{1}, α_{2}, β_{1}, β_{2} | {Y_{1 i}, Y_{2 i}, X_{i}}_{i = 1}^{n}, I) d α_{1} d α_{2} d β_{1} d β_{2}

$P=\int_{-\infty}^{\infty} \int_{\beta_{2}}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} Pr(\alpha_{1},\alpha_{2},\beta_{1},\beta_{2}|\{Y_{1i},Y_{2i},X_{i}\}_{i=1}^{n},I) d\alpha_{1}d\alpha_{2}d\beta_{1}d\beta_{2}$

Because the probability is conditional on the data, it will factor into the two separate posteriors for each model

P r (α_{1}, β_{1} | {Y_{1 i}, X_{i}, Y_{2 i}}_{i = 1}^{n}, I) P r (α_{2}, β_{2} | {Y_{2 i}, X_{i}, Y_{1 i}}_{i = 1}^{n}, I)

$Pr(\alpha_{1},\beta_{1}|\{Y_{1i},X_{i},Y_{2i}\}_{i=1}^{n},I)Pr(\alpha_{2},\beta_{2}|\{Y_{2i},X_{i},Y_{1i}\}_{i=1}^{n},I)$

Now because there is no direct links between $Y_{1i}$ and $\alpha_{2},\beta_{2}$ , only indirect links through $X_{i}$ , which is known, it will drop out of the conditioning in the second posterior. same for $Y_{2i}$ in the first posterior.

From standard logistic regression theory, and assuming uniform prior probabilities, the posterior for the parameters is approximately bi-variate normal with mean equal to the MLEs, and variance equal to the information matrix, denoted by $V_{1}$ and $V_{2}$ - which do not depend on the parameters, only the MLEs. so you have straight-forward normal integrals with known variance matrix. $\alpha_{j}$ marginalises out with no contribution (as would any other "common variable") and we are left with the usual result (I can post the details of the derivation if you want, but its pretty "standard" stuff):

P = Φ (\frac{{\hat{β}}_{2, M L E} - {\hat{β}}_{1, M L E}}{\sqrt{V_{1 : β, β} + V_{2 : β, β}}})

$P=\Phi\left(\frac{\hat{\beta}_{2,MLE}-\hat{\beta}_{1,MLE}}{\sqrt{V_{1:\beta,\beta}+V_{2:\beta,\beta}}}\right)$

Where $\Phi()$ is just the standard normal CDF. This is the usual comparison of normal means test. But note that this approach requires the use of the same set of regression variables in each. In the multivariate case with many predictors, if you have different regression variables, the integrals will become effectively equal to the above test, but from the MLEs of the two betas from the "big model" which includes all covariates from both models.

— probabilityislogic
소스

3

Why not? The models are estimating how much 1 unit of change in any model predictor will influence the probability of "1" for the outcome variable. I'll assume the models are the same-- that they have the same predictors in them. The most informative way to compare the relative magnitudes of any given predictor in the 2 models is to use the models to calculate (either deterministically or better by simulation) how much some meaningful increment of change (e.g., +/- 1 SD) in the predictor affects the probabilities of the respective outcome variables--& compare them! You'll want to determine confidence intervals for the two estimates as well as so you can satisfy yourself that the difference is "significant," practically & statistically.

— dmk38
소스

Thanks dmk8, very useful. Some follow-up points/questions: is this what is often meant when referring to varying the variable of interest (the economy from bad to good for example) while holding all control variables at their means? What do you mean by deterministically? How do I determine the confidence intervals around the probabilities?

— Ejs

2

Consult the King. He will not disappoint. King, G., Tomz, M., & Wittenberg., J. (2000). Making the Most of Statistical Analyses: Improving Interpretation and Presentation. Am. J. Pol. Sci, 44(2), 347-361.

— dmk38

2

I assume that by "my independent variable is the economy" you're using shorthand for some specific predictor.

At one level, I see nothing wrong with making a statement such as

X predicts Y1 with an odds ratio of _ and a 95% confidence interval of [ _ , _ ] while X predicts Y2 with an odds ratio of _ and a 95% confidence interval of [ _ , _ ].

@dmk38's recent suggestions look very helpful in this regard.

You might also want to standardize the coefficients to facilitate comparison.

At another level, beware of taking inferential statistics (standard errors, p-values, CIs) literally when your sample constitutes a nonrandom sample of the population of years to which you might want to generalize.

— rolando2
소스

Yes, 'the economy' is shorthand for perceptions of national economic conditions. Does the same advice apply when other predictors (controls) are included in the model?

— Ejs

@Ejs - I'm afraid there's no short answer to your last question. You're getting into what it means to assess relationships when using statistical control - a fabulously intricate topic worthy of extensive study. You're also probably getting into the topic of variable selection, which is a big one as well. Imho the best source for the committed student of these topics is Pedhazur's amazon.com/Multiple-regression-behavioral-research-Pedhazur/…

— rolando2

1

Let us say the interest lies in comparing two groups of people: those with $X_{1} = 1$ and those with $X_{1} = 0$ .

The exponential of $\beta_{1}$ , the corresponding coefficient, is interpreted as the ratio of the odds of success for those with $X_{1} = 1$ over the odds of success for those with $X_{1} = 0$ , conditional on the other variables in the model.

So, if you have two models with different dependend variables then the interpretation of $\beta_{1}$ changes since it is not conditioned upon the same set of variables. As a consequence, the comparison is not direct...

— ocram
소스

Does this have any implications for roland2's suggestion?

— Ejs

@Ejs. Do you refer to the standardisation step? By the way, does my answer help ? Have I misunderstood the question ?

— ocram