통계 및 빅 데이터 r

2

R에서 glm을 사용하여 로지스틱 회귀 모델을 만들었습니다. 두 개의 독립 변수가 있습니다. 두 변수의 산포도에서 모델의 결정 경계를 그리는 방법은 무엇입니까? 예를 들어 http://onlinecourses.science.psu.edu/stat557/node/55 와 같은 그림을 그리는 방법은 무엇 입니까? 감사.

16 r logistic

3

'predict.randomForest`는 클래스 확률을 어떻게 추정합니까?

randomForest패키지를 사용할 때 클래스 확률 은 어떻게 추정 predict(model, data, type = "prob")됩니까? 나는 확률을 예측하기 ranger위해 probability = T인수를 사용하여 임의의 숲을 훈련 하는 데 사용했습니다 . ranger설명서에서 다음과 같이 말합니다. Malley et al. (2012). 일부 데이터를 시뮬레이트하고 패키지를 모두 시도하고 매우 다른 결과를 얻었습니다 (아래 코드 참조) 따라서 …

16 r random-forest prediction

1

lmer 모델에 사용할 다중 비교 방법 : lsmeans 또는 glht?

하나의 고정 효과 (조건)와 두 개의 임의 효과 (대상 내 설계 및 쌍으로 인해 참가자)가있는 혼합 효과 모델을 사용하여 데이터 세트를 분석하고 있습니다. lme4패키지로 모델이 생성되었습니다 exp.model<-lmer(outcome~condition+(1|participant)+(1|pair),data=exp). 다음으로, 고정 효과 (조건)없이 모형에 대해이 모형의 우도 비 검정을 수행했으며 유의 한 차이가 있습니다. 내 데이터 세트에는 3 가지 조건이 있으므로 다중 …

16 r repeated-measures multiple-comparisons post-hoc lsmeans bayesian posterior marginal integral anova time-series regularization machine-learning pca computational-statistics references inference regression cross-validation python random-forest chi-squared spearman-rho r machine-learning confidence-interval bagging clustering feature-selection model-selection bic hypothesis-testing kurtosis r regression residuals terminology

3

단순 카이 제곱 테스트 대신 glm () 사용

glm()R을 사용하여 귀무 가설을 변경하는 데 관심 이 있습니다. 예를 들면 다음과 같습니다. x = rbinom(100, 1, .7) summary(glm(x ~ 1, family = "binomial")) 가설을 검정합니다 . 내에서 null을 p = 임의의 값 으로 변경하려면 어떻게해야 합니까? p=0.5p=0.5p = 0.5pppglm() 이 작업을 prop.test()and 로도 수행 할 수 있다는 것을 알고 …

15 r hypothesis-testing generalized-linear-model chi-squared offset

1

반응 변수가 0에서 1 사이 인 혼합 모델을 맞추는 방법은 무엇입니까?

내가 사용하려고 lme4::glmer()바이너리 아닌 종속 변수지만, 0과 1 사이의 연속 변수와 이항 일반화 된 혼합 모델 (GLMM)를 맞게. 이 변수를 확률로 생각할 수 있습니다. 사실 그것은 이다 사람을 대상으로하여 (실험하는 I 도움말 분석에)보고 된 확률. 즉 그건 아니 는 "이산"분수하지만, 연속 변수입니다. 내 glmer()(아래 참조) 예상대로 호출이 작동하지 않습니다. 왜? …

15 r logistic mixed-model glmm lme4-nlme

4

반복 횟수가 증가함에 따라 그라디언트 부스팅 기계 정확도가 감소합니다.

caretR 의 패키지를 통해 그라디언트 부스팅 머신 알고리즘을 실험하고 있습니다 . 소규모 대학 입학 데이터 세트를 사용하여 다음 코드를 실행했습니다. library(caret) ### Load admissions dataset. ### mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv") ### Create yes/no levels for admission. ### mydata$admit_factor[mydata$admit==0] <- "no" mydata$admit_factor[mydata$admit==1] <- "yes" ### Gradient boosting machine algorithm. ### set.seed(123) fitControl …

15 machine-learning caret boosting gbm hypothesis-testing t-test panel-data psychometrics intraclass-correlation generalized-linear-model categorical-data binomial model intercept causality cross-correlation distributions ranks p-value z-test sign-test time-series references terminology cross-correlation definition probability distributions beta-distribution inverse-gamma missing-data paired-comparisons paired-data clustered-standard-errors cluster-sample time-series arima logistic binary-data odds-ratio medicine hypothesis-testing wilcoxon-mann-whitney unsupervised-learning hierarchical-clustering neural-networks train clustering k-means regression ordinal-data change-scores machine-learning experiment-design roc precision-recall auc stata multilevel-analysis regression fitting nonlinear jmp r data-visualization gam gamm4 r lme4-nlme many-categories regression causality instrumental-variables endogeneity controlling-for-a-variable

1

다단계 혼합 효과 모델에 대한 수학 방정식 작성

이력서 질문 혼합 효과 모델에 대해 (a) 상세하고 간결한 수학적 표현을 제공하려고합니다. lme4R 에서 패키지를 사용하고 있습니다. 모델에 올바른 수학 표현은 무엇입니까? 데이터, 과학 질문 및 R 코드 내 데이터 세트는 다른 지역의 종으로 구성됩니다. 종의 유병률이 멸종 (멸종이 반드시 영구적 일 필요는 없으며, 재 식민지화 될 수 있음)으로 이어 …

15 r mixed-model multilevel-analysis lme4-nlme

2

BSTS 모델의 예측 (R)은 완전히 실패합니다

베이지안 구조 시계열 모델에 대한 이 블로그 게시물을 읽은 후 이전에 ARIMA를 사용했던 문제와 관련하여이를 구현하고 싶었습니다. 나는 알려진 (그러나 시끄러운) 계절 성분에 대한 데이터를 가지고 있습니다. 연간, 매월 및 매주 성분이 있으며 특별한 날 (연방 또는 종교 휴일과 같은)로 인한 영향도 있습니다. 나는 bsts이것을 구현 하기 위해 패키지를 사용했으며 …

15 r time-series bayesian mcmc bsts

1

QR 분해 이해

나는 더 많은 것을 이해하려고 노력한 예제 (R)를 가지고 있습니다. 선형 모델을 만들기 위해 Limma를 사용하고 있으며 폴드 변경 계산에서 단계별로 어떤 일이 일어나고 있는지 이해하려고합니다. 나는 주로 계수 계산에 어떤 일이 발생하는지 알아 내려고 노력하고 있습니다. 내가 알아낼 수있는 것에서 QR 분해는 계수를 얻는 데 사용되므로 본질적으로 계산중인 방정식 …

15 r regression linear-model

1

Breiman의 임의 포리스트는 정보 획득 또는 Gini 인덱스를 사용합니까?

Breiman의 임의 포리스트 (R randomForest 패키지의 임의 포리스트)가 분할 기준 (속성 선택 기준) 정보 획득 또는 Gini 인덱스로 사용되는지 알고 싶습니다. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm 및 R의 randomForest 패키지에 대한 설명서에서 찾아 보았습니다 . 그러나 찾은 유일한 것은 Gini 인덱스를 사용할 수 있다는 것입니다. 가변 중요도 컴퓨팅.

15 r random-forest entropy gini

2

클러스터링을위한 혼합 모델을 맞추는 방법

나는 X와 Y의 두 변수를 가지고 클러스터를 최대 (및 최적) = 5로 만들어야합니다. 변수의 이상적인 플롯은 다음과 같습니다. 이 클러스터를 5 개 만들고 싶습니다. 이 같은: 따라서 이것이 5 개의 군집을 가진 혼합 모형이라고 생각합니다. 각 군집에는 중심점이 있고 그 주위에 신뢰 원이 있습니다. 클러스터는 항상 이와 같은 것은 아니며 …

15 r clustering gaussian-mixture

2

R에서 "손으로"AIC 계산

R에서 선형 회귀의 AIC를 계산하려고 시도했지만 다음 AIC과 같이 함수 를 사용하지 않았습니다 . lm_mtcars <- lm(mpg ~ drat, mtcars) nrow(mtcars)*(log((sum(lm_mtcars$residuals^2)/nrow(mtcars))))+(length(lm_mtcars$coefficients)*2) [1] 97.98786 그러나 AIC다른 값을 제공합니다. AIC(lm_mtcars) [1] 190.7999 누군가 내가 뭘 잘못하고 있는지 말해 줄래?

15 r aic information-theory

1

귀무 가설 하에서 교환 가능한 샘플의 직관은 무엇입니까?

순열 검정 (랜덤 화 검정, 재 랜덤 화 검정 또는 정확한 검정이라고도 함)은 매우 유용하며, 예를 들어 요구되는 정규 분포 가정이 t-test충족되지 않고 순위에 따라 값을 변환 할 때 유용합니다. 비모수 테스트 Mann-Whitney-U-test는 더 많은 정보가 손실 될 수 있습니다. 그러나 이러한 종류의 테스트를 사용할 때 단 하나의 가정 만 …

15 hypothesis-testing permutation-test exchangeability r statistical-significance loess data-visualization normal-distribution pdf ggplot2 kernel-smoothing probability self-study expected-value normal-distribution prior correlation time-series regression heteroscedasticity estimation estimators fisher-information data-visualization repeated-measures binary-data panel-data mathematical-statistics coefficient-of-variation normal-distribution order-statistics regression machine-learning one-class probability estimators forecasting prediction validation finance measurement-error variance mean spatial monte-carlo data-visualization boxplot sampling uniform chi-squared goodness-of-fit probability mixture theory gaussian-mixture regression statistical-significance p-value bootstrap regression multicollinearity correlation r poisson-distribution survival regression categorical-data ordinal-data ordered-logit regression interaction time-series machine-learning forecasting cross-validation binomial multiple-comparisons simulation false-discovery-rate r clustering frequency wilcoxon-mann-whitney wilcoxon-signed-rank r svm t-test missing-data excel r numerical-integration r random-variable lme4-nlme mixed-model weighted-regression power-law errors-in-variables machine-learning classification entropy information-theory mutual-information

1

LASSO 변수 추적 플롯 해석

나는 glmnet패키지를 처음 접했고 결과를 해석하는 방법을 여전히 확신하지 못한다. 누구든지 다음 추적 플롯을 읽도록 도와 줄 수 있습니까? 다음을 실행하여 그래프를 얻었습니다. library(glmnet) return <- matrix(ret.ff.zoo[which(index(ret.ff.zoo)==beta.df$date[2]), ]) data <- matrix(unlist(beta.df[which(beta.df$date==beta.df$date[2]), ][ ,-1]), ncol=num.factors) model <- cv.glmnet(data, return, standardize=TRUE) op <- par(mfrow=c(1, 2)) plot(model$glmnet.fit, "norm", label=TRUE) plot(model$glmnet.fit, "lambda", label=TRUE) par(op)

15 r data-visualization interpretation lasso glmnet

1

베타 회귀 분석에서 계수를 해석하는 방법은 무엇입니까?

0과 1 사이의 경계가있는 데이터가 있습니다 betareg.R 의 패키지를 사용 하여 경계가있는 데이터를 종속 변수로 사용하여 회귀 모델을 맞추 었습니다. 내 질문은 : 회귀의 계수를 어떻게 해석합니까?

15 r regression interpretation beta-distribution regression-coefficients

«r» 태그된 질문