학습 알고리즘 중에서 선택하는 방법

21

일부 훈련 데이터를 기반으로 레코드를 2 가지 범주 (참 / 거짓)로 분류하는 프로그램을 구현해야하며 어떤 알고리즘 / 방법론을보고 있는지 궁금합니다. 인공 신경망, 유전자 알고리즘, 기계 학습, 베이지안 최적화 등 중에서 선택할 수있는 많은 것들이 있으며, 어디서부터 시작 해야할지 모르겠습니다. 내 질문은 : 문제에 사용해야하는 학습 알고리즘을 어떻게 선택해야합니까?

이것이 도움이된다면 여기에 해결해야 할 문제가 있습니다.

학습 데이터 :
학습 데이터는 다음과 같은 여러 행으로 구성됩니다.

Precursor1, Precursor2, Boolean (true/false)

런
에는 많은 선구자가 주어질 것입니다.
그때,

다른 알고리즘에서 알고리즘 A를 선택하거나 (또는 동적으로 알고리즘을 생성) 이러한 전구체의 모든 가능한 조합에 적용하고 방출되는 "레코드"를 수집합니다. "레코드"는 여러 키-값 쌍으로 구성됩니다 *.
멋진 알고리즘을 적용하고 이러한 레코드를 2 가지 범주 (참 / 거짓)로 분류합니다.
열차 데이터와 동일한 형식의 테이블을 생성합니다.
Precursor1, Precursor2, Boolean

그리고 전체 프로그램은 내가 맞은 참 / 거짓의 수에 따라 점수가 매겨집니다.

* : "레코드"는 다음과 같습니다 (이것이 의미가 있습니다)

Record         [1...*] Score
-Precursor1             -Key
-Precursor2             -Value

사용 가능한 키는 유한합니다. 레코드에는 이러한 키의 다른 하위 집합이 포함됩니다 (일부 레코드에는 key1, key2, key3 ... 다른 레코드에는 key3, key4 ... 등이 있음).

실제로 2 개의 학습이 필요합니다. 하나는 1 단계입니다. Precursor 쌍 등을보고 비교할 레코드를 생성하기 위해 적용 할 알고리즘을 결정하는 모듈이 필요합니다. 다른 하나는 2 단계입니다. 레코드 수집을 분석하고 두 범주 (참 / 거짓)로 분류하는 모듈이 필요합니다.

미리 감사드립니다!

— 엔노시오 지
소스

16

" classet and regression testing"을 나타내는 " caret "라는 " R " 패키지가 있습니다 . 데이터에 수십 가지의 다른 학습 알고리즘을 쉽게 적용하고 교차 검증하여 각각의 정확성을 평가할 수 있기 때문에 시작하기에 좋은 장소라고 생각합니다.

다음은 고유 한 데이터 / 기타 방법으로 수정할 수있는 예입니다.

install.packages('caret',dependencies = c('Depends','Suggests'))
library(caret)

set.seed(999)
Precursor1 <- runif(25)
Precursor2 <- runif(25)
Target <- sample(c('T','F'),25,replace=TRUE)
MyData <- data.frame(Precursor1,Precursor2,Target)
str(MyData)

#Try Logistic regression
model_Logistic <- train(Target~Precursor1+Precursor2,data=MyData,method='glm')

#Try Neural Network
model_NN <- train(Target~Precursor1+Precursor2,data=MyData,method='nnet',trace=FALSE)

#Try Naive Bayes
model_NB <- train(Target~Precursor1+Precursor2,data=MyData,method='nb')

#Try Random Forest
model_RF <- train(Target~Precursor1+Precursor2,data=MyData,method='rf')

#Try Support Vector Machine
model_SVM<- train(Target~Precursor1+Precursor2,data=MyData,method='svmLinear')

#Try Nearest Neighbors
model_KNN<- train(Target~Precursor1+Precursor2,data=MyData,method='knn')

#Compare the accuracy of each model
cat('Logistic:',max(model_Logistic$results$Accuracy))
cat('Neural:',max(model_NN$results$Accuracy))
cat('Bayes:',max(model_NB$results$Accuracy))
cat('Random Forest:',max(model_RF$results$Accuracy))
cat('Support Vector Machine:',max(model_SVM$results$Accuracy))
cat('Nearest Neighbors:',max(model_KNN$results$Accuracy))

#Look at other available methods
?train

또 다른 아이디어는 데이터를 학습 세트와 테스트 세트로 나누고 각 모델이 테스트 세트에서 수행되는 방식을 비교하는 것입니다. 당신이 좋아한다면, 나는 당신에게 그 방법을 보여줄 수 있습니다.

— 잭
소스

8

확률 이론을 사용하여 시작한 다음 확률 이론이 지시하는 바를 가장 잘 계산하는 알고리즘을 선택합니다. 따라서 훈련 데이터 와 새로운 전구체 와 를 분류 할 객체와 이전 정보 있습니다. $T$ $X$ $Y$ $I$

에 대해 알고 싶습니다 . 그런 다음 확률 이론에 따르면 사용 가능한 모든 정보에 따라 확률을 계산하면됩니다. $Y$

P (Y | T, X, I)

$P(Y|T,X,I)$

이제 확률 이론의 규칙을 사용하여이를 계산 방법을 알고있는 것으로 조작 할 수 있습니다. 따라서 베이 즈 정리를 사용하면 다음을 얻을 수 있습니다.

P (Y | T, X, I) = \frac{P (Y | T, I) P (X | Y, T, I)}{P (X | T, I)}

$P(Y|T,X,I)=\frac{P(Y|T,I)P(X|Y,T,I)}{P(X|T,I)}$

이제 는 일반적으로 쉽습니다. 사전 정보가훈련 데이터 (예 : 상관 관계)를 넘어 에 대해 무언가를 말해 줄 수 없다면, 승계의 규칙에 의해 또는 기본적으로 관찰 된 몇 배의 훈련 데이터 세트에서 사실이었습니다. $P(Y|T,I)$ $Y$ $Y$

두 번째 용어 -이것은 모델이며 대부분의 작업이 수행되는 위치와 다른 알고리즘이 다른 작업을 수행하는 위치입니다. 는 계산하기에 약간의 악의적 인 짐승이므로이를 피하기 위해 다음과 같은 트릭을 수행합니다. $P(X|Y,T,I)$ $P(X|T,I)$ $Y$ $\overline{Y}$ $Y$

O (Y | T, X, I) = \frac{P (Y | T, X, I)}{P (\bar{Y} | T, X, I)} = \frac{P (Y | T, I)}{P (\bar{Y} | T, I)} \frac{P (X | Y, T, I)}{P (X | \bar{Y}, T, I)}

$O(Y|T,X,I)=\frac{P(Y|T,X,I)}{P(\overline{Y}|T,X,I)}=\frac{P(Y|T,I)}{P(\overline{Y}|T,I)}\frac{P(X|Y,T,I)}{P(X|\overline{Y},T,I)}$

$Y$

$P(X|Y,T,I)$ $\theta_{Y}$

P (X | Y, T, I) = \int P (X, θ_{Y} | Y, T, I) d θ = \int P (X | θ_{Y}, Y, T, I) P (θ_{Y} | Y, T, I) d θ_{Y}

$P(X|Y,T,I)=\int P(X,\theta_{Y}|Y,T,I) d\theta = \int P(X|\theta_{Y},Y,T,I)P(\theta_{Y}|Y,T,I) d\theta_{Y}$

$P(X|\theta_{Y},Y,T,I)=P(X|\theta_{Y},Y,I)$ $T$ $P(\theta_{Y}|Y,T,I)$ 모형의 모수에 대한 사후 분포입니다. 이것은 훈련 데이터가 결정하는 부분입니다. 그리고 이것은 아마도 대부분의 작업이 갈 곳입니다.

$\theta_{Y}$ $M_i$ $\theta^{(i)}_{Y}$

P (X | Y, T, I) = \sum_{i} P (M_{i} | Y, T, I) \int P (X | θ_{Y}^{(i)}, M_{i}, Y, T, I) P (θ_{Y}^{(i)} | M_{i}, Y, T, I) d θ_{Y}^{(i)}

$P(X|Y,T,I)= \sum_{i}P(M_{i}|Y,T,I)\int P(X|\theta_{Y}^{(i)},M_{i},Y,T,I)P(\theta_{Y}^{(i)}|M_{i},Y,T,I) d\theta_{Y}^{(i)}$

P (M_{i} | Y, T, I) = P (M_{i} | Y, I) \int P (θ_{Y}^{(i)} | M_{i}, Y, I) P (T | θ_{Y}^{(i)}, M_{i}, Y, I) d θ_{Y}^{(i)}

$P(M_{i}|Y,T,I)=P(M_{i}|Y,I)\int P(\theta_{Y}^{(i)}|M_{i},Y,I)P(T|\theta_{Y}^{(i)},M_{i},Y,I) d\theta_{Y}^{(i)}$

(NOTE: $M_i$ is a proposition of the form "the ith model is the best in the set that is being considered". and no improper priors allowed if you are integrating over models - the infinities do not cancel out in this case, and you will be left with non-sense)

Now, up to this point, all results are exact and optimal (this is the option 2 - apply some awesome algorithm to the data). But this a daunting task to undertake. In the real world, the mathematics required may be not feasible to do in practice - so you will have to compromise. you should always "have a go" at doing the exact equations, for any maths that you can simplify will save you time at the PC. However, this first step is important, because this sets "the target", and it makes it clear what is to be done. Otherwise you are left (as you seem to be) with a whole host of potential options with nothing to choose between them.

Now at this stage, we are still in "symbolic logic" world, where nothing really makes sense. So you need to link these to your specific problem:

$P(M_{i}|Y,I)$ is the prior probability for the ith model - generally will be equal for all i.
$P(\theta_{Y}^{(i)}|M_{i},Y,I)$ is the prior for the parameters in the ith model (must be proper!)
$P(T|\theta_{Y}^{(i)},M_{i},Y,I)$ is the likelihood function for the training data, given the ith model
$P(\theta_{Y}^{(i)}|T,M_{i},Y,I)$ is the posterior for the parameters in the ith model, conditional on the training data.
$P(M_{i}|Y,T,I)$ is the posterior for the ith model conditional on the training data

There will be another set of equations for $\overline{Y}$

Note that the equations will simplify enormously if a) one model is a clear winner, so that $P(M_{j}|Y,T,I)\approx 1$ and b) within this model, its parameters are very accurate, so the integrand resembles a delta function (and integration is very close to substitution or plug-in estimates). If both these conditions are met you have:

P (X | Y, T, I) \approx P (X | θ_{Y}^{(j)}, M_{j}, Y, T, I)_{θ_{Y}^{(j)} = {\hat{θ}}_{Y}^{(j)}}

$P(X|Y,T,I)\approx P(X|\theta_{Y}^{(j)},M_{j},Y,T,I)_{\theta_{Y}^{(j)}=\hat{\theta}_{Y}^{(j)}}$

Which is the "standard" approach to this kind of problem.

— probabilityislogic
소스