SVM 외에 어떤 알고리즘에 기능 확장이 필요합니까?

17

RandomForest, DecisionTrees, NaiveBayes, SVM (커널 = 선형 및 rbf), KNN, LDA 및 XGBoost와 같은 많은 알고리즘을 사용하고 있습니다. SVM을 제외하고는 모두 매우 빠릅니다. 그때는 기능 확장이 더 빨리 작동해야한다는 것을 알게되었습니다. 그런 다음 다른 알고리즘에 대해서도 동일한 작업을 수행해야하는지 궁금해지기 시작했습니다.

— 아이 자삭
소스

관련 : 정규화 및 기능 확장이 어떻게, 왜 작동합니까?

— Franck Dernoncourt

또한 : 모델을 만들기 전에 변수를 조정 (예 : 표준화)하는 경우가 있습니다. 언제 이것이 좋은 생각이고 언제 나쁜 것입니까?

— Franck Dernoncourt

21

일반적으로 거리 를 이용하는 알고리즘 또는 유사성 으로 k-NN 및 SVM과 같은 데이터 샘플 사이의 (예 : 스칼라 곱 형태) 기능 변환에 민감합니다.

Fisher LDA 또는 Naive Bayes와 같은 그래픽 모델 기반 분류기와 의사 결정 트리 및 RF 기반 X 앙상블 분석법 (RF, XGB)은 기능 스케일링에 변동이 없지만 여전히 데이터의 크기를 조정 / 표준화하는 것이 좋습니다. .

— 외침
소스

3

+1. XGBoost는 실제로 선형 부스팅을 기반으로 두 번째 알고리즘도 구현합니다. 스케일링은 거기에서 변화를 가져올 것입니다.

— usεr11852는

1

RF 및 XGB의 데이터 크기 조정 / 표준화에 대해 더 자세히 설명해 주시겠습니까? 모델의 품질에 어떤 영향을 줄 수 있는지 모르겠습니다.

— Tomek Tarczynski

17

다음은 http://www.dataschool.io/comparing-supervised-learning-algorithms/ 에서 찾은 목록으로, 기능 분류가 필요한 분류 자를 나타냅니다 .

전체 테이블 :

k- 평균 군집화 에서는 입력을 정규화해야합니다 .

분류 기가 Yell Bond가 언급 한 것과 같이 거리 또는 유사성을 이용하는지 여부를 고려하는 것 외에도, 스토캐스틱 그라디언트 디센트 (Stochastic Gradient Descent)는 특징 스케일링에 민감합니다 .

참고 문헌 :

{1} 찰스 엘칸. "로그 선형 모델 및 조건부 랜덤 필드." CIKM 8 (2008)의 튜토리얼 노트. https://scholar.google.com/scholar?cluster=5802800304608191219&hl=ko&as_sdt=0,22 ; https://pdfs.semanticscholar.org/b971/0868004ec688c4ca87aa1fec7ffb7a2d01d8.pdf

— 프랭크 데논 코트
소스

이 답변에서 부족한 것은 이유에 대한 설명입니다! 그것에 대한 내 대답을 참조하십시오.

— kjetil b halvorsen

2

@kjetilbhalvorsen은 k-means와 SGD에 대해 잘 설명했지만 다른 알고리즘과 모델이 많이 있습니다. 스택 교환에는 30k 문자 제한이 있습니다 :)

— Franck Dernoncourt

다소 관련 : stats.stackexchange.com/questions/231285/…

— kjetil b halvorsen

@FranckDernoncourt 이것에 대한 질문을해도 될까요? 범주 형 데이터와 연속 형 데이터의 데이터 세트가 있으며 SVM을 구축하고 있습니다. 연속 데이터가 크게 왜곡됩니다 (긴 꼬리). 연속에 변화를 위해 내가 어떻게해야 log transformation / Box-Cox하고 그 다음 도 normalise the resultant data to get limits between 0 and 1? 로그 값을 정규화하겠습니다. 그런 다음 연속 및 범주 형 (0-1) 데이터에 대한 SVM을 함께 계산 하시겠습니까? 당신이 제공 할 수있는 모든 도움을 원합니다.

— Chuck

7

Yell Bond의 우수하지만 너무 짧은 답변에 추가하십시오. 선형 회귀 모형의 결과를 살펴보고 예측 변수는 두 개 뿐이지 만 문제는 여기에 의존하지 않습니다.

{와이}_{나는} = β_{0} + β_{1} {엑스}_{나는} + β_{2} 지_{나는} + ϵ_{나는}

$Y_i = \beta_0 + \beta_1 x_i + \beta_2 z_i + \epsilon_i$

i = 1, \dots, n

$i=1, \dots, n$ . 예를 들어 예측 변수를 중앙에 배치하고

{엑스}_{나는}^{※} = ({엑스}_{나는} - \bar{엑스}) / sd (엑스) 지_{나는}^{※} = (지_{나는} - \bar{지}) / sd (지)

$x_i^* = (x_i - \bar{x})/\text{sd}(x) \\ z_i^* = (z_i - \bar{z})/\text{sd}(z)$ 대신 모델을 적합시킵니다 (보통 최소 제곱 사용)

{와이}_{나는} = β_{0}^{※} + β_{1}^{※} {엑스}_{나는}^{※} + β_{2}^{※} 지_{나는}^{※} + ϵ_{나는}

$Y_i = \beta_0^* + \beta_1^* x_i^* + \beta_2^* z_i^* + \epsilon_i$ 그러면 적합 된 매개 변수 (베타)가 변경되지만 적용된 변환에서 간단한 대수로 계산할 수있는 방식으로 변경됩니다. 변환 예측 변수를 사용하여 모델의 추정 베타를 호출하면

β_{1, 2}^{*}

$\beta_{1,2}^*$ 변환되지 않은 모델의 베타를

{\hat{β}}_{1, 2}

$\hat{\beta}_{1,2}$ , we can calculate one set of betas from the other one, knowing the means and standard deviations of the predictors. The realtionship between the transformed and untransformed parameters is the same as between their estimates, when based on OLS. Some algebra will give the relationship as

β_{0} = β_{0}^{*} - \frac{β_{1}^{*} \bar{x}}{sd(x)} - \frac{β_{2}^{*} \bar{z}}{sd(z)}, β_{1} = \frac{β_{1}^{*}}{sd(x)}, β_{2} = \frac{β_{2}^{*}}{sd(z)}

$\beta_0=\beta_0^* - \frac{\beta_1^* \bar{x}}{\text{sd(x)}} -\frac{\beta_2^*\bar{z}}{\text{sd(z)}},\quad \beta_1 =\frac{\beta_1^*}{\text{sd(x)}},\quad \beta_2=\frac{\beta_2^*}{\text{sd(z)}}$ So standardization is not a necessary part of modelling. (It might still be done for other reasons, which we do not cover here). This answer depends also upon us using ordinary least squares. For some other fitting methods, such as ridge or lasso, standardization is important, because we loose this invariance we have with least squares. This is easy to see: both lasso and ridge do regularization based on the size of the betas, so any transformation which change the relative sizes of the betas will change the result!

And this discussion for the case of linear regression tells you what you should look after in other cases: Is there invariance, or is it not? Generally, methods which depends on distance measures among the predictors will not show invariance, so standardization is important. Another example will be clustering.

— kjetil b halvorsen
소스

1

Can you explicitly show how one calculates one set of betas from the other in this particular example of scalings you have applied?

— Mathews24

@kjetil Can I ask you a question building on this? I have a dataset of both categorical and continuous data, for which I'm building an SVM. The continuous data is highly skewed (long tail). For transformation on the continuous should I do a log transformation / Box-Cox and then also normalise the resultant data to get limits between 0 and 1? So i'll be normalising the log values. Then calculate the SVM on the continuous and categorical (0-1) data together? Cheers for any help you can provide

— Chuck

1

Can you please add this as a new question? with reference back here!

— kjetil b halvorsen