데이터 과학 kaggle

3

카글 경쟁에 참여하고 있습니다. 데이터 세트에는 약 100 개의 기능이 있으며 모두 실제로는 무엇을 나타내는 지 알 수 없습니다. 기본적으로 그들은 단지 숫자입니다. 사람들은 이러한 기능에 대해 많은 기능 엔지니어링을 수행하고 있습니다. 정확히 알려지지 않은 기능에 대해 기능 공학을 정확히 수행 할 수있는 방법이 궁금합니다. 알려지지 않은 기능에 대해 기능 …

19 machine-learning feature-selection feature-extraction feature-engineering kaggle

5

seaborn 히트 맵을 더 크게 만들기

corr()원본 df 에서 df를 만듭니다 . corr()DF는 70 X 70에서 나와는 히트 맵을 시각화하는 것은 불가능합니다 ... sns.heatmap(df). 를 표시하려고 corr = df.corr()하면 테이블이 화면에 맞지 않으며 모든 상관 관계를 볼 수 있습니다. df크기에 관계없이 전체를 인쇄 하거나 히트 맵의 크기를 제어하는 방법입니까?

17 visualization pandas plotting machine-learning neural-network svm decision-trees svm efficiency python linear-regression machine-learning nlp topic-model lda named-entity-recognition naive-bayes-classifier association-rules fuzzy-logic kaggle deep-learning tensorflow inception classification feature-selection feature-engineering machine-learning scikit-learn tensorflow keras encoding nlp text-mining nlp rnn python neural-network feature-extraction machine-learning predictive-modeling python r linear-regression clustering r ggplot2 neural-network neural-network training python neural-network deep-learning rnn predictive-modeling databases sql programming distribution dataset cross-validation neural-network deep-learning rnn machine-learning machine-learning python deep-learning data-mining tensorflow visualization tools sql embeddings orange feature-extraction unsupervised-learning gan machine-learning python data-mining pandas machine-learning data-mining bigdata apache-spark apache-hadoop deep-learning python convnet keras aggregation clustering k-means r random-forest decision-trees reference-request visualization data pandas plotting neural-network keras rnn theano deep-learning tensorflow inception predictive-modeling deep-learning regression sentiment-analysis nlp encoding deep-learning python scikit-learn lda convnet keras predictive-modeling regression overfitting regression svm prediction machine-learning similarity word2vec information-retrieval word-embeddings neural-network deep-learning rnn

3

왜 치우친 데이터를 정규 분포로 변환합니까?

나는 Kaggle ( House Price on Human Price 's Kernel on House Price : Advance Regression Techniques ) 에서 주택 가격 경쟁의 해결책을 겪고 있었고이 부분을 보았습니다 . # Transform the skewed numeric features by taking log(feature + 1). # This will make the features more normal. from scipy.stats import …

15 regression feature-extraction feature-engineering kaggle feature-scaling

1

해싱 트릭-실제로 일어나는 일

Vowpal Wabbit 또는 일부 인수 분해 시스템이 클릭률 경쟁 ( Kaggle ) 을 수상한 ML 알고리즘과 같이 기능이 '해시'되었다고 언급하면 실제로 모델에 어떤 의미가 있습니까? 인터넷 추가의 ID를 나타내는 변수가 있는데 '236BG231'과 같은 값을 사용합니다. 그런 다음이 기능은 임의의 정수로 해시된다는 것을 이해합니다. 그러나 내 질문은 다음과 같습니다. 이제 모델에서 …

12 machine-learning predictive-modeling kaggle

1

몇 개의 LSTM 셀을 사용해야합니까?

사용해야하는 최소, 최대 및 "합리적인"양의 LSTM 셀과 관련된 경험 법칙 (또는 실제 규칙)이 있습니까? 특히 TensorFlow 및 속성의 BasicLSTMCell 과 관련이 num_units있습니다. 분류 문제가 다음과 같이 정의되었다고 가정하십시오. t - number of time steps n - length of input vector in each time step m - length of output vector …

12 rnn machine-learning r predictive-modeling random-forest python language-model sentiment-analysis encoding machine-learning deep-learning neural-network dataset caffe classification xgboost multiclass-classification unbalanced-classes time-series descriptive-statistics python r clustering machine-learning python deep-learning tensorflow machine-learning python predictive-modeling probability scikit-learn svm machine-learning python classification gradient-descent regression research python neural-network deep-learning convnet keras python tensorflow machine-learning deep-learning tensorflow python r bigdata visualization rstudio pandas pyspark dataset time-series multilabel-classification machine-learning neural-network ensemble-modeling kaggle machine-learning linear-regression cnn convnet machine-learning tensorflow association-rules machine-learning predictive-modeling training model-selection neural-network keras deep-learning deep-learning convnet image-classification predictive-modeling prediction machine-learning python classification predictive-modeling scikit-learn machine-learning python random-forest sampling training recommender-system books python neural-network nlp deep-learning tensorflow python matlab information-retrieval search search-engine deep-learning convnet keras machine-learning python cross-validation sampling machine-learning

2

내 훈련 세트에 음의 y 값이 없을 때 그라디언트 증폭 회귀 분석에서 음수 값을 예측하는 이유는 무엇입니까?

내가 나무의 수를 증가로 scikit 학습 의를 GradientBoostingRegressor, 나는 부정적인 값 내 훈련이나 설정을 테스트에 없다하더라도, 더 부정적인 예측을 얻을. 나는 약 10 가지 기능을 가지고 있으며, 대부분 바이너리입니다. 내가 튜닝 한 일부 매개 변수는 다음과 같습니다. 나무 / 반복 횟수; 학습 깊이; 학습 속도. 음수 값의 백분율은 ~ 2 …

8 machine-learning python algorithms scikit-learn kaggle

«kaggle» 태그된 질문