데이터 과학 encoding

6

기계 학습 모델에서 월 및 시간과 같은 기능을 요소 또는 숫자로 인코딩하는 것이 더 낫습니까? 한편으로, 나는 시간이 앞으로 진행되는 과정이기 때문에 숫자 인코딩이 합리적이라고 생각하지만 (5 개월은 6 개월이 뒤 따름), 반면에 순환 인코딩으로 인해 범주 형 인코딩이 더 합리적이라고 생각합니다 연도 및 일수 (12 번째 달 다음에 첫 …

23 machine-learning feature-extraction feature-engineering encoding numerical

3

변압기 모델에서 위치 인코딩은 무엇입니까?

나는 ML을 처음 접했고 이것이 나의 첫 번째 질문이므로 내 질문이 어리 석다면 죄송합니다. 나는 종이를 읽고 이해하려고 노력하고 있습니다. 주의는 당신이 필요한 전부 이며 그 안에는 그림이 있습니다. 위치 인코딩 이 무엇인지 모르겠습니다 . 유투브 동영상을 들으면서 단어의 의미와 위치를 모두 포함하고 있으며 관련이 있음을 알게되었습니다.s i n ( …

23 nlp encoding attention-mechanism transformer

2

Sparse_categorical_crossentropy 및 categorical_crossentropy (각도, 정확도)

어느 것이 정확도에 더 좋거나 같습니까? 물론 categorical_crossentropy를 사용하는 경우 하나의 핫 인코딩을 사용하고 sparse_categorical_crossentropy를 사용하는 경우 일반 정수로 인코딩합니다. 또한, 어느 것이 다른 것보다 더 좋은가?

20 neural-network keras loss-function encoding

1

keras로 멀티 클래스 분류에서 문자열 레이블을 처리하는 방법은 무엇입니까?

나는 기계 학습과 keras에 초보자이며 이제 keras를 사용하여 멀티 클래스 이미지 분류 문제를 해결하고 있습니다. 입력은 태그 된 이미지입니다. 사전 처리 후 훈련 데이터는 Python 목록에 다음과 같이 표시됩니다. [["dog", "path/to/dog/imageX.jpg"],["cat", "path/to/cat/imageX.jpg"], ["bird", "path/to/cat/imageX.jpg"]] "dog", "cat"및 "bird"는 클래스 레이블입니다. 이 문제에는 one-hot 인코딩을 사용해야한다고 생각하지만이 문자열 레이블을 처리하는 방법은 명확하지 …

18 machine-learning scikit-learn tensorflow keras encoding

5

seaborn 히트 맵을 더 크게 만들기

corr()원본 df 에서 df를 만듭니다 . corr()DF는 70 X 70에서 나와는 히트 맵을 시각화하는 것은 불가능합니다 ... sns.heatmap(df). 를 표시하려고 corr = df.corr()하면 테이블이 화면에 맞지 않으며 모든 상관 관계를 볼 수 있습니다. df크기에 관계없이 전체를 인쇄 하거나 히트 맵의 크기를 제어하는 방법입니까?

17 visualization pandas plotting machine-learning neural-network svm decision-trees svm efficiency python linear-regression machine-learning nlp topic-model lda named-entity-recognition naive-bayes-classifier association-rules fuzzy-logic kaggle deep-learning tensorflow inception classification feature-selection feature-engineering machine-learning scikit-learn tensorflow keras encoding nlp text-mining nlp rnn python neural-network feature-extraction machine-learning predictive-modeling python r linear-regression clustering r ggplot2 neural-network neural-network training python neural-network deep-learning rnn predictive-modeling databases sql programming distribution dataset cross-validation neural-network deep-learning rnn machine-learning machine-learning python deep-learning data-mining tensorflow visualization tools sql embeddings orange feature-extraction unsupervised-learning gan machine-learning python data-mining pandas machine-learning data-mining bigdata apache-spark apache-hadoop deep-learning python convnet keras aggregation clustering k-means r random-forest decision-trees reference-request visualization data pandas plotting neural-network keras rnn theano deep-learning tensorflow inception predictive-modeling deep-learning regression sentiment-analysis nlp encoding deep-learning python scikit-learn lda convnet keras predictive-modeling regression overfitting regression svm prediction machine-learning similarity word2vec information-retrieval word-embeddings neural-network deep-learning rnn

4

큰 범주 값에 대한 하나의 핫 인코딩 대안?

1600 개가 넘는 범주 값이 큰 데이터 프레임이 있습니다 .1600 개가 넘는 열이 없도록 대안을 찾을 수있는 방법이 있습니까? 나는 재미있는 링크 http://amunategui.github.io/feature-hashing/#sourcecode 아래에서 이것을 발견했다. 그러나 그들은 내가 원하지 않는 클래스 / 객체로 변환하고 있습니다. 다른 머신 러닝 모델로 테스트 할 수 있도록 최종 출력을 데이터 프레임으로 원합니까? 또는 …

13 machine-learning dataset dimensionality-reduction encoding

1

전역 압축 방법과 범용 압축 방법의 차이점은 무엇입니까?

압축 방법은 두 가지 주요 세트로 나눌 수 있음을 이해합니다. 글로벌 현지 첫 번째 세트는 처리중인 데이터에 관계없이 작동합니다. 즉, 데이터의 특성에 의존하지 않으므로 데이터 세트의 일부 (압축 자체 이전)에서 사전 처리를 수행 할 필요가 없습니다. 반면에 로컬 방법은 데이터를 분석하여 일반적으로 압축률을 향상시키는 정보를 추출합니다. 이 방법들 중 일부에 …

12 classification algorithms encoding

1

몇 개의 LSTM 셀을 사용해야합니까?

사용해야하는 최소, 최대 및 "합리적인"양의 LSTM 셀과 관련된 경험 법칙 (또는 실제 규칙)이 있습니까? 특히 TensorFlow 및 속성의 BasicLSTMCell 과 관련이 num_units있습니다. 분류 문제가 다음과 같이 정의되었다고 가정하십시오. t - number of time steps n - length of input vector in each time step m - length of output vector …

12 rnn machine-learning r predictive-modeling random-forest python language-model sentiment-analysis encoding machine-learning deep-learning neural-network dataset caffe classification xgboost multiclass-classification unbalanced-classes time-series descriptive-statistics python r clustering machine-learning python deep-learning tensorflow machine-learning python predictive-modeling probability scikit-learn svm machine-learning python classification gradient-descent regression research python neural-network deep-learning convnet keras python tensorflow machine-learning deep-learning tensorflow python r bigdata visualization rstudio pandas pyspark dataset time-series multilabel-classification machine-learning neural-network ensemble-modeling kaggle machine-learning linear-regression cnn convnet machine-learning tensorflow association-rules machine-learning predictive-modeling training model-selection neural-network keras deep-learning deep-learning convnet image-classification predictive-modeling prediction machine-learning python classification predictive-modeling scikit-learn machine-learning python random-forest sampling training recommender-system books python neural-network nlp deep-learning tensorflow python matlab information-retrieval search search-engine deep-learning convnet keras machine-learning python cross-validation sampling machine-learning

«encoding» 태그된 질문