데이터 과학 nlp

4

다른 형식의 문서를 비교할 때 TF-IDF 및 코사인 유사성에 대한 대안

저는 작고 개인적인 프로젝트를 진행하고 있는데,이 프로젝트는 사용자의 직무 기술을 활용하고 그 기술을 기반으로 가장 이상적인 경력을 제안합니다. 나는 이것을 달성하기 위해 직업 목록 데이터베이스를 사용합니다. 현재 코드는 다음과 같이 작동합니다. 1) 각 직업 목록의 텍스트를 처리하여 목록에 언급 된 기술을 추출합니다. 2) 각 경력 (예 : "데이터 분석가")에 대해 …

12 nlp text-mining similarity cosine-distance

3

n- 그램으로 색인 된 데이터를 저장하기위한 효율적인 데이터베이스 모델

큰 텍스트 모음에 존재하는 매우 큰 n-gram 데이터베이스를 만들어야하는 응용 프로그램을 작성 중입니다. 효율적인 3 가지 연산 유형이 필요합니다 : n-gram 자체에 의해 색인 된 검색 및 삽입, sub-n-gram을 포함하는 모든 n-gram을 쿼리합니다. 데이터베이스가 거대한 문서 트리 여야하고 Mongo와 같은 문서 데이터베이스가 작업을 잘 수행 할 수 있어야하는 것처럼 들리지만 …

12 nlp databases

3

NLTK의 NER 관련 도움말

파이썬을 사용하는 동안 NLTK에서 한동안 일했습니다. 내가 직면하고있는 문제는 NLTK의 NER를 내 사용자 정의 데이터로 훈련시키는 데 도움이되지 않는다는 것입니다. 그들은 MaxEnt를 사용하여 ACE 말뭉치에 대해 교육했습니다. 웹에서 많이 검색했지만 NLTK의 NER를 훈련시키는 데 사용할 수있는 방법을 찾지 못했습니다. 누구든지 NLTK NER 교육에 사용되는 교육 데이터 세트 형식으로 안내 할 …

12 machine-learning python nlp

1

몇 개의 LSTM 셀을 사용해야합니까?

사용해야하는 최소, 최대 및 "합리적인"양의 LSTM 셀과 관련된 경험 법칙 (또는 실제 규칙)이 있습니까? 특히 TensorFlow 및 속성의 BasicLSTMCell 과 관련이 num_units있습니다. 분류 문제가 다음과 같이 정의되었다고 가정하십시오. t - number of time steps n - length of input vector in each time step m - length of output vector …

12 rnn machine-learning r predictive-modeling random-forest python language-model sentiment-analysis encoding machine-learning deep-learning neural-network dataset caffe classification xgboost multiclass-classification unbalanced-classes time-series descriptive-statistics python r clustering machine-learning python deep-learning tensorflow machine-learning python predictive-modeling probability scikit-learn svm machine-learning python classification gradient-descent regression research python neural-network deep-learning convnet keras python tensorflow machine-learning deep-learning tensorflow python r bigdata visualization rstudio pandas pyspark dataset time-series multilabel-classification machine-learning neural-network ensemble-modeling kaggle machine-learning linear-regression cnn convnet machine-learning tensorflow association-rules machine-learning predictive-modeling training model-selection neural-network keras deep-learning deep-learning convnet image-classification predictive-modeling prediction machine-learning python classification predictive-modeling scikit-learn machine-learning python random-forest sampling training recommender-system books python neural-network nlp deep-learning tensorflow python matlab information-retrieval search search-engine deep-learning convnet keras machine-learning python cross-validation sampling machine-learning

3

파이썬에 적합한 기본 언어 모델이 있습니까?

응용 프로그램을 프로토 타이핑하고 있으며 생성 된 일부 문장의 난이도를 계산하려면 언어 모델이 필요합니다. 파이썬에서 쉽게 사용할 수있는 훈련 된 언어 모델이 있습니까? 간단한 것 model = LanguageModel('en') p1 = model.perplexity('This is a well constructed sentence') p2 = model.perplexity('Bunny lamp robert junior pancake') assert p1 < p2 일부 프레임 워크를 …

11 python nlp language-model r statistics linear-regression machine-learning classification random-forest xgboost python sampling data-mining orange predictive-modeling recommender-system statistics dimensionality-reduction pca machine-learning python deep-learning keras reinforcement-learning neural-network image-classification r dplyr deep-learning keras tensorflow lstm dropout machine-learning sampling categorical-data data-imputation machine-learning deep-learning machine-learning-model dropout deep-network pandas data-cleaning data-science-model aggregation python neural-network reinforcement-learning policy-gradients r dataframe dataset statistics prediction forecasting r k-means python scikit-learn labels python orange cloud-computing machine-learning neural-network deep-learning rnn recurrent-neural-net logistic-regression missing-data deep-learning autoencoder apache-hadoop time-series data preprocessing classification predictive-modeling time-series machine-learning python feature-selection autoencoder deep-learning keras tensorflow lstm word-embeddings predictive-modeling prediction machine-learning-model machine-learning classification binary theory machine-learning neural-network time-series lstm rnn neural-network deep-learning keras tensorflow convnet computer-vision

4

문장에서 정보 추출

간단한 챗봇을 만들고 있습니다. 사용자 응답에서 정보를 얻고 싶습니다. 시나리오 예 : Bot : Hi, what is your name? User: My name is Edwin. 문장에서 Edwin이라는 이름을 추출하고 싶습니다. 그러나 사용자는 다음과 같은 다른 방식으로 응답 할 수 있습니다. User: Edwin is my name. User: I am Edwin. User: Edwin. …

11 python nlp

2

"의도 인식 자"는 어떻게 작동합니까?

아마존의 Alexa , Nuance 's Mix 및 Facebook의 Wit.ai는 모두 비슷한 시스템을 사용하여 텍스트 명령을 의도로 변환하는 방법, 즉 컴퓨터가 이해할 수있는 방법을 지정합니다. 나는 이것의 "공식적인"이름이 무엇인지 잘 모르겠지만 "의도 인식"이라고 부릅니다. 기본적으로 "조명을 50 % 밝기로 설정하십시오"에서로 전환하는 방법은입니다 lights.setBrightness(0.50). 그들이 지정하는 방법은 개발자가 의도와 연관되고 선택적으로 "엔터티"(기본적으로 …

11 machine-learning nlp

1

문자 순서가 영어 단어인지 잡음인지 확인하는 방법

향후 예측을 위해 단어 목록에서 어떤 종류의 기능을 추출하려고 시도합니까? 기존 단어입니까 아니면 문자 혼란입니까? 내가 찾은 작업에 대한 설명 이 있습니다. 주어진 단어가 영어인지 대답 할 수있는 프로그램을 작성해야합니다. 사전에서 단어를 찾아보기 만하면 되기는 쉽지만 중요한 제한이 있습니다. 프로그램이 64KiB를 넘지 않아야합니다. 따라서 문제를 해결하기 위해 로지스틱 회귀를 사용할 …

11 machine-learning nlp text-mining algorithms

4

보이지 않는 단어를 식별하고 이미 훈련 된 데이터와 연관시키기 위해 word2vec를 사용하는 방법

나는 word2vec gensim 모델을 작업하고 있었고 정말 흥미 롭습니다. 모델을 확인할 때 알 수없는 / 보이지 않는 단어가 훈련 된 모델에서 유사한 용어를 얻을 수있는 방법을 찾는 데 흥미가 있습니다. 이게 가능해? word2vec를 조정할 수 있습니까? 또는 훈련 말뭉치에는 내가 비슷한 것을 찾고 싶은 모든 단어가 있어야합니다.

11 nlp deep-learning word-embeddings unsupervised-learning

1

작은 텍스트 파일에 word2vec 적용

나는 word2vec에 완전히 새로운 그래서 pls는 나와 함께 견딜. 각각 1000-3000 사이의 트윗 세트를 포함하는 텍스트 파일 세트가 있습니다. 공통 키워드 ( "kw1")를 선택했으며 word2vec를 사용하여 "kw1"에 대한 의미 적으로 관련있는 용어를 찾고 싶습니다. 예를 들어 키워드가 "apple"인 경우 입력 파일을 기준으로 "ipad" "os" "mac"...와 같은 관련 용어가 표시됩니다. 따라서 …

11 machine-learning nlp text-mining

3

해싱 벡터 라이저와 tfidf 벡터 라이저의 차이점은 무엇입니까?

텍스트 문서 모음을 각 문서의 단어 벡터로 변환하고 있습니다. 내가 사용이 시도했습니다 TfidfVectorizer 과 HashingVectorizer을 나는 a 처럼 점수 HashingVectorizer를 고려하지 않는다는 것을 이해합니다 . 내가 아직도 일하고있는 이유 는 here 및 here 설명 된 것처럼 거대한 데이터 세트를 처리하는 동안 제공하는 유연성 때문 입니다. (내 원래 데이터 세트에는 3 …

11 nlp scikit-learn text-mining tfidf

3

NER에 대한 감독되지 않은 기능 학습

나는 수작업으로 만들어진 CRF 알고리즘을 사용하여 NER 시스템을 구현하여 꽤 좋은 결과를 얻었습니다. 문제는 POS 태그 및 보조 정리를 포함하여 많은 다른 기능을 사용했다는 것입니다. 이제 다른 언어에 대해 동일한 NER를 만들고 싶습니다. 여기서 문제는 POS 태그와 젬마를 사용할 수 없다는 것입니다. 딥 러닝 및 비지도 기능 학습에 대한 기사를 …

11 nlp text-mining feature-extraction

3

자연어 쿼리를 처리하는 방법?

자연어 쿼리에 대해 궁금합니다. 스탠포드는 자연 언어 처리를위한 강력한 소프트웨어 세트를 가지고 있습니다 . 또한 Apache OpenNLP 라이브러리 와 텍스트 엔지니어링을위한 일반 아키텍처를 보았습니다 . 자연어 처리에는 엄청나게 많은 용도가 있으며 이러한 프로젝트의 문서를 빨리 흡수하기 어렵게 만듭니다. 간단한 질문을 SQL로 기본 변환하는 데 필요한 작업을 개략적으로 간략하게 설명 할 …

11 nlp

3

과학 컴퓨팅을위한 최고의 언어

폐쇄되었습니다 . 이 질문은 더 집중되어야 합니다. 현재 답변을받지 않습니다. 이 질문을 개선하고 싶습니까? 이 게시물 을 편집 하여 한 가지 문제에만 집중할 수 있도록 질문을 업데이트하십시오 . 휴일 오년 전에 . 대부분의 언어에는 몇 가지 과학 컴퓨팅 라이브러리가 있습니다. 파이썬은 Scipy Rust 있다 SciRust C++이 등 여러 가지 ViennaCL와Armadillo …

10 efficiency statistics tools knowledge-base machine-learning neural-network deep-learning optimization hyperparameter machine-learning time-series categorical-data logistic-regression python visualization bigdata efficiency classification binary svm random-forest logistic-regression data-mining sql experiments bigdata efficiency performance scalability distributed bigdata nlp statistics education knowledge-base definitions machine-learning recommender-system evaluation efficiency algorithms parameter efficiency scalability sql statistics visualization knowledge-base education machine-learning r python r text-mining sentiment-analysis machine-learning machine-learning python neural-network statistics reference-request machine-learning data-mining python classification data-mining bigdata usecase apache-hadoop map-reduce aws education feature-selection machine-learning machine-learning sports data-formats hierarchical-data-format bigdata apache-hadoop bigdata apache-hadoop python visualization knowledge-base classification confusion-matrix accuracy bigdata apache-hadoop bigdata efficiency apache-hadoop distributed machine-translation nlp metadata data-cleaning text-mining python pandas machine-learning python pandas scikit-learn bigdata machine-learning databases clustering data-mining recommender-system

3

Word2Vec과 Doc2Vec은 모두 분산 표현입니까 아니면 분산 표현입니까?

분포 표현은 유사한 맥락에서 발생하는 단어가 유사한 의미를 갖는 경향이 있다는 분포 가설을 기반으로한다는 것을 읽었습니다. Word2Vec과 Doc2Vec은 모두이 가설에 따라 모델링됩니다. 그러나, 원래의 논문에서, 심지어는 같은 제목되는 Distributed representation of words and phrases과 Distributed representation of sentences and documents. 따라서 이러한 알고리즘은 분포 표현 또는 분산 표현을 기반으로합니다. LDA …

10 nlp word-embeddings terminology word2vec

«nlp» 태그된 질문