프로그램 작성 beautifulsoup

28

UnicodeEncodeError : 'ascii'코덱은 위치 20에서 문자 u '\ xa0'을 인코딩 할 수 없습니다. 서 수가 범위 내에 있지 않습니다 (128)

다른 웹 페이지 (다른 사이트)에서 가져온 텍스트의 유니 코드 문자를 처리하는 데 문제가 있습니다. BeautifulSoup을 사용하고 있습니다. 문제는 오류가 항상 재현 가능한 것은 아니라는 것입니다. 때로는 일부 페이지에서 작동하며 때로는UnicodeEncodeError 않습니다. 나는 생각할 수있는 모든 것을 시도했지만 유니 코드 관련 오류를 발생시키지 않고 일관되게 작동하는 것을 찾지 못했습니다. 문제를 일으키는 …

1296 python unicode beautifulsoup python-2.x python-unicode

16

클래스별로 요소를 찾는 방법

Beautifulsoup을 사용하여 "클래스"속성이있는 HTML 요소를 구문 분석하는 데 문제가 있습니다. 코드는 다음과 같습니다 soup = BeautifulSoup(sdata) mydivs = soup.findAll('div') for div in mydivs: if (div["class"] == "stylelistrow"): print div 스크립트가 끝나고 "후"같은 줄에 오류가 발생합니다. File "./beautifulcoding.py", line 130, in getlanguage if (div["class"] == "stylelistrow"): File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 599, in …

386 python html web-scraping beautifulsoup

12

bs4.FeatureNotFound : 요청한 기능을 가진 트리 빌더를 찾을 수 없습니다 : lxml. 파서 라이브러리를 설치해야합니까?

... soup = BeautifulSoup(html, "lxml") File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? 위의 결과는 내 터미널에서 출력됩니다. Mac OS 10.7.x에 있습니다. Python 2.7.1이 있고이 자습서 를 따라 Beautiful Soup …

224 python python-2.7 beautifulsoup lxml

6

UnicodeEncodeError : 'charmap'코덱이 문자를 인코딩 할 수 없습니다

웹 사이트를 긁으려고하는데 오류가 발생합니다. 다음 코드를 사용하고 있습니다. import urllib.request from bs4 import BeautifulSoup get = urllib.request.urlopen("https://www.website.com/") html = get.read() soup = BeautifulSoup(html) print(soup) 그리고 다음과 같은 오류가 발생합니다. File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 70924-70950: character maps to …

205 python beautifulsoup urllib

5

TypeError : 파이썬과 CSV에서 'str'이 아닌 바이트와 유사한 객체가 필요합니다.

TypeError : 'str'이 아닌 바이트와 같은 객체가 필요합니다. Csv 파일에 HTML 테이블 데이터를 저장하기 위해 Python 코드 아래에서 실행하는 동안 오류가 발생합니다. rideup.pls를 얻는 방법을 모르십시오. import csv import requests from bs4 import BeautifulSoup url='http://www.mapsofindia.com/districts-india/' response=requests.get(url) html=response.content soup=BeautifulSoup(html,'html.parser') table=soup.find('table', attrs={'class':'tableizer-table'}) list_of_rows=[] for row in table.findAll('tr')[1:]: list_of_cells=[] for cell in row.findAll('td'): …

173 csv python-3.x beautifulsoup html-table

11

아름다운 수프와 ID로 div와 그 내용 추출

soup.find("tagName", { "id" : "articlebody" }) 왜 이것이 <div id="articlebody"> ... </div>사이에 태그와 물건을 반환하지 않습니까? 아무것도 반환하지 않습니다. 그리고 나는 그것이 바로 그것을 쳐다보고 있기 때문에 그것이 존재한다는 것을 알고 있습니다. soup.prettify() soup.find("div", { "id" : "articlebody" }) 또한 작동하지 않습니다. ( 편집 : BeautifulSoup이 내 페이지를 올바르게 구문 …

147 python beautifulsoup

16

파이썬과 BeautifulSoup을 사용하여 웹 페이지에서 링크를 검색

파이썬을 사용하여 웹 페이지의 링크를 검색하고 링크의 URL 주소를 복사하려면 어떻게해야합니까?

141 python web-scraping hyperlink beautifulsoup

16

ImportError : 이름이 bs4 인 모듈 없음 (BeautifulSoup)

저는 Python에서 일하고 Flask를 사용하고 있습니다. 내 컴퓨터에서 기본 Python 파일을 실행하면 완벽하게 작동하지만 venv를 활성화하고 터미널에서 Flask Python 파일을 실행하면 기본 Python 파일에 "No Module Named bs4"가 표시됩니다. 모든 의견이나 조언을 부탁드립니다.

138 python beautifulsoup flask importerror

8

BeautifulSoup과 Scrapy 크롤러의 차이점은 무엇입니까?

아마존과 이베이 제품 가격의 비교를 보여주는 웹 사이트를 만들고 싶습니다. 이 중 어느 것이 더 잘 작동하며 왜 그럴까요? 나는 BeautifulSoup에 다소 익숙 하지만 Scrapy 크롤러 에는별로 익숙 하지 않습니다 .

134 python beautifulsoup scrapy web-crawler

10

BeautifulSoup Grab Visible Webpage Text

기본적으로 BeautifulSoup을 사용 하여 웹 페이지에 표시되는 텍스트 를 엄격하게 잡으려고합니다 . 예를 들어, 이 웹 페이지 는 제 테스트 케이스입니다. 그리고 주로 본문 텍스트 (기사)와 여기저기서 탭 이름 몇 개만 가져오고 싶습니다. 나는 내가 원하지 않는 많은 태그와 html 주석 을 반환하는 이 SO 질문 에서 제안을 시도했습니다 <script>. …

124 python text beautifulsoup html-content-extraction

17

스크래핑 : SSL : http://en.wikipedia.org에 대한 CERTIFICATE_VERIFY_FAILED 오류

나는 '파이썬으로 웹 스크래핑'의 코드를 연습하고 있으며이 인증서 문제가 계속 발생합니다. from urllib.request import urlopen from bs4 import BeautifulSoup import re pages = set() def getLinks(pageUrl): global pages html = urlopen("http://en.wikipedia.org"+pageUrl) bsObj = BeautifulSoup(html) for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")): if 'href' in link.attrs: if link.attrs['href'] not in pages: #We have …

123 python web-scraping beautifulsoup scrapy ssl-certificate

6

BeautifulSoup을 사용하여 노드의 자식을 찾는 방법

나는 <a>자식 인 모든 태그 를 얻고 싶다.<li> . <div> <li class="test"> <a>link1</a> <ul> <li> <a>link2</a> </li> </ul> </li> </div> 다음과 같은 특정 클래스의 요소를 찾는 방법을 알고 있습니다. soup.find("li", { "class" : "test" }) 그러나 나는 모든 것을 찾는 방법을 모른다 <a><li class=test> 다른 사람이 아닌 모든 자녀 . …

115 python html beautifulsoup

6

뷰티플 스프로 속성 값 추출

웹 페이지의 특정 "입력"태그에서 단일 "값"속성의 내용을 추출하려고합니다. 다음 코드를 사용합니다. import urllib f = urllib.urlopen("http://58.68.130.147") s = f.read() f.close() from BeautifulSoup import BeautifulStoneSoup soup = BeautifulStoneSoup(s) inputTag = soup.findAll(attrs={"name" : "stainfo"}) output = inputTag['value'] print str(output) TypeError : list indices must be integers, not str Beautifulsoup 문서에서 나는 문자열이 …

111 python parsing attributes beautifulsoup

9

BeautifulSoup에서 xpath를 사용할 수 있습니까?

BeautifulSoup을 사용하여 URL을 긁어 내고 다음 코드가 있습니다. import urllib import urllib2 from BeautifulSoup import BeautifulSoup url = "http://www.example.com/servlet/av/ResultTemplate=AVResult.html" req = urllib2.Request(url) response = urllib2.urlopen(req) the_page = response.read() soup = BeautifulSoup(the_page) soup.findAll('td',attrs={'class':'empformbody'}) 이제 위의 코드에서 findAll태그와 관련 정보를 가져 오는 데 사용할 수 있지만 xpath를 사용하고 싶습니다. BeautifulSoup에서 xpath를 사용할 …

106 python xpath beautifulsoup urllib

7

Python : BeautifulSoup-이름 속성을 기반으로 속성 값 가져 오기

예를 들어 이름을 기반으로 속성 값을 인쇄하고 싶습니다. <META NAME="City" content="Austin"> 이렇게하고 싶어요 soup = BeautifulSoup(f) //f is some HTML containing the above meta tag for meta_tag in soup('meta'): if meta_tag['name'] == 'City': print meta_tag['content'] 위의 코드는를 제공합니다 KeyError: 'name'. 이름이 BeatifulSoup에서 사용되기 때문에 키워드 인수로 사용할 수 없기 때문이라고 …

95 python beautifulsoup

«beautifulsoup» 태그된 질문