152

인터넷에서 이미지의 URL을 알고 있습니다.

예 : http://www.digimouth.com/news/media/2011/09/google-logo.jpg ( Google 로고 포함)

이제 브라우저에서 URL을 실제로 열고 파일을 수동으로 저장하지 않고 Python을 사용 하여이 이미지를 다운로드하는 방법은 무엇입니까?

python web-scraping

— 판 카이 밧사
소스

1

파이썬을 사용하여 HTTP를 통해 파일

— Jaydev

316

파이썬 2

다음은 파일로 파일을 저장하는 것보다 간단한 방법입니다.

import urllib

urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

두 번째 인수는 파일을 저장해야하는 로컬 경로입니다.

파이썬 3

SergO가 제안한 것처럼 아래 코드는 Python 3에서 작동해야합니다.

import urllib.request

urllib.request.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

— Liquid_Fire
소스

55

링크에서 파일 이름을 얻는 좋은 방법은filename = link.split('/')[-1]

— heltonbiker

2

urlretrieve를 사용하면 dict 및 404 오류 텍스트가 포함 된 1KB 파일을 얻습니다. 브라우저에 url을 입력하면 사진을 볼 수 있습니다

— Yebach

2

@Yebach : 다운로드하는 사이트에서 쿠키, User-Agent 또는 기타 헤더를 사용하여 서비스 할 콘텐츠를 결정할 수 있습니다. 브라우저와 Python에 따라 다릅니다.

— Liquid_Fire 2016 년

27

파이썬 3 : import urllib.request 및urllib.request.urlretrieve().

— SergO

1

@ SergO-Python 3 부분을 원래 답변에 추가 할 수 있습니까?

— Sreejith Menon

27

import urllib
resource = urllib.urlopen("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")
output = open("file01.jpg","wb")
output.write(resource.read())
output.close()

file01.jpg 이미지가 포함됩니다.

— 누팔 이브라힘
소스

2

이진 모드에서 파일을 열어야합니다. open("file01.jpg", "wb")그렇지 않으면 이미지가 손상 될 수 있습니다.

— Liquid_Fire

2

urllib.urlretrieve이미지를 직접 저장할 수 있습니다.

— heltonbiker

17

이 작업을 수행하는 스크립트를 작성 했으며 github에서 사용할 수 있습니다.

BeautifulSoup을 사용하여 웹 사이트의 이미지를 파싱 할 수있었습니다. 많은 웹 스크래핑을 수행하거나 내 도구를 사용하려는 경우 권장합니다 sudo pip install BeautifulSoup. BeautifulSoup에 대한 정보는 여기에 있습니다 .

편의상 여기 내 코드가 있습니다.

from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib

# use this image scraper from the location that 
#you want to save scraped images to

def make_soup(url):
    html = urlopen(url).read()
    return BeautifulSoup(html)

def get_images(url):
    soup = make_soup(url)
    #this makes a list of bs4 element tags
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + "images found.")
    print 'Downloading images to current working directory.'
    #compile our unicode list of image links
    image_links = [each.get('src') for each in images]
    for each in image_links:
        filename=each.split('/')[-1]
        urllib.urlretrieve(each, filename)
    return image_links

#a standard call looks like this
#get_images('http://www.wookmark.com')

— 예.
소스

11

요청으로 수행 할 수 있습니다. 페이지를로드하고 이진 컨텐츠를 파일로 덤프하십시오.

import os
import requests

url = 'https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg'
page = requests.get(url)

f_ext = os.path.splitext(url)[-1]
f_name = 'img{}'.format(f_ext)
with open(f_name, 'wb') as f:
    f.write(page.content)

— AlexG
소스

1

요청이 잘못된 경우 요청의 사용자 헤더 :)

— 1UC1F3R616

8

파이썬 3

urllib.request — URL을 열기위한 확장 가능한 라이브러리

from urllib.error import HTTPError
from urllib.request import urlretrieve

try:
    urlretrieve(image_url, image_local_path)
except FileNotFoundError as err:
    print(err)   # something wrong with local path
except HTTPError as err:
    print(err)  # something wrong with url

— SergO
소스

6

Python 2 및 Python 3에서 작동하는 솔루션

try:
    from urllib.request import urlretrieve  # Python 3
except ImportError:
    from urllib import urlretrieve  # Python 2

url = "http://www.digimouth.com/news/media/2011/09/google-logo.jpg"
urlretrieve(url, "local-filename.jpg")

또는의 추가 요구 사항 requests이 수용 가능하고 http (s) URL 인 경우 :

def load_requests(source_url, sink_path):
    """
    Load a file from an URL (e.g. http).

    Parameters
    ----------
    source_url : str
        Where to load the file from.
    sink_path : str
        Where the loaded file is stored.
    """
    import requests
    r = requests.get(source_url, stream=True)
    if r.status_code == 200:
        with open(sink_path, 'wb') as f:
            for chunk in r:
                f.write(chunk)

— 마틴 토마
소스

5

Yup.의 스크립트를 확장하는 스크립트를 만들었습니다. 나는 몇 가지를 고쳤다. 이제 403 : 금지 된 문제를 무시합니다. 이미지를 검색하지 못하면 충돌하지 않습니다. 손상된 미리보기를 피하려고 시도합니다. 올바른 절대 URL을 얻습니다. 더 많은 정보를 제공합니다. 명령 행에서 인수로 실행할 수 있습니다.

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib2
import shutil
import requests
from urlparse import urljoin
import sys
import time

def make_soup(url):
    req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib2.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print 'Downloading images to current working directory.'
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print 'Getting: ' + filename
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print '  An error occured. Continuing.'
    print 'Done.'

if __name__ == '__main__':
    url = sys.argv[1]
    get_images(url)

— madprops
소스

3

요청 라이브러리 사용

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)


ImageDl(url)

— 소한 다스
소스

필자의 경우 헤더가 정말 중요한 것 같습니다 .403 오류가 발생했습니다. 효과가있었습니다.

— Ishtiyaq Husain

2

이것은 매우 짧은 대답입니다.

import urllib
urllib.urlretrieve("http://photogallery.sandesh.com/Picture.aspx?AlubumId=422040", "Abc.jpg")

— OO7
소스

2

파이썬 3 버전

Python 3의 @madprops 코드를 조정했습니다.

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib.request
import shutil
import requests
from urllib.parse import urljoin
import sys
import time

def make_soup(url):
    req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib.request.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print('Downloading images to current working directory.')
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print('Getting: ' + filename)
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print('  An error occured. Continuing.')
    print('Done.')

if __name__ == '__main__':
    get_images('http://www.wookmark.com')

— 지오바니 G. PY
소스

1

요청을 사용하여 Python 3에 새로운 것 :

코드의 주석. 기능을 사용할 준비가되었습니다.


import requests
from os import path

def get_image(image_url):
    """
    Get image based on url.
    :return: Image name if everything OK, False otherwise
    """
    image_name = path.split(image_url)[1]
    try:
        image = requests.get(image_url)
    except OSError:  # Little too wide, but work OK, no additional imports needed. Catch all conection problems
        return False
    if image.status_code == 200:  # we could have retrieved error page
        base_dir = path.join(path.dirname(path.realpath(__file__)), "images") # Use your own path or "" to use current working directory. Folder must exist.
        with open(path.join(base_dir, image_name), "wb") as f:
            f.write(image.content)
        return image_name

get_image("https://apod.nasddfda.gov/apod/image/2003/S106_Mishra_1947.jpg")

— 파벨 판 코차
소스

0

늦은 답변이지만 dloadpython>=3.6 사용할 수 있습니다 .

import dload
dload.save("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

로 이미지가 필요한 경우 다음을 bytes사용하십시오.

img_bytes = dload.bytes("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

사용하여 설치 pip3 install dload

— CONvid19
소스

-2

img_data=requests.get('https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg')

with open(str('file_name.jpg', 'wb') as handler:
    handler.write(img_data)

— 루이스 만
소스

4

스택 오버플로에 오신 것을 환영합니다! 이 사용자의 문제를 해결했지만 코드 전용 답변은 나중에이 질문을받는 사용자에게는 큰 도움이되지 않습니다. 코드가 원래 문제를 해결하는 이유를 설명하려면 답변을 편집하십시오.

— Joe C

1

TypeError: a bytes-like object is required, not 'Response'. 그것은해야합니다handler.write(img_data.content)

— TitanFighter

이어야합니다 handler.write(img_data.read()).

— jdhao

URL 주소를 이미 알고있는 Python을 사용하여 이미지를 로컬에 저장하는 방법은 무엇입니까?

파이썬 2

파이썬 3

파이썬 3 버전