Hướng dẫn urllib.request python - trăn urllib.request

Question

Urllib là gì?

Nội dung chính Show

Để biết chi tiết về các tiêu đề yêu cầu HTTP nhiều hơn, hãy xem tham chiếu nhanh đến các tiêu đề HTTP.
Michael Foord
Dữ liệu¶
Xử lý các trường hợp ngoại lệ
Lỗi HTTP¶
Gói nó lên
thông tin và geturl¶
Người mở và người xử lý
Xác thực cơ bản
Proxy công
# Table mapping response codes to messages; entries have the # form {code: (shortmessage, longmessage)}. responses = { 100: ('Continue', 'Request received, please continue'), 101: ('Switching Protocols', 'Switching to new protocol; obey Upgrade header'), 200: ('OK', 'Request fulfilled, document follows'), 201: ('Created', 'Document created, URL follows'), 202: ('Accepted', 'Request accepted, processing continues off-line'), 203: ('Non-Authoritative Information', 'Request fulfilled from cache'), 204: ('No Content', 'Request fulfilled, nothing follows'), 205: ('Reset Content', 'Clear input form for further input.'), 206: ('Partial Content', 'Partial content follows.'), 300: ('Multiple Choices', 'Object has several resources -- see URI list'), 301: ('Moved Permanently', 'Object moved permanently -- see URI list'), 302: ('Found', 'Object moved temporarily -- see URI list'), 303: ('See Other', 'Object moved -- see Method and URL list'), 304: ('Not Modified', 'Document has not changed since given time'), 305: ('Use Proxy', 'You must use proxy specified in Location to access this ' 'resource.'), 307: ('Temporary Redirect', 'Object moved temporarily -- see URI list'), 400: ('Bad Request', 'Bad request syntax or unsupported method'), 401: ('Unauthorized', 'No permission -- see authorization schemes'), 402: ('Payment Required', 'No payment -- see charging schemes'), 403: ('Forbidden', 'Request forbidden -- authorization will not help'), 404: ('Not Found', 'Nothing matches the given URI'), 405: ('Method Not Allowed', 'Specified method is invalid for this server.'), 406: ('Not Acceptable', 'URI not available in preferred format.'), 407: ('Proxy Authentication Required', 'You must authenticate with ' 'this proxy before proceeding.'), 408: ('Request Timeout', 'Request timed out; try again later.'), 409: ('Conflict', 'Request conflict.'), 410: ('Gone', 'URI no longer exists and has been permanently removed.'), 411: ('Length Required', 'Client must specify Content-Length.'), 412: ('Precondition Failed', 'Precondition in headers is false.'), 413: ('Request Entity Too Large', 'Entity is too large.'), 414: ('Request-URI Too Long', 'URI is too long.'), 415: ('Unsupported Media Type', 'Entity body in unsupported format.'), 416: ('Requested Range Not Satisfiable', 'Cannot satisfy request range.'), 417: ('Expectation Failed', 'Expect condition could not be satisfied.'), 500: ('Internal Server Error', 'Server got itself in trouble'), 501: ('Not Implemented', 'Server does not support this operation'), 502: ('Bad Gateway', 'Invalid responses from another server/proxy.'), 503: ('Service Unavailable', 'The server cannot process the request due to a high load'), 504: ('Gateway Timeout', 'The gateway server did not receive a timely response'), 505: ('HTTP Version Not Supported', 'Cannot fulfill request.'), } 3 sẽ bị bỏ qua nếu một biến # Table mapping response codes to messages; entries have the # form {code: (shortmessage, longmessage)}. responses = { 100: ('Continue', 'Request received, please continue'), 101: ('Switching Protocols', 'Switching to new protocol; obey Upgrade header'), 200: ('OK', 'Request fulfilled, document follows'), 201: ('Created', 'Document created, URL follows'), 202: ('Accepted', 'Request accepted, processing continues off-line'), 203: ('Non-Authoritative Information', 'Request fulfilled from cache'), 204: ('No Content', 'Request fulfilled, nothing follows'), 205: ('Reset Content', 'Clear input form for further input.'), 206: ('Partial Content', 'Partial content follows.'), 300: ('Multiple Choices', 'Object has several resources -- see URI list'), 301: ('Moved Permanently', 'Object moved permanently -- see URI list'), 302: ('Found', 'Object moved temporarily -- see URI list'), 303: ('See Other', 'Object moved -- see Method and URL list'), 304: ('Not Modified', 'Document has not changed since given time'), 305: ('Use Proxy', 'You must use proxy specified in Location to access this ' 'resource.'), 307: ('Temporary Redirect', 'Object moved temporarily -- see URI list'), 400: ('Bad Request', 'Bad request syntax or unsupported method'), 401: ('Unauthorized', 'No permission -- see authorization schemes'), 402: ('Payment Required', 'No payment -- see charging schemes'), 403: ('Forbidden', 'Request forbidden -- authorization will not help'), 404: ('Not Found', 'Nothing matches the given URI'), 405: ('Method Not Allowed', 'Specified method is invalid for this server.'), 406: ('Not Acceptable', 'URI not available in preferred format.'), 407: ('Proxy Authentication Required', 'You must authenticate with ' 'this proxy before proceeding.'), 408: ('Request Timeout', 'Request timed out; try again later.'), 409: ('Conflict', 'Request conflict.'), 410: ('Gone', 'URI no longer exists and has been permanently removed.'), 411: ('Length Required', 'Client must specify Content-Length.'), 412: ('Precondition Failed', 'Precondition in headers is false.'), 413: ('Request Entity Too Large', 'Entity is too large.'), 414: ('Request-URI Too Long', 'URI is too long.'), 415: ('Unsupported Media Type', 'Entity body in unsupported format.'), 416: ('Requested Range Not Satisfiable', 'Cannot satisfy request range.'), 417: ('Expectation Failed', 'Expect condition could not be satisfied.'), 500: ('Internal Server Error', 'Server got itself in trouble'), 501: ('Not Implemented', 'Server does not support this operation'), 502: ('Bad Gateway', 'Invalid responses from another server/proxy.'), 503: ('Service Unavailable', 'The server cannot process the request due to a high load'), 504: ('Gateway Timeout', 'The gateway server did not receive a timely response'), 505: ('HTTP Version Not Supported', 'Cannot fulfill request.'), } 4 được đặt; Xem tài liệu trên # Table mapping response codes to messages; entries have the # form {code: (shortmessage, longmessage)}. responses = { 100: ('Continue', 'Request received, please continue'), 101: ('Switching Protocols', 'Switching to new protocol; obey Upgrade header'), 200: ('OK', 'Request fulfilled, document follows'), 201: ('Created', 'Document created, URL follows'), 202: ('Accepted', 'Request accepted, processing continues off-line'), 203: ('Non-Authoritative Information', 'Request fulfilled from cache'), 204: ('No Content', 'Request fulfilled, nothing follows'), 205: ('Reset Content', 'Clear input form for further input.'), 206: ('Partial Content', 'Partial content follows.'), 300: ('Multiple Choices', 'Object has several resources -- see URI list'), 301: ('Moved Permanently', 'Object moved permanently -- see URI list'), 302: ('Found', 'Object moved temporarily -- see URI list'), 303: ('See Other', 'Object moved -- see Method and URL list'), 304: ('Not Modified', 'Document has not changed since given time'), 305: ('Use Proxy', 'You must use proxy specified in Location to access this ' 'resource.'), 307: ('Temporary Redirect', 'Object moved temporarily -- see URI list'), 400: ('Bad Request', 'Bad request syntax or unsupported method'), 401: ('Unauthorized', 'No permission -- see authorization schemes'), 402: ('Payment Required', 'No payment -- see charging schemes'), 403: ('Forbidden', 'Request forbidden -- authorization will not help'), 404: ('Not Found', 'Nothing matches the given URI'), 405: ('Method Not Allowed', 'Specified method is invalid for this server.'), 406: ('Not Acceptable', 'URI not available in preferred format.'), 407: ('Proxy Authentication Required', 'You must authenticate with ' 'this proxy before proceeding.'), 408: ('Request Timeout', 'Request timed out; try again later.'), 409: ('Conflict', 'Request conflict.'), 410: ('Gone', 'URI no longer exists and has been permanently removed.'), 411: ('Length Required', 'Client must specify Content-Length.'), 412: ('Precondition Failed', 'Precondition in headers is false.'), 413: ('Request Entity Too Large', 'Entity is too large.'), 414: ('Request-URI Too Long', 'URI is too long.'), 415: ('Unsupported Media Type', 'Entity body in unsupported format.'), 416: ('Requested Range Not Satisfiable', 'Cannot satisfy request range.'), 417: ('Expectation Failed', 'Expect condition could not be satisfied.'), 500: ('Internal Server Error', 'Server got itself in trouble'), 501: ('Not Implemented', 'Server does not support this operation'), 502: ('Bad Gateway', 'Invalid responses from another server/proxy.'), 503: ('Service Unavailable', 'The server cannot process the request due to a high load'), 504: ('Gateway Timeout', 'The gateway server did not receive a timely response'), 505: ('HTTP Version Not Supported', 'Cannot fulfill request.'), } 5.
Kể từ Python 2.3, bạn có thể chỉ định thời gian một ổ cắm nên đợi phản hồi trước khi hết thời gian. Điều này có thể hữu ích trong các ứng dụng phải tìm nạp các trang web. Theo mặc định, mô -đun ổ cắm không có thời gian chờ và có thể treo. Hiện tại, thời gian chờ ổ cắm không được phơi bày tại các cấp độ http.client hoặc urllib.request. Tuy nhiên, bạn có thể đặt thời gian chờ mặc định trên toàn cầu cho tất cả các ổ cắm bằng cách sử dụng

urllib là một mô-đun của Python có thể dùng để mở các URL. Nó định nghĩa các hàm và lớp giúp thao tác với URL. là một mô-đun của Python có thể dùng để mở các URL. Nó định nghĩa các hàm và lớp giúp thao tác với URL.

Với Python, bạn cũng có thể truy cập và trích xuất dữ liệu từ internet như XML, HTML, JSON, v.v. Bạn cũng có thể sử dụng Python để xử lý trực tiếp các dữ liệu này. Trong hướng dẫn này, chúng ta sẽ xem làm thế nào chúng ta có thể lấy dữ liệu từ web. Ví dụ: ở đây chúng ta sử dụng URL video guru99.com và chúng ta sẽ truy cập URL cũng như in tệp HTML của URL này bằng Python.

Trong bài này, chúng ta sẽ tìm hiểu:

Cách mở URL bằng Urllib
Cách đọc tệp HTML từ URL trong Python

Cách mở URL bằng Urllib

Cách đọc tệp HTML từ URL trong Python

Trước khi chạy mã nguồn để kết nối với dữ liệu Internet, chúng ta cần nạp mô-đun thư viện thao tác với URL là "urllib".
Nạp mô-đun "urllib".
Khai báo hàm main.
Khai báo biến "webUrl"
Sau đó gọi hàm urlopen từ thư viện urllib
URL chúng ta đang mở là bài giảng guru99 trên youtube
Tiếp theo, chúng ta sẽ in ra mã kết quả.
Mã kết quả được lấy từ hàm "getcode" trên biến webUrl mà chúng ta vừa khai báo.
Chúng ta cần chuyển nó sang dạng chuỗi, để nó có thể nối được vào chuỗi “result code”.

Đây sẽ là mã HTTP thông thường "200", nó cho thấy yêu cầu http được xử lý thành công.

Cách lấy về tệp HTML từ URL trong Python

Hướng dẫn urllib.request python - trăn urllib.request

Bạn cũng có thể đọc nội dung HTML bằng cách sử dụng hàm "read" trong Python và khi bạn thực thi mã nguồn, dữ liệu HTML sẽ được in ra.
Gọi hàm đọc read trên biến webURL
Đọc nội dung của biến tức là bạn đang đọc nội dung các tệp dữ liệu.
Lưu toàn bộ nội dung mà URL trả về vào một biến data

Thực thi mã nguồn, nó sẽ in dữ liệu ở dạng HTML ra màn hình.

Dưới đây là mã nguồn hoàn chỉnh

#  
# read the data from the URL and print it
#
import urllib2

def main():
# open a connection to a URL using urllib2
   webUrl = urllib2.urlopen("https://www.youtube.com/user/guru99com")
  
#get the result code and print it
   print "result code: " + str(webUrl.getcode()) 
  
# read the data from the URL and print it
   data = webUrl.read()
   print data
 
if __name__ == "__main__":
  main()

Ví dụ sử dụng Python 2

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

Trình duyệt đánh hơi là một thực tiễn rất xấu cho thiết kế trang web - các trang web xây dựng sử dụng các tiêu chuẩn web là hợp lý hơn nhiều. Thật không may, rất nhiều trang web vẫn gửi các phiên bản khác nhau đến các trình duyệt khác nhau.

Tác nhân người dùng cho MSIE 6 là ‘Mozilla/4.0 (tương thích; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

Để biết chi tiết về các tiêu đề yêu cầu HTTP nhiều hơn, hãy xem tham chiếu nhanh đến các tiêu đề HTTP.

Trong trường hợp của tôi, tôi phải sử dụng proxy để truy cập internet tại nơi làm việc. Nếu bạn cố gắng tìm nạp các URL localhost thông qua proxy này, nó sẽ chặn chúng. IE được thiết lập để sử dụng proxy, mà Urllib chọn. Để kiểm tra các tập lệnh với máy chủ localhost, tôi phải ngăn Urllib sử dụng proxy. is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the urlopen function. This is capable of fetching URLs using a variety of different protocols. It also offers a slightly more complex interface for handling common situations - like basic authentication, cookies, proxies and so on. These are provided by objects called handlers and openers.

Mở Urllib cho SSL Proxy (Phương pháp kết nối): Công thức Cookbook ASPN.

Tác giảRFC 2616. This is a technical document and not intended to be easy to read. This HOWTO aims to illustrate using urllib, with enough detail about HTTP to help you through. It is not intended to replace the

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

0 docs, but is supplementary to them.

Michael Foord

Giới thiệu¶

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

Urllib.Request là một mô -đun Python để tìm nạp các URL (bộ định vị tài nguyên thống nhất). Nó cung cấp một giao diện rất đơn giản, dưới dạng chức năng Urlopen. Điều này có khả năng tìm nạp các URL bằng nhiều giao thức khác nhau. Nó cũng cung cấp một giao diện phức tạp hơn một chút để xử lý các tình huống phổ biến - như xác thực cơ bản, cookie, proxy, v.v. Chúng được cung cấp bởi các đối tượng gọi là trình xử lý và bộ mở.

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

Urllib.Request hỗ trợ tìm nạp các URL cho nhiều sơ đồ URL của Google (được xác định bởi chuỗi trước

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

7 trong URL - ví dụ

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

8 là sơ đồ URL của

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

9) bằng cách sử dụng các giao thức mạng liên quan của chúng (ví dụ: FTP, HTTP). Hướng dẫn này tập trung vào trường hợp phổ biến nhất, HTTP.

HTTP dựa trên các yêu cầu và phản hồi - khách hàng thực hiện các yêu cầu và máy chủ gửi phản hồi. Urllib.Request phản ánh điều này với một đối tượng

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

3 đại diện cho yêu cầu HTTP bạn đang thực hiện. Ở dạng đơn giản nhất, bạn tạo một đối tượng yêu cầu chỉ định URL bạn muốn tìm nạp. Gọi

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

4 với đối tượng yêu cầu này trả về một đối tượng phản hồi cho URL được yêu cầu. Phản hồi này là một đối tượng giống như tệp, có nghĩa là bạn có thể gọi

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

5 trên phản hồi:

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')
with urllib.request.urlopen(req) as response:
   the_page = response.read()

Lưu ý rằng Urllib.Request sử dụng cùng một giao diện yêu cầu để xử lý tất cả các sơ đồ URL. Ví dụ: bạn có thể thực hiện yêu cầu FTP như vậy:

req = urllib.request.Request('ftp://example.com/')

Trong trường hợp của HTTP, có hai điều bổ sung yêu cầu các đối tượng cho phép bạn thực hiện: Đầu tiên, bạn có thể truyền dữ liệu được gửi đến máy chủ. Thứ hai, bạn có thể chuyển thêm thông tin (siêu dữ liệu) về dữ liệu hoặc về chính yêu cầu, cho máy chủ - thông tin này được gửi dưới dạng tiêu đề HTTP. Hãy cùng nhau nhìn vào từng người trong số này.

Dữ liệu¶

Đôi khi bạn muốn gửi dữ liệu đến URL (thường là URL sẽ đề cập đến tập lệnh CGI (Giao diện cổng thông thường) hoặc ứng dụng web khác). Với HTTP, điều này thường được thực hiện bằng cách sử dụng những gì được gọi là yêu cầu POST. Đây thường là những gì trình duyệt của bạn làm khi bạn gửi biểu mẫu HTML mà bạn đã điền trên web. Không phải tất cả các bài đăng phải đến từ các biểu mẫu: Bạn có thể sử dụng một bài đăng để truyền dữ liệu tùy ý đến ứng dụng của riêng bạn. Trong trường hợp phổ biến của các biểu mẫu HTML, dữ liệu cần được mã hóa theo cách tiêu chuẩn và sau đó được chuyển đến đối tượng yêu cầu dưới dạng đối số

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

6. Việc mã hóa được thực hiện bằng cách sử dụng một hàm từ thư viện

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

7.POST request. This is often what your browser does when you submit a HTML form that you filled in on the web. Not all POSTs have to come from forms: you can use a POST to transmit arbitrary data to your own application. In the common case of HTML forms, the data needs to be encoded in a standard way, and then passed to the Request object as the

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

6 argument. The encoding is done using a function from the

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

7 library.

import urllib.parse
import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

Lưu ý rằng các mã hóa khác đôi khi được yêu cầu (ví dụ: để tải lên tệp từ các biểu mẫu HTML - xem Thông số kỹ thuật HTML, gửi biểu mẫu để biết thêm chi tiết).

Nếu bạn không vượt qua đối số

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

6, Urllib sử dụng yêu cầu GET. Một cách mà các yêu cầu nhận và đăng khác nhau là các yêu cầu POS đến cửa của bạn). Mặc dù tiêu chuẩn HTTP cho thấy rõ rằng các bài đăng được dự định luôn gây ra tác dụng phụ và nhận được các yêu cầu không bao giờ gây ra tác dụng phụ, không có gì ngăn chặn yêu cầu có tác dụng phụ và cũng như không có yêu cầu không có tác dụng phụ. Dữ liệu cũng có thể được truyền trong yêu cầu HTTP nhận bằng cách mã hóa nó trong chính URL.GET request. One way in which GET and POST requests differ is that POST requests often have “side-effects”: they change the state of the system in some way (for example by placing an order with the website for a hundredweight of tinned spam to be delivered to your door). Though the HTTP standard makes it clear that POSTs are intended to always cause side-effects, and GET requests never to cause side-effects, nothing prevents a GET request from having side-effects, nor a POST requests from having no side-effects. Data can also be passed in an HTTP GET request by encoding it in the URL itself.

Điều này được thực hiện như sau:

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

Lưu ý rằng URL đầy đủ được tạo bằng cách thêm

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

9 vào URL, theo sau là các giá trị được mã hóa.

Xử lý các trường hợp ngoại lệ

Urlopen tăng

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

0 khi nó không thể xử lý phản hồi (mặc dù như thường lệ với API Python, các trường hợp ngoại lệ tích hợp như

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

1,

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

2, v.v. cũng có thể được nâng lên).

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

3 là lớp con của

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

0 được nêu trong trường hợp cụ thể của URL HTTP.

Các lớp ngoại lệ được xuất từ mô -đun

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

5.

Urlerror¶

Thông thường, urlerror được nâng lên vì không có kết nối mạng (không có tuyến đường đến máy chủ được chỉ định) hoặc máy chủ được chỉ định không tồn tại. Trong trường hợp này, ngoại lệ được nâng lên sẽ có thuộc tính ‘lý do, đây là một bộ xử lý có chứa mã lỗi và thông báo lỗi văn bản.

e.g.

>>> req = urllib.request.Request('http://www.pretend_server.org')
>>> try: urllib.request.urlopen(req)
... except urllib.error.URLError as e:
...     print(e.reason)      
...
(4, 'getaddrinfo failed')

Lỗi HTTP¶

Mỗi phản hồi HTTP từ máy chủ đều chứa mã trạng thái số. Đôi khi mã trạng thái chỉ ra rằng máy chủ không thể đáp ứng yêu cầu. Các trình xử lý mặc định sẽ xử lý một số phản hồi này cho bạn (ví dụ: nếu phản hồi là chuyển hướng của Cameron, yêu cầu máy khách lấy tài liệu từ một URL khác, Urllib sẽ xử lý điều đó cho bạn). Đối với những người có thể xử lý, Urlopen sẽ tăng

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

3. Các lỗi điển hình bao gồm ‘404, (không tìm thấy trang),‘ 403, (Yêu cầu bị cấm) và ‘401 (yêu cầu xác thực).

Xem Phần 10 của RFC 2616 để biết tham khảo trên tất cả các mã lỗi HTTP.RFC 2616 for a reference on all the HTTP error codes.

Ví dụ

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

3 được nâng lên sẽ có thuộc tính mã số nguyên, tương ứng với lỗi được gửi bởi máy chủ.

Mã lỗi

Vì các trình xử lý mặc định xử lý chuyển hướng (mã trong phạm vi 300) và mã trong phạm vi 10029299 cho thấy thành công, bạn thường sẽ chỉ thấy mã lỗi trong phạm vi 400 Lỗi599.

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

8 là một từ điển hữu ích của các mã phản hồi trong đó cho thấy tất cả các mã phản hồi được sử dụng bởi RFC 2616. Từ điển được sao chép ở đây để thuận tiệnRFC 2616. The dictionary is reproduced here for convenience

# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
responses = {
    100: ('Continue', 'Request received, please continue'),
    101: ('Switching Protocols',
          'Switching to new protocol; obey Upgrade header'),

    200: ('OK', 'Request fulfilled, document follows'),
    201: ('Created', 'Document created, URL follows'),
    202: ('Accepted',
          'Request accepted, processing continues off-line'),
    203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
    204: ('No Content', 'Request fulfilled, nothing follows'),
    205: ('Reset Content', 'Clear input form for further input.'),
    206: ('Partial Content', 'Partial content follows.'),

    300: ('Multiple Choices',
          'Object has several resources -- see URI list'),
    301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
    302: ('Found', 'Object moved temporarily -- see URI list'),
    303: ('See Other', 'Object moved -- see Method and URL list'),
    304: ('Not Modified',
          'Document has not changed since given time'),
    305: ('Use Proxy',
          'You must use proxy specified in Location to access this '
          'resource.'),
    307: ('Temporary Redirect',
          'Object moved temporarily -- see URI list'),

    400: ('Bad Request',
          'Bad request syntax or unsupported method'),
    401: ('Unauthorized',
          'No permission -- see authorization schemes'),
    402: ('Payment Required',
          'No payment -- see charging schemes'),
    403: ('Forbidden',
          'Request forbidden -- authorization will not help'),
    404: ('Not Found', 'Nothing matches the given URI'),
    405: ('Method Not Allowed',
          'Specified method is invalid for this server.'),
    406: ('Not Acceptable', 'URI not available in preferred format.'),
    407: ('Proxy Authentication Required', 'You must authenticate with '
          'this proxy before proceeding.'),
    408: ('Request Timeout', 'Request timed out; try again later.'),
    409: ('Conflict', 'Request conflict.'),
    410: ('Gone',
          'URI no longer exists and has been permanently removed.'),
    411: ('Length Required', 'Client must specify Content-Length.'),
    412: ('Precondition Failed', 'Precondition in headers is false.'),
    413: ('Request Entity Too Large', 'Entity is too large.'),
    414: ('Request-URI Too Long', 'URI is too long.'),
    415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
    416: ('Requested Range Not Satisfiable',
          'Cannot satisfy request range.'),
    417: ('Expectation Failed',
          'Expect condition could not be satisfied.'),

    500: ('Internal Server Error', 'Server got itself in trouble'),
    501: ('Not Implemented',
          'Server does not support this operation'),
    502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
    503: ('Service Unavailable',
          'The server cannot process the request due to a high load'),
    504: ('Gateway Timeout',
          'The gateway server did not receive a timely response'),
    505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
    }

Khi một lỗi được nêu ra, máy chủ sẽ phản hồi bằng cách trả về mã lỗi HTTP và trang lỗi. Bạn có thể sử dụng thể hiện

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

3 làm phản hồi trên trang được trả về. Điều này có nghĩa là cũng như thuộc tính mã, nó cũng có các phương thức đọc, geturl và thông tin, được trả về bởi mô -đun

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')
with urllib.request.urlopen(req) as response:
   the_page = response.read()

0:

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

0

Gói nó lên

Vì vậy, nếu bạn muốn được chuẩn bị cho

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

3 hoặc

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

0, có hai cách tiếp cận cơ bản. Tôi thích phương pháp thứ hai.

Số 1¶

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

1

Ghi chú

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')
with urllib.request.urlopen(req) as response:
   the_page = response.read()

3 phải đến trước, nếu không

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')
with urllib.request.urlopen(req) as response:
   the_page = response.read()

4 cũng sẽ bắt được

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

3.

Số 2¶

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

2

thông tin và geturl¶

Phản hồi được trả về bởi Urlopen (hoặc trường hợp ____33) có hai phương pháp hữu ích

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')
with urllib.request.urlopen(req) as response:
   the_page = response.read()

7 và

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')
with urllib.request.urlopen(req) as response:
   the_page = response.read()

8 và được xác định trong mô -đun

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')
with urllib.request.urlopen(req) as response:
   the_page = response.read()

0 ..

GetURL - Điều này trả về URL thực của trang được lấy. Điều này rất hữu ích vì

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

4 (hoặc đối tượng mở được sử dụng) có thể đã theo chuyển hướng. URL của trang được tìm nạp có thể không giống như URL được yêu cầu. - this returns the real URL of the page fetched. This is useful because

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

4 (or the opener object used) may have followed a redirect. The URL of the page fetched may not be the same as the URL requested.

Thông tin - Điều này trả về một đối tượng giống như từ điển mô tả trang được tìm nạp, đặc biệt là các tiêu đề được gửi bởi máy chủ. Nó hiện là một ví dụ

req = urllib.request.Request('ftp://example.com/')

1. - this returns a dictionary-like object that describes the page fetched, particularly the headers sent by the server. It is currently an

req = urllib.request.Request('ftp://example.com/')

1 instance.

Các tiêu đề điển hình bao gồm ‘độ dài nội dung,‘ loại nội dung, v.v. Xem tham chiếu nhanh đến các tiêu đề HTTP để biết danh sách các tiêu đề HTTP hữu ích với các giải thích ngắn gọn về ý nghĩa và việc sử dụng của chúng.

Người mở và người xử lý

Khi bạn tìm nạp một URL, bạn sử dụng một dụng cụ mở (một thể hiện của

req = urllib.request.Request('ftp://example.com/')

2 có tên có thể khó hiểu). Thông thường chúng tôi đã sử dụng bộ mở mặc định - thông qua

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

4 - nhưng bạn có thể tạo bộ mở tùy chỉnh. Người mở sử dụng trình xử lý. Tất cả các công việc nặng nề của người Viking được thực hiện bởi những người xử lý. Mỗi người xử lý biết cách mở URL cho một sơ đồ URL cụ thể (HTTP, FTP, v.v.) hoặc cách xử lý một khía cạnh mở URL, ví dụ như chuyển hướng HTTP hoặc cookie HTTP.

Bạn sẽ muốn tạo Trình mở nếu bạn muốn tìm nạp các URL với trình xử lý cụ thể được cài đặt, ví dụ để có được một dụng cụ mở xử lý cookie hoặc để có một dụng cụ mở không xử lý chuyển hướng.

Để tạo một dụng cụ mở, khởi tạo một

req = urllib.request.Request('ftp://example.com/')

4, và sau đó gọi

req = urllib.request.Request('ftp://example.com/')

5 nhiều lần.

Ngoài ra, bạn có thể sử dụng

req = urllib.request.Request('ftp://example.com/')

6, đây là hàm tiện lợi để tạo các đối tượng mở với một cuộc gọi chức năng duy nhất.

req = urllib.request.Request('ftp://example.com/')

6 thêm một số trình xử lý theo mặc định, nhưng cung cấp một cách nhanh chóng để thêm nhiều hơn và/hoặc ghi đè người xử lý mặc định.

Các loại xử lý khác mà bạn có thể muốn có thể xử lý các proxy, xác thực và các tình huống phổ biến nhưng hơi chuyên dụng khác.

req = urllib.request.Request('ftp://example.com/')

8 có thể được sử dụng để tạo một đối tượng

req = urllib.request.Request('ftp://example.com/')

9 là bộ mở mặc định (toàn cầu). Điều này có nghĩa là các cuộc gọi đến

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

4 sẽ sử dụng bộ mở bạn đã cài đặt.

Các đối tượng mở có một phương thức

import urllib.parse
import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

1, có thể được gọi trực tiếp vào các URL tìm nạp các URL giống như hàm

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

4: Không cần gọi

req = urllib.request.Request('ftp://example.com/')

8, ngoại trừ để thuận tiện.

Xác thực cơ bản

Để minh họa việc tạo và cài đặt một trình xử lý, chúng tôi sẽ sử dụng

import urllib.parse
import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

4. Để thảo luận chi tiết hơn về chủ đề này - bao gồm một lời giải thích về cách thức xác thực cơ bản hoạt động - xem hướng dẫn xác thực cơ bản.

Khi yêu cầu xác thực, máy chủ sẽ gửi tiêu đề (cũng như mã lỗi 401) yêu cầu xác thực. Điều này chỉ định sơ đồ xác thực và ‘Realm. Tiêu đề trông giống như:

import urllib.parse
import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

5.

e.g.

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

3

Sau đó, khách hàng nên thử lại yêu cầu với tên và mật khẩu phù hợp cho vương quốc được bao gồm làm tiêu đề trong yêu cầu. Đây là ‘xác thực cơ bản. Để đơn giản hóa quá trình này, chúng ta có thể tạo một thể hiện là

import urllib.parse
import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

4 và một bộ mở để sử dụng trình xử lý này.

import urllib.parse
import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

4 sử dụng một đối tượng được gọi là Trình quản lý mật khẩu để xử lý ánh xạ URL và Realms cho mật khẩu và tên người dùng. Nếu bạn biết vương quốc là gì (từ tiêu đề xác thực được gửi bởi máy chủ), thì bạn có thể sử dụng

import urllib.parse
import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

8. Thường thì người ta không quan tâm đến vương quốc là gì. Trong trường hợp đó, thuận tiện để sử dụng

import urllib.parse
import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }

data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

9. Điều này cho phép bạn chỉ định tên người dùng và mật khẩu mặc định cho URL. Điều này sẽ được cung cấp trong trường hợp không có bạn cung cấp một sự kết hợp thay thế cho một lĩnh vực cụ thể. Chúng tôi chỉ ra điều này bằng cách cung cấp

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

0 như là đối số vương quốc cho phương thức

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

1.

URL cấp cao nhất là URL đầu tiên yêu cầu xác thực. URL, sâu hơn so với URL bạn chuyển đến .add_password () cũng sẽ khớp.

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

4

Ghi chú

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')
with urllib.request.urlopen(req) as response:
   the_page = response.read()

3 phải đến trước, nếu không

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')
with urllib.request.urlopen(req) as response:
   the_page = response.read()

4 cũng sẽ bắt được

import shutil
import tempfile
import urllib.request

with urllib.request.urlopen('http://python.org/') as response:
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:
    pass

3.

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

5 environment variable is set),

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

6,

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

7,

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

8,

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

9,

>>> req = urllib.request.Request('http://www.pretend_server.org')
>>> try: urllib.request.urlopen(req)
... except urllib.error.URLError as e:
...     print(e.reason)      
...
(4, 'getaddrinfo failed')

0,

>>> req = urllib.request.Request('http://www.pretend_server.org')
>>> try: urllib.request.urlopen(req)
... except urllib.error.URLError as e:
...     print(e.reason)      
...
(4, 'getaddrinfo failed')

1,

>>> req = urllib.request.Request('http://www.pretend_server.org')
>>> try: urllib.request.urlopen(req)
... except urllib.error.URLError as e:
...     print(e.reason)      
...
(4, 'getaddrinfo failed')

2,

>>> req = urllib.request.Request('http://www.pretend_server.org')
>>> try: urllib.request.urlopen(req)
... except urllib.error.URLError as e:
...     print(e.reason)      
...
(4, 'getaddrinfo failed')

3.

Số 2¶

Proxy công

Urllib sẽ tự động phát hiện cài đặt proxy của bạn và sử dụng chúng. Đây là thông qua

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

4, là một phần của chuỗi xử lý bình thường khi phát hiện cài đặt proxy. Thông thường, đó là một điều tốt, nhưng có những lúc nó có thể không hữu ích 5. Một cách để làm điều này là thiết lập
>>> import urllib.request >>> import urllib.parse >>> data = {} >>> data['name'] = 'Somebody Here' >>> data['location'] = 'Northampton' >>> data['language'] = 'Python' >>> url_values = urllib.parse.urlencode(data) >>> print(url_values) # The order may differ from below. name=Somebody+Here&language=Python&location=Northampton >>> url = 'http://www.example.com/example.cgi' >>> full_url = url + '?' + url_values >>> data = urllib.request.urlopen(full_url)
4 của chúng ta, không có proxy được xác định. Điều này được thực hiện bằng cách sử dụng các bước tương tự để thiết lập trình xử lý xác thực cơ bản: will auto-detect your proxy settings and use those. This is through the

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

4, which is part of the normal handler chain when a proxy setting is detected. Normally that’s a good thing, but there are occasions when it may not be helpful 5. One way to do this is to setup our own

>>> import urllib.request
>>> import urllib.parse
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)  # The order may differ from below.  
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib.request.urlopen(full_url)

4, with no proxies defined. This is done using similar steps to setting up a Basic Authentication handler:

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

5

Ghi chú

Hiện tại

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

0 không hỗ trợ tìm nạp các vị trí

# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
responses = {
    100: ('Continue', 'Request received, please continue'),
    101: ('Switching Protocols',
          'Switching to new protocol; obey Upgrade header'),

    200: ('OK', 'Request fulfilled, document follows'),
    201: ('Created', 'Document created, URL follows'),
    202: ('Accepted',
          'Request accepted, processing continues off-line'),
    203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
    204: ('No Content', 'Request fulfilled, nothing follows'),
    205: ('Reset Content', 'Clear input form for further input.'),
    206: ('Partial Content', 'Partial content follows.'),

    300: ('Multiple Choices',
          'Object has several resources -- see URI list'),
    301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
    302: ('Found', 'Object moved temporarily -- see URI list'),
    303: ('See Other', 'Object moved -- see Method and URL list'),
    304: ('Not Modified',
          'Document has not changed since given time'),
    305: ('Use Proxy',
          'You must use proxy specified in Location to access this '
          'resource.'),
    307: ('Temporary Redirect',
          'Object moved temporarily -- see URI list'),

    400: ('Bad Request',
          'Bad request syntax or unsupported method'),
    401: ('Unauthorized',
          'No permission -- see authorization schemes'),
    402: ('Payment Required',
          'No payment -- see charging schemes'),
    403: ('Forbidden',
          'Request forbidden -- authorization will not help'),
    404: ('Not Found', 'Nothing matches the given URI'),
    405: ('Method Not Allowed',
          'Specified method is invalid for this server.'),
    406: ('Not Acceptable', 'URI not available in preferred format.'),
    407: ('Proxy Authentication Required', 'You must authenticate with '
          'this proxy before proceeding.'),
    408: ('Request Timeout', 'Request timed out; try again later.'),
    409: ('Conflict', 'Request conflict.'),
    410: ('Gone',
          'URI no longer exists and has been permanently removed.'),
    411: ('Length Required', 'Client must specify Content-Length.'),
    412: ('Precondition Failed', 'Precondition in headers is false.'),
    413: ('Request Entity Too Large', 'Entity is too large.'),
    414: ('Request-URI Too Long', 'URI is too long.'),
    415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
    416: ('Requested Range Not Satisfiable',
          'Cannot satisfy request range.'),
    417: ('Expectation Failed',
          'Expect condition could not be satisfied.'),

    500: ('Internal Server Error', 'Server got itself in trouble'),
    501: ('Not Implemented',
          'Server does not support this operation'),
    502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
    503: ('Service Unavailable',
          'The server cannot process the request due to a high load'),
    504: ('Gateway Timeout',
          'The gateway server did not receive a timely response'),
    505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
    }

2 thông qua proxy. Tuy nhiên, điều này có thể được kích hoạt bằng cách mở rộng urllib.request như trong Công thức 6.

Ghi chú

Hiện tại

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()

0 không hỗ trợ tìm nạp các vị trí

# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
responses = {
    100: ('Continue', 'Request received, please continue'),
    101: ('Switching Protocols',
          'Switching to new protocol; obey Upgrade header'),

    200: ('OK', 'Request fulfilled, document follows'),
    201: ('Created', 'Document created, URL follows'),
    202: ('Accepted',
          'Request accepted, processing continues off-line'),
    203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
    204: ('No Content', 'Request fulfilled, nothing follows'),
    205: ('Reset Content', 'Clear input form for further input.'),
    206: ('Partial Content', 'Partial content follows.'),

    300: ('Multiple Choices',
          'Object has several resources -- see URI list'),
    301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
    302: ('Found', 'Object moved temporarily -- see URI list'),
    303: ('See Other', 'Object moved -- see Method and URL list'),
    304: ('Not Modified',
          'Document has not changed since given time'),
    305: ('Use Proxy',
          'You must use proxy specified in Location to access this '
          'resource.'),
    307: ('Temporary Redirect',
          'Object moved temporarily -- see URI list'),

    400: ('Bad Request',
          'Bad request syntax or unsupported method'),
    401: ('Unauthorized',
          'No permission -- see authorization schemes'),
    402: ('Payment Required',
          'No payment -- see charging schemes'),
    403: ('Forbidden',
          'Request forbidden -- authorization will not help'),
    404: ('Not Found', 'Nothing matches the given URI'),
    405: ('Method Not Allowed',
          'Specified method is invalid for this server.'),
    406: ('Not Acceptable', 'URI not available in preferred format.'),
    407: ('Proxy Authentication Required', 'You must authenticate with '
          'this proxy before proceeding.'),
    408: ('Request Timeout', 'Request timed out; try again later.'),
    409: ('Conflict', 'Request conflict.'),
    410: ('Gone',
          'URI no longer exists and has been permanently removed.'),
    411: ('Length Required', 'Client must specify Content-Length.'),
    412: ('Precondition Failed', 'Precondition in headers is false.'),
    413: ('Request Entity Too Large', 'Entity is too large.'),
    414: ('Request-URI Too Long', 'URI is too long.'),
    415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
    416: ('Requested Range Not Satisfiable',
          'Cannot satisfy request range.'),
    417: ('Expectation Failed',
          'Expect condition could not be satisfied.'),

    500: ('Internal Server Error', 'Server got itself in trouble'),
    501: ('Not Implemented',
          'Server does not support this operation'),
    502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
    503: ('Service Unavailable',
          'The server cannot process the request due to a high load'),
    504: ('Gateway Timeout',
          'The gateway server did not receive a timely response'),
    505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
    }

2 thông qua proxy. Tuy nhiên, điều này có thể được kích hoạt bằng cách mở rộng urllib.request như trong Công thức 6.

# Table mapping response codes to messages; entries have the # form {code: (shortmessage, longmessage)}. responses = { 100: ('Continue', 'Request received, please continue'), 101: ('Switching Protocols', 'Switching to new protocol; obey Upgrade header'), 200: ('OK', 'Request fulfilled, document follows'), 201: ('Created', 'Document created, URL follows'), 202: ('Accepted', 'Request accepted, processing continues off-line'), 203: ('Non-Authoritative Information', 'Request fulfilled from cache'), 204: ('No Content', 'Request fulfilled, nothing follows'), 205: ('Reset Content', 'Clear input form for further input.'), 206: ('Partial Content', 'Partial content follows.'), 300: ('Multiple Choices', 'Object has several resources -- see URI list'), 301: ('Moved Permanently', 'Object moved permanently -- see URI list'), 302: ('Found', 'Object moved temporarily -- see URI list'), 303: ('See Other', 'Object moved -- see Method and URL list'), 304: ('Not Modified', 'Document has not changed since given time'), 305: ('Use Proxy', 'You must use proxy specified in Location to access this ' 'resource.'), 307: ('Temporary Redirect', 'Object moved temporarily -- see URI list'), 400: ('Bad Request', 'Bad request syntax or unsupported method'), 401: ('Unauthorized', 'No permission -- see authorization schemes'), 402: ('Payment Required', 'No payment -- see charging schemes'), 403: ('Forbidden', 'Request forbidden -- authorization will not help'), 404: ('Not Found', 'Nothing matches the given URI'), 405: ('Method Not Allowed', 'Specified method is invalid for this server.'), 406: ('Not Acceptable', 'URI not available in preferred format.'), 407: ('Proxy Authentication Required', 'You must authenticate with ' 'this proxy before proceeding.'), 408: ('Request Timeout', 'Request timed out; try again later.'), 409: ('Conflict', 'Request conflict.'), 410: ('Gone', 'URI no longer exists and has been permanently removed.'), 411: ('Length Required', 'Client must specify Content-Length.'), 412: ('Precondition Failed', 'Precondition in headers is false.'), 413: ('Request Entity Too Large', 'Entity is too large.'), 414: ('Request-URI Too Long', 'URI is too long.'), 415: ('Unsupported Media Type', 'Entity body in unsupported format.'), 416: ('Requested Range Not Satisfiable', 'Cannot satisfy request range.'), 417: ('Expectation Failed', 'Expect condition could not be satisfied.'), 500: ('Internal Server Error', 'Server got itself in trouble'), 501: ('Not Implemented', 'Server does not support this operation'), 502: ('Bad Gateway', 'Invalid responses from another server/proxy.'), 503: ('Service Unavailable', 'The server cannot process the request due to a high load'), 504: ('Gateway Timeout', 'The gateway server did not receive a timely response'), 505: ('HTTP Version Not Supported', 'Cannot fulfill request.'), } 3 sẽ bị bỏ qua nếu một biến # Table mapping response codes to messages; entries have the # form {code: (shortmessage, longmessage)}. responses = { 100: ('Continue', 'Request received, please continue'), 101: ('Switching Protocols', 'Switching to new protocol; obey Upgrade header'), 200: ('OK', 'Request fulfilled, document follows'), 201: ('Created', 'Document created, URL follows'), 202: ('Accepted', 'Request accepted, processing continues off-line'), 203: ('Non-Authoritative Information', 'Request fulfilled from cache'), 204: ('No Content', 'Request fulfilled, nothing follows'), 205: ('Reset Content', 'Clear input form for further input.'), 206: ('Partial Content', 'Partial content follows.'), 300: ('Multiple Choices', 'Object has several resources -- see URI list'), 301: ('Moved Permanently', 'Object moved permanently -- see URI list'), 302: ('Found', 'Object moved temporarily -- see URI list'), 303: ('See Other', 'Object moved -- see Method and URL list'), 304: ('Not Modified', 'Document has not changed since given time'), 305: ('Use Proxy', 'You must use proxy specified in Location to access this ' 'resource.'), 307: ('Temporary Redirect', 'Object moved temporarily -- see URI list'), 400: ('Bad Request', 'Bad request syntax or unsupported method'), 401: ('Unauthorized', 'No permission -- see authorization schemes'), 402: ('Payment Required', 'No payment -- see charging schemes'), 403: ('Forbidden', 'Request forbidden -- authorization will not help'), 404: ('Not Found', 'Nothing matches the given URI'), 405: ('Method Not Allowed', 'Specified method is invalid for this server.'), 406: ('Not Acceptable', 'URI not available in preferred format.'), 407: ('Proxy Authentication Required', 'You must authenticate with ' 'this proxy before proceeding.'), 408: ('Request Timeout', 'Request timed out; try again later.'), 409: ('Conflict', 'Request conflict.'), 410: ('Gone', 'URI no longer exists and has been permanently removed.'), 411: ('Length Required', 'Client must specify Content-Length.'), 412: ('Precondition Failed', 'Precondition in headers is false.'), 413: ('Request Entity Too Large', 'Entity is too large.'), 414: ('Request-URI Too Long', 'URI is too long.'), 415: ('Unsupported Media Type', 'Entity body in unsupported format.'), 416: ('Requested Range Not Satisfiable', 'Cannot satisfy request range.'), 417: ('Expectation Failed', 'Expect condition could not be satisfied.'), 500: ('Internal Server Error', 'Server got itself in trouble'), 501: ('Not Implemented', 'Server does not support this operation'), 502: ('Bad Gateway', 'Invalid responses from another server/proxy.'), 503: ('Service Unavailable', 'The server cannot process the request due to a high load'), 504: ('Gateway Timeout', 'The gateway server did not receive a timely response'), 505: ('HTTP Version Not Supported', 'Cannot fulfill request.'), } 4 được đặt; Xem tài liệu trên # Table mapping response codes to messages; entries have the # form {code: (shortmessage, longmessage)}. responses = { 100: ('Continue', 'Request received, please continue'), 101: ('Switching Protocols', 'Switching to new protocol; obey Upgrade header'), 200: ('OK', 'Request fulfilled, document follows'), 201: ('Created', 'Document created, URL follows'), 202: ('Accepted', 'Request accepted, processing continues off-line'), 203: ('Non-Authoritative Information', 'Request fulfilled from cache'), 204: ('No Content', 'Request fulfilled, nothing follows'), 205: ('Reset Content', 'Clear input form for further input.'), 206: ('Partial Content', 'Partial content follows.'), 300: ('Multiple Choices', 'Object has several resources -- see URI list'), 301: ('Moved Permanently', 'Object moved permanently -- see URI list'), 302: ('Found', 'Object moved temporarily -- see URI list'), 303: ('See Other', 'Object moved -- see Method and URL list'), 304: ('Not Modified', 'Document has not changed since given time'), 305: ('Use Proxy', 'You must use proxy specified in Location to access this ' 'resource.'), 307: ('Temporary Redirect', 'Object moved temporarily -- see URI list'), 400: ('Bad Request', 'Bad request syntax or unsupported method'), 401: ('Unauthorized', 'No permission -- see authorization schemes'), 402: ('Payment Required', 'No payment -- see charging schemes'), 403: ('Forbidden', 'Request forbidden -- authorization will not help'), 404: ('Not Found', 'Nothing matches the given URI'), 405: ('Method Not Allowed', 'Specified method is invalid for this server.'), 406: ('Not Acceptable', 'URI not available in preferred format.'), 407: ('Proxy Authentication Required', 'You must authenticate with ' 'this proxy before proceeding.'), 408: ('Request Timeout', 'Request timed out; try again later.'), 409: ('Conflict', 'Request conflict.'), 410: ('Gone', 'URI no longer exists and has been permanently removed.'), 411: ('Length Required', 'Client must specify Content-Length.'), 412: ('Precondition Failed', 'Precondition in headers is false.'), 413: ('Request Entity Too Large', 'Entity is too large.'), 414: ('Request-URI Too Long', 'URI is too long.'), 415: ('Unsupported Media Type', 'Entity body in unsupported format.'), 416: ('Requested Range Not Satisfiable', 'Cannot satisfy request range.'), 417: ('Expectation Failed', 'Expect condition could not be satisfied.'), 500: ('Internal Server Error', 'Server got itself in trouble'), 501: ('Not Implemented', 'Server does not support this operation'), 502: ('Bad Gateway', 'Invalid responses from another server/proxy.'), 503: ('Service Unavailable', 'The server cannot process the request due to a high load'), 504: ('Gateway Timeout', 'The gateway server did not receive a timely response'), 505: ('HTTP Version Not Supported', 'Cannot fulfill request.'), } 5.

Ổ cắm và lớp

Hỗ trợ Python cho việc tìm nạp các tài nguyên từ web được xếp lớp. Urllib sử dụng thư viện

# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
responses = {
    100: ('Continue', 'Request received, please continue'),
    101: ('Switching Protocols',
          'Switching to new protocol; obey Upgrade header'),

    200: ('OK', 'Request fulfilled, document follows'),
    201: ('Created', 'Document created, URL follows'),
    202: ('Accepted',
          'Request accepted, processing continues off-line'),
    203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
    204: ('No Content', 'Request fulfilled, nothing follows'),
    205: ('Reset Content', 'Clear input form for further input.'),
    206: ('Partial Content', 'Partial content follows.'),

    300: ('Multiple Choices',
          'Object has several resources -- see URI list'),
    301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
    302: ('Found', 'Object moved temporarily -- see URI list'),
    303: ('See Other', 'Object moved -- see Method and URL list'),
    304: ('Not Modified',
          'Document has not changed since given time'),
    305: ('Use Proxy',
          'You must use proxy specified in Location to access this '
          'resource.'),
    307: ('Temporary Redirect',
          'Object moved temporarily -- see URI list'),

    400: ('Bad Request',
          'Bad request syntax or unsupported method'),
    401: ('Unauthorized',
          'No permission -- see authorization schemes'),
    402: ('Payment Required',
          'No payment -- see charging schemes'),
    403: ('Forbidden',
          'Request forbidden -- authorization will not help'),
    404: ('Not Found', 'Nothing matches the given URI'),
    405: ('Method Not Allowed',
          'Specified method is invalid for this server.'),
    406: ('Not Acceptable', 'URI not available in preferred format.'),
    407: ('Proxy Authentication Required', 'You must authenticate with '
          'this proxy before proceeding.'),
    408: ('Request Timeout', 'Request timed out; try again later.'),
    409: ('Conflict', 'Request conflict.'),
    410: ('Gone',
          'URI no longer exists and has been permanently removed.'),
    411: ('Length Required', 'Client must specify Content-Length.'),
    412: ('Precondition Failed', 'Precondition in headers is false.'),
    413: ('Request Entity Too Large', 'Entity is too large.'),
    414: ('Request-URI Too Long', 'URI is too long.'),
    415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
    416: ('Requested Range Not Satisfiable',
          'Cannot satisfy request range.'),
    417: ('Expectation Failed',
          'Expect condition could not be satisfied.'),

    500: ('Internal Server Error', 'Server got itself in trouble'),
    501: ('Not Implemented',
          'Server does not support this operation'),
    502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
    503: ('Service Unavailable',
          'The server cannot process the request due to a high load'),
    504: ('Gateway Timeout',
          'The gateway server did not receive a timely response'),
    505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
    }

6, lần lượt sử dụng thư viện ổ cắm.

#
# read the data from the URL and print it
#
import urllib.request
# open a connection to a URL using urllib
webUrl  = urllib.request.urlopen('https://www.youtube.com/user/guru99com')

#get the result code and print it
print ("result code: " + str(webUrl.getcode()))

# read the data from the URL and print it
data = webUrl.read()
print (data)

6

Kể từ Python 2.3, bạn có thể chỉ định thời gian một ổ cắm nên đợi phản hồi trước khi hết thời gian. Điều này có thể hữu ích trong các ứng dụng phải tìm nạp các trang web. Theo mặc định, mô -đun ổ cắm không có thời gian chờ và có thể treo. Hiện tại, thời gian chờ ổ cắm không được phơi bày tại các cấp độ http.client hoặc urllib.request. Tuy nhiên, bạn có thể đặt thời gian chờ mặc định trên toàn cầu cho tất cả các ổ cắm bằng cách sử dụng

Chú thích

1

Tài liệu này đã được xem xét và sửa đổi bởi John Lee.

2

Google ví dụ.

3

Trình duyệt đánh hơi là một thực tiễn rất xấu cho thiết kế trang web - các trang web xây dựng sử dụng các tiêu chuẩn web là hợp lý hơn nhiều. Thật không may, rất nhiều trang web vẫn gửi các phiên bản khác nhau đến các trình duyệt khác nhau.

4

Tác nhân người dùng cho MSIE 6 là ‘Mozilla/4.0 (tương thích; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

5

Để biết chi tiết về các tiêu đề yêu cầu HTTP nhiều hơn, hãy xem tham chiếu nhanh đến các tiêu đề HTTP.

6

Trong trường hợp của tôi, tôi phải sử dụng proxy để truy cập internet tại nơi làm việc. Nếu bạn cố gắng tìm nạp các URL localhost thông qua proxy này, nó sẽ chặn chúng. IE được thiết lập để sử dụng proxy, mà Urllib chọn. Để kiểm tra các tập lệnh với máy chủ localhost, tôi phải ngăn Urllib sử dụng proxy.

programming python Urllib Python Python requests Urllib3 Urllib2 Real python requests BeautifulSoup Download file Python

Hướng dẫn urllib.request python - trăn urllib.request

Để biết chi tiết về các tiêu đề yêu cầu HTTP nhiều hơn, hãy xem tham chiếu nhanh đến các tiêu đề HTTP.

Michael Foord

Dữ liệu¶

Xử lý các trường hợp ngoại lệ

Urlerror¶

Lỗi HTTP¶

Mã lỗi

Gói nó lên

Số 1¶

Số 2¶

thông tin và geturl¶

Người mở và người xử lý

Xác thực cơ bản

Proxy công

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội