Python使用Asyncio實(shí)現(xiàn)檢查網(wǎng)站狀態(tài)

更新時(shí)間：2023年03月30日 14:23:42 作者：冷凍工廠

這篇文章主要為大家詳細(xì)介紹了Python如何使用Asyncio實(shí)現(xiàn)檢查網(wǎng)站狀態(tài)，文中的示例代碼講解詳細(xì)，感興趣的小伙伴可以跟隨小編一起學(xué)習(xí)一下

1. 如何使用 Asyncio 檢查 HTTP 狀態(tài)

asyncio 模塊提供了對打開套接字連接和通過流讀寫數(shù)據(jù)的支持。我們可以使用此功能來檢查網(wǎng)頁的狀態(tài)。

這可能涉及四個(gè)步驟，它們是：

打開一個(gè)連接
寫一個(gè)請求
讀取響應(yīng)
關(guān)閉連接

2. 打開 HTTP 連接

可以使用 asyncio.open_connection() 函數(shù)在 asyncio 中打開連接。在眾多參數(shù)中，該函數(shù)采用字符串主機(jī)名和整數(shù)端口號。

這是一個(gè)必須等待的協(xié)程，它返回一個(gè) StreamReader 和一個(gè) StreamWriter，用于使用套接字進(jìn)行讀寫。

這可用于在端口 80 上打開 HTTP 連接。

...
# open a socket connection
reader, writer = await asyncio.open_connection('www.google.com', 80)

我們還可以使用 ssl=True 參數(shù)打開 SSL 連接。這可用于在端口 443 上打開 HTTPS 連接。

...
# open a socket connection
reader, writer = await asyncio.open_connection('www.google.com', 443)

3. 寫入 HTTP 請求

打開后，我們可以向 StreamWriter 寫入查詢以發(fā)出 HTTP 請求。例如，HTTP 版本 1.1 請求是純文本格式的。我們可以請求文件路徑“/”，它可能如下所示：

GET / HTTP/1.1
Host: www.google.com

重要的是，每行末尾必須有一個(gè)回車和一個(gè)換行符（\r\n），末尾有一個(gè)空行。

作為 Python 字符串，這可能如下所示：

'GET / HTTP/1.1\r\n'
'Host: www.google.com\r\n'
'\r\n'

在寫入 StreamWriter 之前，此字符串必須編碼為字節(jié)。這可以通過對字符串本身使用 encode() 方法來實(shí)現(xiàn)。默認(rèn)的“utf-8”編碼可能就足夠了。

...
# encode string as bytes
byte_data = string.encode()

然后可以通過 StreamWriter 的 write() 方法將字節(jié)寫入套接字。

...
# write query to socket
writer.write(byte_data)

寫入請求后，最好等待字節(jié)數(shù)據(jù)發(fā)送完畢并等待套接字準(zhǔn)備就緒。這可以通過 drain() 方法來實(shí)現(xiàn)。這是一個(gè)必須等待的協(xié)程。

...
# wait for the socket to be ready.
await writer.drain()

4. 讀取 HTTP 響應(yīng)

發(fā)出 HTTP 請求后，我們可以讀取響應(yīng)。這可以通過套接字的 StreamReader 來實(shí)現(xiàn)?？梢允褂米x取一大塊字節(jié)的 read() 方法或讀取一行字節(jié)的 readline() 方法來讀取響應(yīng)。

我們可能更喜歡 readline() 方法，因?yàn)槲覀兪褂玫氖腔谖谋镜?HTTP 協(xié)議，它一次發(fā)送一行 HTML 數(shù)據(jù)。readline() 方法是協(xié)程，必須等待。

...
# read one line of response
line_bytes = await reader.readline()

HTTP 1.1 響應(yīng)由兩部分組成，一個(gè)由空行分隔的標(biāo)頭，然后是一個(gè)空行終止的主體。header 包含有關(guān)請求是否成功以及將發(fā)送什么類型的文件的信息，body 包含文件的內(nèi)容，例如 HTML 網(wǎng)頁。

HTTP 標(biāo)頭的第一行包含服務(wù)器上所請求頁面的 HTTP 狀態(tài)。每行都必須從字節(jié)解碼為字符串。

這可以通過對字節(jié)數(shù)據(jù)使用 decode() 方法來實(shí)現(xiàn)。同樣，默認(rèn)編碼為“utf_8”。

...
# decode bytes into a string
line_data = line_bytes.decode()

5. 關(guān)閉 HTTP 連接

我們可以通過關(guān)閉 StreamWriter 來關(guān)閉套接字連接。這可以通過調(diào)用 close() 方法來實(shí)現(xiàn)。

...
# close the connection
writer.close()

這不會阻塞并且可能不會立即關(guān)閉套接字?，F(xiàn)在我們知道如何使用 asyncio 發(fā)出 HTTP 請求和讀取響應(yīng)，讓我們看一些檢查網(wǎng)頁狀態(tài)的示例。

6. 順序檢查 HTTP 狀態(tài)的示例

我們可以開發(fā)一個(gè)示例來使用 asyncio 檢查多個(gè)網(wǎng)站的 HTTP 狀態(tài)。

在此示例中，我們將首先開發(fā)一個(gè)協(xié)程來檢查給定 URL 的狀態(tài)。然后我們將為排名前 10 的網(wǎng)站中的每一個(gè)調(diào)用一次這個(gè)協(xié)程。

首先，我們可以定義一個(gè)協(xié)程，它將接受一個(gè) URL 字符串并返回 HTTP 狀態(tài)。

# get the HTTP/S status of a webpage
async def get_status(url):
	# ...

必須將 URL 解析為其組成部分。我們在發(fā)出 HTTP 請求時(shí)需要主機(jī)名和文件路徑。我們還需要知道 URL 方案（HTTP 或 HTTPS）以確定是否需要 SSL。

這可以使用 urllib.parse.urlsplit() 函數(shù)來實(shí)現(xiàn)，該函數(shù)接受一個(gè) URL 字符串并返回所有 URL 元素的命名元組。

...
# split the url into components
url_parsed = urlsplit(url)

然后我們可以打開基于 URL 方案的 HTTP 連接并使用 URL 主機(jī)名。

...
# open the connection
if url_parsed.scheme == 'https':
    reader, writer = await asyncio.open_connection(url_parsed.hostname, 443, ssl=True)
else:
    reader, writer = await asyncio.open_connection(url_parsed.hostname, 80)

接下來，我們可以使用主機(jī)名和文件路徑創(chuàng)建 HTTP GET 請求，并使用 StreamWriter 將編碼字節(jié)寫入套接字。

...
# send GET request
query = f'GET {url_parsed.path} HTTP/1.1\r\nHost: {url_parsed.hostname}\r\n\r\n'
# write query to socket
writer.write(query.encode())
# wait for the bytes to be written to the socket
await writer.drain()

接下來，我們可以讀取 HTTP 響應(yīng)。我們只需要包含 HTTP 狀態(tài)的響應(yīng)的第一行。

...
# read the single line response
response = await reader.readline()

然后可以關(guān)閉連接。

...
# close the connection
writer.close()

最后，我們可以解碼從服務(wù)器讀取的字節(jié)、遠(yuǎn)程尾隨空白，并返回 HTTP 狀態(tài)。

...
# decode and strip white space
status = response.decode().strip()
# return the response
return status

將它們結(jié)合在一起，下面列出了完整的 get_status() 協(xié)程。它沒有任何錯(cuò)誤處理，例如無法訪問主機(jī)或響應(yīng)緩慢的情況。這些添加將為讀者提供一個(gè)很好的擴(kuò)展。

# get the HTTP/S status of a webpage
async def get_status(url):
    # split the url into components
    url_parsed = urlsplit(url)
    # open the connection
    if url_parsed.scheme == 'https':
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 443, ssl=True)
    else:
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 80)
    # send GET request
    query = f'GET {url_parsed.path} HTTP/1.1\r\nHost: {url_parsed.hostname}\r\n\r\n'
    # write query to socket
    writer.write(query.encode())
    # wait for the bytes to be written to the socket
    await writer.drain()
    # read the single line response
    response = await reader.readline()
    # close the connection
    writer.close()
    # decode and strip white space
    status = response.decode().strip()
    # return the response
    return status

接下來，我們可以為我們要檢查的多個(gè)網(wǎng)頁或網(wǎng)站調(diào)用 get_status() 協(xié)程。在這種情況下，我們將定義一個(gè)世界排名前 10 的網(wǎng)頁列表。

...
# list of top 10 websites to check
sites = ['https://www.google.com/',
    'https://www.youtube.com/',
    'https://www.facebook.com/',
    'https://twitter.com/',
    'https://www.instagram.com/',
    'https://www.baidu.com/',
    'https://www.wikipedia.org/',
    'https://yandex.ru/',
    'https://yahoo.com/',
    'https://www.whatsapp.com/'
    ]

然后我們可以使用我們的 get_status() 協(xié)程依次查詢每個(gè)。在這種情況下，我們將在一個(gè)循環(huán)中按順序這樣做，并依次報(bào)告每個(gè)狀態(tài)。

...
# check the status of all websites
for url in sites:
    # get the status for the url
    status = await get_status(url)
    # report the url and its status
    print(f'{url:30}:\t{status}')

在使用 asyncio 時(shí)，我們可以做得比順序更好，但這提供了一個(gè)很好的起點(diǎn)，我們可以在以后進(jìn)行改進(jìn)。將它們結(jié)合在一起，main() 協(xié)程查詢前 10 個(gè)網(wǎng)站的狀態(tài)。

# main coroutine
async def main():
    # list of top 10 websites to check
    sites = ['https://www.google.com/',
        'https://www.youtube.com/',
        'https://www.facebook.com/',
        'https://twitter.com/',
        'https://www.instagram.com/',
        'https://www.baidu.com/',
        'https://www.wikipedia.org/',
        'https://yandex.ru/',
        'https://yahoo.com/',
        'https://www.whatsapp.com/'
        ]
    # check the status of all websites
    for url in sites:
        # get the status for the url
        status = await get_status(url)
        # report the url and its status
        print(f'{url:30}:\t{status}')

最后，我們可以創(chuàng)建 main() 協(xié)程并將其用作 asyncio 程序的入口點(diǎn)。

...
# run the asyncio program
asyncio.run(main())

將它們結(jié)合在一起，下面列出了完整的示例。

# SuperFastPython.com
# check the status of many webpages
import asyncio
from urllib.parse import urlsplit
 
# get the HTTP/S status of a webpage
async def get_status(url):
    # split the url into components
    url_parsed = urlsplit(url)
    # open the connection
    if url_parsed.scheme == 'https':
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 443, ssl=True)
    else:
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 80)
    # send GET request
    query = f'GET {url_parsed.path} HTTP/1.1\r\nHost: {url_parsed.hostname}\r\n\r\n'
    # write query to socket
    writer.write(query.encode())
    # wait for the bytes to be written to the socket
    await writer.drain()
    # read the single line response
    response = await reader.readline()
    # close the connection
    writer.close()
    # decode and strip white space
    status = response.decode().strip()
    # return the response
    return status
 
# main coroutine
async def main():
    # list of top 10 websites to check
    sites = ['https://www.google.com/',
        'https://www.youtube.com/',
        'https://www.facebook.com/',
        'https://twitter.com/',
        'https://www.instagram.com/',
        'https://www.baidu.com/',
        'https://www.wikipedia.org/',
        'https://yandex.ru/',
        'https://yahoo.com/',
        'https://www.whatsapp.com/'
        ]
    # check the status of all websites
    for url in sites:
        # get the status for the url
        status = await get_status(url)
        # report the url and its status
        print(f'{url:30}:\t{status}')
 
# run the asyncio program
asyncio.run(main())

運(yùn)行示例首先創(chuàng)建 main() 協(xié)程并將其用作程序的入口點(diǎn)。main() 協(xié)程運(yùn)行，定義前 10 個(gè)網(wǎng)站的列表。然后順序遍歷網(wǎng)站列表。 main()協(xié)程掛起調(diào)用get_status()協(xié)程查詢一個(gè)網(wǎng)站的狀態(tài)。

get_status() 協(xié)程運(yùn)行、解析 URL 并打開連接。它構(gòu)造一個(gè) HTTP GET 查詢并將其寫入主機(jī)。讀取、解碼并返回響應(yīng)。main() 協(xié)程恢復(fù)并報(bào)告 URL 的 HTTP 狀態(tài)。

對列表中的每個(gè) URL 重復(fù)此操作。該程序大約需要 5.6 秒才能完成，或者平均每個(gè) URL 大約需要半秒。這突出了我們?nèi)绾问褂?asyncio 來查詢網(wǎng)頁的 HTTP 狀態(tài)。

盡管如此，它并沒有充分利用 asyncio 來并發(fā)執(zhí)行任務(wù)。

https://www.google.com/ :   HTTP/1.1 200 OK
https://www.youtube.com/ :   HTTP/1.1 200 OK
https://www.facebook.com/ :   HTTP/1.1 302 Found
https://twitter.com/ :   HTTP/1.1 200 OK
https://www.instagram.com/ :   HTTP/1.1 200 OK
https://www.baidu.com/ :   HTTP/1.1 200 OK
https://www.wikipedia.org/ :   HTTP/1.1 200 OK
https://yandex.ru/ :   HTTP/1.1 302 Moved temporarily
https://yahoo.com/ :   HTTP/1.1 301 Moved Permanently
https://www.whatsapp.com/ :   HTTP/1.1 302 Found

7. 并發(fā)查看網(wǎng)站狀態(tài)示例

asyncio 的一個(gè)好處是我們可以同時(shí)執(zhí)行許多協(xié)程。我們可以使用 asyncio.gather() 函數(shù)在 asyncio 中并發(fā)查詢網(wǎng)站的狀態(tài)。

此函數(shù)采用一個(gè)或多個(gè)協(xié)程，暫停執(zhí)行提供的協(xié)程，并將每個(gè)協(xié)程的結(jié)果作為可迭代對象返回。然后我們可以遍歷 URL 列表和可迭代的協(xié)程返回值并報(bào)告結(jié)果。

這可能是比上述方法更簡單的方法。首先，我們可以創(chuàng)建一個(gè)協(xié)程列表。

...
# create all coroutine requests
coros = [get_status(url) for url in sites]

接下來，我們可以執(zhí)行協(xié)程并使用 asyncio.gather() 獲取可迭代的結(jié)果。

請注意，我們不能直接提供協(xié)程列表，而是必須將列表解壓縮為單獨(dú)的表達(dá)式，這些表達(dá)式作為位置參數(shù)提供給函數(shù)。

...
# execute all coroutines and wait
results = await asyncio.gather(*coros)

這將同時(shí)執(zhí)行所有協(xié)程并檢索它們的結(jié)果。然后我們可以遍歷 URL 列表和返回狀態(tài)并依次報(bào)告每個(gè)。

...
# process all results
for url, status in zip(sites, results):
    # report status
    print(f'{url:30}:\t{status}')

將它們結(jié)合在一起，下面列出了完整的示例。

# SuperFastPython.com
# check the status of many webpages
import asyncio
from urllib.parse import urlsplit
 
# get the HTTP/S status of a webpage
async def get_status(url):
    # split the url into components
    url_parsed = urlsplit(url)
    # open the connection
    if url_parsed.scheme == 'https':
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 443, ssl=True)
    else:
        reader, writer = await asyncio.open_connection(url_parsed.hostname, 80)
    # send GET request
    query = f'GET {url_parsed.path} HTTP/1.1\r\nHost: {url_parsed.hostname}\r\n\r\n'
    # write query to socket
    writer.write(query.encode())
    # wait for the bytes to be written to the socket
    await writer.drain()
    # read the single line response
    response = await reader.readline()
    # close the connection
    writer.close()
    # decode and strip white space
    status = response.decode().strip()
    # return the response
    return status
 
# main coroutine
async def main():
    # list of top 10 websites to check
    sites = ['https://www.google.com/',
        'https://www.youtube.com/',
        'https://www.facebook.com/',
        'https://twitter.com/',
        'https://www.instagram.com/',
        'https://www.baidu.com/',
        'https://www.wikipedia.org/',
        'https://yandex.ru/',
        'https://yahoo.com/',
        'https://www.whatsapp.com/'
        ]
    # create all coroutine requests
    coros = [get_status(url) for url in sites]
    # execute all coroutines and wait
    results = await asyncio.gather(*coros)
    # process all results
    for url, status in zip(sites, results):
        # report status
        print(f'{url:30}:\t{status}')
 
# run the asyncio program
asyncio.run(main())

運(yùn)行該示例會像以前一樣執(zhí)行 main() 協(xié)程。在這種情況下，協(xié)程列表是在列表理解中創(chuàng)建的。

然后調(diào)用 asyncio.gather() 函數(shù)，傳遞協(xié)程并掛起 main() 協(xié)程，直到它們?nèi)客瓿伞f(xié)程執(zhí)行，同時(shí)查詢每個(gè)網(wǎng)站并返回它們的狀態(tài)。

main() 協(xié)程恢復(fù)并接收可迭代的狀態(tài)值。然后使用 zip() 內(nèi)置函數(shù)遍歷此可迭代對象和 URL 列表，并報(bào)告狀態(tài)。

這突出了一種更簡單的方法來同時(shí)執(zhí)行協(xié)程并在所有任務(wù)完成后報(bào)告結(jié)果。它也比上面的順序版本更快，在我的系統(tǒng)上完成大約 1.4 秒。

https://www.google.com/ :   HTTP/1.1 200 OK
https://www.youtube.com/ :   HTTP/1.1 200 OK
https://www.facebook.com/ :   HTTP/1.1 302 Found
https://twitter.com/ :   HTTP/1.1 200 OK
https://www.instagram.com/ :   HTTP/1.1 200 OK
https://www.baidu.com/ :   HTTP/1.1 200 OK
https://www.wikipedia.org/ :   HTTP/1.1 200 OK
https://yandex.ru/ :   HTTP/1.1 302 Moved temporarily
https://yahoo.com/ :   HTTP/1.1 301 Moved Permanently
https://www.whatsapp.com/ :   HTTP/1.1 302 Found

以上就是Python使用Asyncio實(shí)現(xiàn)檢查網(wǎng)站狀態(tài)的詳細(xì)內(nèi)容，更多關(guān)于Python Asyncio檢查網(wǎng)站狀態(tài)的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: