python爬蟲(chóng)多次請(qǐng)求超時(shí)的幾種重試方法(6種)
第一種方法
headers = Dict()
url = 'https://www.baidu.com'
try:
proxies = None
response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)
except:
# logdebug('requests failed one time')
try:
proxies = None
response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)
except:
# logdebug('requests failed two time')
print('requests failed two time')
總結(jié) :代碼比較冗余,重試try的次數(shù)越多,代碼行數(shù)越多,但是打印日志比較方便
第二種方法
def requestDemo(url,):
headers = Dict()
trytimes = 3 # 重試的次數(shù)
for i in range(trytimes):
try:
proxies = None
response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)
# 注意此處也可能是302等狀態(tài)碼
if response.status_code == 200:
break
except:
# logdebug(f'requests failed {i}time')
print(f'requests failed {i} time')
總結(jié) :遍歷代碼明顯比第一個(gè)簡(jiǎn)化了很多,打印日志也方便
第三種方法
def requestDemo(url, times=1):
headers = Dict()
try:
proxies = None
response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)
html = response.text()
# todo 此處處理代碼正常邏輯
pass
return html
except:
# logdebug(f'requests failed {i}time')
trytimes = 3 # 重試的次數(shù)
if times < trytimes:
times += 1
return requestDemo(url, times)
return 'out of maxtimes'
總結(jié) :迭代 顯得比較高大上,中間處理代碼時(shí)有其它錯(cuò)誤照樣可以進(jìn)行重試; 缺點(diǎn) 不太好理解,容易出錯(cuò),另外try包含的內(nèi)容過(guò)多時(shí),對(duì)代碼運(yùn)行速度不利。
第四種方法
@retry(3) # 重試的次數(shù) 3
def requestDemo(url):
headers = Dict()
proxies = None
response = requests.get(url, headers=headers, verify=False, proxies=None, timeout=3)
html = response.text()
# todo 此處處理代碼正常邏輯
pass
return html
def retry(times):
def wrapper(func):
def inner_wrapper(*args, **kwargs):
i = 0
while i < times:
try:
print(i)
return func(*args, **kwargs)
except:
# 此處打印日志 func.__name__ 為say函數(shù)
print("logdebug: {}()".format(func.__name__))
i += 1
return inner_wrapper
return wrapper
總結(jié) :裝飾器優(yōu)點(diǎn) 多種函數(shù)復(fù)用,使用十分方便
第五種方法
#!/usr/bin/python
# -*-coding='utf-8' -*-
import requests
import time
import json
from lxml import etree
import warnings
warnings.filterwarnings("ignore")
def get_xiaomi():
try:
# for n in range(5): # 重試5次
# print("第"+str(n)+"次")
for a in range(5): # 重試5次
print(a)
url = "https://www.mi.com/"
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
"Connection": "keep-alive",
# "Cookie": "xmuuid=XMGUEST-D80D9CE0-910B-11EA-8EE0-3131E8FF9940; Hm_lvt_c3e3e8b3ea48955284516b186acf0f4e=1588929065; XM_agreement=0; pageid=81190ccc4d52f577; lastsource=www.baidu.com; mstuid=1588929065187_5718; log_code=81190ccc4d52f577-e0f893c4337cbe4d|https%3A%2F%2Fwww.mi.com%2F; Hm_lpvt_c3e3e8b3ea48955284516b186acf0f4e=1588929099; mstz=||1156285732.7|||; xm_vistor=1588929065187_5718_1588929065187-1588929100964",
"Host": "www.mi.com",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Safari/537.36"
}
response = requests.get(url,headers=headers,timeout=10,verify=False)
html = etree.HTML(response.text)
# print(html)
result = etree.tostring(html)
# print(result)
print(result.decode("utf-8"))
title = html.xpath('//head/title/text()')[0]
print("title==",title)
if "左左" in title:
# print(response.status_code)
# if response.status_code ==200:
break
return title
except:
result = "異常"
return result
if __name__ == '__main__':
print(get_xiaomi())
第六種方法
Python重試模塊retrying
# 設(shè)置最大重試次數(shù)
@retry(stop_max_attempt_number=5)
def get_proxies(self):
r = requests.get('代理地址')
print('正在獲取')
raise Exception("異常")
print('獲取到最新代理 = %s' % r.text)
params = dict()
if r and r.status_code == 200:
proxy = str(r.content, encoding='utf-8')
params['http'] = 'http://' + proxy
params['https'] = 'https://' + proxy
# 設(shè)置方法的最大延遲時(shí)間,默認(rèn)為100毫秒(是執(zhí)行這個(gè)方法重試的總時(shí)間)
@retry(stop_max_attempt_number=5,stop_max_delay=50)
# 通過(guò)設(shè)置為50,我們會(huì)發(fā)現(xiàn),任務(wù)并沒(méi)有執(zhí)行5次才結(jié)束!
# 添加每次方法執(zhí)行之間的等待時(shí)間
@retry(stop_max_attempt_number=5,wait_fixed=2000)
# 隨機(jī)的等待時(shí)間
@retry(stop_max_attempt_number=5,wait_random_min=100,wait_random_max=2000)
# 每調(diào)用一次增加固定時(shí)長(zhǎng)
@retry(stop_max_attempt_number=5,wait_incrementing_increment=1000)
# 根據(jù)異常重試,先看個(gè)簡(jiǎn)單的例子
def retry_if_io_error(exception):
return isinstance(exception, IOError)
@retry(retry_on_exception=retry_if_io_error)
def read_a_file():
with open("file", "r") as f:
return f.read()
read_a_file函數(shù)如果拋出了異常,會(huì)去retry_on_exception指向的函數(shù)去判斷返回的是True還是False,如果是True則運(yùn)行指定的重試次數(shù)后,拋出異常,F(xiàn)alse的話直接拋出異常。
當(dāng)時(shí)自己測(cè)試的時(shí)候網(wǎng)上一大堆抄來(lái)抄去的,意思是retry_on_exception指定一個(gè)函數(shù),函數(shù)返回指定異常,會(huì)重試,不是異常會(huì)退出。真坑人啊!
來(lái)看看獲取代理的應(yīng)用(僅僅是為了測(cè)試retrying模塊)
到此這篇關(guān)于python爬蟲(chóng)多次請(qǐng)求超時(shí)的幾種重試方法的文章就介紹到這了,更多相關(guān)python爬蟲(chóng)多次請(qǐng)求超時(shí)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
Django RBAC權(quán)限管理設(shè)計(jì)過(guò)程詳解
這篇文章主要介紹了Django RBAC權(quán)限管理設(shè)計(jì)過(guò)程詳解,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2019-08-08
Python讀寫(xiě)及備份oracle數(shù)據(jù)庫(kù)操作示例
這篇文章主要介紹了Python讀寫(xiě)及備份oracle數(shù)據(jù)庫(kù)操作,結(jié)合實(shí)例形式分析了Python針對(duì)Oracle數(shù)據(jù)庫(kù)操作的相關(guān)庫(kù)安裝,以及使用cx_Oracle與pandas庫(kù)進(jìn)行Oracle數(shù)據(jù)庫(kù)的查詢、插入、備份等操作相關(guān)實(shí)現(xiàn)技巧,需要的朋友可以參考下2018-05-05
Python圖片轉(zhuǎn)gif方式(將靜態(tài)圖轉(zhuǎn)化為分塊加載的動(dòng)態(tài)圖)
這篇文章主要介紹了Python圖片轉(zhuǎn)gif方式(將靜態(tài)圖轉(zhuǎn)化為分塊加載的動(dòng)態(tài)圖),具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2023-11-11
Pycharm創(chuàng)建python文件自動(dòng)添加日期作者等信息(步驟詳解)
這篇文章主要介紹了Pycharm創(chuàng)建python文件自動(dòng)添加日期作者等信息(步驟詳解),本文分步驟給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2021-02-02

