python爬蟲(chóng)請(qǐng)求頭設(shè)置代碼

更新時(shí)間：2020年07月28日 14:06:15 作者：yang

在本篇文章里小編給大家整理的是一篇關(guān)于python爬蟲(chóng)請(qǐng)求頭如何設(shè)置內(nèi)容，需要的朋友們可以學(xué)習(xí)下。

一、requests設(shè)置請(qǐng)求頭:

import requests

url="http://www.targetweb.com"

headers={

'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',

'Cache-Control':'max-age=0',

'Connection':'keep-alive',

'Referer':'http://www.baidu.com/',

'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4882.400 QQBrowser/9.7.13059.400'}

res=requests.get(url,headers=headers)

#圖片下載時(shí)要用到字節(jié)流，請(qǐng)求方式如下

#res=requests.get(url,stream=True,headers)

二、Selenium+Chrome請(qǐng)求頭設(shè)置:

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('lang=zh_CN.UTF-8')# 設(shè)置中文
options.add_argument('user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4882.400 QQBrowser/9.7.13059.400"')# 設(shè)置頭部
browser = webdriver.Chrome(chrome_options=options)
url="http://www.targetweb.com"
browser.get(url)
browser.quit()

三、selenium+phantomjs請(qǐng)求頭設(shè)置：

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
des_cap = dict(DesiredCapabilities.PHANTOMJS)
des_cap["phantomjs.page.settings.userAgent"] = ("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4882.400 QQBrowser/9.7.13059.400")
browser = webdriver.PhantomJS(desired_capabilities=des_cap)
url="http://www.targetweb.com"
browser.get(url)
browser.quit()

四、爬蟲(chóng)框架scrapy設(shè)置請(qǐng)求頭：

在settings.py文件中添加如下：

DEFAULT_REQUEST_HEADERS = {
'accept': 'image/webp,*/*;q=0.8',
'accept-language': 'zh-CN,zh;q=0.8',
'referer': 'https://www.baidu.com/',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4882.400 QQBrowser/9.7.13059.400',}

五、Python異步Aiohttp請(qǐng)求頭設(shè)置:

import aiohttp
url="http://www.targetweb.com"
headers={
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Referer':'http://www.baidu.com/',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4882.400 QQBrowser/9.7.13059.400'}
asyncwithaiohttp.ClientSession(headers=headers)assession:
asyncwithsession.get(url)asresp:
print(resp.status)
print(awaitresp.text())

內(nèi)容擴(kuò)展：

1、為什么要設(shè)置headers?

在請(qǐng)求網(wǎng)頁(yè)爬取的時(shí)候，輸出的text信息中會(huì)出現(xiàn)抱歉，無(wú)法訪問(wèn)等字眼，這就是禁止爬取，需要通過(guò)反爬機(jī)制去解決這個(gè)問(wèn)題。

headers是解決requests請(qǐng)求反爬的方法之一，相當(dāng)于我們進(jìn)去這個(gè)網(wǎng)頁(yè)的服務(wù)器本身，假裝自己本身在爬取數(shù)據(jù)。

對(duì)反爬蟲(chóng)網(wǎng)頁(yè)，可以設(shè)置一些headers信息，模擬成瀏覽器取訪問(wèn)網(wǎng)站。

2、 headers在哪里找？

谷歌或者火狐瀏覽器，在網(wǎng)頁(yè)面上點(diǎn)擊：右鍵–>檢查–>剩余按照?qǐng)D中顯示操作，需要按Fn+F5刷新出網(wǎng)頁(yè)來(lái)

有的瀏覽器是點(diǎn)擊：右鍵->查看元素，刷新

以上就是python爬蟲(chóng)請(qǐng)求頭設(shè)置代碼的詳細(xì)內(nèi)容，更多關(guān)于python爬蟲(chóng)請(qǐng)求頭如何設(shè)置的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: