python爬蟲爬取bilibili網(wǎng)頁基本內(nèi)容

更新時間：2022年01月25日 16:01:09 作者：小木_.

這篇文章主要介紹了python爬蟲爬取bilibili網(wǎng)頁基本內(nèi)容，用爬蟲爬取bilibili網(wǎng)站排行榜游戲類的所有名稱及鏈接，下面來看看具體的實現(xiàn)過程吧,需要的朋友可以參考一下

用爬蟲爬取bilibili網(wǎng)站排行榜游戲類的所有名稱及鏈接：

導(dǎo)入requests、BeautifulSoup

import requests
from bs4 import BeautifulSoup

然后我們需要插入網(wǎng)站鏈接并且要解析網(wǎng)站并打印出來：

e = requests.get('https://www.bilibili.com/v/popular/rank/game') ? #當(dāng)前網(wǎng)站鏈接
html = e.content
soup = BeautifulSoup(html,'html.parser') ? ?#解析html
print(soup)

我們可以看到密密麻麻的代碼函數(shù)，但不太簡潔明了，我們?nèi)?yōu)化一下

繼續(xù)插入如下代碼這個代碼是可以爬取我們想要的類，可以更簡介的簡化代碼

div_people_list = soup.find('ul', attrs={'class': 'rank-list'}) ?#爬取ul類class為rank-list下的數(shù)據(jù)

可以看到還是不夠簡介:

繼續(xù)插入如下代碼:

ca_s = div_people_list.find_all('a', attrs={'class': 'title'}) ? #爬取a類class為title下的數(shù)據(jù)

可以看到鏈接及主題都提取出來了，但還是有瑕疵:

我們加入這行代碼挨個打印并提取標(biāo)題及鏈接，由于鏈接提取出來的是//www.bilibili.com/video/BV1yZ4y1D7ef

前面沒有http：點擊進去會出現(xiàn)錯誤，所有我們需要在前面加入http:進行連接在一起打印

for t in ca_s:
? ? url = t['href']
? ? name = t.get_text()
? ? print(name+'\t點擊鏈接直接觀看鏈接：'+f'http:{url}')

可以看到我們的標(biāo)題及連接都爬取出來了

完整代碼：

import requests
from bs4 import BeautifulSoup
?
e = requests.get('https://www.bilibili.com/v/popular/rank/game') ? #當(dāng)前網(wǎng)站鏈接
html = e.content
soup = BeautifulSoup(html,'html.parser') ? ?#解析html
div_people_list = soup.find('ul', attrs={'class': 'rank-list'}) ?#爬取ul類class為rank-list下的數(shù)據(jù)
ca_s = div_people_list.find_all('a', attrs={'class': 'title'}) ? #爬取a類class為title下的數(shù)據(jù)
?
#挨個傳輸?shù)絫，然后打印數(shù)據(jù)
for t in ca_s:
? ? url = t['href']
? ? name = t.get_text()
? ? print(name+'\t點擊鏈接直接觀看鏈接：'+f'http:{url}')

到此這篇關(guān)于python爬蟲爬取bilibili網(wǎng)頁基本內(nèi)容的文章就介紹到這了,更多相關(guān)python爬取bilibili網(wǎng)頁內(nèi)容內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: