python爬取網(wǎng)頁數(shù)據(jù)到保存到csv

更新時間：2022年01月07日 09:10:50 作者：wh來啦

大家好，本篇文章主要講的是python爬取網(wǎng)頁數(shù)據(jù)到保存到csv，感興趣的同學(xué)趕快來看一看吧，對你有幫助的話記得收藏一下，方便下次瀏覽

任務(wù)需求：

爬取一個網(wǎng)址，將網(wǎng)址的數(shù)據(jù)保存到csv中。

爬取網(wǎng)址：

https://www.iqiyi.com/ranks1/1/0?vfrm=pcw_home&vfrmblk=&vfrmrst=712211_dianyingbang_rebo_title

網(wǎng)址頁面：

代碼實現(xiàn)結(jié)果：

代碼實現(xiàn)：

導(dǎo)入包：

import requests
import parsel
import csv

設(shè)置csv文件格式：

設(shè)計未來數(shù)據(jù)的存儲形式。

#打開文件
f = open('whxixi.csv', mode='a',encoding='utf-8',newline='')
 
#文件列名
csv_writer= csv.DictWriter(f,fieldnames=['電影名字',
    '彈幕總數(shù)',
    '新增評論',
    '電影鏈接',
    '電影日期',
    '電影類型',
    '電影演員',
    '電影介紹'])
 
#輸入文件列名
csv_writer.writeheader()

獲取數(shù)據(jù)：

獲取網(wǎng)頁的html，得到原始的數(shù)據(jù)（得到的數(shù)據(jù)保存在response中）。

#選擇愛奇藝熱播榜的網(wǎng)址
url='https://www.iqiyi.com/ranks1/1/0?vfrm=pcw_home&vfrmblk=&vfrmrst=712211_dianyingbang_rebo_title'
 
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'}
 
#獲取網(wǎng)址內(nèi)容，賦值 到response
response = requests.get(url=url, headers=headers)

加工數(shù)據(jù)：

對得到的網(wǎng)頁原始數(shù)據(jù)進(jìn)行加工處理，即提取出有用的數(shù)據(jù)。備注，根據(jù)爬取的網(wǎng)頁進(jìn)行調(diào)整css（）里面的內(nèi)容，不同網(wǎng)站頁面的結(jié)構(gòu)不同，根據(jù)需要進(jìn)行調(diào)整。（F12開發(fā)者模式）

#把response.text轉(zhuǎn)換為selector對象 可以使用re， css，x-path選擇器
webtext = parsel.Selector(response.text)
 
#第一步篩選數(shù)據(jù)，形成列表，可以使下次查找形成循環(huán)
list=webtext.css('.rvi__list a')
 
#再上一步的基礎(chǔ)上，使用循環(huán)，進(jìn)行提取數(shù)據(jù)
for li in list:
    title= li.css(' .rvi__con .rvi__tit1::text').get()
    bulletcomments =li.css('.rvi__con .rvi__tag__box span:nth-child(1)::text').get() #彈幕總數(shù)
    newcomments =li.css(' .rvi__con .rvi__tag__box span:nth-child(2)::text').get() #新增評論數(shù)
    href = li.css('  ::attr(href)').get().replace('//','http://')
    movie_info=li.css(' .rvi__con .rvi__type1 span::text').get().split('/')
    year = movie_info[0].strip()
    type = movie_info[1].strip()
    actor = movie_info[2].strip()
    filmIntroduction=li.css(' .rvi__con p::text').get().strip()
    dic={
        '電影名字':title,
        '彈幕總數(shù)':bulletcomments,
        '新增評論':newcomments,
        '電影鏈接':href,
        '電影日期':year,
        '電影類型':type,
        '電影演員':actor,
        '電影介紹':filmIntroduction
    }
    csv_writer.writerow(dic)  #將數(shù)據(jù)輸入到csv文件中

完整代碼：

import requests
import parsel
import csv
f = open('whxixi.csv', mode='a',encoding='utf-8',newline='')
 
csv_writer= csv.DictWriter(f,fieldnames=['電影名字',
    '彈幕總數(shù)',
    '新增評論',
    '電影鏈接',
    '電影日期',
    '電影類型',
    '電影演員',
    '電影介紹'])
 
csv_writer.writeheader()
 
#選擇愛奇藝熱播榜的網(wǎng)址
url='https://www.iqiyi.com/ranks1/1/0?vfrm=pcw_home&vfrmblk=&vfrmrst=712211_dianyingbang_rebo_title'
 
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'}
 
#獲取網(wǎng)址內(nèi)容，賦值 到response
response = requests.get(url=url, headers=headers)
 
#把response.text轉(zhuǎn)換為selector對象 可以使用re， css，x-path選擇器
webtext = parsel.Selector(response.text)
 
#第一步篩選數(shù)據(jù)，形成列表，可以使下次查找形成循環(huán)
list=webtext.css('.rvi__list a')
 
#再上一步的基礎(chǔ)上，使用循環(huán)，進(jìn)行提取數(shù)據(jù)
for li in list:
    title= li.css(' .rvi__con .rvi__tit1::text').get()
    bulletcomments =li.css('.rvi__con .rvi__tag__box span:nth-child(1)::text').get() #彈幕總數(shù)
    newcomments =li.css(' .rvi__con .rvi__tag__box span:nth-child(2)::text').get() #新增評論數(shù)
    href = li.css('  ::attr(href)').get().replace('//','http://')
    movie_info=li.css(' .rvi__con .rvi__type1 span::text').get().split('/')
    year = movie_info[0].strip()
    type = movie_info[1].strip()
    actor = movie_info[2].strip()
    filmIntroduction=li.css(' .rvi__con p::text').get().strip()
    dic={
        '電影名字':title,
        '彈幕總數(shù)':bulletcomments,
        '新增評論':newcomments,
        '電影鏈接':href,
        '電影日期':year,
        '電影類型':type,
        '電影演員':actor,
        '電影介紹':filmIntroduction
    }
    csv_writer.writerow(dic)  #將數(shù)據(jù)輸入到csv文件中