python beautiful soup庫(kù)入門安裝教程

更新時(shí)間：2021年08月30日 14:31:21 作者：Cachel wood

Beautiful Soup是python的一個(gè)庫(kù)，最主要的功能是從網(wǎng)頁(yè)抓取數(shù)據(jù)。今天通過(guò)本文給大家分享python beautiful soup庫(kù)入門教程，需要的朋友參考下吧

beautiful soup庫(kù)的安裝

pip install beautifulsoup4

beautiful soup庫(kù)的理解

beautiful soup庫(kù)是解析、遍歷、維護(hù)“標(biāo)簽樹(shù)”的功能庫(kù)

beautiful soup庫(kù)的引用

from bs4 import BeautifulSoup
import bs4

BeautifulSoup類

BeautifulSoup對(duì)應(yīng)一個(gè)HTML/XML文檔的全部?jī)?nèi)容

回顧demo.html

import requests

r = requests.get("http://python123.io/ws/demo.html")
demo = r.text
print(demo)

<html><head><title>This is a python demo page</title></head>
<body>
<p class="title"><b>The demo python introduces several python courses.</b></p>
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a  class="py1" id="link1">Basic Python</a> and <a  class="py2" id="link2">Advanced Python</a>.</p>
</body></html>

Tag標(biāo)簽

基本元素	說(shuō)明
Tag	標(biāo)簽，最基本的信息組織單元，分別用<>和</>標(biāo)明開(kāi)頭和結(jié)尾

import requests
from bs4 import BeautifulSoup
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text
soup = BeautifulSoup(demo,"html.parser")
print(soup.title)
tag = soup.a
print(tag)

<title>This is a python demo page</title>
<a   >Basic Python</a>

任何存在于HTML語(yǔ)法中的標(biāo)簽都可以用soup.訪問(wèn)獲得。當(dāng)HTML文檔中存在多個(gè)相同對(duì)應(yīng)內(nèi)容時(shí)，soup.返回第一個(gè)

Tag的name

基本元素	說(shuō)明
Name	標(biāo)簽的名字， … 的名字是'p',格式：.name

import requests
from bs4 import BeautifulSoup
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text
soup = BeautifulSoup(demo,"html.parser")
print(soup.a.name)
print(soup.a.parent.name)
print(soup.a.parent.parent.name)

a
p   
body

Tag的attrs（屬性）

基本元素	說(shuō)明
Attributes	標(biāo)簽的屬性，字典形式組織，格式：.attrs

import requests
from bs4 import BeautifulSoup
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text
soup = BeautifulSoup(demo,"html.parser")
tag = soup.a
print(tag.attrs)
print(tag.attrs['class'])
print(tag.attrs['href'])
print(type(tag.attrs))
print(type(tag))

{'href': 'http://www.icourse163.org/course/BIT-268001', 'class': ['py1'], 'id': 'link1'}
['py1']
http://www.icourse163.org/course/BIT-268001
<class 'dict'>
<class 'bs4.element.Tag'>

Tag的NavigableString

基本元素	說(shuō)明
NavigableString	標(biāo)簽內(nèi)非屬性字符串，<>…</>中字符串，格式：.string

Tag的Comment

基本元素	說(shuō)明
Comment	標(biāo)簽內(nèi)字符串的注釋部分，一種特殊的Comment類型

import requests
from bs4 import BeautifulSoup
newsoup = BeautifulSoup("<b><!--This is a comment--></b><p>This is not a comment</p>","html.parser")
print(newsoup.b.string)
print(type(newsoup.b.string))
print(newsoup.p.string)
print(type(newsoup.p.string))

This is a comment
<class 'bs4.element.Comment'>
This is not a comment
<class 'bs4.element.NavigableString'>

HTML基本格式

標(biāo)簽樹(shù)的下行遍歷

屬性	說(shuō)明
.contents	子節(jié)點(diǎn)的列表，將所有兒子結(jié)點(diǎn)存入列表
.children	子節(jié)點(diǎn)的迭代類型，與.contents類似，用于循環(huán)遍歷兒子結(jié)點(diǎn)
.descendents	子孫節(jié)點(diǎn)的迭代類型，包含所有子孫節(jié)點(diǎn)，用于循環(huán)遍歷

BeautifulSoup類型是標(biāo)簽樹(shù)的根節(jié)點(diǎn)

import requests
from bs4 import BeautifulSoup
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text

soup = BeautifulSoup(demo,"html.parser")
print(soup.head)
print(soup.head.contents)
print(soup.body.contents)
print(len(soup.body.contents))
print(soup.body.contents[1])

<head><title>This is a python demo page</title></head>
[<title>This is a python demo page</title>]
['\n', <p ><b>The demo python introduces several python courses.</b></p>, '\n', <p >Python 
is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the 
following courses:
<a   >Basic Python</a> and <a   >Advanced Python</a>.</p>, '\n']
5
<p ><b>The demo python introduces several python courses.</b></p>

for child in soup.body.children:
	print(child)  #遍歷兒子結(jié)點(diǎn)
for child in soup.body.descendants:
	print(child) #遍歷子孫節(jié)點(diǎn)

標(biāo)簽樹(shù)的上行遍歷

屬性	說(shuō)明
.parent	節(jié)點(diǎn)的父親標(biāo)簽
.parents	節(jié)點(diǎn)先輩標(biāo)簽的迭代類型，用于循環(huán)遍歷先輩節(jié)點(diǎn)

import requests
from bs4 import BeautifulSoup
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text

soup = BeautifulSoup(demo,"html.parser")
print(soup.title.parent)
print(soup.html.parent)

<head><title>This is a python demo page</title></head>
<html><head><title>This is a python demo page</title></head>
<body>
<p ><b>The demo python introduces several python courses.</b></p>
<p >Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a   >Basic Python</a> and <a   >Advanced Python</a>.</p>
</body></html>

import requests
from bs4 import BeautifulSoup
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text

soup = BeautifulSoup(demo,"html.parser")
for parent in soup.a.parents:
    if parent is None:
        print(parent)
    else:
        print(parent.name)

p
body      
html      
[document]

標(biāo)簽的平行遍歷

屬性	說(shuō)明
.next_sibling	返回按照HTML文本順序的下一個(gè)平行節(jié)點(diǎn)標(biāo)簽
.previous.sibling	返回按照HTML文本順序的上一個(gè)平行節(jié)點(diǎn)標(biāo)簽
.next_siblings	迭代類型，返回按照HTML文本順序的后續(xù)所有平行節(jié)點(diǎn)標(biāo)簽
.previous.siblings	迭代類型，返回按照HTML文本順序的前續(xù)所有平行節(jié)點(diǎn)標(biāo)簽

import requests
from bs4 import BeautifulSoup
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text

soup = BeautifulSoup(demo,"html.parser")
print(soup.a.next_sibling)
print(soup.a.next_sibling.next_sibling)

print(soup.a.previous_sibling)
print(soup.a.previous_sibling.previous_sibling)

print(soup.a.parent)

and 
<a class="py2"  id="link2">Advanced Python</a>
Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:

None
<p class="course">Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<a class="py1"  id="link1">Basic Python</a> and <a class="py2"  id="link2">Advanced Python</a>.</p>

for sibling in soup.a.next_sibling:
	print(sibling)  #遍歷后續(xù)節(jié)點(diǎn)
for sibling in soup.a.previous_sibling:
	print(sibling)  #遍歷前續(xù)節(jié)點(diǎn)

在這里插入圖片描述

bs庫(kù)的prettify()方法

import requests
from bs4 import BeautifulSoup
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text

soup = BeautifulSoup(demo,"html.parser")
print(soup.prettify())

<html>
 <head>
  <title>
   This is a python demo page
  </title>
 </head>
 <body>
  <p class="title">
   <b>
    The demo python introduces several python courses.
   </b>
  </p>
  <p class="course">
   Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
    Basic Python
   </a>
   and
   <a class="py2"  id="link2">
    Advanced Python
   </a>
   .
  </p>
 </body>
</html>

.prettify()為HTML文本<>及其內(nèi)容增加更加'\n'
.prettify()可用于標(biāo)簽，方法：.prettify()

bs4庫(kù)的編碼

bs4庫(kù)將任何HTML輸入都變成utf-8編碼
python 3.x默認(rèn)支持編碼是utf-8,解析無(wú)障礙

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup("<p>中文</p>","html.parser")
print(soup.p.string)

print(soup.p.prettify())

中文

<p>  
 中文
</p>

到此這篇關(guān)于python beautiful soup庫(kù)入門安裝教程的文章就介紹到這了,更多相關(guān)python beautiful soup庫(kù)入門內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

python beautiful soup庫(kù)入門安裝教程

目錄

beautiful soup庫(kù)的安裝

beautiful soup庫(kù)的理解

beautiful soup庫(kù)的引用

BeautifulSoup類

回顧demo.html

Tag標(biāo)簽

Tag的name

Tag的attrs（屬性）

Tag的NavigableString

HTML基本格式

標(biāo)簽樹(shù)的下行遍歷

標(biāo)簽樹(shù)的上行遍歷

標(biāo)簽的平行遍歷

bs庫(kù)的prettify()方法

bs4庫(kù)的編碼

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线 免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕

python beautiful soup庫(kù)入門安裝教程

目錄

beautiful soup庫(kù)的安裝

beautiful soup庫(kù)的理解

beautiful soup庫(kù)的引用

BeautifulSoup類

回顧demo.html

Tag標(biāo)簽

Tag的name

Tag的attrs（屬性）

Tag的NavigableString

HTML基本格式

標(biāo)簽樹(shù)的下行遍歷

標(biāo)簽樹(shù)的上行遍歷

標(biāo)簽的平行遍歷

bs庫(kù)的prettify()方法

bs4庫(kù)的編碼

相關(guān)文章

最新評(píng)論

大家感興趣的內(nèi)容

最近更新的內(nèi)容

常用在線小工具

国产无遮挡裸体免费直播视频,久久精品国产蜜臀av,动漫在线视频一区二区,欧亚日韩一区二区三区,久艹在线免费视频,国产精品美女网站免费,正在播放 97超级视频在线观看,斗破苍穹年番在线观看免费,51最新乱码中文字幕