Python利用Rows快速操作csv文件

更新時(shí)間：2022年09月01日 11:55:34 作者：Python實(shí)用寶典

Rows?是一個(gè)專門用于操作表格的第三方Python模塊。只要通過?Rows?讀取?csv?文件，她就能生成可以被計(jì)算的?Python?對(duì)象。本文將通過示例為大家詳細(xì)講講Python如何利用Rows快速操作csv文件，需要的可以參考一下

相比于 pandas 的 pd.read_csv, 我認(rèn)為 Rows 的優(yōu)勢(shì)在于其易于理解的計(jì)算語法和各種方便的導(dǎo)出和轉(zhuǎn)換語法。它能非常方便地提取pdf中的文字、將csv轉(zhuǎn)換為sqlite文件、合并csv等，還能對(duì)csv文件執(zhí)行sql語法，還是比較強(qiáng)大的。

當(dāng)然，它的影響力肯定沒有 Pandas 大，不過了解一下吧，技多不壓身。

1.準(zhǔn)備

開始之前，你要確保Python和pip已經(jīng)成功安裝在電腦上,如果沒有，可以訪問這篇文章：超詳細(xì)Python安裝指南進(jìn)行安裝。

(可選1) 如果你用Python的目的是數(shù)據(jù)分析，可以直接安裝Anaconda，它內(nèi)置了Python和pip.

(可選2) 此外，推薦大家用VSCode編輯器，它有許多的優(yōu)點(diǎn)

請(qǐng)選擇以下任一種方式輸入命令安裝依賴：

1. Windows 環(huán)境打開 Cmd (開始-運(yùn)行-CMD)。

2. MacOS 環(huán)境打開 Terminal (command+空格輸入Terminal)。

3. 如果你用的是 VSCode編輯器或 Pycharm，可以直接使用界面下方的Terminal.

pip install rows

2.基本使用

通過下面這個(gè)小示例，你就能知道Rows的基本使用方法。

假設(shè)我們有這樣的一個(gè)csv表格數(shù)據(jù)：

state,city,inhabitants,area
AC,Acrelandia,12538,1807.92
AC,Assis Brasil,6072,4974.18
AC,Brasiléia,21398,3916.5
AC,Bujari,8471,3034.87
AC,Capixaba,8798,1702.58
[...]
RJ,Angra dos Reis,169511,825.09
RJ,Aperibé,10213,94.64
RJ,Araruama,112008,638.02
RJ,Areal,11423,110.92
RJ,Arma??o dos Búzios,27560,70.28
[...]

如果我們想要找出 state 為 RJ 并且人口大于 500000 的城市，只需要這么做：

import rows
 
cities = rows.import_from_csv("data/brazilian-cities.csv")
rio_biggest_cities = [
    city for city in cities
    if city.state == "RJ" and city.inhabitants > 500000
]
for city in rio_biggest_cities:
    density = city.inhabitants / city.area
    print(f"{city.city} ({density:5.2f} ppl/km2)")

和 Pandas 很像，但是語法比 Pandas 簡(jiǎn)單，整個(gè)模塊也比 Pandas 輕量。

如果你想要自己新建一個(gè)"表格", 你可以這么寫：

from collections import OrderedDict
from rows import fields, Table
 
 
country_fields = OrderedDict([
    ("name", fields.TextField),
    ("population", fields.IntegerField),
])
 
countries = Table(fields=country_fields)
countries.append({"name": "Argentina", "population": "45101781"})
countries.append({"name": "Brazil", "population": "212392717"})
countries.append({"name": "Colombia", "population": "49849818"})
countries.append({"name": "Ecuador", "population": "17100444"})
countries.append({"name": "Peru", "population": "32933835"})

然后你可以迭代它：

for country in countries:
    print(country)
# Result:
# Row(name='Argentina', population=45101781)
# Row(name='Brazil', population=212392717)
# Row(name='Colombia', population=49849818)
# Row(name='Ecuador', population=17100444)
# Row(name='Peru', population=32933835)
# "Row" is a namedtuple created from `country_fields`
 
# We've added population as a string, the library automatically converted to
# integer so we can also sum:
countries_population = sum(country.population for country in countries)
print(countries_population) # prints 357378595

還可以將此表導(dǎo)出為 CSV 或任何其他支持的格式：

# 公眾號(hào)：Python實(shí)用寶典
import rows
rows.export_to_csv(countries, "some-LA-countries.csv")
 
# html
rows.export_to_html(legislators, "some-LA-countries.csv")

從字典導(dǎo)入到rows對(duì)象：

import rows
 
data = [
    {"name": "Argentina", "population": "45101781"},
    {"name": "Brazil", "population": "212392717"},
    {"name": "Colombia", "population": "49849818"},
    {"name": "Ecuador", "population": "17100444"},
    {"name": "Peru", "population": "32933835"},
    {"name": "Guyana", }, # Missing "population", will fill with `None`
]
table = rows.import_from_dicts(data)
print(table[-1]) # Can use indexes
# Result:
# Row(name='Guyana', population=None)

3.命令行工具

除了寫Python代碼外，你還可以直接使用Rows的命令行工具，下面介紹幾個(gè)可能會(huì)經(jīng)常被用到的工具。

讀取pdf文件內(nèi)的文字并保存為文件：

# 需要提前安裝: pip install rows[pdf]
URL="http://www.imprensaoficial.rr.gov.br/app/_edicoes/2018/01/doe-20180131.pdf"
rows pdf-to-text $URL result.txt # 保存到文件 顯示進(jìn)度條
rows pdf-to-text --quiet $URL result.txt # 保存到文件 不顯示進(jìn)度條
rows pdf-to-text --pages=1,2,3 $URL # 輸出三頁到終端
rows pdf-to-text --pages=1-3 $URL # 輸出三頁到終端(使用 - 范圍符)

將csv轉(zhuǎn)化為sqlite:

rows csv2sqlite \
     --dialect=excel \
     --input-encoding=latin1 \
     file1.csv file2.csv \
     result.sqlite

合并多個(gè)csv文件：

rows csv-merge \
     file1.csv file2.csv.bz2 file3.csv.xz \
     result.csv.gz

對(duì)csv執(zhí)行sql搜索：

# needs: pip install rows[html]
rows query \
    "SELECT * FROM table1 WHERE inhabitants > 1000000" \
    data/brazilian-cities.csv \
    --output=data/result.html

其他更多功能，請(qǐng)見Rows官方文檔：

http://turicas.info/rows

到此這篇關(guān)于Python利用Rows快速操作csv文件的文章就介紹到這了,更多相關(guān)Python操作csv內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: