Python使用pandas將表格數(shù)據(jù)進(jìn)行處理

更新時(shí)間：2022年08月31日 14:42:05 作者：IT之一小佬

這篇文章主要介紹了Python使用pandas將表格數(shù)據(jù)進(jìn)行處理，文章圍繞主題展開詳細(xì)的內(nèi)容介紹，具有一定的參考價(jià)值，感興趣的小伙伴可以參考一下

前言

任務(wù)描述：

當(dāng)前有一份excel表格數(shù)據(jù)，里面存在缺失值，需要對缺失的數(shù)據(jù)到es數(shù)據(jù)庫中進(jìn)行查找并對其進(jìn)行把缺失的數(shù)據(jù)進(jìn)行補(bǔ)全。

excel表格數(shù)據(jù)如下所示：

一、構(gòu)建es庫中的數(shù)據(jù)

1.1 創(chuàng)建索引

# 創(chuàng)建physical_examination索引
PUT /physical_examination
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "properties": {
      "nums": {
        "type": "integer"
      },
      "name": {
        "type": "text"
      },
      "sex": {
        "type": "text"
      },
      "phone": {
        "type": "integer"
      },
      "result": {
        "type": "text"
      }
    }
  }
}

1.2 插入數(shù)據(jù)

【注意：json數(shù)據(jù)不能格式化換行，否則報(bào)錯(cuò)】

# 向physical_examination索引中添加數(shù)據(jù)
POST physical_examination/_bulk
{"index":{"_id":"1"}}
{"nums":1,"name":"劉一","sex":"男","phone":1234567891,"result":"優(yōu)秀"}
{"index":{"_id":"2"}}
{"nums":2,"name":"陳二","sex":"男","phone":1234567892,"result":"優(yōu)秀"}
{"index":{"_id":"3"}}
{"nums":3,"name":"張三","sex":"男","phone":1234567893,"result":"優(yōu)秀"}
{"index":{"_id":"4"}}
{"nums":4,"name":"李四","sex":"男","phone":1234567894,"result":"優(yōu)秀"}
{"index":{"_id":"5"}}
{"nums":5,"name":"王五","sex":"男","phone":1234567895,"result":"優(yōu)秀"}

1.3 查詢數(shù)據(jù)

【注意：默認(rèn)查詢索引下的所有數(shù)據(jù)】

# 查詢索引中的所有數(shù)據(jù)
GET physical_examination/_search
{
  "query": {
    "match_all": {}
  }
}

二、對excel表格中的數(shù)據(jù)處理操作

2.1 導(dǎo)出es查詢的數(shù)據(jù)

方法一：直接在kibana或postman查詢的結(jié)果中進(jìn)行復(fù)制粘貼到一個(gè)文檔。
方法二：使用kibana導(dǎo)出數(shù)據(jù)。
方法三：使用postman導(dǎo)出數(shù)據(jù)保存到本地。

使用python處理數(shù)據(jù)，獲取需要的數(shù)據(jù)。

示例代碼：

# 讀取json中體檢信息
with open('./data/physical_examination.json', 'r', encoding='utf-8') as f:
    data_json = f.read()
print(data_json)
 
# 處理json數(shù)據(jù)中的異常數(shù)據(jù)
if 'false' in data_json:
    data_json = data_json.replace('false', "False")
 
data_json = eval(data_json)
print(data_json)
 
print(data_json['hits']['hits'])
print('*' * 100)
 
valid_data = data_json['hits']['hits']
need_data = []
for data in valid_data:
    print(data['_source'])
    need_data.append(data['_source'])
print(need_data)

讀取缺失數(shù)據(jù)的excel表格，把缺失的數(shù)據(jù)填補(bǔ)進(jìn)去。

# 讀取需要填補(bǔ)數(shù)據(jù)的表格
data_xlsx = pd.read_excel('./data/體檢表.xlsx', sheet_name='Sheet1')
# print(data_xlsx)
 
# 獲取excel表格的行列
row, col = data_xlsx.shape
print(row, col)
 
# 修改表格中的數(shù)據(jù)
for i in range(row):
    bb = data_xlsx.iloc[i]
    print(bb['姓名'], bb['手機(jī)號(hào)'])
    if pd.isnull(bb['手機(jī)號(hào)']):
        bb['手機(jī)號(hào)'] = '666'
        for cc in need_data:
            if cc['name'] == bb['姓名']:
                bb['手機(jī)號(hào)'] = cc['phone']
            data_xlsx.iloc[i, 3] = bb['手機(jī)號(hào)']
    print(bb['姓名'], bb['手機(jī)號(hào)'])
    print("-" * 100)
print(data_xlsx)

將最終處理好的數(shù)據(jù)保存在新建的文件中。

# 保存數(shù)據(jù)到新文件中
data_xlsx.to_excel('./data/new_data.xlsx', sheet_name='Sheet1', index=False, header=True)

完整代碼如下：

import pandas as pd
 
# 讀取json中體檢信息
with open('./data/physical_examination.json', 'r', encoding='utf-8') as f:
    data_json = f.read()
print(data_json)
 
# 處理json數(shù)據(jù)中的異常數(shù)據(jù)
if 'false' in data_json:
    data_json = data_json.replace('false', "False")
 
data_json = eval(data_json)
print(data_json)
 
print(data_json['hits']['hits'])
print('*' * 100)
 
valid_data = data_json['hits']['hits']
need_data = []
for data in valid_data:
    print(data['_source'])
    need_data.append(data['_source'])
print(need_data)
 
# 讀取需要填補(bǔ)數(shù)據(jù)的表格
data_xlsx = pd.read_excel('./data/體檢表.xlsx', sheet_name='Sheet1')
# print(data_xlsx)
 
# 獲取excel表格的行列
row, col = data_xlsx.shape
print(row, col)
 
# 修改表格中的數(shù)據(jù)
for i in range(row):
    bb = data_xlsx.iloc[i]
    print(bb['姓名'], bb['手機(jī)號(hào)'])
    if pd.isnull(bb['手機(jī)號(hào)']):
        bb['手機(jī)號(hào)'] = '666'
        for cc in need_data:
            if cc['name'] == bb['姓名']:
                bb['手機(jī)號(hào)'] = cc['phone']
            data_xlsx.iloc[i, 3] = bb['手機(jī)號(hào)']
    print(bb['姓名'], bb['手機(jī)號(hào)'])
    print("-" * 100)
print(data_xlsx)
 
# 保存數(shù)據(jù)到新文件中
data_xlsx.to_excel('./data/new_data.xlsx', sheet_name='Sheet1', index=False, header=True)

運(yùn)行效果，最終處理好的數(shù)據(jù)如下所示：