python之如何合并excel的多個(gè)sheet

更新時(shí)間：2025年09月02日 11:00:20 作者：trayvontang

文章介紹使用openpyxl和pandas處理Excel報(bào)表,前者需手動合并邏輯,后者通過concat/append簡化操作,強(qiáng)調(diào)data_only參數(shù)避免公式干擾,并可實(shí)現(xiàn)數(shù)據(jù)過濾、轉(zhuǎn)換等功能

簡介

因?yàn)椋刻於紩械拇罅縠xcel報(bào)表匯總處理任務(wù)，所以寫了一個(gè)腳本來處理。

就是找出每一個(gè)excel中特定的sheet，把這些sheet的特定列讀取出來合并到一個(gè)sheet中。

因?yàn)槊恳粋€(gè)sheet的數(shù)據(jù)都不太一樣，所以稍微麻煩一點(diǎn)，下面使用openpyxl方式和pandas兩種方式來處理。

openpyxl方式

使用openpyxl方式要自己實(shí)現(xiàn)合并邏輯，要麻煩一些。

值得注意的是，在excel中可能有公式，讀取excel的時(shí)候可以使用下面的方式：

load_workbook(data_file_path, data_only=True)

使用data_only=True,就可以得到公式計(jì)算之后的值，而不是公式本身，因?yàn)楣奖旧砗喜⒃诹硪粋€(gè)sheet中，公式可能就無效，甚至不對了。

下面給一個(gè)示例代碼，僅供參考：

"""
 pip install openpyxl
"""
from openpyxl import load_workbook
from openpyxl import Workbook
import os
import re

# 模板文件
TEMPLATE_FILE = r'H:\合并\合并模板.xlsx'
# 合并結(jié)果文件
RESULT_FILE = r'H:\合并\結(jié)果.xlsx'
# 數(shù)據(jù)文件目錄
DATA_ROOT_DIR = r"H:\合并"

# 文件名稱正則
DATA_FILE_REG = r"(.*?)-合同導(dǎo)入臺賬\d{8}.xlsx"


# 獲取要處理的文件
def get_deal_file_map():
    file_sn_map = {}
    fs = os.listdir(DATA_ROOT_DIR)
    for f in fs:
        match = re.match(DATA_FILE_REG, f)
        if match:
            city = match.group(1)
            sn = 2
            if city == '成都':
                sn = 4
            elif city == '杭州':
                sn = 3
            file_sn_map[os.path.join(DATA_ROOT_DIR, f)] = sn
    return file_sn_map


# 規(guī)范化列名
def get_normal_column_name(origin_col_name):
    if origin_col_name:
        start = origin_col_name.find("（")
        if start == -1:
            return origin_col_name.strip()
        else:
            return origin_col_name[0:start].strip()


# 獲取列名與列坐標(biāo)的映射
def get_col_name_coordinate_map(sheet_row):
    name_coor_map = {}
    for cell in sheet_row:
        # name_coor_map[get_normal_column_name(cell.value)] = cell.column_letter
        name_coor_map[get_normal_column_name(cell.value)] = cell.column
    return name_coor_map


# 獲取模板文件的列名與列坐標(biāo)映射
def get_template_name_coordinate_map(template_file_path):
    template_wbook = load_workbook(template_file_path)
    table = template_wbook[template_wbook.sheetnames[0]]
    row = table[1:1]
    return get_col_name_coordinate_map(row)


def deal_data_content():
    """
        合并文件內(nèi)容
    """
    dfile_sn_map = get_deal_file_map()
    save_book = Workbook()
    wsheet = save_book.active
    wsheet.title = 'merge-data'
    tmp_col_coor_map = get_template_name_coordinate_map(TEMPLATE_FILE)
    wsheet.append(list(tmp_col_coor_map.keys()))
    line = 2
    for data_file_path in dfile_sn_map.keys():
        sheet_num = dfile_sn_map[data_file_path]
        wbook = load_workbook(data_file_path, data_only=True)

        names = wbook.sheetnames

        for i in range(0, sheet_num):
            table = wbook[names[i]]
            row = table[1:1]
            data_col_coor_map = get_col_name_coordinate_map(row)
            use_col = data_col_coor_map.keys() & tmp_col_coor_map.keys()
            for row in table.iter_rows(min_row=2, values_only=True):
                rcol_index = data_col_coor_map['城市']
                city = row[rcol_index - 1]
                if (city is None) or len(city.strip()) == 0:
                    continue
                for col_name in use_col:
                    rcol_index = data_col_coor_map[col_name]
                    wcol_index = tmp_col_coor_map[col_name]
                    wsheet.cell(line, wcol_index, row[rcol_index - 1])
                line += 1
    save_book.save(RESULT_FILE)


if __name__ == '__main__':
    deal_data_content()

pandas方式

相比于直接使用openpyxl的方式，使用pandas就方便多了，直接使用concat方法就可以了。

pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,keys=None, levels=None, names=None,verify_integrity=False,copy=True)

參數(shù)含義

參數(shù)	含義
objs	kist，Series、DataFrame、Panel
axis	默認(rèn)為0，按行連接
join	inner、outer，默認(rèn)為"outer"
keys	list，最外層構(gòu)建層次索引，如果為多索引，使用元組
levels	list，用于構(gòu)建MultiIndex的特定級別
names	list，結(jié)果層次索引中的級別的名稱
copy	boolean，默認(rèn)True。如果為False，請勿不必要地復(fù)制數(shù)據(jù)
join_axes	將要廢棄，推薦在結(jié)果集上使用reindex
ignore_index	boolean，默認(rèn)False。如果為True，忽略索引
verify_integrity	boolean，默認(rèn)False。檢查新連接的軸是否包含重復(fù)項(xiàng)

下面直接看示例：

# coding:utf-8
import pandas as pd

# 讀取指定文件的指定sheet
df1 = pd.read_excel(r'H:\merge\cd-contract-charge-1-20200807.xlsx', header=0, sheet_name=0)
df2 = pd.read_excel(r'H:\merge\cd-contract-charge-2-20200807.xlsx', header=0, sheet_name=1)
df3 = pd.read_excel(r'H:\merge\cd-contract-charge-3-20200807.xlsx', header=0, sheet_name=2)
df4 = pd.read_excel(r'H:\merge\hz-contract-charge-1-20200807.xlsx', header=0, sheet_name=0)
df5 = pd.read_excel(r'H:\merge\hz-contract-charge-2-20200807.xlsx', header=0, sheet_name=1)

# 按行拼接
data = pd.concat([df1, df2, df3, df4, df5], sort=False, ignore_index=True)

# 選擇需要的列
header = ['日期', '合同號', '城市', '姓名', 'charge']
data = data.loc[:, header]

# 將結(jié)果寫到值得excel文件
data.to_excel(r'H:\merge\result.xlsx', index=False)

主要是讀取excel文件，關(guān)于pandas文件讀寫，可以參考：pandas讀寫文件

除了使用concat方法，還可以使用append方法，append方式是一個(gè)特殊的concat方法，就是concat的參數(shù)axis=0的情況，也是concat方法的axis的默認(rèn)值。

既然使用了pandas，當(dāng)然也可以順便完成一些數(shù)據(jù)過濾、填充、轉(zhuǎn)換之類的操作。