问题:有没有一种方法可以使用pandas.ExcelWriter自动调整Excel列的宽度?

我被要求生成一些Excel报告。我目前正在大量使用pandas作为数据,所以自然地我想使用pandas.ExcelWriter方法生成这些报告。但是,固定的列宽是一个问题。

到目前为止,我的代码很简单。假设我有一个名为“ df”的数据框:

writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")

我正在查看pandas代码,但实际上没有看到任何设置列宽的选项。宇宙中是否有技巧可以使列自动调整为数据?还是在事实之后我可以对xlsx文件做一些事情来调整列宽?

(我正在使用OpenPyXL库,并生成.xlsx文件-如果有区别的话。)

谢谢。

I am being asked to generate some Excel reports. I am currently using pandas quite heavily for my data, so naturally I would like to use the pandas.ExcelWriter method to generate these reports. However the fixed column widths are a problem.

The code I have so far is simple enough. Say I have a dataframe called ‘df’:

writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")

I was looking over the pandas code, and I don’t really see any options to set column widths. Is there a trick out there in the universe to make it such that the columns auto-adjust to the data? Or is there something I can do after the fact to the xlsx file to adjust the column widths?

(I am using the OpenPyXL library, and generating .xlsx files – if that makes any difference.)

Thank you.


回答 0

user6178746的回答启发,我有以下内容:

# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items():  # loop through `dict` of dataframes
    df.to_excel(writer, sheet_name=sheetname)  # send df to writer
    worksheet = writer.sheets[sheetname]  # pull worksheet object
    for idx, col in enumerate(df):  # loop through all columns
        series = df[col]
        max_len = max((
            series.astype(str).map(len).max(),  # len of largest item
            len(str(series.name))  # len of column name/header
            )) + 1  # adding a little extra space
        worksheet.set_column(idx, idx, max_len)  # set column width
writer.save()

Inspired by user6178746’s answer, I have the following:

# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items():  # loop through `dict` of dataframes
    df.to_excel(writer, sheet_name=sheetname)  # send df to writer
    worksheet = writer.sheets[sheetname]  # pull worksheet object
    for idx, col in enumerate(df):  # loop through all columns
        series = df[col]
        max_len = max((
            series.astype(str).map(len).max(),  # len of largest item
            len(str(series.name))  # len of column name/header
            )) + 1  # adding a little extra space
        worksheet.set_column(idx, idx, max_len)  # set column width
writer.save()

回答 1

我发布此消息是因为我遇到了同样的问题,发现Xlsxwriter和pandas的官方文档仍然将此功能列为不受支持。我找到了解决我遇到的问题的解决方案。我基本上只是遍历每列,并使用worksheet.set_column设置列宽==该列内容的最大长度。

但是,重要的一点。此解决方案不适合列标题,仅适合列值。这应该是一个容易的更改,但是,如果您需要改头,则可以。希望这可以帮助某人:)

import pandas as pd
import sqlalchemy as sa
import urllib


read_server = 'serverName'
read_database = 'databaseName'

read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)

#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)

#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')

#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)

#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']

#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
    # find length of column i
    column_len = my_dataframe[col].astype(str).str.len().max()
    # Setting the length if the column header is larger
    # than the max column value length
    column_len = max(column_len, len(col)) + 2
    # set the column length
    worksheet.set_column(i, i, column_len)
writer.save()

I’m posting this because I just ran into the same issue and found that the official documentation for Xlsxwriter and pandas still have this functionality listed as unsupported. I hacked together a solution that solved the issue i was having. I basically just iterate through each column and use worksheet.set_column to set the column width == the max length of the contents of that column.

One important note, however. This solution does not fit the column headers, simply the column values. That should be an easy change though if you need to fit the headers instead. Hope this helps someone :)

import pandas as pd
import sqlalchemy as sa
import urllib


read_server = 'serverName'
read_database = 'databaseName'

read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)

#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)

#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')

#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)

#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']

#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
    # find length of column i
    column_len = my_dataframe[col].astype(str).str.len().max()
    # Setting the length if the column header is larger
    # than the max column value length
    column_len = max(column_len, len(col)) + 2
    # set the column length
    worksheet.set_column(i, i, column_len)
writer.save()

回答 2

现在可能尚无自动方法,但是当您使用openpyxl时,以下几行(由Bufke用户改写了有关手动操作的另一答案)允许您指定合理的值(以字符宽度表示):

writer.sheets['Summary'].column_dimensions['A'].width = 15

There is probably no automatic way to do it right now, but as you use openpyxl, the following line (adapted from another answer by user Bufke on how to do in manually) allows you to specify a sane value (in character widths):

writer.sheets['Summary'].column_dimensions['A'].width = 15

回答 3

我最近开始使用一个不错的程序包,称为StyleFrame。

它获得了DataFrame并允许您非常轻松地对其进行样式设置…

默认情况下,列宽是自动调整的。

例如:

from StyleFrame import StyleFrame
import pandas as pd

df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3], 
                   'bbbbbbbbb': [1, 1, 1],
                   'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
            columns_and_rows_to_freeze='B2')
excel_writer.save()

您还可以更改列宽:

sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
                    width=35.3)


更新

在1.4版中,best_fit参数已添加到中StyleFrame.to_excel。请参阅文档

There is a nice package that I started to use recently called StyleFrame.

it gets DataFrame and lets you to style it very easily…

by default the columns width is auto-adjusting.

for example:

from StyleFrame import StyleFrame
import pandas as pd

df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3], 
                   'bbbbbbbbb': [1, 1, 1],
                   'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
            columns_and_rows_to_freeze='B2')
excel_writer.save()

you can also change the columns width:

sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
                    width=35.3)

UPDATE 1

In version 1.4 best_fit argument was added to StyleFrame.to_excel. See the documentation.

UPDATE 2

Here’s a sample of code that works for StyleFrame 3.x.x

from styleframe import StyleFrame
import pandas as pd

columns = ['aaaaaaaaaaa', 'bbbbbbbbb', 'ccccccccccc', ]
df = pd.DataFrame(data={
        'aaaaaaaaaaa': [1, 2, 3, ],
        'bbbbbbbbb': [1, 1, 1, ],
        'ccccccccccc': [2, 3, 4, ],
    }, columns=columns,
)
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(
    excel_writer=excel_writer, 
    best_fit=columns,
    columns_and_rows_to_freeze='B2', 
    row_to_add_filters=0,
)
excel_writer.save()

回答 4

通过使用pandas和xlsxwriter,您可以完成任务,下面的代码将在Python 3.x中完美地工作。有关使用XlsxWriter和熊猫的更多详细信息,此链接可能会有用:https: //xlsxwriter.readthedocs.io/working_with_pandas.html

import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()

By using pandas and xlsxwriter you can do your task, below code will perfectly work in Python 3.x. For more details on working with XlsxWriter with pandas this link might be useful https://xlsxwriter.readthedocs.io/working_with_pandas.html

import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()

回答 5

我发现基于列标题而不是列内容来调整列更有用。

使用 df.columns.values.tolist() I生成列标题的列表,并使用这些标题的长度来确定列的宽度。

请参阅下面的完整代码:

import pandas as pd
import xlsxwriter

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)

workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet

header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
    worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)

writer.save() # Save the excel file

I found that it was more useful to adjust the column with based on the column header rather than column content.

Using df.columns.values.tolist() I generate a list of the column headers and use the lengths of these headers to determine the width of the columns.

See full code below:

import pandas as pd
import xlsxwriter

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)

workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet

header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
    worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)

writer.save() # Save the excel file

回答 6

在工作中,我总是将数据帧写入excel文件。因此,我没有反复编写相同的代码,而是创建了一个模数。现在,我只是将其导入并使用它来编写和设置excel文件。但是有一个缺点,如果数据帧过大,则需要花费很长时间。所以这是代码:

def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
    out_path = os.path.join(output_dir, output_name)
    writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
                    datetime_format='yyyymmdd', date_format='yyyymmdd')
    workbook = writerReport.book
    # loop through the list of dataframes to save every dataframe into a new sheet in the excel file
    for i, dataframe in enumerate(dataframes_list):
        sheet_name = sheet_names_list[i]  # choose the sheet name from sheet_names_list
        dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
        # Add a header format.
        format = workbook.add_format({
            'bold': True,
            'border': 1,
            'fg_color': '#0000FF',
            'font_color': 'white'})
        # Write the column headers with the defined format.
        worksheet = writerReport.sheets[sheet_name]
        for col_num, col_name in enumerate(dataframe.columns.values):
            worksheet.write(0, col_num, col_name, format)
        worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
        worksheet.freeze_panes(1, 0)
        # loop through the columns in the dataframe to get the width of the column
        for j, col in enumerate(dataframe.columns):
            max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
            # define a max width to not get to wide column
            if max_width > 50:
                max_width = 50
            worksheet.set_column(j, j, max_width)
    writerReport.save()
    writerReport.close()
    return output_dir + output_name

At work, I am always writing the dataframes to excel files. So instead of writing the same code over and over, I have created a modulus. Now I just import it and use it to write and formate the excel files. There is one downside though, it takes a long time if the dataframe is extra large. So here is the code:

def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
    out_path = os.path.join(output_dir, output_name)
    writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
                    datetime_format='yyyymmdd', date_format='yyyymmdd')
    workbook = writerReport.book
    # loop through the list of dataframes to save every dataframe into a new sheet in the excel file
    for i, dataframe in enumerate(dataframes_list):
        sheet_name = sheet_names_list[i]  # choose the sheet name from sheet_names_list
        dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
        # Add a header format.
        format = workbook.add_format({
            'bold': True,
            'border': 1,
            'fg_color': '#0000FF',
            'font_color': 'white'})
        # Write the column headers with the defined format.
        worksheet = writerReport.sheets[sheet_name]
        for col_num, col_name in enumerate(dataframe.columns.values):
            worksheet.write(0, col_num, col_name, format)
        worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
        worksheet.freeze_panes(1, 0)
        # loop through the columns in the dataframe to get the width of the column
        for j, col in enumerate(dataframe.columns):
            max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
            # define a max width to not get to wide column
            if max_width > 50:
                max_width = 50
            worksheet.set_column(j, j, max_width)
    writerReport.save()
    return output_dir + output_name


回答 7

动态调整所有列长

writer = pd.ExcelWriter('/path/to/output/file.xlsx') 
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')

for column in df:
    column_length = max(df[column].astype(str).map(len).max(), len(column))
    col_idx = df.columns.get_loc(column)
    writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)

使用列名手动调整列

col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

使用列索引手动调整列

writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

如果以上任何一个失败

AttributeError: 'Worksheet' object has no attribute 'set_column'

确保安装xlsxwriter

pip install xlsxwriter

Dynamically adjust all the column lengths

writer = pd.ExcelWriter('/path/to/output/file.xlsx') 
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')

for column in df:
    column_length = max(df[column].astype(str).map(len).max(), len(column))
    col_idx = df.columns.get_loc(column)
    writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)

Manually adjust a column using Column Name

col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

Manually adjust a column using Column Index

writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

In case any of the above is failing with

AttributeError: 'Worksheet' object has no attribute 'set_column'

make sure to install xlsxwriter:

pip install xlsxwriter

回答 8

结合其他答案和评论,还支持多索引:

def autosize_excel_columns(worksheet, df):
  autosize_excel_columns_df(worksheet, df.index.to_frame())
  autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)

def autosize_excel_columns_df(worksheet, df, offset=0):
  for idx, col in enumerate(df):
    series = df[col]
    max_len = max((
      series.astype(str).map(len).max(),
      len(str(series.name))
    )) + 1
    worksheet.set_column(idx+offset, idx+offset, max_len)

sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()

Combining the other answers and comments and also supporting multi-indices:

def autosize_excel_columns(worksheet, df):
  autosize_excel_columns_df(worksheet, df.index.to_frame())
  autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)

def autosize_excel_columns_df(worksheet, df, offset=0):
  for idx, col in enumerate(df):
    series = df[col]
    max_len = max((
      series.astype(str).map(len).max(),
      len(str(series.name))
    )) + 1
    worksheet.set_column(idx+offset, idx+offset, max_len)

sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()

回答 9

import re
import openpyxl
..
for col in _ws.columns:
    max_lenght = 0
    print(col[0])
    col_name = re.findall('\w\d', str(col[0]))
    col_name = col_name[0]
    col_name = re.findall('\w', str(col_name))[0]
    print(col_name)
    for cell in col:
        try:
            if len(str(cell.value)) > max_lenght:
                max_lenght = len(cell.value)
        except:
            pass
    adjusted_width = (max_lenght+2)
    _ws.column_dimensions[col_name].width = adjusted_width
import re
import openpyxl
..
for col in _ws.columns:
    max_lenght = 0
    print(col[0])
    col_name = re.findall('\w\d', str(col[0]))
    col_name = col_name[0]
    col_name = re.findall('\w', str(col_name))[0]
    print(col_name)
    for cell in col:
        try:
            if len(str(cell.value)) > max_lenght:
                max_lenght = len(cell.value)
        except:
            pass
    adjusted_width = (max_lenght+2)
    _ws.column_dimensions[col_name].width = adjusted_width

回答 10

最简单的解决方案是在set_column方法中指定列宽。

    for worksheet in writer.sheets.values():
        worksheet.set_column(0,last_column_value, required_width_constant)

Easiest solution is to specify width of column in set_column method.

    for worksheet in writer.sheets.values():
        worksheet.set_column(0,last_column_value, required_width_constant)

回答 11

def auto_width_columns(df, sheetname):
    workbook = writer.book  
    worksheet= writer.sheets[sheetname] 

    for i, col in enumerate(df.columns):
        column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
        worksheet.set_column(i, i, column_len)
def auto_width_columns(df, sheetname):
    workbook = writer.book  
    worksheet= writer.sheets[sheetname] 

    for i, col in enumerate(df.columns):
        column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
        worksheet.set_column(i, i, column_len)

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。