标签归档:Excel

如何在不覆盖数据的情况下(使用熊猫)写入现有的excel文件?

问题:如何在不覆盖数据的情况下(使用熊猫)写入现有的excel文件?

我使用熊猫以以下方式写入excel文件:

import pandas

writer = pandas.ExcelWriter('Masterfile.xlsx') 

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Masterfile.xlsx已经包含许多不同的选项卡。但是,它尚未包含“ Main”。

熊猫正确地写入了“主要”表,不幸的是,它也删除了所有其他标签。

I use pandas to write to excel file in the following fashion:

import pandas

writer = pandas.ExcelWriter('Masterfile.xlsx') 

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Masterfile.xlsx already consists of number of different tabs. However, it does not yet contain “Main”.

Pandas correctly writes to “Main” sheet, unfortunately it also deletes all other tabs.


回答 0

熊猫文档说,它对xlsx文件使用openpyxl。快速浏览一下其中的代码ExcelWriter可以提示可能会发生以下情况:

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book

## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.

writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

Pandas docs says it uses openpyxl for xlsx files. Quick look through the code in ExcelWriter gives a clue that something like this might work out:

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book

## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.

writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

回答 1

这是一个辅助函数:

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]

    Returns: None
    """
    from openpyxl import load_workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl')

    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()

注意:对于<0.21.0的熊猫,请替换sheet_namesheetname

用法示例:

append_df_to_excel('d:/temp/test.xlsx', df)

append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False)

append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False, startrow=25)

Here is a helper function:

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]

    Returns: None

    (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
    """
    from openpyxl import load_workbook

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl')

    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)
        
        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)
        
        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()
            

NOTE: for Pandas < 0.21.0, replace sheet_name with sheetname!

Usage examples:

append_df_to_excel('d:/temp/test.xlsx', df)

append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False)

append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2', index=False, startrow=25)

回答 2

使用openpyxlversion 2.4.0pandasversion 0.19.2,@ ski提出的过程变得更加简单:

import pandas
from openpyxl import load_workbook

with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
    writer.book = load_workbook('Masterfile.xlsx')
    data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!

With openpyxlversion 2.4.0 and pandasversion 0.19.2, the process @ski came up with gets a bit simpler:

import pandas
from openpyxl import load_workbook

with pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') as writer:
    writer.book = load_workbook('Masterfile.xlsx')
    data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])
#That's it!

回答 3

从pandas 0.24开始,您可以使用mode关键字参数简化此操作ExcelWriter

import pandas as pd

with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer: 
     data_filtered.to_excel(writer) 

Starting in pandas 0.24 you can simplify this with the mode keyword argument of ExcelWriter:

import pandas as pd

with pd.ExcelWriter('the_file.xlsx', engine='openpyxl', mode='a') as writer: 
     data_filtered.to_excel(writer) 

回答 4

老问题了,但我猜有些人还在搜索这个-所以…

我发现此方法不错,因为所有工作表都加载到工作表名称和数据框对的字典中,该字典由熊猫使用sheetname = None选项创建。在将电子表格读取为dict格式并将其从dict写回之前,添加,删除或修改工作表很简单。对于我来说,就速度和格式而言,xlsxwriter在执行此特定任务方面比openpyxl更好。

注意:未来版本的熊猫(0.21.0+)将把“ sheetname”参数更改为“ sheet_name”。

# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
                        sheetname=None)

# all worksheets are accessible as dataframes.

# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']

# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df

# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe

# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
                    engine='xlsxwriter',
                    datetime_format='yyyy-mm-dd',
                    date_format='yyyy-mm-dd') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

对于2013年问题中的示例:

ws_dict = pd.read_excel('Masterfile.xlsx',
                        sheetname=None)

ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]

with pd.ExcelWriter('Masterfile.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

Old question, but I am guessing some people still search for this – so…

I find this method nice because all worksheets are loaded into a dictionary of sheet name and dataframe pairs, created by pandas with the sheetname=None option. It is simple to add, delete or modify worksheets between reading the spreadsheet into the dict format and writing it back from the dict. For me the xlsxwriter works better than openpyxl for this particular task in terms of speed and format.

Note: future versions of pandas (0.21.0+) will change the “sheetname” parameter to “sheet_name”.

# read a single or multi-sheet excel file
# (returns dict of sheetname(s), dataframe(s))
ws_dict = pd.read_excel(excel_file_path,
                        sheetname=None)

# all worksheets are accessible as dataframes.

# easy to change a worksheet as a dataframe:
mod_df = ws_dict['existing_worksheet']

# do work on mod_df...then reassign
ws_dict['existing_worksheet'] = mod_df

# add a dataframe to the workbook as a new worksheet with
# ws name, df as dict key, value:
ws_dict['new_worksheet'] = some_other_dataframe

# when done, write dictionary back to excel...
# xlsxwriter honors datetime and date formats
# (only included as example)...
with pd.ExcelWriter(excel_file_path,
                    engine='xlsxwriter',
                    datetime_format='yyyy-mm-dd',
                    date_format='yyyy-mm-dd') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

For the example in the 2013 question:

ws_dict = pd.read_excel('Masterfile.xlsx',
                        sheetname=None)

ws_dict['Main'] = data_filtered[['Diff1', 'Diff2']]

with pd.ExcelWriter('Masterfile.xlsx',
                    engine='xlsxwriter') as writer:

    for ws_name, df_sheet in ws_dict.items():
        df_sheet.to_excel(writer, sheet_name=ws_name)

回答 5

我知道这是一个较旧的线程,但这是您在搜索时发现的第一项,并且如果需要将图表保留在已创建的工作簿中,则上述解决方案将不起作用。在这种情况下,xlwings是一个更好的选择-它允许您写入Excel书并保留图表/图表数据。

简单的例子:

import xlwings as xw
import pandas as pd

#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5

#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')

ws = wb.sheets['chartData']

ws.range('A1').options(index=False).value = df

wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')

xw.apps[0].quit()

I know this is an older thread, but this is the first item you find when searching, and the above solutions don’t work if you need to retain charts in a workbook that you already have created. In that case, xlwings is a better option – it allows you to write to the excel book and keeps the charts/chart data.

simple example:

import xlwings as xw
import pandas as pd

#create DF
months = ['2017-01','2017-02','2017-03','2017-04','2017-05','2017-06','2017-07','2017-08','2017-09','2017-10','2017-11','2017-12']
value1 = [x * 5+5 for x in range(len(months))]
df = pd.DataFrame(value1, index = months, columns = ['value1'])
df['value2'] = df['value1']+5
df['value3'] = df['value2']+5

#load workbook that has a chart in it
wb = xw.Book('C:\\data\\bookwithChart.xlsx')

ws = wb.sheets['chartData']

ws.range('A1').options(index=False).value = df

wb = xw.Book('C:\\data\\bookwithChart_updated.xlsx')

xw.apps[0].quit()

回答 6

在pandas 0.24中有一个更好的解决方案:

with pd.ExcelWriter(path, mode='a') as writer:
    s.to_excel(writer, sheet_name='another sheet', index=False)

之前:

后:

因此,立即升级您的熊猫:

pip install --upgrade pandas

There is a better solution in pandas 0.24:

with pd.ExcelWriter(path, mode='a') as writer:
    s.to_excel(writer, sheet_name='another sheet', index=False)

before:

after:

so upgrade your pandas now:

pip install --upgrade pandas

回答 7

def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
    try:
        master_book = load_workbook(master_file_path)
        master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
        master_writer.book = master_book
        master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
        current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
                                                               header=None,
                                                               index_col=None)
        current_frames.to_excel(master_writer, sheet_name, index=None, header=False)

        master_writer.save()
    except Exception as e:
        raise e

这非常完美,只有主文件(添加新工作表的文件)的格式丢失了。

def append_sheet_to_master(self, master_file_path, current_file_path, sheet_name):
    try:
        master_book = load_workbook(master_file_path)
        master_writer = pandas.ExcelWriter(master_file_path, engine='openpyxl')
        master_writer.book = master_book
        master_writer.sheets = dict((ws.title, ws) for ws in master_book.worksheets)
        current_frames = pandas.ExcelFile(current_file_path).parse(pandas.ExcelFile(current_file_path).sheet_names[0],
                                                               header=None,
                                                               index_col=None)
        current_frames.to_excel(master_writer, sheet_name, index=None, header=False)

        master_writer.save()
    except Exception as e:
        raise e

This works perfectly fine only thing is that formatting of the master file(file to which we add new sheet) is lost.


回答 8

writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)

“ keep_date_col”希望对您有所帮助

writer = pd.ExcelWriter('prueba1.xlsx'engine='openpyxl',keep_date_col=True)

The “keep_date_col” hope help you


回答 9

book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()
book = load_workbook(xlsFilename)
writer = pd.ExcelWriter(self.xlsFilename)
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, sheet_name=sheetName, index=False)
writer.save()

大熊猫可以使用列作为索引吗?

问题:大熊猫可以使用列作为索引吗?

我有一个像这样的电子表格:

Locality    2005    2006    2007    2008    2009

ABBOTSFORD  427000  448000  602500  600000  638500
ABERFELDIE  534000  600000  735000  710000  775000
AIREYS INLET459000  440000  430000  517500  512500

我不想手动将列与行交换。是否可以使用熊猫将数据读取到列表中,如下所示:

data['ABBOTSFORD']=[427000,448000,602500,600000,638500]
data['ABERFELDIE']=[534000,600000,735000,710000,775000]
data['AIREYS INLET']=[459000,440000,430000,517500,512500]

I have a spreadsheet like this:

Locality    2005    2006    2007    2008    2009

ABBOTSFORD  427000  448000  602500  600000  638500
ABERFELDIE  534000  600000  735000  710000  775000
AIREYS INLET459000  440000  430000  517500  512500

I don’t want to manually swap the column with the row. Could it be possible to use pandas reading data to a list as this:

data['ABBOTSFORD']=[427000,448000,602500,600000,638500]
data['ABERFELDIE']=[534000,600000,735000,710000,775000]
data['AIREYS INLET']=[459000,440000,430000,517500,512500]

回答 0

是的,使用set_index可以创建Locality行索引。

data.set_index('Locality', inplace=True)

如果inplace=True未提供,则set_index返回修改后的数据帧。

例:

> import pandas as pd
> df = pd.DataFrame([['ABBOTSFORD', 427000, 448000],
                     ['ABERFELDIE', 534000, 600000]],
                    columns=['Locality', 2005, 2006])

> df
     Locality    2005    2006
0  ABBOTSFORD  427000  448000
1  ABERFELDIE  534000  600000

> df.set_index('Locality', inplace=True)
> df
              2005    2006
Locality                  
ABBOTSFORD  427000  448000
ABERFELDIE  534000  600000

> df.loc['ABBOTSFORD']
2005    427000
2006    448000
Name: ABBOTSFORD, dtype: int64

> df.loc['ABBOTSFORD'][2005]
427000

> df.loc['ABBOTSFORD'].values
array([427000, 448000])

> df.loc['ABBOTSFORD'].tolist()
[427000, 448000]

Yes, with set_index you can make Locality your row index.

data.set_index('Locality', inplace=True)

If inplace=True is not provided, set_index returns the modified dataframe as a result.

Example:

> import pandas as pd
> df = pd.DataFrame([['ABBOTSFORD', 427000, 448000],
                     ['ABERFELDIE', 534000, 600000]],
                    columns=['Locality', 2005, 2006])

> df
     Locality    2005    2006
0  ABBOTSFORD  427000  448000
1  ABERFELDIE  534000  600000

> df.set_index('Locality', inplace=True)
> df
              2005    2006
Locality                  
ABBOTSFORD  427000  448000
ABERFELDIE  534000  600000

> df.loc['ABBOTSFORD']
2005    427000
2006    448000
Name: ABBOTSFORD, dtype: int64

> df.loc['ABBOTSFORD'][2005]
427000

> df.loc['ABBOTSFORD'].values
array([427000, 448000])

> df.loc['ABBOTSFORD'].tolist()
[427000, 448000]

回答 1

您可以使用进行更改,如已经说明的那样set_index。您无需手动将行与列交换data.T,pandas中有一个transpose()方法可以为您完成此操作:

> df = pd.DataFrame([['ABBOTSFORD', 427000, 448000],
                    ['ABERFELDIE', 534000, 600000]],
                    columns=['Locality', 2005, 2006])

> newdf = df.set_index('Locality').T
> newdf

Locality    ABBOTSFORD  ABERFELDIE
2005        427000      534000
2006        448000      600000

然后您可以获取数据框列值并将其转换为列表:

> newdf['ABBOTSFORD'].values.tolist()

[427000, 448000]

You can change the index as explained already using set_index. You don’t need to manually swap rows with columns, there is a transpose (data.T) method in pandas that does it for you:

> df = pd.DataFrame([['ABBOTSFORD', 427000, 448000],
                    ['ABERFELDIE', 534000, 600000]],
                    columns=['Locality', 2005, 2006])

> newdf = df.set_index('Locality').T
> newdf

Locality    ABBOTSFORD  ABERFELDIE
2005        427000      534000
2006        448000      600000

then you can fetch the dataframe column values and transform them to a list:

> newdf['ABBOTSFORD'].values.tolist()

[427000, 448000]

回答 2

您可以在从Pandas中的电子表格读取数据时使用可用的index_col参数设置列索引。

这是我的解决方案:

  1. 首先,将熊猫作为pd导入: import pandas as pd

  2. 使用pd.read_excel()读入文件名(如果电子表格中有数据),并通过指定index_col参数将索引设置为“ Locality”。

    df = pd.read_excel('testexcel.xlsx', index_col=0)

    在此阶段,如果出现“没有名为xlrd的模块”错误,请使用进行安装pip install xlrd

  3. 为了进行视觉检查,请读取数据框,使用df.head()该数据框将打印以下输出

  4. 现在,您可以获取数据框所需列的值并进行打印

You can set the column index using index_col parameter available while reading from spreadsheet in Pandas.

Here is my solution:

  1. Firstly, import pandas as pd: import pandas as pd

  2. Read in filename using pd.read_excel() (if you have your data in a spreadsheet) and set the index to ‘Locality’ by specifying the index_col parameter.

    df = pd.read_excel('testexcel.xlsx', index_col=0)

    At this stage if you get a ‘no module named xlrd’ error, install it using pip install xlrd.

  3. For visual inspection, read the dataframe using df.head() which will print the following output

  4. Now you can fetch the values of the desired columns of the dataframe and print it


有没有一种方法可以使用pandas.ExcelWriter自动调整Excel列的宽度?

问题:有没有一种方法可以使用pandas.ExcelWriter自动调整Excel列的宽度?

我被要求生成一些Excel报告。我目前正在大量使用pandas作为数据,所以自然地我想使用pandas.ExcelWriter方法生成这些报告。但是,固定的列宽是一个问题。

到目前为止,我的代码很简单。假设我有一个名为“ df”的数据框:

writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")

我正在查看pandas代码,但实际上没有看到任何设置列宽的选项。宇宙中是否有技巧可以使列自动调整为数据?还是在事实之后我可以对xlsx文件做一些事情来调整列宽?

(我正在使用OpenPyXL库,并生成.xlsx文件-如果有区别的话。)

谢谢。

I am being asked to generate some Excel reports. I am currently using pandas quite heavily for my data, so naturally I would like to use the pandas.ExcelWriter method to generate these reports. However the fixed column widths are a problem.

The code I have so far is simple enough. Say I have a dataframe called ‘df’:

writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")

I was looking over the pandas code, and I don’t really see any options to set column widths. Is there a trick out there in the universe to make it such that the columns auto-adjust to the data? Or is there something I can do after the fact to the xlsx file to adjust the column widths?

(I am using the OpenPyXL library, and generating .xlsx files – if that makes any difference.)

Thank you.


回答 0

user6178746的回答启发,我有以下内容:

# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items():  # loop through `dict` of dataframes
    df.to_excel(writer, sheet_name=sheetname)  # send df to writer
    worksheet = writer.sheets[sheetname]  # pull worksheet object
    for idx, col in enumerate(df):  # loop through all columns
        series = df[col]
        max_len = max((
            series.astype(str).map(len).max(),  # len of largest item
            len(str(series.name))  # len of column name/header
            )) + 1  # adding a little extra space
        worksheet.set_column(idx, idx, max_len)  # set column width
writer.save()

Inspired by user6178746’s answer, I have the following:

# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items():  # loop through `dict` of dataframes
    df.to_excel(writer, sheet_name=sheetname)  # send df to writer
    worksheet = writer.sheets[sheetname]  # pull worksheet object
    for idx, col in enumerate(df):  # loop through all columns
        series = df[col]
        max_len = max((
            series.astype(str).map(len).max(),  # len of largest item
            len(str(series.name))  # len of column name/header
            )) + 1  # adding a little extra space
        worksheet.set_column(idx, idx, max_len)  # set column width
writer.save()

回答 1

我发布此消息是因为我遇到了同样的问题,发现Xlsxwriter和pandas的官方文档仍然将此功能列为不受支持。我找到了解决我遇到的问题的解决方案。我基本上只是遍历每列,并使用worksheet.set_column设置列宽==该列内容的最大长度。

但是,重要的一点。此解决方案不适合列标题,仅适合列值。这应该是一个容易的更改,但是,如果您需要改头,则可以。希望这可以帮助某人:)

import pandas as pd
import sqlalchemy as sa
import urllib


read_server = 'serverName'
read_database = 'databaseName'

read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)

#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)

#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')

#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)

#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']

#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
    # find length of column i
    column_len = my_dataframe[col].astype(str).str.len().max()
    # Setting the length if the column header is larger
    # than the max column value length
    column_len = max(column_len, len(col)) + 2
    # set the column length
    worksheet.set_column(i, i, column_len)
writer.save()

I’m posting this because I just ran into the same issue and found that the official documentation for Xlsxwriter and pandas still have this functionality listed as unsupported. I hacked together a solution that solved the issue i was having. I basically just iterate through each column and use worksheet.set_column to set the column width == the max length of the contents of that column.

One important note, however. This solution does not fit the column headers, simply the column values. That should be an easy change though if you need to fit the headers instead. Hope this helps someone :)

import pandas as pd
import sqlalchemy as sa
import urllib


read_server = 'serverName'
read_database = 'databaseName'

read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)

#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)

#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')

#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)

#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']

#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
    # find length of column i
    column_len = my_dataframe[col].astype(str).str.len().max()
    # Setting the length if the column header is larger
    # than the max column value length
    column_len = max(column_len, len(col)) + 2
    # set the column length
    worksheet.set_column(i, i, column_len)
writer.save()

回答 2

现在可能尚无自动方法,但是当您使用openpyxl时,以下几行(由Bufke用户改写了有关手动操作的另一答案)允许您指定合理的值(以字符宽度表示):

writer.sheets['Summary'].column_dimensions['A'].width = 15

There is probably no automatic way to do it right now, but as you use openpyxl, the following line (adapted from another answer by user Bufke on how to do in manually) allows you to specify a sane value (in character widths):

writer.sheets['Summary'].column_dimensions['A'].width = 15

回答 3

我最近开始使用一个不错的程序包,称为StyleFrame。

它获得了DataFrame并允许您非常轻松地对其进行样式设置…

默认情况下,列宽是自动调整的。

例如:

from StyleFrame import StyleFrame
import pandas as pd

df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3], 
                   'bbbbbbbbb': [1, 1, 1],
                   'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
            columns_and_rows_to_freeze='B2')
excel_writer.save()

您还可以更改列宽:

sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
                    width=35.3)


更新

在1.4版中,best_fit参数已添加到中StyleFrame.to_excel。请参阅文档

There is a nice package that I started to use recently called StyleFrame.

it gets DataFrame and lets you to style it very easily…

by default the columns width is auto-adjusting.

for example:

from StyleFrame import StyleFrame
import pandas as pd

df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3], 
                   'bbbbbbbbb': [1, 1, 1],
                   'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
            columns_and_rows_to_freeze='B2')
excel_writer.save()

you can also change the columns width:

sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
                    width=35.3)

UPDATE 1

In version 1.4 best_fit argument was added to StyleFrame.to_excel. See the documentation.

UPDATE 2

Here’s a sample of code that works for StyleFrame 3.x.x

from styleframe import StyleFrame
import pandas as pd

columns = ['aaaaaaaaaaa', 'bbbbbbbbb', 'ccccccccccc', ]
df = pd.DataFrame(data={
        'aaaaaaaaaaa': [1, 2, 3, ],
        'bbbbbbbbb': [1, 1, 1, ],
        'ccccccccccc': [2, 3, 4, ],
    }, columns=columns,
)
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(
    excel_writer=excel_writer, 
    best_fit=columns,
    columns_and_rows_to_freeze='B2', 
    row_to_add_filters=0,
)
excel_writer.save()

回答 4

通过使用pandas和xlsxwriter,您可以完成任务,下面的代码将在Python 3.x中完美地工作。有关使用XlsxWriter和熊猫的更多详细信息,此链接可能会有用:https: //xlsxwriter.readthedocs.io/working_with_pandas.html

import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()

By using pandas and xlsxwriter you can do your task, below code will perfectly work in Python 3.x. For more details on working with XlsxWriter with pandas this link might be useful https://xlsxwriter.readthedocs.io/working_with_pandas.html

import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()

回答 5

我发现基于列标题而不是列内容来调整列更有用。

使用 df.columns.values.tolist() I生成列标题的列表,并使用这些标题的长度来确定列的宽度。

请参阅下面的完整代码:

import pandas as pd
import xlsxwriter

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)

workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet

header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
    worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)

writer.save() # Save the excel file

I found that it was more useful to adjust the column with based on the column header rather than column content.

Using df.columns.values.tolist() I generate a list of the column headers and use the lengths of these headers to determine the width of the columns.

See full code below:

import pandas as pd
import xlsxwriter

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)

workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet

header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
    worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)

writer.save() # Save the excel file

回答 6

在工作中,我总是将数据帧写入excel文件。因此,我没有反复编写相同的代码,而是创建了一个模数。现在,我只是将其导入并使用它来编写和设置excel文件。但是有一个缺点,如果数据帧过大,则需要花费很长时间。所以这是代码:

def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
    out_path = os.path.join(output_dir, output_name)
    writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
                    datetime_format='yyyymmdd', date_format='yyyymmdd')
    workbook = writerReport.book
    # loop through the list of dataframes to save every dataframe into a new sheet in the excel file
    for i, dataframe in enumerate(dataframes_list):
        sheet_name = sheet_names_list[i]  # choose the sheet name from sheet_names_list
        dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
        # Add a header format.
        format = workbook.add_format({
            'bold': True,
            'border': 1,
            'fg_color': '#0000FF',
            'font_color': 'white'})
        # Write the column headers with the defined format.
        worksheet = writerReport.sheets[sheet_name]
        for col_num, col_name in enumerate(dataframe.columns.values):
            worksheet.write(0, col_num, col_name, format)
        worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
        worksheet.freeze_panes(1, 0)
        # loop through the columns in the dataframe to get the width of the column
        for j, col in enumerate(dataframe.columns):
            max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
            # define a max width to not get to wide column
            if max_width > 50:
                max_width = 50
            worksheet.set_column(j, j, max_width)
    writerReport.save()
    writerReport.close()
    return output_dir + output_name

At work, I am always writing the dataframes to excel files. So instead of writing the same code over and over, I have created a modulus. Now I just import it and use it to write and formate the excel files. There is one downside though, it takes a long time if the dataframe is extra large. So here is the code:

def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
    out_path = os.path.join(output_dir, output_name)
    writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
                    datetime_format='yyyymmdd', date_format='yyyymmdd')
    workbook = writerReport.book
    # loop through the list of dataframes to save every dataframe into a new sheet in the excel file
    for i, dataframe in enumerate(dataframes_list):
        sheet_name = sheet_names_list[i]  # choose the sheet name from sheet_names_list
        dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
        # Add a header format.
        format = workbook.add_format({
            'bold': True,
            'border': 1,
            'fg_color': '#0000FF',
            'font_color': 'white'})
        # Write the column headers with the defined format.
        worksheet = writerReport.sheets[sheet_name]
        for col_num, col_name in enumerate(dataframe.columns.values):
            worksheet.write(0, col_num, col_name, format)
        worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
        worksheet.freeze_panes(1, 0)
        # loop through the columns in the dataframe to get the width of the column
        for j, col in enumerate(dataframe.columns):
            max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
            # define a max width to not get to wide column
            if max_width > 50:
                max_width = 50
            worksheet.set_column(j, j, max_width)
    writerReport.save()
    return output_dir + output_name


回答 7

动态调整所有列长

writer = pd.ExcelWriter('/path/to/output/file.xlsx') 
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')

for column in df:
    column_length = max(df[column].astype(str).map(len).max(), len(column))
    col_idx = df.columns.get_loc(column)
    writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)

使用列名手动调整列

col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

使用列索引手动调整列

writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

如果以上任何一个失败

AttributeError: 'Worksheet' object has no attribute 'set_column'

确保安装xlsxwriter

pip install xlsxwriter

Dynamically adjust all the column lengths

writer = pd.ExcelWriter('/path/to/output/file.xlsx') 
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')

for column in df:
    column_length = max(df[column].astype(str).map(len).max(), len(column))
    col_idx = df.columns.get_loc(column)
    writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)

Manually adjust a column using Column Name

col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

Manually adjust a column using Column Index

writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

In case any of the above is failing with

AttributeError: 'Worksheet' object has no attribute 'set_column'

make sure to install xlsxwriter:

pip install xlsxwriter

回答 8

结合其他答案和评论,还支持多索引:

def autosize_excel_columns(worksheet, df):
  autosize_excel_columns_df(worksheet, df.index.to_frame())
  autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)

def autosize_excel_columns_df(worksheet, df, offset=0):
  for idx, col in enumerate(df):
    series = df[col]
    max_len = max((
      series.astype(str).map(len).max(),
      len(str(series.name))
    )) + 1
    worksheet.set_column(idx+offset, idx+offset, max_len)

sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()

Combining the other answers and comments and also supporting multi-indices:

def autosize_excel_columns(worksheet, df):
  autosize_excel_columns_df(worksheet, df.index.to_frame())
  autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)

def autosize_excel_columns_df(worksheet, df, offset=0):
  for idx, col in enumerate(df):
    series = df[col]
    max_len = max((
      series.astype(str).map(len).max(),
      len(str(series.name))
    )) + 1
    worksheet.set_column(idx+offset, idx+offset, max_len)

sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()

回答 9

import re
import openpyxl
..
for col in _ws.columns:
    max_lenght = 0
    print(col[0])
    col_name = re.findall('\w\d', str(col[0]))
    col_name = col_name[0]
    col_name = re.findall('\w', str(col_name))[0]
    print(col_name)
    for cell in col:
        try:
            if len(str(cell.value)) > max_lenght:
                max_lenght = len(cell.value)
        except:
            pass
    adjusted_width = (max_lenght+2)
    _ws.column_dimensions[col_name].width = adjusted_width
import re
import openpyxl
..
for col in _ws.columns:
    max_lenght = 0
    print(col[0])
    col_name = re.findall('\w\d', str(col[0]))
    col_name = col_name[0]
    col_name = re.findall('\w', str(col_name))[0]
    print(col_name)
    for cell in col:
        try:
            if len(str(cell.value)) > max_lenght:
                max_lenght = len(cell.value)
        except:
            pass
    adjusted_width = (max_lenght+2)
    _ws.column_dimensions[col_name].width = adjusted_width

回答 10

最简单的解决方案是在set_column方法中指定列宽。

    for worksheet in writer.sheets.values():
        worksheet.set_column(0,last_column_value, required_width_constant)

Easiest solution is to specify width of column in set_column method.

    for worksheet in writer.sheets.values():
        worksheet.set_column(0,last_column_value, required_width_constant)

回答 11

def auto_width_columns(df, sheetname):
    workbook = writer.book  
    worksheet= writer.sheets[sheetname] 

    for i, col in enumerate(df.columns):
        column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
        worksheet.set_column(i, i, column_len)
def auto_width_columns(df, sheetname):
    workbook = writer.book  
    worksheet= writer.sheets[sheetname] 

    for i, col in enumerate(df.columns):
        column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
        worksheet.set_column(i, i, column_len)

使用Pandas对同一工作簿的多个工作表进行pd.read_excel()

问题:使用Pandas对同一工作簿的多个工作表进行pd.read_excel()

我有一个较大的电子表格文件(.xlsx),正在使用python pandas处理。碰巧我需要该大文件中两个选项卡中的数据。选项卡中的一个包含大量数据,另一个仅包含几个正方形单元格。

当我在任何工作表上使用pd.read_excel()时,在我看来整个文件都已加载(而不仅仅是我感兴趣的工作表)。因此,当我两次使用该方法(每张纸一次)时,我实际上不得不使整个工作簿被读两次(即使我们仅使用指定的工作表)。

我使用的是错误的还是仅限于这种方式?

谢谢!

I have a large spreadsheet file (.xlsx) that I’m processing using python pandas. It happens that I need data from two tabs in that large file. One of the tabs has a ton of data and the other is just a few square cells.

When I use pd.read_excel() on any worksheet, it looks to me like the whole file is loaded (not just the worksheet I’m interested in). So when I use the method twice (once for each sheet), I effectively have to suffer the whole workbook being read in twice (even though we’re only using the specified sheet).

Am I using it wrong or is it just limited in this way?

Thank you!


回答 0

尝试pd.ExcelFile

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

正如@HaPsantran指出的那样,在ExcelFile()调用过程中将读取整个Excel文件(似乎没有办法解决此问题)。这仅使您不必每次访问新表时都必须读取相同的文件。

请注意,sheet_name参数to pd.read_excel()可以是工作表的名称(如上所述),指定工作表编号的整数(例如0、1等),工作表名称或索引的列表或None。如果提供了列表,它将返回一个字典,其中的键是工作表名称/索引,值是数据框。默认设置是仅返回第一张纸(即sheet_name=0)。

如果None指定,则将所有表作为{sheet_name:dataframe}字典返回。

Try pd.ExcelFile:

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile() call (there doesn’t appear to be a way around this). This merely saves you from having to read the same file in each time you want to access a new sheet.

Note that the sheet_name argument to pd.read_excel() can be the name of the sheet (as above), an integer specifying the sheet number (eg 0, 1, etc), a list of sheet names or indices, or None. If a list is provided, it returns a dictionary where the keys are the sheet names/indices and the values are the data frames. The default is to simply return the first sheet (ie, sheet_name=0).

If None is specified, all sheets are returned, as a {sheet_name:dataframe} dictionary.


回答 1

有3个选项:

将所有表直接阅读到有序词典中。

import pandas as pd

# for pandas version >= 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheet_name=None)

# for pandas version < 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheetname=None)

感谢@ihightower指出它,并@toto_tico指出版本问题。

将第一张表直接读入数据框

df = pd.read_excel('excel_file_path.xls')
# this will read the first sheet into df

阅读excel文件并获取工作表列表。然后选择并加载纸张。

xls = pd.ExcelFile('excel_file_path.xls')

# Now you can list all sheets in the file
xls.sheet_names
# ['house', 'house_extra', ...]

# to read just one sheet to dataframe:
df = pd.read_excel(file_name, sheetname="house")

阅读所有工作表并将其存储在字典中。与第一个相同,但更明确。

# to read all sheets to a map
sheet_to_df_map = {}
for sheet_name in xls.sheet_names:
    sheet_to_df_map[sheet_name] = xls.parse(sheet_name)

更新:感谢@toto_tico指出版本问题。

sheetname:字符串,整数,字符串/整数的混合列表,或无,默认值0,自0.21.0版开始不推荐使用:使用sheet_name代替源链接

There are a few options:

Read all sheets directly into an ordered dictionary.

import pandas as pd

# for pandas version >= 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheet_name=None)

# for pandas version < 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheetname=None)

Read the first sheet directly into dataframe

df = pd.read_excel('excel_file_path.xls')
# this will read the first sheet into df

Read the excel file and get a list of sheets. Then chose and load the sheets.

xls = pd.ExcelFile('excel_file_path.xls')

# Now you can list all sheets in the file
xls.sheet_names
# ['house', 'house_extra', ...]

# to read just one sheet to dataframe:
df = pd.read_excel(file_name, sheetname="house")

Read all sheets and store it in a dictionary. Same as first but more explicit.

# to read all sheets to a map
sheet_to_df_map = {}
for sheet_name in xls.sheet_names:
    sheet_to_df_map[sheet_name] = xls.parse(sheet_name)
    # you can also use sheet_index [0,1,2..] instead of sheet name.

Thanks @ihightower for pointing it out way to read all sheets and @toto_tico for pointing out the version issue.

sheetname : string, int, mixed list of strings/ints, or None, default 0 Deprecated since version 0.21.0: Use sheet_name instead Source Link


回答 2

您还可以为工作表使用索引:

xls = pd.ExcelFile('path_to_file.xls')
sheet1 = xls.parse(0)

将给出第一个工作表。对于第二个工作表:

sheet2 = xls.parse(1)

You can also use the index for the sheet:

xls = pd.ExcelFile('path_to_file.xls')
sheet1 = xls.parse(0)

will give the first worksheet. for the second worksheet:

sheet2 = xls.parse(1)

回答 3

您还可以将工作表名称指定为参数:

data_file = pd.read_excel('path_to_file.xls', sheet_name="sheet_name")

将仅上传工作表"sheet_name"

You could also specify the sheet name as a parameter:

data_file = pd.read_excel('path_to_file.xls', sheet_name="sheet_name")

will upload only the sheet "sheet_name".


回答 4

pd.read_excel('filename.xlsx') 

默认情况下,请阅读工作簿的第一张表。

pd.read_excel('filename.xlsx', sheet_name = 'sheetname') 

阅读工作簿的特定表并

pd.read_excel('filename.xlsx', sheet_name = None) 

从Excel到pandas数据框读取所有工作表,因为OrderedDict的类型表示嵌套数据框,所有工作表都作为在数据框内部收集的数据框,其类型为OrderedDict。

pd.read_excel('filename.xlsx') 

by default read the first sheet of workbook.

pd.read_excel('filename.xlsx', sheet_name = 'sheetname') 

read the specific sheet of workbook and

pd.read_excel('filename.xlsx', sheet_name = None) 

read all the worksheets from excel to pandas dataframe as a type of OrderedDict means nested dataframes, all the worksheets as dataframes collected inside dataframe and it’s type is OrderedDict.


回答 5

是的,不幸的是,它将始终加载整个文件。如果您重复执行此操作,则最好将工作表提取为单独的CSV,然后分别加载。您可以使用d6tstack自动化该过程,它还添加了其他功能,例如检查所有工作表或多个Excel文件中的所有列是否相等。

import d6tstack
c = d6tstack.convert_xls.XLStoCSVMultiSheet('multisheet.xlsx')
c.convert_all() # ['multisheet-Sheet1.csv','multisheet-Sheet2.csv']

请参阅d6tstack Excel示例

Yes unfortunately it will always load the full file. If you’re doing this repeatedly probably best to extract the sheets to separate CSVs and then load separately. You can automate that process with d6tstack which also adds additional features like checking if all the columns are equal across all sheets or multiple Excel files.

import d6tstack
c = d6tstack.convert_xls.XLStoCSVMultiSheet('multisheet.xlsx')
c.convert_all() # ['multisheet-Sheet1.csv','multisheet-Sheet2.csv']

See d6tstack Excel examples


回答 6

如果您已将excel文件与python程序(相对地址)保存在同一文件夹中,则只需提及工作表编号以及文件名。语法= pd.read_excel(Filename,SheetNo)示例:

    data=pd.read_excel("wt_vs_ht.xlsx","Sheet2")
    print(data)
    x=data.Height
    y=data.Weight
    plt.plot(x,y,'x')
    plt.show()

If you have saved the excel file in the same folder as your python program (relative paths) then you just need to mention sheet number along with file name.

Example:

 data = pd.read_excel("wt_vs_ht.xlsx", "Sheet2")
 print(data)
 x = data.Height
 y = data.Weight
 plt.plot(x,y,'x')
 plt.show()

写入Excel电子表格

问题:写入Excel电子表格

我是Python的新手。我需要将程序中的一些数据写入电子表格。我在网上搜索过,似乎有很多可用的软件包(xlwt,XlsXcessive,openpyxl)。其他人则建议写入.csv文件(从不使用CSV,也不真正了解它是什么)。

该程序非常简单。我有两个列表(浮点数)和三个变量(字符串)。我不知道两个列表的长度,它们的长度可能不一样。

我希望布局如下图所示:

粉色列将具有第一个列表的值,绿色列将具有第二个列表的值。

那么最好的方法是什么?

PS我正在运行Windows 7,但运行该程序的计算机上不一定安装了Office。

import xlwt

x=1
y=2
z=3

list1=[2.34,4.346,4.234]

book = xlwt.Workbook(encoding="utf-8")

sheet1 = book.add_sheet("Sheet 1")

sheet1.write(0, 0, "Display")
sheet1.write(1, 0, "Dominance")
sheet1.write(2, 0, "Test")

sheet1.write(0, 1, x)
sheet1.write(1, 1, y)
sheet1.write(2, 1, z)

sheet1.write(4, 0, "Stimulus Time")
sheet1.write(4, 1, "Reaction Time")

i=4

for n in list1:
    i = i+1
    sheet1.write(i, 0, n)



book.save("trial.xls")

我是根据您的所有建议写的。它可以完成工作,但可以稍作改进。

如何将在for循环中创建的单元格(list1值)格式化为科学或数字格式?

我不想截断这些值。程序中使用的实际值在小数点后大约有10位数字。

I am new to Python. I need to write some data from my program to a spreadsheet. I’ve searched online and there seem to be many packages available (xlwt, XlsXcessive, openpyxl). Others suggest to write to a .csv file (never used CSV and don’t really understand what it is).

The program is very simple. I have two lists (float) and three variables (strings). I don’t know the lengths of the two lists and they probably won’t be the same length.

I want the layout to be as in the picture below:

The pink column will have the values of the first list and the green column will have the values of the second list.

So what’s the best way to do this?

P.S. I am running Windows 7 but I won’t necessarily have Office installed on the computers running this program.

import xlwt

x=1
y=2
z=3

list1=[2.34,4.346,4.234]

book = xlwt.Workbook(encoding="utf-8")

sheet1 = book.add_sheet("Sheet 1")

sheet1.write(0, 0, "Display")
sheet1.write(1, 0, "Dominance")
sheet1.write(2, 0, "Test")

sheet1.write(0, 1, x)
sheet1.write(1, 1, y)
sheet1.write(2, 1, z)

sheet1.write(4, 0, "Stimulus Time")
sheet1.write(4, 1, "Reaction Time")

i=4

for n in list1:
    i = i+1
    sheet1.write(i, 0, n)



book.save("trial.xls")

I wrote this using all your suggestions. It gets the job done but it can be slightly improved.

How do I format the cells created in the for loop (list1 values) as scientific or number?

I do not want to truncate the values. The actual values used in the program would have around 10 digits after the decimal.


回答 0

import xlwt

def output(filename, sheet, list1, list2, x, y, z):
    book = xlwt.Workbook()
    sh = book.add_sheet(sheet)

    variables = [x, y, z]
    x_desc = 'Display'
    y_desc = 'Dominance'
    z_desc = 'Test'
    desc = [x_desc, y_desc, z_desc]

    col1_name = 'Stimulus Time'
    col2_name = 'Reaction Time'

    #You may need to group the variables together
    #for n, (v_desc, v) in enumerate(zip(desc, variables)):
    for n, v_desc, v in enumerate(zip(desc, variables)):
        sh.write(n, 0, v_desc)
        sh.write(n, 1, v)

    n+=1

    sh.write(n, 0, col1_name)
    sh.write(n, 1, col2_name)

    for m, e1 in enumerate(list1, n+1):
        sh.write(m, 0, e1)

    for m, e2 in enumerate(list2, n+1):
        sh.write(m, 1, e2)

    book.save(filename)

有关更多说明:https : //github.com/python-excel

import xlwt

def output(filename, sheet, list1, list2, x, y, z):
    book = xlwt.Workbook()
    sh = book.add_sheet(sheet)

    variables = [x, y, z]
    x_desc = 'Display'
    y_desc = 'Dominance'
    z_desc = 'Test'
    desc = [x_desc, y_desc, z_desc]

    col1_name = 'Stimulus Time'
    col2_name = 'Reaction Time'

    #You may need to group the variables together
    #for n, (v_desc, v) in enumerate(zip(desc, variables)):
    for n, v_desc, v in enumerate(zip(desc, variables)):
        sh.write(n, 0, v_desc)
        sh.write(n, 1, v)

    n+=1

    sh.write(n, 0, col1_name)
    sh.write(n, 1, col2_name)

    for m, e1 in enumerate(list1, n+1):
        sh.write(m, 0, e1)

    for m, e2 in enumerate(list2, n+1):
        sh.write(m, 1, e2)

    book.save(filename)

for more explanation: https://github.com/python-excel


回答 1

使用DataFrame.to_excel大熊猫。Pandas允许您以功能丰富的数据结构表示数据,并且还可以读取 excel文件。

您首先必须将数据转换为DataFrame,然后将其保存到excel文件中,如下所示:

In [1]: from pandas import DataFrame
In [2]: l1 = [1,2,3,4]
In [3]: l2 = [1,2,3,4]
In [3]: df = DataFrame({'Stimulus Time': l1, 'Reaction Time': l2})
In [4]: df
Out[4]: 
   Reaction Time  Stimulus Time
0              1              1
1              2              2
2              3              3
3              4              4

In [5]: df.to_excel('test.xlsx', sheet_name='sheet1', index=False)

和出来的Excel文件看起来像这样:

请注意,两个列表的长度必须相等,否则熊猫会抱怨。要解决此问题,请将所有缺失的值替换为None

Use DataFrame.to_excel from pandas. Pandas allows you to represent your data in functionally rich datastructures and will let you read in excel files as well.

You will first have to convert your data into a DataFrame and then save it into an excel file like so:

In [1]: from pandas import DataFrame
In [2]: l1 = [1,2,3,4]
In [3]: l2 = [1,2,3,4]
In [3]: df = DataFrame({'Stimulus Time': l1, 'Reaction Time': l2})
In [4]: df
Out[4]: 
   Reaction Time  Stimulus Time
0              1              1
1              2              2
2              3              3
3              4              4

In [5]: df.to_excel('test.xlsx', sheet_name='sheet1', index=False)

and the excel file that comes out looks like this:

Note that both lists need to be of equal length else pandas will complain. To solve this, replace all missing values with None.


回答 2

  • xlrd / xlwt(标准):Python在其标准库中没有此功能,但我认为xlrd / xlwt是读取和写入excel文件的“标准”方式。制作工作簿,添加工作表,编写数据/公式以及格式化单元格非常容易。如果您需要所有这些东西,那么使用此库可能会获得最大的成功。我认为您可以选择openpyxl代替,它会非常相似,但是我没有使用过。

    要使用xlwt设置单元格格式,请定义a XFStyle并在写入工作表时包括样式。这是具有许多数字格式的示例。请参见下面的示例代码。

  • Tablib(功能强大,直观):Tablib是用于处理表格数据的功能更强大但更直观的库。它可以编写具有多个工作表以及其他格式(例如csv,json和yaml)的excel工作簿。如果不需要格式化的单元格(例如背景色),则可以使用该库,这将使您从长远角度上受益匪浅。

  • csv(简单):计算机上的文件是文本二进制文件。文本文件只是字符,包括特殊字符(例如换行符和选项卡),并且可以在任何地方轻松打开(例如记事本,Web浏览器或Office产品)。csv文件是一种以某种方式设置格式的文本文件:每一行都是一个值列表,以逗号分隔。Python程序可以轻松读取和写入文本,因此,csv文件是将数据从python程序导出到excel(或其他python程序)的最简单,最快的方法。

    Excel文件是二进制文件,需要知道文件格式的特殊库,这就是为什么您需要用于python的附加库或诸如Microsoft Excel,Gnumeric或LibreOffice之类的特殊程序来读取/写入它们的原因。


import xlwt

style = xlwt.XFStyle()
style.num_format_str = '0.00E+00'

...

for i,n in enumerate(list1):
    sheet1.write(i, 0, n, fmt)
  • xlrd/xlwt (standard): Python does not have this functionality in it’s standard library, but I think of xlrd/xlwt as the “standard” way to read and write excel files. It is fairly easy to make a workbook, add sheets, write data/formulas, and format cells. If you need all of these things, you may have the most success with this library. I think you could choose openpyxl instead and it would be quite similar, but I have not used it.

    To format cells with xlwt, define a XFStyle and include the style when you write to a sheet. Here is an example with many number formats. See example code below.

  • Tablib (powerful, intuitive): Tablib is a more powerful yet intuitive library for working with tabular data. It can write excel workbooks with multiple sheets as well as other formats, such as csv, json, and yaml. If you don’t need formatted cells (like background color), you will do yourself a favor to use this library, which will get you farther in the long run.

  • csv (easy): Files on your computer are either text or binary. Text files are just characters, including special ones like newlines and tabs, and can be easily opened anywhere (e.g. notepad, your web browser, or Office products). A csv file is a text file that is formatted in a certain way: each line is a list of values, separated by commas. Python programs can easily read and write text, so a csv file is the easiest and fastest way to export data from your python program into excel (or another python program).

    Excel files are binary and require special libraries that know the file format, which is why you need an additional library for python, or a special program like Microsoft Excel, Gnumeric, or LibreOffice, to read/write them.


import xlwt

style = xlwt.XFStyle()
style.num_format_str = '0.00E+00'

...

for i,n in enumerate(list1):
    sheet1.write(i, 0, n, fmt)

回答 3

我调查了一些用于Python的Excel模块,发现openpyxl是最好的。

免费书籍《用Python自动化无聊的东西》中有关于openpyxl的一章,其中有更多详细信息,或者您可以查看Read Docs网站。您无需安装Office或Excel即可使用openpyxl。

您的程序如下所示:

import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')

stimulusTimes = [1, 2, 3]
reactionTimes = [2.3, 5.1, 7.0]

for i in range(len(stimulusTimes)):
    sheet['A' + str(i + 6)].value = stimulusTimes[i]
    sheet['B' + str(i + 6)].value = reactionTimes[i]

wb.save('example.xlsx')

I surveyed a few Excel modules for Python, and found openpyxl to be the best.

The free book Automate the Boring Stuff with Python has a chapter on openpyxl with more details or you can check the Read the Docs site. You won’t need Office or Excel installed in order to use openpyxl.

Your program would look something like this:

import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')

stimulusTimes = [1, 2, 3]
reactionTimes = [2.3, 5.1, 7.0]

for i in range(len(stimulusTimes)):
    sheet['A' + str(i + 6)].value = stimulusTimes[i]
    sheet['B' + str(i + 6)].value = reactionTimes[i]

wb.save('example.xlsx')

回答 4

CSV代表逗号分隔的值。CSV就像一个文本文件,可以简单地通过添加.CSV扩展名来创建

例如,编写以下代码:

f = open('example.csv','w')
f.write("display,variable x")
f.close()

您可以使用excel打开此文件。

CSV stands for comma separated values. CSV is like a text file and can be created simply by adding the .CSV extension

for example write this code:

f = open('example.csv','w')
f.write("display,variable x")
f.close()

you can open this file with excel.


回答 5

import xlsxwriter


# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()

# Widen the first column to make the text clearer.
worksheet.set_column('A:A', 20)

# Add a bold format to use to highlight cells.
bold = workbook.add_format({'bold': True})

# Write some simple text.
worksheet.write('A1', 'Hello')

# Text with formatting.
worksheet.write('A2', 'World', bold)

# Write some numbers, with row/column notation.
worksheet.write(2, 0, 123)
worksheet.write(3, 0, 123.456)

# Insert an image.
worksheet.insert_image('B5', 'logo.png')

workbook.close()
import xlsxwriter


# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()

# Widen the first column to make the text clearer.
worksheet.set_column('A:A', 20)

# Add a bold format to use to highlight cells.
bold = workbook.add_format({'bold': True})

# Write some simple text.
worksheet.write('A1', 'Hello')

# Text with formatting.
worksheet.write('A2', 'World', bold)

# Write some numbers, with row/column notation.
worksheet.write(2, 0, 123)
worksheet.write(3, 0, 123.456)

# Insert an image.
worksheet.insert_image('B5', 'logo.png')

workbook.close()

回答 6

也尝试看看以下库:

xlwings-用于将数据从Python进出电子表格以及处理工作簿和图表

ExcelPython-一个Excel加载项,用于用Python代替VBA编写用户定义的函数(UDF)和宏

Try taking a look at the following libraries too:

xlwings – for getting data into and out of a spreadsheet from Python, as well as manipulating workbooks and charts

ExcelPython – an Excel add-in for writing user-defined functions (UDFs) and macros in Python instead of VBA


回答 7

OpenPyxl 是一个很好的库,用于读取/写入Excel 2010 xlsx / xlsm文件:

https://openpyxl.readthedocs.io/en/stable

另一个参考答案是使用折旧函数(get_sheet_by_name)。如果没有它,这是怎么做的:

import openpyxl

wbkName = 'New.xlsx'        #The file should be created before running the code.
wbk = openpyxl.load_workbook(wbkName)
wks = wbk['test1']
someValue = 1337
wks.cell(row=10, column=1).value = someValue
wbk.save(wbkName)
wbk.close

OpenPyxl is quite a nice library, built to read/write Excel 2010 xlsx/xlsm files:

https://openpyxl.readthedocs.io/en/stable

The other answer, referring to it is using the deperciated function (get_sheet_by_name). This is how to do it without it:

import openpyxl

wbkName = 'New.xlsx'        #The file should be created before running the code.
wbk = openpyxl.load_workbook(wbkName)
wks = wbk['test1']
someValue = 1337
wks.cell(row=10, column=1).value = someValue
wbk.save(wbkName)
wbk.close

回答 8

xlsxwriter库非常适合创建.xlsx文件。以下代码段.xlsx从字典列表中生成文件,同时说明顺序显示的名称

from xlsxwriter import Workbook


def create_xlsx_file(file_path: str, headers: dict, items: list):
    with Workbook(file_path) as workbook:
        worksheet = workbook.add_worksheet()
        worksheet.write_row(row=0, col=0, data=headers.values())
        header_keys = list(headers.keys())
        for index, item in enumerate(items):
            row = map(lambda field_id: item.get(field_id, ''), header_keys)
            worksheet.write_row(row=index + 1, col=0, data=row)


headers = {
    'id': 'User Id',
    'name': 'Full Name',
    'rating': 'Rating',
}

items = [
    {'id': 1, 'name': "Ilir Meta", 'rating': 0.06},
    {'id': 2, 'name': "Abdelmadjid Tebboune", 'rating': 4.0},
    {'id': 3, 'name': "Alexander Lukashenko", 'rating': 3.1},
    {'id': 4, 'name': "Miguel Díaz-Canel", 'rating': 0.32}
]

create_xlsx_file("my-xlsx-file.xlsx", headers, items)


💡注1-我故意不回答OP提出的确切情况。取而代之的是,我提出了一种大多数访客都希望使用的更为通用的解决方案。这个问题的标题在搜索引擎中有很好的索引,并跟踪大量流量

💡注2-如果您使用Python3.6或更高版本,请考虑OrderedDict在中使用headers。在Python3.6之前,dict未保留顺序。


The xlsxwriter library is great for creating .xlsx files. The following snippet generates an .xlsx file from a list of dicts while stating the order and the displayed names:

from xlsxwriter import Workbook


def create_xlsx_file(file_path: str, headers: dict, items: list):
    with Workbook(file_path) as workbook:
        worksheet = workbook.add_worksheet()
        worksheet.write_row(row=0, col=0, data=headers.values())
        header_keys = list(headers.keys())
        for index, item in enumerate(items):
            row = map(lambda field_id: item.get(field_id, ''), header_keys)
            worksheet.write_row(row=index + 1, col=0, data=row)


headers = {
    'id': 'User Id',
    'name': 'Full Name',
    'rating': 'Rating',
}

items = [
    {'id': 1, 'name': "Ilir Meta", 'rating': 0.06},
    {'id': 2, 'name': "Abdelmadjid Tebboune", 'rating': 4.0},
    {'id': 3, 'name': "Alexander Lukashenko", 'rating': 3.1},
    {'id': 4, 'name': "Miguel Díaz-Canel", 'rating': 0.32}
]

create_xlsx_file("my-xlsx-file.xlsx", headers, items)


💡 Note 1 – I’m purposely not answering to the exact case the OP presented. Instead, I’m presenting a more generic solution IMHO most visitors seek. This question’s title is well-indexed in search engines and tracks lots of traffic

💡 Note 2 – If you’re not using Python3.6 or newer, consider using OrderedDict in headers. Before Python3.6 the order in dict was not preserved.



回答 9

导入精确数字的最简单方法是在l1和中的数字后面添加小数l2。Python将此小数点解释为您的指示,以包括确切的数字。如果需要将其限制在小数点后一位,则应该能够创建一个限制输出的打印命令,类似于以下内容:

print variable_example[:13]

假设您的数据在小数点后还有两个整数,则将其限制在小数点后十位。

The easiest way to import the exact numbers is to add a decimal after the numbers in your l1 and l2. Python interprets this decimal point as instructions from you to include the exact number. If you need to restrict it to some decimal place, you should be able to create a print command that limits the output, something simple like:

print variable_example[:13]

Would restrict it to the tenth decimal place, assuming your data has two integers left of the decimal.


回答 10

您可以尝试hfexcel基于对人友好的面向对象的Python库XlsxWriter

from hfexcel import HFExcel

hf_workbook = HFExcel.hf_workbook('example.xlsx', set_default_styles=False)

hf_workbook.add_style(
    "headline", 
    {
       "bold": 1,
        "font_size": 14,
        "font": "Arial",
        "align": "center"
    }
)

sheet1 = hf_workbook.add_sheet("sheet1", name="Example Sheet 1")

column1, _ = sheet1.add_column('headline', name='Column 1', width=2)
column1.add_row(data='Column 1 Row 1')
column1.add_row(data='Column 1 Row 2')

column2, _ = sheet1.add_column(name='Column 2')
column2.add_row(data='Column 2 Row 1')
column2.add_row(data='Column 2 Row 2')


column3, _ = sheet1.add_column(name='Column 3')
column3.add_row(data='Column 3 Row 1')
column3.add_row(data='Column 3 Row 2')

# In order to get a row with coordinates:
# sheet[column_index][row_index] => row
print(sheet1[1][1].data)
assert(sheet1[1][1].data == 'Column 2 Row 2')

hf_workbook.save()

You can try hfexcel Human Friendly object-oriented python library based on XlsxWriter:

from hfexcel import HFExcel

hf_workbook = HFExcel.hf_workbook('example.xlsx', set_default_styles=False)

hf_workbook.add_style(
    "headline", 
    {
       "bold": 1,
        "font_size": 14,
        "font": "Arial",
        "align": "center"
    }
)

sheet1 = hf_workbook.add_sheet("sheet1", name="Example Sheet 1")

column1, _ = sheet1.add_column('headline', name='Column 1', width=2)
column1.add_row(data='Column 1 Row 1')
column1.add_row(data='Column 1 Row 2')

column2, _ = sheet1.add_column(name='Column 2')
column2.add_row(data='Column 2 Row 1')
column2.add_row(data='Column 2 Row 2')


column3, _ = sheet1.add_column(name='Column 3')
column3.add_row(data='Column 3 Row 1')
column3.add_row(data='Column 3 Row 2')

# In order to get a row with coordinates:
# sheet[column_index][row_index] => row
print(sheet1[1][1].data)
assert(sheet1[1][1].data == 'Column 2 Row 2')

hf_workbook.save()

回答 11

如果您需要修改现有工作簿,则最安全的方法是使用pyoo。您需要安装一些库,并且要花一些时间才能完成,但是一旦安装完成,这将是防弹的,因为您可以利用LibreOffice / OpenOffice的广泛而可靠的API。

请参阅我的要点,了解如何设置linux系统以及如何使用pyoo进行一些基本编码。

这是代码示例:

#!/usr/local/bin/python3
import pyoo
# Connect to LibreOffice using a named pipe 
# (named in the soffice process startup)
desktop = pyoo.Desktop(pipe='oo_pyuno')
wkbk = desktop.open_spreadsheet("<xls_file_name>")
sheet = wkbk.sheets['Sheet1']
# Write value 'foo' to cell E5 on Sheet1
sheet[4,4].value='foo'
wkbk.save()
wkbk.close()

If your need is to modify an existing workbook, the safest way would be to use pyoo. You need to have some libraries installed and it takes a few hoops to jump through but once its set up, this would be bulletproof as you are leveraging the wide and solid API’s of LibreOffice / OpenOffice.

Please see my Gist on how to set up a linux system and do some basic coding using pyoo.

Here is an example of the code:

#!/usr/local/bin/python3
import pyoo
# Connect to LibreOffice using a named pipe 
# (named in the soffice process startup)
desktop = pyoo.Desktop(pipe='oo_pyuno')
wkbk = desktop.open_spreadsheet("<xls_file_name>")
sheet = wkbk.sheets['Sheet1']
# Write value 'foo' to cell E5 on Sheet1
sheet[4,4].value='foo'
wkbk.save()
wkbk.close()

熊猫:在Excel文件中查找工作表列表

问题:熊猫:在Excel文件中查找工作表列表

新版本的Pandas使用以下界面加载Excel文件:

read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])

但是,如果我不知道可用的图纸怎么办?

例如,我正在使用以下工作表的excel文件

数据1,数据2 …,数据N,foo,bar

但我不知道N先验。

有什么方法可以从Pandas的excel文档中获取工作表列表吗?

The new version of Pandas uses the following interface to load Excel files:

read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])

but what if I don’t know the sheets that are available?

For example, I am working with excel files that the following sheets

Data 1, Data 2 …, Data N, foo, bar

but I don’t know N a priori.

Is there any way to get the list of sheets from an excel document in Pandas?


回答 0

您仍然可以使用ExcelFile类(和sheet_names属性):

xl = pd.ExcelFile('foo.xls')

xl.sheet_names  # see all sheet names

xl.parse(sheet_name)  # read a specific sheet to DataFrame

有关更多选项,请参阅文档以进行解析

You can still use the ExcelFile class (and the sheet_names attribute):

xl = pd.ExcelFile('foo.xls')

xl.sheet_names  # see all sheet names

xl.parse(sheet_name)  # read a specific sheet to DataFrame

see docs for parse for more options…


回答 1

您应该将第二个参数(工作表名称)明确指定为“无”。像这样:

 df = pandas.read_excel("/yourPath/FileName.xlsx", None);

“ df”都是作为DataFrames字典的工作表,您可以通过运行以下命令进行验证:

df.keys()

结果是这样的:

[u'201610', u'201601', u'201701', u'201702', u'201703', u'201704', u'201705', u'201706', u'201612', u'fund', u'201603', u'201602', u'201605', u'201607', u'201606', u'201608', u'201512', u'201611', u'201604']

请参阅pandas doc了解更多详细信息: https //pandas.pydata.org/pandas-docs/stable/generation/pandas.read_excel.html

You should explicitly specify the second parameter (sheetname) as None. like this:

 df = pandas.read_excel("/yourPath/FileName.xlsx", None);

“df” are all sheets as a dictionary of DataFrames, you can verify it by run this:

df.keys()

result like this:

[u'201610', u'201601', u'201701', u'201702', u'201703', u'201704', u'201705', u'201706', u'201612', u'fund', u'201603', u'201602', u'201605', u'201607', u'201606', u'201608', u'201512', u'201611', u'201604']

please refer pandas doc for more details: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html


回答 2

这是我发现最快的方法,灵感来自@divingTobi的答案。所有基于xlrd,openpyxl或pandas的答案对我来说都很慢,因为它们都首先加载整个文件。

from zipfile import ZipFile
from bs4 import BeautifulSoup  # you also need to install "lxml" for the XML parser

with ZipFile(file) as zipped_file:
    summary = zipped_file.open(r'xl/workbook.xml').read()
soup = BeautifulSoup(summary, "xml")
sheets = [sheet.get("name") for sheet in soup.find_all("sheet")]

This is the fastest way I have found, inspired by @divingTobi’s answer. All The answers based on xlrd, openpyxl or pandas are slow for me, as they all load the whole file first.

from zipfile import ZipFile
from bs4 import BeautifulSoup  # you also need to install "lxml" for the XML parser

with ZipFile(file) as zipped_file:
    summary = zipped_file.open(r'xl/workbook.xml').read()
soup = BeautifulSoup(summary, "xml")
sheets = [sheet.get("name") for sheet in soup.find_all("sheet")]


回答 3

以@dhwanil_shah的答案为基础,您不需要提取整个文件。有了zf.open它,可以直接从一个压缩文件中读取。

import xml.etree.ElementTree as ET
import zipfile

def xlsxSheets(f):
    zf = zipfile.ZipFile(f)

    f = zf.open(r'xl/workbook.xml')

    l = f.readline()
    l = f.readline()
    root = ET.fromstring(l)
    sheets=[]
    for c in root.findall('{http://schemas.openxmlformats.org/spreadsheetml/2006/main}sheets/*'):
        sheets.append(c.attrib['name'])
    return sheets

连续两个 readline s很难看,但内容仅在文本的第二行中。无需解析整个文件。

该解决方案似乎比该read_excel版本要快得多,而且很有可能比完整提取版本还快。

Building on @dhwanil_shah ‘s answer, you do not need to extract the whole file. With zf.open it is possible to read from a zipped file directly.

import xml.etree.ElementTree as ET
import zipfile

def xlsxSheets(f):
    zf = zipfile.ZipFile(f)

    f = zf.open(r'xl/workbook.xml')

    l = f.readline()
    l = f.readline()
    root = ET.fromstring(l)
    sheets=[]
    for c in root.findall('{http://schemas.openxmlformats.org/spreadsheetml/2006/main}sheets/*'):
        sheets.append(c.attrib['name'])
    return sheets

The two consecutive readlines are ugly, but the content is only in the second line of the text. No need to parse the whole file.

This solution seems to be much faster than the read_excel version, and most likely also faster than the full extract version.


回答 4

我已经尝试过xlrd,pandas,openpyxl和其他类似的库,并且随着读取整个文件时文件大小的增加,它们似乎都花费了指数时间。上面提到的其他使用’on_demand’的解决方案对我不起作用。如果只想最初获取工作表名称,则以下功能适用于xlsx文件。

def get_sheet_details(file_path):
    sheets = []
    file_name = os.path.splitext(os.path.split(file_path)[-1])[0]
    # Make a temporary directory with the file name
    directory_to_extract_to = os.path.join(settings.MEDIA_ROOT, file_name)
    os.mkdir(directory_to_extract_to)

    # Extract the xlsx file as it is just a zip file
    zip_ref = zipfile.ZipFile(file_path, 'r')
    zip_ref.extractall(directory_to_extract_to)
    zip_ref.close()

    # Open the workbook.xml which is very light and only has meta data, get sheets from it
    path_to_workbook = os.path.join(directory_to_extract_to, 'xl', 'workbook.xml')
    with open(path_to_workbook, 'r') as f:
        xml = f.read()
        dictionary = xmltodict.parse(xml)
        for sheet in dictionary['workbook']['sheets']['sheet']:
            sheet_details = {
                'id': sheet['@sheetId'],
                'name': sheet['@name']
            }
            sheets.append(sheet_details)

    # Delete the extracted files directory
    shutil.rmtree(directory_to_extract_to)
    return sheets

由于所有xlsx基本上都是压缩文件,因此我们提取基本的xml数据并直接从工作簿中读取工作表名称,与库函数相比,此过程只需花费一秒钟的时间。

基准测试:(在具有4张纸的
6mb xlsx文件上)Pandas,xlrd: 12秒
openpyxl: 24秒
建议的方法: 0.4秒

由于我的要求只是读取工作表名称,因此读取整个时间不必要的开销困扰着我,所以我改用了这种方法。

I have tried xlrd, pandas, openpyxl and other such libraries and all of them seem to take exponential time as the file size increase as it reads the entire file. The other solutions mentioned above where they used ‘on_demand’ did not work for me. If you just want to get the sheet names initially, the following function works for xlsx files.

def get_sheet_details(file_path):
    sheets = []
    file_name = os.path.splitext(os.path.split(file_path)[-1])[0]
    # Make a temporary directory with the file name
    directory_to_extract_to = os.path.join(settings.MEDIA_ROOT, file_name)
    os.mkdir(directory_to_extract_to)

    # Extract the xlsx file as it is just a zip file
    zip_ref = zipfile.ZipFile(file_path, 'r')
    zip_ref.extractall(directory_to_extract_to)
    zip_ref.close()

    # Open the workbook.xml which is very light and only has meta data, get sheets from it
    path_to_workbook = os.path.join(directory_to_extract_to, 'xl', 'workbook.xml')
    with open(path_to_workbook, 'r') as f:
        xml = f.read()
        dictionary = xmltodict.parse(xml)
        for sheet in dictionary['workbook']['sheets']['sheet']:
            sheet_details = {
                'id': sheet['@sheetId'],
                'name': sheet['@name']
            }
            sheets.append(sheet_details)

    # Delete the extracted files directory
    shutil.rmtree(directory_to_extract_to)
    return sheets

Since all xlsx are basically zipped files, we extract the underlying xml data and read sheet names from the workbook directly which takes a fraction of a second as compared to the library functions.

Benchmarking: (On a 6mb xlsx file with 4 sheets)
Pandas, xlrd: 12 seconds
openpyxl: 24 seconds
Proposed method: 0.4 seconds

Since my requirement was just reading the sheet names, the unnecessary overhead of reading the entire time was bugging me so I took this route instead.


回答 5

from openpyxl import load_workbook

sheets = load_workbook(excel_file, read_only=True).sheetnames

对于我正在使用的5MB Excel文件,load_workbook没有read_only标记花费了8.24秒。带有read_only标志,仅花费了39.6 ms。如果您仍然想使用Excel库而不是使用xml解决方案,那将比解析整个文件的方法快得多。

from openpyxl import load_workbook

sheets = load_workbook(excel_file, read_only=True).sheetnames

For a 5MB Excel file I’m working with, load_workbook without the read_only flag took 8.24s. With the read_only flag it only took 39.6 ms. If you still want to use an Excel library and not drop to an xml solution, that’s much faster than the methods that parse the whole file.


Pandas实战教程:将Excel转为html格式

大家谈及用Pandas导出数据,应该就会想到to.xxx系列的函数。

这其中呢,比较常用的就是pd.to_csv()pd.to_excel()。但其实还可以将其导成Html网页格式,这里用到的函数就是pd.to_html()

读取Excel

今天我们要实现Excel转为html格式,首先需要用读取Excel中的表格数据。

import pandas as pd
data = pd.read_excel('测试.xlsx')

查看数据

data.head()

下面我们来学习把DataFrame转换成HTML表格的方法。

生成Html

to_html()函数可以直接把DataFrame转换成HTML表格,只需一行代码即可实现:

html_table = data.to_html('测试.html')

运行上面代码后,工作目录中多了测试.html文件,使用网页浏览器打开它,显示内容如下👇

print(data.to_html())

通过print打印,可以看到DataFrame的内部结构被自动转换为嵌入在表格中的<TH>,<TR>,<TD>标签,保留所有内部层级结构。

调整格式

我们还可以自定义修改参数,来调整生成HTML的格式。

html_table = data.to_html('测试.html',header = True,index = False,justify='center')

再次打开新生成的测试.html文件,发现格式已经发生了变化。

如果想对格式进行进一步调整(增加标题、修改颜色等),就需要一些HTML知识了,可以对生成的测试.html文件中的文本进行调整。

对于有些小伙伴可能需要进行页面展示,就要搭配Flask库来使用了。

小结

Pandas提供read_html()to_html()两个函数用于读写html格式的文件。这两个函数非常有用,一个轻松将DataFrame等复杂的数据结构转换成HTML表格;另一个不用复杂爬虫,简单几行代码即可抓取Table表格型数据,简直是个神器!

今天篇幅很短,主要讲了Pandas中to_html()这个函数。使用该函数最大的优点是:我们在不了解html知识的情况下,就能生成一个表格型的HTML。本文转自快学Python

我们的文章到此就结束啦,如果你喜欢今天的 Python 教程,请持续关注Python实用宝典。

有任何问题,可以在公众号后台回复:加群,回答相应验证信息,进入互助群询问。

原创不易,希望你能在下面点个赞和在看支持我继续创作,谢谢!

给作者打赏,选择打赏金额
¥1¥5¥10¥20¥50¥100¥200 自定义

​Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号:Python实用宝典

Excel+Python=精美壁纸日历 任意DIY

广东的太阳还是那么大,隔着玻璃都能感受到热浪。

明明前不久才立夏(明明已经过去三个月!!)

时间跑,日程赶。

昨日又迎来了立秋,正在放暑假的童靴是不是有点忘记时间了呢~

什么?真的忘记了?没关系,今日小编为大家带来一款由Excel简易DIY的小日历

给自己10分钟(滑稽),python回你一个智慧与美貌并存的备忘小神器 相信它会给你带来不少方便(滑稽)

一、环境说明

开始之前,当然要跟小伙伴们交代一下运行环境咯

我们使用的python版本为 Python3.6,需要使用到的包为 openpyxlcalendar,后者是python自带的,而前者则需要小伙伴们打开Cmd/Terminal,运行以下指令安装,如果你还没有安装python,请看这篇文章

pip install openpyxl

安装完成后我们就可以正式开始啦!

二、代码说明

我们会给大家先讲解一些细节的东西,等大家都理解明白了原理,最后会献上完整的源代码~

1. 首先,绘制一份日历,我们得先知道每个月份有多少天,每天都是星期几,我们使用calendar包获得这些信息:
calendar.monthcalendar(2019, i)

通过这个函数,我们能得到 2019年i月的日历,它类似一个j*k的矩阵,因此我们可以这样遍历得到每一个日期:

    for j in range(len(calendar.monthcalendar(2019, i))):
        for k in range(len(calendar.monthcalendar(2019, i)[j])):
            value = calendar.monthcalendar(2019, i)[j][k]
2. 其次,我们怎么样绘制得到日历呢?

openpyxl包给予了我们答案,最方便的做法是我们先将日历绘制到Excel中,然后再从Excel中提取图片出来。openpyxl怎么用?给大家一个设置单元格字体的例子:

sheet.cell(row=j + 4 + count, column=k + 2).font = Font(u'微软雅黑', color=text_color , size=14)

sheet是对应的表格,row和column就是某个单元格的位置,然后对font属性进行设置,调用Font类并设置参数,如果大家不知道Font类有什么参数,可以参考openpyxl官方文档:https://openpyxl.readthedocs.io/en/stable/,你可以看到里面大部分单元格的属性都是这样设置的,非常简单。

3. 我们的作品是每个月份都有一个图在旁边做装饰,其添加方法如下:
imgs = ['12/1.jpg','12/2.jpg','12/3.jpg','12/4.jpg','12/5.jpg','12/6.jpg','12/7.jpg','12/8.jpg','12/9.jpg','12/10.jpg','12/11.jpg','12/12.jpg']

img = Image(imgs[i-1])        
sheet.add_image(img, 'J2')

imgs是每个图的相对路径,如12/1.jpg 是名字为12的文件夹下的1.jpg. 图像路径要导入到openpyxl的Image对象中: img=Image(’12/1.jpg’),然后将该变量放置到某个单元格上:sheet.add_image(img, ‘J2).

这样看你可能会有点糊涂, i-1是哪里来的?sheet是哪里来的?没关系,其实是因为讲解的时候只能给大家献上部分代码,看完下面的完整代码你们就懂啦:

from openpyxl.styles import Alignment, PatternFill, Font, Border, Side
from openpyxl.utils import get_column_letter
from openpyxl.drawing.image import Image
import openpyxl
import calendar

def set_information(date, text):
    t = {}
    t['month'] = date.split('-')[1]
    t['day'] = date.split('-')[2]
    t['text'] = text
    flex_text.append(t)

def set_month_value(i, sheet, border_color, text_color, color_one, color_two):
    # i: 月份
    # sheet: 该月份的excel
    # border_color: 边框颜色
    count = 0
    # render_color 用来设定单元格背景色,交替进行
    render_color_1 = 1
    render_color_2 = 0

    for j in range(len(calendar.monthcalendar(2019, i))):
        for k in range(len(calendar.monthcalendar(2019, i)[j])):
            value = calendar.monthcalendar(2019, i)[j][k]
            # 将0值变为空值
            bd = Border(right=Side(color=border_color, style='thick'),
                        top=Side(color=border_color, style='thick'),
                        left=Side(color=border_color, style='thick'))
            right_bd = Border(right=Side(color=border_color, style='thick'),
                              left=Side(color=border_color, style='thick'),
                              bottom=Side(color=border_color, style='thick'))
            
            if value == 0:
                value = ''
                sheet.cell(row=j + 4 + count, column=k + 2).value = value
                sheet.cell(row=j + 4 + count, column=k + 2).border = bd
                sheet.cell(row=j + 5 + count, column=k + 2).border = right_bd
            else:
                sheet.cell(row=j + 4 + count, column=k + 2).value = value
                sheet.cell(row=j + 4 + count, column=k + 2).border = bd
                sheet.cell(row=j + 4 + count, column=k + 2).font = Font(u'微软雅黑', color=text_color , size=14)
                sheet.cell(row=j + 5 + count, column=k + 2).border = right_bd
                # 单元格文字设置,右对齐,垂直居中

            if render_color_1 &gt; render_color_2:
                sheet.cell(row=j + 4 + count, column=k + 2).fill = PatternFill("solid", fgColor=color_one)
                sheet.cell(row=j + 5 + count, column=k + 2).fill = PatternFill("solid", fgColor=color_one)
                render_color_2 += 1
            else:
                sheet.cell(row=j + 4 + count, column=k + 2).fill = PatternFill("solid", fgColor=color_two)
                sheet.cell(row=j + 5 + count, column=k + 2).fill = PatternFill("solid", fgColor=color_two)
                render_color_1 += 1

            # 提取当天所有事件
            text = ''
            for t in flex_text:
                if int(t['day']) == value and int(t['month']) == i:
                    print(t)
                    text = text + t['text']+'\n'

            # 设置事件信息
            if text != '':
                sheet.cell(row=j + 5 + count, column=k + 2).value = text
                sheet.cell(row=j + 5 + count, column=k + 2).font = Font(u'宋体',color=text_color, size=13)
                align = Alignment(horizontal='right', vertical='center', wrapText=True)
                # wrapText 设置单元格可包含多行字符
                sheet.cell(row=j + 5 + count, column=k + 2).alignment = align
        count += 1

def set_week_line(sheet, border_color, workday_color, otherday_color, text_color):
    # 设置星期栏
    align = Alignment(horizontal='center', vertical='center')
    days = ['星期日', '星期一', '星期二', '星期三', '星期四', '星期五', '星期六']
    bd_day = Border(right=Side(color=border_color, style='thick'), 
                      top=Side(color=border_color, style='thick'),
                      left=Side(color=border_color, style='thick'))
    num = 0
    # 单元格填充色属性设置
    for day in range(2, 9):
        sheet.cell(row=3, column=day).value = days[num]
        sheet.cell(row=3, column=day).alignment = align
        sheet.cell(row=3, column=day).border = bd_day
        sheet.cell(row=3, column=day).font = Font(u'微软雅黑', color=text_color, bold=True , size=12)
        # 设置列宽12
        c_char = get_column_letter(day)
        sheet.column_dimensions[get_column_letter(day)].width = 12
        # 行高27
        sheet.row_dimensions[day].height = 27
        num += 1
        if day == 2 or day == 8:
            for r in range(3, 14):
                sheet.cell(row=r, column=day).fill = PatternFill("solid", fgColor=otherday_color)
        else:
            sheet.cell(row=3, column=day).fill = PatternFill("solid", fgColor=workday_color)

def set_month_year(i, sheet, year_color, month_color):
    # 添加年份及月份
    sheet.cell(row=2, column=8).value = '2019'
    sheet.cell(row=2, column=8).font = Font(u'微软雅黑', size=30, color=year_color)
    sheet.cell(row=2, column=2).value = str(i) + '月'
    sheet.cell(row=2, column=2).font = Font(u'微软雅黑', size=25, color=month_color)
    sheet.row_dimensions[2].height = 35

def get_month_xlsx(wb, imgs, flex_text):
    for i in range(1, 13):
        sheet = wb.create_sheet(index=0, title=str(i) + '月')
        # 添加工作表

        text_color = '000000'
        # 日历文字颜色
        text_color_week = '000000'
        # 星期栏文字颜色
        BorderCorlor = 'B7E0E8'
        # 边框颜色
        backgroundColor = 'FFFFFF'
        # 背景颜色
        workday_color = 'CBEEEE'
        # 工作日背景颜色
        otherday_color = '7FD4D2'
        # 其他日背景颜色
        year_color = '000000'
        # 年份颜色
        month_color = '000000'
        # 月份颜色
        s_color = 'CBEEEE'
        # 单数颜色
        d_color = 'A5E1E0'
        # 双数颜色
        
        # 单元格的背景色
        for k1 in range(1, 15):
            for k2 in range(1, 16):
                sheet.cell(row=k1, column=k2).fill = PatternFill("solid", fgColor=backgroundColor)

        set_month_value(i, sheet, BorderCorlor, text_color, s_color, d_color)
        # 设定月份的值,参数:月份, 表, 边框颜色

        set_week_line(sheet, BorderCorlor, workday_color, otherday_color, text_color_week)
        # 设定星期栏

        set_month_year(i, sheet, year_color, month_color)
        # 设定年份和月份的格式

        # 设置日历主体行高
        for row in range(3, 19):
            if row % 2 == 0 or row == 3:
                sheet.row_dimensions[row].height = 28
            else:
                sheet.row_dimensions[row].height = 56

        # 合并单元格
        sheet.merge_cells('I1:P14')

        # 添加图片
        img = Image(imgs[i-1])
        sheet.add_image(img, 'J2')

        # 添加二维码
        img = Image('2.png')
        sheet.add_image(img, 'O2')

calendar.setfirstweekday(firstweekday=6)
wb = openpyxl.Workbook()
flex_text = []

imgs = ['12/1.jpg','12/2.jpg','12/3.jpg','12/4.jpg','12/5.jpg','12/6.jpg','12/7.jpg','12/8.jpg','12/9.jpg','12/10.jpg','12/11.jpg','12/12.jpg']
# 每个月的图片

set_information('2019-12-1', '考试')
set_information('2019-12-1', '约会')
# 需要添加的信息

get_month_xlsx(wb, imgs, flex_text)
# 得到文档

wb.save('my_calendary.xlsx')
# 保存文档
4. 我们还有一个神秘功能

差点忘了告诉大家了,我们的日历能支持备注哦,在调用get_month_xlsx得到文档前,通过set_information()放入你某一天想做的事情即可。如:

set_information('2019-12-5', '面试')

三、运行代码

终于到了激动人心的运行代码部分了,运行这份代码,你只需要把你想要的图片变量名改一下即可,即imgs. 然后在本地cmd/terminal运行:

python 这份代码的文件名(滑稽.py

会自动生成一个Excel表格叫my_calendary.xlsx. 怎样从里面把日历提取成图片呢?很简单,复制拉取你想要的部分,粘贴到聊天窗口就能变成一个图片!这里给大家献上源代码里的12张图,希望大家喜欢!

下载地址:https://pythondict-1252734158.file.myqcloud.com/home/www/pythondict/wp-content/uploads/2019/08/2019080909262065.zip


根据大家的喜好,大家可以自己设置背景色、边框色、交替色和图片,还有把这个讨厌的好看的二维码去掉。如果有阅读完注释还是不懂的地方,欢迎在下方留言区讨论,我们会抽空回答的~


Python实用宝典 (pythondict.com)

不只是一个宝典

欢迎关注公众号:Python实用宝典