Python 实用宝典

Question 1

The new version of Pandas uses the following interface to load Excel files:

read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])

but what if I don’t know the sheets that are available?

For example, I am working with excel files that the following sheets

Data 1, Data 2 …, Data N, foo, bar

but I don’t know N a priori.

Is there any way to get the list of sheets from an excel document in Pandas?

Question 2

You can still use the ExcelFile class (and the sheet_names attribute):

xl = pd.ExcelFile('foo.xls')

xl.sheet_names  # see all sheet names

xl.parse(sheet_name)  # read a specific sheet to DataFrame

see docs for parse for more options…

Question 3

You should explicitly specify the second parameter (sheetname) as None. like this:

 df = pandas.read_excel("/yourPath/FileName.xlsx", None);

“df” are all sheets as a dictionary of DataFrames, you can verify it by run this:

df.keys()

result like this:

[u'201610', u'201601', u'201701', u'201702', u'201703', u'201704', u'201705', u'201706', u'201612', u'fund', u'201603', u'201602', u'201605', u'201607', u'201606', u'201608', u'201512', u'201611', u'201604']

please refer pandas doc for more details: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Question 4

This is the fastest way I have found, inspired by @divingTobi’s answer. All The answers based on xlrd, openpyxl or pandas are slow for me, as they all load the whole file first.

from zipfile import ZipFile
from bs4 import BeautifulSoup  # you also need to install "lxml" for the XML parser

with ZipFile(file) as zipped_file:
    summary = zipped_file.open(r'xl/workbook.xml').read()
soup = BeautifulSoup(summary, "xml")
sheets = [sheet.get("name") for sheet in soup.find_all("sheet")]

Question 5

Building on @dhwanil_shah ‘s answer, you do not need to extract the whole file. With zf.open it is possible to read from a zipped file directly.

import xml.etree.ElementTree as ET
import zipfile

def xlsxSheets(f):
    zf = zipfile.ZipFile(f)

    f = zf.open(r'xl/workbook.xml')

    l = f.readline()
    l = f.readline()
    root = ET.fromstring(l)
    sheets=[]
    for c in root.findall('{http://schemas.openxmlformats.org/spreadsheetml/2006/main}sheets/*'):
        sheets.append(c.attrib['name'])
    return sheets

The two consecutive readlines are ugly, but the content is only in the second line of the text. No need to parse the whole file.

This solution seems to be much faster than the read_excel version, and most likely also faster than the full extract version.

Question 6

I have tried xlrd, pandas, openpyxl and other such libraries and all of them seem to take exponential time as the file size increase as it reads the entire file. The other solutions mentioned above where they used ‘on_demand’ did not work for me. If you just want to get the sheet names initially, the following function works for xlsx files.

def get_sheet_details(file_path):
    sheets = []
    file_name = os.path.splitext(os.path.split(file_path)[-1])[0]
    # Make a temporary directory with the file name
    directory_to_extract_to = os.path.join(settings.MEDIA_ROOT, file_name)
    os.mkdir(directory_to_extract_to)

    # Extract the xlsx file as it is just a zip file
    zip_ref = zipfile.ZipFile(file_path, 'r')
    zip_ref.extractall(directory_to_extract_to)
    zip_ref.close()

    # Open the workbook.xml which is very light and only has meta data, get sheets from it
    path_to_workbook = os.path.join(directory_to_extract_to, 'xl', 'workbook.xml')
    with open(path_to_workbook, 'r') as f:
        xml = f.read()
        dictionary = xmltodict.parse(xml)
        for sheet in dictionary['workbook']['sheets']['sheet']:
            sheet_details = {
                'id': sheet['@sheetId'],
                'name': sheet['@name']
            }
            sheets.append(sheet_details)

    # Delete the extracted files directory
    shutil.rmtree(directory_to_extract_to)
    return sheets

Since all xlsx are basically zipped files, we extract the underlying xml data and read sheet names from the workbook directly which takes a fraction of a second as compared to the library functions.

Benchmarking: (On a 6mb xlsx file with 4 sheets)
Pandas, xlrd: 12 seconds
openpyxl: 24 seconds
Proposed method: 0.4 seconds

Since my requirement was just reading the sheet names, the unnecessary overhead of reading the entire time was bugging me so I took this route instead.

Question 7

from openpyxl import load_workbook

sheets = load_workbook(excel_file, read_only=True).sheetnames

For a 5MB Excel file I’m working with, load_workbook without the read_only flag took 8.24s. With the read_only flag it only took 39.6 ms. If you still want to use an Excel library and not drop to an xml solution, that’s much faster than the methods that parse the whole file.

Question 8

I am trying to create a .csv file with the values from a Python list. When I print the values in the list they are all unicode (?), i.e. they look something like this

[u'value 1', u'value 2', ...]

If I iterate through the values in the list i.e. for v in mylist: print v they appear to be plain text.

And I can put a , between each with print ','.join(mylist)

And I can output to a file, i.e.

myfile = open(...)
print >>myfile, ','.join(mylist)

But I want to output to a CSV and have delimiters around the values in the list e.g.

"value 1", "value 2", ...

I can’t find an easy way to include the delimiters in the formatting, e.g. I have tried through the join statement. How can I do this?

Question 9

import csv

with open(..., 'wb') as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    wr.writerow(mylist)

Edit: this only works with python 2.x.

To make it work with python 3.x replace wb with w (see this SO answer)

with open(..., 'w', newline='') as myfile:
     wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
     wr.writerow(mylist)

Question 10

Here is a secure version of Alex Martelli’s:

import csv

with open('filename', 'wb') as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    wr.writerow(mylist)

Question 11

For another approach, you can use DataFrame in pandas: And it can easily dump the data to csv just like the code below:

import pandas
df = pandas.DataFrame(data={"col1": list_1, "col2": list_2})
df.to_csv("./file.csv", sep=',',index=False)

Question 12

The best option I’ve found was using the savetxt from the numpy module:

import numpy as np
np.savetxt("file_name.csv", data1, delimiter=",", fmt='%s', header=header)

In case you have multiple lists that need to be stacked

np.savetxt("file_name.csv", np.column_stack((data1, data2)), delimiter=",", fmt='%s', header=header)

Question 13

Use python’s csv module for reading and writing comma or tab-delimited files. The csv module is preferred because it gives you good control over quoting.

For example, here is the worked example for you:

import csv
data = ["value %d" % i for i in range(1,4)]

out = csv.writer(open("myfile.csv","w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(data)

Produces:

"value 1","value 2","value 3"

Question 14

You could use the string.join method in this case.

Split over a few of lines for clarity – here’s an interactive session

>>> a = ['a','b','c']
>>> first = '", "'.join(a)
>>> second = '"%s"' % first
>>> print second
"a", "b", "c"

Or as a single line

>>> print ('"%s"') % '", "'.join(a)
"a", "b", "c"

However, you may have a problem is your strings have got embedded quotes. If this is the case you’ll need to decide how to escape them.

The CSV module can take care of all of this for you, allowing you to choose between various quoting options (all fields, only fields with quotes and seperators, only non numeric fields, etc) and how to esacpe control charecters (double quotes, or escaped strings). If your values are simple, string.join will probably be OK but if you’re having to manage lots of edge cases, use the module available.

Question 15

This solutions sounds crazy, but works smooth as honey

import csv

with open('filename', 'wb') as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL,delimiter='\n')
    wr.writerow(mylist)

The file is being written by csvwriter hence csv properties are maintained i.e. comma separated. The delimiter helps in the main part by moving list items to next line, each time.

Question 16

To create and write into a csv file

The below example demonstrate creating and writing a csv file. to make a dynamic file writer we need to import a package import csv, then need to create an instance of the file with file reference Ex:- with open(“D:\sample.csv”,”w”,newline=””) as file_writer

here if the file does not exist with the mentioned file directory then python will create a same file in the specified directory, and “w” represents write, if you want to read a file then replace “w” with “r” or to append to existing file then “a”. newline=”” specifies that it removes an extra empty row for every time you create row so to eliminate empty row we use newline=””, create some field names(column names) using list like fields=[“Names”,”Age”,”Class”], then apply to writer instance like writer=csv.DictWriter(file_writer,fieldnames=fields) here using Dictionary writer and assigning column names, to write column names to csv we use writer.writeheader() and to write values we use writer.writerow({“Names”:”John”,”Age”:20,”Class”:”12A”}) ,while writing file values must be passed using dictionary method , here the key is column name and value is your respective key value

import csv 

with open("D:\\sample.csv","w",newline="") as file_writer:

   fields=["Names","Age","Class"]

   writer=csv.DictWriter(file_writer,fieldnames=fields)

   writer.writeheader()

   writer.writerow({"Names":"John","Age":21,"Class":"12A"})

Question 17

Jupyter notebook

Lets say that your list is A

Then you can code the following ad you will have it as a csv file (columns only!)

R="\n".join(A)
f = open('Columns.csv','w')
f.write(R)
f.close()

Question 18

you should use the CSV module for sure , but the chances are , you need to write unicode . For those Who need to write unicode , this is the class from example page , that you can use as a util module:

import csv, codecs, cStringIO

class UTF8Recoder:
    """
    Iterator that reads an encoded stream and reencodes the input to UTF-8
    """
    def __init__(self, f, encoding):
        self.reader = codecs.getreader(encoding)(f)

def __iter__(self):
    return self

def next(self):
    return self.reader.next().encode("utf-8")

class UnicodeReader:
    """
    A CSV reader which will iterate over lines in the CSV file "f",
    which is encoded in the given encoding.
    """

def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
    f = UTF8Recoder(f, encoding)
    self.reader = csv.reader(f, dialect=dialect, **kwds)

def next(self):
    row = self.reader.next()
    return [unicode(s, "utf-8") for s in row]

def __iter__(self):
    return self

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
"""

def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
    # Redirect output to a queue
    self.queue = cStringIO.StringIO()
    self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
    self.stream = f
    self.encoder = codecs.getincrementalencoder(encoding)()

def writerow(self, row):
    self.writer.writerow([s.encode("utf-8") for s in row])
    # Fetch UTF-8 output from the queue ...
    data = self.queue.getvalue()
    data = data.decode("utf-8")
    # ... and reencode it into the target encoding
    data = self.encoder.encode(data)
    # write to the target stream
    self.stream.write(data)
    # empty queue
    self.queue.truncate(0)

def writerows(self, rows):
    for row in rows:
        self.writerow(row)

Question 19

Here is another solution that does not require the csv module.

print ', '.join(['"'+i+'"' for i in myList])

Example :

>>> myList = [u'value 1', u'value 2', u'value 3']
>>> print ', '.join(['"'+i+'"' for i in myList])
"value 1", "value 2", "value 3"

However, if the initial list contains some “, they will not be escaped. If it is required, it is possible to call a function to escape it like that :

print ', '.join(['"'+myFunction(i)+'"' for i in myList])

Python 实用宝典

标签归档：xlrd

熊猫：在Excel文件中查找工作表列表

问题：熊猫：在Excel文件中查找工作表列表

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

使用Python列表中的值创建.csv文件

问题：使用Python列表中的值创建.csv文件

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

Jupyter笔记本

Jupyter notebook

回答 9

回答 10

有趣好用的Python教程