标签归档:csv

Python逐行写入CSV

问题:Python逐行写入CSV

我有通过http请求访问的数据,并由服务器以逗号分隔的格式发送回去,我有以下代码:

site= 'www.example.com'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
soup = soup.get_text()
text=str(soup)

文本内容如下:

april,2,5,7
may,3,5,8
june,4,7,3
july,5,6,9

如何将这些数据保存到CSV文件中。我知道我可以按照以下步骤做一些事情,逐行进行迭代:

import StringIO
s = StringIO.StringIO(text)
for line in s:

但是我不确定现在如何正确地将每一行写入CSV

编辑—>感谢您提供的反馈,该解决方案非常简单,可以在下面看到。

解:

import StringIO
s = StringIO.StringIO(text)
with open('fileName.csv', 'w') as f:
    for line in s:
        f.write(line)

I have data which is being accessed via http request and is sent back by the server in a comma separated format, I have the following code :

site= 'www.example.com'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
soup = soup.get_text()
text=str(soup)

The content of text is as follows:

april,2,5,7
may,3,5,8
june,4,7,3
july,5,6,9

How can I save this data into a CSV file. I know I can do something along the lines of the following to iterate line by line:

import StringIO
s = StringIO.StringIO(text)
for line in s:

But i’m unsure how to now properly write each line to CSV

EDIT—> Thanks for the feedback as suggested the solution was rather simple and can be seen below.

Solution:

import StringIO
s = StringIO.StringIO(text)
with open('fileName.csv', 'w') as f:
    for line in s:
        f.write(line)

回答 0

一般方式:

##text=List of strings to be written to file
with open('csvfile.csv','wb') as file:
    for line in text:
        file.write(line)
        file.write('\n')

要么

使用CSV编写器:

import csv
with open(<path to output_csv>, "wb") as csv_file:
        writer = csv.writer(csv_file, delimiter=',')
        for line in data:
            writer.writerow(line)

要么

最简单的方法:

f = open('csvfile.csv','w')
f.write('hi there\n') #Give your csv text here.
## Python will convert \n to os.linesep
f.close()

General way:

##text=List of strings to be written to file
with open('csvfile.csv','wb') as file:
    for line in text:
        file.write(line)
        file.write('\n')

OR

Using CSV writer :

import csv
with open(<path to output_csv>, "wb") as csv_file:
        writer = csv.writer(csv_file, delimiter=',')
        for line in data:
            writer.writerow(line)

OR

Simplest way:

f = open('csvfile.csv','w')
f.write('hi there\n') #Give your csv text here.
## Python will convert \n to os.linesep
f.close()

回答 1

您可以像写入任何普通文件一样直接写入文件。

with open('csvfile.csv','wb') as file:
    for l in text:
        file.write(l)
        file.write('\n')

如果以防万一,它是一个列表列表,您可以直接使用内置csv模块

import csv

with open("csvfile.csv", "wb") as file:
    writer = csv.writer(file)
    writer.writerows(text)

You could just write to the file as you would write any normal file.

with open('csvfile.csv','wb') as file:
    for l in text:
        file.write(l)
        file.write('\n')

If just in case, it is a list of lists, you could directly use built-in csv module

import csv

with open("csvfile.csv", "wb") as file:
    writer = csv.writer(file)
    writer.writerows(text)

回答 2

我只需将每一行写入文件,因为它已经是CSV格式:

write_file = "output.csv"
with open(write_file, "w") as output:
    for line in text:
        output.write(line + '\n')

我现在不记得如何写带有换行符的行,尽管:p

此外,你可能想看看这个答案write()writelines()'\n'

I would simply write each line to a file, since it’s already in a CSV format:

write_file = "output.csv"
with open(write_file, "w") as output:
    for line in text:
        output.write(line + '\n')

I can’t recall how to write lines with line-breaks at the moment, though :p

Also, you might like to take a look at this answer about write(), writelines(), and '\n'.


回答 3

为了补充前面的答案,我快速上了一堂课来写CSV文件。如果您必须处理多个文件,它可以更轻松地管理和关闭打开的文件,并实现一致性和更简洁的代码。

class CSVWriter():

    filename = None
    fp = None
    writer = None

    def __init__(self, filename):
        self.filename = filename
        self.fp = open(self.filename, 'w', encoding='utf8')
        self.writer = csv.writer(self.fp, delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')

    def close(self):
        self.fp.close()

    def write(self, elems):
        self.writer.writerow(elems)

    def size(self):
        return os.path.getsize(self.filename)

    def fname(self):
        return self.filename

用法示例:

mycsv = CSVWriter('/tmp/test.csv')
mycsv.write((12,'green','apples'))
mycsv.write((7,'yellow','bananas'))
mycsv.close()
print("Written %d bytes to %s" % (mycsv.size(), mycsv.fname()))

玩得开心

To complement the previous answers, I whipped up a quick class to write to CSV files. It makes it easier to manage and close open files and achieve consistency and cleaner code if you have to deal with multiple files.

class CSVWriter():

    filename = None
    fp = None
    writer = None

    def __init__(self, filename):
        self.filename = filename
        self.fp = open(self.filename, 'w', encoding='utf8')
        self.writer = csv.writer(self.fp, delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')

    def close(self):
        self.fp.close()

    def write(self, elems):
        self.writer.writerow(elems)

    def size(self):
        return os.path.getsize(self.filename)

    def fname(self):
        return self.filename

Example usage:

mycsv = CSVWriter('/tmp/test.csv')
mycsv.write((12,'green','apples'))
mycsv.write((7,'yellow','bananas'))
mycsv.close()
print("Written %d bytes to %s" % (mycsv.size(), mycsv.fname()))

Have fun


回答 4

那这个呢:

with open("your_csv_file.csv", "w") as f:
    f.write("\n".join(text))

str.join()返回一个字符串,该字符串是可迭代的字符串的串联。元素之间的分隔符是提供此方法的字符串。

What about this:

with open("your_csv_file.csv", "w") as f:
    f.write("\n".join(text))

str.join() Return a string which is the concatenation of the strings in iterable. The separator between elements is the string providing this method.


读取pandas数据框的前几行的方法

问题:读取pandas数据框的前几行的方法

是否有内置的使用方式 read_csv仅读取n文件的前几行而无需提前知道行的长度?我有一个大文件,需要花费很长时间才能读取,偶尔只想使用前20行来获取它的样本(并且不希望加载完整的文件并花大头)。

如果我知道总行数,则可以执行类似的操作footer_lines = total_lines - n并将其传递给skipfooter关键字arg。我当前的解决方案是n使用python和StringIO 手动将第一行抓取到熊猫:

import pandas as pd
from StringIO import StringIO

n = 20
with open('big_file.csv', 'r') as f:
    head = ''.join(f.readlines(n))

df = pd.read_csv(StringIO(head))

并没有那么糟,但是有没有更简洁的“ pandasic”(?)方式来处理关键字或其他内容呢?

Is there a built-in way to use read_csv to read only the first n lines of a file without knowing the length of the lines ahead of time? I have a large file that takes a long time to read, and occasionally only want to use the first, say, 20 lines to get a sample of it (and prefer not to load the full thing and take the head of it).

If I knew the total number of lines I could do something like footer_lines = total_lines - n and pass this to the skipfooter keyword arg. My current solution is to manually grab the first n lines with python and StringIO it to pandas:

import pandas as pd
from StringIO import StringIO

n = 20
with open('big_file.csv', 'r') as f:
    head = ''.join(f.readlines(n))

df = pd.read_csv(StringIO(head))

It’s not that bad, but is there a more concise, ‘pandasic’ (?) way to do it with keywords or something?


回答 0

我认为您可以使用该nrows参数。从文档

nrows : int, default None

    Number of rows of file to read. Useful for reading pieces of large files

这似乎有效。使用标准大型测试文件之一(988504479字节,5344499行):

In [1]: import pandas as pd

In [2]: time z = pd.read_csv("P00000001-ALL.csv", nrows=20)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s

In [3]: len(z)
Out[3]: 20

In [4]: time z = pd.read_csv("P00000001-ALL.csv")
CPU times: user 27.63 s, sys: 1.92 s, total: 29.55 s
Wall time: 30.23 s

I think you can use the nrows parameter. From the docs:

nrows : int, default None

    Number of rows of file to read. Useful for reading pieces of large files

which seems to work. Using one of the standard large test files (988504479 bytes, 5344499 lines):

In [1]: import pandas as pd

In [2]: time z = pd.read_csv("P00000001-ALL.csv", nrows=20)
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s

In [3]: len(z)
Out[3]: 20

In [4]: time z = pd.read_csv("P00000001-ALL.csv")
CPU times: user 27.63 s, sys: 1.92 s, total: 29.55 s
Wall time: 30.23 s

获取pandas.read_csv以将空值读取为空字符串而不是nan

问题:获取pandas.read_csv以将空值读取为空字符串而不是nan

我正在使用pandas库读取一些CSV数据。在我的数据中,某些列包含字符串。该字符串"nan"是一个可能的值,一个空字符串也可以。我设法让大熊猫将“ nan”读取为字符串,但是我不知道如何获取不读取空值的NaN。这是示例数据和输出

One,Two,Three
a,1,one
b,2,two
,3,three
d,4,nan
e,5,five
nan,6,
g,7,seven

>>> pandas.read_csv('test.csv', na_values={'One': [], "Three": []})
    One  Two  Three
0    a    1    one
1    b    2    two
2  NaN    3  three
3    d    4    nan
4    e    5   five
5  nan    6    NaN
6    g    7  seven

它正确地写着“男”为字符串“南”,但仍读取空单元格作为NaN的。我想传递strconverters参数read_csv(带converters={'One': str})),但它仍然读取空单元格作为NaN的。

我意识到我可以在读取后使用fillna填充值,但是真的没有办法告诉熊猫特定CSV列中的空单元格应被读取为空字符串而不是NaN吗?

I’m using the pandas library to read in some CSV data. In my data, certain columns contain strings. The string "nan" is a possible value, as is an empty string. I managed to get pandas to read “nan” as a string, but I can’t figure out how to get it not to read an empty value as NaN. Here’s sample data and output

One,Two,Three
a,1,one
b,2,two
,3,three
d,4,nan
e,5,five
nan,6,
g,7,seven

>>> pandas.read_csv('test.csv', na_values={'One': [], "Three": []})
    One  Two  Three
0    a    1    one
1    b    2    two
2  NaN    3  three
3    d    4    nan
4    e    5   five
5  nan    6    NaN
6    g    7  seven

It correctly reads “nan” as the string “nan’, but still reads the empty cells as NaN. I tried passing in str in the converters argument to read_csv (with converters={'One': str})), but it still reads the empty cells as NaN.

I realize I can fill the values after reading, with fillna, but is there really no way to tell pandas that an empty cell in a particular CSV column should be read as an empty string instead of NaN?


回答 0

我添加了票证以在此处添加某种选项:

https://github.com/pydata/pandas/issues/1450

同时,result.fillna('')应该做你想做的

编辑:在开发版本中(最终为0.8.0),如果您指定的空列表na_values,则空字符串将在结果中保留空字符串

I added a ticket to add an option of some sort here:

https://github.com/pydata/pandas/issues/1450

In the meantime, result.fillna('') should do what you want

EDIT: in the development version (to be 0.8.0 final) if you specify an empty list of na_values, empty strings will stay empty strings in the result


回答 1

阅读其他答案和评论后,我仍然感到困惑。但是,现在的答案似乎更简单,因此您可以开始操作。

从Pandas 0.9版(自2012年起)开始,您只需设置keep_default_na=False以下内容,即可读取解释为空字符串的空单元格的csv :

pd.read_csv('test.csv', keep_default_na=False)

此问题在以下内容中有更清楚的说明

该问题已于2012年8月19日在Pandas 0.9版中修复

I was still confused after reading the other answers and comments. But the answer now seems simpler, so here you go.

Since Pandas version 0.9 (from 2012), you can read your csv with empty cells interpreted as empty strings by simply setting keep_default_na=False:

pd.read_csv('test.csv', keep_default_na=False)

This issue is more clearly explained in

That was fixed on on Aug 19, 2012 for Pandas version 0.9 in


回答 2

我们在Pandas read_csv中有一个简单的参数:

使用:

df = pd.read_csv('test.csv', na_filter= False)

熊猫的文档清楚地解释了上述论点是如何工作的。

链接

We have a simple argument in Pandas read_csv for this:

Use:

df = pd.read_csv('test.csv', na_filter= False)

Pandas documentation clearly explains how the above argument works.

Link


将列表的Python列表写入csv文件

问题:将列表的Python列表写入csv文件

我有一长串以下形式的清单-

a = [[1.2,'abc',3],[1.2,'werew',4],........,[1.4,'qew',2]]

即列表中的值是不同的类型-浮点数,整数,字符串。如何将其写入csv文件,以便输出的csv文件看起来像

1.2,abc,3
1.2,werew,4
.
.
.
1.4,qew,2

I have a long list of lists of the following form —

a = [[1.2,'abc',3],[1.2,'werew',4],........,[1.4,'qew',2]]

i.e. the values in the list are of different types — float,int, strings.How do I write it into a csv file so that my output csv file looks like

1.2,abc,3
1.2,werew,4
.
.
.
1.4,qew,2

回答 0

Python的内置CSV模块可以轻松处理此问题:

import csv

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(a)

假设您的问题中的清单定义为a。您可以通过各种可选参数来调整输出CSV的确切格式,csv.writer()如上面链接的库参考页中所述。

Python 3更新

import csv

with open("out.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(a)

Python’s built-in CSV module can handle this easily:

import csv

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(a)

This assumes your list is defined as a, as it is in your question. You can tweak the exact format of the output CSV via the various optional parameters to csv.writer() as documented in the library reference page linked above.

Update for Python 3

import csv

with open("out.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(a)

回答 1

您可以使用pandas

In [1]: import pandas as pd

In [2]: a = [[1.2,'abc',3],[1.2,'werew',4],[1.4,'qew',2]]

In [3]: my_df = pd.DataFrame(a)

In [4]: my_df.to_csv('my_csv.csv', index=False, header=False)

You could use pandas:

In [1]: import pandas as pd

In [2]: a = [[1.2,'abc',3],[1.2,'werew',4],[1.4,'qew',2]]

In [3]: my_df = pd.DataFrame(a)

In [4]: my_df.to_csv('my_csv.csv', index=False, header=False)

回答 2

import csv
with open(file_path, 'a') as outcsv:   
    #configure writer to write standard csv file
    writer = csv.writer(outcsv, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
    writer.writerow(['number', 'text', 'number'])
    for item in list:
        #Write item to outcsv
        writer.writerow([item[0], item[1], item[2]])

官方文档:http : //docs.python.org/2/library/csv.html

import csv
with open(file_path, 'a') as outcsv:   
    #configure writer to write standard csv file
    writer = csv.writer(outcsv, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
    writer.writerow(['number', 'text', 'number'])
    for item in list:
        #Write item to outcsv
        writer.writerow([item[0], item[1], item[2]])

official docs: http://docs.python.org/2/library/csv.html


回答 3

如果出于某种原因你想要做手工(不使用模块一样csvpandasnumpy等):

with open('myfile.csv','w') as f:
    for sublist in mylist:
        for item in sublist:
            f.write(item + ',')
        f.write('\n')

当然,滚动自己的版本可能容易出错且效率低下……这通常就是为什么要使用该模块的原因。但是有时编写自己的内容可以帮助您了解它们的工作原理,有时则更容易。

If for whatever reason you wanted to do it manually (without using a module like csv,pandas,numpy etc.):

with open('myfile.csv','w') as f:
    for sublist in mylist:
        for item in sublist:
            f.write(item + ',')
        f.write('\n')

Of course, rolling your own version can be error-prone and inefficient … that’s usually why there’s a module for that. But sometimes writing your own can help you understand how they work, and sometimes it’s just easier.


回答 4

在我很大的列表中使用csv.writer花费了很长时间。我决定使用熊猫,它更快,更容易控制和理解:

 import pandas

 yourlist = [[...],...,[...]]
 pd = pandas.DataFrame(yourlist)
 pd.to_csv("mylist.csv")

好的部分,您可以更改一些内容以制作更好的csv文件:

 yourlist = [[...],...,[...]]
 columns = ["abcd","bcde","cdef"] #a csv with 3 columns
 index = [i[0] for i in yourlist] #first element of every list in yourlist
 not_index_list = [i[1:] for i in yourlist]
 pd = pandas.DataFrame(not_index_list, columns = columns, index = index)

 #Now you have a csv with columns and index:
 pd.to_csv("mylist.csv")

Using csv.writer in my very large list took quite a time. I decided to use pandas, it was faster and more easy to control and understand:

 import pandas

 yourlist = [[...],...,[...]]
 pd = pandas.DataFrame(yourlist)
 pd.to_csv("mylist.csv")

The good part you can change somethings to make a better csv file:

 yourlist = [[...],...,[...]]
 columns = ["abcd","bcde","cdef"] #a csv with 3 columns
 index = [i[0] for i in yourlist] #first element of every list in yourlist
 not_index_list = [i[1:] for i in yourlist]
 pd = pandas.DataFrame(not_index_list, columns = columns, index = index)

 #Now you have a csv with columns and index:
 pd.to_csv("mylist.csv")

回答 5

Ambers的解决方案也适用于numpy数组:

from pylab import *
import csv

array_=arange(0,10,1)
list_=[array_,array_*2,array_*3]
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(list_)

Ambers’s solution also works well for numpy arrays:

from pylab import *
import csv

array_=arange(0,10,1)
list_=[array_,array_*2,array_*3]
with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(list_)

回答 6

如果您不想为此导入csv模块,则可以仅使用Python内置组件将列表列表写入csv文件

with open("output.csv", "w") as f:
    for row in a:
        f.write("%s\n" % ','.join(str(col) for col in row))

If you don’t want to import csv module for that, you can write a list of lists to a csv file using only Python built-ins

with open("output.csv", "w") as f:
    for row in a:
        f.write("%s\n" % ','.join(str(col) for col in row))

回答 7

确保lineterinator='\n'创建创作者时注明;否则,当数据源来自其他csv文件时,每条数据行之后可能会在文件中写入多余的空行…

这是我的解决方案:

with open('csvfile', 'a') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter='    ',quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
for i in range(0, len(data)):
    spamwriter.writerow(data[i])

Make sure to indicate lineterinator='\n' when create the writer; otherwise, an extra empty line might be written into file after each data line when data sources are from other csv file…

Here is my solution:

with open('csvfile', 'a') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter='    ',quotechar='|', quoting=csv.QUOTE_MINIMAL, lineterminator='\n')
for i in range(0, len(data)):
    spamwriter.writerow(data[i])

回答 8

如何将列表列表转储到pickle中并使用pickle模块恢复呢?很方便

>>> import pickle
>>> 
>>> mylist = [1, 'foo', 'bar', {1, 2, 3}, [ [1,4,2,6], [3,6,0,10]]]
>>> with open('mylist', 'wb') as f:
...     pickle.dump(mylist, f) 


>>> with open('mylist', 'rb') as f:
...      mylist = pickle.load(f)
>>> mylist
[1, 'foo', 'bar', {1, 2, 3}, [[1, 4, 2, 6], [3, 6, 0, 10]]]
>>> 

How about dumping the list of list into pickle and restoring it with pickle module? It’s quite convenient.

>>> import pickle
>>> 
>>> mylist = [1, 'foo', 'bar', {1, 2, 3}, [ [1,4,2,6], [3,6,0,10]]]
>>> with open('mylist', 'wb') as f:
...     pickle.dump(mylist, f) 


>>> with open('mylist', 'rb') as f:
...      mylist = pickle.load(f)
>>> mylist
[1, 'foo', 'bar', {1, 2, 3}, [[1, 4, 2, 6], [3, 6, 0, 10]]]
>>> 

回答 9

在csv.writer函数中使用换行符跟随示例时,出现错误消息。以下代码对我有用。

 with open(strFileName, "w") as f:
    writer = csv.writer(f, delimiter=',',  quoting=csv.QUOTE_MINIMAL)
    writer.writerows(result)

I got an error message when following the examples with a newline parameter in the csv.writer function. The following code worked for me.

 with open(strFileName, "w") as f:
    writer = csv.writer(f, delimiter=',',  quoting=csv.QUOTE_MINIMAL)
    writer.writerows(result)

如何将标题行添加到Pandas DataFrame

问题:如何将标题行添加到Pandas DataFrame

我正在将csv文件读入pandas。此csv文件由四列和一些行组成,但没有要添加的标题行。我一直在尝试以下方法:

Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')

但是,当我应用代码时,出现以下错误:

ValueError: Shape of passed values is (1, 1), indices imply (4, 1)

错误的确切含义是什么?在python中将标题行添加到csv文件/ pandas df的一种干净方法是什么?

I am reading a csv file into pandas. This csv file constists of four columns and some rows, but does not have a header row, which I want to add. I have been trying the following:

Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')

But when I apply the code, I get the following Error:

ValueError: Shape of passed values is (1, 1), indices imply (4, 1)

What exactly does the error mean? And what would be a clean way in python to add a header row to my csv file/pandas df?


回答 0

您可以names直接在read_csv

names:类似数组,默认为None要使用的列名列表。如果文件不包含标题行,则应显式传递header = None

Cov = pd.read_csv("path/to/file.txt", 
                  sep='\t', 
                  names=["Sequence", "Start", "End", "Coverage"])

You can use names directly in the read_csv

names : array-like, default None List of column names to use. If file contains no header row, then you should explicitly pass header=None

Cov = pd.read_csv("path/to/file.txt", 
                  sep='\t', 
                  names=["Sequence", "Start", "End", "Coverage"])

回答 1

另外,您可以阅读csv,header=None然后添加df.columns

Cov = pd.read_csv("path/to/file.txt", sep='\t', header=None)
Cov.columns = ["Sequence", "Start", "End", "Coverage"]

Alternatively you could read you csv with header=None and then add it with df.columns:

Cov = pd.read_csv("path/to/file.txt", sep='\t', header=None)
Cov.columns = ["Sequence", "Start", "End", "Coverage"]

回答 2

col_Names=["Sequence", "Start", "End", "Coverage"]
my_CSV_File= pd.read_csv("yourCSVFile.csv",names=col_Names)

完成此操作后,只需进行检查[显然,我知道,你知道。但是…

my_CSV_File.head()

希望对您有帮助…干杯

col_Names=["Sequence", "Start", "End", "Coverage"]
my_CSV_File= pd.read_csv("yourCSVFile.csv",names=col_Names)

having done this, just check it with[well obviously I know, u know that. But still…

my_CSV_File.head()

Hope it helps … Cheers


回答 3

要修复代码,您只需将更[Cov]改为Cov.values,第一个参数pd.DataFrame将变为多维numpy数组:

Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame=pd.DataFrame(Cov.values, columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')

但最聪明的解决方案仍然是pd.read_excelheader=None和一起使用names=columns_list

To fix your code you can simply change [Cov] to Cov.values, the first parameter of pd.DataFrame will become a multi-dimensional numpy array:

Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame=pd.DataFrame(Cov.values, columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')

But the smartest solution still is use pd.read_excel with header=None and names=columns_list.


如何将此字典列表转换为csv文件?

问题:如何将此字典列表转换为csv文件?

我有一个字典列表,看起来像这样:

toCSV = [{'name':'bob','age':25,'weight':200},{'name':'jim','age':31,'weight':180}]

我应该怎么做才能将其转换为如下所示的csv文件:

name,age,weight
bob,25,200
jim,31,180

I have a list of dictionaries that looks something like this:

toCSV = [{'name':'bob','age':25,'weight':200},{'name':'jim','age':31,'weight':180}]

What should I do to convert this to a csv file that looks something like this:

name,age,weight
bob,25,200
jim,31,180

回答 0

import csv
toCSV = [{'name':'bob','age':25,'weight':200},
         {'name':'jim','age':31,'weight':180}]
keys = toCSV[0].keys()
with open('people.csv', 'wb') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(toCSV)

编辑:我以前的解决方案不处理订单。正如Wilduck所指出的,此处DictWriter更合适。

import csv
toCSV = [{'name':'bob','age':25,'weight':200},
         {'name':'jim','age':31,'weight':180}]
keys = toCSV[0].keys()
with open('people.csv', 'w', newline='')  as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(toCSV)

EDIT: My prior solution doesn’t handle the order. As noted by Wilduck, DictWriter is more appropriate here.


回答 1

这是当您有一个词典列表时:

import csv
with open('names.csv', 'w') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})

this is when you have one dictionary list:

import csv
with open('names.csv', 'w') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})

回答 2

在python 3中,情况有所不同,但是方式更简单,错误更少。最好告诉CSV文件应使用utf8编码打开,因为这会使数据更易于其他人使用(假设您未使用限制性更强的编码,例如latin1

import csv
toCSV = [{'name':'bob','age':25,'weight':200},
         {'name':'jim','age':31,'weight':180}]
with open('people.csv', 'w', encoding='utf8', newline='') as output_file:
    fc = csv.DictWriter(output_file, 
                        fieldnames=toCSV[0].keys(),

                       )
    fc.writeheader()
    fc.writerows(toCSV)
  • 请注意,csv在python 3中需要该newline=''参数,否则在excel / opencalc中打开时,CSV中会出现空白行。

或者:我更喜欢在pandas模块中使用csv处理程序。我发现它对编码问题的容忍度更高,并且熊猫在加载文件时会自动将CSV中的字符串数字转换为正确的类型(int,float等)。

import pandas
dataframe = pandas.read_csv(filepath)
list_of_dictionaries = dataframe.to_dict('records')
dataframe.to_csv(filepath)

注意:

  • 如果您提供路径,pandas将为您打开文件,并且默认为 utf8 python3中的名称,并且找出标头。
  • 数据框的结构与CSV所提供的结构不同,因此在加载时添加一行即可得到相同的结果: dataframe.to_dict('records')
  • 熊猫还使控制csv文件中列的顺序变得更加容易。默认情况下,它们是字母顺序的,但是您可以指定列顺序。使用香草csv模块,您需要将其喂入,OrderedDict否则它们将以随机顺序出现(如果在python <3.5中工作)。有关更多信息,请参见:在Python Pandas DataFrame中保留列顺序

In python 3 things are a little different, but way simpler and less error prone. It’s a good idea to tell the CSV your file should be opened with utf8 encoding, as it makes that data more portable to others (assuming you aren’t using a more restrictive encoding, like latin1)

import csv
toCSV = [{'name':'bob','age':25,'weight':200},
         {'name':'jim','age':31,'weight':180}]
with open('people.csv', 'w', encoding='utf8', newline='') as output_file:
    fc = csv.DictWriter(output_file, 
                        fieldnames=toCSV[0].keys(),

                       )
    fc.writeheader()
    fc.writerows(toCSV)
  • Note that csv in python 3 needs the newline='' parameter, otherwise you get blank lines in your CSV when opening in excel/opencalc.

Alternatively: I prefer use to the csv handler in the pandas module. I find it is more tolerant of encoding issues, and pandas will automatically convert string numbers in CSVs into the correct type (int,float,etc) when loading the file.

import pandas
dataframe = pandas.read_csv(filepath)
list_of_dictionaries = dataframe.to_dict('records')
dataframe.to_csv(filepath)

Note:

  • pandas will take care of opening the file for you if you give it a path, and will default to utf8 in python3, and figure out headers too.
  • a dataframe is not the same structure as what CSV gives you, so you add one line upon loading to get the same thing: dataframe.to_dict('records')
  • pandas also makes it much easier to control the order of columns in your csv file. By default, they’re alphabetical, but you can specify the column order. With vanilla csv module, you need to feed it an OrderedDict or they’ll appear in a random order (if working in python < 3.5). See: Preserving column order in Python Pandas DataFrame for more.

回答 3

因为@User和@BiXiC在这里寻求UTF-8的帮助,所以@Matthew提供了解决方案的变体。(不允许发表评论,所以我在回答。)

import unicodecsv as csv
toCSV = [{'name':'bob','age':25,'weight':200},
         {'name':'jim','age':31,'weight':180}]
keys = toCSV[0].keys()
with open('people.csv', 'wb') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(toCSV)

Because @User and @BiXiC asked for help with UTF-8 here a variation of the solution by @Matthew. (I’m not allowed to comment, so I’m answering.)

import unicodecsv as csv
toCSV = [{'name':'bob','age':25,'weight':200},
         {'name':'jim','age':31,'weight':180}]
keys = toCSV[0].keys()
with open('people.csv', 'wb') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(toCSV)

回答 4

import csv

with open('file_name.csv', 'w') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(('colum1', 'colum2', 'colum3'))
    for key, value in dictionary.items():
        writer.writerow([key, value[0], value[1]])

这是将数据写入.csv文件的最简单方法

import csv

with open('file_name.csv', 'w') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(('colum1', 'colum2', 'colum3'))
    for key, value in dictionary.items():
        writer.writerow([key, value[0], value[1]])

This would be the simplest way to write data to .csv file


回答 5

这是另一个更通用的解决方案,假设您没有行列表(也许它们不适合内存)或标题的副本(也许write_csv函数是通用的):

def gen_rows():
    yield OrderedDict(name='bob', age=25, weight=200)
    yield OrderedDict(name='jim', age=31, weight=180)

def write_csv():
    it = genrows()
    first_row = it.next()  # __next__ in py3
    with open("people.csv", "w") as outfile:
        wr = csv.DictWriter(outfile, fieldnames=list(first_row))
        wr.writeheader()
        wr.writerow(first_row)
        wr.writerows(it)

注意:这里使用的OrderedDict构造函数仅在python> 3.4中保留顺序。如果订单很重要,请使用OrderedDict([('name', 'bob'),('age',25)])表格。

Here is another, more general solution assuming you don’t have a list of rows (maybe they don’t fit in memory) or a copy of the headers (maybe the write_csv function is generic):

def gen_rows():
    yield OrderedDict(name='bob', age=25, weight=200)
    yield OrderedDict(name='jim', age=31, weight=180)

def write_csv():
    it = genrows()
    first_row = it.next()  # __next__ in py3
    with open("people.csv", "w") as outfile:
        wr = csv.DictWriter(outfile, fieldnames=list(first_row))
        wr.writeheader()
        wr.writerow(first_row)
        wr.writerows(it)

Note: the OrderedDict constructor used here only preserves order in python >3.4. If order is important, use the OrderedDict([('name', 'bob'),('age',25)]) form.


回答 6

import csv
toCSV = [{'name':'bob','age':25,'weight':200},
         {'name':'jim','age':31,'weight':180}]
header=['name','age','weight']     
try:
   with open('output'+str(date.today())+'.csv',mode='w',encoding='utf8',newline='') as output_to_csv:
       dict_csv_writer = csv.DictWriter(output_to_csv, fieldnames=header,dialect='excel')
       dict_csv_writer.writeheader()
       dict_csv_writer.writerows(toCSV)
   print('\nData exported to csv succesfully and sample data')
except IOError as io:
    print('\n',io)
import csv
toCSV = [{'name':'bob','age':25,'weight':200},
         {'name':'jim','age':31,'weight':180}]
header=['name','age','weight']     
try:
   with open('output'+str(date.today())+'.csv',mode='w',encoding='utf8',newline='') as output_to_csv:
       dict_csv_writer = csv.DictWriter(output_to_csv, fieldnames=header,dialect='excel')
       dict_csv_writer.writeheader()
       dict_csv_writer.writerows(toCSV)
   print('\nData exported to csv succesfully and sample data')
except IOError as io:
    print('\n',io)

csv.Error:迭代器应返回字符串,而不是字节

问题:csv.Error:迭代器应返回字符串,而不是字节

Sample.csv包含以下内容:

NAME    Id   No  Dept
Tom     1    12   CS
Hendry  2    35   EC
Bahamas 3    21   IT
Frank   4    61   EE

Python文件包含以下代码:

import csv
ifile  = open('sample.csv', "rb")
read = csv.reader(ifile)
for row in read :
    print (row) 

当我在Python中运行上述代码时,出现以下异常:

文件“ csvformat.py”,第4行,在已读行中表示:_csv.Error:迭代器应返回字符串,而不是字节(您是否以文本模式打开文件?)

我该如何解决?

Sample.csv contains the following:

NAME    Id   No  Dept
Tom     1    12   CS
Hendry  2    35   EC
Bahamas 3    21   IT
Frank   4    61   EE

And the Python file contains the following code:

import csv
ifile  = open('sample.csv', "rb")
read = csv.reader(ifile)
for row in read :
    print (row) 

When I run the above code in Python, I get the following exception:

File “csvformat.py”, line 4, in for row in read : _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

How can I fix it?


回答 0

您以文本模式打开文件。

进一步来说:

ifile  = open('sample.csv', "rt", encoding=<theencodingofthefile>)

编码的不错猜测是“ ascii”和“ utf8”。您还可以关闭编码,它将使用系统默认编码,该编码通常为UTF8,但可能还有其他含义。

You open the file in text mode.

More specifically:

ifile  = open('sample.csv', "rt", encoding=<theencodingofthefile>)

Good guesses for encoding is “ascii” and “utf8”. You can also leave the encoding off, and it will use the system default encoding, which tends to be UTF8, but may be something else.


回答 1

我只是用我的代码解决了这个问题。引发该异常的原因是因为您有参数rb。更改为r

您的代码:

import csv
ifile  = open('sample.csv', "rb")
read = csv.reader(ifile)
for row in read :
    print (row) 

新代码:

import csv
ifile  = open('sample.csv', "r")
read = csv.reader(ifile)
for row in read :
    print (row)

The reason it is throwing that exception is because you have the argument rb, which opens the file in binary mode. Change that to r, which will by default open the file in text mode.

Your code:

import csv
ifile  = open('sample.csv', "rb")
read = csv.reader(ifile)
for row in read :
    print (row) 

New code:

import csv
ifile  = open('sample.csv', "r")
read = csv.reader(ifile)
for row in read :
    print (row)

回答 2

您的问题是您bopen标志中。该标志rt(读取,文本)是默认设置,因此,使用上下文管理器只需执行以下操作:

with open('sample.csv') as ifile:
    read = csv.reader(ifile) 
    for row in read:
        print (row)  

上下文管理器意味着您不需要常规的错误处理(如果没有这种处理,您可能会卡在打开文件中,尤其是在解释器中),因为它会在发生错误或退出上下文时自动关闭文件。

上面是一样的:

with open('sample.csv', 'r') as ifile:
    ...

要么

with open('sample.csv', 'rt') as ifile:
    ...

Your problem is you have the b in the open flag. The flag rt (read, text) is the default, so, using the context manager, simply do this:

with open('sample.csv') as ifile:
    read = csv.reader(ifile) 
    for row in read:
        print (row)  

The context manager means you don’t need generic error handling (without which you may get stuck with the file open, especially in an interpreter), because it will automatically close the file on an error, or on exiting the context.

The above is the same as:

with open('sample.csv', 'r') as ifile:
    ...

or

with open('sample.csv', 'rt') as ifile:
    ...

回答 3

csv.reader期望在Python3中,传递可迭代的返回字符串,而不是字节。这是使用codecs模块的另一解决方案:

import csv
import codecs
ifile  = open('sample.csv', "rb")
read = csv.reader(codecs.iterdecode(ifile, 'utf-8'))
for row in read :
    print (row) 

In Python3, csv.reader expects, that passed iterable returns strings, not bytes. Here is one more solution to this problem, that uses codecs module:

import csv
import codecs
ifile  = open('sample.csv', "rb")
read = csv.reader(codecs.iterdecode(ifile, 'utf-8'))
for row in read :
    print (row) 

回答 4

运行使用Python 2.6.4开发的旧python脚本时出现此错误

更新到3.6.2时,我必须从打开的调用中删除所有“ rb”参数,以修复此csv读取错误。

I had this error when running an old python script developped with Python 2.6.4

When updating to 3.6.2, I had to remove all ‘rb’ parameters from open calls in order to fix this csv reading error.


从CSV文件创建字典?

问题:从CSV文件创建字典?

我正在尝试从csv文件创建字典。csv文件的第一列包含唯一键,第二列包含值。csv文件的每一行代表字典中的唯一键,值对。我尝试使用csv.DictReadercsv.DictWriter类,但是只能弄清楚如何为每一行生成一个新的字典。我要一部字典。这是我尝试使用的代码:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
    writer = csv.writer(outfile)
    for rows in reader:
        k = rows[0]
        v = rows[1]
        mydict = {k:v for k, v in rows}
    print(mydict)

当我运行上面的代码时,我得到一个ValueError: too many values to unpack (expected 2)。如何从csv文件创建一个字典?谢谢。

I am trying to create a dictionary from a csv file. The first column of the csv file contains unique keys and the second column contains values. Each row of the csv file represents a unique key, value pair within the dictionary. I tried to use the csv.DictReader and csv.DictWriter classes, but I could only figure out how to generate a new dictionary for each row. I want one dictionary. Here is the code I am trying to use:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
    writer = csv.writer(outfile)
    for rows in reader:
        k = rows[0]
        v = rows[1]
        mydict = {k:v for k, v in rows}
    print(mydict)

When I run the above code I get a ValueError: too many values to unpack (expected 2). How do I create one dictionary from a csv file? Thanks.


回答 0

我相信您正在寻找的语法如下:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = {rows[0]:rows[1] for rows in reader}

或者,对于python <= 2.7.1,您需要:

mydict = dict((rows[0],rows[1]) for rows in reader)

I believe the syntax you were looking for is as follows:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = {rows[0]:rows[1] for rows in reader}

Alternately, for python <= 2.7.1, you want:

mydict = dict((rows[0],rows[1]) for rows in reader)

回答 1

通过依次调用open和打开文件csv.DictReader

input_file = csv.DictReader(open("coors.csv"))

您可以通过遍历input_file遍历csv文件dict阅读器对象的行。

for row in input_file:
    print(row)

或仅访问第一行

dictobj = csv.DictReader(open('coors.csv')).next() 

更新 在python 3+版本中,此代码将有所变化:

reader = csv.DictReader(open('coors.csv'))
dictobj = next(reader) 

Open the file by calling open and then csv.DictReader.

input_file = csv.DictReader(open("coors.csv"))

You may iterate over the rows of the csv file dict reader object by iterating over input_file.

for row in input_file:
    print(row)

OR To access first line only

dictobj = csv.DictReader(open('coors.csv')).next() 

UPDATE In python 3+ versions, this code would change a little:

reader = csv.DictReader(open('coors.csv'))
dictobj = next(reader) 

回答 2

import csv
reader = csv.reader(open('filename.csv', 'r'))
d = {}
for row in reader:
   k, v = row
   d[k] = v
import csv
reader = csv.reader(open('filename.csv', 'r'))
d = {}
for row in reader:
   k, v = row
   d[k] = v

回答 3

这不是很好,但是使用熊猫的一线解决方案。

import pandas as pd
pd.read_csv('coors.csv', header=None, index_col=0, squeeze=True).to_dict()

如果要为索引指定dtype(如果由于bug而使用index_col参数,则无法在read_csv中指定该类型):

import pandas as pd
pd.read_csv('coors.csv', header=None, dtype={0: str}).set_index(0).squeeze().to_dict()

This isn’t elegant but a one line solution using pandas.

import pandas as pd
pd.read_csv('coors.csv', header=None, index_col=0, squeeze=True).to_dict()

If you want to specify dtype for your index (it can’t be specified in read_csv if you use the index_col argument because of a bug):

import pandas as pd
pd.read_csv('coors.csv', header=None, dtype={0: str}).set_index(0).squeeze().to_dict()

回答 4

您只需要将csv.reader转换为dict:

~ >> cat > 1.csv
key1, value1
key2, value2
key2, value22
key3, value3

~ >> cat > d.py
import csv
with open('1.csv') as f:
    d = dict(filter(None, csv.reader(f)))

print(d)

~ >> python d.py
{'key3': ' value3', 'key2': ' value22', 'key1': ' value1'}

You have to just convert csv.reader to dict:

~ >> cat > 1.csv
key1, value1
key2, value2
key2, value22
key3, value3

~ >> cat > d.py
import csv
with open('1.csv') as f:
    d = dict(filter(None, csv.reader(f)))

print(d)

~ >> python d.py
{'key3': ' value3', 'key2': ' value22', 'key1': ' value1'}

回答 5

您也可以为此使用numpy。

from numpy import loadtxt
key_value = loadtxt("filename.csv", delimiter=",")
mydict = { k:v for k,v in key_value }

You can also use numpy for this.

from numpy import loadtxt
key_value = loadtxt("filename.csv", delimiter=",")
mydict = { k:v for k,v in key_value }

回答 6

我建议添加if rows,以防文件末尾有空行

import csv
with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = dict(row[:2] for row in reader if row)

I’d suggest adding if rows in case there is an empty line at the end of the file

import csv
with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = dict(row[:2] for row in reader if row)

回答 7

一线解决方案

import pandas as pd

dict = {row[0] : row[1] for _, row in pd.read_csv("file.csv").iterrows()}

One-liner solution

import pandas as pd

dict = {row[0] : row[1] for _, row in pd.read_csv("file.csv").iterrows()}

回答 8

如果可以使用numpy包,则可以执行以下操作:

import numpy as np

lines = np.genfromtxt("coors.csv", delimiter=",", dtype=None)
my_dict = dict()
for i in range(len(lines)):
   my_dict[lines[i][0]] = lines[i][1]

If you are OK with using the numpy package, then you can do something like the following:

import numpy as np

lines = np.genfromtxt("coors.csv", delimiter=",", dtype=None)
my_dict = dict()
for i in range(len(lines)):
   my_dict[lines[i][0]] = lines[i][1]

回答 9

对于简单的csv文件,例如以下内容

id,col1,col2,col3
row1,r1c1,r1c2,r1c3
row2,r2c1,r2c2,r2c3
row3,r3c1,r3c2,r3c3
row4,r4c1,r4c2,r4c3

您可以仅使用内置功能将其转换为Python字典

with open(csv_file) as f:
    csv_list = [[val.strip() for val in r.split(",")] for r in f.readlines()]

(_, *header), *data = csv_list
csv_dict = {}
for row in data:
    key, *values = row   
    csv_dict[key] = {key: value for key, value in zip(header, values)}

这应该产生以下字典

{'row1': {'col1': 'r1c1', 'col2': 'r1c2', 'col3': 'r1c3'},
 'row2': {'col1': 'r2c1', 'col2': 'r2c2', 'col3': 'r2c3'},
 'row3': {'col1': 'r3c1', 'col2': 'r3c2', 'col3': 'r3c3'},
 'row4': {'col1': 'r4c1', 'col2': 'r4c2', 'col3': 'r4c3'}}

注意:Python字典具有唯一键,因此,如果csv文件重复ids,则应将每行追加到列表中。

for row in data:
    key, *values = row

    if key not in csv_dict:
            csv_dict[key] = []

    csv_dict[key].append({key: value for key, value in zip(header, values)})

For simple csv files, such as the following

id,col1,col2,col3
row1,r1c1,r1c2,r1c3
row2,r2c1,r2c2,r2c3
row3,r3c1,r3c2,r3c3
row4,r4c1,r4c2,r4c3

You can convert it to a Python dictionary using only built-ins

with open(csv_file) as f:
    csv_list = [[val.strip() for val in r.split(",")] for r in f.readlines()]

(_, *header), *data = csv_list
csv_dict = {}
for row in data:
    key, *values = row   
    csv_dict[key] = {key: value for key, value in zip(header, values)}

This should yield the following dictionary

{'row1': {'col1': 'r1c1', 'col2': 'r1c2', 'col3': 'r1c3'},
 'row2': {'col1': 'r2c1', 'col2': 'r2c2', 'col3': 'r2c3'},
 'row3': {'col1': 'r3c1', 'col2': 'r3c2', 'col3': 'r3c3'},
 'row4': {'col1': 'r4c1', 'col2': 'r4c2', 'col3': 'r4c3'}}

Note: Python dictionaries have unique keys, so if your csv file has duplicate ids you should append each row to a list.

for row in data:
    key, *values = row

    if key not in csv_dict:
            csv_dict[key] = []

    csv_dict[key].append({key: value for key, value in zip(header, values)})

回答 10

您可以使用它,这非常酷:

import dataconverters.commas as commas
filename = 'test.csv'
with open(filename) as f:
      records, metadata = commas.parse(f)
      for row in records:
            print 'this is row in dictionary:'+rowenter code here

You can use this, it is pretty cool:

import dataconverters.commas as commas
filename = 'test.csv'
with open(filename) as f:
      records, metadata = commas.parse(f)
      for row in records:
            print 'this is row in dictionary:'+rowenter code here

回答 11

已经发布了许多解决方案,我想为我的做出贡献,该解决方案适用于CSV文件中不同数量的列。它创建每列一个键的字典,每个键的值是一个列表,其中包含该列中的元素。

    input_file = csv.DictReader(open(path_to_csv_file))
    csv_dict = {elem: [] for elem in input_file.fieldnames}
    for row in input_file:
        for key in csv_dict.keys():
            csv_dict[key].append(row[key])

Many solutions have been posted and I’d like to contribute with mine, which works for a different number of columns in the CSV file. It creates a dictionary with one key per column, and the value for each key is a list with the elements in such column.

    input_file = csv.DictReader(open(path_to_csv_file))
    csv_dict = {elem: [] for elem in input_file.fieldnames}
    for row in input_file:
        for key in csv_dict.keys():
            csv_dict[key].append(row[key])

回答 12

例如,使用熊猫要容易得多。假设您拥有以下数据作为CSV并将其命名为test.txt/ test.csv(您知道CSV是一种文本文件)

a,b,c,d
1,2,3,4
5,6,7,8

现在正在使用熊猫

import pandas as pd
df = pd.read_csv("./text.txt")
df_to_doct = df.to_dict()

对于每一行,

df.to_dict(orient='records')

就是这样。

with pandas, it is much easier, for example. assuming you have the following data as CSV and let’s call it test.txt / test.csv (you know CSV is a sort of text file )

a,b,c,d
1,2,3,4
5,6,7,8

now using pandas

import pandas as pd
df = pd.read_csv("./text.txt")
df_to_doct = df.to_dict()

for each row, it would be

df.to_dict(orient='records')

and that’s it.


回答 13

尝试使用defaultdictDictReader

import csv
from collections import defaultdict
my_dict = defaultdict(list)

with open('filename.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for line in csv_reader:
        for key, value in line.items():
            my_dict[key].append(value)

它返回:

{'key1':[value_1, value_2, value_3], 'key2': [value_a, value_b, value_c], 'Key3':[value_x, Value_y, Value_z]}

Try to use a defaultdict and DictReader.

import csv
from collections import defaultdict
my_dict = defaultdict(list)

with open('filename.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for line in csv_reader:
        for key, value in line.items():
            my_dict[key].append(value)

It returns:

{'key1':[value_1, value_2, value_3], 'key2': [value_a, value_b, value_c], 'Key3':[value_x, Value_y, Value_z]}

如何摆脱熊猫DataFrame中的“未命名:0”列?

问题:如何摆脱熊猫DataFrame中的“未命名:0”列?

我遇到一种情况,有时当我csv从中读取时,会df得到一个不需要的类似索引的列,名为unnamed:0

file.csv

,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9

CSV读取与此:

pd.read_csv('file.csv')

   Unnamed: 0  A  B  C
0           0  1  2  3
1           1  4  5  6
2           2  7  8  9

这很烦人!有谁知道如何摆脱这一点?

I have a situation wherein sometimes when I read a csv from df I get an unwanted index-like column named unnamed:0.

file.csv

,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9

The CSV is read with this:

pd.read_csv('file.csv')

   Unnamed: 0  A  B  C
0           0  1  2  3
1           1  4  5  6
2           2  7  8  9

This is very annoying! Does anyone have an idea on how to get rid of this?


回答 0

它是索引列,请传递index=False以不将其写出,请参阅文档

例:

In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))

Out[37]:
   Unnamed: 0         a         b         c
0           0  0.109066 -1.112704 -0.545209
1           1  0.447114  1.525341  0.317252
2           2  0.507495  0.137863  0.886283
3           3  1.452867  1.888363  1.168101
4           4  0.901371 -0.704805  0.088335

与之比较:

In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))

Out[38]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

您还可以选择read_csv通过传递index_col=0以下内容来判断第一列是索引列:

In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)

Out[40]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

It’s the index column, pass index=False to not write it out, see the docs

Example:

In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))

Out[37]:
   Unnamed: 0         a         b         c
0           0  0.109066 -1.112704 -0.545209
1           1  0.447114  1.525341  0.317252
2           2  0.507495  0.137863  0.886283
3           3  1.452867  1.888363  1.168101
4           4  0.901371 -0.704805  0.088335

compare with:

In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))

Out[38]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

You could also optionally tell read_csv that the first column is the index column by passing index_col=0:

In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)

Out[40]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

回答 1

由于您的CSV及其CSV文件RangeIndex(通常没有名称)一起保存,因此很可能会出现此问题。在保存DataFrame时,实际上需要完成此修复,但这并不总是一种选择。

避免问题:read_csv带有index_col 参数

IMO,最简单的解决方案是将未命名的列作为index读取。将index_col=[0]参数指定为pd.read_csv,它将在第一列中读取作为索引。

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

# Save DataFrame to CSV.
df.to_csv('file.csv')

pd.read_csv('file.csv')

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

注意如果DataFrame没有索引开头,则可以
通过index=False在创建输出CSV时使用来避免这种情况。

df.to_csv('file.csv', index=False)

但是如上所述,这并不总是一种选择。


权宜之计解决方案:过滤 str.match

如果您无法修改代码以读取/写入CSV文件,则可以通过使用str.match以下:

df 

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')

df.columns.str.match('Unnamed')
# array([ True, False, False, False])

df.loc[:, ~df.columns.str.match('Unnamed')]

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

This issue most likely manifests because your CSV was saved along with its RangeIndex (which usually doesn’t have a name). The fix would actually need to be done when saving the DataFrame, but this isn’t always an option.

Avoiding the Problem: read_csv with index_col argument

IMO, the simplest solution would be to read the unnamed column as the index. Specify an index_col=[0] argument to pd.read_csv, this reads in the first column as the index.

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

# Save DataFrame to CSV.
df.to_csv('file.csv')

pd.read_csv('file.csv')

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

Note
You could have avoided this in the first place by using index=False when creating the output CSV, if your DataFrame does not have an index to begin with.

df.to_csv('file.csv', index=False)

But as mentioned above, this isn’t always an option.


Stopgap Solution: Filtering with str.match

If you cannot modify the code to read/write the CSV file, you can just remove the column by filtering with str.match:

df 

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')

df.columns.str.match('Unnamed')
# array([ True, False, False, False])

df.loc[:, ~df.columns.str.match('Unnamed')]

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

回答 2

可能发生这种情况的另一种情况是,如果您的数据被不正确地写入到您的数据中,以致csv每一行都以逗号结尾。Unnamed: x当您尝试将数据读入时,这将在数据末尾留下一个未命名的列df

Another case that this might be happening is if your data was improperly written to your csv to have each row end with a comma. This will leave you with an unnamed column Unnamed: x at the end of your data when you try to read it into a df.


回答 3

要使用所有未命名列,您还可以使用正则表达式,例如 df.drop(df.filter(regex="Unname"),axis=1, inplace=True)

To get ride of all Unnamed columns, you can also use regex such as df.drop(df.filter(regex="Unname"),axis=1, inplace=True)


回答 4

只需使用以下命令删除该列: del df['column_name']

Simply delete that column using: del df['column_name']


写入Excel电子表格

问题:写入Excel电子表格

我是Python的新手。我需要将程序中的一些数据写入电子表格。我在网上搜索过,似乎有很多可用的软件包(xlwt,XlsXcessive,openpyxl)。其他人则建议写入.csv文件(从不使用CSV,也不真正了解它是什么)。

该程序非常简单。我有两个列表(浮点数)和三个变量(字符串)。我不知道两个列表的长度,它们的长度可能不一样。

我希望布局如下图所示:

粉色列将具有第一个列表的值,绿色列将具有第二个列表的值。

那么最好的方法是什么?

PS我正在运行Windows 7,但运行该程序的计算机上不一定安装了Office。

import xlwt

x=1
y=2
z=3

list1=[2.34,4.346,4.234]

book = xlwt.Workbook(encoding="utf-8")

sheet1 = book.add_sheet("Sheet 1")

sheet1.write(0, 0, "Display")
sheet1.write(1, 0, "Dominance")
sheet1.write(2, 0, "Test")

sheet1.write(0, 1, x)
sheet1.write(1, 1, y)
sheet1.write(2, 1, z)

sheet1.write(4, 0, "Stimulus Time")
sheet1.write(4, 1, "Reaction Time")

i=4

for n in list1:
    i = i+1
    sheet1.write(i, 0, n)



book.save("trial.xls")

我是根据您的所有建议写的。它可以完成工作,但可以稍作改进。

如何将在for循环中创建的单元格(list1值)格式化为科学或数字格式?

我不想截断这些值。程序中使用的实际值在小数点后大约有10位数字。

I am new to Python. I need to write some data from my program to a spreadsheet. I’ve searched online and there seem to be many packages available (xlwt, XlsXcessive, openpyxl). Others suggest to write to a .csv file (never used CSV and don’t really understand what it is).

The program is very simple. I have two lists (float) and three variables (strings). I don’t know the lengths of the two lists and they probably won’t be the same length.

I want the layout to be as in the picture below:

The pink column will have the values of the first list and the green column will have the values of the second list.

So what’s the best way to do this?

P.S. I am running Windows 7 but I won’t necessarily have Office installed on the computers running this program.

import xlwt

x=1
y=2
z=3

list1=[2.34,4.346,4.234]

book = xlwt.Workbook(encoding="utf-8")

sheet1 = book.add_sheet("Sheet 1")

sheet1.write(0, 0, "Display")
sheet1.write(1, 0, "Dominance")
sheet1.write(2, 0, "Test")

sheet1.write(0, 1, x)
sheet1.write(1, 1, y)
sheet1.write(2, 1, z)

sheet1.write(4, 0, "Stimulus Time")
sheet1.write(4, 1, "Reaction Time")

i=4

for n in list1:
    i = i+1
    sheet1.write(i, 0, n)



book.save("trial.xls")

I wrote this using all your suggestions. It gets the job done but it can be slightly improved.

How do I format the cells created in the for loop (list1 values) as scientific or number?

I do not want to truncate the values. The actual values used in the program would have around 10 digits after the decimal.


回答 0

import xlwt

def output(filename, sheet, list1, list2, x, y, z):
    book = xlwt.Workbook()
    sh = book.add_sheet(sheet)

    variables = [x, y, z]
    x_desc = 'Display'
    y_desc = 'Dominance'
    z_desc = 'Test'
    desc = [x_desc, y_desc, z_desc]

    col1_name = 'Stimulus Time'
    col2_name = 'Reaction Time'

    #You may need to group the variables together
    #for n, (v_desc, v) in enumerate(zip(desc, variables)):
    for n, v_desc, v in enumerate(zip(desc, variables)):
        sh.write(n, 0, v_desc)
        sh.write(n, 1, v)

    n+=1

    sh.write(n, 0, col1_name)
    sh.write(n, 1, col2_name)

    for m, e1 in enumerate(list1, n+1):
        sh.write(m, 0, e1)

    for m, e2 in enumerate(list2, n+1):
        sh.write(m, 1, e2)

    book.save(filename)

有关更多说明:https : //github.com/python-excel

import xlwt

def output(filename, sheet, list1, list2, x, y, z):
    book = xlwt.Workbook()
    sh = book.add_sheet(sheet)

    variables = [x, y, z]
    x_desc = 'Display'
    y_desc = 'Dominance'
    z_desc = 'Test'
    desc = [x_desc, y_desc, z_desc]

    col1_name = 'Stimulus Time'
    col2_name = 'Reaction Time'

    #You may need to group the variables together
    #for n, (v_desc, v) in enumerate(zip(desc, variables)):
    for n, v_desc, v in enumerate(zip(desc, variables)):
        sh.write(n, 0, v_desc)
        sh.write(n, 1, v)

    n+=1

    sh.write(n, 0, col1_name)
    sh.write(n, 1, col2_name)

    for m, e1 in enumerate(list1, n+1):
        sh.write(m, 0, e1)

    for m, e2 in enumerate(list2, n+1):
        sh.write(m, 1, e2)

    book.save(filename)

for more explanation: https://github.com/python-excel


回答 1

使用DataFrame.to_excel大熊猫。Pandas允许您以功能丰富的数据结构表示数据,并且还可以读取 excel文件。

您首先必须将数据转换为DataFrame,然后将其保存到excel文件中,如下所示:

In [1]: from pandas import DataFrame
In [2]: l1 = [1,2,3,4]
In [3]: l2 = [1,2,3,4]
In [3]: df = DataFrame({'Stimulus Time': l1, 'Reaction Time': l2})
In [4]: df
Out[4]: 
   Reaction Time  Stimulus Time
0              1              1
1              2              2
2              3              3
3              4              4

In [5]: df.to_excel('test.xlsx', sheet_name='sheet1', index=False)

和出来的Excel文件看起来像这样:

请注意,两个列表的长度必须相等,否则熊猫会抱怨。要解决此问题,请将所有缺失的值替换为None

Use DataFrame.to_excel from pandas. Pandas allows you to represent your data in functionally rich datastructures and will let you read in excel files as well.

You will first have to convert your data into a DataFrame and then save it into an excel file like so:

In [1]: from pandas import DataFrame
In [2]: l1 = [1,2,3,4]
In [3]: l2 = [1,2,3,4]
In [3]: df = DataFrame({'Stimulus Time': l1, 'Reaction Time': l2})
In [4]: df
Out[4]: 
   Reaction Time  Stimulus Time
0              1              1
1              2              2
2              3              3
3              4              4

In [5]: df.to_excel('test.xlsx', sheet_name='sheet1', index=False)

and the excel file that comes out looks like this:

Note that both lists need to be of equal length else pandas will complain. To solve this, replace all missing values with None.


回答 2

  • xlrd / xlwt(标准):Python在其标准库中没有此功能,但我认为xlrd / xlwt是读取和写入excel文件的“标准”方式。制作工作簿,添加工作表,编写数据/公式以及格式化单元格非常容易。如果您需要所有这些东西,那么使用此库可能会获得最大的成功。我认为您可以选择openpyxl代替,它会非常相似,但是我没有使用过。

    要使用xlwt设置单元格格式,请定义a XFStyle并在写入工作表时包括样式。这是具有许多数字格式的示例。请参见下面的示例代码。

  • Tablib(功能强大,直观):Tablib是用于处理表格数据的功能更强大但更直观的库。它可以编写具有多个工作表以及其他格式(例如csv,json和yaml)的excel工作簿。如果不需要格式化的单元格(例如背景色),则可以使用该库,这将使您从长远角度上受益匪浅。

  • csv(简单):计算机上的文件是文本二进制文件。文本文件只是字符,包括特殊字符(例如换行符和选项卡),并且可以在任何地方轻松打开(例如记事本,Web浏览器或Office产品)。csv文件是一种以某种方式设置格式的文本文件:每一行都是一个值列表,以逗号分隔。Python程序可以轻松读取和写入文本,因此,csv文件是将数据从python程序导出到excel(或其他python程序)的最简单,最快的方法。

    Excel文件是二进制文件,需要知道文件格式的特殊库,这就是为什么您需要用于python的附加库或诸如Microsoft Excel,Gnumeric或LibreOffice之类的特殊程序来读取/写入它们的原因。


import xlwt

style = xlwt.XFStyle()
style.num_format_str = '0.00E+00'

...

for i,n in enumerate(list1):
    sheet1.write(i, 0, n, fmt)
  • xlrd/xlwt (standard): Python does not have this functionality in it’s standard library, but I think of xlrd/xlwt as the “standard” way to read and write excel files. It is fairly easy to make a workbook, add sheets, write data/formulas, and format cells. If you need all of these things, you may have the most success with this library. I think you could choose openpyxl instead and it would be quite similar, but I have not used it.

    To format cells with xlwt, define a XFStyle and include the style when you write to a sheet. Here is an example with many number formats. See example code below.

  • Tablib (powerful, intuitive): Tablib is a more powerful yet intuitive library for working with tabular data. It can write excel workbooks with multiple sheets as well as other formats, such as csv, json, and yaml. If you don’t need formatted cells (like background color), you will do yourself a favor to use this library, which will get you farther in the long run.

  • csv (easy): Files on your computer are either text or binary. Text files are just characters, including special ones like newlines and tabs, and can be easily opened anywhere (e.g. notepad, your web browser, or Office products). A csv file is a text file that is formatted in a certain way: each line is a list of values, separated by commas. Python programs can easily read and write text, so a csv file is the easiest and fastest way to export data from your python program into excel (or another python program).

    Excel files are binary and require special libraries that know the file format, which is why you need an additional library for python, or a special program like Microsoft Excel, Gnumeric, or LibreOffice, to read/write them.


import xlwt

style = xlwt.XFStyle()
style.num_format_str = '0.00E+00'

...

for i,n in enumerate(list1):
    sheet1.write(i, 0, n, fmt)

回答 3

我调查了一些用于Python的Excel模块,发现openpyxl是最好的。

免费书籍《用Python自动化无聊的东西》中有关于openpyxl的一章,其中有更多详细信息,或者您可以查看Read Docs网站。您无需安装Office或Excel即可使用openpyxl。

您的程序如下所示:

import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')

stimulusTimes = [1, 2, 3]
reactionTimes = [2.3, 5.1, 7.0]

for i in range(len(stimulusTimes)):
    sheet['A' + str(i + 6)].value = stimulusTimes[i]
    sheet['B' + str(i + 6)].value = reactionTimes[i]

wb.save('example.xlsx')

I surveyed a few Excel modules for Python, and found openpyxl to be the best.

The free book Automate the Boring Stuff with Python has a chapter on openpyxl with more details or you can check the Read the Docs site. You won’t need Office or Excel installed in order to use openpyxl.

Your program would look something like this:

import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')

stimulusTimes = [1, 2, 3]
reactionTimes = [2.3, 5.1, 7.0]

for i in range(len(stimulusTimes)):
    sheet['A' + str(i + 6)].value = stimulusTimes[i]
    sheet['B' + str(i + 6)].value = reactionTimes[i]

wb.save('example.xlsx')

回答 4

CSV代表逗号分隔的值。CSV就像一个文本文件,可以简单地通过添加.CSV扩展名来创建

例如,编写以下代码:

f = open('example.csv','w')
f.write("display,variable x")
f.close()

您可以使用excel打开此文件。

CSV stands for comma separated values. CSV is like a text file and can be created simply by adding the .CSV extension

for example write this code:

f = open('example.csv','w')
f.write("display,variable x")
f.close()

you can open this file with excel.


回答 5

import xlsxwriter


# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()

# Widen the first column to make the text clearer.
worksheet.set_column('A:A', 20)

# Add a bold format to use to highlight cells.
bold = workbook.add_format({'bold': True})

# Write some simple text.
worksheet.write('A1', 'Hello')

# Text with formatting.
worksheet.write('A2', 'World', bold)

# Write some numbers, with row/column notation.
worksheet.write(2, 0, 123)
worksheet.write(3, 0, 123.456)

# Insert an image.
worksheet.insert_image('B5', 'logo.png')

workbook.close()
import xlsxwriter


# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()

# Widen the first column to make the text clearer.
worksheet.set_column('A:A', 20)

# Add a bold format to use to highlight cells.
bold = workbook.add_format({'bold': True})

# Write some simple text.
worksheet.write('A1', 'Hello')

# Text with formatting.
worksheet.write('A2', 'World', bold)

# Write some numbers, with row/column notation.
worksheet.write(2, 0, 123)
worksheet.write(3, 0, 123.456)

# Insert an image.
worksheet.insert_image('B5', 'logo.png')

workbook.close()

回答 6

也尝试看看以下库:

xlwings-用于将数据从Python进出电子表格以及处理工作簿和图表

ExcelPython-一个Excel加载项,用于用Python代替VBA编写用户定义的函数(UDF)和宏

Try taking a look at the following libraries too:

xlwings – for getting data into and out of a spreadsheet from Python, as well as manipulating workbooks and charts

ExcelPython – an Excel add-in for writing user-defined functions (UDFs) and macros in Python instead of VBA


回答 7

OpenPyxl 是一个很好的库,用于读取/写入Excel 2010 xlsx / xlsm文件:

https://openpyxl.readthedocs.io/en/stable

另一个参考答案是使用折旧函数(get_sheet_by_name)。如果没有它,这是怎么做的:

import openpyxl

wbkName = 'New.xlsx'        #The file should be created before running the code.
wbk = openpyxl.load_workbook(wbkName)
wks = wbk['test1']
someValue = 1337
wks.cell(row=10, column=1).value = someValue
wbk.save(wbkName)
wbk.close

OpenPyxl is quite a nice library, built to read/write Excel 2010 xlsx/xlsm files:

https://openpyxl.readthedocs.io/en/stable

The other answer, referring to it is using the deperciated function (get_sheet_by_name). This is how to do it without it:

import openpyxl

wbkName = 'New.xlsx'        #The file should be created before running the code.
wbk = openpyxl.load_workbook(wbkName)
wks = wbk['test1']
someValue = 1337
wks.cell(row=10, column=1).value = someValue
wbk.save(wbkName)
wbk.close

回答 8

xlsxwriter库非常适合创建.xlsx文件。以下代码段.xlsx从字典列表中生成文件,同时说明顺序显示的名称

from xlsxwriter import Workbook


def create_xlsx_file(file_path: str, headers: dict, items: list):
    with Workbook(file_path) as workbook:
        worksheet = workbook.add_worksheet()
        worksheet.write_row(row=0, col=0, data=headers.values())
        header_keys = list(headers.keys())
        for index, item in enumerate(items):
            row = map(lambda field_id: item.get(field_id, ''), header_keys)
            worksheet.write_row(row=index + 1, col=0, data=row)


headers = {
    'id': 'User Id',
    'name': 'Full Name',
    'rating': 'Rating',
}

items = [
    {'id': 1, 'name': "Ilir Meta", 'rating': 0.06},
    {'id': 2, 'name': "Abdelmadjid Tebboune", 'rating': 4.0},
    {'id': 3, 'name': "Alexander Lukashenko", 'rating': 3.1},
    {'id': 4, 'name': "Miguel Díaz-Canel", 'rating': 0.32}
]

create_xlsx_file("my-xlsx-file.xlsx", headers, items)


💡注1-我故意不回答OP提出的确切情况。取而代之的是,我提出了一种大多数访客都希望使用的更为通用的解决方案。这个问题的标题在搜索引擎中有很好的索引,并跟踪大量流量

💡注2-如果您使用Python3.6或更高版本,请考虑OrderedDict在中使用headers。在Python3.6之前,dict未保留顺序。


The xlsxwriter library is great for creating .xlsx files. The following snippet generates an .xlsx file from a list of dicts while stating the order and the displayed names:

from xlsxwriter import Workbook


def create_xlsx_file(file_path: str, headers: dict, items: list):
    with Workbook(file_path) as workbook:
        worksheet = workbook.add_worksheet()
        worksheet.write_row(row=0, col=0, data=headers.values())
        header_keys = list(headers.keys())
        for index, item in enumerate(items):
            row = map(lambda field_id: item.get(field_id, ''), header_keys)
            worksheet.write_row(row=index + 1, col=0, data=row)


headers = {
    'id': 'User Id',
    'name': 'Full Name',
    'rating': 'Rating',
}

items = [
    {'id': 1, 'name': "Ilir Meta", 'rating': 0.06},
    {'id': 2, 'name': "Abdelmadjid Tebboune", 'rating': 4.0},
    {'id': 3, 'name': "Alexander Lukashenko", 'rating': 3.1},
    {'id': 4, 'name': "Miguel Díaz-Canel", 'rating': 0.32}
]

create_xlsx_file("my-xlsx-file.xlsx", headers, items)


💡 Note 1 – I’m purposely not answering to the exact case the OP presented. Instead, I’m presenting a more generic solution IMHO most visitors seek. This question’s title is well-indexed in search engines and tracks lots of traffic

💡 Note 2 – If you’re not using Python3.6 or newer, consider using OrderedDict in headers. Before Python3.6 the order in dict was not preserved.



回答 9

导入精确数字的最简单方法是在l1和中的数字后面添加小数l2。Python将此小数点解释为您的指示,以包括确切的数字。如果需要将其限制在小数点后一位,则应该能够创建一个限制输出的打印命令,类似于以下内容:

print variable_example[:13]

假设您的数据在小数点后还有两个整数,则将其限制在小数点后十位。

The easiest way to import the exact numbers is to add a decimal after the numbers in your l1 and l2. Python interprets this decimal point as instructions from you to include the exact number. If you need to restrict it to some decimal place, you should be able to create a print command that limits the output, something simple like:

print variable_example[:13]

Would restrict it to the tenth decimal place, assuming your data has two integers left of the decimal.


回答 10

您可以尝试hfexcel基于对人友好的面向对象的Python库XlsxWriter

from hfexcel import HFExcel

hf_workbook = HFExcel.hf_workbook('example.xlsx', set_default_styles=False)

hf_workbook.add_style(
    "headline", 
    {
       "bold": 1,
        "font_size": 14,
        "font": "Arial",
        "align": "center"
    }
)

sheet1 = hf_workbook.add_sheet("sheet1", name="Example Sheet 1")

column1, _ = sheet1.add_column('headline', name='Column 1', width=2)
column1.add_row(data='Column 1 Row 1')
column1.add_row(data='Column 1 Row 2')

column2, _ = sheet1.add_column(name='Column 2')
column2.add_row(data='Column 2 Row 1')
column2.add_row(data='Column 2 Row 2')


column3, _ = sheet1.add_column(name='Column 3')
column3.add_row(data='Column 3 Row 1')
column3.add_row(data='Column 3 Row 2')

# In order to get a row with coordinates:
# sheet[column_index][row_index] => row
print(sheet1[1][1].data)
assert(sheet1[1][1].data == 'Column 2 Row 2')

hf_workbook.save()

You can try hfexcel Human Friendly object-oriented python library based on XlsxWriter:

from hfexcel import HFExcel

hf_workbook = HFExcel.hf_workbook('example.xlsx', set_default_styles=False)

hf_workbook.add_style(
    "headline", 
    {
       "bold": 1,
        "font_size": 14,
        "font": "Arial",
        "align": "center"
    }
)

sheet1 = hf_workbook.add_sheet("sheet1", name="Example Sheet 1")

column1, _ = sheet1.add_column('headline', name='Column 1', width=2)
column1.add_row(data='Column 1 Row 1')
column1.add_row(data='Column 1 Row 2')

column2, _ = sheet1.add_column(name='Column 2')
column2.add_row(data='Column 2 Row 1')
column2.add_row(data='Column 2 Row 2')


column3, _ = sheet1.add_column(name='Column 3')
column3.add_row(data='Column 3 Row 1')
column3.add_row(data='Column 3 Row 2')

# In order to get a row with coordinates:
# sheet[column_index][row_index] => row
print(sheet1[1][1].data)
assert(sheet1[1][1].data == 'Column 2 Row 2')

hf_workbook.save()

回答 11

如果您需要修改现有工作簿,则最安全的方法是使用pyoo。您需要安装一些库,并且要花一些时间才能完成,但是一旦安装完成,这将是防弹的,因为您可以利用LibreOffice / OpenOffice的广泛而可靠的API。

请参阅我的要点,了解如何设置linux系统以及如何使用pyoo进行一些基本编码。

这是代码示例:

#!/usr/local/bin/python3
import pyoo
# Connect to LibreOffice using a named pipe 
# (named in the soffice process startup)
desktop = pyoo.Desktop(pipe='oo_pyuno')
wkbk = desktop.open_spreadsheet("<xls_file_name>")
sheet = wkbk.sheets['Sheet1']
# Write value 'foo' to cell E5 on Sheet1
sheet[4,4].value='foo'
wkbk.save()
wkbk.close()

If your need is to modify an existing workbook, the safest way would be to use pyoo. You need to have some libraries installed and it takes a few hoops to jump through but once its set up, this would be bulletproof as you are leveraging the wide and solid API’s of LibreOffice / OpenOffice.

Please see my Gist on how to set up a linux system and do some basic coding using pyoo.

Here is an example of the code:

#!/usr/local/bin/python3
import pyoo
# Connect to LibreOffice using a named pipe 
# (named in the soffice process startup)
desktop = pyoo.Desktop(pipe='oo_pyuno')
wkbk = desktop.open_spreadsheet("<xls_file_name>")
sheet = wkbk.sheets['Sheet1']
# Write value 'foo' to cell E5 on Sheet1
sheet[4,4].value='foo'
wkbk.save()
wkbk.close()