如何使用Python将文本文件读取到列表或数组中

问题:如何使用Python将文本文件读取到列表或数组中

我正在尝试将文本文件的行读入python中的列表或数组中。创建后,我只需要能够单独访问列表或数组中的任何项目。

文本文件的格式如下:

0,0,200,0,53,1,0,255,...,0.

...以上,有实际的文本文件中有数百或数千多个项目。

我正在使用以下代码尝试将文件读入列表:

text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()

我得到的输出是:

['0,0,200,0,53,1,0,255,...,0.']
1

显然,它将整个文件读入一个项目列表,而不是单个项目列表。我究竟做错了什么?

I am trying to read the lines of a text file into a list or array in python. I just need to be able to individually access any item in the list or array after it is created.

The text file is formatted as follows:

0,0,200,0,53,1,0,255,...,0.

Where the ... is above, there actual text file has hundreds or thousands more items.

I’m using the following code to try to read the file into a list:

text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()

The output I get is:

['0,0,200,0,53,1,0,255,...,0.']
1

Apparently it is reading the entire file into a list of just one item, rather than a list of individual items. What am I doing wrong?


回答 0

您将必须使用以下方法将字符串拆分为值列表 split()

所以,

lines = text_file.read().split(',')

You will have to split your string into a list of values using split()

So,

lines = text_file.read().split(',')

EDIT: I didn’t realise there would be so much traction to this. Here’s a more idiomatic approach.

import csv
with open('filename.csv', 'r') as fd:
    reader = csv.reader(fd)
    for row in reader:
        # do something

回答 1

您也可以使用numpy loadtxt

from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)

You can also use numpy loadtxt like

from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)

回答 2

所以您想创建一个列表列表…我们需要从一个空列表开始

list_of_lists = []

接下来,我们逐行读取文件内容

with open('data') as f:
    for line in f:
        inner_list = [elt.strip() for elt in line.split(',')]
        # in alternative, if you need to use the file content as numbers
        # inner_list = [int(elt.strip()) for elt in line.split(',')]
        list_of_lists.append(inner_list)

一个常见的用例是列式数据,但我们的存储单位是文件的行,我们已逐一读取它,因此您可能需要转置 列表列表。这可以通过以下成语来完成

by_cols = zip(*list_of_lists)

另一个常见的用法是为每列命名

col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
    by_names[col_name] = by_cols[i]

这样您就可以对同类数据项进行操作

 mean_apple_prices = [money/fruits for money, fruits in
                     zip(by_names['apples revenue'], by_names['apples_sold'])]

我编写的大多数内容都可以使用csv标准库中的模块来加速。另一个第三方模块是pandas,它使您可以自动化典型数据分析的大多数方面(但具有许多依赖性)。


更新虽然在Python 2中zip(*list_of_lists)返回了一个不同的列表(换位后的列表),但在Python 3中情况发生了变化,并zip(*list_of_lists)返回了一个不能下标的zip对象

如果您需要索引访问,则可以使用

by_cols = list(zip(*list_of_lists))

为您提供了两个Python版本中的列表列表。

另一方面,如果您不需要索引访问,而您想要的只是构建一个按列名称索引的字典,那么zip对象就可以了。

file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column

So you want to create a list of lists… We need to start with an empty list

list_of_lists = []

next, we read the file content, line by line

with open('data') as f:
    for line in f:
        inner_list = [elt.strip() for elt in line.split(',')]
        # in alternative, if you need to use the file content as numbers
        # inner_list = [int(elt.strip()) for elt in line.split(',')]
        list_of_lists.append(inner_list)

A common use case is that of columnar data, but our units of storage are the rows of the file, that we have read one by one, so you may want to transpose your list of lists. This can be done with the following idiom

by_cols = zip(*list_of_lists)

Another common use is to give a name to each column

col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
    by_names[col_name] = by_cols[i]

so that you can operate on homogeneous data items

 mean_apple_prices = [money/fruits for money, fruits in
                     zip(by_names['apples revenue'], by_names['apples_sold'])]

Most of what I’ve written can be speeded up using the csv module, from the standard library. Another third party module is pandas, that lets you automate most aspects of a typical data analysis (but has a number of dependencies).


Update While in Python 2 zip(*list_of_lists) returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists) returns a zip object that is not subscriptable.

If you need indexed access you can use

by_cols = list(zip(*list_of_lists))

that gives you a list of lists in both versions of Python.

On the other hand, if you don’t need indexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine…

file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column

回答 3

这个问题问如何将文件中的逗号分隔值内容读取到可迭代列表中:

0,0,200,0,53,1,0,255,...,0.

最简单的方法是使用以下csv模块:

import csv
with open('filename.dat', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')

现在,您可以spamreader像这样轻松地进行迭代:

for row in spamreader:
    print(', '.join(row))

有关更多示例,请参见文档

This question is asking how to read the comma-separated value contents from a file into an iterable list:

0,0,200,0,53,1,0,255,...,0.

The easiest way to do this is with the csv module as follows:

import csv
with open('filename.dat', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')

Now, you can easily iterate over spamreader like this:

for row in spamreader:
    print(', '.join(row))

See documentation for more examples.