问题:Python将csv导入列表
我有一个大约有2000条记录的CSV文件。
每个记录都有一个字符串和一个类别:
This is the first line,Line1
This is the second line,Line2
This is the third line,Line3
我需要将此文件读入如下列表:
data = [('This is the first line', 'Line1'),
('This is the second line', 'Line2'),
('This is the third line', 'Line3')]
如何使用Python将CSV导入到我需要的列表中?
I have a CSV file with about 2000 records.
Each record has a string, and a category to it:
This is the first line,Line1
This is the second line,Line2
This is the third line,Line3
I need to read this file into a list that looks like this:
data = [('This is the first line', 'Line1'),
('This is the second line', 'Line2'),
('This is the third line', 'Line3')]
How can import this CSV to the list I need using Python?
回答 0
使用csv模块:
import csv
with open('file.csv', newline='') as f:
reader = csv.reader(f)
data = list(reader)
print(data)
输出:
[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]
如果您需要元组:
import csv
with open('file.csv', newline='') as f:
reader = csv.reader(f)
data = [tuple(row) for row in reader]
print(data)
输出:
[('This is the first line', 'Line1'), ('This is the second line', 'Line2'), ('This is the third line', 'Line3')]
旧的Python 2答案,也使用csv
模块:
import csv
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
your_list = list(reader)
print your_list
# [['This is the first line', 'Line1'],
# ['This is the second line', 'Line2'],
# ['This is the third line', 'Line3']]
Using the csv module:
import csv
with open('file.csv', newline='') as f:
reader = csv.reader(f)
data = list(reader)
print(data)
Output:
[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]
If you need tuples:
import csv
with open('file.csv', newline='') as f:
reader = csv.reader(f)
data = [tuple(row) for row in reader]
print(data)
Output:
[('This is the first line', 'Line1'), ('This is the second line', 'Line2'), ('This is the third line', 'Line3')]
Old Python 2 answer, also using the csv
module:
import csv
with open('file.csv', 'rb') as f:
reader = csv.reader(f)
your_list = list(reader)
print your_list
# [['This is the first line', 'Line1'],
# ['This is the second line', 'Line2'],
# ['This is the third line', 'Line3']]
回答 1
已针对Python 3更新:
import csv
with open('file.csv', newline='') as f:
reader = csv.reader(f)
your_list = list(reader)
print(your_list)
输出:
[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]
Updated for Python 3:
import csv
with open('file.csv', newline='') as f:
reader = csv.reader(f)
your_list = list(reader)
print(your_list)
Output:
[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]
回答 2
熊猫非常擅长处理数据。这是一个如何使用它的示例:
import pandas as pd
# Read the CSV into a pandas data frame (df)
# With a df you can do many things
# most important: visualize data with Seaborn
df = pd.read_csv('filename.csv', delimiter=',')
# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]
# or export it as a list of dicts
dicts = df.to_dict().values()
一大优势是,熊猫自动处理标题行。
如果您还没有听说过Seaborn,建议您看看。
另请参阅:如何使用Python读写CSV文件?
熊猫#2
import pandas as pd
# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()
# Convert
dicts = df.to_dict('records')
df的内容是:
country population population_time EUR
0 Germany 82521653.0 2016-12-01 True
1 France 66991000.0 2017-01-01 True
2 Indonesia 255461700.0 2017-01-01 False
3 Ireland 4761865.0 NaT True
4 Spain 46549045.0 2017-06-01 True
5 Vatican NaN NaT True
字典的内容是
[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True},
{'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True},
{'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False},
{'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True},
{'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True},
{'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]
熊猫#3
import pandas as pd
# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()
# Convert
lists = [[row[col] for col in df.columns] for row in df.to_dict('records')]
的内容lists
是:
[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True],
['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True],
['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False],
['Ireland', 4761865.0, NaT, True],
['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True],
['Vatican', nan, NaT, True]]
Pandas is pretty good at dealing with data. Here is one example how to use it:
import pandas as pd
# Read the CSV into a pandas data frame (df)
# With a df you can do many things
# most important: visualize data with Seaborn
df = pd.read_csv('filename.csv', delimiter=',')
# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]
# or export it as a list of dicts
dicts = df.to_dict().values()
One big advantage is that pandas deals automatically with header rows.
If you haven’t heard of Seaborn, I recommend having a look at it.
See also: How do I read and write CSV files with Python?
Pandas #2
import pandas as pd
# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()
# Convert
dicts = df.to_dict('records')
The content of df is:
country population population_time EUR
0 Germany 82521653.0 2016-12-01 True
1 France 66991000.0 2017-01-01 True
2 Indonesia 255461700.0 2017-01-01 False
3 Ireland 4761865.0 NaT True
4 Spain 46549045.0 2017-06-01 True
5 Vatican NaN NaT True
The content of dicts is
[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True},
{'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True},
{'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False},
{'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True},
{'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True},
{'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]
Pandas #3
import pandas as pd
# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()
# Convert
lists = [[row[col] for col in df.columns] for row in df.to_dict('records')]
The content of lists
is:
[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True],
['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True],
['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False],
['Ireland', 4761865.0, NaT, True],
['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True],
['Vatican', nan, NaT, True]]
回答 3
Python3更新:
import csv
from pprint import pprint
with open('text.csv', newline='') as file:
reader = csv.reader(file)
res = list(map(tuple, reader))
pprint(res)
输出:
[('This is the first line', ' Line1'),
('This is the second line', ' Line2'),
('This is the third line', ' Line3')]
如果csvfile是文件对象,则应使用打开newline=''
。
CSV模组
Update for Python3:
import csv
from pprint import pprint
with open('text.csv', newline='') as file:
reader = csv.reader(file)
res = list(map(tuple, reader))
pprint(res)
Output:
[('This is the first line', ' Line1'),
('This is the second line', ' Line2'),
('This is the third line', ' Line3')]
If csvfile is a file object, it should be opened with newline=''
.
csv module
回答 4
如果你相信有您的输入没有逗号,以外的其他类别分开,你可以逐行读取文件中的行和分裂上,
,然后推结果List
也就是说,您似乎正在查看CSV文件,因此您可以考虑为其使用模块
If you are sure there are no commas in your input, other than to separate the category, you can read the file line by line and split on ,
, then push the result to List
That said, it looks like you are looking at a CSV file, so you might consider using the modules for it
回答 5
result = []
for line in text.splitlines():
result.append(tuple(line.split(",")))
result = []
for line in text.splitlines():
result.append(tuple(line.split(",")))
回答 6
正如评论中已经说过的那样,您可以csv
在python中使用该库。csv的意思是逗号分隔的值,这似乎与您的情况完全相同:标签和由逗号分隔的值。
作为类别和值类型,我宁愿使用字典类型而不是元组列表。
无论如何,在下面的代码中,我都会同时显示两种方式:d
是字典,l
是元组列表。
import csv
file_name = "test.txt"
try:
csvfile = open(file_name, 'rt')
except:
print("File not found")
csvReader = csv.reader(csvfile, delimiter=",")
d = dict()
l = list()
for row in csvReader:
d[row[1]] = row[0]
l.append((row[0], row[1]))
print(d)
print(l)
As said already in the comments you can use the csv
library in python. csv means comma separated values which seems exactly your case: a label and a value separated by a comma.
Being a category and value type I would rather use a dictionary type instead of a list of tuples.
Anyway in the code below I show both ways: d
is the dictionary and l
is the list of tuples.
import csv
file_name = "test.txt"
try:
csvfile = open(file_name, 'rt')
except:
print("File not found")
csvReader = csv.reader(csvfile, delimiter=",")
d = dict()
l = list()
for row in csvReader:
d[row[1]] = row[0]
l.append((row[0], row[1]))
print(d)
print(l)
回答 7
一个简单的循环就足够了:
lines = []
with open('test.txt', 'r') as f:
for line in f.readlines():
l,name = line.strip().split(',')
lines.append((l,name))
print lines
A simple loop would suffice:
lines = []
with open('test.txt', 'r') as f:
for line in f.readlines():
l,name = line.strip().split(',')
lines.append((l,name))
print lines
回答 8
不幸的是,我发现没有一个现有的答案特别令人满意。
这是一个使用csv模块的简单,完整的Python 3解决方案。
import csv
with open('../resources/temp_in.csv', newline='') as f:
reader = csv.reader(f, skipinitialspace=True)
rows = list(reader)
print(rows)
注意skipinitialspace=True
参数。这是必要的,因为不幸的是,OP的CSV在每个逗号后都包含空格。
输出:
[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]
Unfortunately I find none of the existing answers particularly satisfying.
Here is a straightforward and complete Python 3 solution, using the csv module.
import csv
with open('../resources/temp_in.csv', newline='') as f:
reader = csv.reader(f, skipinitialspace=True)
rows = list(reader)
print(rows)
Notice the skipinitialspace=True
argument. This is necessary since, unfortunately, OP’s CSV contains whitespace after each comma.
Output:
[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]
回答 9
稍微扩展您的需求,并假设您不关心行的顺序,并希望将它们分组在类别下,则以下解决方案可能适用于您:
>>> fname = "lines.txt"
>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> with open(fname) as f:
... for line in f:
... text, cat = line.rstrip("\n").split(",", 1)
... dct[cat].append(text)
...
>>> dct
defaultdict(<type 'list'>, {' CatA': ['This is the first line', 'This is the another line'], ' CatC': ['This is the third line'], ' CatB': ['This is the second line', 'This is the last line']})
这样,您可以在字典中键为类别下获得所有可用的相关行。
Extending your requirements a bit and assuming you do not care about the order of lines and want to get them grouped under categories, the following solution may work for you:
>>> fname = "lines.txt"
>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> with open(fname) as f:
... for line in f:
... text, cat = line.rstrip("\n").split(",", 1)
... dct[cat].append(text)
...
>>> dct
defaultdict(<type 'list'>, {' CatA': ['This is the first line', 'This is the another line'], ' CatC': ['This is the third line'], ' CatB': ['This is the second line', 'This is the last line']})
This way you get all relevant lines available in the dictionary under key being the category.
回答 10
这是Python 3.x中最简单的将CSV导入多维数组的方法,它仅4行代码而无需导入任何内容!
#pull a CSV into a multidimensional array in 4 lines!
L=[] #Create an empty list for the main array
for line in open('log.txt'): #Open the file and read all the lines
x=line.rstrip() #Strip the \n from each line
L.append(x.split(',')) #Split each line into a list and add it to the
#Multidimensional array
print(L)
Here is the easiest way in Python 3.x to import a CSV to a multidimensional array, and its only 4 lines of code without importing anything!
#pull a CSV into a multidimensional array in 4 lines!
L=[] #Create an empty list for the main array
for line in open('log.txt'): #Open the file and read all the lines
x=line.rstrip() #Strip the \n from each line
L.append(x.split(',')) #Split each line into a list and add it to the
#Multidimensional array
print(L)
回答 11
接下来是一段代码,该代码使用csv模块,但使用第一行(即csv表的标头)将file.csv内容提取到字典列表中
import csv
def csv2dicts(filename):
with open(filename, 'rb') as f:
reader = csv.reader(f)
lines = list(reader)
if len(lines) < 2: return None
names = lines[0]
if len(names) < 1: return None
dicts = []
for values in lines[1:]:
if len(values) != len(names): return None
d = {}
for i,_ in enumerate(names):
d[names[i]] = values[i]
dicts.append(d)
return dicts
return None
if __name__ == '__main__':
your_list = csv2dicts('file.csv')
print your_list
Next is a piece of code which uses csv module but extracts file.csv contents to a list of dicts using the first line which is a header of csv table
import csv
def csv2dicts(filename):
with open(filename, 'rb') as f:
reader = csv.reader(f)
lines = list(reader)
if len(lines) < 2: return None
names = lines[0]
if len(names) < 1: return None
dicts = []
for values in lines[1:]:
if len(values) != len(names): return None
d = {}
for i,_ in enumerate(names):
d[names[i]] = values[i]
dicts.append(d)
return dicts
return None
if __name__ == '__main__':
your_list = csv2dicts('file.csv')
print your_list