问题:使用熊猫从txt加载数据
我正在加载一个包含浮点和字符串数据混合的txt文件。我想将它们存储在可以访问每个元素的数组中。现在我正在做
import pandas as pd
data = pd.read_csv('output_list.txt', header = None)
print data
这是输入文件的结构:1 0 2000.0 70.2836942112 1347.28369421 /file_address.txt
。
现在,数据将作为唯一列导入。我如何划分它,以便分别存储不同的元素(所以我可以调用data[i,j]
)?以及如何定义标题?
I am loading a txt file containig a mix of float and string data. I want to store them in an array where I can access each element. Now I am just doing
import pandas as pd
data = pd.read_csv('output_list.txt', header = None)
print data
This is the structure of the input file: 1 0 2000.0 70.2836942112 1347.28369421 /file_address.txt
.
Now the data are imported as a unique column. How can I divide it, so to store different elements separately (so I can call data[i,j]
)? And how can I define a header?
回答 0
您可以使用:
data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]
添加sep=" "
您的代码,在引号之间留一个空格。因此,熊猫可以检测值之间的空格并按列排序。数据列用于命名您的列。
You can use:
data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]
Add sep=" "
in your code, leaving a blank space between the quotes. So pandas can detect spaces between values and sort in columns. Data columns is for naming your columns.
回答 1
我想补充上面的答案,你可以直接使用
df = pd.read_fwf('output_list.txt')
fwf代表固定宽度的格式化行。
I’d like to add to the above answers, you could directly use
df = pd.read_fwf('output_list.txt')
fwf stands for fixed width formatted lines.
回答 2
@Pietrovismara的解决方案是正确的,但我只想添加:可以使用pd.read_csv来执行此操作,而不必使用单独的行来添加列名称。
df = pd.read_csv('output_list.txt', sep=" ", header=None, names=["a", "b", "c"])
@Pietrovismara’s solution is correct but I’d just like to add: rather than having a separate line to add column names, it’s possible to do this from pd.read_csv.
df = pd.read_csv('output_list.txt', sep=" ", header=None, names=["a", "b", "c"])
回答 3
你可以用这个
import pandas as pd
dataset=pd.read_csv("filepath.txt",delimiter="\t")
you can use this
import pandas as pd
dataset=pd.read_csv("filepath.txt",delimiter="\t")
回答 4
如果您没有为数据分配索引,并且不确定间距是多少,可以使用让熊猫分配索引并查找多个空格。
df = pd.read_csv('filename.txt', delimiter= '\s+', index_col=False)
If you don’t have an index assigned to the data and you are not sure what the spacing is, you can use to let pandas assign an index and look for multiple spaces.
df = pd.read_csv('filename.txt', delimiter= '\s+', index_col=False)
回答 5
您可以这样做:
import pandas as pd
df = pd.read_csv('file_location\filename.txt', delimiter = "\t")
(例如df = pd.read_csv(’F:\ Desktop \ ds \ text.txt’,分隔符=“ \ t”)
You can do as:
import pandas as pd
df = pd.read_csv('file_location\filename.txt', delimiter = "\t")
(like, df = pd.read_csv(‘F:\Desktop\ds\text.txt’, delimiter = “\t”)
回答 6
根据熊猫的最新更改,您可以使用read_csv,不建议使用read_table:
import pandas as pd
pd.read_csv("file.txt", sep = "\t")
Based on the latest changes in pandas, you can use, read_csv , read_table is deprecated:
import pandas as pd
pd.read_csv("file.txt", sep = "\t")
回答 7
您可以使用read_table命令导入文本文件,如下所示:
import pandas as pd
df=pd.read_table('output_list.txt',header=None)
加载后需要进行预处理
You can import the text file using the read_table command as so:
import pandas as pd
df=pd.read_table('output_list.txt',header=None)
Preprocessing will need to be done after loading
回答 8
通常,我通常先看一下数据,或者只是尝试将其导入并执行data.head(),如果看到列之间用\ t分隔,则应指定sep="\t"
否则sep = " "
。
import pandas as pd
data = pd.read_csv('data.txt', sep=" ", header=None)
I usually take a look at the data first or just try to import it and do data.head(), if you see that the columns are separated with \t then you should specify sep="\t"
otherwise, sep = " "
.
import pandas as pd
data = pd.read_csv('data.txt', sep=" ", header=None)