问题:按日期对熊猫数据框进行排序

我有一个熊猫数据框,如下所示:

Symbol  Date
A       02/20/2015
A       01/15/2016
A       08/21/2015

我想按它排序Date,但该列只是一个object

我试图将列设置为日期对象,但是遇到了一种格式不需要的格式的问题。所需的格式为2015-02-20,等。

因此,现在我试图找出如何使numpy将“美国”日期转换为ISO标准,以便可以使它们成为日期对象,以便可以对它们进行排序。

我该如何将这些美国日期转换为ISO标准,或者我在熊猫中缺少更直接的方法?

I have a pandas dataframe as follows:

Symbol  Date
A       02/20/2015
A       01/15/2016
A       08/21/2015

I want to sort it by Date, but the column is just an object.

I tried to make the column a date object, but I ran into an issue where that format is not the format needed. The format needed is 2015-02-20, etc.

So now I’m trying to figure out how to have numpy convert the ‘American’ dates into the ISO standard, so that I can make them date objects, so that I can sort by them.

How would I convert these american dates into ISO standard, or is there a more straight forward method I’m missing within pandas?


回答 0

您可以pd.to_datetime()用来转换为日期时间对象。它带有一个format参数,但是在您的情况下,我认为您不需要它。

>>> import pandas as pd
>>> df = pd.DataFrame( {'Symbol':['A','A','A'] ,
    'Date':['02/20/2015','01/15/2016','08/21/2015']})
>>> df
         Date Symbol
0  02/20/2015      A
1  01/15/2016      A
2  08/21/2015      A
>>> df['Date'] =pd.to_datetime(df.Date)
>>> df.sort('Date') # This now sorts in date order
        Date Symbol
0 2015-02-20      A
2 2015-08-21      A
1 2016-01-15      A

为了将来搜索,您可以更改sort语句:

>>> df.sort_values(by='Date') # This now sorts in date order
        Date Symbol
0 2015-02-20      A
2 2015-08-21      A
1 2016-01-15      A

You can use pd.to_datetime() to convert to a datetime object. It takes a format parameter, but in your case I don’t think you need it.

>>> import pandas as pd
>>> df = pd.DataFrame( {'Symbol':['A','A','A'] ,
    'Date':['02/20/2015','01/15/2016','08/21/2015']})
>>> df
         Date Symbol
0  02/20/2015      A
1  01/15/2016      A
2  08/21/2015      A
>>> df['Date'] =pd.to_datetime(df.Date)
>>> df.sort('Date') # This now sorts in date order
        Date Symbol
0 2015-02-20      A
2 2015-08-21      A
1 2016-01-15      A

For future search, you can change the sort statement:

>>> df.sort_values(by='Date') # This now sorts in date order
        Date Symbol
0 2015-02-20      A
2 2015-08-21      A
1 2016-01-15      A

回答 1

sort方法已弃用,并用代替sort_values。使用转换为datetime对象后df['Date']=pd.to_datetime(df['Date'])

df.sort_values(by=['Date'])

注意:按原位和/或降序排序(最新的优先):

df.sort_values(by=['Date'], inplace=True, ascending=False)

sort method has been deprecated and replaced with sort_values. After converting to datetime object using df['Date']=pd.to_datetime(df['Date'])

df.sort_values(by=['Date'])

Note: to sort in-place and/or in a descending order (the most recent first):

df.sort_values(by=['Date'], inplace=True, ascending=False)

回答 2

@JAB的答案非常简洁。但这会改变DataFrame您尝试排序的方式,您可能想要也可能不想要。

注意:您几乎肯定想要它,因为您的日期列应该是日期,而不是字符串!)

万一您不想将日期更改为日期,也可以使用其他方法。

首先,从排序Date列中获取索引:

In [25]: pd.to_datetime(df.Date).order().index
Out[25]: Int64Index([0, 2, 1], dtype='int64')

然后使用它索引原始DataFrame文档,使其保持不变:

In [26]: df.ix[pd.to_datetime(df.Date).order().index]
Out[26]: 
        Date Symbol
0 2015-02-20      A
2 2015-08-21      A
1 2016-01-15      A

魔法!

注意:对于Pandas 0.20.0及更高版本,请使用而不是ix,现在已弃用。

@JAB’s answer is fast and concise. But it changes the DataFrame you are trying to sort, which you may or may not want.

(Note: You almost certainly will want it, because your date columns should be dates, not strings!)

In the unlikely event that you don’t want to change the dates into dates, you can also do it a different way.

First, get the index from your sorted Date column:

In [25]: pd.to_datetime(df.Date).order().index
Out[25]: Int64Index([0, 2, 1], dtype='int64')

Then use it to index your original DataFrame, leaving it untouched:

In [26]: df.ix[pd.to_datetime(df.Date).order().index]
Out[26]: 
        Date Symbol
0 2015-02-20      A
2 2015-08-21      A
1 2016-01-15      A

Magic!

Note: for Pandas versions 0.20.0 and later, use instead of ix, which is now deprecated.


回答 3

可以使用以下代码读取包含日期列的数据:

data = pd.csv(file_path,parse_dates=[date_column])

使用上面的代码行读取数据后,可以使用以下方式访问包含有关日期的信息的列pd.date_time()

pd.date_time(data[date_column], format = '%d/%m/%y')

根据要求更改日期格式。

The data containing the date column can be read by using the below code:

data = pd.csv(file_path,parse_dates=[date_column])

Once the data is read by using the above line of code, the column containing the information about the date can be accessed using pd.date_time() like:

pd.date_time(data[date_column], format = '%d/%m/%y')

to change the format of date as per the requirement.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。