问题:从熊猫日期时间列中分别提取月份和年份

我有一个数据框df,其中包含以下列:

df['ArrivalDate'] =
...
936   2012-12-31
938   2012-12-29
965   2012-12-31
966   2012-12-31
967   2012-12-31
968   2012-12-31
969   2012-12-31
970   2012-12-29
971   2012-12-31
972   2012-12-29
973   2012-12-29
...

该列的元素是pandas.tslib.Timestamp。

我只想包括年份和月份。我以为会有一种简单的方法,但是我无法弄清楚。

这是我尝试过的:

df['ArrivalDate'].resample('M', how = 'mean')

我收到以下错误:

Only valid with DatetimeIndex or PeriodIndex 

然后我尝试了:

df['ArrivalDate'].apply(lambda(x):x[:-2])

我收到以下错误:

'Timestamp' object has no attribute '__getitem__' 

有什么建议?

编辑:我想通了。

df.index = df['ArrivalDate']

然后,我可以使用索引对另一列进行重新采样。

但是我仍然想要一种重新配置整个列的方法。有任何想法吗?

I have a Dataframe, df, with the following column:

df['ArrivalDate'] =
...
936   2012-12-31
938   2012-12-29
965   2012-12-31
966   2012-12-31
967   2012-12-31
968   2012-12-31
969   2012-12-31
970   2012-12-29
971   2012-12-31
972   2012-12-29
973   2012-12-29
...

The elements of the column are pandas.tslib.Timestamp.

I want to just include the year and month. I thought there would be simple way to do it, but I can’t figure it out.

Here’s what I’ve tried:

df['ArrivalDate'].resample('M', how = 'mean')

I got the following error:

Only valid with DatetimeIndex or PeriodIndex 

Then I tried:

df['ArrivalDate'].apply(lambda(x):x[:-2])

I got the following error:

'Timestamp' object has no attribute '__getitem__' 

Any suggestions?

Edit: I sort of figured it out.

df.index = df['ArrivalDate']

Then, I can resample another column using the index.

But I’d still like a method for reconfiguring the entire column. Any ideas?


回答 0

如果希望新列分别显示年和月,则可以执行以下操作:

df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year
df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month

要么…

df['year'] = df['ArrivalDate'].dt.year
df['month'] = df['ArrivalDate'].dt.month

然后,您可以将它们组合或按原样使用它们。

If you want new columns showing year and month separately you can do this:

df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year
df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month

or…

df['year'] = df['ArrivalDate'].dt.year
df['month'] = df['ArrivalDate'].dt.month

Then you can combine them or work with them just as they are.


回答 1

找到最好的方法

df['date_column']必须是日期时间格式。

df['month_year'] = df['date_column'].dt.to_period('M')

您也可以将DDay,2M2个月等用于不同的采样间隔,并且如果其中一个带有时间戳的时间序列数据,我们可以进行细化的采样间隔,例如45Min45分钟,15Min15分钟采样等。

Best way found!!

the df['date_column'] has to be in date time format.

df['month_year'] = df['date_column'].dt.to_period('M')

You could also use D for Day, 2M for 2 Months etc. for different sampling intervals, and in case one has time series data with time stamp, we can go for granular sampling intervals such as 45Min for 45 min, 15Min for 15 min sampling etc.


回答 2

您可以直接访问yearmonth属性,或请求一个datetime.datetime

In [15]: t = pandas.tslib.Timestamp.now()

In [16]: t
Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None)

In [17]: t.to_pydatetime() #datetime method is deprecated
Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701)

In [18]: t.day
Out[18]: 5

In [19]: t.month
Out[19]: 8

In [20]: t.year
Out[20]: 2014

组合年和月的一种方法是对它们进行整数编码,例如:201408对于2014年8月。在整列中,您可以这样做:

df['YearMonth'] = df['ArrivalDate'].map(lambda x: 100*x.year + x.month)

或其许多变体。

不过,我并不是这样做的忠实拥护者,因为它会使日期对齐和算术在以后变得很痛苦,对于那些不遵循相同约定而使用您的代码或数据的其他人来说尤其痛苦。更好的方法是选择一个月的日期约定,例如最终的非美国假日工作日或第一天等,并使用所选的日期约定以日期/时间格式保留数据。

calendar模块对于获取某些日期(例如最后一个工作日)的数值很有用。然后,您可以执行以下操作:

import calendar
import datetime
df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map(
    lambda x: datetime.datetime(
        x.year,
        x.month,
        max(calendar.monthcalendar(x.year, x.month)[-1][:5])
    )
)

如果您碰巧正在寻找一种解决简单问题的方法,那就只是将datetime列格式化为某种字符串表示形式,为此,您可以使用类中的函数datetime.datetime,如下所示:

In [5]: df
Out[5]: 
            date_time
0 2014-10-17 22:00:03

In [6]: df.date_time
Out[6]: 
0   2014-10-17 22:00:03
Name: date_time, dtype: datetime64[ns]

In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d'))
Out[7]: 
0    2014-10-17
Name: date_time, dtype: object

You can directly access the year and month attributes, or request a datetime.datetime:

In [15]: t = pandas.tslib.Timestamp.now()

In [16]: t
Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None)

In [17]: t.to_pydatetime() #datetime method is deprecated
Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701)

In [18]: t.day
Out[18]: 5

In [19]: t.month
Out[19]: 8

In [20]: t.year
Out[20]: 2014

One way to combine year and month is to make an integer encoding them, such as: 201408 for August, 2014. Along a whole column, you could do this as:

df['YearMonth'] = df['ArrivalDate'].map(lambda x: 100*x.year + x.month)

or many variants thereof.

I’m not a big fan of doing this, though, since it makes date alignment and arithmetic painful later and especially painful for others who come upon your code or data without this same convention. A better way is to choose a day-of-month convention, such as final non-US-holiday weekday, or first day, etc., and leave the data in a date/time format with the chosen date convention.

The calendar module is useful for obtaining the number value of certain days such as the final weekday. Then you could do something like:

import calendar
import datetime
df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map(
    lambda x: datetime.datetime(
        x.year,
        x.month,
        max(calendar.monthcalendar(x.year, x.month)[-1][:5])
    )
)

If you happen to be looking for a way to solve the simpler problem of just formatting the datetime column into some stringified representation, for that you can just make use of the function from the datetime.datetime class, like this:

In [5]: df
Out[5]: 
            date_time
0 2014-10-17 22:00:03

In [6]: df.date_time
Out[6]: 
0   2014-10-17 22:00:03
Name: date_time, dtype: datetime64[ns]

In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d'))
Out[7]: 
0    2014-10-17
Name: date_time, dtype: object

回答 3

如果要月对唯一,则使用apply非常时尚。

df['mnth_yr'] = df['date_column'].apply(lambda x: x.strftime('%B-%Y')) 

一列输出月-年。

我通常会忘记,不要忘记先将格式更改为日期时间。

df['date_column'] = pd.to_datetime(df['date_column'])

If you want the month year unique pair, using apply is pretty sleek.

df['mnth_yr'] = df['date_column'].apply(lambda x: x.strftime('%B-%Y')) 

Outputs month-year in one column.

Don’t forget to first change the format to date-time before, I generally forget.

df['date_column'] = pd.to_datetime(df['date_column'])

回答 4

从[‘2018-03-04’]中提取年份的说法

df['Year'] = pd.DatetimeIndex(df['date']).year  

df [‘Year’]创建一个新列。如果要提取月份,请使用.month

Extracting the Year say from [‘2018-03-04’]

df['Year'] = pd.DatetimeIndex(df['date']).year  

The df[‘Year’] creates a new column. While if you want to extract the month just use .month


回答 5

您可以首先使用pandas.to_datetime转换日期字符串,这使您可以访问所有的numpy datetime和timedelta工具。例如:

df['ArrivalDate'] = pandas.to_datetime(df['ArrivalDate'])
df['Month'] = df['ArrivalDate'].values.astype('datetime64[M]')

You can first convert your date strings with pandas.to_datetime, which gives you access to all of the numpy datetime and timedelta facilities. For example:

df['ArrivalDate'] = pandas.to_datetime(df['ArrivalDate'])
df['Month'] = df['ArrivalDate'].values.astype('datetime64[M]')

回答 6

感谢jaknap32,我想根据Year和Month汇总结果,所以可以这样:

df_join['YearMonth'] = df_join['timestamp'].apply(lambda x:x.strftime('%Y%m'))

输出整洁:

0    201108
1    201108
2    201108

Thanks to jaknap32, I wanted to aggregate the results according to Year and Month, so this worked:

df_join['YearMonth'] = df_join['timestamp'].apply(lambda x:x.strftime('%Y%m'))

Output was neat:

0    201108
1    201108
2    201108

回答 7

@KieranPC的解决方案是适用于Pandas的正确方法,但对于任意属性而言却不容易扩展。为此,您可以getattr在生成器理解中使用并结合使用pd.concat

# input data
list_of_dates = ['2012-12-31', '2012-12-29', '2012-12-30']
df = pd.DataFrame({'ArrivalDate': pd.to_datetime(list_of_dates)})

# define list of attributes required    
L = ['year', 'month', 'day', 'dayofweek', 'dayofyear', 'weekofyear', 'quarter']

# define generator expression of series, one for each attribute
date_gen = (getattr(df['ArrivalDate'].dt, i).rename(i) for i in L)

# concatenate results and join to original dataframe
df = df.join(pd.concat(date_gen, axis=1))

print(df)

  ArrivalDate  year  month  day  dayofweek  dayofyear  weekofyear  quarter
0  2012-12-31  2012     12   31          0        366           1        4
1  2012-12-29  2012     12   29          5        364          52        4
2  2012-12-30  2012     12   30          6        365          52        4

@KieranPC’s solution is the correct approach for Pandas, but is not easily extendible for arbitrary attributes. For this, you can use getattr within a generator comprehension and combine using pd.concat:

# input data
list_of_dates = ['2012-12-31', '2012-12-29', '2012-12-30']
df = pd.DataFrame({'ArrivalDate': pd.to_datetime(list_of_dates)})

# define list of attributes required    
L = ['year', 'month', 'day', 'dayofweek', 'dayofyear', 'weekofyear', 'quarter']

# define generator expression of series, one for each attribute
date_gen = (getattr(df['ArrivalDate'].dt, i).rename(i) for i in L)

# concatenate results and join to original dataframe
df = df.join(pd.concat(date_gen, axis=1))

print(df)

  ArrivalDate  year  month  day  dayofweek  dayofyear  weekofyear  quarter
0  2012-12-31  2012     12   31          0        366           1        4
1  2012-12-29  2012     12   29          5        364          52        4
2  2012-12-30  2012     12   30          6        365          52        4

回答 8

df['year_month']=df.datetime_column.apply(lambda x: str(x)[:7])

这对我来说很好用,没想到熊猫会把结果字符串日期解释为日期,但是当我做情节时,它非常了解我的议程和正确订购年份的字符串year_month……必须爱熊猫!

df['year_month']=df.datetime_column.apply(lambda x: str(x)[:7])

This worked fine for me, didn’t think pandas would interpret the resultant string date as date, but when i did the plot, it knew very well my agenda and the string year_month where ordered properly… gotta love pandas!


回答 9

不使用方法套用两个步骤提取所有数据框的年份。

第1步

将列转换为datetime:

df['ArrivalDate']=pd.to_datetime(df['ArrivalDate'], format='%Y-%m-%d')

第2步

使用DatetimeIndex()方法提取年或月

 pd.DatetimeIndex(df['ArrivalDate']).year

There is two steps to extract year for all the dataframe without using method apply.

Step1

convert the column to datetime :

df['ArrivalDate']=pd.to_datetime(df['ArrivalDate'], format='%Y-%m-%d')

Step2

extract the year or the month using DatetimeIndex() method

 pd.DatetimeIndex(df['ArrivalDate']).year

回答 10

单行:添加具有“年-月 ”对的列:(“ pd.to_datetime”首先将列dtype更改为操作之前的日期时间)

df['yyyy-mm'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%Y-%m')

因此,对于额外的“年”或“月”列:

df['yyyy'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%Y')

df['mm'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%m')

SINGLE LINE: Adding a column with ‘year-month’-paires: (‘pd.to_datetime’ first changes the column dtype to date-time before the operation)

df['yyyy-mm'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%Y-%m')


Accordingly for an extra ‘year’ or ‘month’ column:

df['yyyy'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%Y')

df['mm'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%m')


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。