问题:使用python pandas合并日期和时间列
我有一个带有以下各栏的熊猫数据框;
Date Time
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
如何合并data [‘Date’]和data [‘Time’]以获得以下内容?有办法做到pd.to_datetime
吗?
Date
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
I have a pandas dataframe with the following columns;
Date Time
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
How do I combine data[‘Date’] & data[‘Time’] to get the following? Is there a way of doing it using pd.to_datetime
?
Date
01-06-2013 23:00:00
02-06-2013 01:00:00
02-06-2013 21:00:00
02-06-2013 22:00:00
02-06-2013 23:00:00
03-06-2013 01:00:00
03-06-2013 21:00:00
03-06-2013 22:00:00
03-06-2013 23:00:00
04-06-2013 01:00:00
回答 0
值得一提的是,你可能已经能够在阅读这直接,如果你正在使用如read_csv
使用parse_dates=[['Date', 'Time']]
。
假设这些只是字符串,您可以简单地将它们添加在一起(带有空格),从而可以应用to_datetime
:
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
注意:令人惊讶的(对我而言),这在将NaN转换为NaT时可以很好地工作,但值得担心的是转换(也许使用raise
参数)。
It’s worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv
using parse_dates=[['Date', 'Time']]
.
Assuming these are just strings you could simply add them together (with a space), allowing you to apply to_datetime
:
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise
argument).
回答 1
可接受的答案适用于数据类型的列string
。出于完整性考虑:当列的数据类型为:日期和时间时,我在搜索如何执行此操作时遇到了这个问题。
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)
The accepted answer works for columns that are of datatype string
. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)
回答 2
您可以使用它来将日期和时间合并到数据框的同一列中。
import pandas as pd
data_file = 'data.csv' #path of your file
读取具有合并列Date_Time的.csv文件:
data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']])
您可以使用此行同时保留其他两列。
data.set_index(['Date', 'Time'], drop=False)
You can use this to merge date and time into the same column of dataframe.
import pandas as pd
data_file = 'data.csv' #path of your file
Reading .csv file with merged columns Date_Time:
data = pd.read_csv(data_file, parse_dates=[['Date', 'Time']])
You can use this line to keep both other columns also.
data.set_index(['Date', 'Time'], drop=False)
回答 3
如果类型不同(datetime和timestamp或str),则可以强制转换列,并使用to_datetime:
df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))
结果:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
最好,
You can cast the columns if the types are different (datetime and timestamp or str) and use to_datetime :
df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))
Result :
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
Best,
回答 4
我没有足够的声誉对jka.ne进行评论,所以:
我必须修改jka.ne的行才能使其工作:
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)
这可能会帮助其他人。
另外,我还测试了另一种方法,replace
而不是使用combine
:
def combine_date_time(df, datecol, timecol):
return df.apply(lambda row: row[datecol].replace(
hour=row[timecol].hour,
minute=row[timecol].minute),
axis=1)
在OP的情况下为:
combine_date_time(df, 'Date', 'Time')
我已经为两种方法设定了相对较大的数据集(> 500.000行)的时间,并且它们都具有相似的运行时,但是使用combine
速度更快(的响应时间为59s replace
与的响应时间为50s combine
)。
I don’t have enough reputation to comment on jka.ne so:
I had to amend jka.ne’s line for it to work:
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']).time(),1)
This might help others.
Also, I have tested a different approach, using replace
instead of combine
:
def combine_date_time(df, datecol, timecol):
return df.apply(lambda row: row[datecol].replace(
hour=row[timecol].hour,
minute=row[timecol].minute),
axis=1)
which in the OP’s case would be:
combine_date_time(df, 'Date', 'Time')
I have timed both approaches for a relatively large dataset (>500.000 rows), and they both have similar runtimes, but using combine
is faster (59s for replace
vs 50s for combine
).
回答 5
答案实际上取决于您的列类型是什么。就我而言,我有datetime
和timedelta
。
> df[['Date','Time']].dtypes
Date datetime64[ns]
Time timedelta64[ns]
如果是这种情况,则只需添加以下列:
> df['Date'] + df['Time']
The answer really depends on what your column types are. In my case, I had datetime
and timedelta
.
> df[['Date','Time']].dtypes
Date datetime64[ns]
Time timedelta64[ns]
If this is your case, then you just need to add the columns:
> df['Date'] + df['Time']
回答 6
您还可以datetime
通过datetime
和timedelta
对象进行转换,而无需字符串连接。与结合使用pd.DataFrame.pop
,您可以同时删除源系列:
df['DateTime'] = pd.to_datetime(df.pop('Date')) + pd.to_timedelta(df.pop('Time'))
print(df)
DateTime
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
print(df.dtypes)
DateTime datetime64[ns]
dtype: object
You can also convert to datetime
without string concatenation, by combining datetime
and timedelta
objects. Combined with pd.DataFrame.pop
, you can remove the source series simultaneously:
df['DateTime'] = pd.to_datetime(df.pop('Date')) + pd.to_timedelta(df.pop('Time'))
print(df)
DateTime
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
print(df.dtypes)
DateTime datetime64[ns]
dtype: object
回答 7
首先确保具有正确的数据类型:
df["Date"] = pd.to_datetime(df["Date"])
df["Time"] = pd.to_timedelta(df["Time"])
然后,您可以轻松地将它们组合:
df["DateTime"] = df["Date"] + df["Time"]
First make sure to have the right data types:
df["Date"] = pd.to_datetime(df["Date"])
df["Time"] = pd.to_timedelta(df["Time"])
Then you easily combine them:
df["DateTime"] = df["Date"] + df["Time"]
回答 8
使用 combine
功能:
datetime.datetime.combine(date, time)
Use the combine
function:
datetime.datetime.combine(date, time)
回答 9
我的数据集有1秒的分辨率数据,持续了几天,通过此处建议的方法进行解析非常慢。相反,我使用了:
dates = pandas.to_datetime(df.Date, cache=True)
times = pandas.to_timedelta(df.Time)
datetimes = dates + times
请注意,cache=True
由于我的文件中只有几个唯一的日期,因此使用make可以非常有效地解析日期,这对于合并的日期和时间列而言并非如此。
My dataset had 1second resolution data for a few days and parsing by the suggested methods here was very slow. Instead I used:
dates = pandas.to_datetime(df.Date, cache=True)
times = pandas.to_timedelta(df.Time)
datetimes = dates + times
Note the use of cache=True
makes parsing the dates very efficient since there are only a couple unique dates in my files, which is not true for a combined date and time column.
回答 10
数据:
<TICKER>,<PER>,<DATE>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL> SPFB.RTS,1,20190103,100100,106580.0000000,107260.0000000,106570.0000000 ,107230.0000000,3726
码:
data.columns = ['ticker', 'per', 'date', 'time', 'open', 'high', 'low', 'close', 'vol']
data.datetime = pd.to_datetime(data.date.astype(str) + ' ' + data.time.astype(str), format='%Y%m%d %H%M%S')
DATA:
<TICKER>,<PER>,<DATE>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOL>
SPFB.RTS,1,20190103,100100,106580.0000000,107260.0000000,106570.0000000,107230.0000000,3726
CODE:
data.columns = ['ticker', 'per', 'date', 'time', 'open', 'high', 'low', 'close', 'vol']
data.datetime = pd.to_datetime(data.date.astype(str) + ' ' + data.time.astype(str), format='%Y%m%d %H%M%S')