问题:Python strptime()和时区?

我有一个使用IPDDump创建的Blackberry IPD备份中的CSV转储文件。这里的日期/时间字符串看起来像这样(EST澳大利亚时区):

Tue Jun 22 07:46:22 EST 2010

我需要能够在Python中解析此日期。首先,我尝试strptime()从datettime 开始使用该功能。

>>> datetime.datetime.strptime('Tue Jun 22 12:10:20 2010 EST', '%a %b %d %H:%M:%S %Y %Z')

但是,由于某种原因,返回的datetime对象似乎没有任何tzinfo关联。

我确实在该页面上阅读了显然是datetime.strptime默默丢弃的内容tzinfo,但是,我检查了文档,但找不到此处记录的任何相关信息

我已经能够使用第三方Python库dateutil来解析日期,但是我仍对如何strptime()错误地使用内置函数感到好奇?有什么办法可以使strptime()时区与时俱进吗?

I have a CSV dumpfile from a Blackberry IPD backup, created using IPDDump. The date/time strings in here look something like this (where EST is an Australian time-zone):

Tue Jun 22 07:46:22 EST 2010

I need to be able to parse this date in Python. At first, I tried to use the strptime() function from datettime.

>>> datetime.datetime.strptime('Tue Jun 22 12:10:20 2010 EST', '%a %b %d %H:%M:%S %Y %Z')

However, for some reason, the datetime object that comes back doesn’t seem to have any tzinfo associated with it.

I did read on this page that apparently datetime.strptime silently discards tzinfo, however, I checked the documentation, and I can’t find anything to that effect documented here.

I have been able to get the date parsed using a third-party Python library, dateutil, however I’m still curious as to how I was using the in-built strptime() incorrectly? Is there any way to get strptime() to play nicely with timezones?


回答 0

datetime模块的文件说:

返回对应于date_string的datetime,并根据格式进行解析。等同于datetime(*(time.strptime(date_string, format)[0:6]))

看到了[0:6]吗?那让你(year, month, day, hour, minute, second)。没有其他的。没有提及时区。

有趣的是,[Win XP SP2,Python 2.6、2.7]将您的示例传递给您time.strptime不起作用,但是如果您剥离了“%Z”和“ EST”,它将起作用。也可以使用“ UTC”或“ GMT”代替“ EST”。“ PST”和“ MEZ”无效。令人费解。

值得注意的是,此功能已从3.2版开始进行更新,并且同一文档现在也声明以下内容:

将%z指令提供给strptime()方法时,将生成一个可感知的datetime对象。结果的tzinfo将设置为时区实例。

请注意,这不适用于%Z,因此大小写很重要。请参见以下示例:

In [1]: from datetime import datetime

In [2]: start_time = datetime.strptime('2018-04-18-17-04-30-AEST','%Y-%m-%d-%H-%M-%S-%Z')

In [3]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: None

In [4]: start_time = datetime.strptime('2018-04-18-17-04-30-+1000','%Y-%m-%d-%H-%M-%S-%z')

In [5]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: UTC+10:00

The datetime module documentation says:

Return a datetime corresponding to date_string, parsed according to format. This is equivalent to datetime(*(time.strptime(date_string, format)[0:6])).

See that [0:6]? That gets you (year, month, day, hour, minute, second). Nothing else. No mention of timezones.

Interestingly, [Win XP SP2, Python 2.6, 2.7] passing your example to time.strptime doesn’t work but if you strip off the ” %Z” and the ” EST” it does work. Also using “UTC” or “GMT” instead of “EST” works. “PST” and “MEZ” don’t work. Puzzling.

It’s worth noting this has been updated as of version 3.2 and the same documentation now also states the following:

When the %z directive is provided to the strptime() method, an aware datetime object will be produced. The tzinfo of the result will be set to a timezone instance.

Note that this doesn’t work with %Z, so the case is important. See the following example:

In [1]: from datetime import datetime

In [2]: start_time = datetime.strptime('2018-04-18-17-04-30-AEST','%Y-%m-%d-%H-%M-%S-%Z')

In [3]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: None

In [4]: start_time = datetime.strptime('2018-04-18-17-04-30-+1000','%Y-%m-%d-%H-%M-%S-%z')

In [5]: print("TZ NAME: {tz}".format(tz=start_time.tzname()))
TZ NAME: UTC+10:00

回答 1

我建议使用python-dateutil。到目前为止,它的解析器已经能够解析我抛出的每种日期格式。

>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)

等等。不用处理strptime()格式废话……只要在它上面加上一个日期,它就可以解决问题。

更新:糟糕。我错过了您提到您使用过的原始问题dateutil,对此感到抱歉。但是,我希望这个答案对那些有日期解析问题并看到该模块实用程序的人仍然有用。

I recommend using python-dateutil. Its parser has been able to parse every date format I’ve thrown at it so far.

>>> from dateutil import parser
>>> parser.parse("Tue Jun 22 07:46:22 EST 2010")
datetime.datetime(2010, 6, 22, 7, 46, 22, tzinfo=tzlocal())
>>> parser.parse("Fri, 11 Nov 2011 03:18:09 -0400")
datetime.datetime(2011, 11, 11, 3, 18, 9, tzinfo=tzoffset(None, -14400))
>>> parser.parse("Sun")
datetime.datetime(2011, 12, 18, 0, 0)
>>> parser.parse("10-11-08")
datetime.datetime(2008, 10, 11, 0, 0)

and so on. No dealing with strptime() format nonsense… just throw a date at it and it Does The Right Thing.

Update: Oops. I missed in your original question that you mentioned that you used dateutil, sorry about that. But I hope this answer is still useful to other people who stumble across this question when they have date parsing questions and see the utility of that module.


回答 2

您的时间字符串类似于rfc 2822中的时间格式(电子邮件,http标头中的日期格式)。您可以仅使用stdlib对其进行解析:

>>> from email.utils import parsedate_tz
>>> parsedate_tz('Tue Jun 22 07:46:22 EST 2010')
(2010, 6, 22, 7, 46, 22, 0, 1, -1, -18000)

请参阅针对各种Python版本产生可识别时区的datetime对象的解决方案:从电子邮件中解析带时区的date

在此格式下, EST在语义上等效于-0500。尽管通常来说,时区缩写还不足以唯一地标识时区

Your time string is similar to the time format in rfc 2822 (date format in email, http headers). You could parse it using only stdlib:

>>> from email.utils import parsedate_tz
>>> parsedate_tz('Tue Jun 22 07:46:22 EST 2010')
(2010, 6, 22, 7, 46, 22, 0, 1, -1, -18000)

See solutions that yield timezone-aware datetime objects for various Python versions: parsing date with timezone from an email.

In this format, EST is semantically equivalent to -0500. Though, in general, a timezone abbreviation is not enough, to identify a timezone uniquely.


回答 3

遇到这个确切的问题。

我最终要做的是:

# starting with date string
sdt = "20190901"
std_format = '%Y%m%d'

# create naive datetime object
from datetime import datetime
dt = datetime.strptime(sdt, sdt_format)

# extract the relevant date time items
dt_formatters = ['%Y','%m','%d']
dt_vals = tuple(map(lambda formatter: int(datetime.strftime(dt,formatter)), dt_formatters))

# set timezone
import pendulum
tz = pendulum.timezone('utc')

dt_tz = datetime(*dt_vals,tzinfo=tz)

Ran into this exact problem.

What I ended up doing:

# starting with date string
sdt = "20190901"
std_format = '%Y%m%d'

# create naive datetime object
from datetime import datetime
dt = datetime.strptime(sdt, sdt_format)

# extract the relevant date time items
dt_formatters = ['%Y','%m','%d']
dt_vals = tuple(map(lambda formatter: int(datetime.strftime(dt,formatter)), dt_formatters))

# set timezone
import pendulum
tz = pendulum.timezone('utc')

dt_tz = datetime(*dt_vals,tzinfo=tz)

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。